jagomart
digital resources
picture1_Language Pdf 101709 | Jetir1807185


 161x       Filetype PDF       File size 0.81 MB       Source: www.jetir.org


File: Language Pdf 101709 | Jetir1807185
2018 jetir july 2018 volume 5 issue 7 www jetir org issn 2349 5162 classification of sentences for paraphrasing in punjabi language ravinder mohan jindal vijay rana research scholar assistant ...

icon picture PDF Filetype PDF | Posted on 22 Sep 2022 | 3 years ago
Partial capture of text on file.
         © 2018 JETIR July 2018, Volume 5, Issue 7                                            www.jetir.org  (ISSN-2349-5162) 
                 Classification of Sentences for Paraphrasing in 
                                                  Punjabi Language 
          
          
                           Ravinder Mohan Jindal                                          Vijay Rana 
                              Research Scholar                                        Assistant Professor 
                      Sant Baba Bhag Singh University                  Department of Computer Science and Applications 
                                 Jalandhar                                     Sant Baba Bhag Singh University 
                                                                                          Jalandhar 
          
         ABSTRACT 
          
         In this research article, author developed an algorithm to classify the Punjabi sentences into simple, compound and complex 
         sentences. This classification is done to assist in generating paraphrases of Punjabi sentences. Author classifies the Punjabi sentences 
         on  the  basis  of  length  of  sentence  and  other  morphological  features  like  presence  of  non-finite  verb,  presence  of  specific 
         postpositions after the root form of verb etc. after applying the proposed algorithm author obtained precision of 100%, recall 99.4% 
         and F-measure 99.69% for simple sentences, precision of 100%, recall 99.15% and F-measure 99.57% for compound sentences and 
         precision of 100%, recall 99.99% and F-measure 99.94% for complex sentences.  
          
         Keywords- Paraphrasing, sentence simplification, Punjabi sentences. 
          
         INTRODUCTION 
          
         Nowadays, research in the field of language processing is growing rapidly. Most of the automatic language processing systems have 
         been developed for English language but not much work has been done in Indian languages. One of such work is to convert the 
         existing sentence in different form by keeping the semantic or meaning same. This will helpful in converting the complex sentence 
         into simpler one. In Natural Language Processing, the technical term used for such task is Paraphrasing. Paraphrasing play very 
         important role in our day today life like when we read the newspaper, checks email or follow some instruction, we interact with the 
         text and it is very important to understand this text. Now if some of these texts are complex then it becomes necessary to simplify 
         them in order to understand these. Paraphrasing is a technique to modify the natural language sentences so that its complexity is 
         reduced and also its readability and understandability is improved. The goal of paraphrasing is to reduce the syntactic complexity 
         of a large sentence so as to help in the development and improvement of various natural language processing tools. Paraphrasing 
         can  be  helpful  in  developing  and  improving  many  applications  in  different  natural  language  processing  resources  like  text 
         summarization, machine translation, grammar checking and in assistive technology. In all these resources sentence simplification 
         is used as pre-processor. 
          
         PUNJABI SENTENCES 
          
         Like other languages, sentences in Punjabi language also falls into four categories i.e. simple, compound, complex and compound-
         complex sentences. Each of these four categories has its own characteristics that help in the classification of these sentences. In this 
         research author has used two main characteristic of these sentence i.e. the length of sentence and morphological features of 
         conjunctions (for identification of complex sentences). The main features of simple compound and complex sentences that are used 
         for classification are listed in table 1. 
          
                                                                       
          JETIR1807185  Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org                      196 
          
         © 2018 JETIR July 2018, Volume 5, Issue 7                                            www.jetir.org  (ISSN-2349-5162) 
                                        Table 1: Features of simple, compound and complex sentences 
                                                                          
                                Length of sentence       Presence of             Number of verb         Number of 
                                                         dependent clause        phrases                Clauses 
           Simple sentence      Less than r equal to     No dependent clause     One verb phrase        one 
                                seven 
           Compound             More than seven          No dependent clause     At least two verb      At least two 
           Sentence                                                              phrases 
           Complex Sentence     More than seven          Yes contains            At least two verb      At least two 
                                                         dependent clause        phrases 
          
          
         EXISTING WORK 
          
         Various techniques have been proposed by various authors for different languages. These include Bautista, S. et.al (2017) [1] 
         explained a technique to process numerical information present in the large complex sentences. Lee, J. et.al (2017) [2] applied 
         parsing technology for syntactic simplification of English sentences. Narayan, S. et.al (2017) [3] proposed a split and rephrase 
         technique for simplification of complex sentences in English language. Bingel, J. et.al (2016) [4] presented a Conditional Random 
         Field over Dependency based model for text simplification and paraphrasing. Sethi, N. et.al (2016) [5] discussed an approach for 
         reframing the Hindi sentences to generate paraphrases. Narayan, S et.al (2015) [6] presented an unsupervised technique for sentence 
         simplification and this technique was based upon deep semantics. Saini, S et.al (2015) [7] proposed relative clause based sentence 
         simplification method to facilitate English-Hindi machine translation system. Cental, I. et.al (2014) [8] discussed a corpus based 
         approach to syntactically simplify the complex French text in to simple one. Štajner, S. et.al (2013) [9] explained the process of 
         automatic simplification of complex texts in Spanish. Collados, J. C. (2013) [10] used sentence simplification approach for creation 
         of simple Spanish corpus. Author used syntactic simplification split rules, coordination and Subordination to split the large sentence. 
         Wubben, S. et.al (2010) [11] presented a technique for generating paraphrases using monolingual corpus. Petersen, S. E. et.al (2007) 
         [12] performed a detailed analysis of the corpus of news articles and abridged versions written by a literacy organization. Bannard, 
         C. et.al (2005) [13] proposed a method for paraphrasing using bilingual parallel corpora. Inui, K et.al. (2003) [14] described an 
         ongoing research project on text simplification for Japanese language. Knight, K. et.al (2002) [15] presented corpus based sentence 
         compression algorithms using noisy channel and decision tree approach, Rule based sentence simplification approach was proposed 
         by Naushad UzZaman et. al. (2011) [21], identification of complex predicates in Hindi language is developed by Ankit Soni et.al. 
         (2005) [22], clause boundary identification system for Urdu language is developed by Daraksha Parveen et al. (2009) [23]. 
           
         PROPOSED MODEL 
         As shown in figure 1, input sentence is first checked for its length. The length of sentence is simply the number of words excusing 
         the sentence ender present in the sentence. Further author analyzed 500 compound and complex sentences and observed that more 
         than 97% of the compound sentences have length more than seven.  Hence length of sentence is considered as the first criteria for 
         the classification.  If the length of sentence is less than seven then it will check for presence of number of verb phrases. Again if the 
         sentence contains only one verb phrase then it is simple sentence. on the other hand if the length of the sentence is greater than 
         seven then the sentence is candidate for compound and complex sentences. Further this type of sentence will be checked for the 
         presence of dependent clause. As per B.S Cheema [24], there are basically four types of dependent clauses in the Punjabi language. 
         These include dependent clause having relative clause, dependent clause having KI clause, dependent clause having adverb clause 
         and dependent clause having non-finite clause. Each of these types has specific morphological features like relative clause start with 
         ਜਜ and ਜਜਜਜਜ conjunctions, KI clause starts with ਜਜ subordinate conjunction, adverb clauses starts with ਜਜ, ਜਜਜਜ, ਜਜਜਜਜ, 
           JETIR1807185  Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org                         197 
          
         © 2018 JETIR July 2018, Volume 5, Issue 7                                            www.jetir.org  (ISSN-2349-5162) 
         ਜਜਜਜਜ, ਜਜਜਜਜ, ਜਜ, ਜਜਜ ਜਜਜਜਜਜ etc. and non-finite clause contains non-finite verbs like ਜਜਜਜਜਜਜ, ਜਜਜਜਜ, ਜਜਜਜਜਜ i.e. 
         contains ਜਜ ਜਜਜਜ and ਜ as postfix with verb.    
         Algorithm used: 
         Step 1: Enter the Punjabi corpus. 
         Step 2: For each sentences in the corpus calculate its length. 
                    If length of sentence is less than or equal to 7 then go to step 3otherwise go to step 4. 
         Step 3: check for number of verb phrases present in the sentence. 
                   If there is only one verb phrase then it is simple sentence otherwise go to step 4. 
         Step4: Check for the presence of dependent clause using dependent clause features.  
                    If dependent clause is present then it is complex sentence otherwise go to step 5. 
         Step 5: check for number of verb phrases present in the sentence. 
                   If there is only one verb phrase then it is simple sentence otherwise it is compound sentence. 
                                                    Input Punjabi Sentence 
                                                 Calculate Length of Sentence 
        YES            Sentence           YES           Length < 7         NO       Check for presence of 
                      contains one                                                    dependent clause 
                      verb phrase 
                      NO 
                                                       Contains more        NO           Contain 
                                        NO              than one verb                   depended 
                                                          phrases                        clause 
                                                                                               YES 
                                                               YES 
              Simple Sentence                Compound Sentence                       Complex Sentence 
                                                                                                         
                                              Figure 1: Proposed flow char of sentence classification 
          
          
          
          
          JETIR1807185  Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org                198 
          
            © 2018 JETIR July 2018, Volume 5, Issue 7                                            www.jetir.org  (ISSN-2349-5162) 
            CHALLENGES 
              
            Various challenges in developing the sentence classification system are: 
                       As the Punjabi phrase structure is recursive in nature, therefore it is difficult to identify the dependent clause boundaries 
                        [18][19][20]. 
                       There are various types of complex sentence in Punjabi language [16][17][24] and all these complex sentences varies in 
                        structure hence separate morphological feature has to be used for identification of each type of complex sentence.  
                         
            RESULTS AND DISCUSSION 
            As discussed above, two main parameters used for classification of Punjabi sentences are length and morphological features. For 
            testing the proposed algorithm 15000 sentences were collected from online sources. Out of these 15000 sentences, 2000 simple and 
            1500 of each type i.e. compound and complex sentences were used to check the effect of length on classification of sentences. The 
            results obtained are shown in tables 2. After applying the complete proposed algorithm the classification results obtained are shown 
            in tables 3.a, 3.b, 3.c and 3.d.   
             
                                            Table 2: Results obtained by classifying the sentences on the basis of their length 
             
              Type of sentence              Total     number       of    Number  of  sentences       %age          of   Number  of  sentences        %age of sentences 
                                            sentences                    having length <7            sentences          having length >7             having length >7 
                                                                                                     having  length 
                                                                                                     <7 
              Simple                        2000                         1991                        99.55              9                            0.45 
              Compound                      1500                         0                           0                  1500                         100 
              Complex                       1500                         5                           0.4                1495                         99.6 
             
                                              Table3.a: Result obtained by applying the proposed algorithm on three datasets 
              Test        set   Number        of   Number          of   Number          of    Number         of    Correctly            Correctly            Correctly 
              number            sentences     in   Simple               compound              complex              classified simple    classified           classified 
                                                   Sentences in the     sentences  n  the     sentences in the     sentences      by    compound             complex 
                                set                corpus               corpus                corpus               proposed system      sentences      by    sentences 
                                                                                                                                        proposed system      by 
                                                                                                                                                             proposed 
                                                                                                                                                             system 
              1                 5000               904                  2890                  1206                 899                  2855                 1205 
              2                 5000               658                  1700                  2642                 655                  1689                 2642 
              3                 5000               552                  1890                  2558                 548                  1881                 2555 
              Total             15000              2114                 6480                  6406                 2102                 6425                 6402 
                                                                                                  
                                                                                                  
                                                                                                  
                                                                                                  
              JETIR1807185  Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org                                                                  199 
             
The words contained in this file might help you see if this file matches what you are looking for:

...Jetir july volume issue www org issn classification of sentences for paraphrasing in punjabi language ravinder mohan jindal vijay rana research scholar assistant professor sant baba bhag singh university department computer science and applications jalandhar abstract this article author developed an algorithm to classify the into simple compound complex is done assist generating paraphrases classifies on basis length sentence other morphological features like presence non finite verb specific postpositions after root form etc applying proposed obtained precision recall f measure keywords simplification introduction nowadays field processing growing rapidly most automatic systems have been english but not much work has indian languages one such convert existing different by keeping semantic or meaning same will helpful converting simpler natural technical term used task play very important role our day today life when we read newspaper checks email follow some instruction interact with ...

no reviews yet
Please Login to review.