107x Filetype PDF File size 0.49 MB Source: aclanthology.org
MalayalamVerbFrames Jisha P Jayan AshaSNair GovindaruV Centre for Development of Centre for Development of Centre for Development of Imaging Technology Imaging Technology Imaging Technology Thiruvanathapuram Thiruvanathapuram Thiruvanathapuram jishapjayan@gmail.com ashanaircdit@ neithalloor@gmail.com gmail.com Abstract encode other information such as tense, aspect, modality, gender, number, person etc., with verb, Verbs acts as a major role in describing a that allow language specific variations. sentence meaning. Capturing of the syn- This paper is intended to develop verbframe for tactic distributions of occurrence of a verb Malayalam language which has got grammatical in a sentence is the VerbFrame. This pa- roots from Dravidian and Aryan languages. This per tests the applicability of verbframe ap- paper presents the work in different stages, begin- proach that has been developed for Hindi ning in Section 2 with the major works related. language in Malayalam. Around 255 Section 3 introduces the Verb Frame and its de- verbs were selected for this study, show- scription . Section 4 describes the Verb frame for ing the basic argument structure of words Malayalam. Finally, Section 5 concludes the pa- with these verbs. per. Keywords- verbframe; karaka relations; semantic; syntactic; 2 StateofArt 1 Introduction Some of the famous linguistic sources related to verb argument structure, are discussed briefly in Verbs are the most important grammatical cate- this section. Levin’s work on verb classes (Beth, gory in any language. With the help of an action, 1993) indicates the relationship between seman- activity and state are denoted. The arguments of tic and syntactic behavior of the English verbs. the verb indicate various participants required by The verb behavior can be used to get an insight the verb. Verbs play a noteworthy part in inter- into linguistically applicable aspects of the verb preting meaningofasentence,therefore, the study meaning (Beth, 1995). VerbNet (VN) (Kipper, of the argument structure of a verb and their syn- 2000)(Kipper,2005)isadomain-independent;hi- tactic behavior will provide the needed knowledge erarchical, wide-coverage of online verb dictio- base for intelligent NLP applications. Verbframe nary which extends Levin’s verb classes (Beth, is the gathering of the syntactic distribution of the 1993) and providing syntactic and semantic infor- verb occurrence in any sentence. Paninian Gram- mation for English verbs. It is mapped to various matical Framework (PGF) is followed in creating language resources such as Wordnett (Fellbaum, aVerbframeasverbplaystheimportantroleinthe 1998), FrameNet, and PropBank. Each class of sentence analysis. verbs in VN is described by thematic roles, selec- The relation of verb with the alternate units of tional restrictions on the arguments, and syntactic a sentence, in a language may be encoded in vari- frames (Beth, 1993). ous ways. Among them, the word order and the PropBank (PB) (Palmer, 2003) (Palmer, 2005) presence of case markers on the arguments are is a corpus, annotated with verbal propositions very often used by computational linguists. There and their arguments. This has been extensively are, however, languages in which the marking can used for semantic role labeling task in recent be present of the verb itself rather than its argu- times (CoNLL shared task 2004-05 and 2008- ments (Butt, 2010). Such types of relations fre- 2009). PB gives a layer of semantic annotation quently reflect semantics of a verb, that-means the upon the syntactic structures. PB represents the syntacticbehavioroftheverbprovidesagoodsup- verb argument depending on the valency of the 236 port to understand its semantics. Researchers also verb relations by Arg0, Arg1, Arg2, etc., (Palmer, S Bandyopadhyay, D S Sharma and R Sangal. Proc. of the 14th Intl. Conference on Natural Language Processing, pages 236–244, c Kolkata, India. December 2017. 2016 NLP Association of India (NLPAI) 2002). Each set of argument labels and their def- from the compound verb is able to retain its case initions is called a frameset. As an example, con- marking properties and argument structure or not. sider the frameset for the verb dance. This verb Additionally the knowledge and syntax associated takes the dancer:Arg0, dance:Arg1 , partner:Arg2 with verb frames can be utilized for categorizing and audience:Arg3 as essential roles. It also has and analyzing the verb words for various NLP ap- non-essential roles such as location:Argm-loc and plications. time:Argm-tmp. This is for capturing spatio- Soni et al. (Ghosh, 2013) explores the applica- temporal aspects of verbs. tion of verb frames and the conjuncts in sentence FrameNet (FN) (Baker et al., 1998) is an on- simplification for Hindi language. The method line lexical resource for English, based totally on proposed by the authors includes usage of con- frame semantics and supported by means of cor- juncts as a first level of sentence simplification. pus evidence. FN groups words in accordance to This is followed by using verb frames enhanced the conceptual structures, i.e., frames that under- with tense, aspect and modality features. It is a lie them (Arun, 2008) . The paper describes three rule based system and its output is evaluated man- majorcomponentssuchas: (1)Lexicon;(2)Frame ually and automatically using the BLEU score for Database; (3) Annotated Example Sentences. The the ease of readability and simplification. Framedatabasedealswiththedescriptionsofeach A semi-automatic annotator tool for verb frame’s basic conceptual structure, and provides frames was developed by Hanumant et al (Redkar, the names and descriptions of the elements par- 2016). The tool is used for extracting and generat- ticipating in such structure (Begum, 2017). Anno- ing the verb frames automatically from the exam- tated Sentences are marked to illustrate the seman- ple sentences of Marathi wordnet. The paper ex- tic and morpho-syntactic properties of the lexical plains the concept and working of the verb - frame items. Each frame contains numerous elements, tool with its advantages and disadvantages. Other i.e., core (core arguments) and non-core (adjuncts related work by Schulte (Walde, 2009) has also or peripheral roles) elements which are considered explored verb frames for the English language. as semantic roles. For example, core elements of the frame Getting-up are person/animal getting up 3 VerbFrames from sleep and place of sleeping; non-core ele- ments are time, purpose, etc. In all languages, verb plays the major part-of- All these resources looks into the argument speech category. Verbs are used to define actions, structure of English verbs. They gives the syn- activities and states. Ability of the verbs to choose tactic and semantic information, and correlation their arguments and/or adjuncts is termed as ‘verb between them. These resources are also mapped sub-categorization’or ‘verb valency’. Combina- to each other making individual resources much tion of functional units that are elicited by a verb richer. In the work of creating verb frames for is refered to as verb frames. In linguistics, verb- Hindi, the argument structure of verb is captured framing and satellite-framing are typological de- using Karaka relations which capture both syntac- scriptions of how verb phrases in different lan- tic and semantic information about the verbs. Be- guages describe the path of motion or the manner tween Karaka relations, thematic roles and Prop- of motion, respectively (Redkar, 2016). bankannotation, a mapping is done . Begum et al. Verbframegenerallyconstitutesverbalproposi- (Begum, 2008) mentioned their experience with tions and arguments of words surrounding a verb the creation of Hindi verb frames. These frames in a given sentence. Each of the prepositional are further classified based on a Paninian gram- words in a verb frame has arguments such as an mar framework using 6 Karaka relations. This arc-label, otherwise called a semantic role label, methodconsideredthemorphology,syntacticvari- its necessity in a frame, case markers or the suf- ations and semantics of the verb to divide it into fixes, lexical type, relation of the word with head various classes. verb, position withrespecttoheadverb,etc. These Based on similar approach, Ghosh (Ghosh, verbframesaredevelopedtogeneratedependency 2014) created a resource for verb frames for com- tree structures in a given language. Verb frames pound verbs in Bengali language. The main aim onthebasisoftheirargumentdemandscategoriza- 237 of the paper is to investigate if the vector verb tion of any verb. The verb frames show mandatory 1 No Case Case markers Karaka relation for a verb. They are: 1. Karaka : dependency arc labels. 1 nirddeeSika \o±u¥foI φ Nominative 2. The necessity of the argument whether it is 2 prathigraahika }]Xo}KnioI F mandatory (m) or desirable (d). Accusative -e 3 samyojika hwubnPoI ¨Sm 3. CaseMarkers/Vibhakti: post-positionorthe Sociative -ooTu case associated with the nominal. 4 uddeeshika mal ·m, \m 4. Lexical category of the arguments. Dative kku,nu 5 sambandhika hw_áoIn tÂ,DtS 5. The Position of the demanded nominal with Gentive -nRe,-uTe respect to verb whether it is left(l) or right(r). 6 aadhaarika Bcn[oI C²,I² Locative -il, -kal Verbframesarebuiltforthebaseformofaverb. 7 prayoojika }]ubnPoI B² The demands undergo a subsequent change based Instrumental aal on the tense, aspect and modality (TAM) of the 8 sambhoodana hwu_n[oI p , n, u verb used in the sentence. Knowledge about the Vocative long forms transformationsinducedonthebaseformofaverb 9 ao}feoõ2o C²\oÁm by TAM is stored in the form of transformation Ablative il ninn charts for each distinct TAM. In the present work we develop verbframe for Table 1: Case and Case Markers Malayalam based on Karaka theory developed by IIIT-Hyderabad for Hindi. ramanReaniyanvannu. 4 MalayalamVerbFrame Raman’sbrother came. Amid the semantic analysis, verb is taken as the Eg(b): AejqtS AÅ ]l¼q. avaLuTe ammaparanjnju. central, element of the sentence. According to Hermothersaid. Paninian viewpoint, there are four levels in un- Because of this, the genitive noun can be removed derstanding any sentence (Bharati, 1995) namely fromthesentence without affecting the grammati- the surface level (uttered sentence), the vibhakthi cality of the sentence level, the Karaka level and the semantic level. The Dependency annotated data are used for devel- Karaka level has related to semantics on one side oping Malayalam verb frames. The dependency and on the other side with the syntax. Karaka re- annotation is a collective process of Tokeniser, lation can be identified from markers/suffixes or Morphological Analyser, POS tagger, Chunker case endings after the noun. The Karaka relations and Dependency annotation. A raw text will be in Malayalam are analyzed from the point of vib- given as the input and the text is converted into hakthi and the postpositions that associate with it. tokens, identifies grammatical features of the The types of verb and the vibhakthi markers in individual words, assigns parts of speech (POS) Malayalam are illustrated in Figure 1 and Table 1 tags to each word , groups them to phrases and the respectively. dependency tree diagrams are drawn. Malayalam Theroles and the dependency relation based on hastendencytojoinawidevarietyofsuffixeswith IIIT H approach, are shown in Table 2. a single word forming compound words, which The genitive noun does not have any direct makes the process more complicated. Therefore grammaticalorsemanticrelationwiththeverbbut complicated words are spllited and then analysed only the noun modified by the genitive is related in the present analysis. As an example, consider to the verb. The Genetive case “hw_áoIneoõ- the following sentence. 2o” saMbhndhikaavibhaktiotherwisePossessive aÊm õ¸W hn[\ºtj Au]¸o»m ]k¹- takes the markers “t” nRe, “DtS” uTe. ¼obo² _o6 , _o12 tteÊao\qI³ [ncnjw Eg(a): cnat A\ob° eÁq. ASºobo½q¾m . 1karakas are the typed dependency labels in Computa- maRRu bhakshaNa saadhanangngaLe apeek- tional Paninian Framework (Bharati, 1993) 238 shiccu pazhangkanjnjiyil b6 , b12 vaiRRaminukaL Figure 1: Verb types in Malayalam dhaaraaLaMaTangngiyiTTuNTu. name=’_o6’> In comparison to other food items, rice gruel is 3.2 , RD PUNCThe sentenence is annotated as follows: )) 4(( NULL CCP \,n,ne,pl,3,d,F,NGaLe’ head=’hn[\ºtj’ 4.1 NULL CC name=’NP’drel=’k2:VGF’> )) 1.1 aÊm QT QTF \m,n,ne,pl,3,d,I³,kaLu’ head=’tteÊao\qI³’ 1.2 õ¸WJJ ¸W’> 5.1 _o12 N NN \m,n,ne,pl,3,d,I³,kaLu’ name=’tteÊao\qI³’> 1.4 Au]¸o»m PSP 6 (( JJP 2 (( NP drel=’k7:VGF’> )) 2.1 ]k¹¼obo² NNN ºm,v,,,,,CD¾m,iTTuNTu’ head=’ASºobo- )) ½q¾m’ name=’VGF’ Participles m=’C½m’ 3 (( NP name=’NP3’drel=’ccof:NULL CCP’> 7.1 ASºobo½q¾m V VM VF
no reviews yet
Please Login to review.