166x Filetype PDF File size 0.55 MB Source: aclanthology.org
Analysts Grammar or Japanese tn the Nu-ProJect - A Procedural Approach to Analysts Grammar - Jun-tcht TSUJII. Jun-tcht NAKANURA and Nakoto NAGAO Department of Electrical Engineering Kyoto University Kyoto. JAPAN Abstract CFG rules Independently describe constraints on stngle linguistic structures, and a universal rule Analysts grammar of Japanese tn the Mu-proJect application mechanism automatically produces a set ts presented, It is emphasized that rules of posstble structures which satisfy the given expressing constraints on stngle linguistic constraints. It ts well-known, however, that such structures and rules for selecting the most sets of posstble structures often become preferable readtngs are completely different In unmanageably large. nature, and that rules for selecting preferale Because two separate rules such as readings should be utilized tn analysts grammars of practical HT systems. It ts also clatmed that procedural control ts essential tn integrating such NP ..... • NP PREP-P rules tnto a unified grammar. Some sample rules VP ..... • VP PREP-P are gtven to make the points of discussion clear are usually prepared tn CFG grammars tn order to and concrete. analyze noun and verb phrases modifted by 1. Introduction prepositional phrases. CFG grammars provide two syntactic analyses for The Hu-ProJect ts a Japanese nattonal project She was given flowers by her uncle. supported by grants from the Special Coordination Funds for Promoting Science & Technology of Furthermore. the ambiguity of the sentence ts STA(Sctence and Technology Agency). whlch atms to doubled by the lexlcal ambiguity of "by". which can develop Japanese-English and English-Japanese be read as etther a locattve or an agenttve machine translation systems. Ve currently restrict preposition. Since the two syntactic structures the domain of translation to abstracts of are recognized by compZetely independent ru]es and scientific and technological papers. The systems the semantic interpretations of "by" are given by are based on the transfer approach[;], and consist independent processes tn the ]ater stages. It ts of three phases: analysts, transfer and generation. difficult to compare these four readings during the In thts paper, we focus on the analysts grammar of anaZysts to gtve a preference to one of these four Japanese tn the Japanese-English system. The readings. grammar has been developed by using GRADE which ts a programming language specially designed for thts A rule such as project[2]. The grammar now consists of about 900 GRADE rules. The experiments so far show that the "If a sentence ts passlve and there ts a grammar works very well and ts comprehensive enough "by"-prepostttonal phrase, tt ts often the case to treat various linguistic phenomena tn abstracts. that the prepositional phrase ftlls the deep In thts paper we wtll discuss some of the basic agenttve case. (try thts ana]ysts first)" design principles of the grammar together wtth its detatled construction. Some examples of grammar seems reasonable and quite useful for choosing the rules and analysts results wtll be shown to make most preferable interpretation, but tt cannot be the points of our discussion clear and concrete. expressed by refining the ordinary CFG rules. Thts ktnd of ru]e ts quite different In nature from a 2. Procedural Grammar CFG ru]e. It ts not a rule of constraint on a stng]e ]tngutsttc structure(in fact. the above four There has been a prominent tendency tn recent readings are a]l ]tngulsttcal]y posstb]e), but tt computational linguistics to re-evaluate CFG and ts a "heuristic" ru]e concerned with preference of use tt dtrectly or augment tt to analyze readings, which compares several alternative sentences[3.4.5]. In these systems(frameworks), analysts paths and chooses the most feastble one. Human translaters (or humans tn general) have many 267 such preference rules based on vartous sorts of cue such as morphological forms of words, collocations 3 Organization of Grammar of words, text styles, word semantics, etc. These heuristic rules are quite useful not only for In thts sectton, we will give the organization increasing efficiency but also for preventing of the grammar necessary for understanding the proliferation of analysts results. As Wllks[6] discuss|on |n the follow|ng sections. The matn potnted out, we cannot use semanttc Information as components of the grammar are as follows. constraints on stngle linguistic structures, but Just as preference cues to choose the most feastble Interpretations among linguistically posstble (1) Post-Morphological Analysts Interpretations. We clatm that many sorts of (2) Determination of Scopes preference cues other than semanttc ones exist tn (3) Analysts of Stmple Noun Phrases real texts whtch cannot be captured by CFG rules. (4) Analysts of Stmple Sentences We will show tn thts paper that. by utilizing (5) Analysts of Embedded Sentences (Relative vartous sorts of preference cues. our analysts Clauses) grammar of Japanese can work almost (6) Analysts of Relationships of SentenCes determtntsttcally to gtve the most preferable (7) Analysts of Outer Cases Interpretation as the ftrst output, wtthout any (8) Contextual Processing (Processing of Omttted extensive semanttc processing (note that even case elements. Interpretation of 'Ha' . etc.) "semant|c" processing cannot dtsambtguate the above (9) Reduction of Structures for Transfer Phase sentence. The four readings are semantically possible. It requtres deep understanding of Each component conststs of from 60 to 120 contexts or situations, whtch we cannot expect tn a GRADE rules. practical MT system). In order to Integrate heuristic rules based on 47 morpho-syntacttc categories are provtded var|ous levels of cues tnto a untfted analysts for Japanese analysts, each of whtch has tts own grammar, we have developed a programming langauage. lextcal description format. 12.000 lextcal entrtes GRADE. GRADE provtdes us wtth the following have already been prepared according to the facilities. formats. In thts classification. Japanese nouns are categorized |nto 8 sub-classes according to Expllctt Control of Rule Appl|cattons : thetr morpho-syntacttc behavtour, and 53 semanttc Heuristic rules can be ordered according to thetr markers are used to characterize thetr semanttc strength(See 4-2). behaviour. Each verb has a set of case frame descriptions (CFD) whtch correspond to different - Nulttple Relatton Representation : Vartous usages of the verb. A CFD g|ves mapping rules levels of Informer|on Including morphological. between surface case markers (SCN - postpostttonal syntactic, semantic, logtcal etc. are expressed tn case particles are used as SCN's tn Japanese) and a s|ngle annotated tree and can be manipulated at thetr deep case interpretations (DCZ 33 deep any ttme durtng the analysts. Thts ts requtred not cases are used). DC! of an SCM often depends on only because many heuristic rules are based on verbs so that the mapping rules are given %o CFD's heterogeneous levels of cues. but also because the of Individual verbs. A CFO also gtves a normal analysts grammar should perform semantic/logical collocation between the verb and Interpretation of sentences at the same ttme and SCM's(postpositonal case particles). Oetatled the rules for these phases should be wrttten tn the lextcal descriptions are gtven and discussed tn same framework as syntactic analysis rules (See another paper[7]. 4-2. 4-4). The analysts results are dependency trees - Lextcon Drtven Processing : We can wrtte whtch show the semanttc relationships among tnput heuristic rules spectftc to a stngle or a 11mtted words. number of words such as rules concerned wtth collocations among words. These rules are strong tn the sense that they almost always succeed. They 4. Typtcal Steps of Analysts Grammar are stored tn the lextcon and tnvoked at appropriate times durtng the analysts wtthout In the following, we w111 take some sample decreasing efficiency (See 4-1). rules to Illustrate our points of discussion. - Expltct% Definition of Analysts Strategies : 4-; Relative Clauses The whole analysts phase can be dtvtded into steps. Thts makes the whole grammar efficient, natural and Relative clause constructions in Japanese easy %o read. Furthermore. strategic consideration express several different relationships between plays an essential role tn preventing undesirable modifying clauses (relative clauses) and thelr interpretations from betng generated (See 4-3). antecedents. Some relattve clause constructions 268 cannot be translated as relative clauses tn [ex-1] [Type 2] Engltsh. Me classified Japanese relattve clauses "SHORZSOKUDO" "GA" "HAYA[" "KEISANK[" Into the followtn 9 four types, according to the (processing speed) (case (htgh) I (computer) I relationships between clauses and their particle: antecedents. subject (1) Type 1 : Gaps In Cases I case) /t One of the case elements of the relattve RelattvetClause Antecedent clause ts deleted and the antecedent fills the gap. -->(English Translation) (2) Type 2 : Gaps In Case Elements A computer whose processing speed ts htgh The antecedent modifies a case element tn the (Rule 3) Nouns such as "MOKUTEKZ"(puPpose). clause. That ts. a gap exists tn a noun phrase tn "GEN ZN"(reason), "SHUDAN"(method) etc. express the clause. deep case relationships by themselves, and. when these nouns appear as antecedents. |t is often the (3) Type 3 : Apposition case that they ft11 the gaps of the corresponding deep cases tn the relattve clauses. The clause describes the content of the [ex-2] [Type 1] antecedent as the Engltsh "that"-clause tn 'the tdea that the earth ts round'. "KONO" "SOUCHI" "O" "TSUKAT" "TA" "MOKUTEK[" (4) Type 4 : Partlal Apposltlon (th,s)l(dev,c. (c.. ICpurpos.) |part,cle:h /,ormat,ve: I J The antecedent and the clause are related by I / °bJect l / pest) l certain semantic/pragmatic relationships. The /case) ~ / relative clause of thts type doesn't have any gaps. RelattvetClause Antecedent This type cannot be translated dtrectly lnto English relative clauses. Me have to Interpolate --> (English Translation) In English appropriate phrases or clauses whtch are Implicit tn Japanese. tn order to express the The purpose for wh|ch (someone) used thts devtce semantic/pragmatic relationships between the The purpose of ustn9 thts devtce antecedents and relative clauses explicitly. In other words, gaps extst tn the Interpolated phrases or clauses. (Rule 4) There ts a 11mtted number of nouns whtch Because the above four types of relattve are often used as antecedents In Type 4 relattve clauses have the same surface forms fn Japanese clauses. Each of such nouns requtres a specific phrase or clause to be Interpolated tn Engltsh. ......... (verb) (noun). [ex-3] [Type 4] RelattvefClause Antecedent "KONO" "SOUCHI" "0" "TSUKAT"-- "TA" "KEKKA" careful processing ts requtred to d|sttngutsh them (th,s),(devlce)/~case e.~. (to use)/~tense ~'...(;esult) (note that the "antecedents' -modified nouns- ape ...l fformat,ve:h J located after the relat|ve clauses tn Japanese). A 1 ,object , Ipast) I 1 sophisticated analysis procedure has already been [ I case) l developed, which fully ut|ltzes vartous levels of Rel at tve ~ Clause Antecedent heuristic cues as follows. (Rule 1) There are a 11mtted number of nouns whtch --> (Engllsh Translation) are often used as antecedents of Type 3 clauses. (Rule 2) Vhen nouns with certa|n semanttc markers The result which was obtatned by ustng thts dev|ce appear tn the relattve clauses and those nouns are followed by one of spectflc postpostttonal case In the above example, the clause "the result whtch part4cles, there ts a htgh possibility that the someone obtatned (the result : gap)" ts onmitted tn relattve clauses are Type 2. In the following Japanese. whtch relates the antecedent example, the word "SHORISOKUDO"(processtn 9 speed) "KEKKA"(result) and the relattve clause "KONO has the semanttc marker AO (attribute). SOUCHI 0 TSUKAT_TA"(someone used thts devtce). 269 A set of lextcal rules ts defined for (Rule 1) Stnce parttcle "TO" ts also used as a case "KEKKA"(resulL). which basically works as follows : particle, tf It appears tn the position: tt examines first whether the deep object case has already been filled by a noun phrase tn the Noun 'TO" verb Noun, relattve clause. If so, the relattve clause ts Noun 'TO' adjective Noun. taken as type 4 and an appropriate phrase ts Interpolated as tn [ex-3]. If not, the relattve clause ts taken as type 1 as tn the following there are two posstble Interpretations. one tn example where the noun *KEKKA" (result) ftlls the whlch "TO" Is a case parttcle and "noun TO gap of object case tn the relattve clause. adjective(verb)' forms a relattve clause that modifies the second noun. and the other one tn [ex-4] [Type 1] which "TO" ts a conjunctive particle to form a conJuncted noun phrase. However. it ts very 11kely "KONO" "JIKKEN • / •GA". "TSUKAT• J"TA" l "KEKKA" that the parttcle 'TO' ts not 8 conjunctive (thts)J(expertment)//(case~(to use)~(tense (r~ult) parttcle but a post-positional case particle, if rParticle~ iformsttve:]l the adjective (verb) ts one of adjectives (verbs) IsubJect I I past)| I which requtre case elements wtth surface case mark [ _ll case) l / I "TO' and there are no extra words between "TO • end the adjective (verb). In the following example. Relattve Clause Antecedent "KOTONARU(to be different)" ts an adjective which ts often collocated wtth a noun phrase followed by -->(English Translation) case particle "TO". The result whtch thts experiment used [ex-5] YOSOKU-CHI "TO" KOTONARU ATAI (predicted value) (to be different) (value) Such lextcal rules are Invoked at the beginning of the relattve clause analysts by a rule tn the math [dominant interpretation] flow of processing. The noun "KEKKA • (result) is given a mark as a lexlcal property which Indicates IYOSOKU-CHI "TO" KOTONARU ATIAI the noun has special rules to be Invoked when tt appears as an antecedent of a relatlve clause. A11 relattve~clause ant/cedent the nouns which requlre speclal treatments In the relative clause analysts are given the same marker. • the value which ts different from the The rule tn the matn flow only checks thts mark and predicted value Invokes the lextcal rules defined tn the lextcon. [less domtnant Interpretation] (Rule 5) Only the cases marked by postpostttonal case particles 'GA'. 'WO" and 'NI" can be deleted YOSOKU-CHI "TO" KOTONARU ATAI tn Type 1 relattve clauses, when the antecedents are ordtnary nouns. Gaps tn Type 1 relative clauses Me N~ can have other surface case marks, only when the I I antecedents are spectal nouns such as described tn conJuncte~ noun phrase Rule (3). = the predicted value and the different value 4-2 ConJuncted Noun Phrases (Rule 2) If two "TO* particles appear tn the ConJuncted noun phrases often appear in position: abstracts of scientific and technological papers. It ts Important to analyze them correctly. Noun-1 'TO' . ......... Noun-2 'TO' 'NO" NOUN-3 especially to determine scopes of conjunctions correctly, because they often lead to proliferation the right boundary of the scope of the conJuctton of analysis results. The particle "TO" plays ts almost always Noun-2. The second 'TO" plays a almost the same role as the Engllsh "and" to role of a delimiter which deltmtts the right conjunct noun phrases. There are several heuristic boundary of the conjunction. Thts 'TO" tS rules based on various levels of information to optional, but tn real texts one often places tt to determine the scopes. make the scope unambiguous, especially when the second conjunct IS a long noun phrase and the scope is highly ambiguous without tt. Because the seconda delimiter of the conjunction) and 'NO' following a case parttcle turns the preceding phrase to a 270
no reviews yet
Please Login to review.