Syntactic aggregation
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Syntactic aggregation

  • 417 views
Uploaded on

This paper explores the prevalent ...

This paper explores the prevalent
syntactic aggregation constructs in Bengali and present an approach towards generating fluent Bengali compound sentences using
the identified constructs.

More in: Education , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
417
On Slideshare
417
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
2
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Syntactic Aggregation in Bengali Text Generation Sumit Das, Anupam Basu, Sudeshna Sarkar Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur, India – 721302 sumit.jucse@gmail.com,{anupam,sudeshna}@cse.iitkgp.ernet.in Abstract two text spans in (1a), linked by a C ONJUNCTION The quality of the sentences generated by a rhetorical relation (Mann and Thompson, 1988) natural language generation system can be can be combined as in (1b). But (1b) contains un- evaluated based on their well-formedness necessary repetitions shown by the words in bold. (fluency, conciseness and coherence) and So, these can be aggregated to produce (1c) which faithfulness to the communication intent. is more fluent, concise, and coherent than (1b). In this paper, we explore the prevalent 1. a. * Jack went up the hill. syntactic aggregation constructs in Ben- * Jill went up the hill. gali and present an approach towards gen- b. Jack went up the hill and Jill went up erating Bengali compound sentences using the hill. the identified constructs. The inputs to our c. Jack and Jill went up the hill. syntactic aggregation method are the con- stituent simple sentences, rhetorical rela- Syntactic aggregation is the most common form of tions defined over them and the discourse aggregation observed in any real discourse. Shaw markers realizing the relations. The paper (2002) proposed that in syntactic aggregation sim- describes a rule based approach to form pler linguistic components are combined in accor- the compound sentences, by reorganiza- dance with linguistic rules. As it is a language de- tion of components followed by elimina- pendent process, so linguistic knowledge, such as, tion of redundancies of lexical entities, and preferred word ordering, special verb form usage presents a user based evaluation of the re- etc. are required for combining text spans. For sults obtained. example, in Bengali the two simple text spans in1 Introduction (2a), linked by S EQUENCE rhetorical relation, can be simply combined using appropriate discourseAny Natural language Generation (NLG) system marker eba.n as in (2b). But in (2b), the word inshould have the capability to remove unneces- bold is redundant. So, applying the conjunctionsary repetitions when generating text. Unneces- reduction construct the two text spans can be ag-sary repetitions make the text less fluent and non- gregated to generate (2c). But, (2c) can further becoherent. In NLG, the task of combining con- aggregated to (2d) by using non-finite verb giYe.stituent simpler text spans by removing repetitions 2. a. 1 (Ramis called aggregation. According to the standard * rAma mAThe giYechhilathree-stage pipeline NLG architecture proposed by went to the playground).Reiter and Dale (2000) aggregation is a basic task * rAma phuTabala khelechhilaof any NLG system for generating fluent, concise, (Ram played football).and coherent text. Dalianis (1993) viewed aggre- b. rAma mAThe giYechhila eba.n rAmagation mainly as redundancy elimination problem phuTabala khelechhila (Ram went toand should be done in such a way that the origi- 1 In this paper, Bengali graphemes are written using Ro-nal meaning of the text is preserved and no unde- man Script in ITRANS notation. They are written in italicssirable implication is produced. For example, the font. Proceedings of ICON-2009: 7th International Conference on Natural Language Processing Macmillan Publishers, India. Also accessible from http://ltrc.iiit.ac.in/proceedings/ICON-2009
  • 2. the playground and Ram played foot- eration. Apart from redundancy elimination, ag- ball). gregation choices can affect other characteristics c. rAma mAThe giYechhila eba.n phuTa- of text, such as sentence complexity, focus, em- bala khelechhila (Ram went to the phasis, theme/rhyme, prosody etc. playground and played football). Reape and Mellish (1999) defined aggregation d. rAma mAThe giYe phuTabala khelech- as a process to generate more concise, cohesive, hila. (Ram went to the playground and and fluent text by omitting or substituting repeat- played football). ing entities where the reader can infer the deleted entities from the remaining text. Reaper and Mel-Clearly, to syntactically aggregate smaller text lish distinguished among different types of aggre-spans in Bengali an NLG system should have the gation: conceptual, discourse, semantic, syntactic,knowledge of Bengali grammar. lexical, and referential. According to them syn- In this work, we have studied a corpus of Ben- tactic aggregation is the most common and can begali sentences to identify the prevalent syntac- stated by some grouping rules, like, subject group-tic aggregation constructs in Bengali. Then, we ing, predicate grouping etc.have proposed a method to syntactically aggregate Horacek (1992) has given a more theoreticaltwo simple clauses using the constructs identified view of aggregation. He explained it by someto generate a more fluent, concise and coherent grouping phenomena, like content based grouping,compound sentence. The inputs are two simple structurally motivated propositional grouping.clauses, the rhetorical relation between them and Shaw (2002) categorized aggregation into fourthe discourse marker realizing that relation. types: interpretive, referential , syntactic, and lex- The rest of this paper is organized as follows: In ical. He focused mainly on syntactic aggregation.section 2, we briefly mentioned the related works He divided syntactic aggregation into two types:in syntactic aggregation. In Section 3, we present a hypotactic and paratactic. In paratactic aggrega-corpus analysis to identify the prevalent syntactic tion all the constituent text spans are of equal sta-aggregation constructs in Bengali. Rhetorical rela- tus. On the other hand, in hypotactic aggregationtions considered in this work are mentioned in sec- the constituent text spans are related by some sub-tion 4 and the semantic representation used is de- ordinate relation.scribed in section 5. We described our approach in In Virtual Storyteller project (Marit Theune andsection 6 and the evaluation methods in 7. In sec- Hendriks, 2006) different conjunctive and ellipti-tion 8, concluding remarks and some future scopes cal constructs were used to syntactically aggregaterelevant to this work have been provided. simpler text span to generate more coherent and concise fairy-tales.2 Related Work All the works in the area of text aggregation en- countered so far are focused on English and otherThere does not exist any general consensus regard- European languages. In this work, we have pro-ing the exact definition of aggregation, the types posed methods to perform syntactic aggregation inof aggregation or the component of an NLG sys- Bengali text generation.tem where aggregation tasks should be performed.The general approach is to handle the aggregation 3 Corpus Analysistasks in domain and application specific way. Dalianis (1993; 1996) equated aggregation with We conducted a corpus analysis to identify thethe process of redundancy elimination. He divided prevalent syntactic aggregation constructs used init into four principal categories: syntactic, elision, Bengali for generating compound sentences. Forlexical, and referential aggregation. In syntactic this we have chosen text of narrative style be-aggregation repetitions are removed syntactically cause narrative texts are mainly activity or eventleaving one item (at least) in the text to express driven. So, it is easier to model the differentthe meaning explicitly. types of aggregation construct in narrative text. Wilkinson (1995) contradicted Dalianis’s views We have a corpus of 600 compound sentences col-of equating text aggregation with redundancy el- lected from Bengali story books. We have ran-emination because in certain context it can be domly chosen 350 sentences from that corpus fordone by using suitable referring expression gen- analysis. First the selected compound sentences
  • 3. were segmented into simple clauses. A simple * rAma bhAta eba.n shyAma ruTiclause is equivalent to a simple sentence which khAbe (Ram will eat rice andcontains only one finite verb and no coordinating Shyam will eat roti).conjunction. For example, the compound sentence Here the right most portion of the firstrAma eba.n shyAma kAla skule giYechhila (Ram proposition(khAbe) is deleted.and Shyam went to school yesterday) contains 2 – Coordinating one constituent: In thissimple clauses: rAma kAla skule giYechhila (Ram case, one constituent entity from eachwent to school yesterday) and shyAma kAla skule of the input simple clauses are co-giYechhila (Shyam went to school yesterday). By ordinated by a conjunction. This candecomposing the 350 compound sentences, we got happen to any entity of the constituent868 simple clauses (2.48 simple clauses per sen- simple clauses.tence). This measure is important to determine the * rAma eba.n shyAma phuTbalamaximum number of simple clauses that can be khelachhila (Ram and Shyam wasaggregated in a single sentence. We cannot keep playing football).on aggregating arbitrarily large number of sim- The subjects of the two constituent sim-ple clauses even if they are syntactically similar, ple clauses in the above example are co-since it may result in too complex but less fluent ordinated.text. From the corpus analysis, we have identi- – Non-finite verb generation: If bothfied two types of frequently used syntactic aggre- the input simple clauses are about somegation constructs in Bengali, e.g., paratactic con- events or actions performed sequen-struct and elliptic construct. tially or concurrently by the same sub- • Simple paratactic construction: In this ject then they are aggregated using non- case, the two constituent simple clauses are finite form of the verb of the first simple simply connected by the conjunctive dis- clause. course marker and no word deletion is re- * rAma baAta kheYe skule yAbe quired. (Ram will eat rice and go to school). In the above example, the two con- – rAma ekatA boi paRachhila eba.n stituent simple clauses are about two shyAma phuTabala khelachhila (Ram actions performed sequentially by the was reading a book and Shyam was same subject. So, perfect participle form playing football). of the verb khAoYA i.e. kheYe is used for • Elliptic construction: Ellipsis is defined as aggregation. the omission of superfluous words from the Any combination of the above four types of surface form which are inferable from the en- elliptic constructs is also allowed. For ex- tities in the remaining text. The different el- ample, in (3) both conjunction reduction and liptic constructs observed in Bengali are: RNR are used and (4) is generated by us- – Conjunction reduction: In conjunction ing both conjunction reduction and non-finite reduction, the subject of the second sim- verb. ple clause is deleted. 3. rAma bhAta eba.n mAchha khAbe * rAma khAbAra kheYechhe eba.n (Ram will eat rice and roti). bandhudera sAthe sinemA dekhate 4. rAma skule giYe phuTabala khelabe gechhe (Ram has eaten food and (Ram will go to school and play foot- gone to see a movie with friends). ball). In the example given above, the subject In summary, though for corpus study we have con- of the second simple clause, i.e., rAma sidered only narrative Bengali text, it is a part is deleted using conjunction reduction of more general approach. As syntactic aggrega- construct. tion is language dependent but domain indepen- – Right node raising (RNR): In RNR, dent task (Shaw, 2002), the contributions of this the right most portion of the first simple work can be extended to generate aggregated text clause is deleted. in Bengali in other domains as well.
  • 4. 4 Rhetorical Relations Considered information, such as, verb root (v-root), theme, tense, aspect, mood, polarity etc. The arg frameFrom the corpus study, we know that paratactic contains the nominal entities along with the the-aggregations are the most common form of syn- matic role of that entity in that clause. If theretactic aggregation in Bengali. In paratactic ag- is any modifier for the verb or any nominal en-gregation, the constituent text spans are of equal tity in a clause then the respective modifier framesstatus and are linked by a multi-nuclear rhetori- (v-mod and w-mod frame) are present inside thecal relations (Mann and Thompson, 1988). In this corresponding pre and arg frame.work, we have focused on the different paratac-tic constructs for syntactic aggregation of Bengalitext. The multi-nuclear rhetorical relations consid-ered in this paper are C ONJUNCTION , D ISJUNC -TION , C ONTRAST , and S EQUENCE as defined byoriginal Rhetorical Structure Theory (RST). In ad-dition to the said relations, we have consideredanother multi-nuclear temporal coherence relationPARALLEL as defined below: Two text spans are said to be related by PARALLEL relation if the actions or the events in those two text spans are occur- ring simultaneously.For example, the two constituent clauses present in(5) are rAma khAbAra khAchchhila (Ram was eat-ing food) and rAma Tibhi dekhachhila (Ram waswatching TV). The actions in these two clausesare concurrent. So, the coherence relation betweenthem is PARALLEL. 5. rAma khAbAra khete khete Tibhi dekhachhila (Ram was watching TV while eating food).5 The Semantic RepresentationThe semantic representation chosen here is a case-frame representation. This is called predicate-argument representation. The basic building blockin this representation is sentence. An example ofthe sentence frame is given in Figure 1. A sentencecontains a clause frame and clause-count which Figure 1: Case-frame representation for the sen-denotes the number of simple clauses present in tence “rAma pa.Dachhila eba.n shyAma khelach-the sentence. The clause is a recursive structure hila.” (Ram was reading and Shyam was play-that can contain clauses inside itself which makes ing).it capable of representing both simple and com-posite (compound and complex) sentences. Forsimple sentence, the outer clause only contains 6 Proposed Approachone inner clause. On the other hand, for compositesentence the outer clause contains the constituent In our approach for syntactic aggregation, the in-inner clauses along with the rhetorical relation (rh- puts are two simple clauses, the rhetorical relationrel) connecting and discourse marker (dm) realiz- between them, and the discourse marker realiz-ing that rhetorical relation. A clause frame con- ing that relation. To syntactically aggregate thetains a predicate frame (pre) and list of argument two simple clauses by using the different paratac-frames (arg). The pre frame contains verb related tic constructs identified in section 3 we propose
  • 5. the following steps: kakhana < kothAYa. The role on the left side of < will appear before the role on the right side in the • Step 1: Ordering arguments in the constituent surface form. clauses. 6. Ami AgAmIkAla skule yAba (I shall go to • Step 2: Repeating entity identification. school with my father). • Step 3: Ordering constituent clauses. Again, in (7) the role set is {ke, kothAYa, kakhana, • Step 4: Superfluous words deletion and non- kAra sAthe}. By using (7) the total order obtained finite verb generation. from (6) can be extended to ke < kakhana < kAra sAthe < kothAYa. • Step 5: Correct surface form generation. 7. Ami AgAmIkAla bAbAra sAthe skule yAbaThe above steps are described below. (Tomorrow I shall go to school with my fa- ther).6.1 Argument Ordering in the Constituent Clauses Using the above method for the entire set of sim-Preferred word ordering in a sentence varies with ple clauses we have identified the set of possiblelanguages and it is very important for syntactic ag- roles in Bengali and developed a total order amonggregation. Though Bengali is a free-word-order them. The arg frames in the input simple clauseslanguage, the preferred word ordering in a Bengali are ordered using the developed total order.sentence is subject-object-verb. In this work, the input simple clauses are taken 6.2 Repeating Entity Identificationin their corresponding semantic case-frame repre- In our current approach, to remove the redundantsentation as shown in Figure 1. The arg frames in entities first we have identified the repeating enti-the clause are then ordered by using a total order ties present in both the simple clauses taken as in-among the roles associated with the arg frames. put. We are assuming that the nominal entities areThese roles are neither semantic roles nor Paninian equivalent if they have the same thematic role androles. The problem that prevents both the seman- root word in the constituent simple clauses. Fortic and Paninian roles is that, none of them can example, in the simplified semantic representa-be associated with a unique postposition which tion of the compound sentence shown in Figure 2,is very important for generating sentence in Ben- the constituent simple clauses have one repeatinggali. So the alternative approach should be to de- nominal entity. In both the simple sentences, thesign some intermediate representation that has suf- thematic role of that entity is ki and surface form isficient granularity of the roles, such that ambigu- bhAta. Two verbs are equivalent if they have sameous assignments of postpositions are not possible. root words and other functional parameters, suchNow, Bengali has a list of postpositions that are as, tense, aspect, mood, polarity etc. In Figure 2,used in different contexts to convey different se- verbs are equivalent and thus repeating. Two nounmantics. In this work, roles have been designed modifiers are equivalent if they have the same rootat a granularity level where one role is assigned to word and are modifying two nominal entities witha semantically unique postposition. For develop- the same thematic role. Lastly, two verb modifiersing the total order of the roles, we have followed are equivalent if they have same root word. Thean approach taken in the SANYOG system (Bhat- repeating entities are tagged with the status RE-tacharya, 2004). We have taken the constituent PEATING.simple clauses of the compound sentences usedfor corpus analysis. Each simple clause was rep- 6.3 Ordering Constituent Clausesresented in their case-frame representation and the All the rhetorical relations considered in this work,arg frames inside them are then ordered as they ap- mentioned in section 4, are multi-nuclear rela-pear in the surface form of the clause. In this way, tions. So, two simple clauses connected by anythe ordering among the roles of the arg frames in a of these relations, except S EQUENCE relation, canclause is known. For example, the role set for (6) be realized in any order. In case of S EQUENCEis {ke, kothAYa, kakhana}. From (6) we can infer relation, an ordering constraint is imposed by thethat the preferred order among these roles is ke < sequence of the input clauses. So, for S EQUENCE
  • 6. Figure 2: Simplified case-frame representation for the sentence “rAma eba.n shyAma bhAta khAbe.”(Ram and Shyam will eat rice). Note: ∼() denotes a frame.relation the clauses cannot be reordered. For • Polarity: If two simple clauses have theother relations, after identifying the repeating en- same tense but different polarity for the verbtities, the constituent simple clauses in the result- then the clause with negative polarity willing compound sentence are reordered on the basis come first in the surface form. For exam-of their chronological order and polarity following ple, if the simple clauses in (9a), linked bythe rules mentioned below: C ONJUNCTION relation, are aggregated as in (9b) then the negative polarity marker nA af- • Tense: If the two constituent clauses have fects both the verb kinabe and khAbe. So, the different tense then they are ordered chrono- communicative goal is not preserved. How- logically. This improves the fluency of the ever, if the clauses are reordered and then ag- generated compound sentence. For example, gregated, (9c) results which is grammatically if the two clauses in (8a), linked by C ON - correct, fluent and preserves the meaning. JUNCTION relation, are aggregated without chronological ordering then (8b) is gener- 9. a. rAma chakaleTa kinabe. rAma ated. But if they are ordered according to chakaleTa khAbe nA. (Ram will their tense and aggregated then (8c) is gener- buy chocolate. Ram will not eat ated which is more fluent and coherent then chocolate). (8b). b. rAma chakaleTa kinabe eba.n khAbe 8. a. · Ami bA.Di yAba. (I shall go nA (Ram will buy chocolate and home). will not eat). · rAma skule gechhe. (Ram has c. rAma chakaleTa khAbe nA eba.n gone to school). kinabe (Ram will not eat chocolate and will buy). b. Ami bA.Di yAba eba.n rAma skule gechhe. (I shall go home and Ram The ordering based on polarity is done when has gone to school). the clauses are linked by either C ONJUNC - c. rAma skule gechhe eba.n Ami bA.Di TION or D ISJUNCTION relation. yAba. (Ram has gone to school 6.4 Superfluous Words Identification and and I shall go home). Non-finite Verb Generation The chronological ordering is done when After identifying the repeating entities and order- the rhetorical relation between the two con- ing the constituent clauses, the superfluous words stituent clauses is C ONJUNCTION, D ISJUNC - are identified using the following two methods: TION or C ONTRAST . As the constituent sim- ple clauses are concurrent for PARALLEL re- • Forward deletion: If the entities at the be- lation, this ordering is not required. ginning of the surface forms of both clauses
  • 7. are REPEATING then they are marked as bold faced words in the second clause are forward DELETED in the second clause. Surface deleted. forms of both the clauses are traversed from left-to-right and REPEATING entities are 12. rAma Aja bhAta khAbe eba.n rAma kAle marked as DELETED in the second clause bhAta khAbe (Ram will eat rice and Shaym unless a NON-REPEATING entity is encoun- will eat rice). tered. For example, the two constituent 13. rAma Aja bhAta khAbe kintu rAma kAle clauses in (10), linked by C ONJUNCTION re- bAbAra sAthe ruti khAbe (Ram will eat rice lation, have REPEATING entities with the today but Ram will eat roti with father tomor- role ke and kakhana and they occur at the row). beginning of both the clauses. So, the RE- PEATING entities are marked DELETED in In case of S EQUENCE or PARALLEL relation, only the second clause indicated by the words in forward deletion is done. In addition to that, the bold face. verb of the first clause is modified to non-finite form if the subjects of both the clauses are the 10. rAma gatakAla khAbAra kheYechhila same. For S EQUENCE relation, the non-finite form eba.n rAma gatakAla skule giYechhila is the perfect participle of the verb and for PAR - (Ram ate food yesterday and Ram went ALLEL relation, it is the progressive participle. to school yesterday). For example, in (14a) the two clauses are linked • Backward deletion: If the verb and the by S EQUENCE relation. So, first the bold faced entities at the end of the surface forms of words in the second clauses are forward deleted both clauses are REPEATING then they are and then perfect participle form of the verb of the marked as DELETED in the first clause. Sur- first clause is generated. This results in the com- face forms of both the clauses are traversed pound sentence (14b). Similarly, the two clauses from right-to-left and REPEATING verb and in (15a), linked by PARALLEL relation, are also entities are marked as DELETED in the first aggregated to (15b) by using the progressive par- clause unless a NON-REPEATING entity is ticiple of the root verb paRA. encountered. For example, the two con- 14. a. rAma bA.Di yAbe eba.n rAma bhAta stituent clauses in (11), linked by C ONJUNC - khAbe (Ram will go home and Ram TION relation, have REPEATING verb and will eat rice). a REPEATING entity with the role ki and b. rAma bA.Di giYe bhAta khAbe (Ram they occur at the end of both the clauses. will go home and eat rice). So, the REPEATING elements are marked DELETED in the first clause indicated by the 15. a. rAma bai pa.Dachhila eba.n rAma words in bold face. khAbAra khAchchhila (Ram was read- 11. rAma bhAta khAbe eba.n shyAma ing a book. Ram was eating food). bhAta khAbe (Ram will eat rice and b. rAma bai pa.Date pa.Date khAbAra Shaym will eat rice). khAchchhila (Ram was eating food while he was reading a book).If the two simple clauses, linked by C ONJUNC -TION , D ISJUNCTION or C ONTRAST relation, have 6.5 Correct Surface Form Generationthe same role set then the REPEATING entities are The redundant words are identified in the previ-forward deleted and backward deleted. For exam- ous step but the actual deletion is done is thisple, in (12) the two simple clauses, connected by step. While generating the resulting compoundC ONJUNCTION relation, have the same set of as- sentence, the entities marked as DELETED are notsociated roles. So, bold faced words in the second realized i.e. deleted from the surface form.clause are deleted forward and those in the first In case of subject coordinating and RNR con-clause are deleted backward. However, if the role structs, if the subjects of the two input clauses areset is different then only forward deletion is done. different then correct surface form of the commonAs the two clauses in (13), connected by a C ON - verb should be generated. For example, in (16)TRAST relation, has different role sets, only the the surface form used for the common verb khelA
  • 8. is khelba which is generated by the subject of the 7 Evaluationfirst clause i.e. Ami. We have developed a system which performs syn-16. Ami eba.n rAma kAla phuTabala khelaba (I tactic aggregation of two simple clauses by follow- and Ram will play football tomorrow). ing the steps mentioned in section 6. Evaluation of that system is important to validate our approach.Here we have given some rules for generating cor- We performed a user based evaluation. The sys-rect inflectional form of the common verb for dif- tem outputs were shown to the human evaluatorsferent syntactic aggregation constructs in Bengali. and they were asked to rate those outputs based • In case of subject coordinating, if one of the on some parameters. Depending upon their feed- subjects is of first person then the common backs the overall system performance is measured. verb will be inflected by that first person sub- We evaluated the system with three human eval- ject. As, in (17) the common verb inflection uators and they were native speakers of Bengali. yAba is generated by the first person subject They were only given a brief idea about the rhetor- Ami. ical relations considered in this work. As men- tioned in section 3, from a corpus of 600 com- 17. Ami eba.n tumi kAla skule yAba (I and pound sentences 350 were chosen randomly for Ram will play football tomorrow). corpus study. The remaining 250 sentences were • In case of subject coordinating, if one of the used as test sentences in the evaluation. The test subjects is of second person and the other is sentences were segmented into constituent sim- of either second or third person then the com- ple clauses. The simple clauses, the rhetorical re- mon verb will be inflected by that second per- lation connecting them, and the appropriate dis- son subject. As, in (18) the common verb in- course marker realizing that relation were given to flection yAo is generated by the second per- the human evaluator as the test inputs. The evalu- son subject tumi. ation is performed depending upon the following two criteria: 18. tumi eba.n rAma skule yAo (You and Ram go to school). • Well-formedness: We define the well- formedness of an output sentence by its • In case of subject coordinating, if both the grammatical correctness and conciseness. subjects are of third person then the subject The grammatical correctness measures the of the complete clause will inflect the com- accuracy of the syntax, word order and the mon verb. As, in (19) both the subjects are of morphological inflections used. third person and the common verb inflection karabena is generated by the subject of the • Faithfulness: The faithfulness of an output complete clause i.e. tini. measures how well the communication goal 19. rAma eba.n tini kAjatA karabena is preserved by the generated output. (Ram and he will do the work). For both the measures, the evaluators were • In case of RNR construct other than the sub- asked to score the outputs on a scale of 1 to 5. ject coordinating, the subject of the complete 1 is the best and 5 is the worst. The scoring for clause will inflect the common verb. As, well-formedness and faithfulness were done sepa- in (20) the common verb inflection khelabe rately by an individual evaluator so that the score is generated by the subject of the complete of one does not influence the score of the other. clause i.e. se. The results of each evaluator for well-formedness and faithfulness are shown in Figure 3 and Figure 20. Ami krikeTa eba.n se phuTabala khe- 4 respectively. labe (I shall play cricket and he will To calculate overall performance of the system play football). the scores given by individual evaluator were com-So, following the above rules the correct inflec- bined as follows: If two or more evaluators havetional form of the common verb is generated given a common score to a test sentence then itwhich increases the fluency and naturalness of the is assigned to that common score; If all the eval-generated text. uators have given different scores to a test sen-
  • 9. tence then it is not considered for overall perfor-mance calculation. The overall performance ofour system for well-formedness and faithfulnessare shown in Figure 5 and Figure 6 respectively. Figure 6: Faithfulness Pie Chart ciseness. For example, the two clauses in (21a) are Figure 3: Well-formedness Bar Graph connected by S EQUENCE relation and the system syntactically aggregates them to (21b). But (21b) is very good in terms of word ordering and con- ciseness. 21. a. rahima ekadina rAstAYa bhi.Da dekhechhila. rahimera mAthA ghure giYechhila (One day Rahim saw a huge mass in the street. Rahim was moved by that). b. rahima ekadina rAstAYa bhi.Da dekhechhila eba.n tAra mAthA ghure giYechhila (One day Rahim saw a huge mass in the street and he was Figure 4: Faithfulness Bar Graph moved by that). The errors regarding the faithfulness measure are due to wrong order of the constituent clauses and absence of cues which indicates emphasis and prosody. For example, the two clause in (22a), connected by C ONJUNCTION relation, are aggre- gated to (22b). But the output is ambiguous in terms of faithfulness as both the verbs are now in the scope of the words bAbAra sAthe. 22. a. rAma bAbAra sAthe khAbAra khAbe. rAma Tibhi dekhabe (Ram will eat food with father. Ram will watch TV). b. rAma bAbAra sAthe khAbAra khAbe eba.n Tibhi dekhabe (Ram will eat food with father and watch TV). Figure 5: Well-formedness Pie Chart 8 Conclusion The inconsistencies with respect to well-formedness of the system generated output are In this article, we have shown our methods to gen-mainly due to the errors in word ordering and con- erate aggregated and elliptic sentences in Bengali
  • 10. from clause-sized semantic representations. The Mukhopadhyay for their valuable advice and sup-current system can produce paratactic construc- port. This work is supported by the project Sanyogtions and use ellipsis to omit repeated entities. We - Phase II, funded by Media Lab Asia, and con-were able to produce all the desired forms of syn- ducted in Communication Empowerment Labora-tactic aggregation (see Section 3), though there are tory, Indian Institute of Technology.scopes for improvements. Deletion of the repeating words in the gener-ated output sentence sometimes does not preserve Referencesmeaning. In that case, to make the text fluent Samit Bhattacharya. 2004. Sanyog: An iconic sys-anaphoric pronouns need to be used. For example, tem for multilingual communication for people with speech and motor impairments. M.S. Thesis, IIT,if the two clauses in (23a), connected by C ON - Kharagpur, Supervisor-Basu, A, Sarkar, Sudeshna.JUNCTION relation, are aggregated by removingthe repeating words in boldface then actual com- Hercules Dalianis and Eduard H. Hovy. 1993. Aggre-municative goal is not preserved. In place of that, gation in natural language generation. In EWNLG ’93, Proceedings of the 4th European Workshop onthese two clauses are correctly aggregated to (23b) Natural Language Generation, Pisa, Italy.by using anaphoric pronoun tAra. H. Dalianis. 1996. Aggregation as a subtask of text and sentence planning. In J.H.Stewman (ed.), Proceed-23. a. Ami rAmer sAthe phuTabala khelaba ings of Florida AI Research Symposium, FLAIRS- eba.n yadu rAmer sAthe sinemA 96, pages 1–5, Key West, Florida. dekhabe (I shall play football with Ram and Jadu will see a movie with Helmut Horacek. 1992. An integrated view of text planning. In Proceedings of the 6th International Ram). Workshop on Natural Language Generation, pages b. Ami rAmer sAthe phuTabala khelaba 29–44, London, UK. Springer-Verlag. eba.n yadu tAra sAthe sinemA dekhabe William C. Mann and Sandra A. Thompson. 1988. (I shall play football with Ram and Jadu Rhetorical structure theory: Toward a functional the- will see a movie with him. ory of text organization. Text, 8(3):243–281. Feikje Hielkema Marit Theune and Petra Hendriks.The current system takes discourse marker as in- 2006. Performing aggregation and ellipsis using dis-put for a combining simple clauses. But it can course structures. Research on Language and Com-be extended to select the appropriate discourse putation, 4(4):353–375.marker depending upon the rhetorical relation and M. Reape and C. Mellish. 1999. Just what is aggre-other functional informations such as polarity, gation anyway. In Proceedings of the 7th Europeanprosody, emphasis etc. Workshop on Natural Language Generation, pages The system can be extended to aggregate more 20–29, May.than two simple clauses. In that case the docu- Ehud Reiter and Robert Dale. 2000. Building Naturalment structure tree (Reiter and Dale, 2000) will be Language Generation Systems. Cambridge Univer-the input. Clauses can be aggregated according to sity Press, New York, NY, USA.the specification of the document structure tree un- James Chi-Kuei Shaw. 2002. Clause aggregation: anless the complexity of an single sentence exceed approach to generating concise text. Ph.D. thesis,a predefined threshold. Depending upon the re- New York, NY, USA. Sponsor-Mckeown, Kathleensulting sentence complexity and other contextual R.information, sentence break may be declared re- John Wilkinson. 1995. Aggregation in natural lan-sulting in multi-sentential text. guage generation: Another look. Technical report, In our future works, we intend to handle the Computer Science Department, University of Water-above mentioned limitations to generate more nat- loo.ural Bengali text.AcknowledgementWe would like to thank anonymous reviewers forvaluable comments. We would also like to thankMr. Plaban Kumar Bhowmik and Mr. Sibansu