This document discusses syntactic aggregation in Bengali text generation. It analyzes a corpus of Bengali sentences to identify common syntactic aggregation constructs, including paratactic and elliptic constructions. It then proposes an approach to syntactically aggregate two simple Bengali clauses into a more fluent compound sentence based on the identified constructs. The approach takes as input the constituent clauses, their rhetorical relation, and connecting discourse marker to generate the aggregated sentence.
PARSING ARABIC VERB PHRASES USING PREGROUP GRAMMARSijnlc
Parsing of Arabic phrases is a crucial requirement for many applications such as question answering and machine translation. The calculus of pregroup introduced by Lambek as an algebraic computational machinery for the grammatical analysis of natural languages. Pregroup grammar used to analyse sentence structure in many European languages such as English, and non-European languages such as Japanese. In Arabic language, Lambek employed the notions of pregroup to analyse some grammatical structures such as conjugating the verb, tense modifiers and equational sentences. This work attempts to develop an initial phase of an efficient automatic pregroup grammar parser by using linear approach to analysethe verbal phrases of Modern Standard Arabic (MSA). The proposed system starts building Arabic lexicon contains all possible categories of Arabic verbs, then analysing the input verbal Arabic phrase to check if it is wellformed by using linear parsing algorithm based on pregroup grammar rules.
PARSING ARABIC VERB PHRASES USING PREGROUP GRAMMARSijnlc
Parsing of Arabic phrases is a crucial requirement for many applications such as question answering and machine translation. The calculus of pregroup introduced by Lambek as an algebraic computational machinery for the grammatical analysis of natural languages. Pregroup grammar used to analyse sentence structure in many European languages such as English, and non-European languages such as Japanese. In Arabic language, Lambek employed the notions of pregroup to analyse some grammatical structures such as conjugating the verb, tense modifiers and equational sentences. This work attempts to develop an initial phase of an efficient automatic pregroup grammar parser by using linear approach to analysethe verbal phrases of Modern Standard Arabic (MSA). The proposed system starts building Arabic lexicon contains all possible categories of Arabic verbs, then analysing the input verbal Arabic phrase to check if it is wellformed by using linear parsing algorithm based on pregroup grammar rules.
STRESS TEST FOR BERT AND DEEP MODELS: PREDICTING WORDS FROM ITALIAN POETRYkevig
In this paper we present a set of experiments carried out with BERT on a number of Italian sentences taken
from poetry domain. The experiments are organized on the hypothesis of a very high level of difficulty in
predictability at the three levels of linguistic complexity that we intend to monitor: lexical, syntactic and
semantic level. To test this hypothesis we ran the Italian version of BERT with 80 sentences - for a total of
900 tokens – mostly extracted from Italian poetry of the first half of last century. Then we alternated
canonical and non-canonical versions of the same sentence before processing them with the same DL
model. We used then sentences from the newswire domain containing similar syntactic structures. The
results show that the DL model is highly sensitive to presence of non-canonical structures. However, DLs
are also very sensitive to word frequency and to local non-literal meaning compositional effect. This is also
apparent by the preference for predicting function vs content words, collocates vs infrequent word phrases.
In the paper, we focused our attention on the use of subword units done by BERT for out of vocabulary
words.
STRESS TEST FOR BERT AND DEEP MODELS: PREDICTING WORDS FROM ITALIAN POETRYkevig
In this paper we present a set of experiments carried out with BERT on a number of Italian sentences taken
from poetry domain. The experiments are organized on the hypothesis of a very high level of difficulty in
predictability at the three levels of linguistic complexity that we intend to monitor: lexical, syntactic and
semantic level. To test this hypothesis we ran the Italian version of BERT with 80 sentences - for a total of
900 tokens – mostly extracted from Italian poetry of the first half of last century. Then we alternated
canonical and non-canonical versions of the same sentence before processing them with the same DL
model. We used then sentences from the newswire domain containing similar syntactic structures. The
results show that the DL model is highly sensitive to presence of non-canonical structures. However, DLs
are also very sensitive to word frequency and to local non-literal meaning compositional effect. This is also
apparent by the preference for predicting function vs content words, collocates vs infrequent word phrases.
In the paper, we focused our attention on the use of subword units done by BERT for out of vocabulary
words.
A comparative analysis of particle swarm optimization and k means algorithm f...ijnlc
The volume of digitized text documents on the web have been increasing rapidly. As there is huge collection
of data on the web there is a need for grouping(clustering) the documents into clusters for speedy
information retrieval. Clustering of documents is collection of documents into groups such that the
documents within each group are similar to each other and not to documents of other groups. Quality of
clustering result depends greatly on the representation of text and the clustering algorithm. This paper
presents a comparative analysis of three algorithms namely K-means, Particle swarm Optimization (PSO)
and hybrid PSO+K-means algorithm for clustering of text documents using WordNet. The common way of
representing a text document is bag of terms. The bag of terms representation is often unsatisfactory as it
does not exploit the semantics. In this paper, texts are represented in terms of synsets corresponding to a
word. Bag of terms data representation of text is thus enriched with synonyms from WordNet. K-means,
Particle Swarm Optimization (PSO) and hybrid PSO+K-means algorithms are applied for clustering of
text in Nepali language. Experimental evaluation is performed by using intra cluster similarity and inter
cluster similarity.
PARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGINGkevig
This paper describes the use of Naive Bayes to address the task of assigning function tags and context free
grammar (CFG) to parse Myanmar sentences. Part of the challenge of statistical function tagging for
Myanmar sentences comes from the fact that Myanmar has free-phrase-order and a complex
morphological system. Function tagging is a pre-processing step for parsing. In the task of function
tagging, we use the functional annotated corpus and tag Myanmar sentences with correct segmentation,
POS (part-of-speech) tagging and chunking information. We propose Myanmar grammar rules and apply
context free grammar (CFG) to find out the parse tree of function tagged Myanmar sentences. Experiments
show that our analysis achieves a good result with parsing of simple sentences and three types of complex
sentences
STRESS TEST FOR BERT AND DEEP MODELS: PREDICTING WORDS FROM ITALIAN POETRYkevig
In this paper we present a set of experiments carried out with BERT on a number of Italian sentences taken
from poetry domain. The experiments are organized on the hypothesis of a very high level of difficulty in
predictability at the three levels of linguistic complexity that we intend to monitor: lexical, syntactic and
semantic level. To test this hypothesis we ran the Italian version of BERT with 80 sentences - for a total of
900 tokens – mostly extracted from Italian poetry of the first half of last century. Then we alternated
canonical and non-canonical versions of the same sentence before processing them with the same DL
model. We used then sentences from the newswire domain containing similar syntactic structures. The
results show that the DL model is highly sensitive to presence of non-canonical structures. However, DLs
are also very sensitive to word frequency and to local non-literal meaning compositional effect. This is also
apparent by the preference for predicting function vs content words, collocates vs infrequent word phrases.
In the paper, we focused our attention on the use of subword units done by BERT for out of vocabulary
words.
STRESS TEST FOR BERT AND DEEP MODELS: PREDICTING WORDS FROM ITALIAN POETRYkevig
In this paper we present a set of experiments carried out with BERT on a number of Italian sentences taken
from poetry domain. The experiments are organized on the hypothesis of a very high level of difficulty in
predictability at the three levels of linguistic complexity that we intend to monitor: lexical, syntactic and
semantic level. To test this hypothesis we ran the Italian version of BERT with 80 sentences - for a total of
900 tokens – mostly extracted from Italian poetry of the first half of last century. Then we alternated
canonical and non-canonical versions of the same sentence before processing them with the same DL
model. We used then sentences from the newswire domain containing similar syntactic structures. The
results show that the DL model is highly sensitive to presence of non-canonical structures. However, DLs
are also very sensitive to word frequency and to local non-literal meaning compositional effect. This is also
apparent by the preference for predicting function vs content words, collocates vs infrequent word phrases.
In the paper, we focused our attention on the use of subword units done by BERT for out of vocabulary
words.
A comparative analysis of particle swarm optimization and k means algorithm f...ijnlc
The volume of digitized text documents on the web have been increasing rapidly. As there is huge collection
of data on the web there is a need for grouping(clustering) the documents into clusters for speedy
information retrieval. Clustering of documents is collection of documents into groups such that the
documents within each group are similar to each other and not to documents of other groups. Quality of
clustering result depends greatly on the representation of text and the clustering algorithm. This paper
presents a comparative analysis of three algorithms namely K-means, Particle swarm Optimization (PSO)
and hybrid PSO+K-means algorithm for clustering of text documents using WordNet. The common way of
representing a text document is bag of terms. The bag of terms representation is often unsatisfactory as it
does not exploit the semantics. In this paper, texts are represented in terms of synsets corresponding to a
word. Bag of terms data representation of text is thus enriched with synonyms from WordNet. K-means,
Particle Swarm Optimization (PSO) and hybrid PSO+K-means algorithms are applied for clustering of
text in Nepali language. Experimental evaluation is performed by using intra cluster similarity and inter
cluster similarity.
PARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGINGkevig
This paper describes the use of Naive Bayes to address the task of assigning function tags and context free
grammar (CFG) to parse Myanmar sentences. Part of the challenge of statistical function tagging for
Myanmar sentences comes from the fact that Myanmar has free-phrase-order and a complex
morphological system. Function tagging is a pre-processing step for parsing. In the task of function
tagging, we use the functional annotated corpus and tag Myanmar sentences with correct segmentation,
POS (part-of-speech) tagging and chunking information. We propose Myanmar grammar rules and apply
context free grammar (CFG) to find out the parse tree of function tagged Myanmar sentences. Experiments
show that our analysis achieves a good result with parsing of simple sentences and three types of complex
sentences
PARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGINGkevig
This paper describes the use of Naive Bayes to address the task of assigning function tags and context free
grammar (CFG) to parse Myanmar sentences. Part of the challenge of statistical function tagging for
Myanmar sentences comes from the fact that Myanmar has free-phrase-order and a complex
morphological system. Function tagging is a pre-processing step for parsing. In the task of function tagging, we use the functional annotated corpus and tag Myanmar sentences with correct segmentation, POS (part-of-speech) tagging and chunking information. We propose Myanmar grammar rules and apply context free grammar (CFG) to find out the parse tree of function tagged Myanmar sentences. Experiments
show that our analysis achieves a good result with parsing of simple sentences and three types of complex sentences.
Parsing of Myanmar Sentences With Function Taggingkevig
This paper describes the use of Naive Bayes to address the task of assigning function tags and context free
grammar (CFG) to parse Myanmar sentences. Part of the challenge of statistical function tagging for
Myanmar sentences comes from the fact that Myanmar has free-phrase-order and a complex
morphological system. Function tagging is a pre-processing step for parsing. In the task of function
tagging, we use the functional annotated corpus and tag Myanmar sentences with correct segmentation,
POS (part-of-speech) tagging and chunking information. We propose Myanmar grammar rules and apply
context free grammar (CFG) to find out the parse tree of function tagged Myanmar sentences. Experiments
show that our analysis achieves a good result with parsing of simple sentences and three types of complex
sentences.
STATISTICAL FUNCTION TAGGING AND GRAMMATICAL RELATIONS OF MYANMAR SENTENCEScscpconf
This paper describes a context free grammar (CFG) based grammatical relations for Myanmar
sentences which combine corpus-based function tagging system. Part of the challenge of
statistical function tagging for Myanmar sentences comes from the fact that Myanmar has freephrase-order
and a complex morphological system. Function tagging is a pre-processing step to
show grammatical relations of Myanmar sentences. In the task of function tagging, which tags
the function of Myanmar sentences with correct segmentation, POS (part-of-speech) tagging
and chunking information, we use Naive Bayesian theory to disambiguate the possible function
tags of a word. We apply context free grammar (CFG) to find out the grammatical relations of
the function tags. We also create a functional annotated tagged corpus for Myanmar and propose the grammar rules for Myanmar sentences. Experiments show that our analysis achieves a good result with simple sentences and complex sentences.
DEVELOPMENT OF ARABIC NOUN PHRASE EXTRACTOR (ANPE)ijnlc
Extracting key phrases from documents is a common task in many applications. In general: The Noun
Phrase Extractor consists of three modules: tokenization; part-of-speech tagging; noun phrase
identification. These will be used as three main steps in building the new system ANPE, This paper aims at
picking Arabic Noun Phrases from a corpus of documents, Relevant criteria (Recall and Precision), will be
used as evaluation measure. On the one hand, when using NPs rather than using single terms, the system
yields more relevant documents from the retrieved ones, on the other hand, it gave low precision because
number of the retrieved documents will be decreased. At the researchers conclude and recommend
improvements for more effective and efficient research in the future.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
A review of the growth of the Israel Genealogy Research Association Database Collection for the last 12 months. Our collection is now passed the 3 million mark and still growing. See which archives have contributed the most. See the different types of records we have, and which years have had records added. You can also see what we have for the future.
The simplified electron and muon model, Oscillating Spacetime: The Foundation...RitikBhardwaj56
Discover the Simplified Electron and Muon Model: A New Wave-Based Approach to Understanding Particles delves into a groundbreaking theory that presents electrons and muons as rotating soliton waves within oscillating spacetime. Geared towards students, researchers, and science buffs, this book breaks down complex ideas into simple explanations. It covers topics such as electron waves, temporal dynamics, and the implications of this model on particle physics. With clear illustrations and easy-to-follow explanations, readers will gain a new outlook on the universe's fundamental nature.
How to Add Chatter in the odoo 17 ERP ModuleCeline George
In Odoo, the chatter is like a chat tool that helps you work together on records. You can leave notes and track things, making it easier to talk with your team and partners. Inside chatter, all communication history, activity, and changes will be displayed.
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Dr. Vinod Kumar Kanvaria
Exploiting Artificial Intelligence for Empowering Researchers and Faculty,
International FDP on Fundamentals of Research in Social Sciences
at Integral University, Lucknow, 06.06.2024
By Dr. Vinod Kumar Kanvaria
This slide is special for master students (MIBS & MIFB) in UUM. Also useful for readers who are interested in the topic of contemporary Islamic banking.
Executive Directors Chat Leveraging AI for Diversity, Equity, and InclusionTechSoup
Let’s explore the intersection of technology and equity in the final session of our DEI series. Discover how AI tools, like ChatGPT, can be used to support and enhance your nonprofit's DEI initiatives. Participants will gain insights into practical AI applications and get tips for leveraging technology to advance their DEI goals.
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...NelTorrente
In this research, it concludes that while the readiness of teachers in Caloocan City to implement the MATATAG Curriculum is generally positive, targeted efforts in professional development, resource distribution, support networks, and comprehensive preparation can address the existing gaps and ensure successful curriculum implementation.
Normal Labour/ Stages of Labour/ Mechanism of LabourWasim Ak
Normal labor is also termed spontaneous labor, defined as the natural physiological process through which the fetus, placenta, and membranes are expelled from the uterus through the birth canal at term (37 to 42 weeks
Normal Labour/ Stages of Labour/ Mechanism of Labour
Syntactic aggregation
1. Syntactic Aggregation in Bengali Text Generation
Sumit Das, Anupam Basu, Sudeshna Sarkar
Department of Computer Science and Engineering,
Indian Institute of Technology, Kharagpur, India – 721302
sumit.jucse@gmail.com,{anupam,sudeshna}@cse.iitkgp.ernet.in
Abstract two text spans in (1a), linked by a C ONJUNCTION
The quality of the sentences generated by a rhetorical relation (Mann and Thompson, 1988)
natural language generation system can be can be combined as in (1b). But (1b) contains un-
evaluated based on their well-formedness necessary repetitions shown by the words in bold.
(fluency, conciseness and coherence) and So, these can be aggregated to produce (1c) which
faithfulness to the communication intent. is more fluent, concise, and coherent than (1b).
In this paper, we explore the prevalent 1. a. * Jack went up the hill.
syntactic aggregation constructs in Ben-
* Jill went up the hill.
gali and present an approach towards gen-
b. Jack went up the hill and Jill went up
erating Bengali compound sentences using
the hill.
the identified constructs. The inputs to our
c. Jack and Jill went up the hill.
syntactic aggregation method are the con-
stituent simple sentences, rhetorical rela- Syntactic aggregation is the most common form of
tions defined over them and the discourse aggregation observed in any real discourse. Shaw
markers realizing the relations. The paper (2002) proposed that in syntactic aggregation sim-
describes a rule based approach to form pler linguistic components are combined in accor-
the compound sentences, by reorganiza- dance with linguistic rules. As it is a language de-
tion of components followed by elimina- pendent process, so linguistic knowledge, such as,
tion of redundancies of lexical entities, and preferred word ordering, special verb form usage
presents a user based evaluation of the re- etc. are required for combining text spans. For
sults obtained. example, in Bengali the two simple text spans in
1 Introduction (2a), linked by S EQUENCE rhetorical relation, can
be simply combined using appropriate discourse
Any Natural language Generation (NLG) system marker eba.n as in (2b). But in (2b), the word in
should have the capability to remove unneces- bold is redundant. So, applying the conjunction
sary repetitions when generating text. Unneces- reduction construct the two text spans can be ag-
sary repetitions make the text less fluent and non- gregated to generate (2c). But, (2c) can further be
coherent. In NLG, the task of combining con- aggregated to (2d) by using non-finite verb giYe.
stituent simpler text spans by removing repetitions
2. a. 1 (Ram
is called aggregation. According to the standard * rAma mAThe giYechhila
three-stage pipeline NLG architecture proposed by went to the playground).
Reiter and Dale (2000) aggregation is a basic task * rAma phuTabala khelechhila
of any NLG system for generating fluent, concise, (Ram played football).
and coherent text. Dalianis (1993) viewed aggre- b. rAma mAThe giYechhila eba.n rAma
gation mainly as redundancy elimination problem phuTabala khelechhila (Ram went to
and should be done in such a way that the origi- 1
In this paper, Bengali graphemes are written using Ro-
nal meaning of the text is preserved and no unde- man Script in ITRANS notation. They are written in italics
sirable implication is produced. For example, the font.
Proceedings of ICON-2009: 7th International Conference on Natural Language Processing
Macmillan Publishers, India. Also accessible from http://ltrc.iiit.ac.in/proceedings/ICON-2009
2. the playground and Ram played foot- eration. Apart from redundancy elimination, ag-
ball). gregation choices can affect other characteristics
c. rAma mAThe giYechhila eba.n phuTa- of text, such as sentence complexity, focus, em-
bala khelechhila (Ram went to the phasis, theme/rhyme, prosody etc.
playground and played football). Reape and Mellish (1999) defined aggregation
d. rAma mAThe giYe phuTabala khelech- as a process to generate more concise, cohesive,
hila. (Ram went to the playground and and fluent text by omitting or substituting repeat-
played football). ing entities where the reader can infer the deleted
entities from the remaining text. Reaper and Mel-
Clearly, to syntactically aggregate smaller text lish distinguished among different types of aggre-
spans in Bengali an NLG system should have the gation: conceptual, discourse, semantic, syntactic,
knowledge of Bengali grammar. lexical, and referential. According to them syn-
In this work, we have studied a corpus of Ben- tactic aggregation is the most common and can be
gali sentences to identify the prevalent syntac- stated by some grouping rules, like, subject group-
tic aggregation constructs in Bengali. Then, we ing, predicate grouping etc.
have proposed a method to syntactically aggregate Horacek (1992) has given a more theoretical
two simple clauses using the constructs identified view of aggregation. He explained it by some
to generate a more fluent, concise and coherent grouping phenomena, like content based grouping,
compound sentence. The inputs are two simple structurally motivated propositional grouping.
clauses, the rhetorical relation between them and Shaw (2002) categorized aggregation into four
the discourse marker realizing that relation. types: interpretive, referential , syntactic, and lex-
The rest of this paper is organized as follows: In ical. He focused mainly on syntactic aggregation.
section 2, we briefly mentioned the related works He divided syntactic aggregation into two types:
in syntactic aggregation. In Section 3, we present a hypotactic and paratactic. In paratactic aggrega-
corpus analysis to identify the prevalent syntactic tion all the constituent text spans are of equal sta-
aggregation constructs in Bengali. Rhetorical rela- tus. On the other hand, in hypotactic aggregation
tions considered in this work are mentioned in sec- the constituent text spans are related by some sub-
tion 4 and the semantic representation used is de- ordinate relation.
scribed in section 5. We described our approach in In Virtual Storyteller project (Marit Theune and
section 6 and the evaluation methods in 7. In sec- Hendriks, 2006) different conjunctive and ellipti-
tion 8, concluding remarks and some future scopes cal constructs were used to syntactically aggregate
relevant to this work have been provided. simpler text span to generate more coherent and
concise fairy-tales.
2 Related Work All the works in the area of text aggregation en-
countered so far are focused on English and other
There does not exist any general consensus regard-
European languages. In this work, we have pro-
ing the exact definition of aggregation, the types
posed methods to perform syntactic aggregation in
of aggregation or the component of an NLG sys-
Bengali text generation.
tem where aggregation tasks should be performed.
The general approach is to handle the aggregation
3 Corpus Analysis
tasks in domain and application specific way.
Dalianis (1993; 1996) equated aggregation with We conducted a corpus analysis to identify the
the process of redundancy elimination. He divided prevalent syntactic aggregation constructs used in
it into four principal categories: syntactic, elision, Bengali for generating compound sentences. For
lexical, and referential aggregation. In syntactic this we have chosen text of narrative style be-
aggregation repetitions are removed syntactically cause narrative texts are mainly activity or event
leaving one item (at least) in the text to express driven. So, it is easier to model the different
the meaning explicitly. types of aggregation construct in narrative text.
Wilkinson (1995) contradicted Dalianis’s views We have a corpus of 600 compound sentences col-
of equating text aggregation with redundancy el- lected from Bengali story books. We have ran-
emination because in certain context it can be domly chosen 350 sentences from that corpus for
done by using suitable referring expression gen- analysis. First the selected compound sentences
3. were segmented into simple clauses. A simple * rAma bhAta eba.n shyAma ruTi
clause is equivalent to a simple sentence which khAbe (Ram will eat rice and
contains only one finite verb and no coordinating Shyam will eat roti).
conjunction. For example, the compound sentence Here the right most portion of the first
rAma eba.n shyAma kAla skule giYechhila (Ram proposition(khAbe) is deleted.
and Shyam went to school yesterday) contains 2 – Coordinating one constituent: In this
simple clauses: rAma kAla skule giYechhila (Ram case, one constituent entity from each
went to school yesterday) and shyAma kAla skule of the input simple clauses are co-
giYechhila (Shyam went to school yesterday). By ordinated by a conjunction. This can
decomposing the 350 compound sentences, we got happen to any entity of the constituent
868 simple clauses (2.48 simple clauses per sen- simple clauses.
tence). This measure is important to determine the
* rAma eba.n shyAma phuTbala
maximum number of simple clauses that can be khelachhila (Ram and Shyam was
aggregated in a single sentence. We cannot keep playing football).
on aggregating arbitrarily large number of sim- The subjects of the two constituent sim-
ple clauses even if they are syntactically similar, ple clauses in the above example are co-
since it may result in too complex but less fluent ordinated.
text. From the corpus analysis, we have identi-
– Non-finite verb generation: If both
fied two types of frequently used syntactic aggre-
the input simple clauses are about some
gation constructs in Bengali, e.g., paratactic con-
events or actions performed sequen-
struct and elliptic construct.
tially or concurrently by the same sub-
• Simple paratactic construction: In this ject then they are aggregated using non-
case, the two constituent simple clauses are finite form of the verb of the first simple
simply connected by the conjunctive dis- clause.
course marker and no word deletion is re- * rAma baAta kheYe skule yAbe
quired. (Ram will eat rice and go to school).
In the above example, the two con-
– rAma ekatA boi paRachhila eba.n
stituent simple clauses are about two
shyAma phuTabala khelachhila (Ram
actions performed sequentially by the
was reading a book and Shyam was
same subject. So, perfect participle form
playing football).
of the verb khAoYA i.e. kheYe is used for
• Elliptic construction: Ellipsis is defined as aggregation.
the omission of superfluous words from the Any combination of the above four types of
surface form which are inferable from the en- elliptic constructs is also allowed. For ex-
tities in the remaining text. The different el- ample, in (3) both conjunction reduction and
liptic constructs observed in Bengali are: RNR are used and (4) is generated by us-
– Conjunction reduction: In conjunction ing both conjunction reduction and non-finite
reduction, the subject of the second sim- verb.
ple clause is deleted. 3. rAma bhAta eba.n mAchha khAbe
* rAma khAbAra kheYechhe eba.n (Ram will eat rice and roti).
bandhudera sAthe sinemA dekhate 4. rAma skule giYe phuTabala khelabe
gechhe (Ram has eaten food and (Ram will go to school and play foot-
gone to see a movie with friends). ball).
In the example given above, the subject In summary, though for corpus study we have con-
of the second simple clause, i.e., rAma sidered only narrative Bengali text, it is a part
is deleted using conjunction reduction of more general approach. As syntactic aggrega-
construct. tion is language dependent but domain indepen-
– Right node raising (RNR): In RNR, dent task (Shaw, 2002), the contributions of this
the right most portion of the first simple work can be extended to generate aggregated text
clause is deleted. in Bengali in other domains as well.
4. 4 Rhetorical Relations Considered information, such as, verb root (v-root), theme,
tense, aspect, mood, polarity etc. The arg frame
From the corpus study, we know that paratactic
contains the nominal entities along with the the-
aggregations are the most common form of syn-
matic role of that entity in that clause. If there
tactic aggregation in Bengali. In paratactic ag-
is any modifier for the verb or any nominal en-
gregation, the constituent text spans are of equal
tity in a clause then the respective modifier frames
status and are linked by a multi-nuclear rhetori-
(v-mod and w-mod frame) are present inside the
cal relations (Mann and Thompson, 1988). In this
corresponding pre and arg frame.
work, we have focused on the different paratac-
tic constructs for syntactic aggregation of Bengali
text. The multi-nuclear rhetorical relations consid-
ered in this paper are C ONJUNCTION , D ISJUNC -
TION , C ONTRAST , and S EQUENCE as defined by
original Rhetorical Structure Theory (RST). In ad-
dition to the said relations, we have considered
another multi-nuclear temporal coherence relation
PARALLEL as defined below:
Two text spans are said to be related by
PARALLEL relation if the actions or the
events in those two text spans are occur-
ring simultaneously.
For example, the two constituent clauses present in
(5) are rAma khAbAra khAchchhila (Ram was eat-
ing food) and rAma Tibhi dekhachhila (Ram was
watching TV). The actions in these two clauses
are concurrent. So, the coherence relation between
them is PARALLEL.
5. rAma khAbAra khete khete Tibhi dekhachhila
(Ram was watching TV while eating food).
5 The Semantic Representation
The semantic representation chosen here is a case-
frame representation. This is called predicate-
argument representation. The basic building block
in this representation is sentence. An example of
the sentence frame is given in Figure 1. A sentence
contains a clause frame and clause-count which
Figure 1: Case-frame representation for the sen-
denotes the number of simple clauses present in
tence “rAma pa.Dachhila eba.n shyAma khelach-
the sentence. The clause is a recursive structure
hila.” (Ram was reading and Shyam was play-
that can contain clauses inside itself which makes
ing).
it capable of representing both simple and com-
posite (compound and complex) sentences. For
simple sentence, the outer clause only contains 6 Proposed Approach
one inner clause. On the other hand, for composite
sentence the outer clause contains the constituent In our approach for syntactic aggregation, the in-
inner clauses along with the rhetorical relation (rh- puts are two simple clauses, the rhetorical relation
rel) connecting and discourse marker (dm) realiz- between them, and the discourse marker realiz-
ing that rhetorical relation. A clause frame con- ing that relation. To syntactically aggregate the
tains a predicate frame (pre) and list of argument two simple clauses by using the different paratac-
frames (arg). The pre frame contains verb related tic constructs identified in section 3 we propose
5. the following steps: kakhana < kothAYa. The role on the left side of <
will appear before the role on the right side in the
• Step 1: Ordering arguments in the constituent surface form.
clauses.
6. Ami AgAmIkAla skule yAba (I shall go to
• Step 2: Repeating entity identification.
school with my father).
• Step 3: Ordering constituent clauses.
Again, in (7) the role set is {ke, kothAYa, kakhana,
• Step 4: Superfluous words deletion and non- kAra sAthe}. By using (7) the total order obtained
finite verb generation. from (6) can be extended to ke < kakhana < kAra
sAthe < kothAYa.
• Step 5: Correct surface form generation.
7. Ami AgAmIkAla bAbAra sAthe skule yAba
The above steps are described below. (Tomorrow I shall go to school with my fa-
ther).
6.1 Argument Ordering in the Constituent
Clauses Using the above method for the entire set of sim-
Preferred word ordering in a sentence varies with ple clauses we have identified the set of possible
languages and it is very important for syntactic ag- roles in Bengali and developed a total order among
gregation. Though Bengali is a free-word-order them. The arg frames in the input simple clauses
language, the preferred word ordering in a Bengali are ordered using the developed total order.
sentence is subject-object-verb.
In this work, the input simple clauses are taken 6.2 Repeating Entity Identification
in their corresponding semantic case-frame repre- In our current approach, to remove the redundant
sentation as shown in Figure 1. The arg frames in entities first we have identified the repeating enti-
the clause are then ordered by using a total order ties present in both the simple clauses taken as in-
among the roles associated with the arg frames. put. We are assuming that the nominal entities are
These roles are neither semantic roles nor Paninian equivalent if they have the same thematic role and
roles. The problem that prevents both the seman- root word in the constituent simple clauses. For
tic and Paninian roles is that, none of them can example, in the simplified semantic representa-
be associated with a unique postposition which tion of the compound sentence shown in Figure 2,
is very important for generating sentence in Ben- the constituent simple clauses have one repeating
gali. So the alternative approach should be to de- nominal entity. In both the simple sentences, the
sign some intermediate representation that has suf- thematic role of that entity is ki and surface form is
ficient granularity of the roles, such that ambigu- bhAta. Two verbs are equivalent if they have same
ous assignments of postpositions are not possible. root words and other functional parameters, such
Now, Bengali has a list of postpositions that are as, tense, aspect, mood, polarity etc. In Figure 2,
used in different contexts to convey different se- verbs are equivalent and thus repeating. Two noun
mantics. In this work, roles have been designed modifiers are equivalent if they have the same root
at a granularity level where one role is assigned to word and are modifying two nominal entities with
a semantically unique postposition. For develop- the same thematic role. Lastly, two verb modifiers
ing the total order of the roles, we have followed are equivalent if they have same root word. The
an approach taken in the SANYOG system (Bhat- repeating entities are tagged with the status RE-
tacharya, 2004). We have taken the constituent PEATING.
simple clauses of the compound sentences used
for corpus analysis. Each simple clause was rep- 6.3 Ordering Constituent Clauses
resented in their case-frame representation and the All the rhetorical relations considered in this work,
arg frames inside them are then ordered as they ap- mentioned in section 4, are multi-nuclear rela-
pear in the surface form of the clause. In this way, tions. So, two simple clauses connected by any
the ordering among the roles of the arg frames in a of these relations, except S EQUENCE relation, can
clause is known. For example, the role set for (6) be realized in any order. In case of S EQUENCE
is {ke, kothAYa, kakhana}. From (6) we can infer relation, an ordering constraint is imposed by the
that the preferred order among these roles is ke < sequence of the input clauses. So, for S EQUENCE
6. Figure 2: Simplified case-frame representation for the sentence “rAma eba.n shyAma bhAta khAbe.”
(Ram and Shyam will eat rice). Note: ∼() denotes a frame.
relation the clauses cannot be reordered. For • Polarity: If two simple clauses have the
other relations, after identifying the repeating en- same tense but different polarity for the verb
tities, the constituent simple clauses in the result- then the clause with negative polarity will
ing compound sentence are reordered on the basis come first in the surface form. For exam-
of their chronological order and polarity following ple, if the simple clauses in (9a), linked by
the rules mentioned below: C ONJUNCTION relation, are aggregated as in
(9b) then the negative polarity marker nA af-
• Tense: If the two constituent clauses have
fects both the verb kinabe and khAbe. So, the
different tense then they are ordered chrono-
communicative goal is not preserved. How-
logically. This improves the fluency of the
ever, if the clauses are reordered and then ag-
generated compound sentence. For example,
gregated, (9c) results which is grammatically
if the two clauses in (8a), linked by C ON -
correct, fluent and preserves the meaning.
JUNCTION relation, are aggregated without
chronological ordering then (8b) is gener- 9. a. rAma chakaleTa kinabe. rAma
ated. But if they are ordered according to chakaleTa khAbe nA. (Ram will
their tense and aggregated then (8c) is gener- buy chocolate. Ram will not eat
ated which is more fluent and coherent then chocolate).
(8b). b. rAma chakaleTa kinabe eba.n khAbe
8. a. · Ami bA.Di yAba. (I shall go nA (Ram will buy chocolate and
home). will not eat).
· rAma skule gechhe. (Ram has c. rAma chakaleTa khAbe nA eba.n
gone to school). kinabe (Ram will not eat chocolate
and will buy).
b. Ami bA.Di yAba eba.n rAma skule
gechhe. (I shall go home and Ram The ordering based on polarity is done when
has gone to school). the clauses are linked by either C ONJUNC -
c. rAma skule gechhe eba.n Ami bA.Di TION or D ISJUNCTION relation.
yAba. (Ram has gone to school
6.4 Superfluous Words Identification and
and I shall go home).
Non-finite Verb Generation
The chronological ordering is done when
After identifying the repeating entities and order-
the rhetorical relation between the two con-
ing the constituent clauses, the superfluous words
stituent clauses is C ONJUNCTION, D ISJUNC -
are identified using the following two methods:
TION or C ONTRAST . As the constituent sim-
ple clauses are concurrent for PARALLEL re- • Forward deletion: If the entities at the be-
lation, this ordering is not required. ginning of the surface forms of both clauses
7. are REPEATING then they are marked as bold faced words in the second clause are forward
DELETED in the second clause. Surface deleted.
forms of both the clauses are traversed from
left-to-right and REPEATING entities are 12. rAma Aja bhAta khAbe eba.n rAma kAle
marked as DELETED in the second clause bhAta khAbe (Ram will eat rice and Shaym
unless a NON-REPEATING entity is encoun- will eat rice).
tered. For example, the two constituent
13. rAma Aja bhAta khAbe kintu rAma kAle
clauses in (10), linked by C ONJUNCTION re-
bAbAra sAthe ruti khAbe (Ram will eat rice
lation, have REPEATING entities with the
today but Ram will eat roti with father tomor-
role ke and kakhana and they occur at the
row).
beginning of both the clauses. So, the RE-
PEATING entities are marked DELETED in In case of S EQUENCE or PARALLEL relation, only
the second clause indicated by the words in forward deletion is done. In addition to that, the
bold face. verb of the first clause is modified to non-finite
form if the subjects of both the clauses are the
10. rAma gatakAla khAbAra kheYechhila
same. For S EQUENCE relation, the non-finite form
eba.n rAma gatakAla skule giYechhila
is the perfect participle of the verb and for PAR -
(Ram ate food yesterday and Ram went
ALLEL relation, it is the progressive participle.
to school yesterday).
For example, in (14a) the two clauses are linked
• Backward deletion: If the verb and the by S EQUENCE relation. So, first the bold faced
entities at the end of the surface forms of words in the second clauses are forward deleted
both clauses are REPEATING then they are and then perfect participle form of the verb of the
marked as DELETED in the first clause. Sur- first clause is generated. This results in the com-
face forms of both the clauses are traversed pound sentence (14b). Similarly, the two clauses
from right-to-left and REPEATING verb and in (15a), linked by PARALLEL relation, are also
entities are marked as DELETED in the first aggregated to (15b) by using the progressive par-
clause unless a NON-REPEATING entity is ticiple of the root verb paRA.
encountered. For example, the two con- 14. a. rAma bA.Di yAbe eba.n rAma bhAta
stituent clauses in (11), linked by C ONJUNC - khAbe (Ram will go home and Ram
TION relation, have REPEATING verb and
will eat rice).
a REPEATING entity with the role ki and
b. rAma bA.Di giYe bhAta khAbe (Ram
they occur at the end of both the clauses.
will go home and eat rice).
So, the REPEATING elements are marked
DELETED in the first clause indicated by the 15. a. rAma bai pa.Dachhila eba.n rAma
words in bold face. khAbAra khAchchhila (Ram was read-
11. rAma bhAta khAbe eba.n shyAma ing a book. Ram was eating food).
bhAta khAbe (Ram will eat rice and b. rAma bai pa.Date pa.Date khAbAra
Shaym will eat rice). khAchchhila (Ram was eating food
while he was reading a book).
If the two simple clauses, linked by C ONJUNC -
TION , D ISJUNCTION or C ONTRAST relation, have 6.5 Correct Surface Form Generation
the same role set then the REPEATING entities are The redundant words are identified in the previ-
forward deleted and backward deleted. For exam- ous step but the actual deletion is done is this
ple, in (12) the two simple clauses, connected by step. While generating the resulting compound
C ONJUNCTION relation, have the same set of as- sentence, the entities marked as DELETED are not
sociated roles. So, bold faced words in the second realized i.e. deleted from the surface form.
clause are deleted forward and those in the first In case of subject coordinating and RNR con-
clause are deleted backward. However, if the role structs, if the subjects of the two input clauses are
set is different then only forward deletion is done. different then correct surface form of the common
As the two clauses in (13), connected by a C ON - verb should be generated. For example, in (16)
TRAST relation, has different role sets, only the the surface form used for the common verb khelA
8. is khelba which is generated by the subject of the 7 Evaluation
first clause i.e. Ami.
We have developed a system which performs syn-
16. Ami eba.n rAma kAla phuTabala khelaba (I tactic aggregation of two simple clauses by follow-
and Ram will play football tomorrow). ing the steps mentioned in section 6. Evaluation of
that system is important to validate our approach.
Here we have given some rules for generating cor-
We performed a user based evaluation. The sys-
rect inflectional form of the common verb for dif-
tem outputs were shown to the human evaluators
ferent syntactic aggregation constructs in Bengali.
and they were asked to rate those outputs based
• In case of subject coordinating, if one of the on some parameters. Depending upon their feed-
subjects is of first person then the common backs the overall system performance is measured.
verb will be inflected by that first person sub- We evaluated the system with three human eval-
ject. As, in (17) the common verb inflection uators and they were native speakers of Bengali.
yAba is generated by the first person subject They were only given a brief idea about the rhetor-
Ami. ical relations considered in this work. As men-
tioned in section 3, from a corpus of 600 com-
17. Ami eba.n tumi kAla skule yAba (I and
pound sentences 350 were chosen randomly for
Ram will play football tomorrow).
corpus study. The remaining 250 sentences were
• In case of subject coordinating, if one of the used as test sentences in the evaluation. The test
subjects is of second person and the other is sentences were segmented into constituent sim-
of either second or third person then the com- ple clauses. The simple clauses, the rhetorical re-
mon verb will be inflected by that second per- lation connecting them, and the appropriate dis-
son subject. As, in (18) the common verb in- course marker realizing that relation were given to
flection yAo is generated by the second per- the human evaluator as the test inputs. The evalu-
son subject tumi. ation is performed depending upon the following
two criteria:
18. tumi eba.n rAma skule yAo (You and
Ram go to school). • Well-formedness: We define the well-
formedness of an output sentence by its
• In case of subject coordinating, if both the
grammatical correctness and conciseness.
subjects are of third person then the subject
The grammatical correctness measures the
of the complete clause will inflect the com-
accuracy of the syntax, word order and the
mon verb. As, in (19) both the subjects are of
morphological inflections used.
third person and the common verb inflection
karabena is generated by the subject of the • Faithfulness: The faithfulness of an output
complete clause i.e. tini. measures how well the communication goal
19. rAma eba.n tini kAjatA karabena is preserved by the generated output.
(Ram and he will do the work).
For both the measures, the evaluators were
• In case of RNR construct other than the sub- asked to score the outputs on a scale of 1 to 5.
ject coordinating, the subject of the complete 1 is the best and 5 is the worst. The scoring for
clause will inflect the common verb. As, well-formedness and faithfulness were done sepa-
in (20) the common verb inflection khelabe rately by an individual evaluator so that the score
is generated by the subject of the complete of one does not influence the score of the other.
clause i.e. se. The results of each evaluator for well-formedness
and faithfulness are shown in Figure 3 and Figure
20. Ami krikeTa eba.n se phuTabala khe-
4 respectively.
labe (I shall play cricket and he will
To calculate overall performance of the system
play football).
the scores given by individual evaluator were com-
So, following the above rules the correct inflec- bined as follows: If two or more evaluators have
tional form of the common verb is generated given a common score to a test sentence then it
which increases the fluency and naturalness of the is assigned to that common score; If all the eval-
generated text. uators have given different scores to a test sen-
9. tence then it is not considered for overall perfor-
mance calculation. The overall performance of
our system for well-formedness and faithfulness
are shown in Figure 5 and Figure 6 respectively.
Figure 6: Faithfulness Pie Chart
ciseness. For example, the two clauses in (21a) are
Figure 3: Well-formedness Bar Graph
connected by S EQUENCE relation and the system
syntactically aggregates them to (21b). But (21b)
is very good in terms of word ordering and con-
ciseness.
21. a. rahima ekadina rAstAYa bhi.Da
dekhechhila. rahimera mAthA ghure
giYechhila (One day Rahim saw a
huge mass in the street. Rahim was
moved by that).
b. rahima ekadina rAstAYa bhi.Da
dekhechhila eba.n tAra mAthA ghure
giYechhila (One day Rahim saw a
huge mass in the street and he was
Figure 4: Faithfulness Bar Graph moved by that).
The errors regarding the faithfulness measure are
due to wrong order of the constituent clauses and
absence of cues which indicates emphasis and
prosody. For example, the two clause in (22a),
connected by C ONJUNCTION relation, are aggre-
gated to (22b). But the output is ambiguous in
terms of faithfulness as both the verbs are now in
the scope of the words bAbAra sAthe.
22. a. rAma bAbAra sAthe khAbAra khAbe.
rAma Tibhi dekhabe (Ram will eat
food with father. Ram will watch TV).
b. rAma bAbAra sAthe khAbAra khAbe
eba.n Tibhi dekhabe (Ram will eat
food with father and watch TV).
Figure 5: Well-formedness Pie Chart
8 Conclusion
The inconsistencies with respect to well-
formedness of the system generated output are In this article, we have shown our methods to gen-
mainly due to the errors in word ordering and con- erate aggregated and elliptic sentences in Bengali
10. from clause-sized semantic representations. The Mukhopadhyay for their valuable advice and sup-
current system can produce paratactic construc- port. This work is supported by the project Sanyog
tions and use ellipsis to omit repeated entities. We - Phase II, funded by Media Lab Asia, and con-
were able to produce all the desired forms of syn- ducted in Communication Empowerment Labora-
tactic aggregation (see Section 3), though there are tory, Indian Institute of Technology.
scopes for improvements.
Deletion of the repeating words in the gener-
ated output sentence sometimes does not preserve References
meaning. In that case, to make the text fluent Samit Bhattacharya. 2004. Sanyog: An iconic sys-
anaphoric pronouns need to be used. For example, tem for multilingual communication for people with
speech and motor impairments. M.S. Thesis, IIT,
if the two clauses in (23a), connected by C ON - Kharagpur, Supervisor-Basu, A, Sarkar, Sudeshna.
JUNCTION relation, are aggregated by removing
the repeating words in boldface then actual com- Hercules Dalianis and Eduard H. Hovy. 1993. Aggre-
municative goal is not preserved. In place of that, gation in natural language generation. In EWNLG
’93, Proceedings of the 4th European Workshop on
these two clauses are correctly aggregated to (23b) Natural Language Generation, Pisa, Italy.
by using anaphoric pronoun tAra.
H. Dalianis. 1996. Aggregation as a subtask of text and
sentence planning. In J.H.Stewman (ed.), Proceed-
23. a. Ami rAmer sAthe phuTabala khelaba
ings of Florida AI Research Symposium, FLAIRS-
eba.n yadu rAmer sAthe sinemA 96, pages 1–5, Key West, Florida.
dekhabe (I shall play football with
Ram and Jadu will see a movie with Helmut Horacek. 1992. An integrated view of text
planning. In Proceedings of the 6th International
Ram). Workshop on Natural Language Generation, pages
b. Ami rAmer sAthe phuTabala khelaba 29–44, London, UK. Springer-Verlag.
eba.n yadu tAra sAthe sinemA dekhabe
William C. Mann and Sandra A. Thompson. 1988.
(I shall play football with Ram and Jadu Rhetorical structure theory: Toward a functional the-
will see a movie with him. ory of text organization. Text, 8(3):243–281.
Feikje Hielkema Marit Theune and Petra Hendriks.
The current system takes discourse marker as in- 2006. Performing aggregation and ellipsis using dis-
put for a combining simple clauses. But it can course structures. Research on Language and Com-
be extended to select the appropriate discourse putation, 4(4):353–375.
marker depending upon the rhetorical relation and
M. Reape and C. Mellish. 1999. Just what is aggre-
other functional informations such as polarity, gation anyway. In Proceedings of the 7th European
prosody, emphasis etc. Workshop on Natural Language Generation, pages
The system can be extended to aggregate more 20–29, May.
than two simple clauses. In that case the docu- Ehud Reiter and Robert Dale. 2000. Building Natural
ment structure tree (Reiter and Dale, 2000) will be Language Generation Systems. Cambridge Univer-
the input. Clauses can be aggregated according to sity Press, New York, NY, USA.
the specification of the document structure tree un-
James Chi-Kuei Shaw. 2002. Clause aggregation: an
less the complexity of an single sentence exceed approach to generating concise text. Ph.D. thesis,
a predefined threshold. Depending upon the re- New York, NY, USA. Sponsor-Mckeown, Kathleen
sulting sentence complexity and other contextual R.
information, sentence break may be declared re- John Wilkinson. 1995. Aggregation in natural lan-
sulting in multi-sentential text. guage generation: Another look. Technical report,
In our future works, we intend to handle the Computer Science Department, University of Water-
above mentioned limitations to generate more nat- loo.
ural Bengali text.
Acknowledgement
We would like to thank anonymous reviewers for
valuable comments. We would also like to thank
Mr. Plaban Kumar Bhowmik and Mr. Sibansu