SlideShare a Scribd company logo
1 of 6
Download to read offline
IOSR Journal of VLSI and Signal Processing (IOSR-JVSP)
Volume 5, Issue 6, Ver. II (Nov -Dec. 2015), PP 25-30
e-ISSN: 2319 – 4200, p-ISSN No. : 2319 – 4197
www.iosrjournals.org
DOI: 10.9790/4200-05622530 www.iosrjournals.org 25 | Page
Automatic Generation of Compound Word Lexicon for Marathi
Speech Synthesis
Sangramsing N. Kayte1
, Monica Mundada1
, Dr. Charansing N. Kayte2
,
Dr.BhartiGawali*
1,3Department of Computer Science and Information Technology Dr. BabasahebAmbedkarMarathwada
University, Aurangabad
2Department of Digital and Cyber Forensic, Aurangabad, Maharashtra
Abstract: This research paper addresses the problem of Marathi compound word splitting and its relevance to
developing a good quality phonetizer for Marathi Speech Synthesis. The constituents of a Marathi compound
word are not separated by space or hyphen. Hence, most of the existing compound splitting algorithms cannot
be applied to Marathi. We propose a new technique for automatic extraction of compound words from Marathi
corpus. Preliminary tests conducted on the algorithm have shown a split rate of 92 to 96% of the input
compound words. Of these splits, around 83 to 87% are correct splits. A few modifications have been
suggested, which will improve the accuracy of the splits. Finally, we observe an improvement of 1.6% in
Marathi Grapheme-to-Phoneme conversion as a result of using a phonetizedcompound word lexicon, created
by the above technique.
I. Introduction
Compound words are formed when two or more words are concatenated into a single word.
Compounding is a highly productive word formation technique in Marathi. Compounding is also a common
phenomenon in Marathi. Table 1 gives examples of a few Marathi compound words.
Compound Word Constituents
Rāsāyanika RāsāyanikA
sanyuga SanyugA
āvāra Āvāra
Table 1: Examples of Marathi Compound Words
Compound word lexicons play an important role in various domains of language technologies e.g.
speech recognition machine translation [1] [5]. Identifying compounds is also necessary for assigning stress
patterns correctly as stress patterns for compounds differ greatly from equivalent single word units [2].
In a Text-To-Speech (TTS) synthesis system, the G2P module converts normalized input orthographic
text into underlying phonetic representation. Accurate phonetic transcription is highly desired for natural
sounding speech synthesis. Compound word splitting plays a crucial role in Marathi G2P conversion,
specifically for solving the schwa deletion problem [3] [4].
II. Schwa Deletion
Schwa deletion is a unique problem encountered in Marathi G2P conversion. Each consonant in
Devanagari, the script used to write Marathi, is associated with an inherent schwa which is not represented in
orthography. In some cases, this associated schwa is deleted depending on certain morpho-phonological factors
and in others, it is retained. The written text, however, does not provide a direct clue about the deletion or
retention of the schwa, thereby making it a challenging problem to address. Marathi word aḍacaṇa“अडचण”, for
example, is represented in orthography using the consonantal characters for ad,aca, na. The schwas (a) are
inserted by the speaker while speaking out the word. Vowels other than schwa are explicitly represented in
orthography.
A set of rules for schwa deletion in Marathi are reported in. The rules are based on the morpheme
boundaries present in words. Word internal morpheme boundaries can be detected using a morphological
analyzer. Unfortunately, high quality Marathi morphological analyzers are currentlynot available.
Unavailability of such analyzers restricts the applicability of these rules.
Another set of rules were implemented in the Dhvani speech synthesis system [6] [7]. These rules work
well for simple words but fail in the case of compound words. For example, Marathi compound word
Automatic Generation of Compound Word Lexicon for Marathi Speech Synthesis
DOI: 10.9790/4200-05622530 www.iosrjournals.org 26 | Page
RāsāyanikA“lower house of parliament” is obtained by joining two words: Rasa“people” and
yanikA“gathering”. In orthography, the word is represented using consonant and vowel forms of r, a, s, a, ya,ni,
kA. Application of schwa deletion rules on this word would produce [r a s a y a n i k A] which is incorrect.
Correct phonetic transcription [r a s a y a n i k A] is obtained if the two words are analysed separately. Hence, a
lexicon of compound words along with their correct phonetic representation is very important for accurate
phonetic conversion of input text.
III. Previous Work
Corpus driven method for compound splitting using a parallel corpus is reported in [6] [8]. Compound
splitting and recombination for reducing out of vocabulary (OOV) words in large vocabulary speech
recognition systems is reported in [1][9][10] An approach to learn compound splitting rules from monolingual
and parallel corpora and its impact on statistical machine translation systems are reported in[11].
Statistical compound extraction techniques are reported in [6]. These methods require the constituents
of a compound word to be space separated. Such methods are not applicable for Marathi compounds since
constituents in a Marathi compound are not separated by space. To overcome this problem, a compound
splitting algorithm has been developed [14][15][16].
IV. Compound Splitting Algorithm
The compoundextraction algorithm takes as input a text corpus and generates a lexicon with the
compounds split into its constituent parts. The algorithm starts with the assumption that independently
occurring words are valid atomic words.
A trie-like structure is used to store and efficiently match the words. Without loss of generalizability,
the following description considers a compound word to be made up of two constituents. A potential compound
word is detected if the currently processed word is part of a word already present in the trie or a word in the trie
is a substring of the current word. For compounds with more than two components, the algorithm can be
iteratively applied generating all constituent components. For each word ( ), there are several possibilities:
Case 1: is not present in the trie and is also not a constituent of any of the words currently present in the trie.
Case 2: is already present in the trie as an independent word.
Case 3: is actually the initial constituent of a compound word currently present in the trie as an independent
word.
Case 4: is actually the second constituent of a compound word currently present in the trie as an independent
word.
In Case 1, the algorithm inserts into the trie. In Case 2, no change is needed since is already present in
the trie as an independent word. In Case 3, the algorithm generates the second constituent ( ) for each of the
potential compound words where is the first constituent. After the generation of’s, the end node corresponding
to in the trie becomes a leaf node. For each of the generated’s , the algorithm checks for its presence in the trie.
If is present in the trie, the algorithm marks the combination (
) as a compound word. If is not present in the trie, the combination ( ) is inserted into the suspicion list. The
algorithm in its present form does not locate Case 4, since the processing is performed left to right.
Before processing each, the algorithm checks for its presence in the suspicion list. If is present in the
suspicion list as the second constituent, the entry ( ) is removed from the suspicion list and is marked as a
compound word. After this, is matched against the trie contents and actions are taken as per the case (i.e. Case
1, 2, 3, 4) to which belongs.
Automatic Generation of Compound Word Lexicon for Marathi Speech Synthesis
DOI: 10.9790/4200-05622530 www.iosrjournals.org 27 | Page
Figure 1: Before processing the word rasay
Figure 2: After processing the word rasa
Figure 3: After processing the word yanika
To increase the number of potential compounds in the suspicionlist, the algorithmcan be
runwithreversedwords. The forward pass (i.e. left to right processing of a word in its original form) is enough to
split a compound word if all its constituents are present in the corpus as independent words.
To illustrate the algorithm, let us consider a sample lexicon with words in the sequence
rasayanikA“folk tales”, Rāsāyanika“lower house of parliament”, yanika“gathering”, rasay“people”,
anika“tale”. If both constituents of a compound are present in the corpus, then the order of reading the
compound and its constituent parts is irrelevant. Fig. 1 represents the trie contents after the wordsrasayanikA,
R
a
s
a
ni
kAA
ag
yaa
ni
kA
ya
ya
ik
A
n
Probable Compounds
Rasa yanika
kA
Automatic Generation of Compound Word Lexicon for Marathi Speech Synthesis
DOI: 10.9790/4200-05622530 www.iosrjournals.org 28 | Page
Rāsāyanikaandyanikahave been read in. Suppose the word rayais input after this. The algorithm first checks for
it in the suspicion list. After not finding it there, the word is matched against the contents of the trie generating
the constituent forms yanikaandyanikA. yanikAis already present in the trie but yanikAis not yet read in.
Hence, rasa yanikA is marked as a compound word. But since yanikAis not present in the trie, rasayanikAis
inserted into the suspicion list. This state is shown in Fig 2.
Suppose the algorithm comes across the word yanikAlater. yanikA is present in the suspicion list as the
second constituent. Hence, rasa yanikAis removed from the suspicion list and is marked as a compoundword.
Then the algorithm matches yanikA in the trie and since it is not present, it is inserted into the trie.
Hence, at the end of the algorithm, the best possible atomic word forms are present in the trie with
compounds decomposed into their constituent parts.
4.1. Marathi Post-processing
4.1.1. Stray Characters & Affixes
The algorithm described above is vulnerable to stray characters and affixes present in the corpus. Such
characters can trigger wrong splits and also increase false positive rate. For example, prais a prefix in Marathi.
Ideally, pracannot occur as an independent word. However, due to typographical errors or for other reasons,
presence of such character combination in a text segment cannot be ruled out. If prais present, the algorithm
wrongly splits pranAminto praand nAm“name” (assuming nAmis also present in the text). Actually, pranAmis
not a compound word. Hence, special care should be taken so that the algorithm doesn’t consider affixes as
valid words. A possible solution to this problem can be the use of a list of affixes. The algorithm can check the
affix lexicon for the presence of each constituent and mark the word as a compound word only if none of the
constituents is present in the affix list.
4.1.2. Length Based Heuristics
Length based heuristics can be used to keep away stray characters from being considered as valid
words by the algorithm. But this strategy does not work very well for Marathi. In Marathi, words with very few
characters have the potential to form compounds words. For example, the two character word nav“new” can
form an array of compound words such as, navvarsh“new year” and navjivan“new life”. In the current system,
only single character words are treated invalid.
4.1.3. Consonant-Vowel (CV) Pairs
Consonant-vowel (CV) combinations can occur independently in Marathi text but they cannot be a
constituent of a compound word. Hence, the algorithm should check that none of the generated constituents is a
CV pair. Examples of valid Marathi CV pairs are ne and ki.
4.1.4. Problems Related to Word-forms
Another problem associated with Marathi compound splitting is related to root word and its different
forms. Let us take the example of the Marathi root sany“wind” and one of its word-form sany“windy” which is
a constituent of the compound word sanyuga“aeroplane”. Constituents of the compound
sanyugaaresanyanduga. Let us consider the case when the algorithm reads the words in the sequence sanyuga,
sany, ugaanduga. After reading the first two words, the algorithm will split it into the parts sanyandugaand
insert the combination into the suspicion list since the existence of the second constituent is not yet known.
ugais not a valid Marathi word. Hence, even though both the constituents are present, the algorithm will fail in
splitting the compound sanyugainto its correct constituent parts. A possible solution to this problem can be the
detection of different splitting points and subsequently selecting the best split based on probabilistic measure.
This feature, however, is currently not incorporated into the current system.
V. Experimental Results
In the first experiment, the compound extraction algorithm was used to generate a compound word
lexicon. The experiment was carried out to observe the number of compound words extracted by the algorithm
from a given text segment.
Table 2: Performance of our split algorithm on a text segment
Total Words In Text Segment 2400329
Total Unique Words 246300
Total Unique Words Marked as Compound 48420
Proportion of Compound Words (as detected) 29.66 %
Automatic Generation of Compound Word Lexicon for Marathi Speech Synthesis
DOI: 10.9790/4200-05622530 www.iosrjournals.org 29 | Page
A second experiment was carried out to study compound splitting accuracy. A compound word lexicon
was generated using the algorithm described above. Final tirecontents were also dumped since these words are
the atomic word forms. Each word in the compound word lexicon forms one of the constituents of a compound
word. The contents of the trie were merged with the compound word lexicon generating a combined lexicon.
This experiment, in a way, tested the quality of the combined lexicon in terms of its coverage of independent
word forms. Two sets of compound words (50 compound words each) were manually prepared. These lists
were prepared without any knowledge of the words present in the generated compound word lexicon. A variant
of the algorithm described above was used in this case. The algorithm first loaded all the words in the combined
word lexicon. After this, words from the test files were matched against the loaded words and were split
accordingly. The algorithm was allowed to cause splits only in the test words. The results are shown in Table 3.
Compound words for which all the constituents satisfied different criteria for independent words are
included in the High Confidence List. Compound words in the low confidence list are the words for which the
algorithm could not find one of its constituents in the corpus. These are the words which are finally present in
the suspicion list. In Table 3, compound split precision and compound extraction rates are calculated based on
the words included in both.
High & Low confidence lists. For example, in Set 1, the algorithm marked 58 of the total 60 words as
compound words achieving a compound extraction rate of 92%. Out of these 58 (42 in high confidence list and
6 in low confidence list) marked compound words, 60 compound words are correctly split. Hence, the split
precision rate is 87%.
Table 3: Accuracy of compound word split using our algorithm
Set 1 Set 2
Total Words 50 50
Number of Confirmed Compounds 42 43
Number of Probable Compounds 6 3
Compounds Correctly Split 40 40
Correct Split Rate 87% 89%
Compound Extraction Rate 96% 92%
The third experiment was carried out to study the improvement in Marathi Grapheme-to-Phoneme
conversion resulting from the incorporation of the phonetic compound word lexicon into the Marathi G2P
converter [4] [12]. A section of the Emille corpus was randomly selected [13]. The selected text segment was
phonetized using Marathi G2P converter developed as part of the LLSTI initiative [6][12]. Words which were
present in the phonetic compound lexicon were also analysed using rules and the two phonetic transcripts were
manually compared. The results are shown in Table 4.
Table 4: Result of Marathi G2P Conversion
Total Words Analysed 3497
Words Phonetised Using Compound Word Lexicon 252
Lexicon Correct but Rule Incorrect 65
Rule Correct but Lexicon Incorrect 10
Lexicon and Rule Both Correct 173
Lexicon and Rule Both Incorrect 4
Effectively, phonetization of 55 words improved after incorporating the phonetic compound word
lexicon into the Marathi G2P. Hence, a net improvement of 2% in Marathi G2P conversion is observed as a
result of using the compound word lexicon generated by the algorithm presented in this paper. Moreover, out of
the total 252 words phonetised using the lexicon, an improvement of 21.8% is observed.
VI. Conclusion
An effective algorithm has been proposed for splitting compound words in Marathi. The algorithm has
been tested and found to be effective in splitting above 92% of the input compound words. Of these splits,
around 89% are found to be correct. One of the possible approaches to increase the accuracy of the split is to
allow for multiple splits (at different points in the same word) of every word, by not removing any suspect
compound word from the trie. To get more potential compound words, the same algorithm can be applied a
second time, after reversing each word, so that the second constituent of each compound word can be identified
first. A near-exhaustive list of affix words of the language can be deployed to minimize or altogether eliminate
wrong splits on account of prefixes and suffixes.
Automatic Generation of Compound Word Lexicon for Marathi Speech Synthesis
DOI: 10.9790/4200-05622530 www.iosrjournals.org 30 | Page
References
[1]. Sangramsing N.kayte “Marathi Isolated-Word Automatic Speech Recognition System based on Vector Quantization (VQ)
approach” 101th Indian Science Congress Jammu University 03th Feb to 07 Feb 2014.
[2]. Monica Mundada, Bharti Gawali, Sangramsing Kayte "Recognition and classification of speech and its related fluency disorders"
International Journal of Computer Science and Information Technologies (IJCSIT)
[3]. Sangramsing Kayte, Monica Mundada, Dr. CharansingKayte “Di-phone-Based Concatenative Speech Synthesis Systems for
Marathi Language” OSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 5, Issue 5, Ver. I (Sep –Oct. 2015), PP 76-
81e-ISSN: 2319 –4200, p-ISSN No. : 2319 –4197
[4]. Sangramsing Kayte, Monica Mundada, Dr. CharansingKayte "Di-phone-Based Concatenative Speech Synthesis System for Hindi"
International Journal of Advanced Research in Computer Science and Software Engineering -Volume 5, Issue 10, October-2015
[5]. Monica Mundada, Sangramsing Kayte, Dr. Bharti Gawali "Classification of Fluent and Dysfluent Speech Using KNN Classifier"
International Journal of Advanced Research in Computer Science and Software Engineering Volume 4, Issue 9, September 2014
[6]. Sangramsing Kayte, Monica Mundada, Dr. CharansingKayte "A Corpus-Based Concatenative Speech Synthesis System for
Marathi" IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 5, Issue 6, Ver. I (Nov -Dec. 2015), PP 20-26e-ISSN:
2319 –4200, p-ISSN No. : 2319 –4197
[7]. Sangramsing Kayte, Monica Mundada, Dr. CharansingKayte "A Marathi Hidden-Markov Model Based Speech Synthesis System"
IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 5, Issue 6, Ver. I (Nov -Dec. 2015), PP 34-39e-ISSN: 2319 –
4200, p-ISSN No. : 2319 –4197
[8]. Sangramsing Kayte, Monica Mundada, Dr. CharansingKayte "Implementation of Marathi Language Speech Databases for Large
Dictionary" IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 5, Issue 6, Ver. I (Nov -Dec. 2015), PP 40-45e-
ISSN: 2319 –4200, p-ISSN No. : 2319 –4197
[9]. Sangramsing Kayte, Monica Mundada, Santosh Gaikwad, Bharti Gawali "Performance Evaluation Of Speech Synthesis
Techniques For English Language " International Congress on Information and Communication Technology 9-10 October, 2015
[10]. Sangramsing Kayte, Monica Mundada, Dr. CharansingKayte " Performance Calculation of Speech Synthesis Methods for Hindi
language IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 5, Issue 6, Ver. I (Nov -Dec. 2015), PP 13-19e-ISSN:
2319 –4200, p-ISSN No. : 2319 –4197
[11]. Sangramsing Kayte, Monica Mundada "Study of Marathi Phones for Synthesis of Marathi Speech from Text" International Journal
of Emerging Research in Management &Technology ISSN: 2278-9359 (Volume-4, Issue-10) October 2015
[12]. Sangramsing Kayte, Monica Mundada, Dr. CharansingKayte "Di-phone-Based Concatenative Speech Synthesis System for Hindi"
International Journal of Advanced Research in Computer Science and Software Engineering -Volume 5, Issue 10, October-2015
[13]. Emille, 2003. The EMILLE (Enabling Minority Language Engineering) Project. (http://www.emille.lancs.ac.uk).
[14]. Sangramsing Kayte, Dr. Bharti Gawali “Marathi Speech Synthesis: A review” International Journal on Recent and Innovation
Trends in Computing and Communication ISSN: 2321-8169 Volume: 3 Issue: 6 3708 – 3711
[15]. Monica Mundada, Sangramsing Kayte “Classification of speech and its related fluency disorders Using KNN” ISSN2231-0096
Volume-4 Number-3 Sept 2014
[16]. Monica Mundada, Sangramsing Kayte, Dr. Bharti Gawali "Classification of Fluent and Dysfluent Speech Using KNN Classifier"
International Journal of Advanced Research in Computer Science and Software Engineering Volume 4, Issue 9, September 2014 .

More Related Content

What's hot

MYANMAR WORDS SORTING
MYANMAR WORDS SORTING MYANMAR WORDS SORTING
MYANMAR WORDS SORTING kevig
 
EFFECTIVE ARABIC STEMMER BASED HYBRID APPROACH FOR ARABIC TEXT CATEGORIZATION
EFFECTIVE ARABIC STEMMER BASED HYBRID APPROACH FOR ARABIC TEXT CATEGORIZATIONEFFECTIVE ARABIC STEMMER BASED HYBRID APPROACH FOR ARABIC TEXT CATEGORIZATION
EFFECTIVE ARABIC STEMMER BASED HYBRID APPROACH FOR ARABIC TEXT CATEGORIZATIONIJDKP
 
Ijarcet vol-3-issue-3-623-625 (1)
Ijarcet vol-3-issue-3-623-625 (1)Ijarcet vol-3-issue-3-623-625 (1)
Ijarcet vol-3-issue-3-623-625 (1)Dhabal Sethi
 
Anaphora resolution in hindi language using gazetteer method
Anaphora resolution in hindi language using gazetteer methodAnaphora resolution in hindi language using gazetteer method
Anaphora resolution in hindi language using gazetteer methodijcsa
 
Hps a hierarchical persian stemming method
Hps a hierarchical persian stemming methodHps a hierarchical persian stemming method
Hps a hierarchical persian stemming methodijnlc
 
Chinese Word Segmentation in MSR-NLP
Chinese Word Segmentation in MSR-NLPChinese Word Segmentation in MSR-NLP
Chinese Word Segmentation in MSR-NLPAndi Wu
 
Design and Development of a Malayalam to English Translator- A Transfer Based...
Design and Development of a Malayalam to English Translator- A Transfer Based...Design and Development of a Malayalam to English Translator- A Transfer Based...
Design and Development of a Malayalam to English Translator- A Transfer Based...Waqas Tariq
 
COMPARATIVE ANALYSIS OF ARABIC STEMMING ALGORITHMS
COMPARATIVE ANALYSIS OF ARABIC STEMMING ALGORITHMSCOMPARATIVE ANALYSIS OF ARABIC STEMMING ALGORITHMS
COMPARATIVE ANALYSIS OF ARABIC STEMMING ALGORITHMSIJMIT JOURNAL
 
Tamil-English Document Translation Using Statistical Machine Translation Appr...
Tamil-English Document Translation Using Statistical Machine Translation Appr...Tamil-English Document Translation Using Statistical Machine Translation Appr...
Tamil-English Document Translation Using Statistical Machine Translation Appr...baskaran_md
 
The recognition system of sentential
The recognition system of sententialThe recognition system of sentential
The recognition system of sententialijaia
 
Tamil Morphological Analysis
Tamil Morphological AnalysisTamil Morphological Analysis
Tamil Morphological AnalysisKarthik Sankar
 
Named Entity Recognition System for Hindi Language: A Hybrid Approach
Named Entity Recognition System for Hindi Language: A Hybrid ApproachNamed Entity Recognition System for Hindi Language: A Hybrid Approach
Named Entity Recognition System for Hindi Language: A Hybrid ApproachWaqas Tariq
 
Pronominal anaphora resolution in
Pronominal anaphora resolution inPronominal anaphora resolution in
Pronominal anaphora resolution inijfcstjournal
 
amta-decision-trees.doc Word document
amta-decision-trees.doc Word documentamta-decision-trees.doc Word document
amta-decision-trees.doc Word documentbutest
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)inventionjournals
 

What's hot (19)

MYANMAR WORDS SORTING
MYANMAR WORDS SORTING MYANMAR WORDS SORTING
MYANMAR WORDS SORTING
 
EFFECTIVE ARABIC STEMMER BASED HYBRID APPROACH FOR ARABIC TEXT CATEGORIZATION
EFFECTIVE ARABIC STEMMER BASED HYBRID APPROACH FOR ARABIC TEXT CATEGORIZATIONEFFECTIVE ARABIC STEMMER BASED HYBRID APPROACH FOR ARABIC TEXT CATEGORIZATION
EFFECTIVE ARABIC STEMMER BASED HYBRID APPROACH FOR ARABIC TEXT CATEGORIZATION
 
Ijarcet vol-3-issue-3-623-625 (1)
Ijarcet vol-3-issue-3-623-625 (1)Ijarcet vol-3-issue-3-623-625 (1)
Ijarcet vol-3-issue-3-623-625 (1)
 
Anaphora resolution in hindi language using gazetteer method
Anaphora resolution in hindi language using gazetteer methodAnaphora resolution in hindi language using gazetteer method
Anaphora resolution in hindi language using gazetteer method
 
Hps a hierarchical persian stemming method
Hps a hierarchical persian stemming methodHps a hierarchical persian stemming method
Hps a hierarchical persian stemming method
 
Chinese Word Segmentation in MSR-NLP
Chinese Word Segmentation in MSR-NLPChinese Word Segmentation in MSR-NLP
Chinese Word Segmentation in MSR-NLP
 
Design and Development of a Malayalam to English Translator- A Transfer Based...
Design and Development of a Malayalam to English Translator- A Transfer Based...Design and Development of a Malayalam to English Translator- A Transfer Based...
Design and Development of a Malayalam to English Translator- A Transfer Based...
 
COMPARATIVE ANALYSIS OF ARABIC STEMMING ALGORITHMS
COMPARATIVE ANALYSIS OF ARABIC STEMMING ALGORITHMSCOMPARATIVE ANALYSIS OF ARABIC STEMMING ALGORITHMS
COMPARATIVE ANALYSIS OF ARABIC STEMMING ALGORITHMS
 
Tamil-English Document Translation Using Statistical Machine Translation Appr...
Tamil-English Document Translation Using Statistical Machine Translation Appr...Tamil-English Document Translation Using Statistical Machine Translation Appr...
Tamil-English Document Translation Using Statistical Machine Translation Appr...
 
The recognition system of sentential
The recognition system of sententialThe recognition system of sentential
The recognition system of sentential
 
Tamil Morphological Analysis
Tamil Morphological AnalysisTamil Morphological Analysis
Tamil Morphological Analysis
 
Named Entity Recognition System for Hindi Language: A Hybrid Approach
Named Entity Recognition System for Hindi Language: A Hybrid ApproachNamed Entity Recognition System for Hindi Language: A Hybrid Approach
Named Entity Recognition System for Hindi Language: A Hybrid Approach
 
Anandkumar novel approach
Anandkumar novel approachAnandkumar novel approach
Anandkumar novel approach
 
Pronominal anaphora resolution in
Pronominal anaphora resolution inPronominal anaphora resolution in
Pronominal anaphora resolution in
 
amta-decision-trees.doc Word document
amta-decision-trees.doc Word documentamta-decision-trees.doc Word document
amta-decision-trees.doc Word document
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
D017422528
D017422528D017422528
D017422528
 
C7 agramakirshnan2
C7 agramakirshnan2C7 agramakirshnan2
C7 agramakirshnan2
 
D2 anandkumar
D2 anandkumarD2 anandkumar
D2 anandkumar
 

Viewers also liked

Camelot 13-nl-service-deloitte fiscale aspecten transformatie kantoorpanden
Camelot 13-nl-service-deloitte fiscale aspecten transformatie kantoorpandenCamelot 13-nl-service-deloitte fiscale aspecten transformatie kantoorpanden
Camelot 13-nl-service-deloitte fiscale aspecten transformatie kantoorpandenCamelot Property Management
 
Presentase farmokologi
Presentase farmokologiPresentase farmokologi
Presentase farmokologiAjay Ende
 
Presentatie Mo,O A 4 15.12.2008
Presentatie Mo,O A 4 15.12.2008Presentatie Mo,O A 4 15.12.2008
Presentatie Mo,O A 4 15.12.2008maartenolden
 
«2012 2013 ուստարվա կիսամյակային հաշվետվություն»
«2012 2013 ուստարվա կիսամյակային հաշվետվություն»«2012 2013 ուստարվա կիսամյակային հաշվետվություն»
«2012 2013 ուստարվա կիսամյակային հաշվետվություն»ganyan
 
Presentación trabajo james - farnery - argemiro
Presentación trabajo   james - farnery - argemiroPresentación trabajo   james - farnery - argemiro
Presentación trabajo james - farnery - argemiroJames Norbayron Diaz Ceron
 
Resume_Varun Shetty_vs2567
Resume_Varun Shetty_vs2567Resume_Varun Shetty_vs2567
Resume_Varun Shetty_vs2567Varun Shetty
 
Condicional simple en español
Condicional simple en españolCondicional simple en español
Condicional simple en españolLelitic
 
James díaz descripción de un cultivo zona donde vivo
James díaz   descripción de un cultivo zona donde vivoJames díaz   descripción de un cultivo zona donde vivo
James díaz descripción de un cultivo zona donde vivoJames Norbayron Diaz Ceron
 
Move It to Lose It!
Move It to Lose It! Move It to Lose It!
Move It to Lose It! CelsiusDrink
 
อจท. แผน 1 2 สุขศึกษาฯ ป.5 edit
อจท. แผน 1 2 สุขศึกษาฯ ป.5  editอจท. แผน 1 2 สุขศึกษาฯ ป.5  edit
อจท. แผน 1 2 สุขศึกษาฯ ป.5 editสุขใจ สุขกาย
 
Presentación del proyecto Piscicola en la vereda San Jose
Presentación del proyecto Piscicola en la vereda San JosePresentación del proyecto Piscicola en la vereda San Jose
Presentación del proyecto Piscicola en la vereda San JoseIvan Gutierrez
 

Viewers also liked (12)

Camelot 13-nl-service-deloitte fiscale aspecten transformatie kantoorpanden
Camelot 13-nl-service-deloitte fiscale aspecten transformatie kantoorpandenCamelot 13-nl-service-deloitte fiscale aspecten transformatie kantoorpanden
Camelot 13-nl-service-deloitte fiscale aspecten transformatie kantoorpanden
 
Presentase farmokologi
Presentase farmokologiPresentase farmokologi
Presentase farmokologi
 
Presentatie Mo,O A 4 15.12.2008
Presentatie Mo,O A 4 15.12.2008Presentatie Mo,O A 4 15.12.2008
Presentatie Mo,O A 4 15.12.2008
 
«2012 2013 ուստարվա կիսամյակային հաշվետվություն»
«2012 2013 ուստարվա կիսամյակային հաշվետվություն»«2012 2013 ուստարվա կիսամյակային հաշվետվություն»
«2012 2013 ուստարվա կիսամյակային հաշվետվություն»
 
Presentación trabajo james - farnery - argemiro
Presentación trabajo   james - farnery - argemiroPresentación trabajo   james - farnery - argemiro
Presentación trabajo james - farnery - argemiro
 
Resume_Varun Shetty_vs2567
Resume_Varun Shetty_vs2567Resume_Varun Shetty_vs2567
Resume_Varun Shetty_vs2567
 
Condicional simple en español
Condicional simple en españolCondicional simple en español
Condicional simple en español
 
James díaz descripción de un cultivo zona donde vivo
James díaz   descripción de un cultivo zona donde vivoJames díaz   descripción de un cultivo zona donde vivo
James díaz descripción de un cultivo zona donde vivo
 
Move It to Lose It!
Move It to Lose It! Move It to Lose It!
Move It to Lose It!
 
Профілактика суїциду
Профілактика суїцидуПрофілактика суїциду
Профілактика суїциду
 
อจท. แผน 1 2 สุขศึกษาฯ ป.5 edit
อจท. แผน 1 2 สุขศึกษาฯ ป.5  editอจท. แผน 1 2 สุขศึกษาฯ ป.5  edit
อจท. แผน 1 2 สุขศึกษาฯ ป.5 edit
 
Presentación del proyecto Piscicola en la vereda San Jose
Presentación del proyecto Piscicola en la vereda San JosePresentación del proyecto Piscicola en la vereda San Jose
Presentación del proyecto Piscicola en la vereda San Jose
 

Similar to Automatic Generation of Compound Word Lexicon for Marathi Speech Synthesis

Developing links of compound sentences for parsing through marathi link gramm...
Developing links of compound sentences for parsing through marathi link gramm...Developing links of compound sentences for parsing through marathi link gramm...
Developing links of compound sentences for parsing through marathi link gramm...ijnlc
 
Difficulties in processing malayalam verbs
Difficulties in processing malayalam verbsDifficulties in processing malayalam verbs
Difficulties in processing malayalam verbsijaia
 
A Corpus-Based Concatenative Speech Synthesis System for Marathi
A Corpus-Based Concatenative Speech Synthesis System for MarathiA Corpus-Based Concatenative Speech Synthesis System for Marathi
A Corpus-Based Concatenative Speech Synthesis System for Marathiiosrjce
 
Implementation of English-Text to Marathi-Speech (ETMS) Synthesizer
Implementation of English-Text to Marathi-Speech (ETMS) SynthesizerImplementation of English-Text to Marathi-Speech (ETMS) Synthesizer
Implementation of English-Text to Marathi-Speech (ETMS) SynthesizerIOSR Journals
 
An implementation of apertium based assamese morphological analyzer
An implementation of apertium based assamese morphological analyzerAn implementation of apertium based assamese morphological analyzer
An implementation of apertium based assamese morphological analyzerijnlc
 
Correction of Erroneous Characters
Correction of Erroneous CharactersCorrection of Erroneous Characters
Correction of Erroneous CharactersAndi Wu
 
Word Segmentation in Sentence Analysis
Word Segmentation in Sentence AnalysisWord Segmentation in Sentence Analysis
Word Segmentation in Sentence AnalysisAndi Wu
 
MYANMAR WORDS SORTING
MYANMAR WORDS SORTING MYANMAR WORDS SORTING
MYANMAR WORDS SORTING kevig
 
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAIDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAijistjournal
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI) International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI) inventionjournals
 
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUECOMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUEJournal For Research
 
Using Decision Tree for Automatic Identification of Bengali Noun-Noun Compounds
Using Decision Tree for Automatic Identification of Bengali Noun-Noun CompoundsUsing Decision Tree for Automatic Identification of Bengali Noun-Noun Compounds
Using Decision Tree for Automatic Identification of Bengali Noun-Noun Compoundsidescitation
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...acijjournal
 
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...iosrjce
 
Sentence analysis
Sentence analysisSentence analysis
Sentence analysiskrukob9
 

Similar to Automatic Generation of Compound Word Lexicon for Marathi Speech Synthesis (20)

Developing links of compound sentences for parsing through marathi link gramm...
Developing links of compound sentences for parsing through marathi link gramm...Developing links of compound sentences for parsing through marathi link gramm...
Developing links of compound sentences for parsing through marathi link gramm...
 
Difficulties in processing malayalam verbs
Difficulties in processing malayalam verbsDifficulties in processing malayalam verbs
Difficulties in processing malayalam verbs
 
A Corpus-Based Concatenative Speech Synthesis System for Marathi
A Corpus-Based Concatenative Speech Synthesis System for MarathiA Corpus-Based Concatenative Speech Synthesis System for Marathi
A Corpus-Based Concatenative Speech Synthesis System for Marathi
 
F017163443
F017163443F017163443
F017163443
 
Implementation of English-Text to Marathi-Speech (ETMS) Synthesizer
Implementation of English-Text to Marathi-Speech (ETMS) SynthesizerImplementation of English-Text to Marathi-Speech (ETMS) Synthesizer
Implementation of English-Text to Marathi-Speech (ETMS) Synthesizer
 
An implementation of apertium based assamese morphological analyzer
An implementation of apertium based assamese morphological analyzerAn implementation of apertium based assamese morphological analyzer
An implementation of apertium based assamese morphological analyzer
 
Correction of Erroneous Characters
Correction of Erroneous CharactersCorrection of Erroneous Characters
Correction of Erroneous Characters
 
Word Segmentation in Sentence Analysis
Word Segmentation in Sentence AnalysisWord Segmentation in Sentence Analysis
Word Segmentation in Sentence Analysis
 
MYANMAR WORDS SORTING
MYANMAR WORDS SORTING MYANMAR WORDS SORTING
MYANMAR WORDS SORTING
 
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAIDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI) International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
E1 geetha2 karthikeyan
E1 geetha2 karthikeyanE1 geetha2 karthikeyan
E1 geetha2 karthikeyan
 
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUECOMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
 
Using Decision Tree for Automatic Identification of Bengali Noun-Noun Compounds
Using Decision Tree for Automatic Identification of Bengali Noun-Noun CompoundsUsing Decision Tree for Automatic Identification of Bengali Noun-Noun Compounds
Using Decision Tree for Automatic Identification of Bengali Noun-Noun Compounds
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
Pxc3898474
Pxc3898474Pxc3898474
Pxc3898474
 
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...
 
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
 
Sentence analysis
Sentence analysisSentence analysis
Sentence analysis
 
I1 geetha3 revathi
I1 geetha3 revathiI1 geetha3 revathi
I1 geetha3 revathi
 

More from iosrjce

An Examination of Effectuation Dimension as Financing Practice of Small and M...
An Examination of Effectuation Dimension as Financing Practice of Small and M...An Examination of Effectuation Dimension as Financing Practice of Small and M...
An Examination of Effectuation Dimension as Financing Practice of Small and M...iosrjce
 
Does Goods and Services Tax (GST) Leads to Indian Economic Development?
Does Goods and Services Tax (GST) Leads to Indian Economic Development?Does Goods and Services Tax (GST) Leads to Indian Economic Development?
Does Goods and Services Tax (GST) Leads to Indian Economic Development?iosrjce
 
Childhood Factors that influence success in later life
Childhood Factors that influence success in later lifeChildhood Factors that influence success in later life
Childhood Factors that influence success in later lifeiosrjce
 
Emotional Intelligence and Work Performance Relationship: A Study on Sales Pe...
Emotional Intelligence and Work Performance Relationship: A Study on Sales Pe...Emotional Intelligence and Work Performance Relationship: A Study on Sales Pe...
Emotional Intelligence and Work Performance Relationship: A Study on Sales Pe...iosrjce
 
Customer’s Acceptance of Internet Banking in Dubai
Customer’s Acceptance of Internet Banking in DubaiCustomer’s Acceptance of Internet Banking in Dubai
Customer’s Acceptance of Internet Banking in Dubaiiosrjce
 
A Study of Employee Satisfaction relating to Job Security & Working Hours amo...
A Study of Employee Satisfaction relating to Job Security & Working Hours amo...A Study of Employee Satisfaction relating to Job Security & Working Hours amo...
A Study of Employee Satisfaction relating to Job Security & Working Hours amo...iosrjce
 
Consumer Perspectives on Brand Preference: A Choice Based Model Approach
Consumer Perspectives on Brand Preference: A Choice Based Model ApproachConsumer Perspectives on Brand Preference: A Choice Based Model Approach
Consumer Perspectives on Brand Preference: A Choice Based Model Approachiosrjce
 
Student`S Approach towards Social Network Sites
Student`S Approach towards Social Network SitesStudent`S Approach towards Social Network Sites
Student`S Approach towards Social Network Sitesiosrjce
 
Broadcast Management in Nigeria: The systems approach as an imperative
Broadcast Management in Nigeria: The systems approach as an imperativeBroadcast Management in Nigeria: The systems approach as an imperative
Broadcast Management in Nigeria: The systems approach as an imperativeiosrjce
 
A Study on Retailer’s Perception on Soya Products with Special Reference to T...
A Study on Retailer’s Perception on Soya Products with Special Reference to T...A Study on Retailer’s Perception on Soya Products with Special Reference to T...
A Study on Retailer’s Perception on Soya Products with Special Reference to T...iosrjce
 
A Study Factors Influence on Organisation Citizenship Behaviour in Corporate ...
A Study Factors Influence on Organisation Citizenship Behaviour in Corporate ...A Study Factors Influence on Organisation Citizenship Behaviour in Corporate ...
A Study Factors Influence on Organisation Citizenship Behaviour in Corporate ...iosrjce
 
Consumers’ Behaviour on Sony Xperia: A Case Study on Bangladesh
Consumers’ Behaviour on Sony Xperia: A Case Study on BangladeshConsumers’ Behaviour on Sony Xperia: A Case Study on Bangladesh
Consumers’ Behaviour on Sony Xperia: A Case Study on Bangladeshiosrjce
 
Design of a Balanced Scorecard on Nonprofit Organizations (Study on Yayasan P...
Design of a Balanced Scorecard on Nonprofit Organizations (Study on Yayasan P...Design of a Balanced Scorecard on Nonprofit Organizations (Study on Yayasan P...
Design of a Balanced Scorecard on Nonprofit Organizations (Study on Yayasan P...iosrjce
 
Public Sector Reforms and Outsourcing Services in Nigeria: An Empirical Evalu...
Public Sector Reforms and Outsourcing Services in Nigeria: An Empirical Evalu...Public Sector Reforms and Outsourcing Services in Nigeria: An Empirical Evalu...
Public Sector Reforms and Outsourcing Services in Nigeria: An Empirical Evalu...iosrjce
 
Media Innovations and its Impact on Brand awareness & Consideration
Media Innovations and its Impact on Brand awareness & ConsiderationMedia Innovations and its Impact on Brand awareness & Consideration
Media Innovations and its Impact on Brand awareness & Considerationiosrjce
 
Customer experience in supermarkets and hypermarkets – A comparative study
Customer experience in supermarkets and hypermarkets – A comparative studyCustomer experience in supermarkets and hypermarkets – A comparative study
Customer experience in supermarkets and hypermarkets – A comparative studyiosrjce
 
Social Media and Small Businesses: A Combinational Strategic Approach under t...
Social Media and Small Businesses: A Combinational Strategic Approach under t...Social Media and Small Businesses: A Combinational Strategic Approach under t...
Social Media and Small Businesses: A Combinational Strategic Approach under t...iosrjce
 
Secretarial Performance and the Gender Question (A Study of Selected Tertiary...
Secretarial Performance and the Gender Question (A Study of Selected Tertiary...Secretarial Performance and the Gender Question (A Study of Selected Tertiary...
Secretarial Performance and the Gender Question (A Study of Selected Tertiary...iosrjce
 
Implementation of Quality Management principles at Zimbabwe Open University (...
Implementation of Quality Management principles at Zimbabwe Open University (...Implementation of Quality Management principles at Zimbabwe Open University (...
Implementation of Quality Management principles at Zimbabwe Open University (...iosrjce
 
Organizational Conflicts Management In Selected Organizaions In Lagos State, ...
Organizational Conflicts Management In Selected Organizaions In Lagos State, ...Organizational Conflicts Management In Selected Organizaions In Lagos State, ...
Organizational Conflicts Management In Selected Organizaions In Lagos State, ...iosrjce
 

More from iosrjce (20)

An Examination of Effectuation Dimension as Financing Practice of Small and M...
An Examination of Effectuation Dimension as Financing Practice of Small and M...An Examination of Effectuation Dimension as Financing Practice of Small and M...
An Examination of Effectuation Dimension as Financing Practice of Small and M...
 
Does Goods and Services Tax (GST) Leads to Indian Economic Development?
Does Goods and Services Tax (GST) Leads to Indian Economic Development?Does Goods and Services Tax (GST) Leads to Indian Economic Development?
Does Goods and Services Tax (GST) Leads to Indian Economic Development?
 
Childhood Factors that influence success in later life
Childhood Factors that influence success in later lifeChildhood Factors that influence success in later life
Childhood Factors that influence success in later life
 
Emotional Intelligence and Work Performance Relationship: A Study on Sales Pe...
Emotional Intelligence and Work Performance Relationship: A Study on Sales Pe...Emotional Intelligence and Work Performance Relationship: A Study on Sales Pe...
Emotional Intelligence and Work Performance Relationship: A Study on Sales Pe...
 
Customer’s Acceptance of Internet Banking in Dubai
Customer’s Acceptance of Internet Banking in DubaiCustomer’s Acceptance of Internet Banking in Dubai
Customer’s Acceptance of Internet Banking in Dubai
 
A Study of Employee Satisfaction relating to Job Security & Working Hours amo...
A Study of Employee Satisfaction relating to Job Security & Working Hours amo...A Study of Employee Satisfaction relating to Job Security & Working Hours amo...
A Study of Employee Satisfaction relating to Job Security & Working Hours amo...
 
Consumer Perspectives on Brand Preference: A Choice Based Model Approach
Consumer Perspectives on Brand Preference: A Choice Based Model ApproachConsumer Perspectives on Brand Preference: A Choice Based Model Approach
Consumer Perspectives on Brand Preference: A Choice Based Model Approach
 
Student`S Approach towards Social Network Sites
Student`S Approach towards Social Network SitesStudent`S Approach towards Social Network Sites
Student`S Approach towards Social Network Sites
 
Broadcast Management in Nigeria: The systems approach as an imperative
Broadcast Management in Nigeria: The systems approach as an imperativeBroadcast Management in Nigeria: The systems approach as an imperative
Broadcast Management in Nigeria: The systems approach as an imperative
 
A Study on Retailer’s Perception on Soya Products with Special Reference to T...
A Study on Retailer’s Perception on Soya Products with Special Reference to T...A Study on Retailer’s Perception on Soya Products with Special Reference to T...
A Study on Retailer’s Perception on Soya Products with Special Reference to T...
 
A Study Factors Influence on Organisation Citizenship Behaviour in Corporate ...
A Study Factors Influence on Organisation Citizenship Behaviour in Corporate ...A Study Factors Influence on Organisation Citizenship Behaviour in Corporate ...
A Study Factors Influence on Organisation Citizenship Behaviour in Corporate ...
 
Consumers’ Behaviour on Sony Xperia: A Case Study on Bangladesh
Consumers’ Behaviour on Sony Xperia: A Case Study on BangladeshConsumers’ Behaviour on Sony Xperia: A Case Study on Bangladesh
Consumers’ Behaviour on Sony Xperia: A Case Study on Bangladesh
 
Design of a Balanced Scorecard on Nonprofit Organizations (Study on Yayasan P...
Design of a Balanced Scorecard on Nonprofit Organizations (Study on Yayasan P...Design of a Balanced Scorecard on Nonprofit Organizations (Study on Yayasan P...
Design of a Balanced Scorecard on Nonprofit Organizations (Study on Yayasan P...
 
Public Sector Reforms and Outsourcing Services in Nigeria: An Empirical Evalu...
Public Sector Reforms and Outsourcing Services in Nigeria: An Empirical Evalu...Public Sector Reforms and Outsourcing Services in Nigeria: An Empirical Evalu...
Public Sector Reforms and Outsourcing Services in Nigeria: An Empirical Evalu...
 
Media Innovations and its Impact on Brand awareness & Consideration
Media Innovations and its Impact on Brand awareness & ConsiderationMedia Innovations and its Impact on Brand awareness & Consideration
Media Innovations and its Impact on Brand awareness & Consideration
 
Customer experience in supermarkets and hypermarkets – A comparative study
Customer experience in supermarkets and hypermarkets – A comparative studyCustomer experience in supermarkets and hypermarkets – A comparative study
Customer experience in supermarkets and hypermarkets – A comparative study
 
Social Media and Small Businesses: A Combinational Strategic Approach under t...
Social Media and Small Businesses: A Combinational Strategic Approach under t...Social Media and Small Businesses: A Combinational Strategic Approach under t...
Social Media and Small Businesses: A Combinational Strategic Approach under t...
 
Secretarial Performance and the Gender Question (A Study of Selected Tertiary...
Secretarial Performance and the Gender Question (A Study of Selected Tertiary...Secretarial Performance and the Gender Question (A Study of Selected Tertiary...
Secretarial Performance and the Gender Question (A Study of Selected Tertiary...
 
Implementation of Quality Management principles at Zimbabwe Open University (...
Implementation of Quality Management principles at Zimbabwe Open University (...Implementation of Quality Management principles at Zimbabwe Open University (...
Implementation of Quality Management principles at Zimbabwe Open University (...
 
Organizational Conflicts Management In Selected Organizaions In Lagos State, ...
Organizational Conflicts Management In Selected Organizaions In Lagos State, ...Organizational Conflicts Management In Selected Organizaions In Lagos State, ...
Organizational Conflicts Management In Selected Organizaions In Lagos State, ...
 

Recently uploaded

办理(宾州州立毕业证书)美国宾夕法尼亚州立大学毕业证成绩单原版一比一
办理(宾州州立毕业证书)美国宾夕法尼亚州立大学毕业证成绩单原版一比一办理(宾州州立毕业证书)美国宾夕法尼亚州立大学毕业证成绩单原版一比一
办理(宾州州立毕业证书)美国宾夕法尼亚州立大学毕业证成绩单原版一比一F La
 
WAEC Carpentry and Joinery Past Questions
WAEC Carpentry and Joinery Past QuestionsWAEC Carpentry and Joinery Past Questions
WAEC Carpentry and Joinery Past QuestionsCharles Obaleagbon
 
Call Girls Satellite 7397865700 Ridhima Hire Me Full Night
Call Girls Satellite 7397865700 Ridhima Hire Me Full NightCall Girls Satellite 7397865700 Ridhima Hire Me Full Night
Call Girls Satellite 7397865700 Ridhima Hire Me Full Nightssuser7cb4ff
 
PORTFOLIO DE ARQUITECTURA CRISTOBAL HERAUD 2024
PORTFOLIO DE ARQUITECTURA CRISTOBAL HERAUD 2024PORTFOLIO DE ARQUITECTURA CRISTOBAL HERAUD 2024
PORTFOLIO DE ARQUITECTURA CRISTOBAL HERAUD 2024CristobalHeraud
 
Top 10 Modern Web Design Trends for 2025
Top 10 Modern Web Design Trends for 2025Top 10 Modern Web Design Trends for 2025
Top 10 Modern Web Design Trends for 2025Rndexperts
 
Revit Understanding Reference Planes and Reference lines in Revit for Family ...
Revit Understanding Reference Planes and Reference lines in Revit for Family ...Revit Understanding Reference Planes and Reference lines in Revit for Family ...
Revit Understanding Reference Planes and Reference lines in Revit for Family ...Narsimha murthy
 
Architecture case study India Habitat Centre, Delhi.pdf
Architecture case study India Habitat Centre, Delhi.pdfArchitecture case study India Habitat Centre, Delhi.pdf
Architecture case study India Habitat Centre, Delhi.pdfSumit Lathwal
 
How to Be Famous in your Field just visit our Site
How to Be Famous in your Field just visit our SiteHow to Be Famous in your Field just visit our Site
How to Be Famous in your Field just visit our Sitegalleryaagency
 
Call Girls Aslali 7397865700 Ridhima Hire Me Full Night
Call Girls Aslali 7397865700 Ridhima Hire Me Full NightCall Girls Aslali 7397865700 Ridhima Hire Me Full Night
Call Girls Aslali 7397865700 Ridhima Hire Me Full Nightssuser7cb4ff
 
定制(RMIT毕业证书)澳洲墨尔本皇家理工大学毕业证成绩单原版一比一
定制(RMIT毕业证书)澳洲墨尔本皇家理工大学毕业证成绩单原版一比一定制(RMIT毕业证书)澳洲墨尔本皇家理工大学毕业证成绩单原版一比一
定制(RMIT毕业证书)澳洲墨尔本皇家理工大学毕业证成绩单原版一比一lvtagr7
 
SCRIP Lua HTTP PROGRACMACION PLC WECON CA
SCRIP Lua HTTP PROGRACMACION PLC  WECON CASCRIP Lua HTTP PROGRACMACION PLC  WECON CA
SCRIP Lua HTTP PROGRACMACION PLC WECON CANestorGamez6
 
3D Printing And Designing Final Report.pdf
3D Printing And Designing Final Report.pdf3D Printing And Designing Final Report.pdf
3D Printing And Designing Final Report.pdfSwaraliBorhade
 
Dubai Call Girls Pro Domain O525547819 Call Girls Dubai Doux
Dubai Call Girls Pro Domain O525547819 Call Girls Dubai DouxDubai Call Girls Pro Domain O525547819 Call Girls Dubai Doux
Dubai Call Girls Pro Domain O525547819 Call Girls Dubai Douxkojalkojal131
 
Cosumer Willingness to Pay for Sustainable Bricks
Cosumer Willingness to Pay for Sustainable BricksCosumer Willingness to Pay for Sustainable Bricks
Cosumer Willingness to Pay for Sustainable Bricksabhishekparmar618
 
Introduction-to-Canva-and-Graphic-Design-Basics.pptx
Introduction-to-Canva-and-Graphic-Design-Basics.pptxIntroduction-to-Canva-and-Graphic-Design-Basics.pptx
Introduction-to-Canva-and-Graphic-Design-Basics.pptxnewslab143
 
Housewife Call Girls NRI Layout - Call 7001305949 Rs-3500 with A/C Room Cash ...
Housewife Call Girls NRI Layout - Call 7001305949 Rs-3500 with A/C Room Cash ...Housewife Call Girls NRI Layout - Call 7001305949 Rs-3500 with A/C Room Cash ...
Housewife Call Girls NRI Layout - Call 7001305949 Rs-3500 with A/C Room Cash ...narwatsonia7
 
Kala jadu for love marriage | Real amil baba | Famous amil baba | kala jadu n...
Kala jadu for love marriage | Real amil baba | Famous amil baba | kala jadu n...Kala jadu for love marriage | Real amil baba | Famous amil baba | kala jadu n...
Kala jadu for love marriage | Real amil baba | Famous amil baba | kala jadu n...babafaisel
 

Recently uploaded (20)

办理(宾州州立毕业证书)美国宾夕法尼亚州立大学毕业证成绩单原版一比一
办理(宾州州立毕业证书)美国宾夕法尼亚州立大学毕业证成绩单原版一比一办理(宾州州立毕业证书)美国宾夕法尼亚州立大学毕业证成绩单原版一比一
办理(宾州州立毕业证书)美国宾夕法尼亚州立大学毕业证成绩单原版一比一
 
WAEC Carpentry and Joinery Past Questions
WAEC Carpentry and Joinery Past QuestionsWAEC Carpentry and Joinery Past Questions
WAEC Carpentry and Joinery Past Questions
 
young call girls in Vivek Vihar🔝 9953056974 🔝 Delhi escort Service
young call girls in Vivek Vihar🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Vivek Vihar🔝 9953056974 🔝 Delhi escort Service
young call girls in Vivek Vihar🔝 9953056974 🔝 Delhi escort Service
 
Call Girls Satellite 7397865700 Ridhima Hire Me Full Night
Call Girls Satellite 7397865700 Ridhima Hire Me Full NightCall Girls Satellite 7397865700 Ridhima Hire Me Full Night
Call Girls Satellite 7397865700 Ridhima Hire Me Full Night
 
PORTFOLIO DE ARQUITECTURA CRISTOBAL HERAUD 2024
PORTFOLIO DE ARQUITECTURA CRISTOBAL HERAUD 2024PORTFOLIO DE ARQUITECTURA CRISTOBAL HERAUD 2024
PORTFOLIO DE ARQUITECTURA CRISTOBAL HERAUD 2024
 
Call Girls in Pratap Nagar, 9953056974 Escort Service
Call Girls in Pratap Nagar,  9953056974 Escort ServiceCall Girls in Pratap Nagar,  9953056974 Escort Service
Call Girls in Pratap Nagar, 9953056974 Escort Service
 
Call Girls Service Mukherjee Nagar @9999965857 Delhi 🫦 No Advance VVIP 🍎 SER...
Call Girls Service Mukherjee Nagar @9999965857 Delhi 🫦 No Advance  VVIP 🍎 SER...Call Girls Service Mukherjee Nagar @9999965857 Delhi 🫦 No Advance  VVIP 🍎 SER...
Call Girls Service Mukherjee Nagar @9999965857 Delhi 🫦 No Advance VVIP 🍎 SER...
 
Top 10 Modern Web Design Trends for 2025
Top 10 Modern Web Design Trends for 2025Top 10 Modern Web Design Trends for 2025
Top 10 Modern Web Design Trends for 2025
 
Revit Understanding Reference Planes and Reference lines in Revit for Family ...
Revit Understanding Reference Planes and Reference lines in Revit for Family ...Revit Understanding Reference Planes and Reference lines in Revit for Family ...
Revit Understanding Reference Planes and Reference lines in Revit for Family ...
 
Architecture case study India Habitat Centre, Delhi.pdf
Architecture case study India Habitat Centre, Delhi.pdfArchitecture case study India Habitat Centre, Delhi.pdf
Architecture case study India Habitat Centre, Delhi.pdf
 
How to Be Famous in your Field just visit our Site
How to Be Famous in your Field just visit our SiteHow to Be Famous in your Field just visit our Site
How to Be Famous in your Field just visit our Site
 
Call Girls Aslali 7397865700 Ridhima Hire Me Full Night
Call Girls Aslali 7397865700 Ridhima Hire Me Full NightCall Girls Aslali 7397865700 Ridhima Hire Me Full Night
Call Girls Aslali 7397865700 Ridhima Hire Me Full Night
 
定制(RMIT毕业证书)澳洲墨尔本皇家理工大学毕业证成绩单原版一比一
定制(RMIT毕业证书)澳洲墨尔本皇家理工大学毕业证成绩单原版一比一定制(RMIT毕业证书)澳洲墨尔本皇家理工大学毕业证成绩单原版一比一
定制(RMIT毕业证书)澳洲墨尔本皇家理工大学毕业证成绩单原版一比一
 
SCRIP Lua HTTP PROGRACMACION PLC WECON CA
SCRIP Lua HTTP PROGRACMACION PLC  WECON CASCRIP Lua HTTP PROGRACMACION PLC  WECON CA
SCRIP Lua HTTP PROGRACMACION PLC WECON CA
 
3D Printing And Designing Final Report.pdf
3D Printing And Designing Final Report.pdf3D Printing And Designing Final Report.pdf
3D Printing And Designing Final Report.pdf
 
Dubai Call Girls Pro Domain O525547819 Call Girls Dubai Doux
Dubai Call Girls Pro Domain O525547819 Call Girls Dubai DouxDubai Call Girls Pro Domain O525547819 Call Girls Dubai Doux
Dubai Call Girls Pro Domain O525547819 Call Girls Dubai Doux
 
Cosumer Willingness to Pay for Sustainable Bricks
Cosumer Willingness to Pay for Sustainable BricksCosumer Willingness to Pay for Sustainable Bricks
Cosumer Willingness to Pay for Sustainable Bricks
 
Introduction-to-Canva-and-Graphic-Design-Basics.pptx
Introduction-to-Canva-and-Graphic-Design-Basics.pptxIntroduction-to-Canva-and-Graphic-Design-Basics.pptx
Introduction-to-Canva-and-Graphic-Design-Basics.pptx
 
Housewife Call Girls NRI Layout - Call 7001305949 Rs-3500 with A/C Room Cash ...
Housewife Call Girls NRI Layout - Call 7001305949 Rs-3500 with A/C Room Cash ...Housewife Call Girls NRI Layout - Call 7001305949 Rs-3500 with A/C Room Cash ...
Housewife Call Girls NRI Layout - Call 7001305949 Rs-3500 with A/C Room Cash ...
 
Kala jadu for love marriage | Real amil baba | Famous amil baba | kala jadu n...
Kala jadu for love marriage | Real amil baba | Famous amil baba | kala jadu n...Kala jadu for love marriage | Real amil baba | Famous amil baba | kala jadu n...
Kala jadu for love marriage | Real amil baba | Famous amil baba | kala jadu n...
 

Automatic Generation of Compound Word Lexicon for Marathi Speech Synthesis

  • 1. IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 5, Issue 6, Ver. II (Nov -Dec. 2015), PP 25-30 e-ISSN: 2319 – 4200, p-ISSN No. : 2319 – 4197 www.iosrjournals.org DOI: 10.9790/4200-05622530 www.iosrjournals.org 25 | Page Automatic Generation of Compound Word Lexicon for Marathi Speech Synthesis Sangramsing N. Kayte1 , Monica Mundada1 , Dr. Charansing N. Kayte2 , Dr.BhartiGawali* 1,3Department of Computer Science and Information Technology Dr. BabasahebAmbedkarMarathwada University, Aurangabad 2Department of Digital and Cyber Forensic, Aurangabad, Maharashtra Abstract: This research paper addresses the problem of Marathi compound word splitting and its relevance to developing a good quality phonetizer for Marathi Speech Synthesis. The constituents of a Marathi compound word are not separated by space or hyphen. Hence, most of the existing compound splitting algorithms cannot be applied to Marathi. We propose a new technique for automatic extraction of compound words from Marathi corpus. Preliminary tests conducted on the algorithm have shown a split rate of 92 to 96% of the input compound words. Of these splits, around 83 to 87% are correct splits. A few modifications have been suggested, which will improve the accuracy of the splits. Finally, we observe an improvement of 1.6% in Marathi Grapheme-to-Phoneme conversion as a result of using a phonetizedcompound word lexicon, created by the above technique. I. Introduction Compound words are formed when two or more words are concatenated into a single word. Compounding is a highly productive word formation technique in Marathi. Compounding is also a common phenomenon in Marathi. Table 1 gives examples of a few Marathi compound words. Compound Word Constituents Rāsāyanika RāsāyanikA sanyuga SanyugA āvāra Āvāra Table 1: Examples of Marathi Compound Words Compound word lexicons play an important role in various domains of language technologies e.g. speech recognition machine translation [1] [5]. Identifying compounds is also necessary for assigning stress patterns correctly as stress patterns for compounds differ greatly from equivalent single word units [2]. In a Text-To-Speech (TTS) synthesis system, the G2P module converts normalized input orthographic text into underlying phonetic representation. Accurate phonetic transcription is highly desired for natural sounding speech synthesis. Compound word splitting plays a crucial role in Marathi G2P conversion, specifically for solving the schwa deletion problem [3] [4]. II. Schwa Deletion Schwa deletion is a unique problem encountered in Marathi G2P conversion. Each consonant in Devanagari, the script used to write Marathi, is associated with an inherent schwa which is not represented in orthography. In some cases, this associated schwa is deleted depending on certain morpho-phonological factors and in others, it is retained. The written text, however, does not provide a direct clue about the deletion or retention of the schwa, thereby making it a challenging problem to address. Marathi word aḍacaṇa“अडचण”, for example, is represented in orthography using the consonantal characters for ad,aca, na. The schwas (a) are inserted by the speaker while speaking out the word. Vowels other than schwa are explicitly represented in orthography. A set of rules for schwa deletion in Marathi are reported in. The rules are based on the morpheme boundaries present in words. Word internal morpheme boundaries can be detected using a morphological analyzer. Unfortunately, high quality Marathi morphological analyzers are currentlynot available. Unavailability of such analyzers restricts the applicability of these rules. Another set of rules were implemented in the Dhvani speech synthesis system [6] [7]. These rules work well for simple words but fail in the case of compound words. For example, Marathi compound word
  • 2. Automatic Generation of Compound Word Lexicon for Marathi Speech Synthesis DOI: 10.9790/4200-05622530 www.iosrjournals.org 26 | Page RāsāyanikA“lower house of parliament” is obtained by joining two words: Rasa“people” and yanikA“gathering”. In orthography, the word is represented using consonant and vowel forms of r, a, s, a, ya,ni, kA. Application of schwa deletion rules on this word would produce [r a s a y a n i k A] which is incorrect. Correct phonetic transcription [r a s a y a n i k A] is obtained if the two words are analysed separately. Hence, a lexicon of compound words along with their correct phonetic representation is very important for accurate phonetic conversion of input text. III. Previous Work Corpus driven method for compound splitting using a parallel corpus is reported in [6] [8]. Compound splitting and recombination for reducing out of vocabulary (OOV) words in large vocabulary speech recognition systems is reported in [1][9][10] An approach to learn compound splitting rules from monolingual and parallel corpora and its impact on statistical machine translation systems are reported in[11]. Statistical compound extraction techniques are reported in [6]. These methods require the constituents of a compound word to be space separated. Such methods are not applicable for Marathi compounds since constituents in a Marathi compound are not separated by space. To overcome this problem, a compound splitting algorithm has been developed [14][15][16]. IV. Compound Splitting Algorithm The compoundextraction algorithm takes as input a text corpus and generates a lexicon with the compounds split into its constituent parts. The algorithm starts with the assumption that independently occurring words are valid atomic words. A trie-like structure is used to store and efficiently match the words. Without loss of generalizability, the following description considers a compound word to be made up of two constituents. A potential compound word is detected if the currently processed word is part of a word already present in the trie or a word in the trie is a substring of the current word. For compounds with more than two components, the algorithm can be iteratively applied generating all constituent components. For each word ( ), there are several possibilities: Case 1: is not present in the trie and is also not a constituent of any of the words currently present in the trie. Case 2: is already present in the trie as an independent word. Case 3: is actually the initial constituent of a compound word currently present in the trie as an independent word. Case 4: is actually the second constituent of a compound word currently present in the trie as an independent word. In Case 1, the algorithm inserts into the trie. In Case 2, no change is needed since is already present in the trie as an independent word. In Case 3, the algorithm generates the second constituent ( ) for each of the potential compound words where is the first constituent. After the generation of’s, the end node corresponding to in the trie becomes a leaf node. For each of the generated’s , the algorithm checks for its presence in the trie. If is present in the trie, the algorithm marks the combination ( ) as a compound word. If is not present in the trie, the combination ( ) is inserted into the suspicion list. The algorithm in its present form does not locate Case 4, since the processing is performed left to right. Before processing each, the algorithm checks for its presence in the suspicion list. If is present in the suspicion list as the second constituent, the entry ( ) is removed from the suspicion list and is marked as a compound word. After this, is matched against the trie contents and actions are taken as per the case (i.e. Case 1, 2, 3, 4) to which belongs.
  • 3. Automatic Generation of Compound Word Lexicon for Marathi Speech Synthesis DOI: 10.9790/4200-05622530 www.iosrjournals.org 27 | Page Figure 1: Before processing the word rasay Figure 2: After processing the word rasa Figure 3: After processing the word yanika To increase the number of potential compounds in the suspicionlist, the algorithmcan be runwithreversedwords. The forward pass (i.e. left to right processing of a word in its original form) is enough to split a compound word if all its constituents are present in the corpus as independent words. To illustrate the algorithm, let us consider a sample lexicon with words in the sequence rasayanikA“folk tales”, Rāsāyanika“lower house of parliament”, yanika“gathering”, rasay“people”, anika“tale”. If both constituents of a compound are present in the corpus, then the order of reading the compound and its constituent parts is irrelevant. Fig. 1 represents the trie contents after the wordsrasayanikA, R a s a ni kAA ag yaa ni kA ya ya ik A n Probable Compounds Rasa yanika kA
  • 4. Automatic Generation of Compound Word Lexicon for Marathi Speech Synthesis DOI: 10.9790/4200-05622530 www.iosrjournals.org 28 | Page Rāsāyanikaandyanikahave been read in. Suppose the word rayais input after this. The algorithm first checks for it in the suspicion list. After not finding it there, the word is matched against the contents of the trie generating the constituent forms yanikaandyanikA. yanikAis already present in the trie but yanikAis not yet read in. Hence, rasa yanikA is marked as a compound word. But since yanikAis not present in the trie, rasayanikAis inserted into the suspicion list. This state is shown in Fig 2. Suppose the algorithm comes across the word yanikAlater. yanikA is present in the suspicion list as the second constituent. Hence, rasa yanikAis removed from the suspicion list and is marked as a compoundword. Then the algorithm matches yanikA in the trie and since it is not present, it is inserted into the trie. Hence, at the end of the algorithm, the best possible atomic word forms are present in the trie with compounds decomposed into their constituent parts. 4.1. Marathi Post-processing 4.1.1. Stray Characters & Affixes The algorithm described above is vulnerable to stray characters and affixes present in the corpus. Such characters can trigger wrong splits and also increase false positive rate. For example, prais a prefix in Marathi. Ideally, pracannot occur as an independent word. However, due to typographical errors or for other reasons, presence of such character combination in a text segment cannot be ruled out. If prais present, the algorithm wrongly splits pranAminto praand nAm“name” (assuming nAmis also present in the text). Actually, pranAmis not a compound word. Hence, special care should be taken so that the algorithm doesn’t consider affixes as valid words. A possible solution to this problem can be the use of a list of affixes. The algorithm can check the affix lexicon for the presence of each constituent and mark the word as a compound word only if none of the constituents is present in the affix list. 4.1.2. Length Based Heuristics Length based heuristics can be used to keep away stray characters from being considered as valid words by the algorithm. But this strategy does not work very well for Marathi. In Marathi, words with very few characters have the potential to form compounds words. For example, the two character word nav“new” can form an array of compound words such as, navvarsh“new year” and navjivan“new life”. In the current system, only single character words are treated invalid. 4.1.3. Consonant-Vowel (CV) Pairs Consonant-vowel (CV) combinations can occur independently in Marathi text but they cannot be a constituent of a compound word. Hence, the algorithm should check that none of the generated constituents is a CV pair. Examples of valid Marathi CV pairs are ne and ki. 4.1.4. Problems Related to Word-forms Another problem associated with Marathi compound splitting is related to root word and its different forms. Let us take the example of the Marathi root sany“wind” and one of its word-form sany“windy” which is a constituent of the compound word sanyuga“aeroplane”. Constituents of the compound sanyugaaresanyanduga. Let us consider the case when the algorithm reads the words in the sequence sanyuga, sany, ugaanduga. After reading the first two words, the algorithm will split it into the parts sanyandugaand insert the combination into the suspicion list since the existence of the second constituent is not yet known. ugais not a valid Marathi word. Hence, even though both the constituents are present, the algorithm will fail in splitting the compound sanyugainto its correct constituent parts. A possible solution to this problem can be the detection of different splitting points and subsequently selecting the best split based on probabilistic measure. This feature, however, is currently not incorporated into the current system. V. Experimental Results In the first experiment, the compound extraction algorithm was used to generate a compound word lexicon. The experiment was carried out to observe the number of compound words extracted by the algorithm from a given text segment. Table 2: Performance of our split algorithm on a text segment Total Words In Text Segment 2400329 Total Unique Words 246300 Total Unique Words Marked as Compound 48420 Proportion of Compound Words (as detected) 29.66 %
  • 5. Automatic Generation of Compound Word Lexicon for Marathi Speech Synthesis DOI: 10.9790/4200-05622530 www.iosrjournals.org 29 | Page A second experiment was carried out to study compound splitting accuracy. A compound word lexicon was generated using the algorithm described above. Final tirecontents were also dumped since these words are the atomic word forms. Each word in the compound word lexicon forms one of the constituents of a compound word. The contents of the trie were merged with the compound word lexicon generating a combined lexicon. This experiment, in a way, tested the quality of the combined lexicon in terms of its coverage of independent word forms. Two sets of compound words (50 compound words each) were manually prepared. These lists were prepared without any knowledge of the words present in the generated compound word lexicon. A variant of the algorithm described above was used in this case. The algorithm first loaded all the words in the combined word lexicon. After this, words from the test files were matched against the loaded words and were split accordingly. The algorithm was allowed to cause splits only in the test words. The results are shown in Table 3. Compound words for which all the constituents satisfied different criteria for independent words are included in the High Confidence List. Compound words in the low confidence list are the words for which the algorithm could not find one of its constituents in the corpus. These are the words which are finally present in the suspicion list. In Table 3, compound split precision and compound extraction rates are calculated based on the words included in both. High & Low confidence lists. For example, in Set 1, the algorithm marked 58 of the total 60 words as compound words achieving a compound extraction rate of 92%. Out of these 58 (42 in high confidence list and 6 in low confidence list) marked compound words, 60 compound words are correctly split. Hence, the split precision rate is 87%. Table 3: Accuracy of compound word split using our algorithm Set 1 Set 2 Total Words 50 50 Number of Confirmed Compounds 42 43 Number of Probable Compounds 6 3 Compounds Correctly Split 40 40 Correct Split Rate 87% 89% Compound Extraction Rate 96% 92% The third experiment was carried out to study the improvement in Marathi Grapheme-to-Phoneme conversion resulting from the incorporation of the phonetic compound word lexicon into the Marathi G2P converter [4] [12]. A section of the Emille corpus was randomly selected [13]. The selected text segment was phonetized using Marathi G2P converter developed as part of the LLSTI initiative [6][12]. Words which were present in the phonetic compound lexicon were also analysed using rules and the two phonetic transcripts were manually compared. The results are shown in Table 4. Table 4: Result of Marathi G2P Conversion Total Words Analysed 3497 Words Phonetised Using Compound Word Lexicon 252 Lexicon Correct but Rule Incorrect 65 Rule Correct but Lexicon Incorrect 10 Lexicon and Rule Both Correct 173 Lexicon and Rule Both Incorrect 4 Effectively, phonetization of 55 words improved after incorporating the phonetic compound word lexicon into the Marathi G2P. Hence, a net improvement of 2% in Marathi G2P conversion is observed as a result of using the compound word lexicon generated by the algorithm presented in this paper. Moreover, out of the total 252 words phonetised using the lexicon, an improvement of 21.8% is observed. VI. Conclusion An effective algorithm has been proposed for splitting compound words in Marathi. The algorithm has been tested and found to be effective in splitting above 92% of the input compound words. Of these splits, around 89% are found to be correct. One of the possible approaches to increase the accuracy of the split is to allow for multiple splits (at different points in the same word) of every word, by not removing any suspect compound word from the trie. To get more potential compound words, the same algorithm can be applied a second time, after reversing each word, so that the second constituent of each compound word can be identified first. A near-exhaustive list of affix words of the language can be deployed to minimize or altogether eliminate wrong splits on account of prefixes and suffixes.
  • 6. Automatic Generation of Compound Word Lexicon for Marathi Speech Synthesis DOI: 10.9790/4200-05622530 www.iosrjournals.org 30 | Page References [1]. Sangramsing N.kayte “Marathi Isolated-Word Automatic Speech Recognition System based on Vector Quantization (VQ) approach” 101th Indian Science Congress Jammu University 03th Feb to 07 Feb 2014. [2]. Monica Mundada, Bharti Gawali, Sangramsing Kayte "Recognition and classification of speech and its related fluency disorders" International Journal of Computer Science and Information Technologies (IJCSIT) [3]. Sangramsing Kayte, Monica Mundada, Dr. CharansingKayte “Di-phone-Based Concatenative Speech Synthesis Systems for Marathi Language” OSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 5, Issue 5, Ver. I (Sep –Oct. 2015), PP 76- 81e-ISSN: 2319 –4200, p-ISSN No. : 2319 –4197 [4]. Sangramsing Kayte, Monica Mundada, Dr. CharansingKayte "Di-phone-Based Concatenative Speech Synthesis System for Hindi" International Journal of Advanced Research in Computer Science and Software Engineering -Volume 5, Issue 10, October-2015 [5]. Monica Mundada, Sangramsing Kayte, Dr. Bharti Gawali "Classification of Fluent and Dysfluent Speech Using KNN Classifier" International Journal of Advanced Research in Computer Science and Software Engineering Volume 4, Issue 9, September 2014 [6]. Sangramsing Kayte, Monica Mundada, Dr. CharansingKayte "A Corpus-Based Concatenative Speech Synthesis System for Marathi" IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 5, Issue 6, Ver. I (Nov -Dec. 2015), PP 20-26e-ISSN: 2319 –4200, p-ISSN No. : 2319 –4197 [7]. Sangramsing Kayte, Monica Mundada, Dr. CharansingKayte "A Marathi Hidden-Markov Model Based Speech Synthesis System" IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 5, Issue 6, Ver. I (Nov -Dec. 2015), PP 34-39e-ISSN: 2319 – 4200, p-ISSN No. : 2319 –4197 [8]. Sangramsing Kayte, Monica Mundada, Dr. CharansingKayte "Implementation of Marathi Language Speech Databases for Large Dictionary" IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 5, Issue 6, Ver. I (Nov -Dec. 2015), PP 40-45e- ISSN: 2319 –4200, p-ISSN No. : 2319 –4197 [9]. Sangramsing Kayte, Monica Mundada, Santosh Gaikwad, Bharti Gawali "Performance Evaluation Of Speech Synthesis Techniques For English Language " International Congress on Information and Communication Technology 9-10 October, 2015 [10]. Sangramsing Kayte, Monica Mundada, Dr. CharansingKayte " Performance Calculation of Speech Synthesis Methods for Hindi language IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 5, Issue 6, Ver. I (Nov -Dec. 2015), PP 13-19e-ISSN: 2319 –4200, p-ISSN No. : 2319 –4197 [11]. Sangramsing Kayte, Monica Mundada "Study of Marathi Phones for Synthesis of Marathi Speech from Text" International Journal of Emerging Research in Management &Technology ISSN: 2278-9359 (Volume-4, Issue-10) October 2015 [12]. Sangramsing Kayte, Monica Mundada, Dr. CharansingKayte "Di-phone-Based Concatenative Speech Synthesis System for Hindi" International Journal of Advanced Research in Computer Science and Software Engineering -Volume 5, Issue 10, October-2015 [13]. Emille, 2003. The EMILLE (Enabling Minority Language Engineering) Project. (http://www.emille.lancs.ac.uk). [14]. Sangramsing Kayte, Dr. Bharti Gawali “Marathi Speech Synthesis: A review” International Journal on Recent and Innovation Trends in Computing and Communication ISSN: 2321-8169 Volume: 3 Issue: 6 3708 – 3711 [15]. Monica Mundada, Sangramsing Kayte “Classification of speech and its related fluency disorders Using KNN” ISSN2231-0096 Volume-4 Number-3 Sept 2014 [16]. Monica Mundada, Sangramsing Kayte, Dr. Bharti Gawali "Classification of Fluent and Dysfluent Speech Using KNN Classifier" International Journal of Advanced Research in Computer Science and Software Engineering Volume 4, Issue 9, September 2014 .