Your SlideShare is downloading. ×
A framework for bangla text to speech synthesis
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

A framework for bangla text to speech synthesis

270
views

Published on

My conference presentation slide for my paper in 16th ICCIT conference, 2013.

My conference presentation slide for my paper in 16th ICCIT conference, 2013.

Published in: Technology

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
270
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
10
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. A Framework for Bangla Text to Speech Synthesis Authors K. M. Azharul Hasan, Muhammad Hozaifa, Sanjoy Dutta, Rafsan Zani Rabbi Presented By Sanjoy Dutta Department of Computer Science & Engineering Khulna University of Engineering and Technology, Khulna, Bangladesh. Authors
  • 2. Contents • Problem Statement • Factors for Speech Synthesis in Bangla • Proposed Framework • Rules and Structure Development • Syllable Parser Development • Audio File Selection and Normalization • Experimental Analysis & Results • Conclusion 2
  • 3. Problem Statement •Develop a framework for Bangla Text to Speech Synthesis. 3
  • 4. Contents • Problem Statement • Factors for Speech Synthesis in Bangla • Proposed Framework • Rules and Structure Development • Syllable Parser Development • Audio File Selection and Normalization • Experimental Analysis & Results • Conclusion 4
  • 5. Factors for Speech Synthesis in Bangla • Sequential flow of diphones A diphone is a set of two adjacent phonemes where the transition between two phonemes are modelled, usually from the middle of the first phoneme to the middle of the second phoneme. A phoneme is a sound or a group of different sounds perceived to have the same function by speakers of the language or dialect in question. Like in English for K/C phoneme: Skill, School. • Position vs. Pronunciation Three kinds of position occurs of consonant and vowels: Constant Vowel(CV) Vowel Constant(VC) Vowel Constant Vowel(VCV) 5
  • 6. Contents • Problem Statement • Factors for Speech Synthesis in Bangla • Proposed Framework • Rules and Structure Development • Syllable Parser Development • Audio File Selection and Normalization • Experimental Analysis & Results • Conclusion 6
  • 7. Proposed Framework Structure and Rules • Text Normalization: Transforming text into a single standard form. Used when converting text to speech, numbers, dates, acronyms, and abbreviations. Text Normalization for Position vs. Pronunciation. 7
  • 8. Normalization rules for ‘ ’ 8
  • 9. Normalization rules for ‘ - - - ’ 9
  • 10. Syllable Parser Development 10
  • 11. Syllable Parser In Action 11
  • 12. Contents • Problem Statement • Factors for Speech Synthesis in Bangla • Proposed Framework • Rules and Structure Development • Syllable Parser Development • Audio File Selection and Normalization • Experimental Analysis & Results • Conclusion 12
  • 13. Audio File Selection and Normalization Total 39 consonants 11 vowels in Bangla After Reduction 28 independent consonants 8 (the vowel ’ ‘ is the exception) vowel 13
  • 14. Audio File Selection and Normalization Finally 224 (28*8) audio files for the syllables. 28 consonant against 5 vowels to generate 140 (28*5) diphones. In summary, we need (9 vowels, 28 consonants, 224 syllables and 140 diphones) 401 audio files to be created. 14
  • 15. Contents • Problem Statement • Factors for Speech Synthesis in Bangla • Proposed Framework • Rules and Structure Development • Syllable Parser Development • Audio File Selection and Normalization • Experimental Analysis & Results • Conclusion 15
  • 16. Experimental Analysis and Results Strategy of Analysis: Sample Input Test: Various News Articles from News Portals Listeners Selection: Anonymous Personals Chosen Randomly Accuracy Analysis: Accuracy = 𝑊𝑜𝑟𝑑𝑠 𝑙𝑖𝑠𝑡𝑒𝑛𝑒𝑟𝑠 𝑤𝑒𝑟𝑒 𝑎𝑏𝑙𝑒 𝑡𝑜 ℎ𝑒𝑎𝑟 𝑜𝑛 1𝑠𝑡 𝑎𝑡𝑡𝑒𝑚𝑝𝑡 𝑐𝑙𝑒𝑎𝑟𝑙𝑦∗100 𝑇𝑜𝑡𝑎𝑙 𝑁𝑜. 𝑜𝑓 𝑤𝑜𝑟𝑑𝑠 𝑖𝑛 𝑒𝑣𝑒𝑟𝑦 𝑠𝑎𝑚𝑝𝑙𝑒 16
  • 17. Experiment Result Listening Factors: • Duration Synchronization and Merging • Numerical Value like years Constrains in Sample 1: ‌ , , , , , , Constrains in Sample 2: , , , , , , 17
  • 18. Limitations and Future Works Detect Noun and Adjective words namely ( ) Noun and ( ) Adjective both words should follow the rule 3(a) . But they don't follow the rule 3(a) and their pronunciation is different. 18
  • 19. CONCLUSION We believe the proposed framework can be useful for Bangla TTS development to detect the Bangla words with minimum audio file requirement. 19
  • 20. Thank You !!! 20