SlideShare a Scribd company logo
Progress on
Bangla Text-To-Speech System
Presented By:
Dr. M. Shahidur Rahman
Professor, Dept. of Computer Science & Engg.
Shahjalal University of Science & Technology
rahmanms@sust.edu
Outline
• Introduction to TTS
• How TTS works
• Present Bangla TTS systems
• Problems of the present Bangla TTS
• Directions to improve the performance of
Bangla TTS
• Discussion…
2
What is a TTS?
• The goal of text-to-speech (TTS) synthesis is to convert an
arbitrary input text into intelligible and natural sounding
speech
– TTS is not a “cut-and-paste” approach that strings together
isolated words
– Instead, TTS employs linguistic analysis to infer correct
pronunciation and prosody (i.e., NLP) and acoustic
representations of speech to generate waveforms (i.e.,
DSP)
3
TTS Applications
Applications:
 Services for the visually impaired community
 Services for the Illiterate people with difficulties in reading
 Enable use of Computers and IT services
 Reading email aloud
 Using Word processor
 Using Internet
Commercial TTS Systems:
 Festival
 Bell Labs TTS
4
How TTS Works
5
Different TTS Systems
Phoneme-Based TTS System
• Phonemes are:
– The minimal distinctive phonetic units
– Relatively small in number (39 phonemes in English)
• Disadvantage
– Phonemes ignore transitional sound !!!
6
Different TTS Systems (cont’d)
Diphone-Based TTS System:
 Diphones are:
– Made up of 2 phonemes
– Incorporate transitional sound
– Produce better sounding speech
– Ex. কক = ক + কঅ + অক + ক
Disadvantage:
• Over 1500 diphones in English language !!!
7
Text Pre-Processing
• Convert raw text, which may include numbers, abbreviations,
etc., into the equivalent of written-out words
8
Word to Diphone Converter
(Phonetization)
 Purpose
 Translate words to their diphone representations
(Ex. রাজা -> Diphones: {র + রআ + আজ + জআ})
 mark the text into prosodic units such as phrases,
clauses and sentences
 Resource
– Dictionary of words and their diphones
9
Prosody
Diphone
Retrieval
ConcatenationAcoustic
Manipulation
Diphone
Database
Prosody
Param.
10
Properties of Speech
PeriodicNon-
Periodic
Non-
Periodic
eg. cat.wav
11
Altering Pitch/Duration/Amplitude
• For smooth concatenation, altering pitch,
duration and amplitude at the concatenation
point is very important.
12
Altering Pitch
Hanning
window
Original diphone Extracted
pitch period
Hanned
pitch period
X
=
13
PSOLA – Pitch Synchronous Overlap
and Add
=
50% Overlap + Add
Pitch Up > 50%
Pitch Down < 50%
14
Altering Duration
• Increase number of PSOLA iterations
(overlaps) to increase duration
• Decrease number of PSOLA iterations
(overlaps) to decrease duration
15
Altering Amplitude
 Multiplying the signal by a constant
 If constant > 1, amplitude increase
 If constant < 1, amplitude decrease
16
Concatenation
Diphones  Word
• Using PSOLA at the joining ends
• Ensures smooth transition
Words  Sentence
• Straight joining at the end points due to
presence of pauses
17
Putting All Together
TTS System
Text
Pre-processing Prosody Concatenation
words
18
Types of Concatenative speech
synthesis
• Concatenative synthesis with a fixed inventory
– contain one sample for each unit, and perform
prosodic modification to match the required
prosody
• Unit-selection-based synthesis
– store several instances of each unit, thus
improving the chances of finding a well-matched
unit
19
Progress of Bangla TTS
• KATHA
 Developed in BRAC university
 Unit based system using Festival framework
 4355 Diphones
 Takes 2 sec to generate a 10 sec utterance
• BANGLA VAANI
 syllable based synthesis system
 Developed in Kolkata
• SUBACHAN
 Developed by SUST people
 Diphone based synthesis system
 527 Diphones
 Takes 45ms to generate a 10 sec utterance
20
Speech Signal From Kotha and Subachan
• (Voice of kotha) তিতি প্রধািি কতি হলেও বিশ তকছু প্রিন্ধ-
তিিন্ধ রচিা ও প্রকাশ কলরলছি
• (Voice of Subachan) তিতি প্রধািি কতি হলেও বিশ তকছু
প্রিন্ধ-তিিন্ধ রচিা ও প্রকাশ কলরলছি
• (Voice of kotha) জীবনানন্দ দাশ ববিংশ
শতাব্দীর অনযতম প্রধান আধুবনক বািংলা কবব
• (Voice of Subachan) জীবনানন্দ দাশ ববিংশ
শতাব্দীর অনযতম প্রধান আধুবনক বািংলা কবব
21
Problems: Homograph Ambiguity
• Homographs are words that share the same spelling
but differ in meaning and pronunciation
22
Solution: Homograph Disambiguation
 Collect allpossible homograph words
 Determine POS tag of the homograph words
Ex. বছলেরামালেিে (bol) বেেলছ।
িু তম যালি তক িা িে (bolo)।
• Bayes Theorem can also be applied to determine the
likelihood of a word.
23
Problems: Improper Concatenation
24
Not concatenated
properly
Signal from the the
utterance of রাশেদ
Solution: Improper Concatenation
• PSOLA
• Reducing number of concatenation point
– Ex 1. Sentence-> কামাে ভাে বছলে।
Diphones-> কা + আমা + আে ভা+আলো বছ+এলে
Instead of ক + কআ +আম + মআ +আে + ে …
– Ex 2. ফলাাঃ পৃবিবী -> পৃ + ইবি + ইবী
• Vowel sound is periodic, thus suitable for
appropriate concatenation
• Use 1000 most frequently spoken word
25
Duration Modeling
26
Duration Modeling
27
Thank you all!
Suggestions??
28
Sound Synthesized by Katha
• Katha
29
Sound Synthesized by Subachan
• Subachan
30

More Related Content

Viewers also liked

Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversionankit_saluja
 
Arabic_Verb_Mizansus sorf o munshayib bangla
Arabic_Verb_Mizansus sorf o munshayib bangla Arabic_Verb_Mizansus sorf o munshayib bangla
Arabic_Verb_Mizansus sorf o munshayib bangla
Sonali Jannat
 
Voice To Text Presentation
Voice To Text PresentationVoice To Text Presentation
Voice To Text Presentationshahinmehr
 
Voice to text voice to sign with hyperlinks
Voice to text voice to sign with hyperlinksVoice to text voice to sign with hyperlinks
Voice to text voice to sign with hyperlinks
SJones87
 
Tools For Arabic People Names Processing And Retrieval - Ali Salhi & Adnan Yahya
Tools For Arabic People Names Processing And Retrieval - Ali Salhi & Adnan YahyaTools For Arabic People Names Processing And Retrieval - Ali Salhi & Adnan Yahya
Tools For Arabic People Names Processing And Retrieval - Ali Salhi & Adnan YahyaArabicOntology
 
Mp3englishreview
Mp3englishreviewMp3englishreview
Mp3englishreview
Md Mominul Islam
 
Vocabulary List in Arabic: Side-by-side with English and Kannada
Vocabulary List in Arabic: Side-by-side with English and KannadaVocabulary List in Arabic: Side-by-side with English and Kannada
Vocabulary List in Arabic: Side-by-side with English and Kannada
Muhammad Haroon
 
Digital speech processing lecture1
Digital speech processing lecture1Digital speech processing lecture1
Digital speech processing lecture1Samiul Parag
 
BIODERMA
BIODERMABIODERMA
BIODERMAIeva_S
 
Bangla OCR
Bangla OCRBangla OCR
Bangla OCR
Al Imran, CISA
 
парки легені міст і сіл
парки   легені міст і сілпарки   легені міст і сіл
парки легені міст і сіл
Юлія Козійчук
 
Psoriasis treatment by aseem
Psoriasis treatment by aseemPsoriasis treatment by aseem
Psoriasis treatment by aseem
Dr. Aseem Sharma
 
Physics (NSC013)
Physics (NSC013)Physics (NSC013)
Physics (NSC013)
Czarina Nedamo
 
Text to-speech & voice recognition
Text to-speech & voice recognitionText to-speech & voice recognition
Text to-speech & voice recognition
Mark Williams
 
Text to speech converter in C#.NET
Text to speech converter in C#.NETText to speech converter in C#.NET
Text to speech converter in C#.NET
Mandeep Cheema
 
General principles of drug action
General principles of drug actionGeneral principles of drug action
General principles of drug action
Morteza Parmis ( Esmaeili )
 

Viewers also liked (17)

Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversion
 
Arabic_Verb_Mizansus sorf o munshayib bangla
Arabic_Verb_Mizansus sorf o munshayib bangla Arabic_Verb_Mizansus sorf o munshayib bangla
Arabic_Verb_Mizansus sorf o munshayib bangla
 
Voice To Text Presentation
Voice To Text PresentationVoice To Text Presentation
Voice To Text Presentation
 
Voice to text voice to sign with hyperlinks
Voice to text voice to sign with hyperlinksVoice to text voice to sign with hyperlinks
Voice to text voice to sign with hyperlinks
 
Tools For Arabic People Names Processing And Retrieval - Ali Salhi & Adnan Yahya
Tools For Arabic People Names Processing And Retrieval - Ali Salhi & Adnan YahyaTools For Arabic People Names Processing And Retrieval - Ali Salhi & Adnan Yahya
Tools For Arabic People Names Processing And Retrieval - Ali Salhi & Adnan Yahya
 
Mp3englishreview
Mp3englishreviewMp3englishreview
Mp3englishreview
 
Vocabulary List in Arabic: Side-by-side with English and Kannada
Vocabulary List in Arabic: Side-by-side with English and KannadaVocabulary List in Arabic: Side-by-side with English and Kannada
Vocabulary List in Arabic: Side-by-side with English and Kannada
 
Digital speech processing lecture1
Digital speech processing lecture1Digital speech processing lecture1
Digital speech processing lecture1
 
BIODERMA
BIODERMABIODERMA
BIODERMA
 
Bangla OCR
Bangla OCRBangla OCR
Bangla OCR
 
парки легені міст і сіл
парки   легені міст і сілпарки   легені міст і сіл
парки легені міст і сіл
 
Speech processing
Speech processingSpeech processing
Speech processing
 
Psoriasis treatment by aseem
Psoriasis treatment by aseemPsoriasis treatment by aseem
Psoriasis treatment by aseem
 
Physics (NSC013)
Physics (NSC013)Physics (NSC013)
Physics (NSC013)
 
Text to-speech & voice recognition
Text to-speech & voice recognitionText to-speech & voice recognition
Text to-speech & voice recognition
 
Text to speech converter in C#.NET
Text to speech converter in C#.NETText to speech converter in C#.NET
Text to speech converter in C#.NET
 
General principles of drug action
General principles of drug actionGeneral principles of drug action
General principles of drug action
 

Similar to Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman

Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
paperpublications3
 
透過 Amazon Polly 為你的應用程式加入語音功能
透過 Amazon Polly 為你的應用程式加入語音功能透過 Amazon Polly 為你的應用程式加入語音功能
透過 Amazon Polly 為你的應用程式加入語音功能
Amazon Web Services
 
Introduction to text to speech
Introduction to text to speechIntroduction to text to speech
Introduction to text to speech
Bilgin Aksoy
 
Improvement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A ReviewImprovement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A Review
inscit2006
 
Announcing Amazon Polly - Turn Text into Lifelike Speech - December 2016 Mont...
Announcing Amazon Polly - Turn Text into Lifelike Speech - December 2016 Mont...Announcing Amazon Polly - Turn Text into Lifelike Speech - December 2016 Mont...
Announcing Amazon Polly - Turn Text into Lifelike Speech - December 2016 Mont...
Amazon Web Services
 
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
paperpublications3
 
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
iosrjce
 
G1803013542
G1803013542G1803013542
G1803013542
IOSR Journals
 
AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Polly (MAC204)
AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Polly (MAC204)AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Polly (MAC204)
AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Polly (MAC204)
Amazon Web Services
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
shrey bhate
 
Speech Synthesis.pptx
Speech Synthesis.pptxSpeech Synthesis.pptx
Speech Synthesis.pptx
Subramanian Mani
 
江振宇/It's Not What You Say: It's How You Say It!
江振宇/It's Not What You Say: It's How You Say It!江振宇/It's Not What You Say: It's How You Say It!
江振宇/It's Not What You Say: It's How You Say It!
台灣資料科學年會
 
Direct Punjabi to English Speech Translation using Discrete Units
Direct Punjabi to English Speech Translation using Discrete UnitsDirect Punjabi to English Speech Translation using Discrete Units
Direct Punjabi to English Speech Translation using Discrete Units
IJCI JOURNAL
 
Comparative study of Text-to-Speech Synthesis for Indian Languages by using S...
Comparative study of Text-to-Speech Synthesis for Indian Languages by using S...Comparative study of Text-to-Speech Synthesis for Indian Languages by using S...
Comparative study of Text-to-Speech Synthesis for Indian Languages by using S...
ravi sharma
 
Chapter 2: Text Operation in information stroage and retrieval
Chapter 2: Text Operation in information stroage and retrievalChapter 2: Text Operation in information stroage and retrieval
Chapter 2: Text Operation in information stroage and retrieval
captainmactavish1996
 
Voice morphing-101113123852-phpapp01
Voice morphing-101113123852-phpapp01Voice morphing-101113123852-phpapp01
Voice morphing-101113123852-phpapp01
Rehan Ahmed
 
NLP_KASHK:Text Normalization
NLP_KASHK:Text NormalizationNLP_KASHK:Text Normalization
NLP_KASHK:Text Normalization
Hemantha Kulathilake
 
Translation
TranslationTranslation
Translation
Anmol0894
 
Principal characteristics of speech
Principal characteristics of speechPrincipal characteristics of speech
Principal characteristics of speechNikolay Karpov
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
Basha Chand
 

Similar to Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman (20)

Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
 
透過 Amazon Polly 為你的應用程式加入語音功能
透過 Amazon Polly 為你的應用程式加入語音功能透過 Amazon Polly 為你的應用程式加入語音功能
透過 Amazon Polly 為你的應用程式加入語音功能
 
Introduction to text to speech
Introduction to text to speechIntroduction to text to speech
Introduction to text to speech
 
Improvement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A ReviewImprovement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A Review
 
Announcing Amazon Polly - Turn Text into Lifelike Speech - December 2016 Mont...
Announcing Amazon Polly - Turn Text into Lifelike Speech - December 2016 Mont...Announcing Amazon Polly - Turn Text into Lifelike Speech - December 2016 Mont...
Announcing Amazon Polly - Turn Text into Lifelike Speech - December 2016 Mont...
 
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
 
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
 
G1803013542
G1803013542G1803013542
G1803013542
 
AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Polly (MAC204)
AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Polly (MAC204)AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Polly (MAC204)
AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Polly (MAC204)
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
 
Speech Synthesis.pptx
Speech Synthesis.pptxSpeech Synthesis.pptx
Speech Synthesis.pptx
 
江振宇/It's Not What You Say: It's How You Say It!
江振宇/It's Not What You Say: It's How You Say It!江振宇/It's Not What You Say: It's How You Say It!
江振宇/It's Not What You Say: It's How You Say It!
 
Direct Punjabi to English Speech Translation using Discrete Units
Direct Punjabi to English Speech Translation using Discrete UnitsDirect Punjabi to English Speech Translation using Discrete Units
Direct Punjabi to English Speech Translation using Discrete Units
 
Comparative study of Text-to-Speech Synthesis for Indian Languages by using S...
Comparative study of Text-to-Speech Synthesis for Indian Languages by using S...Comparative study of Text-to-Speech Synthesis for Indian Languages by using S...
Comparative study of Text-to-Speech Synthesis for Indian Languages by using S...
 
Chapter 2: Text Operation in information stroage and retrieval
Chapter 2: Text Operation in information stroage and retrievalChapter 2: Text Operation in information stroage and retrieval
Chapter 2: Text Operation in information stroage and retrieval
 
Voice morphing-101113123852-phpapp01
Voice morphing-101113123852-phpapp01Voice morphing-101113123852-phpapp01
Voice morphing-101113123852-phpapp01
 
NLP_KASHK:Text Normalization
NLP_KASHK:Text NormalizationNLP_KASHK:Text Normalization
NLP_KASHK:Text Normalization
 
Translation
TranslationTranslation
Translation
 
Principal characteristics of speech
Principal characteristics of speechPrincipal characteristics of speech
Principal characteristics of speech
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 

Recently uploaded

Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
Scholarhat
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
DhatriParmar
 
Best Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDABest Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDA
deeptiverma2406
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
Levi Shapiro
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
vaibhavrinwa19
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Akanksha trivedi rama nursing college kanpur.
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
Academy of Science of South Africa
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
thanhdowork
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
TechSoup
 
Normal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of LabourNormal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of Labour
Wasim Ak
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Dr. Vinod Kumar Kanvaria
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
tarandeep35
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
Celine George
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdfMASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
goswamiyash170123
 

Recently uploaded (20)

Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
 
Best Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDABest Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDA
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
 
Normal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of LabourNormal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of Labour
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdfMASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
 

Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman

  • 1. Progress on Bangla Text-To-Speech System Presented By: Dr. M. Shahidur Rahman Professor, Dept. of Computer Science & Engg. Shahjalal University of Science & Technology rahmanms@sust.edu
  • 2. Outline • Introduction to TTS • How TTS works • Present Bangla TTS systems • Problems of the present Bangla TTS • Directions to improve the performance of Bangla TTS • Discussion… 2
  • 3. What is a TTS? • The goal of text-to-speech (TTS) synthesis is to convert an arbitrary input text into intelligible and natural sounding speech – TTS is not a “cut-and-paste” approach that strings together isolated words – Instead, TTS employs linguistic analysis to infer correct pronunciation and prosody (i.e., NLP) and acoustic representations of speech to generate waveforms (i.e., DSP) 3
  • 4. TTS Applications Applications:  Services for the visually impaired community  Services for the Illiterate people with difficulties in reading  Enable use of Computers and IT services  Reading email aloud  Using Word processor  Using Internet Commercial TTS Systems:  Festival  Bell Labs TTS 4
  • 6. Different TTS Systems Phoneme-Based TTS System • Phonemes are: – The minimal distinctive phonetic units – Relatively small in number (39 phonemes in English) • Disadvantage – Phonemes ignore transitional sound !!! 6
  • 7. Different TTS Systems (cont’d) Diphone-Based TTS System:  Diphones are: – Made up of 2 phonemes – Incorporate transitional sound – Produce better sounding speech – Ex. কক = ক + কঅ + অক + ক Disadvantage: • Over 1500 diphones in English language !!! 7
  • 8. Text Pre-Processing • Convert raw text, which may include numbers, abbreviations, etc., into the equivalent of written-out words 8
  • 9. Word to Diphone Converter (Phonetization)  Purpose  Translate words to their diphone representations (Ex. রাজা -> Diphones: {র + রআ + আজ + জআ})  mark the text into prosodic units such as phrases, clauses and sentences  Resource – Dictionary of words and their diphones 9
  • 12. Altering Pitch/Duration/Amplitude • For smooth concatenation, altering pitch, duration and amplitude at the concatenation point is very important. 12
  • 13. Altering Pitch Hanning window Original diphone Extracted pitch period Hanned pitch period X = 13
  • 14. PSOLA – Pitch Synchronous Overlap and Add = 50% Overlap + Add Pitch Up > 50% Pitch Down < 50% 14
  • 15. Altering Duration • Increase number of PSOLA iterations (overlaps) to increase duration • Decrease number of PSOLA iterations (overlaps) to decrease duration 15
  • 16. Altering Amplitude  Multiplying the signal by a constant  If constant > 1, amplitude increase  If constant < 1, amplitude decrease 16
  • 17. Concatenation Diphones  Word • Using PSOLA at the joining ends • Ensures smooth transition Words  Sentence • Straight joining at the end points due to presence of pauses 17
  • 18. Putting All Together TTS System Text Pre-processing Prosody Concatenation words 18
  • 19. Types of Concatenative speech synthesis • Concatenative synthesis with a fixed inventory – contain one sample for each unit, and perform prosodic modification to match the required prosody • Unit-selection-based synthesis – store several instances of each unit, thus improving the chances of finding a well-matched unit 19
  • 20. Progress of Bangla TTS • KATHA  Developed in BRAC university  Unit based system using Festival framework  4355 Diphones  Takes 2 sec to generate a 10 sec utterance • BANGLA VAANI  syllable based synthesis system  Developed in Kolkata • SUBACHAN  Developed by SUST people  Diphone based synthesis system  527 Diphones  Takes 45ms to generate a 10 sec utterance 20
  • 21. Speech Signal From Kotha and Subachan • (Voice of kotha) তিতি প্রধািি কতি হলেও বিশ তকছু প্রিন্ধ- তিিন্ধ রচিা ও প্রকাশ কলরলছি • (Voice of Subachan) তিতি প্রধািি কতি হলেও বিশ তকছু প্রিন্ধ-তিিন্ধ রচিা ও প্রকাশ কলরলছি • (Voice of kotha) জীবনানন্দ দাশ ববিংশ শতাব্দীর অনযতম প্রধান আধুবনক বািংলা কবব • (Voice of Subachan) জীবনানন্দ দাশ ববিংশ শতাব্দীর অনযতম প্রধান আধুবনক বািংলা কবব 21
  • 22. Problems: Homograph Ambiguity • Homographs are words that share the same spelling but differ in meaning and pronunciation 22
  • 23. Solution: Homograph Disambiguation  Collect allpossible homograph words  Determine POS tag of the homograph words Ex. বছলেরামালেিে (bol) বেেলছ। িু তম যালি তক িা িে (bolo)। • Bayes Theorem can also be applied to determine the likelihood of a word. 23
  • 24. Problems: Improper Concatenation 24 Not concatenated properly Signal from the the utterance of রাশেদ
  • 25. Solution: Improper Concatenation • PSOLA • Reducing number of concatenation point – Ex 1. Sentence-> কামাে ভাে বছলে। Diphones-> কা + আমা + আে ভা+আলো বছ+এলে Instead of ক + কআ +আম + মআ +আে + ে … – Ex 2. ফলাাঃ পৃবিবী -> পৃ + ইবি + ইবী • Vowel sound is periodic, thus suitable for appropriate concatenation • Use 1000 most frequently spoken word 25
  • 29. Sound Synthesized by Katha • Katha 29
  • 30. Sound Synthesized by Subachan • Subachan 30