Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman

Progress on
Bangla Text-To-Speech System
Presented By:
Dr. M. Shahidur Rahman
Professor, Dept. of Computer Science & Engg.
Shahjalal University of Science & Technology
rahmanms@sust.edu

Outline
• Introduction to TTS
• How TTS works
• Present Bangla TTS systems
• Problems of the present Bangla TTS
• Directions to improve the performance of
Bangla TTS
• Discussion…
2

What is a TTS?
• The goal of text-to-speech (TTS) synthesis is to convert an
arbitrary input text into intelligible and natural sounding
speech
– TTS is not a “cut-and-paste” approach that strings together
isolated words
– Instead, TTS employs linguistic analysis to infer correct
pronunciation and prosody (i.e., NLP) and acoustic
representations of speech to generate waveforms (i.e.,
DSP)
3

TTS Applications
Applications:
 Services for the visually impaired community
 Services for the Illiterate people with difficulties in reading
 Enable use of Computers and IT services
 Reading email aloud
 Using Word processor
 Using Internet
Commercial TTS Systems:
 Festival
 Bell Labs TTS
4

Different TTS Systems
Phoneme-Based TTS System
• Phonemes are:
– The minimal distinctive phonetic units
– Relatively small in number (39 phonemes in English)
• Disadvantage
– Phonemes ignore transitional sound !!!
6

Different TTS Systems (cont’d)
Diphone-Based TTS System:
 Diphones are:
– Made up of 2 phonemes
– Incorporate transitional sound
– Produce better sounding speech
– Ex. কক = ক + কঅ + অক + ক
Disadvantage:
• Over 1500 diphones in English language !!!
7

Text Pre-Processing
• Convert raw text, which may include numbers, abbreviations,
etc., into the equivalent of written-out words
8

Word to Diphone Converter
(Phonetization)
 Purpose
 Translate words to their diphone representations
(Ex. রাজা -> Diphones: {র + রআ + আজ + জআ})
 mark the text into prosodic units such as phrases,
clauses and sentences
 Resource
– Dictionary of words and their diphones
9

Prosody
Diphone
Retrieval
ConcatenationAcoustic
Manipulation
Diphone
Database
Prosody
Param.
10

Properties of Speech
PeriodicNon-
Periodic
Non-
Periodic
eg. cat.wav
11

Altering Pitch/Duration/Amplitude
• For smooth concatenation, altering pitch,
duration and amplitude at the concatenation
point is very important.
12

Altering Pitch
Hanning
window
Original diphone Extracted
pitch period
Hanned
pitch period
X
=
13

PSOLA – Pitch Synchronous Overlap
and Add
=
50% Overlap + Add
Pitch Up > 50%
Pitch Down < 50%
14

Altering Duration
• Increase number of PSOLA iterations
(overlaps) to increase duration
• Decrease number of PSOLA iterations
(overlaps) to decrease duration
15

Altering Amplitude
 Multiplying the signal by a constant
 If constant > 1, amplitude increase
 If constant < 1, amplitude decrease
16

Concatenation
Diphones  Word
• Using PSOLA at the joining ends
• Ensures smooth transition
Words  Sentence
• Straight joining at the end points due to
presence of pauses
17

Putting All Together
TTS System
Text
Pre-processing Prosody Concatenation
words
18

Types of Concatenative speech
synthesis
• Concatenative synthesis with a fixed inventory
– contain one sample for each unit, and perform
prosodic modification to match the required
prosody
• Unit-selection-based synthesis
– store several instances of each unit, thus
improving the chances of finding a well-matched
unit
19

Progress of Bangla TTS
• KATHA
 Developed in BRAC university
 Unit based system using Festival framework
 4355 Diphones
 Takes 2 sec to generate a 10 sec utterance
• BANGLA VAANI
 syllable based synthesis system
 Developed in Kolkata
• SUBACHAN
 Developed by SUST people
 Diphone based synthesis system
 527 Diphones
 Takes 45ms to generate a 10 sec utterance
20

Speech Signal From Kotha and Subachan
• (Voice of kotha) তিতি প্রধািি কতি হলেও বিশ তকছু প্রিন্ধ-
তিিন্ধ রচিা ও প্রকাশ কলরলছি
• (Voice of Subachan) তিতি প্রধািি কতি হলেও বিশ তকছু
প্রিন্ধ-তিিন্ধ রচিা ও প্রকাশ কলরলছি
• (Voice of kotha) জীবনানন্দ দাশ ববিংশ
শতাব্দীর অনযতম প্রধান আধুবনক বািংলা কবব
• (Voice of Subachan) জীবনানন্দ দাশ ববিংশ
শতাব্দীর অনযতম প্রধান আধুবনক বািংলা কবব
21

Problems: Homograph Ambiguity
• Homographs are words that share the same spelling
but differ in meaning and pronunciation
22

Solution: Homograph Disambiguation
 Collect allpossible homograph words
 Determine POS tag of the homograph words
Ex. বছলেরামালেিে (bol) বেেলছ।
িু তম যালি তক িা িে (bolo)।
• Bayes Theorem can also be applied to determine the
likelihood of a word.
23

Problems: Improper Concatenation
24
Not concatenated
properly
Signal from the the
utterance of রাশেদ

Solution: Improper Concatenation
• PSOLA
• Reducing number of concatenation point
– Ex 1. Sentence-> কামাে ভাে বছলে।
Diphones-> কা + আমা + আে ভা+আলো বছ+এলে
Instead of ক + কআ +আম + মআ +আে + ে …
– Ex 2. ফলাাঃ পৃবিবী -> পৃ + ইবি + ইবী
• Vowel sound is periodic, thus suitable for
appropriate concatenation
• Use 1000 most frequently spoken word
25

Thank you all!
Suggestions??
28

Sound Synthesized by Katha
• Katha
29

Sound Synthesized by Subachan
• Subachan
30

Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (17)

Similar to Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman

Similar to Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman (20)

Recently uploaded

Recently uploaded (20)

Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman