Text To Speech Synthesis System For Marathi Language Using Concatenation Technique

Presented by
Mr. Sangramsing Nathusing Kayte
Guided by
Dr.Bharti W. Gawali
Professor & Head
Department of Computer Science & Information Technology,
Dr. Babasaheb Ambedkar Marathwada University, Aurangabad (M.S.) India.
Dr. Babasaheb Ambedkar Marathwada University
Department of Computer Science and Information Technology
Text To Speech Synthesis System For Marathi Language
Using Concatenation Technique
1

Presentation Overview
 Introduction to Speech Processing
 Speech Synthesis
 Techniques of Speech Synthesis
 Survey of Literature
 Objectives of the Research
 Tools Used for Implementation
 Performance Evaluation Methods
 Database Creation
 Comparative Analysis
 Contribution and Significance
 Conclusion
 References
2

Introduction to Speech Processing
 Speech is the most basic form of communication between human
beings.
 Every human being cannot read and write , however, can
communicate using speech.
 Its high influence on the day-to-day life of human being
3

Pictorial
Involves organs like Vocal chords, Nasal cavity , mouth, teeth,
Lips etc.
4

Various speech processing Phases
1. Speech Recognition
2. Speech Synthesis
3. Speech coding
4. Speech compression
We are Focusing on speech synthesis.
5

Speech Synthesis
Speech Synthesis is an artificial production of human speech.
It is the computer-generated simulation of human speech.
Also called as Text to Speech (TTS) system that converts text
into spoken language
प्राचीन भारतीय अर्थव्यवस्र्ा
अततशय सुंदर होती
Text to Speech
System
Text Speech
6

Architecture of a Text-to-Speech System
Text
Text Normalization
Prosodic Prediction
• Duration
• F0 Contour
• Energy
Waveform-Generation
Handling numbers, symbols,
abbreviations etc.
Pre-processing
Linguistic Analysis:
• Part of speech tagging
• Phrase breaks
• Letter to Sound Rules
Speech
• Document Structure Detection
• Conversion from Unicode and fonts
Tagged
Text
Word
sequence
Phone
sequence
7
प्राचीन भारतीय अर्थव्यवस्र्ा
अततशय सुंदर होती

Speech Synthesis Techniques
There are different types of synthesis methods that can be used
when building a TTS synthesis system.
1 Articulatory Synthesis
2. Formant Synthesis
3. Concatenative Speech Synthesis
8

1 Articulatory Synthesis:- Articulatory Synthesis is a method of
synthesizing speech by controlling the speech articulators (e.g. jaw,
tongue, lips, etc.)
2.Formant Synthesis:- Each phone is produced by specifying the
Formants and pitch. A set of rules are also specified to modify pitch
and formants, so that transition from one phone to another phone
is sufficiently smooth
9

3. Concatenative synthesis:- Concatenative synthesis is based
on the concatenation (or stringing together) of segments of pre-
recorded speech. Architecture of Concatenative
synthesis
Input data
Text input
Speech recordings
Unit
segmentation
Unit database
Unit
selection
Concatenation
+ smoothing
i a
sh
+… + + …
10

There Are Three Main Subtypes Of Concatenative
Synthesis:
1.Domain-specific synthesis
2.Di-phone synthesis
3.Unit selection synthesis
1) Domain-specific synthesis:- concatenates pre-recorded
words and phrases to create complete utterances.
2) Di-phone synthesis:- uses a minimal speech database
containing all the Di-phones (sound-to-sound transitions)
occurring in a given language. In di-phone synthesis, only one
example of each di-phone is contained in the speech database. 11

3)Unit selection synthesis:- uses large speech databases (more than
one hour of recorded speech). During database creation, each recorded
utterance is segmented into some or all of the following linguistic constructs
such as phonemes, words phrases and sentences
Architecture of Unit Selection synthesis
12
Unit Selection
• Preprocessing
• Text Normalization
• Linguistic analysis
n - Diaphone units
(large speech
database)
Text Speech
Phones

Survey of Literature
There are various foreign languages in which work has been
done or going on such as English, Japanese, European Portuguese,
Arabic, Polish, Korean, German, Turkish, Mongolian, and Greek.
While focusing towards Indian languages there are total 22
official languages out of which Hindi, Malayalam, Kannada,
Bengali, Oriya, Punjabi, Gujarati, Telugu, and Marathi are being in
focused.
13

Different systems have been developed in these languages such
as Dhvani, Shruti, HP Lab, Vani etc.
There are various institutions working on build a Marathi speech
synthesizer such as IIIT-Hyderabad, CDAC-Mumbai, CDAC-Pune,
and IIT Chennai.
They have built various applications such as e-speak, a-speak, i-
speak, Sandesh Pathak but in Hindi, Telugu and other languages.
Very less work has been done in Marathi.
14

15
International Speech Synthesis Languages scenario
Language Techniques
Arabic (Saudi Arabia) - ar-SA iSpeech SDK
Chinese (China) - zh-CN Festival framework
Chinese (Hong Kong SAR China) - zh-HK Festival framework
Chinese (Taiwan) - zh-TW Festival framework
English (Australia) - en-AU ,
English (Ireland) - en-IE
English (South Africa) - en-ZA
English (United Kingdom) - en-GB
English (United States) - en-US
Festival framework

16
National Speech Synthesis Languages scenario
Language Techniques
Punjabi Festival framework
Gujarati Concatenative synthesis ,Unit selection, festival
Tamil Concatenative synthesis ,Unit selection, HTK
Bangali Concatenative synthesis ,Unit selection
Oriya Concatenative synthesis ,Unit selection
Hindi Concatenative synthesis ,Unit selection
Kannada Hidden Markov Model
Marathi
Continuous Density Hidden Markov Model approach for Marathi
Speech synthesis system based on Marathi affricates.
Telugu Festival framework

17
International Institute, speech Synthesis work is carried out
Sr. No Institute Language
1 Digital Future of technology, New America
U.S. English
U.K. English
Spanish
French
German
2
Machine Intelligence Laboratory
Cambridge University Department of Engineering
U.S. English & U.K. English
3 IBM
Arabic
Chinese
French
German
U.S. English & U.K. English

18
National Institute, where Marathi speech Synthesis work is carried out
Sr. No Institute Language
1 TIFR Mumbai Hindi, Bengali, Marathi, Indian English
2 IIIT Hyderabad Hindi, Telugu, Other Language, Marathi
3 IIT Mumbai Hindi, Marathi
4 CDAC Pune Hindi, Marathi, Indian English, Other Language
5 CDAC Noida Hindi, Panjabi, Marathi, Other Language
6 CIIL Mysore Most of the Indian Languages, Marathi

Objectives of the Research
The major objective of the proposed study is to build a Unit
based synthesis system.
Creation of database for speech synthesis.
Developing the MATLAB and Android APP based on the speech
synthesis system in Marathi.
19

TOOLS Used for Implementation
Festival Framework
MATLAB
Android
20

Festival Framework
The Festival Speech Synthesis System is a general multi-lingual
speech synthesis system.
Developed by Alan W. Black Centre for Speech Technology
Research (CSTR) at the University of Edinburgh.
Festival is designed to support multiple languages, and comes
with support for English (British and American pronunciation),
Welsh, and Spanish.
Voice packages exist for several other languages, such as
Spanish, Finnish, Hindi, Italian, Marathi, Polish, Russian and
21

Festival Speech tools
 festival-2.4
 speech_tools-2.4
 festvox-2.1
 festlex_CMU
 festlex_OALD
 festlex_POSLEX
 festvox_kalloc
22

Commands
 Initially, set environment variables FESTVOXDIR and ESTDIR to
their respective directories. For example as
export FESTVOXDIR=/home/drbhati/unit/festvox
export ESTDIR=/home/drbhati/unit/speech_tools
 Filename for a voice uses three part names i.e a institution
name, a language, and a speaker.
mkdir bamu_mar_sing
 Make it current directory.
Cd bamu_mar_sing
23

 To set prompts for Unit selection process.
$FESTVOXDIR/src/unitsel/setup_clunits bamu mar sing
 To convert the wav files into 16000 HZ sampling frequency and
mono sound executable file
./bin/get_wavs recording/*.wav
 The next stage- To generate waveforms to act as prompts, or
timing cues even if the prompts are not actually played.
festival -b festvox/build_clunits.scm ’(build_prompts_waves
"etc/txt.data")’
Commands Cont..
24

 After recording the recorded files should be in wav/.
./bin/prompt_them etc/txt.data
 Lab files are created for each phone with following command.
./bin/make_labs prompt-wav/*.wav
 After here, the steps are concerned with signal analysis,
specifically pitch marking and cepstral parameter extraction.
 There are a number of methods for pitch mark extraction and a
number of parameters within these files that may need tuning.
festival -b festvox/build_clunits.scm ’(build_utts "etc/txt.data")’
25

Lab files are created for each phone with following command.
./bin/make_labs prompt-wav/*.wav
There are a number of methods for pitch mark extraction and a
number of parameters within these files that may need tuning.
festival -b festvox/build_clunits.scm ’(build_utts
"etc/txt.data")’
 Concerned with signal analysis, specifically pitch marking
and cepstral parameter extraction.
./bin/make_pm_wave wav/*.wav
26

The next stage it find the Mel Frequency Cepstral Coefficents.
./bin/make_mcep wav/*.wav
 Building the cluster unit selection synthesizer consists of a
number of stages all based on the controlling Festival script.
 The parameters of which are described above.
festival -b festvox/build_clunits.scm ’(build_clunits
"etc/txt.data")’
27

The resulting voice is synthesized voice
festival festvox/bamu_mar_bharti_clunits.scm
festival> (voice_bamu_mar_bharti_clunits)
festival> (SayText "kaarand~a aapalayaakad:ei tii padadhata
naahii.")
festival> (SayText "कारण आपल्याकडे ती पद्धत नाही.")
28

DATABASE USED FOR FESTIVAL
The speech data are collected with the help of recording studio
using the single channel.
Parameter Value
Sampling Rate 16000 HZ
Speakers Dependent
Condition of Noise Normal
Accent Marathi
Pre –emphasis 1-0.97z--1
Window type Hamming, 25 milliseconds
Window step size 20 millisecond
Distance of
Microphone
10-12 Meter
Table: Technical specification of Data recording
29
Database Statistics
Number of Utterances: 1000
Utterances: 1
Session: 01
Total size: 1000 sentences
Ubuntu Festival

30
Table for Sentences and label used for generating
synthesized Speech
Sr.No The Original Sentence Original
Speech File
Synthesis
Speech File
1 कारण आपल्याकडे ती पद्धत नाही A001 a001
2 प्राचीन भारतीय अर्थव्यवस्र्ा अततशय सुंदर होती A002 a002
3 कनाथटकात के वळ कन्नड अधधकृ त आहे A003 a003
4 ववककपीडडया हा एक ज्ञानकोश आहे A004 a004
5 कारण तू माझी आई आहेस A005 a005
6 इुंग्रजी भाषा आहे रोमन लिपी आहे A006 a006
7 येर्े `धन म्हणजे शद्ध िक्ष्मी A007 a007
8 सुंदभथ मात्र जमथनी येर्ीि A008 a008
9 शेवटचे र हे अक्षर मात्र कायम A009 a009
10 धिधाडे ही कोणी लशकार करत नाहह A0010 a0010
We tried on ten sentence to analyze the quality of original
and Synthesis

Performance Evaluation Methods
There are many performance evaluation methods.
But to judge the naturalness of the synthesized speech Most
widely used is Mean Opinion Score method
Mean opinion score (MOS) is a test that has been used for
decades in telephony networks to obtain the human user's view
of the quality of the network. The MOS is expressed as a single
number in the range 1 to 5, where 1 is lowest perceived audio
quality, and 5 is the highest perceived audio quality
measurement
31

Performance Evaluation MOS is calculated for subjective quality measurement.
Unit selection synthesis approach. It was counseled to the listeners that they have to score between 01
to 05 (Excellent – 05 Very good – 04 Good – 03 Satisfactory – 02 Not understandable-01) for
understandable.
Table 1. Unit selection speech synthesis of the scores given by each subject for each synthesis system
Subject Sub1 Sub2 Sub3 Sub4 Sub5 Sub6 Sub7 Sub8 Sub9 Sub10
Sentence
1 5 5 5 5 4 4 5 4 4 5
2 5 5 4 5 5 4 5 4 4 5
3 4 4 5 4 3 3 4 2 5 4
4 5 4 4 5 4 4 5 5 5 5
5 5 5 5 5 4 4 5 3 3 5
6 5 4 5 5 5 4 5 4 4 5
7 4 5 4 4 4 4 4 4 4 4
8 4 4 5 4 4 5 4 5 5 4
9 5 3 5 5 3 4 5 3 5 5
10 5 5 4 5 4 4 5 4 4 5

33
Sr. No Original Speech File Synthesized File P.S.N.R M.S.E
1 A001 a001 3.30 7.94
2 A002 a002 6.72 4.57
3 A003 a003 3.21 1.02
4 A004 a004 4.20 3.70
5 A005 a005 2.57 7.61
6 A006 a006 1.26 5.32
7 A007 a007 7.56 8.06
8 A008 a008 1.29 7.20
9 A009 a009 3.24 9.25
10 A0010 a0010 4.08 7.01
Average 3.74 6.17
Quality 96.26 93.83
The PSNR and MSE method was used for subjective quality measure of speech
synthesis based on unit selection approach.
Peak Signal Noise Ratio and Mean Square Error Quality Measure

Synthesis System Design
The Speech Synthesis system is developed in two platform :
a) MATLAB based Marathi Speech Synthesis system
b) Android Based Marathi Speech Synthesis system
34

a. Matlab based speech synthesised calculator
MATLAB (matrix laboratory) is a multi-paradigm numerical
computing environment and fourth-generation programming
language.
GUI based Calculator is developed and concatenation speech
synthesis technique is applied.
35

Prototype of Marathi Speech Talking Calculator Application
36

37
Testing table for the MATLAB Marathi Speech Talking
CalculatorSr.No NO1 Op1 NO2 Op2 Result Response Time Result in Speech
१ २ + ६ = ८ १ सेकुं द आठ
२ ५ + ९ = १४ १ सेकुं द चौदा
३ ५९ + ८९ = १४८ १.६सेकुं द एकशे अठ्ठेचाळीस
४ ६६ + ८४ = १५० १.६ सेकुं द एकशे पन्नास
५ ८७ + ९३ = १८० १.६ सेकुं द एकशेऐुंशी
६ ९६८ + ७६९ = १८३७ १.७ सेकुं द एकहजार आठशे सदतीस
७ २३६७ + ९५६३ = ११९३० २.६सेकुं द अकराहजार नऊशे तीस
८ ५८६ + ३२१ = ९०७ १.६ सेकुं द नऊशे सात
९ ५८२ + ६३९ = १२२१ १.७ सेकुं द एकहजार दोनशे एकवीस
१० ४९३ + ३५७ = ८५० १.६ सेकुं द आठशे पन्नास
११ १२.१५ + ९५.६९ = १०७.८४ २.६ सेकुं द एकशे सात परुंका चौऱ्याऐुंशी
१२ ७५.९ + ५६.७८ = १३२.६८ २.६ सेकुं द एकशे बत्तीस परुंका अडस्ठ
१३ १० - ६ = ४ १ सेकुं द चार
१४ १५ - ९ = ४ १ सेकुं द चार
१५ ८९ - ६३ = २६ १ सेकुं द सव्वीस

38
१६ ७९५ - ५५६ = २३९ १.३ सेकुं द दोनशे एकोणचाळीस
१७ १५६७ - २१५९ = ५९२ १.३ सेकुं द पाचशे ब्याण्णव
१८ ४५९ - ३६७ = ९२ १.३ सेकुं द ब्याण्णव
१९ ९६१५३ - ६५८९ = ८९५६४ २.६ सेकुं द एकोणनव्वद हजार पाचशे चौस्ठ
२० ५६४७८९ - २३६९ = ५६२४२० २.८ सेकुं द पाचिीक बास्ठ हजार चारशे वीस
२१ २३६५ - २५८ = २१०७ १.४ सेकुं द दोन हजार एकशे सात
२२ ८८८ - ६६६ = २२२ १.२ सेकुं द दोनशे बावीस
२३ १५९.३६ - २३.८ = १३५.५६ २.८ सेकुं द एकशे पस्तीस परुंका छप्पन्न
२४ ९५३.१२६ - ६९३.२५ = २५९.८७६ २.८ सेकुं द दोनशे एकोणसाठ परुंका आठशे शहात्तर
२५ २३ * ५९ = १३५७ २ सेकुं द एकहजार तीनशे सत्तावन्न
२६ १३ * ३६ = ३३८ १.४ सेकुं द तीनशे अडती
२७ ९ * ८ = ७२ १.४ सेकुं द बाहत्तर
२८ २६ * १२ = ३१२ १.४ सेकुं द तीनशे बारा
२९ ६६ * ९ = ५९४ १.४ सेकुं द पाचशे चौऱ्याण्णव
३० ३६ * ११ = १०५६ २ सेकुं द एकहजार छप्पन्न
३१ ५३९ * १२ = ६४६८ १.४ सेकुं द सहा हजार चारशे अडस्ठ
३२ २५७८ * ३ = १५४६८ २.८ सेकुं द पुंधरा हजार चारशे अडस्ठ
३३ ६९७५ * १२ = ८३७०० २.४ सेकुं द त्र्याऐुंशी हजार सातशे

39
३४ २२५८ * ५६ = १२६४४८ २.८ सेकुं द एक िीक सव्वीस हजार चारशे अठ्ठेचाळीस
३५ ५२३.६९ * १२.३ = ६४४१.३८७ ३. २ सेकुं द सहा हजार चारशे एक्के चाळीस परुंका तीनशे
सत्त्याऐुंशी
३६ ८५९६.३ * ६.३ = ५४१५६.६९ ४ सेकुं द चोपन्न हजार एकशे छप्पन्न परुंका एकोणसत्तर
३७ ५४ / ४ = १३.५ २ सेकुं द तेरा परुंका पाच
३८ ३३० / ८ = ४१.२५ २ सेकुं द एक्के चाळीसएक्काकॅ िेस परुंका पुंचवीस
३९ ७५० / ६ = १२५ १.४ सेकुं द एकशे पुंचवीस
४० ८५० / ६ = १४१.६६ ३ सेकुं द एकशे एक्के चाळीस परुंका सहास्ठ
४१ ६५० / १० = ६५ १ सेकुं द पास्ठ
४२ ६३ / ५ = १२.६ २ सेकुं द बारा परुंका सहा
४३ ३३० / १५ = २२ १ सेकुं द बावीस
४४ ५९६ / ७ = ८५.१४ २ सेकुं द पुंच्याऐुंशी परुंका चौदा
४५ २३५८९ / १० = २३५८.९ २.४ सेकुं द दोन हजार तीनशे अठ्ठावन्न परुंका नऊ
४६ ९६३२ / ८ = १२०४ १.६ सेकुं द एक हजार दोनशे चार
४७ ५६३.६ / २.५६ = २२०.१५६ ३.४ सेकुं द दोनशे वीस परुंका एकशे छप्पन्न
४८ ८५६९.२३ / १२.८५ = ६६६.८६६ ३.२ सेकुं द सहाशे सहास्ठ परुंका आठशे सहास्ठ
४९ ५९.२६३ / ९.१२ = ६.४९ २ सेकुं द सहा परुंका एकोणपन्नास
५० १५६३.२५ / २१.२५ = ७३.५३४ २ सेकुं द त्र्याहत्तर परुंका पाचशे चौतीस

40
Performance Evaluation MOS on MATLAB based Marathi Speech Talking
CalculatorMOS is calculated for subjective quality measurement. It is calculated for the synthesized
speech using the Unit selection synthesis. It was counseled to the listeners that they have
to score between 01 to 05 (Excellent – 05 Very good – 04 Good – 03 Satisfactory – 02
Not understandable-01) for understandable.
Sentence
1 5 5 5 5 4 4 5 4 5 5
2 5 5 5 5 5 4 5 4 5 5
3 5 5 5 5 5 4 5 4 5 5
4 5 5 5 5 5 5 5 4 5 5
5 5 5 5 5 5 5 5 5 5 5
6 5 5 5 5 5 5 5 5 5 5
7 5 5 5 5 4 5 5 5 5 5
8 5 5 5 5 4 4 5 5 5 5
9 5 5 5 5 4 4 5 5 5 5
10 5 5 5 5 4 4 5 4 5 5
11 5 5 5 5 4 4 5 4 5 5

41
12 5 5 5 5 4 4 5 4 5 5
13 5 5 5 5 4 4 5 4 5 5
14 5 5 5 5 4 4 5 4 5 5
15 5 5 5 5 4 4 5 4 5 5
16 5 5 5 5 4 4 5 4 5 5
17 5 5 5 5 4 4 5 4 5 5
18 5 5 5 5 4 4 5 4 5 5
19 5 5 5 5 4 4 5 4 5 5
20 5 5 5 5 4 4 5 4 5 5
21 5 5 5 5 4 4 5 4 5 5
22 5 5 5 5 4 4 5 4 5 5
23 5 5 5 5 4 4 5 4 5 5
24 5 5 5 5 4 4 5 4 5 5
25 5 5 5 5 4 4 5 4 5 5
26 5 5 5 5 4 4 5 4 5 5
27 5 5 5 5 4 4 5 4 5 5
28 5 5 5 5 4 4 5 4 5 5
29 5 5 5 5 4 4 5 4 5 5

42
30 5 5 5 5 4 4 5 4 5 5
31 5 5 5 5 4 4 5 4 5 5
32 5 5 5 5 4 4 5 4 5 5
33 5 5 5 5 4 4 5 4 5 5
34 5 5 5 5 4 4 5 4 5 5
35 5 5 5 5 4 4 5 4 5 5
36 5 5 5 5 4 4 5 4 5 5
37 5 5 5 5 4 4 5 4 5 5
38 5 5 5 5 4 4 5 4 5 5
39 5 5 5 5 4 4 5 4 5 5
40 5 5 5 5 4 4 5 4 5 5
41 5 5 5 5 4 4 5 4 5 5
42 5 5 5 5 4 4 5 4 5 5
43 5 5 5 5 4 4 5 4 5 5
44 5 5 5 5 4 4 5 4 5 5
45 5 5 5 5 4 4 5 4 5 5
46 5 5 5 5 4 4 5 4 5 5
47 5 5 5 5 4 4 5 4 5 5
48 5 5 5 5 4 4 5 4 5 5
49 5 5 5 5 4 4 5 4 5 5

43
The table shows that the 85% of individuals rated the quality of synthetic speech is good and
understandable. The only 15 % speech is perceptible and not clearly produced. Thus Unit selection
method provides the naturalness and understandability, the two important parameters of TTS
system.
The quality of speech as per the above MOS table is as follows:
Graphical Representation :- Percentage Evaluation for Synthesized output speech for
MATLAB application
85%
15%
High quality and Understandable 85%
Perceptible Quality speech 15%

b. Android based speech synthesized
Android is a mobile operating system developed by Google, based
on the Linux kernel and designed primarily for touchscreen mobile
devices such as smartphones and tablets.
Android's user interface is mainly based on direct manipulation,
using touch gestures that loosely correspond to real-world actions,
such as swiping, tapping and pinching, to manipulate on-screen
objects, along with a virtual keyboard for text input. In addition to
touchscreen devices, Google has further developed Android TV for
televisions, Android Auto for cars, and Android Wear for wrist
watches, each with a specialized user interface. Variants of Android
are also used on notebooks, game consoles, digital cameras, and
other electronics.
44

45
Android Applications in International scenario
Sr. No Name of Application Language
1 Google Text to Speech English
2 Easy Text speech English
3 IVONA Text-to-Speech for iOS English,Welish , Polish
4 Voice Dream Reader English
5 Text to Speech for iOS English
6 Type and speak English
7 eReader Prestigo:Book Reader 25 Languages
8 Announcify English
9 Select and Speak English
10 SpeakIt English
11 Dictanots English
12 Voice Recognition English
13 Text to Speech Reader English

46
Android Applications in National scenario
Sr.No Name of Application Language
1 aSpeak Telugu, Hindi
2 Sandesh Pathak Telugu ,Hindi, Marathi, Tamil and Gujarati
3 Shruti Hindi and Bengali
4 HP labs Hindi
5 Vani Hindi
6 Dhvani
Hindi, Malayalam, Kannada, Bengali, Oriya, Punjabi,
Telegu, Marathi

DatabaseCreation MATLAB & Android
The total number of words with probability 121 , utterance and the data was collected
in 1 sessions so, the overall 121/- vocabulary size are collected for the database.
Table : Design Speech database for MTC Application
Number Marathi Pronunciation Number Marathi Pronunciation
१ एक 11 अकरा
2 दोन 12 बारा
३ तीन 13 तेरा
४ चार 14 चौदा
५ पाच 15 पुंधरा
६ सहा 16 सोळा
७ सात 17 सतरा
८ आठ 18 अठरा
९ नउ 19 एकोणीस
१० दहा 20 वीस
47

48
Number Marathi
Pronunciation
Number Marathi
Pronunciation
Number Marathi Pronunciation
21 एकवीस 35 पस्तीस 49 एकोणपन्नास
22 बावीस 36 छत्तीस 50 पन्नास
23 तेवीस 37 सदतीस 51 एक्कावन्न
24 चोवीस 38 अडतीस 52 बावन्न
25 पुंचवीस 39 एकोणचाळीस 53 त्रेपन्न
26 सव्वीस 40 चाळीस 54 चोपन्न
27 सत्तावीस 41 एक्के चाळीस 55 पुंचावन्न
28 अठ्ठावीस 42 बेचाळीस 56 छप्पन्न
29 एकोणतीस 43 त्रेचाळीस 57 सत्तावन्न
30 तीस 44 चव्वेचाळीस 58 अठ्ठावन्न
31 एकतीस 45 पुंचेचाळीस 59 एकोणसाठ
32 बत्तीस 46 सेहेचाळीस 60 साठ
33 तेहेतीस 47 सत्तेचाळीस 61 एकस्ठ
34 चौतीस 48 अठ्ठेचाळीस 62 बास्ठ
Cont..

49
Number Marathi
Pronunciation
Number Marathi
Pronunciation
Number Marathi
Pronunciation
63 त्रेस्ठ 78 अठ्ठ्याहत्तर 93 त्र्याण्णव
64 चौस्ठ 79 एकोण ऐुंशी 94 चौऱ्याण्णव
65 पास्ठ 80 ऐुंशी 95 पुंच्याण्णव
66 सहास्ठ 81 एक्क्याऐुंशी 96 शहाण्णव
67 सदस्ठ 82 ब्याऐुंशी 97 सत्त्याण्णव
68 अडस्ठ 83 त्र्याऐुंशी 98 अठ्ठ्याण्णव
69 एकोणसत्तर 84 चौऱ्याऐुंशी 99 नव्व्याण्णव
70 सत्तर 85 पुंच्याऐुंशी 100 शुंभर
71 एक्काहत्तर 86 शहाऐुंशी १०१ एकशे एक
72 बाहत्तर 87 सत्त्याऐुंशी १००० हजार
73 त्र्याहत्तर 88 अठ्ठ्याऐुंशी १०,००० दहा हजार
74 चौर्याहत्तर 89 एकोणनव्वद १,००,००० िाख
75 पुंच्याहत्तर 90 नव्वद १०,००,००० दहा िाख
76 शहात्तर 91 एक्क्याण्णव १,००,००,००० कोटी
77 सत्याहत्तर 92 ब्याण्णव १००,००,००,००० शुंभर कोटी
Cont..

Table :Operation (Action Taken) Vocabulary for MTC Application
Sr. No Operation Pronunciation
1 + बेरीज
2 - वजाबाकी
3 * गुणाकार
4 / भागाकार
5 . दशाांश
6 = बरोबर
Cont..
50

ACQUISITION SETUP
The speech data are collected with the help of Real-tech microphone and
CSL is using the single channel.
Parameter Value
Sampling Rate 16 000 HZ
Speakers Dependent
Condition of Noise Normal
Accent Marathi
Pre –emphasis 1-0.97z--1
Window type Hamming, 25 milliseconds
Window step size 20 millisecond
Distance of Microphone 10-12 Meter
Table: Technical specification of Data recording
MATLAB & Android
Database Statistics
Number of Utterances: 121
Utterances: 1
Session: 01
Total size: 121 words
51

b) Android Based Marathi Speech Talking Calculator :
52
Prototype of Marathi Speech Talking Calculator Application

Testing table for the Android Marathi Speech Talking
Calculator
53
Sr_No NO1 Op1 NO2 Op2 Result Response
Time
Result in Speech
१ २ + ६ = ८ १ सेकुं द आठ
२ ५ + ९ = १४ १ सेकुं द चौदा
३ ५९ + ८९ = १४८ १.६सेकुं द एकशे अठ्ठेचाळीस
४ ६६ + ८४ = १५० १.६ सेकुं द एकशे पन्नास
५ ८७ + ९३ = १८० १.६ सेकुं द एकशेऐुंशी
६ ९६८ + ७६९ = १८३७ १.७ सेकुं द एकहजार आठशे सदतीस
७ २३६७ + ९५६३ = ११९३० ३ सेकुं द अकराहजार नऊशे तीस
८ ५८६ + ३२१ = ९०७ १.६ सेकुं द नऊशे सात
९ ५८२ + ६३९ = १२२१ १.७ सेकुं द एकहजार दोनशे एकवीस
१० ४९३ + ३५७ = ८५० १.६ सेकुं द आठशे पन्नास
११ १२.१५ + ९५.६९ = १०७.८४ २.३ सेकुं द एकशे सात परुंका चौऱ्याऐुंशी
१२ ७५.९ + ५६.७८ = १३२.६८ ३.२ सेकुं द एकशे बत्तीस परुंका अडस्ठ
१३ १० - ६ = ४ १ सेकुं द चार
१४ १५ - ९ = ४ १ सेकुं द चार
१५ ८९ - ६३ = २६ १ सेकुं द सव्वीस

54
१६ ७९५ - ५५६ = २३९ १.३ सेकुं द दोनशे एकोणचाळीस
१७ १५६७ - २१५९ = ५९२ १.३ सेकुं द पाचशे ब्याण्णव
१८ ४५९ - ३६७ = ९२ १.३ सेकुं द ब्याण्णव
१९ ९६१५३ - ६५८९ = ८९५६४ ३ सेकुं द एकोणनव्वद हजार पाचशे चौस्ठ
२० ५६४७८९ - २३६९ = ५६२४२० ३.८ सेकुं द पाचिीक बास्ठ हजार चारशे वीस
२१ २३६५ - २५८ = २१०७ १.४ सेकुं द दोन हजार एकशे सात
२२ ८८८ - ६६६ = २२२ १.२ सेकुं द दोनशे बावीस
२३ १५९.३६ - २३.८ = १३५.५६ २ सेकुं द एकशे पस्तीस परुंका छप्पन्न
२४ ९५३.१२६ - ६९३.२५ = २५९.८७६ ३.२ सेकुं द दोनशे एकोणसाठ परुंका आठशे शहात्तर
२५ २३ * ५९ = १३५७ १.४ सेकुं द एकहजार तीनशे सत्तावन्न
२६ १३ * ३६ = ३३८ १.४ सेकुं द तीनशे अडती
२७ ९ * ८ = ७२ १.४ सेकुं द बाहत्तर
२८ २६ * १२ = ३१२ १.४ सेकुं द तीनशे बारा
२९ ६६ * ९ = ५९४ १.४ सेकुं द पाचशे चौऱ्याण्णव
३० ३६ * ११ = १०५६ १.४ सेकुं द एकहजार छप्पन्न
३१ ५३९ * १२ = ६४६८ १.४ सेकुं द सहा हजार चारशे अडस्ठ
३२ २५७८ * ३ = १५४६८ ३.४ सेकुं द पुंधरा हजार चारशे अडस्ठ
३३ ६९७५ * १२ = ८३७०० ३.६ सेकुं द त्र्याऐुंशी हजार सातशे
Cont.

55
३४ २२५८ * ५६ = १२६४४८ ३.८ सेकुं द एक िीक सव्वीस हजार चारशे
अठ्ठेचाळीस
३५ ५२३.६९ * १२.३ = ६४४१.३८७ ३.३ सेकुं द सहा हजार चारशे एक्के चाळीस परुंका
तीनशे सत्त्याऐुंशी
३६ ८५९६.३ * ६.३ = ५४१५६.६९ ३ सेकुं द चोपन्न हजार एकशे छप्पन्न परुंका
एकोणसत्तर
३७ ५४ / ४ = १३.५ २ सेकुं द तेरा परुंका पाच
३८ ३३० / ८ = ४१.२५ २ सेकुं द एक्के चाळीसएक्काकॅ िेस परुंका पुंचवीस
३९ ७५० / ६ = १२५ १.४ सेकुं द एकशे पुंचवीस
४० ८५० / ६ = १४१.६६ ३ सेकुं द एकशे एक्के चाळीस परुंका सहास्ठ
४१ ६५० / १० = ६५ १ सेकुं द पास्ठ
४२ ६३ / ५ = १२.६ २ सेकुं द बारा परुंका सहा
४३ ३३० / १५ = २२ १ सेकुं द बावीस
४४ ५९६ / ७ = ८५.१४ २ सेकुं द पुंच्याऐुंशी परुंका चौदा
४५ २३५८९ / १० = २३५८.९ ३.२ सेकुं द दोन हजार तीनशे अठ्ठावन्न परुंका नऊ
४६ ९६३२ / ८ = १२०४ १.६ सेकुं द एक हजार दोनशे चार
४७ ५६३.६ / २.५६ = २२०.१५६ ३.२ सेकुं द दोनशे वीस परुंका एकशे छप्पन्न
४८ ८५६९.२३ / १२.८५ = ६६६.८६६ ३.२ सेकुं द सहाशे सहास्ठ परुंका आठशे सहास्ठ
४९ ५९.२६३ / ९.१२ = ६.४९ २ सेकुं द सहा परुंका एकोणपन्नास
५० १५६३.२५ / २१.२५ = ७३.५३४ २ सेकुं द त्र्याहत्तर परुंका पाचशे चौतीस

56
Performance Evaluation MOS on MATLAB Marathi Speech Talking
CalculatorMOS is calculated for subjective quality measurement. It is calculated for the synthesized speech using
the Unit selection synthesis. It was counseled to the listeners that they have to score between 01 to 05
(Excellent – 05 Very good – 04 Good – 03 Satisfactory – 02 Not understandable-01) for
understandable.
Sentence
1 5 5 5 5 4 4 5 4 5 5
2 5 5 5 5 4 4 5 4 5 5
3 5 5 5 5 4 4 5 4 5 5
4 5 5 5 5 4 4 5 4 5 5
5 5 5 5 5 4 4 5 4 5 5
6 5 5 5 5 4 4 5 4 5 5
7 5 5 5 5 4 4 5 4 5 5
8 5 5 5 5 4 4 5 4 5 5
9 5 5 5 5 4 4 5 4 5 5
10 5 5 5 5 4 4 5 4 5 5
11 5 5 5 5 4 4 5 4 5 5

57
12 5 5 5 5 4 4 5 4 5 5
13 5 5 5 5 4 4 5 4 5 5
14 5 5 5 5 4 4 5 4 5 5
15 5 5 5 5 4 4 5 4 5 5
16 5 5 5 5 4 4 5 4 5 5
17 5 5 5 5 4 4 5 4 5 5
18 5 5 5 5 4 4 5 4 5 5
19 5 5 5 5 4 4 5 4 5 5
20 5 5 5 5 4 4 5 4 5 5
21 5 5 5 5 4 4 5 4 5 5
22 5 5 5 5 4 4 5 4 5 5
23 5 5 5 5 4 4 5 4 5 5
24 5 5 5 5 4 4 5 4 5 5
25 5 5 5 5 4 4 5 4 5 5
26 5 5 5 5 4 4 5 4 5 5
27 5 5 5 5 4 4 5 4 5 5
28 5 5 5 5 4 4 5 4 5 5
29 5 5 5 5 4 4 5 4 5 5
30 5 5 5 5 4 4 5 4 5 5
Cont.

58
31 5 5 5 5 4 4 5 4 5 5
32 5 5 5 5 4 4 5 4 5 5
33 5 5 5 5 4 4 5 4 5 5
34 5 5 5 5 4 4 5 4 5 5
35 5 5 5 5 4 4 5 4 5 5
36 5 5 5 5 4 4 5 4 5 5
37 5 5 5 5 4 4 5 4 5 5
38 5 5 5 5 4 4 5 4 5 5
39 5 5 5 5 4 4 5 4 5 5
40 5 5 5 5 4 4 5 4 5 5
41 5 5 5 5 4 4 5 4 5 5
42 5 5 5 5 4 4 5 4 5 5
43 5 5 5 5 4 4 5 4 5 5
44 5 5 5 5 4 4 5 4 5 5
45 5 5 5 5 4 4 5 4 5 5
46 5 5 5 5 4 4 5 4 5 5
47 5 5 5 5 4 4 5 4 5 5
48 5 5 5 5 4 4 5 4 5 5
49 5 5 5 5 4 4 5 4 5 5
50 5 5 5 5 4 4 5 4 5 5
Cont.

59
The Graph shows that the 95.5% of individuals rated the the quality of synthetic speech
is good and understandable. The only 4.5 % speech is perceptible and not clearly
produced. Thus Unit selection method provides the naturalness and understandability,
the two important parameters of TTS system.
The quality of speech as per the above MOS table for the is as follows:
Graphical Representation : Percentage Evaluation for Synthesized output speech for Android
application
95.5%
4.5%
High quality and Understandable 95.5%
Perceptible Quality speech 4.5%

Comparative Analysis
60
Sr. No Name of Android
Application in Marathi
Utilities
1. Sandesh Pathak Agriculture based
2. A-speak Hindi &Telugu text-to-speech
3 Dhvani Bengali, Gujarati, Hindi, Kannada, Malayalam,
Marathi, Oriya, Panjabi, Tamil, Telugu, Pashto
text to speech system
4 Shruti Marathi text-to-speech
7 janbharti Marathi text-to-speech
8 Android Based Marathi
Speech Calculator
Marathi Talking Calculator
9 Matlab Based Marathi
Calculator
Marathi Talking Calculator

Contribution and Significance
61
 The contribution and significance of this research are:
 Created Marathi Speech Calculator in MATLAB.
 Creation of Marathi speech database and its publication in android Studio
in Google play store.
 The Created database will be useful for young researcher, who want to
start their work in regional language.
 Such assertive application is useful for common masses who communicate
through Marathi .

Limitation of Research
62
 For MATLAB & Android based Marathi Calculator:
 The developed Talking calculator speaks the result only till 10 digit place
value
 It also performs only basic arithmetic operations like Addition,
Subtraction, Multiplication and Division.

Conclusion
It is observed from the literature review that much of efforts
are done on speech synthesis by many of institution like
CDAC,TIFR,CMU,IIT Madras, CEERI,ISI Kolkata, IIIT Hyderabad.
Application developed in TTS for Marathi are sandesh
Pathak, janbharti,etc.
But still the research efforts are needed in terms of natural
ness in Marathi.
Festival Quality of output speech synthesis.
In this we have attempted to design TTS for specific domain
like calculation. 63

Future Scope
64
 As the developed application only works for basic
operators we will try to implement the other operators left.
 We will attempt to go beyond 10 digits.
 We will Develop a system that will read online Newspaper
and books in Marathi

Acknowledgment
68
I would like to thank to my Research advisors, Professors Bharti Gawali for
supporting me during these past four years. Bharti Gawali is someone you will
instantly love and never forget once you meet her. She is the funniest Research
advisor and one of the smartest lady.
I am also very grateful to MLA for his scientific advice and knowledge and
many insightful discussions and suggestions.
I also have to thank the members of my Department faculty’s and Non-Teaching
Staff for their helpful career advice and suggestions in general.

List of Publication
1) Sangramsing Kayte, Kavita Waghmare, Dr. Bharti Gawali “Marathi Speech
Synthesis: A review” International Journal on Recent and Innovation Trends in
Computing and Communication ISSN: 2321-8169 Volume: 3 Issue: 6 3708 – 3711
(Impact Factor 5.83)
2) Sangramsing Kayte, Bharti Gawali "A Text-To-Speech Synthesis for Marathi
Language using Festival and Festvox" International Journal of Computer
Applications (0975 – 8887) Volume 132 – No.3, December 2015
3) Sangramsing Kayte and Bharti Gawali. Article: Analysis of Pitch and Duration in
Speech Synthesis using PSOLA. Communications on Applied Electronics 4(4):10-18,
February 2016. Published by Foundation of Computer Science (FCS), NY, USA.
65

Chapters in the Book
1) Sangramsing N. Kayte, Monica Mundada, Santosh Gaikwad and Bharti Gawali
"Performance Evaluation of Speech Synthesis Techniques for English Language" ©
Springer Science+Business Media Singapore 2016 S.C. Satapathy et al. (eds.),
Proceedings of the International Congress on Information and Communication
Technology, Advances in Intelligent Systems and Computing 439, DOI 10.1007/978-
981-10-0755-2_27
66

References
67
1. Paul Taylor, a text book on “Text to Speech Synthesis”, University of Cambridge,
United Kingdom
2. Sami Lemmetty “Review of Speech Synthesis Technology” M.Tech., Helsinki
University of Technology, Finland, 1999
3. A. Black, P. Taylor, and R. Caley, “The Festival speech synthesis system,”
http://festvox.org/festival, 1999.
4. K. Prahallad, N. K. Elluru, V. Keri, S. Rajendran, and A. W. Black, "The IIIT-H
Indic speech databases", in Proceedings of INTERSPEECH, Portland, Oregon, USA,
2012.
5. A. Black and K. Lenzo, “Building voices in the Festival speech synthesis system,”
http://festvox.org/bsv/, 2000.
References

Websites
68
1. http://tcts.fpms.ac.be/synthesis/introtts_old.html
2. http://www.festvox.org/
3. http://www.cstr.ed.ac.uk/
4. http://en.wikipedia.org/wiki/Speech_synthesis
5. http://hts.sp.nitech.ac.jp/
6. http://festvox.org/11752/packed/
7. http://audacity.sourceforge.net/

Text To Speech Synthesis System For Marathi Language Using Concatenation Technique

Recommended

Recommended

More Related Content

What's hot

What's hot (14)

Similar to Text To Speech Synthesis System For Marathi Language Using Concatenation Technique

Similar to Text To Speech Synthesis System For Marathi Language Using Concatenation Technique (20)

Recently uploaded

Recently uploaded (20)

Text To Speech Synthesis System For Marathi Language Using Concatenation Technique

Editor's Notes