SlideShare a Scribd company logo
1 of 70
Presented by
Mr. Sangramsing Nathusing Kayte
Guided by
Dr.Bharti W. Gawali
Professor & Head
Department of Computer Science & Information Technology,
Dr. Babasaheb Ambedkar Marathwada University, Aurangabad (M.S.) India.
Dr. Babasaheb Ambedkar Marathwada University
Department of Computer Science and Information Technology
Text To Speech Synthesis System For Marathi Language
Using Concatenation Technique
1
Presentation Overview
 Introduction to Speech Processing
 Speech Synthesis
 Techniques of Speech Synthesis
 Survey of Literature
 Objectives of the Research
 Tools Used for Implementation
 Performance Evaluation Methods
 Database Creation
 Comparative Analysis
 Contribution and Significance
 Conclusion
 References
2
Introduction to Speech Processing
 Speech is the most basic form of communication between human
beings.
 Every human being cannot read and write , however, can
communicate using speech.
 Its high influence on the day-to-day life of human being
3
Pictorial
Involves organs like Vocal chords, Nasal cavity , mouth, teeth,
Lips etc.
4
Various speech processing Phases
1. Speech Recognition
2. Speech Synthesis
3. Speech coding
4. Speech compression
We are Focusing on speech synthesis.
5
Speech Synthesis
Speech Synthesis is an artificial production of human speech.
It is the computer-generated simulation of human speech.
Also called as Text to Speech (TTS) system that converts text
into spoken language
प्राचीन भारतीय अर्थव्यवस्र्ा
अततशय सुंदर होती
Text to Speech
System
Text Speech
6
Architecture of a Text-to-Speech System
Text
Text Normalization
Prosodic Prediction
• Duration
• F0 Contour
• Energy
Waveform-Generation
Handling numbers, symbols,
abbreviations etc.
Pre-processing
Linguistic Analysis:
• Part of speech tagging
• Phrase breaks
• Letter to Sound Rules
Speech
• Document Structure Detection
• Conversion from Unicode and fonts
Tagged
Text
Word
sequence
Phone
sequence
7
प्राचीन भारतीय अर्थव्यवस्र्ा
अततशय सुंदर होती
Speech Synthesis Techniques
There are different types of synthesis methods that can be used
when building a TTS synthesis system.
1 Articulatory Synthesis
2. Formant Synthesis
3. Concatenative Speech Synthesis
8
1 Articulatory Synthesis:- Articulatory Synthesis is a method of
synthesizing speech by controlling the speech articulators (e.g. jaw,
tongue, lips, etc.)
2.Formant Synthesis:- Each phone is produced by specifying the
Formants and pitch. A set of rules are also specified to modify pitch
and formants, so that transition from one phone to another phone
is sufficiently smooth
9
3. Concatenative synthesis:- Concatenative synthesis is based
on the concatenation (or stringing together) of segments of pre-
recorded speech. Architecture of Concatenative
synthesis
Input data
Text input
Speech recordings
Unit
segmentation
Unit database
Unit
selection
Concatenation
+ smoothing
i a
sh
+… + + …
10
There Are Three Main Subtypes Of Concatenative
Synthesis:
1.Domain-specific synthesis
2.Di-phone synthesis
3.Unit selection synthesis
1) Domain-specific synthesis:- concatenates pre-recorded
words and phrases to create complete utterances.
2) Di-phone synthesis:- uses a minimal speech database
containing all the Di-phones (sound-to-sound transitions)
occurring in a given language. In di-phone synthesis, only one
example of each di-phone is contained in the speech database. 11
3)Unit selection synthesis:- uses large speech databases (more than
one hour of recorded speech). During database creation, each recorded
utterance is segmented into some or all of the following linguistic constructs
such as phonemes, words phrases and sentences
Architecture of Unit Selection synthesis
12
Unit Selection
• Preprocessing
• Text Normalization
• Linguistic analysis
n - Diaphone units
(large speech
database)
Text Speech
Phones
Survey of Literature
There are various foreign languages in which work has been
done or going on such as English, Japanese, European Portuguese,
Arabic, Polish, Korean, German, Turkish, Mongolian, and Greek.
While focusing towards Indian languages there are total 22
official languages out of which Hindi, Malayalam, Kannada,
Bengali, Oriya, Punjabi, Gujarati, Telugu, and Marathi are being in
focused.
13
Different systems have been developed in these languages such
as Dhvani, Shruti, HP Lab, Vani etc.
There are various institutions working on build a Marathi speech
synthesizer such as IIIT-Hyderabad, CDAC-Mumbai, CDAC-Pune,
and IIT Chennai.
They have built various applications such as e-speak, a-speak, i-
speak, Sandesh Pathak but in Hindi, Telugu and other languages.
Very less work has been done in Marathi.
14
15
International Speech Synthesis Languages scenario
Language Techniques
Arabic (Saudi Arabia) - ar-SA iSpeech SDK
Chinese (China) - zh-CN Festival framework
Chinese (Hong Kong SAR China) - zh-HK Festival framework
Chinese (Taiwan) - zh-TW Festival framework
English (Australia) - en-AU ,
English (Ireland) - en-IE
English (South Africa) - en-ZA
English (United Kingdom) - en-GB
English (United States) - en-US
Festival framework
16
National Speech Synthesis Languages scenario
Language Techniques
Punjabi Festival framework
Gujarati Concatenative synthesis ,Unit selection, festival
Tamil Concatenative synthesis ,Unit selection, HTK
Bangali Concatenative synthesis ,Unit selection
Oriya Concatenative synthesis ,Unit selection
Hindi Concatenative synthesis ,Unit selection
Kannada Hidden Markov Model
Marathi
Continuous Density Hidden Markov Model approach for Marathi
Speech synthesis system based on Marathi affricates.
Telugu Festival framework
17
International Institute, speech Synthesis work is carried out
Sr. No Institute Language
1 Digital Future of technology, New America
U.S. English
U.K. English
Spanish
French
German
2
Machine Intelligence Laboratory
Cambridge University Department of Engineering
U.S. English & U.K. English
3 IBM
Arabic
Chinese
French
German
U.S. English & U.K. English
18
National Institute, where Marathi speech Synthesis work is carried out
Sr. No Institute Language
1 TIFR Mumbai Hindi, Bengali, Marathi, Indian English
2 IIIT Hyderabad Hindi, Telugu, Other Language, Marathi
3 IIT Mumbai Hindi, Marathi
4 CDAC Pune Hindi, Marathi, Indian English, Other Language
5 CDAC Noida Hindi, Panjabi, Marathi, Other Language
6 CIIL Mysore Most of the Indian Languages, Marathi
Objectives of the Research
The major objective of the proposed study is to build a Unit
based synthesis system.
Creation of database for speech synthesis.
Developing the MATLAB and Android APP based on the speech
synthesis system in Marathi.
19
TOOLS Used for Implementation
Festival Framework
MATLAB
Android
20
Festival Framework
The Festival Speech Synthesis System is a general multi-lingual
speech synthesis system.
Developed by Alan W. Black Centre for Speech Technology
Research (CSTR) at the University of Edinburgh.
Festival is designed to support multiple languages, and comes
with support for English (British and American pronunciation),
Welsh, and Spanish.
Voice packages exist for several other languages, such as
Spanish, Finnish, Hindi, Italian, Marathi, Polish, Russian and
21
Festival Speech tools
 festival-2.4
 speech_tools-2.4
 festvox-2.1
 festlex_CMU
 festlex_OALD
 festlex_POSLEX
 festvox_kalloc
22
Commands
 Initially, set environment variables FESTVOXDIR and ESTDIR to
their respective directories. For example as
export FESTVOXDIR=/home/drbhati/unit/festvox
export ESTDIR=/home/drbhati/unit/speech_tools
 Filename for a voice uses three part names i.e a institution
name, a language, and a speaker.
mkdir bamu_mar_sing
 Make it current directory.
Cd bamu_mar_sing
23
 To set prompts for Unit selection process.
$FESTVOXDIR/src/unitsel/setup_clunits bamu mar sing
 To convert the wav files into 16000 HZ sampling frequency and
mono sound executable file
./bin/get_wavs recording/*.wav
 The next stage- To generate waveforms to act as prompts, or
timing cues even if the prompts are not actually played.
festival -b festvox/build_clunits.scm ’(build_prompts_waves
"etc/txt.data")’
Commands Cont..
24
 After recording the recorded files should be in wav/.
./bin/prompt_them etc/txt.data
 Lab files are created for each phone with following command.
./bin/make_labs prompt-wav/*.wav
 After here, the steps are concerned with signal analysis,
specifically pitch marking and cepstral parameter extraction.
 There are a number of methods for pitch mark extraction and a
number of parameters within these files that may need tuning.
festival -b festvox/build_clunits.scm ’(build_utts "etc/txt.data")’
25
Lab files are created for each phone with following command.
./bin/make_labs prompt-wav/*.wav
There are a number of methods for pitch mark extraction and a
number of parameters within these files that may need tuning.
festival -b festvox/build_clunits.scm ’(build_utts
"etc/txt.data")’
 Concerned with signal analysis, specifically pitch marking
and cepstral parameter extraction.
./bin/make_pm_wave wav/*.wav
26
The next stage it find the Mel Frequency Cepstral Coefficents.
./bin/make_mcep wav/*.wav
 Building the cluster unit selection synthesizer consists of a
number of stages all based on the controlling Festival script.
 The parameters of which are described above.
festival -b festvox/build_clunits.scm ’(build_clunits
"etc/txt.data")’
27
The resulting voice is synthesized voice
festival festvox/bamu_mar_bharti_clunits.scm
festival> (voice_bamu_mar_bharti_clunits)
festival> (SayText "kaarand~a aapalayaakad:ei tii padadhata
naahii.")
festival> (SayText "कारण आपल्याकडे ती पद्धत नाही.")
28
DATABASE USED FOR FESTIVAL
The speech data are collected with the help of recording studio
using the single channel.
Parameter Value
Sampling Rate 16000 HZ
Speakers Dependent
Condition of Noise Normal
Accent Marathi
Pre –emphasis 1-0.97z--1
Window type Hamming, 25 milliseconds
Window step size 20 millisecond
Distance of
Microphone
10-12 Meter
Table: Technical specification of Data recording
29
Database Statistics
Number of Utterances: 1000
Utterances: 1
Session: 01
Total size: 1000 sentences
Ubuntu Festival
30
Table for Sentences and label used for generating
synthesized Speech
Sr.No The Original Sentence Original
Speech File
Synthesis
Speech File
1 कारण आपल्याकडे ती पद्धत नाही A001 a001
2 प्राचीन भारतीय अर्थव्यवस्र्ा अततशय सुंदर होती A002 a002
3 कनाथटकात के वळ कन्नड अधधकृ त आहे A003 a003
4 ववककपीडडया हा एक ज्ञानकोश आहे A004 a004
5 कारण तू माझी आई आहेस A005 a005
6 इुंग्रजी भाषा आहे रोमन लिपी आहे A006 a006
7 येर्े `धन म्हणजे शद्ध िक्ष्मी A007 a007
8 सुंदभथ मात्र जमथनी येर्ीि A008 a008
9 शेवटचे र हे अक्षर मात्र कायम A009 a009
10 धिधाडे ही कोणी लशकार करत नाहह A0010 a0010
We tried on ten sentence to analyze the quality of original
and Synthesis
Performance Evaluation Methods
There are many performance evaluation methods.
But to judge the naturalness of the synthesized speech Most
widely used is Mean Opinion Score method
Mean opinion score (MOS) is a test that has been used for
decades in telephony networks to obtain the human user's view
of the quality of the network. The MOS is expressed as a single
number in the range 1 to 5, where 1 is lowest perceived audio
quality, and 5 is the highest perceived audio quality
measurement
31
Performance Evaluation MOS is calculated for subjective quality measurement.
Unit selection synthesis approach. It was counseled to the listeners that they have to score between 01
to 05 (Excellent – 05 Very good – 04 Good – 03 Satisfactory – 02 Not understandable-01) for
understandable.
Table 1. Unit selection speech synthesis of the scores given by each subject for each synthesis system
Subject Sub1 Sub2 Sub3 Sub4 Sub5 Sub6 Sub7 Sub8 Sub9 Sub10
Sentence
1 5 5 5 5 4 4 5 4 4 5
2 5 5 4 5 5 4 5 4 4 5
3 4 4 5 4 3 3 4 2 5 4
4 5 4 4 5 4 4 5 5 5 5
5 5 5 5 5 4 4 5 3 3 5
6 5 4 5 5 5 4 5 4 4 5
7 4 5 4 4 4 4 4 4 4 4
8 4 4 5 4 4 5 4 5 5 4
9 5 3 5 5 3 4 5 3 5 5
10 5 5 4 5 4 4 5 4 4 5
33
Sr. No Original Speech File Synthesized File P.S.N.R M.S.E
1 A001 a001 3.30 7.94
2 A002 a002 6.72 4.57
3 A003 a003 3.21 1.02
4 A004 a004 4.20 3.70
5 A005 a005 2.57 7.61
6 A006 a006 1.26 5.32
7 A007 a007 7.56 8.06
8 A008 a008 1.29 7.20
9 A009 a009 3.24 9.25
10 A0010 a0010 4.08 7.01
Average 3.74 6.17
Quality 96.26 93.83
The PSNR and MSE method was used for subjective quality measure of speech
synthesis based on unit selection approach.
Peak Signal Noise Ratio and Mean Square Error Quality Measure
Synthesis System Design
The Speech Synthesis system is developed in two platform :
a) MATLAB based Marathi Speech Synthesis system
b) Android Based Marathi Speech Synthesis system
34
a. Matlab based speech synthesised calculator
MATLAB (matrix laboratory) is a multi-paradigm numerical
computing environment and fourth-generation programming
language.
GUI based Calculator is developed and concatenation speech
synthesis technique is applied.
35
Prototype of Marathi Speech Talking Calculator Application
36
37
Testing table for the MATLAB Marathi Speech Talking
CalculatorSr.No NO1 Op1 NO2 Op2 Result Response Time Result in Speech
१ २ + ६ = ८ १ सेकुं द आठ
२ ५ + ९ = १४ १ सेकुं द चौदा
३ ५९ + ८९ = १४८ १.६सेकुं द एकशे अठ्ठेचाळीस
४ ६६ + ८४ = १५० १.६ सेकुं द एकशे पन्नास
५ ८७ + ९३ = १८० १.६ सेकुं द एकशेऐुंशी
६ ९६८ + ७६९ = १८३७ १.७ सेकुं द एकहजार आठशे सदतीस
७ २३६७ + ९५६३ = ११९३० २.६सेकुं द अकराहजार नऊशे तीस
८ ५८६ + ३२१ = ९०७ १.६ सेकुं द नऊशे सात
९ ५८२ + ६३९ = १२२१ १.७ सेकुं द एकहजार दोनशे एकवीस
१० ४९३ + ३५७ = ८५० १.६ सेकुं द आठशे पन्नास
११ १२.१५ + ९५.६९ = १०७.८४ २.६ सेकुं द एकशे सात परुंका चौऱ्याऐुंशी
१२ ७५.९ + ५६.७८ = १३२.६८ २.६ सेकुं द एकशे बत्तीस परुंका अडस्ठ
१३ १० - ६ = ४ १ सेकुं द चार
१४ १५ - ९ = ४ १ सेकुं द चार
१५ ८९ - ६३ = २६ १ सेकुं द सव्वीस
38
१६ ७९५ - ५५६ = २३९ १.३ सेकुं द दोनशे एकोणचाळीस
१७ १५६७ - २१५९ = ५९२ १.३ सेकुं द पाचशे ब्याण्णव
१८ ४५९ - ३६७ = ९२ १.३ सेकुं द ब्याण्णव
१९ ९६१५३ - ६५८९ = ८९५६४ २.६ सेकुं द एकोणनव्वद हजार पाचशे चौस्ठ
२० ५६४७८९ - २३६९ = ५६२४२० २.८ सेकुं द पाचिीक बास्ठ हजार चारशे वीस
२१ २३६५ - २५८ = २१०७ १.४ सेकुं द दोन हजार एकशे सात
२२ ८८८ - ६६६ = २२२ १.२ सेकुं द दोनशे बावीस
२३ १५९.३६ - २३.८ = १३५.५६ २.८ सेकुं द एकशे पस्तीस परुंका छप्पन्न
२४ ९५३.१२६ - ६९३.२५ = २५९.८७६ २.८ सेकुं द दोनशे एकोणसाठ परुंका आठशे शहात्तर
२५ २३ * ५९ = १३५७ २ सेकुं द एकहजार तीनशे सत्तावन्न
२६ १३ * ३६ = ३३८ १.४ सेकुं द तीनशे अडती
२७ ९ * ८ = ७२ १.४ सेकुं द बाहत्तर
२८ २६ * १२ = ३१२ १.४ सेकुं द तीनशे बारा
२९ ६६ * ९ = ५९४ १.४ सेकुं द पाचशे चौऱ्याण्णव
३० ३६ * ११ = १०५६ २ सेकुं द एकहजार छप्पन्न
३१ ५३९ * १२ = ६४६८ १.४ सेकुं द सहा हजार चारशे अडस्ठ
३२ २५७८ * ३ = १५४६८ २.८ सेकुं द पुंधरा हजार चारशे अडस्ठ
३३ ६९७५ * १२ = ८३७०० २.४ सेकुं द त्र्याऐुंशी हजार सातशे
39
३४ २२५८ * ५६ = १२६४४८ २.८ सेकुं द एक िीक सव्वीस हजार चारशे अठ्ठेचाळीस
३५ ५२३.६९ * १२.३ = ६४४१.३८७ ३. २ सेकुं द सहा हजार चारशे एक्के चाळीस परुंका तीनशे
सत्त्याऐुंशी
३६ ८५९६.३ * ६.३ = ५४१५६.६९ ४ सेकुं द चोपन्न हजार एकशे छप्पन्न परुंका एकोणसत्तर
३७ ५४ / ४ = १३.५ २ सेकुं द तेरा परुंका पाच
३८ ३३० / ८ = ४१.२५ २ सेकुं द एक्के चाळीसएक्काकॅ िेस परुंका पुंचवीस
३९ ७५० / ६ = १२५ १.४ सेकुं द एकशे पुंचवीस
४० ८५० / ६ = १४१.६६ ३ सेकुं द एकशे एक्के चाळीस परुंका सहास्ठ
४१ ६५० / १० = ६५ १ सेकुं द पास्ठ
४२ ६३ / ५ = १२.६ २ सेकुं द बारा परुंका सहा
४३ ३३० / १५ = २२ १ सेकुं द बावीस
४४ ५९६ / ७ = ८५.१४ २ सेकुं द पुंच्याऐुंशी परुंका चौदा
४५ २३५८९ / १० = २३५८.९ २.४ सेकुं द दोन हजार तीनशे अठ्ठावन्न परुंका नऊ
४६ ९६३२ / ८ = १२०४ १.६ सेकुं द एक हजार दोनशे चार
४७ ५६३.६ / २.५६ = २२०.१५६ ३.४ सेकुं द दोनशे वीस परुंका एकशे छप्पन्न
४८ ८५६९.२३ / १२.८५ = ६६६.८६६ ३.२ सेकुं द सहाशे सहास्ठ परुंका आठशे सहास्ठ
४९ ५९.२६३ / ९.१२ = ६.४९ २ सेकुं द सहा परुंका एकोणपन्नास
५० १५६३.२५ / २१.२५ = ७३.५३४ २ सेकुं द त्र्याहत्तर परुंका पाचशे चौतीस
40
Performance Evaluation MOS on MATLAB based Marathi Speech Talking
CalculatorMOS is calculated for subjective quality measurement. It is calculated for the synthesized
speech using the Unit selection synthesis. It was counseled to the listeners that they have
to score between 01 to 05 (Excellent – 05 Very good – 04 Good – 03 Satisfactory – 02
Not understandable-01) for understandable.
Subject Sub1 Sub2 Sub3 Sub4 Sub5 Sub6 Sub7 Sub8 Sub9 Sub10
Sentence
1 5 5 5 5 4 4 5 4 5 5
2 5 5 5 5 5 4 5 4 5 5
3 5 5 5 5 5 4 5 4 5 5
4 5 5 5 5 5 5 5 4 5 5
5 5 5 5 5 5 5 5 5 5 5
6 5 5 5 5 5 5 5 5 5 5
7 5 5 5 5 4 5 5 5 5 5
8 5 5 5 5 4 4 5 5 5 5
9 5 5 5 5 4 4 5 5 5 5
10 5 5 5 5 4 4 5 4 5 5
11 5 5 5 5 4 4 5 4 5 5
41
12 5 5 5 5 4 4 5 4 5 5
13 5 5 5 5 4 4 5 4 5 5
14 5 5 5 5 4 4 5 4 5 5
15 5 5 5 5 4 4 5 4 5 5
16 5 5 5 5 4 4 5 4 5 5
17 5 5 5 5 4 4 5 4 5 5
18 5 5 5 5 4 4 5 4 5 5
19 5 5 5 5 4 4 5 4 5 5
20 5 5 5 5 4 4 5 4 5 5
21 5 5 5 5 4 4 5 4 5 5
22 5 5 5 5 4 4 5 4 5 5
23 5 5 5 5 4 4 5 4 5 5
24 5 5 5 5 4 4 5 4 5 5
25 5 5 5 5 4 4 5 4 5 5
26 5 5 5 5 4 4 5 4 5 5
27 5 5 5 5 4 4 5 4 5 5
28 5 5 5 5 4 4 5 4 5 5
29 5 5 5 5 4 4 5 4 5 5
42
30 5 5 5 5 4 4 5 4 5 5
31 5 5 5 5 4 4 5 4 5 5
32 5 5 5 5 4 4 5 4 5 5
33 5 5 5 5 4 4 5 4 5 5
34 5 5 5 5 4 4 5 4 5 5
35 5 5 5 5 4 4 5 4 5 5
36 5 5 5 5 4 4 5 4 5 5
37 5 5 5 5 4 4 5 4 5 5
38 5 5 5 5 4 4 5 4 5 5
39 5 5 5 5 4 4 5 4 5 5
40 5 5 5 5 4 4 5 4 5 5
41 5 5 5 5 4 4 5 4 5 5
42 5 5 5 5 4 4 5 4 5 5
43 5 5 5 5 4 4 5 4 5 5
44 5 5 5 5 4 4 5 4 5 5
45 5 5 5 5 4 4 5 4 5 5
46 5 5 5 5 4 4 5 4 5 5
47 5 5 5 5 4 4 5 4 5 5
48 5 5 5 5 4 4 5 4 5 5
49 5 5 5 5 4 4 5 4 5 5
43
The table shows that the 85% of individuals rated the quality of synthetic speech is good and
understandable. The only 15 % speech is perceptible and not clearly produced. Thus Unit selection
method provides the naturalness and understandability, the two important parameters of TTS
system.
The quality of speech as per the above MOS table is as follows:
Graphical Representation :- Percentage Evaluation for Synthesized output speech for
MATLAB application
85%
15%
High quality and Understandable 85%
Perceptible Quality speech 15%
b. Android based speech synthesized
Android is a mobile operating system developed by Google, based
on the Linux kernel and designed primarily for touchscreen mobile
devices such as smartphones and tablets.
Android's user interface is mainly based on direct manipulation,
using touch gestures that loosely correspond to real-world actions,
such as swiping, tapping and pinching, to manipulate on-screen
objects, along with a virtual keyboard for text input. In addition to
touchscreen devices, Google has further developed Android TV for
televisions, Android Auto for cars, and Android Wear for wrist
watches, each with a specialized user interface. Variants of Android
are also used on notebooks, game consoles, digital cameras, and
other electronics.
44
45
Android Applications in International scenario
Sr. No Name of Application Language
1 Google Text to Speech English
2 Easy Text speech English
3 IVONA Text-to-Speech for iOS English,Welish , Polish
4 Voice Dream Reader English
5 Text to Speech for iOS English
6 Type and speak English
7 eReader Prestigo:Book Reader 25 Languages
8 Announcify English
9 Select and Speak English
10 SpeakIt English
11 Dictanots English
12 Voice Recognition English
13 Text to Speech Reader English
46
Android Applications in National scenario
Sr.No Name of Application Language
1 aSpeak Telugu, Hindi
2 Sandesh Pathak Telugu ,Hindi, Marathi, Tamil and Gujarati
3 Shruti Hindi and Bengali
4 HP labs Hindi
5 Vani Hindi
6 Dhvani
Hindi, Malayalam, Kannada, Bengali, Oriya, Punjabi,
Telegu, Marathi
DatabaseCreation MATLAB & Android
The total number of words with probability 121 , utterance and the data was collected
in 1 sessions so, the overall 121/- vocabulary size are collected for the database.
Table : Design Speech database for MTC Application
Number Marathi Pronunciation Number Marathi Pronunciation
१ एक 11 अकरा
2 दोन 12 बारा
३ तीन 13 तेरा
४ चार 14 चौदा
५ पाच 15 पुंधरा
६ सहा 16 सोळा
७ सात 17 सतरा
८ आठ 18 अठरा
९ नउ 19 एकोणीस
१० दहा 20 वीस
47
48
Number Marathi
Pronunciation
Number Marathi
Pronunciation
Number Marathi Pronunciation
21 एकवीस 35 पस्तीस 49 एकोणपन्नास
22 बावीस 36 छत्तीस 50 पन्नास
23 तेवीस 37 सदतीस 51 एक्कावन्न
24 चोवीस 38 अडतीस 52 बावन्न
25 पुंचवीस 39 एकोणचाळीस 53 त्रेपन्न
26 सव्वीस 40 चाळीस 54 चोपन्न
27 सत्तावीस 41 एक्के चाळीस 55 पुंचावन्न
28 अठ्ठावीस 42 बेचाळीस 56 छप्पन्न
29 एकोणतीस 43 त्रेचाळीस 57 सत्तावन्न
30 तीस 44 चव्वेचाळीस 58 अठ्ठावन्न
31 एकतीस 45 पुंचेचाळीस 59 एकोणसाठ
32 बत्तीस 46 सेहेचाळीस 60 साठ
33 तेहेतीस 47 सत्तेचाळीस 61 एकस्ठ
34 चौतीस 48 अठ्ठेचाळीस 62 बास्ठ
Cont..
49
Number Marathi
Pronunciation
Number Marathi
Pronunciation
Number Marathi
Pronunciation
63 त्रेस्ठ 78 अठ्ठ्याहत्तर 93 त्र्याण्णव
64 चौस्ठ 79 एकोण ऐुंशी 94 चौऱ्याण्णव
65 पास्ठ 80 ऐुंशी 95 पुंच्याण्णव
66 सहास्ठ 81 एक्क्याऐुंशी 96 शहाण्णव
67 सदस्ठ 82 ब्याऐुंशी 97 सत्त्याण्णव
68 अडस्ठ 83 त्र्याऐुंशी 98 अठ्ठ्याण्णव
69 एकोणसत्तर 84 चौऱ्याऐुंशी 99 नव्व्याण्णव
70 सत्तर 85 पुंच्याऐुंशी 100 शुंभर
71 एक्काहत्तर 86 शहाऐुंशी १०१ एकशे एक
72 बाहत्तर 87 सत्त्याऐुंशी १००० हजार
73 त्र्याहत्तर 88 अठ्ठ्याऐुंशी १०,००० दहा हजार
74 चौर्याहत्तर 89 एकोणनव्वद १,००,००० िाख
75 पुंच्याहत्तर 90 नव्वद १०,००,००० दहा िाख
76 शहात्तर 91 एक्क्याण्णव १,००,००,००० कोटी
77 सत्याहत्तर 92 ब्याण्णव १००,००,००,००० शुंभर कोटी
Cont..
Table :Operation (Action Taken) Vocabulary for MTC Application
Sr. No Operation Pronunciation
1 + बेरीज
2 - वजाबाकी
3 * गुणाकार
4 / भागाकार
5 . दशाांश
6 = बरोबर
Cont..
50
ACQUISITION SETUP
The speech data are collected with the help of Real-tech microphone and
CSL is using the single channel.
Parameter Value
Sampling Rate 16 000 HZ
Speakers Dependent
Condition of Noise Normal
Accent Marathi
Pre –emphasis 1-0.97z--1
Window type Hamming, 25 milliseconds
Window step size 20 millisecond
Distance of Microphone 10-12 Meter
Table: Technical specification of Data recording
MATLAB & Android
Database Statistics
Number of Utterances: 121
Utterances: 1
Session: 01
Total size: 121 words
51
b) Android Based Marathi Speech Talking Calculator :
52
Prototype of Marathi Speech Talking Calculator Application
Testing table for the Android Marathi Speech Talking
Calculator
53
Sr_No NO1 Op1 NO2 Op2 Result Response
Time
Result in Speech
१ २ + ६ = ८ १ सेकुं द आठ
२ ५ + ९ = १४ १ सेकुं द चौदा
३ ५९ + ८९ = १४८ १.६सेकुं द एकशे अठ्ठेचाळीस
४ ६६ + ८४ = १५० १.६ सेकुं द एकशे पन्नास
५ ८७ + ९३ = १८० १.६ सेकुं द एकशेऐुंशी
६ ९६८ + ७६९ = १८३७ १.७ सेकुं द एकहजार आठशे सदतीस
७ २३६७ + ९५६३ = ११९३० ३ सेकुं द अकराहजार नऊशे तीस
८ ५८६ + ३२१ = ९०७ १.६ सेकुं द नऊशे सात
९ ५८२ + ६३९ = १२२१ १.७ सेकुं द एकहजार दोनशे एकवीस
१० ४९३ + ३५७ = ८५० १.६ सेकुं द आठशे पन्नास
११ १२.१५ + ९५.६९ = १०७.८४ २.३ सेकुं द एकशे सात परुंका चौऱ्याऐुंशी
१२ ७५.९ + ५६.७८ = १३२.६८ ३.२ सेकुं द एकशे बत्तीस परुंका अडस्ठ
१३ १० - ६ = ४ १ सेकुं द चार
१४ १५ - ९ = ४ १ सेकुं द चार
१५ ८९ - ६३ = २६ १ सेकुं द सव्वीस
54
१६ ७९५ - ५५६ = २३९ १.३ सेकुं द दोनशे एकोणचाळीस
१७ १५६७ - २१५९ = ५९२ १.३ सेकुं द पाचशे ब्याण्णव
१८ ४५९ - ३६७ = ९२ १.३ सेकुं द ब्याण्णव
१९ ९६१५३ - ६५८९ = ८९५६४ ३ सेकुं द एकोणनव्वद हजार पाचशे चौस्ठ
२० ५६४७८९ - २३६९ = ५६२४२० ३.८ सेकुं द पाचिीक बास्ठ हजार चारशे वीस
२१ २३६५ - २५८ = २१०७ १.४ सेकुं द दोन हजार एकशे सात
२२ ८८८ - ६६६ = २२२ १.२ सेकुं द दोनशे बावीस
२३ १५९.३६ - २३.८ = १३५.५६ २ सेकुं द एकशे पस्तीस परुंका छप्पन्न
२४ ९५३.१२६ - ६९३.२५ = २५९.८७६ ३.२ सेकुं द दोनशे एकोणसाठ परुंका आठशे शहात्तर
२५ २३ * ५९ = १३५७ १.४ सेकुं द एकहजार तीनशे सत्तावन्न
२६ १३ * ३६ = ३३८ १.४ सेकुं द तीनशे अडती
२७ ९ * ८ = ७२ १.४ सेकुं द बाहत्तर
२८ २६ * १२ = ३१२ १.४ सेकुं द तीनशे बारा
२९ ६६ * ९ = ५९४ १.४ सेकुं द पाचशे चौऱ्याण्णव
३० ३६ * ११ = १०५६ १.४ सेकुं द एकहजार छप्पन्न
३१ ५३९ * १२ = ६४६८ १.४ सेकुं द सहा हजार चारशे अडस्ठ
३२ २५७८ * ३ = १५४६८ ३.४ सेकुं द पुंधरा हजार चारशे अडस्ठ
३३ ६९७५ * १२ = ८३७०० ३.६ सेकुं द त्र्याऐुंशी हजार सातशे
Cont.
55
३४ २२५८ * ५६ = १२६४४८ ३.८ सेकुं द एक िीक सव्वीस हजार चारशे
अठ्ठेचाळीस
३५ ५२३.६९ * १२.३ = ६४४१.३८७ ३.३ सेकुं द सहा हजार चारशे एक्के चाळीस परुंका
तीनशे सत्त्याऐुंशी
३६ ८५९६.३ * ६.३ = ५४१५६.६९ ३ सेकुं द चोपन्न हजार एकशे छप्पन्न परुंका
एकोणसत्तर
३७ ५४ / ४ = १३.५ २ सेकुं द तेरा परुंका पाच
३८ ३३० / ८ = ४१.२५ २ सेकुं द एक्के चाळीसएक्काकॅ िेस परुंका पुंचवीस
३९ ७५० / ६ = १२५ १.४ सेकुं द एकशे पुंचवीस
४० ८५० / ६ = १४१.६६ ३ सेकुं द एकशे एक्के चाळीस परुंका सहास्ठ
४१ ६५० / १० = ६५ १ सेकुं द पास्ठ
४२ ६३ / ५ = १२.६ २ सेकुं द बारा परुंका सहा
४३ ३३० / १५ = २२ १ सेकुं द बावीस
४४ ५९६ / ७ = ८५.१४ २ सेकुं द पुंच्याऐुंशी परुंका चौदा
४५ २३५८९ / १० = २३५८.९ ३.२ सेकुं द दोन हजार तीनशे अठ्ठावन्न परुंका नऊ
४६ ९६३२ / ८ = १२०४ १.६ सेकुं द एक हजार दोनशे चार
४७ ५६३.६ / २.५६ = २२०.१५६ ३.२ सेकुं द दोनशे वीस परुंका एकशे छप्पन्न
४८ ८५६९.२३ / १२.८५ = ६६६.८६६ ३.२ सेकुं द सहाशे सहास्ठ परुंका आठशे सहास्ठ
४९ ५९.२६३ / ९.१२ = ६.४९ २ सेकुं द सहा परुंका एकोणपन्नास
५० १५६३.२५ / २१.२५ = ७३.५३४ २ सेकुं द त्र्याहत्तर परुंका पाचशे चौतीस
56
Performance Evaluation MOS on MATLAB Marathi Speech Talking
CalculatorMOS is calculated for subjective quality measurement. It is calculated for the synthesized speech using
the Unit selection synthesis. It was counseled to the listeners that they have to score between 01 to 05
(Excellent – 05 Very good – 04 Good – 03 Satisfactory – 02 Not understandable-01) for
understandable.
Subject Sub1 Sub2 Sub3 Sub4 Sub5 Sub6 Sub7 Sub8 Sub9 Sub10
Sentence
1 5 5 5 5 4 4 5 4 5 5
2 5 5 5 5 4 4 5 4 5 5
3 5 5 5 5 4 4 5 4 5 5
4 5 5 5 5 4 4 5 4 5 5
5 5 5 5 5 4 4 5 4 5 5
6 5 5 5 5 4 4 5 4 5 5
7 5 5 5 5 4 4 5 4 5 5
8 5 5 5 5 4 4 5 4 5 5
9 5 5 5 5 4 4 5 4 5 5
10 5 5 5 5 4 4 5 4 5 5
11 5 5 5 5 4 4 5 4 5 5
57
12 5 5 5 5 4 4 5 4 5 5
13 5 5 5 5 4 4 5 4 5 5
14 5 5 5 5 4 4 5 4 5 5
15 5 5 5 5 4 4 5 4 5 5
16 5 5 5 5 4 4 5 4 5 5
17 5 5 5 5 4 4 5 4 5 5
18 5 5 5 5 4 4 5 4 5 5
19 5 5 5 5 4 4 5 4 5 5
20 5 5 5 5 4 4 5 4 5 5
21 5 5 5 5 4 4 5 4 5 5
22 5 5 5 5 4 4 5 4 5 5
23 5 5 5 5 4 4 5 4 5 5
24 5 5 5 5 4 4 5 4 5 5
25 5 5 5 5 4 4 5 4 5 5
26 5 5 5 5 4 4 5 4 5 5
27 5 5 5 5 4 4 5 4 5 5
28 5 5 5 5 4 4 5 4 5 5
29 5 5 5 5 4 4 5 4 5 5
30 5 5 5 5 4 4 5 4 5 5
Cont.
58
31 5 5 5 5 4 4 5 4 5 5
32 5 5 5 5 4 4 5 4 5 5
33 5 5 5 5 4 4 5 4 5 5
34 5 5 5 5 4 4 5 4 5 5
35 5 5 5 5 4 4 5 4 5 5
36 5 5 5 5 4 4 5 4 5 5
37 5 5 5 5 4 4 5 4 5 5
38 5 5 5 5 4 4 5 4 5 5
39 5 5 5 5 4 4 5 4 5 5
40 5 5 5 5 4 4 5 4 5 5
41 5 5 5 5 4 4 5 4 5 5
42 5 5 5 5 4 4 5 4 5 5
43 5 5 5 5 4 4 5 4 5 5
44 5 5 5 5 4 4 5 4 5 5
45 5 5 5 5 4 4 5 4 5 5
46 5 5 5 5 4 4 5 4 5 5
47 5 5 5 5 4 4 5 4 5 5
48 5 5 5 5 4 4 5 4 5 5
49 5 5 5 5 4 4 5 4 5 5
50 5 5 5 5 4 4 5 4 5 5
Cont.
59
The Graph shows that the 95.5% of individuals rated the the quality of synthetic speech
is good and understandable. The only 4.5 % speech is perceptible and not clearly
produced. Thus Unit selection method provides the naturalness and understandability,
the two important parameters of TTS system.
The quality of speech as per the above MOS table for the is as follows:
Graphical Representation : Percentage Evaluation for Synthesized output speech for Android
application
95.5%
4.5%
High quality and Understandable 95.5%
Perceptible Quality speech 4.5%
Comparative Analysis
60
Sr. No Name of Android
Application in Marathi
Utilities
1. Sandesh Pathak Agriculture based
2. A-speak Hindi &Telugu text-to-speech
3 Dhvani Bengali, Gujarati, Hindi, Kannada, Malayalam,
Marathi, Oriya, Panjabi, Tamil, Telugu, Pashto
text to speech system
4 Shruti Marathi text-to-speech
7 janbharti Marathi text-to-speech
8 Android Based Marathi
Speech Calculator
Marathi Talking Calculator
9 Matlab Based Marathi
Calculator
Marathi Talking Calculator
Contribution and Significance
61
 The contribution and significance of this research are:
 Created Marathi Speech Calculator in MATLAB.
 Creation of Marathi speech database and its publication in android Studio
in Google play store.
 The Created database will be useful for young researcher, who want to
start their work in regional language.
 Such assertive application is useful for common masses who communicate
through Marathi .
Limitation of Research
62
 For MATLAB & Android based Marathi Calculator:
 The developed Talking calculator speaks the result only till 10 digit place
value
 It also performs only basic arithmetic operations like Addition,
Subtraction, Multiplication and Division.
Conclusion
It is observed from the literature review that much of efforts
are done on speech synthesis by many of institution like
CDAC,TIFR,CMU,IIT Madras, CEERI,ISI Kolkata, IIIT Hyderabad.
Application developed in TTS for Marathi are sandesh
Pathak, janbharti,etc.
But still the research efforts are needed in terms of natural
ness in Marathi.
Festival Quality of output speech synthesis.
In this we have attempted to design TTS for specific domain
like calculation. 63
Future Scope
64
 As the developed application only works for basic
operators we will try to implement the other operators left.
 We will attempt to go beyond 10 digits.
 We will Develop a system that will read online Newspaper
and books in Marathi
Acknowledgment
68
I would like to thank to my Research advisors, Professors Bharti Gawali for
supporting me during these past four years. Bharti Gawali is someone you will
instantly love and never forget once you meet her. She is the funniest Research
advisor and one of the smartest lady.
I am also very grateful to MLA for his scientific advice and knowledge and
many insightful discussions and suggestions.
I also have to thank the members of my Department faculty’s and Non-Teaching
Staff for their helpful career advice and suggestions in general.
List of Publication
1) Sangramsing Kayte, Kavita Waghmare, Dr. Bharti Gawali “Marathi Speech
Synthesis: A review” International Journal on Recent and Innovation Trends in
Computing and Communication ISSN: 2321-8169 Volume: 3 Issue: 6 3708 – 3711
(Impact Factor 5.83)
2) Sangramsing Kayte, Bharti Gawali "A Text-To-Speech Synthesis for Marathi
Language using Festival and Festvox" International Journal of Computer
Applications (0975 – 8887) Volume 132 – No.3, December 2015
3) Sangramsing Kayte and Bharti Gawali. Article: Analysis of Pitch and Duration in
Speech Synthesis using PSOLA. Communications on Applied Electronics 4(4):10-18,
February 2016. Published by Foundation of Computer Science (FCS), NY, USA.
65
Chapters in the Book
1) Sangramsing N. Kayte, Monica Mundada, Santosh Gaikwad and Bharti Gawali
"Performance Evaluation of Speech Synthesis Techniques for English Language" ©
Springer Science+Business Media Singapore 2016 S.C. Satapathy et al. (eds.),
Proceedings of the International Congress on Information and Communication
Technology, Advances in Intelligent Systems and Computing 439, DOI 10.1007/978-
981-10-0755-2_27
66
References
67
1. Paul Taylor, a text book on “Text to Speech Synthesis”, University of Cambridge,
United Kingdom
2. Sami Lemmetty “Review of Speech Synthesis Technology” M.Tech., Helsinki
University of Technology, Finland, 1999
3. A. Black, P. Taylor, and R. Caley, “The Festival speech synthesis system,”
http://festvox.org/festival, 1999.
4. K. Prahallad, N. K. Elluru, V. Keri, S. Rajendran, and A. W. Black, "The IIIT-H
Indic speech databases", in Proceedings of INTERSPEECH, Portland, Oregon, USA,
2012.
5. A. Black and K. Lenzo, “Building voices in the Festival speech synthesis system,”
http://festvox.org/bsv/, 2000.
References
Websites
68
1. http://tcts.fpms.ac.be/synthesis/introtts_old.html
2. http://www.festvox.org/
3. http://www.cstr.ed.ac.uk/
4. http://en.wikipedia.org/wiki/Speech_synthesis
5. http://hts.sp.nitech.ac.jp/
6. http://festvox.org/11752/packed/
7. http://audacity.sourceforge.net/
70
Thank You

More Related Content

What's hot

Efficient Speech Emotion Recognition using SVM and Decision Trees
Efficient Speech Emotion Recognition using SVM and Decision TreesEfficient Speech Emotion Recognition using SVM and Decision Trees
Efficient Speech Emotion Recognition using SVM and Decision TreesIRJET Journal
 
Development of text to speech system for yoruba language
Development of text to speech system for yoruba languageDevelopment of text to speech system for yoruba language
Development of text to speech system for yoruba languageAlexander Decker
 
A Marathi Hidden-Markov Model Based Speech Synthesis System
A Marathi Hidden-Markov Model Based Speech Synthesis SystemA Marathi Hidden-Markov Model Based Speech Synthesis System
A Marathi Hidden-Markov Model Based Speech Synthesis Systemiosrjce
 
Hps a hierarchical persian stemming method
Hps a hierarchical persian stemming methodHps a hierarchical persian stemming method
Hps a hierarchical persian stemming methodijnlc
 
PART OF SPEECH TAGGING OFMARATHI TEXT USING TRIGRAMMETHOD
PART OF SPEECH TAGGING OFMARATHI TEXT USING TRIGRAMMETHODPART OF SPEECH TAGGING OFMARATHI TEXT USING TRIGRAMMETHOD
PART OF SPEECH TAGGING OFMARATHI TEXT USING TRIGRAMMETHODijait
 
Lec 15,16,17 NLP.machine translation
Lec 15,16,17  NLP.machine translationLec 15,16,17  NLP.machine translation
Lec 15,16,17 NLP.machine translationguest873a50
 
SAP (SPEECH AND AUDIO PROCESSING)
SAP (SPEECH AND AUDIO PROCESSING)SAP (SPEECH AND AUDIO PROCESSING)
SAP (SPEECH AND AUDIO PROCESSING)dineshkatta4
 
Onward presentation.en
Onward presentation.enOnward presentation.en
Onward presentation.enClarkTony
 
K AMBA P ART O F S PEECH T AGGER U SING M EMORY B ASED A PPROACH
K AMBA  P ART  O F  S PEECH  T AGGER  U SING  M EMORY  B ASED  A PPROACHK AMBA  P ART  O F  S PEECH  T AGGER  U SING  M EMORY  B ASED  A PPROACH
K AMBA P ART O F S PEECH T AGGER U SING M EMORY B ASED A PPROACHijnlc
 
B tech project_report
B tech project_reportB tech project_report
B tech project_reportabhiuaikey
 
PERFORMANCE ANALYSIS OF DIFFERENT ACOUSTIC FEATURES BASED ON LSTM FOR BANGLA ...
PERFORMANCE ANALYSIS OF DIFFERENT ACOUSTIC FEATURES BASED ON LSTM FOR BANGLA ...PERFORMANCE ANALYSIS OF DIFFERENT ACOUSTIC FEATURES BASED ON LSTM FOR BANGLA ...
PERFORMANCE ANALYSIS OF DIFFERENT ACOUSTIC FEATURES BASED ON LSTM FOR BANGLA ...ijma
 
Parts of Speect Tagging
Parts of Speect TaggingParts of Speect Tagging
Parts of Speect Taggingtheyaseen51
 
On Developing an Automatic Speech Recognition System for Commonly used Englis...
On Developing an Automatic Speech Recognition System for Commonly used Englis...On Developing an Automatic Speech Recognition System for Commonly used Englis...
On Developing an Automatic Speech Recognition System for Commonly used Englis...rahulmonikasharma
 

What's hot (14)

Efficient Speech Emotion Recognition using SVM and Decision Trees
Efficient Speech Emotion Recognition using SVM and Decision TreesEfficient Speech Emotion Recognition using SVM and Decision Trees
Efficient Speech Emotion Recognition using SVM and Decision Trees
 
Development of text to speech system for yoruba language
Development of text to speech system for yoruba languageDevelopment of text to speech system for yoruba language
Development of text to speech system for yoruba language
 
A Marathi Hidden-Markov Model Based Speech Synthesis System
A Marathi Hidden-Markov Model Based Speech Synthesis SystemA Marathi Hidden-Markov Model Based Speech Synthesis System
A Marathi Hidden-Markov Model Based Speech Synthesis System
 
Hps a hierarchical persian stemming method
Hps a hierarchical persian stemming methodHps a hierarchical persian stemming method
Hps a hierarchical persian stemming method
 
PART OF SPEECH TAGGING OFMARATHI TEXT USING TRIGRAMMETHOD
PART OF SPEECH TAGGING OFMARATHI TEXT USING TRIGRAMMETHODPART OF SPEECH TAGGING OFMARATHI TEXT USING TRIGRAMMETHOD
PART OF SPEECH TAGGING OFMARATHI TEXT USING TRIGRAMMETHOD
 
Lec 15,16,17 NLP.machine translation
Lec 15,16,17  NLP.machine translationLec 15,16,17  NLP.machine translation
Lec 15,16,17 NLP.machine translation
 
SAP (SPEECH AND AUDIO PROCESSING)
SAP (SPEECH AND AUDIO PROCESSING)SAP (SPEECH AND AUDIO PROCESSING)
SAP (SPEECH AND AUDIO PROCESSING)
 
Onward presentation.en
Onward presentation.enOnward presentation.en
Onward presentation.en
 
K AMBA P ART O F S PEECH T AGGER U SING M EMORY B ASED A PPROACH
K AMBA  P ART  O F  S PEECH  T AGGER  U SING  M EMORY  B ASED  A PPROACHK AMBA  P ART  O F  S PEECH  T AGGER  U SING  M EMORY  B ASED  A PPROACH
K AMBA P ART O F S PEECH T AGGER U SING M EMORY B ASED A PPROACH
 
B tech project_report
B tech project_reportB tech project_report
B tech project_report
 
NLP
NLPNLP
NLP
 
PERFORMANCE ANALYSIS OF DIFFERENT ACOUSTIC FEATURES BASED ON LSTM FOR BANGLA ...
PERFORMANCE ANALYSIS OF DIFFERENT ACOUSTIC FEATURES BASED ON LSTM FOR BANGLA ...PERFORMANCE ANALYSIS OF DIFFERENT ACOUSTIC FEATURES BASED ON LSTM FOR BANGLA ...
PERFORMANCE ANALYSIS OF DIFFERENT ACOUSTIC FEATURES BASED ON LSTM FOR BANGLA ...
 
Parts of Speect Tagging
Parts of Speect TaggingParts of Speect Tagging
Parts of Speect Tagging
 
On Developing an Automatic Speech Recognition System for Commonly used Englis...
On Developing an Automatic Speech Recognition System for Commonly used Englis...On Developing an Automatic Speech Recognition System for Commonly used Englis...
On Developing an Automatic Speech Recognition System for Commonly used Englis...
 

Similar to Text To Speech Synthesis System For Marathi Language Using Concatenation Technique

A Review on Speech Corpus Development for Automatic Speech Recognition in Ind...
A Review on Speech Corpus Development for Automatic Speech Recognition in Ind...A Review on Speech Corpus Development for Automatic Speech Recognition in Ind...
A Review on Speech Corpus Development for Automatic Speech Recognition in Ind...Eswar Publications
 
Implementation of Marathi Language Speech Databases for Large Dictionary
Implementation of Marathi Language Speech Databases for Large DictionaryImplementation of Marathi Language Speech Databases for Large Dictionary
Implementation of Marathi Language Speech Databases for Large Dictionaryiosrjce
 
DEVELOPMENT OF PHONEME DOMINATED DATABASE FOR LIMITED DOMAIN T-T-S IN HINDI
DEVELOPMENT OF PHONEME DOMINATED DATABASE FOR LIMITED DOMAIN T-T-S IN HINDIDEVELOPMENT OF PHONEME DOMINATED DATABASE FOR LIMITED DOMAIN T-T-S IN HINDI
DEVELOPMENT OF PHONEME DOMINATED DATABASE FOR LIMITED DOMAIN T-T-S IN HINDIijaia
 
Approach of Syllable Based Unit Selection Text- To-Speech Synthesis System fo...
Approach of Syllable Based Unit Selection Text- To-Speech Synthesis System fo...Approach of Syllable Based Unit Selection Text- To-Speech Synthesis System fo...
Approach of Syllable Based Unit Selection Text- To-Speech Synthesis System fo...iosrjce
 
MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK
 MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK
MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORKijitcs
 
ELKL 5 Language documentation for linguistics and technology
ELKL 5 Language documentation for linguistics and technologyELKL 5 Language documentation for linguistics and technology
ELKL 5 Language documentation for linguistics and technologyDafydd Gibbon
 
A Context-based Numeral Reading Technique for Text to Speech Systems
A Context-based Numeral Reading Technique for Text to Speech Systems A Context-based Numeral Reading Technique for Text to Speech Systems
A Context-based Numeral Reading Technique for Text to Speech Systems IJECEIAES
 
A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES
A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES
A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES ijnlc
 
Paper on Speech Recognition
Paper on Speech RecognitionPaper on Speech Recognition
Paper on Speech RecognitionThejus Joby
 
Contextual Analysis for Middle Eastern Languages with Hidden Markov Models
Contextual Analysis for Middle Eastern Languages with Hidden Markov ModelsContextual Analysis for Middle Eastern Languages with Hidden Markov Models
Contextual Analysis for Middle Eastern Languages with Hidden Markov Modelsijnlc
 
Marathi Text-To-Speech Synthesis using Natural Language Processing
Marathi Text-To-Speech Synthesis using Natural Language ProcessingMarathi Text-To-Speech Synthesis using Natural Language Processing
Marathi Text-To-Speech Synthesis using Natural Language Processingiosrjce
 
An optimized approach to voice translation on mobile phones
An optimized approach to voice translation on mobile phonesAn optimized approach to voice translation on mobile phones
An optimized approach to voice translation on mobile phoneseSAT Journals
 
An optimized approach to voice translation on mobile phones
An optimized approach to voice translation on mobile phonesAn optimized approach to voice translation on mobile phones
An optimized approach to voice translation on mobile phoneseSAT Publishing House
 
A decision tree based word sense disambiguation system in manipuri language
A decision tree based word sense disambiguation system in manipuri languageA decision tree based word sense disambiguation system in manipuri language
A decision tree based word sense disambiguation system in manipuri languageacijjournal
 
AMHARIC TEXT TO SPEECH SYNTHESIS FOR SYSTEM DEVELOPMENT
AMHARIC TEXT TO SPEECH SYNTHESIS FOR SYSTEM DEVELOPMENTAMHARIC TEXT TO SPEECH SYNTHESIS FOR SYSTEM DEVELOPMENT
AMHARIC TEXT TO SPEECH SYNTHESIS FOR SYSTEM DEVELOPMENTNathan Mathis
 

Similar to Text To Speech Synthesis System For Marathi Language Using Concatenation Technique (20)

G1803013542
G1803013542G1803013542
G1803013542
 
A Review on Speech Corpus Development for Automatic Speech Recognition in Ind...
A Review on Speech Corpus Development for Automatic Speech Recognition in Ind...A Review on Speech Corpus Development for Automatic Speech Recognition in Ind...
A Review on Speech Corpus Development for Automatic Speech Recognition in Ind...
 
F017163443
F017163443F017163443
F017163443
 
Implementation of Marathi Language Speech Databases for Large Dictionary
Implementation of Marathi Language Speech Databases for Large DictionaryImplementation of Marathi Language Speech Databases for Large Dictionary
Implementation of Marathi Language Speech Databases for Large Dictionary
 
DEVELOPMENT OF PHONEME DOMINATED DATABASE FOR LIMITED DOMAIN T-T-S IN HINDI
DEVELOPMENT OF PHONEME DOMINATED DATABASE FOR LIMITED DOMAIN T-T-S IN HINDIDEVELOPMENT OF PHONEME DOMINATED DATABASE FOR LIMITED DOMAIN T-T-S IN HINDI
DEVELOPMENT OF PHONEME DOMINATED DATABASE FOR LIMITED DOMAIN T-T-S IN HINDI
 
Approach of Syllable Based Unit Selection Text- To-Speech Synthesis System fo...
Approach of Syllable Based Unit Selection Text- To-Speech Synthesis System fo...Approach of Syllable Based Unit Selection Text- To-Speech Synthesis System fo...
Approach of Syllable Based Unit Selection Text- To-Speech Synthesis System fo...
 
MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK
 MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK
MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK
 
visH (fin).pptx
visH (fin).pptxvisH (fin).pptx
visH (fin).pptx
 
ELKL 5 Language documentation for linguistics and technology
ELKL 5 Language documentation for linguistics and technologyELKL 5 Language documentation for linguistics and technology
ELKL 5 Language documentation for linguistics and technology
 
A Context-based Numeral Reading Technique for Text to Speech Systems
A Context-based Numeral Reading Technique for Text to Speech Systems A Context-based Numeral Reading Technique for Text to Speech Systems
A Context-based Numeral Reading Technique for Text to Speech Systems
 
A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES
A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES
A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES
 
An Application for Performing Real Time Speech Translation in Mobile Environment
An Application for Performing Real Time Speech Translation in Mobile EnvironmentAn Application for Performing Real Time Speech Translation in Mobile Environment
An Application for Performing Real Time Speech Translation in Mobile Environment
 
Ijetcas14 444
Ijetcas14 444Ijetcas14 444
Ijetcas14 444
 
Paper on Speech Recognition
Paper on Speech RecognitionPaper on Speech Recognition
Paper on Speech Recognition
 
Contextual Analysis for Middle Eastern Languages with Hidden Markov Models
Contextual Analysis for Middle Eastern Languages with Hidden Markov ModelsContextual Analysis for Middle Eastern Languages with Hidden Markov Models
Contextual Analysis for Middle Eastern Languages with Hidden Markov Models
 
Marathi Text-To-Speech Synthesis using Natural Language Processing
Marathi Text-To-Speech Synthesis using Natural Language ProcessingMarathi Text-To-Speech Synthesis using Natural Language Processing
Marathi Text-To-Speech Synthesis using Natural Language Processing
 
An optimized approach to voice translation on mobile phones
An optimized approach to voice translation on mobile phonesAn optimized approach to voice translation on mobile phones
An optimized approach to voice translation on mobile phones
 
An optimized approach to voice translation on mobile phones
An optimized approach to voice translation on mobile phonesAn optimized approach to voice translation on mobile phones
An optimized approach to voice translation on mobile phones
 
A decision tree based word sense disambiguation system in manipuri language
A decision tree based word sense disambiguation system in manipuri languageA decision tree based word sense disambiguation system in manipuri language
A decision tree based word sense disambiguation system in manipuri language
 
AMHARIC TEXT TO SPEECH SYNTHESIS FOR SYSTEM DEVELOPMENT
AMHARIC TEXT TO SPEECH SYNTHESIS FOR SYSTEM DEVELOPMENTAMHARIC TEXT TO SPEECH SYNTHESIS FOR SYSTEM DEVELOPMENT
AMHARIC TEXT TO SPEECH SYNTHESIS FOR SYSTEM DEVELOPMENT
 

Recently uploaded

MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptxPoojaSen20
 

Recently uploaded (20)

MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptx
 

Text To Speech Synthesis System For Marathi Language Using Concatenation Technique

  • 1. Presented by Mr. Sangramsing Nathusing Kayte Guided by Dr.Bharti W. Gawali Professor & Head Department of Computer Science & Information Technology, Dr. Babasaheb Ambedkar Marathwada University, Aurangabad (M.S.) India. Dr. Babasaheb Ambedkar Marathwada University Department of Computer Science and Information Technology Text To Speech Synthesis System For Marathi Language Using Concatenation Technique 1
  • 2. Presentation Overview  Introduction to Speech Processing  Speech Synthesis  Techniques of Speech Synthesis  Survey of Literature  Objectives of the Research  Tools Used for Implementation  Performance Evaluation Methods  Database Creation  Comparative Analysis  Contribution and Significance  Conclusion  References 2
  • 3. Introduction to Speech Processing  Speech is the most basic form of communication between human beings.  Every human being cannot read and write , however, can communicate using speech.  Its high influence on the day-to-day life of human being 3
  • 4. Pictorial Involves organs like Vocal chords, Nasal cavity , mouth, teeth, Lips etc. 4
  • 5. Various speech processing Phases 1. Speech Recognition 2. Speech Synthesis 3. Speech coding 4. Speech compression We are Focusing on speech synthesis. 5
  • 6. Speech Synthesis Speech Synthesis is an artificial production of human speech. It is the computer-generated simulation of human speech. Also called as Text to Speech (TTS) system that converts text into spoken language प्राचीन भारतीय अर्थव्यवस्र्ा अततशय सुंदर होती Text to Speech System Text Speech 6
  • 7. Architecture of a Text-to-Speech System Text Text Normalization Prosodic Prediction • Duration • F0 Contour • Energy Waveform-Generation Handling numbers, symbols, abbreviations etc. Pre-processing Linguistic Analysis: • Part of speech tagging • Phrase breaks • Letter to Sound Rules Speech • Document Structure Detection • Conversion from Unicode and fonts Tagged Text Word sequence Phone sequence 7 प्राचीन भारतीय अर्थव्यवस्र्ा अततशय सुंदर होती
  • 8. Speech Synthesis Techniques There are different types of synthesis methods that can be used when building a TTS synthesis system. 1 Articulatory Synthesis 2. Formant Synthesis 3. Concatenative Speech Synthesis 8
  • 9. 1 Articulatory Synthesis:- Articulatory Synthesis is a method of synthesizing speech by controlling the speech articulators (e.g. jaw, tongue, lips, etc.) 2.Formant Synthesis:- Each phone is produced by specifying the Formants and pitch. A set of rules are also specified to modify pitch and formants, so that transition from one phone to another phone is sufficiently smooth 9
  • 10. 3. Concatenative synthesis:- Concatenative synthesis is based on the concatenation (or stringing together) of segments of pre- recorded speech. Architecture of Concatenative synthesis Input data Text input Speech recordings Unit segmentation Unit database Unit selection Concatenation + smoothing i a sh +… + + … 10
  • 11. There Are Three Main Subtypes Of Concatenative Synthesis: 1.Domain-specific synthesis 2.Di-phone synthesis 3.Unit selection synthesis 1) Domain-specific synthesis:- concatenates pre-recorded words and phrases to create complete utterances. 2) Di-phone synthesis:- uses a minimal speech database containing all the Di-phones (sound-to-sound transitions) occurring in a given language. In di-phone synthesis, only one example of each di-phone is contained in the speech database. 11
  • 12. 3)Unit selection synthesis:- uses large speech databases (more than one hour of recorded speech). During database creation, each recorded utterance is segmented into some or all of the following linguistic constructs such as phonemes, words phrases and sentences Architecture of Unit Selection synthesis 12 Unit Selection • Preprocessing • Text Normalization • Linguistic analysis n - Diaphone units (large speech database) Text Speech Phones
  • 13. Survey of Literature There are various foreign languages in which work has been done or going on such as English, Japanese, European Portuguese, Arabic, Polish, Korean, German, Turkish, Mongolian, and Greek. While focusing towards Indian languages there are total 22 official languages out of which Hindi, Malayalam, Kannada, Bengali, Oriya, Punjabi, Gujarati, Telugu, and Marathi are being in focused. 13
  • 14. Different systems have been developed in these languages such as Dhvani, Shruti, HP Lab, Vani etc. There are various institutions working on build a Marathi speech synthesizer such as IIIT-Hyderabad, CDAC-Mumbai, CDAC-Pune, and IIT Chennai. They have built various applications such as e-speak, a-speak, i- speak, Sandesh Pathak but in Hindi, Telugu and other languages. Very less work has been done in Marathi. 14
  • 15. 15 International Speech Synthesis Languages scenario Language Techniques Arabic (Saudi Arabia) - ar-SA iSpeech SDK Chinese (China) - zh-CN Festival framework Chinese (Hong Kong SAR China) - zh-HK Festival framework Chinese (Taiwan) - zh-TW Festival framework English (Australia) - en-AU , English (Ireland) - en-IE English (South Africa) - en-ZA English (United Kingdom) - en-GB English (United States) - en-US Festival framework
  • 16. 16 National Speech Synthesis Languages scenario Language Techniques Punjabi Festival framework Gujarati Concatenative synthesis ,Unit selection, festival Tamil Concatenative synthesis ,Unit selection, HTK Bangali Concatenative synthesis ,Unit selection Oriya Concatenative synthesis ,Unit selection Hindi Concatenative synthesis ,Unit selection Kannada Hidden Markov Model Marathi Continuous Density Hidden Markov Model approach for Marathi Speech synthesis system based on Marathi affricates. Telugu Festival framework
  • 17. 17 International Institute, speech Synthesis work is carried out Sr. No Institute Language 1 Digital Future of technology, New America U.S. English U.K. English Spanish French German 2 Machine Intelligence Laboratory Cambridge University Department of Engineering U.S. English & U.K. English 3 IBM Arabic Chinese French German U.S. English & U.K. English
  • 18. 18 National Institute, where Marathi speech Synthesis work is carried out Sr. No Institute Language 1 TIFR Mumbai Hindi, Bengali, Marathi, Indian English 2 IIIT Hyderabad Hindi, Telugu, Other Language, Marathi 3 IIT Mumbai Hindi, Marathi 4 CDAC Pune Hindi, Marathi, Indian English, Other Language 5 CDAC Noida Hindi, Panjabi, Marathi, Other Language 6 CIIL Mysore Most of the Indian Languages, Marathi
  • 19. Objectives of the Research The major objective of the proposed study is to build a Unit based synthesis system. Creation of database for speech synthesis. Developing the MATLAB and Android APP based on the speech synthesis system in Marathi. 19
  • 20. TOOLS Used for Implementation Festival Framework MATLAB Android 20
  • 21. Festival Framework The Festival Speech Synthesis System is a general multi-lingual speech synthesis system. Developed by Alan W. Black Centre for Speech Technology Research (CSTR) at the University of Edinburgh. Festival is designed to support multiple languages, and comes with support for English (British and American pronunciation), Welsh, and Spanish. Voice packages exist for several other languages, such as Spanish, Finnish, Hindi, Italian, Marathi, Polish, Russian and 21
  • 22. Festival Speech tools  festival-2.4  speech_tools-2.4  festvox-2.1  festlex_CMU  festlex_OALD  festlex_POSLEX  festvox_kalloc 22
  • 23. Commands  Initially, set environment variables FESTVOXDIR and ESTDIR to their respective directories. For example as export FESTVOXDIR=/home/drbhati/unit/festvox export ESTDIR=/home/drbhati/unit/speech_tools  Filename for a voice uses three part names i.e a institution name, a language, and a speaker. mkdir bamu_mar_sing  Make it current directory. Cd bamu_mar_sing 23
  • 24.  To set prompts for Unit selection process. $FESTVOXDIR/src/unitsel/setup_clunits bamu mar sing  To convert the wav files into 16000 HZ sampling frequency and mono sound executable file ./bin/get_wavs recording/*.wav  The next stage- To generate waveforms to act as prompts, or timing cues even if the prompts are not actually played. festival -b festvox/build_clunits.scm ’(build_prompts_waves "etc/txt.data")’ Commands Cont.. 24
  • 25.  After recording the recorded files should be in wav/. ./bin/prompt_them etc/txt.data  Lab files are created for each phone with following command. ./bin/make_labs prompt-wav/*.wav  After here, the steps are concerned with signal analysis, specifically pitch marking and cepstral parameter extraction.  There are a number of methods for pitch mark extraction and a number of parameters within these files that may need tuning. festival -b festvox/build_clunits.scm ’(build_utts "etc/txt.data")’ 25
  • 26. Lab files are created for each phone with following command. ./bin/make_labs prompt-wav/*.wav There are a number of methods for pitch mark extraction and a number of parameters within these files that may need tuning. festival -b festvox/build_clunits.scm ’(build_utts "etc/txt.data")’  Concerned with signal analysis, specifically pitch marking and cepstral parameter extraction. ./bin/make_pm_wave wav/*.wav 26
  • 27. The next stage it find the Mel Frequency Cepstral Coefficents. ./bin/make_mcep wav/*.wav  Building the cluster unit selection synthesizer consists of a number of stages all based on the controlling Festival script.  The parameters of which are described above. festival -b festvox/build_clunits.scm ’(build_clunits "etc/txt.data")’ 27
  • 28. The resulting voice is synthesized voice festival festvox/bamu_mar_bharti_clunits.scm festival> (voice_bamu_mar_bharti_clunits) festival> (SayText "kaarand~a aapalayaakad:ei tii padadhata naahii.") festival> (SayText "कारण आपल्याकडे ती पद्धत नाही.") 28
  • 29. DATABASE USED FOR FESTIVAL The speech data are collected with the help of recording studio using the single channel. Parameter Value Sampling Rate 16000 HZ Speakers Dependent Condition of Noise Normal Accent Marathi Pre –emphasis 1-0.97z--1 Window type Hamming, 25 milliseconds Window step size 20 millisecond Distance of Microphone 10-12 Meter Table: Technical specification of Data recording 29 Database Statistics Number of Utterances: 1000 Utterances: 1 Session: 01 Total size: 1000 sentences Ubuntu Festival
  • 30. 30 Table for Sentences and label used for generating synthesized Speech Sr.No The Original Sentence Original Speech File Synthesis Speech File 1 कारण आपल्याकडे ती पद्धत नाही A001 a001 2 प्राचीन भारतीय अर्थव्यवस्र्ा अततशय सुंदर होती A002 a002 3 कनाथटकात के वळ कन्नड अधधकृ त आहे A003 a003 4 ववककपीडडया हा एक ज्ञानकोश आहे A004 a004 5 कारण तू माझी आई आहेस A005 a005 6 इुंग्रजी भाषा आहे रोमन लिपी आहे A006 a006 7 येर्े `धन म्हणजे शद्ध िक्ष्मी A007 a007 8 सुंदभथ मात्र जमथनी येर्ीि A008 a008 9 शेवटचे र हे अक्षर मात्र कायम A009 a009 10 धिधाडे ही कोणी लशकार करत नाहह A0010 a0010 We tried on ten sentence to analyze the quality of original and Synthesis
  • 31. Performance Evaluation Methods There are many performance evaluation methods. But to judge the naturalness of the synthesized speech Most widely used is Mean Opinion Score method Mean opinion score (MOS) is a test that has been used for decades in telephony networks to obtain the human user's view of the quality of the network. The MOS is expressed as a single number in the range 1 to 5, where 1 is lowest perceived audio quality, and 5 is the highest perceived audio quality measurement 31
  • 32. Performance Evaluation MOS is calculated for subjective quality measurement. Unit selection synthesis approach. It was counseled to the listeners that they have to score between 01 to 05 (Excellent – 05 Very good – 04 Good – 03 Satisfactory – 02 Not understandable-01) for understandable. Table 1. Unit selection speech synthesis of the scores given by each subject for each synthesis system Subject Sub1 Sub2 Sub3 Sub4 Sub5 Sub6 Sub7 Sub8 Sub9 Sub10 Sentence 1 5 5 5 5 4 4 5 4 4 5 2 5 5 4 5 5 4 5 4 4 5 3 4 4 5 4 3 3 4 2 5 4 4 5 4 4 5 4 4 5 5 5 5 5 5 5 5 5 4 4 5 3 3 5 6 5 4 5 5 5 4 5 4 4 5 7 4 5 4 4 4 4 4 4 4 4 8 4 4 5 4 4 5 4 5 5 4 9 5 3 5 5 3 4 5 3 5 5 10 5 5 4 5 4 4 5 4 4 5
  • 33. 33 Sr. No Original Speech File Synthesized File P.S.N.R M.S.E 1 A001 a001 3.30 7.94 2 A002 a002 6.72 4.57 3 A003 a003 3.21 1.02 4 A004 a004 4.20 3.70 5 A005 a005 2.57 7.61 6 A006 a006 1.26 5.32 7 A007 a007 7.56 8.06 8 A008 a008 1.29 7.20 9 A009 a009 3.24 9.25 10 A0010 a0010 4.08 7.01 Average 3.74 6.17 Quality 96.26 93.83 The PSNR and MSE method was used for subjective quality measure of speech synthesis based on unit selection approach. Peak Signal Noise Ratio and Mean Square Error Quality Measure
  • 34. Synthesis System Design The Speech Synthesis system is developed in two platform : a) MATLAB based Marathi Speech Synthesis system b) Android Based Marathi Speech Synthesis system 34
  • 35. a. Matlab based speech synthesised calculator MATLAB (matrix laboratory) is a multi-paradigm numerical computing environment and fourth-generation programming language. GUI based Calculator is developed and concatenation speech synthesis technique is applied. 35
  • 36. Prototype of Marathi Speech Talking Calculator Application 36
  • 37. 37 Testing table for the MATLAB Marathi Speech Talking CalculatorSr.No NO1 Op1 NO2 Op2 Result Response Time Result in Speech १ २ + ६ = ८ १ सेकुं द आठ २ ५ + ९ = १४ १ सेकुं द चौदा ३ ५९ + ८९ = १४८ १.६सेकुं द एकशे अठ्ठेचाळीस ४ ६६ + ८४ = १५० १.६ सेकुं द एकशे पन्नास ५ ८७ + ९३ = १८० १.६ सेकुं द एकशेऐुंशी ६ ९६८ + ७६९ = १८३७ १.७ सेकुं द एकहजार आठशे सदतीस ७ २३६७ + ९५६३ = ११९३० २.६सेकुं द अकराहजार नऊशे तीस ८ ५८६ + ३२१ = ९०७ १.६ सेकुं द नऊशे सात ९ ५८२ + ६३९ = १२२१ १.७ सेकुं द एकहजार दोनशे एकवीस १० ४९३ + ३५७ = ८५० १.६ सेकुं द आठशे पन्नास ११ १२.१५ + ९५.६९ = १०७.८४ २.६ सेकुं द एकशे सात परुंका चौऱ्याऐुंशी १२ ७५.९ + ५६.७८ = १३२.६८ २.६ सेकुं द एकशे बत्तीस परुंका अडस्ठ १३ १० - ६ = ४ १ सेकुं द चार १४ १५ - ९ = ४ १ सेकुं द चार १५ ८९ - ६३ = २६ १ सेकुं द सव्वीस
  • 38. 38 १६ ७९५ - ५५६ = २३९ १.३ सेकुं द दोनशे एकोणचाळीस १७ १५६७ - २१५९ = ५९२ १.३ सेकुं द पाचशे ब्याण्णव १८ ४५९ - ३६७ = ९२ १.३ सेकुं द ब्याण्णव १९ ९६१५३ - ६५८९ = ८९५६४ २.६ सेकुं द एकोणनव्वद हजार पाचशे चौस्ठ २० ५६४७८९ - २३६९ = ५६२४२० २.८ सेकुं द पाचिीक बास्ठ हजार चारशे वीस २१ २३६५ - २५८ = २१०७ १.४ सेकुं द दोन हजार एकशे सात २२ ८८८ - ६६६ = २२२ १.२ सेकुं द दोनशे बावीस २३ १५९.३६ - २३.८ = १३५.५६ २.८ सेकुं द एकशे पस्तीस परुंका छप्पन्न २४ ९५३.१२६ - ६९३.२५ = २५९.८७६ २.८ सेकुं द दोनशे एकोणसाठ परुंका आठशे शहात्तर २५ २३ * ५९ = १३५७ २ सेकुं द एकहजार तीनशे सत्तावन्न २६ १३ * ३६ = ३३८ १.४ सेकुं द तीनशे अडती २७ ९ * ८ = ७२ १.४ सेकुं द बाहत्तर २८ २६ * १२ = ३१२ १.४ सेकुं द तीनशे बारा २९ ६६ * ९ = ५९४ १.४ सेकुं द पाचशे चौऱ्याण्णव ३० ३६ * ११ = १०५६ २ सेकुं द एकहजार छप्पन्न ३१ ५३९ * १२ = ६४६८ १.४ सेकुं द सहा हजार चारशे अडस्ठ ३२ २५७८ * ३ = १५४६८ २.८ सेकुं द पुंधरा हजार चारशे अडस्ठ ३३ ६९७५ * १२ = ८३७०० २.४ सेकुं द त्र्याऐुंशी हजार सातशे
  • 39. 39 ३४ २२५८ * ५६ = १२६४४८ २.८ सेकुं द एक िीक सव्वीस हजार चारशे अठ्ठेचाळीस ३५ ५२३.६९ * १२.३ = ६४४१.३८७ ३. २ सेकुं द सहा हजार चारशे एक्के चाळीस परुंका तीनशे सत्त्याऐुंशी ३६ ८५९६.३ * ६.३ = ५४१५६.६९ ४ सेकुं द चोपन्न हजार एकशे छप्पन्न परुंका एकोणसत्तर ३७ ५४ / ४ = १३.५ २ सेकुं द तेरा परुंका पाच ३८ ३३० / ८ = ४१.२५ २ सेकुं द एक्के चाळीसएक्काकॅ िेस परुंका पुंचवीस ३९ ७५० / ६ = १२५ १.४ सेकुं द एकशे पुंचवीस ४० ८५० / ६ = १४१.६६ ३ सेकुं द एकशे एक्के चाळीस परुंका सहास्ठ ४१ ६५० / १० = ६५ १ सेकुं द पास्ठ ४२ ६३ / ५ = १२.६ २ सेकुं द बारा परुंका सहा ४३ ३३० / १५ = २२ १ सेकुं द बावीस ४४ ५९६ / ७ = ८५.१४ २ सेकुं द पुंच्याऐुंशी परुंका चौदा ४५ २३५८९ / १० = २३५८.९ २.४ सेकुं द दोन हजार तीनशे अठ्ठावन्न परुंका नऊ ४६ ९६३२ / ८ = १२०४ १.६ सेकुं द एक हजार दोनशे चार ४७ ५६३.६ / २.५६ = २२०.१५६ ३.४ सेकुं द दोनशे वीस परुंका एकशे छप्पन्न ४८ ८५६९.२३ / १२.८५ = ६६६.८६६ ३.२ सेकुं द सहाशे सहास्ठ परुंका आठशे सहास्ठ ४९ ५९.२६३ / ९.१२ = ६.४९ २ सेकुं द सहा परुंका एकोणपन्नास ५० १५६३.२५ / २१.२५ = ७३.५३४ २ सेकुं द त्र्याहत्तर परुंका पाचशे चौतीस
  • 40. 40 Performance Evaluation MOS on MATLAB based Marathi Speech Talking CalculatorMOS is calculated for subjective quality measurement. It is calculated for the synthesized speech using the Unit selection synthesis. It was counseled to the listeners that they have to score between 01 to 05 (Excellent – 05 Very good – 04 Good – 03 Satisfactory – 02 Not understandable-01) for understandable. Subject Sub1 Sub2 Sub3 Sub4 Sub5 Sub6 Sub7 Sub8 Sub9 Sub10 Sentence 1 5 5 5 5 4 4 5 4 5 5 2 5 5 5 5 5 4 5 4 5 5 3 5 5 5 5 5 4 5 4 5 5 4 5 5 5 5 5 5 5 4 5 5 5 5 5 5 5 5 5 5 5 5 5 6 5 5 5 5 5 5 5 5 5 5 7 5 5 5 5 4 5 5 5 5 5 8 5 5 5 5 4 4 5 5 5 5 9 5 5 5 5 4 4 5 5 5 5 10 5 5 5 5 4 4 5 4 5 5 11 5 5 5 5 4 4 5 4 5 5
  • 41. 41 12 5 5 5 5 4 4 5 4 5 5 13 5 5 5 5 4 4 5 4 5 5 14 5 5 5 5 4 4 5 4 5 5 15 5 5 5 5 4 4 5 4 5 5 16 5 5 5 5 4 4 5 4 5 5 17 5 5 5 5 4 4 5 4 5 5 18 5 5 5 5 4 4 5 4 5 5 19 5 5 5 5 4 4 5 4 5 5 20 5 5 5 5 4 4 5 4 5 5 21 5 5 5 5 4 4 5 4 5 5 22 5 5 5 5 4 4 5 4 5 5 23 5 5 5 5 4 4 5 4 5 5 24 5 5 5 5 4 4 5 4 5 5 25 5 5 5 5 4 4 5 4 5 5 26 5 5 5 5 4 4 5 4 5 5 27 5 5 5 5 4 4 5 4 5 5 28 5 5 5 5 4 4 5 4 5 5 29 5 5 5 5 4 4 5 4 5 5
  • 42. 42 30 5 5 5 5 4 4 5 4 5 5 31 5 5 5 5 4 4 5 4 5 5 32 5 5 5 5 4 4 5 4 5 5 33 5 5 5 5 4 4 5 4 5 5 34 5 5 5 5 4 4 5 4 5 5 35 5 5 5 5 4 4 5 4 5 5 36 5 5 5 5 4 4 5 4 5 5 37 5 5 5 5 4 4 5 4 5 5 38 5 5 5 5 4 4 5 4 5 5 39 5 5 5 5 4 4 5 4 5 5 40 5 5 5 5 4 4 5 4 5 5 41 5 5 5 5 4 4 5 4 5 5 42 5 5 5 5 4 4 5 4 5 5 43 5 5 5 5 4 4 5 4 5 5 44 5 5 5 5 4 4 5 4 5 5 45 5 5 5 5 4 4 5 4 5 5 46 5 5 5 5 4 4 5 4 5 5 47 5 5 5 5 4 4 5 4 5 5 48 5 5 5 5 4 4 5 4 5 5 49 5 5 5 5 4 4 5 4 5 5
  • 43. 43 The table shows that the 85% of individuals rated the quality of synthetic speech is good and understandable. The only 15 % speech is perceptible and not clearly produced. Thus Unit selection method provides the naturalness and understandability, the two important parameters of TTS system. The quality of speech as per the above MOS table is as follows: Graphical Representation :- Percentage Evaluation for Synthesized output speech for MATLAB application 85% 15% High quality and Understandable 85% Perceptible Quality speech 15%
  • 44. b. Android based speech synthesized Android is a mobile operating system developed by Google, based on the Linux kernel and designed primarily for touchscreen mobile devices such as smartphones and tablets. Android's user interface is mainly based on direct manipulation, using touch gestures that loosely correspond to real-world actions, such as swiping, tapping and pinching, to manipulate on-screen objects, along with a virtual keyboard for text input. In addition to touchscreen devices, Google has further developed Android TV for televisions, Android Auto for cars, and Android Wear for wrist watches, each with a specialized user interface. Variants of Android are also used on notebooks, game consoles, digital cameras, and other electronics. 44
  • 45. 45 Android Applications in International scenario Sr. No Name of Application Language 1 Google Text to Speech English 2 Easy Text speech English 3 IVONA Text-to-Speech for iOS English,Welish , Polish 4 Voice Dream Reader English 5 Text to Speech for iOS English 6 Type and speak English 7 eReader Prestigo:Book Reader 25 Languages 8 Announcify English 9 Select and Speak English 10 SpeakIt English 11 Dictanots English 12 Voice Recognition English 13 Text to Speech Reader English
  • 46. 46 Android Applications in National scenario Sr.No Name of Application Language 1 aSpeak Telugu, Hindi 2 Sandesh Pathak Telugu ,Hindi, Marathi, Tamil and Gujarati 3 Shruti Hindi and Bengali 4 HP labs Hindi 5 Vani Hindi 6 Dhvani Hindi, Malayalam, Kannada, Bengali, Oriya, Punjabi, Telegu, Marathi
  • 47. DatabaseCreation MATLAB & Android The total number of words with probability 121 , utterance and the data was collected in 1 sessions so, the overall 121/- vocabulary size are collected for the database. Table : Design Speech database for MTC Application Number Marathi Pronunciation Number Marathi Pronunciation १ एक 11 अकरा 2 दोन 12 बारा ३ तीन 13 तेरा ४ चार 14 चौदा ५ पाच 15 पुंधरा ६ सहा 16 सोळा ७ सात 17 सतरा ८ आठ 18 अठरा ९ नउ 19 एकोणीस १० दहा 20 वीस 47
  • 48. 48 Number Marathi Pronunciation Number Marathi Pronunciation Number Marathi Pronunciation 21 एकवीस 35 पस्तीस 49 एकोणपन्नास 22 बावीस 36 छत्तीस 50 पन्नास 23 तेवीस 37 सदतीस 51 एक्कावन्न 24 चोवीस 38 अडतीस 52 बावन्न 25 पुंचवीस 39 एकोणचाळीस 53 त्रेपन्न 26 सव्वीस 40 चाळीस 54 चोपन्न 27 सत्तावीस 41 एक्के चाळीस 55 पुंचावन्न 28 अठ्ठावीस 42 बेचाळीस 56 छप्पन्न 29 एकोणतीस 43 त्रेचाळीस 57 सत्तावन्न 30 तीस 44 चव्वेचाळीस 58 अठ्ठावन्न 31 एकतीस 45 पुंचेचाळीस 59 एकोणसाठ 32 बत्तीस 46 सेहेचाळीस 60 साठ 33 तेहेतीस 47 सत्तेचाळीस 61 एकस्ठ 34 चौतीस 48 अठ्ठेचाळीस 62 बास्ठ Cont..
  • 49. 49 Number Marathi Pronunciation Number Marathi Pronunciation Number Marathi Pronunciation 63 त्रेस्ठ 78 अठ्ठ्याहत्तर 93 त्र्याण्णव 64 चौस्ठ 79 एकोण ऐुंशी 94 चौऱ्याण्णव 65 पास्ठ 80 ऐुंशी 95 पुंच्याण्णव 66 सहास्ठ 81 एक्क्याऐुंशी 96 शहाण्णव 67 सदस्ठ 82 ब्याऐुंशी 97 सत्त्याण्णव 68 अडस्ठ 83 त्र्याऐुंशी 98 अठ्ठ्याण्णव 69 एकोणसत्तर 84 चौऱ्याऐुंशी 99 नव्व्याण्णव 70 सत्तर 85 पुंच्याऐुंशी 100 शुंभर 71 एक्काहत्तर 86 शहाऐुंशी १०१ एकशे एक 72 बाहत्तर 87 सत्त्याऐुंशी १००० हजार 73 त्र्याहत्तर 88 अठ्ठ्याऐुंशी १०,००० दहा हजार 74 चौर्याहत्तर 89 एकोणनव्वद १,००,००० िाख 75 पुंच्याहत्तर 90 नव्वद १०,००,००० दहा िाख 76 शहात्तर 91 एक्क्याण्णव १,००,००,००० कोटी 77 सत्याहत्तर 92 ब्याण्णव १००,००,००,००० शुंभर कोटी Cont..
  • 50. Table :Operation (Action Taken) Vocabulary for MTC Application Sr. No Operation Pronunciation 1 + बेरीज 2 - वजाबाकी 3 * गुणाकार 4 / भागाकार 5 . दशाांश 6 = बरोबर Cont.. 50
  • 51. ACQUISITION SETUP The speech data are collected with the help of Real-tech microphone and CSL is using the single channel. Parameter Value Sampling Rate 16 000 HZ Speakers Dependent Condition of Noise Normal Accent Marathi Pre –emphasis 1-0.97z--1 Window type Hamming, 25 milliseconds Window step size 20 millisecond Distance of Microphone 10-12 Meter Table: Technical specification of Data recording MATLAB & Android Database Statistics Number of Utterances: 121 Utterances: 1 Session: 01 Total size: 121 words 51
  • 52. b) Android Based Marathi Speech Talking Calculator : 52 Prototype of Marathi Speech Talking Calculator Application
  • 53. Testing table for the Android Marathi Speech Talking Calculator 53 Sr_No NO1 Op1 NO2 Op2 Result Response Time Result in Speech १ २ + ६ = ८ १ सेकुं द आठ २ ५ + ९ = १४ १ सेकुं द चौदा ३ ५९ + ८९ = १४८ १.६सेकुं द एकशे अठ्ठेचाळीस ४ ६६ + ८४ = १५० १.६ सेकुं द एकशे पन्नास ५ ८७ + ९३ = १८० १.६ सेकुं द एकशेऐुंशी ६ ९६८ + ७६९ = १८३७ १.७ सेकुं द एकहजार आठशे सदतीस ७ २३६७ + ९५६३ = ११९३० ३ सेकुं द अकराहजार नऊशे तीस ८ ५८६ + ३२१ = ९०७ १.६ सेकुं द नऊशे सात ९ ५८२ + ६३९ = १२२१ १.७ सेकुं द एकहजार दोनशे एकवीस १० ४९३ + ३५७ = ८५० १.६ सेकुं द आठशे पन्नास ११ १२.१५ + ९५.६९ = १०७.८४ २.३ सेकुं द एकशे सात परुंका चौऱ्याऐुंशी १२ ७५.९ + ५६.७८ = १३२.६८ ३.२ सेकुं द एकशे बत्तीस परुंका अडस्ठ १३ १० - ६ = ४ १ सेकुं द चार १४ १५ - ९ = ४ १ सेकुं द चार १५ ८९ - ६३ = २६ १ सेकुं द सव्वीस
  • 54. 54 १६ ७९५ - ५५६ = २३९ १.३ सेकुं द दोनशे एकोणचाळीस १७ १५६७ - २१५९ = ५९२ १.३ सेकुं द पाचशे ब्याण्णव १८ ४५९ - ३६७ = ९२ १.३ सेकुं द ब्याण्णव १९ ९६१५३ - ६५८९ = ८९५६४ ३ सेकुं द एकोणनव्वद हजार पाचशे चौस्ठ २० ५६४७८९ - २३६९ = ५६२४२० ३.८ सेकुं द पाचिीक बास्ठ हजार चारशे वीस २१ २३६५ - २५८ = २१०७ १.४ सेकुं द दोन हजार एकशे सात २२ ८८८ - ६६६ = २२२ १.२ सेकुं द दोनशे बावीस २३ १५९.३६ - २३.८ = १३५.५६ २ सेकुं द एकशे पस्तीस परुंका छप्पन्न २४ ९५३.१२६ - ६९३.२५ = २५९.८७६ ३.२ सेकुं द दोनशे एकोणसाठ परुंका आठशे शहात्तर २५ २३ * ५९ = १३५७ १.४ सेकुं द एकहजार तीनशे सत्तावन्न २६ १३ * ३६ = ३३८ १.४ सेकुं द तीनशे अडती २७ ९ * ८ = ७२ १.४ सेकुं द बाहत्तर २८ २६ * १२ = ३१२ १.४ सेकुं द तीनशे बारा २९ ६६ * ९ = ५९४ १.४ सेकुं द पाचशे चौऱ्याण्णव ३० ३६ * ११ = १०५६ १.४ सेकुं द एकहजार छप्पन्न ३१ ५३९ * १२ = ६४६८ १.४ सेकुं द सहा हजार चारशे अडस्ठ ३२ २५७८ * ३ = १५४६८ ३.४ सेकुं द पुंधरा हजार चारशे अडस्ठ ३३ ६९७५ * १२ = ८३७०० ३.६ सेकुं द त्र्याऐुंशी हजार सातशे Cont.
  • 55. 55 ३४ २२५८ * ५६ = १२६४४८ ३.८ सेकुं द एक िीक सव्वीस हजार चारशे अठ्ठेचाळीस ३५ ५२३.६९ * १२.३ = ६४४१.३८७ ३.३ सेकुं द सहा हजार चारशे एक्के चाळीस परुंका तीनशे सत्त्याऐुंशी ३६ ८५९६.३ * ६.३ = ५४१५६.६९ ३ सेकुं द चोपन्न हजार एकशे छप्पन्न परुंका एकोणसत्तर ३७ ५४ / ४ = १३.५ २ सेकुं द तेरा परुंका पाच ३८ ३३० / ८ = ४१.२५ २ सेकुं द एक्के चाळीसएक्काकॅ िेस परुंका पुंचवीस ३९ ७५० / ६ = १२५ १.४ सेकुं द एकशे पुंचवीस ४० ८५० / ६ = १४१.६६ ३ सेकुं द एकशे एक्के चाळीस परुंका सहास्ठ ४१ ६५० / १० = ६५ १ सेकुं द पास्ठ ४२ ६३ / ५ = १२.६ २ सेकुं द बारा परुंका सहा ४३ ३३० / १५ = २२ १ सेकुं द बावीस ४४ ५९६ / ७ = ८५.१४ २ सेकुं द पुंच्याऐुंशी परुंका चौदा ४५ २३५८९ / १० = २३५८.९ ३.२ सेकुं द दोन हजार तीनशे अठ्ठावन्न परुंका नऊ ४६ ९६३२ / ८ = १२०४ १.६ सेकुं द एक हजार दोनशे चार ४७ ५६३.६ / २.५६ = २२०.१५६ ३.२ सेकुं द दोनशे वीस परुंका एकशे छप्पन्न ४८ ८५६९.२३ / १२.८५ = ६६६.८६६ ३.२ सेकुं द सहाशे सहास्ठ परुंका आठशे सहास्ठ ४९ ५९.२६३ / ९.१२ = ६.४९ २ सेकुं द सहा परुंका एकोणपन्नास ५० १५६३.२५ / २१.२५ = ७३.५३४ २ सेकुं द त्र्याहत्तर परुंका पाचशे चौतीस
  • 56. 56 Performance Evaluation MOS on MATLAB Marathi Speech Talking CalculatorMOS is calculated for subjective quality measurement. It is calculated for the synthesized speech using the Unit selection synthesis. It was counseled to the listeners that they have to score between 01 to 05 (Excellent – 05 Very good – 04 Good – 03 Satisfactory – 02 Not understandable-01) for understandable. Subject Sub1 Sub2 Sub3 Sub4 Sub5 Sub6 Sub7 Sub8 Sub9 Sub10 Sentence 1 5 5 5 5 4 4 5 4 5 5 2 5 5 5 5 4 4 5 4 5 5 3 5 5 5 5 4 4 5 4 5 5 4 5 5 5 5 4 4 5 4 5 5 5 5 5 5 5 4 4 5 4 5 5 6 5 5 5 5 4 4 5 4 5 5 7 5 5 5 5 4 4 5 4 5 5 8 5 5 5 5 4 4 5 4 5 5 9 5 5 5 5 4 4 5 4 5 5 10 5 5 5 5 4 4 5 4 5 5 11 5 5 5 5 4 4 5 4 5 5
  • 57. 57 12 5 5 5 5 4 4 5 4 5 5 13 5 5 5 5 4 4 5 4 5 5 14 5 5 5 5 4 4 5 4 5 5 15 5 5 5 5 4 4 5 4 5 5 16 5 5 5 5 4 4 5 4 5 5 17 5 5 5 5 4 4 5 4 5 5 18 5 5 5 5 4 4 5 4 5 5 19 5 5 5 5 4 4 5 4 5 5 20 5 5 5 5 4 4 5 4 5 5 21 5 5 5 5 4 4 5 4 5 5 22 5 5 5 5 4 4 5 4 5 5 23 5 5 5 5 4 4 5 4 5 5 24 5 5 5 5 4 4 5 4 5 5 25 5 5 5 5 4 4 5 4 5 5 26 5 5 5 5 4 4 5 4 5 5 27 5 5 5 5 4 4 5 4 5 5 28 5 5 5 5 4 4 5 4 5 5 29 5 5 5 5 4 4 5 4 5 5 30 5 5 5 5 4 4 5 4 5 5 Cont.
  • 58. 58 31 5 5 5 5 4 4 5 4 5 5 32 5 5 5 5 4 4 5 4 5 5 33 5 5 5 5 4 4 5 4 5 5 34 5 5 5 5 4 4 5 4 5 5 35 5 5 5 5 4 4 5 4 5 5 36 5 5 5 5 4 4 5 4 5 5 37 5 5 5 5 4 4 5 4 5 5 38 5 5 5 5 4 4 5 4 5 5 39 5 5 5 5 4 4 5 4 5 5 40 5 5 5 5 4 4 5 4 5 5 41 5 5 5 5 4 4 5 4 5 5 42 5 5 5 5 4 4 5 4 5 5 43 5 5 5 5 4 4 5 4 5 5 44 5 5 5 5 4 4 5 4 5 5 45 5 5 5 5 4 4 5 4 5 5 46 5 5 5 5 4 4 5 4 5 5 47 5 5 5 5 4 4 5 4 5 5 48 5 5 5 5 4 4 5 4 5 5 49 5 5 5 5 4 4 5 4 5 5 50 5 5 5 5 4 4 5 4 5 5 Cont.
  • 59. 59 The Graph shows that the 95.5% of individuals rated the the quality of synthetic speech is good and understandable. The only 4.5 % speech is perceptible and not clearly produced. Thus Unit selection method provides the naturalness and understandability, the two important parameters of TTS system. The quality of speech as per the above MOS table for the is as follows: Graphical Representation : Percentage Evaluation for Synthesized output speech for Android application 95.5% 4.5% High quality and Understandable 95.5% Perceptible Quality speech 4.5%
  • 60. Comparative Analysis 60 Sr. No Name of Android Application in Marathi Utilities 1. Sandesh Pathak Agriculture based 2. A-speak Hindi &Telugu text-to-speech 3 Dhvani Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Oriya, Panjabi, Tamil, Telugu, Pashto text to speech system 4 Shruti Marathi text-to-speech 7 janbharti Marathi text-to-speech 8 Android Based Marathi Speech Calculator Marathi Talking Calculator 9 Matlab Based Marathi Calculator Marathi Talking Calculator
  • 61. Contribution and Significance 61  The contribution and significance of this research are:  Created Marathi Speech Calculator in MATLAB.  Creation of Marathi speech database and its publication in android Studio in Google play store.  The Created database will be useful for young researcher, who want to start their work in regional language.  Such assertive application is useful for common masses who communicate through Marathi .
  • 62. Limitation of Research 62  For MATLAB & Android based Marathi Calculator:  The developed Talking calculator speaks the result only till 10 digit place value  It also performs only basic arithmetic operations like Addition, Subtraction, Multiplication and Division.
  • 63. Conclusion It is observed from the literature review that much of efforts are done on speech synthesis by many of institution like CDAC,TIFR,CMU,IIT Madras, CEERI,ISI Kolkata, IIIT Hyderabad. Application developed in TTS for Marathi are sandesh Pathak, janbharti,etc. But still the research efforts are needed in terms of natural ness in Marathi. Festival Quality of output speech synthesis. In this we have attempted to design TTS for specific domain like calculation. 63
  • 64. Future Scope 64  As the developed application only works for basic operators we will try to implement the other operators left.  We will attempt to go beyond 10 digits.  We will Develop a system that will read online Newspaper and books in Marathi
  • 65. Acknowledgment 68 I would like to thank to my Research advisors, Professors Bharti Gawali for supporting me during these past four years. Bharti Gawali is someone you will instantly love and never forget once you meet her. She is the funniest Research advisor and one of the smartest lady. I am also very grateful to MLA for his scientific advice and knowledge and many insightful discussions and suggestions. I also have to thank the members of my Department faculty’s and Non-Teaching Staff for their helpful career advice and suggestions in general.
  • 66. List of Publication 1) Sangramsing Kayte, Kavita Waghmare, Dr. Bharti Gawali “Marathi Speech Synthesis: A review” International Journal on Recent and Innovation Trends in Computing and Communication ISSN: 2321-8169 Volume: 3 Issue: 6 3708 – 3711 (Impact Factor 5.83) 2) Sangramsing Kayte, Bharti Gawali "A Text-To-Speech Synthesis for Marathi Language using Festival and Festvox" International Journal of Computer Applications (0975 – 8887) Volume 132 – No.3, December 2015 3) Sangramsing Kayte and Bharti Gawali. Article: Analysis of Pitch and Duration in Speech Synthesis using PSOLA. Communications on Applied Electronics 4(4):10-18, February 2016. Published by Foundation of Computer Science (FCS), NY, USA. 65
  • 67. Chapters in the Book 1) Sangramsing N. Kayte, Monica Mundada, Santosh Gaikwad and Bharti Gawali "Performance Evaluation of Speech Synthesis Techniques for English Language" © Springer Science+Business Media Singapore 2016 S.C. Satapathy et al. (eds.), Proceedings of the International Congress on Information and Communication Technology, Advances in Intelligent Systems and Computing 439, DOI 10.1007/978- 981-10-0755-2_27 66
  • 68. References 67 1. Paul Taylor, a text book on “Text to Speech Synthesis”, University of Cambridge, United Kingdom 2. Sami Lemmetty “Review of Speech Synthesis Technology” M.Tech., Helsinki University of Technology, Finland, 1999 3. A. Black, P. Taylor, and R. Caley, “The Festival speech synthesis system,” http://festvox.org/festival, 1999. 4. K. Prahallad, N. K. Elluru, V. Keri, S. Rajendran, and A. W. Black, "The IIIT-H Indic speech databases", in Proceedings of INTERSPEECH, Portland, Oregon, USA, 2012. 5. A. Black and K. Lenzo, “Building voices in the Festival speech synthesis system,” http://festvox.org/bsv/, 2000. References
  • 69. Websites 68 1. http://tcts.fpms.ac.be/synthesis/introtts_old.html 2. http://www.festvox.org/ 3. http://www.cstr.ed.ac.uk/ 4. http://en.wikipedia.org/wiki/Speech_synthesis 5. http://hts.sp.nitech.ac.jp/ 6. http://festvox.org/11752/packed/ 7. http://audacity.sourceforge.net/

Editor's Notes

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
  35. 35
  36. 37
  37. 38
  38. 39
  39. 40
  40. 41
  41. 42
  42. 43
  43. 44
  44. 45
  45. 46
  46. 47
  47. 48
  48. 49
  49. 50
  50. 51
  51. 52
  52. 53
  53. 54
  54. 55
  55. 56
  56. 57
  57. 58
  58. 59
  59. 60
  60. 61
  61. 62
  62. 63
  63. 64
  64. 65
  65. 66
  66. 67
  67. 68
  68. 69
  69. 70