SAP (SPEECH AND AUDIO PROCESSING)

Text
to
Speech
synthesizer.
Mini Project Report
(Speech and Audio Processing ECT 359-1)

Contents :
 Introduction
 Objective
 Theoretical background
 Flowchart
 Code and Execution
 Result with Discussion
 Applications
 Advantages
 Limitations and Future scope
 References
2

Introduction :
 The text-to-speech (TTS) synthesis procedure
consists of two main phases.
 The first is text analysis, where the input text is
transcribed into a phonetic or some other linguistic
representation.
 And the second one is the generation of speech
waveforms, where the output is produced from this
phonetic and prosodic information.
3

Introduction :
 These two phases are usually called high and low-level
synthesis . A simplified version of this procedure is
presented in figure below.
4

5
Objectives :
 Text to speech synthesizer will be of great help to
people with visual impairment .
 Text to speech synthesizer will help a machine to
communicate with users.

Theoretical Background :
6
 Speech Synthesis is the artificial production of human
speech.
 A synthesizer can incorporate a model of the vocal tract and
other human voice characteristics to create a completely
"synthetic" voice output.
 A computer system used for this purpose is called a speech
computer or speech synthesizer.
 A text-to-speech (TTS) system converts normal language text
into speech; other systems render symbolic linguistic
representations like phonetic transcriptions into speech.

TTS overview :
7
The procedure consist of two main phases:-
 Text Analysis
 Speech waveforms
 TEXT ANALYSIS : The input text is transcribed into a phonetic or some other
linguistic representation
 SPEECH WAVEFORMS : The acoustic output is produced from the phonetic
and prosodic information

Front End and Back End in TTS
8
 A text-to-speech system (or "engine") is composed of two
parts: a front-end and a back-end.
 The front-end converts raw text containing symbols like
numbers and abbreviations into the equivalent of written
out words (tokenization), then assigns phonetic
transcriptions to each word, and divides and marks the
text into prosodic units, like phrases, clauses, and
sentences (grapheme-phoneme conversion).
 The back-end often referred to as the synthesizer— then
converts the symbolic linguistic representation into sound.

9
FrFront End and Back End in TTS :

11
Speech Synthesizer used :
Concatenative synthesis is based on the concatenation (or
stringing together) of segments of recorded speech. Generally,
concatenative synthesis produces the most natural-sounding
synthesized speech.
 Concatenate segments of pre-recorded natural human
speech.
 Requires database of previously recorded human speech
covering all the possible segments to be synthesized .
 Segment might be phoneme, syllable, word, phrase, or any
combination .

Detailed Architecture of TTS systems :
12

.NET Framework :
 .NET is a framework developed by Microsoft.
 It is a new programming methodology.
 .NET is platform independent/cross platform ‘
 .NET is language insensitive.
 It includes a large class library known as Framework
Class Library (FCL).
13

Continued …….
14
 Microsoft also produces an IDE largely for .NET software
called Visual Studio.
 It provides language interoperability (each language can
use code written in other language ) across several
programming languages.

Result :
22
 In this way , our aim to convert text which we passed as argument in
function is converted into artificial human voice (speech) .
 With the help of this TTS synthesizer , a blind guy can even read a book
or novel which is not available in braille language .
 This TTS synthesizer can be used in medical store for proper
pronunciation of medicines on cover or boxes.
 It is mostly used in voice stick device and voice assistant like Siri, google
assistant , Cortana and Alexa etc.

Applications :
 Talking Calculator
 Computer generated instructions
 Aids for the blind
 Telephone inquiry services
 Teaching machices
 Usage in education and daily life .
23

Advantages :
 Able to read large paragraphs .
 It offers a range of different accents and voices .
 Provide significant help for people with eyes disabilities.
 More accuracy in medical systems.
 It can be adapted easily to say whatever users want them to say.
 It provides talking machines for vocally impaired or deaf people
and better aids for speech therapy.
24

Limitations :
 No explicit emotions
 Homographs (Pronunciation)
 Prosody
 Language specific problems
 Special characters and symbols
25

Future Scope :
 It can also work in different languages like Marathi ,
Hindi , Kannada , etc.
 Accuracy will become better and can able to
pronounce symbols and special characters.
 Increasing variety of voices .
26

References :
 www.google.com
 www.youtube.com
 www.shareslide.net
 www.mathworks.com
 www.microsoft.com

SAP (SPEECH AND AUDIO PROCESSING)

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to SAP (SPEECH AND AUDIO PROCESSING)

Similar to SAP (SPEECH AND AUDIO PROCESSING) (20)

Recently uploaded

Recently uploaded (20)

SAP (SPEECH AND AUDIO PROCESSING)