SlideShare a Scribd company logo
Seminar on
“ AI Based Character Recognition and
Speech Synthesis”
Developed By:
Kalyani Hadke Rani Kubetkar
Shreya Surjuse Ankita Jadhao
Kruttika Sorte
Guided By
Prof. H. N. Datir
Artificial Intelligence
based
Character Recognition and Speech Synthesis
NEED!!!
We are facing so many problem in our daily life like, if we
capturing the image some time we can not get proper
image and not recognize the words.
Lots of people have the problem of illiteracy .
So we wish that this image should be converted to text
for various purposes.
While studying, we don’t read the text as a regular
practice. So we wish that this text can be converted into
audio.
Apart which we wish should be captured in image &
converted into audio.
As generally we prefer hearing songs,
Introduction to CR and SS
• Optical Character Recognition (OCR) is an electronic or
mechanical converter.
• OCR converts scanned images or text into machine code.
• Speech Synthesis is the artificial production of human
speech.
• Speech synthesizer – a computer system used for this
purpose.
• TTS engine performs:
• Language into speech
• Symbolic linguistic representation to speech
• Image
OCR
• Recognized
text
TEXT
• Speech
engine
speech
•Image
OCR
•Recognized
text
TEXT• Recognized
text
TEXT
• Speech
engine
speech
Overview
DFD For Character Recognition
System
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning
Network
recognition Network testing
Pre-processing explanation
De-noising
De-skew
Binarization
Pre-processing
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning
Network
recognition Network testing
Pre-processing explanation
DFD For Character Recognition System
 Image segmentation
 Decompose sequence of characters in individual
symbols.
 Directly affects the rate of recognition of script.
 Locate and identify boundaries of image.
1. External segmentation
2. Internal segmentation
SEGMENTATION
.
.
Image segmentation is the process of partitioning
an image into multiple segments ,so as to change
the representation of an image into something that
is more meaningful and easier to analyze.
1
2
3
4
. External Segmentation:
determine the character lines in the text.
Image segmentation is the process of partitioning 1
I m a g e
Internal Segmentation:
decompose an image of sequence of characters to
images of individual symbols
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning
Network
recognition Network testing
Pre-processing explanation
DFD For Character Recognition System
• Mapping of symbol image into a
corresponding two dimensional binary matrix
• Issue – deciding the size of matrix
• Sampling strategy for mapping the symbol
image
Image Digitization - Matrix matching
Input alphabet
‘ a ‘
0
0 0
0
0
0
0
0
0
0 0
0
0
0
0
0
0
1
1 1
1
1
1
1
1
1
1
1 1 1
Segmented grid
Digitization
• To feed matrix data to the network it must be
linearize to a single dimension
0
0 0
0
0
0
0
0
0
0 0
0
0
0
0
0
0
1
1 1
1
1
1
1
1
1
1
1 1 1
…………...0 1 1
N
A
M
E
NAME
001110100….
111010011….
11001100….
000111101…..
NAME
NEURAL
NETWORK
14
1
13
5Image of
scanned
document
Sub-
images of
individual
letter from
document
Binary representation of
sub-images. E.g 0 is
white and 1 is black.
A supervised
neural network
that has been
trained to
recognize
images of
characters.
Neural network output
numeric values
corresponding to the
recognized characters.
File contains
the text of
the scanned
document.
DFD For Character Recognition System
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning
Network
recognition Network testing
Pre-processing explanation
 Artificial neural network consists of
 a large number of highly interconnected processing elements (neurons)
 working in unison to solve specific problems
 analogous to the biological neurons in the brain.
 Neurons communicated with weighted links
NEURON NEURON
Weighted link
X1
Xn
Output
Wk1
Wkp
Summation
Sigmoid function
• Feed-forward neural network
• A multilayer perceptron
• Teaching and adaption of ANN
• Implementation the ANN
Neural Network
Input Signal
Output signal
Input layer
First hidden layer
Second hidden layer
Output layer
DFD For Character Recognition System
Pre-Processing Segmentation Image Digitization
Network Implementation
Training of Learning
Network
Recognition Network testing
Pre-processing explanation
Neural Network
Input Signal
Output signal
Binary converted image
Obtained text of
scanned image
Back-propagation for Error
calculation
ERROR
N
A
M
E
NAME
001110100….
111010011….
11001100….
000111101…..
NAME
NEURAL
NETWORK
14
1
13
5
Sub-
images of
individual
letter from
document
Binary representation of
sub-images. E.g 0 is
white and 1 is black.
A supervised
neural network
that has been
trained to
recognize
images of
characters.
Neural network output
numeric values
corresponding to the
recognized characters.
File contains
the text of
the scanned
document.
Image of
scanned
document
Speech Synthesis
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning
Network
Input
Image
File containing
Text of scanned document
NLP DSP
SPEECH
TEXT
TTS Engine
• TTS-Text to Speech engine
• a computer-based system that read any text
aloud.
• TTS engine consist of
Front-end - NLP
Back-end -DSP
Speech Synthesis
Modules of Text-to-Speech
Natural language
processing
Text Preprocessing
Text Analysis
Linguistic Analysis
Digital
signal
processing
Speech
Synthesizer
TEXT SPEECH
Prosody
Phonemes
Figure 1. A simple but general functional diagram of a TTS system
Input Output
Speech Synthesis
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning
Network
Input
Image
File containing
Text of scanned document
NLP DSP
SPEECH
TEXT
TTS Engine
TEXT
ANALYSIS
Auto
PHONEME
Prosody
Generation
• This step called high-level, front-end or text-
to-phoneme.
• It consists of the following parts:
Text analysis
Automatic Phonetization
Prosody generation
NLP Module
Speech Synthesis
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning
Network
Input
Image
File containing
Text of scanned document
NLP DSP
SPEECH
TEXT
TTS Engine
TEXT
ANALYSIS
Auto
PHONEME
Prosody
Generation
NLP Module
Text Analysis
A pre-processing
A morphological
analysis
A contextual
analysis
A syntactic-prosodic
Text analysis
Speech Synthesis
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning
Network
Input
Image
File containing
Text of scanned document
NLP DSP
SPEECH
TEXT
TTS Engine
TEXT
ANALYSIS
Auto
PHONEME
Prosody
Generation
NLP Module
Automatic Phonetization
Rule-Based
Dictionary-based
Hybrid-approach
Automatic
Phonetization
Speech Synthesis
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning
Network
Input
Image
File containing
Text of scanned document
NLP DSP
SPEECH
TEXT
TTS Engine
TEXT
ANALYSIS
Auto
PHONEME
Prosody
Generation
Formant
synthesis
Concatenative
synthesis
NLP Module
Prosody Generation
Pitch
Intonation
Ryhthm
Prosody
Generation
Speech Synthesis
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning
Network
Input
Image
File containing
Text of scanned document
NLP DSP
SPEECH
TEXT
TTS Engine
TEXT
ANALYSIS
Auto
PHONEME
Prosody
Generation
Formant
synthesis
Concatenative
synthesis
DSP component
• Low level phoneme to speech
• There are two main technologies used for the
generating synthetic speech waveforms:
• Concatenative synthesis
• Formant synthesis
Speech Synthesis
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning
Network
Input
Image
File containing
Text of scanned document
NLP DSP
SPEECH
TEXT
TTS Engine
TEXT
ANALYSIS
Auto
PHONEME
Prosody
Generation
Formant
synthesis
Concatenative
synthesis
Formant Synthesis
• Formant synthesis – rule-based synthesis
• does not use any human speech samples at runtime.
• Wave-form created using an acoustic model of the
human vocal tract.
• Generates artificial, somewhat robotic speech
Speech Synthesis
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning
Network
Input
Image
File containing
Text of scanned document
NLP DSP
SPEECH
TEXT
TTS Engine
TEXT
ANALYSIS
Auto
PHONEME
Prosody
Generation
Formant
synthesis
Concatenative
synthesis
Concatenative synthesis
• Based on the concatenation of segments of
recorded speech.
• Gives the most natural sounding synthesized
speech.
Concatenative
Synthesis
Diphone
Concatenation
Synthesis
Unit
Concatenation
Synthesis
Somewhat robotic
speech, sonic glitches natural speech
SUBTYPES
• Unit Concatenation Synthesis
– Algorithm
• Break language down to small units (phonemes, syllables, etc.)
• Create a large database of recorded speech
• Each unit is labeled: pitch, duration, prosody, position in syllable, etc.
Labeling is synthesizer-dependant
• Target utterance is selected at runtime by determining the best chain
of units (HMM, Decision Tree)
• Use DSP to smooth transitions between units
Approaches To Wave-form Generation
Concatenative
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning
Network
Input
Image
File containing
Text of scanned document
NLP DSP
SPEECH
TEXT
TTS Engine
TEXT
ANALYSIS
Auto
PHONEME
Prosody
Generation
Formant
synthesis
Concatenative
synthesis
Advantages
• Machine Language Translation
• Information Retrievals
• Visual Issue
(Difficulty seeing text)
• Motor Issue
(Difficulty handling a book or paper)
QUESTIONS
????

More Related Content

What's hot

Speech recognition challenges
Speech recognition challengesSpeech recognition challenges
Speech recognition challenges
Alexandru Chica
 
Voice Recognition
Voice RecognitionVoice Recognition
Voice Recognition
Amrita More
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
Ahmed Moawad
 
Voice Identification And Recognition System, Matlab
Voice Identification And Recognition System, MatlabVoice Identification And Recognition System, Matlab
Voice Identification And Recognition System, Matlab
Sohaib Tallat
 
Speech recognition final presentation
Speech recognition final presentationSpeech recognition final presentation
Speech recognition final presentation
himanshubhatti
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
Seminar Links
 
Speaker recognition system by abhishek mahajan
Speaker recognition system by abhishek mahajanSpeaker recognition system by abhishek mahajan
Speaker recognition system by abhishek mahajan
Abhishek Mahajan
 
Mini Project- Audio Enhancement
Mini Project-  Audio EnhancementMini Project-  Audio Enhancement
Speech recognition
Speech recognitionSpeech recognition
Speech recognition
Charu Joshi
 
An Introduction To Speech Recognition
An Introduction To Speech RecognitionAn Introduction To Speech Recognition
Voice Recognition
Voice RecognitionVoice Recognition
Voice Recognition
Gabija Ščiukauskaitė
 
Automatic speech recognition system using deep learning
Automatic speech recognition system using deep learningAutomatic speech recognition system using deep learning
Automatic speech recognition system using deep learning
Ankan Dutta
 
Speaker recognition in android
Speaker recognition in androidSpeaker recognition in android
Speaker recognition in android
Anshuli Mittal
 
Automatic speech recognition system
Automatic speech recognition systemAutomatic speech recognition system
Automatic speech recognition system
Alok Tiwari
 
Esophageal Speech Recognition using Artificial Neural Network (ANN)
Esophageal Speech Recognition using Artificial Neural Network (ANN)Esophageal Speech Recognition using Artificial Neural Network (ANN)
Esophageal Speech Recognition using Artificial Neural Network (ANN)
Saibur Rahman
 
Automatic speech recognition system
Automatic speech recognition systemAutomatic speech recognition system
Automatic speech recognition system
Alok Tiwari
 
A Survey on Speaker Recognition System
A Survey on Speaker Recognition SystemA Survey on Speaker Recognition System
A Survey on Speaker Recognition System
Vani011
 
Deep Learning for Speech Recognition - Vikrant Singh Tomar
Deep Learning for Speech Recognition - Vikrant Singh TomarDeep Learning for Speech Recognition - Vikrant Singh Tomar
Deep Learning for Speech Recognition - Vikrant Singh Tomar
WithTheBest
 
Speech recognition an overview
Speech recognition   an overviewSpeech recognition   an overview
Speech recognition an overview
Varun Jain
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
SrijanKumar18
 

What's hot (20)

Speech recognition challenges
Speech recognition challengesSpeech recognition challenges
Speech recognition challenges
 
Voice Recognition
Voice RecognitionVoice Recognition
Voice Recognition
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Voice Identification And Recognition System, Matlab
Voice Identification And Recognition System, MatlabVoice Identification And Recognition System, Matlab
Voice Identification And Recognition System, Matlab
 
Speech recognition final presentation
Speech recognition final presentationSpeech recognition final presentation
Speech recognition final presentation
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
Speaker recognition system by abhishek mahajan
Speaker recognition system by abhishek mahajanSpeaker recognition system by abhishek mahajan
Speaker recognition system by abhishek mahajan
 
Mini Project- Audio Enhancement
Mini Project-  Audio EnhancementMini Project-  Audio Enhancement
Mini Project- Audio Enhancement
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognition
 
An Introduction To Speech Recognition
An Introduction To Speech RecognitionAn Introduction To Speech Recognition
An Introduction To Speech Recognition
 
Voice Recognition
Voice RecognitionVoice Recognition
Voice Recognition
 
Automatic speech recognition system using deep learning
Automatic speech recognition system using deep learningAutomatic speech recognition system using deep learning
Automatic speech recognition system using deep learning
 
Speaker recognition in android
Speaker recognition in androidSpeaker recognition in android
Speaker recognition in android
 
Automatic speech recognition system
Automatic speech recognition systemAutomatic speech recognition system
Automatic speech recognition system
 
Esophageal Speech Recognition using Artificial Neural Network (ANN)
Esophageal Speech Recognition using Artificial Neural Network (ANN)Esophageal Speech Recognition using Artificial Neural Network (ANN)
Esophageal Speech Recognition using Artificial Neural Network (ANN)
 
Automatic speech recognition system
Automatic speech recognition systemAutomatic speech recognition system
Automatic speech recognition system
 
A Survey on Speaker Recognition System
A Survey on Speaker Recognition SystemA Survey on Speaker Recognition System
A Survey on Speaker Recognition System
 
Deep Learning for Speech Recognition - Vikrant Singh Tomar
Deep Learning for Speech Recognition - Vikrant Singh TomarDeep Learning for Speech Recognition - Vikrant Singh Tomar
Deep Learning for Speech Recognition - Vikrant Singh Tomar
 
Speech recognition an overview
Speech recognition   an overviewSpeech recognition   an overview
Speech recognition an overview
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 

Viewers also liked

Artificial intelligence Speech recognition system
Artificial intelligence Speech recognition systemArtificial intelligence Speech recognition system
Artificial intelligence Speech recognition system
REHMAT ULLAH
 
Social messenger introduction
Social messenger introductionSocial messenger introduction
Social messenger introduction
deepakrajput022
 
Rudy Marsman's thesis presentation slides: Speech synthesis based on a limite...
Rudy Marsman's thesis presentation slides: Speech synthesis based on a limite...Rudy Marsman's thesis presentation slides: Speech synthesis based on a limite...
Rudy Marsman's thesis presentation slides: Speech synthesis based on a limite...
Victor de Boer
 
Blackboard Pattern
Blackboard PatternBlackboard Pattern
Blackboard Pattern
tcab22
 
Blackboard architecture pattern
Blackboard architecture patternBlackboard architecture pattern
Blackboard architecture pattern
aish006
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
Hugo Moreno
 
blackboard architecture
blackboard architectureblackboard architecture
blackboard architecture
Nguyễn Ngân
 
Gujarati Text-to-Speech Presentation
Gujarati Text-to-Speech PresentationGujarati Text-to-Speech Presentation
Gujarati Text-to-Speech Presentation
samyakbhuta
 
Speaker recognition using MFCC
Speaker recognition using MFCCSpeaker recognition using MFCC
Speaker recognition using MFCC
Hira Shaukat
 
Adaline madaline
Adaline madalineAdaline madaline
Adaline madaline
Nagarajan
 
Project report of OCR Recognition
Project report of OCR RecognitionProject report of OCR Recognition
Project report of OCR Recognition
Bharat Kalia
 
Speech Recognition , Noise Filtering and Content Search Engine , Research Do...
Speech Recognition , Noise Filtering and  Content Search Engine , Research Do...Speech Recognition , Noise Filtering and  Content Search Engine , Research Do...
Speech Recognition , Noise Filtering and Content Search Engine , Research Do...
Gayan Kalanamith Mannapperuma
 
software architecture
software architecturesoftware architecture
software architecture
Manidheer Babu
 
SRS FOR CHAT APPLICATION
SRS FOR CHAT APPLICATIONSRS FOR CHAT APPLICATION
SRS FOR CHAT APPLICATION
Atul Kushwaha
 
Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks
Chiranjeevi Adi
 

Viewers also liked (15)

Artificial intelligence Speech recognition system
Artificial intelligence Speech recognition systemArtificial intelligence Speech recognition system
Artificial intelligence Speech recognition system
 
Social messenger introduction
Social messenger introductionSocial messenger introduction
Social messenger introduction
 
Rudy Marsman's thesis presentation slides: Speech synthesis based on a limite...
Rudy Marsman's thesis presentation slides: Speech synthesis based on a limite...Rudy Marsman's thesis presentation slides: Speech synthesis based on a limite...
Rudy Marsman's thesis presentation slides: Speech synthesis based on a limite...
 
Blackboard Pattern
Blackboard PatternBlackboard Pattern
Blackboard Pattern
 
Blackboard architecture pattern
Blackboard architecture patternBlackboard architecture pattern
Blackboard architecture pattern
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
blackboard architecture
blackboard architectureblackboard architecture
blackboard architecture
 
Gujarati Text-to-Speech Presentation
Gujarati Text-to-Speech PresentationGujarati Text-to-Speech Presentation
Gujarati Text-to-Speech Presentation
 
Speaker recognition using MFCC
Speaker recognition using MFCCSpeaker recognition using MFCC
Speaker recognition using MFCC
 
Adaline madaline
Adaline madalineAdaline madaline
Adaline madaline
 
Project report of OCR Recognition
Project report of OCR RecognitionProject report of OCR Recognition
Project report of OCR Recognition
 
Speech Recognition , Noise Filtering and Content Search Engine , Research Do...
Speech Recognition , Noise Filtering and  Content Search Engine , Research Do...Speech Recognition , Noise Filtering and  Content Search Engine , Research Do...
Speech Recognition , Noise Filtering and Content Search Engine , Research Do...
 
software architecture
software architecturesoftware architecture
software architecture
 
SRS FOR CHAT APPLICATION
SRS FOR CHAT APPLICATIONSRS FOR CHAT APPLICATION
SRS FOR CHAT APPLICATION
 
Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks
 

Similar to Ai based character recognition and speech synthesis

Character recognition of Devanagari characters using Artificial Neural Network
Character recognition of Devanagari characters using Artificial Neural NetworkCharacter recognition of Devanagari characters using Artificial Neural Network
Character recognition of Devanagari characters using Artificial Neural Network
ijceronline
 
Deep learning Techniques JNTU R20 UNIT 2
Deep learning Techniques JNTU R20 UNIT 2Deep learning Techniques JNTU R20 UNIT 2
Deep learning Techniques JNTU R20 UNIT 2
EXAMCELLH4
 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_experts
Sanghamitra Deb
 
A12REVIEW.pptx
A12REVIEW.pptxA12REVIEW.pptx
A12REVIEW.pptx
Moinuddin143394
 
Mirko Lucchese - Deep Image Processing
Mirko Lucchese - Deep Image ProcessingMirko Lucchese - Deep Image Processing
Mirko Lucchese - Deep Image Processing
MeetupDataScienceRoma
 
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine LearningMakine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
Ali Alkan
 
Opticalcharacter recognition
Opticalcharacter recognition Opticalcharacter recognition
Opticalcharacter recognition
Shobhit Saxena
 
Understanding deep learning
Understanding deep learningUnderstanding deep learning
Understanding deep learning
Dr. Stylianos Kampakis
 
A SMART LANGUAGE TRANSLATION TECHNIQUE USING OCR
A SMART LANGUAGE TRANSLATION TECHNIQUE USING OCRA SMART LANGUAGE TRANSLATION TECHNIQUE USING OCR
A SMART LANGUAGE TRANSLATION TECHNIQUE USING OCR
IRJET Journal
 
IBM Deep Learning Overview
IBM Deep Learning OverviewIBM Deep Learning Overview
IBM Deep Learning Overview
David Solomon
 
Natural Language Processing Advancements By Deep Learning: A Survey
Natural Language Processing Advancements By Deep Learning: A SurveyNatural Language Processing Advancements By Deep Learning: A Survey
Natural Language Processing Advancements By Deep Learning: A Survey
Rimzim Thube
 
IMAGE TO TEXT TO SPEECH CONVERSION USING MACHINE LEARNING
IMAGE TO TEXT TO SPEECH CONVERSION USING MACHINE LEARNINGIMAGE TO TEXT TO SPEECH CONVERSION USING MACHINE LEARNING
IMAGE TO TEXT TO SPEECH CONVERSION USING MACHINE LEARNING
IRJET Journal
 
Prior AI consulting use cases
Prior AI consulting use casesPrior AI consulting use cases
Prior AI consulting use cases
Harendra Singh
 
Ocr using tensor flow
Ocr using tensor flowOcr using tensor flow
Ocr using tensor flow
Naresh Kumar
 
Artificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
Artificial Intelligence and Deep Learning in Azure, CNTK and TensorflowArtificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
Artificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
Jen Stirrup
 
sahuPPT.pptx
sahuPPT.pptxsahuPPT.pptx
sahuPPT.pptx
KartikDutta10
 
Talk from NVidia Developer Connect
Talk from NVidia Developer ConnectTalk from NVidia Developer Connect
Talk from NVidia Developer Connect
Anuj Gupta
 
Deep Learning - Speaker Verification, Sound Event Detection
Deep Learning - Speaker Verification, Sound Event DetectionDeep Learning - Speaker Verification, Sound Event Detection
Deep Learning - Speaker Verification, Sound Event Detection
Sai Kiran Kadam
 
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionTeaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Zachary S. Brown
 
OWF14 - Big Data : The State of Machine Learning in 2014
OWF14 - Big Data : The State of Machine  Learning in 2014OWF14 - Big Data : The State of Machine  Learning in 2014
OWF14 - Big Data : The State of Machine Learning in 2014
Paris Open Source Summit
 

Similar to Ai based character recognition and speech synthesis (20)

Character recognition of Devanagari characters using Artificial Neural Network
Character recognition of Devanagari characters using Artificial Neural NetworkCharacter recognition of Devanagari characters using Artificial Neural Network
Character recognition of Devanagari characters using Artificial Neural Network
 
Deep learning Techniques JNTU R20 UNIT 2
Deep learning Techniques JNTU R20 UNIT 2Deep learning Techniques JNTU R20 UNIT 2
Deep learning Techniques JNTU R20 UNIT 2
 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_experts
 
A12REVIEW.pptx
A12REVIEW.pptxA12REVIEW.pptx
A12REVIEW.pptx
 
Mirko Lucchese - Deep Image Processing
Mirko Lucchese - Deep Image ProcessingMirko Lucchese - Deep Image Processing
Mirko Lucchese - Deep Image Processing
 
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine LearningMakine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
 
Opticalcharacter recognition
Opticalcharacter recognition Opticalcharacter recognition
Opticalcharacter recognition
 
Understanding deep learning
Understanding deep learningUnderstanding deep learning
Understanding deep learning
 
A SMART LANGUAGE TRANSLATION TECHNIQUE USING OCR
A SMART LANGUAGE TRANSLATION TECHNIQUE USING OCRA SMART LANGUAGE TRANSLATION TECHNIQUE USING OCR
A SMART LANGUAGE TRANSLATION TECHNIQUE USING OCR
 
IBM Deep Learning Overview
IBM Deep Learning OverviewIBM Deep Learning Overview
IBM Deep Learning Overview
 
Natural Language Processing Advancements By Deep Learning: A Survey
Natural Language Processing Advancements By Deep Learning: A SurveyNatural Language Processing Advancements By Deep Learning: A Survey
Natural Language Processing Advancements By Deep Learning: A Survey
 
IMAGE TO TEXT TO SPEECH CONVERSION USING MACHINE LEARNING
IMAGE TO TEXT TO SPEECH CONVERSION USING MACHINE LEARNINGIMAGE TO TEXT TO SPEECH CONVERSION USING MACHINE LEARNING
IMAGE TO TEXT TO SPEECH CONVERSION USING MACHINE LEARNING
 
Prior AI consulting use cases
Prior AI consulting use casesPrior AI consulting use cases
Prior AI consulting use cases
 
Ocr using tensor flow
Ocr using tensor flowOcr using tensor flow
Ocr using tensor flow
 
Artificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
Artificial Intelligence and Deep Learning in Azure, CNTK and TensorflowArtificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
Artificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
 
sahuPPT.pptx
sahuPPT.pptxsahuPPT.pptx
sahuPPT.pptx
 
Talk from NVidia Developer Connect
Talk from NVidia Developer ConnectTalk from NVidia Developer Connect
Talk from NVidia Developer Connect
 
Deep Learning - Speaker Verification, Sound Event Detection
Deep Learning - Speaker Verification, Sound Event DetectionDeep Learning - Speaker Verification, Sound Event Detection
Deep Learning - Speaker Verification, Sound Event Detection
 
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionTeaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
 
OWF14 - Big Data : The State of Machine Learning in 2014
OWF14 - Big Data : The State of Machine  Learning in 2014OWF14 - Big Data : The State of Machine  Learning in 2014
OWF14 - Big Data : The State of Machine Learning in 2014
 

Recently uploaded

Textile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdfTextile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdf
NazakatAliKhoso2
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
co23btech11018
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
gerogepatton
 
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURSCompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
RamonNovais6
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
kandramariana6
 
UNLOCKING HEALTHCARE 4.0: NAVIGATING CRITICAL SUCCESS FACTORS FOR EFFECTIVE I...
UNLOCKING HEALTHCARE 4.0: NAVIGATING CRITICAL SUCCESS FACTORS FOR EFFECTIVE I...UNLOCKING HEALTHCARE 4.0: NAVIGATING CRITICAL SUCCESS FACTORS FOR EFFECTIVE I...
UNLOCKING HEALTHCARE 4.0: NAVIGATING CRITICAL SUCCESS FACTORS FOR EFFECTIVE I...
amsjournal
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
Yasser Mahgoub
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
IJECEIAES
 
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
ecqow
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
171ticu
 
gray level transformation unit 3(image processing))
gray level transformation unit 3(image processing))gray level transformation unit 3(image processing))
gray level transformation unit 3(image processing))
shivani5543
 
Material for memory and display system h
Material for memory and display system hMaterial for memory and display system h
Material for memory and display system h
gowrishankartb2005
 
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
171ticu
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
Madan Karki
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Christina Lin
 
The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.
sachin chaurasia
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
Dr Ramhari Poudyal
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Sinan KOZAK
 
Introduction to AI Safety (public presentation).pptx
Introduction to AI Safety (public presentation).pptxIntroduction to AI Safety (public presentation).pptx
Introduction to AI Safety (public presentation).pptx
MiscAnnoy1
 
Transformers design and coooling methods
Transformers design and coooling methodsTransformers design and coooling methods
Transformers design and coooling methods
Roger Rozario
 

Recently uploaded (20)

Textile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdfTextile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdf
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
 
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURSCompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
 
UNLOCKING HEALTHCARE 4.0: NAVIGATING CRITICAL SUCCESS FACTORS FOR EFFECTIVE I...
UNLOCKING HEALTHCARE 4.0: NAVIGATING CRITICAL SUCCESS FACTORS FOR EFFECTIVE I...UNLOCKING HEALTHCARE 4.0: NAVIGATING CRITICAL SUCCESS FACTORS FOR EFFECTIVE I...
UNLOCKING HEALTHCARE 4.0: NAVIGATING CRITICAL SUCCESS FACTORS FOR EFFECTIVE I...
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
 
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
 
gray level transformation unit 3(image processing))
gray level transformation unit 3(image processing))gray level transformation unit 3(image processing))
gray level transformation unit 3(image processing))
 
Material for memory and display system h
Material for memory and display system hMaterial for memory and display system h
Material for memory and display system h
 
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
 
The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
 
Introduction to AI Safety (public presentation).pptx
Introduction to AI Safety (public presentation).pptxIntroduction to AI Safety (public presentation).pptx
Introduction to AI Safety (public presentation).pptx
 
Transformers design and coooling methods
Transformers design and coooling methodsTransformers design and coooling methods
Transformers design and coooling methods
 

Ai based character recognition and speech synthesis

  • 1. Seminar on “ AI Based Character Recognition and Speech Synthesis” Developed By: Kalyani Hadke Rani Kubetkar Shreya Surjuse Ankita Jadhao Kruttika Sorte Guided By Prof. H. N. Datir
  • 3. NEED!!! We are facing so many problem in our daily life like, if we capturing the image some time we can not get proper image and not recognize the words. Lots of people have the problem of illiteracy . So we wish that this image should be converted to text for various purposes. While studying, we don’t read the text as a regular practice. So we wish that this text can be converted into audio. Apart which we wish should be captured in image & converted into audio. As generally we prefer hearing songs,
  • 4. Introduction to CR and SS • Optical Character Recognition (OCR) is an electronic or mechanical converter. • OCR converts scanned images or text into machine code. • Speech Synthesis is the artificial production of human speech. • Speech synthesizer – a computer system used for this purpose. • TTS engine performs: • Language into speech • Symbolic linguistic representation to speech
  • 5. • Image OCR • Recognized text TEXT • Speech engine speech •Image OCR •Recognized text TEXT• Recognized text TEXT • Speech engine speech Overview
  • 6. DFD For Character Recognition System Pre-Processing Segmentation Image Digitization Network ImplementationTraining of Learning Network recognition Network testing Pre-processing explanation
  • 8. Pre-Processing Segmentation Image Digitization Network ImplementationTraining of Learning Network recognition Network testing Pre-processing explanation DFD For Character Recognition System
  • 9.  Image segmentation  Decompose sequence of characters in individual symbols.  Directly affects the rate of recognition of script.  Locate and identify boundaries of image. 1. External segmentation 2. Internal segmentation SEGMENTATION
  • 10. . . Image segmentation is the process of partitioning an image into multiple segments ,so as to change the representation of an image into something that is more meaningful and easier to analyze. 1 2 3 4 . External Segmentation: determine the character lines in the text. Image segmentation is the process of partitioning 1
  • 11. I m a g e Internal Segmentation: decompose an image of sequence of characters to images of individual symbols
  • 12. Pre-Processing Segmentation Image Digitization Network ImplementationTraining of Learning Network recognition Network testing Pre-processing explanation DFD For Character Recognition System
  • 13. • Mapping of symbol image into a corresponding two dimensional binary matrix • Issue – deciding the size of matrix • Sampling strategy for mapping the symbol image Image Digitization - Matrix matching
  • 14. Input alphabet ‘ a ‘ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 Segmented grid Digitization
  • 15. • To feed matrix data to the network it must be linearize to a single dimension 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 …………...0 1 1
  • 16. N A M E NAME 001110100…. 111010011…. 11001100…. 000111101….. NAME NEURAL NETWORK 14 1 13 5Image of scanned document Sub- images of individual letter from document Binary representation of sub-images. E.g 0 is white and 1 is black. A supervised neural network that has been trained to recognize images of characters. Neural network output numeric values corresponding to the recognized characters. File contains the text of the scanned document.
  • 17. DFD For Character Recognition System Pre-Processing Segmentation Image Digitization Network ImplementationTraining of Learning Network recognition Network testing Pre-processing explanation
  • 18.  Artificial neural network consists of  a large number of highly interconnected processing elements (neurons)  working in unison to solve specific problems  analogous to the biological neurons in the brain.  Neurons communicated with weighted links NEURON NEURON Weighted link X1 Xn Output Wk1 Wkp Summation Sigmoid function
  • 19. • Feed-forward neural network • A multilayer perceptron • Teaching and adaption of ANN • Implementation the ANN
  • 20. Neural Network Input Signal Output signal Input layer First hidden layer Second hidden layer Output layer
  • 21. DFD For Character Recognition System Pre-Processing Segmentation Image Digitization Network Implementation Training of Learning Network Recognition Network testing Pre-processing explanation
  • 22. Neural Network Input Signal Output signal Binary converted image Obtained text of scanned image Back-propagation for Error calculation ERROR
  • 23. N A M E NAME 001110100…. 111010011…. 11001100…. 000111101….. NAME NEURAL NETWORK 14 1 13 5 Sub- images of individual letter from document Binary representation of sub-images. E.g 0 is white and 1 is black. A supervised neural network that has been trained to recognize images of characters. Neural network output numeric values corresponding to the recognized characters. File contains the text of the scanned document. Image of scanned document
  • 24. Speech Synthesis Pre-Processing Segmentation Image Digitization Network ImplementationTraining of Learning Network Input Image File containing Text of scanned document NLP DSP SPEECH TEXT TTS Engine
  • 25. • TTS-Text to Speech engine • a computer-based system that read any text aloud. • TTS engine consist of Front-end - NLP Back-end -DSP Speech Synthesis
  • 26. Modules of Text-to-Speech Natural language processing Text Preprocessing Text Analysis Linguistic Analysis Digital signal processing Speech Synthesizer TEXT SPEECH Prosody Phonemes Figure 1. A simple but general functional diagram of a TTS system Input Output
  • 27. Speech Synthesis Pre-Processing Segmentation Image Digitization Network ImplementationTraining of Learning Network Input Image File containing Text of scanned document NLP DSP SPEECH TEXT TTS Engine TEXT ANALYSIS Auto PHONEME Prosody Generation
  • 28. • This step called high-level, front-end or text- to-phoneme. • It consists of the following parts: Text analysis Automatic Phonetization Prosody generation NLP Module
  • 29. Speech Synthesis Pre-Processing Segmentation Image Digitization Network ImplementationTraining of Learning Network Input Image File containing Text of scanned document NLP DSP SPEECH TEXT TTS Engine TEXT ANALYSIS Auto PHONEME Prosody Generation
  • 30. NLP Module Text Analysis A pre-processing A morphological analysis A contextual analysis A syntactic-prosodic Text analysis
  • 31. Speech Synthesis Pre-Processing Segmentation Image Digitization Network ImplementationTraining of Learning Network Input Image File containing Text of scanned document NLP DSP SPEECH TEXT TTS Engine TEXT ANALYSIS Auto PHONEME Prosody Generation
  • 33. Speech Synthesis Pre-Processing Segmentation Image Digitization Network ImplementationTraining of Learning Network Input Image File containing Text of scanned document NLP DSP SPEECH TEXT TTS Engine TEXT ANALYSIS Auto PHONEME Prosody Generation Formant synthesis Concatenative synthesis
  • 35. Speech Synthesis Pre-Processing Segmentation Image Digitization Network ImplementationTraining of Learning Network Input Image File containing Text of scanned document NLP DSP SPEECH TEXT TTS Engine TEXT ANALYSIS Auto PHONEME Prosody Generation Formant synthesis Concatenative synthesis
  • 36. DSP component • Low level phoneme to speech • There are two main technologies used for the generating synthetic speech waveforms: • Concatenative synthesis • Formant synthesis
  • 37. Speech Synthesis Pre-Processing Segmentation Image Digitization Network ImplementationTraining of Learning Network Input Image File containing Text of scanned document NLP DSP SPEECH TEXT TTS Engine TEXT ANALYSIS Auto PHONEME Prosody Generation Formant synthesis Concatenative synthesis
  • 38. Formant Synthesis • Formant synthesis – rule-based synthesis • does not use any human speech samples at runtime. • Wave-form created using an acoustic model of the human vocal tract. • Generates artificial, somewhat robotic speech
  • 39. Speech Synthesis Pre-Processing Segmentation Image Digitization Network ImplementationTraining of Learning Network Input Image File containing Text of scanned document NLP DSP SPEECH TEXT TTS Engine TEXT ANALYSIS Auto PHONEME Prosody Generation Formant synthesis Concatenative synthesis
  • 40. Concatenative synthesis • Based on the concatenation of segments of recorded speech. • Gives the most natural sounding synthesized speech.
  • 42. • Unit Concatenation Synthesis – Algorithm • Break language down to small units (phonemes, syllables, etc.) • Create a large database of recorded speech • Each unit is labeled: pitch, duration, prosody, position in syllable, etc. Labeling is synthesizer-dependant • Target utterance is selected at runtime by determining the best chain of units (HMM, Decision Tree) • Use DSP to smooth transitions between units Approaches To Wave-form Generation Concatenative
  • 43. Pre-Processing Segmentation Image Digitization Network ImplementationTraining of Learning Network Input Image File containing Text of scanned document NLP DSP SPEECH TEXT TTS Engine TEXT ANALYSIS Auto PHONEME Prosody Generation Formant synthesis Concatenative synthesis
  • 44. Advantages • Machine Language Translation • Information Retrievals • Visual Issue (Difficulty seeing text) • Motor Issue (Difficulty handling a book or paper)