Speech-Recognition.pptx

OUTLINE
1. Introduction
2. Background with Literature Support
3. Motivation / Research Gaps
4. Hypothesis / Research Questions
5. Objectives
6. Methodology
1. Research Flow Diagram
2. Data Required
3. Analytical Tools Required
7. Proposed Plan for Next 3 Year over all plan
8. Proposed Activity for Next Year

INTRODUCTION
• Speech means the expression of or the ability to express thoughts and feelings
by articulating sounds orally.
• Recognition is the means of identification of those sounds
• Speech recognition, or speech-to-text, is the ability of a machine or a program
to identify words spoken aloud and convert them into readable text.
• But there are some problems in speech recognition of certain languages out of
which recognition of tribal languages has been major concern rather than the
urban languages like telugu,Hindi,Tamil,Malayalam etc.
• When huge amount of text or linguistic problem oriented languages like tribal
languages have been encountered there exist demand for Automatic speech
Recognition where the letters or words spoken need to be identified based on
certain mechanisms automatically.

INTRODUCTION
What is ASR(Automatic Speech Recognition?
• ASR is a Speech recognition technology which is capable of converting spoken
language (an audio signal) into written text that is often used as a command or
further process.
• It just address the aspect of Speech to Text(STT).
• The speech processing used in :
Digital speech coding,
Spoken language dialog systems,
Text-to-speech synthesis,
Automatic speech recognition.
Information (such as speaker, gender or language )
can also be extracted from speech

Speech Production/Speech Perception[1]

BACKGROUND WITH LITERATURE SUPPORT
No Title Author Journal Name & Year Methodology Adapted Key Findings Gap
s
1 A review on speech
processing using machine
learning paradigm.
Bhangale, Kishor &
Kothandaraman,
Mohanaprasad
International Journal of
Speech Technology
It concentrated on the
distinct feature extraction
and machine learning
classifiers for the speech
processing applications.
Statistical feature extraction
techniques such as PCA, ICA, and ZCR
are suitable for low vocabulary, but for
larger and noisy vocabulary it shows
poor performance.
2 Revisiting signal
processing with
spectrogram analysis on
EEG, ECG and speech
signals
Wang, W., Zhang, G., Yang, L.,
Balaji, V. S., Elamaran, V., &
Arunkumar, N.
Future Generation
Computer Systems, 98
Spectrogram is performed
on EEG, ECG and Speech
signals
Spectrogram is a better tool for ECG.
3 Recognition of Vowels in
Continuous Speech by
Using Formants
Prica, Biljana & Ili, Siniša. FACTA UNIVERSITATIS
(NIS) ˇ SER.: ELEC.
ENERG. vol. 23, no. 3,
December 2010, 379-
393
Using of LPC Method in
Speech Analysis
Investigated the correlations between
formants in each vowel and developed
the algorithm to reduce the overlap of
different vowels in F1-F2 and F2-F3
planes
4 Speech Processing for
Language Learning: A
Practical Approach to
Computer-Assisted
Pronunciation Teaching
Bogach, N., Boitsova, E.,
Chernonog, S., Lamtev, A.,
Lesnichaya, M., Lezhenin, I.,
Novopashenny, A.,
Svechnikov, R., Tsikach, D.,
Vasiliev, K., Pyshkin, E., &
Blake, J. (2021).
.Electronics
(Switzerland), 10(3).
Voice Activity Detection,
Prosodic Similarity
Evaluation
Improvements in signal processing
algorithms to help learning foreign
languages.

MOTIVATION / RESEARCH GAPS
0th REVIEW 2202030001 Department of CSE
• As a vast country with diverse Indigenous cultures and traditions, India
is a hub of languages.
• In a multilingual society like ours, language plays an important role in
shaping our social experiences.
• However, many of the indigenous languages, especially, that of Adivasis,
are at threat due to several reasons.
• In this context still some of tribal languages like Gondi and Koya/Koi is
not recognized by the government of India—making it difficult to
preserve the indigenous languages.

MOTIVATION / RESEARCH GAPS
0th REVIEW 2202030001 Department of CSE
 Another significant issue at hand is the non-tribal migration to the tribal
hamlets. Non-tribal people speak the official state language or regions’
dominant language.
• In this context after thorough literature survey I thought Automatic
speech recognition (ASR) may be implemented so as to recognize letter or
word or sentence so that the acknowledging the letters of a tribal language
may always bring the urban people closure to these adivasis so that entire
culture may converge to a single group in its understanding.

HYPOTHESIS / RESEARCH QUESTIONS
https://en.wikipedia.org/wiki/Alluri_Sitharama_Raju_district
In Andhra Pradesh state, tribal languages are like Adivasi odiya, Savara, Koya,
Kui, Gondi, Ollari, Chenchu etc.

OBJECTIVES
The Goal of Automatic Speech Recognition ( ASR ):
• To Transform a sequence of sound waves into a string of letters or words.
• Basic steps to implement Automatic Speech Recognition
• Speech wave form is processed to produce new representation as a sequence of
vectors containing values of features or parameters.
• The parameters values extracted from raw speech or used to build acoustic
models.
• The dictionary provides pronunciations for words found in the language model.
• And thereafter fixing a suitable Problem Statement/Objective in this regard.

METHODOLOGY
• Speech recognition : Pattern recognition
• Data required : Audio recordings of human speech
12
Speech Signal Processing
(Feature Extraction)
• Digitisation of analog speech signal
• Blocking signal into frames
• FFT → mel filter → log → IFFT ⇒ MFCC
• Sequence of feature vectors
: x1, x2,... xT
: o1, o2,... oT
Research Flow Diagram

METHODOLOGY
Analytical Tools required :
Speech Signal Processing (Feature Extraction)
i. Time-domain approach
ii. Frequency-Domain Approach

Analytical Tools required :
Source-Filter Model [7]
Fourier or
Harmonic or
Spectral analysis
is inverse of
Spectral
synthesis (wave
synthesis) as
shown on right.

Speech signal represented as a sequence of spectral vectors

METHODOLOGY
Models (Classifiers ) to be used in speech processing
• Dynamic time warping (DTW
• Hidden Markov model (HMM)
• Gaussian mixture model (GMM)
• Combination of HMM and GMM
• Artificial Neural Network
• Deep Neural Network etc.

Speech-Recognition.pptx

Recommended

Recommended

More Related Content

Similar to Speech-Recognition.pptx

Similar to Speech-Recognition.pptx (20)

Recently uploaded

Recently uploaded (20)

Speech-Recognition.pptx