Speech Recognition

1
“One of the most fascinating characteristics
of humans is their capability to
communicate ideas by means of
speech”

An Advanced Method for
Speech Recognition
Prepared By :
Salma Subh Mohmmed
&
Mahmoud Abd _Elmotelb Ibrhaim Mohammed

3
Production of Speech
•voiced excitation
•unvoiced excitation
•transient excitation
Characteristics of the Speech
•The bandwidth of the signal is 4 kHz
•The signal is periodic with a fundamental frequency between 80 Hz
and 350 Hz
•There are peaks in the spectral distribution of energy at
(2n − 1) ∗ 500 Hz ; n = 1, 2, 3, . . . (1.1)
•The envelope of the power spectrum of the signal shows a
decrease with
increasing frequency (-6dB per octave)

5
* Speech Recognition
•* is the process by which a computer (or other type of machine) identifies spoken
words. Basically, it means talking to your computer, AND having it correctly
recognize what you are saying.

6
Acoustic
processing
Feature
extraction Classification
and
recognition
Feature
selection
UTA algorithm
FastFourier
Transform
MelsScale
Bankpass
Filtering
Cepstral
Analysis.
speech recognition
process contains four
main stages:

Speech Recognition System
Three steps to do it
1- Pre-processing ( Analysis Speech)
2- Recognition
3- ( spectral analysis ) >parameter Conversion
7
Acoustic
processing

8
Speech Analysis Techniques Based On
Linear Prediction And Filterbanks.

Pre-processing
•‫للصوت‬ ‫األساسية‬ ‫المعالجة‬ ‫تسبق‬ ‫التي‬ ‫العمليات‬ ‫وهي‬
•‫إل‬ ‫الحاسوب‬ ‫إلى‬ ‫المدخل‬ ‫الصوت‬ ‫تحويل‬‫ى‬‫يستطيع‬ ‫شكل‬
‫ال‬recognizer‫معه‬ ‫التعامل‬
9

pre-processing
(Data collection & acquisition)
‫واكتسابها‬ ‫البيانات‬ ‫تجميع‬
‫صوتي‬ ‫بصمات‬ ‫لهم‬ ‫ونأخذ‬ ‫متقاربة‬ ‫أعمار‬ ‫في‬ ‫وإناث‬ ‫ذكور‬ ‫معينين‬ ‫أشخاص‬ ‫جمع‬ ‫ومعناها‬‫ة‬
(voiced & unvoiced detection)
‫مسموع‬ ‫والغير‬ ‫المسموع‬ ‫اكتشاف‬
‫مسموعة‬ ‫وغير‬ ‫مسموعة‬ ‫أصوات‬ ‫الكالم‬ ‫في‬ ‫لدينا‬ ‫يكون‬ ‫أن‬ ‫الطبيعي‬ ‫من‬...
‫المسموع‬:‫لها‬ ‫اى‬amplitude‫كبير‬..‫مسموع‬ ‫الغير‬:‫اى‬‫وهو‬ ، ‫ظاهر‬ ‫غير‬‫الذى‬‫يكون‬
amplitude‫يشبه‬ ‫مما‬ ‫صغير‬noise
ex
10

end -point-detection
‫المفيد‬ ‫الكالم‬ ‫ونهاية‬ ‫بداية‬ ‫تحديد‬
‫السكوت‬ ‫فترة‬ ‫فهناك‬ ‫شخص‬ ‫يتكلم‬ ‫عندما‬..‫يكون‬ ‫ال‬ ‫وهنا‬
ampltiude‫يسمى‬ ‫وهذا‬ ‫جدا‬ ‫صغيرة‬ ‫قيمة‬ ‫له‬ ‫ولكن‬ ‫صفر‬
‫بـ‬noise
11

Time Wrapping( segmentation into frame )
‫واحدة‬ ‫كلمة‬ ‫نطقوا‬ ‫األشخاص‬ ‫من‬ ‫مجموعة‬ ‫لدينا‬ ‫كان‬ ‫لو‬ ‫مثال‬
‫لكل‬ ‫الصوتي‬ ‫التسجيل‬‫ن‬ ‫فرد‬ ‫كل‬ ‫أن‬ ‫فية‬ ‫نالحظ‬ ‫شخص‬‫طق‬
‫األخر‬ ‫عن‬ ‫مختلفة‬ ‫فترة‬ ‫في‬ ‫الكلمة‬..‫ت‬ ‫يجب‬ ‫وبالتالي‬‫طول‬ ‫حديد‬
‫الصوتية‬ ‫البصمات‬ ‫لجميع‬ ‫معين‬..
12

Framing
•‫أعض‬ ‫ستكون‬ ‫جدا‬ ‫بسيطة‬ ‫زمنية‬ ‫فترة‬ ‫خالل‬ ‫فإنه‬ ‫معين‬ ‫بحرف‬ ‫النطق‬ ‫أثناء‬‫النطق‬ ‫اء‬
‫غير‬ ‫ثبات‬ ‫في‬‫هي‬ ‫الفترة‬ ‫وهذه‬ ‫ملحوظ‬20‫ثانية‬ ‫ملي‬
•Speech‫كل‬ ‫ثبات‬ ‫فيها‬ ‫يحدث‬(20‫ثانية‬ ‫ملي‬)‫ال‬ ‫تكاد‬ ‫جدا‬ ‫صغيرة‬ ‫فترة‬ ‫وهي‬
‫تذكر‬
•‫نقسم‬ ‫سوف‬‫ال‬speech‫من‬ ‫مجموعة‬ ‫إلى‬frames‫كل‬ ‫أن‬ ‫بحيث‬frame20
‫كل‬ ‫من‬ ‫نأخذ‬ ‫ثم‬ ، ‫ثانية‬ ‫ملى‬frame‫عينة‬sample‫خصائص‬ ‫عن‬ ‫تعبر‬frame
13

Windwing
•‫تح‬ ‫قد‬ ‫التي‬ ‫الخطأ‬ ‫نسبة‬ ‫من‬ ‫التقليل‬ ‫يتم‬ ‫المرحلة‬ ‫هذه‬ ‫وبواسطة‬‫نتيجة‬ ‫دث‬
‫موجات‬ ‫تقسيم‬‫إلى‬ ‫الكالم‬frames
•The most common in speech analysis is the Hamming
window:
14

15
We can now assemble a set of band pass filters to analyse speech. These
need to be covering - that is every frequency is covered by one filter so no
information is lost
is a popular speech coding analysis

Recognition
•‫المدخل‬ ‫الصوت‬ ‫على‬ ‫التعرف‬ ‫مرحلة‬
•‫قسمين‬ ‫إلى‬ ‫المرحلة‬ ‫هذه‬ ‫تنقسم‬:
(identification & verification)
Identification:‫على‬ ‫الطريقة‬ ‫هذه‬ ‫تعتمد‬Distance measurement
‫الب‬ ‫مثل‬ ‫معينة‬ ‫كثافة‬ ‫داخل‬ ‫صحيح‬ ‫شيء‬ ‫اقرب‬ ‫حساب‬ ‫وهى‬‫صمة‬
Verification:‫الخطوة‬ ‫من‬ ‫الناتج‬ ‫صحة‬ ‫من‬ ‫التأكد‬ ‫وهى‬
‫السابقة‬
16

# Concept
isolated word recognition I W R
‫بعضها‬ ‫عن‬ ‫ومعزولة‬ ‫منفصلة‬ ‫كلمات‬ ‫على‬ ‫للتعرف‬ ‫ويستخدم‬‫مشكلة‬ ‫نواجه‬ ‫ال‬ ‫ألننا‬ ‫وذلك‬ ‫التعرف‬ ‫أنواع‬ ‫أسهل‬ ‫وهو‬
‫ال‬co-articulation‫في‬ ‫الحرف‬ ‫التقاء‬ ‫وهي‬‫مما‬ ‫الثانية‬ ‫الكلمة‬ ‫بداية‬ ‫في‬ ‫الحرف‬ ‫مع‬ ‫األولى‬ ‫الكلمة‬ ‫نهاية‬
‫التعرف‬ ‫في‬ ‫صعوبة‬ ‫يسبب‬
connected word recognition C W R
‫وذلك‬ ‫بفواصل‬ ‫الكلمات‬ ‫من‬ ‫مجموعة‬ ‫على‬ ‫للتعرف‬ ‫يستخدم‬‫بوضع‬Stops‫السابق‬ ‫النوع‬ ‫يشبه‬ ‫وهو‬ ‫الكلمات‬ ‫بين‬
‫التعرف‬ ‫في‬ ‫أصعب‬ ‫لكنه‬
continuous speech recognition C S R
‫المتواصل‬ ‫الكالم‬ ‫على‬ ‫للتعرف‬ ‫وهي‬
Speech understanding S U
‫إلى‬ ‫تحويله‬ ‫وممكن‬ ‫خاصة‬ ‫مترجمات‬ ‫بواسطة‬ ‫الكالم‬ ‫فهم‬ ‫عمليات‬ ‫وهي‬‫عليه‬ ‫التعرف‬ ‫بعد‬ ‫نصوص‬
speaker identification ,speaker verification S I, S V
word spotting
‫معينة‬ ‫كلمات‬ ‫عن‬ ‫للتنقيب‬ ‫ويستخدم‬
17

18
# Generally, there are three
usualmethods in speech
Recognition
•between two time series
•determine if two waveforms represent
the same spoken
recognition:
Dynamic Time
Warping
(DTW)
•having a given number of state
Hidden Markov
Model
(HMM)
•parallel distributed processing
•faster
Artificial Neural
Networks
(ANNs)

20
A hidden Markov model (HMM) is a statistical Markov model in
which the system being modeled is assumed to be a Markov process
with unobserved (hidden) states.
An HMM can be considered as the simplest dynamic Bayesian
network.
In a regular Markov model, the state is directly visible to the
observer, and therefore the state transition probabilities are the only
parameters.
In a hidden Markov model, the state is not directly visible, but
output, dependent on the state, is visible.
Each state has a probability distribution over the possible output
tokens. Therefore the sequence of tokens generated by an HMM
gives some information about the sequence of states.

21
Note that the adjective 'hidden' refers to the state sequence
through which the model passes, not to the parameters of the
model; even if the model parameters are known exactly, the
model is still 'hidden'.
Hidden Markov models are especially known for their
application in temporal pattern recognition such as speech,
handwriting, gesture recognition, part-of-speech tagging,
musical score following, partial discharges and bioinformatics.
A hidden Markov model can be considered a generalization of a
mixture model where the hidden variables (or latent variables),
which control the mixture component to be selected for each
observation, are related through a Markov process rather than
independent of each other.

•‫التالية‬ ‫السلسلة‬ ‫إنشاء‬ ‫يمكننا‬ ، ‫السابق‬ ‫المخطط‬ ‫من‬:
N1 N2 N3
N1 N2 N2 N2 N3 N3 N3 N3 N3
N1 N1 N2 N2 N3
‫و‬ ‫الصوت‬ ‫على‬ ‫كالتعرف‬ ‫المعقدة‬ ‫األشياء‬ ‫حاله‬ ‫في‬ ‫المخططات‬ ‫تلك‬ ‫بين‬ ‫المسارات‬‫معالجة‬
‫القادمة‬ ‫الصورة‬ ‫في‬ ‫موضح‬ ‫هو‬ ‫كما‬ ، ‫القيم‬ ‫بعض‬ ‫عليها‬ ‫تكون‬ ، ‫اللغات‬
22
‫يلي‬ ‫كما‬ ، ‫مسار‬ ‫كل‬ ‫في‬ ‫بعضها‬ ‫مع‬ ‫وضربها‬ ‫القيم‬ ‫بإسناد‬ ‫نقوم‬ ‫سوف‬ ‫أالن‬:
N1 N2 N3 = 0.4 * 0.8 * 0.5 = 0.16
N1 N2 N2 N2 N3 N3 N3 N3 N3 = 0.4 x 0.2 x 0.2 x 0.8 x 0.5 x 0.5 x 0.5 x 0.5 = 0.0008
N1 N1 N2 N2 N3 = 0.6 x 0.4 x 0.2 x 0.8 x 0.5 = 0.192

•
‫نموذج‬ ‫يسمى‬ ‫القيم‬ ‫مع‬ ‫الموجه‬ ‫المخطط‬ ‫هذا‬‫ماركوف‬‫ولك‬ ، ‫فكرته‬ ‫لبساطه‬ ‫نظرا‬ ‫تتعجب‬ ‫قد‬ ،‫نه‬
‫الصوت‬ ‫على‬ ‫كالتعرف‬ ‫ما‬ ‫مشكله‬ ‫في‬ ‫استخدامه‬ ‫تم‬ ‫حال‬ ‫في‬ ‫جدا‬ ‫فعال‬.
‫منه‬ ‫وكل‬ ‫الكلمات‬ ‫من‬ ‫اآلالف‬ ‫مع‬ ‫التعامل‬ ‫البرنامج‬ ‫على‬ ‫يجب‬ ، ‫الصوت‬ ‫على‬ ‫التعرف‬ ‫حاله‬ ‫في‬‫ا‬
‫مختلف‬ ‫بشكل‬ ‫تنطق‬(‫نطق‬ ‫من‬ ‫أكثر‬ ‫لها‬)‫بكلمة‬ ‫كلمه‬ ‫البحث‬ ‫وطريقة‬ ،brute force‫مجدية‬ ‫غير‬
‫نموذج‬ ‫استخدام‬ ‫مع‬ ‫لكن‬ ، ‫أيضا‬ ‫والذاكرة‬ ‫الوقت‬ ‫من‬ ‫الكثير‬ ‫وتستهلك‬ ‫بتاتا‬‫ماركوف‬‫يمكن‬‫من‬ ‫نا‬
‫ل‬ ‫التالي‬ ‫بالمثال‬ ‫األمر‬ ‫هذا‬ ‫نوضح‬ ، ‫أيضا‬ ‫المناسبة‬ ‫النطق‬ ‫طريقه‬ ‫واختيار‬ ‫الكلمات‬ ‫تمثيل‬‫كلمه‬ ‫نطق‬
tomato.
•
23
t ow m aa t ow - British English
t ah m ey t ow - American English
t ah mey t a - Possibly pronunciation when speaking quickly

‫المخفية‬ ‫ماركوف‬ ‫نماذج‬ ‫مع‬ ‫ارتبطت‬ ‫رئيسية‬ ‫خوارزميات‬ ‫ثالث‬ ‫هنــاك‬:
The forward algorithm, useful for isolated word recognition
The Viterbi algorithm, useful for continuous speech recognition
The forward-backward algorithm, useful for training an HMM
24

Speech Recognition

Recommended

Recommended

More Related Content

Recently uploaded

Recently uploaded (10)

Featured

Featured (20)

Speech Recognition