Mr.T.JAYASANKAr
ASSITANT PrOFESSOr
DEPArTMENT OF ECE
ANNA UNIVErSITY, BIT CAMPUS,
TIrUCHIrAPALLI.
A STUDY ONA STUDY ON
ELECTROMYOGRAPHYELECTROMYOGRAPHY
BASEDSPEECHBASEDSPEECH
RECOGNITIONRECOGNITION
K.SrIrAM, D.VETrIVEL, S.VENGATESH
U.G. SCHOLAr, DEPArTMENT OF ECE,
ANNA UNIVErSITY,
BIT CAMPUS, TIrUCHIrAPALLI.
ABSTrACT
We present our recent study on EMG based speech
recognition by using Silent Speech Interface (SSI). An
Electromyography (EMG) is the one which detects the
electric potential generated by muscle cells when these
cells are electrically activated. This paper helps in
choosing the technique along with their relative merits &
demerits and also gives technique developed in each
stage of speech recognition. It concludes with the brief
explanation in matching techniques and the factor to
analyze the performance of systems.
PRODUCTION OF SPEECH
PHONETICS
• Phonetics is the precise study of human speech
sounds.
• An appropriate knowledge of phonetics enables a
person to acquire a correct knowledge of
pronunciation and describes how sounds are made.
BrANCHES:
• Articulatory phonetics
• Acoustic phonetics
• Auditory phonetics
AUTOMATIC SPEECH
rECOGNITION
WHY WE GO FOr SSI…
• Even the best speech recognition systems sometimes make
errors. If there is noise or some other sound in the room
(e.g. the television or a kettle boiling), the number of errors
will increase.
• Speech Recognition works best if the microphone is close to
the user. More distant microphones (e.g. on a table or wall)
will tend to increase the number of errors.
• Confidential and private communication in public places is
difficult due to the clearly audible speech.
• Silent Speech Interface is an electronic device
that supports speech communication to take
place without the necessity of emitting an
audible acoustic signal by a human being.
• As such it is a type of electronic lip reading.
ELECTrOMYOGrAPHY
• The Silent Speech Interface uses electromyography,
monitoring tiny muscular movements that occur
when we speak.
• Monitored signals are converted into electrical
pulses that can then be turned into speech, without
a sound uttered.
• It is a technique which monitors tiny muscular
movements and pulses generated by it . The
transducers involved converts the pulses into electric
signals .
• Electromyography sensors attached to the face
records the electric signals produced by the facial
muscles, compare them with pre recorded signal
pattern of spoken words .
• When there is a match that sound is recognized by
the system and performing any required task.
STAGES IN SPEECH RECOGNITION
ACQUIRING THE EMG SIGNAL
PRE PROCESSING
SIGNAL PROCESSING
FEATURE EXTRACTION
ACQUIRING EMG SIGNAL
• silver/silver chloride (Ag/Ag-Cl) surface electrodes.
• Three channels of electromyogram recording system,
 M. digastrics,
 M. zygomaticus major
 M.orbicularis
• Six channels of electromyogram recording system,
the lavatorangulioris,
the zygomaticus major,
the platysma,
the depresserrangulioris ,
the interiod belly of the digastic, and
the tongue.
SIX CHANNEL EMG DATA ACQUISITION SYSTEM
HIGH PASS FILTER
60Hz
SAMPLER
600HzEMG SIGNAL
FILTERED EMG
SIGNALS
ACQUIRING THE EMG SIGNAL
FOUR CHANNEL EMG DATA ACQUISITION SYSTEM
NOTCH FILTER
60Hz
SAMPLER
2KHz
EMG SIGNAL
FILTERED EM
SIGNALS
ACQUIRING THE EMG SIGNAL
• right and left area of throat near the chin cleft
• 1-0.5 centimeter from the left and right side of the
larynx.
PREPROCESSING
FILTERS:
•LPF
•HPF
•BPF
•NOTCH FILTERS
– Power line interference
WAVELET TRANSFORM:
•wavelet transform is not only a very promising technique
for time –frequency analysis but also a noise reduction
method.
ACCURATE RECOGNITION
• Principle Component Analysis (PCA)
 Used Orthogonal Transformation to convert a correlated
Variables into Uncorrelated variables.
• Linear Predictive Coding (LPC)
LPC act as a tool used mostly in audio signal processing for representing
the spectral envelope of a digital signal of speech in compressed form.
 Current speech can be closely approximated as a linear combination of
past samples.
SIGNAL PROCESSING
• FILTERING
 50/60 Hz Notch filter
• NORMALIZATIOM
 The EMG signal is very sensitive to the changes in the changes in the electrodes
position and temperature issues.
 Hence to make a comparison of possible amplitudes it is very important to apply a
normalization process at each recording in order to compensate these changes.
• SIGNAL INTERPRETATION
» Short Time Fourier Transform (STFT)
» Wavelet Transform (WT)
» Wavelet Packed Transform (WPT)
FEATURE EXTRACTION
• Feature extraction to reduce the dimensionality of the
data.
• Thus feature extraction is the process of isolating the
most useful components of the data for further study
while discarding the less useful aspects.
• It reduces the number of variables that must be
examined, thereby saving time and resources.
FEATURE EXTRACTION
SPEECH SIGNAL MEL SCALE FILTERING
LOG
FEATURE VECTOR DERIVATIVES
DISCRETE COSINE
TRANSFORM
FAST FOURIER
TRANSFORM
Mel Frequency Cepstral Coefficients(MFCC)
spectrum
Mel frequency
spectrum
Cepstral
coefficient
PATTERN RECOGNITION
APPROACH
• 2 steps:
– Pattern Training
– Pattern Comparison
• Goal to determine identity of unknown speech according
to how well patterns match
METHODS IN PATTERN
COMPARISON APPROACH
• Template Based Approach
• Stochastic Approach (HMM)
– Probabilistic Models
– Uncertainty and Incompleteness
TEMPlATE-BASED ASR
• Originally only worked for isolated words
• For each word we want to recognize, we store a template or
example based on actual data
• Patterns stored as dictionary of words
• Each test utterance is checked against the templates to find
the best match
• Uses the Dynamic Time Warping (DTW) algorithm
DyNAMIC TIME WARPINg
• Dynamic time Warping founds an optimal match
between two sequences of feature vectors which allow
for stretched and compressed section of the
sequence.
STOCHASTIC APPROACH
• System that changes over time in an uncertain manner. It
entails the use of probabilistic models to deal with uncertain or
incomplete information.
HIDDEN MARKOV MODEl
• A hidden Markov model (HMM) is a statistical model,in which the
system being modeled is assumed to be a Markov process (Memoryless
process: its future and past are independent ) with hidden state.
• To understand the use of HMM for speech modeling, let us take this
Example,Consider the word "again" with two possible pronunciation:
(1)Again: AXGEHN
(2)Again: AXGEYN
HIDDEN MARKOV MODEl
ENDN
EH
EY
AXBEGIN
HMM FOR WORD AGAIN
G
P{X/AX} P{X/G}
P{X/EY}
P{X/EH}
P{X/N}
MATCHINg TECHNIQUE
WHOLE-WORD MATCHING:
» Compares the incoming digital-audio signal against a prerecorded
template.
» Requires a large amount of storage space.
SUB-WORD MATCHING:
» Looks for sub-words – usually phonemes and then performs
further pattern recognition
» Requires much less storage
EMg SPEECH RECOgNIZER
• Session Dependent (SD)
• Session Independent (SD)
• Multi Sessions (MS)
• Session Adaptive Systems (SAS)
SESSION DEPENDENT
SESSION INDEPENDENT
SIlENT SOUND TECHNOlOgy
….Silence is the best answer for all
the situations …even your mobile
understands !
ApplicAtions
The Technology opens up a host of
application such as mentioned below:
• As we know in space there is no
medium for sound to travel therefore
this technology can be best utilized by
astronauts.
• Helping people who have lost their
voice due to illness or accident.
• We can make silent calls even if we are
standing in a crowded place.
• Telling a trusted friend your PIN number over
the phone without anyone eavesdropping —
assuming no lip-readers are around.
• Silent Sound Techniques is applied in Military
for communicating secret/confidential matters
to others.
• Since the electrical signals are universal they
can be translated into any language. Native
speakers can translate it before sending it to
the other side. Hence it can be converted into
any language of choice currently being
German, English & French.
REstRictions
• Translation into majority of languages but for
languages such as Chinese different tone holds
different meaning, facial movements being the
same. Hence this technology is difficult to apply in
such situations.
• From security point of view recognising who you
are talking to gets complicated.
• Even differentiating between people and emotions
cannot be done. This means you will always feel
you are talking to a robot.
• This device presently needs nine leads to be
attached to our face which is quite impractical to
make it usable.
FUtURE pRospEcts
• Silent sound technology gives way to a bright future to
speech recognition technology.
• Without having electrodes hanging all around your
face, these electrodes will be incorporated into systems
.
• It may have features like lip reading based on image
recognition & processing rather than
electromyography.
• An electric signals are universal. So it can be translated
into any languages. Now a days only the translation
between English, French and German are available.
conclUsion
A study of EMG  based Speech Recognition
A study of EMG  based Speech Recognition

A study of EMG based Speech Recognition

  • 2.
    Mr.T.JAYASANKAr ASSITANT PrOFESSOr DEPArTMENT OFECE ANNA UNIVErSITY, BIT CAMPUS, TIrUCHIrAPALLI. A STUDY ONA STUDY ON ELECTROMYOGRAPHYELECTROMYOGRAPHY BASEDSPEECHBASEDSPEECH RECOGNITIONRECOGNITION K.SrIrAM, D.VETrIVEL, S.VENGATESH U.G. SCHOLAr, DEPArTMENT OF ECE, ANNA UNIVErSITY, BIT CAMPUS, TIrUCHIrAPALLI.
  • 3.
    ABSTrACT We present ourrecent study on EMG based speech recognition by using Silent Speech Interface (SSI). An Electromyography (EMG) is the one which detects the electric potential generated by muscle cells when these cells are electrically activated. This paper helps in choosing the technique along with their relative merits & demerits and also gives technique developed in each stage of speech recognition. It concludes with the brief explanation in matching techniques and the factor to analyze the performance of systems.
  • 4.
  • 5.
    PHONETICS • Phonetics isthe precise study of human speech sounds. • An appropriate knowledge of phonetics enables a person to acquire a correct knowledge of pronunciation and describes how sounds are made. BrANCHES: • Articulatory phonetics • Acoustic phonetics • Auditory phonetics
  • 6.
  • 7.
    WHY WE GOFOr SSI… • Even the best speech recognition systems sometimes make errors. If there is noise or some other sound in the room (e.g. the television or a kettle boiling), the number of errors will increase. • Speech Recognition works best if the microphone is close to the user. More distant microphones (e.g. on a table or wall) will tend to increase the number of errors. • Confidential and private communication in public places is difficult due to the clearly audible speech.
  • 8.
    • Silent SpeechInterface is an electronic device that supports speech communication to take place without the necessity of emitting an audible acoustic signal by a human being. • As such it is a type of electronic lip reading.
  • 9.
    ELECTrOMYOGrAPHY • The SilentSpeech Interface uses electromyography, monitoring tiny muscular movements that occur when we speak. • Monitored signals are converted into electrical pulses that can then be turned into speech, without a sound uttered. • It is a technique which monitors tiny muscular movements and pulses generated by it . The transducers involved converts the pulses into electric signals .
  • 10.
    • Electromyography sensorsattached to the face records the electric signals produced by the facial muscles, compare them with pre recorded signal pattern of spoken words . • When there is a match that sound is recognized by the system and performing any required task.
  • 11.
    STAGES IN SPEECHRECOGNITION ACQUIRING THE EMG SIGNAL PRE PROCESSING SIGNAL PROCESSING FEATURE EXTRACTION
  • 12.
    ACQUIRING EMG SIGNAL •silver/silver chloride (Ag/Ag-Cl) surface electrodes. • Three channels of electromyogram recording system,  M. digastrics,  M. zygomaticus major  M.orbicularis • Six channels of electromyogram recording system, the lavatorangulioris, the zygomaticus major, the platysma, the depresserrangulioris , the interiod belly of the digastic, and the tongue.
  • 13.
    SIX CHANNEL EMGDATA ACQUISITION SYSTEM HIGH PASS FILTER 60Hz SAMPLER 600HzEMG SIGNAL FILTERED EMG SIGNALS ACQUIRING THE EMG SIGNAL
  • 14.
    FOUR CHANNEL EMGDATA ACQUISITION SYSTEM NOTCH FILTER 60Hz SAMPLER 2KHz EMG SIGNAL FILTERED EM SIGNALS ACQUIRING THE EMG SIGNAL • right and left area of throat near the chin cleft • 1-0.5 centimeter from the left and right side of the larynx.
  • 15.
    PREPROCESSING FILTERS: •LPF •HPF •BPF •NOTCH FILTERS – Powerline interference WAVELET TRANSFORM: •wavelet transform is not only a very promising technique for time –frequency analysis but also a noise reduction method.
  • 16.
    ACCURATE RECOGNITION • PrincipleComponent Analysis (PCA)  Used Orthogonal Transformation to convert a correlated Variables into Uncorrelated variables. • Linear Predictive Coding (LPC) LPC act as a tool used mostly in audio signal processing for representing the spectral envelope of a digital signal of speech in compressed form.  Current speech can be closely approximated as a linear combination of past samples.
  • 17.
    SIGNAL PROCESSING • FILTERING 50/60 Hz Notch filter • NORMALIZATIOM  The EMG signal is very sensitive to the changes in the changes in the electrodes position and temperature issues.  Hence to make a comparison of possible amplitudes it is very important to apply a normalization process at each recording in order to compensate these changes. • SIGNAL INTERPRETATION » Short Time Fourier Transform (STFT) » Wavelet Transform (WT) » Wavelet Packed Transform (WPT)
  • 18.
    FEATURE EXTRACTION • Featureextraction to reduce the dimensionality of the data. • Thus feature extraction is the process of isolating the most useful components of the data for further study while discarding the less useful aspects. • It reduces the number of variables that must be examined, thereby saving time and resources.
  • 19.
    FEATURE EXTRACTION SPEECH SIGNALMEL SCALE FILTERING LOG FEATURE VECTOR DERIVATIVES DISCRETE COSINE TRANSFORM FAST FOURIER TRANSFORM Mel Frequency Cepstral Coefficients(MFCC) spectrum Mel frequency spectrum Cepstral coefficient
  • 20.
    PATTERN RECOGNITION APPROACH • 2steps: – Pattern Training – Pattern Comparison • Goal to determine identity of unknown speech according to how well patterns match
  • 21.
    METHODS IN PATTERN COMPARISONAPPROACH • Template Based Approach • Stochastic Approach (HMM) – Probabilistic Models – Uncertainty and Incompleteness
  • 22.
    TEMPlATE-BASED ASR • Originallyonly worked for isolated words • For each word we want to recognize, we store a template or example based on actual data • Patterns stored as dictionary of words • Each test utterance is checked against the templates to find the best match • Uses the Dynamic Time Warping (DTW) algorithm
  • 23.
    DyNAMIC TIME WARPINg •Dynamic time Warping founds an optimal match between two sequences of feature vectors which allow for stretched and compressed section of the sequence.
  • 24.
    STOCHASTIC APPROACH • Systemthat changes over time in an uncertain manner. It entails the use of probabilistic models to deal with uncertain or incomplete information. HIDDEN MARKOV MODEl • A hidden Markov model (HMM) is a statistical model,in which the system being modeled is assumed to be a Markov process (Memoryless process: its future and past are independent ) with hidden state. • To understand the use of HMM for speech modeling, let us take this Example,Consider the word "again" with two possible pronunciation: (1)Again: AXGEHN (2)Again: AXGEYN
  • 25.
    HIDDEN MARKOV MODEl ENDN EH EY AXBEGIN HMMFOR WORD AGAIN G P{X/AX} P{X/G} P{X/EY} P{X/EH} P{X/N}
  • 26.
    MATCHINg TECHNIQUE WHOLE-WORD MATCHING: »Compares the incoming digital-audio signal against a prerecorded template. » Requires a large amount of storage space. SUB-WORD MATCHING: » Looks for sub-words – usually phonemes and then performs further pattern recognition » Requires much less storage
  • 27.
    EMg SPEECH RECOgNIZER •Session Dependent (SD) • Session Independent (SD) • Multi Sessions (MS) • Session Adaptive Systems (SAS)
  • 28.
  • 29.
  • 30.
    SIlENT SOUND TECHNOlOgy ….Silenceis the best answer for all the situations …even your mobile understands !
  • 31.
    ApplicAtions The Technology opensup a host of application such as mentioned below: • As we know in space there is no medium for sound to travel therefore this technology can be best utilized by astronauts. • Helping people who have lost their voice due to illness or accident. • We can make silent calls even if we are standing in a crowded place.
  • 32.
    • Telling atrusted friend your PIN number over the phone without anyone eavesdropping — assuming no lip-readers are around. • Silent Sound Techniques is applied in Military for communicating secret/confidential matters to others. • Since the electrical signals are universal they can be translated into any language. Native speakers can translate it before sending it to the other side. Hence it can be converted into any language of choice currently being German, English & French.
  • 33.
    REstRictions • Translation intomajority of languages but for languages such as Chinese different tone holds different meaning, facial movements being the same. Hence this technology is difficult to apply in such situations. • From security point of view recognising who you are talking to gets complicated. • Even differentiating between people and emotions cannot be done. This means you will always feel you are talking to a robot. • This device presently needs nine leads to be attached to our face which is quite impractical to make it usable.
  • 34.
    FUtURE pRospEcts • Silentsound technology gives way to a bright future to speech recognition technology. • Without having electrodes hanging all around your face, these electrodes will be incorporated into systems . • It may have features like lip reading based on image recognition & processing rather than electromyography. • An electric signals are universal. So it can be translated into any languages. Now a days only the translation between English, French and German are available.
  • 35.