7. Speech recognition
● one W AH N
● two T UW
● three TH R IY
● four F AO R
● five F AY V
● six S IH K S
● seven S EH V AH N
● eight EY T
● nine N AY N
● ten T EH N
phonemes
words
sentences
AM
dict
LM
11. Metaphone code computation algorithm
Remove all repeating neighboring letters except letter C.
The beginning of the word should be transformed using the
following rules:
KN → N
GN → N
PN → N
AE → E
WR → R
Remove B letter at the end, if it is after M letter.
Replace C using the rules below:
With Х: CIA → XIA, SCH → SKH, CH → XH
With S: CI → SI, CE → SE, CY → SY
With K: C → K
Replace D using the following rules:
With J: DGE → JGE, DGY → JGY, DGI → JGY
With T: D → T
Replace GH → H, except it is at the end or before a vowel.
Replace GN → N and GNED → NED, if they are at the end.
Replace G using the following rules
With J: GI → JI, GE → JE, GY → JY
With K: G → K
Remove all H after a vowel but not before a vowel.
Perform following transformations using the rules below:
CK → K
PH → F
Q → K
V → F
Z → S
Replace S with X:
SH → XH
SIO → XIO
SIA → XIA
Replace T using the following rules
With X: TIA → XIA, TIO → XIO
With 0: TH → 0
Remove: TCH → CH
Transform WH → W at the beginning. Remove W if there is no vowel
after it.
If X is at the beginning, then replace X → S, else replace X → KS
Remove all Y which are not before a vowel.
Remove all vowels except vowel at the start of the word.
22. Conclusions
● good recognition model and audio preprocessing is crucial,
consider speed vs accuracy
● phonetic filtering increases recall but decreases precision
● phonetic filters as improvement, not standalone
● consider fuzzy search
23. Use cases
● audio archive
● looking up broadcast
○ opinion mining
○ collecting information
● voice control
● dictation
○ short notes
○ voice mail -> text messages