Khmer ASR systemSethserey SAMsam.sethserey@itc.edu.kh
Part I: ASR in generalo    Definitiono    Type of ASRo    ASR flow charto    Data requiremento    Performance of ASR syste...
What is ASR system?o  ASR: Automatic speech recognition   systemo  ASR: A system or tool that can   convert audio flow con...
ASR: what for?o  ASR systems improve your life (works ,   business, communication ,etc.)
Typology of ASR systemso  Speaker-dependent vs. -independento  Language constraints:                   + Vocabulary:  n  ...
Levels of complexity                       6
ASR flow chart                             s                             e                        Seven                   ...
ASR data requiremento  To train AM and ML models, huge amount of   data (text & audio) are needed.                        ...
ASR Performanceo    English ASR system Evaluations at National Institute of     Standards and Technology (NIST)           ...
Causes of ASR’s error rate                         “seven”o  The current ASR for continuous speech   can not reach 0% of W...
Three fundamental methods forcreating a new ASR systemo  Enough training data è bootstrapingo  Small amount of data è ad...
Part II:Khmer language & its processingo  Khmer languageo  Why research on Khmer ASR?                                12
Khmer Languageo    Official	  language	  of	  Cambodia	  o    Spoken	  by	  more	  than	  15	  M	  people	  o    An	  atonal...
Why research on Khmer ASR?o  An	  under-­‐resourced	  language	  	      n  Lack	  of	  text	  and	  speech	  data	  in	  ...
Part III:    Khmer ASR at the glanceo  Corpus  o  Speech corpus setup  o  Text corpus setup  o  General overviewo  Current...
Corpus: Speeh corpus setupo  Two types of corpus:  n  small transcribed corpus (2007-2008)     o  Transcribed manually by...
Corpus: Text corpus setupo  Retrieving	  text	  from	  the	  Web	  is	  becoming	  a	  common	  approach	  o  Well	  selec...
Corpus-Oveviewo  Description of Khmer ASR corpus Type               Small corpus         Large corpus Signal             ~...
Current ASR systemContinue ASR       Training &          Word Error Rate (%)  System         tasting corpus               ...
Future Worko  Collect more text data for language   modelo  Next challenge: How to improve   Khmer ASR for independent spe...
THANK YOU!!              21
Upcoming SlideShare
Loading in …5
×

Khmer ASR

1,165 views

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,165
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
20
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Khmer ASR

  1. 1. Khmer ASR systemSethserey SAMsam.sethserey@itc.edu.kh
  2. 2. Part I: ASR in generalo  Definitiono  Type of ASRo  ASR flow charto  Data requiremento  Performance of ASR systemso  Fundamental methods to create ASR system 2
  3. 3. What is ASR system?o  ASR: Automatic speech recognition systemo  ASR: A system or tool that can convert audio flow contained speech to text. Seven Seven days ASR System Zaven : : Text output 3
  4. 4. ASR: what for?o  ASR systems improve your life (works , business, communication ,etc.)
  5. 5. Typology of ASR systemso  Speaker-dependent vs. -independento  Language constraints: + Vocabulary: n  isolated word recognition n  connected word small (100), n  keyword spotting medium (5 000), large (50 000) n  continuous speech recognitiono  Robustness constraints n  laboratory (office) conditions: imposed n  microphone, channel noise … 5
  6. 6. Levels of complexity 6
  7. 7. ASR flow chart s e Seven v Seven days Zaven e : n : Signal processing Decoding/Searching (digitalizing & feature extraction) ASR system 7
  8. 8. ASR data requiremento  To train AM and ML models, huge amount of data (text & audio) are needed. Pronunciation Audio + dictionary Text data transcription data 8
  9. 9. ASR Performanceo  English ASR system Evaluations at National Institute of Standards and Technology (NIST) 9
  10. 10. Causes of ASR’s error rate “seven”o  The current ASR for continuous speech can not reach 0% of WER, why ? n  Acoustic model is affected by human character and environment: gender, age, emotion, pitch, accent, physical state, channel noise, etc. n  Lexical model is affected by incorrect word pronunciation. n  Language model : incorrect usage of words, grammar mistakes. 10
  11. 11. Three fundamental methods forcreating a new ASR systemo  Enough training data è bootstrapingo  Small amount of data è adaptationo  No data è cross-language transfer 11
  12. 12. Part II:Khmer language & its processingo  Khmer languageo  Why research on Khmer ASR? 12
  13. 13. Khmer Languageo  Official  language  of  Cambodia  o  Spoken  by  more  than  15  M  people  o  An  atonal  language  o  Wri>ng  system   n  33  Consonants,  23  dependent  vowels   n  14  independent  vowels,  13  diacri>cs  and  various  signs       n  No  explicit  word  boundary       13
  14. 14. Why research on Khmer ASR?o  An  under-­‐resourced  language     n  Lack  of  text  and  speech  data  in  digital  form   n  Lack  of  linguis>c  documents  (both  soK  and  hard   copies)  o  Lacking  explicit  Word  Segmenta>on     n  Automa>c  Word  Segmenta>on  is  needed   n  State-­‐of-­‐the-­‐art  method  of    segmenta>on  uses     –  hand-­‐craKed  lexicons,  word  frequencies,     –  op>miza>on  criteria  …  o  Others  under-­‐resourced,  unsegmented   languages  in  the  region  :  Burmese,  Laos,  Thai   Vietnamese         14
  15. 15. Part III: Khmer ASR at the glanceo  Corpus o  Speech corpus setup o  Text corpus setup o  General overviewo  Current ASR systemo  Future work 15
  16. 16. Corpus: Speeh corpus setupo  Two types of corpus: n  small transcribed corpus (2007-2008) o  Transcribed manually by Engineering students at ITC o  only 6 hours of transcribed signal o  Nature: radio signal (poor quality) downloaded from radio australie, radio free asia and voice of america n  Large transcribed corpus (2011) o  Already have text and speech corresponding o  Students help verifying the transcription o  21 hours of transcribed signal o  Nature: reading speech from newspaper 16
  17. 17. Corpus: Text corpus setupo  Retrieving  text  from  the  Web  is  becoming  a  common  approach  o  Well  selected  rich-­‐content  websites  Vs  crawling  the  Web  o  Adap>ng  ClipsTextTk,  an  open  source  tool  for  corpus  crea>on  for   Khmer  language   n  Conversion  from  legacy  character  encoding  to  Unicode   n  Automa>c  Segmenta>on     n  Conversion  of  special  sign  and  number  to  text   n  Normaliza>on  of  word  spelling  o  Text  Corpus  obtained  from  5  sites  :   n  2,5000  html  pages  retrieved     n  AKer  processing  :  0.5  M  sentences,  15  M  words   n  Dura>on  :  November  2007  –  January  2008       17
  18. 18. Corpus-Oveviewo  Description of Khmer ASR corpus Type Small corpus Large corpus Signal ~6h of transcribed ~20h of (acoustic model) signal (radio) transcribed signal (reading speech) Text 0,5 millions of to be improved (language model) phrase ~ 15,5 millions of words Pronunciation ~ 20 000 words To be improved Dictionary (lexical model) 18
  19. 19. Current ASR systemContinue ASR Training & Word Error Rate (%) System tasting corpus Context Context Dependent Dependent (8gau) (16gau)Khmer ASR v1 - LM: 15.5M words 42.5 40.3 - Training AM: 5h - Testing: 172pKhmer ASR v2 - LM: 15M words 36.4 35 - Training AM: 20h - Testing: 290 p 19
  20. 20. Future Worko  Collect more text data for language modelo  Next challenge: How to improve Khmer ASR for independent speakers and in different environments? 20
  21. 21. THANK YOU!! 21

×