Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Automatic Language Identification

1,065 views

Published on

My presentation about Automatic Language Identification at Barcamp London 9.

Published in: Technology
  • Login to see the comments

Automatic Language Identification

  1. 1. mjav meong h miaou mja m ueam m ia m uw na meow u mi m v ao já ya ? mi
  2. 2. Automatic  Language   Identification       (LiD)   Alexander  Hosford   @awhosford  
  3. 3. The  LiD  Problem  •  The  identification  of  a  given  spoken  language   from  a  speech  signal  •  94%  of  global  population  speak  only  6%  of   world  languages  
  4. 4. Real  World  Situations  •  Automation  of  LiD  is  desirable  •  Offers  many  benefits  to  international  service   industries  •  Hotels,  Airports,  Global  Call  Centres  
  5. 5. Language  Differences    •  Languages  contain  information  that  makes   one  discernable  from  the  other;   –  Phonemes   –  Prosody   –  Phonotactics   –  Syntax  
  6. 6. Human  Abilities  •  Bias  towards  native  language  arises  in  infancy  •  Prosodic  features  some  of  the  first  cues  to  be   recognised  •  Humans  can  make  a  reasonable  estimate  on  language   heard  within  3-­‐4  seconds  of  audition  •  Even  unfamiliar  languages  may  be  plausibly  judged    
  7. 7. Previous  Attempts  •  Attempts  as  early  as  1974  –  USAF  work,  therefore   classified.  •  Methodological  Investigations  as  early  as  1977  (House  &   Neuburg)  •  Studies  for  the  most  part  center  on  phonotactic  constrains   and  phoneme  modeling  •  Raw  acoustic  waveforms  have  been  visited  (Kwasny  1993)  
  8. 8. A  Simpler  Approach?  •  Phonotactic  approaches  require  expert  linguistic   knowledge  •  Phoneme  and  phonotactic  modeling  time   consuming  •  Given  the  speed  of  human  LiD  abilities,   discrimination  most  likely  based  on  acoustic   features  
  9. 9. System  Overview   SuperCollider MFCC Vectors SuperCollider SystemCmdPre-processed Speech Onset ARFF File WEKA Tools nPVI Generation Utterance Detection Pitch Contour
  10. 10. Feature  Extraction  •  Spectral  Information  –  MFCC  Vectors  •  Pitch  contour  information  •  Handled  by  SCMIR  Library  (Collins,  2010)  •  Speech  Rhythm  –  Normalised  Pairwise  Variability   Index  (nPVI)   Feature  Type   Implemented  Measure   Spectral  Content   MFCC  Vector  uGen   Pitch  Contour   Tartini  uGen   Speech  Rhythm   nPVI  function  
  11. 11. Classification  •  Handled  by  the  WEKA  toolkit  •  Built  in  Multilayer  Perceptron  •  Called  from  the  command  line  through  a   SuperCollider  system  command  
  12. 12. Comparisons  Made  •  66  language  pairs  from  12  languages  •  A  comparison  within  language  families  •  A  comparison  of  all  12  languages  •  Averaged  &  segmented  data  
  13. 13. Results  Within  Families  100.00   90.00   80.00   70.00   Germanic   Romantic   60.00   Slavic   SinoAltaic   50.00   Mean   40.00   30.00   20.00   4  MFCC   13  MFCC  Tartini   41  MFCC  Tartini   All  Features   *Data  averaged  across  files  
  14. 14. Results  Within  Families  100.00   90.00   80.00   70.00   Germanic   Romantic   60.00   Slavic   SinoAltaic   50.00   Mean   40.00   30.00   20.00   4  MFCC   13  MFCC  Tartini   41  MFCC  Tartini   *Data  in  5  second  segments    
  15. 15. Results  From  All  Languages  100.00   90.00   80.00   70.00   60.00   Averaged   Mean   50.00   5s  Segments   Mean   40.00   30.00   20.00   10.00   4  MFCC   4  MFCC  Tartini   4  MFCC  Tartini   13  MFCC   13  MFCC  Tartini   13  MFCC  Tartini   41  MFCC   41  MFCC  Tartini   41  MFCC  Tartini   nPVI   nPVI   nPVI  
  16. 16. Extensions  •  nPVI  function  to  use  vowel  onsets  •  Phonemic  Segmentation  •  A  Larger  dataset  •  Better  Efficiency  •  Real-­‐time  operation  
  17. 17. Robustness  •  Real  world  signals  are  very  different  from   processed  ‘clean’  data.  •  ‘Ideal’  LiD  systems  –  independent  &  robust  •  A  need  to  analyse  only  the  part  of  the  signal   that  matters.  
  18. 18. CASA  •  A  computational  modeling  of  human   ‘Auditory  Scene  Analysis’  (Bregman  1990)  •  Separation  of  signal  into  component  parts   and  reconstitution  into  meaningful  ‘streams’  
  19. 19. Using  Noisy  Signals   I’m  open  to  suggestions!  
  20. 20. alexanderhosford@googlemail.com   07814  692  549   @awhosford  

×