Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Telefonica Research System for the Spoken Web Search task at Mediaeval 2012

976 views

Published on

  • Be the first to comment

  • Be the first to like this

Telefonica Research System for the Spoken Web Search task at Mediaeval 2012

  1. 1. Telefonica  Research  at  Mediaeval   2012  Spoken  Web  Search  Task   Xavier  Anguera  
  2. 2. Outline  •  System  descripBon   –  Speech  AcBvity  detecBon  •  Proposed  systems   –  Segmental-­‐DTW   –  IR-­‐DTW  •  Results  
  3. 3. Proposed  overall  system   S-­‐DTW   IR-­‐DTW  
  4. 4. Frontend  MFCC-­‐39  features   (12  Cepstra  +  Energy)  +  Delta  +  DeltaDelta  Mean  &  variance  normalizaBon  at  sentence  level    Posterior  probabiliBes  from  a  GMM  background    model  L2-­‐normalizaBon    
  5. 5. Background  model  training   IteraBve  128   Gaussian  Spling   EM-­‐ML  GMM   training   K-­‐means     assignment  [1]  “Speaker  Independent  discriminant  feature  extracBon  for  acousBc  paXern  matching”,  Xavier  Anguera,  ICASSP  2012  
  6. 6. Silence  modeling  10%  lowest  energy   frames   •  1  Gauss  for  noise  and  4   Gauss  for  speech   Silence/Speech   •  Perform  10  iteraBons  or   GMM  training   while  %  variaBon  is  high   Decode  the  data  
  7. 7. 2234444343322444444444443222222234444444444444444444444443210000011222443   Threshold  set  to  values  <2  (i.e.  silence  +  lowest  speech)  
  8. 8. Overlap  postprocessing   •  We  compute  the  percentage  of  overlap   between  all  matching  paths   min(End1, End2) ! max(Start1, Start2) Ovl = min(End1! Start1, End2 ! Start2) •  For  pairs  with  >  0.5  overlap   –  Select  the  match  with  highest  score  
  9. 9. Start1   End1   Match1   Match2   Start2   End2   min(ends)  –  max(starts)  Ovl  =     =  0.8   Min(size1,  size2)  
  10. 10. S-­‐DTW  submission  •  Based  on  last  year’s  submission  but  with  the   system  improvements  above  
  11. 11. DTW  local  constraints  •  no  global  constraints  are  applied  in  order  to  allow  for   matching  of  any  segment  among  both  sequences  •  Local  constraints  are  set  to  allow  warping  up  to  2X   " D(m ! 2, n) + d(xm , yn ) (m,  n)   $ $ jumps(m ! 2, n) + 3 $ D(m, n ! 2) + d(xm , yn ) (m-­‐2,  n-­‐1)  D(m, n) = min # $ jumps(m, n ! 2) + 3 $ D(m ! 2, n ! 2) + d(x , y ) m n $ (m-­‐1,  n-­‐2)   % jumps(m ! 2, n ! 2) + 4 (m-­‐1,  n-­‐1)  •  Posteriorgram  features  distance:   $ N!1 d(xm , yn ) = ! log & # xm [i]" yn [i]) % i=0 (
  12. 12. S-­‐DTW  algorithm  Query  term   Reference  term  
  13. 13. S-­‐DTW  algorithm  Query  term   Reference  term  
  14. 14. IR-­‐DTW  •  Total  rework  from  last  year’s  system  •  Aim  at  keeping  the  same  accuracy,  but:   –  Much  less  memory  usage   –  Faster  retrieval  •  IR  (InformaBon  Retrieval)  cause  we  use   reference  features  indexing  for  fast  nearest   neighbors  retrieval  
  15. 15. Official  results   MTWV   Dev-­‐dev   Dev-­‐eval   Eval-­‐dev   Eval-­‐eval  IR-­‐DTW   0.3903   0.3139   0.4983   0.3416   S-­‐DTW   0.3745   0.3001   0.4716   0.3113   ATWV   Dev-­‐dev   Dev-­‐eval   Eval-­‐dev   Eval-­‐eval  IR-­‐DTW   0.3866   0.3042   0.4219   0.3301  S-­‐DTW   0.3644   0.292   0.3988   0.2942  
  16. 16. DEV-DEV results 98 Random Performance IR-DTW MTWV=0.390 Scr=0.387 95 S-DTW MTWV=0.375 Scr=0.695 90 80Miss probability (in %) 60 40 20 10 5 .0001 .001 .004 .01.02 .05 .1 .2 .5 1 2 5 10 20 40 False Alarm probability (in %)
  17. 17. EVAL-EVAL Results 98 Random Performance IR-DTW MTWV=0.342 95 S-DTW MTWV=0.311 90 80Miss probability (in %) 60 40 20 10 5 .0001 .001 .004 .01.02 .05 .1 .2 .5 1 2 5 10 20 40 False Alarm probability (in %)
  18. 18. DEV-EVAL results 98 Random Performance IR-DTW MTWV=0.314 95 S-DTW MTWV=0.300 90 80Miss probability (in %) 60 40 20 10 5 .0001 .001 .004 .01.02 .05 .1 .2 .5 1 2 5 10 20 40 False Alarm probability (in %)
  19. 19. EVAL-DEV results 98 Random Performance IR-DTW MTWV=0.498 95 S-DTW MTWV=0.472 90 80Miss probability (in %) 60 40 20 10 5 .0001 .001 .004 .01.02 .05 .1 .2 .5 1 2 5 10 20 40 False Alarm probability (in %)
  20. 20. Xavier  Anguera  Summary   xanguera@Bd.es   •  We  propose  2  systems,  all  sharing  the  same   framework   •  Some  improvements  in  the  framework  were   incorporated:  speech/silence  classificaBon,  new   overlap  detecBon,  modified  background  model.   •  IR-­‐DTW  is  a  total  reimplementaBon  of  SDTW,   using  informaBon  retrieval  concepts  

×