Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

ARF @ MediaEval 2012: A Romanian ASR-based Approach to Spoken Term Detection


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

ARF @ MediaEval 2012: A Romanian ASR-based Approach to Spoken Term Detection

  1. 1. Motivation Spoken Term Detection trough ASR Based on the Romanian ASR for continuous speech:  acoustic model trained with 64h of speech  language model trained with 170 million words  18% WER on clean speech Adaptation of Romanian ASR to Lwazi language Provided searching algorithms based on different outputs of ASR
  2. 2. ASR adaptation Tuning the Romanian ASR to minimize PhER at 8KHz 77 African phones mapped to 28 Romanian phones Romanian to Lwazi phone mapping rules: 1) directly by IPA classification 2) to the closest phone according to IPA full chart 3) based on the confusion matrix MAP adaptation of acoustic model with the development data set
  3. 3. ASR accuracy Adaptation steps PhER [%]Romanian ASR for continuous speech 36.8Romanian ASR - beam width tuned 31.4Romanian ASR - language model tuned 25.3African speech with Romanian ASR 61.2MAP adaptation with Lwazi dev set 48.1
  4. 4. Searching techniques The ASR output can be:  String of characters  Lattice  Confusion Networks Character comparison based techniques:  DTW String Search (DTWSS)  Sausage Technique (ST) Acoustics based technique:  Lattice Grammar (LG)
  5. 5. DTWSS Sliding window length proportional to the query lengths Shorter DTW matches are given higher score Longer queries are given higher scores The score formula: LQ  LQm LW  LS s  (1  PhER )(1   )(1   ) LQM  LQm LQ
  6. 6. Sausage Technique (ST)
  7. 7. Lattice Grammar (LG) Recognition of the query Building of a finite state grammar (FSG) from the lattice (query) output of the ASR Recognition of the contents with the FSG. Calculation of the likelihood probability Normalization of the likelihood probability and use it as decision score
  8. 8. Results on evaluation data set
  9. 9. Results on all data set evalQ- evalQ- devQ- devQ- ATWV evalC devC evalC devC DTWSS (α=0.8 β=0.4) 0.31 0.47 0.33 0.49 DTWSS (α=0.6 β=0.6) 0.31 0.48 0.33 0.47 DTWSS (α=0.1 β=0.4) 0.27 0.44 0.32 0.47 ST 0.12 0.22 0.17 0.25 LG 0 0.02 0 -
  10. 10. Conclusions The Romanian ASR is adapted to recognize African phones DTWSS obtains by far the best results The penalization of long DTW matches and short query lengths helped increase the ATWV ST and LG methods suffer the low PhER (48%)