BUT2012 APPROACHES FOR SPOKEN WEB SEARCH - MEDIAEVAL 2012

663 views
608 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
663
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

BUT2012 APPROACHES FOR SPOKEN WEB SEARCH - MEDIAEVAL 2012

  1. 1. BUT2012 Brno University of Technology Faculty of Information Technology Speech@FITIgor Szöke, Michal Fapšo, Karel VeselýMediaEval 2012 workshop – SWS task, October 4.-5. 2012, Pisa
  2. 2. Outlines  Systems overview & Underlying technologies  PhnRec, R-AKWS, AKWS – primary system  DTW  (GMM/HMM) – not submitted  Calibration  Results and discussionMediaEval SWS 2012 BUT2012 2workshop - 4.-5.10. Pisa
  3. 3. System overview  Our internal task was − to build simple and minimalistic language dependent Query-by-Example (QbE).  Ingredients − Development data, Neural net classifier, Phoneme recognizer, Acoustic keyword spotting, DTW, CalibrationMediaEval SWS 2012 BUT2012 3workshop - 4.-5.10. Pisa
  4. 4. System overview Sentence mean normalization Bottle-Neck Posteriors Neural network based features AKWS - X − bottle-necks DTW X X − three state phone posteriors (GMM/HMM) X - Query detector − AKWS − DTW − (GMM/HMM) – not submitted to the evals MediaEval SWS 2012 BUT2012 4 workshop - 4.-5.10. Pisa
  5. 5. Underlying technologies  Universal context, bottle-neck neural network base classifier  devC state re-alignment, Reduced phone set (50 phonemes)  Trained by Tnet – our tool, publicly availableMediaEval SWS 2012 BUT2012 5workshop - 4.-5.10. Pisa
  6. 6. Phnrec, R-AKWS, AKWS  Phoneme recognizer - free phone loop, devC 66.02% PAC  R-AKWS - Queries extracted from phone alignment  AKWS - Queries extracted from phone recognizer devQ - devC devQ - evalC MTWV MTWVcalib UBTWV MTWV MTWVcalib UBTWVR-AKWS 0.739 0.786 0.859 R-AKWS 0.653 0.703 0.789AKWS 0.452 0.493 0.600 AKWS 0.377 0.429 0.552MediaEval SWS 2012 BUT2012 6workshop - 4.-5.10. Pisa
  7. 7. DTW  Used as a baseline.. bottlenecks are better than posteriors devQ - devC evalQ - evalC MTWV MTWVcalib UBTWV MTWV MTWVcalib UBTWVR-AKWS 0.739 0.786 0.859 R-AKWS - - -AKWS 0.452 0.493 0.600 AKWS 0.470 0.530 0.672DTW 0.400 0.468 0.552 DTW 0.426 0.488 0.599MediaEval SWS 2012 BUT2012 7workshop - 4.-5.10. Pisa
  8. 8. GMM/HMM  Inspired by AKWS, not submitted due to bad results. MTWV MTWVcalib UBTWV R-AKWS 0.739 0.786 0.859 AKWS 0.452 0.493 0.600 DTW 0.400 0.468 0.552 GMM/HMM 0.011 - 0.336MediaEval SWS 2012 BUT2012 8workshop - 4.-5.10. Pisa
  9. 9. Calibration  TWV - pooled, UBTWV - non-pooled TWV (each term has its best thr.)  Calibration of scores (linear combination of 12 parameters - 6 features with linear and quadratic forms). Trained on UBTWV thresholds. − Query length (w/o outer sil), Length of inner sil, − Score average global, Score average by phonemes − Phonemes count, Detections count  We found that Detections count and Length of inner sil work the best for AKWS (after evals). Parameter Training error AKWS Training error DTW Detections count 0.1272 0.002115 Length of inner sil 0.1577 0.002687 Query length (w/o outer sil) 0.1626 0.002773 Score average global 0.1635 0.002530 Phonemes count 0.1656 0.002779 Score average by phonemes 0.1660 0.002746MediaEval SWS 2012 BUT2012 9workshop - 4.-5.10. Pisa
  10. 10. Calibration AKWSMediaEval SWS 2012 BUT2012 10workshop - 4.-5.10. Pisa
  11. 11. Conclusion devQ-devC evalQ-evalC ATWV MTWV UBTWV ATWV MTWV UBTWV AKWS 0.488 0.502 0.600 0.522 0.553 0.672 (0.488) (0.452) (0.492) (0.530) DTW 0.443 0.468 0.552 0.448 0.488 0.599• AKWS with new calibration (submitted in brackets)• Good and consistent data, enough to train good Phnrec• GMM/HMM does not perform well on in-language condition and 1 example per query (our best system in last year)• Number of detections is important calibration feature (due to TWV)• Future work: detections calibration, system fusion
  12. 12. Like / Dislike / Next evals?  Like: − Adapted TWV, real KWS scoring − Phone alignment provided − Good data, great work of organizers  "Dislike": − No test data alignment − No speaker information  Next evals: − More examples per query? − Provide query and the query sentence (adaptation issue)? − Non-pooled scoring metric? − We would like to share our features – more on poster sessionMediaEval SWS 2012 BUT2012 12workshop - 4.-5.10. Pisa
  13. 13. Thank You for Your attention.MediaEval SWS 2012 BUT2012 13workshop - 4.-5.10. Pisa

×