The Spoken Web Search Task

  • 553 views
Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
553
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
6
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. MediaEval 2012 Spoken Web SearchFlorian Metze, Marelie Davel, Etienne Barnard, XavierAnguera, Guillaume Gravier, and Nitendra Rajput Pisa, October 4, 2012
  • 2. Outline  The Spoken Web Search Task  Data and Scoring  Organizers and Participants  Results  Discussion
  • 3. Organizers Florian Metze (Carnegie Mellon) Etienne Barnard, Marelie Davel, Charl v. Heerden (North-West University) Xavier Anguera (Telefonica Research) Guillaume Gravier (IRISA) Nitendra Rajput (IBM India)
  • 4. Real life audio content is very diverse! “2011 Indian Data”
  • 5. Spoken Web Search Task Motivation Any speech problem can be solved with enough:  Money, Time, Constraints, Data What if we have just one constraints?  Don’t know what language/ dialect is being used? Don’t have much data!  But don’t have to do Large Vocabulary Speech Recognition, “only” content retrieval What can be done?  Port outside resources (ie. run language-independent/-portable recognizer)  Build a “zero knowledge” approach (i.e. try to directly identify similar words)
  • 6. Primary Data Source: “African Data” “Lwazi” Corpus  Data obtained during targeted effort  Lwazi means Knowledge  Meant as resource for speech Lwazi project aims to develop research, so no “found” data, telephony-based speech-driven as “Indian Data” information system  E. Barnard, M. Davel, and C. van 11 South-African languages, 3h-6h Heerden, "ASR Corpus design for of speech per language resource-scarce languages," in Proc. INTERSPEECH, Brighton, UK; Sep.  Phone sets, dictionaries, read & 2009, pp. 2847-2850. spontaneous, … 3200 utterances used, from 4 languages
  • 7. Evaluation Paradigm:Spoken Term Detection (STD) Do not attempt to convert speech to text (full recognition, ASR) Attempt to detect the occurrence (or absence) of “keywords” STD is not easier than doing ASR  It requires less resources: particularly not a strong language model Evaluation metrics:  (Spoken) Document Retrieval (SDR), when relaxing time constraints  Actual Term Weighted Value (ATWV, MTWV – defined by NIST)
  • 8. Evaluation Idea – 4 Conditions Test development terms on (known) development data Test (unknown) evaluation terms on (unknown) evaluation data Test development terms on evaluation data Test evaluation terms on development data Terms provided as audio examples taken from collections Systems could be developed with or without using external resources (i.e. other speech data, it is important to document, which ones were used – “restricted” vs “open”)
  • 9. NIST Scoring Tools Developed for 2006 Spoken Term Detection  Generates “Actual” and “Maximum Term Weighted Value” (ATWV, MTWV)  Generates DET curves Adapted by us  ECF = “Experiment Control File” (controls which sections to process)  RTTM = “Rich Transcription Time Mark” (defines references)  TLIST =“Term List” Files (links term IDs and word dictionary) A few parameters to choose  Different for 2011 and 2012, to better represent characteristics of SWS task (thanks, Xavi) Best ATWV value is 1, below 0 possible
  • 10. How to Interpret DET Plots Most useful plot If done right, will give you Miss probability (in %) 10 20 40 60 80 90 95 98 .0001 5  P(Miss) over P(FA) for all decision Term Wtd. float-primary-test: CTS Subset Max Val=0.173 Scr=1.276 Term Wtd. float-primary-test : ALL Data Max Val=0.173 Scr=1.276 .001 .004.01.02 .05 .1 .2 .5 1 2 scores False Alarm probability (in %)  A “marker” at the actual decision Combined DET Plot  If computed using score, this will Random Performance 5 be on the line 10 20 Used for evaluation (with 40 score.occ.txt)
  • 11. 2012 Spoken Web Search Participants Authors Title Haipeng Wang and Tan Lee CUHK System for the Spoken Web Search task at Mediaeval 2012 Cyril Joder, Felix Weninger, Martin Wöllmer The TUM Cumulative DTW Approach for the Mediaeval and Björn Schuller 2012 Spoken Web Search Task Andi Buzo, Horia Cucu, Mihai Safta, ARF @ MediaEval 2012: A Romanian ASR-based Bogdan Ionescu, and Corneliu Burileanu Approach to Spoken Term Detection Alberto Abad and Ramón F. Astudillo The L2F Spoken Web Search system for Mediaeval 2012 Jozef Vavrek, Matus Pleva and Jozef Juhar TUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVM Amparo Varona, Mikel Penagarikano, Luis GTTS System for the Spoken Web Search Task at Javier Rodriguez-Fuentes, German Bordel, MediaEval 2012 and Mireia Diez Igor Szoke, Michal Fapšo, and Karel BUT 2012 APPROACHES FOR SPOKEN WEB Veselý SEARCH - MEDIAEVAL 2012 Aren Jansen, Benjamin Van Durme, and The JHU-HLTCOE Spoken Web Search System for Pascal Clark MediaEval 2012 Xavier Anguera (TID) Telefonica Research System for the Spoken Web Search task at Mediaeval 2012
  • 12. Summary of (Primary) Results Team Type Dev EvalCUHK cuhk_phnrecgmmasm_p-fusionprf_1 open 0,7824 0,7430CUHK cuhk_spch_p-gmmasmprf_1 restricted 0,6776 0,6350L2F l2f_12_spch_p-phonetic4_fusion_mv_1 open 0,5313 0,5195BUT BUT_spch_p-akws-devterms_1 open 0,4884 0,4918BUT BUT_spch_g-DTW-devterms_1 open 0,4426 0,4477JHU-HLTCOE jhu_all_spch_p-rails_1 restricted 0,3811 0,3688TID sws2012_IRDTW restricted 0,3866 0,3301TUM tum_spch_p-cdtw_1 restricted 0,2628 0,2895ARF arf_spch_p-asrDTWAlign_w15_a08_b04 open 0,4109 0,2448GTTS gtts_spch_p-phone_lattice_1 open 0,0978 0,0809TUKE tuke_spch_p-dtwsvm restricted 0 0
  • 13. Development data, development terms 98 Random Performance ARF MTWV=0.471 95 ARF MTWV=0.491 ARF MTWV=0.253 90 ARF MTWV=0.487Development Data, Development Terms BUT MTWV=0.468 Miss probability (in %) 80 BUT MTWV=0.493 CUHK MTWV=0.735 CUHK MTWV=0.751 60 CUHK MTWV=0.787 CUHK MTWV=0.631 CUHK MTWV=0.680 40 JHU-HLTCOE MTWV=0.382 L2F MTWV=0.531 TUKE MTWV=0.000 20 TUM MTWV=0.354 TUM MTWV=0.337 TUM MTWV=0.270 10 TID MTWV=0.390 5 TID MTWV=0.375 .0001 .001.004.01.02 .05 .1 .2 .5 1 2 5 10 20 40 GTTS MTWV=0.098 GTTS MTWV=0.105 False Alarm probability (in %)
  • 14. Development data, evaluation terms 98 Random Performance ARF MTWV=0.443 Scr=0.470 95 ARF MTWV=0.475 ARF MTWV=0.016 90 ARF MTWV=0.224 ARF MTWV=0.466 Miss probability (in %) 80 BUT MTWV=0.481Development Data, Evaluation Terms BUT MTWV=0.629 CUHK MTWV=0.769 60 CUHK MTWV=0.772 CUHK MTWV=0.805 CUHK MTWV=0.687 40 CUHK MTWV=0.686 JHU-HLTCOE MTWV=0.440 L2F MTWV=0.633 20 TUKE MTWV=0.000 TUKE MTWV=0.257 TUM MTWV=0.201 10 TUM MTWV=0.396 5 TID MTWV=0.498 .0001 .001.004.01.02 .05 .1 .2 .5 1 2 5 10 20 40 TID MTWV=0.300 GTTS MTWV=0.083 False Alarm probability (in %) GTTS MTWV=0.109
  • 15. Evaluation data, development terms 98 Random Performance ARF MTWV=0.317 95 ARF MTWV=0.339 ARF MTWV=0.000 90 ARF MTWV=0.167 ARF MTWV=0.333 Miss probability (in %)Evaluation Data, Development Terms 80 BUT MTWV=0.383 BUT MTWV=0.429 CUHK MTWV=0.707 60 CUHK MTWV=0.715 CUHK MTWV=0.752 CUHK MTWV=0.561 40 CUHK MTWV=0.620 JHU-HLTCOE MTWV=0.336 L2F MTWV=0.486 20 TUKE MTWV=0.000 TUM MTWV=0.236 TUM MTWV=0.291 10 TUM MTWV=0.174 5 TID MTWV=0.314 .0001 .001.004.01.02 .05 .1 .2 .5 1 2 5 10 20 40 TID MTWV=0.472 GTTS MTWV=0.070 False Alarm probability (in %) GTTS MTWV=0.081
  • 16. Evaluation data, evaluation terms 98 Random Performance ARF MTWV=0.268 95 ARF MTWV=0.310 ARF MTWV=0.001 90 ARF MTWV=0.120 ARF MTWV=0.306 Miss probability (in %) 80 BUT MTWV=0.488 BUT MTWV=0.530Evaluation Data, Evaluation Terms CUHK MTWV=0.724 60 CUHK MTWV=0.742 CUHK MTWV=0.762 CUHK MTWV=0.589 40 CUHK MTWV=0.643 JHU-HLTCOE MTWV=0.384 L2F MTWV=0.523 20 TUKE MTWV=0.000 TUM MTWV=0.187 TUM MTWV=0.164 10 TUM MTWV=0.296 5 TID MTWV=0.342 .0001 .001.004.01.02 .05 .1 .2 .5 1 2 5 10 20 40 TID MTWV=0.311 GTTS MTWV=0.070 False Alarm probability (in %) GTTS MTWV=0.081
  • 17. Spoken Web Search TaskSummary 1 Second time around  Last year’s participants (mostly) became organizers  Grew from 5 to ca. 10 participants!!!  Europe, America, Asia, Africa (where’s Australia and Antarctica?) Interesting differences in performance  Thank you all participants! It was fun & interesting.  Evaluation criteria useful, correct?
  • 18. Spoken Web Search TaskSummary 2 Could talk a bit about JHU-HLTCOE’s “RAILS” system Next steps?  Do more joint analysis (hope everybody’s results agree with ours?)  Shared Publications? ICASSP? Journal?  Develop task further for next year?  “Speech Kitchen” idea will be presented later …
  • 19.  Thank You!
  • 20. How to Interpret *.occ.txt File Coefficients C, V  Values used for padding and multi-term detections are missing  Weighting of correct vs incorrect detections  In some rare cases lists different Probability of a Term values for total and only sub-class  Expectation of terms  Was expecting more questions Average and Maximum TWV P(FA) and P(Miss) Optimal decision score
  • 21. Parameters used The tools assume you use a  Used different parameters for “decision score” African and Indian data sets to reflect different use cases  Submit “candidates” with score lower than cutoff  KoefV/ KoefC are debatable  Submit “detections” with score  What’s the cost of wrong and the higher than cutoff benefit of correct detections Enables plotting of DET curves  -P Probability-of-Term Can be confusing  How frequent are terms expected to be?
  • 22. How to Interpret score.det.thresh.pdf Can be used to analyze decision MaxValue 0.173 @ 1.276 score behavior P(Miss) P(FA) Value 0.2 0.4 0.6 0.8 0 1  P(FA) False Alarms 1  P(Miss) Missed Detections Term Wtd. Threshold Plot for float-primary-test : ALL Data 2 3  Resulting TWV Decision Score 4 5 6 7
  • 23. Dev-Dev MTWV-ATWV differences 0.1 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0
  • 24. Eval-Eval MTWV-ATWV differences 0.1 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0
  • 25. Dev-Eval MTWV-ATWV differences 0.25 0.2 0.15 0.1 0.05 0
  • 26. Eval-Dev MTWV-ATWV differences 0.25 0.2 0.15 0.1 0.05 0