Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The JHU-HLTCOE Spoken Web Search System for MediaEval 2012


Published on

  • Be the first to comment

  • Be the first to like this

The JHU-HLTCOE Spoken Web Search System for MediaEval 2012

  1. 1. The JHU-HLTCOE Spoken Web Search System Aren Jansen Benjamin Van Durme Pascal Clark MediaEval 2012 Spoken Web Search Task October 4, 2012
  2. 2. Scalable Segmental DTW Search [Jansen & Van Durme, Interspeech 2012]• Randomized Acoustic Indexing for Logarithmic-Time Search (RAILS) 1. Map each frame into a sortable bit signature using Locality Sensitive Hashing 2. Maintain a sorted list (index) of the signatures 3. Use index to construct nearest neighbor sets in logarithmic time 4. Search for runs of similar frames with sparse image processing techniques
  3. 3. Features and Scoring• Features: – Frequency domain linear prediction [Hermansky et al.] – 5 cepstral coefficients + velocity + acceleration (15D)• Score Normalization: – Split queries about non-speech regions and search each subsegment separately and sum DTW scores – Per query z-normalization of DTW scores
  4. 4. Results Query Search MTWV ATWV Set Collection Dev Dev 0.381 0.381 Dev Eval 0.336 0.321 Eval Dev 0.439 0.421 Eval Eval 0.384 0.369• z-normalization gives reliable scores across sets – Small gap between MTWV and ATWV with threshold selected on dev-dev• Evaluated system runs 1,000X faster than real-time on 1 CPU• Find hits at over 500,000X real-time (but reduced ATWV)