Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Speech Recognition Virtual Kitchen


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

The Speech Recognition Virtual Kitchen

  1. 1. The Speech Recognition Virtual KitchenFlorian Metze and Eric Fosler-LussierINTERSPEECH 2012
  2. 2. Multimedia Retrieval and Summarization “Traditional” Multimedia Retrieval and Summarization  Select frames and shots that are most informative  Save user time by avoiding repetitions etc. (BBC Rushes Summarization) Recent Advances in Natural Language Processing  Replace “extractive” summarization of text with “abstractive” techniques  Use Statistical Machine Translation as a general technique to convert long “foreign” symbol sequence into concise English text Would this not apply nicely to Multi-media?  Easily have huge amounts of data  “Skimming”, “tagging” with keywords, or “liking” clearly doesn’t do justice to relevance, complexity and potential of Multi-media
  3. 3. What’s Next? Generate more detailed synopses, add temporal aspects, properties Add more modalities (sounds, etc.) “What is in these videos?”  Text could summarize multiple videos at once  Attract interest to (groups of) videos “Why is this video relevant? Or different?”  Text can relate a retrieved video to the query  Text can potentially flag false alarms, outliers
  4. 4.  Thank You!
  5. 5. Feature DefinitionEvent name: Changing a vehicle tire  Extract candidates for relevant objects from “Event Kit”Definition: One or more people work to replace a tire on avehicle  Determine salient objects from MED featuresExplication: A vehicle is any device, motorized or not, used to transport people and/or otheritems. Tires are ring-shaped inflated objects, usually made of rubber, that fit over the wheel of a  Intersect both setsvehicle. The process for replacing a tire includes removing the existing tire and installing the new tireonto the wheel of the vehicle. Tires typically are replaced because they are damaged or worn down. Ifa tire is damaged and loses air pressure as a result, it is called a "flat tire". Generally the driver of the  Use ontologies to resolvevehicle with a flat tire will stop the vehicle as soon as possible and replace the affected tire with atemporary tire called a "spare tire”, which may be stored elsewhere on/in the vehicle. In other cases,the tire may be changed not by the vehicle operator, but by a professional (e.g. a mechanic) who may synonyms, etcuse dedicated tools and work in a repair shop or similar setting.  Combine data-driven and knowledge based sourcesEvidential description: scene: garage, outdoors, street, parking lot
  6. 6. MER Approach: Feature Extraction What to mention:  Take visual evidence (for 100s of classes) for video  Re-rank using manually determined “importance” How to mention:  Present as corroborating or contraindicative evidence  Place additional constraints Similar for ASR hypotheses  Based on unigrams for now Move from “hand-engineered” to automatic methods  Now: similar to Tf/ Idf measure, BOW features  Future: Bipartite graph matching to determine “good” concepts
  7. 7. INTERSPEECH AFTERPARTY“Speech Recognition Virtual Kitchen” Broadway 3 & 4 4:30pm on Thursday, September 13 We want your input to grow this idea further – show your support Come and see more demos of VMs Kitchen Server Third-Party Server Discuss with potential users or content SW/ SW/ VM VM Data Data providers from outside the speech Repo Repo community ➀ ➁ Host PC Virtual Machine Present your own ideas in a short • So ware and Data Virtual • Example Scripts Data • Tutorials presentation(?) ➂ Machine • Reference/ Sample Log-files • …
  8. 8.