Open Text:Speech recognition in Opencast MatterhornStephen MarquardCentre for Educational TechnologyUniversity of Cape Tow...
Project goals<br />Integrate CMU Sphinx speech recognition engine into Opencast Matterhorn<br />Provide easy mechanism for...
Why is it important?<br />Video and audio is more useful if you can:<br />Navigate it easily<br />Locate relevant recordin...
Why is it difficult?<br />Audio quality can dramatically affect speech recognition accuracy<br />Echo and reverberation<br...
Prior work in ASR for lectures<br />MIT Lecture Browser (SUMMIT recognizer)<br />U. Toronto / ePresence PhD prototype by C...
Speech recognition software ecosystem<br />Licensing and patents<br />Closed<br />Proprietary<br />FOSS<br />Open<br />
Accounting for context:Language model adaptation<br />Adapt a language model to more closely resemble the target speech<br...
Using Wikipedia for LM adaptation<br />Goal is to adapt a “standard” LM to be specific to the topic of the audio<br />Star...
Baseline performance with Sphinx4 (HUB4 acoustic and language models)<br />Lecture audio and transcripts from Open Yale Co...
Best-case comparison (30% WER)Transcript, HUB4 LM, Wikipedia Similarity LM<br />Before launching into Pynchon today, I tho...
Worst-case comparison (61% WER)Transcript, HUB4 LM, Wikipedia Similarity LM<br />i'd talk with the french revolution this ...
Work in progress<br />Identify requirements for recording recognition-quality audio (equipment, acoustics)<br />Implement ...
Other integration possibilities<br />External transcription services (automate the workflow, choice between manual or auto...
Find out more<br />Email me:stephen.marquard@uct.ac.za<br />	Follow me on Twitter:<br />http://twitter.com/stephenmarquard...
Upcoming SlideShare
Loading in …5
×

Open Text: Speech recognition in Opencast Matterhorn

4,766 views

Published on

Introduction and overview of activities to add CMU Sphinx speech recognition to Opencast Matterhorn.

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
4,766
On SlideShare
0
From Embeds
0
Number of Embeds
2,611
Actions
Shares
0
Downloads
22
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Open Text: Speech recognition in Opencast Matterhorn

  1. 1. Open Text:Speech recognition in Opencast MatterhornStephen MarquardCentre for Educational TechnologyUniversity of Cape TownJune 2011<br />
  2. 2. Project goals<br />Integrate CMU Sphinx speech recognition engine into Opencast Matterhorn<br />Provide easy mechanism for speaker training<br />Generate automatic transcripts of recorded lectures<br />Allow users to correct and improve the transcripts<br />Use feedback to improve recognition accuracy (of the same, similar or subsequent recordings)<br />
  3. 3. Why is it important?<br />Video and audio is more useful if you can:<br />Navigate it easily<br />Locate relevant recordings from a large set<br />Use by students:<br />Catch up on missed lectures (continuous play or read the transcript)<br />Revision: jump to a particular point or find the lectures which cover topic X<br />On the public web:<br />Discoverability (search indexing)<br />Similar advantages to OCR recognition of slides (but harder)<br />
  4. 4. Why is it difficult?<br />Audio quality can dramatically affect speech recognition accuracy<br />Echo and reverberation<br />Background noise<br />Microphone location<br />Speaker-independent large-vocabulary continuous speech recognition is the hardest type of ASR<br />Best case: good acoustics, single speaker (limited dialogue), accent match with the acoustic model, limited vocabulary.<br />
  5. 5. Prior work in ASR for lectures<br />MIT Lecture Browser (SUMMIT recognizer)<br />U. Toronto / ePresence PhD prototype by CosminMunteanu(SONIC recognizer)<br />ETH Zurich Integration of CMU Sphinx with REPLAY by SamirAtitallah<br />
  6. 6. Speech recognition software ecosystem<br />Licensing and patents<br />Closed<br />Proprietary<br />FOSS<br />Open<br />
  7. 7.
  8. 8. Accounting for context:Language model adaptation<br />Adapt a language model to more closely resemble the target speech<br />Using related text for<br />Topic modelling (vocabulary, concepts)<br />Style-of-speech modelling<br /> “ok and um it's quite useful to have a very good diagnostic test of of acute hepatitis um you know to prevent kind of unnecessary um surgery um so hepatitis is really one um example of a cause of acute abdominal pain that doesn't need surgery”<br />
  9. 9. Using Wikipedia for LM adaptation<br />Goal is to adapt a “standard” LM to be specific to the topic of the audio<br />Start somewhere: title, keywords, text from slides<br />Select a set of documents, adapt the LM<br />Using wikipedia, select by similarity: identify the set of documents most closely related to the starting point or keywords<br />
  10. 10. Baseline performance with Sphinx4 (HUB4 acoustic and language models)<br />Lecture audio and transcripts from Open Yale Courses http://oyc.yale.edu/ Used under CC-BY-NC-SA license.<br />
  11. 11. Best-case comparison (30% WER)Transcript, HUB4 LM, Wikipedia Similarity LM<br />Before launching into Pynchon today, I thought I would just take a few moments to look back over the books that we've read and talk about the visions of language that they have offered us, and also just to reflect for a moment on the relationship imagined between those visions of language and what is happening outside of fiction in what we might call the real world.<br />We started this course talking about Black Boy and the way that a whole world of pressure -- political pressure, racial tension -- pushed on the borders of that work and actually changed its very material form.<br />before launching into not pynchontoday route just take a few moments to look back cover the books that we've brad<br />and talk about the visions of language that they have offered up<br />and also just to reflect for mounted on the relationship<br />imagine between those visions of language<br />and what is happening outside of fiction in in what we might call the real<br />world we started this course talking about black boy and a weighing bat<br />a whole world of pressure political pressure racial tension<br />pushed on the borders and that work and actually changed its very nature eel for<br />before launching into not mentioned today really does take a few moments to look back over the books that we've read<br />and talk about the visions of language that they have offered up<br />and also just to reflect for movement on the relationship<br />imagine between those visions of language<br />and what is happening outside of fiction in in what we might call the reel<br />well we started this course talking about black boy and a weighing of that<br />a whole world of pressure political pressure of racial tension<br />pushed on the borders of bad work and actually changed its very nature eel for<br />
  12. 12. Worst-case comparison (61% WER)Transcript, HUB4 LM, Wikipedia Similarity LM<br />i'd talk with the french revolution this party do in all the myself will forty-five minutes after throughout beginning<br />i'm in seoul on<br />on i wanted it to do<br />two things unless the revolution through<br />the eyes of maps that ulmus piano<br />member of a treaty of public safety arguably without fascists<br />i'd solicit were not member<br />ah is jacobo out into an away he incarnated death jacobinchapel back he imparted the french revolution<br />i've talked with the french are loose in this part to do in all the myself low forty five minutes after score of beginning<br />i'm in seoul on<br />bob and i wanted to do<br />two things i want the revolution through<br />the eyes of maps that elvis piano<br />a member of the treaty of public safety are giveaway with that fascists<br />i thought it were not member<br />ah gee i go back into a a way he imparted that chappel been the chapel back he imparted the first revolution<br />I'm going to talk about the French Revolution.<br />It's hard to do.<br />I'll leave myself about forty-five minutes after I screw around at the beginning.<br />I want to do two things.<br />I want to see the Revolution through the eyes of Maximilien de Robespierre, a member of the Committee of Public Safety --arguably, with Saint-Just, its most important member.<br />In a way, Jacobin -- he incarnated the French Revolution.<br />
  13. 13. Work in progress<br />Identify requirements for recording recognition-quality audio (equipment, acoustics)<br />Implement dynamic language model adaptation<br />Integrate into Opencast Matterhorn workflow<br />Show transcript to users in UI, enable search<br />Allow users to edit / improve transcript<br />Use edits to improve recognition<br />
  14. 14. Other integration possibilities<br />External transcription services (automate the workflow, choice between manual or automatic transcript)<br />External speech recognition services (e.g. nexiwave.com)<br />
  15. 15. Find out more<br />Email me:stephen.marquard@uct.ac.za<br /> Follow me on Twitter:<br />http://twitter.com/stephenmarquard<br /> Read my blog on open source language modelling and speech recognition: http://trulymadlywordly.blogspot.com<br /> CMU Sphinxhttp://cmusphinx.sourceforge.net/<br />

×