Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

2,261 views

Published on

Need to find a specific segment in an hour-long web video, webcast or podcast of a lecture? Want to read a transcript of that lecture? Want to bookmark, annotate, or discuss video or audio clips from an entire lecture? The SpokenMedia project at MIT is developing a web-based service to enable automatic lecture transcription. The project is also developing a suite of tools and services to improve interaction with webcasts and podcasts enabling students and faculty to create rich media notebooks to support their learning and teaching. Presented by Brandon Muramatsu, Andrew McKinney and Peter Wilkins at the NERCOMP 2010, Providence, Rhode Island, March 9, 2010.

  • Be the first to comment

SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

  1. 1. SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks Brandon Muramatsu [email_address] Andrew McKinney [email_address] Peter Wilkins [email_address] MIT, Office of Educational Innovation and Technology Citation: Muramatsu, B., McKinney, A., Wilkins, P. (2010). SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks. Presented at NERCOMP 2010: Providence, Rhode Island, March 9, 2010. Unless otherwise specified, this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License
  2. 2. SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks Brandon Muramatsu [email_address] Andrew McKinney [email_address] Peter Wilkins [email_address] MIT, Office of Educational Innovation and Technology Citation: Muramatsu, B., McKinney, A., Wilkins, P. (2010). SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks. Presented at NERCOMP 2010: Providence, Rhode Island, March 9, 2010. Unless otherwise specified, this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License … we now return you to your regularly scheduled presentation… SpokenMedia: What to do if your videos aren’t in YouTube B R E A K I N G N E W S … YouTube announces captions on all videos…News at 11… YouTube. (2010, March 4). The Future Will be Captioned: Improving Accessibility on YouTube. Retrieved on March 8, 2010 from YouTube Website: http://youtube-global.blogspot.com/2010/03/future-will-be-captioned-improving.html
  3. 3. Why are we doing this? <ul><li>More & more videos on the Web </li></ul><ul><ul><li>Universities recording course lectures </li></ul></ul><ul><ul><li>Students relying upon Web video for courses </li></ul></ul>MIT OCW 8.01 : Professor Lewin puts his life on the line in Lecture 11 by demonstrating his faith in the Conservation of Mechanical Energy.
  4. 4. What video? Where? iTunes U
  5. 5. What are the challenges? <ul><li>Search </li></ul><ul><ul><li>Volume </li></ul></ul><ul><ul><li>Segmented by Web, Video </li></ul></ul><ul><ul><li>Text title and Description </li></ul></ul>Google Search for “ angular momentum” Performed April 2009
  6. 6. What about Bing? Bing Search for “angular momentum” Performed August 2009
  7. 7. What are the Challenges? <ul><li>Description </li></ul><ul><ul><li>Course and Lecture Title </li></ul></ul><ul><ul><li>Summary </li></ul></ul><ul><ul><li>Metadata? </li></ul></ul>YouTube, MIT OCW Physics 8.01 - Lecture 20 Retrieved August 2009 webcast.berkeley, Physics 8A, 002, Spring 2009 Retrieved August 2009
  8. 8. What are the challenges? Use <ul><li>Interaction & Use </li></ul><ul><ul><li>Transcripts / captions </li></ul></ul><ul><ul><ul><li>Do they exist? </li></ul></ul></ul><ul><ul><ul><li>Cost? </li></ul></ul></ul><ul><ul><li>Full video vs. segments </li></ul></ul>Lewin, W. (1999). Lec 20 | 8.01 Physics I: Classical Mechanics, Fall 1999. Retrieved August 1, 2009 from YouTube Website: http://www.youtube.com/watch?v=ibePFvo22x4 “ GOD!!!51 MINUTES!! i think i'll pass.. “ – slourdas, YouTube
  9. 9. Search thru the Static <ul><li>We’re living in a video world…but only have text to use for search… </li></ul>flickr @ futureatlas.com
  10. 10. Why do we need these tools? <ul><li>Improve search and retrieval </li></ul><ul><li>Improve user experience </li></ul><ul><li>Captioning for accessibility? </li></ul><ul><li>Facilitate translation? </li></ul><ul><li>Other uses? </li></ul>
  11. 11. YouTube Announcement YouTube. (2010, March 4). The Future Will be Captioned: Improving Accessibility on YouTube. Retrieved on March 8, 2010 from YouTube Website: http://youtube-global.blogspot.com/2010/03/future-will-be-captioned-improving.html
  12. 12. Comparing SpokenMedia and YouTube Auto-Caption? <ul><li>YouTube </li></ul><ul><li>Scale ✔ </li></ul><ul><li>Research-basis ✔ </li></ul><ul><li>For all videos ✔ (soon) </li></ul><ul><li>No transcript/caption export (?) </li></ul><ul><li>YouTube hosted </li></ul><ul><li>Accuracy based on general patterns (?) </li></ul><ul><li>No transcript editing (?) </li></ul><ul><li>SpokenMedia </li></ul><ul><li>Limited </li></ul><ul><li>Research-basis ✔ </li></ul><ul><li>Service by request </li></ul><ul><li>Transcript/caption export available ✔ </li></ul><ul><li>Hosted anywhere ✔ </li></ul><ul><li>Accuracy based on custom models ✔ (soon) </li></ul><ul><li>Transcript editing ✔ (soon) </li></ul>
  13. 13. Developing SpokenMedia… <ul><li>What do we have at MIT? </li></ul><ul><ul><li>Existing videos & audio, new video </li></ul></ul><ul><ul><li>Lecture notes, slides, etc. (descriptive text) </li></ul></ul><ul><ul><li>Multiple videos/audio by same lecturer </li></ul></ul><ul><ul><li>Diverse topics/disciplines </li></ul></ul><ul><li>Research from Spoken Language Systems Group !!! </li></ul>
  14. 14. Enabling Research <ul><li>Spoken Lecture: research project </li></ul><ul><li>Speech recognition & automated transcription of lectures </li></ul><ul><li>Why lectures? </li></ul><ul><ul><li>Conversational, spontaneous, starts/stops </li></ul></ul><ul><ul><li>Different from broadcast news, other types of speech recognition </li></ul></ul><ul><ul><li>Specialized vocabularies </li></ul></ul>James Glass [email_address]
  15. 15. Spoken Lecture Project <ul><li>Processor, browser, workflow </li></ul><ul><li>Prototyped with lecture & seminar video </li></ul><ul><ul><li>MIT OCW (~300 hours, lectures) </li></ul></ul><ul><ul><li>MIT World (~80 hours, seminar speakers) </li></ul></ul><ul><li>Supported with iCampus MIT/Microsoft Alliance funding </li></ul>James Glass [email_address]
  16. 16. Tech Transfer Timeline: Research -> Service 1990 2000 2010 2006 Spoken Language Systems Group Research 2009
  17. 17. Let’s see a demo!
  18. 18. Demo
  19. 19. How Does it Work? Lecture Transcription Workflow
  20. 20. Recognizer Accuracy? ~85% <ul><li>Accuracy </li></ul><ul><ul><li>Domain Model and Acoustic Model </li></ul></ul><ul><ul><li>Internal validity measure </li></ul></ul><ul><ul><li>Single 100% accurate transcript for a full course </li></ul></ul>Ongoing research by Jim Glass and his team
  21. 21. What works today? Lecture Transcription Workflow
  22. 22. Transcript “Errors” <ul><li>“ angular momentum and forks it’s extremely non intuitive” </li></ul><ul><ul><li>“ folks”? </li></ul></ul><ul><ul><li>“ torques”? </li></ul></ul><ul><li>“ introduce both fork an angular momentum” </li></ul><ul><ul><li>“ torque”! </li></ul></ul>
  23. 23. That’s what we have today… <ul><li>Features </li></ul><ul><ul><li>Video linked transcripts </li></ul></ul><ul><ul><li>“ Bouncing Ball” follow along </li></ul></ul><ul><ul><li>Search within a video </li></ul></ul><ul><ul><li>Multiple transcript language support </li></ul></ul><ul><li>Challenges </li></ul><ul><ul><li>Accuracy (partial toolset) </li></ul></ul>
  24. 24. Where are we heading? <ul><li>Improved accuracy </li></ul><ul><li>Automate and improve processing </li></ul><ul><li>Search across multiple video transcripts </li></ul><ul><li>Starting a lecture transcription service </li></ul>
  25. 25. Lecture Transcription Service <ul><li>Integrate with media production workflows </li></ul><ul><ul><li>At MIT, University of Queensland </li></ul></ul><ul><li>Stand-alone service </li></ul><ul><ul><li>Test with external content (video) producers </li></ul></ul>
  26. 26. A Lecture Transcription Service? Caveats <ul><li>Lecture-style content (technology optimized) </li></ul><ul><li>Up to 85% accuracy </li></ul><ul><ul><li>(good for search, not sure about accessibility) </li></ul></ul><ul><li>English-language audio </li></ul><ul><ul><li>(need much more research for other languages) </li></ul></ul><ul><li>Processing hosted at MIT (current thinking) </li></ul><ul><ul><li>Submit jobs via MIT-run service </li></ul></ul><ul><ul><li>Contribute audio, models, transcript for further research </li></ul></ul>
  27. 27. Test it for yourself! <ul><li>http://spokenmedia.mit.edu/ </li></ul><ul><li>http://sm.mit.edu/upload </li></ul>
  28. 28. Toward Rich Media Notebooks Improving the User Experience <ul><li>Innovative player interfaces (soon) </li></ul><ul><ul><li>Bookmarking and annotation </li></ul></ul><ul><ul><li>Clip creation and authoring </li></ul></ul><ul><li>Transcript editing (soon) </li></ul><ul><li>Searching across collections of videos </li></ul>
  29. 29. Player with Annotation Mockup
  30. 30. Editing Interfaces Soon (we’re designing the editing interfaces right now)
  31. 31. Thanks! spokenmedia.mit.edu Brandon Muramatsu [email_address] Andrew McKinney [email_address] Peter Wilkins [email_address] MIT, Office of Educational Innovation and Technology Citation: Muramatsu, B., McKinney, A., Wilkins, P. (2010). SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks. Presented at NERCOMP 2010: Providence, Rhode Island, March 9, 2010. Unless otherwise specified, this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License

×