Writing and Speech Recognition


Published on

Writing Recognition, Digital Ink, Speech Recognition

Published in: Business, Technology

Writing and Speech Recognition

  1. 1. Speech, Ink, and Slides: The Interaction of Content Channels Richard Anderson Crystal Hoyer Craig Prince Jonathan Su Fred Videon Steve Wolfman Repeat Intro of Self Mention: -Richard -Jonathan In Audience
  2. 2. Background <ul><li>Content channels simply refers to the various sources of information in some context (e.g. audio, slides, digital ink, video, etc.) </li></ul><ul><li>Our focus is on the use of digital ink in the classroom setting </li></ul><ul><li>We want to capture/playback/analyze these channels intelligently </li></ul>
  3. 3. Why do we want to analyze content channels? <ul><li>We want to make it easier to interact with electronic materials </li></ul><ul><ul><li>Better search and navigation of presentations </li></ul></ul><ul><ul><li>Accessibility for the hearing/learning/visually impaired </li></ul></ul><ul><ul><li>Generating text transcripts </li></ul></ul><ul><ul><li>Recognizing high level behaviors </li></ul></ul>Conversion to: Braille/Screen Reader
  4. 4. Distance Learning Classes
  5. 5. Classroom Presenter <ul><li>General tool for giving presentations on the Tablet PC </li></ul><ul><li>Many similar systems – our findings applicable to all such systems </li></ul><ul><li>Enables writing directly on the slides </li></ul><ul><li>Tablet PC enables high-quality digital ink </li></ul><ul><li>Used in over 100 courses so far </li></ul><ul><li>Allows us to collect real usage data </li></ul>
  6. 6. Questions We Wanted to Explore <ul><li>High Level Question: What is the potential for automatic analysis of archived content? </li></ul><ul><li>Other Questions: </li></ul><ul><ul><li>How well can digital ink be recognized by itself? </li></ul></ul><ul><ul><li>How closely are different content channels tied together? </li></ul></ul><ul><ul><ul><li>Speech and Ink? </li></ul></ul></ul><ul><ul><ul><li>Ink and Slide Content? </li></ul></ul></ul><ul><ul><li>Can we identify high level behaviors by analyzing the content channels? </li></ul></ul>
  7. 7. Research Methodology <ul><li>We wanted to understand what real presentation data is like </li></ul><ul><li>We collected several 100’s of hrs. of recorded lectures from distance learning classes </li></ul><ul><li>Analyzed the data in various ways to help answer our guiding questions. </li></ul><ul><ul><li>Note: All examples given here are from real presentations! </li></ul></ul>
  8. 8. Outline <ul><li>Motivation </li></ul><ul><li>Handwriting Recognition </li></ul><ul><li>Joint Writing and Speech Recognition </li></ul><ul><li>Attentional Mark Identification </li></ul><ul><li>Activity Inference: Recognizing Corrections </li></ul>
  9. 9. Handwriting Recognition <ul><li>Classroom lectures on Tablet PC offer interesting challenges for handwriting recognition </li></ul><ul><ul><li>Somewhat Awkward </li></ul></ul><ul><ul><ul><li>Small Surface to Write On </li></ul></ul></ul><ul><ul><ul><li>Bad Angle to the Tablet PC </li></ul></ul></ul><ul><ul><li>Hastily Written </li></ul></ul><ul><ul><ul><li>Concentrating on Speaking </li></ul></ul></ul><ul><ul><ul><li>Excited / Nervous </li></ul></ul></ul>
  10. 10. Recognition Examples <ul><li>The Good: </li></ul><ul><li>The Bad: </li></ul><ul><li>The Ugly: </li></ul>Mark: Success/Failure
  11. 11. Recognition Procedure <ul><li>Studied isolated words/phrases written on slides </li></ul><ul><li>Removed all non-textual ink </li></ul><ul><li>Fed through the Microsoft Handwriting Recognizer </li></ul><ul><li>No training done! </li></ul>
  12. 12. Handwriting Recog. Results Mention That These Results Are Surprisingly Good! Each Row Represents a Different Lecturer 260 (21%) 18 (1%) 123 (10%) 850 (68%) Total 58 (11%) 2 <(1%) 46 (9%) 408 (79%) Prof. E 111 (26%) 9 (2%) 45 (11%) 262 (61%) Prof. D 19 (44%) 1 (3%) 5 (11%) 18 (42%) Prof. C 71 (29%) 6 (2%) 26 (10%) 146 (59%) Prof. B 1 (6%) 0 (0%) 1 (6%) 16 (88%) Prof. A None Close Alternate Exact
  13. 13. Outline <ul><li>Motivation </li></ul><ul><li>Handwriting Recognition </li></ul><ul><li>Joint Writing and Speech Recognition </li></ul><ul><li>Attentional Mark Identification </li></ul><ul><li>Activity Inference: Recognizing Corrections </li></ul>Look at Potential
  14. 14. Joint Writing and Speech Recognition <ul><li>Co-expression of ink and speech </li></ul><ul><ul><li>Is digital ink spoken as it is written? </li></ul></ul><ul><ul><ul><li>Yes, but how often? How “closely” to the written text? </li></ul></ul></ul><ul><ul><li>Can speech be used to disambiguate handwriting ? </li></ul></ul><ul><ul><li>Can handwriting be used to disambiguate speech ? (incl. deictic references) </li></ul></ul>In Time/Accuracy, Wanted Empirical Evidence
  15. 15. Examples <ul><li>Difficult for Speech and Ink Recognition </li></ul><ul><li>Difficult Written Abbreviations </li></ul><ul><li>Speech/Ink Used to Disambiguate Ink/Speech </li></ul>DigiMon Java 2 Enterprise Edition Eswaran, Gray, Loric, Traiger corn flakes
  16. 16. Experiment <ul><li>Examined instances of isolated word writing </li></ul><ul><li>Selected word writing episodes at random but uniformly from the various instructors </li></ul><ul><li>Generated transcripts manually from the audio </li></ul><ul><li>Checked whether the instructor spoke the exact word written </li></ul><ul><li>Measured the time between the written and spoken word </li></ul>
  17. 17. Speech/Text Co-occurrence Results Each Row Represents a Different Lecturer
  18. 18. Outline <ul><li>Motivation </li></ul><ul><li>Handwriting Recognition </li></ul><ul><li>Joint Writing and Speech Recognition </li></ul><ul><li>Attentional Mark Identification </li></ul><ul><li>Activity Inference: Recognizing Corrections </li></ul>
  19. 19. Attentional Mark Identification <ul><li>Attentional Marks are… </li></ul><ul><li>First step is to Identify a stroke as a mark </li></ul><ul><li>Tying Attentional Marks to slide content is important </li></ul><ul><li>Attentional Ink provides a concrete link between speech and slide content ! </li></ul>
  20. 20. Example
  21. 21. Method <ul><li>Segmentation </li></ul><ul><ul><li>Few strokes </li></ul></ul><ul><ul><li>Close spatial and temporal proximity </li></ul></ul><ul><li>Mark Recognition </li></ul><ul><ul><li>Created hand tuned classifiers for: Circles, Lines, Bullets/Ticks </li></ul></ul><ul><li>Matched with slide content </li></ul>
  22. 22. Experiment <ul><li>Identified and Classified Attention Marks by Hand </li></ul><ul><ul><li>Two different people per slide </li></ul></ul><ul><ul><li>Identified type of mark as well as slide content mark referred to </li></ul></ul><ul><li>Identified Attention Marks Automatically </li></ul><ul><li>Compared Resulting Identification </li></ul>
  23. 23. Content Matching Issues <ul><li>Hard to determine exactly what content a mark refers to </li></ul>Not just a recognition Issue, but also related to HOW people draw
  24. 24. Content Matching Cont. <ul><li>Granularity of content parsing can be an issue </li></ul>
  25. 25. Attentional Ink Recognition Accuracy 532 118 (22%) 50 (9%) 35 (7%) 329 (62%) 87 35 (40%) 0 (0%) 0 (0%) 52 (60%) Bullets 339 66 (20%) 44 (13%) 22 (6%) 207 (61%) Underlines 106 17 (16%) 6 (6%) 13 (12%) 70 (66%) Circles Non-Match Close Exact to Punctuation Exact
  26. 26. Outline <ul><li>Motivation </li></ul><ul><li>Handwriting Recognition </li></ul><ul><li>Joint Writing and Speech Recognition </li></ul><ul><li>Attentional Mark Identification </li></ul><ul><li>Activity Inference: Recognizing Corrections </li></ul>
  27. 27. Recongizing Corrections <ul><li>Why? </li></ul><ul><ul><li>Want to answer the broad question: </li></ul></ul><ul><ul><ul><li>- “Can we recognize patterns of activity by analyzing the ink and speech channels?” </li></ul></ul></ul><ul><ul><li>Useful for Presenters </li></ul></ul><ul><ul><ul><li>- Occurs frequently (about 1-3 per lecture) </li></ul></ul></ul><ul><ul><li>But Non-trivial </li></ul></ul>Our vision allows false positives
  28. 28. Recognizing Corrections <ul><li>Identified Six Types of Corrections </li></ul>Looked through large # of lectures, wide range of marks
  29. 29. Example Results No Table Because: 1. Not a robust experiment 2. Proof of Concept
  30. 30. Wrap-up <ul><li>We wanted to understand the nature of real data to direct our focus when building tools for automatic analysis </li></ul><ul><li>Our studies provided the necessary understanding to accomplish this </li></ul>
  31. 31. Wrap-up (Cont.) <ul><li>Specific Results: </li></ul><ul><ul><li>Basic handwriting recognition is surprisingly good </li></ul></ul><ul><ul><li>Very strong co-occurrence of written and spoken words </li></ul></ul><ul><ul><li>We were able to identify attentional marks and the content associated with them </li></ul></ul><ul><ul><li>Activity Recognition: There are certain high-level activities that we can identify </li></ul></ul>ALL OPEN for Refinement
  32. 32. Questions? <ul><li>E-mail </li></ul><ul><ul><li>[email_address] </li></ul></ul><ul><ul><li>[email_address] </li></ul></ul><ul><li>Classroom Presenter Website </li></ul><ul><ul><li>http://www.cs.washington.edu/education/dl/presenter/ </li></ul></ul>