Successfully reported this slideshow.
Your SlideShare is downloading. ×

Speech recognition - Art of the possible

Ad

Speech
recognition:
Art of the possible
Dominik.Lukes@ctl.ox.ac.uk @techczech

Ad

Dominik’s journey
Computational linguistics
Cognitive linguistics
Language teaching
1990–1995
Language teacher training
Tr...

Ad

Bill Gates in 2011
“The next big thing is definitely
speech and voice recognition.”

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Loading in …3
×

Check these out next

1 of 61 Ad
1 of 61 Ad

Speech recognition - Art of the possible

Download to read offline

A lightning talk presentation from Jisc's Focus on the future: new developments in accessible and assistive technologies event held on 16 March 2022 as part of Digifest community fringe.

A lightning talk presentation from Jisc's Focus on the future: new developments in accessible and assistive technologies event held on 16 March 2022 as part of Digifest community fringe.

Advertisement
Advertisement

More Related Content

More from Jisc (20)

Advertisement

Speech recognition - Art of the possible

  1. 1. Speech recognition: Art of the possible Dominik.Lukes@ctl.ox.ac.uk @techczech
  2. 2. Dominik’s journey Computational linguistics Cognitive linguistics Language teaching 1990–1995 Language teacher training Translation Metaphor / discourse studies 1995–2008 Readability Learning / Assistive technology Dyslexia teacher training 2009 – present
  3. 3. Bill Gates in 2011 “The next big thing is definitely speech and voice recognition.”
  4. 4. What do we want to know? What is the current state of the art? How we got here? Where are going?
  5. 5. Are we asking the right questions?
  6. 6. Tasks for speech recognition by difficulty Select word from list Interpret command Type dictation Transcribe presentation Transcribe conversation
  7. 7. How we think of it vs how it is Select word from list Interpret command Type dictation Transcribe presentation Transcribe conversation Transcribe conversation Transcribe presentation Type dictation Interpret command Select word from list
  8. 8. Speech recognition approximate timeline Select digit 1950s Select from 1000 words 1970s Select from large vocabulary 1980s Dictate word by word 1990s Dictate whole sentences 1997 Transcribe YouTube video 2012 Transcribe conversation 2019
  9. 9. What is the actual job of speech recognition?
  10. 10. What is this word? [pʰɹɛtsɫ̩] [pɹɛtsl] /pretsəl/ <pretzel>
  11. 11. What’s the problem aspirated /p/ at start of a stressed syllable devoiced /r/ following /p/ labialised /r/ following /p/ dark /l/ syllabic consonant glottal stop
  12. 12. It gets worse: find the missing sounds
  13. 13. Course on speech recognition 1993 Faster computers won’t help improve speech recognition. We need a new approach.
  14. 14. Dragon Naturally Speaking released in 1997. Can recognise whole sentences. What happened?
  15. 15. How speech recognition does not work? Finding individual sounds (phonemes) in the speech and matching them to letters.
  16. 16. How speech recognition actually works? P(W|C) What is the likelihood that the next word is X given what came before?
  17. 17. Actually, it is quite a bit more complicated (Huang and Deng 2009)
  18. 18. Probabilistic (stochastic) ASR enabled the change. Linguistics took the back seat.
  19. 19. Fred Jelinek (ASR Pioneer - 1988?) "Every time I fire a linguist, the performance of the speech recognizer goes up"
  20. 20. Consequence of probabilistic approach: Worse on words not predictable from context Names Acronyms Specialist Terms
  21. 21. Question in 2011 I recorded a lecture, can I use Dragon to transcribe it?
  22. 22. “Caption fails” in 2014 provided source for comedy
  23. 23. YouTube Captions today are usable and useful
  24. 24. So what happened between 2014 and 2022?
  25. 25. Ingredients of success Larger data sets More computing power Neural networks
  26. 26. Patrick Winston (2015) MIT Lecture 12a in AI course It was in 2010, yes, that's right. It was in 2010. We were having our annual discussion about what we would dump from 6034 in order to make room for some other stuff. And we almost killed off neural nets. That might seem strange because our heads are stuffed with neurons. … But many of us felt that the neural models of the day weren't much in the way of faithful models of what actually goes on inside our heads. And besides that, nobody had ever made a neural net that was worth a darn for doing anything.
  27. 27. 2012 – ImageNet showed that Neural Networks are much better at computing the probabilities for complex data.
  28. 28. Ok, we have neural nets, what does that mean?
  29. 29. Things to know about Neural Nets Everything has a probability Same input does not produce same output They have no ‘sanity check’ or ‘common sense’
  30. 30. What do probabilities look like?
  31. 31. What BERT is not: Lessons from a new suite of psycholinguistic diagnostics for language models Allyson Ettinger 2019
  32. 32. https://what-if.xkcd.com/34
  33. 33. Output changes as more information is made available. (Not always for the better)
  34. 34. Examples from today’s captions Crystal > Chris is Am > and experts > experience AR > a our
  35. 35. Different ways of transcribing Dua Lipa alipa dualipa dua lipa lipa duda lipa
  36. 36. Rise and mostly fall of Google’s new spell Czech
  37. 37. Tracking faces at the tips of the shoes
  38. 38. Hallucination is a big problem
  39. 39. Question asked by faculty member in 2021 We correct the transcripts, why doesn’t the system learn the correct spelling?
  40. 40. Adding your own word list just tweaks the probabilities.
  41. 41. Setting a genre setting tweaks the probabilities.
  42. 42. Another thing to know about NN Neural Nets use very large data sets and can take days or weeks to train.
  43. 43. Consequences of NN size Speech recognition is often not done on device. Individual input often cannot adjust the quality (except in pre-training) Most applications use APIs from the big players Few open source/free options
  44. 44. Big players in the field Google Microsoft (now also Nuance) Amazon
  45. 45. Interesting smaller companies Verbit.ai Carescribe.io (Caption.Ed) Otter.ai Rev.ai
  46. 46. Interesting applications Descript Microsoft Reading Progress Microsoft Presentation Coach
  47. 47. What can we expect in the future
  48. 48. Cautionary tale by SMBC
  49. 49. The Original Roomba (2002) vs Roomba S9+ (2019) - Wow!
  50. 50. What happens in speeches Fillers Repetition
  51. 51. What does conversation actually look like?
  52. 52. Possible futures? Incremental improvement similar to Roomba in 17 years Accurate lecture transcripts Fluent dictation with pauses Better meeting transcription Revolutionary change similar to change in speech recognition in 6 years Informal conversation transcription Interactive dictation Multilingual speech transcription
  53. 53. How should we think about accuracy? We speak 120-180 words per minute 99% accurate = 2 errors per minute
  54. 54. From Sept 2014 xkcd.com/1425 Sometimes it is hard to judge how much effort will be needed to solve a seemingly easy problem.
  55. 55. Wishlist (a few hours of coding) Transcripts indicate level of confidence Benchmarks for lecture transcripts Better manual control of transcripts (like Descript)
  56. 56. Dreamlist (5 years and a research team) Multilingual transcription (identify change in language) Multimodal transcription (use information from video) Raw to readable transcript
  57. 57. Welcome to the panel
  58. 58. Kate Knill Machine Intelligence Lab, University of Cambridge Richard Cave MND Association (and formerly Google project Euphonia) Richard Purcell Caption.Ed Irit Opher Head of Research at Verbit.ai
  59. 59. What is the current state of the art of speech recognition in general and in the transcription of recorded speech in particular? What are the current quality metrics and how much do they tell us about suitability of models? Do we need better ones? After the big recent jump in performance, are we seeing a plateau with incremental growth or can we expect another step change in quality? Where can we see the most innovation? What are the research and development blind spots where more effort is needed? What are the currently unsolved problems for which we do not have a solution? What is the space for smaller players to innovate in this space? How much do they have to rely on pre- trained models from big providers? Is there space for open source?
  60. 60. This presentation is licensed under Creative Commons By Attribution license except where otherwise noted. Icons and stock images from Microsoft Office 365 creative premium. They cannot be distributed separately from this document.

×