Tiny Ears
   Using Speech
Recognition To Teach
    Kids To Read
          Emily Toop
        Radical Robot
   Brighton iPhone Creators
        November 2011
What is Speech Recognition?
• Converting spoken words to text
• Not targeted to a single speaker (voice
  recognition)
• Utterances converted into phonemes that are
  compared against language model & grammar
  to generate a hypothesis
• Recognition score to give confidence in
  hypothesis
Why is Speech            The human brain is

       Recognition Hard?          incredibly specialised -
                                  speech recognition &
                                  vision has taken millions
                                  of years to perfect. Hard
                                  to make a computer do
                                  the same thing.



• Background Noise
• Detecting gaps
• Too many hypotheses generated
• Accents
• Other Languages
• Dictionary words vs unknown words (i.e.
  names)
How Does Siri Work?

• Protocol Cracked - https://
  github.com/plamoni/SiriProxy
• Server Based because of CPU & live
  data updates - doesn’t work offline
• Limited vocabulary with well designed
  grammar
Device Based
        Recognition


• Works offline
• Immediate response for real time
  processing
• No need for expensive data plans for
  your app to work
Device Based
        Recognition

• Open Ears - http://
  www.politepix.com/openears
• Pocket Sphinx/ Sphinx CMU http://
  cmusphinx.sourceforge.net/2010/03/
  pocketsphinx-0-6-release/
• Limited Language Model
• Limited Grammer
Demo




Number Recogniser
Number Recogniser

• Import OpenEars .xcodeproj into
  project
• Add OpenEars as target dependency
• link libOpenEarsLibrary.a binary
• Add OpenEars, SphinxBase &
  PocketSphinx to Header Search Path
Number Recogniser
• Create and start audioSessionManager
  is delegate
  didFinishLaunchingWithOptions
Number recogniser

• Rename .m file that runs
  PocketSphinxController to .mm
• Add OpenEarsEventObserverDelegate
Number Recogniser
Number recogniser



•   -(void)pocketsphinxRecognitionLoopDidStart{}
•   -(void)fliteDidFinishSpeaking{} (if using flite for
    text to speech)
Improving Recognition
 with Face Detection
• Determine when user is speaking
  directly to app and not to another
  person to enhance accuracy
• Stop listening when face not detected.
• Detect when app has been abandoned
  & shut down audio manager etc.
• Start listening when face is detected
  again
Demo


• Decorator
• Using Core Image for face detection
  WWDC Session Videos numbers 419 &
  422
Kitten Break
Kitten Break
Tiny Ears
• iPad Storybook using Speech
  Recognition to listen to children as they
  read aloud
• Detect when child stumbles or does not
  recognise a word & intervene with
  assistance to teach child to read word
• Track reading progress over time to
  provide targeted feedback.
Problems -
          Educational

• Large Age Range - different kids have
  different reading abilities and therefore
  require different levels of feedback/
  intervention
• Presenting learning in a fun way so
  nothing is so difficult child will give up
  rather than learn
Problems -
   Speech Recognition

• 4 year olds speak very differently from
  adults
• how do we detect errors? - unknown
  words & mispronounciations
• ‘noise’ words, detecting coughs, laughs
  or sounds indicating distress or
  difficulty
Problems -
   Speech Recognition
• Is the child present?
• Is there more than one person present?
 • Whose speech should we process?
 • Can we even tell?
• Can we detect if the child is in distress
  or struggling?
• Can we detect reading ability through
  Speech Recognition?
Startup Chile

• Startup Accelerator run by Chilean
  government
• US$40k for 6 month, no equity
• Starting January 16th
• Looking for collborators from
  education, business, artificial
  intelligence - email me
Questions?


• http://emilytoop.com
• @fluffyemily
• emily@radicalrobot.co.uk
• http://radicalrobot.co.uk

Speech recognition

  • 1.
    Tiny Ears Using Speech Recognition To Teach Kids To Read Emily Toop Radical Robot Brighton iPhone Creators November 2011
  • 2.
    What is SpeechRecognition? • Converting spoken words to text • Not targeted to a single speaker (voice recognition) • Utterances converted into phonemes that are compared against language model & grammar to generate a hypothesis • Recognition score to give confidence in hypothesis
  • 3.
    Why is Speech The human brain is Recognition Hard? incredibly specialised - speech recognition & vision has taken millions of years to perfect. Hard to make a computer do the same thing. • Background Noise • Detecting gaps • Too many hypotheses generated • Accents • Other Languages • Dictionary words vs unknown words (i.e. names)
  • 4.
    How Does SiriWork? • Protocol Cracked - https:// github.com/plamoni/SiriProxy • Server Based because of CPU & live data updates - doesn’t work offline • Limited vocabulary with well designed grammar
  • 5.
    Device Based Recognition • Works offline • Immediate response for real time processing • No need for expensive data plans for your app to work
  • 6.
    Device Based Recognition • Open Ears - http:// www.politepix.com/openears • Pocket Sphinx/ Sphinx CMU http:// cmusphinx.sourceforge.net/2010/03/ pocketsphinx-0-6-release/ • Limited Language Model • Limited Grammer
  • 7.
  • 8.
    Number Recogniser • ImportOpenEars .xcodeproj into project • Add OpenEars as target dependency • link libOpenEarsLibrary.a binary • Add OpenEars, SphinxBase & PocketSphinx to Header Search Path
  • 9.
    Number Recogniser • Createand start audioSessionManager is delegate didFinishLaunchingWithOptions
  • 10.
    Number recogniser • Rename.m file that runs PocketSphinxController to .mm • Add OpenEarsEventObserverDelegate
  • 11.
  • 12.
    Number recogniser • -(void)pocketsphinxRecognitionLoopDidStart{} • -(void)fliteDidFinishSpeaking{} (if using flite for text to speech)
  • 13.
    Improving Recognition withFace Detection • Determine when user is speaking directly to app and not to another person to enhance accuracy • Stop listening when face not detected. • Detect when app has been abandoned & shut down audio manager etc. • Start listening when face is detected again
  • 14.
    Demo • Decorator • UsingCore Image for face detection WWDC Session Videos numbers 419 & 422
  • 15.
  • 16.
  • 17.
    Tiny Ears • iPadStorybook using Speech Recognition to listen to children as they read aloud • Detect when child stumbles or does not recognise a word & intervene with assistance to teach child to read word • Track reading progress over time to provide targeted feedback.
  • 18.
    Problems - Educational • Large Age Range - different kids have different reading abilities and therefore require different levels of feedback/ intervention • Presenting learning in a fun way so nothing is so difficult child will give up rather than learn
  • 19.
    Problems - Speech Recognition • 4 year olds speak very differently from adults • how do we detect errors? - unknown words & mispronounciations • ‘noise’ words, detecting coughs, laughs or sounds indicating distress or difficulty
  • 20.
    Problems - Speech Recognition • Is the child present? • Is there more than one person present? • Whose speech should we process? • Can we even tell? • Can we detect if the child is in distress or struggling? • Can we detect reading ability through Speech Recognition?
  • 21.
    Startup Chile • StartupAccelerator run by Chilean government • US$40k for 6 month, no equity • Starting January 16th • Looking for collborators from education, business, artificial intelligence - email me
  • 22.
    Questions? • http://emilytoop.com • @fluffyemily •emily@radicalrobot.co.uk • http://radicalrobot.co.uk

Editor's Notes

  • #2 \n
  • #3 \n
  • #4 Background Noise - solution possible Noise Rejection Microphones. These are getting better but still aren’t fantastic\nDetecting gaps - need loads of training data to train statistical model on expected speech patterns\nHypotheses - lots of CPU required to whittle them down to most likely\nAccents - More training data to cover accents and more CPU to match against language/grammar models\nOther Languages - need a new model or every language\n\n
  • #5 \n
  • #6 \n
  • #7 \n
  • #8 \n
  • #9 \n
  • #10 \n
  • #11 \n
  • #12 \n
  • #13 \n
  • #14 \n
  • #15 \n
  • #16 \n
  • #17 \n
  • #18 \n
  • #19 \n
  • #20 \n
  • #21 error detection - car/care, ph vs f and silent letters - hour\n
  • #22 1) should we ignore or accept sound input as speech?\n3) - visually or through ‘noise’ word detection\n\n
  • #23 \n
  • #24 \n