   What is Speech Recognition?
    Also known as automatic speech
     recognition or computer speech
     recognition which means
     understanding voice by the computer
     and performing any required task.
   Where can it be used?
   System control/navigation
    e.g. GPS-connected digital maps: “How far is it
    to the motorway junction?”
   Commercial/Industrial applications
    in-car steering systems
    Voice dialing
    hands-free use of mobile in car e.g. “Dial
    office”
Voice Input     Analog to Digital      Acoustic Model



                                       Language Model




     Feedback      Display          Speech Engine
   Acoustic Model
       An acoustic model is created by taking audio recordings of
        speech, and their text transcriptions, and using software to
        create statistical representations of the sounds that make up
        each word. It is used by a speech recognition engine to
        recognize speech.

   Language Model
         Language model is used in many natural language
        processing applications such as speech recognition tries to
        capture the properties of a language, and to predict the next
        word in a speech sequence.
   Two types of speech recognition.

       Speaker-Dependent

              is commonly used for dictation software

     Speaker-Independent

               is more commonly found in telephone
        applications.
   Speaker-dependent software works by learning
    the unique characteristics of a single person’s
    voice, in a way similar to voice recognition.
    New users must first “train” the software by
    speaking to it, so the computer can analyze
    how the person talks.
    This often means users have to read a few
    pages of text to the computer before they can
    use the speech recognition software.
   Speaker-independent software is designed to recognize
    anyone’s voice, so no training is involved.
    This means it is the only real option for applications
    such as interactive voice response systems — where
    businesses can’t ask callers to read pages of text before
    using the system.
   The downside is that speaker-independent software is
    generally less accurate than speaker-dependent
    software.
   Speech recognition engines that are speaker
    independent generally deal with this fact by limiting
    the grammars they use. By using a smaller list of
    recognized words, the speech engine is more likely to
    correctly recognize what a speaker said.
   Articulation produces
   sound waves which
   the ear conveys to the brain
   for processing
Acoustic waveform      Acoustic signal




      Digitization
      Acoustic analysis of the
                                         Speech recognition
       speech signal
      Linguistic interpretation
• Digitization
   Analogue to digital conversion
   Sampling and quantizing
       Sampling is converting a continuous signal into a discrete signal
       Quantizing is the process of approximating a continuous range of
        values
• Signal processing
      – Separating speech from background noise
• Phonetics
      – Variability in human speech
• Phonology
      – Recognizing individual sound distinctions (similar
        phonemes)
      – is the systematic use of sound to encode meaning
        in any spoken human language
   Semantics and pragmatics
     Semantics tells about the meaning
     Pragmatics is concerned with bridging the
      explanatory gap between sentence meaning and
      speaker's meaning
   Lexicology and syntax
       Lexicology is that part of linguistics which
        studies words, their nature, and meaning.
     Syntax tell about the arrangement of words and
        phrases to create well-formed sentences.
S1




 Speaker       Speech
                             parsing      S2
                               and
Recognition   Recognition
                            arbitration


                                          SK



                                          SN
Switch on                                             S1
Channel 9




             Speaker       Speech
                                         parsing      S2
                                           and
            Recognition   Recognition
                                        arbitration


                                                      SK



                                                      SN
Who is                          S1
              speaking?




 Speaker            Speech
                                  parsing      S2
                                    and
Recognition        Recognition
                                 arbitration


                                               SK


                    Annie
                    David                      SN
                    Cathy


                  “Authentication”
What is he          S1
                        saying?




 Speaker       Speech
                              parsing      S2
                                and
Recognition   Recognition
                             arbitration


                                           SK



                              On,Off,TV    SN
                             Fridge,Door


              “Understanding”
What is he
                             talking              S1
                             about?




 Speaker       Speech
                               parsing            S2
                                 and
Recognition   Recognition
                              arbitration


                                                  SK


 “Switch”,”to”,”channel”,”nine”
                                    Channel->TV
                                                  SN
                                     Dim->Lamp
                                    On->TV,Lamp
    “Inferring and execution”
   http://www.slideshare.net/richiebmthimmaia
    h/automatic-speech-recognition-4721204
   http://www.scribd.com/doc/36605017/Voice
    -Recognition-System-PPT
   http://www.google.com.pk/url?
    sa=t&rct=j&q=speech%20recognization
    %20system%20.ppt
   http://www.wikipedia.com
Artificial intelligence Speech recognition system

Artificial intelligence Speech recognition system

  • 2.
    What is Speech Recognition? Also known as automatic speech recognition or computer speech recognition which means understanding voice by the computer and performing any required task.
  • 3.
    Where can it be used?  System control/navigation e.g. GPS-connected digital maps: “How far is it to the motorway junction?”  Commercial/Industrial applications in-car steering systems  Voice dialing hands-free use of mobile in car e.g. “Dial office”
  • 4.
    Voice Input Analog to Digital Acoustic Model Language Model Feedback Display Speech Engine
  • 5.
    Acoustic Model  An acoustic model is created by taking audio recordings of speech, and their text transcriptions, and using software to create statistical representations of the sounds that make up each word. It is used by a speech recognition engine to recognize speech.  Language Model  Language model is used in many natural language processing applications such as speech recognition tries to capture the properties of a language, and to predict the next word in a speech sequence.
  • 6.
    Two types of speech recognition.  Speaker-Dependent is commonly used for dictation software  Speaker-Independent is more commonly found in telephone applications.
  • 7.
    Speaker-dependent software works by learning the unique characteristics of a single person’s voice, in a way similar to voice recognition.  New users must first “train” the software by speaking to it, so the computer can analyze how the person talks.  This often means users have to read a few pages of text to the computer before they can use the speech recognition software.
  • 8.
    Speaker-independent software is designed to recognize anyone’s voice, so no training is involved.  This means it is the only real option for applications such as interactive voice response systems — where businesses can’t ask callers to read pages of text before using the system.  The downside is that speaker-independent software is generally less accurate than speaker-dependent software.  Speech recognition engines that are speaker independent generally deal with this fact by limiting the grammars they use. By using a smaller list of recognized words, the speech engine is more likely to correctly recognize what a speaker said.
  • 9.
    Articulation produces  sound waves which  the ear conveys to the brain  for processing
  • 10.
    Acoustic waveform Acoustic signal  Digitization  Acoustic analysis of the Speech recognition speech signal  Linguistic interpretation
  • 12.
    • Digitization  Analogue to digital conversion  Sampling and quantizing  Sampling is converting a continuous signal into a discrete signal  Quantizing is the process of approximating a continuous range of values • Signal processing – Separating speech from background noise • Phonetics – Variability in human speech • Phonology – Recognizing individual sound distinctions (similar phonemes) – is the systematic use of sound to encode meaning in any spoken human language
  • 13.
    Semantics and pragmatics  Semantics tells about the meaning  Pragmatics is concerned with bridging the explanatory gap between sentence meaning and speaker's meaning  Lexicology and syntax  Lexicology is that part of linguistics which studies words, their nature, and meaning.  Syntax tell about the arrangement of words and phrases to create well-formed sentences.
  • 14.
    S1 Speaker Speech parsing S2 and Recognition Recognition arbitration SK SN
  • 15.
    Switch on S1 Channel 9 Speaker Speech parsing S2 and Recognition Recognition arbitration SK SN
  • 16.
    Who is S1 speaking? Speaker Speech parsing S2 and Recognition Recognition arbitration SK Annie David SN Cathy “Authentication”
  • 17.
    What is he S1 saying? Speaker Speech parsing S2 and Recognition Recognition arbitration SK On,Off,TV SN Fridge,Door “Understanding”
  • 18.
    What is he talking S1 about? Speaker Speech parsing S2 and Recognition Recognition arbitration SK “Switch”,”to”,”channel”,”nine” Channel->TV SN Dim->Lamp On->TV,Lamp “Inferring and execution”
  • 19.
    http://www.slideshare.net/richiebmthimmaia h/automatic-speech-recognition-4721204  http://www.scribd.com/doc/36605017/Voice -Recognition-System-PPT  http://www.google.com.pk/url? sa=t&rct=j&q=speech%20recognization %20system%20.ppt  http://www.wikipedia.com