Meaning
 A device in which speech is used to input data or system
commands directly into a system. Such equipment
involves the use of speech recognition processes, and can
replace or supplement other input devices. Some voice
input devices can recognize spoken words from a
predefined vocabulary, some have to be trained for a
particular speaker. When the operator utters a vocabulary
item, the matching data input is displayed as characters on
a screen and can then be verified by the operator.
 Speech recognition (SR) is the inter-disciplinary subfield
of computational linguistics which incorporates knowledge
and research in the linguistics, computer science, and
electrical engineering fields to develop methodologies and
technologies that enables the recognition and translation of
spoken language into text by computers and computerized
devices such as those categoriezed as Smart
Technologies and robotics. It is also known as "automatic
speech recognition" (ASR), "computer speech recognition", or
just "speech to text" (STT).
Siri for ios
 Speech recognition applications include voice user
interfaces such as voice dialling (e.g. "Call home"),
call routing (e.g. "I would like to make a collect
call"), domotic appliance control, search (e.g. find a
podcast where particular words were spoken),
simple data entry (e.g., entering a credit card
number), preparation of structured documents (e.g. a
radiology report), speech-to-text processing
(e.g., word processors or emails),
and aircraft (usually termed Direct Voice Input).
Facts
 Eliminating typewriter/keyboard and so on.
 From the technology perspective, speech recognition
has a long history with several waves of major
innovations.. These speech industry players include
Microsoft, Google, IBM, Baidu (China), Apple,
Amazon, Nuance, IflyTek (China), many of which
have publicized the core technology in their speech
recognition systems being based on deep learning.
Used for translation
In tourism
 Car GPS system
 Mobile maps applications-for directions like
google map,here map etc
 Apps like watsaap,viber, and most of the social
networking sites
other key areas
 Digital Security locks
Skype and other social networking sites
Accuracy As mentioned earlier in this article, accuracy of speech recognition varies in the
following:
 Error rates increase as the vocabulary size grows:
 e.g. The 10 digits "zero" to "nine" can be recognized essentially perfectly, but
vocabulary sizes of 200, 5000 or 100000 may have error rates of 3%, 7% or 45%
respectively.
 Vocabulary is hard to recognize if it contains confusable words:
 e.g. The 26 letters of the English alphabet are difficult to discriminate because they
are confusable words (most notoriously, the E-set: "B, C, D, E, G, P, T, V, Z"); an 8%
error rate is considered good for this vocabulary.[
 Speaker dependence vs. independence:
 A speaker-dependent system is intended for use by a single speaker.
A speaker-independent system is intended for use by any speaker, more difficult.
 Isolated, Discontinuous or continuous speech
 With isolated speech single words are used, therefore it becomes easier to recognize
the speech.
With discontinuous speech full sentences separated by silence are used, therefore it
becomes easier to recognize the speech as well as with isolated speech.
With continuous speech naturally spoken sentences are used, therefore it becomes
harder to recognize the speech, different from both isolated and discontinuous
speech.
 Task and language constraints
Lets make it simple
 The trouble is, listening is much harder than it looks (or
sounds): there are all sorts of different problems going on at
the same time...
 When someone speaks to you in the street, there's the
sheer difficulty of separating their words (what scientists
would call the acoustic signal) from the
backgroundnoise—especially in something like a cocktail
party, where the "noise" is similar speech from other
conversations.
 When people talk quickly, and run all their words together in
a long stream, how do we know exactly when one word
ends and the next one begins? (Did they just say "dancing
and smile" or "dance, sing, and smile"?)
 There's the problem of how everyone's voice is a little bit
different, and the way our voices change from moment to
moment. How do our brains figure out that a word like "bird"
 What about words like "red" and "read" that sound identical
but mean totally different things (homophones, as they're
called)? How does our brain know which word the speaker
means?
 What about sentences that are misheard to mean radically
different things? There's the age-old military example of
"send reinforcements, we're going to advance" being
misheard for "send three and fourpence, we're going to a
dance"—and all of us can probably think of song lyrics
we've hilariously misunderstood the same way (I always
chuckle when I hear Kate Bush singing about "the cattle
burning over your shoulder").
Thank you

Voice input and speech recognition system in tourism/social media

  • 2.
    Meaning  A devicein which speech is used to input data or system commands directly into a system. Such equipment involves the use of speech recognition processes, and can replace or supplement other input devices. Some voice input devices can recognize spoken words from a predefined vocabulary, some have to be trained for a particular speaker. When the operator utters a vocabulary item, the matching data input is displayed as characters on a screen and can then be verified by the operator.
  • 3.
     Speech recognition(SR) is the inter-disciplinary subfield of computational linguistics which incorporates knowledge and research in the linguistics, computer science, and electrical engineering fields to develop methodologies and technologies that enables the recognition and translation of spoken language into text by computers and computerized devices such as those categoriezed as Smart Technologies and robotics. It is also known as "automatic speech recognition" (ASR), "computer speech recognition", or just "speech to text" (STT).
  • 4.
  • 5.
     Speech recognitionapplications include voice user interfaces such as voice dialling (e.g. "Call home"), call routing (e.g. "I would like to make a collect call"), domotic appliance control, search (e.g. find a podcast where particular words were spoken), simple data entry (e.g., entering a credit card number), preparation of structured documents (e.g. a radiology report), speech-to-text processing (e.g., word processors or emails), and aircraft (usually termed Direct Voice Input).
  • 7.
    Facts  Eliminating typewriter/keyboardand so on.  From the technology perspective, speech recognition has a long history with several waves of major innovations.. These speech industry players include Microsoft, Google, IBM, Baidu (China), Apple, Amazon, Nuance, IflyTek (China), many of which have publicized the core technology in their speech recognition systems being based on deep learning.
  • 8.
  • 9.
    In tourism  CarGPS system  Mobile maps applications-for directions like google map,here map etc  Apps like watsaap,viber, and most of the social networking sites other key areas  Digital Security locks
  • 10.
    Skype and othersocial networking sites
  • 11.
    Accuracy As mentionedearlier in this article, accuracy of speech recognition varies in the following:  Error rates increase as the vocabulary size grows:  e.g. The 10 digits "zero" to "nine" can be recognized essentially perfectly, but vocabulary sizes of 200, 5000 or 100000 may have error rates of 3%, 7% or 45% respectively.  Vocabulary is hard to recognize if it contains confusable words:  e.g. The 26 letters of the English alphabet are difficult to discriminate because they are confusable words (most notoriously, the E-set: "B, C, D, E, G, P, T, V, Z"); an 8% error rate is considered good for this vocabulary.[  Speaker dependence vs. independence:  A speaker-dependent system is intended for use by a single speaker. A speaker-independent system is intended for use by any speaker, more difficult.  Isolated, Discontinuous or continuous speech  With isolated speech single words are used, therefore it becomes easier to recognize the speech. With discontinuous speech full sentences separated by silence are used, therefore it becomes easier to recognize the speech as well as with isolated speech. With continuous speech naturally spoken sentences are used, therefore it becomes harder to recognize the speech, different from both isolated and discontinuous speech.  Task and language constraints
  • 12.
    Lets make itsimple  The trouble is, listening is much harder than it looks (or sounds): there are all sorts of different problems going on at the same time...  When someone speaks to you in the street, there's the sheer difficulty of separating their words (what scientists would call the acoustic signal) from the backgroundnoise—especially in something like a cocktail party, where the "noise" is similar speech from other conversations.  When people talk quickly, and run all their words together in a long stream, how do we know exactly when one word ends and the next one begins? (Did they just say "dancing and smile" or "dance, sing, and smile"?)  There's the problem of how everyone's voice is a little bit different, and the way our voices change from moment to moment. How do our brains figure out that a word like "bird"
  • 13.
     What aboutwords like "red" and "read" that sound identical but mean totally different things (homophones, as they're called)? How does our brain know which word the speaker means?  What about sentences that are misheard to mean radically different things? There's the age-old military example of "send reinforcements, we're going to advance" being misheard for "send three and fourpence, we're going to a dance"—and all of us can probably think of song lyrics we've hilariously misunderstood the same way (I always chuckle when I hear Kate Bush singing about "the cattle burning over your shoulder").
  • 14.