Voice input and speech recognition system in tourism/social media

Meaning
 A device in which speech is used to input data or system
commands directly into a system. Such equipment
involves the use of speech recognition processes, and can
replace or supplement other input devices. Some voice
input devices can recognize spoken words from a
predefined vocabulary, some have to be trained for a
particular speaker. When the operator utters a vocabulary
item, the matching data input is displayed as characters on
a screen and can then be verified by the operator.

 Speech recognition (SR) is the inter-disciplinary subfield
of computational linguistics which incorporates knowledge
and research in the linguistics, computer science, and
electrical engineering fields to develop methodologies and
technologies that enables the recognition and translation of
spoken language into text by computers and computerized
devices such as those categoriezed as Smart
Technologies and robotics. It is also known as "automatic
speech recognition" (ASR), "computer speech recognition", or
just "speech to text" (STT).

 Speech recognition applications include voice user
interfaces such as voice dialling (e.g. "Call home"),
call routing (e.g. "I would like to make a collect
call"), domotic appliance control, search (e.g. find a
podcast where particular words were spoken),
simple data entry (e.g., entering a credit card
number), preparation of structured documents (e.g. a
radiology report), speech-to-text processing
(e.g., word processors or emails),
and aircraft (usually termed Direct Voice Input).

Facts
 Eliminating typewriter/keyboard and so on.
 From the technology perspective, speech recognition
has a long history with several waves of major
innovations.. These speech industry players include
Microsoft, Google, IBM, Baidu (China), Apple,
Amazon, Nuance, IflyTek (China), many of which
have publicized the core technology in their speech
recognition systems being based on deep learning.

In tourism
 Car GPS system
 Mobile maps applications-for directions like
google map,here map etc
 Apps like watsaap,viber, and most of the social
networking sites
other key areas
 Digital Security locks

Skype and other social networking sites

Accuracy As mentioned earlier in this article, accuracy of speech recognition varies in the
following:
 Error rates increase as the vocabulary size grows:
 e.g. The 10 digits "zero" to "nine" can be recognized essentially perfectly, but
vocabulary sizes of 200, 5000 or 100000 may have error rates of 3%, 7% or 45%
respectively.
 Vocabulary is hard to recognize if it contains confusable words:
 e.g. The 26 letters of the English alphabet are difficult to discriminate because they
are confusable words (most notoriously, the E-set: "B, C, D, E, G, P, T, V, Z"); an 8%
error rate is considered good for this vocabulary.[
 Speaker dependence vs. independence:
 A speaker-dependent system is intended for use by a single speaker.
A speaker-independent system is intended for use by any speaker, more difficult.
 Isolated, Discontinuous or continuous speech
 With isolated speech single words are used, therefore it becomes easier to recognize
the speech.
With discontinuous speech full sentences separated by silence are used, therefore it
becomes easier to recognize the speech as well as with isolated speech.
With continuous speech naturally spoken sentences are used, therefore it becomes
harder to recognize the speech, different from both isolated and discontinuous
speech.
 Task and language constraints

Lets make it simple
 The trouble is, listening is much harder than it looks (or
sounds): there are all sorts of different problems going on at
the same time...
 When someone speaks to you in the street, there's the
sheer difficulty of separating their words (what scientists
would call the acoustic signal) from the
backgroundnoise—especially in something like a cocktail
party, where the "noise" is similar speech from other
conversations.
 When people talk quickly, and run all their words together in
a long stream, how do we know exactly when one word
ends and the next one begins? (Did they just say "dancing
and smile" or "dance, sing, and smile"?)
 There's the problem of how everyone's voice is a little bit
different, and the way our voices change from moment to
moment. How do our brains figure out that a word like "bird"

 What about words like "red" and "read" that sound identical
but mean totally different things (homophones, as they're
called)? How does our brain know which word the speaker
means?
 What about sentences that are misheard to mean radically
different things? There's the age-old military example of
"send reinforcements, we're going to advance" being
misheard for "send three and fourpence, we're going to a
dance"—and all of us can probably think of song lyrics
we've hilariously misunderstood the same way (I always
chuckle when I hear Kate Bush singing about "the cattle
burning over your shoulder").

Voice input and speech recognition system in tourism/social media

In this document

More Related Content

What's hot

Viewers also liked

Similar to Voice input and speech recognition system in tourism/social media

More from cidroypaes

Recently uploaded

Voice input and speech recognition system in tourism/social media