Voice input and speech recognition system in tourism/social media
Voice input devices allow users to input data or commands using speech instead of other input methods like keyboards. Some voice input devices recognize words from a predefined vocabulary while others need to be trained for a specific speaker. When a word is spoken, the matching input is displayed on screen for verification.
Speech recognition is the process of converting spoken language to text using computer programs. It draws from linguistics, computer science, and electrical engineering. Applications include voice assistants, dictation software, call routing, and more. Accuracy depends on factors like vocabulary size, presence of similar sounding words, whether the system is designed for one speaker or many, and whether speech is isolated, connected or continuous.
Voice input and speech recognition system in tourism/social media
2.
Meaning
A devicein which speech is used to input data or system
commands directly into a system. Such equipment
involves the use of speech recognition processes, and can
replace or supplement other input devices. Some voice
input devices can recognize spoken words from a
predefined vocabulary, some have to be trained for a
particular speaker. When the operator utters a vocabulary
item, the matching data input is displayed as characters on
a screen and can then be verified by the operator.
3.
Speech recognition(SR) is the inter-disciplinary subfield
of computational linguistics which incorporates knowledge
and research in the linguistics, computer science, and
electrical engineering fields to develop methodologies and
technologies that enables the recognition and translation of
spoken language into text by computers and computerized
devices such as those categoriezed as Smart
Technologies and robotics. It is also known as "automatic
speech recognition" (ASR), "computer speech recognition", or
just "speech to text" (STT).
Speech recognitionapplications include voice user
interfaces such as voice dialling (e.g. "Call home"),
call routing (e.g. "I would like to make a collect
call"), domotic appliance control, search (e.g. find a
podcast where particular words were spoken),
simple data entry (e.g., entering a credit card
number), preparation of structured documents (e.g. a
radiology report), speech-to-text processing
(e.g., word processors or emails),
and aircraft (usually termed Direct Voice Input).
7.
Facts
Eliminating typewriter/keyboardand so on.
From the technology perspective, speech recognition
has a long history with several waves of major
innovations.. These speech industry players include
Microsoft, Google, IBM, Baidu (China), Apple,
Amazon, Nuance, IflyTek (China), many of which
have publicized the core technology in their speech
recognition systems being based on deep learning.
In tourism
CarGPS system
Mobile maps applications-for directions like
google map,here map etc
Apps like watsaap,viber, and most of the social
networking sites
other key areas
Digital Security locks
Accuracy As mentionedearlier in this article, accuracy of speech recognition varies in the
following:
Error rates increase as the vocabulary size grows:
e.g. The 10 digits "zero" to "nine" can be recognized essentially perfectly, but
vocabulary sizes of 200, 5000 or 100000 may have error rates of 3%, 7% or 45%
respectively.
Vocabulary is hard to recognize if it contains confusable words:
e.g. The 26 letters of the English alphabet are difficult to discriminate because they
are confusable words (most notoriously, the E-set: "B, C, D, E, G, P, T, V, Z"); an 8%
error rate is considered good for this vocabulary.[
Speaker dependence vs. independence:
A speaker-dependent system is intended for use by a single speaker.
A speaker-independent system is intended for use by any speaker, more difficult.
Isolated, Discontinuous or continuous speech
With isolated speech single words are used, therefore it becomes easier to recognize
the speech.
With discontinuous speech full sentences separated by silence are used, therefore it
becomes easier to recognize the speech as well as with isolated speech.
With continuous speech naturally spoken sentences are used, therefore it becomes
harder to recognize the speech, different from both isolated and discontinuous
speech.
Task and language constraints
12.
Lets make itsimple
The trouble is, listening is much harder than it looks (or
sounds): there are all sorts of different problems going on at
the same time...
When someone speaks to you in the street, there's the
sheer difficulty of separating their words (what scientists
would call the acoustic signal) from the
backgroundnoise—especially in something like a cocktail
party, where the "noise" is similar speech from other
conversations.
When people talk quickly, and run all their words together in
a long stream, how do we know exactly when one word
ends and the next one begins? (Did they just say "dancing
and smile" or "dance, sing, and smile"?)
There's the problem of how everyone's voice is a little bit
different, and the way our voices change from moment to
moment. How do our brains figure out that a word like "bird"
13.
What aboutwords like "red" and "read" that sound identical
but mean totally different things (homophones, as they're
called)? How does our brain know which word the speaker
means?
What about sentences that are misheard to mean radically
different things? There's the age-old military example of
"send reinforcements, we're going to advance" being
misheard for "send three and fourpence, we're going to a
dance"—and all of us can probably think of song lyrics
we've hilariously misunderstood the same way (I always
chuckle when I hear Kate Bush singing about "the cattle
burning over your shoulder").