The document discusses voice browsers, which allow users to access websites and information using voice commands rather than a graphical user interface. It describes key components of voice browsers like VoiceXML for creating voice interfaces, speech recognition, text-to-speech synthesis, and call control. The document also outlines possible applications of voice browsers and standards developed by the W3C to make voice interfaces compatible across platforms.
2. TABLE OF CONTENTS
What is voice browser
Motivation
Difference between graphical browser and voice
browser
Possible applications
W3C
VoiceXML
Speech Recognition
Call control
TTS
Voice style sheets
Conclusion
IT Dept. GNIT
2
3. WHAT IS A VOICE BROWSER?
A voice browser is a software application that
presents an interactive voice user interface to
the user in a manner analogous to the
functioning of a web browser.
Expanding access to the Web.
Will allow any telephone to be used to access
appropriately designed Web-based services.
IT Dept. GNIT
3
4. IT Dept. GNIT
4
WHAT IS A VOICE BROWSER?
Server-based , Voice portals
Interaction via keypads, spoken
commands, listening to prerecorded
speech, synthetic speech and music.
An advantage to people with visual
impairment.
Mobile Web
5. Use of the hands during browsing might prove
inconvenient or impossible. Voice input is a
natural solution for such ands-busy situations.
Even in standard browser applications, using
voice input is simply more fun than the
alternatives.
Browser replaces the mouse in most instances
to enable hands-free browsing.
IT Dept. GNIT
5
WHY A VOICE BROWSER?
6. WHY A VOICE BROWSER?
Voice input provides direct "see and say" access
to links, eliminating the wrist strain associated
with holding the mouse for often hours at a time.
IT Dept. GNIT
6
7. Easy to use - for people with no knowledge
or fear of computers.
Voice Browsers are the next generation of
call centers, which will become Voice Web
portals to the company's services and
related websites, whether accessed via the
telephone network or via the Internet.
IT Dept. GNIT
7
MOTIVATION
8. Graphical browsing is more passive due to
the persistence of the visual information .
Voice browsing is more active since the user
has to issue commands.
Graphical Browsers can be client-based,
whereas Voice Browsers should be server-
based.
IT Dept. GNIT
8
GRAPHICAL & VOICE BROWSING
9. POSSIBLE APPLICATIONS
Accessing business information:
The corporate "front desk" which asks callers who or
what they want.
Automated telephone ordering service .
Airline arrival and departure information.
Home banking services.
Accessing public information:
Community information such as weather, traffic
condition, school closures, directions and events.
IT Dept. GNIT
9
10. CONTD..
Local, national and international news.
National and international stock market
information.
Business and e-commerce transactions.
Accessing personal information:
Voice mail.
Calendars, address and telephone lists .
Personal horoscope.
Personal newsletter.
To-do lists, shopping lists, and calorie
counters.
IT Dept. GNIT
10
11. W3C
The World Wide Web Consortium (W3C) develops
interoperable technologies (specifications,
guidelines, software, and tools) to lead the Web to
its full potential as a forum for information,
commerce, communication, and collective
understanding.
11
IT Dept. GNIT
W3C Speech Interface Framework
VoiceXML
Speech Recognition :
1.Speech Grammars 2.Stochastic (N-Gram) Language
Models 3.Semantic Interpretation 4.Pronunciation Lexicon
Call control
12. VOICEXML
VoiceXML is a dialog markup language designed
for telephony applications, where users are
restricted to voice and DTMF (touch tone) input.
There are other languages: VoXML, omniviewXML
text.html
text.vxml
Web
Server
Internet
Browse
r
IT Dept. GNIT
12
15. DTMF GRAMMARS
Touch tone input is often used as an
alternative to speech recognition.
Especially useful in noisy conditions or
when the social context makes it awkward
to speak.
The W3C DTMF grammar format allows
authors to specify the expected sequence
of digits, and to bind them to the
appropriate results.
IT Dept. GNIT
15
16. SPEECH GRAMMARS
Speech Grammars allow authors to specify
rules covering the sequences of words that
users are expected to say in particular
contexts.
These contexual clues allow the
recognition engine to focus on likely
utterances, improving the chances of a
correct match.
IT Dept. GNIT
16
17. STOCHASTIC (N-GRAM) LANGUAGE MODELS
Speech Grammars are unuseful in case of
open-enden prompt(how can i help u).
The solution is to use a stochastic
language model. Such models specify the
probability that one word occurs following
certain others. The probabilities are
computed from a collection of utterances
collected from many users.
IT Dept. GNIT
17
18. SEMANTIC INTERPRETATION
The recognition process matches an
utterance to a speech grammar, building a
parse tree as a byproduct.
There are two approaches to harvesting
semantic results from the parse tree:
1. Annotating grammar rules with
semantic interpretation tags.
2. Representing the result in XML.
IT Dept. GNIT
18
19. PRONUNCIATION LEXICON
o Application developers sometimes need to
ability to tune speech engines, whether for
synthesis or recognition.
o W3C is developing a markup language for
an open portable specification of
pronunciation information using a standard
phonetic alphabet.
o The most commonly needed pronunciations
are for proper nouns such as surnames or
business names.
IT Dept. GNIT
19
20. CALL CONTROL
Fine-grained control of speech (signal
processing) resources and telephony
resources in a VoiceXML telephony
platform.
Will enable application developers to use
markup to perform call screening, whisper
call waiting, call transfer, and more.
Can be used to transfer a user from one
voice browser to another on a competely
different machine.
IT Dept. GNIT
20
21. TEXT TO SPEECH SYNTHESIS:
1. Pre-processing
2. Text normalization
i) digit normalization
ii) date normalization
iii) abbreviation normalization
3. Parts of speech annotation
4. Pronunciation lexicon
5. Letter to sound rules
6. Synthesis
IT Dept. GNIT
21
22. VOICE STYLE SHEETS!
Volume
Rate
Pitch
Direction
Spelling out text letter by
letter
Speech fonts (male/female,
adult/child etc.)
Inserted text before and after
element content
Sound effects and music
Authors want
control over how
the document is
rendered. Aural
style sheets
provide basis for
Controlling a
range of features
IT Dept. GNIT
22
23. CONCLUSION
If voice browsers are meant to replace
human operator dialog, they must be fast
in response.
Speech Recognition / Interpretation /
Synthesis depend on implementation
When a user requests a certain document,
several related documents can be
downloaded for easier access.
IT Dept. GNIT
23