Voice Browser Overview: What is it and How Does it Work

Presented by
Soumya Shuchi(14300211047)
Srirupa Das(14300211048)
Subhajit Karmakar(14300211049)
Subhendu Paul(14300211050)
Sumadhura Biswas(14300211051)
Suman Bose(14300211052)
Sumit Kr.Singh(14300211053)
IT Dept. GNIT
1

TABLE OF CONTENTS
 What is voice browser
 Motivation
 Difference between graphical browser and voice
browser
 Possible applications
 W3C
 VoiceXML
 Speech Recognition
 Call control
 TTS
 Voice style sheets
 Conclusion
IT Dept. GNIT
2

WHAT IS A VOICE BROWSER?
 A voice browser is a software application that
presents an interactive voice user interface to
the user in a manner analogous to the
functioning of a web browser.
 Expanding access to the Web.
 Will allow any telephone to be used to access
appropriately designed Web-based services.
IT Dept. GNIT
3

IT Dept. GNIT
4
WHAT IS A VOICE BROWSER?
 Server-based , Voice portals
 Interaction via keypads, spoken
commands, listening to prerecorded
speech, synthetic speech and music.
 An advantage to people with visual
impairment.
 Mobile Web

 Use of the hands during browsing might prove
inconvenient or impossible. Voice input is a
natural solution for such ands-busy situations.
 Even in standard browser applications, using
voice input is simply more fun than the
alternatives.
 Browser replaces the mouse in most instances
to enable hands-free browsing.
IT Dept. GNIT
5
WHY A VOICE BROWSER?

WHY A VOICE BROWSER?
 Voice input provides direct "see and say" access
to links, eliminating the wrist strain associated
with holding the mouse for often hours at a time.
IT Dept. GNIT
6

 Easy to use - for people with no knowledge
or fear of computers.
 Voice Browsers are the next generation of
call centers, which will become Voice Web
portals to the company's services and
related websites, whether accessed via the
telephone network or via the Internet.
IT Dept. GNIT
7
MOTIVATION

 Graphical browsing is more passive due to
the persistence of the visual information .
 Voice browsing is more active since the user
has to issue commands.
 Graphical Browsers can be client-based,
whereas Voice Browsers should be server-
based.
IT Dept. GNIT
8
GRAPHICAL & VOICE BROWSING

POSSIBLE APPLICATIONS
 Accessing business information:
 The corporate "front desk" which asks callers who or
what they want.
 Automated telephone ordering service .
 Airline arrival and departure information.
 Home banking services.
 Accessing public information:
 Community information such as weather, traffic
condition, school closures, directions and events.
IT Dept. GNIT
9

CONTD..
 Local, national and international news.
 National and international stock market
information.
 Business and e-commerce transactions.
 Accessing personal information:
 Voice mail.
 Calendars, address and telephone lists .
 Personal horoscope.
 Personal newsletter.
 To-do lists, shopping lists, and calorie
counters.
IT Dept. GNIT
10

W3C
 The World Wide Web Consortium (W3C) develops
interoperable technologies (specifications,
guidelines, software, and tools) to lead the Web to
its full potential as a forum for information,
commerce, communication, and collective
understanding.
11
IT Dept. GNIT
W3C Speech Interface Framework
 VoiceXML
 Speech Recognition :
1.Speech Grammars 2.Stochastic (N-Gram) Language
Models 3.Semantic Interpretation 4.Pronunciation Lexicon
 Call control

VOICEXML
 VoiceXML is a dialog markup language designed
for telephony applications, where users are
restricted to voice and DTMF (touch tone) input.
 There are other languages: VoXML, omniviewXML
text.html
text.vxml
Web
Server
Internet
Browse
r
IT Dept. GNIT
12

SPEECH RECOGNITION
DTMF
Grammars
Speech
Grammars
Stochastic
Language
Models
Semantic
Interpretation
Touch Tone
USER
Speech
IT Dept. GNIT
14

DTMF GRAMMARS
 Touch tone input is often used as an
alternative to speech recognition.
 Especially useful in noisy conditions or
when the social context makes it awkward
to speak.
 The W3C DTMF grammar format allows
authors to specify the expected sequence
of digits, and to bind them to the
appropriate results.
IT Dept. GNIT
15

SPEECH GRAMMARS
 Speech Grammars allow authors to specify
rules covering the sequences of words that
users are expected to say in particular
contexts.
 These contexual clues allow the
recognition engine to focus on likely
utterances, improving the chances of a
correct match.
IT Dept. GNIT
16

STOCHASTIC (N-GRAM) LANGUAGE MODELS
 Speech Grammars are unuseful in case of
open-enden prompt(how can i help u).
 The solution is to use a stochastic
language model. Such models specify the
probability that one word occurs following
certain others. The probabilities are
computed from a collection of utterances
collected from many users.
IT Dept. GNIT
17

SEMANTIC INTERPRETATION
 The recognition process matches an
utterance to a speech grammar, building a
parse tree as a byproduct.
 There are two approaches to harvesting
semantic results from the parse tree:
1. Annotating grammar rules with
semantic interpretation tags.
2. Representing the result in XML.
IT Dept. GNIT
18

PRONUNCIATION LEXICON
o Application developers sometimes need to
ability to tune speech engines, whether for
synthesis or recognition.
o W3C is developing a markup language for
an open portable specification of
pronunciation information using a standard
phonetic alphabet.
o The most commonly needed pronunciations
are for proper nouns such as surnames or
business names.
IT Dept. GNIT
19

CALL CONTROL
 Fine-grained control of speech (signal
processing) resources and telephony
resources in a VoiceXML telephony
platform.
 Will enable application developers to use
markup to perform call screening, whisper
call waiting, call transfer, and more.
 Can be used to transfer a user from one
voice browser to another on a competely
different machine.
IT Dept. GNIT
20

TEXT TO SPEECH SYNTHESIS:
 1. Pre-processing
 2. Text normalization
i) digit normalization
ii) date normalization
iii) abbreviation normalization
 3. Parts of speech annotation
 4. Pronunciation lexicon
 5. Letter to sound rules
 6. Synthesis
IT Dept. GNIT
21

VOICE STYLE SHEETS!
 Volume
 Rate
 Pitch
 Direction
 Spelling out text letter by
letter
 Speech fonts (male/female,
adult/child etc.)
 Inserted text before and after
element content
 Sound effects and music
Authors want
control over how
the document is
rendered. Aural
style sheets
provide basis for
Controlling a
range of features
IT Dept. GNIT
22

CONCLUSION
 If voice browsers are meant to replace
human operator dialog, they must be fast
in response.
 Speech Recognition / Interpretation /
Synthesis depend on implementation
 When a user requests a certain document,
several related documents can be
downloaded for easier access.
IT Dept. GNIT
23

REFERENCES
 www.w3.org/standards/webofdevices/voice
 www.pcworld.com/article/230305/google
 www.hwg.org/opcenter/w3c/voicebrowsers.html
IT Dept. GNIT
24

Voice Browser Overview: What is it and How Does it Work

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Voice Browser Overview: What is it and How Does it Work

Similar to Voice Browser Overview: What is it and How Does it Work (20)

More from Suman Bose

More from Suman Bose (7)

Recently uploaded

Recently uploaded (20)

Voice Browser Overview: What is it and How Does it Work