• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Voice browser

Voice browser






Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Voice browser Voice browser Presentation Transcript

    • Presented by Soumya Shuchi(14300211047) Srirupa Das(14300211048) Subhajit Karmakar(14300211049) Subhendu Paul(14300211050) Sumadhura Biswas(14300211051) Suman Bose(14300211052) Sumit Kr.Singh(14300211053) IT Dept. GNIT 1
    • TABLE OF CONTENTS  What is voice browser  Motivation  Difference between graphical browser and voice browser  Possible applications  W3C  VoiceXML  Speech Recognition  Call control  TTS  Voice style sheets  Conclusion IT Dept. GNIT 2
    • WHAT IS A VOICE BROWSER?  A voice browser is a software application that presents an interactive voice user interface to the user in a manner analogous to the functioning of a web browser.  Expanding access to the Web.  Will allow any telephone to be used to access appropriately designed Web-based services. IT Dept. GNIT 3
    • IT Dept. GNIT 4 WHAT IS A VOICE BROWSER?  Server-based , Voice portals  Interaction via keypads, spoken commands, listening to prerecorded speech, synthetic speech and music.  An advantage to people with visual impairment.  Mobile Web
    •  Use of the hands during browsing might prove inconvenient or impossible. Voice input is a natural solution for such ands-busy situations.  Even in standard browser applications, using voice input is simply more fun than the alternatives.  Browser replaces the mouse in most instances to enable hands-free browsing. IT Dept. GNIT 5 WHY A VOICE BROWSER?
    • WHY A VOICE BROWSER?  Voice input provides direct "see and say" access to links, eliminating the wrist strain associated with holding the mouse for often hours at a time. IT Dept. GNIT 6
    •  Easy to use - for people with no knowledge or fear of computers.  Voice Browsers are the next generation of call centers, which will become Voice Web portals to the company's services and related websites, whether accessed via the telephone network or via the Internet. IT Dept. GNIT 7 MOTIVATION
    •  Graphical browsing is more passive due to the persistence of the visual information .  Voice browsing is more active since the user has to issue commands.  Graphical Browsers can be client-based, whereas Voice Browsers should be server- based. IT Dept. GNIT 8 GRAPHICAL & VOICE BROWSING
    • POSSIBLE APPLICATIONS  Accessing business information:  The corporate "front desk" which asks callers who or what they want.  Automated telephone ordering service .  Airline arrival and departure information.  Home banking services.  Accessing public information:  Community information such as weather, traffic condition, school closures, directions and events. IT Dept. GNIT 9
    • CONTD..  Local, national and international news.  National and international stock market information.  Business and e-commerce transactions.  Accessing personal information:  Voice mail.  Calendars, address and telephone lists .  Personal horoscope.  Personal newsletter.  To-do lists, shopping lists, and calorie counters. IT Dept. GNIT 10
    • W3C  The World Wide Web Consortium (W3C) develops interoperable technologies (specifications, guidelines, software, and tools) to lead the Web to its full potential as a forum for information, commerce, communication, and collective understanding. 11 IT Dept. GNIT W3C Speech Interface Framework  VoiceXML  Speech Recognition : 1.Speech Grammars 2.Stochastic (N-Gram) Language Models 3.Semantic Interpretation 4.Pronunciation Lexicon  Call control
    • VOICEXML  VoiceXML is a dialog markup language designed for telephony applications, where users are restricted to voice and DTMF (touch tone) input.  There are other languages: VoXML, omniviewXML text.html text.vxml Web Server Internet Browse r IT Dept. GNIT 12
    • SPEECH RECOGNITION DTMF Grammars Speech Grammars Stochastic Language Models Semantic Interpretation Touch Tone USER Speech IT Dept. GNIT 14
    • DTMF GRAMMARS  Touch tone input is often used as an alternative to speech recognition.  Especially useful in noisy conditions or when the social context makes it awkward to speak.  The W3C DTMF grammar format allows authors to specify the expected sequence of digits, and to bind them to the appropriate results. IT Dept. GNIT 15
    • SPEECH GRAMMARS  Speech Grammars allow authors to specify rules covering the sequences of words that users are expected to say in particular contexts.  These contexual clues allow the recognition engine to focus on likely utterances, improving the chances of a correct match. IT Dept. GNIT 16
    • STOCHASTIC (N-GRAM) LANGUAGE MODELS  Speech Grammars are unuseful in case of open-enden prompt(how can i help u).  The solution is to use a stochastic language model. Such models specify the probability that one word occurs following certain others. The probabilities are computed from a collection of utterances collected from many users. IT Dept. GNIT 17
    • SEMANTIC INTERPRETATION  The recognition process matches an utterance to a speech grammar, building a parse tree as a byproduct.  There are two approaches to harvesting semantic results from the parse tree: 1. Annotating grammar rules with semantic interpretation tags. 2. Representing the result in XML. IT Dept. GNIT 18
    • PRONUNCIATION LEXICON o Application developers sometimes need to ability to tune speech engines, whether for synthesis or recognition. o W3C is developing a markup language for an open portable specification of pronunciation information using a standard phonetic alphabet. o The most commonly needed pronunciations are for proper nouns such as surnames or business names. IT Dept. GNIT 19
    • CALL CONTROL  Fine-grained control of speech (signal processing) resources and telephony resources in a VoiceXML telephony platform.  Will enable application developers to use markup to perform call screening, whisper call waiting, call transfer, and more.  Can be used to transfer a user from one voice browser to another on a competely different machine. IT Dept. GNIT 20
    • TEXT TO SPEECH SYNTHESIS:  1. Pre-processing  2. Text normalization i) digit normalization ii) date normalization iii) abbreviation normalization  3. Parts of speech annotation  4. Pronunciation lexicon  5. Letter to sound rules  6. Synthesis IT Dept. GNIT 21
    • VOICE STYLE SHEETS!  Volume  Rate  Pitch  Direction  Spelling out text letter by letter  Speech fonts (male/female, adult/child etc.)  Inserted text before and after element content  Sound effects and music Authors want control over how the document is rendered. Aural style sheets provide basis for Controlling a range of features IT Dept. GNIT 22
    • CONCLUSION  If voice browsers are meant to replace human operator dialog, they must be fast in response.  Speech Recognition / Interpretation / Synthesis depend on implementation  When a user requests a certain document, several related documents can be downloaded for easier access. IT Dept. GNIT 23
    • REFERENCES  www.w3.org/standards/webofdevices/voice  www.pcworld.com/article/230305/google  www.hwg.org/opcenter/w3c/voicebrowsers.html IT Dept. GNIT 24