Voice browser


Published on

Published in: Internet, Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Voice browser

  1. 1. Presented by Soumya Shuchi(14300211047) Srirupa Das(14300211048) Subhajit Karmakar(14300211049) Subhendu Paul(14300211050) Sumadhura Biswas(14300211051) Suman Bose(14300211052) Sumit Kr.Singh(14300211053) IT Dept. GNIT 1
  2. 2. TABLE OF CONTENTS  What is voice browser  Motivation  Difference between graphical browser and voice browser  Possible applications  W3C  VoiceXML  Speech Recognition  Call control  TTS  Voice style sheets  Conclusion IT Dept. GNIT 2
  3. 3. WHAT IS A VOICE BROWSER?  A voice browser is a software application that presents an interactive voice user interface to the user in a manner analogous to the functioning of a web browser.  Expanding access to the Web.  Will allow any telephone to be used to access appropriately designed Web-based services. IT Dept. GNIT 3
  4. 4. IT Dept. GNIT 4 WHAT IS A VOICE BROWSER?  Server-based , Voice portals  Interaction via keypads, spoken commands, listening to prerecorded speech, synthetic speech and music.  An advantage to people with visual impairment.  Mobile Web
  5. 5.  Use of the hands during browsing might prove inconvenient or impossible. Voice input is a natural solution for such ands-busy situations.  Even in standard browser applications, using voice input is simply more fun than the alternatives.  Browser replaces the mouse in most instances to enable hands-free browsing. IT Dept. GNIT 5 WHY A VOICE BROWSER?
  6. 6. WHY A VOICE BROWSER?  Voice input provides direct "see and say" access to links, eliminating the wrist strain associated with holding the mouse for often hours at a time. IT Dept. GNIT 6
  7. 7.  Easy to use - for people with no knowledge or fear of computers.  Voice Browsers are the next generation of call centers, which will become Voice Web portals to the company's services and related websites, whether accessed via the telephone network or via the Internet. IT Dept. GNIT 7 MOTIVATION
  8. 8.  Graphical browsing is more passive due to the persistence of the visual information .  Voice browsing is more active since the user has to issue commands.  Graphical Browsers can be client-based, whereas Voice Browsers should be server- based. IT Dept. GNIT 8 GRAPHICAL & VOICE BROWSING
  9. 9. POSSIBLE APPLICATIONS  Accessing business information:  The corporate "front desk" which asks callers who or what they want.  Automated telephone ordering service .  Airline arrival and departure information.  Home banking services.  Accessing public information:  Community information such as weather, traffic condition, school closures, directions and events. IT Dept. GNIT 9
  10. 10. CONTD..  Local, national and international news.  National and international stock market information.  Business and e-commerce transactions.  Accessing personal information:  Voice mail.  Calendars, address and telephone lists .  Personal horoscope.  Personal newsletter.  To-do lists, shopping lists, and calorie counters. IT Dept. GNIT 10
  11. 11. W3C  The World Wide Web Consortium (W3C) develops interoperable technologies (specifications, guidelines, software, and tools) to lead the Web to its full potential as a forum for information, commerce, communication, and collective understanding. 11 IT Dept. GNIT W3C Speech Interface Framework  VoiceXML  Speech Recognition : 1.Speech Grammars 2.Stochastic (N-Gram) Language Models 3.Semantic Interpretation 4.Pronunciation Lexicon  Call control
  12. 12. VOICEXML  VoiceXML is a dialog markup language designed for telephony applications, where users are restricted to voice and DTMF (touch tone) input.  There are other languages: VoXML, omniviewXML text.html text.vxml Web Server Internet Browse r IT Dept. GNIT 12
  14. 14. SPEECH RECOGNITION DTMF Grammars Speech Grammars Stochastic Language Models Semantic Interpretation Touch Tone USER Speech IT Dept. GNIT 14
  15. 15. DTMF GRAMMARS  Touch tone input is often used as an alternative to speech recognition.  Especially useful in noisy conditions or when the social context makes it awkward to speak.  The W3C DTMF grammar format allows authors to specify the expected sequence of digits, and to bind them to the appropriate results. IT Dept. GNIT 15
  16. 16. SPEECH GRAMMARS  Speech Grammars allow authors to specify rules covering the sequences of words that users are expected to say in particular contexts.  These contexual clues allow the recognition engine to focus on likely utterances, improving the chances of a correct match. IT Dept. GNIT 16
  17. 17. STOCHASTIC (N-GRAM) LANGUAGE MODELS  Speech Grammars are unuseful in case of open-enden prompt(how can i help u).  The solution is to use a stochastic language model. Such models specify the probability that one word occurs following certain others. The probabilities are computed from a collection of utterances collected from many users. IT Dept. GNIT 17
  18. 18. SEMANTIC INTERPRETATION  The recognition process matches an utterance to a speech grammar, building a parse tree as a byproduct.  There are two approaches to harvesting semantic results from the parse tree: 1. Annotating grammar rules with semantic interpretation tags. 2. Representing the result in XML. IT Dept. GNIT 18
  19. 19. PRONUNCIATION LEXICON o Application developers sometimes need to ability to tune speech engines, whether for synthesis or recognition. o W3C is developing a markup language for an open portable specification of pronunciation information using a standard phonetic alphabet. o The most commonly needed pronunciations are for proper nouns such as surnames or business names. IT Dept. GNIT 19
  20. 20. CALL CONTROL  Fine-grained control of speech (signal processing) resources and telephony resources in a VoiceXML telephony platform.  Will enable application developers to use markup to perform call screening, whisper call waiting, call transfer, and more.  Can be used to transfer a user from one voice browser to another on a competely different machine. IT Dept. GNIT 20
  21. 21. TEXT TO SPEECH SYNTHESIS:  1. Pre-processing  2. Text normalization i) digit normalization ii) date normalization iii) abbreviation normalization  3. Parts of speech annotation  4. Pronunciation lexicon  5. Letter to sound rules  6. Synthesis IT Dept. GNIT 21
  22. 22. VOICE STYLE SHEETS!  Volume  Rate  Pitch  Direction  Spelling out text letter by letter  Speech fonts (male/female, adult/child etc.)  Inserted text before and after element content  Sound effects and music Authors want control over how the document is rendered. Aural style sheets provide basis for Controlling a range of features IT Dept. GNIT 22
  23. 23. CONCLUSION  If voice browsers are meant to replace human operator dialog, they must be fast in response.  Speech Recognition / Interpretation / Synthesis depend on implementation  When a user requests a certain document, several related documents can be downloaded for easier access. IT Dept. GNIT 23
  24. 24. REFERENCES  www.w3.org/standards/webofdevices/voice  www.pcworld.com/article/230305/google  www.hwg.org/opcenter/w3c/voicebrowsers.html IT Dept. GNIT 24