The Great Mind Challenge'12
VOICE BASED WEB BROWSER
This document covers the functional and non-functional requirements of the
Voice-Based Web Browser including the physical description of the system as well as the
behavioral and other factors necessary to provide a complete and comprehensive
description of the Voice-Based Web Browser.
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
Voice based web browser
1. VOICE BASED WEB BROWSER
Software Requirements Specification
Sri Shakthi institute of Engineering and Technology
Coimbatore – 641062
Tamil Nadu
Team Members
Sindujaa.R 10IT47
Sowndarya.P 10IT50
Sandeep.N 10IT35
Bharath.A 11ITL61
Project Guide
Ashok kumar.S
Assistant Professor
Department Of IT
2. 2
TABLE OF CONTENT:
1. Introduction ----------------------------------------------------------- 4
1.1. Purpose ------------------------------------------------------- 4
1.2. Scope ------------------------------------------------------- 4
1.3. Definitions ------------------------------------------------------- 4
1.4. Overview ------------------------------------------------------- 5
2. Overall Description---------------------------------------------------- 5
2.1. Collaboration diagram -------------------------------------- 5
2.2. Use case diagram for voice based web browser ---------- 6
2.3. Use case diagram for voice recognizer ----------------9
2.4. Use case diagram for web browser ------------------------11
2.5. Use case diagram for text to voice converter ------------- 13
3. Specific Requirements ---------------------------------------------14
3.1. Basic components of voice based web browser ---------- 14
3.2. Basic components of web browser ------------------------- 16
4. Conclusion -------------------------------------------------------------17
3. 3
LIST OF FIGURES
Fig 1: collaboration diagram for voice based web browser
Fig 2: voice based web browser
Fig 3: voice recognizer
Fig 4: Web browser
Fig 5: text to voice converter
Fig 6: components of voice based web browser
Fig 7: Components for browser
4. 4
1. Introduction
1.1. Purpose
This document details both functional and non-functional requirements for voice
based web browser.
This document serves as a contraction between the team members of the voice
based web browser to ensure fulfillment of project requirements and to describe the inner
working of voice recognizer and it’s interaction with web browser.
1.2. Scope
This document covers the functional and non-functional requirements of the
Voice Based Web Browser including the physical description of the system as well as the
behavioral and other factors necessary to provide a complete and comprehensive
description of the Voice Based Web Browser.
1.3. Definitions
Term Description
Speech synthesis
Refers to a computer's ability to produce sound that
resembles human speech. Although they Can’t imitate the
full spectrum of human cadences and intonations, speech
Refers to a computer’s ability to produce sound that
resembles human speech. Although they synthesis systems
can read text file and output them in a very intelligible, if
somewhat dull, voice. Many systems even allow the user
to choose the type of voice for example, male or female.
Speech synthesis systems are particularly valuable for
seeing-impaired individuals.
Speech recognition In computer science, speech recognition (SR) is the
translation of spoken words into text. It is also known as
"automatic speech recognition", "ASR", "computer speech
recognition", "speech to text", or just "STT".
5. 5
1.4. Overview
Internet has brought about an incredible improvement in human access to
knowledge and information. Voice browsers allow people to access the web using speech
synthesis pre-recorded audio and speech recognition. This can be supplemented by
keypads and small displays. Voice may also be offered as an adjunct to conventional
desktop browsers with high resolution graphical displays, providing an accessible
alternative to using the keyboard or screen, for instance in automobiles where hands/eyes
free operation is essential. Voice interaction can escape the physical limitations on
keypads and displays as mobile devices become ever smaller. The browser will have an
integrated text extraction engine that inspects the content of the page to construct a
structured representation. The internal nodes of the structure represent various levels of
abstraction of the content. This helps in easy and flexible navigation of the page so as to
rapidly home into objects of interest. Finally, the browser is integrated to an automatic
Text-To-Speech transliteration engine that outputs the selected text in the form of speech.
2. Overall Description
2.1. Collaboration diagram
1. User enters into the web browser.
2. Browser will send the pre-recorded audio to select option that is the input will be either
URL or Search option.
3. User will send the option to the browser through voice to text converter.
4. Voice to text converter convert the speech into text and send it to browser.
5. Browser will send request to the server.
6. Server sends the result to the browser via text to voice converter with Selection option
such as links and contents.
7. User communicates with the server by answering and listening.
6. 6
Fig 1: collaboration diagram for voice based web browser
2.2 Use case diagram for voice based web browser
Text to voice converter: Technology that converts digital text to audible speech. In other
words, it allows a device to talk to the user through its speaker.
Voice input: The control and operations of computer systems by spoken commands .A
peripheral device that accepts data and feeds it into a computer.
Voice output: A signal coming out of a computer information, conveys meaning and is
useful to people.
user voice to text
converter
browser text to voice
converter
server
1.login through voice to text converter
2.send pre-recorded audio(give option)
3.select option
4.voice to text
5.send request
6.reponse
7.text to voice
8.get voice response
7. 7
Output data: Data generated by a computer is referred to as output. This includes data
produced at a software level, such as the result of calculation, or at a physical level, such
as printed document.
Data storage: Storage is frequently used to mean the devices and data connected to the
computer through input/output operations-that is, hard disk and tape system and other
enterprise, the options for this kind of storage are of much greater variety and expense
than that related to memory.
Voice to Text converter: Ability of computer systems to accept speech input and act on
it into written language. Current research efforts are directed towards applications of
automatic speech recognition(ASR), where the goal is to transform the content of speech
into knowledge that forms the basis for linguistic or cognitive tasks, such as translation
into another language.
Database Manager: A database manager links two or more files together and is the
foundation for developing routine business systems. Contrast with file manager, which
works with only one file at a time and is typically used interactively on a personal
computer for managing personal, independent files, such as name and address lists.
Connector: Used to connect the recognized voice and the web browser.
Browser Engine: Software that renders HTML pages (Web pages). It turns the HTML
layout tags in the page into the appropriate commands for the operating system. Also
called as “Layout Engine”.
Apply Grammar: The study of structural relationships in language or in a language,
sometimes including pronunciation, meaning, and linguistic history.
Extension: A filename extension is a suffix (separated from the base filename by a dot)
to the name of a computer file applied to indicate the encoding (file format) of its content.
Examples of filename extensions are .png,.jpeg,.exe,dmg and .txt.
8. 8
Fig 2: voice based web browser
connector
voice input
applygrammar voice to text
converter
conver into words
user
find extention
server
browser engine
database
manager
data storage
voice output
text to voice
converter
output data
9. 9
2.3. Use case diagram for voice recognizer
Voice recognizer is used to convert the voice in to text so that it will observe the voice
without any noise and apply grammar for check the pronunciation.
Voice: sound uttered by the mouth especially that uttered by human beings in speech or
song; sound thus uttered considered as possessing some special quality or character; as
the human voice; a pleasant voice; a low voice.
Micro phone: A device used in sound-re-production system for converting sound into
electrical energy, usually by means of a ribbon or diaphragm set into motion by the
sound waves. The vibrations are converted into the equivalent audio-frequency electric
currents informal name mike see also carbon microphone computer loudspeaker.
Apply grammar: The study of structural relationships in language or in a language,
sometimes including pronunciation meaning and linguistic history.
Figure out speech: An expression that uses language in a non literal way, such as a
Metaphor or synecdoche, or in a structured or unusual way, such as anaphora or chiasmus
or that employs sounds, such as alliteration or assonance, to achieve rhetorical effect.
Voice to text converter: Ability of computer systems to accept speech input and act on
it or transcribe it into written language. Current research efforts are directed toward
applications of speech recognition (ASR), where the goal is to transform the content of
speech into knowledge that forms the basis for linguistic or cognitive tasks, such as
translation into another language.
Speech to word: The faculty or act of speaking. The faculty or act of expressing or
describing thoughts, feelings, or perceptions by the articulation of words.
10. 10
Fig 3: voice recognizer
user
voice microphone
tranform the digital audio
applygrammar
figure out the speech
voice to text
converter
speech to word
11. 11
2.4 Use case diagram for web browser
Input Request: The control and operation of computer systems by spoken commands. A
peripheral device that accepts data and feeds it into a computer.
Use Extension: A filename extension is a suffix (separated from the base filename by
dot) to the name of a computer file applied to indicate the encoding (file format) of its
content.
Output data: Data generated by a computer is referred to as output. This includes data
produced at a software level, such as the result of a calculation, or at a physical level,
such as a printed document.
Learn keyboard shortcuts: Making the work to be less. Reduction of work.
Server: A server is a computerized program that manages access to a centralized
resource or server in a network.
Stored data: A permanent store house of data. The term is often used to lump the storage
of all types of data structures and is the foundation for developing routine business
systems. Contrast with file manager, which works with only one file at a time and is
typically used interactively on a personal computer for managing personal, independent
files, such as name and address lists.
Request: A formal message requesting something that is submitted to an authority.
12. 12
Fig 4: Web browser
learn keyboard shortcuts
use extension
stored data
database
manager
request
input request
server
output data
user
13. 13
2.5. Use case diagram for text to voice converter
Text to voice: Technology that converts digital text to audible speech. In other words, it
allows a device to the user through its speaker.
Voice Input: The control and operation of computer systems by spoken commands.
Apply grammar: The study of structural relationships in language or in a language,
sometimes including pronunciation meaning and linguistic history.
Fig 5: text to voice converter
user
voice output
applygrammar
text to voice
converter
text to voice
14. 14
3. Specific requirements
3.1. Basic components of voice based web browser
User: user means someone who uses other people to gain an advantage.
Voice input: The control and operations of computer systems by spoken commands.
Micro phone : A device used in sound reproduction system for converting sound into
electrical energy, usually by means of a ribbon or diaphragm set into motion by the
sound waves. The vibrations are converted into the equivalent audio-frequency electric
currents informal name mike see also carbon microphone computer loudspeaker.
Voice to text converter: : Ability of computer systems to accept speech input and act on
it or transcribe it into written language. Current research efforts are directed toward
applications of speech recognition (ASR), where the goal is to transform the content of
speech into knowledge that forms the basis for linguistic or cognitive tasks, such as
translation into another language.
Browser : Computer program (such as inter net explorer or Mozilla firefox ) that enables
internet users to access, navigate and search World Wide Web sites. Browsers interpret
hypertext links(“hot links”) and allow documents formatted in hypertext markup
language(HTML) to be viewed on the computer screen, and provide many other services
including email and downloading and uploading of data, audio and video files.Also called
web browser.
User request : User request is a request for materials, supplies, equipment or services
made through the Automated Purchasing System.
Server: A server is a computerized program that manages access to a centralized
resource or service in a network.
Speaker: The standard structure when using language is on side of the speaker meaning
and sentence, and on the side of the receiver meaning and action.
Display: Something intended to communicate a particular impression; made a display of
strength; a show of impatience; a good show of looking interested.
Pre-Recorder: Recorded at one time for transmission later. Set down or
registered in a permanent form especially on film or tape for reproduction; recorded
music.
Text to voice converter : Ability of computer systems to accept speech input and act on
it or transcribe it into written language. Current research efforts are directed toward
applications of speech recognition (ASR), where the goal is to transform the content of
15. 15
speech into knowledge that forms the basis for linguistic or cognitive tasks, such as
translation into another language. Practical applications include database-query systems,
information retrieval systems, and speaker identification and verification systems, as in
telebanking . Speech recognition has promising in robotics, particularly development of
robots that can hear.
Output data: Information that is to be output from a cryptographic module that has
resulted from transformation or computation in the module.
User request
Response
Fig 6: components of voice based web browser
USER SERVER
VOICE INPUT
MICRO PHONE
VOICE TO TEXT
CONVERTER
BROWSER
SPEAKER
DISPLAY
PRE-
RECORDER
TEXT TO VOICE
CONVERTER
16. 16
3.2. Basic components of web browser
User interface: A user interface is the system by which people (users) interact with a
machine. The user interface includes hardware (physical) and software (logical)
components. User interface exist for various systems, and provide a means of:
1. Input allowing the users to manipulate a system
2. Output allowing the system to indicate the effects of the user’s manipulation.
Browser engine: Software that renders HTML pages (web pages).It turns the HTML
layout tags in the page into the appropriate commands for the operating system.Also
called as “Layout Engine”.
Rendering Engine: A rendering engine is used by a web browser to reader HTML
pages, by mail programs that render HTML e-mail messages, as well as any other
applications that need to render web page content.
Data Store: A data store is platform-independent and host –independent. Therefore, data
store do not change when the virtual machines they contain are moved between hosts.
The scope of a data store is a data centre, the data store is uniquely named within the
datacenter.
Networking: Working together, collaborative work, exchange of information, sharing of
knowledge and capabilities, integration, a bidirectional stream of valuable updated
information.
JavaScript Interpreter: A JavaScript interpreter is specialized computer software which
interprets and executes Java script (also known as ECMA Script).Although there are
several uses for a Java Script engine; it is most commonly used in web browsers.
Back end: Back-end generally refers to a place where a typical end user cannot access
(whether that be a portion of the application, or the code to the application itself).Beyond
that, it depends on the context.
17. 17
Fig 7: Components for browser
4. Conclusion
Speech or voice recognition is a process allowing the elements of speech to be
recognized and analyzed so that the message of that speech can be transposed into a
meaningful form; this speech is sent to the browser which converts the speech into text.
The text is sent to the server processes for the desired result. The text result is again
converted into voice.
Technology makes computers accessible to people who can’t see, or who have
trouble seeing the keyboard and monitor .This page explores solutions that enable
computers to talk, scan and read documents, and make on screen items bigger and easier
to see. It also covers Braille and magnification devices and players for audio books in
special formats.
user interface
browser engine
rendering
engine
networking java script
interpreter
ui back end
datastore
18. 18
References:
[1].Introduction to object oriented analysis and design Stephen R.Schach
[2].IBM Red Books.
[3].IBM TGMC Sample Synopsis.
[4].IBM – www.ibm.in/developerworks.