SlideShare a Scribd company logo
1 of 15
White paper on Machine
Learning
(Speech to Text Conversion using Android Platform)
Group 10
[Type the companyname]
Apurva Mittal (20141009)
Ketan Gyanchandani (20141028)
Riya Giri (20141058)
Sanjeev Kumar (20141063)
Saurabh Ojha (20141064)
Vikash Kumar (20141072)
Introduction:
Machine learning is a type of artificial intelligence (AI) that provides computers with the
ability to learn without being explicitly programmed. Machine learning focuses on the
development of computer programs that can teach themselves to grow and change when
exposed to new data. The process of machine learning is similar to that of data mining. Both
systems search through data to look for patterns. However, instead of extracting data for
human comprehension, machine learning uses that data to improve the program's own
understanding. Machine learning programs detect patterns in data and adjust program actions
accordingly. For example, Facebook's News Feed changes according to the user's personal
interactions with other users. If a user frequently tags a friend in photos, writes on his wall or
"likes" his links, the News Feed will show more of that friend's activity in the user's News
Feed due to presumed closeness.
Essentially, it is a method of teaching computers to make and improve predictions or
behaviours based on some data. What is this "data"? Well, that depends entirely on the
problem. It could be readings from a robot's sensors as it learns to walk, or the correct output
of a program for certain input. Another way to think about machine learning is that it is
"pattern recognition" - the act of teaching a program to react to or recognize patterns.
Speech has not been used much in the field of electronics and computers due to the
complexity and variety of speech signals and sounds. However, with modern processes,
algorithms, and methods we can process speech signals easily and recognize the text. Speech
recognition (SR) is the translation of spoken words into text. It is also known as "automatic
speech recognition" (ASR), "computer speech recognition", or just "speech to text" (STT).
Background:
For the past several decades, designers have processed speech for a wide variety of
applications ranging from mobile communications to automatic reading machines. Speech
has not been used much in the field of electronics and computers due to the complexity and
variety of speech signals and sounds. However, with modern processes, algorithms, and
methods we can process speech signals easily and recognize the text.
Speech recognition is usually processed in middleware; the results are transmitted to the
user applications.
Speech recognition using android platform is done via the Internet, connecting to Google's
server. Speech recognition for Voice uses algorithms based on hidden Markov models (HMM
- Hidden Markov Model) and N-gram language model. It is currently the most successful and
most flexible approach to speech recognition. This application is adapted to input messages in
English.
A hidden Markov model (HMM) is a statistical Markov model in which the system being
modelled is assumed to be a Markov process with unobserved (hidden) states.
Markov Model:
In simple Markov models the state is directly visible or known to the observer, and therefore
the state transition probabilities are the only parameters.
Let’s take a model of the weather consisting of four state Markov model of the weather.
Suppose that once on any day (e.g. in the morning), the weather is observed as any of the
following state with state transition probability as shown in fig.
• State 1: cloudy
• State 2: sunny
• State 3: rainy
• State 4: windy
Fig. Markov Model for weather
Now, from the above figure the pattern of weather over a period of it can be easily predicted
as the initial state is known & the probability of occurrence of various states are known. For
e.g. the probability of getting the sequence of “sunny, rainy, sunny, cloudy, cloudy” can be
given by Eqn.
P(O A,π)= π XA
= π2 .a23 .a32 .a24 .a41 .a11
Where
a11 a12 a13 a14
A= a21 a22 a23 a24
a31 a32 a33 a34
a41 a42 a43 a44 Transition state probability .
Π = π1 π2 π3 π4 Initial state probability.
Thus we can see that in simple Markov model, the prediction of event on the basis of known
initial state can be easily predicted.
Hidden Markov Model:
The Markov model which is used in Speech to text model is known as Hidden Markov
Model. This model is called Hidden Markov because; the state is not directly visible.
Even though state is not directly visible but output which is dependent on the state is
visible. Each state has a probability distribution over the possible output. Therefore the
sequence of output generated by an HMM gives some information about the sequence of
states. The state sequence through which the model passes is hidden, not to the
parameters of the model; even if the model parameters are known exactly, the model is
still 'hidden'.
Hidden Markov models have application in temporal pattern recognition such as speech,
handwriting, gesture recognition, part-of-speech tagging, musical score following, and
bioinformatics.
We can have a better understanding of the HMM model by looking at the Urn & Ball
Model.
Urn & Ball Model: Hidden MarkovModel
Let’s consider that there are N large glass urns in a room. In each urn there is definite no.
of coloured balls. Let’s consider that the set of N urns contains balls of 6 colours (R = red,
O=orange, B=black, G=green, B=blue, P=purple).
The person in the room chooses an urn in that room and randomly draws a ball from that urn.
He then puts the ball on a conveyor belt, where the observer can observe the sequence of the
balls but not the sequence of urns from which they were drawn. The person in the room has
some procedure to choose urns; the choice of the urn for the n-th ball depends only upon a
random number and the choice of the urn for the (n − 1)-th ball. The choice of urn does not
directly depend on the urns chosen before the single previous urn; therefore, this is called a
Markov process. The Markov process itself cannot be observed, and only the sequence of
labelled balls can be observed, thus this process is called Hidden Markov Process.
Fig: Urn & Ball Model
Although the observer doesn’t know the sequence in which the urn has been chosen, he
knows the probability of the different colour ball which can be chosen from each urn. The
observer can calculate the probability of particular ball being chosen from a particular urn by
calculating the various probabilities of the sequence of choice of urn.
N-gram language model: we can easily recall being told by our high school grammar
teacher, not every random combination of words forms a grammatically acceptable sentence:
 Colourless green ideas sleep furiously
 Furiously sleep ideas green colourless
 Ideas furiously colourless sleep green
The sentence Colourless green ideas sleep furiously (made famous by the linguist Noam
Chomsky), for instance, is grammatically perfectly acceptable, but of course entirely
nonsensical. If you compare this sentence to the other two sentences, this grammaticality
becomes evident. The sentence Furiously sleep ideas green colourless is grammatically
unacceptable, and so is Ideas furiously colourless sleep green: these sentences do not play by
the rules of the English language. In other words, the fact that languages have rules
constraints the way in which words can be combined into an acceptable sentence.
Language plays by rules whereas, computers work with rules. Inferring a set of rules gives us
the language model. A model that describes how a language, say English, works and behaves.
The rules by which a language plays are very complex, and no full set of rules to describe a
language has ever been proposed. There are simpler ways to obtain a language model, namely
by exploiting the observation that words do not combine in a random order. That is, we can
learn a lot from a word and its neighbours. Language models that exploit the ordering of
words are called n-gram language models, in which the n represents any integer greater than
zero.
N-gram models can be imagined as placing a small window over a sentence or a text, in
which only n words are visible at the same time. The simplest n-gram model is therefore a so-
called unigram model. This is a model in which we only look at one word at a time. The
sentence Colourless green ideas sleep furiously, for instance, contains five unigrams:
“colourless”, “green”, “ideas”, “sleep”, and “furiously”. Of course, this is not very
informative, as these are just the words that form the sentence. In fact, N-grams start to
become interesting when n is two (a bigram) or greater.
We can easily modify our definition of bigrams to extract n-grams at a specified length.
Rather than always takeing two elements, we make the number of items to take an argument
to the function. When used for language modelling, independence assumptions are made so
that each word depends only on the last n-1 words. This Markov model is used as an
approximation of the true underlying language. This assumption is important because it
massively simplifies the problem of learning the language model from data. In addition,
because of the open nature of language, it is common to group words unknown to the
language model together. in a simple n-gram language model, the probability of a word,
conditioned on some number of previous words (one word in a bigram model, two words in a
trigram model, etc.) can be described as following a categorical distribution (often
imprecisely called a "multinomial distribution"). Basically a n-gram model predicts what is
the likelihood of the next letter. From training data, one can derive the probability distribution
for the next letter given in a history of size n: a=.4, b=.0004, c=0, where the probabilities of
all possible “next letters” sums up to 1.0.
Problems: the following problems faced in today’s world
Hands-free computing:
Today’s generation prefer speaking over writing any day. This may be due to many reasons:
time scarcity, multitasking efficiency, and hassle free tasks division. They have so many
things to do and very less time. There is a need of such an interface that they can connect to
and interact with to make their daily talks easy. With the help of this application any one can
give an input of their voice and abruptly the voice will be converted into text without doing
any extra task.
Education and daily life:
Today’s tech-savvy youngsters want to get their hands on anything and everything. They
have indulged themselves in so many things that they cannot bear spending time writing their
projects and assignments. In today’s generation children are loaded with so many activities
that their health is degrading day by day. This application will help them doing their
assignments. They will have to just dictate their lines and this application will provide them
with the written documents. Moreover they need to learn new languages so that they can
connect with the outer world and work on their pronunciation skills to. This application with
the multiple languages option can help people learning different languages without the help
of any tutor and without going to the particular region.
In day-to-day life when texts messages are integral part of our lives, this application
will help everyone typing the text messages on a go while doing any other task. For example,
anyone can just speak their message to be sent while driving when it is urgent.
Blindness and education:
Among people there are some that are unable to write, either because of blindness (complete
or partial), or for other reasons. Students who are physically disabled or suffer
from Repetitive strain injury/other injuries to the upper extremities have to worry about
handwriting, typing, or working with scribe on school assignments. They all need such an
interface that listens to them and do their task and help them connect to the outer world
instantly.
For those people who cannot read or write, it is very difficult to use the texting application of
phones. This application will help them in this kind of situation by proving the proper
platform.
Solution:
Speech recognition will be very helpful to such people. They will be able to take notes of
anything and everything, send messages across distances at a go. Students who are blind or
have very low vision can benefit from using the technology to convey words and then hear
the application recite for them, as well as use a computer by commanding with their voice,
instead of having to look at the screen and keyboard. For language learning, speech
recognition can be useful for learning a second language. It can teach proper pronunciation,
in addition to helping a person develop fluency with their speaking skills. Today’s generation
prefer speaking over writing any day, for such tech-say youngsters speech to text is the best
way to promote learning and sharing of information. This will help them take notes on a go.
ANDROID Platform as a way out of this problem:
Android is a software environment for mobile devices that includes an operating system,
middleware and key applications.
The Android operating system (OS) architecture is divided into 5 layers. The application
layer of Android OS is visible to end user, and consists of user applications. The application
layer includes basic applications which come with the operating system and applications
which user subsequently takes. All applications are written in the Java programming
language. Framework is extensible set of software components used by all applications in the
operating system. The next layer represents the libraries, written in the C and C + +
programming languages, and OS accesses them via framework. Dalvik Virtual Machine
(DVM), forms the main part of the executive system environment. Virtual machine is used to
start the core libraries written in the Java programming language.
Android Architecture
Unlike Java’s virtual machine, which is based on the stack, DVM bases on registry structure
and it is intended for mobile devices. The last architecture layer of Android operating system
is kernel based on Linux OS, which serves as a hardware abstraction layer. The main reasons
for its use are memory management and processes, security model, network system and the
constant development of systems. There are four basic components used in construction of
applications: activity, intent, service and the content provider. An activity is the main element
of every application and simplified description defines it as a window that users see on their
mobile device. The application can have one or more activities. Main activity is the one that
is used as startup. The transition between the activities is carried out in a way that launched
activity calls a new activity. Each activity as a separate component is implemented with
inheritance of Activity class. During the execution of applications, activities are added to the
stack, currently running activity is on the top of the stack.
An intent is a message used to run the activities, services, or recipient’s multicast. An intent
can contain the name of the components you need to run, the action which is necessary to
execute, the address of stored data needed to run the component, and component type. A
service is a component that runs in the background to perform long running operations or to
perform work for remote processes. One service can link multiple applications and service is
executed until a connection with all applications is done. A content provider manages a
shared set of application data. Data can be stored in the file system, a SQLite database, on the
web, or any other persistent storage location which application can access [1]. Through the
content provider, other applications can query or even modify the data (if the content
provider allows it).
Speech Recognition:
Speech recognition for this application is done on Google server, using the HMM and n-gram
algorithm. The system can be divided into several blocks: feature extraction, acoustic models
database which is built based on the training data, dictionary, language model and the speech
recognition algorithm.
The input audio waveform from a microphone is converted into a sequence of fixed size
acoustic vectors Y 1:T = y1,...,yT in a process called feature extraction. The decoder then
attempts to find the sequence of words w1:L = w1,...,wL which is most likely to have
generated Y , i.e. the decoder tries to find wˆ = arg max w{P(w|Y )}.However, since P(w|Y )
is difficult to model directly,1 Bayes’ Rule is used to transform it into the equivalent problem
of finding:
wˆ = arg max w{p(Y |w)P(w)}
The likelihood p(Y |w) is determined by an acoustic model and the prior P(w) is determined
by a language model.
The basic unit of sound represented by the acoustic model is the phone. For example, the
word “bat” is composed of three phones /b/ /ae/ /t/. About 40 such phones are required for
English.
For any given w, the corresponding acoustic model is synthesized by concatenating phone
models to make words as defined by a pronunciation dictionary. The parameters of these
phone models are estimated from training data consisting of speech waveforms and their
orthographic transcriptions. The language model is typically an N-gram model in which the
probability of each word is conditioned only on its N-1 predecessors. The N-gram parameters
are estimated by counting N-tuples in appropriate text corpora (set of words). The decoder
operates by searching through all possible word sequences using pruning to remove unlikely
hypotheses thereby keeping the search tractable. When the end of the utterance is reached, the
most likely word sequence is output. Alternatively, modern decoders can generate lattices
containing a compact representation of the most likely hypotheses.
MAIN PARTS OF THE PROJECT:
A. Voice Recognition Activity class: Voice Recognition Activity is startup activity
defined as launcher in AndroidManifest.xmlfile.
This is where most of the initialization goes to programmatically interact with widgets
in the user interface. In this method there is also a check whether mobile phone, on
which application is installed, has speech recognition possibility. If a mobile device
doesn’t have one of many Google’s applications which integrate speech recognition,
further work of this application Voice SMS will be disabled and message on the
screen will be “Recognizer not present”. Recognition process is done trough one of
Google’s speech recognition applications. If recognition activity is present user can
start the speech recognition by pressing on the button and thus launching
startActivityForResult (Intent intent, int requestCode). The application uses
startActivityForResult() to broadcast an intent that requests voice recognition,
including an extra parameter that specifies one of two language models.
Enables search after clicking image button
Processes and gives text output
b. SMS class: this class acts as an interface for sending SMS activity. The text is entered in
the space for writing messages and displayed on the screen. By clicking the Send SMS button
application checks whether the message and the number of recipient are entered to perform
sending of message. When cursor is positioned in the space for recipient number from
contacts, button attribute visibility is changed from default gone to visible. Pressing the
button the command allows you to enter the contact numbers. After selecting desired contact,
message can be sent.
Interface for sending SMS
c.XML files: Application consists of two different interfaces. When the user runs application
screen is defined in voice_recognition.xml. The linear arrangement of elements allows adding
widget one below another. Width and height are defined with fill_parent attribute, which
means to be equal as parent (in this case the screen). The second interface, defined within
sms.xml file, is displayed when the user chooses one of offered messages.
AndroidManifest.xml realizes installing and launching applications on the mobile device.
Economic feasibility:
As far as the Economic feasibility of this project is considered, we can say that it would be
very cost efficient for any company to incorporate this application into their project. The
following features of this application shows its economic feasibility:
 Operating System used for development id free of cost and so is the eclipse ide used
as an interface for application development.
 Free use and adaptation of operating system to manufacturers of mobile devices.
 Equality of basic core applications and additional applications in access to resources.
 Optimized use of memory and automatic control of applications which are being
executed.
 Quick and easy development of applications using development tools and rich
database of software libraries.
 High quality of audiovisual content, it is possible to use vector graphics, and most
audio and video formats.
 Ability to test applications on most computing platforms, including Windows, Linux.
Thus saving time and money.
Conclusion:
A speech recognizer’s effectiveness depends on its synthesizing rate and pronunciation
quality. Generally it is seen that STT software uses only type of the language models. Using
only one type of algorithm does serve the purpose of converting speech to text but often lands
up with not so fine quality and low synthesizing rate. Our application attempts to interpolate
the data by combining two sets of model namely, hidden Markov and n-gram model which
are the best algorithms so far for any STT software.
Future developments:
The existing Speech to Text Conversion software available over the net converts the speech
entered in any particular language into that language’s text only. Moreover if, we look at few
applications which do provide the feature of incorporating multiple languages translator, it
often ends up creating a mess for application users by mixing up words an often not giving
desired output.
With the development of software and hardware capabilities of mobile devices, there is an
increased need for device-specific content, what resulted in market changes. We look forward
incorporating the idea of how this software can be developed in future to enter speech in
multi languages and convert it into the multi language text effectively, which could create a
foundation for everyday use of this technology worldwide. The user shall be given a
preference to choose a language he wishes to speak which shall be matched with the existing
database consisting of dictionary and then he will be asked to choose the language he wants
the data to be converted in. The speech synthesizer shall convert the same into required
language and display the desired output. We will focus on various languages spoken in India
thus making it one of its kinds.
Machine Learning White Paper on Speech to Text Conversion

More Related Content

Similar to Machine Learning White Paper on Speech to Text Conversion

2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.pptmilkesa13
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Textkevig
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Textkevig
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
 
Langauage model
Langauage modelLangauage model
Langauage modelc sharada
 
Cognitive plausibility in learning algorithms
Cognitive plausibility in learning algorithmsCognitive plausibility in learning algorithms
Cognitive plausibility in learning algorithmsAndré Karpištšenko
 
Lecture 6
Lecture 6Lecture 6
Lecture 6hunglq
 
Lect 8 learning types (M.L.).pdf
Lect 8 learning types (M.L.).pdfLect 8 learning types (M.L.).pdf
Lect 8 learning types (M.L.).pdfHassanElalfy4
 
A Neural Probabilistic Language Model.pptx
A Neural Probabilistic Language Model.pptxA Neural Probabilistic Language Model.pptx
A Neural Probabilistic Language Model.pptxRama Irsheidat
 
Detection of slang words in e data using semi supervised learning
Detection of slang words in e data using semi supervised learningDetection of slang words in e data using semi supervised learning
Detection of slang words in e data using semi supervised learningijaia
 
A neural probabilistic language model
A neural probabilistic language modelA neural probabilistic language model
A neural probabilistic language modelc sharada
 
Paper id 23201490
Paper id 23201490Paper id 23201490
Paper id 23201490IJRAT
 
DETECTION OF JARGON WORDS IN A TEXT USING SEMI-SUPERVISED LEARNING
DETECTION OF JARGON WORDS IN A TEXT USING SEMI-SUPERVISED LEARNINGDETECTION OF JARGON WORDS IN A TEXT USING SEMI-SUPERVISED LEARNING
DETECTION OF JARGON WORDS IN A TEXT USING SEMI-SUPERVISED LEARNINGcsandit
 
DETECTION OF JARGON WORDS IN A TEXT USING SEMI-SUPERVISED LEARNING
DETECTION OF JARGON WORDS IN A TEXT USING SEMI-SUPERVISED LEARNINGDETECTION OF JARGON WORDS IN A TEXT USING SEMI-SUPERVISED LEARNING
DETECTION OF JARGON WORDS IN A TEXT USING SEMI-SUPERVISED LEARNINGcscpconf
 

Similar to Machine Learning White Paper on Speech to Text Conversion (20)

2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt
 
Language Modeling.docx
Language Modeling.docxLanguage Modeling.docx
Language Modeling.docx
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
Next word Prediction
Next word PredictionNext word Prediction
Next word Prediction
 
Langauage model
Langauage modelLangauage model
Langauage model
 
Cognitive plausibility in learning algorithms
Cognitive plausibility in learning algorithmsCognitive plausibility in learning algorithms
Cognitive plausibility in learning algorithms
 
Lecture 6
Lecture 6Lecture 6
Lecture 6
 
Lect 8 learning types (M.L.).pdf
Lect 8 learning types (M.L.).pdfLect 8 learning types (M.L.).pdf
Lect 8 learning types (M.L.).pdf
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
A Neural Probabilistic Language Model.pptx
A Neural Probabilistic Language Model.pptxA Neural Probabilistic Language Model.pptx
A Neural Probabilistic Language Model.pptx
 
Nlp
NlpNlp
Nlp
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Detection of slang words in e data using semi supervised learning
Detection of slang words in e data using semi supervised learningDetection of slang words in e data using semi supervised learning
Detection of slang words in e data using semi supervised learning
 
A neural probabilistic language model
A neural probabilistic language modelA neural probabilistic language model
A neural probabilistic language model
 
Paper id 23201490
Paper id 23201490Paper id 23201490
Paper id 23201490
 
DETECTION OF JARGON WORDS IN A TEXT USING SEMI-SUPERVISED LEARNING
DETECTION OF JARGON WORDS IN A TEXT USING SEMI-SUPERVISED LEARNINGDETECTION OF JARGON WORDS IN A TEXT USING SEMI-SUPERVISED LEARNING
DETECTION OF JARGON WORDS IN A TEXT USING SEMI-SUPERVISED LEARNING
 
DETECTION OF JARGON WORDS IN A TEXT USING SEMI-SUPERVISED LEARNING
DETECTION OF JARGON WORDS IN A TEXT USING SEMI-SUPERVISED LEARNINGDETECTION OF JARGON WORDS IN A TEXT USING SEMI-SUPERVISED LEARNING
DETECTION OF JARGON WORDS IN A TEXT USING SEMI-SUPERVISED LEARNING
 
Speech Analysis
Speech AnalysisSpeech Analysis
Speech Analysis
 

More from Apurva Mittal

Brand Management-Analysis of Energy/ Infrastructure Brands
Brand Management-Analysis of Energy/ Infrastructure Brands Brand Management-Analysis of Energy/ Infrastructure Brands
Brand Management-Analysis of Energy/ Infrastructure Brands Apurva Mittal
 
Upstream Regulations and NELP-Evolution & Analysis
Upstream Regulations and NELP-Evolution & AnalysisUpstream Regulations and NELP-Evolution & Analysis
Upstream Regulations and NELP-Evolution & AnalysisApurva Mittal
 
Linux & Open Software
Linux & Open SoftwareLinux & Open Software
Linux & Open SoftwareApurva Mittal
 
OPERATIONS MANAGEMENT INDIGO AIRLINES SERVICES
OPERATIONS MANAGEMENTINDIGO AIRLINES SERVICESOPERATIONS MANAGEMENTINDIGO AIRLINES SERVICES
OPERATIONS MANAGEMENT INDIGO AIRLINES SERVICESApurva Mittal
 
Growth story of Microsoft
Growth story of MicrosoftGrowth story of Microsoft
Growth story of MicrosoftApurva Mittal
 
Shale gas- a look into the past to leap into the future
Shale gas- a look into the past to leap into the futureShale gas- a look into the past to leap into the future
Shale gas- a look into the past to leap into the futureApurva Mittal
 
Exploration and analysis of oil and gas field ( 3D seismic survey)
Exploration and analysis of oil and gas field ( 3D seismic survey)Exploration and analysis of oil and gas field ( 3D seismic survey)
Exploration and analysis of oil and gas field ( 3D seismic survey)Apurva Mittal
 
Critical Review of the Recent Amendments in Indian Labor Legislations
Critical Review of the Recent Amendments in Indian Labor LegislationsCritical Review of the Recent Amendments in Indian Labor Legislations
Critical Review of the Recent Amendments in Indian Labor LegislationsApurva Mittal
 
Relationship and professionalism
Relationship and professionalism Relationship and professionalism
Relationship and professionalism Apurva Mittal
 
Management control system @ germi
Management control system @ germiManagement control system @ germi
Management control system @ germiApurva Mittal
 
Legal Aspects of Business
Legal Aspects of BusinessLegal Aspects of Business
Legal Aspects of BusinessApurva Mittal
 
GLOBAL PROCUREMENT DATA WAREHOUSE ENHANCES SOURCING INTELLIGENCE
GLOBAL PROCUREMENT DATA WAREHOUSE ENHANCES SOURCING INTELLIGENCEGLOBAL PROCUREMENT DATA WAREHOUSE ENHANCES SOURCING INTELLIGENCE
GLOBAL PROCUREMENT DATA WAREHOUSE ENHANCES SOURCING INTELLIGENCE Apurva Mittal
 
Developing and Executing an Effective Marketing and Media Outreach Strategy...
Developing and Executing an Effective Marketing and Media Outreach Strategy...Developing and Executing an Effective Marketing and Media Outreach Strategy...
Developing and Executing an Effective Marketing and Media Outreach Strategy...Apurva Mittal
 
Key performance indicators
Key performance indicators  Key performance indicators
Key performance indicators Apurva Mittal
 

More from Apurva Mittal (18)

Brand Management-Analysis of Energy/ Infrastructure Brands
Brand Management-Analysis of Energy/ Infrastructure Brands Brand Management-Analysis of Energy/ Infrastructure Brands
Brand Management-Analysis of Energy/ Infrastructure Brands
 
BPCL & Wipro
BPCL & WiproBPCL & Wipro
BPCL & Wipro
 
Upstream Regulations and NELP-Evolution & Analysis
Upstream Regulations and NELP-Evolution & AnalysisUpstream Regulations and NELP-Evolution & Analysis
Upstream Regulations and NELP-Evolution & Analysis
 
Barilla Spa Case
Barilla Spa CaseBarilla Spa Case
Barilla Spa Case
 
Linux & Open Software
Linux & Open SoftwareLinux & Open Software
Linux & Open Software
 
OPERATIONS MANAGEMENT INDIGO AIRLINES SERVICES
OPERATIONS MANAGEMENTINDIGO AIRLINES SERVICESOPERATIONS MANAGEMENTINDIGO AIRLINES SERVICES
OPERATIONS MANAGEMENT INDIGO AIRLINES SERVICES
 
Growth story of Microsoft
Growth story of MicrosoftGrowth story of Microsoft
Growth story of Microsoft
 
Shale gas- a look into the past to leap into the future
Shale gas- a look into the past to leap into the futureShale gas- a look into the past to leap into the future
Shale gas- a look into the past to leap into the future
 
Exploration and analysis of oil and gas field ( 3D seismic survey)
Exploration and analysis of oil and gas field ( 3D seismic survey)Exploration and analysis of oil and gas field ( 3D seismic survey)
Exploration and analysis of oil and gas field ( 3D seismic survey)
 
Advergames
AdvergamesAdvergames
Advergames
 
Critical Review of the Recent Amendments in Indian Labor Legislations
Critical Review of the Recent Amendments in Indian Labor LegislationsCritical Review of the Recent Amendments in Indian Labor Legislations
Critical Review of the Recent Amendments in Indian Labor Legislations
 
Relationship and professionalism
Relationship and professionalism Relationship and professionalism
Relationship and professionalism
 
Management control system @ germi
Management control system @ germiManagement control system @ germi
Management control system @ germi
 
Machine learning
Machine learningMachine learning
Machine learning
 
Legal Aspects of Business
Legal Aspects of BusinessLegal Aspects of Business
Legal Aspects of Business
 
GLOBAL PROCUREMENT DATA WAREHOUSE ENHANCES SOURCING INTELLIGENCE
GLOBAL PROCUREMENT DATA WAREHOUSE ENHANCES SOURCING INTELLIGENCEGLOBAL PROCUREMENT DATA WAREHOUSE ENHANCES SOURCING INTELLIGENCE
GLOBAL PROCUREMENT DATA WAREHOUSE ENHANCES SOURCING INTELLIGENCE
 
Developing and Executing an Effective Marketing and Media Outreach Strategy...
Developing and Executing an Effective Marketing and Media Outreach Strategy...Developing and Executing an Effective Marketing and Media Outreach Strategy...
Developing and Executing an Effective Marketing and Media Outreach Strategy...
 
Key performance indicators
Key performance indicators  Key performance indicators
Key performance indicators
 

Recently uploaded

Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptxPoojaSen20
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 

Recently uploaded (20)

Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptx
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 

Machine Learning White Paper on Speech to Text Conversion

  • 1. White paper on Machine Learning (Speech to Text Conversion using Android Platform) Group 10 [Type the companyname] Apurva Mittal (20141009) Ketan Gyanchandani (20141028) Riya Giri (20141058) Sanjeev Kumar (20141063) Saurabh Ojha (20141064) Vikash Kumar (20141072)
  • 2. Introduction: Machine learning is a type of artificial intelligence (AI) that provides computers with the ability to learn without being explicitly programmed. Machine learning focuses on the development of computer programs that can teach themselves to grow and change when exposed to new data. The process of machine learning is similar to that of data mining. Both systems search through data to look for patterns. However, instead of extracting data for human comprehension, machine learning uses that data to improve the program's own understanding. Machine learning programs detect patterns in data and adjust program actions accordingly. For example, Facebook's News Feed changes according to the user's personal interactions with other users. If a user frequently tags a friend in photos, writes on his wall or "likes" his links, the News Feed will show more of that friend's activity in the user's News Feed due to presumed closeness. Essentially, it is a method of teaching computers to make and improve predictions or behaviours based on some data. What is this "data"? Well, that depends entirely on the problem. It could be readings from a robot's sensors as it learns to walk, or the correct output of a program for certain input. Another way to think about machine learning is that it is "pattern recognition" - the act of teaching a program to react to or recognize patterns. Speech has not been used much in the field of electronics and computers due to the complexity and variety of speech signals and sounds. However, with modern processes, algorithms, and methods we can process speech signals easily and recognize the text. Speech recognition (SR) is the translation of spoken words into text. It is also known as "automatic speech recognition" (ASR), "computer speech recognition", or just "speech to text" (STT). Background: For the past several decades, designers have processed speech for a wide variety of applications ranging from mobile communications to automatic reading machines. Speech has not been used much in the field of electronics and computers due to the complexity and variety of speech signals and sounds. However, with modern processes, algorithms, and methods we can process speech signals easily and recognize the text.
  • 3. Speech recognition is usually processed in middleware; the results are transmitted to the user applications. Speech recognition using android platform is done via the Internet, connecting to Google's server. Speech recognition for Voice uses algorithms based on hidden Markov models (HMM - Hidden Markov Model) and N-gram language model. It is currently the most successful and most flexible approach to speech recognition. This application is adapted to input messages in English. A hidden Markov model (HMM) is a statistical Markov model in which the system being modelled is assumed to be a Markov process with unobserved (hidden) states. Markov Model: In simple Markov models the state is directly visible or known to the observer, and therefore the state transition probabilities are the only parameters. Let’s take a model of the weather consisting of four state Markov model of the weather. Suppose that once on any day (e.g. in the morning), the weather is observed as any of the following state with state transition probability as shown in fig. • State 1: cloudy • State 2: sunny • State 3: rainy • State 4: windy
  • 4. Fig. Markov Model for weather Now, from the above figure the pattern of weather over a period of it can be easily predicted as the initial state is known & the probability of occurrence of various states are known. For e.g. the probability of getting the sequence of “sunny, rainy, sunny, cloudy, cloudy” can be given by Eqn. P(O A,π)= π XA = π2 .a23 .a32 .a24 .a41 .a11 Where a11 a12 a13 a14 A= a21 a22 a23 a24 a31 a32 a33 a34 a41 a42 a43 a44 Transition state probability .
  • 5. Π = π1 π2 π3 π4 Initial state probability. Thus we can see that in simple Markov model, the prediction of event on the basis of known initial state can be easily predicted. Hidden Markov Model: The Markov model which is used in Speech to text model is known as Hidden Markov Model. This model is called Hidden Markov because; the state is not directly visible. Even though state is not directly visible but output which is dependent on the state is visible. Each state has a probability distribution over the possible output. Therefore the sequence of output generated by an HMM gives some information about the sequence of states. The state sequence through which the model passes is hidden, not to the parameters of the model; even if the model parameters are known exactly, the model is still 'hidden'. Hidden Markov models have application in temporal pattern recognition such as speech, handwriting, gesture recognition, part-of-speech tagging, musical score following, and bioinformatics. We can have a better understanding of the HMM model by looking at the Urn & Ball Model. Urn & Ball Model: Hidden MarkovModel Let’s consider that there are N large glass urns in a room. In each urn there is definite no. of coloured balls. Let’s consider that the set of N urns contains balls of 6 colours (R = red, O=orange, B=black, G=green, B=blue, P=purple). The person in the room chooses an urn in that room and randomly draws a ball from that urn. He then puts the ball on a conveyor belt, where the observer can observe the sequence of the balls but not the sequence of urns from which they were drawn. The person in the room has some procedure to choose urns; the choice of the urn for the n-th ball depends only upon a random number and the choice of the urn for the (n − 1)-th ball. The choice of urn does not directly depend on the urns chosen before the single previous urn; therefore, this is called a Markov process. The Markov process itself cannot be observed, and only the sequence of
  • 6. labelled balls can be observed, thus this process is called Hidden Markov Process. Fig: Urn & Ball Model Although the observer doesn’t know the sequence in which the urn has been chosen, he knows the probability of the different colour ball which can be chosen from each urn. The observer can calculate the probability of particular ball being chosen from a particular urn by calculating the various probabilities of the sequence of choice of urn. N-gram language model: we can easily recall being told by our high school grammar teacher, not every random combination of words forms a grammatically acceptable sentence:  Colourless green ideas sleep furiously  Furiously sleep ideas green colourless  Ideas furiously colourless sleep green The sentence Colourless green ideas sleep furiously (made famous by the linguist Noam Chomsky), for instance, is grammatically perfectly acceptable, but of course entirely nonsensical. If you compare this sentence to the other two sentences, this grammaticality becomes evident. The sentence Furiously sleep ideas green colourless is grammatically unacceptable, and so is Ideas furiously colourless sleep green: these sentences do not play by the rules of the English language. In other words, the fact that languages have rules constraints the way in which words can be combined into an acceptable sentence. Language plays by rules whereas, computers work with rules. Inferring a set of rules gives us the language model. A model that describes how a language, say English, works and behaves. The rules by which a language plays are very complex, and no full set of rules to describe a
  • 7. language has ever been proposed. There are simpler ways to obtain a language model, namely by exploiting the observation that words do not combine in a random order. That is, we can learn a lot from a word and its neighbours. Language models that exploit the ordering of words are called n-gram language models, in which the n represents any integer greater than zero. N-gram models can be imagined as placing a small window over a sentence or a text, in which only n words are visible at the same time. The simplest n-gram model is therefore a so- called unigram model. This is a model in which we only look at one word at a time. The sentence Colourless green ideas sleep furiously, for instance, contains five unigrams: “colourless”, “green”, “ideas”, “sleep”, and “furiously”. Of course, this is not very informative, as these are just the words that form the sentence. In fact, N-grams start to become interesting when n is two (a bigram) or greater. We can easily modify our definition of bigrams to extract n-grams at a specified length. Rather than always takeing two elements, we make the number of items to take an argument to the function. When used for language modelling, independence assumptions are made so that each word depends only on the last n-1 words. This Markov model is used as an approximation of the true underlying language. This assumption is important because it massively simplifies the problem of learning the language model from data. In addition, because of the open nature of language, it is common to group words unknown to the language model together. in a simple n-gram language model, the probability of a word, conditioned on some number of previous words (one word in a bigram model, two words in a trigram model, etc.) can be described as following a categorical distribution (often imprecisely called a "multinomial distribution"). Basically a n-gram model predicts what is the likelihood of the next letter. From training data, one can derive the probability distribution for the next letter given in a history of size n: a=.4, b=.0004, c=0, where the probabilities of all possible “next letters” sums up to 1.0. Problems: the following problems faced in today’s world Hands-free computing: Today’s generation prefer speaking over writing any day. This may be due to many reasons: time scarcity, multitasking efficiency, and hassle free tasks division. They have so many things to do and very less time. There is a need of such an interface that they can connect to and interact with to make their daily talks easy. With the help of this application any one can give an input of their voice and abruptly the voice will be converted into text without doing any extra task. Education and daily life: Today’s tech-savvy youngsters want to get their hands on anything and everything. They have indulged themselves in so many things that they cannot bear spending time writing their projects and assignments. In today’s generation children are loaded with so many activities
  • 8. that their health is degrading day by day. This application will help them doing their assignments. They will have to just dictate their lines and this application will provide them with the written documents. Moreover they need to learn new languages so that they can connect with the outer world and work on their pronunciation skills to. This application with the multiple languages option can help people learning different languages without the help of any tutor and without going to the particular region. In day-to-day life when texts messages are integral part of our lives, this application will help everyone typing the text messages on a go while doing any other task. For example, anyone can just speak their message to be sent while driving when it is urgent. Blindness and education: Among people there are some that are unable to write, either because of blindness (complete or partial), or for other reasons. Students who are physically disabled or suffer from Repetitive strain injury/other injuries to the upper extremities have to worry about handwriting, typing, or working with scribe on school assignments. They all need such an interface that listens to them and do their task and help them connect to the outer world instantly. For those people who cannot read or write, it is very difficult to use the texting application of phones. This application will help them in this kind of situation by proving the proper platform. Solution: Speech recognition will be very helpful to such people. They will be able to take notes of anything and everything, send messages across distances at a go. Students who are blind or have very low vision can benefit from using the technology to convey words and then hear the application recite for them, as well as use a computer by commanding with their voice, instead of having to look at the screen and keyboard. For language learning, speech recognition can be useful for learning a second language. It can teach proper pronunciation, in addition to helping a person develop fluency with their speaking skills. Today’s generation prefer speaking over writing any day, for such tech-say youngsters speech to text is the best way to promote learning and sharing of information. This will help them take notes on a go. ANDROID Platform as a way out of this problem: Android is a software environment for mobile devices that includes an operating system, middleware and key applications. The Android operating system (OS) architecture is divided into 5 layers. The application layer of Android OS is visible to end user, and consists of user applications. The application layer includes basic applications which come with the operating system and applications which user subsequently takes. All applications are written in the Java programming
  • 9. language. Framework is extensible set of software components used by all applications in the operating system. The next layer represents the libraries, written in the C and C + + programming languages, and OS accesses them via framework. Dalvik Virtual Machine (DVM), forms the main part of the executive system environment. Virtual machine is used to start the core libraries written in the Java programming language. Android Architecture Unlike Java’s virtual machine, which is based on the stack, DVM bases on registry structure and it is intended for mobile devices. The last architecture layer of Android operating system is kernel based on Linux OS, which serves as a hardware abstraction layer. The main reasons for its use are memory management and processes, security model, network system and the constant development of systems. There are four basic components used in construction of applications: activity, intent, service and the content provider. An activity is the main element of every application and simplified description defines it as a window that users see on their mobile device. The application can have one or more activities. Main activity is the one that is used as startup. The transition between the activities is carried out in a way that launched activity calls a new activity. Each activity as a separate component is implemented with inheritance of Activity class. During the execution of applications, activities are added to the stack, currently running activity is on the top of the stack. An intent is a message used to run the activities, services, or recipient’s multicast. An intent can contain the name of the components you need to run, the action which is necessary to execute, the address of stored data needed to run the component, and component type. A
  • 10. service is a component that runs in the background to perform long running operations or to perform work for remote processes. One service can link multiple applications and service is executed until a connection with all applications is done. A content provider manages a shared set of application data. Data can be stored in the file system, a SQLite database, on the web, or any other persistent storage location which application can access [1]. Through the content provider, other applications can query or even modify the data (if the content provider allows it). Speech Recognition: Speech recognition for this application is done on Google server, using the HMM and n-gram algorithm. The system can be divided into several blocks: feature extraction, acoustic models database which is built based on the training data, dictionary, language model and the speech recognition algorithm. The input audio waveform from a microphone is converted into a sequence of fixed size acoustic vectors Y 1:T = y1,...,yT in a process called feature extraction. The decoder then attempts to find the sequence of words w1:L = w1,...,wL which is most likely to have generated Y , i.e. the decoder tries to find wˆ = arg max w{P(w|Y )}.However, since P(w|Y ) is difficult to model directly,1 Bayes’ Rule is used to transform it into the equivalent problem of finding: wˆ = arg max w{p(Y |w)P(w)} The likelihood p(Y |w) is determined by an acoustic model and the prior P(w) is determined by a language model. The basic unit of sound represented by the acoustic model is the phone. For example, the word “bat” is composed of three phones /b/ /ae/ /t/. About 40 such phones are required for English. For any given w, the corresponding acoustic model is synthesized by concatenating phone models to make words as defined by a pronunciation dictionary. The parameters of these phone models are estimated from training data consisting of speech waveforms and their orthographic transcriptions. The language model is typically an N-gram model in which the probability of each word is conditioned only on its N-1 predecessors. The N-gram parameters are estimated by counting N-tuples in appropriate text corpora (set of words). The decoder operates by searching through all possible word sequences using pruning to remove unlikely hypotheses thereby keeping the search tractable. When the end of the utterance is reached, the
  • 11. most likely word sequence is output. Alternatively, modern decoders can generate lattices containing a compact representation of the most likely hypotheses. MAIN PARTS OF THE PROJECT: A. Voice Recognition Activity class: Voice Recognition Activity is startup activity defined as launcher in AndroidManifest.xmlfile. This is where most of the initialization goes to programmatically interact with widgets in the user interface. In this method there is also a check whether mobile phone, on which application is installed, has speech recognition possibility. If a mobile device doesn’t have one of many Google’s applications which integrate speech recognition, further work of this application Voice SMS will be disabled and message on the screen will be “Recognizer not present”. Recognition process is done trough one of Google’s speech recognition applications. If recognition activity is present user can start the speech recognition by pressing on the button and thus launching startActivityForResult (Intent intent, int requestCode). The application uses startActivityForResult() to broadcast an intent that requests voice recognition, including an extra parameter that specifies one of two language models. Enables search after clicking image button
  • 12. Processes and gives text output b. SMS class: this class acts as an interface for sending SMS activity. The text is entered in the space for writing messages and displayed on the screen. By clicking the Send SMS button application checks whether the message and the number of recipient are entered to perform sending of message. When cursor is positioned in the space for recipient number from contacts, button attribute visibility is changed from default gone to visible. Pressing the button the command allows you to enter the contact numbers. After selecting desired contact, message can be sent.
  • 13. Interface for sending SMS c.XML files: Application consists of two different interfaces. When the user runs application screen is defined in voice_recognition.xml. The linear arrangement of elements allows adding widget one below another. Width and height are defined with fill_parent attribute, which means to be equal as parent (in this case the screen). The second interface, defined within sms.xml file, is displayed when the user chooses one of offered messages. AndroidManifest.xml realizes installing and launching applications on the mobile device. Economic feasibility: As far as the Economic feasibility of this project is considered, we can say that it would be very cost efficient for any company to incorporate this application into their project. The following features of this application shows its economic feasibility:  Operating System used for development id free of cost and so is the eclipse ide used as an interface for application development.  Free use and adaptation of operating system to manufacturers of mobile devices.  Equality of basic core applications and additional applications in access to resources.  Optimized use of memory and automatic control of applications which are being executed.  Quick and easy development of applications using development tools and rich database of software libraries.  High quality of audiovisual content, it is possible to use vector graphics, and most audio and video formats.
  • 14.  Ability to test applications on most computing platforms, including Windows, Linux. Thus saving time and money. Conclusion: A speech recognizer’s effectiveness depends on its synthesizing rate and pronunciation quality. Generally it is seen that STT software uses only type of the language models. Using only one type of algorithm does serve the purpose of converting speech to text but often lands up with not so fine quality and low synthesizing rate. Our application attempts to interpolate the data by combining two sets of model namely, hidden Markov and n-gram model which are the best algorithms so far for any STT software. Future developments: The existing Speech to Text Conversion software available over the net converts the speech entered in any particular language into that language’s text only. Moreover if, we look at few applications which do provide the feature of incorporating multiple languages translator, it often ends up creating a mess for application users by mixing up words an often not giving desired output. With the development of software and hardware capabilities of mobile devices, there is an increased need for device-specific content, what resulted in market changes. We look forward incorporating the idea of how this software can be developed in future to enter speech in multi languages and convert it into the multi language text effectively, which could create a foundation for everyday use of this technology worldwide. The user shall be given a preference to choose a language he wishes to speak which shall be matched with the existing database consisting of dictionary and then he will be asked to choose the language he wants the data to be converted in. The speech synthesizer shall convert the same into required language and display the desired output. We will focus on various languages spoken in India thus making it one of its kinds.