SlideShare a Scribd company logo
1 of 5
Download to read offline
International Conference on Advances in Human Machine Interaction (HMI - 2016),
March 03-05, 2016, R. L. Jalappa Institute of Technology, Doddaballapur, Bangalore, India
978-1-4673-8810-8/16/$31.00 ©2016 IEEE
Intelligent Hands Free Speech based SMS
System on Android
Gulbakshee Dharmale
1
, Dr. Vilas Thakare
3
, Dr. Dipti D. Patil
2
1,3
Computer Science Dept., SGB Amravati University, Amravati, INDIA.
2
Computer Engineering Dept., MKSSS’s CCOEW, Savitribai Phule, Pune University, Pune, INDIA.
Abstract--Over the years speech recognition has
taken the market. The speech input can be used
in varying domains such as automatic reader
and for inputting data to the system. Speech
recognition can minimize the use of text and
other types of input, at the same time
minimizing the calculation needed for the
process. A decade back speech recognition was
difficult to use in any system, but with elevation
in technology leading to new algorithms,
techniques and advanced tools. Now it is
possible to generate the desired speech
recognition output. One such method is the
hidden markov models which is used in this
paper. Voice or signaled input is inserted
through any speech device such as microphone,
then speech can be processed and convert it to
text hence able to send SMS, also Phone number
can be entering either by voice or you may select
it from contact list. Voice has opened up data
input for a variety of user’s such as illiterate,
handicapped, as if the person cannot write then
the speech input is a boon and other’s too which
can lead to better usage of the application.
Keywords: HMM; Speech Recognition (SR);
Android;
I. INTRODUCTION
The use of voice in mobile phones has
opened up the market for voice application for
variety of the user’s such as handicap etc. Similar
kind of technology was apple’s well known siri.
The siri enable the user’s to call contacts, send
email and many more functionality. Similar to it
was google speech recognition (SR) api. The major
difference between both of them is that siri can
understand multiple phrases and words but google
voice majorly focused for specific phrases and
keywords [1]. Speech recognition adds another
dimension to the classic keyboard input which
leads to ease of the user side i.e. manipulation of
text is far easier than the classic method. This
application uses the google api which uses the
hidden markov models (HMM) method to send
sms, In this sms application the user has to speak
the numeric characters as the contact information
on which sms has to send and message for the
receiver.
This application also included that user
can only input numeric character for contact
information, i.e. the security validation for number
is done. SR will listen to input and convert numeric
to text and will be displayed on contact information
to verify. If any user try to insert any other
character into the information an error would be
displayed e.g. if user speaks his name for contact, it
will be displayed as invalid contact. The message
box can accept any character. To use the speech
recognition user has to be loud and clear so that
command is properly executed by the system.
II. SPEECH RECOGNITION
The system shown here will use SR with
google server which uses HMM method. The brief
description of how speech is recognized is as
follows. Firstly the speech is inputted, sound can be
fluctuating set of signals which are recorded [2].
These signals depends on speaker how is his/her
voice quality and hold on the language. The input
data is divided into words and phrases, i.e.
command is divided into several parts. Lastly
comes the processing phase where accordingly
system understands command and executes it.
Speech Recognition stands majorly on five
pillars that are, feature extraction, acoustic models
database which is built based on the training data,
dictionary, language model and the speech
recognition algorithm. The input data i.e. voice is
first converted to digital signal and are sampled on
time and amplitude axis. This digitalized signal is
then processed. For processing the signal is divided
into small intervals, which depends on the
algorithm used. The generalized timestamp is
International Conference on Advances in Human Machine Interaction (HMI - 2016),
March 03-05, 2016, R. L. Jalappa Institute of Technology, Doddaballapur, Bangalore, India
978-1-4673-8810-8/16/$31.00 ©2016 IEEE
20 ms. This division is based on the features of data
as those features are compared with database
element. Database element contains information of
feature of the word found and according the
command is created. The basic element can be a
phoneme for continuous speech or word for
isolated words recognition.
The dictionary or Database is used to
connect the frequency model i.e. the spoken word
with actual vocabulary word. The signal namely
speech has its constraints as said speech should
match the meaning of textual language brain
created. The HMM uses word for modeling [3].
The output is a hidden probability function of the
state which cannot be deterministically specified.
States sequence is never a command that is SR
system generally assumes that the signal is
realization of message which is encoded as a
sequence of symbols. Here symbols are the words
sampled. To effect the reverse operation of
recognizing the underlying symbol sequence given
a spoken utterance, the continuous speech
waveform is first converted to a sequence of
equally spaced discrete parameter vectors. Vectors
of speech characteristics consist mostly of MFCC
(Mel Frequency Cepstral Coefficients),
standardized by the European Telecommunications
Standards Institute for speech recognition. The
MFC can be easily created. The Fourier analysis is
performed on sampled i.e. divided data then
variable bandwidth triangular filters are placed
along with the Mel frequency scale and energies
are calculated by spectrum [1].
Lastly, the magnitude compression is
applied and spectrum is de-correlated using the
DCT and first coefficients represent the MFCC’s.
These vectors are called as observations which
serve to future calculation. Finally all the states
undergo same procedure one after another, all the
states vectors are calculated and are checked with
the dictionary and a command is created [4]. The
system is trained through Iterative Baum-Welch
procedure. The training is repeated until good
accuracy is reached. The training should be
accurate to an extent that if some minor changes
are present the output can be generated with same
efficiency. The decoding is done through Viterbi
algorithm; it is according to output sequence which
is defined by recursive relation created above.
A. Hidden Markov Models In Speech Recognition
An HMM algorithm is a random
probability finite state automation defined by
parameters. A HMM chain like depicted in the
following figure 1. It has a set of N distinct states
S1, S2, till SN, at regularly spaced discrete times
i.e. the sampled signal. The time instants or the
time at which signal is sampled are denoted with t
= 1, 2, and so on. The actual state at time instance t
is denoted as qt. The important feature because of
which HMM is used is that this probabilistic model
is Markov property which states that current state,
future state and past states are independent to each
other it means the states are calculated
independently. Equation (1) and (2) defines the
state transition probability as follows:
P [qt = Sj | qt-1 = Si, qt-2 = Sk...]=
P [qt = Sj| qt-1 = Si] (1)
Fig. 1 Markov chain with states
The Markov chain is also called as
observed chain as output process corresponds with
the observed states only [5]. The HMM can be
thought of as black box, where the sequence of
output symbols produced over time is observable,
but sequence of states visited over time is hidden
view. That’s why it is called a hidden markov
model [6]. Simple HMM with two states and two
output states A and B shows in below figure 2.
Fig. 2 Hidden Markov Model
aij = P[qt = Sj | qt-1 = Si], 1 ≤ i, j ≤ Naij ≥ 0
(2)
International Conference on Advances in Human Machine Interaction (HMI - 2016),
March 03-05, 2016, R. L. Jalappa Institute of Technology, Doddaballapur, Bangalore, India
978-1-4673-8810-8/16/$31.00 ©2016 IEEE
The three basic problems which tend to
arrive are as follows: First problem states that if
observed sequence as O and model as λ, then how
probability of the sequence with the model is
efficiently computed. Second problem is that if
observed sequence as O and model as λ, then how
can inner state sequence Q explains the
observations O. Lastly how to choose parameter to
get the best and accurate output. Solutions to these
are the Forward algorithm (to get the efficient
output sequence); the Viterbi algorithm (to get the
best observed state O) and the Baum-Welch
algorithm (to choose the computing parameters) are
used respectively.
III. IMPLEMENTATION
The system here is divided into small
subsystems. In our application the models are as
follows. Firstly user has an option to select whether
to use voice as an input or select contacts manually.
If user selects manual option then a service is
called which access all the contacts in contact list
present on the cell phone. If voice is selected then
the google SR api is called and a dialog box appear
which says that speck now and a mic type image is
formed. Once the user is done speaking then api
takes a few seconds to process the data and output
is displayed on the sender’s address block.
Here validation is applied, since the
application sends message to number’s therefore
only numeric characters are allowed so that
unnecessary data is trimmed out. Similar kind of
work is done with message dialog. Here when the
user presses message box user has an option to
write the message manually or to import message
via his/her voice. If manual option is chosen then
keyboard appears and data can be inputted. If voice
is chosen then again the google api is called and a
dialog box appears saying speak now. After giving
input, the inputted data can be seen in message box.
Here no constraint is applied as message can be in
numeric as well as alphabetical characters. Once
both the fields are full in the end a button is present
which says send message and once the user clicks
it, message service is called and message can be
sent to the inputted receiver.
Here the google api is always called,
processing of api is as follows. The HMM rule is
applied and then input is taken and sampled. A
forward algorithm is applied to forward variable
and partial observation is stored until time t with
model λ. The forward variable is defined as follows
in equation (3).
= P ( … , = Si | λ) (3)
Now we have to calculate highest
probability from the given sequence which is to be
stated by inner states of the sequence for that
Viterbi algorithm is used and it helps in calculating
the highest probability along a single path at time t.
It is defined in equation (4) as follows.
Lastly the parameters are chosen by the
Baum- Welch algorithm and equation (5) is defined
it as follows.
Once the output is processed it is displayed on the
desired place.
IV. RESULT
Developed Speech recognizer system
tested for a SMS sending application and found
that it recognizes the speech to an accuracy of more
than 90%. Enter phone number by speech or select
contact from contact list. As user presses select
contact here by selecting name of person it gives all
phone numbers of that person in phone contact list
box. Now it is possible to send sms to all numbers
of same person on one click which results in
reducing time of searching each number. When
user presses the message box user has an option to
write the message manually or to import the
message via his/her voice. If manual option is
chosen then the keyboard appears and the data can
be inputted. If voice is chosen then again google
api is called and a dialog box appears saying speak
now as shown in figure 3.
αt(i) = P(O1 O2 …. Ot, qt= Si | λ)… (3)
=
(4)
ξt(i,j)=P(qt=Si,qt+1=Sj| O, λ) (5)
International Conference on Advances in Human Machine Interaction (HMI - 2016),
March 03-05, 2016, R. L. Jalappa Institute of Technology, Doddaballapur, Bangalore, India
978-1-4673-8810-8/16/$31.00 ©2016 IEEE
Fig. 3 Speech input
Some simple AI is also added to it by
which checking validity of input speech for phone
number. If the user is giving phone number as an
input, but input speech is not a number, then the
system gives recognized speech as well as informs
user that this input is not a proper number as shown
in given figure 4.
Fig. 4 Incorrect phone number input
This system tested for various speakers
which had varying speech speed, amplitude and
frequency. The results of this system are very good
and recognized most of the speech inputs.
Fig. 5 Correct Inputs
V. CONCLUSION
An automatic speech recognizer studied
and implemented on the android platform which
gives much accuracy for both numeric and alpha
numeric inputs. The accuracy of this system is
about 90%, and delay for recognition is less than
100 ns.
We plan to implement this work for other
languages as well as test them on the SMS sending
application which is developed.
References
[1]. B. Raghavendhar Reddy, E. Mahender, “Speech to
text conversion using android platform “, IJERA
Feb-2013.
[2]. V.A.Mane, A.B.Patil,” Comparison of LDM and
HMM for an application of a speech”, International
Conference on Advances in Recent Technologies in
Communication and Computing,978-0-7695-4201-
0/10 $26.00 © 2010 IEEE, DOI
10.1109/ARTCom.2010.65-431
[3]. Brahim Patel, Dr. Y. SrinivasRao ,”Speech
recognition using HMM with MFCC- an analysis
using frequency spectral decomposion technique” ,
SIPIJ Dec 2010.
[4].
1
Anjali Arora,
2
Anil Chopra,” Query processing for
hindi keywords searching using NLP”, International
Journal of Computer Science and Information
Technology Research ISSN 2348-120X (online) Vol.
3, Issue 2,Month: April - June 2015, pp: (981-984).
International Conference on Advances in Human Machine Interaction (HMI - 2016),
March 03-05, 2016, R. L. Jalappa Institute of Technology, Doddaballapur, Bangalore, India
978-1-4673-8810-8/16/$31.00 ©2016 IEEE
[5]. Patrick Gampp, “Hidden markov model basics”.
[6]. J. Tebelskis,”Speech recognition using neural
networks”, Pittsburgh: School of Computer Science,
Carnegie Mellon University, 1995.

More Related Content

What's hot

International journal of signal and image processing issues vol 2015 - no 1...
International journal of signal and image processing issues   vol 2015 - no 1...International journal of signal and image processing issues   vol 2015 - no 1...
International journal of signal and image processing issues vol 2015 - no 1...sophiabelthome
 
Speech Recognized Automation System Using Speaker Identification through Wire...
Speech Recognized Automation System Using Speaker Identification through Wire...Speech Recognized Automation System Using Speaker Identification through Wire...
Speech Recognized Automation System Using Speaker Identification through Wire...IOSR Journals
 
HIDDEN MARKOV MODEL BASED NAMED ENTITY RECOGNITION TOOL
HIDDEN MARKOV MODEL BASED NAMED ENTITY RECOGNITION TOOLHIDDEN MARKOV MODEL BASED NAMED ENTITY RECOGNITION TOOL
HIDDEN MARKOV MODEL BASED NAMED ENTITY RECOGNITION TOOLijfcstjournal
 
Isolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantizationIsolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantizationeSAT Publishing House
 
Development and testing of an FPT.AI-based voicebot
Development and testing of an FPT.AI-based voicebotDevelopment and testing of an FPT.AI-based voicebot
Development and testing of an FPT.AI-based voicebotjournalBEEI
 
Cleveree: an artificially intelligent web service for Jacob voice chatbot
Cleveree: an artificially intelligent web service for Jacob voice chatbotCleveree: an artificially intelligent web service for Jacob voice chatbot
Cleveree: an artificially intelligent web service for Jacob voice chatbotTELKOMNIKA JOURNAL
 
Cohesive Software Design
Cohesive Software DesignCohesive Software Design
Cohesive Software Designijtsrd
 
IRJET- Study of Effect of PCA on Speech Emotion Recognition
IRJET- Study of Effect of PCA on Speech Emotion RecognitionIRJET- Study of Effect of PCA on Speech Emotion Recognition
IRJET- Study of Effect of PCA on Speech Emotion RecognitionIRJET Journal
 
Analysis of sms feedback and online feedback using sentiment analysis for ass...
Analysis of sms feedback and online feedback using sentiment analysis for ass...Analysis of sms feedback and online feedback using sentiment analysis for ass...
Analysis of sms feedback and online feedback using sentiment analysis for ass...eSAT Journals
 
TEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKS
TEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKSTEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKS
TEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKSijdms
 
Evaluation of hindi english mt systems, challenges and solutions
Evaluation of hindi english mt systems, challenges and solutionsEvaluation of hindi english mt systems, challenges and solutions
Evaluation of hindi english mt systems, challenges and solutionsSajeed Mahaboob
 
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCEDETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCEAbdurrahimDerric
 
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...ijcsit
 

What's hot (16)

International journal of signal and image processing issues vol 2015 - no 1...
International journal of signal and image processing issues   vol 2015 - no 1...International journal of signal and image processing issues   vol 2015 - no 1...
International journal of signal and image processing issues vol 2015 - no 1...
 
Speech Recognized Automation System Using Speaker Identification through Wire...
Speech Recognized Automation System Using Speaker Identification through Wire...Speech Recognized Automation System Using Speaker Identification through Wire...
Speech Recognized Automation System Using Speaker Identification through Wire...
 
HIDDEN MARKOV MODEL BASED NAMED ENTITY RECOGNITION TOOL
HIDDEN MARKOV MODEL BASED NAMED ENTITY RECOGNITION TOOLHIDDEN MARKOV MODEL BASED NAMED ENTITY RECOGNITION TOOL
HIDDEN MARKOV MODEL BASED NAMED ENTITY RECOGNITION TOOL
 
Isolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantizationIsolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantization
 
Development and testing of an FPT.AI-based voicebot
Development and testing of an FPT.AI-based voicebotDevelopment and testing of an FPT.AI-based voicebot
Development and testing of an FPT.AI-based voicebot
 
Cleveree: an artificially intelligent web service for Jacob voice chatbot
Cleveree: an artificially intelligent web service for Jacob voice chatbotCleveree: an artificially intelligent web service for Jacob voice chatbot
Cleveree: an artificially intelligent web service for Jacob voice chatbot
 
Cohesive Software Design
Cohesive Software DesignCohesive Software Design
Cohesive Software Design
 
IRJET- Study of Effect of PCA on Speech Emotion Recognition
IRJET- Study of Effect of PCA on Speech Emotion RecognitionIRJET- Study of Effect of PCA on Speech Emotion Recognition
IRJET- Study of Effect of PCA on Speech Emotion Recognition
 
Analysis of sms feedback and online feedback using sentiment analysis for ass...
Analysis of sms feedback and online feedback using sentiment analysis for ass...Analysis of sms feedback and online feedback using sentiment analysis for ass...
Analysis of sms feedback and online feedback using sentiment analysis for ass...
 
TEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKS
TEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKSTEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKS
TEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKS
 
Evaluation of hindi english mt systems, challenges and solutions
Evaluation of hindi english mt systems, challenges and solutionsEvaluation of hindi english mt systems, challenges and solutions
Evaluation of hindi english mt systems, challenges and solutions
 
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCEDETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
 
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
 
J1803015357
J1803015357J1803015357
J1803015357
 
[IJET-V2I1P13] Authors:Shilpa More, Gagandeep .S. Dhir , Deepak Daiwadney and...
[IJET-V2I1P13] Authors:Shilpa More, Gagandeep .S. Dhir , Deepak Daiwadney and...[IJET-V2I1P13] Authors:Shilpa More, Gagandeep .S. Dhir , Deepak Daiwadney and...
[IJET-V2I1P13] Authors:Shilpa More, Gagandeep .S. Dhir , Deepak Daiwadney and...
 
Machine learning
Machine learningMachine learning
Machine learning
 

Similar to Hands Free SMS System Using Speech Recognition

Intelligent speech based sms system
Intelligent speech based sms systemIntelligent speech based sms system
Intelligent speech based sms systemKamal Spring
 
Speech Recognized Automation System Using Speaker Identification through Wire...
Speech Recognized Automation System Using Speaker Identification through Wire...Speech Recognized Automation System Using Speaker Identification through Wire...
Speech Recognized Automation System Using Speaker Identification through Wire...IOSR Journals
 
Speech to text conversion for visually impaired person using µ law companding
Speech to text conversion for visually impaired person using µ law compandingSpeech to text conversion for visually impaired person using µ law companding
Speech to text conversion for visually impaired person using µ law compandingiosrjce
 
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...IDES Editor
 
Course report-islam-taharimul (1)
Course report-islam-taharimul (1)Course report-islam-taharimul (1)
Course report-islam-taharimul (1)TANVIRAHMED611926
 
IRJET- Emotion recognition using Speech Signal: A Review
IRJET-  	  Emotion recognition using Speech Signal: A ReviewIRJET-  	  Emotion recognition using Speech Signal: A Review
IRJET- Emotion recognition using Speech Signal: A ReviewIRJET Journal
 
Speaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization ApproachSpeaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization Approachijsrd.com
 
Voice biometric recognition
Voice biometric recognitionVoice biometric recognition
Voice biometric recognitionphyuhsan
 
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueA Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueCSCJournals
 
IRJET- Voice based Billing System
IRJET-  	  Voice based Billing SystemIRJET-  	  Voice based Billing System
IRJET- Voice based Billing SystemIRJET Journal
 
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...AIRCC Publishing Corporation
 
IRJET- Device Activation based on Voice Recognition using Mel Frequency Cepst...
IRJET- Device Activation based on Voice Recognition using Mel Frequency Cepst...IRJET- Device Activation based on Voice Recognition using Mel Frequency Cepst...
IRJET- Device Activation based on Voice Recognition using Mel Frequency Cepst...IRJET Journal
 
A comparison of different support vector machine kernels for artificial speec...
A comparison of different support vector machine kernels for artificial speec...A comparison of different support vector machine kernels for artificial speec...
A comparison of different support vector machine kernels for artificial speec...TELKOMNIKA JOURNAL
 
Intelligent Arabic letters speech recognition system based on mel frequency c...
Intelligent Arabic letters speech recognition system based on mel frequency c...Intelligent Arabic letters speech recognition system based on mel frequency c...
Intelligent Arabic letters speech recognition system based on mel frequency c...IJECEIAES
 
IRJET- Conversational Assistant based on Sentiment Analysis
IRJET- Conversational Assistant based on Sentiment AnalysisIRJET- Conversational Assistant based on Sentiment Analysis
IRJET- Conversational Assistant based on Sentiment AnalysisIRJET Journal
 
Bachelors project summary
Bachelors project summaryBachelors project summary
Bachelors project summaryAditya Deshmukh
 

Similar to Hands Free SMS System Using Speech Recognition (20)

Intelligent speech based sms system
Intelligent speech based sms systemIntelligent speech based sms system
Intelligent speech based sms system
 
Speech Recognized Automation System Using Speaker Identification through Wire...
Speech Recognized Automation System Using Speaker Identification through Wire...Speech Recognized Automation System Using Speaker Identification through Wire...
Speech Recognized Automation System Using Speaker Identification through Wire...
 
H010625862
H010625862H010625862
H010625862
 
Speech to text conversion for visually impaired person using µ law companding
Speech to text conversion for visually impaired person using µ law compandingSpeech to text conversion for visually impaired person using µ law companding
Speech to text conversion for visually impaired person using µ law companding
 
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
 
Course report-islam-taharimul (1)
Course report-islam-taharimul (1)Course report-islam-taharimul (1)
Course report-islam-taharimul (1)
 
IRJET- Emotion recognition using Speech Signal: A Review
IRJET-  	  Emotion recognition using Speech Signal: A ReviewIRJET-  	  Emotion recognition using Speech Signal: A Review
IRJET- Emotion recognition using Speech Signal: A Review
 
Speaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization ApproachSpeaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization Approach
 
Voice biometric recognition
Voice biometric recognitionVoice biometric recognition
Voice biometric recognition
 
1
11
1
 
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueA Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
 
[IJET-V1I6P21] Authors : Easwari.N , Ponmuthuramalingam.P
[IJET-V1I6P21] Authors : Easwari.N , Ponmuthuramalingam.P[IJET-V1I6P21] Authors : Easwari.N , Ponmuthuramalingam.P
[IJET-V1I6P21] Authors : Easwari.N , Ponmuthuramalingam.P
 
IRJET- Voice based Billing System
IRJET-  	  Voice based Billing SystemIRJET-  	  Voice based Billing System
IRJET- Voice based Billing System
 
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
 
IRJET- Device Activation based on Voice Recognition using Mel Frequency Cepst...
IRJET- Device Activation based on Voice Recognition using Mel Frequency Cepst...IRJET- Device Activation based on Voice Recognition using Mel Frequency Cepst...
IRJET- Device Activation based on Voice Recognition using Mel Frequency Cepst...
 
A comparison of different support vector machine kernels for artificial speec...
A comparison of different support vector machine kernels for artificial speec...A comparison of different support vector machine kernels for artificial speec...
A comparison of different support vector machine kernels for artificial speec...
 
Av4103298302
Av4103298302Av4103298302
Av4103298302
 
Intelligent Arabic letters speech recognition system based on mel frequency c...
Intelligent Arabic letters speech recognition system based on mel frequency c...Intelligent Arabic letters speech recognition system based on mel frequency c...
Intelligent Arabic letters speech recognition system based on mel frequency c...
 
IRJET- Conversational Assistant based on Sentiment Analysis
IRJET- Conversational Assistant based on Sentiment AnalysisIRJET- Conversational Assistant based on Sentiment Analysis
IRJET- Conversational Assistant based on Sentiment Analysis
 
Bachelors project summary
Bachelors project summaryBachelors project summary
Bachelors project summary
 

Recently uploaded

Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.arsicmarija21
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
Romantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxRomantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxsqpmdrvczh
 

Recently uploaded (20)

Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
Romantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxRomantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptx
 

Hands Free SMS System Using Speech Recognition

  • 1. International Conference on Advances in Human Machine Interaction (HMI - 2016), March 03-05, 2016, R. L. Jalappa Institute of Technology, Doddaballapur, Bangalore, India 978-1-4673-8810-8/16/$31.00 ©2016 IEEE Intelligent Hands Free Speech based SMS System on Android Gulbakshee Dharmale 1 , Dr. Vilas Thakare 3 , Dr. Dipti D. Patil 2 1,3 Computer Science Dept., SGB Amravati University, Amravati, INDIA. 2 Computer Engineering Dept., MKSSS’s CCOEW, Savitribai Phule, Pune University, Pune, INDIA. Abstract--Over the years speech recognition has taken the market. The speech input can be used in varying domains such as automatic reader and for inputting data to the system. Speech recognition can minimize the use of text and other types of input, at the same time minimizing the calculation needed for the process. A decade back speech recognition was difficult to use in any system, but with elevation in technology leading to new algorithms, techniques and advanced tools. Now it is possible to generate the desired speech recognition output. One such method is the hidden markov models which is used in this paper. Voice or signaled input is inserted through any speech device such as microphone, then speech can be processed and convert it to text hence able to send SMS, also Phone number can be entering either by voice or you may select it from contact list. Voice has opened up data input for a variety of user’s such as illiterate, handicapped, as if the person cannot write then the speech input is a boon and other’s too which can lead to better usage of the application. Keywords: HMM; Speech Recognition (SR); Android; I. INTRODUCTION The use of voice in mobile phones has opened up the market for voice application for variety of the user’s such as handicap etc. Similar kind of technology was apple’s well known siri. The siri enable the user’s to call contacts, send email and many more functionality. Similar to it was google speech recognition (SR) api. The major difference between both of them is that siri can understand multiple phrases and words but google voice majorly focused for specific phrases and keywords [1]. Speech recognition adds another dimension to the classic keyboard input which leads to ease of the user side i.e. manipulation of text is far easier than the classic method. This application uses the google api which uses the hidden markov models (HMM) method to send sms, In this sms application the user has to speak the numeric characters as the contact information on which sms has to send and message for the receiver. This application also included that user can only input numeric character for contact information, i.e. the security validation for number is done. SR will listen to input and convert numeric to text and will be displayed on contact information to verify. If any user try to insert any other character into the information an error would be displayed e.g. if user speaks his name for contact, it will be displayed as invalid contact. The message box can accept any character. To use the speech recognition user has to be loud and clear so that command is properly executed by the system. II. SPEECH RECOGNITION The system shown here will use SR with google server which uses HMM method. The brief description of how speech is recognized is as follows. Firstly the speech is inputted, sound can be fluctuating set of signals which are recorded [2]. These signals depends on speaker how is his/her voice quality and hold on the language. The input data is divided into words and phrases, i.e. command is divided into several parts. Lastly comes the processing phase where accordingly system understands command and executes it. Speech Recognition stands majorly on five pillars that are, feature extraction, acoustic models database which is built based on the training data, dictionary, language model and the speech recognition algorithm. The input data i.e. voice is first converted to digital signal and are sampled on time and amplitude axis. This digitalized signal is then processed. For processing the signal is divided into small intervals, which depends on the algorithm used. The generalized timestamp is
  • 2. International Conference on Advances in Human Machine Interaction (HMI - 2016), March 03-05, 2016, R. L. Jalappa Institute of Technology, Doddaballapur, Bangalore, India 978-1-4673-8810-8/16/$31.00 ©2016 IEEE 20 ms. This division is based on the features of data as those features are compared with database element. Database element contains information of feature of the word found and according the command is created. The basic element can be a phoneme for continuous speech or word for isolated words recognition. The dictionary or Database is used to connect the frequency model i.e. the spoken word with actual vocabulary word. The signal namely speech has its constraints as said speech should match the meaning of textual language brain created. The HMM uses word for modeling [3]. The output is a hidden probability function of the state which cannot be deterministically specified. States sequence is never a command that is SR system generally assumes that the signal is realization of message which is encoded as a sequence of symbols. Here symbols are the words sampled. To effect the reverse operation of recognizing the underlying symbol sequence given a spoken utterance, the continuous speech waveform is first converted to a sequence of equally spaced discrete parameter vectors. Vectors of speech characteristics consist mostly of MFCC (Mel Frequency Cepstral Coefficients), standardized by the European Telecommunications Standards Institute for speech recognition. The MFC can be easily created. The Fourier analysis is performed on sampled i.e. divided data then variable bandwidth triangular filters are placed along with the Mel frequency scale and energies are calculated by spectrum [1]. Lastly, the magnitude compression is applied and spectrum is de-correlated using the DCT and first coefficients represent the MFCC’s. These vectors are called as observations which serve to future calculation. Finally all the states undergo same procedure one after another, all the states vectors are calculated and are checked with the dictionary and a command is created [4]. The system is trained through Iterative Baum-Welch procedure. The training is repeated until good accuracy is reached. The training should be accurate to an extent that if some minor changes are present the output can be generated with same efficiency. The decoding is done through Viterbi algorithm; it is according to output sequence which is defined by recursive relation created above. A. Hidden Markov Models In Speech Recognition An HMM algorithm is a random probability finite state automation defined by parameters. A HMM chain like depicted in the following figure 1. It has a set of N distinct states S1, S2, till SN, at regularly spaced discrete times i.e. the sampled signal. The time instants or the time at which signal is sampled are denoted with t = 1, 2, and so on. The actual state at time instance t is denoted as qt. The important feature because of which HMM is used is that this probabilistic model is Markov property which states that current state, future state and past states are independent to each other it means the states are calculated independently. Equation (1) and (2) defines the state transition probability as follows: P [qt = Sj | qt-1 = Si, qt-2 = Sk...]= P [qt = Sj| qt-1 = Si] (1) Fig. 1 Markov chain with states The Markov chain is also called as observed chain as output process corresponds with the observed states only [5]. The HMM can be thought of as black box, where the sequence of output symbols produced over time is observable, but sequence of states visited over time is hidden view. That’s why it is called a hidden markov model [6]. Simple HMM with two states and two output states A and B shows in below figure 2. Fig. 2 Hidden Markov Model aij = P[qt = Sj | qt-1 = Si], 1 ≤ i, j ≤ Naij ≥ 0 (2)
  • 3. International Conference on Advances in Human Machine Interaction (HMI - 2016), March 03-05, 2016, R. L. Jalappa Institute of Technology, Doddaballapur, Bangalore, India 978-1-4673-8810-8/16/$31.00 ©2016 IEEE The three basic problems which tend to arrive are as follows: First problem states that if observed sequence as O and model as λ, then how probability of the sequence with the model is efficiently computed. Second problem is that if observed sequence as O and model as λ, then how can inner state sequence Q explains the observations O. Lastly how to choose parameter to get the best and accurate output. Solutions to these are the Forward algorithm (to get the efficient output sequence); the Viterbi algorithm (to get the best observed state O) and the Baum-Welch algorithm (to choose the computing parameters) are used respectively. III. IMPLEMENTATION The system here is divided into small subsystems. In our application the models are as follows. Firstly user has an option to select whether to use voice as an input or select contacts manually. If user selects manual option then a service is called which access all the contacts in contact list present on the cell phone. If voice is selected then the google SR api is called and a dialog box appear which says that speck now and a mic type image is formed. Once the user is done speaking then api takes a few seconds to process the data and output is displayed on the sender’s address block. Here validation is applied, since the application sends message to number’s therefore only numeric characters are allowed so that unnecessary data is trimmed out. Similar kind of work is done with message dialog. Here when the user presses message box user has an option to write the message manually or to import message via his/her voice. If manual option is chosen then keyboard appears and data can be inputted. If voice is chosen then again the google api is called and a dialog box appears saying speak now. After giving input, the inputted data can be seen in message box. Here no constraint is applied as message can be in numeric as well as alphabetical characters. Once both the fields are full in the end a button is present which says send message and once the user clicks it, message service is called and message can be sent to the inputted receiver. Here the google api is always called, processing of api is as follows. The HMM rule is applied and then input is taken and sampled. A forward algorithm is applied to forward variable and partial observation is stored until time t with model λ. The forward variable is defined as follows in equation (3). = P ( … , = Si | λ) (3) Now we have to calculate highest probability from the given sequence which is to be stated by inner states of the sequence for that Viterbi algorithm is used and it helps in calculating the highest probability along a single path at time t. It is defined in equation (4) as follows. Lastly the parameters are chosen by the Baum- Welch algorithm and equation (5) is defined it as follows. Once the output is processed it is displayed on the desired place. IV. RESULT Developed Speech recognizer system tested for a SMS sending application and found that it recognizes the speech to an accuracy of more than 90%. Enter phone number by speech or select contact from contact list. As user presses select contact here by selecting name of person it gives all phone numbers of that person in phone contact list box. Now it is possible to send sms to all numbers of same person on one click which results in reducing time of searching each number. When user presses the message box user has an option to write the message manually or to import the message via his/her voice. If manual option is chosen then the keyboard appears and the data can be inputted. If voice is chosen then again google api is called and a dialog box appears saying speak now as shown in figure 3. αt(i) = P(O1 O2 …. Ot, qt= Si | λ)… (3) = (4) ξt(i,j)=P(qt=Si,qt+1=Sj| O, λ) (5)
  • 4. International Conference on Advances in Human Machine Interaction (HMI - 2016), March 03-05, 2016, R. L. Jalappa Institute of Technology, Doddaballapur, Bangalore, India 978-1-4673-8810-8/16/$31.00 ©2016 IEEE Fig. 3 Speech input Some simple AI is also added to it by which checking validity of input speech for phone number. If the user is giving phone number as an input, but input speech is not a number, then the system gives recognized speech as well as informs user that this input is not a proper number as shown in given figure 4. Fig. 4 Incorrect phone number input This system tested for various speakers which had varying speech speed, amplitude and frequency. The results of this system are very good and recognized most of the speech inputs. Fig. 5 Correct Inputs V. CONCLUSION An automatic speech recognizer studied and implemented on the android platform which gives much accuracy for both numeric and alpha numeric inputs. The accuracy of this system is about 90%, and delay for recognition is less than 100 ns. We plan to implement this work for other languages as well as test them on the SMS sending application which is developed. References [1]. B. Raghavendhar Reddy, E. Mahender, “Speech to text conversion using android platform “, IJERA Feb-2013. [2]. V.A.Mane, A.B.Patil,” Comparison of LDM and HMM for an application of a speech”, International Conference on Advances in Recent Technologies in Communication and Computing,978-0-7695-4201- 0/10 $26.00 © 2010 IEEE, DOI 10.1109/ARTCom.2010.65-431 [3]. Brahim Patel, Dr. Y. SrinivasRao ,”Speech recognition using HMM with MFCC- an analysis using frequency spectral decomposion technique” , SIPIJ Dec 2010. [4]. 1 Anjali Arora, 2 Anil Chopra,” Query processing for hindi keywords searching using NLP”, International Journal of Computer Science and Information Technology Research ISSN 2348-120X (online) Vol. 3, Issue 2,Month: April - June 2015, pp: (981-984).
  • 5. International Conference on Advances in Human Machine Interaction (HMI - 2016), March 03-05, 2016, R. L. Jalappa Institute of Technology, Doddaballapur, Bangalore, India 978-1-4673-8810-8/16/$31.00 ©2016 IEEE [5]. Patrick Gampp, “Hidden markov model basics”. [6]. J. Tebelskis,”Speech recognition using neural networks”, Pittsburgh: School of Computer Science, Carnegie Mellon University, 1995.