SlideShare a Scribd company logo
1 of 4
Mobile Application for Real Time Monitoring of Speech Rate
Based on Phoneme Segmentation Techniques
Katia Raichlin Levi 1, Aviv Sotzianu 2, Ofer Amir 3, Eran Aharonson 2, Zehava Ovadia-Blechman 1
1 Medical Engineering Department, Afeka Tel-Aviv Academic College of Engineering.
2 Software Engineering Department, Afeka Tel-Aviv Academic College of Engineering.
3 Communication Disorders Department, Tel Aviv University.
ekatirinar@mail.afeka.ac.il, avivs4@afeka.ac.il,oferamir@post.tau.ac.il, erana@afeka.ac.il,
zehava@afeka.ac.il
Abstract Stuttering is a prevalent speech disorder.
It is characterized by various speech disfluencies,
such as word repetitions, syllable repetitions,
prolongation of sounds and blocking or hesitation
before word completion. Although,among adults,
stuttering is not regarded as a disorder that can be
cured by therapy; adults-who-stutter can learn
various techniques to control and improve their
speech fluency. To that end, they are commonly
required to either slow or modify their speaking
rate, and become aware of it, as they produce
speech spontaneously. A known method to
quantify client’s speaking rate is by calculating
the number of syllables or phonemes during
practice within a fixed time.
The goal of the present study was to develop an
algorithm, as well as an application for
smartphone, that monitors in real time the rate of
the speech ofa patient and presents himthe results
both graphically and numerically in real time. The
recordings and their analysis are stored on a cloud.
This data is uploaded to therapist’s website which
allows the therapist to send his patients real-time
feedback and instructions. Moreover, in addition
to the patient, the therapist can also track training
results through a dedicated website.
The solution is based on a digital speech
processing algorithm which receives speech
signal, extracts relevant features and calculates
the rate of the user’s speech by recognizing the
number of phonemes in each time segment. The
analyzed data is presented graphically and
numerically on the screen. The DSP algorithm
was written using Matlab and ported to Java
implementation on Android OS based
smartphone.
Our algorithm is language agnostic, for this
research we analyzed spontaneous Hebrew
language speech in real time.
The developed algorithm divides user’s speech
signal to small overlapping speech frames. The
algorithm applies Hamming window for each
frame and extracts Mel-Frequency Cepstrum
Coefficients (MFCC) in each one. Subsequently,
Spectral transition measure (STM) is calculated
for each frame. STM step outputs possible
candidates that represent the possible transition
between two adjacent phonemes. The algorithm
detects and removes false boundaries and
calculates out of it the average number of
phonemes spoken per minute.
The preliminary results show that accuracy count
phonemes reach an average of 90% (compared to
human phoneme counting). The preliminary
analysis shows that the main source of deviation
is background noise.
In Summary, our novel application was found
reliable and has a clinical potential to improve
speech therapy. The algorithm providing an
efficient tool for patient who enables real-time
feedback during his exercises, and allowing a
better effective follow of the speech fluency.
1. Introduction
Stuttering is a prevalent speech disorder. It is
characterized by various speech disfluencies,such
as word repetitions, syllable repetitions,
prolongation of sounds and blocking or hesitation
before word completion.
Although,among adults,stuttering is not regarded
as a disorder that can be cured by therapy; adults-
who-stutter can learn various techniques to
control and improve their speech fluency. To that
end, they are commonly required to either slow or
modify their speaking rate, and become aware of
it, as they produce speech spontaneously. Today
acceptable clinical practice to quantify the client's
speech rate is done by manually calculating the
number of phonemes during training within a
fixed time.
Speech therapists can use the following stuttering
therapy techniques:
Auditory feedback devices:
Auditory feedback devices measure feedback
speech,distorted speech, delayed speech or pulse
tones. An example to such devices is the
Electronic fluency devices which change the
sound of the user's voice in his or her ear.
The disadvantage ofthose devices is that they are
available to the client only during the therapy
session.
Computer based therapy:
Computer based therapy programs display speech
spectrograms, waveforms, pitch patterns and
other graphical representations of an individual’s
speech. They are available only in large clinical
facilities. The training process can take many
months of repeated procedures that are costly.
Using those techniques, the patient can evaluate
his speech rate only during speech therapy
sessions.However,between the meetings he can’t
be sure whether his exercises improve his speech
rate.
The goal of the present study was to develop an
application for smartphone which monitors, in
real time, the rate of the speech of a patient and
presents the results both graphically and
numerically. Moreover, in addition to the patient,
the therapist could also enable to track training
results through a dedicated cloud service.
2. Methodology
2.1 Speech Rate Algorithms and Applications
The following algorithms/applications evaluate
speech rate to the following groups:
Algorithms for detection ofphonemes boundaries
for evaluation of speech rate:
Example for such algorithm is Dusan and Rabiner
algorithm [11].This algorithm wasn’t
implemented on mobile, it doesn’t work in real
time and it wasn’t tested on Hebrew speakers.
Algorithms which measures speech rate by
counting vowels in time unit:
Example for such algorithm is Pfau and Ruske
[Error! Reference source not found.]. This
algorithm doesn’t work in real time and it wasn’t
implemented on mobile.
Evaluation ofspeech rate by counting the number
of syllables in time segment:
Example for such software is AgentAssist ( Real
Time Speaking Rate Monitoring System). This
software was designed for call centers which are
interested to learn whether their representatives
speak too fast [3].It analyses voice signal in real
time, however it wasn’t implemented yet on
mobile.
2.2 DSP algorithm
The DSP algorithm (Figure 1) is based on Sorin
Dusan & Lawrence Rabiner algorithm [1] for
Figure 1: Block Diagram of the DSP algorithm
detection of phonemes boundaries. However, we
have modified the detection and removal of false
boundaries stage of the algorithm. The DSP
algorithm was written using Matlab. The sample
rate of the algorithm is 16 KHz. The algorithm
divides user’s speech signal to time segments of
10 seconds with 1 second overlap (chunks). It
uses the process chunk function in order to
calculate the number of phonemes in every chunk.
The voice signal in each chunk is divided to small
speech frames (32 ms) with overlap (22 ms). The
algorithm applies Hamming window for each
frame and extracts 10 Mel-Frequency Cepstrum
Coefficients (MFCC) in each one. Subsequently,
Spectral transition measure (STM) is calculated
for each frame. STM outputs possible candidates
that represent the possible transition between two
adjacent phonemes.
In order to detect the false boundaries,we defined
a threshold for the STM: if the current STM is
smaller than 0.25, it is recognized by the
algorithm as a false boundary and is removed. If
the STM is higher than 0.25, it is considered as a
phoneme boundary.
We calculate the number of phonemes in each
chunk by the following equation: Number of
phonemes = number of boundaries – 1.
We considerthe last chunk as a special case:If the
number of samples in the last chunk<16000
samples, we ignore the last chunk. If the number
of samples in the last chunk is higher than 16000
samples, we consider the last chunk and calculate
the number of phonemes in it.
2.3 The Protocol
In order to test the accuracy of the algorithm, we
have recorded 18 random subjects (men and
women), Hebrew speakers, without speech
disorders, age 20-50. The subjects were asked to
read a Hebrew text from the book:”Habait shel
Yael”. We compared the number of phonemes,
which were detected by the application, to manual
count of phonemes.
2.4 Smartphone and Tablet applications
We have developed a complete systemconsisting
from a smartphone application (Android) for the
patient and tablet application (Android) for the
speech therapist.
2.4.1 Smartphone (Android) Application
The smartphone (Android) application monitors
in real time the rate of the speech of a patient and
presents him the results both graphically and
numerically (Figure 2). The recordings and their
analysis are stored on a cloud. The data is
uploaded to therapist’s website (or tablet) which
allows the therapist to send his patients real-time
feedback and instructions. The application
updates the graph of speech rate every 2 seconds,
based on arithmetic mean.
Figure 2: Online analysis on smartphone
3. Preliminary Results
Figure 3 presents the accuracy of the algorithm:
Figure 3: Algorithm Accuracy Histogram
If the algorithm detects fewer boundaries than the
manual count, the accuracy is negative. If the
algorithm detects more boundaries than the
manual count, the accuracy is positive.
The preliminary results showthat accuracy of the
algorithm is 92.74% (compared to manual count).
The following parameters influenced algorithm’s
accuracy: environmental noise, the proximity of
the microphone to the speaker, too fast speech
rate, the gender of the speaker etc.
4. Discussion
Currently, patients can evaluate their speech rate
only during speech therapy sessions.Therefore, a
need arises for a mobile application which would
monitor in real time speaker’s rate of speech in
order to improve his speech fluency. On the
0
2
4
6
-15 -10 -5 0 5 10 15
NUMBEROFUSERS(#)
PHONEME DETECTION ERROR (%)
market, there isn’t yet any mobile application
which monitors speech rate in real time based on
counting the number of phonemes in time
interval, like our application. With further tests
and improvements, this application has the
potential to improve speech therapy process.
5. Summary
This article describes smartphone (Android)
application which monitors the speech rate of a
patient and presents,in real time, the results both
graphically and numerically. Our algorithm and
application has the potential to improve the
speech therapy by providing the patient real-time
feedback and instructions on his exercises by his
speech therapist, thus allowing him to learn to
control his speech fluency. This application may
be effective in different cases where the need to
improve the rate of speech. Such as slow or
increase the rate of speech, control the rate of
uniform speech. Furthermore, the application may
also be useful for practicing the speech volume
control.
Recommendations for further work
Our application has been tested on patients
without speech disorders.In order to confirm that
our application suits their needs, it is highly
recommended to test the application with those
patients.
It is recommended to test the accuracy of the
application in different rates of speech: slow,
normal and fast.
The application is currently designed for adults.
With some modifications it could be used also in
children’s therapy.
References
1. Sorin Dusan, and Lawrence R. Rabiner. "On the relation
between maximum spectral transitionpositions andphone
boundaries." INTERSPEECH. 2006.
2. Thilo Pfau, andGünther Ruske. "Estimatingthe speaking
rate by vowel detection." Acoustics, Speech and Signal
Processing, 1998. Proceedings of the 1998 IEEE
International Conference on. Vol. 2. IEEE, 1998.
3. Pandharipande, Meghna Abhishek, and Sunil Kumar
Kopparapu. "Real time speaking rate monitoring
system." Signal Processing, Communications and
Computing (ICSPCC), 2011 IEEE International
Conference on. IEEE, 2011.
4. Gerald A. Maguire, Christopher Y. Yeh, andBrandonS.
Ito, "Overview of the diagnosis and treatment of
stuttering" Journal of Experimental & Clinical
Medicine 4.2 (2012): 92-97. Urmila Shrawankar and V.
M. Thakare. “Techniques for Feature Extraction in
Speech Recognition System: A Comparative Study”,
International Journal Of Computer Applications In
Engineering, Technology andSciences (IJCAETS),ISSN
0974-3596,2010,pp 412-418
5. Kalinli, Ozlem."Automatic PhonemeSegmentation Using
Auditory Attention Features." INTERSPEECH. 2012.
6. Amir Ofer, andReut Levine-Yundof. "Listeners'attitude
toward people with dysphonia." Journal of Voice 27.4
(2013): 524-e1.

More Related Content

What's hot

What's hot (20)

19 ijcse-01227
19 ijcse-0122719 ijcse-01227
19 ijcse-01227
 
Speaker Identification & Verification Using MFCC & SVM
Speaker Identification & Verification Using MFCC & SVMSpeaker Identification & Verification Using MFCC & SVM
Speaker Identification & Verification Using MFCC & SVM
 
Speaker recognition on matlab
Speaker recognition on matlabSpeaker recognition on matlab
Speaker recognition on matlab
 
Dynamic Audio-Visual Client Recognition modelling
Dynamic Audio-Visual Client Recognition modellingDynamic Audio-Visual Client Recognition modelling
Dynamic Audio-Visual Client Recognition modelling
 
Automatic speech recognition system using deep learning
Automatic speech recognition system using deep learningAutomatic speech recognition system using deep learning
Automatic speech recognition system using deep learning
 
Multirate signal processing and decimation interpolation
Multirate signal processing and decimation interpolationMultirate signal processing and decimation interpolation
Multirate signal processing and decimation interpolation
 
Av4103298302
Av4103298302Av4103298302
Av4103298302
 
Financial Transactions in ATM Machines using Speech Signals
Financial Transactions in ATM Machines using Speech SignalsFinancial Transactions in ATM Machines using Speech Signals
Financial Transactions in ATM Machines using Speech Signals
 
Voice Identification And Recognition System, Matlab
Voice Identification And Recognition System, MatlabVoice Identification And Recognition System, Matlab
Voice Identification And Recognition System, Matlab
 
Dy36749754
Dy36749754Dy36749754
Dy36749754
 
Utterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNUtterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANN
 
Utterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNUtterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANN
 
SPEECH BASED EMOTION RECOGNITION USING VOICE
SPEECH BASED  EMOTION RECOGNITION USING VOICESPEECH BASED  EMOTION RECOGNITION USING VOICE
SPEECH BASED EMOTION RECOGNITION USING VOICE
 
Speech Recognized Automation System Using Speaker Identification through Wire...
Speech Recognized Automation System Using Speaker Identification through Wire...Speech Recognized Automation System Using Speaker Identification through Wire...
Speech Recognized Automation System Using Speaker Identification through Wire...
 
IRJET- Vocal Code
IRJET- Vocal CodeIRJET- Vocal Code
IRJET- Vocal Code
 
[IJET-V1I6P21] Authors : Easwari.N , Ponmuthuramalingam.P
[IJET-V1I6P21] Authors : Easwari.N , Ponmuthuramalingam.P[IJET-V1I6P21] Authors : Easwari.N , Ponmuthuramalingam.P
[IJET-V1I6P21] Authors : Easwari.N , Ponmuthuramalingam.P
 
Speech user interface
Speech user interfaceSpeech user interface
Speech user interface
 
A Review on Speech Enhancement System using FIR Filter
A Review on Speech Enhancement System using FIR FilterA Review on Speech Enhancement System using FIR Filter
A Review on Speech Enhancement System using FIR Filter
 
05 comparative study of voice print based acoustic features mfcc and lpcc
05 comparative study of voice print based acoustic features mfcc and lpcc05 comparative study of voice print based acoustic features mfcc and lpcc
05 comparative study of voice print based acoustic features mfcc and lpcc
 
50120140502007
5012014050200750120140502007
50120140502007
 

Viewers also liked

RAYMOND G SHAFFER RESUME
RAYMOND G SHAFFER RESUMERAYMOND G SHAFFER RESUME
RAYMOND G SHAFFER RESUME
Raymond Shaffer
 
7487 DM The Collection Set Menu
7487 DM The Collection Set Menu7487 DM The Collection Set Menu
7487 DM The Collection Set Menu
The Parlour Bar
 
Critically Appraised Topic
Critically Appraised TopicCritically Appraised Topic
Critically Appraised Topic
E. Chris Lynch
 

Viewers also liked (17)

Designing the voting experience: Making elections easy for voters to understand
Designing the voting experience: Making elections easy for voters to understandDesigning the voting experience: Making elections easy for voters to understand
Designing the voting experience: Making elections easy for voters to understand
 
My Resume-2
My Resume-2My Resume-2
My Resume-2
 
Creating effective election materials and websites
Creating effective election materials and websitesCreating effective election materials and websites
Creating effective election materials and websites
 
Melbourne zone subs unamed
Melbourne zone subs unamed Melbourne zone subs unamed
Melbourne zone subs unamed
 
Nopeiden kokeilujen ohjelma 1. kierros 2015
Nopeiden kokeilujen ohjelma 1. kierros 2015Nopeiden kokeilujen ohjelma 1. kierros 2015
Nopeiden kokeilujen ohjelma 1. kierros 2015
 
RAYMOND G SHAFFER RESUME
RAYMOND G SHAFFER RESUMERAYMOND G SHAFFER RESUME
RAYMOND G SHAFFER RESUME
 
Presentation2
Presentation2Presentation2
Presentation2
 
Locks And Locking - EN
Locks And Locking - ENLocks And Locking - EN
Locks And Locking - EN
 
Resume_Navaneethakrishnan_Exp
Resume_Navaneethakrishnan_ExpResume_Navaneethakrishnan_Exp
Resume_Navaneethakrishnan_Exp
 
Educar para la vida
Educar para la vidaEducar para la vida
Educar para la vida
 
Melbourne zone subs unamed
Melbourne zone subs unamed Melbourne zone subs unamed
Melbourne zone subs unamed
 
7487 DM The Collection Set Menu
7487 DM The Collection Set Menu7487 DM The Collection Set Menu
7487 DM The Collection Set Menu
 
Ballot design best practices
Ballot design best practicesBallot design best practices
Ballot design best practices
 
C.V.Capt.Georgios Sfakianakis A.M
C.V.Capt.Georgios Sfakianakis A.MC.V.Capt.Georgios Sfakianakis A.M
C.V.Capt.Georgios Sfakianakis A.M
 
Want to know more about k to 12?
Want to know more about k to 12?Want to know more about k to 12?
Want to know more about k to 12?
 
Sticker Reorder Point
Sticker Reorder PointSticker Reorder Point
Sticker Reorder Point
 
Critically Appraised Topic
Critically Appraised TopicCritically Appraised Topic
Critically Appraised Topic
 

Similar to Raichlin et al_Mobile application for Speech Rate

EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMESEFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
kevig
 

Similar to Raichlin et al_Mobile application for Speech Rate (20)

Kc3517481754
Kc3517481754Kc3517481754
Kc3517481754
 
Course report-islam-taharimul (1)
Course report-islam-taharimul (1)Course report-islam-taharimul (1)
Course report-islam-taharimul (1)
 
On the use of voice activity detection in speech emotion recognition
On the use of voice activity detection in speech emotion recognitionOn the use of voice activity detection in speech emotion recognition
On the use of voice activity detection in speech emotion recognition
 
Audio/Speech Signal Analysis for Depression
Audio/Speech Signal Analysis for DepressionAudio/Speech Signal Analysis for Depression
Audio/Speech Signal Analysis for Depression
 
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
 
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueA Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
 
Artificial Intelligence- An Introduction
Artificial Intelligence- An IntroductionArtificial Intelligence- An Introduction
Artificial Intelligence- An Introduction
 
Artificial Intelligence - An Introduction
Artificial Intelligence - An Introduction Artificial Intelligence - An Introduction
Artificial Intelligence - An Introduction
 
30
3030
30
 
IRJET- Emotion recognition using Speech Signal: A Review
IRJET-  	  Emotion recognition using Speech Signal: A ReviewIRJET-  	  Emotion recognition using Speech Signal: A Review
IRJET- Emotion recognition using Speech Signal: A Review
 
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTSEFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS
 
Effect of MFCC Based Features for Speech Signal Alignments
Effect of MFCC Based Features for Speech Signal AlignmentsEffect of MFCC Based Features for Speech Signal Alignments
Effect of MFCC Based Features for Speech Signal Alignments
 
Ijetcas14 390
Ijetcas14 390Ijetcas14 390
Ijetcas14 390
 
Voice Recognition Based Automation System for Medical Applications and for Ph...
Voice Recognition Based Automation System for Medical Applications and for Ph...Voice Recognition Based Automation System for Medical Applications and for Ph...
Voice Recognition Based Automation System for Medical Applications and for Ph...
 
Voice Recognition Based Automation System for Medical Applications and for Ph...
Voice Recognition Based Automation System for Medical Applications and for Ph...Voice Recognition Based Automation System for Medical Applications and for Ph...
Voice Recognition Based Automation System for Medical Applications and for Ph...
 
An effective evaluation study of objective measures using spectral subtractiv...
An effective evaluation study of objective measures using spectral subtractiv...An effective evaluation study of objective measures using spectral subtractiv...
An effective evaluation study of objective measures using spectral subtractiv...
 
Animal Voice Morphing System
Animal Voice Morphing SystemAnimal Voice Morphing System
Animal Voice Morphing System
 
M sc thesis_presentation_
M sc thesis_presentation_M sc thesis_presentation_
M sc thesis_presentation_
 
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMESEFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
 
FINAL report
FINAL reportFINAL report
FINAL report
 

Raichlin et al_Mobile application for Speech Rate

  • 1. Mobile Application for Real Time Monitoring of Speech Rate Based on Phoneme Segmentation Techniques Katia Raichlin Levi 1, Aviv Sotzianu 2, Ofer Amir 3, Eran Aharonson 2, Zehava Ovadia-Blechman 1 1 Medical Engineering Department, Afeka Tel-Aviv Academic College of Engineering. 2 Software Engineering Department, Afeka Tel-Aviv Academic College of Engineering. 3 Communication Disorders Department, Tel Aviv University. ekatirinar@mail.afeka.ac.il, avivs4@afeka.ac.il,oferamir@post.tau.ac.il, erana@afeka.ac.il, zehava@afeka.ac.il Abstract Stuttering is a prevalent speech disorder. It is characterized by various speech disfluencies, such as word repetitions, syllable repetitions, prolongation of sounds and blocking or hesitation before word completion. Although,among adults, stuttering is not regarded as a disorder that can be cured by therapy; adults-who-stutter can learn various techniques to control and improve their speech fluency. To that end, they are commonly required to either slow or modify their speaking rate, and become aware of it, as they produce speech spontaneously. A known method to quantify client’s speaking rate is by calculating the number of syllables or phonemes during practice within a fixed time. The goal of the present study was to develop an algorithm, as well as an application for smartphone, that monitors in real time the rate of the speech ofa patient and presents himthe results both graphically and numerically in real time. The recordings and their analysis are stored on a cloud. This data is uploaded to therapist’s website which allows the therapist to send his patients real-time feedback and instructions. Moreover, in addition to the patient, the therapist can also track training results through a dedicated website. The solution is based on a digital speech processing algorithm which receives speech signal, extracts relevant features and calculates the rate of the user’s speech by recognizing the number of phonemes in each time segment. The analyzed data is presented graphically and numerically on the screen. The DSP algorithm was written using Matlab and ported to Java implementation on Android OS based smartphone. Our algorithm is language agnostic, for this research we analyzed spontaneous Hebrew language speech in real time. The developed algorithm divides user’s speech signal to small overlapping speech frames. The algorithm applies Hamming window for each frame and extracts Mel-Frequency Cepstrum Coefficients (MFCC) in each one. Subsequently, Spectral transition measure (STM) is calculated for each frame. STM step outputs possible candidates that represent the possible transition between two adjacent phonemes. The algorithm detects and removes false boundaries and calculates out of it the average number of phonemes spoken per minute. The preliminary results show that accuracy count phonemes reach an average of 90% (compared to human phoneme counting). The preliminary analysis shows that the main source of deviation is background noise. In Summary, our novel application was found reliable and has a clinical potential to improve speech therapy. The algorithm providing an efficient tool for patient who enables real-time feedback during his exercises, and allowing a better effective follow of the speech fluency. 1. Introduction Stuttering is a prevalent speech disorder. It is characterized by various speech disfluencies,such as word repetitions, syllable repetitions, prolongation of sounds and blocking or hesitation before word completion. Although,among adults,stuttering is not regarded as a disorder that can be cured by therapy; adults- who-stutter can learn various techniques to control and improve their speech fluency. To that
  • 2. end, they are commonly required to either slow or modify their speaking rate, and become aware of it, as they produce speech spontaneously. Today acceptable clinical practice to quantify the client's speech rate is done by manually calculating the number of phonemes during training within a fixed time. Speech therapists can use the following stuttering therapy techniques: Auditory feedback devices: Auditory feedback devices measure feedback speech,distorted speech, delayed speech or pulse tones. An example to such devices is the Electronic fluency devices which change the sound of the user's voice in his or her ear. The disadvantage ofthose devices is that they are available to the client only during the therapy session. Computer based therapy: Computer based therapy programs display speech spectrograms, waveforms, pitch patterns and other graphical representations of an individual’s speech. They are available only in large clinical facilities. The training process can take many months of repeated procedures that are costly. Using those techniques, the patient can evaluate his speech rate only during speech therapy sessions.However,between the meetings he can’t be sure whether his exercises improve his speech rate. The goal of the present study was to develop an application for smartphone which monitors, in real time, the rate of the speech of a patient and presents the results both graphically and numerically. Moreover, in addition to the patient, the therapist could also enable to track training results through a dedicated cloud service. 2. Methodology 2.1 Speech Rate Algorithms and Applications The following algorithms/applications evaluate speech rate to the following groups: Algorithms for detection ofphonemes boundaries for evaluation of speech rate: Example for such algorithm is Dusan and Rabiner algorithm [11].This algorithm wasn’t implemented on mobile, it doesn’t work in real time and it wasn’t tested on Hebrew speakers. Algorithms which measures speech rate by counting vowels in time unit: Example for such algorithm is Pfau and Ruske [Error! Reference source not found.]. This algorithm doesn’t work in real time and it wasn’t implemented on mobile. Evaluation ofspeech rate by counting the number of syllables in time segment: Example for such software is AgentAssist ( Real Time Speaking Rate Monitoring System). This software was designed for call centers which are interested to learn whether their representatives speak too fast [3].It analyses voice signal in real time, however it wasn’t implemented yet on mobile. 2.2 DSP algorithm The DSP algorithm (Figure 1) is based on Sorin Dusan & Lawrence Rabiner algorithm [1] for Figure 1: Block Diagram of the DSP algorithm detection of phonemes boundaries. However, we have modified the detection and removal of false boundaries stage of the algorithm. The DSP algorithm was written using Matlab. The sample rate of the algorithm is 16 KHz. The algorithm divides user’s speech signal to time segments of 10 seconds with 1 second overlap (chunks). It uses the process chunk function in order to calculate the number of phonemes in every chunk.
  • 3. The voice signal in each chunk is divided to small speech frames (32 ms) with overlap (22 ms). The algorithm applies Hamming window for each frame and extracts 10 Mel-Frequency Cepstrum Coefficients (MFCC) in each one. Subsequently, Spectral transition measure (STM) is calculated for each frame. STM outputs possible candidates that represent the possible transition between two adjacent phonemes. In order to detect the false boundaries,we defined a threshold for the STM: if the current STM is smaller than 0.25, it is recognized by the algorithm as a false boundary and is removed. If the STM is higher than 0.25, it is considered as a phoneme boundary. We calculate the number of phonemes in each chunk by the following equation: Number of phonemes = number of boundaries – 1. We considerthe last chunk as a special case:If the number of samples in the last chunk<16000 samples, we ignore the last chunk. If the number of samples in the last chunk is higher than 16000 samples, we consider the last chunk and calculate the number of phonemes in it. 2.3 The Protocol In order to test the accuracy of the algorithm, we have recorded 18 random subjects (men and women), Hebrew speakers, without speech disorders, age 20-50. The subjects were asked to read a Hebrew text from the book:”Habait shel Yael”. We compared the number of phonemes, which were detected by the application, to manual count of phonemes. 2.4 Smartphone and Tablet applications We have developed a complete systemconsisting from a smartphone application (Android) for the patient and tablet application (Android) for the speech therapist. 2.4.1 Smartphone (Android) Application The smartphone (Android) application monitors in real time the rate of the speech of a patient and presents him the results both graphically and numerically (Figure 2). The recordings and their analysis are stored on a cloud. The data is uploaded to therapist’s website (or tablet) which allows the therapist to send his patients real-time feedback and instructions. The application updates the graph of speech rate every 2 seconds, based on arithmetic mean. Figure 2: Online analysis on smartphone 3. Preliminary Results Figure 3 presents the accuracy of the algorithm: Figure 3: Algorithm Accuracy Histogram If the algorithm detects fewer boundaries than the manual count, the accuracy is negative. If the algorithm detects more boundaries than the manual count, the accuracy is positive. The preliminary results showthat accuracy of the algorithm is 92.74% (compared to manual count). The following parameters influenced algorithm’s accuracy: environmental noise, the proximity of the microphone to the speaker, too fast speech rate, the gender of the speaker etc. 4. Discussion Currently, patients can evaluate their speech rate only during speech therapy sessions.Therefore, a need arises for a mobile application which would monitor in real time speaker’s rate of speech in order to improve his speech fluency. On the 0 2 4 6 -15 -10 -5 0 5 10 15 NUMBEROFUSERS(#) PHONEME DETECTION ERROR (%)
  • 4. market, there isn’t yet any mobile application which monitors speech rate in real time based on counting the number of phonemes in time interval, like our application. With further tests and improvements, this application has the potential to improve speech therapy process. 5. Summary This article describes smartphone (Android) application which monitors the speech rate of a patient and presents,in real time, the results both graphically and numerically. Our algorithm and application has the potential to improve the speech therapy by providing the patient real-time feedback and instructions on his exercises by his speech therapist, thus allowing him to learn to control his speech fluency. This application may be effective in different cases where the need to improve the rate of speech. Such as slow or increase the rate of speech, control the rate of uniform speech. Furthermore, the application may also be useful for practicing the speech volume control. Recommendations for further work Our application has been tested on patients without speech disorders.In order to confirm that our application suits their needs, it is highly recommended to test the application with those patients. It is recommended to test the accuracy of the application in different rates of speech: slow, normal and fast. The application is currently designed for adults. With some modifications it could be used also in children’s therapy. References 1. Sorin Dusan, and Lawrence R. Rabiner. "On the relation between maximum spectral transitionpositions andphone boundaries." INTERSPEECH. 2006. 2. Thilo Pfau, andGünther Ruske. "Estimatingthe speaking rate by vowel detection." Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on. Vol. 2. IEEE, 1998. 3. Pandharipande, Meghna Abhishek, and Sunil Kumar Kopparapu. "Real time speaking rate monitoring system." Signal Processing, Communications and Computing (ICSPCC), 2011 IEEE International Conference on. IEEE, 2011. 4. Gerald A. Maguire, Christopher Y. Yeh, andBrandonS. Ito, "Overview of the diagnosis and treatment of stuttering" Journal of Experimental & Clinical Medicine 4.2 (2012): 92-97. Urmila Shrawankar and V. M. Thakare. “Techniques for Feature Extraction in Speech Recognition System: A Comparative Study”, International Journal Of Computer Applications In Engineering, Technology andSciences (IJCAETS),ISSN 0974-3596,2010,pp 412-418 5. Kalinli, Ozlem."Automatic PhonemeSegmentation Using Auditory Attention Features." INTERSPEECH. 2012. 6. Amir Ofer, andReut Levine-Yundof. "Listeners'attitude toward people with dysphonia." Journal of Voice 27.4 (2013): 524-e1.