SlideShare a Scribd company logo
1 of 5
Audio Based Speech Recognition Using KNN
Classification Method
Instructor: Dr. Umasankar Kandaswamy
Santosh Kumar Chikoti
schikoti@ltu.edu
Hansong Xu
Hxu1@ltu.edu
ABSTRACT
Road traffic crashed is top 9th cause of death worldwide, which has being reported fromassociation for safe
international rode travel [1]. Because of car accident, over 37 thousand people died in US each year.
Ranking at third place, which causing car accident is other form of distracted driving (such as: dial, adjust
temperature reaching stuff. etc) [2]. While, the No.1 and No.2 are speeding and use mobile phone, which
can be noticed and reduced by following the rules. In this work, we introduce Automatic Speech
Recognition (ASR) for car drivers, which can adjust or control part of the high frequently using functions
through input driver’s voice, such as adjust temperature, control windows, open GPS, and dial. etc., but not
limited in those function. Apply ASR system while driving can largely reduce the distracted actions, and
minimize the chance of car accidents happens.
INTRODUCTION
To get maximum applicability of ASR, we split our design into two parts: Driver identification and
command classification. More specifically, for the first part, we using our recorded voice from authorized
driver as training data, and new voice record as testing data, to recognize the driver. Only if the new
recorder is classified to certain group of training data, the command section can be activated. For the
second part, we recorded totally 8 speech commands for training, which are ‘Air conditioner’, ‘window
up’, ‘window down’, ‘engine off’, ‘start engine, ‘make phone call’, ‘GPS on’ and ‘play music’. After the
driver recognition is success in the first step, then, the authorized driver may input speech command for
control functions while driving.
Paper [3] introduced a voice based robot control system, they using the Linear Predicted Coefficient (LPC)
feature, since the LPC have better performance on recognition of isolated words. So, for their robot
command such as ‘go origin’, ‘up’, ‘down’, ‘turn’, they didn’t choose the other complex algorithm. Also, in
paper [4] gives us a totally 8 voice features, which can be used for voice information extraction, such as
LPC, Linear predictive cepstral coefficient (LPCC), perceptual linear predictive coefficients (PLP), Mel-
frequency cepstral coefficient (MFCC). etc. For different classifier, K Nearest Neighbor (KNN) was used
in paper [5] for Parkinson’s disease detection. The KNN was trained with voices from both healthy people
and Parkinson’s disease people. At the result, they get 94.8% accuracy rate for 7 optimized features and
98.2% for 9 features. Which consider as optimal performance from KNN.
Our system structure is described in methodology section followed with detail of LPC and MFCC feature
explanation. The simulation result explained in experimental result part. Followed with conclusion section.
METHODOLOGY
1. Linear Predicted Coefficient (LPC)
LPC have a wide range of performance analysis parameters for the purpose of evaluation, which include
Bit rates,Overall delay of the system, Computational Complexity and Objective Performance Evaluation.
The LPC finds the coefficients of 15th order linear predictor that predicts the current value of the collected
audio sample based on the past samples. Linear predictive analysis is used for the purpose of compressing
signal for transmission and storage. LPC is largely used for medium and low bit rate coder. While we pass
the voice or speech signal from a filter, the residual error is generated as an output.
In pre-processing step, the zero value are removed, shows in left part of figure 1. The total 13 LPC feature
data from a name speech recording shows in right part of figure 1.
Figure 1 Zero value removal and LPC feature extraction
2. MFCC
The use of Mel frequency cepstral coefficients can be considered as one of the standard method for feature
extraction. MFCCs are co-efficient that collectively make up MFC. They are derived from a type of
cepstral representation of the voice memo. Here the frequency bands are equally spaced to the Mel scale.
In this method, the spectrum is warped according to the Mel scale. This is very similar to the perceptual
linear predictive analysis of speech and the short-term spectrum is modified based on the spectral
transformation. Mel scale cepstral analysis uses cepstral smoothing to smooth the modified power
spectrum. This is done by direct transformation of the log power spectrum to the cepstral domain using an
inverse Discrete Fourier Transform (DFT).
Figure 2 the steps for extracting MFCC features
Figure 3 13 rows of MFCC feature data from command speech
FFT
•Fourier transform for the collected audio X
Mel-
scale
•Mel-frequency wrapping –mel spectrum
DCT
•Discrete cosine wavelet matrix computation
MFCC
•13 rows of cepstral coefficients
3. KNN
K nearest neighbors is a simple algorithm that stores all available cases and classifies new cases based on a
similarity measure (e.g., Euclidean distance functions).
 Set the training features and classes
 Find the features for new data sets
 Compare the testing data with training data sets
 Label the minimum distance for the new data feature.
Figure 4 general structure of our proposed method
EXPERIMENTAL RESULT
Our simulation work can be followed in five steps:
1. First we give voice input to the systemby recording in Matlab recorder.
2. From the created data set, we extract features from the given input voice.
3. As we have two data sets of ‘Jason ‘ and ‘Santosh’, extraction of features are done with same data
set of speaker.
4. Using KNN, selection of nearest neighbors are selected and matched with the given input, where
K=[1,10].
5. For the perfect match of maximum nearest neighbor hood points we get accurate result.
In above simulation work, we recorded 2 second for both of our speech of our own name for 20 times, at
sampling rate at Fs=8000 per second. The total training data set is 2 × 8000 × 20 total 20 frames. For the
LPC feature, 10 feature data for each frame were extracted. Total feature frame for LPC feature is 10 × 20
from both of us. For the command recognition, we recorded 2 second of 10 commands voice from both of
us for 20 times as well. For the command speech, MFCC feature was calculate for each command data
frame. Totally, 13 rows of MFCC feature data was extracted from each command’ speech.
Then, for testing, we calculated the LPC and MFCC features from recorded voice. Then, K Nearest
Neighbor classifier compared the Euclidean distance of input features with training sets. The accuracy
performance for driver identification shows in figure 6, with total tested 10 times. The accuracy for each
feature shows as figure 7, with total tested 30 times, separately.
Person identification 20 10 (jason) 10 (san)
Correctly identified 8 9
Performance 80% 90%
Figure 6 driver identification performance
Total tested
30 times
Air
conditioner
Window
up
start
engine
Engine
off
Window
down
make
a
phone
call
GPS
on
Play
music
Correctly
identified
29 18 15 29 22 29 15 28
Performance 96.6% 60% 50% 96.6% 73.3% 96.6% 50% 93.3%
Figure 7 commands classification accuracy
CONCLUSION
In this project, we did a speech based voice recognition system, which can be used for controlling in-
vehicle function. Based on this method, we introduced the driver identification and speech based command
classification. By this way, the security of vehicle itself can be increased and potential risk of car accidents
can be reduced as well. We trained our KNN classifier with recorded training command speech and name
speech. Then, we tested this method with new recording for both command and name. Classification
accuracy of our system can be up to 90% and 96.6% separately. Future scope of our work will be words
(letters) recognition and voice (person) recognition individually.
REFERENCE:
[1]: ASIRT “Annual Global Road Crash Statistics” http://asirt.org/Initiatives/Informing-Road-Users/Road-Safety-Facts/Road-Crash-
Statistics
[2]: Attorney ‘Top10Causes of Car Accidents’ December18,2013 Car Accident Attorney
http://www.losangelespersonalinjurylawyers.co/top-10-causes-of-car-accidents/
[3]:Luo Zhizeng and Zhao Jinghing “Speech Recognition and Its Application in Voice based Robot Control System” Conference:
Intelligent Mechatronics andAutomation, 2004. Proceedings. 2004International Conference on
[4]:U. Shrawankar and V. Thakare. Techniques for Feature Extraction In Speech Recognition System : A Comparative Study. Arxiv e-
prints, May2013.
[5]:R. Arefi Shirvan, E. Tahami, Voice analysis for detecting parkinson's desiease using genetic algorithm and KNN classification
method, Proc 18th Int Con onBiomedical Enghineering, Tehran, pp. 550-555, 2011.
[6]; Tsang-LongPao; TatungUniv., Taipei ; Wen-Yuan Liao; Yu-Te Chen “Audio-Visual Speech Recognitionwith Weighted
KNN-basedClassificationin MandarinDatabase” Intelligent InformationHidingandMultimedia Signal Processing, 2007.IIHMSP
2007. ThirdInternational Conferenceon (Volume:1), 26-28 Nov. 2007 pp39 - 42

More Related Content

What's hot

SPEECH RECOGNITION USING NEURAL NETWORK
SPEECH RECOGNITION USING NEURAL NETWORK SPEECH RECOGNITION USING NEURAL NETWORK
SPEECH RECOGNITION USING NEURAL NETWORK Kamonasish Hore
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognitionRichie
 
Natural language processing
Natural language processingNatural language processing
Natural language processingBasha Chand
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognitionCharu Joshi
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognitionManthan Gandhi
 
Speech recognition final
Speech recognition finalSpeech recognition final
Speech recognition finalArpit Kumar
 
Artificial Passenger Fair
Artificial Passenger FairArtificial Passenger Fair
Artificial Passenger FairDIYA NAMBIAR
 
Hot Topics in Machine Learning for Research and Thesis
Hot Topics in Machine Learning for Research and ThesisHot Topics in Machine Learning for Research and Thesis
Hot Topics in Machine Learning for Research and ThesisWriteMyThesis
 
Digital speech processing lecture1
Digital speech processing lecture1Digital speech processing lecture1
Digital speech processing lecture1Samiul Parag
 
Homomorphic speech processing
Homomorphic speech processingHomomorphic speech processing
Homomorphic speech processingsivakumar m
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognitionfathitarek
 
Automatic number plate recognition using matlab
Automatic number plate recognition using matlabAutomatic number plate recognition using matlab
Automatic number plate recognition using matlabChetanSingh134
 
Traffic sign detection
Traffic sign detectionTraffic sign detection
Traffic sign detectionAvijit Rai
 
Natural Language Processing seminar review
Natural Language Processing seminar review Natural Language Processing seminar review
Natural Language Processing seminar review Jayneel Vora
 

What's hot (20)

SPEECH RECOGNITION USING NEURAL NETWORK
SPEECH RECOGNITION USING NEURAL NETWORK SPEECH RECOGNITION USING NEURAL NETWORK
SPEECH RECOGNITION USING NEURAL NETWORK
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognition
 
Automatic Speech Recognition
Automatic Speech RecognitionAutomatic Speech Recognition
Automatic Speech Recognition
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 
Speech recognition final
Speech recognition finalSpeech recognition final
Speech recognition final
 
Speech Signal Processing
Speech Signal ProcessingSpeech Signal Processing
Speech Signal Processing
 
Artificial Passenger Fair
Artificial Passenger FairArtificial Passenger Fair
Artificial Passenger Fair
 
Hot Topics in Machine Learning for Research and Thesis
Hot Topics in Machine Learning for Research and ThesisHot Topics in Machine Learning for Research and Thesis
Hot Topics in Machine Learning for Research and Thesis
 
Digital speech processing lecture1
Digital speech processing lecture1Digital speech processing lecture1
Digital speech processing lecture1
 
Homomorphic speech processing
Homomorphic speech processingHomomorphic speech processing
Homomorphic speech processing
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Next word Prediction
Next word PredictionNext word Prediction
Next word Prediction
 
Automatic number plate recognition using matlab
Automatic number plate recognition using matlabAutomatic number plate recognition using matlab
Automatic number plate recognition using matlab
 
Speech Signal Analysis
Speech Signal AnalysisSpeech Signal Analysis
Speech Signal Analysis
 
Speech processing
Speech processingSpeech processing
Speech processing
 
Traffic sign detection
Traffic sign detectionTraffic sign detection
Traffic sign detection
 
sentiment analysis
sentiment analysis sentiment analysis
sentiment analysis
 
Natural Language Processing seminar review
Natural Language Processing seminar review Natural Language Processing seminar review
Natural Language Processing seminar review
 

Similar to Audio Based Speech Recognition Using KNN Classification Method.Report

Realization and design of a pilot assist decision making system based on spee...
Realization and design of a pilot assist decision making system based on spee...Realization and design of a pilot assist decision making system based on spee...
Realization and design of a pilot assist decision making system based on spee...csandit
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...IJCSEA Journal
 
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...IDES Editor
 
Limited Data Speaker Verification: Fusion of Features
Limited Data Speaker Verification: Fusion of FeaturesLimited Data Speaker Verification: Fusion of Features
Limited Data Speaker Verification: Fusion of FeaturesIJECEIAES
 
Fusion Approach for Robust Speaker Identification System
Fusion Approach for Robust Speaker Identification System Fusion Approach for Robust Speaker Identification System
Fusion Approach for Robust Speaker Identification System IJCSIS Research Publications
 
IRJET- Emotion recognition using Speech Signal: A Review
IRJET-  	  Emotion recognition using Speech Signal: A ReviewIRJET-  	  Emotion recognition using Speech Signal: A Review
IRJET- Emotion recognition using Speech Signal: A ReviewIRJET Journal
 
F EATURE S ELECTION USING F ISHER ’ S R ATIO T ECHNIQUE FOR A UTOMATIC ...
F EATURE  S ELECTION USING  F ISHER ’ S  R ATIO  T ECHNIQUE FOR  A UTOMATIC  ...F EATURE  S ELECTION USING  F ISHER ’ S  R ATIO  T ECHNIQUE FOR  A UTOMATIC  ...
F EATURE S ELECTION USING F ISHER ’ S R ATIO T ECHNIQUE FOR A UTOMATIC ...IJCI JOURNAL
 
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueA Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueCSCJournals
 
FPGA-based implementation of speech recognition for robocar control using MFCC
FPGA-based implementation of speech recognition for robocar control using MFCCFPGA-based implementation of speech recognition for robocar control using MFCC
FPGA-based implementation of speech recognition for robocar control using MFCCTELKOMNIKA JOURNAL
 
Intelligent Arabic letters speech recognition system based on mel frequency c...
Intelligent Arabic letters speech recognition system based on mel frequency c...Intelligent Arabic letters speech recognition system based on mel frequency c...
Intelligent Arabic letters speech recognition system based on mel frequency c...IJECEIAES
 
Hardware Implementation of Adaptive Noise Cancellation over DSP Kit TMS320C6713
Hardware Implementation of Adaptive Noise Cancellation over DSP Kit TMS320C6713Hardware Implementation of Adaptive Noise Cancellation over DSP Kit TMS320C6713
Hardware Implementation of Adaptive Noise Cancellation over DSP Kit TMS320C6713CSCJournals
 
05 comparative study of voice print based acoustic features mfcc and lpcc
05 comparative study of voice print based acoustic features mfcc and lpcc05 comparative study of voice print based acoustic features mfcc and lpcc
05 comparative study of voice print based acoustic features mfcc and lpccIJAEMSJORNAL
 
Performance Evaluation of Conventional and Hybrid Feature Extractions Using M...
Performance Evaluation of Conventional and Hybrid Feature Extractions Using M...Performance Evaluation of Conventional and Hybrid Feature Extractions Using M...
Performance Evaluation of Conventional and Hybrid Feature Extractions Using M...IJERA Editor
 
Audio/Speech Signal Analysis for Depression
Audio/Speech Signal Analysis for DepressionAudio/Speech Signal Analysis for Depression
Audio/Speech Signal Analysis for Depressionijsrd.com
 
A GAUSSIAN MIXTURE MODEL BASED SPEECH RECOGNITION SYSTEM USING MATLAB
A GAUSSIAN MIXTURE MODEL BASED SPEECH RECOGNITION SYSTEM USING MATLABA GAUSSIAN MIXTURE MODEL BASED SPEECH RECOGNITION SYSTEM USING MATLAB
A GAUSSIAN MIXTURE MODEL BASED SPEECH RECOGNITION SYSTEM USING MATLABsipij
 
Voice biometric recognition
Voice biometric recognitionVoice biometric recognition
Voice biometric recognitionphyuhsan
 
Isolated words recognition using mfcc, lpc and neural network
Isolated words recognition using mfcc, lpc and neural networkIsolated words recognition using mfcc, lpc and neural network
Isolated words recognition using mfcc, lpc and neural networkeSAT Journals
 

Similar to Audio Based Speech Recognition Using KNN Classification Method.Report (20)

Realization and design of a pilot assist decision making system based on spee...
Realization and design of a pilot assist decision making system based on spee...Realization and design of a pilot assist decision making system based on spee...
Realization and design of a pilot assist decision making system based on spee...
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
 
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
 
Limited Data Speaker Verification: Fusion of Features
Limited Data Speaker Verification: Fusion of FeaturesLimited Data Speaker Verification: Fusion of Features
Limited Data Speaker Verification: Fusion of Features
 
Fusion Approach for Robust Speaker Identification System
Fusion Approach for Robust Speaker Identification System Fusion Approach for Robust Speaker Identification System
Fusion Approach for Robust Speaker Identification System
 
IRJET- Emotion recognition using Speech Signal: A Review
IRJET-  	  Emotion recognition using Speech Signal: A ReviewIRJET-  	  Emotion recognition using Speech Signal: A Review
IRJET- Emotion recognition using Speech Signal: A Review
 
F EATURE S ELECTION USING F ISHER ’ S R ATIO T ECHNIQUE FOR A UTOMATIC ...
F EATURE  S ELECTION USING  F ISHER ’ S  R ATIO  T ECHNIQUE FOR  A UTOMATIC  ...F EATURE  S ELECTION USING  F ISHER ’ S  R ATIO  T ECHNIQUE FOR  A UTOMATIC  ...
F EATURE S ELECTION USING F ISHER ’ S R ATIO T ECHNIQUE FOR A UTOMATIC ...
 
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueA Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
 
FPGA-based implementation of speech recognition for robocar control using MFCC
FPGA-based implementation of speech recognition for robocar control using MFCCFPGA-based implementation of speech recognition for robocar control using MFCC
FPGA-based implementation of speech recognition for robocar control using MFCC
 
Intelligent Arabic letters speech recognition system based on mel frequency c...
Intelligent Arabic letters speech recognition system based on mel frequency c...Intelligent Arabic letters speech recognition system based on mel frequency c...
Intelligent Arabic letters speech recognition system based on mel frequency c...
 
Ijecet 06 09_010
Ijecet 06 09_010Ijecet 06 09_010
Ijecet 06 09_010
 
D111823
D111823D111823
D111823
 
Hardware Implementation of Adaptive Noise Cancellation over DSP Kit TMS320C6713
Hardware Implementation of Adaptive Noise Cancellation over DSP Kit TMS320C6713Hardware Implementation of Adaptive Noise Cancellation over DSP Kit TMS320C6713
Hardware Implementation of Adaptive Noise Cancellation over DSP Kit TMS320C6713
 
05 comparative study of voice print based acoustic features mfcc and lpcc
05 comparative study of voice print based acoustic features mfcc and lpcc05 comparative study of voice print based acoustic features mfcc and lpcc
05 comparative study of voice print based acoustic features mfcc and lpcc
 
Performance Evaluation of Conventional and Hybrid Feature Extractions Using M...
Performance Evaluation of Conventional and Hybrid Feature Extractions Using M...Performance Evaluation of Conventional and Hybrid Feature Extractions Using M...
Performance Evaluation of Conventional and Hybrid Feature Extractions Using M...
 
Audio/Speech Signal Analysis for Depression
Audio/Speech Signal Analysis for DepressionAudio/Speech Signal Analysis for Depression
Audio/Speech Signal Analysis for Depression
 
A GAUSSIAN MIXTURE MODEL BASED SPEECH RECOGNITION SYSTEM USING MATLAB
A GAUSSIAN MIXTURE MODEL BASED SPEECH RECOGNITION SYSTEM USING MATLABA GAUSSIAN MIXTURE MODEL BASED SPEECH RECOGNITION SYSTEM USING MATLAB
A GAUSSIAN MIXTURE MODEL BASED SPEECH RECOGNITION SYSTEM USING MATLAB
 
Voice biometric recognition
Voice biometric recognitionVoice biometric recognition
Voice biometric recognition
 
Isolated words recognition using mfcc, lpc and neural network
Isolated words recognition using mfcc, lpc and neural networkIsolated words recognition using mfcc, lpc and neural network
Isolated words recognition using mfcc, lpc and neural network
 

Audio Based Speech Recognition Using KNN Classification Method.Report

  • 1. Audio Based Speech Recognition Using KNN Classification Method Instructor: Dr. Umasankar Kandaswamy Santosh Kumar Chikoti schikoti@ltu.edu Hansong Xu Hxu1@ltu.edu
  • 2. ABSTRACT Road traffic crashed is top 9th cause of death worldwide, which has being reported fromassociation for safe international rode travel [1]. Because of car accident, over 37 thousand people died in US each year. Ranking at third place, which causing car accident is other form of distracted driving (such as: dial, adjust temperature reaching stuff. etc) [2]. While, the No.1 and No.2 are speeding and use mobile phone, which can be noticed and reduced by following the rules. In this work, we introduce Automatic Speech Recognition (ASR) for car drivers, which can adjust or control part of the high frequently using functions through input driver’s voice, such as adjust temperature, control windows, open GPS, and dial. etc., but not limited in those function. Apply ASR system while driving can largely reduce the distracted actions, and minimize the chance of car accidents happens. INTRODUCTION To get maximum applicability of ASR, we split our design into two parts: Driver identification and command classification. More specifically, for the first part, we using our recorded voice from authorized driver as training data, and new voice record as testing data, to recognize the driver. Only if the new recorder is classified to certain group of training data, the command section can be activated. For the second part, we recorded totally 8 speech commands for training, which are ‘Air conditioner’, ‘window up’, ‘window down’, ‘engine off’, ‘start engine, ‘make phone call’, ‘GPS on’ and ‘play music’. After the driver recognition is success in the first step, then, the authorized driver may input speech command for control functions while driving. Paper [3] introduced a voice based robot control system, they using the Linear Predicted Coefficient (LPC) feature, since the LPC have better performance on recognition of isolated words. So, for their robot command such as ‘go origin’, ‘up’, ‘down’, ‘turn’, they didn’t choose the other complex algorithm. Also, in paper [4] gives us a totally 8 voice features, which can be used for voice information extraction, such as LPC, Linear predictive cepstral coefficient (LPCC), perceptual linear predictive coefficients (PLP), Mel- frequency cepstral coefficient (MFCC). etc. For different classifier, K Nearest Neighbor (KNN) was used in paper [5] for Parkinson’s disease detection. The KNN was trained with voices from both healthy people and Parkinson’s disease people. At the result, they get 94.8% accuracy rate for 7 optimized features and 98.2% for 9 features. Which consider as optimal performance from KNN. Our system structure is described in methodology section followed with detail of LPC and MFCC feature explanation. The simulation result explained in experimental result part. Followed with conclusion section. METHODOLOGY 1. Linear Predicted Coefficient (LPC) LPC have a wide range of performance analysis parameters for the purpose of evaluation, which include Bit rates,Overall delay of the system, Computational Complexity and Objective Performance Evaluation. The LPC finds the coefficients of 15th order linear predictor that predicts the current value of the collected audio sample based on the past samples. Linear predictive analysis is used for the purpose of compressing signal for transmission and storage. LPC is largely used for medium and low bit rate coder. While we pass the voice or speech signal from a filter, the residual error is generated as an output. In pre-processing step, the zero value are removed, shows in left part of figure 1. The total 13 LPC feature data from a name speech recording shows in right part of figure 1.
  • 3. Figure 1 Zero value removal and LPC feature extraction 2. MFCC The use of Mel frequency cepstral coefficients can be considered as one of the standard method for feature extraction. MFCCs are co-efficient that collectively make up MFC. They are derived from a type of cepstral representation of the voice memo. Here the frequency bands are equally spaced to the Mel scale. In this method, the spectrum is warped according to the Mel scale. This is very similar to the perceptual linear predictive analysis of speech and the short-term spectrum is modified based on the spectral transformation. Mel scale cepstral analysis uses cepstral smoothing to smooth the modified power spectrum. This is done by direct transformation of the log power spectrum to the cepstral domain using an inverse Discrete Fourier Transform (DFT). Figure 2 the steps for extracting MFCC features Figure 3 13 rows of MFCC feature data from command speech FFT •Fourier transform for the collected audio X Mel- scale •Mel-frequency wrapping –mel spectrum DCT •Discrete cosine wavelet matrix computation MFCC •13 rows of cepstral coefficients
  • 4. 3. KNN K nearest neighbors is a simple algorithm that stores all available cases and classifies new cases based on a similarity measure (e.g., Euclidean distance functions).  Set the training features and classes  Find the features for new data sets  Compare the testing data with training data sets  Label the minimum distance for the new data feature. Figure 4 general structure of our proposed method EXPERIMENTAL RESULT Our simulation work can be followed in five steps: 1. First we give voice input to the systemby recording in Matlab recorder. 2. From the created data set, we extract features from the given input voice. 3. As we have two data sets of ‘Jason ‘ and ‘Santosh’, extraction of features are done with same data set of speaker. 4. Using KNN, selection of nearest neighbors are selected and matched with the given input, where K=[1,10]. 5. For the perfect match of maximum nearest neighbor hood points we get accurate result. In above simulation work, we recorded 2 second for both of our speech of our own name for 20 times, at sampling rate at Fs=8000 per second. The total training data set is 2 × 8000 × 20 total 20 frames. For the LPC feature, 10 feature data for each frame were extracted. Total feature frame for LPC feature is 10 × 20 from both of us. For the command recognition, we recorded 2 second of 10 commands voice from both of us for 20 times as well. For the command speech, MFCC feature was calculate for each command data frame. Totally, 13 rows of MFCC feature data was extracted from each command’ speech. Then, for testing, we calculated the LPC and MFCC features from recorded voice. Then, K Nearest Neighbor classifier compared the Euclidean distance of input features with training sets. The accuracy performance for driver identification shows in figure 6, with total tested 10 times. The accuracy for each feature shows as figure 7, with total tested 30 times, separately.
  • 5. Person identification 20 10 (jason) 10 (san) Correctly identified 8 9 Performance 80% 90% Figure 6 driver identification performance Total tested 30 times Air conditioner Window up start engine Engine off Window down make a phone call GPS on Play music Correctly identified 29 18 15 29 22 29 15 28 Performance 96.6% 60% 50% 96.6% 73.3% 96.6% 50% 93.3% Figure 7 commands classification accuracy CONCLUSION In this project, we did a speech based voice recognition system, which can be used for controlling in- vehicle function. Based on this method, we introduced the driver identification and speech based command classification. By this way, the security of vehicle itself can be increased and potential risk of car accidents can be reduced as well. We trained our KNN classifier with recorded training command speech and name speech. Then, we tested this method with new recording for both command and name. Classification accuracy of our system can be up to 90% and 96.6% separately. Future scope of our work will be words (letters) recognition and voice (person) recognition individually. REFERENCE: [1]: ASIRT “Annual Global Road Crash Statistics” http://asirt.org/Initiatives/Informing-Road-Users/Road-Safety-Facts/Road-Crash- Statistics [2]: Attorney ‘Top10Causes of Car Accidents’ December18,2013 Car Accident Attorney http://www.losangelespersonalinjurylawyers.co/top-10-causes-of-car-accidents/ [3]:Luo Zhizeng and Zhao Jinghing “Speech Recognition and Its Application in Voice based Robot Control System” Conference: Intelligent Mechatronics andAutomation, 2004. Proceedings. 2004International Conference on [4]:U. Shrawankar and V. Thakare. Techniques for Feature Extraction In Speech Recognition System : A Comparative Study. Arxiv e- prints, May2013. [5]:R. Arefi Shirvan, E. Tahami, Voice analysis for detecting parkinson's desiease using genetic algorithm and KNN classification method, Proc 18th Int Con onBiomedical Enghineering, Tehran, pp. 550-555, 2011. [6]; Tsang-LongPao; TatungUniv., Taipei ; Wen-Yuan Liao; Yu-Te Chen “Audio-Visual Speech Recognitionwith Weighted KNN-basedClassificationin MandarinDatabase” Intelligent InformationHidingandMultimedia Signal Processing, 2007.IIHMSP 2007. ThirdInternational Conferenceon (Volume:1), 26-28 Nov. 2007 pp39 - 42