SlideShare a Scribd company logo
1 of 7
Download to read offline
International Conference on Advance Research in Computer Science, Electrical and Electronics Engineering 
Sep 7, 2013 Pattaya 
5 
SPEAKER AND SPEECH RECOGNITION FOR SECURED SMART HOME APPLICATION 
R. Gomes1, S. Shaji2, L. Nadar2, V. Vincent2 
Dept. of Electronics and Telecommunication 
Xavier Institute of Engineering, University of Mumbai 
Mahim (W), Mumbai-400016, Maharashtra, India 
1write2roger.gomes@gmail.com 
2be.speaker.recognition@gmail.com 
S. Patnaik 
Dept. of Electronics and Telecommunication 
Xavier Institute of Engineering, University of Mumbai 
Mahim (W), Mumbai-400016, Maharashtra, India 
suprava.patnaik@xavierengg.com 
ABSTRACT 
The concept of a smart home refers to the idea of having intelligent devices surrounding us responding to our various needs as and when the situation arises for e.g. switching on/off of lights and fans when an individual enters or leaves a room, automatic adjustment of the temperature of a room depending on the ambient temperature etc. In the context of a smart home an individual’s interaction with all the electrical appliances is crucial giving him complete control and freedom to control all the devices at home. However, with this control a question of security arises. An individual at his home would want access to all the devices restricted to only his family members and friends. To address the above simultaneous demand of security (e.g. operation by family members only) and automation (remote operation of multiple devices), in this paper we present a concept of speaker recognition for security and speech recognition for home appliances automation. The goal is design and implementation of a text independent speaker recognition based on Mel-frequency Cepstrum Coefficients (MFCCs) and Vector Quantization (VQ) algorithm for security integrated with a speaker independent speech recognition using Dynamic time warping (DTW) algorithm for home appliances automation. 
KEYWORDS: Automation, Security, Speaker Recognition, Speech Recognition, Mel Frequency Cepstrum Coefficients (MFCCs), Vector Quantization (VQ), Dynamic Time Warping (DTW) 
I. INTRODUCTION 
The human speech signal contains many discriminative features. These features are unique to every individual and serve as a biometric parameter which can be used by robust voice based biometric systems to correctly verify an individual‟s identity [1]. Unlike other biometric parameters like fingerprint and iris, voice based biometrics presents the advantage of remotely accessing systems through the telephone network, this makes it quite valuable in real time applications of authentication and authorization over a large distance [2]. Speaker recognition typically is the process of automatically recognizing who is speaking on the basis of information obtained from his speech. This technique will make it possible to verify the identity of a person accessing the system [2]. In the context of automation in a smart home only an authorized user must be given access to control all the devices and appliances at home. In this case, for authenticating a user we use text independent speaker recognition. Once access to the system has been granted to the authenticated user, all the appliances and device connected to the system must be under his control. In order to accomplish this task we use isolated word speech recognition for correctly identifying the uttered words by matching it with the reference templates stored in the database. 
The proposed system in this paper involves three phases. The first phase is the speaker recognition phase to authenticate the user, the second phase is the speech recognition phase to identify the word spoken by the user for the purpose of automation and the third phase is the device control phase which involves serially communicating the results of identification to PIC16F676 to toggle the status of the devices connected to it. 
II. SPEAKER RECOGNITION 
Speaker recognition is the method of automatically identify who is speaking on the basis of individual information integrated in speech waves [2]. The process of speaker recognition involves two phases, the testing and the training phase. Both these phases involve extracting the features vectors and its matching. This is possible using MFCC algorithm and feature matching using VQ and its optimization with Linde, Buzo and Gray (LBG) algorithm. 
Fig. 1 Block Diagram of MFCC Processor [3] 
A. Mel-frequency Cepstrum Coefficients 
The Mel-Frequency Cepstrum (MFC) is a representation of short-term power spectrum of a sound. The MFCCs are coefficients that collectively make up an MFC. They are derived from a type of cepstral representation of the audio clip (a nonlinear "spectrum- of-a-spectrum") [3]. The difference between the
6 
cepstrum and the mel-frequency cepstrum is that in the MFC, the frequency bands are equally spaced on the mel scale, which approximates the human auditory system's response more closely than the linearly-spaced frequency bands used in the normal cepstrum [1]. 
1) Frame Blocking: It has been assumed that over a long interval of time speech signal is not stationary, however over a sufficiently short interval of time say 10-30ms it can be considered stationary. In frame blocking, the continuous speech signal is blocked into frames of N samples, with adjacent frames being separated by M (M < N).The first frame consists of the first N samples. The second frame begins M samples after the first frame, and overlaps it by N - M samples [3]. Similarly, the third frame begins 2M samples after the first frame (or M samples after the second frame) and overlaps it by N - 2M samples. Typical values for N and M are N = 256 (which is equivalent to ~ 30ms windowing and facilitate the fast radix-2 FFT) and M = 100 [1, 3]. 
2) Windowing: To minimize the signal discontinuities at the beginning and end of each frame the concept of windowing is used to minimize the spectral distortion to taper the signal to zero at the beginning and end of each frame. In other words, when we perform Fourier Transform, it assumes that the signal repeats, and the end of one frame does not connect smoothly with the beginning of the next one. In this process, we multiply the given signal (frame in this case) by a so called Window Function [3, 11]. There are many „soft windows‟ which can be used, but in our system Hamming window has been used, which has the form 
( ) ( ) ( ) 
3) Fast Fourier Transform (FFT): The next processing step is the Fast Fourier Transform, which converts each frame of N samples from the time domain into the frequency domain [3]. The FFT is a fast algorithm to implement the Discrete Fourier Transform (DFT) which is defined on the set of N samples 
Σ ( ) 
The result after this step is often referred to as spectrum or periodogram [5, 3]. 
4) Mel-frequency wrapping: Psychophysical studies have shown that human perception of the frequency contents of sounds for speech signals does not follow a linear scale. Thus for each tone with an 
actual frequency, f, measured in Hz, a subjective pitch is measured on a scale called the „mel‟ scale. The mel-frequency scale is linear frequency spacing below 1000 Hz and a logarithmic spacing above 1000 Hz. As a reference point, the pitch of a 1 kHz tone, 40dB above the perceptual hearing threshold, is defined as 1000 mels [1, 3]. Therefore we can use the following approximate formula to compute the mels for a given frequency f in Hz: 
( ) ( ) ( ) 
5) Cepstrum: In this final step, we convert the log mel spectrum back to time. The result is called the mel frequency cepstrum coefficients (MFCC). The cepstral representation of the speech spectrum provides a good representation of the local spectral properties of the signal for the given frame analysis. Because the mel spectrum coefficients (and so their logarithm) are real numbers, we can convert them to the time domain using the Discrete Cosine Transform (DCT). Therefore if we denote those mel power spectrum coefficients that are the result of the last step are 
̃ , K (4) 
We calculate the mfcc‟s as ̃ Σ ( ̃) [ ( ) ] ( ) 
By applying the procedure described above, for each speech frame of around 30msec with overlap, a set of mel-frequency cepstrum coefficients is computed [3, 4]. These are result of a cosine transform of the logarithm of the short-term power spectrum expressed on a mel- frequency scale. This set of coefficients is called an acoustic vector. Therefore each input utterance is transformed into a sequence of acoustic vectors. 
B. Feature matching using VQ 
The state-of-the-art in feature matching techniques used in speaker recognition includes DTW, Hidden Markov Modelling (HMM), and VQ. In this paper, the VQ approach is used, due to ease of implementation and high accuracy [2]. Vector Quantization is the classical quantization technique from signal processing which allows the modelling of probability density functions by the distribution of prototype vectors. It works by dividing a large set of points into groups having approximately the same number of points closest to them. Each group is represented by its centroid point. The density matching property of vector quantization is powerful, especially for identifying the density of large and high-dimensioned data. Since data points are represented by the index of their closest centroid, commonly occurring data have low error [1].
7 
A vector quantizer maps k-dimensional vectors in the vector space Rk into a finite set of vectors Y = {yi : i = 1, 2,….N}. Each vector yi is called a code vector or a codeword and the set of all the code words is called a codebook. Associated with each codeword, yi, is a nearest neighbour region called Voronoi region, and it is defined by { } ( ) 
Given an input vector, the codeword that is chosen to represent it is the one in the same Voronoi region. 
Fig. 2 Codewords in 2-dimensional space. Input vectors are marked with an x, codewords are marked with circles, and the Voronoi regions are separated with boundary lines [1] 
The representative codeword is determined to be the closest in Euclidean distance from the input vector. The Euclidean distance is defined by ( ) √Σ( ) ( ) 
where, xj is the jth component of the input vector, and yij is 
the jth is component of the codeword yi [1]. 
C. Clustering of Training Vectors using LBG algorithm 
After the enrolment session, the acoustic vectors extracted from input speech of a speaker provide a set of training vectors. As described above, the next important step is to build a speaker-specific VQ codebook for this speaker using those training vectors. There is a well-known algorithm, namely LBG algorithm [Linde, Buzo and Gray, 1980], for clustering a set of L training vectors into a set of M codebook vectors [3]. The algorithm is formally implemented by the following recursive procedure: 
1. Design a 1-vector codebook; this is the centroid of the entire set of training vectors (hence, no iteration is required here). 
2. Double the size of the codebook by splitting each current codebook yn according to the rule 
( ) 
( ) 
Where n varies from 1 to the current size of the codebook, and ε is a splitting parameter (we choose ε=0.01). 
3. Nearest-Neighbor Search: for each training vector, find the codeword in the current codebook that is closest (in terms of similarity measurement), and assign that vector to the corresponding cell (associated with the closest codeword). 
4. Centroid Update: update the codeword in each cell using the centroid of the training vectors assigned to 
that cell. 
5. Iteration 1: repeat steps 3 and 4 until the average distance falls below a preset threshold. 
6. Iteration 2: repeat steps 2, 3 and 4 until a codebook size of M is designed [3]. 
III. SPEECH RECOGNITION 
Speech Recognition is the ability of a computer to recognize general, naturally flowing utterances from a wide variety of users [10]. Speaker independent isolated word recognition for the purpose of automation in a smart home has been described in this paper. The process of isolated word recognition involves acquisition of the speech sequence of the word uttered by the user. This then followed by the extraction of MFCC‟s or the acoustic feature vectors which is exactly similar to the processes employed in speaker recognition described in the above section. This then followed by the DTW algorithm to identify the correctly uttered word. 
A. Dynamic Time Warping 
DTW algorithm is based on Dynamic Programming techniques as described in [10]. This algorithm is for measuring similarity between two time series which may vary in time or speed. This technique also used to find the optimal alignment between two times series if one time series may be “warped” non-linearly by stretching or shrinking it along its time axis. This warping between two time series can then be used to find corresponding regions between the two time series or to determine the similarity between the two time series [11]. The principle of DTW is to compare two dynamic patterns and measure its similarity by calculating a minimum distance between them. The classic DTW is computed as below. Suppose we have two time series Q and C, of length n and m respectively, where: 
Q= q1,q2,q3….qi….qn (8) 
C=c1,c2,c3.....cj…...cm, (9) 
To align two sequences using DTW, an n-by-m matrix where the (ith, jth) element of the matrix contains the distance d (qi, cj) between the two points qi and cj is constructed [10]. Then, the absolute distance between
8 
the values of two sequences is calculated using the Euclidean distance computation: 
d (qi , cj) = (qi - cj)2 (10) 
Each matrix element (i, j) corresponds to the alignment between the points qi and cj. Then, accumulated distance is measured by: 
D(i, j) =min[ D(i-1, j-1), D(i-1, j) ,D(i, j-1) ] + d(i, j) (11) 
Using dynamic programming techniques, the search for the minimum distance path can be done in polynomial time P(t), using equation below: 
P(t)=O(N2 V) (12) 
where, N is the length of the sequence, and V is the number of templates to be considered [11]. Theoretically, the major optimizations to the DTW algorithm arise from observations on the nature of good paths through the grid. These are outlined in Sakoe and Chiba [11,12] and can be summarized as: Monotonic condition, Continuity Condition, Boundary Condition, Adjustment window condition and Slope constraint condition. 
IV. SYSTEM ARCHITECTURE 
The application of speaker and speech recognition in our proposed smart home system is shown in figure 7. 
Fig. 3 Process flow of the proposed smart home system 
As described in figure 7 a prospective user must first be authenticated to use the system, his speech sequences are first acquired and analyzed using MFCC and VQ LBG if it matches with the speaker templates then the user is granted access. The next phase is the automation phase, the authenticated user utters the name of the device/appliance he wants to use, provided the reference template of the word is stored and the device is connected to the system. DTW algorithm insures robust matching with the reference templates and on correct recognition passes on the results acquired to the PIC16F676 microcontroller using the RS232 standard communication protocol. On receiving the appropriate signals of the correctly recognized device/appliance, its current status would be toggled. 
A. Experimental Setup 
As it can be seen from figure 7, the basic experimental setup consists of mic which captures the utterances from the user. Processing of the speech is done by the Matlab Scripts which involves feature extraction using MFCC, Feature matching and optimization using VQ and LBG respectively, followed by isolated word recognition using DTW. The phases of speaker and speech recognition are carried out in Matlab following which the results of authentication and identification are serially communicated to the PIC16F676 microcontroller. 
Computer mic Light Bulb PIC16F676 based RS232 
Relay board 
Fig. 4 Experimental set up for speaker and speech recognition based device control 
B. PIC16F676 based RS232 Relay Board 
The PIC16F676 microcontroller has been used in our system for communicating with Matlab to acquire the results of the recognized word using the RS232 communications protocol. Interfacing with various devices in our system has been accomplished by making provisions for an array of relays. Acquisition of Speech Sequence from the prospective user Analysis of the Speech Sequence for Authentication Speech Feature Extraction using MFCC Speech Feature matching with the models in the database using VQ LBG Perform Speech Recognition using DTW Grant of access to the authenticated user for controlling devices using speech recognition Acquire uttered speech sequence and extraction of acoustic feature vectors(MFCC) Recognition of the uttered word using DTW Serially communicate the recognized word to PIC16F676 using RS232 communication protocol Toggle the current status of the corresponding device connected to the microcontroller via a relay
9 
Light Bulb 8 Relays ULN2803 PIC16F676 LM7805 
Fig. 5 PIC16F676 based RS232 Relay Board 
As shown in figure 9, our system provides provision for 8 devices as 8 relays are connected to the PIC16F676 microcontroller, these are in turn driven by ULN2803 high voltage, high current Darlington arrays for providing the necessary switching signals to the relays. 
V. RESULTS 
The Speaker and Speech recognition algorithms were successfully implemented in matlab. Speech feature vector extraction using MFCC and feature matching using VQ LBG have been successfully implemented in matlab for speaker recognition thus fulfilling the objective of authenticating a user. The figures below describe the results obtained. 
Fig. 6 Plot of mel-spaced filterbanks 
Fig. 7 Plot of VQ codewords 
Fig. 8 Results of successful Authentication 
Fig. 9 Results of successful word Identification 
VI. CONCLUSION 
The implemented speaker recognition system was found to have an accuracy of 80% Accuracy is compromised if conditions like duration of silence, ambient noise content, emotional and physical health of the speaker vary during training and testing period. Thus we have to ensure that these conditions remain same during both the training and testing phases. The accuracy of speaker recognition could be improved by using a larger database of samples for training purposes. These samples may be taken under varying conditions and thus can present a complete representation of the trained speaker during training. 
The implemented DTW based speech recognition system was found to have a high accuracy of 90%. The recognition was followed by communication of the results to the PIC16F676 microcontroller serially thus switching on/off of the device connected to it. Thus, the objective of security in a smart home by authenticating a user using speaker recognition and automation in a smart home using speech recognition have been achieved and presented in this paper. 
REFERENCES 
1) Vibha Tiwari, “MFCC and its Application in Speaker Recognition”, International Journal on Emerging Technologies,ISSN: 0975-8364, Feb 2010
1 0 
2) S. J. Abdallaha, I. M. Osman, M. E. Mustafa, “Text-Independent Speaker Identification Using Hidden Markov Model” World of Computer Science and Information Technology Journal (WCSIT) , ISSN: 2221-0741, Vol. 2, No. 6, 203- 208, 2012 
3) Ch.Srinivasa Kumar et al., “Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm”, International Journal on Computer Science and Engineering (IJCSE), ISSN: 0975-3397, Vol 3 No: 8, August 2011 
4) Srinivasan,”Speaker Identification and Verification using Vector Quantization and Mel Frequency Cepstral Coefficients” Research Journal of Applied Sciences, Engineering and Technology, ISSN:2040- 7467, 4(1): 33-40, 2012 
5) Anjali Bala et al. , ”Voice Command recognition system on MFCC and DTW”, International Journal of Engineering Science and Technology, ISSN:0975-5462, Vol. 2 (12), 2010, 
6) D. Subudhi, A.K. Patra, N. Bhattacharya, and P. Kuanar, “Embedded System Design of a Remote Voice Control and Security System”, TENCON 2008-2008 Region 10 Conference 
7) Ian McLoughlin, “Applied Speech and Audio Signal Processing”, Cambridge University Press, 2009 
8) Jacob Benesty, M. Mohan Sondhi, Yiteng Huang(Eds.),”Springer Handbook of Speech Processing” 
9) A Thakur, “Design of a Matlab based Automatic Speaker Recognition and Control System”, International journal of Advanced engineering Sciences and Technologies, ISSN: 2230-7818, Vol no 8, Issue no 1, 100-1 
10) B Plannener, “Introduction to Speech Recognition” March 2005, www.speech-recognition .de accessed on 25th April 2013 
11) L Muda, M Begam and L Elamvazuthi, “Voice Recognition Algorithms using MFCC and DTW Techniques” Journal of Computing, volume 2 , issues 3, March 2010 
12) Steve Cassidy, “Speech Recognition: Chapter 11: Pattern Matching in Time”, http://web.science.mq .edu.au/~cassidy/comp449/html/ch11s02.html, Accessed on 24th April 2013
11

More Related Content

What's hot

FPGA-based implementation of speech recognition for robocar control using MFCC
FPGA-based implementation of speech recognition for robocar control using MFCCFPGA-based implementation of speech recognition for robocar control using MFCC
FPGA-based implementation of speech recognition for robocar control using MFCCTELKOMNIKA JOURNAL
 
F EATURE S ELECTION USING F ISHER ’ S R ATIO T ECHNIQUE FOR A UTOMATIC ...
F EATURE  S ELECTION USING  F ISHER ’ S  R ATIO  T ECHNIQUE FOR  A UTOMATIC  ...F EATURE  S ELECTION USING  F ISHER ’ S  R ATIO  T ECHNIQUE FOR  A UTOMATIC  ...
F EATURE S ELECTION USING F ISHER ’ S R ATIO T ECHNIQUE FOR A UTOMATIC ...IJCI JOURNAL
 
Speaker recognition systems
Speaker recognition systemsSpeaker recognition systems
Speaker recognition systemsNamratha Dcruz
 
Environmental Sound detection Using MFCC technique
Environmental Sound detection Using MFCC techniqueEnvironmental Sound detection Using MFCC technique
Environmental Sound detection Using MFCC techniquePankaj Kumar
 
Isolated words recognition using mfcc, lpc and neural network
Isolated words recognition using mfcc, lpc and neural networkIsolated words recognition using mfcc, lpc and neural network
Isolated words recognition using mfcc, lpc and neural networkeSAT Journals
 
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueA Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueCSCJournals
 
Complexity analysis of multilayer perceptron neural network embedded into a w...
Complexity analysis of multilayer perceptron neural network embedded into a w...Complexity analysis of multilayer perceptron neural network embedded into a w...
Complexity analysis of multilayer perceptron neural network embedded into a w...Amir Shokri
 
Speaker Identification From Youtube Obtained Data
Speaker Identification From Youtube Obtained DataSpeaker Identification From Youtube Obtained Data
Speaker Identification From Youtube Obtained Datasipij
 
Distributed dynamic frequency allocation in wireless cellular networks using ...
Distributed dynamic frequency allocation in wireless cellular networks using ...Distributed dynamic frequency allocation in wireless cellular networks using ...
Distributed dynamic frequency allocation in wireless cellular networks using ...eSAT Journals
 
SPEKER RECOGNITION UNDER LIMITED DATA CODITION
SPEKER RECOGNITION UNDER LIMITED DATA CODITIONSPEKER RECOGNITION UNDER LIMITED DATA CODITION
SPEKER RECOGNITION UNDER LIMITED DATA CODITIONniranjan kumar
 
Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016
Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016
Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016Keunwoo Choi
 
Designing an Efficient Multimodal Biometric System using Palmprint and Speech...
Designing an Efficient Multimodal Biometric System using Palmprint and Speech...Designing an Efficient Multimodal Biometric System using Palmprint and Speech...
Designing an Efficient Multimodal Biometric System using Palmprint and Speech...IDES Editor
 
Wavelet Based Noise Robust Features for Speaker Recognition
Wavelet Based Noise Robust Features for Speaker RecognitionWavelet Based Noise Robust Features for Speaker Recognition
Wavelet Based Noise Robust Features for Speaker RecognitionCSCJournals
 
Design and implementation of a java based virtual laboratory for data communi...
Design and implementation of a java based virtual laboratory for data communi...Design and implementation of a java based virtual laboratory for data communi...
Design and implementation of a java based virtual laboratory for data communi...IJECEIAES
 
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...TELKOMNIKA JOURNAL
 

What's hot (20)

FPGA-based implementation of speech recognition for robocar control using MFCC
FPGA-based implementation of speech recognition for robocar control using MFCCFPGA-based implementation of speech recognition for robocar control using MFCC
FPGA-based implementation of speech recognition for robocar control using MFCC
 
F EATURE S ELECTION USING F ISHER ’ S R ATIO T ECHNIQUE FOR A UTOMATIC ...
F EATURE  S ELECTION USING  F ISHER ’ S  R ATIO  T ECHNIQUE FOR  A UTOMATIC  ...F EATURE  S ELECTION USING  F ISHER ’ S  R ATIO  T ECHNIQUE FOR  A UTOMATIC  ...
F EATURE S ELECTION USING F ISHER ’ S R ATIO T ECHNIQUE FOR A UTOMATIC ...
 
Speaker recognition systems
Speaker recognition systemsSpeaker recognition systems
Speaker recognition systems
 
Environmental Sound detection Using MFCC technique
Environmental Sound detection Using MFCC techniqueEnvironmental Sound detection Using MFCC technique
Environmental Sound detection Using MFCC technique
 
Isolated words recognition using mfcc, lpc and neural network
Isolated words recognition using mfcc, lpc and neural networkIsolated words recognition using mfcc, lpc and neural network
Isolated words recognition using mfcc, lpc and neural network
 
V041203124126
V041203124126V041203124126
V041203124126
 
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueA Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
 
D04812125
D04812125D04812125
D04812125
 
Complexity analysis of multilayer perceptron neural network embedded into a w...
Complexity analysis of multilayer perceptron neural network embedded into a w...Complexity analysis of multilayer perceptron neural network embedded into a w...
Complexity analysis of multilayer perceptron neural network embedded into a w...
 
Speaker Identification From Youtube Obtained Data
Speaker Identification From Youtube Obtained DataSpeaker Identification From Youtube Obtained Data
Speaker Identification From Youtube Obtained Data
 
histogram-based-emotion
histogram-based-emotionhistogram-based-emotion
histogram-based-emotion
 
Distributed dynamic frequency allocation in wireless cellular networks using ...
Distributed dynamic frequency allocation in wireless cellular networks using ...Distributed dynamic frequency allocation in wireless cellular networks using ...
Distributed dynamic frequency allocation in wireless cellular networks using ...
 
SPEKER RECOGNITION UNDER LIMITED DATA CODITION
SPEKER RECOGNITION UNDER LIMITED DATA CODITIONSPEKER RECOGNITION UNDER LIMITED DATA CODITION
SPEKER RECOGNITION UNDER LIMITED DATA CODITION
 
Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016
Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016
Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016
 
Deep leaning Vincent Vanhoucke
Deep leaning Vincent VanhouckeDeep leaning Vincent Vanhoucke
Deep leaning Vincent Vanhoucke
 
Designing an Efficient Multimodal Biometric System using Palmprint and Speech...
Designing an Efficient Multimodal Biometric System using Palmprint and Speech...Designing an Efficient Multimodal Biometric System using Palmprint and Speech...
Designing an Efficient Multimodal Biometric System using Palmprint and Speech...
 
Wavelet Based Noise Robust Features for Speaker Recognition
Wavelet Based Noise Robust Features for Speaker RecognitionWavelet Based Noise Robust Features for Speaker Recognition
Wavelet Based Noise Robust Features for Speaker Recognition
 
Design and implementation of a java based virtual laboratory for data communi...
Design and implementation of a java based virtual laboratory for data communi...Design and implementation of a java based virtual laboratory for data communi...
Design and implementation of a java based virtual laboratory for data communi...
 
Chapter3
Chapter3Chapter3
Chapter3
 
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
 

Viewers also liked

Leadership Lessons to Learn From The Dark Knight Trilogy
Leadership Lessons to Learn From The Dark Knight TrilogyLeadership Lessons to Learn From The Dark Knight Trilogy
Leadership Lessons to Learn From The Dark Knight TrilogyRoger Gomes
 
Dynamic time warping and PIC 16F676 for control of devices
Dynamic time warping and PIC 16F676 for control of devicesDynamic time warping and PIC 16F676 for control of devices
Dynamic time warping and PIC 16F676 for control of devicesRoger Gomes
 
Future of Cellular Communication: 4G Communication Systems
Future of Cellular Communication: 4G Communication Systems  Future of Cellular Communication: 4G Communication Systems
Future of Cellular Communication: 4G Communication Systems Roger Gomes
 
AWS re:Invent 2016: Voice-enabling Your Home and Devices with Amazon Alexa an...
AWS re:Invent 2016: Voice-enabling Your Home and Devices with Amazon Alexa an...AWS re:Invent 2016: Voice-enabling Your Home and Devices with Amazon Alexa an...
AWS re:Invent 2016: Voice-enabling Your Home and Devices with Amazon Alexa an...Amazon Web Services
 
Introducing The Amazon Echo
Introducing The Amazon EchoIntroducing The Amazon Echo
Introducing The Amazon EchoMicah Flores
 
Introduction to Smart Devices
Introduction to Smart DevicesIntroduction to Smart Devices
Introduction to Smart DevicesYong Heui Cho
 
Role of information technology on environment and human health
Role of information technology on environment and human healthRole of information technology on environment and human health
Role of information technology on environment and human healthRoger Gomes
 
Smart Home technologies
Smart Home technologiesSmart Home technologies
Smart Home technologiesloggcity
 

Viewers also liked (10)

Leadership Lessons to Learn From The Dark Knight Trilogy
Leadership Lessons to Learn From The Dark Knight TrilogyLeadership Lessons to Learn From The Dark Knight Trilogy
Leadership Lessons to Learn From The Dark Knight Trilogy
 
Dark Knight v Batman narrative
Dark Knight v Batman narrative Dark Knight v Batman narrative
Dark Knight v Batman narrative
 
Dynamic time warping and PIC 16F676 for control of devices
Dynamic time warping and PIC 16F676 for control of devicesDynamic time warping and PIC 16F676 for control of devices
Dynamic time warping and PIC 16F676 for control of devices
 
Future of Cellular Communication: 4G Communication Systems
Future of Cellular Communication: 4G Communication Systems  Future of Cellular Communication: 4G Communication Systems
Future of Cellular Communication: 4G Communication Systems
 
Amazon Echo
Amazon EchoAmazon Echo
Amazon Echo
 
AWS re:Invent 2016: Voice-enabling Your Home and Devices with Amazon Alexa an...
AWS re:Invent 2016: Voice-enabling Your Home and Devices with Amazon Alexa an...AWS re:Invent 2016: Voice-enabling Your Home and Devices with Amazon Alexa an...
AWS re:Invent 2016: Voice-enabling Your Home and Devices with Amazon Alexa an...
 
Introducing The Amazon Echo
Introducing The Amazon EchoIntroducing The Amazon Echo
Introducing The Amazon Echo
 
Introduction to Smart Devices
Introduction to Smart DevicesIntroduction to Smart Devices
Introduction to Smart Devices
 
Role of information technology on environment and human health
Role of information technology on environment and human healthRole of information technology on environment and human health
Role of information technology on environment and human health
 
Smart Home technologies
Smart Home technologiesSmart Home technologies
Smart Home technologies
 

Similar to Speaker and speech recognition for smart home security

Intelligent Arabic letters speech recognition system based on mel frequency c...
Intelligent Arabic letters speech recognition system based on mel frequency c...Intelligent Arabic letters speech recognition system based on mel frequency c...
Intelligent Arabic letters speech recognition system based on mel frequency c...IJECEIAES
 
Speech Recognition System By Matlab
Speech Recognition System By MatlabSpeech Recognition System By Matlab
Speech Recognition System By MatlabAnkit Gujrati
 
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITIONSYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITIONcscpconf
 
Speaker recognition on matlab
Speaker recognition on matlabSpeaker recognition on matlab
Speaker recognition on matlabArcanjo Salazaku
 
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...IDES Editor
 
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITIONSYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITIONcsandit
 
IRJET- Emotion recognition using Speech Signal: A Review
IRJET-  	  Emotion recognition using Speech Signal: A ReviewIRJET-  	  Emotion recognition using Speech Signal: A Review
IRJET- Emotion recognition using Speech Signal: A ReviewIRJET Journal
 
A comparison of different support vector machine kernels for artificial speec...
A comparison of different support vector machine kernels for artificial speec...A comparison of different support vector machine kernels for artificial speec...
A comparison of different support vector machine kernels for artificial speec...TELKOMNIKA JOURNAL
 
Automatic speech emotion and speaker recognition based on hybrid gmm and ffbnn
Automatic speech emotion and speaker recognition based on hybrid gmm and ffbnnAutomatic speech emotion and speaker recognition based on hybrid gmm and ffbnn
Automatic speech emotion and speaker recognition based on hybrid gmm and ffbnnijcsa
 
Audio/Speech Signal Analysis for Depression
Audio/Speech Signal Analysis for DepressionAudio/Speech Signal Analysis for Depression
Audio/Speech Signal Analysis for Depressionijsrd.com
 
Emotion Recognition Based On Audio Speech
Emotion Recognition Based On Audio SpeechEmotion Recognition Based On Audio Speech
Emotion Recognition Based On Audio SpeechIOSR Journals
 
Speech Recognized Automation System Using Speaker Identification through Wire...
Speech Recognized Automation System Using Speaker Identification through Wire...Speech Recognized Automation System Using Speaker Identification through Wire...
Speech Recognized Automation System Using Speaker Identification through Wire...IOSR Journals
 
Speech Recognized Automation System Using Speaker Identification through Wire...
Speech Recognized Automation System Using Speaker Identification through Wire...Speech Recognized Automation System Using Speaker Identification through Wire...
Speech Recognized Automation System Using Speaker Identification through Wire...IOSR Journals
 
Speaker Identification & Verification Using MFCC & SVM
Speaker Identification & Verification Using MFCC & SVMSpeaker Identification & Verification Using MFCC & SVM
Speaker Identification & Verification Using MFCC & SVMIRJET Journal
 
A novel automatic voice recognition system based on text-independent in a noi...
A novel automatic voice recognition system based on text-independent in a noi...A novel automatic voice recognition system based on text-independent in a noi...
A novel automatic voice recognition system based on text-independent in a noi...IJECEIAES
 
Frequency based criterion for distinguishing tonal and noisy spectral components
Frequency based criterion for distinguishing tonal and noisy spectral componentsFrequency based criterion for distinguishing tonal and noisy spectral components
Frequency based criterion for distinguishing tonal and noisy spectral componentsCSCJournals
 
Comparative Study of Different Techniques in Speaker Recognition: Review
Comparative Study of Different Techniques in Speaker Recognition: ReviewComparative Study of Different Techniques in Speaker Recognition: Review
Comparative Study of Different Techniques in Speaker Recognition: ReviewIJAEMSJORNAL
 
Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...
Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...
Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...TELKOMNIKA JOURNAL
 

Similar to Speaker and speech recognition for smart home security (20)

Intelligent Arabic letters speech recognition system based on mel frequency c...
Intelligent Arabic letters speech recognition system based on mel frequency c...Intelligent Arabic letters speech recognition system based on mel frequency c...
Intelligent Arabic letters speech recognition system based on mel frequency c...
 
Speech Recognition System By Matlab
Speech Recognition System By MatlabSpeech Recognition System By Matlab
Speech Recognition System By Matlab
 
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITIONSYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
 
Speaker recognition on matlab
Speaker recognition on matlabSpeaker recognition on matlab
Speaker recognition on matlab
 
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
 
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITIONSYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
 
IRJET- Emotion recognition using Speech Signal: A Review
IRJET-  	  Emotion recognition using Speech Signal: A ReviewIRJET-  	  Emotion recognition using Speech Signal: A Review
IRJET- Emotion recognition using Speech Signal: A Review
 
A comparison of different support vector machine kernels for artificial speec...
A comparison of different support vector machine kernels for artificial speec...A comparison of different support vector machine kernels for artificial speec...
A comparison of different support vector machine kernels for artificial speec...
 
Automatic speech emotion and speaker recognition based on hybrid gmm and ffbnn
Automatic speech emotion and speaker recognition based on hybrid gmm and ffbnnAutomatic speech emotion and speaker recognition based on hybrid gmm and ffbnn
Automatic speech emotion and speaker recognition based on hybrid gmm and ffbnn
 
Audio/Speech Signal Analysis for Depression
Audio/Speech Signal Analysis for DepressionAudio/Speech Signal Analysis for Depression
Audio/Speech Signal Analysis for Depression
 
Emotion Recognition Based On Audio Speech
Emotion Recognition Based On Audio SpeechEmotion Recognition Based On Audio Speech
Emotion Recognition Based On Audio Speech
 
Speech Recognized Automation System Using Speaker Identification through Wire...
Speech Recognized Automation System Using Speaker Identification through Wire...Speech Recognized Automation System Using Speaker Identification through Wire...
Speech Recognized Automation System Using Speaker Identification through Wire...
 
Speech Recognized Automation System Using Speaker Identification through Wire...
Speech Recognized Automation System Using Speaker Identification through Wire...Speech Recognized Automation System Using Speaker Identification through Wire...
Speech Recognized Automation System Using Speaker Identification through Wire...
 
Speaker Identification & Verification Using MFCC & SVM
Speaker Identification & Verification Using MFCC & SVMSpeaker Identification & Verification Using MFCC & SVM
Speaker Identification & Verification Using MFCC & SVM
 
N017428692
N017428692N017428692
N017428692
 
Speaker Recognition Using Vocal Tract Features
Speaker Recognition Using Vocal Tract FeaturesSpeaker Recognition Using Vocal Tract Features
Speaker Recognition Using Vocal Tract Features
 
A novel automatic voice recognition system based on text-independent in a noi...
A novel automatic voice recognition system based on text-independent in a noi...A novel automatic voice recognition system based on text-independent in a noi...
A novel automatic voice recognition system based on text-independent in a noi...
 
Frequency based criterion for distinguishing tonal and noisy spectral components
Frequency based criterion for distinguishing tonal and noisy spectral componentsFrequency based criterion for distinguishing tonal and noisy spectral components
Frequency based criterion for distinguishing tonal and noisy spectral components
 
Comparative Study of Different Techniques in Speaker Recognition: Review
Comparative Study of Different Techniques in Speaker Recognition: ReviewComparative Study of Different Techniques in Speaker Recognition: Review
Comparative Study of Different Techniques in Speaker Recognition: Review
 
Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...
Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...
Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...
 

Recently uploaded

CS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdfCS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdfBalamuruganV28
 
Uk-NO1 kala jadu karne wale ka contact number kala jadu karne wale baba kala ...
Uk-NO1 kala jadu karne wale ka contact number kala jadu karne wale baba kala ...Uk-NO1 kala jadu karne wale ka contact number kala jadu karne wale baba kala ...
Uk-NO1 kala jadu karne wale ka contact number kala jadu karne wale baba kala ...Amil baba
 
Introduction to Artificial Intelligence: Intelligent Agents, State Space Sear...
Introduction to Artificial Intelligence: Intelligent Agents, State Space Sear...Introduction to Artificial Intelligence: Intelligent Agents, State Space Sear...
Introduction to Artificial Intelligence: Intelligent Agents, State Space Sear...shreenathji26
 
STATE TRANSITION DIAGRAM in psoc subject
STATE TRANSITION DIAGRAM in psoc subjectSTATE TRANSITION DIAGRAM in psoc subject
STATE TRANSITION DIAGRAM in psoc subjectGayathriM270621
 
The Satellite applications in telecommunication
The Satellite applications in telecommunicationThe Satellite applications in telecommunication
The Satellite applications in telecommunicationnovrain7111
 
70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical training70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical trainingGladiatorsKasper
 
Curve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptxCurve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptxRomil Mishra
 
22CYT12 & Chemistry for Computer Systems_Unit-II-Corrosion & its Control Meth...
22CYT12 & Chemistry for Computer Systems_Unit-II-Corrosion & its Control Meth...22CYT12 & Chemistry for Computer Systems_Unit-II-Corrosion & its Control Meth...
22CYT12 & Chemistry for Computer Systems_Unit-II-Corrosion & its Control Meth...KrishnaveniKrishnara1
 
Detection&Tracking - Thermal imaging object detection and tracking
Detection&Tracking - Thermal imaging object detection and trackingDetection&Tracking - Thermal imaging object detection and tracking
Detection&Tracking - Thermal imaging object detection and trackinghadarpinhas1
 
multiple access in wireless communication
multiple access in wireless communicationmultiple access in wireless communication
multiple access in wireless communicationpanditadesh123
 
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.elesangwon
 
Javier_Fernandez_CARS_workshop_presentation.pptx
Javier_Fernandez_CARS_workshop_presentation.pptxJavier_Fernandez_CARS_workshop_presentation.pptx
Javier_Fernandez_CARS_workshop_presentation.pptxJavier Fernández Muñoz
 
Novel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending ActuatorsNovel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending ActuatorsResearcher Researcher
 
"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...Erbil Polytechnic University
 
Uk-NO1 Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Exp...
Uk-NO1 Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Exp...Uk-NO1 Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Exp...
Uk-NO1 Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Exp...Amil baba
 
tourism-management-srs_compress-software-engineering.pdf
tourism-management-srs_compress-software-engineering.pdftourism-management-srs_compress-software-engineering.pdf
tourism-management-srs_compress-software-engineering.pdfchess188chess188
 
Immutable Image-Based Operating Systems - EW2024.pdf
Immutable Image-Based Operating Systems - EW2024.pdfImmutable Image-Based Operating Systems - EW2024.pdf
Immutable Image-Based Operating Systems - EW2024.pdfDrew Moseley
 
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptx
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptxTriangulation survey (Basic Mine Surveying)_MI10412MI.pptx
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptxRomil Mishra
 

Recently uploaded (20)

CS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdfCS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdf
 
Uk-NO1 kala jadu karne wale ka contact number kala jadu karne wale baba kala ...
Uk-NO1 kala jadu karne wale ka contact number kala jadu karne wale baba kala ...Uk-NO1 kala jadu karne wale ka contact number kala jadu karne wale baba kala ...
Uk-NO1 kala jadu karne wale ka contact number kala jadu karne wale baba kala ...
 
Introduction to Artificial Intelligence: Intelligent Agents, State Space Sear...
Introduction to Artificial Intelligence: Intelligent Agents, State Space Sear...Introduction to Artificial Intelligence: Intelligent Agents, State Space Sear...
Introduction to Artificial Intelligence: Intelligent Agents, State Space Sear...
 
STATE TRANSITION DIAGRAM in psoc subject
STATE TRANSITION DIAGRAM in psoc subjectSTATE TRANSITION DIAGRAM in psoc subject
STATE TRANSITION DIAGRAM in psoc subject
 
The Satellite applications in telecommunication
The Satellite applications in telecommunicationThe Satellite applications in telecommunication
The Satellite applications in telecommunication
 
70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical training70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical training
 
Curve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptxCurve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptx
 
22CYT12 & Chemistry for Computer Systems_Unit-II-Corrosion & its Control Meth...
22CYT12 & Chemistry for Computer Systems_Unit-II-Corrosion & its Control Meth...22CYT12 & Chemistry for Computer Systems_Unit-II-Corrosion & its Control Meth...
22CYT12 & Chemistry for Computer Systems_Unit-II-Corrosion & its Control Meth...
 
Detection&Tracking - Thermal imaging object detection and tracking
Detection&Tracking - Thermal imaging object detection and trackingDetection&Tracking - Thermal imaging object detection and tracking
Detection&Tracking - Thermal imaging object detection and tracking
 
Versatile Engineering Construction Firms
Versatile Engineering Construction FirmsVersatile Engineering Construction Firms
Versatile Engineering Construction Firms
 
multiple access in wireless communication
multiple access in wireless communicationmultiple access in wireless communication
multiple access in wireless communication
 
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
 
Javier_Fernandez_CARS_workshop_presentation.pptx
Javier_Fernandez_CARS_workshop_presentation.pptxJavier_Fernandez_CARS_workshop_presentation.pptx
Javier_Fernandez_CARS_workshop_presentation.pptx
 
Novel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending ActuatorsNovel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending Actuators
 
"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...
 
Uk-NO1 Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Exp...
Uk-NO1 Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Exp...Uk-NO1 Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Exp...
Uk-NO1 Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Exp...
 
tourism-management-srs_compress-software-engineering.pdf
tourism-management-srs_compress-software-engineering.pdftourism-management-srs_compress-software-engineering.pdf
tourism-management-srs_compress-software-engineering.pdf
 
Immutable Image-Based Operating Systems - EW2024.pdf
Immutable Image-Based Operating Systems - EW2024.pdfImmutable Image-Based Operating Systems - EW2024.pdf
Immutable Image-Based Operating Systems - EW2024.pdf
 
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptx
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptxTriangulation survey (Basic Mine Surveying)_MI10412MI.pptx
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptx
 
Designing pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptxDesigning pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptx
 

Speaker and speech recognition for smart home security

  • 1. International Conference on Advance Research in Computer Science, Electrical and Electronics Engineering Sep 7, 2013 Pattaya 5 SPEAKER AND SPEECH RECOGNITION FOR SECURED SMART HOME APPLICATION R. Gomes1, S. Shaji2, L. Nadar2, V. Vincent2 Dept. of Electronics and Telecommunication Xavier Institute of Engineering, University of Mumbai Mahim (W), Mumbai-400016, Maharashtra, India 1write2roger.gomes@gmail.com 2be.speaker.recognition@gmail.com S. Patnaik Dept. of Electronics and Telecommunication Xavier Institute of Engineering, University of Mumbai Mahim (W), Mumbai-400016, Maharashtra, India suprava.patnaik@xavierengg.com ABSTRACT The concept of a smart home refers to the idea of having intelligent devices surrounding us responding to our various needs as and when the situation arises for e.g. switching on/off of lights and fans when an individual enters or leaves a room, automatic adjustment of the temperature of a room depending on the ambient temperature etc. In the context of a smart home an individual’s interaction with all the electrical appliances is crucial giving him complete control and freedom to control all the devices at home. However, with this control a question of security arises. An individual at his home would want access to all the devices restricted to only his family members and friends. To address the above simultaneous demand of security (e.g. operation by family members only) and automation (remote operation of multiple devices), in this paper we present a concept of speaker recognition for security and speech recognition for home appliances automation. The goal is design and implementation of a text independent speaker recognition based on Mel-frequency Cepstrum Coefficients (MFCCs) and Vector Quantization (VQ) algorithm for security integrated with a speaker independent speech recognition using Dynamic time warping (DTW) algorithm for home appliances automation. KEYWORDS: Automation, Security, Speaker Recognition, Speech Recognition, Mel Frequency Cepstrum Coefficients (MFCCs), Vector Quantization (VQ), Dynamic Time Warping (DTW) I. INTRODUCTION The human speech signal contains many discriminative features. These features are unique to every individual and serve as a biometric parameter which can be used by robust voice based biometric systems to correctly verify an individual‟s identity [1]. Unlike other biometric parameters like fingerprint and iris, voice based biometrics presents the advantage of remotely accessing systems through the telephone network, this makes it quite valuable in real time applications of authentication and authorization over a large distance [2]. Speaker recognition typically is the process of automatically recognizing who is speaking on the basis of information obtained from his speech. This technique will make it possible to verify the identity of a person accessing the system [2]. In the context of automation in a smart home only an authorized user must be given access to control all the devices and appliances at home. In this case, for authenticating a user we use text independent speaker recognition. Once access to the system has been granted to the authenticated user, all the appliances and device connected to the system must be under his control. In order to accomplish this task we use isolated word speech recognition for correctly identifying the uttered words by matching it with the reference templates stored in the database. The proposed system in this paper involves three phases. The first phase is the speaker recognition phase to authenticate the user, the second phase is the speech recognition phase to identify the word spoken by the user for the purpose of automation and the third phase is the device control phase which involves serially communicating the results of identification to PIC16F676 to toggle the status of the devices connected to it. II. SPEAKER RECOGNITION Speaker recognition is the method of automatically identify who is speaking on the basis of individual information integrated in speech waves [2]. The process of speaker recognition involves two phases, the testing and the training phase. Both these phases involve extracting the features vectors and its matching. This is possible using MFCC algorithm and feature matching using VQ and its optimization with Linde, Buzo and Gray (LBG) algorithm. Fig. 1 Block Diagram of MFCC Processor [3] A. Mel-frequency Cepstrum Coefficients The Mel-Frequency Cepstrum (MFC) is a representation of short-term power spectrum of a sound. The MFCCs are coefficients that collectively make up an MFC. They are derived from a type of cepstral representation of the audio clip (a nonlinear "spectrum- of-a-spectrum") [3]. The difference between the
  • 2. 6 cepstrum and the mel-frequency cepstrum is that in the MFC, the frequency bands are equally spaced on the mel scale, which approximates the human auditory system's response more closely than the linearly-spaced frequency bands used in the normal cepstrum [1]. 1) Frame Blocking: It has been assumed that over a long interval of time speech signal is not stationary, however over a sufficiently short interval of time say 10-30ms it can be considered stationary. In frame blocking, the continuous speech signal is blocked into frames of N samples, with adjacent frames being separated by M (M < N).The first frame consists of the first N samples. The second frame begins M samples after the first frame, and overlaps it by N - M samples [3]. Similarly, the third frame begins 2M samples after the first frame (or M samples after the second frame) and overlaps it by N - 2M samples. Typical values for N and M are N = 256 (which is equivalent to ~ 30ms windowing and facilitate the fast radix-2 FFT) and M = 100 [1, 3]. 2) Windowing: To minimize the signal discontinuities at the beginning and end of each frame the concept of windowing is used to minimize the spectral distortion to taper the signal to zero at the beginning and end of each frame. In other words, when we perform Fourier Transform, it assumes that the signal repeats, and the end of one frame does not connect smoothly with the beginning of the next one. In this process, we multiply the given signal (frame in this case) by a so called Window Function [3, 11]. There are many „soft windows‟ which can be used, but in our system Hamming window has been used, which has the form ( ) ( ) ( ) 3) Fast Fourier Transform (FFT): The next processing step is the Fast Fourier Transform, which converts each frame of N samples from the time domain into the frequency domain [3]. The FFT is a fast algorithm to implement the Discrete Fourier Transform (DFT) which is defined on the set of N samples Σ ( ) The result after this step is often referred to as spectrum or periodogram [5, 3]. 4) Mel-frequency wrapping: Psychophysical studies have shown that human perception of the frequency contents of sounds for speech signals does not follow a linear scale. Thus for each tone with an actual frequency, f, measured in Hz, a subjective pitch is measured on a scale called the „mel‟ scale. The mel-frequency scale is linear frequency spacing below 1000 Hz and a logarithmic spacing above 1000 Hz. As a reference point, the pitch of a 1 kHz tone, 40dB above the perceptual hearing threshold, is defined as 1000 mels [1, 3]. Therefore we can use the following approximate formula to compute the mels for a given frequency f in Hz: ( ) ( ) ( ) 5) Cepstrum: In this final step, we convert the log mel spectrum back to time. The result is called the mel frequency cepstrum coefficients (MFCC). The cepstral representation of the speech spectrum provides a good representation of the local spectral properties of the signal for the given frame analysis. Because the mel spectrum coefficients (and so their logarithm) are real numbers, we can convert them to the time domain using the Discrete Cosine Transform (DCT). Therefore if we denote those mel power spectrum coefficients that are the result of the last step are ̃ , K (4) We calculate the mfcc‟s as ̃ Σ ( ̃) [ ( ) ] ( ) By applying the procedure described above, for each speech frame of around 30msec with overlap, a set of mel-frequency cepstrum coefficients is computed [3, 4]. These are result of a cosine transform of the logarithm of the short-term power spectrum expressed on a mel- frequency scale. This set of coefficients is called an acoustic vector. Therefore each input utterance is transformed into a sequence of acoustic vectors. B. Feature matching using VQ The state-of-the-art in feature matching techniques used in speaker recognition includes DTW, Hidden Markov Modelling (HMM), and VQ. In this paper, the VQ approach is used, due to ease of implementation and high accuracy [2]. Vector Quantization is the classical quantization technique from signal processing which allows the modelling of probability density functions by the distribution of prototype vectors. It works by dividing a large set of points into groups having approximately the same number of points closest to them. Each group is represented by its centroid point. The density matching property of vector quantization is powerful, especially for identifying the density of large and high-dimensioned data. Since data points are represented by the index of their closest centroid, commonly occurring data have low error [1].
  • 3. 7 A vector quantizer maps k-dimensional vectors in the vector space Rk into a finite set of vectors Y = {yi : i = 1, 2,….N}. Each vector yi is called a code vector or a codeword and the set of all the code words is called a codebook. Associated with each codeword, yi, is a nearest neighbour region called Voronoi region, and it is defined by { } ( ) Given an input vector, the codeword that is chosen to represent it is the one in the same Voronoi region. Fig. 2 Codewords in 2-dimensional space. Input vectors are marked with an x, codewords are marked with circles, and the Voronoi regions are separated with boundary lines [1] The representative codeword is determined to be the closest in Euclidean distance from the input vector. The Euclidean distance is defined by ( ) √Σ( ) ( ) where, xj is the jth component of the input vector, and yij is the jth is component of the codeword yi [1]. C. Clustering of Training Vectors using LBG algorithm After the enrolment session, the acoustic vectors extracted from input speech of a speaker provide a set of training vectors. As described above, the next important step is to build a speaker-specific VQ codebook for this speaker using those training vectors. There is a well-known algorithm, namely LBG algorithm [Linde, Buzo and Gray, 1980], for clustering a set of L training vectors into a set of M codebook vectors [3]. The algorithm is formally implemented by the following recursive procedure: 1. Design a 1-vector codebook; this is the centroid of the entire set of training vectors (hence, no iteration is required here). 2. Double the size of the codebook by splitting each current codebook yn according to the rule ( ) ( ) Where n varies from 1 to the current size of the codebook, and ε is a splitting parameter (we choose ε=0.01). 3. Nearest-Neighbor Search: for each training vector, find the codeword in the current codebook that is closest (in terms of similarity measurement), and assign that vector to the corresponding cell (associated with the closest codeword). 4. Centroid Update: update the codeword in each cell using the centroid of the training vectors assigned to that cell. 5. Iteration 1: repeat steps 3 and 4 until the average distance falls below a preset threshold. 6. Iteration 2: repeat steps 2, 3 and 4 until a codebook size of M is designed [3]. III. SPEECH RECOGNITION Speech Recognition is the ability of a computer to recognize general, naturally flowing utterances from a wide variety of users [10]. Speaker independent isolated word recognition for the purpose of automation in a smart home has been described in this paper. The process of isolated word recognition involves acquisition of the speech sequence of the word uttered by the user. This then followed by the extraction of MFCC‟s or the acoustic feature vectors which is exactly similar to the processes employed in speaker recognition described in the above section. This then followed by the DTW algorithm to identify the correctly uttered word. A. Dynamic Time Warping DTW algorithm is based on Dynamic Programming techniques as described in [10]. This algorithm is for measuring similarity between two time series which may vary in time or speed. This technique also used to find the optimal alignment between two times series if one time series may be “warped” non-linearly by stretching or shrinking it along its time axis. This warping between two time series can then be used to find corresponding regions between the two time series or to determine the similarity between the two time series [11]. The principle of DTW is to compare two dynamic patterns and measure its similarity by calculating a minimum distance between them. The classic DTW is computed as below. Suppose we have two time series Q and C, of length n and m respectively, where: Q= q1,q2,q3….qi….qn (8) C=c1,c2,c3.....cj…...cm, (9) To align two sequences using DTW, an n-by-m matrix where the (ith, jth) element of the matrix contains the distance d (qi, cj) between the two points qi and cj is constructed [10]. Then, the absolute distance between
  • 4. 8 the values of two sequences is calculated using the Euclidean distance computation: d (qi , cj) = (qi - cj)2 (10) Each matrix element (i, j) corresponds to the alignment between the points qi and cj. Then, accumulated distance is measured by: D(i, j) =min[ D(i-1, j-1), D(i-1, j) ,D(i, j-1) ] + d(i, j) (11) Using dynamic programming techniques, the search for the minimum distance path can be done in polynomial time P(t), using equation below: P(t)=O(N2 V) (12) where, N is the length of the sequence, and V is the number of templates to be considered [11]. Theoretically, the major optimizations to the DTW algorithm arise from observations on the nature of good paths through the grid. These are outlined in Sakoe and Chiba [11,12] and can be summarized as: Monotonic condition, Continuity Condition, Boundary Condition, Adjustment window condition and Slope constraint condition. IV. SYSTEM ARCHITECTURE The application of speaker and speech recognition in our proposed smart home system is shown in figure 7. Fig. 3 Process flow of the proposed smart home system As described in figure 7 a prospective user must first be authenticated to use the system, his speech sequences are first acquired and analyzed using MFCC and VQ LBG if it matches with the speaker templates then the user is granted access. The next phase is the automation phase, the authenticated user utters the name of the device/appliance he wants to use, provided the reference template of the word is stored and the device is connected to the system. DTW algorithm insures robust matching with the reference templates and on correct recognition passes on the results acquired to the PIC16F676 microcontroller using the RS232 standard communication protocol. On receiving the appropriate signals of the correctly recognized device/appliance, its current status would be toggled. A. Experimental Setup As it can be seen from figure 7, the basic experimental setup consists of mic which captures the utterances from the user. Processing of the speech is done by the Matlab Scripts which involves feature extraction using MFCC, Feature matching and optimization using VQ and LBG respectively, followed by isolated word recognition using DTW. The phases of speaker and speech recognition are carried out in Matlab following which the results of authentication and identification are serially communicated to the PIC16F676 microcontroller. Computer mic Light Bulb PIC16F676 based RS232 Relay board Fig. 4 Experimental set up for speaker and speech recognition based device control B. PIC16F676 based RS232 Relay Board The PIC16F676 microcontroller has been used in our system for communicating with Matlab to acquire the results of the recognized word using the RS232 communications protocol. Interfacing with various devices in our system has been accomplished by making provisions for an array of relays. Acquisition of Speech Sequence from the prospective user Analysis of the Speech Sequence for Authentication Speech Feature Extraction using MFCC Speech Feature matching with the models in the database using VQ LBG Perform Speech Recognition using DTW Grant of access to the authenticated user for controlling devices using speech recognition Acquire uttered speech sequence and extraction of acoustic feature vectors(MFCC) Recognition of the uttered word using DTW Serially communicate the recognized word to PIC16F676 using RS232 communication protocol Toggle the current status of the corresponding device connected to the microcontroller via a relay
  • 5. 9 Light Bulb 8 Relays ULN2803 PIC16F676 LM7805 Fig. 5 PIC16F676 based RS232 Relay Board As shown in figure 9, our system provides provision for 8 devices as 8 relays are connected to the PIC16F676 microcontroller, these are in turn driven by ULN2803 high voltage, high current Darlington arrays for providing the necessary switching signals to the relays. V. RESULTS The Speaker and Speech recognition algorithms were successfully implemented in matlab. Speech feature vector extraction using MFCC and feature matching using VQ LBG have been successfully implemented in matlab for speaker recognition thus fulfilling the objective of authenticating a user. The figures below describe the results obtained. Fig. 6 Plot of mel-spaced filterbanks Fig. 7 Plot of VQ codewords Fig. 8 Results of successful Authentication Fig. 9 Results of successful word Identification VI. CONCLUSION The implemented speaker recognition system was found to have an accuracy of 80% Accuracy is compromised if conditions like duration of silence, ambient noise content, emotional and physical health of the speaker vary during training and testing period. Thus we have to ensure that these conditions remain same during both the training and testing phases. The accuracy of speaker recognition could be improved by using a larger database of samples for training purposes. These samples may be taken under varying conditions and thus can present a complete representation of the trained speaker during training. The implemented DTW based speech recognition system was found to have a high accuracy of 90%. The recognition was followed by communication of the results to the PIC16F676 microcontroller serially thus switching on/off of the device connected to it. Thus, the objective of security in a smart home by authenticating a user using speaker recognition and automation in a smart home using speech recognition have been achieved and presented in this paper. REFERENCES 1) Vibha Tiwari, “MFCC and its Application in Speaker Recognition”, International Journal on Emerging Technologies,ISSN: 0975-8364, Feb 2010
  • 6. 1 0 2) S. J. Abdallaha, I. M. Osman, M. E. Mustafa, “Text-Independent Speaker Identification Using Hidden Markov Model” World of Computer Science and Information Technology Journal (WCSIT) , ISSN: 2221-0741, Vol. 2, No. 6, 203- 208, 2012 3) Ch.Srinivasa Kumar et al., “Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm”, International Journal on Computer Science and Engineering (IJCSE), ISSN: 0975-3397, Vol 3 No: 8, August 2011 4) Srinivasan,”Speaker Identification and Verification using Vector Quantization and Mel Frequency Cepstral Coefficients” Research Journal of Applied Sciences, Engineering and Technology, ISSN:2040- 7467, 4(1): 33-40, 2012 5) Anjali Bala et al. , ”Voice Command recognition system on MFCC and DTW”, International Journal of Engineering Science and Technology, ISSN:0975-5462, Vol. 2 (12), 2010, 6) D. Subudhi, A.K. Patra, N. Bhattacharya, and P. Kuanar, “Embedded System Design of a Remote Voice Control and Security System”, TENCON 2008-2008 Region 10 Conference 7) Ian McLoughlin, “Applied Speech and Audio Signal Processing”, Cambridge University Press, 2009 8) Jacob Benesty, M. Mohan Sondhi, Yiteng Huang(Eds.),”Springer Handbook of Speech Processing” 9) A Thakur, “Design of a Matlab based Automatic Speaker Recognition and Control System”, International journal of Advanced engineering Sciences and Technologies, ISSN: 2230-7818, Vol no 8, Issue no 1, 100-1 10) B Plannener, “Introduction to Speech Recognition” March 2005, www.speech-recognition .de accessed on 25th April 2013 11) L Muda, M Begam and L Elamvazuthi, “Voice Recognition Algorithms using MFCC and DTW Techniques” Journal of Computing, volume 2 , issues 3, March 2010 12) Steve Cassidy, “Speech Recognition: Chapter 11: Pattern Matching in Time”, http://web.science.mq .edu.au/~cassidy/comp449/html/ch11s02.html, Accessed on 24th April 2013
  • 7. 11