Dynamic time warping and PIC 16F676 for control of devices

901 views

Published on

A presentation done as a part of the final year project during Semester 8 in the under-graduate degree course in engineering.

This presentation explains one of the modules of the project "Speaker and Speech Recognition based Embedded System Design for User Authentication and remote Device Control" which is the Speech Recognition Module.

It effectively explains the Dynamic Time Warping Algorithm used for Speech Recognition and how that is further used along with PIC 16F676 Microcontroller to acquire control of remote devices connected to the system.

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
901
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
25
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Dynamic time warping and PIC 16F676 for control of devices

  1. 1. 1st • Introduction • Proposed System Overview • A Simple Speech Recognition System and its Types • Acquisition of Speech Signal and its Analysis • Dynamic Time Warping Algorithm for Digit Recognition 2nd • Introduction • RS-232-C and Serial Communication with MatlabR2011b • Serial Communications with PIC 16F676 for Device Control • Interfacing Circuit Schematics and Design 3rd • Summary • Conclusion and Results • Future Work
  2. 2. Part 1 Introduction Proposed System Overview Speech Recognition and its Types Acquisition of Speech Signal and its Analysis Dynamic Time Warping (DTW) DTW for Digit Recognition
  3. 3. Discussion So far was with Reference to Implementation of Speaker Recognition for the process of user Authentication Goal of the project is to provide access to the Authenticated user to control the devices connected to the System Speaker Recognition  Speech Recognition  Device Control The control of the devices would be via recognition of the device Id (digits from 1 to 8) connected to the system The Recognition of the device id is accomplished using DTW Algorithm based Speaker Independent Isolated word Recognition
  4. 4. )
  5. 5. : Recording Training Sequences MFCC Feature Extraction Speaker Model Monitoring Microphone MFCC Feature Extraction Calculate VQ Make Decision and Display Results Monitoring Microphone MFCC Feature Extraction DTW based matching Toggle Device status
  6. 6. DTW Algorithm is based on Dynamic Programming, which is nothing but a systematic process of comparing 2 sequences of acoustic feature vectors It is used for measuring 2 time series which may vary in time or Speed Our speech is represented by a series of feature vectors that are computed every 10ms This technique is used to find optimal assignment between 2 time series of acoustic feature vectors If one of the time series is “warped” non-linearly by stretching or shrinking along its time axis then this technique of obtaining time frames of comparable length is called “Time Warping”
  7. 7. Whole words comprises of dozens of feature vectors. The no of vectors depends upon how fast we speak. Let us consider an example of a word ‘ w ’ having a vector sequence x̂ which is to be compared with a known seq. ŵ We need to measure the distances between these vector sequences to determine its similarity During the computation of distances we need to assign a “Optimal Assignment” between the individual vector pairs and also compute distances between the pairs However words with different lengths of sequence vectors needs to be taken into consideration for that pupose consider the following diagram
  8. 8. • The length Lp of the path is determined by max. no of vectors in x̂ and ŵ • The assignment between x̂ and ŵ as given by P and it can be interpreted as time warping between the time axes of x and w • Thus by time warping different length of vector sequences can be cmpensated • For a given path P the distances between vector sequences can now be computed as the sum of the distances between individual vectors • d(gl) denotes the vector distance for the time indices i and j defined by the grid point gl={I,j} this distance would be the Euclidian distance
  9. 9. • The criterium of finding the optimal path Popt os to minimize the distance D(x̂,ŵ, P) • However it is not necessary to compute all the paths P and the corresponding distances D to determine which is the optimum • Since feature vectors are measured in short time intervals we restrict time warping to reasonable boundaries. For this pupose we need to understand local path alternatives • The first and last vectors of X and W should be assigned to each other • To locally wrap the duration of the speech signal we “reuse” the preceding vectors to restrict time warping, with these restrictions we can draw local path alternatives • The grid pt. (i,j) can have the possible predecessor path (i − 1, j) ; (i − 1, j − 1) ; (i, j − 1) • Popt will be the concatenation of these local path alternatives
  10. 10. • Now that we have defined the local pathalternatives we can use Bellman’s principle to find the optimal path Popt • Bellman’s principle states the following: If Popt is the optimal path through the matrix of grid points beginning at (0, 0) and ending at (TW −1, TX −1), and the grid point (i, j) is part of path Popt, then the partial path from (0, 0) to (i, j) is also part of Popt. • Only 3 possible predecessor paths: (i − 1, j) ; (i − 1, j − 1) ; (i, j − 1) • Now let us assume we have calculated the optimal paths considering the above 3 paths and its corresponding accumulated distance • We can mow find the optimal path from(0,0) to grid point (i,j) by selecting exactly the one path hypothesis which minimizes the accumulated distance • Since the decision for the best predecessor path hypothesis reduces the number of paths leading to grid point (i, j) to exactly one, it is also said that the possible path hypotheses are recombined during the optimization step. 𝛿 𝛿(i,j)
  11. 11. Initialization(0,0)  Iteration  Termination 𝛿(Tx-1,Tw-1)
  12. 12. 1st • Introduction • Proposed System Overview • A Simple Speech Recognition System and its Types • Acquisition of Speech Signal and its Analysis • Dynamic Time Warping Algorithm for Digit Recognition 2nd • Introduction • RS-232-C and Serial Communication with MatlabR2011b • Serial Communications with PIC 16F676 for Device Control • Interfacing Circuit Schematics and Design 3rd • Summary • Conclusion and Results • Future Work
  13. 13. 1. Introduction 2. RS-232-C Serial Communications with Matlab 3. Serial Communication with PIC16F676 for Device Control 4. Interfacing Circuits ad Schematics
  14. 14. • The RS-232-C convention specifies that, with respect to ground, a voltage more negative than -3 V is interpreted as a 1 bit and a voltage more positive than +3 V as a 0 bit. • Serial communications, according to RS-232-C, require that transmitter and receiver agree on a communications protocol.
  15. 15. Serial communications in MatlbR2011b is possible by writing scripts which initializes a special variable to keep track of serial connections – the Serial Object. Unlike normal variables which have a single value, objects have many "attributes" or parameters that can be set. (ex. port number, baud rate, buffer size, etc.) One of those attributes is the port number. A label that corresponds to which port your device is connected to. In order to send or receive data through the serial port object it must be open. When not in use it can be closed (not the same as deleting it). We can have many different serial objects in memory. They can all send and receive data at the same time as long as they are each on a different port. There can even be several objects associated with the same physical port. However, only one of those objects associated with a given port can actually be open (sending or receiving data) at any time.
  16. 16. a. Creating a Serial Port Object: serialPort = serial('com1') Resulting Intializations: 1.Serial Port Object : Serial-COM1 2.Communication Settings 3.Communication State Port: COM1 Status: closed BaudRate: 9600 RecordStatus: off 4.Terminator: 'LF' 5.Read/Write State TransferStatus: idle ValuesReceived: 0 BytesAvailable: 0 ValuesSent: 0 b. Setting the Parameters get(serialPort, 'baudrate') set(serialPort, 'BaudRate', 19200) ans =9600 get(serialPort, 'BaudRate') ans =19200
  17. 17. The method described previously is cumbersome if we have a lot of things that we want to change. A better way to to set them when you create the Serial object. serialPort_new = serial('com1', 'baudrate', 19200, 'terminator', 'CR') • Writing To The Serial Port Before we can write to the serial port, you need to open it: fopen(‘COM1’) • Writing Binary Data Use the command fwrite to send four bytes of binary data fwrite(COM1, [0, 12, 117, 251]); • Reading From The Serial Port You can use fread to read in data (not text). It can automatically format the data for you. Here is an example. Say the buffer currently has 2 bytes of data in it a = fread(serialObj, 2);% Will read two bytes and create a vector
  18. 18. Establish Serial Port Communication with Matlab Acquire Results of User Authentication Display Results of the Authenticated User Display the Speech Recognition Menu and accept the Device Id utterd by the authenticated User Send the Identified device ID via the Serial port to PIC to toggle the current status of the device Overview of the system
  19. 19. Registers use in Asynchronous Mode 1. The SPBRG register is set up for the selected baud rate. 2. Asynchronous reception is enabled by clearing the SYNC bit in the TXSTA register and setting the SPEN bit in the RCSTA register
  20. 20. 3. To enable the receive data interrupt, the RCIE, GIE, and PEIE bits must be set. 4. Reception is activated by setting the CREN bit in RCSTA. 5. When reception has concluded, the RCIF bit in the PIR1 register is set. 6. Received data is retrieved by reading RCREG. 7. If any error occurred the CREN bit must be cleared
  21. 21. 1st • Introduction • Proposed System Overview • A Simple Speech Recognition System and its Types • Acquisition of Speech Signal and its Analysis • Dynamic Time Warping Algorithm for Digit Recognition 2nd • Introduction • RS-232-C and Serial Communication with MatlabR2011b • Serial Communications with PIC 16F676 for Device Control • Interfacing Circuit Schematics and Design 3rd • Summary • Conclusion and Results • Future Work
  22. 22. In this Presentation all the Aspects involved in the process of Speaker and Speech Recognition and the various techniques used to achieve them have been discussed. Acquisition of Acoustic feature vectors and matching those vectors with existing models in the database using Vector quantization and optimizing it using the LBG algorithm and word identification using DTW have been dealt with. Serial communication between Matlab and PIC via the serial port using the RS-232-C standard is also presented and finally the process of granting access to the authenticated user for device control has been dealt with in this presentation.
  23. 23. User Speaker Recognition Speech Recognition Accuracy (Speaker/ Speech) Speaker Id No of attempts Correctly Recognized No of attempts CorrectLy Recognized 1 10 8 10 9 (80/90) 2 10 9 10 8 (90/80) 3 10 8 10 9 (80/90) 4 10 9 10 9 (90/90) Total 40 34 40 35 (85/86.25)
  24. 24. Insert a Class Id Speech s/g Duration, fs, no of bits per sec Speech S/g acquisition via mic using audiorecorder function Feature Extraction Using Mfcc (s,fs)Frame Blocking using Hamming Window Mel- frequency filter bank
  25. 25. Feature Matching using Vqlbg(d,k) Vq Codebook
  26. 26. Speech s/g Duration, fs, no of bits per sec Speech S/g acquisition via mic using audiorecorder function Feature Extraction Using Mfcc (s,fs) Frame Blocking using Hamming Window Mel- frequency filter bank Feature Matching using Vqlbg(d,k) Vq Codebook
  27. 27. Vq Codebook from Training Phase Vq Codebook from Testing Phase Comparison of Euclidian Distances User Id with Lowest Euclidian Distance is Authenticated
  28. 28. Creation of Reference Templates Path to separate folder is provided which has all the words to be recognized
  29. 29. Feature Extraction Calculation of lowest total Cost Comparison of Local Distance with all the stored words
  30. 30. Selection of Optimal path Sends the results of recognition word to COM port Signal(device id) received by PIC and the corresponding device is toggled
  31. 31. • The System proposed could be improved to a great extent by implementing more efficient models for speaker Identification such as Hidden Markov Models (HMM) This uses theory from statistics in order to (sort of) arrange our feature vectors into a Markov matrix (chains) that stores probabilities of state transitions. • Along with Speaker Recognition an added level of voice based biometric security could also be provided using Speech Recognition, that is after verifying who the user , acquire some specific keyword unique to the system.Also Integration of mobile phone based sytem access would mean controlling any system from almost anywhere in thee world. • The Fuzzy c-means clustering technique improves VQ performance at the classification stage. The FVQ performance can be improved more by using a fuzzy-based hierarchical clustering approach proposed by Haipeng. • The performance of GMM is better than the other classifiers, even though FVQ improves the ASR performance significantly when compared to the other VQ techniques. Additionalwork in the area of enhanced or alternative fuzzy clustering techniques is appropriate.

×