VoCoRoBo: Remote Speech Recognition and Tilt
Sensing Multi-Robotic System
Sagun Man Singh Shrestha1
, Labu Manandhar2
, Ri...
1.2.1 Speech Analysis
In speech recognition, the frequency content of the
detected word has to be analyzed. Several 4th
or...
The system is divided into two broad sub-
subsystems: Control Module and Multi-Robot
Module. The Control Module is further...
8 band pass filters to be encoded into a fingerprint.
The words to be matched are stored as fingerprints in
a dictionary s...
are generated using Matlab (Listing 1.2.1). Floating
point coefficients are converted to fixed point by
multiplying them b...
Figure 3.6: Transmitter Mode Payload
Out of three blocks, the first 16 bytes block holds the
data text to be sent from the...
makes use of a secret internal state which consists of
two parts:
 A permutation of all 256 possible bytes (denoted
"S" b...
4. RESULTS
4.1 Time domain waveform
The figure depicts different time domain waveforms
of the spoken word generated by Mat...
The accuracy of the speech recognition was within an
acceptable range of above 90% by our initial
expectations of the syst...
secure link with ARC4 cipher, all using ubiquitous
low-cost 8-bit microcontrollers. With an accuracy of
the speech recogni...
Upcoming SlideShare
Loading in …5
×

VoCoRoBo: Remote Speech Recognition and Tilt Sensing Multi-Robotic System

2,938 views

Published on

This work is based on the implementation of real-time speech recognition using DSP algorithms such as Chebyshev IIR filters, accelerometer for tilt-sensing and establishment of short-range wireless secure link with ARC4 cipher, all using low-cost 8-bit ATmega microcontrollers. The robot implements a simple but effective algorithm for comparing the spoken word with a dictionary of fingerprints using a modified Euclidean distance calculation. It also includes the ability to securely control the navigation of multiple robots located at remote locations wirelessly from the Control Module and also gather the various environmental data collected by the Robot Modules and display them in the back to Control. Considering the time-critical algorithms actually requiring large computations as well as a variety of sensors interfaced in the system, this project can demonstrate how one can build an expansible multi-robotic system from cheap and ubiquitous electronics.

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,938
On SlideShare
0
From Embeds
0
Number of Embeds
15
Actions
Shares
0
Downloads
8
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

VoCoRoBo: Remote Speech Recognition and Tilt Sensing Multi-Robotic System

  1. 1. VoCoRoBo: Remote Speech Recognition and Tilt Sensing Multi-Robotic System Sagun Man Singh Shrestha1 , Labu Manandhar2 , Ritesh Bhattarai3 Department of Electronics and Computer Engineering, Tribhuvan University – Kathmandu Engineering College, Nepal Gmail: 1 sagunms, 2 laburocks, 2 reittes | github.com/sagunms/vocorobo Abstract: This work is based on the implementation of real-time speech recognition using DSP algorithms such as Chebyshev IIR filters, accelerometer for tilt-sensing and establishment of short- range wireless secure link with ARC4 cipher, all using low-cost 8-bit ATmega microcontrollers. The robot implements a simple but effective algorithm for comparing the spoken word with a dictionary of fingerprints using a modified Euclidean distance calculation. It also includes the ability to securely control the navigation of multiple robots located at remote locations wirelessly from the Control Module and also gather the various environmental data collected by the Robot Modules and display them in the back to Control. Considering the time-critical algorithms actually requiring large computations as well as a variety of sensors interfaced in the system, this project can demonstrate how one can build an expansible multi-robotic system from cheap and ubiquitous electronics. Keywords: Speech Recognition, Chebyshev, Digital Signal Processing, Euclidean Distance, ARC4 Cryptography, ATMega16/32, nRF24L01+ Wireless Transceiver, MMA7260Q Accelerometer I. INTRODUCTION VoCoRoBo stands for Voice Controlled RoBot in which the user is capable of wirelessly controlling multiple robots with either a voice command or tilting the controls towards the desired direction. In addition to this, each robot also relays temperature and light sensor data securely back to the user station. 1.1 HARDWARE A microcontroller is an integrated circuit composed of a microprocessor unit, memory, and input/output peripheral devices. Atmel ATmega32/16 is a low- power CMOS 8-bit microcontroller based on the AVR RISC architecture which is used to implement the voice recognition, tilt-sensing, wireless and cryptography algorithms. An accelerometer measures proper magnitude and direction of acceleration experienced relative to freefall, and can be used to sense orientation. Controlling the robots with fun and intuitive tilt gestures was possible using the Freescale MMA7260Q 3-axis accelerometer. The two parts of the system – control and robot modules are linked wirelessly using the popular Nordic nRF24L0+ radio transceiver. It operates on 2.4 - 2.5 GHz ISM band, with air data rate up to 2Mbps, has ultra low power operation and is ideally suited for remote control and data acquisition. L293D H-bridge IC is a quad push- pull driver capable of delivering output currents up to 600mA per channel. To control each robot turning speeds simply by speed difference between wheels on either side, differential drive technique was used. 1.2 SOFTWARE Speech recognition is the process of converting an acoustic signal captured by microphone and then identifying the word from the sound. Due to speaker dependence, the system needs to be trained before use. Digital signal processing is concerned with the representation of signals by a sequence of numbers and their processing. Infinite impulse response is a property of signal processing systems having impulse response function that is non-zero over infinite length of time. An example of IIR filter are Chebyshev II filters having a steeper roll-off and more stop band ripple than Butterworth filters. They minimize the error between the idealized and the actual filter characteristic over the range of the filter.
  2. 2. 1.2.1 Speech Analysis In speech recognition, the frequency content of the detected word has to be analyzed. Several 4th order Chebyshev band pass filters are created by cascading two 2nd order filters using the following Direct Form II Transposed realization of difference equations. ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) Coefficients a’s and b’s used in the above equations was obtained using the following syntax in Matlab. [B,A] = cheby2(2,40,[Freq1, Freq2]); cheby2 designs Chebyshev Type II digital filter using the given specifications, 2 defines a 4th order filter, 40 defines the stop band ripple in dB, and Freq1 and Freq2 are the normalized cutoff frequencies. The tf2sos function is then used to convert the transfer function of the filter to a 2nd order section version. 1.2.2 Voice-fingerprint Calculation Due to the limited RAM on the ATMega32, the relevant information of each spoken word had to be encoded in the form of a ‘fingerprint’. To compare fingerprints, the following pseudo Euclidean distance formula was used between the fingerprint of stored and sampled word to find correct word. ∑| | where, P = (p1, p2, ..., pn) is the dictionary fingerprint and, Q = (q1, q2, ..., qn) is sampled word fingerprint. pi and qi are the fingerprint data points. To see if two words are the same, the distance between them are computed and the words with the minimum distance in the database are considered to be the matching word. Original Euclidean distance requires squaring the difference between two points. Fixed point arithmetic produces too large a number, causing the variables to overflow. Thus a modified formula was used by neglecting the square root and the square which practically showed satisfactory results. 1.2.3 ARC4 Cryptography ARC4 is one of the most widely used software stream ciphers in many encryption schemes, including WEP, WPA, and SSL. The main factors in ARC4's success over such a wide range of applications are its speed, simplicity and efficiency in software and hardware. 3. DESIGN AND IMPLEMENTATION 3.1 HARDWARE ARCHITECTURE Figure 3.1: Overall Hardware Architecture 2.4 GHz wireless link with 2 bytes (control byte + count byte) payload ATmega16 @ 8 MHz (nRF24L01+ wireless interface with ARC4 Cryptography) ATmega32 @ 16 MHz (Speech Recognition and MMA7260Q Tilt Sensing) Port C PB0-PB3 ADC (Port A) PD3-PD5 x y z Port C SPI (Port B) PA0-PA2 nRF24L01 Module LCD LEDs Port C SPI (Port B) PD0-PD3 LEDs L293D H-Bridge M M nRF24L01 Module ATmega16 @ 8 MHz (nRF24L01+ with ARC4 and H-Bridge interface)
  3. 3. The system is divided into two broad sub- subsystems: Control Module and Multi-Robot Module. The Control Module is further divided into two layers: the topmost layer and the second layer. 3.1.1 Control Module The topmost layer of the control module consists of ATMega32, where speech recognition, MMA7260Q accelerometer sensing, output to 16x2 text LCD are handled. The 2nd layer consists of ATMega16 where the nRF24L01 wireless routine as well as encryption and decryption with ARC4 cipher are implemented. The bridge protocol between the 1st and 2nd layers in the control module (Fig. 3.1 and 3.2) is designed such the three output pins of PORTD of ATMega32 viz. PD2, PD3 and PD4 are connected to the respective input pins of PORTA of ATMega16 viz. PA0, PA1 and PA3. When the 1st layer recognizes the spoken word (front, back, left, right or stop), the equivalent bit combination is inputted to PORTA of the 2nd layer via these bridge lines. The 2nd layer then sends out the corresponding control byte wirelessly via SPI port. When the one of the robot receives this control byte, it will be decoded into its matching differential drive motor combinations that will move the robot physically in the commanded direction. FUNCTION Equivalent received control byte PIN A (Connected to Layer 1) BINARY HEX PA2 PA1 PA0 STOP S 0 0 0 00H FRONT F 0 0 1 01H BACK B 0 1 0 02H LEFT L 0 1 1 03H RIGHT R 1 0 0 04H SPD_UP U 1 0 1 06H SPD_DN D 1 1 1 07H Table 3.1: Function control byte to be sent out via Wireless (SPI port) and corresponding bit combination inputted to the second layer of Control Module (PINA). 3.1.2 Robot Module It consists of two identical robots (A and B) which can be positioned at different locations, provided they are within the signal range of the Control Module. Each robot consists of an ATMega16 with sensors that take environmental data specifically, LM35 temperature sensor and a light dependent resistor. A 2.4 GHz wireless transceiver nRF24L01 is also available on-board to receive control data and transmit the remote data for data acquisition. With L293D H-Bridge driver, two differential drive motors are controlled independently so that the robot can navigate front, back, left or right. Four input pins of the L293D viz. IN1, IN2, IN3 and IN4 are connected to four output pins of PORTD of ATMega16 viz. PD0, PD1, PD2 and PD3 respectively. FUNCTION Equivalent received control byte PORT D (Connected to H- bridge) BINARY HEX (LSB) IN4 IN3 IN2 IN1 STOP S 0 0 0 0 00H FRONT F 0 1 1 0 06H BACK B 1 0 0 1 09H LEFT L 0 0 1 1 03H RIGHT R 1 1 0 0 04H SPD_UP U SPD_DN D Table 3.2: Function control byte received via Wireless (SPI port) and corresponding bit combinations outputted to H-bridge (PORTD). 3.2 SOFTWARE ARCHITECTURE 3.2.1 Input, Processing, Output Figure 3.2: Input, Processing and Output block diagram for speech recognition At a rate of 4 KHz, the algorithm checks the ADC input for audio signal. If the ADC value exceeds the threshold value, it is taken as the start of half a second long word. The sampled word passes through Speech ADC Band Pass Filters Generate Voice Fingerprints Fingerprint Templates Control Signals Output to the Robot COMPA RE
  4. 4. 8 band pass filters to be encoded into a fingerprint. The words to be matched are stored as fingerprints in a dictionary so that newly generated sampled fingerprints can be compared with them later. The modified Euclidean distance calculation finds the fingerprint that is the closest match and then sends a control signal ultimately to the robot to perform operations like left, right, front, back and stop. 3.2.2 Initial-Threshold Calculation All the background sound at the startup is considered to be a base value which improves the accuracy of the speech recognition. At the start up, the algorithm reads the ADC input using ATMega32 timercounter0 and accumulates its value 256 times. By interpreting the reading of the ADC value as a number between 1 to 1/256 in fixed point, and accumulating 256 times, the average ADC value is calculated without doing a multiply or divide. Three average values are taken each with a 16.4 ms delay between the samples. After this, the threshold value is to be four times the value of the median number. The threshold value is useful to detect whether a word has been spoken or not. 3.2.3 Voice-fingerprint Generation Figure 3.3: Filter implementation block diagram for the generation of fingerprints The program considers a word detected if a sample value from the ADC is greater than the threshold value. Every sample of ADC stored in an integer variable Ain which again passes through eight 4th order band pass filters for 2000 samples (half a second) once a word has been detected. When a filter is used its output is squared and that value is accumulated with the previous squares of the filter output. After 125 samples the accumulated value is stored as a data point in the fingerprint of that word. The accumulator is then cleared and the process is begun again. After 2000 samples 16 points have been generated from each filter, thus every sampled word is divided up into 16 parts. Our assembly language code is based around using 8 filters and since each one gives an output of 16 data points and every fingerprint is made up of 128 data points. 3.2.4 Filter Design and Implementation Figure 3.4: Band pass Filter 200-400 Hz Figure 3.5: Band pass Filter 1600-1800 Hz 3.2.5 Digital Filter Implementation The 4th order Chebyshev digital filter with 40 dB stop band was chosen due to very sharp transitions after the cutoff frequency. Most of the important frequency content in speech is found to be within the first 2 KHz as it usually contains the first and second speech formants. Thus 8 BPFs of frequencies ranging from .2 to 1.8 KHz were designed as shown in the magnitude and phase plot. This also permitted the sampling at 4 KHz (to satisfy the Nyquist criteria for sampling first 2 KHz voice frequencies) and enough time to implement 8 filters. For sufficient frequency resolution to properly identify words, bandwidth of each filter is set to 200 Hz. Each 4th order filter is created in assembly code by cascading two 2nd order IIR filters whose coefficients ADC FILTER 2 FILTER 9 ACCUMULATOR ACCUMULATOR VOICE FINGERPRINT
  5. 5. are generated using Matlab (Listing 1.2.1). Floating point coefficients are converted to fixed point by multiplying them by 256 and rounding off to nearest integer in real-time. Fixed point was used instead of floating point (which would have been more accurate) as floating point calculations of ATMega32 is too slow to call all the filters within 4 KHz. The ATMega32 only has 2 KB of RAM and a word sampled at 4 KHz for a half a second would require entire 2 KB. In order to make a fingerprint then from a word the ADC output has to pass through all the filters faster than the ADC sample time of 250 µs. The output of the filter was squared in order to store the intensity of the sound rather than just the amplitude. Since the lowest and highest frequencies could be neglected without noticeable degradation in accuracy of speech recognition and that the memory and cycle time of ATMega32 wouldn’t be sufficient to implement all ten filters, only 8 BPFs was sufficient to compartmentalize frequencies between 200 Hz - 1.8 KHz. 3.2.7.1 Chebyshev II filter coefficients # Filter 1 Filter 2 Filter 3 Filter 4 f,KHz 0.2 – 0.4 0.4 – 0.6 0.6 – 0.8 0.8 – 1 1st 2nd -order coeff. A112:451 A113:-248 B111:21 B112:-32 B113:21 A212:355 A213:-248 B211:27 B212:-29 B213:27 A312:224 A313:-248 B311:31 B312:-15 B313:31 A412:72 A413:-248 B411:34 B412:4 B413:34 2nd 2nd -order coeff. A122:458 A123:-248 B121:2225 B122:-4285 B123:2225 A222:366 A223:-248 B221:1090 B222:-1826 B223:1090 A322:239 A323:-248 B321:762 B322:-965 B323:762 A422:88 A423:-248 B421:633 B422:-464 B423:633 Gain G1=80 G2=120 G3=140 G4=160 Table 3.3: MATLAB filter coefficients for Chebyshev II (40 dB stop band) Filters 1-4 # Filter 5 Filter 6 Filter 7 Filter 8 f,KHz 1 – 1.2 1.2 – 1.4 1.4 – 1.6 1.6 – 1.8 1st 2nd -order coeff. A512:-72 A513:-248 B511:34 B512:-4 B513:34 A622:-239 A623:-248 B621:762 B622:965 B623:762 A712:-355 A713:-248 B711:27 B712:29 B713:27 A812:-451 A813:-248 B811:21 B812:32 B813:21 2nd 2nd -order coeff. A522:-88 A523:-248 B521:633 B522:464 B523:633 A622:458 A723:-248 B721:2225 B722:-4285 B723:2225 A722:-366 A723:-248 B721:1090 B722:1826 B723:1090 A822:-458 A823:-248 B821:2225 B822:4285 B823:2225 Gain G5=160 G6=140 G7=120 G8=80 Table 3.4: MATLAB filter coefficients for Chebyshev II (40 dB stop band) Filters 5-8 3.2.6 Wireless Packet Format The preamble byte composed of alternating zeros and ones is sent first, followed by five bytes address field. Data payload of user settable length (1-32 bytes) is sent next. Two versions of payload was implemented i.e. 2 bytes payload was primarily used having only the encrypted byte and a count byte, however for data acquisition from temperature and light sensors from a remote location, an 18 byte payload version was designed. The final part is the two byte long CRC. 3.2.7.2 Wireless Data Payload format The data payload for control module is of two types: transmitter mode and receiver mode payload, both having 18 bytes payload width. Control module has to transmit data packets to individual robots and also receive sensor data from replying robots. So it has to hold the entire payload of the each robot (two in our case) for both transmission and reception. Two 18 bytes char arrays data_tx1 and data_tx2 stores the transmission mode payload while the other two arrays data_rx1 and data_rx2 are for the receiver. Both the payload sizes are of PAYLOAD_SIZE (18 bytes) defined in the wireless routine of ATmega16.
  6. 6. Figure 3.6: Transmitter Mode Payload Out of three blocks, the first 16 bytes block holds the data text to be sent from the control to the robot modules. For inputting the text data, we use RealTerm to send the text from the computer to the MCU via UART for sending text messages to the individual robots at different locations. The control block is formed by the 1 byte data_control which stores the ASCII characters: ‘F’, ‘B’, ‘L’, ‘R’ and ‘S’, representing the control information for front, back, left, right and stop. When the targeted robot receives the control information in transmitter payload, it will interpret the ASCII control byte as the corresponding robot movement commands. The ARC4 cipher is used to encrypt the control byte and data text block. Being a stream cipher, the byte count must be kept up to date (missing a packet will result in an incorrect decryption from that point on), so a packet count byte is added with each packet that keeps a packet count. This allows the local unit to catch up to the correct byte in the PRGA (assuming targeted robot misses less than 256 packets in a row). Figure 3.7: Receiver Mode Payload Receiver mode payload is needed to receive the encrypted data payload sent by the individual robots, decrypt the encrypted block by syncing with the help of packet count, segregate the sensor readings of temperature (2 bytes), light (2 bytes) and speed setting (1 byte) and store it in their respective variables for data logging. For data acquisition, the payload is divided into temperature and light blocks. The readings from the two sensors in each robot are stored in their respective integer variables and sent to the control module in packet format. 3.2.7.3 Source and destination data pipe addressing Using switches connected to PD4 and PD5 of ATMega16, the user can select whether the control byte generated by roboControl function is directed to control bytes data_control1 or data_control2 which are concatenated to respective data packets for each robot. The user would thus be able to select to which robot the current command would be directed to. This technique would enable the realization of multi-robot control paradigm from a single control module. For implementing a minimalistic Star network topology, the receiving pipes of control module, Robot1 and Robot2 are 0, 1 and 2 respectively and the corresponding pipe addresses are E7:E7:E7:E7:E7, C2:C2:C2:C2:C2, C2:C2:C2:C2:C3. The rest of the five data pipes in each of the three linking modules are disabled to effectively block reception of packets destination was elsewhere. Prior to transmitting a data packet, the destination address should be set. Figure 3.8: Minimalistic Star Network Topology for establishing communication link between Control and Robot Agent modules and their respective destination multi-pipe addressing 3.2.7 ARC4 Cryptography ARC4 generates a pseudorandom stream of bits (keystream) which, for encryption, is combined with the plaintext using bit-wise xor; decryption is performed in the same way (since xor is a symmetric operation). To generate the keystream, the cipher Data text (data_text1, data_text2) (16 bytes) 17 16 15----------------------------------------------- 0 Packet count (1 byte) Control (1 byte) data_control Encrypted Block T T T T T T T T T T T T T T T T PAYLOAD_SIZE (18 Bytes) data_tx1 data_tx2 Padding bits (12 bytes) Light (2 bytes) P P P P P P P P P P P P L L T T 17 16 15 ------------------------ 4 3 2 1 0 Speed (1 byte) Packet count (1 byte) Temp (2 bytes) PAYLOAD_SIZE (18 Bytes) Encrypted Block E7:E7:E7:E7:E7 Pipe 0 Robotic Agent I Pipe 2Pipe 0 C2:C2:C2:C2:C2 Pipe 1 C2:C2:C2:C2:C3 Pipe 2 Pipe 1 Pipe 0 P5 P4 P3 P2 P1 P0 P5 P4 P3 P2 P1 P0 P5 P4 P3 P2 P1 P0 TX TX Robotic Agent II TX Control Module Communication Link (Pipe Destination)
  7. 7. makes use of a secret internal state which consists of two parts:  A permutation of all 256 possible bytes (denoted "S" below).  Two 8-bit index-pointers (denoted "i" and "j"). The permutation is initialized with a variable length key, typically between 40 and 256 bits, using the key- scheduling algorithm (KSA). After this, the stream of bits is generated using the pseudo-random generation algorithm (PRGA). The ARC4 cipher is implemented in conjugation with the wireless routine of ATMega16 of both control and robot modules. 3.2.8 MMA7260Q Tilt Sensing Figure 3.9: Overall accelerometer tilt sensing algorithm MMA7260Q has three sensor output pins viz. X, Y and Z connected to three of the ADC inputs viz. PA3, PA4 and PA5 of ATMega32. The robot functions (front, back, left and right) are controlled in either Speech or Accelerometer mode. In the latter, the tilt- sensing algorithm samples the X, Y, Z values for origin first into xyzOrigin, and rapidly stores the remaining into xyzADCArray. These arrays are used by the three decision blocks to determine the speeds in the individual directions. In the speed and decision block, once the speed either in positive or negative direction (depending on accelerometer orientation) is determined, decide whether the function to be interpreted is a front, back, left, right or stop. For this, the calculated speed in either X or Y has to exceed a predefined threshold, to consider the movement data valid. The decision of the command interpreted by the algorithm is sent to the roboControl function which conveys it ultimately to one of the robots. Figure 3.10: Flowchart showing xSpeed determination and decision making of robot functions (FRONT and BACK) Yes Samples X, Y, Z values for Origin into xyzOrigin array No STAR T Initialize: *Origin & Speed variables for x, y, z XOrizin = yOrigin = zOrigin = 0 XSpeed = ySpeed = zSpeed = 0 *Configure ADC pin = 3 to 5 *Initialize LCD Determine xSpeed, ySpeed and zSpeed (REFERENCE AXIS) Decision of robot function Send appropriate control signal ADC conversi on Store the remaining values into xyzADCArray No Yes Yes No Yes Yes No Yes Decision = FRONT Decisio n = STOP Decision = LT/RT Is xADCArr ay> xSpeed = xADCArray - xOrigin (+ve speed value) From xOrigin From xADCArray xSpeed = xOrigin - xADCArray (-ve speed value) Is xSpeed > threshold Decision = BACK AXIS= 1? Decisio n = Is xSpeed > threshold AXIS= 1? Send decision to roboControl function
  8. 8. 4. RESULTS 4.1 Time domain waveform The figure depicts different time domain waveforms of the spoken word generated by Matlab. The time duration of the spoken words front, left and right are approximately of 4s duration. The word back is of lowest duration of 2s due to which it is recognized with least accuracy relative to other five words while stop is of highest duration of 5s accuracy is highest. Figure 4.1: Time domain representation of Back Figure 4.2: Time domain representation of Stop 4.2 Frequency domain waveform These figures depict the spectral analysis (discreate fourier transform) of the sampled time domain data generated using Matlab. Figure 4.3: FFT of the word Back Figure 4.4: FFT of the word Stop 4.3 Dictionary data points for voice fingerprints Table 4.1: Dictionary data points for the word FRONT stored in the flash memory 128 data points for each of the five words are logged via RealTerm in similar manner during the training stage and stored as dictionary in the flash memory. 4.4 Speech Recognition Figure 4.5: Recognition Probability Comparison 85% 90% 95% 100% Front Back Left Right Stop 95% 90% 95% 95% 100% Recognition Probability Number of Testing=20 Filter 1 Filter 2 Filter 3 Filter 4 Filter 5 Filter 6 Filter 7 Filter 8 731 831 723 2343 4838 2514 7815 1085 681 1025 707 1057 625 309 172 672 177 346 307 364 95 59 10 0 0 0 0 35 4 0 0 0 3120 3704 4341 1001 1957 5105 288 51 156 31 0 732 175 4 0 44 474 1188 1966 539 167 184 78 0 30 52 30 193 0 0 0 0 7662 4377 3991 2200 1639 347 561 134 0 23 20 1309 874 0 0 0 1564 789 4137 1752 1311 1629 52 5 34 68 123 728 343 120 77 76 385 183 306 171 553 163 3 56 72 123 68 219 196 42 41 37 704 764 796 950 2347 1998 489 665 266 379 137 138 729 944 1400 516
  9. 9. The accuracy of the speech recognition was within an acceptable range of above 90% by our initial expectations of the system design. However, considering the basic speech algorithm, recognition is valid only for the same person who underwent the preliminary voice training to initialize the dictionary fingerprints. For convenience, the recorded voice of Oxford dictionary software stored as a .wav file was played in a relatively quiet surroundings. 4.5 Euclidean Distance Comparison Figure 4.6: Euclidean Distance Comparison UART logging from RealTerm was done and the Euclidean distance comparison was logged with all five different fingerprints already stored in the EEPROM. As expected, the word was recognized as the one with the least distance when comparing with the five fingerprints. 4.6 Wireless Transmit and Receive 4.6.1 Correct ARC4 Key Encryption/Decryption The logged data data from the RealTerm is presented below. It depicts correct ARC4 key encryption and decryption. If the private key is matched in both the control and robot modules as shown below, then the encrypted data is decrypted back to the original data as the PRGA of robot agent updates 12 times to catch up with the PRGA of Control module. CONTROL Initialized! == Control Module == Private Key = SaGuN - TX to Robot I - Destination: C2:C2:C2:C2:C2(Pipe1) Original: data_tx1[0]= S data_tx1[1]=0 ROBOT Initialized! == Robot Module I== Private Key = SaGuN -RX from Control- Packet received! Encrypted data[0]= ‘ data[1]=0 No. of PRGA updates = Encrypted: data_tx1[0]= ‘ data_tx1[1]=0 Packet sent! Current Sequence = 1 - TX to Robot I - Destination: C2:C2:C2:C2:C2(Pipe1) Original: data_tx1[0]= S data_tx1[1]=1 Encrypted: data_tx1[0]= , data_tx1[1]=1 Packet sent! Current Sequence = 2 12 times Decrypted data[0]= S data[1]=0 Current Sequence = 1 -RX from Control- Packet received! Encrypted data[0]= , data[1]=1 Decrypted data[0]= S data[1]=1 Current Sequence = 2 4.6.2 Incorrect ARC4 Key Encryption/Decryption If the private key is not matched between the two modules then the encrypted data cannot be decrypted back to its original data as shown below. CONTROL Initialized! = Control Module = Private Key= VoCoRoBo - TX to Robot II - Destination: C2:C2:C2:C2:C3(Pipe2) Original data_tx1[0]= S data_tx1[1]=0 Encrypted data_tx1[0]= j data_tx1[1]=0 Packet sent! Current Sequence = 1 - TX to Robot II - Destination: C2:C2:C2:C2:C3(Pipe2) Original data_tx1[0]= S data_tx1[1]=1 Encrypted data_tx1[0]= D data_tx1[1]=1 Packet sent! Current Sequence = 2 ROBOT Initialized! =Robot Module II= Private Key = SaGuN - RX from Control- Packet received! Encrypted data[0]= j data[1]=0 No. of PRGA updates = 7 times Decrypted data[0]= ƒ data[1]=0 Current Sequence = 1 - RX from Control- Packet received! Encrypted data[0]= D data[1]=1 Decrypted data[0]= ~ data[1]=1 Current Sequence = 2 5. CONCLUSION This project is based on the implementation of real- time speech recognition using DSP algorithms such as Chebyshev IIR filters, accelerometer for tilt- sensing and establishment of short-range wireless
  10. 10. secure link with ARC4 cipher, all using ubiquitous low-cost 8-bit microcontrollers. With an accuracy of the speech recognition above 90%, it shows the feasibility of the system to be applied in any low cost applications in real time. It was observed that the words with greater pronunciation stress were recognized better. Although for now, the recognition is accurate only for the same person who trained the system, it can be expanded to make the system speaker independent by further research on the storing and retrieval of the voice fingerprint from a different media. Multi-channel wireless link with ARC4 was also successfully implemented to exchange control and sensor data. As nRF24L01 is capable of higher speed data transmission, the system can also be expanded to incorporate other sensors like audio or video sensors for richer data acquisition . 6. REFERENCES [1] T. Aamodt. (2003, April) “Speech Recognition Algorithm”, University of British Columbia. http://www.eecg.toronto.edu/%7Eaamodt/ece34 1/speech-recognition [2] X. Lu, S. Lee, 2006. “Voice Recognition Security System”, Cornell University [3] A. Harison, C. Shah, 2006 "Voice Recognition Car", Cornell University. [4] B. R. Land; Cornell University; Fixed Point mathematical function in GCC and assembler; Optimized 2nd order IIR code. [5] B. R. Land (2008, September). Fast Digital Filtering. Circuit Cellar Issue # 218, p. 40. [6] Application Note AVR201: “Using the AVR® Hardware Multiplier”, Atmel Corporation. [7] IIR Design: nauticom.net/www/jdtaft/iir.htm [8] Brennen Ball; 2007; “Specializing in the NXP LPC2148 and Microchip PIC18F452 microcontrollers and the Nordic Semiconductor nRF24L01 2.4 GHz RF link”; diyembedded.com [9] “Interfacing nRF2401 with SPI” (White Paper), Nordic Semiconductor. [10] T. Igoe, “MMA7260Q 3-Axis Accelerometer Report for PIC 18F252 using PicBasic Pro”, Sensor Workshop at ITP (January 16, 2006). [11] Application Note AN3447: “Implementing Auto- zero calibration technique for accelerometers”, Freescale Semiconductors. 7. PICTURES Figure 7.1: Overall System Figure 7.2: Schematic Diagram of Control Module Figure 7.3: Schematic Diagram of a single Robot Module

×