Exploring Twitter's Finagle technology stack for microservices
Real-Time Voice Actuation
1. Team Jarvis
Final Presentation
Pragya Agrawal
Dominic Calabrese
David Martel
Nathan Sawicki
2. Project Goals
• Design and build real-time speech recognition system
• Build with embedded hardware
• Used Source-Filter model of speech and Support Vector Machine
classifier to recognize commands “zero” through “nine”
• Finished system executes in real-time and has GPIO-based actuation
to demonstrate functional voice recognition
4. Source-Filter Model of Speech
• Word characterization should be
independent of volume, pitch, and duration
of the word
• Simplify speech production model to being:
1.Source - vibration of vocal chords
2.Filter – vocal tract (i.e. positioning of
tongue, mouth, etc.)
• Accurately modeling the filter provides a
basis for word recognition[4]
Broad sweeps of spectrum (formants) result
from the filter configuration. Rapidly varying
peaks come from source resonances
5. All-Pole Filter Coefficients
• First n filter coefficients can be roughly
calculated using the first n time shifts of
the autocorrelation of a signal
• Levinson-Durbin recursion algorithm
calculates all-pole filter coefficients from
autocorrelation
• Want to capture spectral envelope, so
want ~10 filter coefficients[5]
Too many coefficients leads to over-fitting of
curve
6. Cepstral Coefficients
• Cepstrum is useful in separating the source
and filter
• Cepstral coefficients are a very compact
representation of the spectral envelope and
are highly uncorrelated
• Filter coefficients are too sensitive to
numerical precision
• Better to transform LP coefficients into
cepstral coefficients[5]
Cepstral Analysis on source filter model
(a) DFT (b) log magnitude of DFT (c) IDFT
7. Support Vector Machine Learning
• Support Vector Machine (SVM) is a supervised
learning algorithm used for classification and
regression
• We utilize Multi-class Support Vector Machine
• Our algorithm uses one-against-one method to
construct (k *(k-1)/2) classifiers (k = number of
classes), one SVM for each pair of classes.
• LIBSVM, an integrated software for multi-class
support vector classification is used[6]
9. Rejected Methods
• Classification based on correlation of cepstral coefficients
• Took maximum correlation between new signal and library
• Not very robust to small variations or scalable
• Classification using SVM on CRM database
• Words cut off early in database or contaminated by other words
• Recording conditions do not match our method
10. C5515: Vocalization Identification
• Implemented Word from non-Word
Identification
• Grab frame of 256 samples Compute
RMS of frame, compare to threshold
• If RMS > Threshold
• Accumulate frame data
• Else if RMS < Threshold and Frames
Acquired > 3
• Compute Autocorrelation,
• Transmit Data
• Else
• Reset Stored Data
• Specific values determined experimentally
11. C5515: UART Transmission
• Transmit Autocorrelation Coefficients
• UART is 115200 baud, 8 bit, No
Parity, 1 stop bit
• Data is signed 16 bit
• Bit masking and Reconstruction
on the Raspberry Pi
• BlueSmirf Bluetooth-UART Pipes
• Abstracts wireless transmission
• Looks like UART to microcontroller
• Effectively Plug&Play
12. C5515: Major Challenges Faced
• Autocorrelation Coefficient Overflow
• Function Generator Provide too large a voltage
• Forces autocorrelation to overflow
• Bit-shifting worked temporarily, but reduced data precision: poor
classifier performance and threshold variability
• Solution: Switched to Microphone
• Bluesmirf Setup
• Configuring Bluesmirf requires commands at precise times
• Solution: Implemented long delay function on C5515
13. Raspberry Pi: Word Classification
• Implemented All-pole Model of Speech
Vocalization for Classification
• Computes LPC Coefficients from
Autocorrelation
• Converts LPC Coefficients into Cepstral
Coefficients
• LIBSVM multistage classifier
• Algorithm written in mixed C/C++
• LPC and Cepstral functions codegen’d
from Matlab
• Wrapper in hand written code
• Waits for autocorrelation input from UART
14. Raspberry Pi: Actuation
• State Machine implemented
• Displays infamous EECS 452 Fall 2014 Image on sequence of “452”
• Displays special Raspberry Pi Image on “314”
• GPIO array drives LED Binary Counter
• Capable of implemented more complicated functions
• Planned for Coffee Machine Actuation, ran out of time
• Renders graphics using OpenVG Library
• Displays Startup Image
• Displays Digit Image on Classification
15. Raspberry Pi: Major Challenges Faced
• Initially planned to use Simulink Model to implement code
• Worked great for algorithm
• Did not work well for IO
• S-Functions are tricky to work with
• Solution
• Codegen core algorithm
• Hand write wrapper
• Matlab Coder Toolbox
• Converts Matlab code into ANSI C code, with processor specific
optimizations available
• Extremely useful for complex algorithms
• Very finicky to configure properly
• Solution: Study, study, study