Candidate Name Recognition
Roshaan Siddiqui and Oliver Diamond
The Task
● Speech-to-text software fails to correctly transcribe
candidate names
● Can names be identified directly from audio files?
Main Challenges
● Instances originating from different speakers
● Lack of data for uncommon names
Method
1. Extract and label known instances of names from dataset
2. Build CNN-based name recognition model
3. Identify phonetically similar instances from transcription
4. Run model on the identified similar instances
Training a Convolutional Neural Network (CNN)
Trained
Model
Learns Features
Converts to Spectrogram
Labeled Data (E.g. Biden)
Phonetically Similar Instance
Trained
Model
Use-Case
Trump/Biden/Unknown
Next Steps
1. Build CNN-based name recognition model to work on Joe Biden
and Donald Trump and test accuracy of the model
2. Build new instances of data by adding background noise to
pre-existing name instances and use Google Text To Speech to
create more instances.
3. Gather audio data for uncommon candidate names and retrain the
CNN-based name recognition model.
Questions?
Additional Information
Metaphone Algorithm + Text Similarly Algorithm
Correct Candidate Spelling Encoding
100%
100%
85%
Direct Match
Direct Match
Possible Match
How is our data structured
Donald
Trump
Training Data
Phonetically Similar Matches

Candidate Name Recognition