Deep Learning - Speaker Verification, Sound Event Detection
Improving Speech Recognition with Embodied Cognition and Behaviour-based Robotics
1. Improving Speech Recognition
with Embodied Cognition
and Behaviour-based Robotics
Improving Speech Recognition
with Embodied Cognition
and Behaviour-based Robotics
Jorge Davila-Chacon
University of Hamburg - Knowledge Technology
www.informatik.uni-hamburg.de/WTM/
Spotify ML Meetup – November 3rd
2014
4. Virtual Reality LabVirtual Reality Lab
Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 4
Bauer, J., Dávila-Chacón, J., Strahl, E., Wermter, S. Smoke and Mirrors — Virtual Realities for Sensor Fusion Experiments in Biomimetic
Robotics. In: Multisensor Fusion and Integration for Intelligent Systems, 2012
6. Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation
Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 6
ITD
ILD
ITDs from
Low Frequencies
ITDs from
Low Frequencies
ILDs from
High Frequencies
ILDs from
High Frequencies
Spatial cues allow sound source localisation:
• Interaural Time Difference (ITD)
• Interaural Level Difference (ILD)
Spatial cues allow sound source localisation:
• Interaural Time Difference (ITD)
• Interaural Level Difference (ILD)
Same frequency
component
Same frequency
component
7. Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 7
ITDs extracted in Medial Superior Olive
(MSO)
ITDs extracted in Medial Superior Olive
(MSO)
• AVCN - Anterior Ventral
Cochlear Nucleus
• AN - Auditory Nerve
• IC – Inferior Colliculus
Interaural Time Differences
Neuroanatomy
Interaural Time Differences
Neuroanatomy
8. Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 8
Interaural Time Differences
Computational Principle
Interaural Time Differences
Computational Principle
9. Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 9
ILDs extracted in Lateral Superior Olive
(LSO)
ILDs extracted in Lateral Superior Olive
(LSO)
• MNTB - Medial Nucleus of the Trapezoid Body
• IC – Inferior Colliculus
Interaural Level Differences
Neuroanatomy
Interaural Level Differences
Neuroanatomy
10. Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation
Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 10
Output of
MSO and LSO
integrated in
IC
Output of
MSO and LSO
integrated in
IC
J. Dávila-Chacón, S. Heinrich, J. Liu, S. Wermter. Biomimetic Binaural Sound Source Localisation with Ego-Noise Cancellation.
International Conference on Artificial Neural Networks, 2012.
11. Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation
Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 11
12. Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation
Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 12
13. Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation
Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 13
14. Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation
Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 14
MLP
IC
IC
15. Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation
Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 15
J. Dávila-Chacón, S. Magg, J. Liu, S. Wermter. Neural and Statistical Processing of Spatial Cues for Sound Source Localisation.
International Joint Conference on Neural Networks, 2013.
16. Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation
Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 16
17. Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation
Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 17
Simple IC outputSimple IC output
18. Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation
Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 18
Complex IC outputComplex IC output
20. Robotic Automatic Speech RecognitionRobotic Automatic Speech Recognition
Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 20
Platforms used for ASR: iCub and
Soundman
Platforms used for ASR: iCub and
Soundman
21. Robotic Automatic Speech RecognitionRobotic Automatic Speech Recognition
Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 21
J. Dávila-Chacón, J. Twiefel, J. Liu, S. Wermter. Improving Humanoid Robot Speech Recognition with Sound Source Localisation.
International Conference on Artificial Neural Networks, 2014.
Binary measure - Static ASRBinary measure - Static ASR
22. Robotic Automatic Speech RecognitionRobotic Automatic Speech Recognition
Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 22
Continuous measure - Static
ASR
Continuous measure - Static
ASR
J. Dávila-Chacón, J. Twiefel, J. Liu, S. Wermter. Improving Humanoid Robot Speech Recognition with Sound Source Localisation.
International Conference on Artificial Neural Networks, 2014.
23. ● Robotics as a “sandbox” for learning ML
● Neuroscience provides clues for computational principles
● Embodiment
• iCub allows computation of spatial cues
• Interaction with environment can reduce noise
● Signal processing with ANN
• Spiking ANN are an effective representation of spatial cues
• Bayesian integration important for dimensionality reduction
• Softmax Neural layer robust to ego-noise and reverberation
ConclusionConclusion
Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 23
24. Future WorkFuture Work
● Neural SSL
• Integrate GPU version of MSO and LSO
• Propagation of probabilities through time
• From discrete to continuous
● Integration with vision
• From supervised to unsupervised SSL
• Possible extension to sensorimotor contingencies
• Vision to select between multiple sound sources
• Vision for speech segregation
Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 24
25. Thank you for your attention.
jorgedch@gmail.com
LinkedIn: Jorge Davila Chacon
• J. Liu, D. Perez-Gonzalez, A. Rees, H. Erwin, S. Wermter. A biologically inspired spiking neural
network model of the auditory midbrain for sound source localisation. Neurocomputing (2010)
• J. Davila-Chacon, S. Heinrich, J. Liu, and S. Wermter. Biomimetic binaural sound source
localisation with ego-noise cancellation. International Conference on Artificial Neural Networks
(2012)
• J. Bauer, J. Davila-Chacon, E. Strahl, S. Wermter. Smoke and Mirrors — Virtual Realities for
Sensor Fusion Experiments in Biomimetic Robotics. Multisensor Fusion and Integration for
Intelligent Systems (2012)
• J. Davila-Chacon, S. Magg, J. Liu, S. Wermter. Neural and Statistical Processing of Spatial Cues
for Sound Source Localisation. International Joint Conference on Neural Networks (2013)
• J. Dávila-Chacón, J. Twiefel, J. Liu, S. Wermter. Improving Humanoid Robot Speech Recognition
with Sound Source Localisation. International Conference on Artificial Neural Networks (2014)
Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 25
30. AppendixAppendix
Levenshtein distance
Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 30
J. Dávila-Chacón, J. Twiefel, J. Liu, S. Wermter. Improving Humanoid Robot Speech Recognition with Sound Source Localisation.
International Conference on Artificial Neural Networks, 2014.