SlideShare a Scribd company logo
1 of 30
Download to read offline
Improving Speech Recognition
with Embodied Cognition
and Behaviour-based Robotics
Improving Speech Recognition
with Embodied Cognition
and Behaviour-based Robotics
Jorge Davila-Chacon
University of Hamburg - Knowledge Technology
www.informatik.uni-hamburg.de/WTM/
Spotify ML Meetup – November 3rd
2014
MotivationMotivation
Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 2
• Why is bio-inspired SSL interesting / useful?
Neurobotic ExperimentsNeurobotic Experiments
Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 3
Virtual Reality LabVirtual Reality Lab
Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 4
Bauer, J., Dávila-Chacón, J., Strahl, E., Wermter, S. Smoke and Mirrors — Virtual Realities for Sensor Fusion Experiments in Biomimetic
Robotics. In: Multisensor Fusion and Integration for Intelligent Systems, 2012
Neurobotic ExperimentsNeurobotic Experiments
Jorge Davila-Chacon 5Bio-Inspired SSL for Robot ASR
Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation
Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 6
ITD
ILD
ITDs from
Low Frequencies
ITDs from
Low Frequencies
ILDs from
High Frequencies
ILDs from
High Frequencies
Spatial cues allow sound source localisation:
• Interaural Time Difference (ITD)
• Interaural Level Difference (ILD)
Spatial cues allow sound source localisation:
• Interaural Time Difference (ITD)
• Interaural Level Difference (ILD)
Same frequency
component
Same frequency
component
Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 7
ITDs extracted in Medial Superior Olive
(MSO)
ITDs extracted in Medial Superior Olive
(MSO)
• AVCN - Anterior Ventral
Cochlear Nucleus
• AN - Auditory Nerve
• IC – Inferior Colliculus
Interaural Time Differences
Neuroanatomy
Interaural Time Differences
Neuroanatomy
Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 8
Interaural Time Differences
Computational Principle
Interaural Time Differences
Computational Principle
Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 9
ILDs extracted in Lateral Superior Olive
(LSO)
ILDs extracted in Lateral Superior Olive
(LSO)
• MNTB - Medial Nucleus of the Trapezoid Body
• IC – Inferior Colliculus
Interaural Level Differences
Neuroanatomy
Interaural Level Differences
Neuroanatomy
Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation
Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 10
Output of
MSO and LSO
integrated in
IC
Output of
MSO and LSO
integrated in
IC
J. Dávila-Chacón, S. Heinrich, J. Liu, S. Wermter. Biomimetic Binaural Sound Source Localisation with Ego-Noise Cancellation.
International Conference on Artificial Neural Networks, 2012.
Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation
Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 11
Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation
Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 12
Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation
Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 13
Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation
Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 14
 MLP
IC
IC
Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation
Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 15
J. Dávila-Chacón, S. Magg, J. Liu, S. Wermter. Neural and Statistical Processing of Spatial Cues for Sound Source Localisation.
International Joint Conference on Neural Networks, 2013.
Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation
Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 16
Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation
Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 17
Simple IC outputSimple IC output
Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation
Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 18
Complex IC outputComplex IC output
Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation
Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 19
Static SSLStatic SSL
Dynamic
SSL
Dynamic
SSL
 Feed forward
neural network
Robotic Automatic Speech RecognitionRobotic Automatic Speech Recognition
Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 20
Platforms used for ASR: iCub and
Soundman
Platforms used for ASR: iCub and
Soundman
Robotic Automatic Speech RecognitionRobotic Automatic Speech Recognition
Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 21
J. Dávila-Chacón, J. Twiefel, J. Liu, S. Wermter. Improving Humanoid Robot Speech Recognition with Sound Source Localisation.
International Conference on Artificial Neural Networks, 2014.
Binary measure - Static ASRBinary measure - Static ASR
Robotic Automatic Speech RecognitionRobotic Automatic Speech Recognition
Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 22
Continuous measure - Static
ASR
Continuous measure - Static
ASR
J. Dávila-Chacón, J. Twiefel, J. Liu, S. Wermter. Improving Humanoid Robot Speech Recognition with Sound Source Localisation.
International Conference on Artificial Neural Networks, 2014.
● Robotics as a “sandbox” for learning ML
● Neuroscience provides clues for computational principles
● Embodiment
• iCub allows computation of spatial cues
• Interaction with environment can reduce noise
● Signal processing with ANN
• Spiking ANN are an effective representation of spatial cues
• Bayesian integration important for dimensionality reduction
• Softmax Neural layer robust to ego-noise and reverberation
ConclusionConclusion
Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 23
Future WorkFuture Work
● Neural SSL
• Integrate GPU version of MSO and LSO
• Propagation of probabilities through time
• From discrete to continuous
● Integration with vision
• From supervised to unsupervised SSL
• Possible extension to sensorimotor contingencies
• Vision to select between multiple sound sources
• Vision for speech segregation
Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 24
Thank you for your attention.
jorgedch@gmail.com
LinkedIn: Jorge Davila Chacon
• J. Liu, D. Perez-Gonzalez, A. Rees, H. Erwin, S. Wermter. A biologically inspired spiking neural
network model of the auditory midbrain for sound source localisation. Neurocomputing (2010)
• J. Davila-Chacon, S. Heinrich, J. Liu, and S. Wermter. Biomimetic binaural sound source
localisation with ego-noise cancellation. International Conference on Artificial Neural Networks
(2012)
• J. Bauer, J. Davila-Chacon, E. Strahl, S. Wermter. Smoke and Mirrors — Virtual Realities for
Sensor Fusion Experiments in Biomimetic Robotics. Multisensor Fusion and Integration for
Intelligent Systems (2012)
• J. Davila-Chacon, S. Magg, J. Liu, S. Wermter. Neural and Statistical Processing of Spatial Cues
for Sound Source Localisation. International Joint Conference on Neural Networks (2013)
• J. Dávila-Chacón, J. Twiefel, J. Liu, S. Wermter. Improving Humanoid Robot Speech Recognition
with Sound Source Localisation. International Conference on Artificial Neural Networks (2014)
Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 25
AppendixAppendix
Best performances with clustering layer
Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 26
AppendixAppendix
Best performances with clustering layer
Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 27
AppendixAppendix
Bayesian IC model
Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 28
AppendixAppendix
Bayesian IC model
Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 29
AppendixAppendix
Levenshtein distance
Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 30
J. Dávila-Chacón, J. Twiefel, J. Liu, S. Wermter. Improving Humanoid Robot Speech Recognition with Sound Source Localisation.
International Conference on Artificial Neural Networks, 2014.

More Related Content

Viewers also liked

Viewers also liked (7)

Learning to Persuade with AI
Learning to Persuade with AILearning to Persuade with AI
Learning to Persuade with AI
 
Introduction to Reinforcement Learning and FinTech Case Study
Introduction to Reinforcement Learning and FinTech Case StudyIntroduction to Reinforcement Learning and FinTech Case Study
Introduction to Reinforcement Learning and FinTech Case Study
 
The State of Debt Collection 2014 - Industry Infographic
The State of Debt Collection 2014 - Industry InfographicThe State of Debt Collection 2014 - Industry Infographic
The State of Debt Collection 2014 - Industry Infographic
 
The State of Debt Collection
The State of Debt CollectionThe State of Debt Collection
The State of Debt Collection
 
Webinar 001 - Behavioral Economics in Debt Collection
Webinar 001 -  Behavioral Economics in Debt CollectionWebinar 001 -  Behavioral Economics in Debt Collection
Webinar 001 - Behavioral Economics in Debt Collection
 
HSBC UK
HSBC UKHSBC UK
HSBC UK
 
Visual Design with Data
Visual Design with DataVisual Design with Data
Visual Design with Data
 

Similar to Improving Speech Recognition with Embodied Cognition and Behaviour-based Robotics

What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...
Simplilearn
 

Similar to Improving Speech Recognition with Embodied Cognition and Behaviour-based Robotics (20)

One Perceptron to Rule them All: Deep Learning for Multimedia #A2IC2018
One Perceptron  to Rule them All: Deep Learning for Multimedia #A2IC2018One Perceptron  to Rule them All: Deep Learning for Multimedia #A2IC2018
One Perceptron to Rule them All: Deep Learning for Multimedia #A2IC2018
 
Our Best Ideas in Our Hands with Adaptive Virtual Reality - MCAA - ESOF 2016
Our Best Ideas in Our Hands with Adaptive Virtual Reality - MCAA - ESOF 2016Our Best Ideas in Our Hands with Adaptive Virtual Reality - MCAA - ESOF 2016
Our Best Ideas in Our Hands with Adaptive Virtual Reality - MCAA - ESOF 2016
 
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
 
The Allosphere
The AllosphereThe Allosphere
The Allosphere
 
Presentation.ppt
Presentation.pptPresentation.ppt
Presentation.ppt
 
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...
 
Video Analysis (D4L2 2017 UPC Deep Learning for Computer Vision)
Video Analysis (D4L2 2017 UPC Deep Learning for Computer Vision)Video Analysis (D4L2 2017 UPC Deep Learning for Computer Vision)
Video Analysis (D4L2 2017 UPC Deep Learning for Computer Vision)
 
SMART SOUND SYSTEM APPLIED FOR THE EXTENSIVE CARE OF PEOPLE WITH HEARING IMPA...
SMART SOUND SYSTEM APPLIED FOR THE EXTENSIVE CARE OF PEOPLE WITH HEARING IMPA...SMART SOUND SYSTEM APPLIED FOR THE EXTENSIVE CARE OF PEOPLE WITH HEARING IMPA...
SMART SOUND SYSTEM APPLIED FOR THE EXTENSIVE CARE OF PEOPLE WITH HEARING IMPA...
 
Self-supervised Audiovisual Learning - Xavier Giro - UPC Barcelona 2019
Self-supervised Audiovisual Learning - Xavier Giro - UPC Barcelona 2019Self-supervised Audiovisual Learning - Xavier Giro - UPC Barcelona 2019
Self-supervised Audiovisual Learning - Xavier Giro - UPC Barcelona 2019
 
Cephalometry in orthodontics
Cephalometry in orthodonticsCephalometry in orthodontics
Cephalometry in orthodontics
 
Cephalometry
Cephalometry Cephalometry
Cephalometry
 
Wireless Recording Technologies for in vivo Electrophysiology in Conscious, F...
Wireless Recording Technologies for in vivo Electrophysiology in Conscious, F...Wireless Recording Technologies for in vivo Electrophysiology in Conscious, F...
Wireless Recording Technologies for in vivo Electrophysiology in Conscious, F...
 
Dusjagr ucsb 02_bio_electronix
Dusjagr ucsb 02_bio_electronixDusjagr ucsb 02_bio_electronix
Dusjagr ucsb 02_bio_electronix
 
Robotics of Future
Robotics of FutureRobotics of Future
Robotics of Future
 
Brain Computing
Brain ComputingBrain Computing
Brain Computing
 
Deep re-id: 关于行人重识别的深度学习方法
Deep re-id: 关于行人重识别的深度学习方法Deep re-id: 关于行人重识别的深度学习方法
Deep re-id: 关于行人重识别的深度学习方法
 
Speaker ID II (D4L1 Deep Learning for Speech and Language UPC 2017)
Speaker ID II (D4L1 Deep Learning for Speech and Language UPC 2017)Speaker ID II (D4L1 Deep Learning for Speech and Language UPC 2017)
Speaker ID II (D4L1 Deep Learning for Speech and Language UPC 2017)
 
NASA FEI
NASA FEINASA FEI
NASA FEI
 
Shruti Badhwar
Shruti BadhwarShruti Badhwar
Shruti Badhwar
 
Deep Learning - Speaker Verification, Sound Event Detection
Deep Learning - Speaker Verification, Sound Event DetectionDeep Learning - Speaker Verification, Sound Event Detection
Deep Learning - Speaker Verification, Sound Event Detection
 

Improving Speech Recognition with Embodied Cognition and Behaviour-based Robotics

  • 1. Improving Speech Recognition with Embodied Cognition and Behaviour-based Robotics Improving Speech Recognition with Embodied Cognition and Behaviour-based Robotics Jorge Davila-Chacon University of Hamburg - Knowledge Technology www.informatik.uni-hamburg.de/WTM/ Spotify ML Meetup – November 3rd 2014
  • 2. MotivationMotivation Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 2 • Why is bio-inspired SSL interesting / useful?
  • 3. Neurobotic ExperimentsNeurobotic Experiments Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 3
  • 4. Virtual Reality LabVirtual Reality Lab Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 4 Bauer, J., Dávila-Chacón, J., Strahl, E., Wermter, S. Smoke and Mirrors — Virtual Realities for Sensor Fusion Experiments in Biomimetic Robotics. In: Multisensor Fusion and Integration for Intelligent Systems, 2012
  • 5. Neurobotic ExperimentsNeurobotic Experiments Jorge Davila-Chacon 5Bio-Inspired SSL for Robot ASR
  • 6. Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 6 ITD ILD ITDs from Low Frequencies ITDs from Low Frequencies ILDs from High Frequencies ILDs from High Frequencies Spatial cues allow sound source localisation: • Interaural Time Difference (ITD) • Interaural Level Difference (ILD) Spatial cues allow sound source localisation: • Interaural Time Difference (ITD) • Interaural Level Difference (ILD) Same frequency component Same frequency component
  • 7. Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 7 ITDs extracted in Medial Superior Olive (MSO) ITDs extracted in Medial Superior Olive (MSO) • AVCN - Anterior Ventral Cochlear Nucleus • AN - Auditory Nerve • IC – Inferior Colliculus Interaural Time Differences Neuroanatomy Interaural Time Differences Neuroanatomy
  • 8. Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 8 Interaural Time Differences Computational Principle Interaural Time Differences Computational Principle
  • 9. Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 9 ILDs extracted in Lateral Superior Olive (LSO) ILDs extracted in Lateral Superior Olive (LSO) • MNTB - Medial Nucleus of the Trapezoid Body • IC – Inferior Colliculus Interaural Level Differences Neuroanatomy Interaural Level Differences Neuroanatomy
  • 10. Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 10 Output of MSO and LSO integrated in IC Output of MSO and LSO integrated in IC J. Dávila-Chacón, S. Heinrich, J. Liu, S. Wermter. Biomimetic Binaural Sound Source Localisation with Ego-Noise Cancellation. International Conference on Artificial Neural Networks, 2012.
  • 11. Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 11
  • 12. Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 12
  • 13. Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 13
  • 14. Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 14  MLP IC IC
  • 15. Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 15 J. Dávila-Chacón, S. Magg, J. Liu, S. Wermter. Neural and Statistical Processing of Spatial Cues for Sound Source Localisation. International Joint Conference on Neural Networks, 2013.
  • 16. Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 16
  • 17. Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 17 Simple IC outputSimple IC output
  • 18. Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 18 Complex IC outputComplex IC output
  • 19. Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 19 Static SSLStatic SSL Dynamic SSL Dynamic SSL  Feed forward neural network
  • 20. Robotic Automatic Speech RecognitionRobotic Automatic Speech Recognition Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 20 Platforms used for ASR: iCub and Soundman Platforms used for ASR: iCub and Soundman
  • 21. Robotic Automatic Speech RecognitionRobotic Automatic Speech Recognition Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 21 J. Dávila-Chacón, J. Twiefel, J. Liu, S. Wermter. Improving Humanoid Robot Speech Recognition with Sound Source Localisation. International Conference on Artificial Neural Networks, 2014. Binary measure - Static ASRBinary measure - Static ASR
  • 22. Robotic Automatic Speech RecognitionRobotic Automatic Speech Recognition Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 22 Continuous measure - Static ASR Continuous measure - Static ASR J. Dávila-Chacón, J. Twiefel, J. Liu, S. Wermter. Improving Humanoid Robot Speech Recognition with Sound Source Localisation. International Conference on Artificial Neural Networks, 2014.
  • 23. ● Robotics as a “sandbox” for learning ML ● Neuroscience provides clues for computational principles ● Embodiment • iCub allows computation of spatial cues • Interaction with environment can reduce noise ● Signal processing with ANN • Spiking ANN are an effective representation of spatial cues • Bayesian integration important for dimensionality reduction • Softmax Neural layer robust to ego-noise and reverberation ConclusionConclusion Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 23
  • 24. Future WorkFuture Work ● Neural SSL • Integrate GPU version of MSO and LSO • Propagation of probabilities through time • From discrete to continuous ● Integration with vision • From supervised to unsupervised SSL • Possible extension to sensorimotor contingencies • Vision to select between multiple sound sources • Vision for speech segregation Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 24
  • 25. Thank you for your attention. jorgedch@gmail.com LinkedIn: Jorge Davila Chacon • J. Liu, D. Perez-Gonzalez, A. Rees, H. Erwin, S. Wermter. A biologically inspired spiking neural network model of the auditory midbrain for sound source localisation. Neurocomputing (2010) • J. Davila-Chacon, S. Heinrich, J. Liu, and S. Wermter. Biomimetic binaural sound source localisation with ego-noise cancellation. International Conference on Artificial Neural Networks (2012) • J. Bauer, J. Davila-Chacon, E. Strahl, S. Wermter. Smoke and Mirrors — Virtual Realities for Sensor Fusion Experiments in Biomimetic Robotics. Multisensor Fusion and Integration for Intelligent Systems (2012) • J. Davila-Chacon, S. Magg, J. Liu, S. Wermter. Neural and Statistical Processing of Spatial Cues for Sound Source Localisation. International Joint Conference on Neural Networks (2013) • J. Dávila-Chacón, J. Twiefel, J. Liu, S. Wermter. Improving Humanoid Robot Speech Recognition with Sound Source Localisation. International Conference on Artificial Neural Networks (2014) Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 25
  • 26. AppendixAppendix Best performances with clustering layer Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 26
  • 27. AppendixAppendix Best performances with clustering layer Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 27
  • 28. AppendixAppendix Bayesian IC model Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 28
  • 29. AppendixAppendix Bayesian IC model Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 29
  • 30. AppendixAppendix Levenshtein distance Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 30 J. Dávila-Chacón, J. Twiefel, J. Liu, S. Wermter. Improving Humanoid Robot Speech Recognition with Sound Source Localisation. International Conference on Artificial Neural Networks, 2014.