Vivek Kumar
Director, Technology Incubation
Dolby Laboratories
@vivek_kumar
When Deep Learning Takes
On Signal Processing
• Introduction : Signal Processing & Deep Learning
• Toy Example - 2 ways of solving noise reduction
• Some Learnings & takeaways
Agenda
Introduction
.. conveying information about the behavior or
attributes of some phenomenon such as sound,
images, and biological measurements
Roland Priemer (1991). Introductory Signal Processing.
Signals
Analysis, synthesis, and modification of signals.
e.g. to improve signal transmission fidelity, storage
efficiency and to emphasize or detect components of
interest in a measured signal
Signal Processing
Using digital operations to perform signal processing
operations
Digital Signal Processing
https://en.wikipedia.org/wiki/Digital_signal_processing
Image By en:User:Cburnett [CC-BY-SA-3.0 )], via Wikimedia Commons
Speech & Audio
• speech recognition/synthesis, digital audio,
equalization…
Image & Video
• enhancement, compression, Computer Vision…
Communications
• Bluetooth, wifi, cellular/mobile phones, digital
television …
Other
• radar processing, sonar processing, ECG
analysis, X-ray analysis, EEG brain mappers,
consumer electronics…
Applications of DSP
Artificial
Intelligence
Machine
Learning
Deep
Learning
Toy Example : Noise Cancellation
Two Stories
Noise Cancellation: State of the Art
https://medium.com/@2hz_ai
Toy Example : Noise Cancellation
http://www.labbookpages.co.uk/audio/beamforming/delaySum.html
Digital Signal Processing - Beamforming
A Wavenet for Speech Denoising - Dario Rethage, Jordi Pons & Xavier Serra
Wavenet
Speech denoising - Wavenet
A Wavenet for Speech Denoising - Dario Rethage, Jordi Pons & Xavier Serra
• Signal Processing
• Understanding the Signals - First principles
• Deep Learning
• Understand high level intention – reframing
the problem
Different Approaches
Learnings
Image - Surpassing conventional CV
• Image Recognition: 2012 , 2015 (better than humans)
• Image Generation : 2015
• Image Post Processing (Super Resolution) : 2015
• Image compression: 2017
Speech – Surpassing traditional Signal Processing
• Speech Recognition: 2015, 2016(Human Parity)
• Speech Synthesis : 2016
• Speech compression: 2017 (5x- 7x over WB-AMR)
Natural Language Processing (NLP/NLU)
• Text Classification
• Sentiment Analysis
• Semantic Analysis
Why Deep Learning – State of ART
Deep Learning Doesn’t Needs domain expertize
• Takes years to develop Signal Processing expertise.
• e.g. which transform (MFCC? MDCT ? QMF ? Wavelets?)
Generalize into other domains
• Wavenet was derived from PixelCNN (image generation)
Why Deep Learning ?
Unclear problem definition
Unclear pass fail metric
Requires computing power, memory & Data
• Lots of data - substantially more than humans
Kind of a black box – hard to draw right conclusions
• Very easy to draw the wrong conclusions.
[SKIP TO END]
Why not Deep Learning
Working with Domain Experts
Once upon a
time…
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
2010 2011 2012 2013 20142015
Deep Learning
Nvidia CES 2016 press conference
IMAGENET Accuracy Rate
Traditional Computer Vision
2012 Computer
VisionPaper by
Yaan LeCun
experimental results show a very small improvement
…wonder what can be learnt from such study..
Without explicit representation for the features,
representation and algorithm contributions are limited
to such data sets…do not reflect any real world
applications.
https://plus.google.com/+YannLeCunPhD/posts/gurGyczzsJ7
Rejected in 2012 because …
Performance
Time
Ignored
Pace of Technological
Process
Inspired from Innovators Dilemma by Clayton Christensen
Flash
Point
So working with
experts…
Should you ignore experts ?
Not if you want to do great things
Image © 1986 Túrelio (via Wikimedia-Commons), 1986 / Lizenz: Creative Commons CC-BY-SA-2.0 de
I can do things you cannot,
you can do things I cannot;
together we can do great things.
~ Mother Teresa
Defining the Problem
• Asking the right question
• Set up right training targets/expectation
• Note: Do not prematurely limit solution
Getting the right dataset and tools to process it.
• e.g. normalize signal levels based on speech activity
• A clean (or appropriate) dataset is key to success.
Analyze performance with domain knowledge
• e.g. identify weakness with signal processing tools
• Comparing with State of art.
Working with Experts
System Design vs Component Design
• Deep Learning shines when optimized End 2 End
Errors : Graceful degradation vs Catastrophic Errors
Initial demo/proof of concept is easy
• Tuning and data wrangling is time intensive
Knowledge transferring across different domains
Other Takeaways
Conclusion
Have a growth mindset be a lifelong learner
There is a lot DSP & AI researchers can
learn from each other
Bringing in insights from different domain
• Outside the comfort zone – but very powerful
Questions ?
@vivek_kumar
In the beginner's mind
there are many
possibilities
In the
expert's mind
there are few
~
Zen Master Shunryu Suzuki

Deep learning takes on Signal Processing

  • 1.
    Vivek Kumar Director, TechnologyIncubation Dolby Laboratories @vivek_kumar When Deep Learning Takes On Signal Processing
  • 2.
    • Introduction :Signal Processing & Deep Learning • Toy Example - 2 ways of solving noise reduction • Some Learnings & takeaways Agenda
  • 3.
  • 4.
    .. conveying informationabout the behavior or attributes of some phenomenon such as sound, images, and biological measurements Roland Priemer (1991). Introductory Signal Processing. Signals
  • 5.
    Analysis, synthesis, andmodification of signals. e.g. to improve signal transmission fidelity, storage efficiency and to emphasize or detect components of interest in a measured signal Signal Processing
  • 6.
    Using digital operationsto perform signal processing operations Digital Signal Processing https://en.wikipedia.org/wiki/Digital_signal_processing Image By en:User:Cburnett [CC-BY-SA-3.0 )], via Wikimedia Commons
  • 7.
    Speech & Audio •speech recognition/synthesis, digital audio, equalization… Image & Video • enhancement, compression, Computer Vision… Communications • Bluetooth, wifi, cellular/mobile phones, digital television … Other • radar processing, sonar processing, ECG analysis, X-ray analysis, EEG brain mappers, consumer electronics… Applications of DSP
  • 8.
  • 9.
    Toy Example :Noise Cancellation Two Stories
  • 10.
    Noise Cancellation: Stateof the Art https://medium.com/@2hz_ai Toy Example : Noise Cancellation
  • 11.
  • 12.
    A Wavenet forSpeech Denoising - Dario Rethage, Jordi Pons & Xavier Serra Wavenet
  • 13.
    Speech denoising -Wavenet A Wavenet for Speech Denoising - Dario Rethage, Jordi Pons & Xavier Serra
  • 14.
    • Signal Processing •Understanding the Signals - First principles • Deep Learning • Understand high level intention – reframing the problem Different Approaches
  • 15.
  • 16.
    Image - Surpassingconventional CV • Image Recognition: 2012 , 2015 (better than humans) • Image Generation : 2015 • Image Post Processing (Super Resolution) : 2015 • Image compression: 2017 Speech – Surpassing traditional Signal Processing • Speech Recognition: 2015, 2016(Human Parity) • Speech Synthesis : 2016 • Speech compression: 2017 (5x- 7x over WB-AMR) Natural Language Processing (NLP/NLU) • Text Classification • Sentiment Analysis • Semantic Analysis Why Deep Learning – State of ART
  • 17.
    Deep Learning Doesn’tNeeds domain expertize • Takes years to develop Signal Processing expertise. • e.g. which transform (MFCC? MDCT ? QMF ? Wavelets?) Generalize into other domains • Wavenet was derived from PixelCNN (image generation) Why Deep Learning ?
  • 18.
    Unclear problem definition Unclearpass fail metric Requires computing power, memory & Data • Lots of data - substantially more than humans Kind of a black box – hard to draw right conclusions • Very easy to draw the wrong conclusions. [SKIP TO END] Why not Deep Learning
  • 19.
  • 20.
  • 21.
    100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 2010 2011 20122013 20142015 Deep Learning Nvidia CES 2016 press conference IMAGENET Accuracy Rate Traditional Computer Vision 2012 Computer VisionPaper by Yaan LeCun
  • 22.
    experimental results showa very small improvement …wonder what can be learnt from such study.. Without explicit representation for the features, representation and algorithm contributions are limited to such data sets…do not reflect any real world applications. https://plus.google.com/+YannLeCunPhD/posts/gurGyczzsJ7 Rejected in 2012 because …
  • 23.
    Performance Time Ignored Pace of Technological Process Inspiredfrom Innovators Dilemma by Clayton Christensen Flash Point
  • 24.
  • 25.
    Should you ignoreexperts ? Not if you want to do great things
  • 26.
    Image © 1986Túrelio (via Wikimedia-Commons), 1986 / Lizenz: Creative Commons CC-BY-SA-2.0 de I can do things you cannot, you can do things I cannot; together we can do great things. ~ Mother Teresa
  • 27.
    Defining the Problem •Asking the right question • Set up right training targets/expectation • Note: Do not prematurely limit solution Getting the right dataset and tools to process it. • e.g. normalize signal levels based on speech activity • A clean (or appropriate) dataset is key to success. Analyze performance with domain knowledge • e.g. identify weakness with signal processing tools • Comparing with State of art. Working with Experts
  • 28.
    System Design vsComponent Design • Deep Learning shines when optimized End 2 End Errors : Graceful degradation vs Catastrophic Errors Initial demo/proof of concept is easy • Tuning and data wrangling is time intensive Knowledge transferring across different domains Other Takeaways
  • 29.
    Conclusion Have a growthmindset be a lifelong learner There is a lot DSP & AI researchers can learn from each other Bringing in insights from different domain • Outside the comfort zone – but very powerful
  • 30.
  • 31.
    In the beginner'smind there are many possibilities In the expert's mind there are few ~ Zen Master Shunryu Suzuki

Editor's Notes

  • #2 Thank you all for coming to this presentation. This topic is really dear to my heart and it’s great to see so many people interested in this topic.  I work in the technology incubation group in Dolby Laboratories and me and my team have been focused on using ai and deep learning to build new experiences especially an audio video media and entertainment space.  My Journey not been a straight one. My Background is EE (?) I have been working with Signal Processing for almost 15 years initially for wireless communication and later on Audio and Speech.  Almost 9 years ago I Joined Dolby laboratories where I ended up  the best experts on signal processing, learning a lot from Them. Now for the last three years or so I’ve been using deep learning to replace what we typically used to go but signal processor. This presentation is the last part of my journey. This change was not just about using new tools, but also about changing the mindset and reframing how technology gets developed. Image courtsy : https://pxhere.com/en/photo/709985
  • #3 Brief intoruduction – DSP and Deep learning so that we. Are talking the same language. And hopefully it should help guide and inform other teams planning to go through a similar transition.
  • #5 Signals are essentially ways of representing information. e.g. if you want to get an idea of temperature varies over a week – you can measure temperature every hour and plot it agaist time
  • #6 Anything you do to signals – to improve it enahcnce it or modify it – it becomes signal processing.
  • #7 When we use digital operations to Do operations. Now we do most computing in digital domain But the Real world consists of analog symbols
  • #8 <5 minutes end of this slide>
  • #9 I know everybody here is familiar with Deep Learning/Machine Learning/AI A good way of visualizing it is to draw concentric circles * The Biggest one being AI i– computers or machines that mimic human behavior. This has been around forever Then we have machine learning which instead of being explicitly programmed, they learns from data Even though there are multiple kinds of machine learning and we have been using them for decades * All the recent breakthroughs like image recognition and speech recognition. have been driven by deep learning ATTRIBUTION: MIND icon created by Delwar Hossain from Noun Project GEAR icon created by Chrystina Angeline from Noun Project LAYERS icon created by Shmidt Sergey from Noun Project
  • #10  I will take a toy example of Noise Cancellation and provide two perspectives on how we would tackle the problem .
  • #11 And when I talk about Noise cancellation – it’s the algorithm which runs on your device which.
  • #12 Traditional way of way’s of doing noise cancellation and The simplest way is to use a microphone array – where the output from them is added. So if we have a signal coming from the front – the signal would be in phase and get’s applifiend Out of phase signal get’s cancelled. With this we have a focused beam pattern which Aplifies signal on the fron’t and atteniuates signal from the sides.
  • #13 How many of you are familiar with wavenet ? Wavenet is a speech synthesis network developed by google Which takes in the previous audio sample and predicts the next sample When the network is conditioned on text –it is able to synthesise speech. essentially creating ver realistec text-speech network. which is more natural than all other speech syhteis methods. ( used in google Assistant ..)
  • #14 So we have a network which can generate clean speech, now instead of generating speech from text, we can condition it on noisy speech and teach it to generate clean speech from noisy speech.
  • #15 Go to first principles, understand the physics behind waves and how they interact with the enviornment and try to On the other hand with Deep Learning instead of removing noise from speech, or attuniating noise we are resynthesising speech with is very close the orignal speech.
  • #17 Sometimes it’s very clear when to use deep learning for the last 5 years… Performed better than what we could hope to achieve with Signal PRocessing But In the noise cancelaation example I used earlier, noise cancellation Signal Processing still performs better than Deep Learning So why would somebody use Deep Learning for
  • #18 To me a big advantage deep learning provides an easy way of achiving a goal Without becoming a domain expert in the fietd.. Domain expertise is hard, takes years to develop Phd or Master’s degree atleast, who could develop these solutiosn. Even then it’s hard to develop a new solution. Which transform to use ? MFCC (Mel-frequency cepstral coefficients) which were used in speech recognitransformed used to extract features speech recognition took decates to develop. Leading researchers building on work of other leading researchers. With Deep Learning you can do speech recognition on raw audio samples and it works reasonably easiliy. The autors of Wavenet (the state of art speech synthesis network developed by google) before working on Wavenet published pixelCNN which was a network generating images and were able to quickly take this network (~6 months) and use it to generate speech synthesis. (think about it we have been working on speech syhtesisi for decates – and a comparatively naïve approach was able to beat the state of art by a huge margin] Similar examples in algorithms develipmed for Audio being used in NLP …
  • #19 Deep Leanring can solve a problem, but it can not help you find the question.
  • #20 One of the biggest challenges I had – was working with domain experts. So I am going to take a quick detour and tell you a story
  • #21 And I am going to take a quick detour to
  • #22 Imagenet – Olympics for image classification Classify – millions of images into 1000 catagories With traditional CV, the accuracy mid 70s – every year 1-2% improvement In 2012 - Alex Krizhevsky, for the very first time was successfully able to train a deep learning to solve this problem Without any handcrafted features, his entry - alexnet won the compedition Not only that – the improvement achieved was more than the collective improvement seen in a decated With deep learning Continued to increase – 2016 superhuman and in 2017 compedition stopped classifying still images And let’s be honest, Classigying 13 million images into 1000 categories is not challenging anymore  Interestingly – while deep learning was improving the state of the art by significant margins - Papers published by YannLE Cunn/ The Leading authority on deep learning were being rejected by computer vision publications. Image from http://www.slideshare.net/NVIDIA/nvidia-ces-2016-press-conference
  • #23 Why do you think that was
  • #24 To me what I have observed is that in technology and in business Incumbents hardly survive the disruption. This is one aspect of what Christenson described in Innvators' dliema Disruptive Innovation improving at a exponential rate.
  • #25 So how do you work with experts, when the technology they had Something they have been working for decates is being disrupted.
  • #26 If you were a technology company you had to work with domain experts
  • #27 Image : https://commons.wikimedia.org/wiki/File:MotherTeresa_094.jpg
  • #28 that it is not fixated on any one method, or looking to find a problem for a solution it is emotionally vested in  
  • #30 Never be entrenched with a tool or practices. Keep an open mind mind Always be prepared to drop your tools when better comes along. Something which helped me was undertandnig that the learnings when working with a tools are embodied within you and not just the tools – always be prepared to move on when faced with disruptive innovation. This is what deep learning has enabled me to do – talk to people who work in finance, medicine and … /// and work with them and discuss technical problem I never dremnt we would be sharing notes.
  • #31 In Dolby we believe in enhaninsg the science of sight and sound to create and enable spectacular experiences. Audio, Video and Speech Content Cretors to create tooks to create immersive audio (Dolby Atmos – surround sound including height) or high dynamic range video or Voice Deliver it in a format which enables all experiences. My team is exploring Deep Learning in all modalities to improve the experiences people have – either by enabling content creators to create better content or cosumers to have a better experience Since my team is part of technology incubation – it takes some time when the project leaves my team and is productized and there are several projects using deep learning which I am really exited about.
  • #32 To me the biggest aspect of it is the mindset and I would like to end with a qupte from Zen Mastter Sh
  • #33  SIFT – developed by David love over perioid of 10 years working with his phd students Estimated 100 person years
  • #34 Two dictionaries, one for speech and one for noise, need to be trained offline. Once a noisy speech is given, we first calculate the magnitude of the Short-Time-Fourier-Transform. Second, separate it into two parts via NMF, one can be sparsely represented by the speech dictionary, and the other part can be sparsely represented by the noise dictionary. Third, the part that is represented by the speech dictionary will be the estimated clean speech.
  • #35 Wavenet is a speech synthesis network developed by google
  • #37 In DSP we mostly write explicit rules instead in machine learning we learn from Data.
  • #38 Even though there are multiple kinds of machine learning and we have been using them for decades All the recent breakthroughs have been driven by deep learning
  • #39 Before deep learning Traditional Machine Learning Algorithms depends on the right features for it to work Quality of output depended on features And most of the time was spent on identifying the right features. SIFT & Hog for images or MFCCs (David Love) 100’s of people working for years One big advantage Deep learning has over traditional machine learning is That deep learning can work with raw data