SlideShare a Scribd company logo
1 of 23
Download to read offline
Comparing ML-Based Audio
with ML-Based Vision: An
Introduction to ML Audio
for ML Vision Engineers
Josh Morris
Engineering Manager, Machine Learning
DSP Concepts
The Audio of Things Approach
DSP Concepts helps product makers deliver remarkable Audio Experience through a flexible and modular approach
within a design platform environment. This system makes the entire workflow faster and easier across prototyping,
design, debugging, tuning, production, and even over-the-air updates.
The Audio Experience Design
platform that accelerates feature
development and makes audio
innovation easy
Audio IP
• Building blocks to product-ready
solutions
Prototyping Hardware
• Test and verify performance
Debug Tools
• Pinpoint issues and reduce risk
Tuning Tools
• Optimize playback and voice control
Expertise
• Algorithm, software, tuning, testing
Worldwide Support
Santa Clara, Stuttgart, Taipei, Seoul
© 2022 DSP Concepts 2
Processing at the edge is getting more and more powerful making it possible to do things that were
reserved for the cloud
Audio is becoming increasingly popular
• Standalone applications
• Smart assistant
• Voice control
• Denoising
• Multi-modal applications
• Industrial sensing
• Anomaly detection
Motivation for Talk
© 2022 DSP Concepts 3
• Feature engineering
• How is audio different from
vision?
• How is audio like vision?
• Implementation
• Similarities
• Differences
• Common problems in vision and
their audio analogues
• Classification
• Sequence decoding
• Restoration/denoising
Agenda
© 2022 DSP Concepts 4
Features
• Audio is a sequence of
samples
• Somewhere between
video/images
• Inherent left to right structure
to data
• Sample rate
• Bit depth
How Are Audio Features Different from Vision
Features?
© 2022 DSP Concepts
https://en.wikipedia.org/wiki/Sampling_(signal_processing)
https://en.wikipedia.org/wiki/File:Mike_Austin_Sequence.JPG
6
A lot of techniques employed for ML audio-based solutions borrow techniques from
vision. This means we need to take a 1D sequence of samples and make them look
like an image.
Manipulating Audio from Time Domain to Frequency
Domain
© 2022 DSP Concepts
https://en.wikipedia.org/wiki/Sampling_(signal_processing)
https://upload.wikimedia.org/wikipedia/commons/c/c5/Spectrogram-19thC.png
7
• Fast Fourier transform (FFT)
• Shows frequency over time
• Linearly spaced frequency
bins
• Can apply processing in the
frequency domain and then
use an inverse fast Fourier
transform (IFFT) to get time
domain audio
Fast Fourier Transform
© 2022 DSP Concepts
https://commons.wikimedia.org/wiki/File:FFT-Time-Frequency-View.png
8
• Built from short time Fourier
transform
• Repeat Fourier transform with
set window and hop size
• short-time Fourier transform (STFT)
• Take magnitude squared of
frequency bins
• Common for classification
tasks
Power Spectrogram
© 2022 DSP Concepts
https://upload.wikimedia.org/wikipedia/commons/9/99/Mount_Rainier_soundscape.jpg
9
• Constant Q transform (CQT)
• Logarithmically spaced
frequency bins
• Popular for musical
applications
• More computationally efficient
since fewer bins are needed
to cover a frequency range
Constant Q Transform
© 2022 DSP Concepts
https://en.wikipedia.org/wiki/Constant-Q_transform#/media/File:CQT-piano-chord.png
10
• Mel spectrogram
• Triangular frequency windows
• Filter banks that attempt to
approximate human hearing
• Humans struggle to hear
frequencies that are close together.
This anomaly is known as masking.
Mel Spectrogram
© 2022 DSP Concepts
http://www.ifp.illinois.edu/~minhdo/teaching/speaker_recognition/speaker_recognition.html
11
• Feature normalization
• Multi-channel features for
complex audio
• Color channels in audio
• Real and imaginary
components in frequency time
How Are Audio Features the Same as Vision?
© 2022 DSP Concepts
https://en.wikipedia.org/wiki/Fourier_transform#/media/File:Rising_circular.gif
https://en.wikipedia.org/wiki/Grayscale#/media/File:Beyoglu_4671_tricolor.png
12
Implementation
• Take audio and create an
image-like input with fixed
dimensions.
• Take the input signal into the
frequency domain
• Take a window of feature
vectors
• Slide window with a hop,
usually smaller than the input
dimension of the model
How Do Implementations of Audio Solutions Differ
from Vision Solutions?
https://www.jonnor.com/2021/12/audio-classification-with-machine-learning-europython-2019/
© 2022 DSP Concepts 14
• Use the same layers
• Convolutional layers
• Conventional
• Depth-wise separable
• Residual blocks
• RNN
• Use the same training tricks
How Are Implementations of Audio Solutions Like
Vision Solutions?
© 2022 DSP Concepts
https://www.semanticscholar.org/paper/Singing-Voice-Separation-with-Deep-U-Net-
Networks-Jansson-Humphrey/83ea11b45cba0fc7ee5d60f608edae9c1443861d
15
• Some popular vision model
architectures show up
• EfficientNet
• Similar concepts
• Encoder-decoder network
• Transfer learning
How Are Implementations of Audio Solutions Like
Vision Solutions?
https://ai.googleblog.com/2021/08/soundstream-end-to-end-neural-audio.html
© 2022 DSP Concepts 16
Common Problems in Vision and Their Audio
Analogues
• Very similar
• Convolution layers pull out structural
information
• Trained on frequency domain features
• In audio, we usually have a sliding
window
• Like working with a camera stream
• Can be important for energy savings in
complex systems
• Motion detection
• Voice activity detection
Classification
https://www.tensorflow.org/tutorials/audio/simple_audo
© 2022 DSP Concepts 18
• Vision
• Optical character recognition
• Audio
• Automatic speech recognition
• Both look for structural information in
their inputs and decode them to a
character sequence
• Similar architectures
• RCNN
• Transformer
Sequence Decoding
© 2022 DSP Concepts
https://distill.pub/2017/ctc/
19
• Vision
• Directly regress the image
• Audio
• Regress a gain mask which is
applied to audio stream
• Applied in a streaming fashion
• Window and hop
Restoration/Denoising
© 2022 DSP Concepts
https://www.mathworks.com/help/audio/ug/denoise-speech-using-deep-
learning-networks.html
20
• Feature engineering
• After some preprocessing things
are more similar than not
• Implementation
• Windowing with a set stride and
hop allows us to deal with
streams of data
• Common problems in vision and
their audio analogues
• Classification
• Sequence decoding
• Restoration/denoising
Conclusion
© 2022 DSP Concepts 21
Resources
Getting Started with Audio
Audio Classification using Transfer Learning
https://www.tensorflow.org/tutorials/audio/transfer
_learning_audio
Speech Command Recognition
https://www.tensorflow.org/tutorials/audio/simple_
audio
Get a 30 Day Trial of Audio Weaver
https://w.dspconcepts.com/audio-weaver
© 2022 DSP Concepts 22
Audio Weaver accelerates audio feature development and enables collaboration across product teams. With
over 550 optimized processing modules, audio designs can be developed and implemented on hardware
without writing any DSP code.
The Audio Weaver Framework: Overview
© 2022 DSP Concepts
AWE Designer
Windows-based graphical design environment
 Standard Edition: Design GUI
 Pro Edition: Works with MathWorks® MATLAB®
platform
Audio IP Modules
Building blocks for product developers
 From low-level primitives to complete designs
 From DSP Concepts and our third-party partners
AWE Core
The embedded processing engine
 Optimized target-specific libraries
 Available for multiple processors
 Supports multicore and multi-instance implementation
23

More Related Content

Similar to “Comparing ML-Based Audio with ML-Based Vision: An Introduction to ML Audio for ML Vision Engineers,” a Presentation from DSP Concepts

ODSA Sub-Project Launch
 ODSA Sub-Project Launch ODSA Sub-Project Launch
ODSA Sub-Project LaunchNetronome
 
iDiff 2008 conference #6 IP-Racine DVS
iDiff 2008 conference #6  IP-Racine DVSiDiff 2008 conference #6  IP-Racine DVS
iDiff 2008 conference #6 IP-Racine DVSBenoit Michel
 
Extending Hadoop for Fun & Profit
Extending Hadoop for Fun & ProfitExtending Hadoop for Fun & Profit
Extending Hadoop for Fun & ProfitMilind Bhandarkar
 
Track 1 session 2 - st dev con 2016 - dsp concepts - innovating iot+wearab...
Track 1   session 2 - st dev con 2016 -  dsp concepts - innovating iot+wearab...Track 1   session 2 - st dev con 2016 -  dsp concepts - innovating iot+wearab...
Track 1 session 2 - st dev con 2016 - dsp concepts - innovating iot+wearab...ST_World
 
JP Keynote Nikkei Embedded Processor Symposium 2002
JP Keynote Nikkei Embedded Processor Symposium 2002JP Keynote Nikkei Embedded Processor Symposium 2002
JP Keynote Nikkei Embedded Processor Symposium 2002Lee Flanagin
 
Efficient video perception through AI
Efficient video perception through AIEfficient video perception through AI
Efficient video perception through AIQualcomm Research
 
AudioCodes Session Border Controller Update
AudioCodes Session Border Controller UpdateAudioCodes Session Border Controller Update
AudioCodes Session Border Controller UpdateJohn D'Annunzio
 
SUT Video Collaboration Overview
SUT Video Collaboration OverviewSUT Video Collaboration Overview
SUT Video Collaboration OverviewJohn Grindley
 
Commercial pres for slideshare
Commercial pres for slideshareCommercial pres for slideshare
Commercial pres for slideshareKatherine Rush
 
Maniteja_Professional_Resume
Maniteja_Professional_ResumeManiteja_Professional_Resume
Maniteja_Professional_ResumeVaddi Maniteja
 
[DSC Europe 22] Make some noise for AI in JavaScript - Sead Delalic
[DSC Europe 22] Make some noise for AI in JavaScript - Sead Delalic[DSC Europe 22] Make some noise for AI in JavaScript - Sead Delalic
[DSC Europe 22] Make some noise for AI in JavaScript - Sead DelalicDataScienceConferenc1
 
Ccvp plus module 1
Ccvp plus module 1Ccvp plus module 1
Ccvp plus module 1Le Ngoc Viet
 
NetSuite For Manufacturing _ Cloud Manufacturing Software for Modern Manufact...
NetSuite For Manufacturing _ Cloud Manufacturing Software for Modern Manufact...NetSuite For Manufacturing _ Cloud Manufacturing Software for Modern Manufact...
NetSuite For Manufacturing _ Cloud Manufacturing Software for Modern Manufact...Net at Work
 
Design & Simulation With Verilog
Design & Simulation With Verilog Design & Simulation With Verilog
Design & Simulation With Verilog Semi Design
 
ShowNTell: An easy-to-use tool for answering students’ questions with voice-o...
ShowNTell: An easy-to-use tool for answering students’ questions with voice-o...ShowNTell: An easy-to-use tool for answering students’ questions with voice-o...
ShowNTell: An easy-to-use tool for answering students’ questions with voice-o...Anand Bhojan
 
Introduction to Fog
Introduction to FogIntroduction to Fog
Introduction to FogCisco DevNet
 
Cisco Multi-Service FAN Solution
Cisco Multi-Service FAN SolutionCisco Multi-Service FAN Solution
Cisco Multi-Service FAN SolutionCisco DevNet
 
DTT Regionalization @ iTVF2015 - Istanbul
DTT Regionalization @ iTVF2015 - IstanbulDTT Regionalization @ iTVF2015 - Istanbul
DTT Regionalization @ iTVF2015 - IstanbulBerry Eskes
 
Janus + Audio @ Open Source World
Janus + Audio @ Open Source WorldJanus + Audio @ Open Source World
Janus + Audio @ Open Source WorldLorenzo Miniero
 

Similar to “Comparing ML-Based Audio with ML-Based Vision: An Introduction to ML Audio for ML Vision Engineers,” a Presentation from DSP Concepts (20)

ODSA Sub-Project Launch
 ODSA Sub-Project Launch ODSA Sub-Project Launch
ODSA Sub-Project Launch
 
iDiff 2008 conference #6 IP-Racine DVS
iDiff 2008 conference #6  IP-Racine DVSiDiff 2008 conference #6  IP-Racine DVS
iDiff 2008 conference #6 IP-Racine DVS
 
Extending Hadoop for Fun & Profit
Extending Hadoop for Fun & ProfitExtending Hadoop for Fun & Profit
Extending Hadoop for Fun & Profit
 
Track 1 session 2 - st dev con 2016 - dsp concepts - innovating iot+wearab...
Track 1   session 2 - st dev con 2016 -  dsp concepts - innovating iot+wearab...Track 1   session 2 - st dev con 2016 -  dsp concepts - innovating iot+wearab...
Track 1 session 2 - st dev con 2016 - dsp concepts - innovating iot+wearab...
 
What’s new in MPEG?
What’s new in MPEG?What’s new in MPEG?
What’s new in MPEG?
 
JP Keynote Nikkei Embedded Processor Symposium 2002
JP Keynote Nikkei Embedded Processor Symposium 2002JP Keynote Nikkei Embedded Processor Symposium 2002
JP Keynote Nikkei Embedded Processor Symposium 2002
 
Efficient video perception through AI
Efficient video perception through AIEfficient video perception through AI
Efficient video perception through AI
 
AudioCodes Session Border Controller Update
AudioCodes Session Border Controller UpdateAudioCodes Session Border Controller Update
AudioCodes Session Border Controller Update
 
SUT Video Collaboration Overview
SUT Video Collaboration OverviewSUT Video Collaboration Overview
SUT Video Collaboration Overview
 
Commercial pres for slideshare
Commercial pres for slideshareCommercial pres for slideshare
Commercial pres for slideshare
 
Maniteja_Professional_Resume
Maniteja_Professional_ResumeManiteja_Professional_Resume
Maniteja_Professional_Resume
 
[DSC Europe 22] Make some noise for AI in JavaScript - Sead Delalic
[DSC Europe 22] Make some noise for AI in JavaScript - Sead Delalic[DSC Europe 22] Make some noise for AI in JavaScript - Sead Delalic
[DSC Europe 22] Make some noise for AI in JavaScript - Sead Delalic
 
Ccvp plus module 1
Ccvp plus module 1Ccvp plus module 1
Ccvp plus module 1
 
NetSuite For Manufacturing _ Cloud Manufacturing Software for Modern Manufact...
NetSuite For Manufacturing _ Cloud Manufacturing Software for Modern Manufact...NetSuite For Manufacturing _ Cloud Manufacturing Software for Modern Manufact...
NetSuite For Manufacturing _ Cloud Manufacturing Software for Modern Manufact...
 
Design & Simulation With Verilog
Design & Simulation With Verilog Design & Simulation With Verilog
Design & Simulation With Verilog
 
ShowNTell: An easy-to-use tool for answering students’ questions with voice-o...
ShowNTell: An easy-to-use tool for answering students’ questions with voice-o...ShowNTell: An easy-to-use tool for answering students’ questions with voice-o...
ShowNTell: An easy-to-use tool for answering students’ questions with voice-o...
 
Introduction to Fog
Introduction to FogIntroduction to Fog
Introduction to Fog
 
Cisco Multi-Service FAN Solution
Cisco Multi-Service FAN SolutionCisco Multi-Service FAN Solution
Cisco Multi-Service FAN Solution
 
DTT Regionalization @ iTVF2015 - Istanbul
DTT Regionalization @ iTVF2015 - IstanbulDTT Regionalization @ iTVF2015 - Istanbul
DTT Regionalization @ iTVF2015 - Istanbul
 
Janus + Audio @ Open Source World
Janus + Audio @ Open Source WorldJanus + Audio @ Open Source World
Janus + Audio @ Open Source World
 

More from Edge AI and Vision Alliance

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...Edge AI and Vision Alliance
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...Edge AI and Vision Alliance
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...Edge AI and Vision Alliance
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...Edge AI and Vision Alliance
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...Edge AI and Vision Alliance
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...Edge AI and Vision Alliance
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...Edge AI and Vision Alliance
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsightsEdge AI and Vision Alliance
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...Edge AI and Vision Alliance
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...Edge AI and Vision Alliance
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...Edge AI and Vision Alliance
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...Edge AI and Vision Alliance
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...Edge AI and Vision Alliance
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...Edge AI and Vision Alliance
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...Edge AI and Vision Alliance
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from SamsaraEdge AI and Vision Alliance
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...Edge AI and Vision Alliance
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...Edge AI and Vision Alliance
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...Edge AI and Vision Alliance
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...Edge AI and Vision Alliance
 

More from Edge AI and Vision Alliance (20)

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
 

Recently uploaded

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 

Recently uploaded (20)

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 

“Comparing ML-Based Audio with ML-Based Vision: An Introduction to ML Audio for ML Vision Engineers,” a Presentation from DSP Concepts

  • 1. Comparing ML-Based Audio with ML-Based Vision: An Introduction to ML Audio for ML Vision Engineers Josh Morris Engineering Manager, Machine Learning DSP Concepts
  • 2. The Audio of Things Approach DSP Concepts helps product makers deliver remarkable Audio Experience through a flexible and modular approach within a design platform environment. This system makes the entire workflow faster and easier across prototyping, design, debugging, tuning, production, and even over-the-air updates. The Audio Experience Design platform that accelerates feature development and makes audio innovation easy Audio IP • Building blocks to product-ready solutions Prototyping Hardware • Test and verify performance Debug Tools • Pinpoint issues and reduce risk Tuning Tools • Optimize playback and voice control Expertise • Algorithm, software, tuning, testing Worldwide Support Santa Clara, Stuttgart, Taipei, Seoul © 2022 DSP Concepts 2
  • 3. Processing at the edge is getting more and more powerful making it possible to do things that were reserved for the cloud Audio is becoming increasingly popular • Standalone applications • Smart assistant • Voice control • Denoising • Multi-modal applications • Industrial sensing • Anomaly detection Motivation for Talk © 2022 DSP Concepts 3
  • 4. • Feature engineering • How is audio different from vision? • How is audio like vision? • Implementation • Similarities • Differences • Common problems in vision and their audio analogues • Classification • Sequence decoding • Restoration/denoising Agenda © 2022 DSP Concepts 4
  • 6. • Audio is a sequence of samples • Somewhere between video/images • Inherent left to right structure to data • Sample rate • Bit depth How Are Audio Features Different from Vision Features? © 2022 DSP Concepts https://en.wikipedia.org/wiki/Sampling_(signal_processing) https://en.wikipedia.org/wiki/File:Mike_Austin_Sequence.JPG 6
  • 7. A lot of techniques employed for ML audio-based solutions borrow techniques from vision. This means we need to take a 1D sequence of samples and make them look like an image. Manipulating Audio from Time Domain to Frequency Domain © 2022 DSP Concepts https://en.wikipedia.org/wiki/Sampling_(signal_processing) https://upload.wikimedia.org/wikipedia/commons/c/c5/Spectrogram-19thC.png 7
  • 8. • Fast Fourier transform (FFT) • Shows frequency over time • Linearly spaced frequency bins • Can apply processing in the frequency domain and then use an inverse fast Fourier transform (IFFT) to get time domain audio Fast Fourier Transform © 2022 DSP Concepts https://commons.wikimedia.org/wiki/File:FFT-Time-Frequency-View.png 8
  • 9. • Built from short time Fourier transform • Repeat Fourier transform with set window and hop size • short-time Fourier transform (STFT) • Take magnitude squared of frequency bins • Common for classification tasks Power Spectrogram © 2022 DSP Concepts https://upload.wikimedia.org/wikipedia/commons/9/99/Mount_Rainier_soundscape.jpg 9
  • 10. • Constant Q transform (CQT) • Logarithmically spaced frequency bins • Popular for musical applications • More computationally efficient since fewer bins are needed to cover a frequency range Constant Q Transform © 2022 DSP Concepts https://en.wikipedia.org/wiki/Constant-Q_transform#/media/File:CQT-piano-chord.png 10
  • 11. • Mel spectrogram • Triangular frequency windows • Filter banks that attempt to approximate human hearing • Humans struggle to hear frequencies that are close together. This anomaly is known as masking. Mel Spectrogram © 2022 DSP Concepts http://www.ifp.illinois.edu/~minhdo/teaching/speaker_recognition/speaker_recognition.html 11
  • 12. • Feature normalization • Multi-channel features for complex audio • Color channels in audio • Real and imaginary components in frequency time How Are Audio Features the Same as Vision? © 2022 DSP Concepts https://en.wikipedia.org/wiki/Fourier_transform#/media/File:Rising_circular.gif https://en.wikipedia.org/wiki/Grayscale#/media/File:Beyoglu_4671_tricolor.png 12
  • 14. • Take audio and create an image-like input with fixed dimensions. • Take the input signal into the frequency domain • Take a window of feature vectors • Slide window with a hop, usually smaller than the input dimension of the model How Do Implementations of Audio Solutions Differ from Vision Solutions? https://www.jonnor.com/2021/12/audio-classification-with-machine-learning-europython-2019/ © 2022 DSP Concepts 14
  • 15. • Use the same layers • Convolutional layers • Conventional • Depth-wise separable • Residual blocks • RNN • Use the same training tricks How Are Implementations of Audio Solutions Like Vision Solutions? © 2022 DSP Concepts https://www.semanticscholar.org/paper/Singing-Voice-Separation-with-Deep-U-Net- Networks-Jansson-Humphrey/83ea11b45cba0fc7ee5d60f608edae9c1443861d 15
  • 16. • Some popular vision model architectures show up • EfficientNet • Similar concepts • Encoder-decoder network • Transfer learning How Are Implementations of Audio Solutions Like Vision Solutions? https://ai.googleblog.com/2021/08/soundstream-end-to-end-neural-audio.html © 2022 DSP Concepts 16
  • 17. Common Problems in Vision and Their Audio Analogues
  • 18. • Very similar • Convolution layers pull out structural information • Trained on frequency domain features • In audio, we usually have a sliding window • Like working with a camera stream • Can be important for energy savings in complex systems • Motion detection • Voice activity detection Classification https://www.tensorflow.org/tutorials/audio/simple_audo © 2022 DSP Concepts 18
  • 19. • Vision • Optical character recognition • Audio • Automatic speech recognition • Both look for structural information in their inputs and decode them to a character sequence • Similar architectures • RCNN • Transformer Sequence Decoding © 2022 DSP Concepts https://distill.pub/2017/ctc/ 19
  • 20. • Vision • Directly regress the image • Audio • Regress a gain mask which is applied to audio stream • Applied in a streaming fashion • Window and hop Restoration/Denoising © 2022 DSP Concepts https://www.mathworks.com/help/audio/ug/denoise-speech-using-deep- learning-networks.html 20
  • 21. • Feature engineering • After some preprocessing things are more similar than not • Implementation • Windowing with a set stride and hop allows us to deal with streams of data • Common problems in vision and their audio analogues • Classification • Sequence decoding • Restoration/denoising Conclusion © 2022 DSP Concepts 21
  • 22. Resources Getting Started with Audio Audio Classification using Transfer Learning https://www.tensorflow.org/tutorials/audio/transfer _learning_audio Speech Command Recognition https://www.tensorflow.org/tutorials/audio/simple_ audio Get a 30 Day Trial of Audio Weaver https://w.dspconcepts.com/audio-weaver © 2022 DSP Concepts 22
  • 23. Audio Weaver accelerates audio feature development and enables collaboration across product teams. With over 550 optimized processing modules, audio designs can be developed and implemented on hardware without writing any DSP code. The Audio Weaver Framework: Overview © 2022 DSP Concepts AWE Designer Windows-based graphical design environment  Standard Edition: Design GUI  Pro Edition: Works with MathWorks® MATLAB® platform Audio IP Modules Building blocks for product developers  From low-level primitives to complete designs  From DSP Concepts and our third-party partners AWE Core The embedded processing engine  Optimized target-specific libraries  Available for multiple processors  Supports multicore and multi-instance implementation 23