SlideShare a Scribd company logo
1 of 23
Download to read offline
Comparing ML-Based Audio
with ML-Based Vision: An
Introduction to ML Audio
for ML Vision Engineers
Josh Morris
Engineering Manager, Machine Learning
DSP Concepts
The Audio of Things Approach
DSP Concepts helps product makers deliver remarkable Audio Experience through a flexible and modular approach
within a design platform environment. This system makes the entire workflow faster and easier across prototyping,
design, debugging, tuning, production, and even over-the-air updates.
The Audio Experience Design
platform that accelerates feature
development and makes audio
innovation easy
Audio IP
• Building blocks to product-ready
solutions
Prototyping Hardware
• Test and verify performance
Debug Tools
• Pinpoint issues and reduce risk
Tuning Tools
• Optimize playback and voice control
Expertise
• Algorithm, software, tuning, testing
Worldwide Support
Santa Clara, Stuttgart, Taipei, Seoul
© 2022 DSP Concepts 2
Processing at the edge is getting more and more powerful making it possible to do things that were
reserved for the cloud
Audio is becoming increasingly popular
• Standalone applications
• Smart assistant
• Voice control
• Denoising
• Multi-modal applications
• Industrial sensing
• Anomaly detection
Motivation for Talk
© 2022 DSP Concepts 3
• Feature engineering
• How is audio different from
vision?
• How is audio like vision?
• Implementation
• Similarities
• Differences
• Common problems in vision and
their audio analogues
• Classification
• Sequence decoding
• Restoration/denoising
Agenda
© 2022 DSP Concepts 4
Features
• Audio is a sequence of
samples
• Somewhere between
video/images
• Inherent left to right structure
to data
• Sample rate
• Bit depth
How Are Audio Features Different from Vision
Features?
© 2022 DSP Concepts
https://en.wikipedia.org/wiki/Sampling_(signal_processing)
https://en.wikipedia.org/wiki/File:Mike_Austin_Sequence.JPG
6
A lot of techniques employed for ML audio-based solutions borrow techniques from
vision. This means we need to take a 1D sequence of samples and make them look
like an image.
Manipulating Audio from Time Domain to Frequency
Domain
© 2022 DSP Concepts
https://en.wikipedia.org/wiki/Sampling_(signal_processing)
https://upload.wikimedia.org/wikipedia/commons/c/c5/Spectrogram-19thC.png
7
• Fast Fourier transform (FFT)
• Shows frequency over time
• Linearly spaced frequency
bins
• Can apply processing in the
frequency domain and then
use an inverse fast Fourier
transform (IFFT) to get time
domain audio
Fast Fourier Transform
© 2022 DSP Concepts
https://commons.wikimedia.org/wiki/File:FFT-Time-Frequency-View.png
8
• Built from short time Fourier
transform
• Repeat Fourier transform with
set window and hop size
• short-time Fourier transform (STFT)
• Take magnitude squared of
frequency bins
• Common for classification
tasks
Power Spectrogram
© 2022 DSP Concepts
https://upload.wikimedia.org/wikipedia/commons/9/99/Mount_Rainier_soundscape.jpg
9
• Constant Q transform (CQT)
• Logarithmically spaced
frequency bins
• Popular for musical
applications
• More computationally efficient
since fewer bins are needed
to cover a frequency range
Constant Q Transform
© 2022 DSP Concepts
https://en.wikipedia.org/wiki/Constant-Q_transform#/media/File:CQT-piano-chord.png
10
• Mel spectrogram
• Triangular frequency windows
• Filter banks that attempt to
approximate human hearing
• Humans struggle to hear
frequencies that are close together.
This anomaly is known as masking.
Mel Spectrogram
© 2022 DSP Concepts
http://www.ifp.illinois.edu/~minhdo/teaching/speaker_recognition/speaker_recognition.html
11
• Feature normalization
• Multi-channel features for
complex audio
• Color channels in audio
• Real and imaginary
components in frequency time
How Are Audio Features the Same as Vision?
© 2022 DSP Concepts
https://en.wikipedia.org/wiki/Fourier_transform#/media/File:Rising_circular.gif
https://en.wikipedia.org/wiki/Grayscale#/media/File:Beyoglu_4671_tricolor.png
12
Implementation
• Take audio and create an
image-like input with fixed
dimensions.
• Take the input signal into the
frequency domain
• Take a window of feature
vectors
• Slide window with a hop,
usually smaller than the input
dimension of the model
How Do Implementations of Audio Solutions Differ
from Vision Solutions?
https://www.jonnor.com/2021/12/audio-classification-with-machine-learning-europython-2019/
© 2022 DSP Concepts 14
• Use the same layers
• Convolutional layers
• Conventional
• Depth-wise separable
• Residual blocks
• RNN
• Use the same training tricks
How Are Implementations of Audio Solutions Like
Vision Solutions?
© 2022 DSP Concepts
https://www.semanticscholar.org/paper/Singing-Voice-Separation-with-Deep-U-Net-
Networks-Jansson-Humphrey/83ea11b45cba0fc7ee5d60f608edae9c1443861d
15
• Some popular vision model
architectures show up
• EfficientNet
• Similar concepts
• Encoder-decoder network
• Transfer learning
How Are Implementations of Audio Solutions Like
Vision Solutions?
https://ai.googleblog.com/2021/08/soundstream-end-to-end-neural-audio.html
© 2022 DSP Concepts 16
Common Problems in Vision and Their Audio
Analogues
• Very similar
• Convolution layers pull out structural
information
• Trained on frequency domain features
• In audio, we usually have a sliding
window
• Like working with a camera stream
• Can be important for energy savings in
complex systems
• Motion detection
• Voice activity detection
Classification
https://www.tensorflow.org/tutorials/audio/simple_audo
© 2022 DSP Concepts 18
• Vision
• Optical character recognition
• Audio
• Automatic speech recognition
• Both look for structural information in
their inputs and decode them to a
character sequence
• Similar architectures
• RCNN
• Transformer
Sequence Decoding
© 2022 DSP Concepts
https://distill.pub/2017/ctc/
19
• Vision
• Directly regress the image
• Audio
• Regress a gain mask which is
applied to audio stream
• Applied in a streaming fashion
• Window and hop
Restoration/Denoising
© 2022 DSP Concepts
https://www.mathworks.com/help/audio/ug/denoise-speech-using-deep-
learning-networks.html
20
• Feature engineering
• After some preprocessing things
are more similar than not
• Implementation
• Windowing with a set stride and
hop allows us to deal with
streams of data
• Common problems in vision and
their audio analogues
• Classification
• Sequence decoding
• Restoration/denoising
Conclusion
© 2022 DSP Concepts 21
Resources
Getting Started with Audio
Audio Classification using Transfer Learning
https://www.tensorflow.org/tutorials/audio/transfer
_learning_audio
Speech Command Recognition
https://www.tensorflow.org/tutorials/audio/simple_
audio
Get a 30 Day Trial of Audio Weaver
https://w.dspconcepts.com/audio-weaver
© 2022 DSP Concepts 22
Audio Weaver accelerates audio feature development and enables collaboration across product teams. With
over 550 optimized processing modules, audio designs can be developed and implemented on hardware
without writing any DSP code.
The Audio Weaver Framework: Overview
© 2022 DSP Concepts
AWE Designer
Windows-based graphical design environment
 Standard Edition: Design GUI
 Pro Edition: Works with MathWorks® MATLAB®
platform
Audio IP Modules
Building blocks for product developers
 From low-level primitives to complete designs
 From DSP Concepts and our third-party partners
AWE Core
The embedded processing engine
 Optimized target-specific libraries
 Available for multiple processors
 Supports multicore and multi-instance implementation
23

More Related Content

Similar to “Comparing ML-Based Audio with ML-Based Vision: An Introduction to ML Audio for ML Vision Engineers,” a Presentation from DSP Concepts

ODSA Sub-Project Launch
 ODSA Sub-Project Launch ODSA Sub-Project Launch
ODSA Sub-Project LaunchNetronome
 
iDiff 2008 conference #6 IP-Racine DVS
iDiff 2008 conference #6  IP-Racine DVSiDiff 2008 conference #6  IP-Racine DVS
iDiff 2008 conference #6 IP-Racine DVSBenoit Michel
 
Extending Hadoop for Fun & Profit
Extending Hadoop for Fun & ProfitExtending Hadoop for Fun & Profit
Extending Hadoop for Fun & ProfitMilind Bhandarkar
 
Track 1 session 2 - st dev con 2016 - dsp concepts - innovating iot+wearab...
Track 1   session 2 - st dev con 2016 -  dsp concepts - innovating iot+wearab...Track 1   session 2 - st dev con 2016 -  dsp concepts - innovating iot+wearab...
Track 1 session 2 - st dev con 2016 - dsp concepts - innovating iot+wearab...ST_World
 
JP Keynote Nikkei Embedded Processor Symposium 2002
JP Keynote Nikkei Embedded Processor Symposium 2002JP Keynote Nikkei Embedded Processor Symposium 2002
JP Keynote Nikkei Embedded Processor Symposium 2002Lee Flanagin
 
Efficient video perception through AI
Efficient video perception through AIEfficient video perception through AI
Efficient video perception through AIQualcomm Research
 
AudioCodes Session Border Controller Update
AudioCodes Session Border Controller UpdateAudioCodes Session Border Controller Update
AudioCodes Session Border Controller UpdateJohn D'Annunzio
 
SUT Video Collaboration Overview
SUT Video Collaboration OverviewSUT Video Collaboration Overview
SUT Video Collaboration OverviewJohn Grindley
 
Commercial pres for slideshare
Commercial pres for slideshareCommercial pres for slideshare
Commercial pres for slideshareKatherine Rush
 
Maniteja_Professional_Resume
Maniteja_Professional_ResumeManiteja_Professional_Resume
Maniteja_Professional_ResumeVaddi Maniteja
 
[DSC Europe 22] Make some noise for AI in JavaScript - Sead Delalic
[DSC Europe 22] Make some noise for AI in JavaScript - Sead Delalic[DSC Europe 22] Make some noise for AI in JavaScript - Sead Delalic
[DSC Europe 22] Make some noise for AI in JavaScript - Sead DelalicDataScienceConferenc1
 
Ccvp plus module 1
Ccvp plus module 1Ccvp plus module 1
Ccvp plus module 1Le Ngoc Viet
 
NetSuite For Manufacturing _ Cloud Manufacturing Software for Modern Manufact...
NetSuite For Manufacturing _ Cloud Manufacturing Software for Modern Manufact...NetSuite For Manufacturing _ Cloud Manufacturing Software for Modern Manufact...
NetSuite For Manufacturing _ Cloud Manufacturing Software for Modern Manufact...Net at Work
 
Design & Simulation With Verilog
Design & Simulation With Verilog Design & Simulation With Verilog
Design & Simulation With Verilog Semi Design
 
ShowNTell: An easy-to-use tool for answering students’ questions with voice-o...
ShowNTell: An easy-to-use tool for answering students’ questions with voice-o...ShowNTell: An easy-to-use tool for answering students’ questions with voice-o...
ShowNTell: An easy-to-use tool for answering students’ questions with voice-o...Anand Bhojan
 
Introduction to Fog
Introduction to FogIntroduction to Fog
Introduction to FogCisco DevNet
 
Cisco Multi-Service FAN Solution
Cisco Multi-Service FAN SolutionCisco Multi-Service FAN Solution
Cisco Multi-Service FAN SolutionCisco DevNet
 
DTT Regionalization @ iTVF2015 - Istanbul
DTT Regionalization @ iTVF2015 - IstanbulDTT Regionalization @ iTVF2015 - Istanbul
DTT Regionalization @ iTVF2015 - IstanbulBerry Eskes
 
Janus + Audio @ Open Source World
Janus + Audio @ Open Source WorldJanus + Audio @ Open Source World
Janus + Audio @ Open Source WorldLorenzo Miniero
 

Similar to “Comparing ML-Based Audio with ML-Based Vision: An Introduction to ML Audio for ML Vision Engineers,” a Presentation from DSP Concepts (20)

ODSA Sub-Project Launch
 ODSA Sub-Project Launch ODSA Sub-Project Launch
ODSA Sub-Project Launch
 
iDiff 2008 conference #6 IP-Racine DVS
iDiff 2008 conference #6  IP-Racine DVSiDiff 2008 conference #6  IP-Racine DVS
iDiff 2008 conference #6 IP-Racine DVS
 
Extending Hadoop for Fun & Profit
Extending Hadoop for Fun & ProfitExtending Hadoop for Fun & Profit
Extending Hadoop for Fun & Profit
 
Track 1 session 2 - st dev con 2016 - dsp concepts - innovating iot+wearab...
Track 1   session 2 - st dev con 2016 -  dsp concepts - innovating iot+wearab...Track 1   session 2 - st dev con 2016 -  dsp concepts - innovating iot+wearab...
Track 1 session 2 - st dev con 2016 - dsp concepts - innovating iot+wearab...
 
What’s new in MPEG?
What’s new in MPEG?What’s new in MPEG?
What’s new in MPEG?
 
JP Keynote Nikkei Embedded Processor Symposium 2002
JP Keynote Nikkei Embedded Processor Symposium 2002JP Keynote Nikkei Embedded Processor Symposium 2002
JP Keynote Nikkei Embedded Processor Symposium 2002
 
Efficient video perception through AI
Efficient video perception through AIEfficient video perception through AI
Efficient video perception through AI
 
AudioCodes Session Border Controller Update
AudioCodes Session Border Controller UpdateAudioCodes Session Border Controller Update
AudioCodes Session Border Controller Update
 
SUT Video Collaboration Overview
SUT Video Collaboration OverviewSUT Video Collaboration Overview
SUT Video Collaboration Overview
 
Commercial pres for slideshare
Commercial pres for slideshareCommercial pres for slideshare
Commercial pres for slideshare
 
Maniteja_Professional_Resume
Maniteja_Professional_ResumeManiteja_Professional_Resume
Maniteja_Professional_Resume
 
[DSC Europe 22] Make some noise for AI in JavaScript - Sead Delalic
[DSC Europe 22] Make some noise for AI in JavaScript - Sead Delalic[DSC Europe 22] Make some noise for AI in JavaScript - Sead Delalic
[DSC Europe 22] Make some noise for AI in JavaScript - Sead Delalic
 
Ccvp plus module 1
Ccvp plus module 1Ccvp plus module 1
Ccvp plus module 1
 
NetSuite For Manufacturing _ Cloud Manufacturing Software for Modern Manufact...
NetSuite For Manufacturing _ Cloud Manufacturing Software for Modern Manufact...NetSuite For Manufacturing _ Cloud Manufacturing Software for Modern Manufact...
NetSuite For Manufacturing _ Cloud Manufacturing Software for Modern Manufact...
 
Design & Simulation With Verilog
Design & Simulation With Verilog Design & Simulation With Verilog
Design & Simulation With Verilog
 
ShowNTell: An easy-to-use tool for answering students’ questions with voice-o...
ShowNTell: An easy-to-use tool for answering students’ questions with voice-o...ShowNTell: An easy-to-use tool for answering students’ questions with voice-o...
ShowNTell: An easy-to-use tool for answering students’ questions with voice-o...
 
Introduction to Fog
Introduction to FogIntroduction to Fog
Introduction to Fog
 
Cisco Multi-Service FAN Solution
Cisco Multi-Service FAN SolutionCisco Multi-Service FAN Solution
Cisco Multi-Service FAN Solution
 
DTT Regionalization @ iTVF2015 - Istanbul
DTT Regionalization @ iTVF2015 - IstanbulDTT Regionalization @ iTVF2015 - Istanbul
DTT Regionalization @ iTVF2015 - Istanbul
 
Janus + Audio @ Open Source World
Janus + Audio @ Open Source WorldJanus + Audio @ Open Source World
Janus + Audio @ Open Source World
 

More from Edge AI and Vision Alliance

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...Edge AI and Vision Alliance
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...Edge AI and Vision Alliance
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...Edge AI and Vision Alliance
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...Edge AI and Vision Alliance
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...Edge AI and Vision Alliance
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...Edge AI and Vision Alliance
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...Edge AI and Vision Alliance
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsightsEdge AI and Vision Alliance
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...Edge AI and Vision Alliance
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...Edge AI and Vision Alliance
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...Edge AI and Vision Alliance
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...Edge AI and Vision Alliance
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...Edge AI and Vision Alliance
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...Edge AI and Vision Alliance
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...Edge AI and Vision Alliance
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from SamsaraEdge AI and Vision Alliance
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...Edge AI and Vision Alliance
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...Edge AI and Vision Alliance
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...Edge AI and Vision Alliance
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...Edge AI and Vision Alliance
 

More from Edge AI and Vision Alliance (20)

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
 

Recently uploaded

Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your Queries
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your QueriesExploring ChatGPT Prompt Hacks To Maximally Optimise Your Queries
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your QueriesSanjay Willie
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 

Recently uploaded (20)

Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your Queries
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your QueriesExploring ChatGPT Prompt Hacks To Maximally Optimise Your Queries
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your Queries
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 

“Comparing ML-Based Audio with ML-Based Vision: An Introduction to ML Audio for ML Vision Engineers,” a Presentation from DSP Concepts

  • 1. Comparing ML-Based Audio with ML-Based Vision: An Introduction to ML Audio for ML Vision Engineers Josh Morris Engineering Manager, Machine Learning DSP Concepts
  • 2. The Audio of Things Approach DSP Concepts helps product makers deliver remarkable Audio Experience through a flexible and modular approach within a design platform environment. This system makes the entire workflow faster and easier across prototyping, design, debugging, tuning, production, and even over-the-air updates. The Audio Experience Design platform that accelerates feature development and makes audio innovation easy Audio IP • Building blocks to product-ready solutions Prototyping Hardware • Test and verify performance Debug Tools • Pinpoint issues and reduce risk Tuning Tools • Optimize playback and voice control Expertise • Algorithm, software, tuning, testing Worldwide Support Santa Clara, Stuttgart, Taipei, Seoul © 2022 DSP Concepts 2
  • 3. Processing at the edge is getting more and more powerful making it possible to do things that were reserved for the cloud Audio is becoming increasingly popular • Standalone applications • Smart assistant • Voice control • Denoising • Multi-modal applications • Industrial sensing • Anomaly detection Motivation for Talk © 2022 DSP Concepts 3
  • 4. • Feature engineering • How is audio different from vision? • How is audio like vision? • Implementation • Similarities • Differences • Common problems in vision and their audio analogues • Classification • Sequence decoding • Restoration/denoising Agenda © 2022 DSP Concepts 4
  • 6. • Audio is a sequence of samples • Somewhere between video/images • Inherent left to right structure to data • Sample rate • Bit depth How Are Audio Features Different from Vision Features? © 2022 DSP Concepts https://en.wikipedia.org/wiki/Sampling_(signal_processing) https://en.wikipedia.org/wiki/File:Mike_Austin_Sequence.JPG 6
  • 7. A lot of techniques employed for ML audio-based solutions borrow techniques from vision. This means we need to take a 1D sequence of samples and make them look like an image. Manipulating Audio from Time Domain to Frequency Domain © 2022 DSP Concepts https://en.wikipedia.org/wiki/Sampling_(signal_processing) https://upload.wikimedia.org/wikipedia/commons/c/c5/Spectrogram-19thC.png 7
  • 8. • Fast Fourier transform (FFT) • Shows frequency over time • Linearly spaced frequency bins • Can apply processing in the frequency domain and then use an inverse fast Fourier transform (IFFT) to get time domain audio Fast Fourier Transform © 2022 DSP Concepts https://commons.wikimedia.org/wiki/File:FFT-Time-Frequency-View.png 8
  • 9. • Built from short time Fourier transform • Repeat Fourier transform with set window and hop size • short-time Fourier transform (STFT) • Take magnitude squared of frequency bins • Common for classification tasks Power Spectrogram © 2022 DSP Concepts https://upload.wikimedia.org/wikipedia/commons/9/99/Mount_Rainier_soundscape.jpg 9
  • 10. • Constant Q transform (CQT) • Logarithmically spaced frequency bins • Popular for musical applications • More computationally efficient since fewer bins are needed to cover a frequency range Constant Q Transform © 2022 DSP Concepts https://en.wikipedia.org/wiki/Constant-Q_transform#/media/File:CQT-piano-chord.png 10
  • 11. • Mel spectrogram • Triangular frequency windows • Filter banks that attempt to approximate human hearing • Humans struggle to hear frequencies that are close together. This anomaly is known as masking. Mel Spectrogram © 2022 DSP Concepts http://www.ifp.illinois.edu/~minhdo/teaching/speaker_recognition/speaker_recognition.html 11
  • 12. • Feature normalization • Multi-channel features for complex audio • Color channels in audio • Real and imaginary components in frequency time How Are Audio Features the Same as Vision? © 2022 DSP Concepts https://en.wikipedia.org/wiki/Fourier_transform#/media/File:Rising_circular.gif https://en.wikipedia.org/wiki/Grayscale#/media/File:Beyoglu_4671_tricolor.png 12
  • 14. • Take audio and create an image-like input with fixed dimensions. • Take the input signal into the frequency domain • Take a window of feature vectors • Slide window with a hop, usually smaller than the input dimension of the model How Do Implementations of Audio Solutions Differ from Vision Solutions? https://www.jonnor.com/2021/12/audio-classification-with-machine-learning-europython-2019/ © 2022 DSP Concepts 14
  • 15. • Use the same layers • Convolutional layers • Conventional • Depth-wise separable • Residual blocks • RNN • Use the same training tricks How Are Implementations of Audio Solutions Like Vision Solutions? © 2022 DSP Concepts https://www.semanticscholar.org/paper/Singing-Voice-Separation-with-Deep-U-Net- Networks-Jansson-Humphrey/83ea11b45cba0fc7ee5d60f608edae9c1443861d 15
  • 16. • Some popular vision model architectures show up • EfficientNet • Similar concepts • Encoder-decoder network • Transfer learning How Are Implementations of Audio Solutions Like Vision Solutions? https://ai.googleblog.com/2021/08/soundstream-end-to-end-neural-audio.html © 2022 DSP Concepts 16
  • 17. Common Problems in Vision and Their Audio Analogues
  • 18. • Very similar • Convolution layers pull out structural information • Trained on frequency domain features • In audio, we usually have a sliding window • Like working with a camera stream • Can be important for energy savings in complex systems • Motion detection • Voice activity detection Classification https://www.tensorflow.org/tutorials/audio/simple_audo © 2022 DSP Concepts 18
  • 19. • Vision • Optical character recognition • Audio • Automatic speech recognition • Both look for structural information in their inputs and decode them to a character sequence • Similar architectures • RCNN • Transformer Sequence Decoding © 2022 DSP Concepts https://distill.pub/2017/ctc/ 19
  • 20. • Vision • Directly regress the image • Audio • Regress a gain mask which is applied to audio stream • Applied in a streaming fashion • Window and hop Restoration/Denoising © 2022 DSP Concepts https://www.mathworks.com/help/audio/ug/denoise-speech-using-deep- learning-networks.html 20
  • 21. • Feature engineering • After some preprocessing things are more similar than not • Implementation • Windowing with a set stride and hop allows us to deal with streams of data • Common problems in vision and their audio analogues • Classification • Sequence decoding • Restoration/denoising Conclusion © 2022 DSP Concepts 21
  • 22. Resources Getting Started with Audio Audio Classification using Transfer Learning https://www.tensorflow.org/tutorials/audio/transfer _learning_audio Speech Command Recognition https://www.tensorflow.org/tutorials/audio/simple_ audio Get a 30 Day Trial of Audio Weaver https://w.dspconcepts.com/audio-weaver © 2022 DSP Concepts 22
  • 23. Audio Weaver accelerates audio feature development and enables collaboration across product teams. With over 550 optimized processing modules, audio designs can be developed and implemented on hardware without writing any DSP code. The Audio Weaver Framework: Overview © 2022 DSP Concepts AWE Designer Windows-based graphical design environment  Standard Edition: Design GUI  Pro Edition: Works with MathWorks® MATLAB® platform Audio IP Modules Building blocks for product developers  From low-level primitives to complete designs  From DSP Concepts and our third-party partners AWE Core The embedded processing engine  Optimized target-specific libraries  Available for multiple processors  Supports multicore and multi-instance implementation 23