SlideShare a Scribd company logo
1 of 33
Download to read offline
Deep Learning with
Audio Signals
Prepare, Process, Design, Expect
Keunwoo Ch i
Keunwoo Choi
QMUL, UK

ETRI, S. Korea

SNU, S. Korea

@keunwoochoi (twtr, github)
Research Scientist
WARNING
THIS MATERIAL IS WRITTEN FOR ATTENDEES IN
QCON.AI, NAMELY, SOFTWARE ENGINEERS AND DEEP
LEARNING PRACTITIONERS TO PROVIDE AN OFF-THE-
SHELF GUIDE. MY ADVICE MIGHT NOT BE THE FINAL
SOLUTION FOR YOUR PROBLEM, BUT WOULD BE A
GOOD STARTING POINT.

..ALSO, THERE'S NO SPOTIFY SECRET HERE :P
Content
• Prepare the dataset

• Pre-process the signal

• Design your network

• Expect the result
Prepare the datasets
or, know your data
Q. How to start an audio task?
LMGTFY
• Google them, of course

• But....
Audio dataset
• Lucky → the exactly same class(es), many of them, yay!

• Meh → same or similar classes, sounds alright..

• Ugh.. → there are 2 in freesound.org and 3 on youtube
Audio (or, sound) dataset
• Our algorithm is living in the
digital space

• So is the .wav files

• But,

the sound is in the real world
Our lovely cyberspace
Audio dataset
Source
Noise
Reverberation
Microphone
• Room reverberation image from https://johnlsayers.com/Recmanual/Pages/Reverb.htm
Audio dataset
Dear everyone,
YOU ARE ALWAYS IN THE
"UGH..." SITUATION
→ HOW TO BUILD A CORRECT
AUDIO DATASET?
What we can do
• Know your real situation

• You can mimic noise/reverberation/mic if you have

• clean/dry/high-quality source signals
DL models are robust only within the variance they've seen.
→ Good at interpolation.. only.
E.g., a model trained with clean signals probably can't deal with noisy signals
noisy environment cheap mic
Simulate the real world
+ noise signalclean signal noisy signal
room impulse responsedry signal wet signal
band-pass filter
original
signal
recorded
signal
What to Google
Noise
babble noise recording
home noise recording
cafe noise recording
street noise recording
white noise, brown noise
x_noise = x + alpha * noise
Reverberation

(maybe skip it)
room impulse responses, RIR
reverberation simulators
x_wet = np.conv(x, rir)
Microphone
band pass filter
scipy.signal filtering
microphone specification
speaker specification
microphone frequency response
scipy.signal.convolve

scipy.signal.fftconvolve

Or trimming-off your
spectrograms
Pre-process the signals
or, log(melgram)
Q. What to do after loading the signals?
Digital Audio 101
• 1 second of digital audio:

size=(44100, ), dtype=int16

• MNIST: (28, 28, 1), int8

CIFAR10: (32, 32, 3), int8

ImageNet: (256, 256, 3), int8

• Audio: Lots of data points in
one item!
Audio representations
Type Description
Data shape and size

for e.g., 1 second,

sampling rate=44100
Waveform x
44100 x [int16]

Spectrograms
STFT(x)
Melspectrogram(x)
CQT(x)
513 x 87 x [float32]

128 x 87 x [float32]

72 x 87 x [float32]
Features
MFCC(x)

= some process on STFT(x)
20 x 87 x [float32]
Spoiler: log10(Melspectrograms) for the win,
but let's see some details
Spectrograms
• 2-dim representation of audio signal
TODO: IMAGE
Practitioner's choice
• Rule of thumb: DISCARD ALL THE REDUNDANCY

• Sample rate, or bandwidth

• Goal: To optimize the input audio data for your model

• by resampling - can be computation heavier

• by discarding some freq bands - can be storage heavy
https://www.summerrankin.com/dogandponyshow/2017/10/16/catdog
Practitioner's choice
• Melspectrogram

- in decibel scale

- which only covers the frequency range you're
interested in.

• Why?

- smaller, therefore easier and faster training

- perceptual - weighing more on the freq region where
humans are more interested

- faster than CQT to compute

- decibel scale - another perceptually motivated choice
Q. Ok, how can I compute them?
import librosa
import madmom
• Python libraries - librosa/madmom/scipy/.. 

• Computations on CPU

• Best when all the processing will be done before
the training
import kapre
• Keras Audio Preprocessing layers

• CPU and GPU

• Best when you want to do things on the fly/GPU

= Best to optimize audio-related parameters
• pip install kapre

• There's also pytorch-audio!Disclaimer: I'm the maintainer
Design your network
or, know the assumptions
Q. What kind of network structure I need?
A dumb-but-strong-therefore-good-while-
annoying-since-it's-from-computer-vision
baseline approach
• Trim the signals properly (e.g. 1-sec)

• Do the classification with 2D
convnet, 3x3 kernel (=aka vggnet)

• Raise $1B
• Retire
• Post "why i retired.." on Medium
• Happy life!
Go even dumber
• Just download some pre-trained networks for..

- music

- audio

- image (?)

• Re-use it for your task (aka transfer learning)

• 1B - retire - Medium - happy - repeat
Better and stronger,
by understanding assumptions
• assert "Receptive field" size == size of the target pattern

• How sparse the target pattern is?

- Bird singing sparse? 

- Voice-in-music sparse? 

- Distortion-guitar-in-Metallica sparse?
Have no idea?
• Go see how computer vision people are doing

• Clone it

• It's ok, it's a good baseline at least
My spectrogram is 28x28 bc
the model I downloaded is
trained on MNIST
Don't use spectrograms as if
they are images
It all boils down to the
pattern recognition, they're
actually similar tasks.
the time and frequency axes
have totally different
meanings
I don't know how to
incorporate them into my
model.. BUT IT WORKS!
Expecting the result
or, know the problem
Q. How would it work?
YOU
• You are responsible for the feasibility

• Is it a task you can?

• Is the information in the input (mel-spectrogram)?

• Are similar tasks being solved?
Think about it!
• Is it possible? To what extent? E.g., 

• Baby crying detection

• Baby crying recognition and classification

• Dog barking translation

• Hit song detection
Conclusion
Conclusion..
Conclusion!
Conclusion
• Sound is analog, you might need to think about some
analog process, too.

• Pre-process: Follow others when you're lost

• Audio is big in data size, but sparse in information.
Reduce the size. Don't start with end-to-end.

• Design: Follow others when you're lost

• Expect: Make sure if it's doable
Deep Learning with
Audio Signal
Prepare, Process, Design, Expect
Keunwoo Ch i
Q&A
PS. See you soon at the panel talk!

More Related Content

What's hot

Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...Simplilearn
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...Simplilearn
 
SPEECH RECOGNITION USING NEURAL NETWORK
SPEECH RECOGNITION USING NEURAL NETWORK SPEECH RECOGNITION USING NEURAL NETWORK
SPEECH RECOGNITION USING NEURAL NETWORK Kamonasish Hore
 
Object detection with Tensorflow Api
Object detection with Tensorflow ApiObject detection with Tensorflow Api
Object detection with Tensorflow ApiArwinKhan1
 
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...Simplilearn
 
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Simplilearn
 
Variational Autoencoder Tutorial
Variational Autoencoder Tutorial Variational Autoencoder Tutorial
Variational Autoencoder Tutorial Hojin Yang
 
Recurrent neural networks rnn
Recurrent neural networks   rnnRecurrent neural networks   rnn
Recurrent neural networks rnnKuppusamy P
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learningbutest
 
Deep learning for music classification, 2016-05-24
Deep learning for music classification, 2016-05-24Deep learning for music classification, 2016-05-24
Deep learning for music classification, 2016-05-24Keunwoo Choi
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksUsman Qayyum
 
Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...
Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...
Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...Simplilearn
 
RNN and its applications
RNN and its applicationsRNN and its applications
RNN and its applicationsSungjoon Choi
 
Introduction to TensorFlow 2.0
Introduction to TensorFlow 2.0Introduction to TensorFlow 2.0
Introduction to TensorFlow 2.0Databricks
 
Speech Recognition with Deep Neural Networks (D3L2 Deep Learning for Speech a...
Speech Recognition with Deep Neural Networks (D3L2 Deep Learning for Speech a...Speech Recognition with Deep Neural Networks (D3L2 Deep Learning for Speech a...
Speech Recognition with Deep Neural Networks (D3L2 Deep Learning for Speech a...Universitat Politècnica de Catalunya
 
"Image and Video Summarization," a Presentation from the University of Washin...
"Image and Video Summarization," a Presentation from the University of Washin..."Image and Video Summarization," a Presentation from the University of Washin...
"Image and Video Summarization," a Presentation from the University of Washin...Edge AI and Vision Alliance
 
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...Edureka!
 
Image classification using cnn
Image classification using cnnImage classification using cnn
Image classification using cnnRahat Yasir
 

What's hot (20)

Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
 
SPEECH RECOGNITION USING NEURAL NETWORK
SPEECH RECOGNITION USING NEURAL NETWORK SPEECH RECOGNITION USING NEURAL NETWORK
SPEECH RECOGNITION USING NEURAL NETWORK
 
Speech Recognition System
Speech Recognition SystemSpeech Recognition System
Speech Recognition System
 
Object detection with Tensorflow Api
Object detection with Tensorflow ApiObject detection with Tensorflow Api
Object detection with Tensorflow Api
 
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
 
Cnn
CnnCnn
Cnn
 
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
 
Variational Autoencoder Tutorial
Variational Autoencoder Tutorial Variational Autoencoder Tutorial
Variational Autoencoder Tutorial
 
Recurrent neural networks rnn
Recurrent neural networks   rnnRecurrent neural networks   rnn
Recurrent neural networks rnn
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learning
 
Deep learning for music classification, 2016-05-24
Deep learning for music classification, 2016-05-24Deep learning for music classification, 2016-05-24
Deep learning for music classification, 2016-05-24
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural Networks
 
Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...
Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...
Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...
 
RNN and its applications
RNN and its applicationsRNN and its applications
RNN and its applications
 
Introduction to TensorFlow 2.0
Introduction to TensorFlow 2.0Introduction to TensorFlow 2.0
Introduction to TensorFlow 2.0
 
Speech Recognition with Deep Neural Networks (D3L2 Deep Learning for Speech a...
Speech Recognition with Deep Neural Networks (D3L2 Deep Learning for Speech a...Speech Recognition with Deep Neural Networks (D3L2 Deep Learning for Speech a...
Speech Recognition with Deep Neural Networks (D3L2 Deep Learning for Speech a...
 
"Image and Video Summarization," a Presentation from the University of Washin...
"Image and Video Summarization," a Presentation from the University of Washin..."Image and Video Summarization," a Presentation from the University of Washin...
"Image and Video Summarization," a Presentation from the University of Washin...
 
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
 
Image classification using cnn
Image classification using cnnImage classification using cnn
Image classification using cnn
 

Similar to Deep Learning with Audio Signals: Prepare, Process, Design, Expect

Deep learning: the future of recommendations
Deep learning: the future of recommendationsDeep learning: the future of recommendations
Deep learning: the future of recommendationsBalázs Hidasi
 
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 Separating Hype from Reality in Deep Learning with Sameer Farooqui Separating Hype from Reality in Deep Learning with Sameer Farooqui
Separating Hype from Reality in Deep Learning with Sameer FarooquiDatabricks
 
Digital signal processing through speech, hearing, and Python
Digital signal processing through speech, hearing, and PythonDigital signal processing through speech, hearing, and Python
Digital signal processing through speech, hearing, and PythonMel Chua
 
Build your own speech to text dataset in 30 days
Build your own speech to text dataset in 30 daysBuild your own speech to text dataset in 30 days
Build your own speech to text dataset in 30 daysDmytro Naumov
 
Multimodal Learning Analytics
Multimodal Learning AnalyticsMultimodal Learning Analytics
Multimodal Learning AnalyticsXavier Ochoa
 
Apache MXNet ODSC West 2018
Apache MXNet ODSC West 2018Apache MXNet ODSC West 2018
Apache MXNet ODSC West 2018Apache MXNet
 
Introduction to Neural Networks
Introduction to Neural NetworksIntroduction to Neural Networks
Introduction to Neural NetworksDatabricks
 
Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Julien SIMON
 
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflowNVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflowNVIDIA Taiwan
 
Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersRoelof Pieters
 
AlphaZero and beyond: Polygames
AlphaZero and beyond: PolygamesAlphaZero and beyond: Polygames
AlphaZero and beyond: PolygamesOlivier Teytaud
 
Music recommendations @ MLConf 2014
Music recommendations @ MLConf 2014Music recommendations @ MLConf 2014
Music recommendations @ MLConf 2014Erik Bernhardsson
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.pptRahulTr22
 
Data science programming .ppt
Data science programming .pptData science programming .ppt
Data science programming .pptGanesh E
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.pptkalai75
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.pptAravind Reddy
 
Introduction to Neural Networks in Tensorflow
Introduction to Neural Networks in TensorflowIntroduction to Neural Networks in Tensorflow
Introduction to Neural Networks in TensorflowNicholas McClure
 
Fun with MATLAB
Fun with MATLABFun with MATLAB
Fun with MATLABritece
 
Deep learning from scratch
Deep learning from scratch Deep learning from scratch
Deep learning from scratch Eran Shlomo
 

Similar to Deep Learning with Audio Signals: Prepare, Process, Design, Expect (20)

Deep learning: the future of recommendations
Deep learning: the future of recommendationsDeep learning: the future of recommendations
Deep learning: the future of recommendations
 
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 Separating Hype from Reality in Deep Learning with Sameer Farooqui Separating Hype from Reality in Deep Learning with Sameer Farooqui
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 
Digital signal processing through speech, hearing, and Python
Digital signal processing through speech, hearing, and PythonDigital signal processing through speech, hearing, and Python
Digital signal processing through speech, hearing, and Python
 
Build your own speech to text dataset in 30 days
Build your own speech to text dataset in 30 daysBuild your own speech to text dataset in 30 days
Build your own speech to text dataset in 30 days
 
Multimodal Learning Analytics
Multimodal Learning AnalyticsMultimodal Learning Analytics
Multimodal Learning Analytics
 
Apache MXNet ODSC West 2018
Apache MXNet ODSC West 2018Apache MXNet ODSC West 2018
Apache MXNet ODSC West 2018
 
Introduction to Neural Networks
Introduction to Neural NetworksIntroduction to Neural Networks
Introduction to Neural Networks
 
Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)
 
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflowNVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
 
Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ers
 
AlphaZero and beyond: Polygames
AlphaZero and beyond: PolygamesAlphaZero and beyond: Polygames
AlphaZero and beyond: Polygames
 
Music recommendations @ MLConf 2014
Music recommendations @ MLConf 2014Music recommendations @ MLConf 2014
Music recommendations @ MLConf 2014
 
Data Science
Data Science Data Science
Data Science
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.ppt
 
Data science programming .ppt
Data science programming .pptData science programming .ppt
Data science programming .ppt
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.ppt
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.ppt
 
Introduction to Neural Networks in Tensorflow
Introduction to Neural Networks in TensorflowIntroduction to Neural Networks in Tensorflow
Introduction to Neural Networks in Tensorflow
 
Fun with MATLAB
Fun with MATLABFun with MATLAB
Fun with MATLAB
 
Deep learning from scratch
Deep learning from scratch Deep learning from scratch
Deep learning from scratch
 

More from Keunwoo Choi

인공지능의 음악 인지 모델 - 65차 한국음악지각인지학회 기조강연 (최근우 박사)
인공지능의 음악 인지 모델 - 65차 한국음악지각인지학회 기조강연 (최근우 박사)인공지능의 음악 인지 모델 - 65차 한국음악지각인지학회 기조강연 (최근우 박사)
인공지능의 음악 인지 모델 - 65차 한국음악지각인지학회 기조강연 (최근우 박사)Keunwoo Choi
 
가상현실을 위한 오디오 기술
가상현실을 위한 오디오 기술가상현실을 위한 오디오 기술
가상현실을 위한 오디오 기술Keunwoo Choi
 
Conditional generative model for audio
Conditional generative model for audioConditional generative model for audio
Conditional generative model for audioKeunwoo Choi
 
Convolutional recurrent neural networks for music classification
Convolutional recurrent neural networks for music classificationConvolutional recurrent neural networks for music classification
Convolutional recurrent neural networks for music classificationKeunwoo Choi
 
The effects of noisy labels on deep convolutional neural networks for music t...
The effects of noisy labels on deep convolutional neural networks for music t...The effects of noisy labels on deep convolutional neural networks for music t...
The effects of noisy labels on deep convolutional neural networks for music t...Keunwoo Choi
 
dl4mir tutorial at ETRI, Korea
dl4mir tutorial at ETRI, Koreadl4mir tutorial at ETRI, Korea
dl4mir tutorial at ETRI, KoreaKeunwoo Choi
 
Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016
Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016
Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016Keunwoo Choi
 
Deep Convolutional Neural Networks - Overview
Deep Convolutional Neural Networks - OverviewDeep Convolutional Neural Networks - Overview
Deep Convolutional Neural Networks - OverviewKeunwoo Choi
 
딥러닝 개요 (2015-05-09 KISTEP)
딥러닝 개요 (2015-05-09 KISTEP)딥러닝 개요 (2015-05-09 KISTEP)
딥러닝 개요 (2015-05-09 KISTEP)Keunwoo Choi
 
Understanding Music Playlists
Understanding Music PlaylistsUnderstanding Music Playlists
Understanding Music PlaylistsKeunwoo Choi
 

More from Keunwoo Choi (10)

인공지능의 음악 인지 모델 - 65차 한국음악지각인지학회 기조강연 (최근우 박사)
인공지능의 음악 인지 모델 - 65차 한국음악지각인지학회 기조강연 (최근우 박사)인공지능의 음악 인지 모델 - 65차 한국음악지각인지학회 기조강연 (최근우 박사)
인공지능의 음악 인지 모델 - 65차 한국음악지각인지학회 기조강연 (최근우 박사)
 
가상현실을 위한 오디오 기술
가상현실을 위한 오디오 기술가상현실을 위한 오디오 기술
가상현실을 위한 오디오 기술
 
Conditional generative model for audio
Conditional generative model for audioConditional generative model for audio
Conditional generative model for audio
 
Convolutional recurrent neural networks for music classification
Convolutional recurrent neural networks for music classificationConvolutional recurrent neural networks for music classification
Convolutional recurrent neural networks for music classification
 
The effects of noisy labels on deep convolutional neural networks for music t...
The effects of noisy labels on deep convolutional neural networks for music t...The effects of noisy labels on deep convolutional neural networks for music t...
The effects of noisy labels on deep convolutional neural networks for music t...
 
dl4mir tutorial at ETRI, Korea
dl4mir tutorial at ETRI, Koreadl4mir tutorial at ETRI, Korea
dl4mir tutorial at ETRI, Korea
 
Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016
Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016
Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016
 
Deep Convolutional Neural Networks - Overview
Deep Convolutional Neural Networks - OverviewDeep Convolutional Neural Networks - Overview
Deep Convolutional Neural Networks - Overview
 
딥러닝 개요 (2015-05-09 KISTEP)
딥러닝 개요 (2015-05-09 KISTEP)딥러닝 개요 (2015-05-09 KISTEP)
딥러닝 개요 (2015-05-09 KISTEP)
 
Understanding Music Playlists
Understanding Music PlaylistsUnderstanding Music Playlists
Understanding Music Playlists
 

Recently uploaded

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 

Recently uploaded (20)

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 

Deep Learning with Audio Signals: Prepare, Process, Design, Expect

  • 1. Deep Learning with Audio Signals Prepare, Process, Design, Expect Keunwoo Ch i
  • 2. Keunwoo Choi QMUL, UK ETRI, S. Korea SNU, S. Korea @keunwoochoi (twtr, github) Research Scientist
  • 3. WARNING THIS MATERIAL IS WRITTEN FOR ATTENDEES IN QCON.AI, NAMELY, SOFTWARE ENGINEERS AND DEEP LEARNING PRACTITIONERS TO PROVIDE AN OFF-THE- SHELF GUIDE. MY ADVICE MIGHT NOT BE THE FINAL SOLUTION FOR YOUR PROBLEM, BUT WOULD BE A GOOD STARTING POINT. ..ALSO, THERE'S NO SPOTIFY SECRET HERE :P
  • 4. Content • Prepare the dataset • Pre-process the signal • Design your network • Expect the result
  • 5. Prepare the datasets or, know your data Q. How to start an audio task?
  • 6. LMGTFY • Google them, of course • But....
  • 7. Audio dataset • Lucky → the exactly same class(es), many of them, yay! • Meh → same or similar classes, sounds alright.. • Ugh.. → there are 2 in freesound.org and 3 on youtube
  • 8. Audio (or, sound) dataset • Our algorithm is living in the digital space • So is the .wav files • But,
 the sound is in the real world Our lovely cyberspace
  • 9. Audio dataset Source Noise Reverberation Microphone • Room reverberation image from https://johnlsayers.com/Recmanual/Pages/Reverb.htm
  • 10. Audio dataset Dear everyone, YOU ARE ALWAYS IN THE "UGH..." SITUATION → HOW TO BUILD A CORRECT AUDIO DATASET?
  • 11. What we can do • Know your real situation • You can mimic noise/reverberation/mic if you have • clean/dry/high-quality source signals DL models are robust only within the variance they've seen. → Good at interpolation.. only. E.g., a model trained with clean signals probably can't deal with noisy signals noisy environment cheap mic
  • 12. Simulate the real world + noise signalclean signal noisy signal room impulse responsedry signal wet signal band-pass filter original signal recorded signal
  • 13. What to Google Noise babble noise recording home noise recording cafe noise recording street noise recording white noise, brown noise x_noise = x + alpha * noise Reverberation (maybe skip it) room impulse responses, RIR reverberation simulators x_wet = np.conv(x, rir) Microphone band pass filter scipy.signal filtering microphone specification speaker specification microphone frequency response scipy.signal.convolve scipy.signal.fftconvolve Or trimming-off your spectrograms
  • 14. Pre-process the signals or, log(melgram) Q. What to do after loading the signals?
  • 15. Digital Audio 101 • 1 second of digital audio:
 size=(44100, ), dtype=int16 • MNIST: (28, 28, 1), int8
 CIFAR10: (32, 32, 3), int8
 ImageNet: (256, 256, 3), int8 • Audio: Lots of data points in one item!
  • 16. Audio representations Type Description Data shape and size for e.g., 1 second,
 sampling rate=44100 Waveform x 44100 x [int16] Spectrograms STFT(x) Melspectrogram(x) CQT(x) 513 x 87 x [float32] 128 x 87 x [float32] 72 x 87 x [float32] Features MFCC(x) = some process on STFT(x) 20 x 87 x [float32] Spoiler: log10(Melspectrograms) for the win, but let's see some details
  • 17. Spectrograms • 2-dim representation of audio signal TODO: IMAGE
  • 18. Practitioner's choice • Rule of thumb: DISCARD ALL THE REDUNDANCY • Sample rate, or bandwidth • Goal: To optimize the input audio data for your model • by resampling - can be computation heavier • by discarding some freq bands - can be storage heavy https://www.summerrankin.com/dogandponyshow/2017/10/16/catdog
  • 19. Practitioner's choice • Melspectrogram
 - in decibel scale
 - which only covers the frequency range you're interested in. • Why?
 - smaller, therefore easier and faster training
 - perceptual - weighing more on the freq region where humans are more interested
 - faster than CQT to compute
 - decibel scale - another perceptually motivated choice Q. Ok, how can I compute them?
  • 20. import librosa import madmom • Python libraries - librosa/madmom/scipy/.. • Computations on CPU • Best when all the processing will be done before the training
  • 21. import kapre • Keras Audio Preprocessing layers • CPU and GPU • Best when you want to do things on the fly/GPU
 = Best to optimize audio-related parameters • pip install kapre • There's also pytorch-audio!Disclaimer: I'm the maintainer
  • 22. Design your network or, know the assumptions Q. What kind of network structure I need?
  • 23. A dumb-but-strong-therefore-good-while- annoying-since-it's-from-computer-vision baseline approach • Trim the signals properly (e.g. 1-sec) • Do the classification with 2D convnet, 3x3 kernel (=aka vggnet) • Raise $1B • Retire • Post "why i retired.." on Medium • Happy life!
  • 24. Go even dumber • Just download some pre-trained networks for..
 - music
 - audio
 - image (?) • Re-use it for your task (aka transfer learning) • 1B - retire - Medium - happy - repeat
  • 25. Better and stronger, by understanding assumptions • assert "Receptive field" size == size of the target pattern • How sparse the target pattern is?
 - Bird singing sparse? 
 - Voice-in-music sparse? 
 - Distortion-guitar-in-Metallica sparse?
  • 26. Have no idea? • Go see how computer vision people are doing • Clone it • It's ok, it's a good baseline at least
  • 27. My spectrogram is 28x28 bc the model I downloaded is trained on MNIST Don't use spectrograms as if they are images It all boils down to the pattern recognition, they're actually similar tasks. the time and frequency axes have totally different meanings I don't know how to incorporate them into my model.. BUT IT WORKS!
  • 28. Expecting the result or, know the problem Q. How would it work?
  • 29. YOU • You are responsible for the feasibility • Is it a task you can? • Is the information in the input (mel-spectrogram)? • Are similar tasks being solved?
  • 30. Think about it! • Is it possible? To what extent? E.g., • Baby crying detection • Baby crying recognition and classification • Dog barking translation • Hit song detection
  • 32. Conclusion • Sound is analog, you might need to think about some analog process, too. • Pre-process: Follow others when you're lost • Audio is big in data size, but sparse in information. Reduce the size. Don't start with end-to-end. • Design: Follow others when you're lost • Expect: Make sure if it's doable
  • 33. Deep Learning with Audio Signal Prepare, Process, Design, Expect Keunwoo Ch i Q&A PS. See you soon at the panel talk!