SlideShare a Scribd company logo
Chord Recognition
Aka. Chord detection, audio chord estimation
Mu-Heng Yang – RA at CITI, Academia Sinica
TODO
 If no chord -> N
C C Em Em | C C Em Em
Purpose
Evaluation
 OR = Overlap Ratio
 WAOR = Weighted Average Overlap Ratio
 Ex:song1(3min)OR=73%
song2(6min)OR=70%
WAOR=71%
Evaluation
 CSR = Chord Symbol Recall
 WCSR = Weighted Chord Symbol Recall
 Because ground truth is almost continuous
 Discretize the ground truth will cause small error
Chord
 <Root>:<Type>/<Inversion>
 Root: C, C#, D, … , A, A#, B
 Type: maj, min, 7, maj7, min7
 Inversion: 3 or 5 or 7
 Other: aug, dim, sus, 1, 4, 5, 6, 9, 11, 13
MIREX
 Chord root note only
 Major and minor
 Seventh chords
 Major and minor with inversions
 Seventh chords with inversions
Data
 hard to get audio files due to copyrights.
 Remastering results in different audio files
 180 songs from the Beatles dataset
 100 songs from the RWC Pop dataset
 18 songs from the Zweieck dataset
 19 songs from the Queen dataset
 198 songs from the Billboard dataset
Ambiguity
 Fade out
 Staccato
 Many chords at the same time
 Overlapped due to reverb
 Different chord mapping
Basic Model
Chroma
Median filter
HMM/GMM
Viterbi
Outline
 Feature Extraction
 Pre-processing
 Learning
 Post-filtering
Outline
 Feature Extraction
 Pre-processing
 Learning
 Post-filtering
HPSS
 Harmonic Percussive Sound Separation
 Percussion Suppressed 50.9%→74.2%
 No harmonic structure
 Smooth frequency envelope
 Concentrated in a short time
 Demo
2010 Ueda et. al. [1]
Tuning
 Standard frequency of A4 is 440 Hz.
 Sometime tuning is deviated. (415~445 Hz)
 One song’s WCSR increase from 14.5% to 73.9%
 However, it’s very hard to get perfect tuning
Chroma
 A.k.a Pitch Class Profile
 Sum energies of frequencies within each bin
 Use log/power function to compress energy
 Sum up respective bins to get chroma vector
C = C1 + C2 + C3 + C4 + C5 + …
D = D1 + D2 + D3 + D4 + D5 + ...
Chroma
DNN features
 DNN likes lower-level features
 DNN likes lots of information
 Unfold each octave
 2~3 bins per semitone
 Some even directly use FFT
Outline
 Feature Extraction
 Pre-processing
 Learning
 Post-filtering
Beat Synchronize
 Assume there’s only one chord in a beat
 Average all frames within each beat
 It can smooth the noise and percussion
 But most NN-based models don’t use it
Median Filter
 It can also smooth the noise and percussion
 Without losing too many information
 But it may smear the chord boundary
Time-Splicing
 Include frames before and after the current one.
 If it’s the 7th frame, then concatenate the 6th & 8th
 Of course you can include more frames(3~11)
 It depends on the capability of your DNN model
Spliced Filter
 Inspired by CNN(convolution neural network)
 Use filters to include past and future frames
 Use max pooling to reduce feature dimension
 WCSR increases from 75.5% to 91.9% !
2015 Xinquan Zhou et. al. [2]
Outline
 Feature Extraction
 Pre-processing
 Learning
 Post-filtering
Learning
 Unsupervised Learning (pre-train)
Unlabeled Data
 Supervised Learning (fine-tune)
Labeled Data
Dimension Reduction
 Principal Component Analysis
 But it’s can only do linear combination
Autoencoder
 Use NN to do non-linear dimension reduction
For binary input
For real input
Denoising Autoencoder
 More robust to noisy input
 For binary input, set a random fraction of input to 0 or 1
 For real-valued input, use isotropic additive Gaussian noise
Stacked Denoising Autoencoder
 More layers with greedy layer-wise training
 Don’t need labeled data for pre-training
 Use the model to Initialize DNN(converge faster)
2014 Steenbergen et. al. [3]
Supervised-Learning
 Fine-tune by back-propagation after pre-training
 Predict the emission probability for each chord
 Softmax output layer at the end of DNN
Emis Frame1 Frame2 Frame3 Frame4 Frame5 Frame6
Cmaj 80% 70% 45% 40% 80% 85%
Fmaj 5% 5% 10% 5% 10% 5%
Gmaj 10% 15% 35% 50% 5% 5%
Amin 5% 10% 10% 5% 5% 5%
Prevent Overfitting
 Happen when models are too powerful
 Song-based Cross Validation
 Early-Stopping (20 iter)
 Dropout & DropConnect
 Weight Penalty, Weight Constraint
 Bottleneck Architecture
2010 Hinton et. al. [4]
https://www.coursera.org/course/neuralnets
Meta-Parameters
 Learning Rate: adapt by watching sign changes
 Mini-batches Sizes: 10~100
 Momentum: leave local min (0.5~0.9)
2010 Hinton et. al. [4]
https://www.coursera.org/course/neuralnets
Outline
 Feature Extraction
 Pre-processing
 Learning
 Post-filtering
Viterbi-decoding
 Get Transition Probability by counting
 Remember the best path to current node
Tran Cmaj Fmaj Gmaj Amin
Cmaj 80% 5% 10% 5%
Fmaj 20% 70% 5% 5%
Gmaj 10% 10% 70% 10%
Amin 10% 20% 60% 10%
Emis T1 T2 T3 T4 T5 T6
Cmaj 80% 70% 45% 40% 80% 85%
Fmaj 5% 5% 10% 5% 10% 5%
Gmaj 10% 15% 35% 50% 5% 5%
Amin 5% 10% 10% 5% 5% 5%
http://www.hooktheory.com/trends
Ensemble
 Average the prediction of many models
 Models are saved during cross validation
 Each model is trained with different data
Chord 1st Model 2nd Model 3rd Model Ensemble
Cmaj 50% 40% 30% 40%
Amin 20% 10% 30% 20%
Fmaj 20% 30% 10% 20%
Gmaj 10% 20% 30% 20%
RNN
 Recurrent Neural Network
 Combine acoustic model and language model
 Result in 80.6% WAOR
2015 Nicolas et. al. [5] [6]
RNN
 Back-propagation through time(BPTT)
 Weights are shared
2015 Nicolas et. al. [5] [6]
Unroll
DBN with Musical Context
 Dynamic Bayesian Network
 Depend on other information
 Map back to Cmaj key
 Chord Progession: I V vi IV
 Average chorus1~3
2010 Mauch et. al. [7] [8]
Reference
 [1] HMM-based Approach for Automatic Chord Detection Using Refined Acoustic Features
 [2] Chord Detection Using Deep Learning
 [3] Chord Recognition with Stacked Denoising Autoencoders
 [4] A Practical Guide to Training Restricted Boltzmann Machines
 [5] Audio Chord Recognition with Recurrent Neural Network
 [6] Audio Chord Recognition with a Hybrid Recurrent Neural Network
 [7] Simultaneous Estimation of Chords and Musical Context From Audio
 [8] Using Musical Structure to Enhance Automatic Chord Transcription
Q&A
 Thanks for listening.

More Related Content

What's hot

Asee gsw 2000
Asee gsw 2000Asee gsw 2000
Asee gsw 2000
Nahom Tewolde
 
Sampling
SamplingSampling
Speech enhancement for distant talking speech recognition
Speech enhancement for distant talking speech recognitionSpeech enhancement for distant talking speech recognition
Speech enhancement for distant talking speech recognition
Takuya Yoshioka
 
Pulse Modulation ppt
Pulse Modulation pptPulse Modulation ppt
Pulse Modulation ppt
sanjeev2419
 
Convolutional Generative Adversarial Networks with Binary Neurons for Polypho...
Convolutional Generative Adversarial Networks with Binary Neurons for Polypho...Convolutional Generative Adversarial Networks with Binary Neurons for Polypho...
Convolutional Generative Adversarial Networks with Binary Neurons for Polypho...
Hao-Wen (Herman) Dong
 
Channel modeling of a plasma plume general
Channel modeling of a  plasma plume generalChannel modeling of a  plasma plume general
Channel modeling of a plasma plume general
Christian Zuniga, PhD
 
Audio Processing
Audio ProcessingAudio Processing
Audio Processing
aneetaanu
 
Multirate dtsp
Multirate dtspMultirate dtsp
Multirate dtsp
Anjali Yadav
 
Sampling Theorem
Sampling TheoremSampling Theorem
Sampling Theorem
Dr Naim R Kidwai
 
Weakly-Supervised Sound Event Detection with Self-Attention
Weakly-Supervised Sound Event Detection with Self-AttentionWeakly-Supervised Sound Event Detection with Self-Attention
Weakly-Supervised Sound Event Detection with Self-Attention
NU_I_TODALAB
 

What's hot (10)

Asee gsw 2000
Asee gsw 2000Asee gsw 2000
Asee gsw 2000
 
Sampling
SamplingSampling
Sampling
 
Speech enhancement for distant talking speech recognition
Speech enhancement for distant talking speech recognitionSpeech enhancement for distant talking speech recognition
Speech enhancement for distant talking speech recognition
 
Pulse Modulation ppt
Pulse Modulation pptPulse Modulation ppt
Pulse Modulation ppt
 
Convolutional Generative Adversarial Networks with Binary Neurons for Polypho...
Convolutional Generative Adversarial Networks with Binary Neurons for Polypho...Convolutional Generative Adversarial Networks with Binary Neurons for Polypho...
Convolutional Generative Adversarial Networks with Binary Neurons for Polypho...
 
Channel modeling of a plasma plume general
Channel modeling of a  plasma plume generalChannel modeling of a  plasma plume general
Channel modeling of a plasma plume general
 
Audio Processing
Audio ProcessingAudio Processing
Audio Processing
 
Multirate dtsp
Multirate dtspMultirate dtsp
Multirate dtsp
 
Sampling Theorem
Sampling TheoremSampling Theorem
Sampling Theorem
 
Weakly-Supervised Sound Event Detection with Self-Attention
Weakly-Supervised Sound Event Detection with Self-AttentionWeakly-Supervised Sound Event Detection with Self-Attention
Weakly-Supervised Sound Event Detection with Self-Attention
 

Similar to Chord recognition mac lab presentation

Audio Visual Emotion Recognition Using Cross Correlation and Wavelet Packet D...
Audio Visual Emotion Recognition Using Cross Correlation and Wavelet Packet D...Audio Visual Emotion Recognition Using Cross Correlation and Wavelet Packet D...
Audio Visual Emotion Recognition Using Cross Correlation and Wavelet Packet D...
Shamman Noor Shoudha
 
ISSCS2011
ISSCS2011ISSCS2011
ISSCS2011
Sphinx Tsau
 
NTSC Software Decoding Presentation
NTSC Software Decoding PresentationNTSC Software Decoding Presentation
NTSC Software Decoding Presentation
Prateek Dayal
 
Audio Separation Comparison: Clustering Repeating Period and Hidden Markov Model
Audio Separation Comparison: Clustering Repeating Period and Hidden Markov ModelAudio Separation Comparison: Clustering Repeating Period and Hidden Markov Model
Audio Separation Comparison: Clustering Repeating Period and Hidden Markov Model
Yao Yao
 
Mp3
Mp3Mp3
Characterizing the Heterogeneity of 2D Materials with Transmission Electron M...
Characterizing the Heterogeneity of 2D Materials with Transmission Electron M...Characterizing the Heterogeneity of 2D Materials with Transmission Electron M...
Characterizing the Heterogeneity of 2D Materials with Transmission Electron M...
University of Illinois at Urbana-Champaign
 
FORECASTING MUSIC GENRE (RNN - LSTM)
FORECASTING MUSIC GENRE (RNN - LSTM)FORECASTING MUSIC GENRE (RNN - LSTM)
FORECASTING MUSIC GENRE (RNN - LSTM)
IRJET Journal
 
Speaker Segmentation (2006)
Speaker Segmentation (2006)Speaker Segmentation (2006)
Speaker Segmentation (2006)
Luís Gustavo Martins
 
N017428692
N017428692N017428692
N017428692
IOSR Journals
 
Development of a Multipurpose Audio Transmission System on the Internet
Development of a Multipurpose Audio Transmission System on the InternetDevelopment of a Multipurpose Audio Transmission System on the Internet
Development of a Multipurpose Audio Transmission System on the Internet
Takashi Kishida
 
Teaching Computers to Listen to Music
Teaching Computers to Listen to MusicTeaching Computers to Listen to Music
Teaching Computers to Listen to Music
Eric Battenberg
 
DNA Splice site prediction
DNA Splice site predictionDNA Splice site prediction
DNA Splice site prediction
sageteam
 
Time Based Effects
Time Based EffectsTime Based Effects
Time Based Effects
Magic Finger Lounge
 
Artifact Detection and Removal from In-Vivo Neural Signals
Artifact Detection and Removal from In-Vivo Neural SignalsArtifact Detection and Removal from In-Vivo Neural Signals
Artifact Detection and Removal from In-Vivo Neural Signals
Md Kafiul Islam
 
MMCF: Multimodal Collaborative Filtering for Automatic Playlist Conitnuation
MMCF: Multimodal Collaborative Filtering for Automatic Playlist ConitnuationMMCF: Multimodal Collaborative Filtering for Automatic Playlist Conitnuation
MMCF: Multimodal Collaborative Filtering for Automatic Playlist Conitnuation
Hojin Yang
 
AviPulse - Presentation at YETI 20th Jan 2016
AviPulse - Presentation at YETI 20th Jan 2016AviPulse - Presentation at YETI 20th Jan 2016
AviPulse - Presentation at YETI 20th Jan 2016
Bhavin Chandarana
 
DNN-based permutation solver for frequency-domain independent component analy...
DNN-based permutation solver for frequency-domain independent component analy...DNN-based permutation solver for frequency-domain independent component analy...
DNN-based permutation solver for frequency-domain independent component analy...
Kitamura Laboratory
 
DNN-based frequency-domain permutation solver for multichannel audio source s...
DNN-based frequency-domain permutation solver for multichannel audio source s...DNN-based frequency-domain permutation solver for multichannel audio source s...
DNN-based frequency-domain permutation solver for multichannel audio source s...
Kitamura Laboratory
 
Intel Nervana Artificial Intelligence Meetup 11/30/16
Intel Nervana Artificial Intelligence Meetup 11/30/16Intel Nervana Artificial Intelligence Meetup 11/30/16
Intel Nervana Artificial Intelligence Meetup 11/30/16
Intel Nervana
 
An efficient peak valley detection based vad algorithm for robust detection o...
An efficient peak valley detection based vad algorithm for robust detection o...An efficient peak valley detection based vad algorithm for robust detection o...
An efficient peak valley detection based vad algorithm for robust detection o...
csandit
 

Similar to Chord recognition mac lab presentation (20)

Audio Visual Emotion Recognition Using Cross Correlation and Wavelet Packet D...
Audio Visual Emotion Recognition Using Cross Correlation and Wavelet Packet D...Audio Visual Emotion Recognition Using Cross Correlation and Wavelet Packet D...
Audio Visual Emotion Recognition Using Cross Correlation and Wavelet Packet D...
 
ISSCS2011
ISSCS2011ISSCS2011
ISSCS2011
 
NTSC Software Decoding Presentation
NTSC Software Decoding PresentationNTSC Software Decoding Presentation
NTSC Software Decoding Presentation
 
Audio Separation Comparison: Clustering Repeating Period and Hidden Markov Model
Audio Separation Comparison: Clustering Repeating Period and Hidden Markov ModelAudio Separation Comparison: Clustering Repeating Period and Hidden Markov Model
Audio Separation Comparison: Clustering Repeating Period and Hidden Markov Model
 
Mp3
Mp3Mp3
Mp3
 
Characterizing the Heterogeneity of 2D Materials with Transmission Electron M...
Characterizing the Heterogeneity of 2D Materials with Transmission Electron M...Characterizing the Heterogeneity of 2D Materials with Transmission Electron M...
Characterizing the Heterogeneity of 2D Materials with Transmission Electron M...
 
FORECASTING MUSIC GENRE (RNN - LSTM)
FORECASTING MUSIC GENRE (RNN - LSTM)FORECASTING MUSIC GENRE (RNN - LSTM)
FORECASTING MUSIC GENRE (RNN - LSTM)
 
Speaker Segmentation (2006)
Speaker Segmentation (2006)Speaker Segmentation (2006)
Speaker Segmentation (2006)
 
N017428692
N017428692N017428692
N017428692
 
Development of a Multipurpose Audio Transmission System on the Internet
Development of a Multipurpose Audio Transmission System on the InternetDevelopment of a Multipurpose Audio Transmission System on the Internet
Development of a Multipurpose Audio Transmission System on the Internet
 
Teaching Computers to Listen to Music
Teaching Computers to Listen to MusicTeaching Computers to Listen to Music
Teaching Computers to Listen to Music
 
DNA Splice site prediction
DNA Splice site predictionDNA Splice site prediction
DNA Splice site prediction
 
Time Based Effects
Time Based EffectsTime Based Effects
Time Based Effects
 
Artifact Detection and Removal from In-Vivo Neural Signals
Artifact Detection and Removal from In-Vivo Neural SignalsArtifact Detection and Removal from In-Vivo Neural Signals
Artifact Detection and Removal from In-Vivo Neural Signals
 
MMCF: Multimodal Collaborative Filtering for Automatic Playlist Conitnuation
MMCF: Multimodal Collaborative Filtering for Automatic Playlist ConitnuationMMCF: Multimodal Collaborative Filtering for Automatic Playlist Conitnuation
MMCF: Multimodal Collaborative Filtering for Automatic Playlist Conitnuation
 
AviPulse - Presentation at YETI 20th Jan 2016
AviPulse - Presentation at YETI 20th Jan 2016AviPulse - Presentation at YETI 20th Jan 2016
AviPulse - Presentation at YETI 20th Jan 2016
 
DNN-based permutation solver for frequency-domain independent component analy...
DNN-based permutation solver for frequency-domain independent component analy...DNN-based permutation solver for frequency-domain independent component analy...
DNN-based permutation solver for frequency-domain independent component analy...
 
DNN-based frequency-domain permutation solver for multichannel audio source s...
DNN-based frequency-domain permutation solver for multichannel audio source s...DNN-based frequency-domain permutation solver for multichannel audio source s...
DNN-based frequency-domain permutation solver for multichannel audio source s...
 
Intel Nervana Artificial Intelligence Meetup 11/30/16
Intel Nervana Artificial Intelligence Meetup 11/30/16Intel Nervana Artificial Intelligence Meetup 11/30/16
Intel Nervana Artificial Intelligence Meetup 11/30/16
 
An efficient peak valley detection based vad algorithm for robust detection o...
An efficient peak valley detection based vad algorithm for robust detection o...An efficient peak valley detection based vad algorithm for robust detection o...
An efficient peak valley detection based vad algorithm for robust detection o...
 

Recently uploaded

GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Zilliz
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
TIPNGVN2
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 

Recently uploaded (20)

GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 

Chord recognition mac lab presentation

  • 1. Chord Recognition Aka. Chord detection, audio chord estimation Mu-Heng Yang – RA at CITI, Academia Sinica
  • 2. TODO  If no chord -> N
  • 3. C C Em Em | C C Em Em Purpose
  • 4. Evaluation  OR = Overlap Ratio  WAOR = Weighted Average Overlap Ratio  Ex:song1(3min)OR=73% song2(6min)OR=70% WAOR=71%
  • 5. Evaluation  CSR = Chord Symbol Recall  WCSR = Weighted Chord Symbol Recall  Because ground truth is almost continuous  Discretize the ground truth will cause small error
  • 6. Chord  <Root>:<Type>/<Inversion>  Root: C, C#, D, … , A, A#, B  Type: maj, min, 7, maj7, min7  Inversion: 3 or 5 or 7  Other: aug, dim, sus, 1, 4, 5, 6, 9, 11, 13
  • 7. MIREX  Chord root note only  Major and minor  Seventh chords  Major and minor with inversions  Seventh chords with inversions
  • 8. Data  hard to get audio files due to copyrights.  Remastering results in different audio files  180 songs from the Beatles dataset  100 songs from the RWC Pop dataset  18 songs from the Zweieck dataset  19 songs from the Queen dataset  198 songs from the Billboard dataset
  • 9. Ambiguity  Fade out  Staccato  Many chords at the same time  Overlapped due to reverb  Different chord mapping
  • 11. Outline  Feature Extraction  Pre-processing  Learning  Post-filtering
  • 12. Outline  Feature Extraction  Pre-processing  Learning  Post-filtering
  • 13. HPSS  Harmonic Percussive Sound Separation  Percussion Suppressed 50.9%→74.2%  No harmonic structure  Smooth frequency envelope  Concentrated in a short time  Demo 2010 Ueda et. al. [1]
  • 14. Tuning  Standard frequency of A4 is 440 Hz.  Sometime tuning is deviated. (415~445 Hz)  One song’s WCSR increase from 14.5% to 73.9%  However, it’s very hard to get perfect tuning
  • 15. Chroma  A.k.a Pitch Class Profile  Sum energies of frequencies within each bin  Use log/power function to compress energy  Sum up respective bins to get chroma vector
  • 16. C = C1 + C2 + C3 + C4 + C5 + … D = D1 + D2 + D3 + D4 + D5 + ... Chroma
  • 17. DNN features  DNN likes lower-level features  DNN likes lots of information  Unfold each octave  2~3 bins per semitone  Some even directly use FFT
  • 18. Outline  Feature Extraction  Pre-processing  Learning  Post-filtering
  • 19. Beat Synchronize  Assume there’s only one chord in a beat  Average all frames within each beat  It can smooth the noise and percussion  But most NN-based models don’t use it
  • 20. Median Filter  It can also smooth the noise and percussion  Without losing too many information  But it may smear the chord boundary
  • 21. Time-Splicing  Include frames before and after the current one.  If it’s the 7th frame, then concatenate the 6th & 8th  Of course you can include more frames(3~11)  It depends on the capability of your DNN model
  • 22. Spliced Filter  Inspired by CNN(convolution neural network)  Use filters to include past and future frames  Use max pooling to reduce feature dimension  WCSR increases from 75.5% to 91.9% ! 2015 Xinquan Zhou et. al. [2]
  • 23. Outline  Feature Extraction  Pre-processing  Learning  Post-filtering
  • 24. Learning  Unsupervised Learning (pre-train) Unlabeled Data  Supervised Learning (fine-tune) Labeled Data
  • 25. Dimension Reduction  Principal Component Analysis  But it’s can only do linear combination
  • 26. Autoencoder  Use NN to do non-linear dimension reduction For binary input For real input
  • 27. Denoising Autoencoder  More robust to noisy input  For binary input, set a random fraction of input to 0 or 1  For real-valued input, use isotropic additive Gaussian noise
  • 28. Stacked Denoising Autoencoder  More layers with greedy layer-wise training  Don’t need labeled data for pre-training  Use the model to Initialize DNN(converge faster) 2014 Steenbergen et. al. [3]
  • 29. Supervised-Learning  Fine-tune by back-propagation after pre-training  Predict the emission probability for each chord  Softmax output layer at the end of DNN Emis Frame1 Frame2 Frame3 Frame4 Frame5 Frame6 Cmaj 80% 70% 45% 40% 80% 85% Fmaj 5% 5% 10% 5% 10% 5% Gmaj 10% 15% 35% 50% 5% 5% Amin 5% 10% 10% 5% 5% 5%
  • 30. Prevent Overfitting  Happen when models are too powerful  Song-based Cross Validation  Early-Stopping (20 iter)  Dropout & DropConnect  Weight Penalty, Weight Constraint  Bottleneck Architecture 2010 Hinton et. al. [4] https://www.coursera.org/course/neuralnets
  • 31. Meta-Parameters  Learning Rate: adapt by watching sign changes  Mini-batches Sizes: 10~100  Momentum: leave local min (0.5~0.9) 2010 Hinton et. al. [4] https://www.coursera.org/course/neuralnets
  • 32. Outline  Feature Extraction  Pre-processing  Learning  Post-filtering
  • 33. Viterbi-decoding  Get Transition Probability by counting  Remember the best path to current node Tran Cmaj Fmaj Gmaj Amin Cmaj 80% 5% 10% 5% Fmaj 20% 70% 5% 5% Gmaj 10% 10% 70% 10% Amin 10% 20% 60% 10% Emis T1 T2 T3 T4 T5 T6 Cmaj 80% 70% 45% 40% 80% 85% Fmaj 5% 5% 10% 5% 10% 5% Gmaj 10% 15% 35% 50% 5% 5% Amin 5% 10% 10% 5% 5% 5% http://www.hooktheory.com/trends
  • 34. Ensemble  Average the prediction of many models  Models are saved during cross validation  Each model is trained with different data Chord 1st Model 2nd Model 3rd Model Ensemble Cmaj 50% 40% 30% 40% Amin 20% 10% 30% 20% Fmaj 20% 30% 10% 20% Gmaj 10% 20% 30% 20%
  • 35. RNN  Recurrent Neural Network  Combine acoustic model and language model  Result in 80.6% WAOR 2015 Nicolas et. al. [5] [6]
  • 36. RNN  Back-propagation through time(BPTT)  Weights are shared 2015 Nicolas et. al. [5] [6] Unroll
  • 37. DBN with Musical Context  Dynamic Bayesian Network  Depend on other information  Map back to Cmaj key  Chord Progession: I V vi IV  Average chorus1~3 2010 Mauch et. al. [7] [8]
  • 38. Reference  [1] HMM-based Approach for Automatic Chord Detection Using Refined Acoustic Features  [2] Chord Detection Using Deep Learning  [3] Chord Recognition with Stacked Denoising Autoencoders  [4] A Practical Guide to Training Restricted Boltzmann Machines  [5] Audio Chord Recognition with Recurrent Neural Network  [6] Audio Chord Recognition with a Hybrid Recurrent Neural Network  [7] Simultaneous Estimation of Chords and Musical Context From Audio  [8] Using Musical Structure to Enhance Automatic Chord Transcription
  • 39. Q&A  Thanks for listening.