Audio Source Separation Based on Low-Rank Structure and Statistical Independence

Daichi Kitamura
Daichi KitamuraAssistant Professor at National Institute of Technology, Kagawa College
Audio Source Separation Based on Low-Rank
Structure and Statistical Independence
The University of Tokyo
Research Associate
Daichi Kitamura
Nagoya University, Lecture
May 30, 2017
Introduction
• Daichi Kitamura (北村大地)
• Research Associate of The University of Tokyo
• Academic background
– Kagawa National Collage of Technology (2005 ~ 2012)
• B.S. in Engineering (March 2012)
– Nara Institute of Science and Technology (2012 ~ 2014)
• M.S. in Engineering (March 2014)
– SOKENDAI (2014 ~ 2017)
• Ph.D. in Informatics (March 2017)
• Research topics
– Media signal processing
– Audio source separation
2
Contents
• Research background
– Audio source separation and its applications
– Demonstration
• Structural modeling of audio sources
– Time-frequency representation
– Low-rank modeling of audio spectrogram
– Supervised audio source separation
• Statistical modeling between sources
– Blind audio source separation
– Audio distribution and central limit theorem
– Maximization of independence
• Conclusion and future works
3
Contents
• Research background
– Audio source separation and its applications
– Demonstration
• Structural modeling of audio sources
– Time-frequency representation
– Low-rank modeling of audio spectrogram
– Supervised audio source separation
• Statistical modeling between sources
– Blind audio source separation
– Audio distribution and central limit theorem
– Maximization of independence
• Conclusion and future works
4
• Audio source separation
– Signal processing
– Separation of speech, music sounds, background noise, …
– Cocktail party effect by a computer
Research background
5
• Audio source separation
– Signal processing
– Separation of speech, music sounds, background noise, …
– Cocktail party effect by a computer
Research background
6
Research background
7
Separate
Automatic transcription
CD
• Application of audio source separation
– Hearing aid
• Easy to talk in a loud environment
– Speech recognition systems
• Siri, Google search, Cortana, Amazon Echo, …
– Automatic music transcription
• Musical part separation (Vo., Gt., Ba., …)
– Remix of live-recorded music
• Professional use (improving quality), personal use (DJ remixing), …
Demonstration: speech source separation
• Real-time speech source separation (video)
8
Demonstration: music source separation
• Music source separation
9
Guitar
Vocal
Keyboard
Guitar
Vocal
Keyboard
Source
separation
Pay attention to
listen three parts
in the mixture.
Contents
• Research background
– Audio source separation and its applications
– Demonstration
• Structural modeling of audio sources
– Time-frequency representation
– Low-rank modeling of audio spectrogram
– Supervised audio source separation
• Statistical modeling between sources
– Blind audio source separation
– Audio distribution and central limit theorem
– Maximization of independence
• Conclusion and future works
10
For monaural
signals
For stereo or
multichannel
signals
Contents
• Research background
– Audio source separation and its applications
– Demonstration
• Structural modeling of audio sources
– Time-frequency representation
– Low-rank modeling of audio spectrogram
– Supervised audio source separation
• Statistical modeling between sources
– Blind audio source separation
– Audio distribution and central limit theorem
– Maximization of independence
• Conclusion and future works
11
For monaural
signals
For stereo or
multichannel
signals
Time-frequency representation of audio signals
• Audio waveform in time domain (speech)
12
• Time-varying frequency structure
– Short-time Fourier transform (STFT)
Time-frequency representation of audio signals
13
Time domain
Window
FFT length
Shift length
Time-frequency domain
Waveform
…
Fourier transform
Fourier transform
Fourier transform
Spectrogram
Complex-valued matrix
Frequency
Time
…
Power spectrogram
Nonnegative real-valued matrix
Entry-wise
absolute
and power
Power spectrogram of speech
14
Power spectrogram of music
15
• Sparse (for both speech and music)
– Strong (yellow) components are fewer
– Weak (darker) components are dominant
• Continuous contour (only in speech)
– Spectrum continuously and dynamically changes
• Low rank (especially in music)
– Including similar patterns (similar timbres) many times
Structural properties
16Speech Music
Comparison of low-rankness
17
Drums Guitar
Vocals Speech
• Low-rankness (simplicity of a matrix)
– can be measured by a cumulative singular value (CSV)
– Drums and guitar are quite low-rank
• Also, vocals and speech are to some extent low-rank
– Music spectrogram can be modeled by few patterns
Comparison of low-rankness
18
95% line
7 29 Around 90
Number of bases
when CSV reaches 95%
(Spectrogram size is 1025x1883)
Modeling technique of low-rank structures
• Nonnegative matrix factorization (NMF) [Lee, 1999]
– is a low-rank approximation using limited number of bases
• Bases and their coefficients must be nonnegative
– can be applied to a power spectrogram
• Spectral patterns (typical timbres) and their time-varying gains
19
Amplitude
Amplitude
Nonnegative matrix
(power spectrogram)
Basis matrix
(spectral patterns)
Activation matrix
(time-varying gains)
Time
: # of frequency bins
: # of time frames
: # of bases
Time
Frequency
Frequency
Basis
Activation
• Parameters optimization in NMF
– Minimize “similarity measure” between and
– Arbitrarily measure for similarity can be used
• Squared Euclidian distance , etc.
– Closed form solution is still an open problem
– Iterative calculation can minimize
• Multiplicative update rules [Lee, 2000]
Modeling technique of low-rank structures
20
(for the case of squared Euclidian distance)
Modeling technique of low-rank structures
• Example
21
Pf. and Cl.
Superposition of
rank-1 spectrogram
Modeling technique of low-rank structures
• Example
– Pf. and Cl. are separated!
– Source separation based on NMF
• is a clustering problem of the obtained spectral bases in
– But how?
22
Pf. Cl.
Pf. and Cl.
• If the sourcewise training data is available,
• Supervised NMF [Smaragdis, 2007], [Kitamura1, 2014]
Supervised audio source separation with NMF
23
Separation stage
Training stage
Given
Spectral
dictionary of Pf.
Other bases
Only , , and are optimized
• Demonstration
– Stereo music separation with supervised NMF [Kitamura, 2015]
Supervised audio source separation with NMF
24
Original song
Training
sound of Pf.
Separated
sound (Pf.)
Training
sound of Ba.
Separated
sound (Ba.)
• Performance will be limited
– when the difference of timbres between training data and
target source in the mixture becomes large
Problem of supervised approach
25
Mixture sound
Target Different Pf.
Slightly
different
Training data
60
40
20
0
-20
Amplitude[dB]
3.02.52.01.51.00.50.0
Frequency [kHz]
Real sound
Artificial sound by MIDI
Difference of timbres
Mixture
(actual Pf. & Tb.)
Separated signal
using artificial Pf.
as training data
Supervised
NMF
• Supervised NMF with basis deformation [Kitamura, 2013]
– employs to adaptively deform pre-trained bases in
Adaptive supervised audio source separation
26
Training stage
Deformation term (positive and negative)
Slightly
different
Separation stage
Given
• Constraint in deformation term
– Range of deformation is restricted
– To avoid excess deformation of
Adaptive supervised audio source separation
27
Mixture
(actual Pf. & Tb.)
Separated signal
Supervised
NMF
Separated signal
Supervised NMF with
basis deformation
Training data is the same
(artificial Pf. sound)
Frequency Frequency
±30%
For the case of
• Demonstration
– Separate actual instrumental sounds using artificial training
data produced by MIDI synthesizer.
Adaptive supervised audio source separation
28Copyright © 2014 Yamaha Corp. All rights reserved.
Original song
(actual instruments)
Training sound of Sax.
(produced by MIDI)
Separated sound (Sax.)
Training sound of Ba.
(produced by MIDI)
Separated sound (Ba.)
Residual sound
Residual sound
Contents
• Research background
– Audio source separation and its applications
– Demonstration
• Structural modeling of audio sources
– Time-frequency representation
– Low-rank modeling of audio spectrogram
– Supervised audio source separation
• Statistical modeling between sources
– Blind audio source separation
– Audio distribution and central limit theorem
– Maximization of independence
• Conclusion and future works
29
For monaural
signals
For stereo or
multichannel
signals
Multichannel recording using microphone array
• Number of microphones and sources
– Overdetermined situation (# of sources # of mics.)
– Underdetermined situation (# of sources # of mics.)
• a priori information
– Training data of the source, position of sources, room
geometry, music scores, etc.
– Blind source separation (BSS): without any a priori info. 30
Sources Observed Estimated
Mixing system Demixing system
Microphone array
CD
L-ch
R-ch
Stereo signal (2-ch) One mic.
1-ch
Monaural signal (1-ch)
BSS and independent component analysis
• Blind source separation (BSS)
– Estimate demixing system without any prior information
about the mixing system
• Typical BSS is based on a statistical independence
• Independent component analysis (ICA) [Comon, 1994]
– How to measure a statistical independence?
– Define a “distribution of audio signals”
– Find demixing system that maximizes independence
31
Demixing systemMixing system
What is the distribution of audio signals?
• Distribution of speech waveform
13
Amplitude
Time samples
Spiky and heavy-tailed
than Gaussian (Normal)
distribution
Amountofcomponents
Amplitude
0
0.1
0.2
0.3
0.4
0.5
-5 -4 -3 -2 -1 0 1 2 3 4 5
Gaussian distribution
What is the distribution of audio signals?
• Distribution of Piano waveform
13
Amplitude
Time samples
Spiky and heavy-tailed
than Gaussian distribution
Amountofcomponents
Amplitude
0
0.1
0.2
0.3
0.4
0.5
0.6
-5 -4 -3 -2 -1 0 1 2 3 4 5
Laplace distribution
What is the distribution of audio signals?
• Distribution of Drums waveform
13
Amplitude
Time samples
Spiky and heavy-tailed
than Gaussian distribution
Amountofcomponents
Amplitude
0
0.2
0.4
0.6
0.8
1
-5 -4 -3 -2 -1 0 1 2 3 4 5
Cauchy distribution
Central limit theorem
35
• Audio source distribution is basically non-Gaussian
– But still we don’t know the source distribution
• How to model them for source separation?
• Central limit theorem
– “A sum of any kind of random variables always approaches
to having a Gaussian distribution.”*
• Can’t believe? Let’s see
0
0.1
0.2
0.3
0.4
0.5
0.6
-5 -4 -3 -2 -1 0 1 2 3 4 5
Laplace distribution
0
0.002
0.004
0.006
0.008
0.01
-5 -4 -3 -2 -1 0 1 2 3 4 5
Uniform distribution
Generate r.v.s
Gaussian distribution
0
0.1
0.2
0.3
0.4
0.5
-5 -4 -3 -2 -1 0 1 2 3 4 5
* Several r.v.s do not obey, e.g., Cauchy r.v.
Central limit theorem
36
• is pips of first dice, and is pips of second dice
–
– Probability is always 1/6
• Results of 1 million trials for each dice
– What about ?
Amount
Amount
Central limit theorem
37
• is pips of first dice, and is pips of second dice
–
– Probability is always 1/6
• Results of 1 million trials for each dice
– What about ?
Amount
Not a uniform distribution any more
Central limit theorem
38
• is pips of first dice, and is pips of second dice
–
– Probability is always 1/6
• Results of 1 million trials for each dice
Amount
Amount
Central limit theorem
39
• is pips of first dice, and is pips of second dice
–
– Probability is always 1/6
• Results of 1 million trials for each dice
– Approaches to a Gaussian distribution (central limit theorem)
Central limit theorem in audio signals
40
• is an th speakers signal
–
– , around 3.3 s
Amplitude
Time samples
Amount
Amplitude
Amplitude
Time samples
Amount
Amplitude
Central limit theorem in audio signals
41
• is an th speakers signal
–
– , around 3.3 s
Amplitude
Time samples
AmountAmplitude
Central limit theorem in audio signals
42
• is an th speakers signal
–
– , around 3.3 s
Amplitude
Time samples
Amount
Amplitude
Amplitude
Time samples
Amount
Amplitude
• is an th speakers signal
–
– , around 3.3 s
Central limit theorem in audio signals
43
Amplitude
Time samples
AmountAmplitude
• is an th speakers signal
–
– , around 3.3 s
Central limit theorem in audio signals
44
Amplitude
Time samples
AmountAmplitude
Almost a
Gaussian dist.
(central limit
theorem)
Principle of ICA
45
• What we can say from central limit theorem
– Gaussian distribution is a limitation of mixture of sources
– If we maximize non-Gaussianity of all signals,
the signals will be the original sources before they mixed
Basic principle of ICA
Maximizing
non-Gaussianity
Maximizing
independence
between components
More general,
Approaching to Gaussian
(central limit theorem)
Departing from Gaussian
(ICA)
Principle of ICA
• Assumption in ICA
– 1. Sources are mutually independent
– 2. Each source distribution is non-Gaussian
– 3. Mixing system is invertible and time-invariant
Mixing matrix
Sources
(latent components)1. Mutually
independent
2. Non-Gaussian
3. Invertible and
time-invariant
10
Mixtures
(observed signals)
Inverse matrix
Principle of ICA
• Uncertainty in ICA
– 1. Signal scale (volume) cannot determined
– 2. Signal permutation cannot determined
11
ICA
ICA
Sources
(latent components)
Mixtures
(observed signals)
Sources
(latent components)
Mixtures
(observed signals)
Separated signals
(estimated by ICA)
Separated signals
(estimated by ICA)
• Estimation in ICA
– Maximize independence between source distributions
– log-likelihood function
Principle of ICA
12
Minimize
distance
: Non-Gaussian source distribution
Generally, is set to an appropriate non-Gaussian distribution
• Audio mixture in actual environment
– Convolutive mixture with reverberation
• Ex. office room has 300 ms, concert hall is more than 2000 ms
– Mixing coefficient becomes mixing filter
• How to deconvolute them?
– 1. Estimate deconvolution filter
• In 16 kHz sampling, the filter with 300 ms includes 4800 taps
• # of parameters that should be estimated explodes
– 2. Estimate demixing coefficient in frequency domain
• Frequency-wise demixing matrix should be estimated by ICA
• encountering permutation problem
ICA-based separation of reverberant mixture
49
Reverberation length
(length of convolution filter)
Simultaneous mixture
Convolutive mixture
ICA-based separation of reverberant mixture
• Frequency-domain ICA (FDICA) [Smaragdis, 1998]
– Apply simple ICA to each frequency bin
50
Spectrogram
ICA1
ICA2
ICA3
…
…
ICA
Frequencybin
Time frame
…
Inverse matrix
Frequency-wise
mixing matrix
Frequency-wise
demixing matrix
ICA-based separation of reverberant mixture
51
• Permutation problem in frequency-domain ICA
– Order of separated signals in each frequency is messed up*
– Have to take an alignment through the frequency
*Scales are also messed up, but they can be easily fixed.
ICA
In all frequency
Source 1
Source 2
Mixture 1
Mixture 2
Permutation
Solver
Separated signal 1
Separated signal 2Time
ICA-based separation of reverberant mixture
• Popular permutation solvers
– Based on direction of arrival (DOA)
• Frequency-domain ICA + DOA alignment [Saruwatari, 2006]
– Based on a relative correlation among frequencies
• Independent vector analysis (IVA) [Hiroe, 2006], [Kim, 2006]
– Based on a low-rank modeling of each source
• Independent low-rank matrix analysis (ILRMA) [Kitamura, 2016]
• Demonstration of BSS using ILRMA
– http://d-kitamura.net/en/demo_rank1_en.htm
52
Contents
• Research background
– Audio source separation and its applications
– Demonstration
• Structural modeling of audio sources
– Time-frequency representation
– Low-rank modeling of audio spectrogram
– Supervised audio source separation
• Statistical modeling between sources
– Blind audio source separation
– Audio distribution and central limit theorem
– Maximization of independence
• Conclusion and future works
53
Conclusions and future works
• Audio source separation based on
– Low-rank property
• Nonnegative matrix factorization
– Statistical independence
• Blind source separation
• For further improving
– Separation based on a huge dataset training
• Deep learning, denoising auto encoder, etc.
• Recording condition is juts one-time
– Informed source separation
• Music scores could be a powerful information
• User can induce the system, and leads more accurate separation
• Performance is still insufficient
– Almost there? Not at all! Make our life better. That’s an engineering.
54
Duration
Region
1 of 54

Recommended

Deep learning takes on Signal Processing by
Deep learning takes on Signal ProcessingDeep learning takes on Signal Processing
Deep learning takes on Signal ProcessingVivek Kumar
919 views31 slides
Blind audio source separation based on time-frequency structure models by
Blind audio source separation based on time-frequency structure modelsBlind audio source separation based on time-frequency structure models
Blind audio source separation based on time-frequency structure modelsKitamura Laboratory
323 views46 slides
3F3 – Digital Signal Processing (DSP) - Part1 by
3F3 – Digital Signal Processing (DSP) - Part13F3 – Digital Signal Processing (DSP) - Part1
3F3 – Digital Signal Processing (DSP) - Part1op205
13.7K views99 slides
Biomedical signal processing syllabus by
Biomedical signal processing syllabusBiomedical signal processing syllabus
Biomedical signal processing syllabusIndian Institute of Technology Bhubaneswar
3.4K views8 slides
DSP-Finite Word Length Effects by
DSP-Finite Word Length EffectsDSP-Finite Word Length Effects
DSP-Finite Word Length EffectsSenthil Kumar K
3K views28 slides
Basics of Digital Filters by
Basics of Digital FiltersBasics of Digital Filters
Basics of Digital Filtersop205
50.3K views67 slides

More Related Content

What's hot

DSP_2018_FOEHU - Lec 02 - Sampling of Continuous Time Signals by
DSP_2018_FOEHU - Lec 02 - Sampling of Continuous Time SignalsDSP_2018_FOEHU - Lec 02 - Sampling of Continuous Time Signals
DSP_2018_FOEHU - Lec 02 - Sampling of Continuous Time SignalsAmr E. Mohamed
3.7K views30 slides
Introduction to wavelet transform by
Introduction to wavelet transformIntroduction to wavelet transform
Introduction to wavelet transformRaj Endiran
27.6K views45 slides
Sampling Theorem by
Sampling TheoremSampling Theorem
Sampling TheoremDr Naim R Kidwai
5.5K views19 slides
Wavelet Transform and DSP Applications by
Wavelet Transform and DSP ApplicationsWavelet Transform and DSP Applications
Wavelet Transform and DSP ApplicationsUniversity of Technology - Iraq
3.7K views54 slides
Walsh transform by
Walsh transformWalsh transform
Walsh transformSachinMaithani1
1.6K views19 slides

What's hot(20)

DSP_2018_FOEHU - Lec 02 - Sampling of Continuous Time Signals by Amr E. Mohamed
DSP_2018_FOEHU - Lec 02 - Sampling of Continuous Time SignalsDSP_2018_FOEHU - Lec 02 - Sampling of Continuous Time Signals
DSP_2018_FOEHU - Lec 02 - Sampling of Continuous Time Signals
Amr E. Mohamed3.7K views
Introduction to wavelet transform by Raj Endiran
Introduction to wavelet transformIntroduction to wavelet transform
Introduction to wavelet transform
Raj Endiran27.6K views
Adaptive Noise Cancellation by tazim68
Adaptive Noise CancellationAdaptive Noise Cancellation
Adaptive Noise Cancellation
tazim68860 views
Introduction to Recurrent Neural Network by Yan Xu
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
Yan Xu4.8K views
parametric method of power spectrum Estimation by junjer
parametric method of power spectrum Estimationparametric method of power spectrum Estimation
parametric method of power spectrum Estimation
junjer8.6K views
Digital speech processing lecture1 by Samiul Parag
Digital speech processing lecture1Digital speech processing lecture1
Digital speech processing lecture1
Samiul Parag12.1K views
non parametric methods for power spectrum estimaton by Bhavika Jethani
non parametric methods for power spectrum estimatonnon parametric methods for power spectrum estimaton
non parametric methods for power spectrum estimaton
Bhavika Jethani10.6K views
Introduction to Digital Signal Processing (DSP) by Md. Arif Hossain
Introduction  to  Digital Signal Processing (DSP)Introduction  to  Digital Signal Processing (DSP)
Introduction to Digital Signal Processing (DSP)
Md. Arif Hossain3.1K views
Mp3 player working by digital signal processing by Dipanjon Halder
Mp3 player working by digital signal processingMp3 player working by digital signal processing
Mp3 player working by digital signal processing
Dipanjon Halder3.1K views
Signal classification of signal by 001Abhishek1
Signal classification of signalSignal classification of signal
Signal classification of signal
001Abhishek19.9K views

Viewers also liked

Experimental analysis of optimal window length for independent low-rank matri... by
Experimental analysis of optimal window length for independent low-rank matri...Experimental analysis of optimal window length for independent low-rank matri...
Experimental analysis of optimal window length for independent low-rank matri...Daichi Kitamura
850 views22 slides
半教師あり非負値行列因子分解における音源分離性能向上のための効果的な基底学習法 by
半教師あり非負値行列因子分解における音源分離性能向上のための効果的な基底学習法半教師あり非負値行列因子分解における音源分離性能向上のための効果的な基底学習法
半教師あり非負値行列因子分解における音源分離性能向上のための効果的な基底学習法Daichi Kitamura
4.3K views28 slides
Blind source separation based on independent low-rank matrix analysis and its... by
Blind source separation based on independent low-rank matrix analysis and its...Blind source separation based on independent low-rank matrix analysis and its...
Blind source separation based on independent low-rank matrix analysis and its...Daichi Kitamura
1.6K views47 slides
Relaxation of rank-1 spatial constraint in overdetermined blind source separa... by
Relaxation of rank-1 spatial constraint in overdetermined blind source separa...Relaxation of rank-1 spatial constraint in overdetermined blind source separa...
Relaxation of rank-1 spatial constraint in overdetermined blind source separa...Daichi Kitamura
1.6K views23 slides
音源分離における音響モデリング(Acoustic modeling in audio source separation) by
音源分離における音響モデリング(Acoustic modeling in audio source separation)音源分離における音響モデリング(Acoustic modeling in audio source separation)
音源分離における音響モデリング(Acoustic modeling in audio source separation)Daichi Kitamura
22.6K views114 slides
擬似ハムバッキングピックアップの弦振動応答 (in Japanese) by
擬似ハムバッキングピックアップの弦振動応答 (in Japanese)擬似ハムバッキングピックアップの弦振動応答 (in Japanese)
擬似ハムバッキングピックアップの弦振動応答 (in Japanese)Daichi Kitamura
1.1K views13 slides

Viewers also liked(18)

Experimental analysis of optimal window length for independent low-rank matri... by Daichi Kitamura
Experimental analysis of optimal window length for independent low-rank matri...Experimental analysis of optimal window length for independent low-rank matri...
Experimental analysis of optimal window length for independent low-rank matri...
Daichi Kitamura850 views
半教師あり非負値行列因子分解における音源分離性能向上のための効果的な基底学習法 by Daichi Kitamura
半教師あり非負値行列因子分解における音源分離性能向上のための効果的な基底学習法半教師あり非負値行列因子分解における音源分離性能向上のための効果的な基底学習法
半教師あり非負値行列因子分解における音源分離性能向上のための効果的な基底学習法
Daichi Kitamura4.3K views
Blind source separation based on independent low-rank matrix analysis and its... by Daichi Kitamura
Blind source separation based on independent low-rank matrix analysis and its...Blind source separation based on independent low-rank matrix analysis and its...
Blind source separation based on independent low-rank matrix analysis and its...
Daichi Kitamura1.6K views
Relaxation of rank-1 spatial constraint in overdetermined blind source separa... by Daichi Kitamura
Relaxation of rank-1 spatial constraint in overdetermined blind source separa...Relaxation of rank-1 spatial constraint in overdetermined blind source separa...
Relaxation of rank-1 spatial constraint in overdetermined blind source separa...
Daichi Kitamura1.6K views
音源分離における音響モデリング(Acoustic modeling in audio source separation) by Daichi Kitamura
音源分離における音響モデリング(Acoustic modeling in audio source separation)音源分離における音響モデリング(Acoustic modeling in audio source separation)
音源分離における音響モデリング(Acoustic modeling in audio source separation)
Daichi Kitamura22.6K views
擬似ハムバッキングピックアップの弦振動応答 (in Japanese) by Daichi Kitamura
擬似ハムバッキングピックアップの弦振動応答 (in Japanese)擬似ハムバッキングピックアップの弦振動応答 (in Japanese)
擬似ハムバッキングピックアップの弦振動応答 (in Japanese)
Daichi Kitamura1.1K views
Efficient initialization for nonnegative matrix factorization based on nonneg... by Daichi Kitamura
Efficient initialization for nonnegative matrix factorization based on nonneg...Efficient initialization for nonnegative matrix factorization based on nonneg...
Efficient initialization for nonnegative matrix factorization based on nonneg...
Daichi Kitamura2.8K views
Music signal separation using supervised nonnegative matrix factorization wit... by Daichi Kitamura
Music signal separation using supervised nonnegative matrix factorization wit...Music signal separation using supervised nonnegative matrix factorization wit...
Music signal separation using supervised nonnegative matrix factorization wit...
Daichi Kitamura985 views
ランク1空間近似を用いたBSSにおける音源及び空間モデルの考察 Study on Source and Spatial Models for BSS wi... by Daichi Kitamura
ランク1空間近似を用いたBSSにおける音源及び空間モデルの考察 Study on Source and Spatial Models for BSS wi...ランク1空間近似を用いたBSSにおける音源及び空間モデルの考察 Study on Source and Spatial Models for BSS wi...
ランク1空間近似を用いたBSSにおける音源及び空間モデルの考察 Study on Source and Spatial Models for BSS wi...
Daichi Kitamura1.8K views
Study on optimal divergence for superresolution-based supervised nonnegative ... by Daichi Kitamura
Study on optimal divergence for superresolution-based supervised nonnegative ...Study on optimal divergence for superresolution-based supervised nonnegative ...
Study on optimal divergence for superresolution-based supervised nonnegative ...
Daichi Kitamura1K views
独立性基準を用いた非負値行列因子分解の効果的な初期値決定法(Statistical-independence-based efficient initia... by Daichi Kitamura
独立性基準を用いた非負値行列因子分解の効果的な初期値決定法(Statistical-independence-based efficient initia...独立性基準を用いた非負値行列因子分解の効果的な初期値決定法(Statistical-independence-based efficient initia...
独立性基準を用いた非負値行列因子分解の効果的な初期値決定法(Statistical-independence-based efficient initia...
Daichi Kitamura4.9K views
非負値行列因子分解に基づくブラインド及び教師あり音楽音源分離の効果的最適化法 by Daichi Kitamura
非負値行列因子分解に基づくブラインド及び教師あり音楽音源分離の効果的最適化法非負値行列因子分解に基づくブラインド及び教師あり音楽音源分離の効果的最適化法
非負値行列因子分解に基づくブラインド及び教師あり音楽音源分離の効果的最適化法
Daichi Kitamura3.5K views
基底変形型教師ありNMFによる実楽器信号分離 (in Japanese) by Daichi Kitamura
基底変形型教師ありNMFによる実楽器信号分離 (in Japanese)基底変形型教師ありNMFによる実楽器信号分離 (in Japanese)
基底変形型教師ありNMFによる実楽器信号分離 (in Japanese)
Daichi Kitamura5.9K views
非負値行列分解の確率的生成モデルと 多チャネル音源分離への応用 (Generative model in nonnegative matrix facto... by Daichi Kitamura
非負値行列分解の確率的生成モデルと多チャネル音源分離への応用 (Generative model in nonnegative matrix facto...非負値行列分解の確率的生成モデルと多チャネル音源分離への応用 (Generative model in nonnegative matrix facto...
非負値行列分解の確率的生成モデルと 多チャネル音源分離への応用 (Generative model in nonnegative matrix facto...
Daichi Kitamura5.9K views
統計的独立性と低ランク行列分解理論に基づく ブラインド音源分離 –独立低ランク行列分析– Blind source separation based on... by Daichi Kitamura
統計的独立性と低ランク行列分解理論に基づくブラインド音源分離 –独立低ランク行列分析– Blind source separation based on...統計的独立性と低ランク行列分解理論に基づくブラインド音源分離 –独立低ランク行列分析– Blind source separation based on...
統計的独立性と低ランク行列分解理論に基づく ブラインド音源分離 –独立低ランク行列分析– Blind source separation based on...
Daichi Kitamura2.9K views
音響メディア信号処理における独立成分分析の発展と応用, History of independent component analysis for sou... by Daichi Kitamura
音響メディア信号処理における独立成分分析の発展と応用, History of independent component analysis for sou...音響メディア信号処理における独立成分分析の発展と応用, History of independent component analysis for sou...
音響メディア信号処理における独立成分分析の発展と応用, History of independent component analysis for sou...
Daichi Kitamura12.2K views
独立性に基づくブラインド音源分離の発展と独立低ランク行列分析 History of independence-based blind source sep... by Daichi Kitamura
独立性に基づくブラインド音源分離の発展と独立低ランク行列分析 History of independence-based blind source sep...独立性に基づくブラインド音源分離の発展と独立低ランク行列分析 History of independence-based blind source sep...
独立性に基づくブラインド音源分離の発展と独立低ランク行列分析 History of independence-based blind source sep...
Daichi Kitamura5.9K views
ICASSP2017読み会(関東編)・AASP_L3(北村担当分) by Daichi Kitamura
ICASSP2017読み会(関東編)・AASP_L3(北村担当分)ICASSP2017読み会(関東編)・AASP_L3(北村担当分)
ICASSP2017読み会(関東編)・AASP_L3(北村担当分)
Daichi Kitamura4K views

Similar to Audio Source Separation Based on Low-Rank Structure and Statistical Independence

Blind source separation based on independent low-rank matrix analysis and its... by
Blind source separation based on independent low-rank matrix analysis and its...Blind source separation based on independent low-rank matrix analysis and its...
Blind source separation based on independent low-rank matrix analysis and its...Daichi Kitamura
1.4K views50 slides
Prior distribution design for music bleeding-sound reduction based on nonnega... by
Prior distribution design for music bleeding-sound reduction based on nonnega...Prior distribution design for music bleeding-sound reduction based on nonnega...
Prior distribution design for music bleeding-sound reduction based on nonnega...Kitamura Laboratory
99 views29 slides
AMT overview by
AMT overviewAMT overview
AMT overviewWarNik Chow
101 views33 slides
Linear multichannel blind source separation based on time-frequency mask obta... by
Linear multichannel blind source separation based on time-frequency mask obta...Linear multichannel blind source separation based on time-frequency mask obta...
Linear multichannel blind source separation based on time-frequency mask obta...Kitamura Laboratory
126 views21 slides
Koyama ASA ASJ joint meeting 2016 by
Koyama ASA ASJ joint meeting 2016Koyama ASA ASJ joint meeting 2016
Koyama ASA ASJ joint meeting 2016SaruwatariLabUTokyo
14.5K views23 slides
Robust music signal separation based on supervised nonnegative matrix factori... by
Robust music signal separation based on supervised nonnegative matrix factori...Robust music signal separation based on supervised nonnegative matrix factori...
Robust music signal separation based on supervised nonnegative matrix factori...Daichi Kitamura
1.2K views30 slides

Similar to Audio Source Separation Based on Low-Rank Structure and Statistical Independence(20)

Blind source separation based on independent low-rank matrix analysis and its... by Daichi Kitamura
Blind source separation based on independent low-rank matrix analysis and its...Blind source separation based on independent low-rank matrix analysis and its...
Blind source separation based on independent low-rank matrix analysis and its...
Daichi Kitamura1.4K views
Prior distribution design for music bleeding-sound reduction based on nonnega... by Kitamura Laboratory
Prior distribution design for music bleeding-sound reduction based on nonnega...Prior distribution design for music bleeding-sound reduction based on nonnega...
Prior distribution design for music bleeding-sound reduction based on nonnega...
Linear multichannel blind source separation based on time-frequency mask obta... by Kitamura Laboratory
Linear multichannel blind source separation based on time-frequency mask obta...Linear multichannel blind source separation based on time-frequency mask obta...
Linear multichannel blind source separation based on time-frequency mask obta...
Robust music signal separation based on supervised nonnegative matrix factori... by Daichi Kitamura
Robust music signal separation based on supervised nonnegative matrix factori...Robust music signal separation based on supervised nonnegative matrix factori...
Robust music signal separation based on supervised nonnegative matrix factori...
Daichi Kitamura1.2K views
PhD Thesis Marius Miron - Source Separation Methods for Orchestral Music by Marius Miron
PhD Thesis Marius Miron - Source Separation Methods for Orchestral MusicPhD Thesis Marius Miron - Source Separation Methods for Orchestral Music
PhD Thesis Marius Miron - Source Separation Methods for Orchestral Music
Marius Miron387 views
Amplitude spectrogram prediction from mel-frequency cepstrum coefficients and... by Kitamura Laboratory
Amplitude spectrogram prediction from mel-frequency cepstrum coefficients and...Amplitude spectrogram prediction from mel-frequency cepstrum coefficients and...
Amplitude spectrogram prediction from mel-frequency cepstrum coefficients and...
Music genre detection using hidden markov models by Meghana Kantharaj
Music genre detection using hidden markov modelsMusic genre detection using hidden markov models
Music genre detection using hidden markov models
Meghana Kantharaj461 views
Deep Learning Meetup #5 by Aloïs Gruson
Deep Learning Meetup #5Deep Learning Meetup #5
Deep Learning Meetup #5
Aloïs Gruson2.1K views
DNN-based frequency-domain permutation solver for multichannel audio source s... by Kitamura Laboratory
DNN-based frequency-domain permutation solver for multichannel audio source s...DNN-based frequency-domain permutation solver for multichannel audio source s...
DNN-based frequency-domain permutation solver for multichannel audio source s...
A Unified Music Recommender System Using Listening Habits and Semantics of Tags by datasciencekorea
A Unified Music Recommender System Using Listening Habits and Semantics of TagsA Unified Music Recommender System Using Listening Habits and Semantics of Tags
A Unified Music Recommender System Using Listening Habits and Semantics of Tags
datasciencekorea2.8K views
Data science-2013-heekim by Haklae Kim
Data science-2013-heekimData science-2013-heekim
Data science-2013-heekim
Haklae Kim711 views
IAFPA 2011- 'No Thank You For the Music' by owrpresentations
IAFPA 2011- 'No Thank You For the Music' IAFPA 2011- 'No Thank You For the Music'
IAFPA 2011- 'No Thank You For the Music'
owrpresentations1K views
Online divergence switching for superresolution-based nonnegative matrix fact... by Daichi Kitamura
Online divergence switching for superresolution-based nonnegative matrix fact...Online divergence switching for superresolution-based nonnegative matrix fact...
Online divergence switching for superresolution-based nonnegative matrix fact...
Daichi Kitamura624 views
FMRI medical imagining by Vishwas N
FMRI  medical imaginingFMRI  medical imagining
FMRI medical imagining
Vishwas N132 views
'Music and Noise Fingerprinting and Reference Cancellation Applied to Forensi... by owrpresentations
'Music and Noise Fingerprinting and Reference Cancellation Applied to Forensi...'Music and Noise Fingerprinting and Reference Cancellation Applied to Forensi...
'Music and Noise Fingerprinting and Reference Cancellation Applied to Forensi...
owrpresentations1.8K views
MLConf2013: Teaching Computer to Listen to Music by Eric Battenberg
MLConf2013: Teaching Computer to Listen to MusicMLConf2013: Teaching Computer to Listen to Music
MLConf2013: Teaching Computer to Listen to Music
Eric Battenberg2.6K views
Ml conf2013 teaching_computers_share by MLconf
Ml conf2013 teaching_computers_shareMl conf2013 teaching_computers_share
Ml conf2013 teaching_computers_share
MLconf1.2K views

More from Daichi Kitamura

独立低ランク行列分析に基づく音源分離とその発展(Audio source separation based on independent low-rank... by
独立低ランク行列分析に基づく音源分離とその発展(Audio source separation based on independent low-rank...独立低ランク行列分析に基づく音源分離とその発展(Audio source separation based on independent low-rank...
独立低ランク行列分析に基づく音源分離とその発展(Audio source separation based on independent low-rank...Daichi Kitamura
1.5K views91 slides
スペクトログラム無矛盾性を用いた 独立低ランク行列分析の実験的評価 by
スペクトログラム無矛盾性を用いた独立低ランク行列分析の実験的評価スペクトログラム無矛盾性を用いた独立低ランク行列分析の実験的評価
スペクトログラム無矛盾性を用いた 独立低ランク行列分析の実験的評価Daichi Kitamura
1.1K views24 slides
Windowsマシン上でVisual Studio Codeとpipenvを使ってPythonの仮想実行環境を構築する方法(Jupyter notebookも) by
Windowsマシン上でVisual Studio Codeとpipenvを使ってPythonの仮想実行環境を構築する方法(Jupyter notebookも)Windowsマシン上でVisual Studio Codeとpipenvを使ってPythonの仮想実行環境を構築する方法(Jupyter notebookも)
Windowsマシン上でVisual Studio Codeとpipenvを使ってPythonの仮想実行環境を構築する方法(Jupyter notebookも)Daichi Kitamura
2.8K views67 slides
独立低ランク行列分析に基づくブラインド音源分離(Blind source separation based on independent low-rank... by
独立低ランク行列分析に基づくブラインド音源分離(Blind source separation based on independent low-rank...独立低ランク行列分析に基づくブラインド音源分離(Blind source separation based on independent low-rank...
独立低ランク行列分析に基づくブラインド音源分離(Blind source separation based on independent low-rank...Daichi Kitamura
8.3K views67 slides
独立深層学習行列分析に基づく多チャネル音源分離の実験的評価(Experimental evaluation of multichannel audio s... by
独立深層学習行列分析に基づく多チャネル音源分離の実験的評価(Experimental evaluation of multichannel audio s...独立深層学習行列分析に基づく多チャネル音源分離の実験的評価(Experimental evaluation of multichannel audio s...
独立深層学習行列分析に基づく多チャネル音源分離の実験的評価(Experimental evaluation of multichannel audio s...Daichi Kitamura
4.1K views26 slides
独立深層学習行列分析に基づく多チャネル音源分離(Multichannel audio source separation based on indepen... by
独立深層学習行列分析に基づく多チャネル音源分離(Multichannel audio source separation based on indepen...独立深層学習行列分析に基づく多チャネル音源分離(Multichannel audio source separation based on indepen...
独立深層学習行列分析に基づく多チャネル音源分離(Multichannel audio source separation based on indepen...Daichi Kitamura
2.1K views15 slides

More from Daichi Kitamura(11)

独立低ランク行列分析に基づく音源分離とその発展(Audio source separation based on independent low-rank... by Daichi Kitamura
独立低ランク行列分析に基づく音源分離とその発展(Audio source separation based on independent low-rank...独立低ランク行列分析に基づく音源分離とその発展(Audio source separation based on independent low-rank...
独立低ランク行列分析に基づく音源分離とその発展(Audio source separation based on independent low-rank...
Daichi Kitamura1.5K views
スペクトログラム無矛盾性を用いた 独立低ランク行列分析の実験的評価 by Daichi Kitamura
スペクトログラム無矛盾性を用いた独立低ランク行列分析の実験的評価スペクトログラム無矛盾性を用いた独立低ランク行列分析の実験的評価
スペクトログラム無矛盾性を用いた 独立低ランク行列分析の実験的評価
Daichi Kitamura1.1K views
Windowsマシン上でVisual Studio Codeとpipenvを使ってPythonの仮想実行環境を構築する方法(Jupyter notebookも) by Daichi Kitamura
Windowsマシン上でVisual Studio Codeとpipenvを使ってPythonの仮想実行環境を構築する方法(Jupyter notebookも)Windowsマシン上でVisual Studio Codeとpipenvを使ってPythonの仮想実行環境を構築する方法(Jupyter notebookも)
Windowsマシン上でVisual Studio Codeとpipenvを使ってPythonの仮想実行環境を構築する方法(Jupyter notebookも)
Daichi Kitamura2.8K views
独立低ランク行列分析に基づくブラインド音源分離(Blind source separation based on independent low-rank... by Daichi Kitamura
独立低ランク行列分析に基づくブラインド音源分離(Blind source separation based on independent low-rank...独立低ランク行列分析に基づくブラインド音源分離(Blind source separation based on independent low-rank...
独立低ランク行列分析に基づくブラインド音源分離(Blind source separation based on independent low-rank...
Daichi Kitamura8.3K views
独立深層学習行列分析に基づく多チャネル音源分離の実験的評価(Experimental evaluation of multichannel audio s... by Daichi Kitamura
独立深層学習行列分析に基づく多チャネル音源分離の実験的評価(Experimental evaluation of multichannel audio s...独立深層学習行列分析に基づく多チャネル音源分離の実験的評価(Experimental evaluation of multichannel audio s...
独立深層学習行列分析に基づく多チャネル音源分離の実験的評価(Experimental evaluation of multichannel audio s...
Daichi Kitamura4.1K views
独立深層学習行列分析に基づく多チャネル音源分離(Multichannel audio source separation based on indepen... by Daichi Kitamura
独立深層学習行列分析に基づく多チャネル音源分離(Multichannel audio source separation based on indepen...独立深層学習行列分析に基づく多チャネル音源分離(Multichannel audio source separation based on indepen...
独立深層学習行列分析に基づく多チャネル音源分離(Multichannel audio source separation based on indepen...
Daichi Kitamura2.1K views
近接分離最適化によるブラインド⾳源分離(Blind source separation via proximal splitting algorithm) by Daichi Kitamura
近接分離最適化によるブラインド⾳源分離(Blind source separation via proximal splitting algorithm)近接分離最適化によるブラインド⾳源分離(Blind source separation via proximal splitting algorithm)
近接分離最適化によるブラインド⾳源分離(Blind source separation via proximal splitting algorithm)
Daichi Kitamura2K views
模擬ハムバッキング・ピックアップの弦振動応答 (in Japanese) by Daichi Kitamura
模擬ハムバッキング・ピックアップの弦振動応答 (in Japanese)模擬ハムバッキング・ピックアップの弦振動応答 (in Japanese)
模擬ハムバッキング・ピックアップの弦振動応答 (in Japanese)
Daichi Kitamura1.3K views
Evaluation of separation accuracy for various real instruments based on super... by Daichi Kitamura
Evaluation of separation accuracy for various real instruments based on super...Evaluation of separation accuracy for various real instruments based on super...
Evaluation of separation accuracy for various real instruments based on super...
Daichi Kitamura676 views
Divergence optimization based on trade-off between separation and extrapolati... by Daichi Kitamura
Divergence optimization based on trade-off between separation and extrapolati...Divergence optimization based on trade-off between separation and extrapolati...
Divergence optimization based on trade-off between separation and extrapolati...
Daichi Kitamura917 views
Depth estimation of sound images using directional clustering and activation-... by Daichi Kitamura
Depth estimation of sound images using directional clustering and activation-...Depth estimation of sound images using directional clustering and activation-...
Depth estimation of sound images using directional clustering and activation-...
Daichi Kitamura919 views

Recently uploaded

Determination of color fastness to rubbing(wet and dry condition) by crockmeter. by
Determination of color fastness to rubbing(wet and dry condition) by crockmeter.Determination of color fastness to rubbing(wet and dry condition) by crockmeter.
Determination of color fastness to rubbing(wet and dry condition) by crockmeter.ShadmanSakib63
6 views6 slides
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F... by
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...SwagatBehera9
5 views36 slides
Experimental animal Guinea pigs.pptx by
Experimental animal Guinea pigs.pptxExperimental animal Guinea pigs.pptx
Experimental animal Guinea pigs.pptxMansee Arya
42 views16 slides
별헤는 사람들 2023년 12월호 전명원 교수 자료 by
별헤는 사람들 2023년 12월호 전명원 교수 자료별헤는 사람들 2023년 12월호 전명원 교수 자료
별헤는 사람들 2023년 12월호 전명원 교수 자료sciencepeople
68 views30 slides
ALGAL PRODUCTS.pptx by
ALGAL PRODUCTS.pptxALGAL PRODUCTS.pptx
ALGAL PRODUCTS.pptxRASHMI M G
7 views17 slides
Applications of Large Language Models in Materials Discovery and Design by
Applications of Large Language Models in Materials Discovery and DesignApplications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and DesignAnubhav Jain
14 views17 slides

Recently uploaded(20)

Determination of color fastness to rubbing(wet and dry condition) by crockmeter. by ShadmanSakib63
Determination of color fastness to rubbing(wet and dry condition) by crockmeter.Determination of color fastness to rubbing(wet and dry condition) by crockmeter.
Determination of color fastness to rubbing(wet and dry condition) by crockmeter.
ShadmanSakib636 views
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F... by SwagatBehera9
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...
SwagatBehera95 views
Experimental animal Guinea pigs.pptx by Mansee Arya
Experimental animal Guinea pigs.pptxExperimental animal Guinea pigs.pptx
Experimental animal Guinea pigs.pptx
Mansee Arya42 views
별헤는 사람들 2023년 12월호 전명원 교수 자료 by sciencepeople
별헤는 사람들 2023년 12월호 전명원 교수 자료별헤는 사람들 2023년 12월호 전명원 교수 자료
별헤는 사람들 2023년 12월호 전명원 교수 자료
sciencepeople68 views
Applications of Large Language Models in Materials Discovery and Design by Anubhav Jain
Applications of Large Language Models in Materials Discovery and DesignApplications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and Design
Anubhav Jain14 views
A giant thin stellar stream in the Coma Galaxy Cluster by Sérgio Sacani
A giant thin stellar stream in the Coma Galaxy ClusterA giant thin stellar stream in the Coma Galaxy Cluster
A giant thin stellar stream in the Coma Galaxy Cluster
Sérgio Sacani20 views
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ... by ILRI
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
ILRI9 views
A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance... by InsideScientific
A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance...A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance...
A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance...
InsideScientific121 views
Ellagic Acid and Its Metabolites as Potent and Selective Allosteric Inhibitor... by Trustlife
Ellagic Acid and Its Metabolites as Potent and Selective Allosteric Inhibitor...Ellagic Acid and Its Metabolites as Potent and Selective Allosteric Inhibitor...
Ellagic Acid and Its Metabolites as Potent and Selective Allosteric Inhibitor...
Trustlife154 views
Study on Drug Drug Interaction Through Prescription Analysis of Type II Diabe... by Anmol Vishnu Gupta
Study on Drug Drug Interaction Through Prescription Analysis of Type II Diabe...Study on Drug Drug Interaction Through Prescription Analysis of Type II Diabe...
Study on Drug Drug Interaction Through Prescription Analysis of Type II Diabe...
ELECTRON TRANSPORT CHAIN by DEEKSHA RANI
ELECTRON TRANSPORT CHAINELECTRON TRANSPORT CHAIN
ELECTRON TRANSPORT CHAIN
DEEKSHA RANI16 views

Audio Source Separation Based on Low-Rank Structure and Statistical Independence

  • 1. Audio Source Separation Based on Low-Rank Structure and Statistical Independence The University of Tokyo Research Associate Daichi Kitamura Nagoya University, Lecture May 30, 2017
  • 2. Introduction • Daichi Kitamura (北村大地) • Research Associate of The University of Tokyo • Academic background – Kagawa National Collage of Technology (2005 ~ 2012) • B.S. in Engineering (March 2012) – Nara Institute of Science and Technology (2012 ~ 2014) • M.S. in Engineering (March 2014) – SOKENDAI (2014 ~ 2017) • Ph.D. in Informatics (March 2017) • Research topics – Media signal processing – Audio source separation 2
  • 3. Contents • Research background – Audio source separation and its applications – Demonstration • Structural modeling of audio sources – Time-frequency representation – Low-rank modeling of audio spectrogram – Supervised audio source separation • Statistical modeling between sources – Blind audio source separation – Audio distribution and central limit theorem – Maximization of independence • Conclusion and future works 3
  • 4. Contents • Research background – Audio source separation and its applications – Demonstration • Structural modeling of audio sources – Time-frequency representation – Low-rank modeling of audio spectrogram – Supervised audio source separation • Statistical modeling between sources – Blind audio source separation – Audio distribution and central limit theorem – Maximization of independence • Conclusion and future works 4
  • 5. • Audio source separation – Signal processing – Separation of speech, music sounds, background noise, … – Cocktail party effect by a computer Research background 5
  • 6. • Audio source separation – Signal processing – Separation of speech, music sounds, background noise, … – Cocktail party effect by a computer Research background 6
  • 7. Research background 7 Separate Automatic transcription CD • Application of audio source separation – Hearing aid • Easy to talk in a loud environment – Speech recognition systems • Siri, Google search, Cortana, Amazon Echo, … – Automatic music transcription • Musical part separation (Vo., Gt., Ba., …) – Remix of live-recorded music • Professional use (improving quality), personal use (DJ remixing), …
  • 8. Demonstration: speech source separation • Real-time speech source separation (video) 8
  • 9. Demonstration: music source separation • Music source separation 9 Guitar Vocal Keyboard Guitar Vocal Keyboard Source separation Pay attention to listen three parts in the mixture.
  • 10. Contents • Research background – Audio source separation and its applications – Demonstration • Structural modeling of audio sources – Time-frequency representation – Low-rank modeling of audio spectrogram – Supervised audio source separation • Statistical modeling between sources – Blind audio source separation – Audio distribution and central limit theorem – Maximization of independence • Conclusion and future works 10 For monaural signals For stereo or multichannel signals
  • 11. Contents • Research background – Audio source separation and its applications – Demonstration • Structural modeling of audio sources – Time-frequency representation – Low-rank modeling of audio spectrogram – Supervised audio source separation • Statistical modeling between sources – Blind audio source separation – Audio distribution and central limit theorem – Maximization of independence • Conclusion and future works 11 For monaural signals For stereo or multichannel signals
  • 12. Time-frequency representation of audio signals • Audio waveform in time domain (speech) 12
  • 13. • Time-varying frequency structure – Short-time Fourier transform (STFT) Time-frequency representation of audio signals 13 Time domain Window FFT length Shift length Time-frequency domain Waveform … Fourier transform Fourier transform Fourier transform Spectrogram Complex-valued matrix Frequency Time … Power spectrogram Nonnegative real-valued matrix Entry-wise absolute and power
  • 14. Power spectrogram of speech 14
  • 16. • Sparse (for both speech and music) – Strong (yellow) components are fewer – Weak (darker) components are dominant • Continuous contour (only in speech) – Spectrum continuously and dynamically changes • Low rank (especially in music) – Including similar patterns (similar timbres) many times Structural properties 16Speech Music
  • 17. Comparison of low-rankness 17 Drums Guitar Vocals Speech
  • 18. • Low-rankness (simplicity of a matrix) – can be measured by a cumulative singular value (CSV) – Drums and guitar are quite low-rank • Also, vocals and speech are to some extent low-rank – Music spectrogram can be modeled by few patterns Comparison of low-rankness 18 95% line 7 29 Around 90 Number of bases when CSV reaches 95% (Spectrogram size is 1025x1883)
  • 19. Modeling technique of low-rank structures • Nonnegative matrix factorization (NMF) [Lee, 1999] – is a low-rank approximation using limited number of bases • Bases and their coefficients must be nonnegative – can be applied to a power spectrogram • Spectral patterns (typical timbres) and their time-varying gains 19 Amplitude Amplitude Nonnegative matrix (power spectrogram) Basis matrix (spectral patterns) Activation matrix (time-varying gains) Time : # of frequency bins : # of time frames : # of bases Time Frequency Frequency Basis Activation
  • 20. • Parameters optimization in NMF – Minimize “similarity measure” between and – Arbitrarily measure for similarity can be used • Squared Euclidian distance , etc. – Closed form solution is still an open problem – Iterative calculation can minimize • Multiplicative update rules [Lee, 2000] Modeling technique of low-rank structures 20 (for the case of squared Euclidian distance)
  • 21. Modeling technique of low-rank structures • Example 21 Pf. and Cl. Superposition of rank-1 spectrogram
  • 22. Modeling technique of low-rank structures • Example – Pf. and Cl. are separated! – Source separation based on NMF • is a clustering problem of the obtained spectral bases in – But how? 22 Pf. Cl. Pf. and Cl.
  • 23. • If the sourcewise training data is available, • Supervised NMF [Smaragdis, 2007], [Kitamura1, 2014] Supervised audio source separation with NMF 23 Separation stage Training stage Given Spectral dictionary of Pf. Other bases Only , , and are optimized
  • 24. • Demonstration – Stereo music separation with supervised NMF [Kitamura, 2015] Supervised audio source separation with NMF 24 Original song Training sound of Pf. Separated sound (Pf.) Training sound of Ba. Separated sound (Ba.)
  • 25. • Performance will be limited – when the difference of timbres between training data and target source in the mixture becomes large Problem of supervised approach 25 Mixture sound Target Different Pf. Slightly different Training data 60 40 20 0 -20 Amplitude[dB] 3.02.52.01.51.00.50.0 Frequency [kHz] Real sound Artificial sound by MIDI Difference of timbres Mixture (actual Pf. & Tb.) Separated signal using artificial Pf. as training data Supervised NMF
  • 26. • Supervised NMF with basis deformation [Kitamura, 2013] – employs to adaptively deform pre-trained bases in Adaptive supervised audio source separation 26 Training stage Deformation term (positive and negative) Slightly different Separation stage Given
  • 27. • Constraint in deformation term – Range of deformation is restricted – To avoid excess deformation of Adaptive supervised audio source separation 27 Mixture (actual Pf. & Tb.) Separated signal Supervised NMF Separated signal Supervised NMF with basis deformation Training data is the same (artificial Pf. sound) Frequency Frequency ±30% For the case of
  • 28. • Demonstration – Separate actual instrumental sounds using artificial training data produced by MIDI synthesizer. Adaptive supervised audio source separation 28Copyright © 2014 Yamaha Corp. All rights reserved. Original song (actual instruments) Training sound of Sax. (produced by MIDI) Separated sound (Sax.) Training sound of Ba. (produced by MIDI) Separated sound (Ba.) Residual sound Residual sound
  • 29. Contents • Research background – Audio source separation and its applications – Demonstration • Structural modeling of audio sources – Time-frequency representation – Low-rank modeling of audio spectrogram – Supervised audio source separation • Statistical modeling between sources – Blind audio source separation – Audio distribution and central limit theorem – Maximization of independence • Conclusion and future works 29 For monaural signals For stereo or multichannel signals
  • 30. Multichannel recording using microphone array • Number of microphones and sources – Overdetermined situation (# of sources # of mics.) – Underdetermined situation (# of sources # of mics.) • a priori information – Training data of the source, position of sources, room geometry, music scores, etc. – Blind source separation (BSS): without any a priori info. 30 Sources Observed Estimated Mixing system Demixing system Microphone array CD L-ch R-ch Stereo signal (2-ch) One mic. 1-ch Monaural signal (1-ch)
  • 31. BSS and independent component analysis • Blind source separation (BSS) – Estimate demixing system without any prior information about the mixing system • Typical BSS is based on a statistical independence • Independent component analysis (ICA) [Comon, 1994] – How to measure a statistical independence? – Define a “distribution of audio signals” – Find demixing system that maximizes independence 31 Demixing systemMixing system
  • 32. What is the distribution of audio signals? • Distribution of speech waveform 13 Amplitude Time samples Spiky and heavy-tailed than Gaussian (Normal) distribution Amountofcomponents Amplitude 0 0.1 0.2 0.3 0.4 0.5 -5 -4 -3 -2 -1 0 1 2 3 4 5 Gaussian distribution
  • 33. What is the distribution of audio signals? • Distribution of Piano waveform 13 Amplitude Time samples Spiky and heavy-tailed than Gaussian distribution Amountofcomponents Amplitude 0 0.1 0.2 0.3 0.4 0.5 0.6 -5 -4 -3 -2 -1 0 1 2 3 4 5 Laplace distribution
  • 34. What is the distribution of audio signals? • Distribution of Drums waveform 13 Amplitude Time samples Spiky and heavy-tailed than Gaussian distribution Amountofcomponents Amplitude 0 0.2 0.4 0.6 0.8 1 -5 -4 -3 -2 -1 0 1 2 3 4 5 Cauchy distribution
  • 35. Central limit theorem 35 • Audio source distribution is basically non-Gaussian – But still we don’t know the source distribution • How to model them for source separation? • Central limit theorem – “A sum of any kind of random variables always approaches to having a Gaussian distribution.”* • Can’t believe? Let’s see 0 0.1 0.2 0.3 0.4 0.5 0.6 -5 -4 -3 -2 -1 0 1 2 3 4 5 Laplace distribution 0 0.002 0.004 0.006 0.008 0.01 -5 -4 -3 -2 -1 0 1 2 3 4 5 Uniform distribution Generate r.v.s Gaussian distribution 0 0.1 0.2 0.3 0.4 0.5 -5 -4 -3 -2 -1 0 1 2 3 4 5 * Several r.v.s do not obey, e.g., Cauchy r.v.
  • 36. Central limit theorem 36 • is pips of first dice, and is pips of second dice – – Probability is always 1/6 • Results of 1 million trials for each dice – What about ? Amount Amount
  • 37. Central limit theorem 37 • is pips of first dice, and is pips of second dice – – Probability is always 1/6 • Results of 1 million trials for each dice – What about ? Amount Not a uniform distribution any more
  • 38. Central limit theorem 38 • is pips of first dice, and is pips of second dice – – Probability is always 1/6 • Results of 1 million trials for each dice Amount Amount
  • 39. Central limit theorem 39 • is pips of first dice, and is pips of second dice – – Probability is always 1/6 • Results of 1 million trials for each dice – Approaches to a Gaussian distribution (central limit theorem)
  • 40. Central limit theorem in audio signals 40 • is an th speakers signal – – , around 3.3 s Amplitude Time samples Amount Amplitude Amplitude Time samples Amount Amplitude
  • 41. Central limit theorem in audio signals 41 • is an th speakers signal – – , around 3.3 s Amplitude Time samples AmountAmplitude
  • 42. Central limit theorem in audio signals 42 • is an th speakers signal – – , around 3.3 s Amplitude Time samples Amount Amplitude Amplitude Time samples Amount Amplitude
  • 43. • is an th speakers signal – – , around 3.3 s Central limit theorem in audio signals 43 Amplitude Time samples AmountAmplitude
  • 44. • is an th speakers signal – – , around 3.3 s Central limit theorem in audio signals 44 Amplitude Time samples AmountAmplitude Almost a Gaussian dist. (central limit theorem)
  • 45. Principle of ICA 45 • What we can say from central limit theorem – Gaussian distribution is a limitation of mixture of sources – If we maximize non-Gaussianity of all signals, the signals will be the original sources before they mixed Basic principle of ICA Maximizing non-Gaussianity Maximizing independence between components More general, Approaching to Gaussian (central limit theorem) Departing from Gaussian (ICA)
  • 46. Principle of ICA • Assumption in ICA – 1. Sources are mutually independent – 2. Each source distribution is non-Gaussian – 3. Mixing system is invertible and time-invariant Mixing matrix Sources (latent components)1. Mutually independent 2. Non-Gaussian 3. Invertible and time-invariant 10 Mixtures (observed signals) Inverse matrix
  • 47. Principle of ICA • Uncertainty in ICA – 1. Signal scale (volume) cannot determined – 2. Signal permutation cannot determined 11 ICA ICA Sources (latent components) Mixtures (observed signals) Sources (latent components) Mixtures (observed signals) Separated signals (estimated by ICA) Separated signals (estimated by ICA)
  • 48. • Estimation in ICA – Maximize independence between source distributions – log-likelihood function Principle of ICA 12 Minimize distance : Non-Gaussian source distribution Generally, is set to an appropriate non-Gaussian distribution
  • 49. • Audio mixture in actual environment – Convolutive mixture with reverberation • Ex. office room has 300 ms, concert hall is more than 2000 ms – Mixing coefficient becomes mixing filter • How to deconvolute them? – 1. Estimate deconvolution filter • In 16 kHz sampling, the filter with 300 ms includes 4800 taps • # of parameters that should be estimated explodes – 2. Estimate demixing coefficient in frequency domain • Frequency-wise demixing matrix should be estimated by ICA • encountering permutation problem ICA-based separation of reverberant mixture 49 Reverberation length (length of convolution filter) Simultaneous mixture Convolutive mixture
  • 50. ICA-based separation of reverberant mixture • Frequency-domain ICA (FDICA) [Smaragdis, 1998] – Apply simple ICA to each frequency bin 50 Spectrogram ICA1 ICA2 ICA3 … … ICA Frequencybin Time frame … Inverse matrix Frequency-wise mixing matrix Frequency-wise demixing matrix
  • 51. ICA-based separation of reverberant mixture 51 • Permutation problem in frequency-domain ICA – Order of separated signals in each frequency is messed up* – Have to take an alignment through the frequency *Scales are also messed up, but they can be easily fixed. ICA In all frequency Source 1 Source 2 Mixture 1 Mixture 2 Permutation Solver Separated signal 1 Separated signal 2Time
  • 52. ICA-based separation of reverberant mixture • Popular permutation solvers – Based on direction of arrival (DOA) • Frequency-domain ICA + DOA alignment [Saruwatari, 2006] – Based on a relative correlation among frequencies • Independent vector analysis (IVA) [Hiroe, 2006], [Kim, 2006] – Based on a low-rank modeling of each source • Independent low-rank matrix analysis (ILRMA) [Kitamura, 2016] • Demonstration of BSS using ILRMA – http://d-kitamura.net/en/demo_rank1_en.htm 52
  • 53. Contents • Research background – Audio source separation and its applications – Demonstration • Structural modeling of audio sources – Time-frequency representation – Low-rank modeling of audio spectrogram – Supervised audio source separation • Statistical modeling between sources – Blind audio source separation – Audio distribution and central limit theorem – Maximization of independence • Conclusion and future works 53
  • 54. Conclusions and future works • Audio source separation based on – Low-rank property • Nonnegative matrix factorization – Statistical independence • Blind source separation • For further improving – Separation based on a huge dataset training • Deep learning, denoising auto encoder, etc. • Recording condition is juts one-time – Informed source separation • Music scores could be a powerful information • User can induce the system, and leads more accurate separation • Performance is still insufficient – Almost there? Not at all! Make our life better. That’s an engineering. 54 Duration Region