From idea to production in a day – Leveraging Azure ML and Streamlit to build...
[Chung il kim] 0829 thesis
1. HIgh Performance
Computing & Systems LAB
Unsupervised feature learning for audio classif-
ication using convolutional deep belief networks
Honglak Lee Yan Largman Peter Pham Andrew Y. Ng
Thesis Presenter Chung il Kim
Computer Science Departement, Stanford University
Stanford, CA 94305
Advances in Neural Information Processing Systems 22 (NIPS 2009)
2. High Performance Computing & Systems Lab
Contents
Abstract & Introduction
Theory & Algorithm
Convolutional Deep Belief Networks(CDBN)
on Shift Invariant Sparse Coding(SISC)
Unsupervised Feature Learning
Application to Audio Recognition Tasks
Speech Recognition
Music Classification
Discussion and Conclusion
31th Aug 2017, Paper Seminar 2
3. High Performance Computing & Systems Lab
1. Abstract & Introduction (1)
Abstract
Deep learning approaches
Build hieratical representations on unlabeled data
Focusing on unlabeled auditory data
Using Convolutional deep belief network(CDBN)
Evaluate auditory data on various audio classification tasks
RAW
MFCC
CDBN(L1, L2)
31th Aug 2017, Paper Seminar 3
4. High Performance Computing & Systems Lab
1. Abstract & Introduction (2)
Introduction
Issue of Audio data recognition
Toward high dimension and complex
Previous work[1, 2]
sparse coding leads to filters correspond to cochlear filters
Related work[3]
Efficient sparse coding algorithm for audio classification tasks
– Feature sign search algorithm(FS-EXACT, FS-Window)
– Lagrangian of DFT
31th Aug 2017, Paper Seminar 4
[1] E. C. Smith and M. S. Lewicki. Efficient auditory coding. Nature, 439:978–982, 2006.
[2] B. A. Olshausen and D. J. Field. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381:607–609, 1996.
[3] R. Grosse, R. Raina, H. Kwong, and A.Y. Ng. Shift-invariant sparse codig for audio classification. In UAI, 2007.
5. High Performance Computing & Systems Lab
1. Abstract & Introduction (3)
Introduction
The limit of those methods
Applied to learn relatively shallow
1-layer representations
Many promising approached [4, 5, 6, 7, 8] usually Image
Fast
With energy-based model
Greedy
Empirical evaluation
But Deep learning not applied to auditory data
31th Aug 2017, Paper Seminar 5
[4]G. E. Hinton, S. Osindero, and Y.-W. Teh. A fast learning algorithm for deep belief nets. Neural Computation, 18(7):1527–1554, 2006.
[5]M. Ranzato, C. Poultney, S. Chopra, and Y. LeCun. Efficient learning of sparse representations with an energy-based model. In NIPS, 2006.
[6]Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle. Greedy layer-wise training of deep networks. In NIPS, 2006.
[7]H. Larochelle, D. Erhan, A. Courville, J. Bergstra, and Y. Bengio. An empirical evaluation of deep architectures on problems with many factors of variation. In ICML, 2007.
[8]H. Lee, C. Ekanadham, and A. Y. Ng. Sparse deep belief network model for visual area V2. In NIPS, 2008.
6. High Performance Computing & Systems Lab
1. Abstract & Introduction (4)
Introduction
Deep belief network
Generative probabilitistic model
– Composed 1 visible layer, and many hidden layer
Well-trained using ‘Greedy Layerwise Training’
Convolutional deep belief network(CDBN) [9]
Also trained as greedy, bottom-up fashion
Good performance in several visual recognition tasks
CDBN on unlabeled audio data
evaluate the learned feature representations
– several audio classification tasks
31th Aug 2017, Paper Seminar 6
[9]H. Lee, R. Grosse, R. Ranganath, and A. Y. Ng. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In ICML, 2009.
7. High Performance Computing & Systems Lab
2. Convolutional Deep Belief Network (1)
Convolutional Restricted Boltzmann Machines(CRBMs)
CDBN, consist of CRBMs block model
31th Aug 2017, Paper Seminar 7
<Figure 1> Image of Convolution Deep Belief Networks
1. Set partial area
2. Get detection through filter
(highly overcomplete, sparse needed)
3. Pooling(usually max-pooling)
4. Greedy layerwise traing
-more than 1
5. Get pattern of visible data.
8. High Performance Computing & Systems Lab
2. Convolutional Deep Belief Network (2)
Convolutional Restricted Boltzmann Machines(CRBMs)
Extension of ‘regular’ Restricted Boltzmann Machines(RBMs)
Decrease related dimension
makes sparsity problem
31th Aug 2017, Paper Seminar 8
<Figure2> dimension down, sparse
9. High Performance Computing & Systems Lab
2. Convolutional Deep Belief Network (3)
CDBNs
Energy function
CRBMs’ Probability distribution referred using energy(next page)
31th Aug 2017, Paper Seminar 9
<Formula 1> Energy function of CRBMs in binary(up) and real-valued(down)
nv : No. dimensional array of binary unit
nw : No. dimensional filter array
K : number of filter
nH : No. dimensional array of hidden unit
(nv – nw + 1)
bk : shared bias for each group
c : shared bias for visible units
10. High Performance Computing & Systems Lab
2. Convolutional Deep Belief Network (4)
CDBNs
Probability distribution
CRBMs’ Probability distribution referred using energy
31th Aug 2017, Paper Seminar 10
<Formula 2> joint and conditional probability distributions
*v : valid convolution
*f : full convolution
11. High Performance Computing & Systems Lab
2. Convolutional Deep Belief Network (5)
Pooling layer
Shrink data map
In classification, usually using max-pooling
31th Aug 2017, Paper Seminar 11
0 0.5 0.5 0.4
0.7 0.1 0.2 0.4
0.9 0.3 0.7 0.5
0.5 0.8 0.2 0
0.7 0.5 0.5
0.9 0.7 0.7
0.9 0.8 0.7
<Picture3> Image of max-pooling
12. High Performance Computing & Systems Lab
2. Convolutional Deep Belief Network (6)
Process of CDBNs
Set partial area
Get detection through filter (highly overcomplete, sparse needed)
Pooling(usually max-pooling)
Greedy layerwise Training more than 1
Get pattern of visible data.
31th Aug 2017, Paper Seminar 12
<Picture4> Process of CDBNs
https://deeplearning4j.org/kr/convolutionnets
13. High Performance Computing & Systems Lab
3. on Shift Invariant Sparse Coding (1)
Sparsity
Typical CRBM is highly overcomplete
Sparsity penalty term added to log-likelihood
To solve overfitting problem in deep neural networks
Avoid full-connectivity
This algorithm uses LASSO(the Least Absolute Shrinkage and Selection Operator)
31th Aug 2017, Paper Seminar 13
<Formula 4> the objective of sparsity
<Formula 3> the training objective
14. High Performance Computing & Systems Lab
3. on Shift Invariant Sparse Coding (2)
Two algorithm for solve SISC in audio data
Coefficient, Figure-Sign Search algorithm
Efficiency for short signals (x->low-dimensional)
Not good for over 1minute
31th Aug 2017, Paper Seminar 14
<Pseudo 1> Feature-sign search algorithm 1
R. Grosse, R. Raina, H. Kwong, and A.Y. Ng. Shift-invariant sparse coding for audio classification. In UAI, 2007
15. High Performance Computing & Systems Lab
3. on Shift Invariant Sparse Coding (3)
Two algorithm for solve SISC in audio data
Bases, using Lagrangian and DFT
1st, Discrete Fourier Transform
– signal decompose
2nd, Set Lagrangian
– To solve optimization
3rd, using Newton’s method(in this Paper)
31th Aug 2017, Paper Seminar 15
16. High Performance Computing & Systems Lab
3. on Shift Invariant Sparse Coding (4)
Approach Tasks
Using LASSO
Partial differential equation
Bias ↑, variance ↓ (trade off)
31th Aug 2017, Paper Seminar 16
Liang Sun Arizona State University, Efficient Sparse Coding Algorithms, http://slideplayer.com/slide/4953202/
<Pseudo 2> Feature-sign search algorithm 2
17. High Performance Computing & Systems Lab
3. on Shift Invariant Sparse Coding (5)
Approach Tasks
By resulting ‘unconstrained QP’
Compute analytical solution
This is subvector of x
Using discrete line search(LS), update x with to the point.
Collect value which coefficient changes sign, and update the lowest one
31th Aug 2017, Paper Seminar 17
Liang Sun Arizona State University, Efficient Sparse Coding Algorithms, http://slideplayer.com/slide/4953202/
<Pseudo 3> Feature-sign search algorithm 2
18. High Performance Computing & Systems Lab
3. on Shift Invariant Sparse Coding (6)
Approach Tasks
Last matching those condition, and repeat it.
31th Aug 2017, Paper Seminar 18
Liang Sun Arizona State University, Efficient Sparse Coding Algorithms, http://slideplayer.com/slide/4953202/
<Pseudo 2> Feature-sign search algorithm 2
19. High Performance Computing & Systems Lab
3. on Shift Invariant Sparse Coding (7)
Result of FS search(learning speed)
31th Aug 2017, Paper Seminar 19
20. High Performance Computing & Systems Lab
3. on Shift Invariant Sparse Coding (8)
Result of FS search(Speech)
Speech data (TIMIT)
1 second long, 32 speech signal with basis function
Filter
SISC(with FS), MFCC(Mel Frequency Cepstral Coefficient), RAW
31th Aug 2017, Paper Seminar 20
21. High Performance Computing & Systems Lab
3. on Shift Invariant Sparse Coding (9)
Result of FS search(Musical genre)
2-second, 5-way musical genre song.
Filter
SISC(with FS), TC(Tzanetakis & Cook)
MFCC(Mel Frequency Cepstrum Coefficient), RAW
31th Aug 2017, Paper Seminar 21
22. High Performance Computing & Systems Lab
4. Unsupervised Feature Learning (1)
Description of TIMIT Data
For researching speech recognition systems
American English
In This Research
Spectrogram form
Window size : 20ms
Overlaps : 10ms
Using PCA-Whitening(with 80 components)
– To reduce the dimensionality
Research Contents
Phonemes
Speaker gender
31th Aug 2017, Paper Seminar 22
23. High Performance Computing & Systems Lab
4. Unsupervised Feature Learning (2)
Layer and Training Setting
1st layer
300 bases
Filter length(nw) : 6
Max-pooling ratio : 3
31th Aug 2017, Paper Seminar 23
2nd layer
300 bases (output of 1st layer)
Filter length : 6
Max-pooling ratio : 3
24. High Performance Computing & Systems Lab
4. Unsupervised Feature Learning (3)
Phonemes and the CDBN features
31th Aug 2017, Paper Seminar 24
Analysis
Vowel(“ah”, “oy”)
Prominent horizontal bands
Lower freq.
“oy”
Upward slanting pattern
25. High Performance Computing & Systems Lab
4. Unsupervised Feature Learning (4)
Phonemes and the CDBN features
31th Aug 2017, Paper Seminar 25
Analysis
Fricatives(“s”)
Energy in the high freq.
“el”
High intensity in low freq.
Low intensity follows in high freq.
26. High Performance Computing & Systems Lab
4. Unsupervised Feature Learning (5)
Speaker gender information & CDBN features
Female, finer horizontal banding pattern in low freq.
L1, L2 correspond to basis.
31th Aug 2017, Paper Seminar 26
27. High Performance Computing & Systems Lab
5. Speech Recognition(Speaker ID) (1)
About bases data
No. speakers : 168
Sentenses per speaker : 10
Total sentenses : 1680
1. Speaker Identification Test
10 Random trials
Training : TIMIT data
All data expressed as Spectrogram
RAW, MFCC, CDBN L1, CDBN L2, CDBN L1+L2
Simple summary statistics for each channel
Evaluate features using standard supervised classifiers
SVM(Sub Vector Machine), GDA(Gaussian Discriminant Analysis), KNN
(K-Nearest Neigbor classification)
31th Aug 2017, Paper Seminar 27
28. High Performance Computing & Systems Lab
5. Speech Recognition(Speaker ID) (2)
Speaker Identification
31th Aug 2017, Paper Seminar 28
29. High Performance Computing & Systems Lab
5. Speech Recognition(Speaker ID) (3)
2. Speaker Gender classification
Randomly sampled training examples
200 testing examples
20 trials
31th Aug 2017, Paper Seminar 29
30. High Performance Computing & Systems Lab
5. Speech Recognition(Speaker ID) (4)
3. Phone Classification
39way phone classification accuracy
Over 5 random trials
31th Aug 2017, Paper Seminar 30
31. High Performance Computing & Systems Lab
6. Music Classification (1)
1. Genre classification
1st and 2nd layer
Music data from: ISMIR
Bases : 300
Filter length : 10
Max-pooling ratio : 3
Randomly sampled 3-second segment(Training or testing sample)
Genre : 5-way(classical, electirc, jazz, pop and rock)
20 random trials on each training samples
31th Aug 2017, Paper Seminar 31
32. High Performance Computing & Systems Lab
6. Music Classification (2)
2. Artist classification
1st and 2nd layer (same as genre classification)
Music data from: ISMIR
Bases : 300
Filter length : 10
Max-pooling ratio : 3
Randomly sampled 3-second segment(Training or testing sample)
Genre only classical music
Only 4-way artist
Over 20 random trials (in average)
31th Aug 2017, Paper Seminar 32
33. High Performance Computing & Systems Lab
6. Music Classification (2)
2. Artist classification
31th Aug 2017, Paper Seminar 33
34. High Performance Computing & Systems Lab
7. Discussion
Not suitable on Modern Speech
Much larger than the TIMIT data set.
This research’s target
Restrict amount of the labeled data
Remains interesting problem
Deep learning to larger datasets
More challenging tasks
31th Aug 2017, Paper Seminar 34
35. High Performance Computing & Systems Lab
8. Conclusion
Applied CDBN to audio data
Evaluate on various audio classification tasks
Not using a large Amount of data
This learned feature often equaled or surpassed MFCC
(MFCC hand-tailored to audio data)
Combining both, achieve higher classification accuracy
L1 CDBN, high performance on multiple audio recognition tasks
Hope Inspiring automatically learning deep feature
In audio data
31th Aug 2017, Paper Seminar 35