SlideShare a Scribd company logo
1 of 62
Details of Lazy Deep
Learning for Images
Recognition in ZZ Photo app
Artem Chernodub, George Pashchenko
IMMSP NASU
Kharkov AI Club, 20 June 2015.
ZZ Photo
𝑝 𝑥 𝑦 =
𝑝 𝑦 𝑥 𝑝(𝑥)
𝑝(𝑦)
Biological-inspired models
Neuroscience
Machine Learning
2 / 62
Biological Neural Networks
3 / 62
Artificial Neural Networks
Traditional (Shallow) Neural
Networks
Deep Neural Networks
Deep Feedforward Neural
Networks
Recurrent Neural Networks
4 / 62
Conventional Methods vs Deep
Learning
5 / 62
Deep Learning = Learning of
Representations (Features)
The traditional model of pattern recognition (since the late
50's):
fixed/engineered features + trainable classifier
Hand-crafted
Feature
Extractor
Trainable
Classifier
Trainable
Feature
Extractor
Trainable
Classifier
End-to-end learning / Feature learning / Deep learning:
trainable features + trainable classifier
6 / 62
ImageNet
Le et al. “Building high-level features using large-scale unsupervised
learning” ICML 2012.
Model # of
parameters
Accuracy, %
Deep Net 10M 15.8
best state-of-the-art N/A 9.3
Training data: 16M images, 20K categories
7 / 62
Deep Face (Facebook)
Y. Taigman, M. Yang, M.A. Ranzato, L. Wolf. DeepFace: Closing the Gap
to Human-Level Performance in Face Verification // CVPR 2014.
Model # of
parameters
Accuracy, %
Deep Face Net 128M 97.35
Human level N/A 97.5
Training data: 4M facial images
8 / 62
TIMIT Phoneme Recognition
Graves, A., Mohamed, A.-R., and Hinton, G. E. (2013). Speech recognition
with deep recurrent neural networks // IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP), pages 6645–6649.
IEEE.
Mohamed, A. and Hinton, G. E. (2010). Phone recognition using restricted
Boltzmann machines // IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP), pages 4354–4357.
Model # of parameters Error
Hidden Markov Model, HMM N / A 27,3%
Deep Belief Network, DBN ~ 4M 26,7%
Deep RNN 4,3M 17.7%
Training data: 462 speakers train / 24 speakers test, 3.16 / 0.14
hrs.
9 / 62
Google Large Vocabulary Speech
Recognition
H. Sak, A. Senior, F. Beaufays. Long Short-Term Memory Recurrent Neural
Network Architectures for Large Scale Acoustic Modeling //
INTERSPEECH’2014.
K. Vesely, A. Ghoshal, L. Burget, D. Povey. Sequence-discriminative
training of deep neural networks // INTERSPEECH’2014.
Model # of
parameters
Cross-entropy
ReLU DNN 85M 11.3
Deep Projection LSTM
RNN
13M 10.7
Training data: 3M utterances (1900 hrs).
10 / 62
Classic Feedforward Neural
Networks (before 2006).
• Single hidden layer (Kolmogorov-Cybenko Universal
Approximation Theorem as the main hope).
• Vanishing gradients effect prevents using more layers.
• Less than 10K free parameters.
• Feature preprocessing stage is often critical.
11 / 62
Training the traditional (shallow)
Neural Network: derivative + optimization
12 / 62
1) forward propagation pass
),( )1(

i
ijij xwfz
),()1(~ )2(

j
jj zwgky
where zj is the postsynaptic value for the j-th hidden neuron, w(1) are the hidden
layer’s weights, f() are the hidden layer’s activation functions, w(2) are the output
layer’s weights, and g() are the output layer’s activation functions.
13 / 62
2) backpropagation pass
Local gradients calculation:
),1(~)1(  kyktOUT

.)(' )2( OUT
jj
HID
j wzf  
,
)(
)2( j
OUT
j
z
w
kE



.
)(
)1( i
IN
j
ji
x
w
kE



Derivatives calculation:
14 / 62
Bad effect of vanishing (exploding)
gradients: a problem
,
)( )1()(
)(



 m
i
m
jm
ji
z
w
kE

,' )1()()1()( 
 m
i
i
m
ij
m
j
m
j wf  0
)(
)(



m
jiw
kE
=> 1mfor
15 / 62
Bad effect of vanishing (exploding)
gradients: two hypotheses
1) increased frequency and
severity of bad local
minima
2) pathological curvature, like
the type seen in the well-
known
Rosenbrock function: 222
)(100)1(),( xyxyxf 
16 / 62
Deep Feedforward Neural
Networks
• 2-stage training process: i) unsupervised pre-training; ii) fine
tuning (vanishing gradients problem is beaten!).
• Number of hidden layers > 1 (usually 6-9).
• 100K – 100M free parameters.
• No (or less) feature preprocessing stage.
17 / 62
Sparse Autoencoders
18 / 62
Dimensionality
reduction
• Use a stacked RBM as deep auto-
encoder
1. Train RBM with images as input &
output
2. Limit one layer to few dimensions
 Information has to pass through middle
layer
G. E. Hinton and R. R. Salakhutdinov. Reducing the Dimensionality of Data
with Neural Networks // Science 313 (2006), p. 504 – 507.
19 / 62
Original
Deep
RBN
PCA
Dimensionality reduction
Olivetti face data, 25x25 pixel images reconstructed from 30
dimensions (625  30)
G. E. Hinton and R. R. Salakhutdinov. Reducing the Dimensionality of Data
with Neural Networks // Science 313 (2006), p. 504 – 507.
20 / 62
How to use unsupervised pre-
training stage / 1
21 / 62
How to use unsupervised pre-
training stage / 2
22 / 62
How to use unsupervised pre-
training stage / 3
23 / 62
How to use unsupervised pre-
training stage / 4
24 / 62
Unlabeled data
Unlabeled data is readily available
Example: Images from the web
1. Download 10’000’000 images
2. Train a 9-layer DNN
3. Concepts are formed by DNN
G. E. Hinton and R. R. Salakhutdinov. Reducing the Dimensionality of Data
with Neural Networks // Science 313 (2006), p. 504 – 507.
25 / 62
Dimensionality reduction
PCA Deep RBN
804’414 Reuters news stories, reduction to 2 dimensions
G. E. Hinton and R. R. Salakhutdinov. Reducing the Dimensionality of Data
with Neural Networks // Science 313 (2006), p. 504 – 507.
26 / 62
Hierarchy of trained
representations
Low-level
feature
Middle-
level
feature
Top-level
feature
Feature visualization of convolutional net trained on ImageNet from [Zeiler
& Fergus 2013]
27 / 62
Hessian-Free optimization: Deep
Learning with no pre-training stage
J. Martens. Deep Learning via Hessian-free Optimization // Proceedings of
the 27th International Conference on Machine Learning (ICML), 2010.
28 / 62
FLOPS comparison
https://ru.wikipedia.org/wiki/FLOPS
Type Name Flops Cost
Mobile Raspberry Pi 1st Gen,
700 Mhz
0,04 Gflops $35
Mobile Apple A8 1,4 Gflops $700 (in iPhone
6)
CPU Intel Core i7-4930K (Ivy
Bridge), 3.7 GHz
140 Gflops $700
CPU Intel Core i7-5960X
(Haswell), 3.0 GHz
350 Gflops $1300
GPU NVidia GTX 980 4612 Gflops (single
precision), 144
Gflops (double
precision)
$600 + cost of
PC (~$1000)
GPU NVidia Tesla K80 8740 Gflops (single
precision), 2910
Gflops (double
precision)
$4500 + cost of
PC (~1500)
29 / 62
Deep Networks Training time using
GPU
• Pretraining – from 2-3 weeks to 2-3
months.
• Fine-tuning (final supervised training) –
from 1 day to 1 week.
30 / 62
Tools for training Deep Neural
Networks
D. Kruchinin, E. Dolotov, K. Kornyakov, V. Kustikova, P. Druzhkov. The
Comparison of Deep Learning Libraries on the Problem of Handwritten
Digit Classication // Analysis of Images, Social Networks and Texts (AIST),
2015, April, 9-11th, Yekaterinburg.
31 / 62
Convolutional Neural Networks:
Return of Jedi
Andrej Karpathy and Fei-Fei. CS231n: Convolutional Neural Networks
for Visual Recognition http://cs231n.github.io/convolutional-networks
Yoshua Bengio, Ian Goodfellow and Aaron Courville. Deep Learning //
An MIT Press book in preparation http://www-
labs.iro.umontreal.ca/~bengioy/DLbook 32 / 62
AlexNet, CNN-Mega-HiT,
results on LSVRC-2012
A. Kryzhevsky, I. Sutskever, G.E. Hinton. ImageNet Classification with
Deep Convolutional Neural Networks // Advances in Neural Information
Processing Systems 25 (NIPS 2012).
33 / 62
Lazy Deep Learning: idea
A. S. Razavian, H. Azizpour, J. Sullivan, S. Carlsson. CNN Features off-
the-shelf: an Astounding Baseline for Recognition //2014 IEEE Conference
on Computer Vision and Pattern Recognition Workshops (CVPRW), 23-28
June 2014, Columbus, USA, p. 512 – 519.
34 / 62
Lazy Deep Learning: bechmark
results
A. S. Razavian, H. Azizpour, J. Sullivan, S. Carlsson. CNN Features off-
the-shelf: an Astounding Baseline for Recognition //2014 IEEE Conference
on Computer Vision and Pattern Recognition Workshops (CVPRW), 23-28
June 2014, Columbus, USA, p. 512 – 519.
35 / 62
MIT-8 toy problem: formulation
• 8 classes
• 2080 images in total
• TRAIN: 2000
images (250 per
class)
• TEST: 688 images,
86 per class
S. Banerji, A. Verma, C. Liu. Novel Color LBP Descriptors for Scene and
Image
Texture Classification // Cross Disciplinary Biometric Systems, 2012, 15th
International Conference on Image Processing, Computer Vision, and
Pattern Recognition, Las Vegas, Nevada, pp. 205-225. 36 / 62
MIT-8 toy problem: results
Acc.
TRAIN
Acc.
TEST
1 LBP + SVM with RBF Kernel 27,2% 19,0%
2 LPQ + SVM with RBF kernel 38,4% 30,5%
3 LBP + SVM with χ2 kernel 94,2% 74,0%
4 LPQ + SVM with χ2 kernel 99,1% 82,2%
5 Deep CNN (AlexNet) + SVM RBF kernel (LAZY DL) 95,1% 91,8%
6 Deep CNN (AlexNet) + SVM with χ2 Kernel (LAZY DL) 100,0% 93,2%
7 Deep CNN (AlexNet) + MLP (LAZY DL) 100,0% 92,3%
Original results, to be published.
37 / 62
ZZ Photo – photo organizer
Trial version is available at http://zzphoto.me
38 / 62
Viola-Jones Object Detector
• Very popular for Human Face Detection.
• May be trained for Cat and Dog Face detection.
• Available free in OpenCV library (http://opencv.org).
O. Parkhi, A. Vedaldi, C. V. Jawahar, and A. Zisserman. The Truth about
Cats and Dogs // Proceedings of the International Conference on
Computer Vision (ICCV), 2011. J.
Liu, A. Kanazawa, D. Jacobs, P. Belhumeur. Dog Breed Classification
Using Part Localization // Lecture Notes in Computer Science Volume
7572, 2012, pp 172-185.
39 / 62
Images pyramid for Viola-Jones
40 / 62
Viola-Jones Object Detector
Classifier Structure
P. Viola, M. Jones. Rapid object detection using a boosted cascade of
simple features // Proceedings of the 2001 IEEE Computer Society
Conference on Computer Vision and Pattern Recognition, CVPR 2001.
41 / 62
AlexNet design
A. Kryzhevsky, I. Sutskever, G.E. Hinton. ImageNet Classification with
Deep Convolutional Neural Networks // Advances in Neural Information
Processing Systems 25 (NIPS 2012).
42 / 62
Pets detection problem (Kaggle
Dataset + random Other images)
• Kaggle Dataset +
random “other”
images;
• 2 classes (cats &
dogs VS other);
• TRAIN: 5,000
samples;
• TEST: 12,000
samples. 43 / 62
Pets detection results: FAR vs FRR
graphs
Original results, to be published.
44 / 62
Pet detection results : ROC curve
Original results, to be published.
45 / 62
Pets detection results,
FAR error is fixed to 0.5%
FRR Error
1 Viola-Jones Face Detector for Cats & Dogs + LBP + SVM 79,73%
2 AlexNet, argmax (STANDARD DL, ImageNet-2012, 1000) 32,05%
3 AlexNet, sum (STANDARD DL, ImageNet-2012, 1000) 26,11%
4 AlexNet + SVM linear (LAZY DL) 4,35%
Original results, to be published.
46 / 62
Development of AleksNet on
OpenCV
VGG MatConvNet: CNNs for MATLAB http://www.vlfeat.org/matconvnet/
mexopencv:MATLAB-OpenCV interface
http://kyamagu.github.io/mexopencv/matlab
MatConvNet,
MATLAB + CUDA
OpenCV app,
C++
YAML
YAML, BIN
47 / 62
Convolution Layer
Andrej Karpathy and Fei-Fei. CS231n: Convolutional Neural Networks for
Visual Recognition http://cs231n.github.io/convolutional-networks
Yoshua Bengio, Ian Goodfellow and Aaron Courville. Deep Learning // An
MIT Press book in preparation http://www-
labs.iro.umontreal.ca/~bengioy/DLbook 48 / 62
Pooling layer
Andrej Karpathy and Fei-Fei. CS231n: Convolutional Neural Networks for
Visual Recognition http://cs231n.github.io/convolutional-networks
Yoshua Bengio, Ian Goodfellow and Aaron Courville. Deep Learning // An
MIT Press book in preparation http://www-
labs.iro.umontreal.ca/~bengioy/DLbook 49 / 62
Activation functions
Andrej Karpathy and Fei-Fei. CS231n: Convolutional Neural Networks for
Visual Recognition http://cs231n.github.io/convolutional-networks
Yoshua Bengio, Ian Goodfellow and Aaron Courville. Deep Learning // An
MIT Press book in preparation http://www-
labs.iro.umontreal.ca/~bengioy/DLbook
𝑓(𝑥) = max 0, 𝑥
𝑓′
𝑥 =
1, 𝑥 ≥ 0
0, 𝑥 < 0
ReLU activation
function
50 / 62
Implementation tricks: im2col
K. Chellapilla, S. Puri, P. Simard. High Performance Convolutional Neural
Networks for Document Processing // International Workshop on Frontiers
in Handwriting Recognition, 2006.
51 / 62
Implementation tricks: im2col for
convolution
K. Chellapilla, S. Puri, P. Simard. High Performance Convolutional Neural
Networks for Document Processing // International Workshop on Frontiers
in Handwriting Recognition, 2006.
52 / 62
Matrix multiplication
Matrices’ size C OpenCV
C++ (use
STL vector
class)
OpenBLAS Matlab
1000×1000 1.45 1.76 1.47 0.062 0.062
2000×2000 11.64 14.2 11.23 0.99 0.54
3000×3000 38.11 47.2 37.99 1.75 1.7
4000×4000 90.84 110.37 90.2 7.91 4.2
5000×5000 180.74 213.4 181.02 10.8 7.3
6000×6000 315.46 376.46 316.3 25.33 12.74
https://4fire.wordpress.com/2012/04/29/matrices-multiplication-on-windows-
matlab-is-the-champion-again/
53 / 62
OpenBLAS
• OpenBLAS is an open source
implementation of the BLAS (Basic Linear
Algebra Subprograms) API with many hand-
crafted optimizations for specific processor
types.
http://www.openblas.net/
54 / 62
Sizes of layers
0.09 1.56 2.25 2.25 2.25
144.02
64.02
15.63
LAYER 1-4 LAYER 5-8 LAYER 9-10 LAYER 11-12 LAYER 13-15 LAYER 16-17 LAYER 18-19 LAYER 20-21
Size,mb
~ 8,5 mb ~ 223 mb
55 / 62
Pets test #2: data
1 mini-set:
- 500 cats
- 500 dogs
- 1000 negatives
56 / 62
Pets test #2: results
0.00
5.00
10.00
15.00
20.00
25.00
30.00
35.00
100 200 500 1000 2000 5000 10000 18000
FRR,%
Train size
15 layer
17 layer
19 layer
57 / 62
Pets test #2: results - FRR, %
(FAR is fixed to 0,5%)
Layer #
Train size 15 16 19
100 30,08 12,61 12,94
500 17,91 10,41 10,72
1000 11,59 7,52 6,80
5000 7,41 3,88 4,13
10000 6,29 3,66 2,71
18000 5,16 2,64 2,54
58 / 62
Calculation speed
0
5
10
15
20
25
30
35
40
45
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Time,ms
Layer #
~ 73 ms ~ 60 ms
59 / 62
Labeled Faces in the Wild (LFW)
Dataset
G. B. Huang, M. Ramesh, T. Berg, E. Learned-Miller. Labeled Faces in the
Wild: A Database for Studying Face Recognition in Unconstrained
Environments // University of Massachusetts, Amherst, Technical Report
07-49, October, 2007
• more than 13,000
images of faces
collected from the web.
• Pairs comparison,
restricted mode.
• test: 10-fold cross-
validation, 6000 face
pairs.
60 / 62
Face Recognition on LWF, results
Y. Taigman, M. Yang, M. Ranzato, L. Wolf. DeepFace: Closing the Gap to
Human-Level Performance in Face Verification, 2014, CVPR.
Accuracy, %
1 Principal Component Analysis (EigenFaces) 60,2%
2 Local Binary Pattern Histograms (LBP) 72,4%
3 Deep CNN (AlexNet) + Euclid (LAZY DL) 71,0%
4 DeepFace by Facebook (STANDARD DL) 97,25%
61 / 62
contact: a.chernodub@gmail.com
george.pashchenko@gmail.com
Thanks!

More Related Content

What's hot

Transfer Learning and Fine-tuning Deep Neural Networks
 Transfer Learning and Fine-tuning Deep Neural Networks Transfer Learning and Fine-tuning Deep Neural Networks
Transfer Learning and Fine-tuning Deep Neural Networks
PyData
 
Yann le cun
Yann le cunYann le cun
Yann le cun
Yandex
 
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr TeterwakLearn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
PyData
 

What's hot (20)

Tutorial on Deep Learning
Tutorial on Deep LearningTutorial on Deep Learning
Tutorial on Deep Learning
 
Deep Learning And Business Models (VNITC 2015-09-13)
Deep Learning And Business Models (VNITC 2015-09-13)Deep Learning And Business Models (VNITC 2015-09-13)
Deep Learning And Business Models (VNITC 2015-09-13)
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
Devil in the Details: Analysing the Performance of ConvNet Features
Devil in the Details: Analysing the Performance of ConvNet FeaturesDevil in the Details: Analysing the Performance of ConvNet Features
Devil in the Details: Analysing the Performance of ConvNet Features
 
(Msc Thesis) Sparse Coral Classification Using Deep Convolutional Neural Netw...
(Msc Thesis) Sparse Coral Classification Using Deep Convolutional Neural Netw...(Msc Thesis) Sparse Coral Classification Using Deep Convolutional Neural Netw...
(Msc Thesis) Sparse Coral Classification Using Deep Convolutional Neural Netw...
 
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
 
Deep Learning with Python (PyData Seattle 2015)
Deep Learning with Python (PyData Seattle 2015)Deep Learning with Python (PyData Seattle 2015)
Deep Learning with Python (PyData Seattle 2015)
 
Deep learning in Computer Vision
Deep learning in Computer VisionDeep learning in Computer Vision
Deep learning in Computer Vision
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
Transfer Learning and Fine-tuning Deep Neural Networks
 Transfer Learning and Fine-tuning Deep Neural Networks Transfer Learning and Fine-tuning Deep Neural Networks
Transfer Learning and Fine-tuning Deep Neural Networks
 
Synthetic dialogue generation with Deep Learning
Synthetic dialogue generation with Deep LearningSynthetic dialogue generation with Deep Learning
Synthetic dialogue generation with Deep Learning
 
Convolutional Neural Networks for Computer vision Applications
Convolutional Neural Networks for Computer vision ApplicationsConvolutional Neural Networks for Computer vision Applications
Convolutional Neural Networks for Computer vision Applications
 
Yann le cun
Yann le cunYann le cun
Yann le cun
 
Intro To Convolutional Neural Networks
Intro To Convolutional Neural NetworksIntro To Convolutional Neural Networks
Intro To Convolutional Neural Networks
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
DeepFix: a fully convolutional neural network for predicting human fixations...
DeepFix:  a fully convolutional neural network for predicting human fixations...DeepFix:  a fully convolutional neural network for predicting human fixations...
DeepFix: a fully convolutional neural network for predicting human fixations...
 
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr TeterwakLearn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
 
Promises of Deep Learning
Promises of Deep LearningPromises of Deep Learning
Promises of Deep Learning
 
Deep learning - Conceptual understanding and applications
Deep learning - Conceptual understanding and applicationsDeep learning - Conceptual understanding and applications
Deep learning - Conceptual understanding and applications
 
Machine Learning and Deep Learning with R
Machine Learning and Deep Learning with RMachine Learning and Deep Learning with R
Machine Learning and Deep Learning with R
 

Similar to Details of Lazy Deep Learning for Images Recognition in ZZ Photo app

(Research Note) Delving deeper into convolutional neural networks for camera ...
(Research Note) Delving deeper into convolutional neural networks for camera ...(Research Note) Delving deeper into convolutional neural networks for camera ...
(Research Note) Delving deeper into convolutional neural networks for camera ...
Jacky Liu
 
imageclassification-160206090009.pdf
imageclassification-160206090009.pdfimageclassification-160206090009.pdf
imageclassification-160206090009.pdf
KammetaJoshna
 
A SYSTEMATIC STUDY OF DEEP LEARNING ARCHITECTURES FOR ANALYSIS OF GLAUCOMA AN...
A SYSTEMATIC STUDY OF DEEP LEARNING ARCHITECTURES FOR ANALYSIS OF GLAUCOMA AN...A SYSTEMATIC STUDY OF DEEP LEARNING ARCHITECTURES FOR ANALYSIS OF GLAUCOMA AN...
A SYSTEMATIC STUDY OF DEEP LEARNING ARCHITECTURES FOR ANALYSIS OF GLAUCOMA AN...
ijaia
 
Unsupervised Deconvolution Neural Network for High Quality Ultrasound Imaging
Unsupervised Deconvolution Neural Network for High Quality Ultrasound ImagingUnsupervised Deconvolution Neural Network for High Quality Ultrasound Imaging
Unsupervised Deconvolution Neural Network for High Quality Ultrasound Imaging
Shujaat Khan
 
UNetEliyaLaialy (2).pptx
UNetEliyaLaialy (2).pptxUNetEliyaLaialy (2).pptx
UNetEliyaLaialy (2).pptx
NoorUlHaq47
 

Similar to Details of Lazy Deep Learning for Images Recognition in ZZ Photo app (20)

AI&BigData Lab 2016. Артем Чернодуб: Обучение глубоких, очень глубоких и реку...
AI&BigData Lab 2016. Артем Чернодуб: Обучение глубоких, очень глубоких и реку...AI&BigData Lab 2016. Артем Чернодуб: Обучение глубоких, очень глубоких и реку...
AI&BigData Lab 2016. Артем Чернодуб: Обучение глубоких, очень глубоких и реку...
 
(Research Note) Delving deeper into convolutional neural networks for camera ...
(Research Note) Delving deeper into convolutional neural networks for camera ...(Research Note) Delving deeper into convolutional neural networks for camera ...
(Research Note) Delving deeper into convolutional neural networks for camera ...
 
imageclassification-160206090009.pdf
imageclassification-160206090009.pdfimageclassification-160206090009.pdf
imageclassification-160206090009.pdf
 
Image classification with Deep Neural Networks
Image classification with Deep Neural NetworksImage classification with Deep Neural Networks
Image classification with Deep Neural Networks
 
Recent advances of AI for medical imaging : Engineering perspectives
Recent advances of AI for medical imaging : Engineering perspectivesRecent advances of AI for medical imaging : Engineering perspectives
Recent advances of AI for medical imaging : Engineering perspectives
 
Machine learning in science and industry — day 4
Machine learning in science and industry — day 4Machine learning in science and industry — day 4
Machine learning in science and industry — day 4
 
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution
 
Exploring EEG for object detection and retrieval
Exploring EEG  for object detection and retrievalExploring EEG  for object detection and retrieval
Exploring EEG for object detection and retrieval
 
A Survey of Deep Learning Algorithms for Malware Detection
A Survey of Deep Learning Algorithms for Malware DetectionA Survey of Deep Learning Algorithms for Malware Detection
A Survey of Deep Learning Algorithms for Malware Detection
 
A SYSTEMATIC STUDY OF DEEP LEARNING ARCHITECTURES FOR ANALYSIS OF GLAUCOMA AN...
A SYSTEMATIC STUDY OF DEEP LEARNING ARCHITECTURES FOR ANALYSIS OF GLAUCOMA AN...A SYSTEMATIC STUDY OF DEEP LEARNING ARCHITECTURES FOR ANALYSIS OF GLAUCOMA AN...
A SYSTEMATIC STUDY OF DEEP LEARNING ARCHITECTURES FOR ANALYSIS OF GLAUCOMA AN...
 
Reservoir computing fast deep learning for sequences
Reservoir computing   fast deep learning for sequencesReservoir computing   fast deep learning for sequences
Reservoir computing fast deep learning for sequences
 
Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)
 
An Evolutionary-based Neural Network for Distinguishing between Genuine and P...
An Evolutionary-based Neural Network for Distinguishing between Genuine and P...An Evolutionary-based Neural Network for Distinguishing between Genuine and P...
An Evolutionary-based Neural Network for Distinguishing between Genuine and P...
 
Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning
 
A Survey on Image Processing using CNN in Deep Learning
A Survey on Image Processing using CNN in Deep LearningA Survey on Image Processing using CNN in Deep Learning
A Survey on Image Processing using CNN in Deep Learning
 
AINL 2016: Filchenkov
AINL 2016: FilchenkovAINL 2016: Filchenkov
AINL 2016: Filchenkov
 
Unsupervised Deconvolution Neural Network for High Quality Ultrasound Imaging
Unsupervised Deconvolution Neural Network for High Quality Ultrasound ImagingUnsupervised Deconvolution Neural Network for High Quality Ultrasound Imaging
Unsupervised Deconvolution Neural Network for High Quality Ultrasound Imaging
 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learning
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
 
UNetEliyaLaialy (2).pptx
UNetEliyaLaialy (2).pptxUNetEliyaLaialy (2).pptx
UNetEliyaLaialy (2).pptx
 

More from PAY2 YOU

ZZ Photo product review
ZZ Photo product reviewZZ Photo product review
ZZ Photo product review
PAY2 YOU
 

More from PAY2 YOU (6)

PAY2YOU
PAY2YOUPAY2YOU
PAY2YOU
 
PAY2YOU
PAY2YOUPAY2YOU
PAY2YOU
 
ZZ Photo product review
ZZ Photo product reviewZZ Photo product review
ZZ Photo product review
 
ZZ Photo
ZZ PhotoZZ Photo
ZZ Photo
 
Машинное обучение для интеллектуализации ваших приложений
Машинное обучение  для интеллектуализации ваших приложенийМашинное обучение  для интеллектуализации ваших приложений
Машинное обучение для интеллектуализации ваших приложений
 
ZZ Photo
ZZ PhotoZZ Photo
ZZ Photo
 

Recently uploaded

%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Recently uploaded (20)

Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT  - Elevating Productivity in Today's Agile EnvironmentHarnessing ChatGPT  - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 

Details of Lazy Deep Learning for Images Recognition in ZZ Photo app

  • 1. Details of Lazy Deep Learning for Images Recognition in ZZ Photo app Artem Chernodub, George Pashchenko IMMSP NASU Kharkov AI Club, 20 June 2015. ZZ Photo
  • 2. 𝑝 𝑥 𝑦 = 𝑝 𝑦 𝑥 𝑝(𝑥) 𝑝(𝑦) Biological-inspired models Neuroscience Machine Learning 2 / 62
  • 4. Artificial Neural Networks Traditional (Shallow) Neural Networks Deep Neural Networks Deep Feedforward Neural Networks Recurrent Neural Networks 4 / 62
  • 5. Conventional Methods vs Deep Learning 5 / 62
  • 6. Deep Learning = Learning of Representations (Features) The traditional model of pattern recognition (since the late 50's): fixed/engineered features + trainable classifier Hand-crafted Feature Extractor Trainable Classifier Trainable Feature Extractor Trainable Classifier End-to-end learning / Feature learning / Deep learning: trainable features + trainable classifier 6 / 62
  • 7. ImageNet Le et al. “Building high-level features using large-scale unsupervised learning” ICML 2012. Model # of parameters Accuracy, % Deep Net 10M 15.8 best state-of-the-art N/A 9.3 Training data: 16M images, 20K categories 7 / 62
  • 8. Deep Face (Facebook) Y. Taigman, M. Yang, M.A. Ranzato, L. Wolf. DeepFace: Closing the Gap to Human-Level Performance in Face Verification // CVPR 2014. Model # of parameters Accuracy, % Deep Face Net 128M 97.35 Human level N/A 97.5 Training data: 4M facial images 8 / 62
  • 9. TIMIT Phoneme Recognition Graves, A., Mohamed, A.-R., and Hinton, G. E. (2013). Speech recognition with deep recurrent neural networks // IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6645–6649. IEEE. Mohamed, A. and Hinton, G. E. (2010). Phone recognition using restricted Boltzmann machines // IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4354–4357. Model # of parameters Error Hidden Markov Model, HMM N / A 27,3% Deep Belief Network, DBN ~ 4M 26,7% Deep RNN 4,3M 17.7% Training data: 462 speakers train / 24 speakers test, 3.16 / 0.14 hrs. 9 / 62
  • 10. Google Large Vocabulary Speech Recognition H. Sak, A. Senior, F. Beaufays. Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling // INTERSPEECH’2014. K. Vesely, A. Ghoshal, L. Burget, D. Povey. Sequence-discriminative training of deep neural networks // INTERSPEECH’2014. Model # of parameters Cross-entropy ReLU DNN 85M 11.3 Deep Projection LSTM RNN 13M 10.7 Training data: 3M utterances (1900 hrs). 10 / 62
  • 11. Classic Feedforward Neural Networks (before 2006). • Single hidden layer (Kolmogorov-Cybenko Universal Approximation Theorem as the main hope). • Vanishing gradients effect prevents using more layers. • Less than 10K free parameters. • Feature preprocessing stage is often critical. 11 / 62
  • 12. Training the traditional (shallow) Neural Network: derivative + optimization 12 / 62
  • 13. 1) forward propagation pass ),( )1(  i ijij xwfz ),()1(~ )2(  j jj zwgky where zj is the postsynaptic value for the j-th hidden neuron, w(1) are the hidden layer’s weights, f() are the hidden layer’s activation functions, w(2) are the output layer’s weights, and g() are the output layer’s activation functions. 13 / 62
  • 14. 2) backpropagation pass Local gradients calculation: ),1(~)1(  kyktOUT  .)(' )2( OUT jj HID j wzf   , )( )2( j OUT j z w kE    . )( )1( i IN j ji x w kE    Derivatives calculation: 14 / 62
  • 15. Bad effect of vanishing (exploding) gradients: a problem , )( )1()( )(     m i m jm ji z w kE  ,' )1()()1()(   m i i m ij m j m j wf  0 )( )(    m jiw kE => 1mfor 15 / 62
  • 16. Bad effect of vanishing (exploding) gradients: two hypotheses 1) increased frequency and severity of bad local minima 2) pathological curvature, like the type seen in the well- known Rosenbrock function: 222 )(100)1(),( xyxyxf  16 / 62
  • 17. Deep Feedforward Neural Networks • 2-stage training process: i) unsupervised pre-training; ii) fine tuning (vanishing gradients problem is beaten!). • Number of hidden layers > 1 (usually 6-9). • 100K – 100M free parameters. • No (or less) feature preprocessing stage. 17 / 62
  • 19. Dimensionality reduction • Use a stacked RBM as deep auto- encoder 1. Train RBM with images as input & output 2. Limit one layer to few dimensions  Information has to pass through middle layer G. E. Hinton and R. R. Salakhutdinov. Reducing the Dimensionality of Data with Neural Networks // Science 313 (2006), p. 504 – 507. 19 / 62
  • 20. Original Deep RBN PCA Dimensionality reduction Olivetti face data, 25x25 pixel images reconstructed from 30 dimensions (625  30) G. E. Hinton and R. R. Salakhutdinov. Reducing the Dimensionality of Data with Neural Networks // Science 313 (2006), p. 504 – 507. 20 / 62
  • 21. How to use unsupervised pre- training stage / 1 21 / 62
  • 22. How to use unsupervised pre- training stage / 2 22 / 62
  • 23. How to use unsupervised pre- training stage / 3 23 / 62
  • 24. How to use unsupervised pre- training stage / 4 24 / 62
  • 25. Unlabeled data Unlabeled data is readily available Example: Images from the web 1. Download 10’000’000 images 2. Train a 9-layer DNN 3. Concepts are formed by DNN G. E. Hinton and R. R. Salakhutdinov. Reducing the Dimensionality of Data with Neural Networks // Science 313 (2006), p. 504 – 507. 25 / 62
  • 26. Dimensionality reduction PCA Deep RBN 804’414 Reuters news stories, reduction to 2 dimensions G. E. Hinton and R. R. Salakhutdinov. Reducing the Dimensionality of Data with Neural Networks // Science 313 (2006), p. 504 – 507. 26 / 62
  • 27. Hierarchy of trained representations Low-level feature Middle- level feature Top-level feature Feature visualization of convolutional net trained on ImageNet from [Zeiler & Fergus 2013] 27 / 62
  • 28. Hessian-Free optimization: Deep Learning with no pre-training stage J. Martens. Deep Learning via Hessian-free Optimization // Proceedings of the 27th International Conference on Machine Learning (ICML), 2010. 28 / 62
  • 29. FLOPS comparison https://ru.wikipedia.org/wiki/FLOPS Type Name Flops Cost Mobile Raspberry Pi 1st Gen, 700 Mhz 0,04 Gflops $35 Mobile Apple A8 1,4 Gflops $700 (in iPhone 6) CPU Intel Core i7-4930K (Ivy Bridge), 3.7 GHz 140 Gflops $700 CPU Intel Core i7-5960X (Haswell), 3.0 GHz 350 Gflops $1300 GPU NVidia GTX 980 4612 Gflops (single precision), 144 Gflops (double precision) $600 + cost of PC (~$1000) GPU NVidia Tesla K80 8740 Gflops (single precision), 2910 Gflops (double precision) $4500 + cost of PC (~1500) 29 / 62
  • 30. Deep Networks Training time using GPU • Pretraining – from 2-3 weeks to 2-3 months. • Fine-tuning (final supervised training) – from 1 day to 1 week. 30 / 62
  • 31. Tools for training Deep Neural Networks D. Kruchinin, E. Dolotov, K. Kornyakov, V. Kustikova, P. Druzhkov. The Comparison of Deep Learning Libraries on the Problem of Handwritten Digit Classication // Analysis of Images, Social Networks and Texts (AIST), 2015, April, 9-11th, Yekaterinburg. 31 / 62
  • 32. Convolutional Neural Networks: Return of Jedi Andrej Karpathy and Fei-Fei. CS231n: Convolutional Neural Networks for Visual Recognition http://cs231n.github.io/convolutional-networks Yoshua Bengio, Ian Goodfellow and Aaron Courville. Deep Learning // An MIT Press book in preparation http://www- labs.iro.umontreal.ca/~bengioy/DLbook 32 / 62
  • 33. AlexNet, CNN-Mega-HiT, results on LSVRC-2012 A. Kryzhevsky, I. Sutskever, G.E. Hinton. ImageNet Classification with Deep Convolutional Neural Networks // Advances in Neural Information Processing Systems 25 (NIPS 2012). 33 / 62
  • 34. Lazy Deep Learning: idea A. S. Razavian, H. Azizpour, J. Sullivan, S. Carlsson. CNN Features off- the-shelf: an Astounding Baseline for Recognition //2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 23-28 June 2014, Columbus, USA, p. 512 – 519. 34 / 62
  • 35. Lazy Deep Learning: bechmark results A. S. Razavian, H. Azizpour, J. Sullivan, S. Carlsson. CNN Features off- the-shelf: an Astounding Baseline for Recognition //2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 23-28 June 2014, Columbus, USA, p. 512 – 519. 35 / 62
  • 36. MIT-8 toy problem: formulation • 8 classes • 2080 images in total • TRAIN: 2000 images (250 per class) • TEST: 688 images, 86 per class S. Banerji, A. Verma, C. Liu. Novel Color LBP Descriptors for Scene and Image Texture Classification // Cross Disciplinary Biometric Systems, 2012, 15th International Conference on Image Processing, Computer Vision, and Pattern Recognition, Las Vegas, Nevada, pp. 205-225. 36 / 62
  • 37. MIT-8 toy problem: results Acc. TRAIN Acc. TEST 1 LBP + SVM with RBF Kernel 27,2% 19,0% 2 LPQ + SVM with RBF kernel 38,4% 30,5% 3 LBP + SVM with χ2 kernel 94,2% 74,0% 4 LPQ + SVM with χ2 kernel 99,1% 82,2% 5 Deep CNN (AlexNet) + SVM RBF kernel (LAZY DL) 95,1% 91,8% 6 Deep CNN (AlexNet) + SVM with χ2 Kernel (LAZY DL) 100,0% 93,2% 7 Deep CNN (AlexNet) + MLP (LAZY DL) 100,0% 92,3% Original results, to be published. 37 / 62
  • 38. ZZ Photo – photo organizer Trial version is available at http://zzphoto.me 38 / 62
  • 39. Viola-Jones Object Detector • Very popular for Human Face Detection. • May be trained for Cat and Dog Face detection. • Available free in OpenCV library (http://opencv.org). O. Parkhi, A. Vedaldi, C. V. Jawahar, and A. Zisserman. The Truth about Cats and Dogs // Proceedings of the International Conference on Computer Vision (ICCV), 2011. J. Liu, A. Kanazawa, D. Jacobs, P. Belhumeur. Dog Breed Classification Using Part Localization // Lecture Notes in Computer Science Volume 7572, 2012, pp 172-185. 39 / 62
  • 40. Images pyramid for Viola-Jones 40 / 62
  • 41. Viola-Jones Object Detector Classifier Structure P. Viola, M. Jones. Rapid object detection using a boosted cascade of simple features // Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2001. 41 / 62
  • 42. AlexNet design A. Kryzhevsky, I. Sutskever, G.E. Hinton. ImageNet Classification with Deep Convolutional Neural Networks // Advances in Neural Information Processing Systems 25 (NIPS 2012). 42 / 62
  • 43. Pets detection problem (Kaggle Dataset + random Other images) • Kaggle Dataset + random “other” images; • 2 classes (cats & dogs VS other); • TRAIN: 5,000 samples; • TEST: 12,000 samples. 43 / 62
  • 44. Pets detection results: FAR vs FRR graphs Original results, to be published. 44 / 62
  • 45. Pet detection results : ROC curve Original results, to be published. 45 / 62
  • 46. Pets detection results, FAR error is fixed to 0.5% FRR Error 1 Viola-Jones Face Detector for Cats & Dogs + LBP + SVM 79,73% 2 AlexNet, argmax (STANDARD DL, ImageNet-2012, 1000) 32,05% 3 AlexNet, sum (STANDARD DL, ImageNet-2012, 1000) 26,11% 4 AlexNet + SVM linear (LAZY DL) 4,35% Original results, to be published. 46 / 62
  • 47. Development of AleksNet on OpenCV VGG MatConvNet: CNNs for MATLAB http://www.vlfeat.org/matconvnet/ mexopencv:MATLAB-OpenCV interface http://kyamagu.github.io/mexopencv/matlab MatConvNet, MATLAB + CUDA OpenCV app, C++ YAML YAML, BIN 47 / 62
  • 48. Convolution Layer Andrej Karpathy and Fei-Fei. CS231n: Convolutional Neural Networks for Visual Recognition http://cs231n.github.io/convolutional-networks Yoshua Bengio, Ian Goodfellow and Aaron Courville. Deep Learning // An MIT Press book in preparation http://www- labs.iro.umontreal.ca/~bengioy/DLbook 48 / 62
  • 49. Pooling layer Andrej Karpathy and Fei-Fei. CS231n: Convolutional Neural Networks for Visual Recognition http://cs231n.github.io/convolutional-networks Yoshua Bengio, Ian Goodfellow and Aaron Courville. Deep Learning // An MIT Press book in preparation http://www- labs.iro.umontreal.ca/~bengioy/DLbook 49 / 62
  • 50. Activation functions Andrej Karpathy and Fei-Fei. CS231n: Convolutional Neural Networks for Visual Recognition http://cs231n.github.io/convolutional-networks Yoshua Bengio, Ian Goodfellow and Aaron Courville. Deep Learning // An MIT Press book in preparation http://www- labs.iro.umontreal.ca/~bengioy/DLbook 𝑓(𝑥) = max 0, 𝑥 𝑓′ 𝑥 = 1, 𝑥 ≥ 0 0, 𝑥 < 0 ReLU activation function 50 / 62
  • 51. Implementation tricks: im2col K. Chellapilla, S. Puri, P. Simard. High Performance Convolutional Neural Networks for Document Processing // International Workshop on Frontiers in Handwriting Recognition, 2006. 51 / 62
  • 52. Implementation tricks: im2col for convolution K. Chellapilla, S. Puri, P. Simard. High Performance Convolutional Neural Networks for Document Processing // International Workshop on Frontiers in Handwriting Recognition, 2006. 52 / 62
  • 53. Matrix multiplication Matrices’ size C OpenCV C++ (use STL vector class) OpenBLAS Matlab 1000×1000 1.45 1.76 1.47 0.062 0.062 2000×2000 11.64 14.2 11.23 0.99 0.54 3000×3000 38.11 47.2 37.99 1.75 1.7 4000×4000 90.84 110.37 90.2 7.91 4.2 5000×5000 180.74 213.4 181.02 10.8 7.3 6000×6000 315.46 376.46 316.3 25.33 12.74 https://4fire.wordpress.com/2012/04/29/matrices-multiplication-on-windows- matlab-is-the-champion-again/ 53 / 62
  • 54. OpenBLAS • OpenBLAS is an open source implementation of the BLAS (Basic Linear Algebra Subprograms) API with many hand- crafted optimizations for specific processor types. http://www.openblas.net/ 54 / 62
  • 55. Sizes of layers 0.09 1.56 2.25 2.25 2.25 144.02 64.02 15.63 LAYER 1-4 LAYER 5-8 LAYER 9-10 LAYER 11-12 LAYER 13-15 LAYER 16-17 LAYER 18-19 LAYER 20-21 Size,mb ~ 8,5 mb ~ 223 mb 55 / 62
  • 56. Pets test #2: data 1 mini-set: - 500 cats - 500 dogs - 1000 negatives 56 / 62
  • 57. Pets test #2: results 0.00 5.00 10.00 15.00 20.00 25.00 30.00 35.00 100 200 500 1000 2000 5000 10000 18000 FRR,% Train size 15 layer 17 layer 19 layer 57 / 62
  • 58. Pets test #2: results - FRR, % (FAR is fixed to 0,5%) Layer # Train size 15 16 19 100 30,08 12,61 12,94 500 17,91 10,41 10,72 1000 11,59 7,52 6,80 5000 7,41 3,88 4,13 10000 6,29 3,66 2,71 18000 5,16 2,64 2,54 58 / 62
  • 59. Calculation speed 0 5 10 15 20 25 30 35 40 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Time,ms Layer # ~ 73 ms ~ 60 ms 59 / 62
  • 60. Labeled Faces in the Wild (LFW) Dataset G. B. Huang, M. Ramesh, T. Berg, E. Learned-Miller. Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments // University of Massachusetts, Amherst, Technical Report 07-49, October, 2007 • more than 13,000 images of faces collected from the web. • Pairs comparison, restricted mode. • test: 10-fold cross- validation, 6000 face pairs. 60 / 62
  • 61. Face Recognition on LWF, results Y. Taigman, M. Yang, M. Ranzato, L. Wolf. DeepFace: Closing the Gap to Human-Level Performance in Face Verification, 2014, CVPR. Accuracy, % 1 Principal Component Analysis (EigenFaces) 60,2% 2 Local Binary Pattern Histograms (LBP) 72,4% 3 Deep CNN (AlexNet) + Euclid (LAZY DL) 71,0% 4 DeepFace by Facebook (STANDARD DL) 97,25% 61 / 62