Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Details of Lazy Deep
Learning for Images
Recognition in ZZ Photo app
Artem Chernodub, George Pashchenko
IMMSP NASU
Kharkov...
𝑝 𝑥 𝑦 =
𝑝 𝑦 𝑥 𝑝(𝑥)
𝑝(𝑦)
Biological-inspired models
Neuroscience
Machine Learning
2 / 62
Biological Neural Networks
3 / 62
Artificial Neural Networks
Traditional (Shallow) Neural
Networks
Deep Neural Networks
Deep Feedforward Neural
Networks
Rec...
Conventional Methods vs Deep
Learning
5 / 62
Deep Learning = Learning of
Representations (Features)
The traditional model of pattern recognition (since the late
50's):...
ImageNet
Le et al. “Building high-level features using large-scale unsupervised
learning” ICML 2012.
Model # of
parameters...
Deep Face (Facebook)
Y. Taigman, M. Yang, M.A. Ranzato, L. Wolf. DeepFace: Closing the Gap
to Human-Level Performance in F...
TIMIT Phoneme Recognition
Graves, A., Mohamed, A.-R., and Hinton, G. E. (2013). Speech recognition
with deep recurrent neu...
Google Large Vocabulary Speech
Recognition
H. Sak, A. Senior, F. Beaufays. Long Short-Term Memory Recurrent Neural
Network...
Classic Feedforward Neural
Networks (before 2006).
• Single hidden layer (Kolmogorov-Cybenko Universal
Approximation Theor...
Training the traditional (shallow)
Neural Network: derivative + optimization
12 / 62
1) forward propagation pass
),( )1(

i
ijij xwfz
),()1(~ )2(

j
jj zwgky
where zj is the postsynaptic value for the j...
2) backpropagation pass
Local gradients calculation:
),1(~)1(  kyktOUT

.)(' )2( OUT
jj
HID
j wzf  
,
)(
)2( j
OUT...
Bad effect of vanishing (exploding)
gradients: a problem
,
)( )1()(
)(



 m
i
m
jm
ji
z
w
kE

,' )1()()1()( 
 m
...
Bad effect of vanishing (exploding)
gradients: two hypotheses
1) increased frequency and
severity of bad local
minima
2) p...
Deep Feedforward Neural
Networks
• 2-stage training process: i) unsupervised pre-training; ii) fine
tuning (vanishing grad...
Sparse Autoencoders
18 / 62
Dimensionality
reduction
• Use a stacked RBM as deep auto-
encoder
1. Train RBM with images as input &
output
2. Limit one...
Original
Deep
RBN
PCA
Dimensionality reduction
Olivetti face data, 25x25 pixel images reconstructed from 30
dimensions (62...
How to use unsupervised pre-
training stage / 1
21 / 62
How to use unsupervised pre-
training stage / 2
22 / 62
How to use unsupervised pre-
training stage / 3
23 / 62
How to use unsupervised pre-
training stage / 4
24 / 62
Unlabeled data
Unlabeled data is readily available
Example: Images from the web
1. Download 10’000’000 images
2. Train a 9...
Dimensionality reduction
PCA Deep RBN
804’414 Reuters news stories, reduction to 2 dimensions
G. E. Hinton and R. R. Salak...
Hierarchy of trained
representations
Low-level
feature
Middle-
level
feature
Top-level
feature
Feature visualization of co...
Hessian-Free optimization: Deep
Learning with no pre-training stage
J. Martens. Deep Learning via Hessian-free Optimizatio...
FLOPS comparison
https://ru.wikipedia.org/wiki/FLOPS
Type Name Flops Cost
Mobile Raspberry Pi 1st Gen,
700 Mhz
0,04 Gflops...
Deep Networks Training time using
GPU
• Pretraining – from 2-3 weeks to 2-3
months.
• Fine-tuning (final supervised traini...
Tools for training Deep Neural
Networks
D. Kruchinin, E. Dolotov, K. Kornyakov, V. Kustikova, P. Druzhkov. The
Comparison ...
Convolutional Neural Networks:
Return of Jedi
Andrej Karpathy and Fei-Fei. CS231n: Convolutional Neural Networks
for Visua...
AlexNet, CNN-Mega-HiT,
results on LSVRC-2012
A. Kryzhevsky, I. Sutskever, G.E. Hinton. ImageNet Classification with
Deep C...
Lazy Deep Learning: idea
A. S. Razavian, H. Azizpour, J. Sullivan, S. Carlsson. CNN Features off-
the-shelf: an Astounding...
Lazy Deep Learning: bechmark
results
A. S. Razavian, H. Azizpour, J. Sullivan, S. Carlsson. CNN Features off-
the-shelf: a...
MIT-8 toy problem: formulation
• 8 classes
• 2080 images in total
• TRAIN: 2000
images (250 per
class)
• TEST: 688 images,...
MIT-8 toy problem: results
Acc.
TRAIN
Acc.
TEST
1 LBP + SVM with RBF Kernel 27,2% 19,0%
2 LPQ + SVM with RBF kernel 38,4% ...
ZZ Photo – photo organizer
Trial version is available at http://zzphoto.me
38 / 62
Viola-Jones Object Detector
• Very popular for Human Face Detection.
• May be trained for Cat and Dog Face detection.
• Av...
Images pyramid for Viola-Jones
40 / 62
Viola-Jones Object Detector
Classifier Structure
P. Viola, M. Jones. Rapid object detection using a boosted cascade of
sim...
AlexNet design
A. Kryzhevsky, I. Sutskever, G.E. Hinton. ImageNet Classification with
Deep Convolutional Neural Networks /...
Pets detection problem (Kaggle
Dataset + random Other images)
• Kaggle Dataset +
random “other”
images;
• 2 classes (cats ...
Pets detection results: FAR vs FRR
graphs
Original results, to be published.
44 / 62
Pet detection results : ROC curve
Original results, to be published.
45 / 62
Pets detection results,
FAR error is fixed to 0.5%
FRR Error
1 Viola-Jones Face Detector for Cats & Dogs + LBP + SVM 79,73...
Development of AleksNet on
OpenCV
VGG MatConvNet: CNNs for MATLAB http://www.vlfeat.org/matconvnet/
mexopencv:MATLAB-OpenC...
Convolution Layer
Andrej Karpathy and Fei-Fei. CS231n: Convolutional Neural Networks for
Visual Recognition http://cs231n....
Pooling layer
Andrej Karpathy and Fei-Fei. CS231n: Convolutional Neural Networks for
Visual Recognition http://cs231n.gith...
Activation functions
Andrej Karpathy and Fei-Fei. CS231n: Convolutional Neural Networks for
Visual Recognition http://cs23...
Implementation tricks: im2col
K. Chellapilla, S. Puri, P. Simard. High Performance Convolutional Neural
Networks for Docum...
Implementation tricks: im2col for
convolution
K. Chellapilla, S. Puri, P. Simard. High Performance Convolutional Neural
Ne...
Matrix multiplication
Matrices’ size C OpenCV
C++ (use
STL vector
class)
OpenBLAS Matlab
1000×1000 1.45 1.76 1.47 0.062 0....
OpenBLAS
• OpenBLAS is an open source
implementation of the BLAS (Basic Linear
Algebra Subprograms) API with many hand-
cr...
Sizes of layers
0.09 1.56 2.25 2.25 2.25
144.02
64.02
15.63
LAYER 1-4 LAYER 5-8 LAYER 9-10 LAYER 11-12 LAYER 13-15 LAYER 1...
Pets test #2: data
1 mini-set:
- 500 cats
- 500 dogs
- 1000 negatives
56 / 62
Pets test #2: results
0.00
5.00
10.00
15.00
20.00
25.00
30.00
35.00
100 200 500 1000 2000 5000 10000 18000
FRR,%
Train siz...
Pets test #2: results - FRR, %
(FAR is fixed to 0,5%)
Layer #
Train size 15 16 19
100 30,08 12,61 12,94
500 17,91 10,41 10...
Calculation speed
0
5
10
15
20
25
30
35
40
45
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Time,ms
Layer #
~ 73 ms ~...
Labeled Faces in the Wild (LFW)
Dataset
G. B. Huang, M. Ramesh, T. Berg, E. Learned-Miller. Labeled Faces in the
Wild: A D...
Face Recognition on LWF, results
Y. Taigman, M. Yang, M. Ranzato, L. Wolf. DeepFace: Closing the Gap to
Human-Level Perfor...
contact: a.chernodub@gmail.com
george.pashchenko@gmail.com
Thanks!
Upcoming SlideShare
Loading in …5
×

Details of Lazy Deep Learning for Images Recognition in ZZ Photo app

8,607 views

Published on

В докладе представлена тема глубокого обучения (Deep Learning) для распознавания изображений. Рассматриваются практические аспекты обучения глубоких сверточных сетей на GPU, обсуждается личный опыт портирования обученных нейросетей в приложение на основе библиотеки OpenCV, проводится сравнение полученного детектора домашних животных на основе подхода Lazy Deep Learning с детектором Виолы-Джонса.

Докладчики: Артем Чернодуб – эксперт в области искусственных нейронных сетей и систем искусственного интеллекта. В 2007 году закончил Московский физико-технический институт. Руководит направлением Computer Vision в компании ZZ Wolf, а также по совместительству работает научным сотрудником в Институте проблем математических машин и систем НАНУ.

Юрий Пащенко – специалист в области систем машинного зрения и машинного обучения, магистр НТУУ «Киевский Политехнический Институт», факультет прикладной математики (2014). Работает в компании ZZ Wolf на должности R&D Engineer.

Published in: Software
  • Be the first to comment

Details of Lazy Deep Learning for Images Recognition in ZZ Photo app

  1. 1. Details of Lazy Deep Learning for Images Recognition in ZZ Photo app Artem Chernodub, George Pashchenko IMMSP NASU Kharkov AI Club, 20 June 2015. ZZ Photo
  2. 2. 𝑝 𝑥 𝑦 = 𝑝 𝑦 𝑥 𝑝(𝑥) 𝑝(𝑦) Biological-inspired models Neuroscience Machine Learning 2 / 62
  3. 3. Biological Neural Networks 3 / 62
  4. 4. Artificial Neural Networks Traditional (Shallow) Neural Networks Deep Neural Networks Deep Feedforward Neural Networks Recurrent Neural Networks 4 / 62
  5. 5. Conventional Methods vs Deep Learning 5 / 62
  6. 6. Deep Learning = Learning of Representations (Features) The traditional model of pattern recognition (since the late 50's): fixed/engineered features + trainable classifier Hand-crafted Feature Extractor Trainable Classifier Trainable Feature Extractor Trainable Classifier End-to-end learning / Feature learning / Deep learning: trainable features + trainable classifier 6 / 62
  7. 7. ImageNet Le et al. “Building high-level features using large-scale unsupervised learning” ICML 2012. Model # of parameters Accuracy, % Deep Net 10M 15.8 best state-of-the-art N/A 9.3 Training data: 16M images, 20K categories 7 / 62
  8. 8. Deep Face (Facebook) Y. Taigman, M. Yang, M.A. Ranzato, L. Wolf. DeepFace: Closing the Gap to Human-Level Performance in Face Verification // CVPR 2014. Model # of parameters Accuracy, % Deep Face Net 128M 97.35 Human level N/A 97.5 Training data: 4M facial images 8 / 62
  9. 9. TIMIT Phoneme Recognition Graves, A., Mohamed, A.-R., and Hinton, G. E. (2013). Speech recognition with deep recurrent neural networks // IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6645–6649. IEEE. Mohamed, A. and Hinton, G. E. (2010). Phone recognition using restricted Boltzmann machines // IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4354–4357. Model # of parameters Error Hidden Markov Model, HMM N / A 27,3% Deep Belief Network, DBN ~ 4M 26,7% Deep RNN 4,3M 17.7% Training data: 462 speakers train / 24 speakers test, 3.16 / 0.14 hrs. 9 / 62
  10. 10. Google Large Vocabulary Speech Recognition H. Sak, A. Senior, F. Beaufays. Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling // INTERSPEECH’2014. K. Vesely, A. Ghoshal, L. Burget, D. Povey. Sequence-discriminative training of deep neural networks // INTERSPEECH’2014. Model # of parameters Cross-entropy ReLU DNN 85M 11.3 Deep Projection LSTM RNN 13M 10.7 Training data: 3M utterances (1900 hrs). 10 / 62
  11. 11. Classic Feedforward Neural Networks (before 2006). • Single hidden layer (Kolmogorov-Cybenko Universal Approximation Theorem as the main hope). • Vanishing gradients effect prevents using more layers. • Less than 10K free parameters. • Feature preprocessing stage is often critical. 11 / 62
  12. 12. Training the traditional (shallow) Neural Network: derivative + optimization 12 / 62
  13. 13. 1) forward propagation pass ),( )1(  i ijij xwfz ),()1(~ )2(  j jj zwgky where zj is the postsynaptic value for the j-th hidden neuron, w(1) are the hidden layer’s weights, f() are the hidden layer’s activation functions, w(2) are the output layer’s weights, and g() are the output layer’s activation functions. 13 / 62
  14. 14. 2) backpropagation pass Local gradients calculation: ),1(~)1(  kyktOUT  .)(' )2( OUT jj HID j wzf   , )( )2( j OUT j z w kE    . )( )1( i IN j ji x w kE    Derivatives calculation: 14 / 62
  15. 15. Bad effect of vanishing (exploding) gradients: a problem , )( )1()( )(     m i m jm ji z w kE  ,' )1()()1()(   m i i m ij m j m j wf  0 )( )(    m jiw kE => 1mfor 15 / 62
  16. 16. Bad effect of vanishing (exploding) gradients: two hypotheses 1) increased frequency and severity of bad local minima 2) pathological curvature, like the type seen in the well- known Rosenbrock function: 222 )(100)1(),( xyxyxf  16 / 62
  17. 17. Deep Feedforward Neural Networks • 2-stage training process: i) unsupervised pre-training; ii) fine tuning (vanishing gradients problem is beaten!). • Number of hidden layers > 1 (usually 6-9). • 100K – 100M free parameters. • No (or less) feature preprocessing stage. 17 / 62
  18. 18. Sparse Autoencoders 18 / 62
  19. 19. Dimensionality reduction • Use a stacked RBM as deep auto- encoder 1. Train RBM with images as input & output 2. Limit one layer to few dimensions  Information has to pass through middle layer G. E. Hinton and R. R. Salakhutdinov. Reducing the Dimensionality of Data with Neural Networks // Science 313 (2006), p. 504 – 507. 19 / 62
  20. 20. Original Deep RBN PCA Dimensionality reduction Olivetti face data, 25x25 pixel images reconstructed from 30 dimensions (625  30) G. E. Hinton and R. R. Salakhutdinov. Reducing the Dimensionality of Data with Neural Networks // Science 313 (2006), p. 504 – 507. 20 / 62
  21. 21. How to use unsupervised pre- training stage / 1 21 / 62
  22. 22. How to use unsupervised pre- training stage / 2 22 / 62
  23. 23. How to use unsupervised pre- training stage / 3 23 / 62
  24. 24. How to use unsupervised pre- training stage / 4 24 / 62
  25. 25. Unlabeled data Unlabeled data is readily available Example: Images from the web 1. Download 10’000’000 images 2. Train a 9-layer DNN 3. Concepts are formed by DNN G. E. Hinton and R. R. Salakhutdinov. Reducing the Dimensionality of Data with Neural Networks // Science 313 (2006), p. 504 – 507. 25 / 62
  26. 26. Dimensionality reduction PCA Deep RBN 804’414 Reuters news stories, reduction to 2 dimensions G. E. Hinton and R. R. Salakhutdinov. Reducing the Dimensionality of Data with Neural Networks // Science 313 (2006), p. 504 – 507. 26 / 62
  27. 27. Hierarchy of trained representations Low-level feature Middle- level feature Top-level feature Feature visualization of convolutional net trained on ImageNet from [Zeiler & Fergus 2013] 27 / 62
  28. 28. Hessian-Free optimization: Deep Learning with no pre-training stage J. Martens. Deep Learning via Hessian-free Optimization // Proceedings of the 27th International Conference on Machine Learning (ICML), 2010. 28 / 62
  29. 29. FLOPS comparison https://ru.wikipedia.org/wiki/FLOPS Type Name Flops Cost Mobile Raspberry Pi 1st Gen, 700 Mhz 0,04 Gflops $35 Mobile Apple A8 1,4 Gflops $700 (in iPhone 6) CPU Intel Core i7-4930K (Ivy Bridge), 3.7 GHz 140 Gflops $700 CPU Intel Core i7-5960X (Haswell), 3.0 GHz 350 Gflops $1300 GPU NVidia GTX 980 4612 Gflops (single precision), 144 Gflops (double precision) $600 + cost of PC (~$1000) GPU NVidia Tesla K80 8740 Gflops (single precision), 2910 Gflops (double precision) $4500 + cost of PC (~1500) 29 / 62
  30. 30. Deep Networks Training time using GPU • Pretraining – from 2-3 weeks to 2-3 months. • Fine-tuning (final supervised training) – from 1 day to 1 week. 30 / 62
  31. 31. Tools for training Deep Neural Networks D. Kruchinin, E. Dolotov, K. Kornyakov, V. Kustikova, P. Druzhkov. The Comparison of Deep Learning Libraries on the Problem of Handwritten Digit Classication // Analysis of Images, Social Networks and Texts (AIST), 2015, April, 9-11th, Yekaterinburg. 31 / 62
  32. 32. Convolutional Neural Networks: Return of Jedi Andrej Karpathy and Fei-Fei. CS231n: Convolutional Neural Networks for Visual Recognition http://cs231n.github.io/convolutional-networks Yoshua Bengio, Ian Goodfellow and Aaron Courville. Deep Learning // An MIT Press book in preparation http://www- labs.iro.umontreal.ca/~bengioy/DLbook 32 / 62
  33. 33. AlexNet, CNN-Mega-HiT, results on LSVRC-2012 A. Kryzhevsky, I. Sutskever, G.E. Hinton. ImageNet Classification with Deep Convolutional Neural Networks // Advances in Neural Information Processing Systems 25 (NIPS 2012). 33 / 62
  34. 34. Lazy Deep Learning: idea A. S. Razavian, H. Azizpour, J. Sullivan, S. Carlsson. CNN Features off- the-shelf: an Astounding Baseline for Recognition //2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 23-28 June 2014, Columbus, USA, p. 512 – 519. 34 / 62
  35. 35. Lazy Deep Learning: bechmark results A. S. Razavian, H. Azizpour, J. Sullivan, S. Carlsson. CNN Features off- the-shelf: an Astounding Baseline for Recognition //2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 23-28 June 2014, Columbus, USA, p. 512 – 519. 35 / 62
  36. 36. MIT-8 toy problem: formulation • 8 classes • 2080 images in total • TRAIN: 2000 images (250 per class) • TEST: 688 images, 86 per class S. Banerji, A. Verma, C. Liu. Novel Color LBP Descriptors for Scene and Image Texture Classification // Cross Disciplinary Biometric Systems, 2012, 15th International Conference on Image Processing, Computer Vision, and Pattern Recognition, Las Vegas, Nevada, pp. 205-225. 36 / 62
  37. 37. MIT-8 toy problem: results Acc. TRAIN Acc. TEST 1 LBP + SVM with RBF Kernel 27,2% 19,0% 2 LPQ + SVM with RBF kernel 38,4% 30,5% 3 LBP + SVM with χ2 kernel 94,2% 74,0% 4 LPQ + SVM with χ2 kernel 99,1% 82,2% 5 Deep CNN (AlexNet) + SVM RBF kernel (LAZY DL) 95,1% 91,8% 6 Deep CNN (AlexNet) + SVM with χ2 Kernel (LAZY DL) 100,0% 93,2% 7 Deep CNN (AlexNet) + MLP (LAZY DL) 100,0% 92,3% Original results, to be published. 37 / 62
  38. 38. ZZ Photo – photo organizer Trial version is available at http://zzphoto.me 38 / 62
  39. 39. Viola-Jones Object Detector • Very popular for Human Face Detection. • May be trained for Cat and Dog Face detection. • Available free in OpenCV library (http://opencv.org). O. Parkhi, A. Vedaldi, C. V. Jawahar, and A. Zisserman. The Truth about Cats and Dogs // Proceedings of the International Conference on Computer Vision (ICCV), 2011. J. Liu, A. Kanazawa, D. Jacobs, P. Belhumeur. Dog Breed Classification Using Part Localization // Lecture Notes in Computer Science Volume 7572, 2012, pp 172-185. 39 / 62
  40. 40. Images pyramid for Viola-Jones 40 / 62
  41. 41. Viola-Jones Object Detector Classifier Structure P. Viola, M. Jones. Rapid object detection using a boosted cascade of simple features // Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2001. 41 / 62
  42. 42. AlexNet design A. Kryzhevsky, I. Sutskever, G.E. Hinton. ImageNet Classification with Deep Convolutional Neural Networks // Advances in Neural Information Processing Systems 25 (NIPS 2012). 42 / 62
  43. 43. Pets detection problem (Kaggle Dataset + random Other images) • Kaggle Dataset + random “other” images; • 2 classes (cats & dogs VS other); • TRAIN: 5,000 samples; • TEST: 12,000 samples. 43 / 62
  44. 44. Pets detection results: FAR vs FRR graphs Original results, to be published. 44 / 62
  45. 45. Pet detection results : ROC curve Original results, to be published. 45 / 62
  46. 46. Pets detection results, FAR error is fixed to 0.5% FRR Error 1 Viola-Jones Face Detector for Cats & Dogs + LBP + SVM 79,73% 2 AlexNet, argmax (STANDARD DL, ImageNet-2012, 1000) 32,05% 3 AlexNet, sum (STANDARD DL, ImageNet-2012, 1000) 26,11% 4 AlexNet + SVM linear (LAZY DL) 4,35% Original results, to be published. 46 / 62
  47. 47. Development of AleksNet on OpenCV VGG MatConvNet: CNNs for MATLAB http://www.vlfeat.org/matconvnet/ mexopencv:MATLAB-OpenCV interface http://kyamagu.github.io/mexopencv/matlab MatConvNet, MATLAB + CUDA OpenCV app, C++ YAML YAML, BIN 47 / 62
  48. 48. Convolution Layer Andrej Karpathy and Fei-Fei. CS231n: Convolutional Neural Networks for Visual Recognition http://cs231n.github.io/convolutional-networks Yoshua Bengio, Ian Goodfellow and Aaron Courville. Deep Learning // An MIT Press book in preparation http://www- labs.iro.umontreal.ca/~bengioy/DLbook 48 / 62
  49. 49. Pooling layer Andrej Karpathy and Fei-Fei. CS231n: Convolutional Neural Networks for Visual Recognition http://cs231n.github.io/convolutional-networks Yoshua Bengio, Ian Goodfellow and Aaron Courville. Deep Learning // An MIT Press book in preparation http://www- labs.iro.umontreal.ca/~bengioy/DLbook 49 / 62
  50. 50. Activation functions Andrej Karpathy and Fei-Fei. CS231n: Convolutional Neural Networks for Visual Recognition http://cs231n.github.io/convolutional-networks Yoshua Bengio, Ian Goodfellow and Aaron Courville. Deep Learning // An MIT Press book in preparation http://www- labs.iro.umontreal.ca/~bengioy/DLbook 𝑓(𝑥) = max 0, 𝑥 𝑓′ 𝑥 = 1, 𝑥 ≥ 0 0, 𝑥 < 0 ReLU activation function 50 / 62
  51. 51. Implementation tricks: im2col K. Chellapilla, S. Puri, P. Simard. High Performance Convolutional Neural Networks for Document Processing // International Workshop on Frontiers in Handwriting Recognition, 2006. 51 / 62
  52. 52. Implementation tricks: im2col for convolution K. Chellapilla, S. Puri, P. Simard. High Performance Convolutional Neural Networks for Document Processing // International Workshop on Frontiers in Handwriting Recognition, 2006. 52 / 62
  53. 53. Matrix multiplication Matrices’ size C OpenCV C++ (use STL vector class) OpenBLAS Matlab 1000×1000 1.45 1.76 1.47 0.062 0.062 2000×2000 11.64 14.2 11.23 0.99 0.54 3000×3000 38.11 47.2 37.99 1.75 1.7 4000×4000 90.84 110.37 90.2 7.91 4.2 5000×5000 180.74 213.4 181.02 10.8 7.3 6000×6000 315.46 376.46 316.3 25.33 12.74 https://4fire.wordpress.com/2012/04/29/matrices-multiplication-on-windows- matlab-is-the-champion-again/ 53 / 62
  54. 54. OpenBLAS • OpenBLAS is an open source implementation of the BLAS (Basic Linear Algebra Subprograms) API with many hand- crafted optimizations for specific processor types. http://www.openblas.net/ 54 / 62
  55. 55. Sizes of layers 0.09 1.56 2.25 2.25 2.25 144.02 64.02 15.63 LAYER 1-4 LAYER 5-8 LAYER 9-10 LAYER 11-12 LAYER 13-15 LAYER 16-17 LAYER 18-19 LAYER 20-21 Size,mb ~ 8,5 mb ~ 223 mb 55 / 62
  56. 56. Pets test #2: data 1 mini-set: - 500 cats - 500 dogs - 1000 negatives 56 / 62
  57. 57. Pets test #2: results 0.00 5.00 10.00 15.00 20.00 25.00 30.00 35.00 100 200 500 1000 2000 5000 10000 18000 FRR,% Train size 15 layer 17 layer 19 layer 57 / 62
  58. 58. Pets test #2: results - FRR, % (FAR is fixed to 0,5%) Layer # Train size 15 16 19 100 30,08 12,61 12,94 500 17,91 10,41 10,72 1000 11,59 7,52 6,80 5000 7,41 3,88 4,13 10000 6,29 3,66 2,71 18000 5,16 2,64 2,54 58 / 62
  59. 59. Calculation speed 0 5 10 15 20 25 30 35 40 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Time,ms Layer # ~ 73 ms ~ 60 ms 59 / 62
  60. 60. Labeled Faces in the Wild (LFW) Dataset G. B. Huang, M. Ramesh, T. Berg, E. Learned-Miller. Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments // University of Massachusetts, Amherst, Technical Report 07-49, October, 2007 • more than 13,000 images of faces collected from the web. • Pairs comparison, restricted mode. • test: 10-fold cross- validation, 6000 face pairs. 60 / 62
  61. 61. Face Recognition on LWF, results Y. Taigman, M. Yang, M. Ranzato, L. Wolf. DeepFace: Closing the Gap to Human-Level Performance in Face Verification, 2014, CVPR. Accuracy, % 1 Principal Component Analysis (EigenFaces) 60,2% 2 Local Binary Pattern Histograms (LBP) 72,4% 3 Deep CNN (AlexNet) + Euclid (LAZY DL) 71,0% 4 DeepFace by Facebook (STANDARD DL) 97,25% 61 / 62
  62. 62. contact: a.chernodub@gmail.com george.pashchenko@gmail.com Thanks!

×