Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
KaoNet: Face Recognition and Generation
App using Deep Learning
Van Phu Quang Huy
Pham Quang Khang
1
About Us
Van Phu Quang Huy
● AI Lead Engineer at Galapagos Inc
Pham Quang Khang
● Software engineer@Works Applications
2
Objectives
● What we want to do?
To introduce the whole process of creating an application
based on Deep Learning
● What w...
Part 1: Face Recognition
4
First thing first: idea
● Facial recognition is a promising yet challenging research field for its
enormous applications:
...
The name: KaoNet
KaoNet = 顔(Kao) + Net
It is The Network of Faces
6
What can the app do?
● Classify the input data into groups of faces of the same people
● Generate faces using the input su...
Whose faces?
● In order to train a neural network, the amount of sample data must be
very large
who would have that amount...
Where to find those photos?
● Online: internet is the infinite source of all kind of information, hence the
more famous on...
How many photos?
● At first, a list of more than 50 popular people was chosen as the target of
the app, we expected around...
But we only care about The Face of people
● The whole photo is not
a good sample since
too much noise in
background
● Solu...
Finally
A Net of Kao (faces)
Result: only have enough time to
filter 26 targets
12
Finish?
● That is only the beginning. Now the hard part: training
● Model:
○ CNN : train to classify samples
○ GAN: to gen...
Neural Networks
14
cat
Convolutional Neural Network: convolution layer
Idea: extracting the elementary features of image by using the local recep...
Pooling layer (sub-sampling)
Local averaging and sub-sampling, reducing the resolution of feature map and
reducing the res...
CNN architecture in KaoNet
● Formula of convolutional layers:
Convolution + Batch Normalization + ReLU +
Max Pooling
● Arc...
CNN architecture in KaoNet
18
Hyper-parameters in KaoNet
● Number of layers: 4 conv, 2 fc
● Size and number of filters in each convolutional layer (prev...
Data partition
● Data is separated into 2 parts: train data and validation data with the ratio
of 80: 20.
● Each epoch, tr...
Source code
● TensorFlow tutorial of Cifar10 and MNIST are good samples
https://www.tensorflow.org/tutorials/deep_cnn
http...
Let’s run
● Training with 26 targets resulted in fair accuracy on training set but
extremely poor on validation set => Ove...
Why failed?
Causes:
○ The model is too complex compare to the number of sample in each training set
○ The number of sample...
The Ultimate 2
● One way to fix the problem is to use a set of sample that fairly separated
and with more amount of data
●...
Accuracy of validation test is highly improved
● Loss drops to close to zero after 10K steps of training
● Train accuracy ...
Training Environment
● Use all resources that we can
○ Macbook Pro (CPU)
○ Dell Vostro desktop (CPU)
○ AWS GPU Instance g2...
Demo
27
Embedding Visualization
・Presenting the vector of last fully
connected layer at each input data
・Each image is represented...
Future of KaoNet
● Biometric security: using face recognition to replace physical lock
● Face search
● Criminal hunting us...
Part 2: Face Generation
30
Generative Model [1]
● Explicitly or implicitly model the distribution of data
● By sampling from that model, it is possib...
Generative Adversarial Networks (GAN)
What are some recent and potentially upcoming breakthroughs in deep
learning? (from ...
GAN [1]
● Based on a game theoretic scenario in which the generator network must
compete against an adversary [2]
○ The ge...
GAN in easy words...
● A criminal tries to print fake money
● A police attempts to distinguish fake money from real money
...
GAN in easy words...
● As the fake money becomes more and more realistic, the police also has
to improve his detection ski...
In the GAN world
● The criminal is called Generator
● The police is called Discriminator
● Generator and Discriminator are...
GAN’s application: Image to Image Translation
37(Isola et al, 2016)
Deep Convolutional Generative Adversarial Networks (DCGAN) [1]
● Both Generator and Discriminator are Deep Convolutional N...
Generator Network in DCGAN
39(Radford et al, 2015)
Experiment: Train DCGAN on our celebrity dataset
40
Step 0 Step 1000 Step 40000
Experiment: Train DCGAN on the Ultimate 2 dataset
41
Step 0 Step 1000 Step 15000
Conclusion
● We have introduced step-by-step of developing an application based on
Deep Learning
● Succeed in creating a f...
Upcoming SlideShare
Loading in …5
×

KaoNet: Face Recognition and Generation App using Deep Learning

1,991 views

Published on

Demo at WSDL 2017. We introduced KaoNet - a Face Recognition and Generation App using Deep Learning.

Published in: Technology
  • Odd Morning Hack Helps Mom of 3 Lose 62lbs (See Before/After pics)  http://t.cn/AirVsfPx
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Hear The Angels Sing: Listen to this free musical composition to clear away all the negativity in your life and welcome in miracles! Download your complimentary "Angel Soundscape" now. ★★★ http://t.cn/AiuvUCDd
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • how to lose belly fat fast - 22 ways to lose 2 inches in 2 weeks ▲▲▲ https://tinyurl.com/bkfitness4u
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

KaoNet: Face Recognition and Generation App using Deep Learning

  1. 1. KaoNet: Face Recognition and Generation App using Deep Learning Van Phu Quang Huy Pham Quang Khang 1
  2. 2. About Us Van Phu Quang Huy ● AI Lead Engineer at Galapagos Inc Pham Quang Khang ● Software engineer@Works Applications 2
  3. 3. Objectives ● What we want to do? To introduce the whole process of creating an application based on Deep Learning ● What will be included: ○ Convolutional Neural Networks (CNNs) ○ Generative Adversarial Networks (GANs) ○ TensorFlow 3
  4. 4. Part 1: Face Recognition 4
  5. 5. First thing first: idea ● Facial recognition is a promising yet challenging research field for its enormous applications: ○ Biometric security system ○ Monitoring and people searching ○ Daily applications ● All the tools to develop a facial recognition app are already provided by lots of company => Why not a face recognition app 5
  6. 6. The name: KaoNet KaoNet = 顔(Kao) + Net It is The Network of Faces 6
  7. 7. What can the app do? ● Classify the input data into groups of faces of the same people ● Generate faces using the input such that the generated faces can be as similar as human as possible 7
  8. 8. Whose faces? ● In order to train a neural network, the amount of sample data must be very large who would have that amount of photos for share? => famous people ● Who would attract most => singers, models, actresses 8
  9. 9. Where to find those photos? ● Online: internet is the infinite source of all kind of information, hence the more famous one person is, the higher the probability his/her photos can be searched with simple keywords ● The search engine we chose: Bing. Because the API to crawl photos from search results is still free in Bing. 9
  10. 10. How many photos? ● At first, a list of more than 50 popular people was chosen as the target of the app, we expected around at least 1K of data for each ● Crawling: data was collected from a few simple fixed keyword to search on Bing and save the result to local server ● Result: around 1K of photos for each person were collected but after removing wrong result, only around 200 correct samples for each was chosen for KaoNet 10
  11. 11. But we only care about The Face of people ● The whole photo is not a good sample since too much noise in background ● Solution: Cut out the face out by OpenCV 11
  12. 12. Finally A Net of Kao (faces) Result: only have enough time to filter 26 targets 12
  13. 13. Finish? ● That is only the beginning. Now the hard part: training ● Model: ○ CNN : train to classify samples ○ GAN: to generate a face from samples ● Framework: TensorFlow. Because it is highly supported for CNN with real time training process observation, and one code for both CPU and GPU 13 Training progress in real time steps steps
  14. 14. Neural Networks 14 cat
  15. 15. Convolutional Neural Network: convolution layer Idea: extracting the elementary features of image by using the local receptive fields instead of training all points on the original image (Yann LeCun 1998) 15 Fei-Fei Li, Stanford 2016
  16. 16. Pooling layer (sub-sampling) Local averaging and sub-sampling, reducing the resolution of feature map and reducing the resolution of the feature map (Yann LeCun 1998) 16 Fei-Fei Li, Stanford 2016
  17. 17. CNN architecture in KaoNet ● Formula of convolutional layers: Convolution + Batch Normalization + ReLU + Max Pooling ● Architecture of KaoNet: 4 convolutional layers (conv) + 2 fully connected layers (fc) 17 layer size-in size-out kernel conv1 128⨉128⨉3 128⨉128⨉32 7⨉7, 1 pool1 128⨉128⨉32 64⨉64⨉32 2⨉2, 2 conv2 64⨉64⨉32 64⨉64⨉64 5⨉5, 1 pool2 64⨉64⨉64 32⨉32⨉64 2⨉2, 2 conv3 32⨉32⨉64 32⨉32⨉128 3⨉3, 1 pool3 32⨉32⨉128 16⨉16⨉128 2⨉2, 2 conv4 16⨉16⨉128 16⨉16⨉192 3⨉3, 1 pool4 16⨉16⨉192 8⨉8⨉192 2⨉2, 2 reshape 8⨉8⨉192 1⨉12288 fc1 1⨉12288 1⨉1024 fc2 1⨉1024 1⨉512
  18. 18. CNN architecture in KaoNet 18
  19. 19. Hyper-parameters in KaoNet ● Number of layers: 4 conv, 2 fc ● Size and number of filters in each convolutional layer (previous slide) ● Size of fully connected layer (previous slide) ● Weight-decay (for fc only, weight-decay = 0.004) ● Optimization algorithm (AdamOptimizer) ● Initial learning rate (0.004) ● Initial weight (normal distribution with mean=0, sttdev = 5e-4) 19
  20. 20. Data partition ● Data is separated into 2 parts: train data and validation data with the ratio of 80: 20. ● Each epoch, training result is applied to validation data to evaluate the loss and prediction accuracy ● Each train step, an amount of batch_size (KaoNet: 64) data is loaded for training. Data is loaded randomly from training set 20
  21. 21. Source code ● TensorFlow tutorial of Cifar10 and MNIST are good samples https://www.tensorflow.org/tutorials/deep_cnn https://www.tensorflow.org/tutorials/layers ● Our source code (not public yet) https://github.com/vanhuyz/KaoNet 21
  22. 22. Let’s run ● Training with 26 targets resulted in fair accuracy on training set but extremely poor on validation set => Overfitting 22 steps steps Train set Validation set Train set Validation set
  23. 23. Why failed? Causes: ○ The model is too complex compare to the number of sample in each training set ○ The number of sample for each object varied too much, some has the number of sample a few times more than others Solution: ● Simplify the model => not so a choice for application extending ● Increasing the number of samples => not enough time ● Only train with targets have sufficient number of sample => worth trying 23
  24. 24. The Ultimate 2 ● One way to fix the problem is to use a set of sample that fairly separated and with more amount of data ● The Ultimate 2: 10K of photos for each target 24
  25. 25. Accuracy of validation test is highly improved ● Loss drops to close to zero after 10K steps of training ● Train accuracy went to 100% before 5K steps ● Validation accuracy highly improved, compared to previous data set 25 steps steps
  26. 26. Training Environment ● Use all resources that we can ○ Macbook Pro (CPU) ○ Dell Vostro desktop (CPU) ○ AWS GPU Instance g2.8xlarge (2.7$/h) → totally cost us about 100$ ○ GeForce GTX 1080 (GPU) → thank Galapagos Inc for supporting! 26
  27. 27. Demo 27
  28. 28. Embedding Visualization ・Presenting the vector of last fully connected layer at each input data ・Each image is represented by a 512-dimensional vector ・High dimension vectors are compressed into 3-dimensional vector using PCA for visualization → Let’s check on Tensorboard 28
  29. 29. Future of KaoNet ● Biometric security: using face recognition to replace physical lock ● Face search ● Criminal hunting using CCTV 29
  30. 30. Part 2: Face Generation 30
  31. 31. Generative Model [1] ● Explicitly or implicitly model the distribution of data ● By sampling from that model, it is possible to generate synthetic data points in the data space 31 [1] C.Bishop, 2006. Pattern Recognition and Machine Learning, p43
  32. 32. Generative Adversarial Networks (GAN) What are some recent and potentially upcoming breakthroughs in deep learning? (from Quora 2016) 32 The most important one, in my opinion, is adversarial training (also called GAN for Generative Adversarial Networks)... This, and the variations that are now being proposed is the most interesting idea in the last 10 years in ML, in my opinion. - Yann LeCun, Director of AI Research at Facebook (https://www.quora.com/What-are-some-recent-and-potentially-upcoming-breakthroughs-in-deep-learning)
  33. 33. GAN [1] ● Based on a game theoretic scenario in which the generator network must compete against an adversary [2] ○ The generator network directly produces “fake” samples ○ The discriminator network attempts to distinguish between samples drawn from the training data and samples drawn from the generator ● Train 2 networks simultaneously ○ The discriminator learns to correctly classify samples as real of fake ○ The generator learns to fool the discriminator into believing its samples are real ● In convergence, the generator’s samples are indistinguishable from real data, and the discriminator outputs ½ everywhere 33 [1] Goodfellow, 2014 [2] Goodfellow et al, 2016. Deep Learning, p702
  34. 34. GAN in easy words... ● A criminal tries to print fake money ● A police attempts to distinguish fake money from real money ● At first, with outdated technology, the criminal just prints some “random papers”, so the police can easily detect what is fake money ● The criminal learns from that, then improves his tech 34 vs
  35. 35. GAN in easy words... ● As the fake money becomes more and more realistic, the police also has to improve his detection skill ● As a result, the criminal and the police learn from each other, and continuously improve themselves ● Finally, when the fake money looks so realistic that the police can not distinguish, the world is over! 35
  36. 36. In the GAN world ● The criminal is called Generator ● The police is called Discriminator ● Generator and Discriminator are usually Neural Networks (but not required) ● GAN’s problems: ○ Unstable to train ○ Non-convergence 36
  37. 37. GAN’s application: Image to Image Translation 37(Isola et al, 2016)
  38. 38. Deep Convolutional Generative Adversarial Networks (DCGAN) [1] ● Both Generator and Discriminator are Deep Convolutional Neural Networks ● Apply some techniques for stable training ○ Replace pooling layers with strided convolutions (discriminator) and fractional-strided convolutions (generator) ○ Use batch normalization ○ Remove fully connected hidden layers ○ Use LeakyRELU activation in the discriminator for all layers ○ ... 38 [1] Radford et al, 2015
  39. 39. Generator Network in DCGAN 39(Radford et al, 2015)
  40. 40. Experiment: Train DCGAN on our celebrity dataset 40 Step 0 Step 1000 Step 40000
  41. 41. Experiment: Train DCGAN on the Ultimate 2 dataset 41 Step 0 Step 1000 Step 15000
  42. 42. Conclusion ● We have introduced step-by-step of developing an application based on Deep Learning ● Succeed in creating a face classification app based on CNN ● Achieved 98% accuracy for validation test and good result on test data ● Successfully generated images using DCGAN 42

×