Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Deep Learning for Computer Vision - ExecutiveML

Slides from my talk about deep learning & computer vision at the Executive-ML conference in Johannesburg on 2017/09/21 :)

  • Be the first to comment

Deep Learning for Computer Vision - ExecutiveML

  1. 1. Deep Learning for Computer Vision Executive-ML 2017/09/21 Neither Proprietary nor Confidential – Please Distribute ;) Alex Conway alex @ numberboost.com @alxcnwy
  2. 2. Hands up!
  3. 3. Check out the Deep Learning Indaba videos & practicals! http://www.deeplearningindaba.com/videos.html http://www.deeplearningindaba.com/practicals.html
  4. 4. Deep Learning is Sexy (for a reason!) 4
  5. 5. Image Classification 5 http://yann.lecun.com/exdb/mnist/ https://github.com/fchollet/keras/blob/master/examples/mnist_cnn.py (99.25% test accuracy in 192 seconds and 70 lines of code)
  6. 6. Image Classification 6
  7. 7. Image Classification 7
  8. 8. https://research.googleblog.com/2017/06/supercharge- your-computer-vision-models.html Object detection
  9. 9. https://www.youtube.com/watch?v=VOC3huqHrss Object detection
  10. 10. Image Captioning & Visual Attention XXX 10 https://einstein.ai/research/knowing-when-to-look-adaptive-attention- via-a-visual-sentinel-for-image-captioning
  11. 11. Image Q&A 11 https://arxiv.org/pdf/1612.00837.pdf
  12. 12. Video Q&A XXX 12 https://www.youtube.com/watch?v=UeheTiBJ0Io
  13. 13. Pix2Pix https://affinelayer.com/pix2pix/ https://github.com/affinelayer/pix2pix-tensorflow 13
  14. 14. Pix2Pix https://medium.com/towards-data-science/face2face-a-pix2pix-demo-that- mimics-the-facial-expression-of-the-german-chancellor-b6771d65bf66 14
  15. 15. 15 Original input Rear Window (1954) Pix2pix output Fully Automated Remastered Painstakingly by Hand https://hackernoon.com /remastering-classic- films-in-tensorflow-with- pix2pix-f4d551fa0503
  16. 16. Style Transfer https://github.com/junyanz/CycleGAN 16
  17. 17. Style Transfer MAGIC https://github.com/junyanz/CycleGAN 17
  18. 18. 1. What is a neural network? 2. What is a convolutionalneural network? 3. How to use a convolutionalneural network 4. More advanced Methods 5. Case studies & applications 18
  19. 19. Big Shout Outs Jeremy Howard & Rachel Thomas http://course.fast.ai Andrej Karpathy http://cs231n.github.io François Chollet (Keras lead dev) https://keras.io/ 19
  20. 20. 1.What is a neural network?
  21. 21. What is a neuron? 21 • 3 inputs[x1,x2,x3] • 3 weights[w1,w2,w3] • Element-wise multiply and sum • Apply activationfunction f • Often add a bias too (weightof 1) – not shown
  22. 22. What is an Activation Function? 22 Sigmoid Tanh ReLU Nonlinearities … “squashing functions” … transformneuron’s output NB: sigmoid output in[0,1]
  23. 23. What is a (Deep) Neural Network? 23 Inputs outputs hidden layer 1 hidden layer 2 hidden layer 3 Outputs of one layer are inputs into the next layer
  24. 24. How does a neural network learn? 24 • You need labelled examples “trainingdata” • Initially, the network makes random predictions (weights initializedrandomly) • For each training data point, we calculate the error between the network’s predictions and the ground-truth labels (aka “loss function”) • Use ‘backpropagation’ (really just the chain rule), to update the network parameters (weights) in the opposite direction to the error
  25. 25. How does a neural network learn? 25 New weight = Old weight Learning rate- Gradient of weight with respect to Error( )x “How much error increases when we increase this weight”
  26. 26. Gradient Descent Interpretation 26 http://scs.ryerson.ca/~aharley/neural-networks/
  27. 27. http://playground.tensorflow.org
  28. 28. What is a Neural Network? For much more detail, see: 1. Michael Nielson’s Neural Networks & Deep Learning free online book http://neuralnetworksanddeeplearning.com/chap1.html 2. Anrej Karpathy’s CS231n Notes http://neuralnetworksanddeeplearning.com/chap1.html 28
  29. 29. 2. What is a convolutional neural network?
  30. 30. What is a Convolutional Neural Network? 30 “like a ordinary neural network but with special types of layers that work well on images” (math works on numbers) • Pixel = 3 colour channels (R, G, B) • Pixel intensity ∈[0,255] • Image has width w and height h • Therefore image is w x h x 3 numbers
  31. 31. 31 This is VGGNet – don’t panic, we’ll break it down piece by piece Example Architecture
  32. 32. 32 This is VGGNet – don’t panic, we’ll break it down piece by piece Example Architecture
  33. 33. Convolutions 33 http://setosa.io/ev/image-kernels/
  34. 34. Convolutions 34 http://deeplearning.net/software/theano/tutorial/conv_arithmetic.html
  35. 35. New Layer Type: ConvolutionalLayer 35 • 2-d weighted average when multiply kernel over pixel patches • We slide the kernel over all pixels of the image (handle borders) • Kernel starts off with “random” values and network updates (learns) the kernel values (using backpropagation) to try minimize loss • Kernels shared across the whole image (parameter sharing)
  36. 36. Many Kernels = Many “Activation Maps” = Volume 36http://cs231n.github.io/convolutional-networks/
  37. 37. New Layer Type: ConvolutionalLayer 37
  38. 38. Convolutions 38 https://github.com/fchollet/keras/blob/master/examples/conv_filter_visualization.py
  39. 39. Convolutions 39
  40. 40. Convolutions 40
  41. 41. Convolutions 41
  42. 42. Convolution Learn Heirarchial Features 42
  43. 43. Great vid 43 https://www.youtube.com/watch?v=AgkfIQ4IGaM
  44. 44. New Layer Type: Max Pooling 44
  45. 45. New Layer Type: Max Pooling • Reduces dimensionality from one layer to next • …by replacing NxN sub-area with max value • Makes network “look” at larger areas of the image at a time • e.g. Instead of identifying fur, identify cat • Reduces overfittingsince losing information helps the network generalize 45
  46. 46. New Layer Type: Max Pooling 46
  47. 47. Softmax • Convert scores ∈ ℝ to probabilities∈ [0,1] • Then predict the class with highest probability 47
  48. 48. Bringing it all together 48 Convolution+ max pooling+ fully connected+ softmax
  49. 49. Bringing it all together 49 Convolution+ max pooling+ fully connected+ softmax
  50. 50. Bringing it all together 50 Convolution+ max pooling+ fully connected+ softmax
  51. 51. Bringing it all together 51 Convolution+ max pooling+ fully connected+ softmax
  52. 52. Bringing it all together 52 Convolution+ max pooling+ fully connected+ softmax
  53. 53. Bringing it all together 53 Convolution+ max pooling+ fully connected+ softmax
  54. 54. We need labelled training data!
  55. 55. ImageNet 55 http://image-net.org/explore 1000 object categories 1.2 million training images
  56. 56. ImageNet 56
  57. 57. ImageNet 57
  58. 58. ImageNet Top 5 Error Rate 58 Traditional Image Processing Methods AlexNet 8 Layers ZFNet 8 Layers GoogLeNet 22 Layers ResNet 152 Layers SENet Ensamble TSNet Ensamble
  59. 59. 3. How to use a convolutional neural network
  60. 60. Using a Pre-Trained ImageNet-Winning CNN 60 • We’ve been looking at “VGGNet” • Oxford Visual Geometry Group (VGG) • ImageNet 2014 Runner-up • Network is 16 layers (deep!) • Easy to fine-tune https://blog.keras.io/building-powerful-image-classification-models-using- very-little-data.html
  61. 61. Example: Classifying Product Images 61 https://github.com/alexcnwy/CTDL_CNN_TALK_20170620 Classifying products into 9 categories
  62. 62. 62 https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html Start with Pre-Trained ImageNet Model
  63. 63. “Transfer Learning” is a game changer
  64. 64. Fine-tuning A CNN To Solve A New Problem • Cut off last layer of pre-trained Imagenet winning CNN • Keep learned network (convolutions) but replace final layer • Can learn to predict new (completely different) classes • Fine-tuning is re-training new final layer - learn for new task 64
  65. 65. Fine-tuning A CNN To Solve A New Problem 65
  66. 66. 66 Before Fine-Tuning
  67. 67. 67 After Fine-Tuning
  68. 68. Fine-tuning A CNN To Solve A New Problem • Fix weights in convolutional layers (set trainable=False) • Remove final dense layer that predicts 1000 ImageNet classes • Replace with new dense layer to predict 9 categories 68 88% accuracy in under 2 minutes for classifying products into categories Fine-tuning is awesome! Insert obligatory brain analogy
  69. 69. Visual Similarity 69 • Chop off last 2 VGG layers • Use dense layer with 4096 activations • Compute nearest neighbours in the space of these activations https://memeburn.com/2017/06/spree-image-search/
  70. 70. 70 https://github.com/alexcnwy/CTDL_CNN_TALK_20170620
  71. 71. 71 Input Image not seen by model Results Top 10 most “visually similar”
  72. 72. 4. More Advanced Methods
  73. 73. Use a Better Architecture (or all of them!) 73 “Ensambles win” learn a weighted average of many models’ predictions
  74. 74. cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf There are MANY Computer Vision Tasks
  75. 75. > Long,Shelhamer,and Darrell,“Fully Convolutional Networks for SemanticSegmentation”CVPR2015 > Noh et al, “LearningDeconvolution Networkfor SemanticSegmentation”CVPR2015 Semantic Segmentation
  76. 76. http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf Object detection
  77. 77. Object detection
  78. 78. https://www.youtube.com/watch?v=VOC3huqHrss Object detection
  79. 79. http://blog.romanofoti.com/style_transfer/ https://github.com/fchollet/keras/blob/master/examples/neural_style_transfer.py Style Transfer Loss = content_loss + style_loss Content loss = convolutions from pre-trained network Style loss = gram matrix from style image convolutions
  80. 80. Video Q&A XXX 80 https://www.youtube.com/watch?v=UeheTiBJ0Io
  81. 81. Video Q&A XXX 81 https://www.youtube.com/watch?v=UeheTiBJ0Io
  82. 82. 5. Case Studies
  83. 83. Image & Video Moderation TODO 83 Large internationalgay datingapp with tens of millions of users uploadinghundreds-of-thousandsofphotosperday
  84. 84. Estimating Accident Repair Cost from Photos TODO 84 Prototype for large SA insurer Detect car make & model from registrationdisk Predict repair cost using learnedmodel
  85. 85. Optical Sorting TODO 85 https://www.youtube.com/watch?v=Xf7jaxwnyso
  86. 86. Segmenting Medical Images TODO 86
  87. 87. m Counting People TODO Countshoppers,segment on age & gender facial recognition loyalty is next
  88. 88. Counting Cars TODO
  89. 89. Wine A.I.
  90. 90. Detecting Potholes
  91. 91. GET IN TOUCH! Alex Conway alex @ numberboost.com @alxcnwy
  92. 92. http://blog.kaggle.com/2016/02/04/noaa-right-whale-recognition- winners-interview-2nd-place-felix-lau/ Computer Vision Pipelines https://flyyufelix.github.io/2017/04/16/kaggle-nature- conservancy.html
  93. 93. https://www.autoblog.com/2017/08/04/self-driving-car-sign-hack-stickers/
  94. 94. Practical Tips • use a GPU – AWS p2 instances(use spot!) • when overfitting (validation_accuracy<<<training_accuracy) – Increase dropout – Early stopping • when underfitting (low training_accuracy) 1. Add moredata 2. Use data augmentation – Flipping/stretching / rotating – Slightly change hues / brightness 3. Use more complex model – Resnet / Inception /Densenet – Ensamble (average<<< weightedaveragewith learnedweights) 94

×