Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

An Introduction to Deep Learning (March 2018)

7,161 views

Published on

Talk @ AI Dev Days, Bangalore, 09/03/2018

Published in: Technology
  • Will it be possible to get the actual powerpoint presentation as information got lost in hosting it on slideshare, for example on slide 10 (Optimizers).
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

An Introduction to Deep Learning (March 2018)

  1. 1. An Introduction to Deep Learning Julien Simon, AI Evangelist, EMEA @julsimon https://medium.com/@julsimon
  2. 2. Agenda • An introduction to Deep Learning • Common network architectures and use cases • Demos • Resources
  3. 3. An introduction to Deep Learning
  4. 4. Activation functions The neuron ෍ i=1 l xi ∗ wi = u ”Multiply and Accumulate” Source: Wikipedia
  5. 5. x = x11, x12, …. x1I x21, x22, …. x2I … … … xm1, xm2, …. xmI I features m samples y = 2 0 … 4 m labels, N2 categories 0,0,1,0,0,…,0 1,0,0,0,0,…,0 … 0,0,0,0,1,…,0 One-hot encoding Neural networks
  6. 6. x = x11, x12, …. x1I x21, x22, …. x2I … … … xm1, xm2, …. xmI I features m samples y = 2 0 … 4 m labels, N2 categories Total number of predictions Accuracy = Number of correct predictions 0,0,1,0,0,…,0 1,0,0,0,0,…,0 … 0,0,0,0,1,…,0 One-hot encoding Neural networks
  7. 7. Neural networks Initially, the network will not predict correctly f(X1) = Y’1 A loss function measures the difference between the real label Y1 and the predicted label Y’1 error = loss(Y1, Y’1) For a batch of samples: ෍ 𝑖=1 𝑏𝑎𝑡𝑐ℎ 𝑠𝑖𝑧𝑒 loss(Yi, Y’i) = batch error The purpose of the training process is to minimize error by gradually adjusting weights
  8. 8. Training Training data set Training Trained neural network Batch size Learning rate Number of epochs Hyper parameters Backpropagation
  9. 9. Stochastic Gradient Descent (SGD) Imagine you stand on top of a mountain with skis strapped to your feet. You want to get down to the valley as quickly as possible, but there is fog and you can only see your immediate surroundings. How can you get down the mountain as quickly as possible? You look around and identify the steepest path down, go down that path for a bit, again look around and find the new steepest path, go down that path, and repeat—this is exactly what gradient descent does. Tim Dettmers University of Lugano 2015 https://devblogs.nvidia.com/parallelforall/deep-learning-nutshell-history-training/ The « step size » is called the learning rate z=f(x,y)
  10. 10. Optimizers
  11. 11. Validation Validation data set Trained neural network Validation accuracy Prediction at the end of each epoch
  12. 12. Early stopping Training accuracy Loss function Accuracy 100% Epochs Validation accuracy Loss Best epoch OVERFITTING « Deep Learning ultimately is about finding a minimum that generalizes well, with bonus points for finding one fast and reliably », Sebastian Ruder
  13. 13. Common network architectures and use cases
  14. 14. Convolutional Neural Networks (CNN) Le Cun, 1998: handwritten digit recognition, 32x32 pixels Convolution and pooling reduce dimensionality https://devblogs.nvidia.com/parallelforall/deep-learning-nutshell-core-concepts/
  15. 15. Object Detection https://github.com/precedenceguo/mx-rcnn https://github.com/zhreshold/mxnet-yolo
  16. 16. Object Segmentation https://github.com/TuSimple/mx-maskrcnn
  17. 17. Text Detection and Recognition https://github.com/Bartzi/stn-ocr
  18. 18. Face Detection https://github.com/tornadomeet/mxnet-face
  19. 19. Real-Time Pose Estimation https://github.com/dragonfly90/mxnet_Realtime_Multi-Person_Pose_Estimation
  20. 20. Long Short Term Memory Networks (LSTM) • A LSTM neuron computes the output based on the input and a previous state • LSTM networks have memory • They’re great at predicting sequences, e.g. machine translation
  21. 21. Machine Translation https://github.com/awslabs/sockeye
  22. 22. GAN: Welcome to the (un)real world, Neo Generating new ”celebrity” faces https://github.com/tkarras/progressive_growing_of_gans From semantic map to 2048x1024 picture https://tcwang0509.github.io/pix2pixHD/
  23. 23. Wait! There’s more! Models can also generate text from text, text from images, text from video, images from text, sound from video, 3D models from 2D images, etc.
  24. 24. Highly-optimized machine learning algorithms BuildPre-built notebook instances Amazon SageMaker
  25. 25. Highly-optimized machine learning algorithms BuildPre-built notebook instances Train Amazon SageMaker One-click training for ML, DL, and custom algorithms Easier training with hyperparameter optimization
  26. 26. One-click training for ML, DL, and custom algorithms Easier training with hyperparameter optimization Highly-optimized machine learning algorithms Deployment without engineering effort Fully-managed hosting at scale BuildPre-built notebook instances Deploy Train Amazon SageMaker
  27. 27. Demos https://github.com/juliensimon/dlnotebooks
  28. 28. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Resources https://aws.amazon.com/machine-learning https://aws.amazon.com/sagemaker https://aws.amazon.com/blogs/ai https://gluon.mxnet.io An overview of Amazon SageMaker https://www.youtube.com/watch?v=ym7NEYEx9x4
  29. 29. Thank you! Julien Simon, AI Evangelist, EMEA @julsimon https://medium.com/@julsimon

×