Successfully reported this slideshow.

Intro to Deep Learning for Computer Vision

4

Share

Loading in …3
×
1 of 23
1 of 23

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Related Audiobooks

Free with a 14 day trial from Scribd

See all

Intro to Deep Learning for Computer Vision

  1. 1. Applications of Deep Learning in Computer Vision Christoph Körner
  2. 2. Outline 1) Introduction to Neural Networks 2) Deep Learning 3) Applications in Computer Vision 4) Conclusion
  3. 3. Why Deep Learning? ● Wins every computer vision challenge (classification, segmentation, etc.) ● Can be applied in various domains (speech recognition, game prediction, computer vision, etc.) ● Beats human accuracy ● Big communities and resources ● Hardware for Deep Learning
  4. 4. Perceptron (1958) ● Weighted sum of inputs ● Threshold operator
  5. 5. Artificial Neural Network (1960) ● Universal function approximator ● Can solve the XOR problem
  6. 6. Backpropagation (1982) ● Propagate the error through the network ● Allows Optimization (SGD, etc.) ● Enables training of multi-layer networks
  7. 7. Convolution and Pooling (1989) ● Less parameters than hidden layers ● More efficient training
  8. 8. Handwritten ZIP Codes (1989) ● 30 training passes ● Achieved 92% accuracy
  9. 9. What happened until 2011? ● Better Initialization ● Better Non-linearities: ReLU ● 1000 times more training data ● More computing power ● Factor 1 million speedup in training time through parallelization on GPUs
  10. 10. Deep Learning ● Conv-, Pool- and Fully-Connected Layers ● ReLU activations ● Deep nested models with many parameters ● New layer types and structures ● New techniques to reduce overfitting ● Loads of training data and compute power ● 10.000.000 images ● Weeks of training on multi-GPU machines
  11. 11. AlexNet (2012) ● 62.378.344 parameters (250MB) ● 24 layers
  12. 12. VGGNet (2013) ● 102.908.520 parameters (412MB) ● 23 layers
  13. 13. GoogLeNet (2014) ● 6.998.552 parameters (28MB) ● 143 layers
  14. 14. Inception Module ● Heavy use of 1x1 convolutions (applied along the depth dimension) ● Very efficient
  15. 15. ResNet (2015) ● Residual learning ● 152 layers
  16. 16. Applications in Computer Vision
  17. 17. Classification ● One class per image ● Softmax layer at the end
  18. 18. Localization ● Bounding box Regression ● Sigmoid layer with 4 outputs at the end ● Via Classification
  19. 19. Detection ● Multiple Objects, multiple classes ● Solved using multiple networks
  20. 20. Segmentation
  21. 21. More Applications ● Compression ● Auto-encoders, Self-organizing maps ● Image Captioning ● Solved with Recurrent Architecture ● Image Stylization ● Clustering ● Many more...
  22. 22. Conclusion ● Powerful, learn from data instead of hand-crafted feature extraction ● Better than humans ● Deeper is always better ● Overfitting ● More data is always better ● Data quality ● Ground truth
  23. 23. Thank you! Christoph Körner

×