Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Introduction to the Artificial Intelligence and Computer Vision revolution

618 views

Published on

Scientific seminar at Politecnico di Milano
Como, Italy
October 2017

Published in: Technology
  • Be the first to comment

Introduction to the Artificial Intelligence and Computer Vision revolution

  1. 1. Introduction to the Artificial Intelligence and Computer Vision revolution Darian Frajberg darian.frajberg@polimi.it October 30, 2017
  2. 2. 2 Introduction § What is Artificial Intelligence? Computers with the ability to reason as humans § What is Machine Learning? Computers with the ability to learn without being explicitly programmed § What is Deep Learning? Computers with the ability to learn by using artificial neural networks, which were inspired by the structure and function of the brain
  3. 3. 3 § What is Computer Vision? The ability of computers to acquire, analyze and understand digital images/videos ”If We Want Machines to Think, We Need to Teach Them to See." -Fei Fei Li, Director of Stanford AI Lab and Stanford Vision Lab Introduction
  4. 4. 4 From Hand-crafted Features to Learned Features § Traditional Computer Vision § Deep Learning Sven Behnke: Visual Perception using Deep Convolutional Neural Networks, Bilbao DeepLearn Summer School (2017)
  5. 5. 5 Deep Learning breakthrough § Data set with over 15M labeled images § Approximately 22k categories § Collected from web and labeled by Amazon Mechanical Turk (crowdsourcing tool) http://www.image-net.org
  6. 6. 6 Deep Learning breakthrough Large Scale Visual Recognition Challenge (ILSVRC) § Annual competition of image classification at large scale since 2010 § Classification: make 5 guesses about the image label § 1K categories § 1.2M training images § 100k test images Russakovsky, Olga, et al. "Imagenet large scale visual recognition challenge." International Journal of Computer Vision 115.3 (2015): 211-252.
  7. 7. 7 Deep Learning breakthrough ILSVRC-2012 Results AlexNet Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012. Place Model Team Top-5 (test) 1st AlexNet (CNN) SuperVision 15.3% 2nd SIFT + FVs ISI 26.2% 3rd SVM OXFORD_VGG 26.97% +10.9%
  8. 8. 8 Deep Learning breakthrough 28,19 25,77 15,31 11,2 7,32 6,66 3,57 2,99 2,25 5,1 0 5 10 15 20 25 30 2010 (NEC) 2011 (Xerox) 2012 (AlexNet) 2013 (ZF) 2014 (VGG) 2014 (GoogleNet) Human 2015 (ResNet) 2016 (Ensemble) 2017 (Ensemble) ILSVRC top-5 error on ImageNet Human Deep Shallow First Deep Architecture First to beat Human
  9. 9. 9 Deep Learning breakthrough § Which were the key elements to achieve those results?
  10. 10. 10 Data Models § Deep Artificial Neural Networks have accomplished outstanding results § The term “deep” refers to the number of hidden layers in the neural network between the input and the output § In Computer Vision, in particular, Convolutional Neural Networks (CNNs) are the architecture having more success § This architecture leads to better models, able to learn more complex non- linear features
  11. 11. 11 Big Data collection
  12. 12. 12 Processing power § Acceleration • Training • Deployment § DL oriented hardware • GPU • ASIC • CPU • FPGA
  13. 13. 13 http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture9.pdf Deep Learning software Caffe (UC Berkeley) Caffe2 (Facebook) Torch (NYU/Facebook) PyTorch (Facebook) Theano (U Montreal) TensorFlow (Google) Paddle (Baidu) CNTK (Microsoft) MXNet (Amazon) Developed by U Washington, CMU, MIT, Hong Kong U, etc, but main framework of choice at AWS MatConvNet (University of Oxford) Keras (François Chollet) Deeplearning4j (SkyTeam) And more…
  14. 14. 14 Some Computer Vision tasks http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf
  15. 15. 15 Some Computer Vision applications § Autonomous vehicles § Face recognition § Gesture recognition § Augmented reality § Industrial automation and inspection § Medical and biomedical § Monitoring and surveillance § Image retrieval § Photography and video enhancement
  16. 16. 16 In 2012 Facebook announced the acquisition of Face.com, a facial recognition technology company that was based in Israel (US$60 million) In 2014 they published a paper on “Deepface”. Facebook used a small part of its database of users and outperformed all face recognition benchmarks § Data set: 4000 people x 1100 images for each one = 4,4M images Taigman, Yaniv, et al. "Deepface: Closing the gap to human-level performance in face verification." Proceedings of the IEEE conference on computer vision and pattern recognition. 2014. Face recognition
  17. 17. 17 Face recognition Evaluation by using public data set benchmarks Taigman, Yaniv, et al. "Deepface: Closing the gap to human-level performance in face verification." Proceedings of the IEEE conference on computer vision and pattern recognition. 2014.
  18. 18. 18 Classification of skin cancer § CNN training on 129.450 images § Comparison with dermatologists Esteva, Andre, et al. "Dermatologist-level classification of skin cancer with deep neural networks." Nature 542.7639 (2017): 115-118.
  19. 19. 19 Data vs Performance More data, more performance
  20. 20. 20 Small labeled data sets Problems § Overfitting becomes much harder to avoid § Outliers become more dangerous Solutions § Clean data noise § Get more annotated data § Reduce model complexity, being careful with underfitting § Apply regularizations § Semi-supervised learning § Apply transfer learning § Apply data augmentation § Generate synthetic data
  21. 21. 21 Transfer learning It is the ability of an AI to learn from a certain task/domain and apply its pre-learned knowledge to another new task/domain http://ruder.io/transfer-learning/
  22. 22. 22 Data augmentation For many problems, we can use known invariances to transform existing training samples into new training samples (not validation and test!) For example, for image classification and object recognition, we have: § Translation invariance § Limited scale invariance § Limited rotation invariance § Limited photometric and color invariance
  23. 23. 23 Synthetic data “Any production data applicable to a given situation that are not obtained by direct measurement” -McGraw-Hill Dictionary of Scientific & Technical Terms Problems § Gap between synthetic and real data § The final model has to generalize and work well with real data Solutions § Transfer Learning § Generative Adversarial Networks
  24. 24. 24 MPIIGaze data set Data set to estimate the appearance-based gaze Zhang, Xucong, et al. "Appearance-based gaze estimation in the wild." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
  25. 25. 25 Generative Adversarial Network Simulated+Unsupervised learning to add realism to the simulator while preserving the annotations of the synthetic images Shrivastava, Ashish, et al. "Learning from simulated and unsupervised images through adversarial training." arXiv preprint arXiv:1612.07828 (2016).
  26. 26. 26 Generative Adversarial Network Results Shrivastava, Ashish, et al. "Learning from simulated and unsupervised images through adversarial training." arXiv preprint arXiv:1612.07828 (2016).
  27. 27. 27 Challenges in Deep Learning § Society adaptation § Data Bias § Responsibility regulations and implications § Black box understanding § Multitask learning § Continuous learning § Self learning
  28. 28. 28 Society adaptation “Robots will be able to do everything better than us” -Elon Musk, founder of PayPal, Tesla Motors, SpaceX, OpenAI, Neuralink, etc
  29. 29. 29 Data Bias ”Forget Killer Robots—Bias Is the Real AI Danger." -John Giannandrea, AI Chief at Google
  30. 30. 30 Data Bias Microsoft created Tay AI chat bot on March 2016 It was designed to mimic the behavior of an American teenager girl and to learn from interacting with Twitter users After a few hours it became offensive and racist, and it had to be shut down
  31. 31. 31 Responsibility regulations and implications Autonomous vehicles § What happens if there is an accident? § Who is responsible for it? § How can it be analyzed?
  32. 32. 32 Black box understanding Deep neural networks learn features based on the data that they are provided and they are usually not understood by humans Problems § Trust them without understanding well the reasoning behind predictions § Understand and fix erroneous predictions causes “Whether it’s an investment decision, a medical decision, or maybe a military decision, you don’t want to just rely on a black box method” -Tommi Jaakkola, Professor of Computer Science and AI at MIT Deep ModelINPUT OUTPUT
  33. 33. 33 Black box understanding Explaining an image classification prediction made by Google’s Inception neural network Top-3 classes: § Electric guitar (p = 0.32) § Acoustic guitar (p = 0.24) § Labrador (p = 0.21) Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. "Why should i trust you?: Explaining the predictions of any classifier." Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016.
  34. 34. 34 Black box understanding Explaining an erroneous image prediction made by a Wolf/Husky classifier Solution 1. Verify whether the training data set contains mostly wolves with snow in the background 2. Increase data set with images of Huskies in the snow and with images of Wolves in other environments 3. Retrain Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. "Why should i trust you?: Explaining the predictions of any classifier." Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016.
  35. 35. 35 Multitask and continuous learning Multitask learning § Flexible and general purpose AI, able to solve different problems, instead of being built to face just a specific one Continuous learning § Adaptable AI, able to evolve and get adjusted to the new knowledge, without the need to re-train every time CONTINUOUS LEARNING
  36. 36. 36 Self learning It is the ability of AI to learn unsupervised features by itself by using algorithms that are able to learn from unlabeled data Unlabeled data is less informative, but it can be massive and inexpensive/free, which can lead to better performance
  37. 37. 37 Artificial Intelligence milestones Deep Blue Chess-playing program (IBM) It defeated the world champion in 1997 It used brute force computation and clever ad-hoc algorithms In 2014 Google acquired DeepMind, a British AI specialized company (US$500 million) AlphaGo Lee Go-playing program (Google/DeepMind) It defeated the world champion in 2016 It used Deep Reinforcement Learning based on human professional games and later on games against instances of itself
  38. 38. 38 Artificial Intelligence milestones AlphaGo Zero Go-playing program (Google/DeepMind) It defeated previous versions of AlphaGo in 2017 It used Deep Reinforcement Learning entirely based on self-playing, without any human data and using less processing power Silver, David, et al. "Mastering the game of Go without human knowledge." Nature 550.7676 (2017): 354-359. Version Hardware Elo rating Matches AlphaGo Fan 176 GPUs 3,144 5:0 against Fan Hui AlphaGo Lee 48 TPUs 3,739 4:1 against Lee Sedol AlphaGo Master 4 TPUs v2 4,858 60:0 against professional players AlphaGo Zero 4 TPUs v2 5,185 100:0 against AlphaGo Lee 89:11 against AlphaGo Master
  39. 39. 39 Conclusions § AI is not just the future, but it is already the present § It is currently applied in a wide range of fields with outstanding results § It has already equaled and even outperformed humans in certain applications § It is of big interest for both Industry and Academy § It has a huge potential and there are still many open challenges and possible applications
  40. 40. 40 Thank you Introduction to the Artificial Intelligence and Computer Vision revolution Darian Frajberg darian.frajberg@polimi.it

×