Introduction to the Artificial Intelligence
and Computer Vision revolution
Darian Frajberg
darian.frajberg@polimi.it
October 30, 2017
2
Introduction
§ What is Artificial Intelligence?
Computers with the ability to
reason as humans
§ What is Machine Learning?
Computers with the ability to learn
without being explicitly
programmed
§ What is Deep Learning?
Computers with the ability to learn
by using artificial neural networks,
which were inspired by the
structure and function of the brain
3
§ What is Computer Vision?
The ability of computers to acquire, analyze and understand digital
images/videos
”If We Want Machines to Think, We Need to Teach Them to See."
-Fei Fei Li, Director of Stanford AI Lab and Stanford Vision Lab
Introduction
4
From Hand-crafted Features to Learned Features
§ Traditional Computer Vision
§ Deep Learning
Sven Behnke: Visual Perception using Deep Convolutional Neural Networks, Bilbao DeepLearn Summer School (2017)
5
Deep Learning breakthrough
§ Data set with over 15M labeled images
§ Approximately 22k categories
§ Collected from web and labeled by Amazon Mechanical Turk
(crowdsourcing tool)
http://www.image-net.org
6
Deep Learning breakthrough
Large Scale Visual Recognition Challenge (ILSVRC)
§ Annual competition of image classification at large scale since 2010
§ Classification: make 5 guesses about the image label
§ 1K categories
§ 1.2M training images
§ 100k test images
Russakovsky, Olga, et al. "Imagenet large scale visual recognition challenge." International Journal of Computer Vision
115.3 (2015): 211-252.
7
Deep Learning breakthrough
ILSVRC-2012 Results
AlexNet
Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural
networks." Advances in neural information processing systems. 2012.
Place Model Team Top-5 (test)
1st AlexNet (CNN) SuperVision 15.3%
2nd SIFT + FVs ISI 26.2%
3rd SVM OXFORD_VGG 26.97%
+10.9%
8
Deep Learning breakthrough
28,19
25,77
15,31
11,2
7,32 6,66
3,57 2,99 2,25
5,1
0
5
10
15
20
25
30
2010
(NEC)
2011
(Xerox)
2012
(AlexNet)
2013
(ZF)
2014
(VGG)
2014
(GoogleNet)
Human 2015
(ResNet)
2016
(Ensemble)
2017
(Ensemble)
ILSVRC top-5 error on ImageNet
Human
Deep
Shallow
First Deep Architecture
First to beat Human
9
Deep Learning breakthrough
§ Which were the key elements to achieve those results?
10
Data Models
§ Deep Artificial Neural Networks have accomplished outstanding results
§ The term “deep” refers to the number of hidden layers in the neural
network between the input and the output
§ In Computer Vision, in particular, Convolutional Neural Networks (CNNs)
are the architecture having more success
§ This architecture leads to better models, able to learn more complex non-
linear features
11
Big Data collection
12
Processing power
§ Acceleration
• Training
• Deployment
§ DL oriented hardware
• GPU
• ASIC
• CPU
• FPGA
13
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture9.pdf
Deep Learning software
Caffe
(UC Berkeley)
Caffe2
(Facebook)
Torch
(NYU/Facebook)
PyTorch
(Facebook)
Theano
(U Montreal)
TensorFlow
(Google)
Paddle
(Baidu)
CNTK
(Microsoft)
MXNet
(Amazon)
Developed by U Washington, CMU,
MIT, Hong Kong U, etc, but main
framework of choice at AWS
MatConvNet
(University of
Oxford)
Keras
(François Chollet)
Deeplearning4j
(SkyTeam)
And more…
14
Some Computer Vision tasks
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf
15
Some Computer Vision applications
§ Autonomous vehicles
§ Face recognition
§ Gesture recognition
§ Augmented reality
§ Industrial automation and inspection
§ Medical and biomedical
§ Monitoring and surveillance
§ Image retrieval
§ Photography and video enhancement
16
In 2012 Facebook announced the acquisition of
Face.com, a facial recognition technology company
that was based in Israel (US$60 million)
In 2014 they published a paper on “Deepface”. Facebook used a small part
of its database of users and outperformed all face recognition benchmarks
§ Data set: 4000 people x 1100 images for each one = 4,4M images
Taigman, Yaniv, et al. "Deepface: Closing the gap to human-level performance in face verification." Proceedings of the
IEEE conference on computer vision and pattern recognition. 2014.
Face recognition
17
Face recognition
Evaluation by using public data set benchmarks
Taigman, Yaniv, et al. "Deepface: Closing the gap to human-level performance in face verification." Proceedings of the
IEEE conference on computer vision and pattern recognition. 2014.
18
Classification of skin cancer
§ CNN training on 129.450 images
§ Comparison with dermatologists
Esteva, Andre, et al. "Dermatologist-level classification of skin cancer with deep neural networks." Nature 542.7639
(2017): 115-118.
19
Data vs Performance
More data, more performance
20
Small labeled data sets
Problems
§ Overfitting becomes much harder to avoid
§ Outliers become more dangerous
Solutions
§ Clean data noise
§ Get more annotated data
§ Reduce model complexity, being careful with underfitting
§ Apply regularizations
§ Semi-supervised learning
§ Apply transfer learning
§ Apply data augmentation
§ Generate synthetic data
21
Transfer learning
It is the ability of an AI to learn from a certain task/domain and apply its
pre-learned knowledge to another new task/domain
http://ruder.io/transfer-learning/
22
Data augmentation
For many problems, we can use known invariances to transform existing
training samples into new training samples (not validation and test!)
For example, for image classification and object recognition, we have:
§ Translation invariance
§ Limited scale invariance
§ Limited rotation invariance
§ Limited photometric and color invariance
23
Synthetic data
“Any production data applicable to a given situation that are not
obtained by direct measurement”
-McGraw-Hill Dictionary of Scientific & Technical Terms
Problems
§ Gap between synthetic and real data
§ The final model has to generalize and
work well with real data
Solutions
§ Transfer Learning
§ Generative Adversarial Networks
24
MPIIGaze data set
Data set to estimate the appearance-based gaze
Zhang, Xucong, et al. "Appearance-based gaze estimation in the wild." Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition. 2015.
25
Generative Adversarial Network
Simulated+Unsupervised learning to add realism to the simulator while
preserving the annotations of the synthetic images
Shrivastava, Ashish, et al. "Learning from simulated and unsupervised images through adversarial training." arXiv
preprint arXiv:1612.07828 (2016).
26
Generative Adversarial Network
Results
Shrivastava, Ashish, et al. "Learning from simulated and unsupervised images through adversarial training." arXiv
preprint arXiv:1612.07828 (2016).
27
Challenges in Deep Learning
§ Society adaptation
§ Data Bias
§ Responsibility regulations and implications
§ Black box understanding
§ Multitask learning
§ Continuous learning
§ Self learning
28
Society adaptation
“Robots will be able to do everything better than us”
-Elon Musk, founder of PayPal, Tesla Motors, SpaceX, OpenAI, Neuralink, etc
29
Data Bias
”Forget Killer Robots—Bias Is the Real AI Danger."
-John Giannandrea, AI Chief at Google
30
Data Bias
Microsoft created Tay
AI chat bot on March
2016
It was designed to
mimic the behavior of
an American teenager
girl and to learn from
interacting with
Twitter users
After a few hours it
became offensive and
racist, and it had to
be shut down
31
Responsibility regulations and implications
Autonomous vehicles
§ What happens if there is an
accident?
§ Who is responsible for it?
§ How can it be analyzed?
32
Black box understanding
Deep neural networks learn features based on the data that they are
provided and they are usually not understood by humans
Problems
§ Trust them without understanding well the reasoning behind predictions
§ Understand and fix erroneous predictions causes
“Whether it’s an investment decision, a medical decision, or maybe a
military decision, you don’t want to just rely on a black box method”
-Tommi Jaakkola, Professor of Computer Science and AI at MIT
Deep ModelINPUT OUTPUT
33
Black box understanding
Explaining an image classification prediction made by Google’s Inception
neural network
Top-3 classes:
§ Electric guitar
(p = 0.32)
§ Acoustic guitar
(p = 0.24)
§ Labrador
(p = 0.21)
Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. "Why should i trust you?: Explaining the predictions of any
classifier." Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
ACM, 2016.
34
Black box understanding
Explaining an erroneous image prediction made by a Wolf/Husky classifier
Solution
1. Verify whether the training data set contains mostly wolves with snow in
the background
2. Increase data set with images of Huskies in the snow and with images of
Wolves in other environments
3. Retrain
Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. "Why should i trust you?: Explaining the predictions of any
classifier." Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
ACM, 2016.
35
Multitask and continuous learning
Multitask learning
§ Flexible and general purpose AI,
able to solve different problems,
instead of being built to face just a
specific one
Continuous learning
§ Adaptable AI, able to evolve and
get adjusted to the new knowledge,
without the need to re-train every
time
CONTINUOUS
LEARNING
36
Self learning
It is the ability of AI to learn unsupervised features by itself by using
algorithms that are able to learn from unlabeled data
Unlabeled data is less informative, but it can be massive and
inexpensive/free, which can lead to better performance
37
Artificial Intelligence milestones
Deep Blue
Chess-playing program (IBM)
It defeated the world champion in 1997
It used brute force computation and
clever ad-hoc algorithms
In 2014 Google acquired DeepMind,
a British AI specialized company
(US$500 million)
AlphaGo Lee
Go-playing program (Google/DeepMind)
It defeated the world champion in 2016
It used Deep Reinforcement Learning based
on human professional games and later on
games against instances of itself
38
Artificial Intelligence milestones
AlphaGo Zero
Go-playing program (Google/DeepMind)
It defeated previous versions of AlphaGo in 2017
It used Deep Reinforcement Learning entirely
based on self-playing, without any human data
and using less processing power
Silver, David, et al. "Mastering the game of Go without human knowledge." Nature 550.7676 (2017): 354-359.
Version Hardware Elo rating Matches
AlphaGo Fan 176 GPUs 3,144 5:0 against Fan Hui
AlphaGo Lee 48 TPUs 3,739 4:1 against Lee Sedol
AlphaGo Master 4 TPUs v2 4,858 60:0 against professional players
AlphaGo Zero 4 TPUs v2 5,185
100:0 against AlphaGo Lee
89:11 against AlphaGo Master
39
Conclusions
§ AI is not just the future, but it is
already the present
§ It is currently applied in a wide range
of fields with outstanding results
§ It has already equaled and even
outperformed humans in certain
applications
§ It is of big interest for both Industry
and Academy
§ It has a huge potential and there are
still many open challenges and
possible applications
40
Thank you
Introduction to the Artificial Intelligence
and Computer Vision revolution
Darian Frajberg
darian.frajberg@polimi.it

Introduction to the Artificial Intelligence and Computer Vision revolution

  • 1.
    Introduction to theArtificial Intelligence and Computer Vision revolution Darian Frajberg darian.frajberg@polimi.it October 30, 2017
  • 2.
    2 Introduction § What isArtificial Intelligence? Computers with the ability to reason as humans § What is Machine Learning? Computers with the ability to learn without being explicitly programmed § What is Deep Learning? Computers with the ability to learn by using artificial neural networks, which were inspired by the structure and function of the brain
  • 3.
    3 § What isComputer Vision? The ability of computers to acquire, analyze and understand digital images/videos ”If We Want Machines to Think, We Need to Teach Them to See." -Fei Fei Li, Director of Stanford AI Lab and Stanford Vision Lab Introduction
  • 4.
    4 From Hand-crafted Featuresto Learned Features § Traditional Computer Vision § Deep Learning Sven Behnke: Visual Perception using Deep Convolutional Neural Networks, Bilbao DeepLearn Summer School (2017)
  • 5.
    5 Deep Learning breakthrough §Data set with over 15M labeled images § Approximately 22k categories § Collected from web and labeled by Amazon Mechanical Turk (crowdsourcing tool) http://www.image-net.org
  • 6.
    6 Deep Learning breakthrough LargeScale Visual Recognition Challenge (ILSVRC) § Annual competition of image classification at large scale since 2010 § Classification: make 5 guesses about the image label § 1K categories § 1.2M training images § 100k test images Russakovsky, Olga, et al. "Imagenet large scale visual recognition challenge." International Journal of Computer Vision 115.3 (2015): 211-252.
  • 7.
    7 Deep Learning breakthrough ILSVRC-2012Results AlexNet Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012. Place Model Team Top-5 (test) 1st AlexNet (CNN) SuperVision 15.3% 2nd SIFT + FVs ISI 26.2% 3rd SVM OXFORD_VGG 26.97% +10.9%
  • 8.
    8 Deep Learning breakthrough 28,19 25,77 15,31 11,2 7,326,66 3,57 2,99 2,25 5,1 0 5 10 15 20 25 30 2010 (NEC) 2011 (Xerox) 2012 (AlexNet) 2013 (ZF) 2014 (VGG) 2014 (GoogleNet) Human 2015 (ResNet) 2016 (Ensemble) 2017 (Ensemble) ILSVRC top-5 error on ImageNet Human Deep Shallow First Deep Architecture First to beat Human
  • 9.
    9 Deep Learning breakthrough §Which were the key elements to achieve those results?
  • 10.
    10 Data Models § DeepArtificial Neural Networks have accomplished outstanding results § The term “deep” refers to the number of hidden layers in the neural network between the input and the output § In Computer Vision, in particular, Convolutional Neural Networks (CNNs) are the architecture having more success § This architecture leads to better models, able to learn more complex non- linear features
  • 11.
  • 12.
    12 Processing power § Acceleration •Training • Deployment § DL oriented hardware • GPU • ASIC • CPU • FPGA
  • 13.
    13 http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture9.pdf Deep Learning software Caffe (UCBerkeley) Caffe2 (Facebook) Torch (NYU/Facebook) PyTorch (Facebook) Theano (U Montreal) TensorFlow (Google) Paddle (Baidu) CNTK (Microsoft) MXNet (Amazon) Developed by U Washington, CMU, MIT, Hong Kong U, etc, but main framework of choice at AWS MatConvNet (University of Oxford) Keras (François Chollet) Deeplearning4j (SkyTeam) And more…
  • 14.
    14 Some Computer Visiontasks http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf
  • 15.
    15 Some Computer Visionapplications § Autonomous vehicles § Face recognition § Gesture recognition § Augmented reality § Industrial automation and inspection § Medical and biomedical § Monitoring and surveillance § Image retrieval § Photography and video enhancement
  • 16.
    16 In 2012 Facebookannounced the acquisition of Face.com, a facial recognition technology company that was based in Israel (US$60 million) In 2014 they published a paper on “Deepface”. Facebook used a small part of its database of users and outperformed all face recognition benchmarks § Data set: 4000 people x 1100 images for each one = 4,4M images Taigman, Yaniv, et al. "Deepface: Closing the gap to human-level performance in face verification." Proceedings of the IEEE conference on computer vision and pattern recognition. 2014. Face recognition
  • 17.
    17 Face recognition Evaluation byusing public data set benchmarks Taigman, Yaniv, et al. "Deepface: Closing the gap to human-level performance in face verification." Proceedings of the IEEE conference on computer vision and pattern recognition. 2014.
  • 18.
    18 Classification of skincancer § CNN training on 129.450 images § Comparison with dermatologists Esteva, Andre, et al. "Dermatologist-level classification of skin cancer with deep neural networks." Nature 542.7639 (2017): 115-118.
  • 19.
    19 Data vs Performance Moredata, more performance
  • 20.
    20 Small labeled datasets Problems § Overfitting becomes much harder to avoid § Outliers become more dangerous Solutions § Clean data noise § Get more annotated data § Reduce model complexity, being careful with underfitting § Apply regularizations § Semi-supervised learning § Apply transfer learning § Apply data augmentation § Generate synthetic data
  • 21.
    21 Transfer learning It isthe ability of an AI to learn from a certain task/domain and apply its pre-learned knowledge to another new task/domain http://ruder.io/transfer-learning/
  • 22.
    22 Data augmentation For manyproblems, we can use known invariances to transform existing training samples into new training samples (not validation and test!) For example, for image classification and object recognition, we have: § Translation invariance § Limited scale invariance § Limited rotation invariance § Limited photometric and color invariance
  • 23.
    23 Synthetic data “Any productiondata applicable to a given situation that are not obtained by direct measurement” -McGraw-Hill Dictionary of Scientific & Technical Terms Problems § Gap between synthetic and real data § The final model has to generalize and work well with real data Solutions § Transfer Learning § Generative Adversarial Networks
  • 24.
    24 MPIIGaze data set Dataset to estimate the appearance-based gaze Zhang, Xucong, et al. "Appearance-based gaze estimation in the wild." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
  • 25.
    25 Generative Adversarial Network Simulated+Unsupervisedlearning to add realism to the simulator while preserving the annotations of the synthetic images Shrivastava, Ashish, et al. "Learning from simulated and unsupervised images through adversarial training." arXiv preprint arXiv:1612.07828 (2016).
  • 26.
    26 Generative Adversarial Network Results Shrivastava,Ashish, et al. "Learning from simulated and unsupervised images through adversarial training." arXiv preprint arXiv:1612.07828 (2016).
  • 27.
    27 Challenges in DeepLearning § Society adaptation § Data Bias § Responsibility regulations and implications § Black box understanding § Multitask learning § Continuous learning § Self learning
  • 28.
    28 Society adaptation “Robots willbe able to do everything better than us” -Elon Musk, founder of PayPal, Tesla Motors, SpaceX, OpenAI, Neuralink, etc
  • 29.
    29 Data Bias ”Forget KillerRobots—Bias Is the Real AI Danger." -John Giannandrea, AI Chief at Google
  • 30.
    30 Data Bias Microsoft createdTay AI chat bot on March 2016 It was designed to mimic the behavior of an American teenager girl and to learn from interacting with Twitter users After a few hours it became offensive and racist, and it had to be shut down
  • 31.
    31 Responsibility regulations andimplications Autonomous vehicles § What happens if there is an accident? § Who is responsible for it? § How can it be analyzed?
  • 32.
    32 Black box understanding Deepneural networks learn features based on the data that they are provided and they are usually not understood by humans Problems § Trust them without understanding well the reasoning behind predictions § Understand and fix erroneous predictions causes “Whether it’s an investment decision, a medical decision, or maybe a military decision, you don’t want to just rely on a black box method” -Tommi Jaakkola, Professor of Computer Science and AI at MIT Deep ModelINPUT OUTPUT
  • 33.
    33 Black box understanding Explainingan image classification prediction made by Google’s Inception neural network Top-3 classes: § Electric guitar (p = 0.32) § Acoustic guitar (p = 0.24) § Labrador (p = 0.21) Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. "Why should i trust you?: Explaining the predictions of any classifier." Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016.
  • 34.
    34 Black box understanding Explainingan erroneous image prediction made by a Wolf/Husky classifier Solution 1. Verify whether the training data set contains mostly wolves with snow in the background 2. Increase data set with images of Huskies in the snow and with images of Wolves in other environments 3. Retrain Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. "Why should i trust you?: Explaining the predictions of any classifier." Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016.
  • 35.
    35 Multitask and continuouslearning Multitask learning § Flexible and general purpose AI, able to solve different problems, instead of being built to face just a specific one Continuous learning § Adaptable AI, able to evolve and get adjusted to the new knowledge, without the need to re-train every time CONTINUOUS LEARNING
  • 36.
    36 Self learning It isthe ability of AI to learn unsupervised features by itself by using algorithms that are able to learn from unlabeled data Unlabeled data is less informative, but it can be massive and inexpensive/free, which can lead to better performance
  • 37.
    37 Artificial Intelligence milestones DeepBlue Chess-playing program (IBM) It defeated the world champion in 1997 It used brute force computation and clever ad-hoc algorithms In 2014 Google acquired DeepMind, a British AI specialized company (US$500 million) AlphaGo Lee Go-playing program (Google/DeepMind) It defeated the world champion in 2016 It used Deep Reinforcement Learning based on human professional games and later on games against instances of itself
  • 38.
    38 Artificial Intelligence milestones AlphaGoZero Go-playing program (Google/DeepMind) It defeated previous versions of AlphaGo in 2017 It used Deep Reinforcement Learning entirely based on self-playing, without any human data and using less processing power Silver, David, et al. "Mastering the game of Go without human knowledge." Nature 550.7676 (2017): 354-359. Version Hardware Elo rating Matches AlphaGo Fan 176 GPUs 3,144 5:0 against Fan Hui AlphaGo Lee 48 TPUs 3,739 4:1 against Lee Sedol AlphaGo Master 4 TPUs v2 4,858 60:0 against professional players AlphaGo Zero 4 TPUs v2 5,185 100:0 against AlphaGo Lee 89:11 against AlphaGo Master
  • 39.
    39 Conclusions § AI isnot just the future, but it is already the present § It is currently applied in a wide range of fields with outstanding results § It has already equaled and even outperformed humans in certain applications § It is of big interest for both Industry and Academy § It has a huge potential and there are still many open challenges and possible applications
  • 40.
    40 Thank you Introduction tothe Artificial Intelligence and Computer Vision revolution Darian Frajberg darian.frajberg@polimi.it