SlideShare a Scribd company logo
A gentle introduction
to Deep Learning
Jose Fernando Rodrigues-Jr
University of Sao Paulo, Brazil
Supervision: Sihem Amer-Yahia
Université Grenoble Alpes, France
Funding: Fundação de Amparo à Pesquisa do Estado de São Paulo (Fapesp)
Grant 2018/17620-5
Laboratoire d’Informatique de Grenoble
/66
About me and my university
● The University of Sao Paulo
-Ranked number 2 among latin-american universities
-Ranked in the 250-300 stratum in the world (UGA is in the 300-350 stratum)
(source: Times Higher Education, 2019)
● Faculty at University of Sao Paulo since 2010, associate professor since 2014
● My campus is in the city of Sao Carlos, country side of the state of Sao Paulo
● The HDI of Sao Carlos is 0.805 (Brazil is 0.754 and France is 0.897)
2
Laboratoire d’Informatique de Grenoble
/66
About me and my university
● The University of Sao Paulo
-Ranked number 2 among latin-american universities
-Ranked in the 250-300 stratum in the world (UGA is in the 300-350 stratum)
(source: Times Higher Education, 2019)
● Faculty at University of Sao Paulo since 2010, associate professor since 2014
● My campus is the city of Sao Carlos, country side of the state of Sao Paulo
● The IDH of Sao Carlos is 0.805 (Brazil is 0.754 and France is 0.897)
3
Laboratoire d’Informatique de Grenoble
/66
Deep Learning
-From the IEEE top 10 computing trends 2018, Deep Learning is the number 1
https://www.computer.org/press-room/2017-news/top-technology-trends-2018;
-Not new: most of the techniques are 20, 30, even 50, years old;
-Not necessarily deep: some architectures have one single (hidden) layer;
-Myth: it is about artificial intelligence, not artificial conscience.
4
Laboratoire d’Informatique de Grenoble
/66
Deep Learning
Specifically, Deep Learning refers to the revival of artificial intelligence (artificial
neural networks) due to four factors:
1) lots of data: while a child learns what a dog looks like from three images, a computer
demands 3 million images;
2) computing power: 2.0xx computers have memory and processing power orders of
magnitude higher than 19xx computers; GPUs scaled the process even more;
3) algorithmic improvements: gradient descent, back propagation and architectural
innovations amplified the range of possibilities;
4) robust frameworks: Theano, TensorFlow, Keras, and many others made complex
parallel math computing accessible.
5
Laboratoire d’Informatique de Grenoble
/66
Image classification breakthrough
Large Scale Visual Recognition Challenge (ILSVRC) - ImageNet for short
2017
Training: 1.2 million images
Validation: 150.000 images
Test: 50.000 images
1.000 classes
6
Laboratoire d’Informatique de Grenoble
/66
Image classification breakthrough
Large Scale Visual Recognition Challenge (ILSVRC) - ImageNet for short
Super-human performance
2017
Training: 1.2 million images
Validation: 150.000 images
Test: 50.000 images
1.000 classes
7
Laboratoire d’Informatique de Grenoble
/66
No more features engineering
8
Laboratoire d’Informatique de Grenoble
/66
No more features engineering
9
● The idea of features engineering applies to data
processing problems that demand features extraction.
This is not always the case;
● Yet, it is still possible to use Artificial Neural Networks
with manually extracted features - sometimes it is the
only course of action, like in regression problems.
Laboratoire d’Informatique de Grenoble
/66
Promising results
Soon (or already?) better than human skills:
-Computer Vision
-Text translation
-Text generation
-Games: Go, Chess, …
-Medicine: heart attack, neuro degenerative diseases, oncology….
Esteva, A. et al.; Dermatologist-level classification of skin cancer with deep neural networks,
Nature, 2017
-super-human performance on classifying skin lesions
-identified classes still unknown in the literature
-1500+ citations in one year
10
Laboratoire d’Informatique de Grenoble
/66
Turing award 2018
Yoshua Bengio, French-Canadian
(theoretical backgrounds)
Geoffrey Hinton, British-Canadian
(back propagation, AlexNet)
Yann LeCun, French
(convolutional networks and
engineering)
“For conceptual and engineering
breakthroughs that have made
deep neural networks a critical
component of computing.”
11
Laboratoire d’Informatique de Grenoble
/66
Context
12
Laboratoire d’Informatique de Grenoble
/66
Biology inspiration
13
Laboratoire d’Informatique de Grenoble
/66
Biology inspiration
14
● Inspiration only - not simulation. It is not yet fully
understood how the brain works.
Laboratoire d’Informatique de Grenoble
/66
Existential parenthesis
Why neurons?
-The universe is made up of sets (for Comp. Science, unordered lists without repetition)
-In a world of sets, what do smart things do?
Ans.: they build up functions (or maps, for CS)
-What is a function, broadly speaking?
Given two sets X and Y, a function defines a mapping between then: f: X→ Y
X and Y can be anything, objects, emotions, concepts, abstractions, skills, music, …
-To do that, nature (evolution) designed specialized cells, named neuros
A very big bunch of neurons is able to build functions! 15
Laboratoire d’Informatique de Grenoble
/66
Existential parenthesis
Why neurons?
-The universe is made up of sets (for CC, unordered lists without repetition)
-In a world of sets, what do smart things do?
Ans.: they build up functions (or maps, for CC)
-What is a function, broadly speaking?
Given two sets X and Y, a function defines a mapping between then: f: X→ Y
X and Y can be anything, objects, emotions, concepts, abstractions, skills, music, …
-To do that, nature (evolution) designed specialized cells, named neuros
A very big bunch of neurons is able to build functions! 16
First key concept:
1) An Artificial Neuron Network is a function;
Laboratoire d’Informatique de Grenoble
/66
Existential parenthesis
Why neurons?
-The universe is made up of sets (for CC, unordered lists without repetition)
-In a world of sets, what do smart things do?
Ans.: they build up functions (or maps, for CC)
-What is a function, broadly speaking?
Given two sets X and Y, a function defines a mapping between then: f: X→ Y
X and Y can be anything, objects, emotions, concepts, abstractions, skills, music, …
-To do that, nature (evolution) designed specialized cells, named neuros
A very big bunch of neurons is able to build functions!
Compared to numeric Math sets, smart beings deal with sets whose all elements
cannot be foreseen, not even exhaustively.
Math function: f: NI → IR; for example f(x) = xe
Open function: f: {all possible dogs} → {all known dog breeds}
Compared to numeric Math sets, smart beings deal with sets with domain's having a
number of unique elements that cannot be completely foreseen, not even exhaustively.
Math function: f: NI → IR; for example f(x) = xe
Open function: f: {all possible dogs} → {all known dog breeds}
Akita, Alaskan husky,
Bichon Frisé, Border
Terrier, Boxer, Brazilian
Mastiff, ….
?
17
Laboratoire d’Informatique de Grenoble
/66
Principle - artificial neuron
18
Laboratoire d’Informatique de Grenoble
/66
Principle - artificial neuron
19
Laboratoire d’Informatique de Grenoble
/66
Principle - artificial neuron
20
In matrix form ⇒ Very important
● 1 input 1 x n feature vector:
● 1 processing n x 1 neuron:
0 ... n
0
...
n
Laboratoire d’Informatique de Grenoble
/66
Principle - artificial neuron
21
In matrix form ⇒ Very important
● j input 1 x n feature vectors:
● k processing n x 1 neurons:
0 ... n-1
0
...
n
0 ... n-1
0 ... n-1
...
0:
1:
j-1:
0
...
n
0
...
n
...
0: 1: k-1:
= Ij x n
= Mn x k
Laboratoire d’Informatique de Grenoble
/66
Principle - artificial neuron
22
In matrix form ⇒ Very important
● j input 1 x n feature vectors:
● k processing n x 1 neurons:
0 ... n-1
0
...
n
0 ... n-1
0 ... n-1
...
0:
1:
j-1:
0
...
n
0
...
n
...
0: 1: k-1:
= I
= M
Now, remember matrix dot product:
And the neuron principle:
Laboratoire d’Informatique de Grenoble
/66
Principle - artificial neuron
23
Suppose:
● j input 1 x n=10 feature vectors:
● k=5 processing neurons 10 x 5:
0 ... n-1
0
...
n
0 ... n-1
0 ... n-1
...
0:
1:
j-1:
0
...
n
0
...
n
...
0: 1: k-1:
= I⇒ j x 10 matrix
= M ⇒ 10 x 5 matrix
10 features
10 weights
5 neurons
Laboratoire d’Informatique de Grenoble
/66
Principle - artificial neuron
24
Suppose:
● j input 1 x n=10 feature vectors:
● k=5 processing 10 x 1 neurons:
0 ... n-1
0
...
n
0 ... n-1
0 ... n-1
...
0:
1:
j-1:
0
...
n
0
...
n
...
0: 1: k-1:
= I⇒ j x 10 matrix
= M ⇒ 10 x 5 matrix
The processing of the j 1x10 vectors by the 10x5 neurons is
represented in the figure:
Which corresponds to the dot product Ijx10
.M10x5
The output is a matrix O corresponding to j new vectors, each with 5
transformed features, that is 0j x 5
Ijx10
M10x5
Laboratoire d’Informatique de Grenoble
/66
Supervised learning
25
Training: I know the answer
→ Learning, building model
Testing: I do not know the answer
→ Evaluation, using model
Laboratoire d’Informatique de Grenoble
/66
After all, an optimization problem
*Biases omitted for simplicity
26
Laboratoire d’Informatique de Grenoble
/66
After all, an optimization problem
*Biases omitted for simplicity
27
Laboratoire d’Informatique de Grenoble
/66
After all, an optimization problem
*Biases omitted for simplicity
28
Laboratoire d’Informatique de Grenoble
/66
After all, an optimization problem
*Biases omitted for simplicity
parameters
(mostly,
weights)
29
Laboratoire d’Informatique de Grenoble
/66
Principle - artificial neuron
30
Suppose:
● j input 1 x n=10 feature vectors:
● k=5 processing neurons 10 x 5:
0 ... n-1
0
...
n
0 ... n-1
0 ... n-1
...
0:
1:
j-1:
0
...
n
0
...
n
...
0: 1: k-1:
= I⇒ j x 10 matrix
= M ⇒ 10 x 5 matrix
10 features
10 weights
5 neurons
This is the object
of the
optimization,
what weights
lead to the
desired output?
Laboratoire d’Informatique de Grenoble
/66
After all, an optimization problem
*Biases omitted for simplicity
parameters
31
Second key concept:
1) An Artificial Neuron Network is a function;
2) The training of an ANN is an optimization problem;
Laboratoire d’Informatique de Grenoble
/66
After all, an optimization problem
*Biases omitted for simplicity
parameters
32
Attention:
- This presentation is only about the basics; in fact, it covers concepts on
Artificial Neural Networks;
- When features extraction is involved, like in image, and audio
processing, the process is much more complex;
- Actually, the deepness of "Deep Learning" has to do with these more
complex problems;
- Nevertheless, the principles are the same.
Laboratoire d’Informatique de Grenoble
/66
Overall (theoretical) process
1. Specify a structure and a loss function to guide the optimization;
2. Feed forward with matrix multiplication and non-linear activations;
3. while (not satisfactory results)
a. Compute the parameters’ adjust using gradient descent;
b. The network backpropagates using the multivariate chain rule;
c. Update the weights accordingly;
d. Classification/Regression.
33
Laboratoire d’Informatique de Grenoble
/66
Overall (theoretical) process
1. Specify a structure and a loss function to guide the optimization;
2. Feed forward with matrix multiplication and non-linear activations;
3. while (not satisfactory results)
a. Compute the parameters’ adjust using gradient descent;
b. The network backpropagates using the multivariate chain rule;
c. Update the weights accordingly;
d. Classification/Regression.
34
Laboratoire d’Informatique de Grenoble
/66
Error landscape
● The set of parameters defines an error landscape
● We want to move along this landscape to find the best minimum
(preferably the global minimum)
35Error landscape
Laboratoire d’Informatique de Grenoble
/66
How to converge to the proper parameters?
The standard solution is the gradient descent algorithm
1.Calculate the partial derivative
2.Backpropagate updating W as
3.Use the chain rule to propagate through all the layers
Loss
W
36
Laboratoire d’Informatique de Grenoble
/66
How to converge to the proper parameters?
The standard solution is the gradient descent algorithm
1.Calculate the partial derivative
2.Backpropagate updating W as
3.Use the chain rule to propagate through all the layers
Loss
W
37
The learning rate states how much to move
in the direction contrary to the gradient.
Laboratoire d’Informatique de Grenoble
/66
How to converge to the proper parameters?
38
Laboratoire d’Informatique de Grenoble
/66
How to converge to the proper parameters?
39
Third key concept:
1) An Artificial Neuron Network is a function;
2) The training of an ANN is an optimization problem;
3) Gradient descent is the ultimate method to move along the error landscape;
Laboratoire d’Informatique de Grenoble
/66
Different gradient descent methods
● There are many gradient descent-based optimizers;
● They vary with respect to the speed of convergence, processing cost,
learning rate, and decay factor;
● Adadelta is the most robust and widely used;
○ It is stochastic, hence, more robust against local minima
40
Laboratoire d’Informatique de Grenoble
/66
Different gradient descent methods
● There are many gradient descent-based optimizers;
● They vary with respect to the speed of convergence, processing cost,
learning rate, and decay factor;
● Adadelta is the most robust and widely used;
○ It is stochastic, hence, more robust against local minima
41
Adadelta uses adaptive learning
rate; the closer to a minimum, the
smaller the learning rate.
Laboratoire d’Informatique de Grenoble
/66
Millions of parameters
● Warning: even for mid-sized networks, the number of weights sums up
to thousands, even millions;
● This is responsible for the high computational cost of deep learning
42
Laboratoire d’Informatique de Grenoble
/66
Millions of parameters
● Warning: even for mid-sized networks, the number of weights sums up
to thousands, even millions;
● This is responsible for the high computational cost of deep learning
43
Laboratoire d’Informatique de Grenoble
/66
Deep Learning Frameworks
Implementing all these concepts from scratch is very hard (really!);
To ease the process, academic and industrial players built frameworks that:
-make linear algebra expression as simple as scalar algebra expression;
-calculate partial derivatives automatically (one line of code);
-perform back propagation;
-distribute the computation over GPUs.
Main frameworks: Theano, Google TensorFlow, Microsoft Cognitive Toolkit,
PyTorch, Keras, Apache MXNet, NVIDIA Caffe, Chainer, and others. 44
Laboratoire d’Informatique de Grenoble
/66
Deep Learning Frameworks
Oh, it is so easy! - NO!
You still have to:
-model the data input and output; a big deal of numbers organized in
multi-dimensional arrays;
-model the layers in terms of size and connectivity -- matrix dimensionality
will give you headaches;
-implement the neurons’ computations;
-implement the updating scheme;
-get used to symbolic coding.
45
Laboratoire d’Informatique de Grenoble
/66
How to choose a Framework
● You are a PhD student, or Posdoc, on DL itself: Theano, TensorFlow, Torch
● You want to use DL only to get features: Keras, Caffe
● You work in industry: TensorFlow, Caffe
● You started your 2 months internship: Keras, Caffe
● You want to give practice works to your students: Keras, Caffe
● You are curious about deep learning: Caffe
● You don’t even know python: Keras, Torch
Source: https://project.inria.fr/deeplearning/files/2016/05/DLFrameworks.pdf
46
Laboratoire d’Informatique de Grenoble
/66
How to choose a Framework
● You are a PhD student, or Posdoc, on DL itself: Theano, TensorFlow, Torch
● You want to use DL only to get features: Keras, Caffe
● You work in industry: TensorFlow, Caffe
● You started your 2 months internship: Keras, Caffe
● You want to give practise works to your students: Keras, Caffe
● You are curious about deep learning: Caffe
● You don’t even know python: Keras, Torch
Source: https://project.inria.fr/deeplearning/files/2016/05/DLFrameworks.pdf
47
Laboratoire d’Informatique de Grenoble
/66
Pitfalls
● Proper pre-processing;
● Optimizing the structure can be a never ending process;
● Preventing over or under fitting;
● Getting it to converge (to a high-quality local minimum);
● Making sure you have the right loss function;
● Doing data augmentation correctly.
Time-consuming
● Testing a single idea can take a week or more;
● Preprocessing large data takes long time;
● Symbolic programing is tough;
● Hyper-parameters + process variations ⇒ number of possible settings
explode. 48
Laboratoire d’Informatique de Grenoble
/66
Further concepts beyond the introduction
● Regularization (L1, L2,...)
● Cost (Loss) Function (exponential, cross-entropy, hellinger, …)
● Activation Function (ReLU, Hyperbolic tangent, sigmoid, …)
● Output layer (softmax, linear, …)
● Linear algebra using broadcasting
● Specialized layers (convolution, pooling, embedding, ...)
● Dropout, masking, padding, ...
49
Laboratoire d’Informatique de Grenoble
/66
The DL zoo
https://towardsdatascience.com/the-mostly-complete-chart-of-neural-networks-explained-3fb6f2367464 50
Laboratoire d’Informatique de Grenoble
/66
Further concepts beyond the introduction
● Regularization (L1, L2,...)
● Cost (Loss) Function (exponential, cross-entropy, hellinger, …)
● Activation Function (ReLU, Hyperbolic tangent, sigmoid, …)
● Output layer (softmax, linear, …)
● Linear algebra using broadcasting
● Specialized layers (convolution, pooling, embedding, ...)
● Dropout, masking, padding, ...
51
Key concepts
1) An Artificial Neuron Network is a function;
2) The training of an ANN is an optimization problem;
3) Gradient descent is the ultimate method to move along the error landscape.
Laboratoire d’Informatique de Grenoble
/66
That’s it for now
52

More Related Content

Similar to A gentle introduction to Deep Learning

know Machine Learning Basic Concepts.pdf
know Machine Learning Basic Concepts.pdfknow Machine Learning Basic Concepts.pdf
know Machine Learning Basic Concepts.pdf
hemangppatel
 
Deep learning tutorial 9/2019
Deep learning tutorial 9/2019Deep learning tutorial 9/2019
Deep learning tutorial 9/2019
Amr Rashed
 
Deep Learning Tutorial
Deep Learning TutorialDeep Learning Tutorial
Deep Learning Tutorial
Amr Rashed
 
Machine Learning ebook.pdf
Machine Learning ebook.pdfMachine Learning ebook.pdf
Machine Learning ebook.pdf
HODIT12
 
1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1
1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 11_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1
1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1
MostafaHazemMostafaa
 
深度学习639页PPT/////////////////////////////
深度学习639页PPT/////////////////////////////深度学习639页PPT/////////////////////////////
深度学习639页PPT/////////////////////////////
alicejiang7888
 
Machine Learning: The Bare Math Behind Libraries
Machine Learning: The Bare Math Behind LibrariesMachine Learning: The Bare Math Behind Libraries
Machine Learning: The Bare Math Behind Libraries
J On The Beach
 
Useing PSO to optimize logit model with Tensorflow
Useing PSO to optimize logit model with TensorflowUseing PSO to optimize logit model with Tensorflow
Useing PSO to optimize logit model with Tensorflow
Yi-Fan Liou
 
ML Basic Concepts.pdf
ML Basic Concepts.pdfML Basic Concepts.pdf
ML Basic Concepts.pdf
ManishaS49
 
Deep Learning Class #0 - You Can Do It
Deep Learning Class #0 - You Can Do ItDeep Learning Class #0 - You Can Do It
Deep Learning Class #0 - You Can Do It
Holberton School
 
DL Classe 0 - You can do it
DL Classe 0 - You can do itDL Classe 0 - You can do it
DL Classe 0 - You can do itGregory Renard
 
Learning
LearningLearning
Learningbutest
 
Evolutionary deep learning: computer vision.
Evolutionary deep learning: computer vision.Evolutionary deep learning: computer vision.
Evolutionary deep learning: computer vision.
Olivier Teytaud
 
nncollovcapaldo2013-131220052427-phpapp01.pdf
nncollovcapaldo2013-131220052427-phpapp01.pdfnncollovcapaldo2013-131220052427-phpapp01.pdf
nncollovcapaldo2013-131220052427-phpapp01.pdf
GayathriRHICETCSESTA
 
nncollovcapaldo2013-131220052427-phpapp01.pdf
nncollovcapaldo2013-131220052427-phpapp01.pdfnncollovcapaldo2013-131220052427-phpapp01.pdf
nncollovcapaldo2013-131220052427-phpapp01.pdf
GayathriRHICETCSESTA
 
From Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science TalesFrom Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science Tales
Bertram Ludäscher
 
Neural Networks and Deep Learning for Physicists
Neural Networks and Deep Learning for PhysicistsNeural Networks and Deep Learning for Physicists
Neural Networks and Deep Learning for Physicists
Héloïse Nonne
 
Nature-Inspired Optimization Algorithms
Nature-Inspired Optimization Algorithms Nature-Inspired Optimization Algorithms
Nature-Inspired Optimization Algorithms
Xin-She Yang
 

Similar to A gentle introduction to Deep Learning (20)

know Machine Learning Basic Concepts.pdf
know Machine Learning Basic Concepts.pdfknow Machine Learning Basic Concepts.pdf
know Machine Learning Basic Concepts.pdf
 
Deep learning tutorial 9/2019
Deep learning tutorial 9/2019Deep learning tutorial 9/2019
Deep learning tutorial 9/2019
 
Deep Learning Tutorial
Deep Learning TutorialDeep Learning Tutorial
Deep Learning Tutorial
 
Machine Learning ebook.pdf
Machine Learning ebook.pdfMachine Learning ebook.pdf
Machine Learning ebook.pdf
 
1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1
1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 11_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1
1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1
 
深度学习639页PPT/////////////////////////////
深度学习639页PPT/////////////////////////////深度学习639页PPT/////////////////////////////
深度学习639页PPT/////////////////////////////
 
Machine Learning: The Bare Math Behind Libraries
Machine Learning: The Bare Math Behind LibrariesMachine Learning: The Bare Math Behind Libraries
Machine Learning: The Bare Math Behind Libraries
 
Useing PSO to optimize logit model with Tensorflow
Useing PSO to optimize logit model with TensorflowUseing PSO to optimize logit model with Tensorflow
Useing PSO to optimize logit model with Tensorflow
 
ML Basic Concepts.pdf
ML Basic Concepts.pdfML Basic Concepts.pdf
ML Basic Concepts.pdf
 
Deep Learning Class #0 - You Can Do It
Deep Learning Class #0 - You Can Do ItDeep Learning Class #0 - You Can Do It
Deep Learning Class #0 - You Can Do It
 
DL Classe 0 - You can do it
DL Classe 0 - You can do itDL Classe 0 - You can do it
DL Classe 0 - You can do it
 
Learning
LearningLearning
Learning
 
Evolutionary deep learning: computer vision.
Evolutionary deep learning: computer vision.Evolutionary deep learning: computer vision.
Evolutionary deep learning: computer vision.
 
nncollovcapaldo2013-131220052427-phpapp01.pdf
nncollovcapaldo2013-131220052427-phpapp01.pdfnncollovcapaldo2013-131220052427-phpapp01.pdf
nncollovcapaldo2013-131220052427-phpapp01.pdf
 
nncollovcapaldo2013-131220052427-phpapp01.pdf
nncollovcapaldo2013-131220052427-phpapp01.pdfnncollovcapaldo2013-131220052427-phpapp01.pdf
nncollovcapaldo2013-131220052427-phpapp01.pdf
 
ppt
pptppt
ppt
 
ppt
pptppt
ppt
 
From Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science TalesFrom Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science Tales
 
Neural Networks and Deep Learning for Physicists
Neural Networks and Deep Learning for PhysicistsNeural Networks and Deep Learning for Physicists
Neural Networks and Deep Learning for Physicists
 
Nature-Inspired Optimization Algorithms
Nature-Inspired Optimization Algorithms Nature-Inspired Optimization Algorithms
Nature-Inspired Optimization Algorithms
 

More from Universidade de São Paulo

Computação: carreira e mercado de trabalho
Computação: carreira e mercado de trabalhoComputação: carreira e mercado de trabalho
Computação: carreira e mercado de trabalho
Universidade de São Paulo
 
Introdução às ferramentas de Business Intelligence do ecossistema Hadoop
Introdução às ferramentas de Business Intelligence do ecossistema HadoopIntrodução às ferramentas de Business Intelligence do ecossistema Hadoop
Introdução às ferramentas de Business Intelligence do ecossistema Hadoop
Universidade de São Paulo
 
On the Support of a Similarity-Enabled Relational Database Management System ...
On the Support of a Similarity-Enabled Relational Database Management System ...On the Support of a Similarity-Enabled Relational Database Management System ...
On the Support of a Similarity-Enabled Relational Database Management System ...
Universidade de São Paulo
 
Effective and Unsupervised Fractal-based Feature Selection for Very Large Dat...
Effective and Unsupervised Fractal-based Feature Selection for Very Large Dat...Effective and Unsupervised Fractal-based Feature Selection for Very Large Dat...
Effective and Unsupervised Fractal-based Feature Selection for Very Large Dat...
Universidade de São Paulo
 
Fire Detection on Unconstrained Videos Using Color-Aware Spatial Modeling and...
Fire Detection on Unconstrained Videos Using Color-Aware Spatial Modeling and...Fire Detection on Unconstrained Videos Using Color-Aware Spatial Modeling and...
Fire Detection on Unconstrained Videos Using Color-Aware Spatial Modeling and...
Universidade de São Paulo
 
Unveiling smoke in social images with the SmokeBlock approach
Unveiling smoke in social images with the SmokeBlock approachUnveiling smoke in social images with the SmokeBlock approach
Unveiling smoke in social images with the SmokeBlock approach
Universidade de São Paulo
 
Vertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale Graphs
Vertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale GraphsVertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale Graphs
Vertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale Graphs
Universidade de São Paulo
 
Fast Billion-scale Graph Computation Using a Bimodal Block Processing Model
Fast Billion-scale Graph Computation Using a Bimodal Block Processing ModelFast Billion-scale Graph Computation Using a Bimodal Block Processing Model
Fast Billion-scale Graph Computation Using a Bimodal Block Processing Model
Universidade de São Paulo
 
An introduction to MongoDB
An introduction to MongoDBAn introduction to MongoDB
An introduction to MongoDB
Universidade de São Paulo
 
StructMatrix: large-scale visualization of graphs by means of structure detec...
StructMatrix: large-scale visualization of graphs by means of structure detec...StructMatrix: large-scale visualization of graphs by means of structure detec...
StructMatrix: large-scale visualization of graphs by means of structure detec...
Universidade de São Paulo
 
Apresentacao vldb
Apresentacao vldbApresentacao vldb
Apresentacao vldb
Universidade de São Paulo
 
Techniques for effective and efficient fire detection from social media images
Techniques for effective and efficient fire detection from social media imagesTechniques for effective and efficient fire detection from social media images
Techniques for effective and efficient fire detection from social media images
Universidade de São Paulo
 
Multimodal graph-based analysis over the DBLP repository: critical discoverie...
Multimodal graph-based analysis over the DBLP repository: critical discoverie...Multimodal graph-based analysis over the DBLP repository: critical discoverie...
Multimodal graph-based analysis over the DBLP repository: critical discoverie...
Universidade de São Paulo
 
Supervised-Learning Link Recommendation in the DBLP co-authoring network
Supervised-Learning Link Recommendation in the DBLP co-authoring networkSupervised-Learning Link Recommendation in the DBLP co-authoring network
Supervised-Learning Link Recommendation in the DBLP co-authoring network
Universidade de São Paulo
 
Graph-based Relational Data Visualization
Graph-based RelationalData VisualizationGraph-based RelationalData Visualization
Graph-based Relational Data Visualization
Universidade de São Paulo
 
Reviewing Data Visualization: an Analytical Taxonomical Study
Reviewing Data Visualization: an Analytical Taxonomical StudyReviewing Data Visualization: an Analytical Taxonomical Study
Reviewing Data Visualization: an Analytical Taxonomical Study
Universidade de São Paulo
 
Complexidade de Algoritmos, Notação assintótica, Algoritmos polinomiais e in...
Complexidade de Algoritmos, Notação assintótica, Algoritmos polinomiais e in...Complexidade de Algoritmos, Notação assintótica, Algoritmos polinomiais e in...
Complexidade de Algoritmos, Notação assintótica, Algoritmos polinomiais e in...
Universidade de São Paulo
 
Dawarehouse e OLAP
Dawarehouse e OLAPDawarehouse e OLAP
Dawarehouse e OLAP
Universidade de São Paulo
 
Visualization tree multiple linked analytical decisions
Visualization tree multiple linked analytical decisionsVisualization tree multiple linked analytical decisions
Visualization tree multiple linked analytical decisions
Universidade de São Paulo
 
Frequency plot and relevance plot to enhance visual data exploration
Frequency plot and relevance plot to enhance visual data explorationFrequency plot and relevance plot to enhance visual data exploration
Frequency plot and relevance plot to enhance visual data exploration
Universidade de São Paulo
 

More from Universidade de São Paulo (20)

Computação: carreira e mercado de trabalho
Computação: carreira e mercado de trabalhoComputação: carreira e mercado de trabalho
Computação: carreira e mercado de trabalho
 
Introdução às ferramentas de Business Intelligence do ecossistema Hadoop
Introdução às ferramentas de Business Intelligence do ecossistema HadoopIntrodução às ferramentas de Business Intelligence do ecossistema Hadoop
Introdução às ferramentas de Business Intelligence do ecossistema Hadoop
 
On the Support of a Similarity-Enabled Relational Database Management System ...
On the Support of a Similarity-Enabled Relational Database Management System ...On the Support of a Similarity-Enabled Relational Database Management System ...
On the Support of a Similarity-Enabled Relational Database Management System ...
 
Effective and Unsupervised Fractal-based Feature Selection for Very Large Dat...
Effective and Unsupervised Fractal-based Feature Selection for Very Large Dat...Effective and Unsupervised Fractal-based Feature Selection for Very Large Dat...
Effective and Unsupervised Fractal-based Feature Selection for Very Large Dat...
 
Fire Detection on Unconstrained Videos Using Color-Aware Spatial Modeling and...
Fire Detection on Unconstrained Videos Using Color-Aware Spatial Modeling and...Fire Detection on Unconstrained Videos Using Color-Aware Spatial Modeling and...
Fire Detection on Unconstrained Videos Using Color-Aware Spatial Modeling and...
 
Unveiling smoke in social images with the SmokeBlock approach
Unveiling smoke in social images with the SmokeBlock approachUnveiling smoke in social images with the SmokeBlock approach
Unveiling smoke in social images with the SmokeBlock approach
 
Vertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale Graphs
Vertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale GraphsVertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale Graphs
Vertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale Graphs
 
Fast Billion-scale Graph Computation Using a Bimodal Block Processing Model
Fast Billion-scale Graph Computation Using a Bimodal Block Processing ModelFast Billion-scale Graph Computation Using a Bimodal Block Processing Model
Fast Billion-scale Graph Computation Using a Bimodal Block Processing Model
 
An introduction to MongoDB
An introduction to MongoDBAn introduction to MongoDB
An introduction to MongoDB
 
StructMatrix: large-scale visualization of graphs by means of structure detec...
StructMatrix: large-scale visualization of graphs by means of structure detec...StructMatrix: large-scale visualization of graphs by means of structure detec...
StructMatrix: large-scale visualization of graphs by means of structure detec...
 
Apresentacao vldb
Apresentacao vldbApresentacao vldb
Apresentacao vldb
 
Techniques for effective and efficient fire detection from social media images
Techniques for effective and efficient fire detection from social media imagesTechniques for effective and efficient fire detection from social media images
Techniques for effective and efficient fire detection from social media images
 
Multimodal graph-based analysis over the DBLP repository: critical discoverie...
Multimodal graph-based analysis over the DBLP repository: critical discoverie...Multimodal graph-based analysis over the DBLP repository: critical discoverie...
Multimodal graph-based analysis over the DBLP repository: critical discoverie...
 
Supervised-Learning Link Recommendation in the DBLP co-authoring network
Supervised-Learning Link Recommendation in the DBLP co-authoring networkSupervised-Learning Link Recommendation in the DBLP co-authoring network
Supervised-Learning Link Recommendation in the DBLP co-authoring network
 
Graph-based Relational Data Visualization
Graph-based RelationalData VisualizationGraph-based RelationalData Visualization
Graph-based Relational Data Visualization
 
Reviewing Data Visualization: an Analytical Taxonomical Study
Reviewing Data Visualization: an Analytical Taxonomical StudyReviewing Data Visualization: an Analytical Taxonomical Study
Reviewing Data Visualization: an Analytical Taxonomical Study
 
Complexidade de Algoritmos, Notação assintótica, Algoritmos polinomiais e in...
Complexidade de Algoritmos, Notação assintótica, Algoritmos polinomiais e in...Complexidade de Algoritmos, Notação assintótica, Algoritmos polinomiais e in...
Complexidade de Algoritmos, Notação assintótica, Algoritmos polinomiais e in...
 
Dawarehouse e OLAP
Dawarehouse e OLAPDawarehouse e OLAP
Dawarehouse e OLAP
 
Visualization tree multiple linked analytical decisions
Visualization tree multiple linked analytical decisionsVisualization tree multiple linked analytical decisions
Visualization tree multiple linked analytical decisions
 
Frequency plot and relevance plot to enhance visual data exploration
Frequency plot and relevance plot to enhance visual data explorationFrequency plot and relevance plot to enhance visual data exploration
Frequency plot and relevance plot to enhance visual data exploration
 

Recently uploaded

一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 

Recently uploaded (20)

一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 

A gentle introduction to Deep Learning

  • 1. A gentle introduction to Deep Learning Jose Fernando Rodrigues-Jr University of Sao Paulo, Brazil Supervision: Sihem Amer-Yahia Université Grenoble Alpes, France Funding: Fundação de Amparo à Pesquisa do Estado de São Paulo (Fapesp) Grant 2018/17620-5
  • 2. Laboratoire d’Informatique de Grenoble /66 About me and my university ● The University of Sao Paulo -Ranked number 2 among latin-american universities -Ranked in the 250-300 stratum in the world (UGA is in the 300-350 stratum) (source: Times Higher Education, 2019) ● Faculty at University of Sao Paulo since 2010, associate professor since 2014 ● My campus is in the city of Sao Carlos, country side of the state of Sao Paulo ● The HDI of Sao Carlos is 0.805 (Brazil is 0.754 and France is 0.897) 2
  • 3. Laboratoire d’Informatique de Grenoble /66 About me and my university ● The University of Sao Paulo -Ranked number 2 among latin-american universities -Ranked in the 250-300 stratum in the world (UGA is in the 300-350 stratum) (source: Times Higher Education, 2019) ● Faculty at University of Sao Paulo since 2010, associate professor since 2014 ● My campus is the city of Sao Carlos, country side of the state of Sao Paulo ● The IDH of Sao Carlos is 0.805 (Brazil is 0.754 and France is 0.897) 3
  • 4. Laboratoire d’Informatique de Grenoble /66 Deep Learning -From the IEEE top 10 computing trends 2018, Deep Learning is the number 1 https://www.computer.org/press-room/2017-news/top-technology-trends-2018; -Not new: most of the techniques are 20, 30, even 50, years old; -Not necessarily deep: some architectures have one single (hidden) layer; -Myth: it is about artificial intelligence, not artificial conscience. 4
  • 5. Laboratoire d’Informatique de Grenoble /66 Deep Learning Specifically, Deep Learning refers to the revival of artificial intelligence (artificial neural networks) due to four factors: 1) lots of data: while a child learns what a dog looks like from three images, a computer demands 3 million images; 2) computing power: 2.0xx computers have memory and processing power orders of magnitude higher than 19xx computers; GPUs scaled the process even more; 3) algorithmic improvements: gradient descent, back propagation and architectural innovations amplified the range of possibilities; 4) robust frameworks: Theano, TensorFlow, Keras, and many others made complex parallel math computing accessible. 5
  • 6. Laboratoire d’Informatique de Grenoble /66 Image classification breakthrough Large Scale Visual Recognition Challenge (ILSVRC) - ImageNet for short 2017 Training: 1.2 million images Validation: 150.000 images Test: 50.000 images 1.000 classes 6
  • 7. Laboratoire d’Informatique de Grenoble /66 Image classification breakthrough Large Scale Visual Recognition Challenge (ILSVRC) - ImageNet for short Super-human performance 2017 Training: 1.2 million images Validation: 150.000 images Test: 50.000 images 1.000 classes 7
  • 8. Laboratoire d’Informatique de Grenoble /66 No more features engineering 8
  • 9. Laboratoire d’Informatique de Grenoble /66 No more features engineering 9 ● The idea of features engineering applies to data processing problems that demand features extraction. This is not always the case; ● Yet, it is still possible to use Artificial Neural Networks with manually extracted features - sometimes it is the only course of action, like in regression problems.
  • 10. Laboratoire d’Informatique de Grenoble /66 Promising results Soon (or already?) better than human skills: -Computer Vision -Text translation -Text generation -Games: Go, Chess, … -Medicine: heart attack, neuro degenerative diseases, oncology…. Esteva, A. et al.; Dermatologist-level classification of skin cancer with deep neural networks, Nature, 2017 -super-human performance on classifying skin lesions -identified classes still unknown in the literature -1500+ citations in one year 10
  • 11. Laboratoire d’Informatique de Grenoble /66 Turing award 2018 Yoshua Bengio, French-Canadian (theoretical backgrounds) Geoffrey Hinton, British-Canadian (back propagation, AlexNet) Yann LeCun, French (convolutional networks and engineering) “For conceptual and engineering breakthroughs that have made deep neural networks a critical component of computing.” 11
  • 12. Laboratoire d’Informatique de Grenoble /66 Context 12
  • 13. Laboratoire d’Informatique de Grenoble /66 Biology inspiration 13
  • 14. Laboratoire d’Informatique de Grenoble /66 Biology inspiration 14 ● Inspiration only - not simulation. It is not yet fully understood how the brain works.
  • 15. Laboratoire d’Informatique de Grenoble /66 Existential parenthesis Why neurons? -The universe is made up of sets (for Comp. Science, unordered lists without repetition) -In a world of sets, what do smart things do? Ans.: they build up functions (or maps, for CS) -What is a function, broadly speaking? Given two sets X and Y, a function defines a mapping between then: f: X→ Y X and Y can be anything, objects, emotions, concepts, abstractions, skills, music, … -To do that, nature (evolution) designed specialized cells, named neuros A very big bunch of neurons is able to build functions! 15
  • 16. Laboratoire d’Informatique de Grenoble /66 Existential parenthesis Why neurons? -The universe is made up of sets (for CC, unordered lists without repetition) -In a world of sets, what do smart things do? Ans.: they build up functions (or maps, for CC) -What is a function, broadly speaking? Given two sets X and Y, a function defines a mapping between then: f: X→ Y X and Y can be anything, objects, emotions, concepts, abstractions, skills, music, … -To do that, nature (evolution) designed specialized cells, named neuros A very big bunch of neurons is able to build functions! 16 First key concept: 1) An Artificial Neuron Network is a function;
  • 17. Laboratoire d’Informatique de Grenoble /66 Existential parenthesis Why neurons? -The universe is made up of sets (for CC, unordered lists without repetition) -In a world of sets, what do smart things do? Ans.: they build up functions (or maps, for CC) -What is a function, broadly speaking? Given two sets X and Y, a function defines a mapping between then: f: X→ Y X and Y can be anything, objects, emotions, concepts, abstractions, skills, music, … -To do that, nature (evolution) designed specialized cells, named neuros A very big bunch of neurons is able to build functions! Compared to numeric Math sets, smart beings deal with sets whose all elements cannot be foreseen, not even exhaustively. Math function: f: NI → IR; for example f(x) = xe Open function: f: {all possible dogs} → {all known dog breeds} Compared to numeric Math sets, smart beings deal with sets with domain's having a number of unique elements that cannot be completely foreseen, not even exhaustively. Math function: f: NI → IR; for example f(x) = xe Open function: f: {all possible dogs} → {all known dog breeds} Akita, Alaskan husky, Bichon Frisé, Border Terrier, Boxer, Brazilian Mastiff, …. ? 17
  • 18. Laboratoire d’Informatique de Grenoble /66 Principle - artificial neuron 18
  • 19. Laboratoire d’Informatique de Grenoble /66 Principle - artificial neuron 19
  • 20. Laboratoire d’Informatique de Grenoble /66 Principle - artificial neuron 20 In matrix form ⇒ Very important ● 1 input 1 x n feature vector: ● 1 processing n x 1 neuron: 0 ... n 0 ... n
  • 21. Laboratoire d’Informatique de Grenoble /66 Principle - artificial neuron 21 In matrix form ⇒ Very important ● j input 1 x n feature vectors: ● k processing n x 1 neurons: 0 ... n-1 0 ... n 0 ... n-1 0 ... n-1 ... 0: 1: j-1: 0 ... n 0 ... n ... 0: 1: k-1: = Ij x n = Mn x k
  • 22. Laboratoire d’Informatique de Grenoble /66 Principle - artificial neuron 22 In matrix form ⇒ Very important ● j input 1 x n feature vectors: ● k processing n x 1 neurons: 0 ... n-1 0 ... n 0 ... n-1 0 ... n-1 ... 0: 1: j-1: 0 ... n 0 ... n ... 0: 1: k-1: = I = M Now, remember matrix dot product: And the neuron principle:
  • 23. Laboratoire d’Informatique de Grenoble /66 Principle - artificial neuron 23 Suppose: ● j input 1 x n=10 feature vectors: ● k=5 processing neurons 10 x 5: 0 ... n-1 0 ... n 0 ... n-1 0 ... n-1 ... 0: 1: j-1: 0 ... n 0 ... n ... 0: 1: k-1: = I⇒ j x 10 matrix = M ⇒ 10 x 5 matrix 10 features 10 weights 5 neurons
  • 24. Laboratoire d’Informatique de Grenoble /66 Principle - artificial neuron 24 Suppose: ● j input 1 x n=10 feature vectors: ● k=5 processing 10 x 1 neurons: 0 ... n-1 0 ... n 0 ... n-1 0 ... n-1 ... 0: 1: j-1: 0 ... n 0 ... n ... 0: 1: k-1: = I⇒ j x 10 matrix = M ⇒ 10 x 5 matrix The processing of the j 1x10 vectors by the 10x5 neurons is represented in the figure: Which corresponds to the dot product Ijx10 .M10x5 The output is a matrix O corresponding to j new vectors, each with 5 transformed features, that is 0j x 5 Ijx10 M10x5
  • 25. Laboratoire d’Informatique de Grenoble /66 Supervised learning 25 Training: I know the answer → Learning, building model Testing: I do not know the answer → Evaluation, using model
  • 26. Laboratoire d’Informatique de Grenoble /66 After all, an optimization problem *Biases omitted for simplicity 26
  • 27. Laboratoire d’Informatique de Grenoble /66 After all, an optimization problem *Biases omitted for simplicity 27
  • 28. Laboratoire d’Informatique de Grenoble /66 After all, an optimization problem *Biases omitted for simplicity 28
  • 29. Laboratoire d’Informatique de Grenoble /66 After all, an optimization problem *Biases omitted for simplicity parameters (mostly, weights) 29
  • 30. Laboratoire d’Informatique de Grenoble /66 Principle - artificial neuron 30 Suppose: ● j input 1 x n=10 feature vectors: ● k=5 processing neurons 10 x 5: 0 ... n-1 0 ... n 0 ... n-1 0 ... n-1 ... 0: 1: j-1: 0 ... n 0 ... n ... 0: 1: k-1: = I⇒ j x 10 matrix = M ⇒ 10 x 5 matrix 10 features 10 weights 5 neurons This is the object of the optimization, what weights lead to the desired output?
  • 31. Laboratoire d’Informatique de Grenoble /66 After all, an optimization problem *Biases omitted for simplicity parameters 31 Second key concept: 1) An Artificial Neuron Network is a function; 2) The training of an ANN is an optimization problem;
  • 32. Laboratoire d’Informatique de Grenoble /66 After all, an optimization problem *Biases omitted for simplicity parameters 32 Attention: - This presentation is only about the basics; in fact, it covers concepts on Artificial Neural Networks; - When features extraction is involved, like in image, and audio processing, the process is much more complex; - Actually, the deepness of "Deep Learning" has to do with these more complex problems; - Nevertheless, the principles are the same.
  • 33. Laboratoire d’Informatique de Grenoble /66 Overall (theoretical) process 1. Specify a structure and a loss function to guide the optimization; 2. Feed forward with matrix multiplication and non-linear activations; 3. while (not satisfactory results) a. Compute the parameters’ adjust using gradient descent; b. The network backpropagates using the multivariate chain rule; c. Update the weights accordingly; d. Classification/Regression. 33
  • 34. Laboratoire d’Informatique de Grenoble /66 Overall (theoretical) process 1. Specify a structure and a loss function to guide the optimization; 2. Feed forward with matrix multiplication and non-linear activations; 3. while (not satisfactory results) a. Compute the parameters’ adjust using gradient descent; b. The network backpropagates using the multivariate chain rule; c. Update the weights accordingly; d. Classification/Regression. 34
  • 35. Laboratoire d’Informatique de Grenoble /66 Error landscape ● The set of parameters defines an error landscape ● We want to move along this landscape to find the best minimum (preferably the global minimum) 35Error landscape
  • 36. Laboratoire d’Informatique de Grenoble /66 How to converge to the proper parameters? The standard solution is the gradient descent algorithm 1.Calculate the partial derivative 2.Backpropagate updating W as 3.Use the chain rule to propagate through all the layers Loss W 36
  • 37. Laboratoire d’Informatique de Grenoble /66 How to converge to the proper parameters? The standard solution is the gradient descent algorithm 1.Calculate the partial derivative 2.Backpropagate updating W as 3.Use the chain rule to propagate through all the layers Loss W 37 The learning rate states how much to move in the direction contrary to the gradient.
  • 38. Laboratoire d’Informatique de Grenoble /66 How to converge to the proper parameters? 38
  • 39. Laboratoire d’Informatique de Grenoble /66 How to converge to the proper parameters? 39 Third key concept: 1) An Artificial Neuron Network is a function; 2) The training of an ANN is an optimization problem; 3) Gradient descent is the ultimate method to move along the error landscape;
  • 40. Laboratoire d’Informatique de Grenoble /66 Different gradient descent methods ● There are many gradient descent-based optimizers; ● They vary with respect to the speed of convergence, processing cost, learning rate, and decay factor; ● Adadelta is the most robust and widely used; ○ It is stochastic, hence, more robust against local minima 40
  • 41. Laboratoire d’Informatique de Grenoble /66 Different gradient descent methods ● There are many gradient descent-based optimizers; ● They vary with respect to the speed of convergence, processing cost, learning rate, and decay factor; ● Adadelta is the most robust and widely used; ○ It is stochastic, hence, more robust against local minima 41 Adadelta uses adaptive learning rate; the closer to a minimum, the smaller the learning rate.
  • 42. Laboratoire d’Informatique de Grenoble /66 Millions of parameters ● Warning: even for mid-sized networks, the number of weights sums up to thousands, even millions; ● This is responsible for the high computational cost of deep learning 42
  • 43. Laboratoire d’Informatique de Grenoble /66 Millions of parameters ● Warning: even for mid-sized networks, the number of weights sums up to thousands, even millions; ● This is responsible for the high computational cost of deep learning 43
  • 44. Laboratoire d’Informatique de Grenoble /66 Deep Learning Frameworks Implementing all these concepts from scratch is very hard (really!); To ease the process, academic and industrial players built frameworks that: -make linear algebra expression as simple as scalar algebra expression; -calculate partial derivatives automatically (one line of code); -perform back propagation; -distribute the computation over GPUs. Main frameworks: Theano, Google TensorFlow, Microsoft Cognitive Toolkit, PyTorch, Keras, Apache MXNet, NVIDIA Caffe, Chainer, and others. 44
  • 45. Laboratoire d’Informatique de Grenoble /66 Deep Learning Frameworks Oh, it is so easy! - NO! You still have to: -model the data input and output; a big deal of numbers organized in multi-dimensional arrays; -model the layers in terms of size and connectivity -- matrix dimensionality will give you headaches; -implement the neurons’ computations; -implement the updating scheme; -get used to symbolic coding. 45
  • 46. Laboratoire d’Informatique de Grenoble /66 How to choose a Framework ● You are a PhD student, or Posdoc, on DL itself: Theano, TensorFlow, Torch ● You want to use DL only to get features: Keras, Caffe ● You work in industry: TensorFlow, Caffe ● You started your 2 months internship: Keras, Caffe ● You want to give practice works to your students: Keras, Caffe ● You are curious about deep learning: Caffe ● You don’t even know python: Keras, Torch Source: https://project.inria.fr/deeplearning/files/2016/05/DLFrameworks.pdf 46
  • 47. Laboratoire d’Informatique de Grenoble /66 How to choose a Framework ● You are a PhD student, or Posdoc, on DL itself: Theano, TensorFlow, Torch ● You want to use DL only to get features: Keras, Caffe ● You work in industry: TensorFlow, Caffe ● You started your 2 months internship: Keras, Caffe ● You want to give practise works to your students: Keras, Caffe ● You are curious about deep learning: Caffe ● You don’t even know python: Keras, Torch Source: https://project.inria.fr/deeplearning/files/2016/05/DLFrameworks.pdf 47
  • 48. Laboratoire d’Informatique de Grenoble /66 Pitfalls ● Proper pre-processing; ● Optimizing the structure can be a never ending process; ● Preventing over or under fitting; ● Getting it to converge (to a high-quality local minimum); ● Making sure you have the right loss function; ● Doing data augmentation correctly. Time-consuming ● Testing a single idea can take a week or more; ● Preprocessing large data takes long time; ● Symbolic programing is tough; ● Hyper-parameters + process variations ⇒ number of possible settings explode. 48
  • 49. Laboratoire d’Informatique de Grenoble /66 Further concepts beyond the introduction ● Regularization (L1, L2,...) ● Cost (Loss) Function (exponential, cross-entropy, hellinger, …) ● Activation Function (ReLU, Hyperbolic tangent, sigmoid, …) ● Output layer (softmax, linear, …) ● Linear algebra using broadcasting ● Specialized layers (convolution, pooling, embedding, ...) ● Dropout, masking, padding, ... 49
  • 50. Laboratoire d’Informatique de Grenoble /66 The DL zoo https://towardsdatascience.com/the-mostly-complete-chart-of-neural-networks-explained-3fb6f2367464 50
  • 51. Laboratoire d’Informatique de Grenoble /66 Further concepts beyond the introduction ● Regularization (L1, L2,...) ● Cost (Loss) Function (exponential, cross-entropy, hellinger, …) ● Activation Function (ReLU, Hyperbolic tangent, sigmoid, …) ● Output layer (softmax, linear, …) ● Linear algebra using broadcasting ● Specialized layers (convolution, pooling, embedding, ...) ● Dropout, masking, padding, ... 51 Key concepts 1) An Artificial Neuron Network is a function; 2) The training of an ANN is an optimization problem; 3) Gradient descent is the ultimate method to move along the error landscape.
  • 52. Laboratoire d’Informatique de Grenoble /66 That’s it for now 52