https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
4. 4
Outline
1. Supervised learning: regression/classification
2. Single neuron models (perceptrons)
a. Linear regression
b. Logistic regression
c. Multiple outputs and softmax regression
3. Limitations of the perceptron
5. 5
Outline
1. Supervised learning: regression/classification
2. Single neuron models (perceptrons)
a. Linear regression
b. Logistic regression
c. Multiple outputs and softmax regression
3. Limitations of the perceptron
12. Regression vs Classification
Depending on the type of target 퐲 we get:
● Regression: 퐲 ∈ ℝN
is continuous (e.g. temperatures 퐲 = {19º, 23º, 22º})
● Classification: 퐲 is discrete (e.g. 퐲 = {“dog”,”cat”,”ostrich”}).
12
13. Regression vs Classification
Depending on the type of target 퐲 we get:
● Regression: 퐲 ∈ ℝN
is continuous (e.g. temperatures 퐲 = {19º, 23.2º, 22.8º})
● Classification: 퐲 is discrete (e.g. 퐲 = {“dog”, “cat”, “ostrich”}).
13
15. Linear Regression (eg. 1D input - 1D ouput)
15
= w · x + b
Training a model means learning
parameters w and b from data.
16. Linear Regression (M-D input)
16
Input data can also be M-dimensional with vector x:
y = wT
· x + b = w1·x1 + w2·x2 + w3·x3 + … + wM·xM + b
e.g. we want to predict the price of a house (y) based on:
x1 = square-meters (sqm)
x2,3 = location (lat, lon)
x4 = year of construction (yoc)
y = price = w1·(sqm) + w2·(lat) + w3·(lon) + w4·(yoc) + b
17. Regression vs Classification
Depending on the type of target 퐲 we get:
● Regression: 퐲 ∈ ℝN
is continuous (e.g. temperatures 퐲 = {19º, 23º, 22º})
● Classification: 퐲 is discrete (e.g. 퐲 = {”dog”,”cat”,”ostrich”}).
17
20. Multi-class Classification
● Classification: 퐲 is discrete (e.g. 퐲 = {”dog”,”cat”,”ostrich”}.
○ Classes are often coded as one-hot vector (each class corresponds to a different dimension of
the output space)
20
Perronin, F., CVPR Tutorial on LSVR @ CVPR’14, Output embedding for LSVR
[1,0,0]
[0,1,0]
[0,0,1]
One-hot
representations
21. 21
Discussion
Should you treat these three problems as classification or as regression problems?
Problem Regression ? Classification ?
Predicting whether stock price of a company will
increase tomorrow
Predict the number of copies a music album will be sold
next month
Predicting the gender of a person by his/her
handwriting style
22. 22
Outline
1. Supervised learning: regression/classification
2. Single neuron models (perceptrons)
a. Linear regression
b. Logistic regression
c. Multiple outputs and softmax regression
3. Limitations of the perceptron
23. 23
The Perceptron is seen as an analogy to a biological neuron.
Biological neurons fire an impulse once the sum of all inputs is over a threshold.
The perceptron acts like a switch (learn how in the next slides...).
Single neuron model (perceptron)
24. Single Neuron Model (Perceptron)
The perceptron can address both regression or classification
problems, depending on the chosen activation function.
24
26. Single neuron model (perceptron)
26
Weights and bias are the parameters that define the behavior (must be estimated).
27. Single neuron model (perceptron)
27
The output y is derived from a sum of the weighted inputs plus a bias term.
28. 28
Outline
1. Supervised learning: regression/classification
2. Single neuron models (perceptrons)
a. Linear regression
b. Logistic regression
c. Multiple outputs and softmax regression
3. Limitations of the perceptron
29. Single neuron model: Linear Regression
29
The perceptron can solve linear regression problems when f(a)=a. [identity]
30. 30
Outline
1. Supervised learning: regression/classification
2. Single neuron models (perceptrons)
a. Linear regression
b. Logistic regression
c. Multiple outputs and softmax regression
3. Limitations of the perceptron
31. Single neuron model: Regression
31
More interesting when the activation function f(a) is not the identity, but:
32. Single neuron model: Logistic Regression
32
More interesting when the activation function f(a) is not the identity, but:
33. Single neuron model: Logistic Regression
33
The sigmoid function σ(x) or logistic curve maps any input x between [0,1]:
34. Single neuron model: Logistic Regression
34
The perceptron is suitable for classification problems when f(a)=σ(a). [sigmoid]
Logits
35. Single neuron model: Binary Classification
35
For classification, regressed values should b collapsed into 0 and 1 to quantize the
confidence of the predictions (“probabilities”).
Threshold (thr)
36. Single neuron model: Binary Classification
36
y thr → class 1
(eg. green)
y thr → class 2
(eg. red)
Setting a threshold (thr) at the output of the perceptron allows solving
classification problems between two classes (binary):
Logits
37. Single neuron model: Binary Classification
37
The classification threshold can be adjusted based on the desired precision - recall
trade-off:
High precision low
recall for class green
Low precision high
recall for class green
38. 38
Outline
1. Supervised learning: regression/classification
2. Single neuron models (perceptrons)
a. Linear regression
b. Logistic regression
c. Softmax regression
3. Limitations of the perceptron
39. Softmax regression: Binary case
39
J. Alammar, “A visual and interactive guide to the Basics of Neural Networks” (2016)
Normalization factor so that the
sum of probabilities sum up to 1.
Softmax
regression
40. 40
Softmax regression: Multiclass (N classes)
40
Multiple classes can be predicted by putting many neurons in parallel, each
processing its binary output out of N possible classes.
0.3 “dog”
0.08 “cat”
0.6 “whatever”
raw pixels
unrolled img
Normalization factor so that the
sum of probabilities sum up to 1.
Softmax
regression
45. Exercise
45
Consider a binary classifier implemented with a single
neuron modelled by two weights w1
=0.2 and w2
=0.8 and a
bias b=-1. Consider the activation function to be a sigmoid
f(x) = 1 / (1+e-x
).
a) Draw a scheme of the model.
b) Compute the output of the logistic regressor for a given
input x=[1,1].
c) Considering a classification threshold of yth
=0 (yth
0.9 for
class A, and yth
0.9 for class B), which class would be
predicted for the considered input x=[1,1] ?
46. 46
Outline
1. Supervised learning: regression/classification
2. Single neuron models (perceptrons)
a. Linear regression
b. Logistic regression
c. Multiple outputs and softmax regression
3. Limitations of the perceptron
47. Limitations of the Perceptrons
47Minsky, Marvin, and Seymour A. Papert. Perceptrons: An introduction to computational geometry. 1969
48. Limitations of the Perceptrons
48
x1
x2
Class 0
Class 1
2D input space data
Parameters of the line.
They are find based on training data
- Learning Stage.
x
49. Limitations of the Perceptrons
49
Input 1 Input 2 Desired
Output
0 0 0
0 1 1
1 0 1
1 1 0
XOR logic table
1
0
0
1 Input 1
Input 2
Data might be non linearly separable
→ One single neuron is not enough
?
50. Limitations of the Perceptrons
50
Perceptrons can only produce linear
decision boundaries.
Real world problems often need
non-linear boundaries
● Images
● Audio
● Text
51. Limitations of the Perceptrons
51
What can we do?
1. Use a non-linear classifier
○ Decision trees (and forests)
○ K nearest neighbours
2. Engineer a suitable representation
○ One in which features are more linearly separable
○ Then use a linear model
3. Engineer a kernel
○ Design a kernel K(x1
, x2
)
○ Use kernel methods (e.g. SVM)
4. Learn a suitable representation space from the data
○ Deep learning, deep neural networks
○ Boosted cascade classifiers like Viola Jones also take this approach
52. 52
Discussion
One single perceptron can solve classification problems
imposing non-linear boundaries...
● Always
● Only if the activation function is not linear.
● Only if the loss function is the binary cross-entropy.
● Never.
53. Prepare your the lecture...
53
How to solve the XOR problem with two perceptrons ?
Rumelhart, D. E., McClelland, J. L. (1986). Parallel distributed processing: explorations in the
microstructure of cognition. volume 1. Foundations.
54. Suggested readings for the ext lecture
54
Neural networks as universal approximators
Hornik, Kurt, Maxwell Stinchcombe, and Halbert White. Multilayer feedforward networks are
universal approximators. Neural networks 2, no. 5 (1989): 359-366.