The Perceptron (D1L1 Insight@DCU Machine Learning Workshop 2017)

1
DEEP
LEARNING
WORKSHOP
Dublin City University
27-28 April 2017
Xavier Giro-i-Nieto
xavier.giro@upc.edu
Associate Professor
Universitat Politecnica de Catalunya
Technical University of Catalonia
The Perceptron
Day 1 Lecture 1
#InsightDL2017

4
Outline
1. Supervised learning: Regression/Classification
2. Linear regression
3. Logistic regression
4. The Perceptron
5. Multi-class classification

5
Machine Learning techniques
We can categorize three types of learning procedures:
1. Supervised Learning:
= ƒ( )
2. Unsupervised Learning:
ƒ( )
3. Reinforcement Learning:
= ƒ( )
We have a labeled dataset with pairs (x, y), e.g.
classify a signal window as containing speech or not:
x1 = [x(1), x(2), …, x(T)] y1 = “no”
x2 = [x(T+1), …, x(2T)] y2 = “yes”
x3 = [x(2T+1), …, x(3T)] y3 = “yes”
...

6
Supervised Learning
Build a function: = ƒ( ), ∈ ℝm
, ∈ ℝⁿ
Depending on the type of outcome we get…
● Regression: is continous (e.g. temperature samples = {19º, 23º, 22º})
● Classification: is discrete (e.g. = {1, 2, 5, 2, 2}).
○ Beware! These are unordered categories, not numerically meaningful
outputs: e.g. code[1] = “dog”, code[2] = “cat”, code[5] = “ostrich”, ...

7
Regression motivation
Text to Speech: Textual features → Spectrum of speech (many coefficients)
TXT
Designed
feature
extraction
ft 1
ft 2
ft 3
Regression
module
s1
s2
s3
wavegen
“Hand-crafted”
features
“Hand-crafted”
features

8
Classification motivation
Automatic Speech Recognition: Acoustic features → Textual transcription (words)
Designed
feature
extraction
s1
s2
s3
Classifier “hola que tal”
“Hand-crafted”
features

9
What “deep-models” means nowadays
Learn the representations as well, not only the final mapping → end2end
Learned
extraction
Classifier
Model maps raw inputs to raw outputs, no
intermediate blocks.
End2end model
“hola que tal”

10
Linear Regression
y = w · x + b
Linear
x
Training a model means learning
parameters w and b from data.

11
Linear Regression
y = w · x + b
Linear
x
Input
variable x is
1D in this
example

12
Linear Regression
Input data can also be M-dimensional with vector x:
y = wT
· x + b = w1·x1 + w2·x2 + w3·x3 + … + wM·xM + b
e.g. we want to predict the price of a house (y) based on:
x1 = square-meters (sqm)
x2,3 = location (lat, lon)
x4 = year of construction (yoc)
y = price = w1·(sqm) + w2·(lat) + w3·(lon) + w4·(yoc) + b

13
Logistic Regression for Classification
For classification, regressed values must be bounded between 0 and 1 to
represent probabilities.
The sigmoid function maps any input x between [0,1]:
Sigmoid
Linear
regressor

14
Setting a threshold (thr) at the output of the perceptron allows solving classification
problems between two classes (binary):
Logistic Regression for Classification

15
The Perceptron (Neuron)
The Perceptron can represent both linear & logistic regression:
if ƒ(a)=a → linear
if ƒ(a)=sigmoid(a) → logistic

16
The output is derived by a sum of the weighted inputs plus a bias term.

17
Weights and bias are the parameters that define the behavior (must be learned).

18
The Perceptron is seen as an analogy to a biological neuron.
Biological neurons fire an impulse once the sum of all inputs is over a threshold.
The sigmoid emulates the thresholding behavior → act like a switch.
Figure credit:Introduction to AI

19
Binary classification with one neuron
Setting a threshold (thr) at the output of the perceptron allows solving classification
problems between two classes (binary):
y > thr → class 1
(eg. YES)
y < thr → class 2
(eg. NO)

20
Binary classification with two neurons
Probability estimations for each
class can also be obtained by
softmax normalization on the
output of two neurons, one
specialised for each class.
J. Alammar, “A visual and interactive guide to the Basics of Neural Networks” (2016)
Softmax
normalization

21
Binary classification with two neurons
Normalization factor so that the
sum of probabilities sum up to 1.
J. Alammar, “A visual and interactive guide to the Basics of Neural Networks” (2016)
Softmax
normalization

22
Three-class classification with 3 neurons
TensorFlow, “MNIST for ML beginners”

23

24

25
Multi-class classification
Multiple classes can be predicted by putting many neurons in parallel, each
processing its binary output out of N possible classes.
0.3 “dog”
0.08 “cat”
0.6 “whatever”
raw pixels
unrolled img
Normalization factor,
remember: we want a pdf at
the output! → all output P’s
sum up to 1.
Softmax function

26
Outline
1. Supervised learning: Regression/Classification
2. Linear regression
3. Logistic regression
4. The Perceptron
5. Multi-class classification

27
Thanks ! Q&A ?
https://imatge.upc.edu/web/people/xavier-giro
@DocXavi
/ProfessorXavi
#InsightDL2017

The Perceptron (D1L1 Insight@DCU Machine Learning Workshop 2017)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to The Perceptron (D1L1 Insight@DCU Machine Learning Workshop 2017)

Similar to The Perceptron (D1L1 Insight@DCU Machine Learning Workshop 2017) (20)

More from Universitat Politècnica de Catalunya

More from Universitat Politècnica de Catalunya (20)

Recently uploaded

Recently uploaded (20)

The Perceptron (D1L1 Insight@DCU Machine Learning Workshop 2017)