SlideShare a Scribd company logo
1 of 52
Download to read offline
[course site]
Javier Ruiz Hidalgo
javier.ruiz@upc.edu
Associate Professor
Universitat Politecnica de Catalunya
Technical University of Catalonia
Loss functions
Day 3 Lecture 1
#DLUPC
Me
● Javier Ruiz Hidalgo
○ Email: j.ruiz@upc.edu
○ Office: UPC, Campus Nord, D5-008
● Teaching experience
○ Basic signal processing
○ Project based
○ Image processing & computer vision
● Research experience
○ Master on hierarchical image representations by UEA (UK)
○ PhD on video coding by UPC (Spain)
○ Interests in image & video coding, 3D analysis and super-resolution
2
Acknowledgements
3
Kevin McGuinness
kevin.mcguinness@dcu.ie
Research Fellow
Insight Centre for Data Analytics
Dublin City University
Xavier Giró
xavier.giro@upc.edu
Associate Professor
ETSETB TelecomBCN
Universitat Politècnica de Catalunya
Motivation
4
Model
● 3 layer neural network (2 hidden layers)
● Tanh units (activation function)
● 512-512-10
● Softmax on top layer
● Cross entropy loss
28
28
Layer #Weights #Biases Total
1 784 x 512 512 401,920
2 512 x 512 512 262,656
3 512 x 10 10 5,130
669,706
??
Outline
● Introduction
○ Definition, properties, training process
● Common types of loss functions
○ Regression
○ Classification
● Regularization
● Example
5
Nomenclature
loss function
cost function
objective function
error function
6
=
=
=
Definition
In a supervised deep learning context the loss function
measures the quality of a particular set of parameters
based on how well the output of the network agrees with
the ground truth labels in the training data.
7
Loss function (1)
8
How good does
our network with
the training data?
labels (ground truth)
input
parameters (weights, biases)error
Deep Network
input output
Loss function (2)
● The loss function does not want to measure the entire
performance of the network against a validation/test
dataset.
9
Loss function (3)
● The loss function is used to
guide the training process
in order to find a set of
parameters that reduce the
value of the loss function.
10
Training process
Stochastic gradient descent
● Find a set of parameters which make the loss as small
as possible.
● Change parameters at a rate determined by the partial
derivatives of the loss function:
11
Properties (1)
● Minimum (0 value) when the
output of the network is equal
to the ground truth data.
● Increase value when output
differs from ground truth.
12
epoch
Loss
Similar to GTDifferent from GT
Properties (2)
13
● Ideally → convex function
Properties (2)
14
● In reality → some many parameters (in the order of
millions) than it is not convex
Properties (3)
15
● Varies smoothly with changes on the output
○ Better gradients for gradient descent
○ Easy to compute small changes in the parameters to get an
improvement in the loss
Assumptions
● For backpropagation to work:
○ Loss function can be written as an average over loss
functions for individual training examples:
○ Loss functions can be written as a function of the
output activations from the neural network.
16Neural Networks and Deep Learning
empirical risk
Outline
● Introduction
○ Definition, properties, training process
● Common types of loss functions
○ Regression
○ Classification
● Regularization
● Example
17
Common types of loss functions (1)
● Loss functions depend on the type of task:
○ Regression: the network predicts continuous,
numeric variables
■ Example: Length of fishes in images, price of a
house
■ Absolute value, square error
18
19
Loss functions for regression
L1 Distance /
Absolute Value
L2 Distance /
Euclidian Distance / Square Value
Lighter computation
(only |·|)
Heavier computation
(square)
20
Loss functions for regression
L1 Distance /
Absolute Value
L2 Distance /
Euclidian Distance / Square Value
Less penalty to errors > 1
More penalty to errors < 1
More penalty to error > 1
Less penalty to errors < 1
21
Loss functions for regression
L1 Distance /
Absolute Value
L2 Distance /
Euclidian Distance / Square Value
Less sensitive to outliers,
which are very high yi
-fθ
(xi
)
More sensitive to outliers,
which are very high yi
-fθ
(xi
)
Outline
● Introduction
○ Definition, properties, training process
● Common types of loss functions
○ Regression
○ Classification
● Regularization
● Example
22
Common types of loss functions (2)
● Loss functions depend on the type of task:
○ Classification: the network predicts categorical
variables (fixed number of classes)
■ Example: classify email as spam, classify images
of numbers.
■ hinge loss, Cross-entropy loss
23
Classification (1)
We want the network to classify the input into a
fixed number of classes
24
class “1”
class “2”
class “3”
Classification (2)
● Each input can have only one label
○ One prediction per output class
■ The network will have “k” outputs (number of classes)
25
0.1
2
1
class “1”
class “2”
class “3”
input
Network
output
scores / logits
Classification (3)
● How can we create a loss function to improve the scores?
○ Somehow write the labels (ground truth of the data) into a vector →
One-hot encoding
○ Non-probabilistic interpretation → hinge loss
○ Probabilistic interpretation: need to transform the scores into a probability
function → Softmax
26
One-hot encoding
● Transform each label into a vector (with only 1 and 0)
○ Length equal to the total number of classes “k”
○ Value of 1 for the correct class and 0 elsewhere
27
0
1
0
class “2”
1
0
0
class “1”
0
0
1
class “3”
Softmax (1)
● Convert scores into confidences/probabilities
○ From 0.0 to 1.0
○ Probability for all classes adds to 1.0
28
0.1
2
1
input
Network
output
scores / logits
0.1
0.7
0.2
probability
Softmax (2)
● Softmax function
29
0.1
2
1
input
Network
output
scores / logits
0.1
0.7
0.2
probability
scores (logits)
Neural Networks and Deep Learning (softmax)
30
Cross-entropy loss (1)
31
0.1
2
1
input
Network
output
scores / logits lk
0.1
0.7
0.2
probability S(lk
)
0
1
0
labels yk
loss function
Loss
contribution
from input xi
in a training
batch
Cross-entropy loss (2)
32
0.1
2
1
input
Network
output
scores / logits lk
0.1
0.7
0.2
probability S(lk
)
0
1
0
labels yk
loss function
In a one-hot vector, only
one yk
=1 (rest are 0) !!
Cross-entropy loss (2)
33
0.1
2
1
input
Network
output
scores / logits lk
0.1
0.7
0.2
probability S(lk
)
0
1
0
labels yk
loss function
Only the confidence for the
GT class is considered.
Cross-entropy loss (3)
34
-log penalizes small confidence
scores for GT class
Cross-entropy loss (3)
● For a set of n inputs
35
labels (one-hot)
Softmax
Cross-entropy loss (4)
● In general, cross-entropy loss works better than
square error loss:
○ Square error loss usually gives too much emphasis to
incorrect outputs.
○ In square error loss, as the output gets closer to either 0.0
or 1.0 the gradients get smaller, the change in weights gets
smaller and training is slower.
36
Weighted cross-entropy
37
● Dealing with class imbalance
○ Different number of samples for different classes
○ Introduce a weighing factor (related to inverse class frequency or treated as
hyperparameter)
class “ant”
class “panda”class “flamingo”
Multi-label classification (1)
● Outputs can be matched to more than one label
○ “car”, “automobile”, “motor vehicle” can be applied to a same image of a car.
38Bucak, Serhat Selcuk, Rong Jin, and Anil K. Jain. "Multi-label learning with incomplete class assignments." CVPR 2011.
Multi-label classification (2)
● Outputs can be matched to more than one label
○ “car”, “automobile”, “motor vehicle” can be applied to a same image of a car.
● .. directly use sigmoid at each output independently instead
of softmax
39
Multi-label classification (3)
● Cross-entropy loss for multi-label classification:
40
Outline
● Introduction
○ Definition, properties, training process
● Common types of loss functions
○ Regression
○ Classification
● Regularization
● Example
41
Regularization
● Control the capacity of the network to prevent
overfitting
○ L2-regularization (weight decay):
○ L1-regularization:
42
regularization parameter
Outline
● Introduction
○ Definition, properties, training process
● Common types of loss functions
○ Regression
○ Classification
● Regularization
● Example
43
cross-entropy loss = 1.63
cross-entropy loss = 4.14
Example
44Why you should use cross-entropy instead of classification error
Classification accuracy = 2/3 = 0.66
Classification accuracy = 2/3 = 0.66
0.1
0.2
0.7
Confidences S(lk
)
0.3
0.4
0.3
0.3
0.3
0.4
0.3
0.3
0.4
Confidences S(lk
)
0.1
0.7
0.2
0.1
0.2
0.7
Ground Truth
1
0
0
0
1
0
0
0
1
Predictions
0
0
1
0
1
0
0
0
1
Example
45Why you should use cross-entropy instead of classification error
a) Compute the accuracy of the predictions.
0.1
0.2
0.7
Confidences S(lk
) Predictions
0.3
0.4
0.3
0.3
0.3
0.4
0
0
1
0
1
0
0
0
1
The class with the maximum confidence is predicted.
Ground Truth
1
0
0
0
1
0
0
0
1
Example
46
a) Compute the accuracy of the predictions.
0.1
0.2
0.7
Confidences S(lk
) Predictions
0.3
0.4
0.3
0.3
0.3
0.4
0
0
1
0
1
0
0
0
1
Class predictions and ground truth can be compared with a confusion matrix.
Ground Truth
1
0
0
0
1
0
0
0
1
Prediction
Class
1
Class
2
Class
3
GT
Class
1
0 0 1
Class
2
0 1 0
Class
3
0 0 1
Accuracy =2 / 3 = 0.66 Accuracy =2 / 3 = 0.66
Example
47Why you should use cross-entropy instead of classification error
Classification accuracy = 2/3
cross-entropy loss = 4.14
a) Compute the accuracy of the predictions
b) Compute the cross-entropy loss of the prediction
0.1
0.2
0.7
Confidences S(lk
)
0.3
0.4
0.3
0.3
0.3
0.4
Ground Truth y
1
0
0
0
1
0
0
0
1
Example
48Why you should use cross-entropy instead of classification error
a) Compute the accuracy of the predictions
b) Compute the cross-entropy loss of the prediction
0.1
0.2
0.7
Confidences S(lk
)
0.3
0.4
0.3
0.3
0.3
0.4
Ground Truth y
1
0
0
0
1
0
0
0
1
-log
2.3 0.92 0.92
Example
49Why you should use cross-entropy instead of classification error
a) Compute the accuracy of the predictions
b) Compute the cross-entropy loss of the prediction
0.1
0.2
0.7
Confidences S(lk
)
0.3
0.4
0.3
0.3
0.3
0.4
Ground Truth y
1
0
0
0
1
0
0
0
1
L=4.14
-log
2.3 0.92 0.92
Assignment
50
a) Compute the accuracy of the predictions
b) Compute the cross-entropy loss of the prediction
c) Compare the result with the one obtained with the example and discuss how accuray and
cross-entropy have captured the quality of the predictions.
0.3
0.3
0.4
Ground Truth
0.1
0.7
0.2
0.1
0.2
0.7
1
0
0
0
1
0
0
0
1
Confidences S(lk
)
Why you should use cross-entropy instead of classification error
References
● About loss functions
● Neural networks and deep learning
● Are loss functions all the same?
● Convolutional neural networks for Visual Recognition
● Deep learning book, MIT Press, 2016
● On Loss Functions for Deep Neural Networks in
Classification
51
Thanks! Questions?
52
https://imatge.upc.edu/web/people/javier-ruiz-hidalgo

More Related Content

What's hot

Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural NetworksDatabricks
 
Optimization in Deep Learning
Optimization in Deep LearningOptimization in Deep Learning
Optimization in Deep LearningYan Xu
 
Autoencoder
AutoencoderAutoencoder
AutoencoderHARISH R
 
GAN - Theory and Applications
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and ApplicationsEmanuele Ghelfi
 
Optimization for Deep Learning
Optimization for Deep LearningOptimization for Deep Learning
Optimization for Deep LearningSebastian Ruder
 
Neural network & its applications
Neural network & its applications Neural network & its applications
Neural network & its applications Ahmed_hashmi
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkKnoldus Inc.
 
Feed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentFeed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentMuhammad Rasel
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data MiningValerii Klymchuk
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Gaurav Mittal
 
Autoencoders
AutoencodersAutoencoders
AutoencodersCloudxLab
 
Loss Function.pptx
Loss Function.pptxLoss Function.pptx
Loss Function.pptxfunnyworld18
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function홍배 김
 
Feature selection
Feature selectionFeature selection
Feature selectionDong Guo
 
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
Semantic Segmentation - Fully Convolutional Networks for Semantic SegmentationSemantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation岳華 杜
 
Activation function
Activation functionActivation function
Activation functionAstha Jain
 
Deep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningDeep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningShubhmay Potdar
 

What's hot (20)

AlexNet
AlexNetAlexNet
AlexNet
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
 
Optimization in Deep Learning
Optimization in Deep LearningOptimization in Deep Learning
Optimization in Deep Learning
 
Autoencoder
AutoencoderAutoencoder
Autoencoder
 
GAN - Theory and Applications
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and Applications
 
Optimization for Deep Learning
Optimization for Deep LearningOptimization for Deep Learning
Optimization for Deep Learning
 
Neural network & its applications
Neural network & its applications Neural network & its applications
Neural network & its applications
 
Transfer Learning
Transfer LearningTransfer Learning
Transfer Learning
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
Feed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentFeed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descent
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data Mining
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
Autoencoders
AutoencodersAutoencoders
Autoencoders
 
Loss Function.pptx
Loss Function.pptxLoss Function.pptx
Loss Function.pptx
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
Semantic Segmentation - Fully Convolutional Networks for Semantic SegmentationSemantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
 
Activation function
Activation functionActivation function
Activation function
 
Deep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningDeep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter Tuning
 

Similar to Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018

Lcdf4 chap 03_p2
Lcdf4 chap 03_p2Lcdf4 chap 03_p2
Lcdf4 chap 03_p2ozgur_can
 
DALL-E.pdf
DALL-E.pdfDALL-E.pdf
DALL-E.pdfdsfajkh
 
Deep Learning Module 2A Training MLP.pptx
Deep Learning Module 2A Training MLP.pptxDeep Learning Module 2A Training MLP.pptx
Deep Learning Module 2A Training MLP.pptxvipul6601
 
Mirko Damiani - An Embedded soft real time distributed system in Go
Mirko Damiani - An Embedded soft real time distributed system in GoMirko Damiani - An Embedded soft real time distributed system in Go
Mirko Damiani - An Embedded soft real time distributed system in Golinuxlab_conf
 
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 
Archi Modelling
Archi ModellingArchi Modelling
Archi Modellingdilane007
 
TC39: How we work, what we are working on, and how you can get involved (dotJ...
TC39: How we work, what we are working on, and how you can get involved (dotJ...TC39: How we work, what we are working on, and how you can get involved (dotJ...
TC39: How we work, what we are working on, and how you can get involved (dotJ...Igalia
 
L3-.pptx
L3-.pptxL3-.pptx
L3-.pptxasdq4
 
LUT-Network Revision2 -English version-
LUT-Network Revision2 -English version-LUT-Network Revision2 -English version-
LUT-Network Revision2 -English version-ryuz88
 
lossy compression JPEG
lossy compression JPEGlossy compression JPEG
lossy compression JPEGMahmoud Hikmet
 
Digit recognizer by convolutional neural network
Digit recognizer by convolutional neural networkDigit recognizer by convolutional neural network
Digit recognizer by convolutional neural networkDing Li
 
Logic Design - Chapter 5: Part1 Combinattional Logic
Logic Design - Chapter 5: Part1 Combinattional LogicLogic Design - Chapter 5: Part1 Combinattional Logic
Logic Design - Chapter 5: Part1 Combinattional LogicGouda Mando
 

Similar to Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018 (20)

Lcdf4 chap 03_p2
Lcdf4 chap 03_p2Lcdf4 chap 03_p2
Lcdf4 chap 03_p2
 
Eye deep
Eye deepEye deep
Eye deep
 
DALL-E.pdf
DALL-E.pdfDALL-E.pdf
DALL-E.pdf
 
Deep Learning Module 2A Training MLP.pptx
Deep Learning Module 2A Training MLP.pptxDeep Learning Module 2A Training MLP.pptx
Deep Learning Module 2A Training MLP.pptx
 
Mirko Damiani - An Embedded soft real time distributed system in Go
Mirko Damiani - An Embedded soft real time distributed system in GoMirko Damiani - An Embedded soft real time distributed system in Go
Mirko Damiani - An Embedded soft real time distributed system in Go
 
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
 
ICT FIRST LECTURE.pptx
ICT FIRST LECTURE.pptxICT FIRST LECTURE.pptx
ICT FIRST LECTURE.pptx
 
Deep MIML Network
Deep MIML NetworkDeep MIML Network
Deep MIML Network
 
Archi Modelling
Archi ModellingArchi Modelling
Archi Modelling
 
DL (v2).pptx
DL (v2).pptxDL (v2).pptx
DL (v2).pptx
 
TC39: How we work, what we are working on, and how you can get involved (dotJ...
TC39: How we work, what we are working on, and how you can get involved (dotJ...TC39: How we work, what we are working on, and how you can get involved (dotJ...
TC39: How we work, what we are working on, and how you can get involved (dotJ...
 
L3-.pptx
L3-.pptxL3-.pptx
L3-.pptx
 
LUT-Network Revision2 -English version-
LUT-Network Revision2 -English version-LUT-Network Revision2 -English version-
LUT-Network Revision2 -English version-
 
Auto Tuning
Auto TuningAuto Tuning
Auto Tuning
 
Neural networks
Neural networksNeural networks
Neural networks
 
lossy compression JPEG
lossy compression JPEGlossy compression JPEG
lossy compression JPEG
 
Digit recognizer by convolutional neural network
Digit recognizer by convolutional neural networkDigit recognizer by convolutional neural network
Digit recognizer by convolutional neural network
 
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
 
Logic Design - Chapter 5: Part1 Combinattional Logic
Logic Design - Chapter 5: Part1 Combinattional LogicLogic Design - Chapter 5: Part1 Combinattional Logic
Logic Design - Chapter 5: Part1 Combinattional Logic
 
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
 

More from Universitat Politècnica de Catalunya

The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...Universitat Politècnica de Catalunya
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoUniversitat Politècnica de Catalunya
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Universitat Politècnica de Catalunya
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosUniversitat Politècnica de Catalunya
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Universitat Politècnica de Catalunya
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Universitat Politècnica de Catalunya
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Universitat Politècnica de Catalunya
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Universitat Politècnica de Catalunya
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Universitat Politècnica de Catalunya
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Universitat Politècnica de Catalunya
 

More from Universitat Politècnica de Catalunya (20)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Deep Generative Learning for All
Deep Generative Learning for AllDeep Generative Learning for All
Deep Generative Learning for All
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
 
The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
 
Open challenges in sign language translation and production
Open challenges in sign language translation and productionOpen challenges in sign language translation and production
Open challenges in sign language translation and production
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
 
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in MinecraftDiscovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in Minecraft
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...
 
Intepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural NetworksIntepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural Networks
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
 
Curriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object SegmentationCurriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object Segmentation
 
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
 

Recently uploaded

办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一F La
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 

Recently uploaded (20)

办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 

Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018

  • 1. [course site] Javier Ruiz Hidalgo javier.ruiz@upc.edu Associate Professor Universitat Politecnica de Catalunya Technical University of Catalonia Loss functions Day 3 Lecture 1 #DLUPC
  • 2. Me ● Javier Ruiz Hidalgo ○ Email: j.ruiz@upc.edu ○ Office: UPC, Campus Nord, D5-008 ● Teaching experience ○ Basic signal processing ○ Project based ○ Image processing & computer vision ● Research experience ○ Master on hierarchical image representations by UEA (UK) ○ PhD on video coding by UPC (Spain) ○ Interests in image & video coding, 3D analysis and super-resolution 2
  • 3. Acknowledgements 3 Kevin McGuinness kevin.mcguinness@dcu.ie Research Fellow Insight Centre for Data Analytics Dublin City University Xavier Giró xavier.giro@upc.edu Associate Professor ETSETB TelecomBCN Universitat Politècnica de Catalunya
  • 4. Motivation 4 Model ● 3 layer neural network (2 hidden layers) ● Tanh units (activation function) ● 512-512-10 ● Softmax on top layer ● Cross entropy loss 28 28 Layer #Weights #Biases Total 1 784 x 512 512 401,920 2 512 x 512 512 262,656 3 512 x 10 10 5,130 669,706 ??
  • 5. Outline ● Introduction ○ Definition, properties, training process ● Common types of loss functions ○ Regression ○ Classification ● Regularization ● Example 5
  • 6. Nomenclature loss function cost function objective function error function 6 = = =
  • 7. Definition In a supervised deep learning context the loss function measures the quality of a particular set of parameters based on how well the output of the network agrees with the ground truth labels in the training data. 7
  • 8. Loss function (1) 8 How good does our network with the training data? labels (ground truth) input parameters (weights, biases)error Deep Network input output
  • 9. Loss function (2) ● The loss function does not want to measure the entire performance of the network against a validation/test dataset. 9
  • 10. Loss function (3) ● The loss function is used to guide the training process in order to find a set of parameters that reduce the value of the loss function. 10
  • 11. Training process Stochastic gradient descent ● Find a set of parameters which make the loss as small as possible. ● Change parameters at a rate determined by the partial derivatives of the loss function: 11
  • 12. Properties (1) ● Minimum (0 value) when the output of the network is equal to the ground truth data. ● Increase value when output differs from ground truth. 12 epoch Loss Similar to GTDifferent from GT
  • 13. Properties (2) 13 ● Ideally → convex function
  • 14. Properties (2) 14 ● In reality → some many parameters (in the order of millions) than it is not convex
  • 15. Properties (3) 15 ● Varies smoothly with changes on the output ○ Better gradients for gradient descent ○ Easy to compute small changes in the parameters to get an improvement in the loss
  • 16. Assumptions ● For backpropagation to work: ○ Loss function can be written as an average over loss functions for individual training examples: ○ Loss functions can be written as a function of the output activations from the neural network. 16Neural Networks and Deep Learning empirical risk
  • 17. Outline ● Introduction ○ Definition, properties, training process ● Common types of loss functions ○ Regression ○ Classification ● Regularization ● Example 17
  • 18. Common types of loss functions (1) ● Loss functions depend on the type of task: ○ Regression: the network predicts continuous, numeric variables ■ Example: Length of fishes in images, price of a house ■ Absolute value, square error 18
  • 19. 19 Loss functions for regression L1 Distance / Absolute Value L2 Distance / Euclidian Distance / Square Value Lighter computation (only |·|) Heavier computation (square)
  • 20. 20 Loss functions for regression L1 Distance / Absolute Value L2 Distance / Euclidian Distance / Square Value Less penalty to errors > 1 More penalty to errors < 1 More penalty to error > 1 Less penalty to errors < 1
  • 21. 21 Loss functions for regression L1 Distance / Absolute Value L2 Distance / Euclidian Distance / Square Value Less sensitive to outliers, which are very high yi -fθ (xi ) More sensitive to outliers, which are very high yi -fθ (xi )
  • 22. Outline ● Introduction ○ Definition, properties, training process ● Common types of loss functions ○ Regression ○ Classification ● Regularization ● Example 22
  • 23. Common types of loss functions (2) ● Loss functions depend on the type of task: ○ Classification: the network predicts categorical variables (fixed number of classes) ■ Example: classify email as spam, classify images of numbers. ■ hinge loss, Cross-entropy loss 23
  • 24. Classification (1) We want the network to classify the input into a fixed number of classes 24 class “1” class “2” class “3”
  • 25. Classification (2) ● Each input can have only one label ○ One prediction per output class ■ The network will have “k” outputs (number of classes) 25 0.1 2 1 class “1” class “2” class “3” input Network output scores / logits
  • 26. Classification (3) ● How can we create a loss function to improve the scores? ○ Somehow write the labels (ground truth of the data) into a vector → One-hot encoding ○ Non-probabilistic interpretation → hinge loss ○ Probabilistic interpretation: need to transform the scores into a probability function → Softmax 26
  • 27. One-hot encoding ● Transform each label into a vector (with only 1 and 0) ○ Length equal to the total number of classes “k” ○ Value of 1 for the correct class and 0 elsewhere 27 0 1 0 class “2” 1 0 0 class “1” 0 0 1 class “3”
  • 28. Softmax (1) ● Convert scores into confidences/probabilities ○ From 0.0 to 1.0 ○ Probability for all classes adds to 1.0 28 0.1 2 1 input Network output scores / logits 0.1 0.7 0.2 probability
  • 29. Softmax (2) ● Softmax function 29 0.1 2 1 input Network output scores / logits 0.1 0.7 0.2 probability scores (logits) Neural Networks and Deep Learning (softmax)
  • 30. 30
  • 31. Cross-entropy loss (1) 31 0.1 2 1 input Network output scores / logits lk 0.1 0.7 0.2 probability S(lk ) 0 1 0 labels yk loss function Loss contribution from input xi in a training batch
  • 32. Cross-entropy loss (2) 32 0.1 2 1 input Network output scores / logits lk 0.1 0.7 0.2 probability S(lk ) 0 1 0 labels yk loss function In a one-hot vector, only one yk =1 (rest are 0) !!
  • 33. Cross-entropy loss (2) 33 0.1 2 1 input Network output scores / logits lk 0.1 0.7 0.2 probability S(lk ) 0 1 0 labels yk loss function Only the confidence for the GT class is considered.
  • 34. Cross-entropy loss (3) 34 -log penalizes small confidence scores for GT class
  • 35. Cross-entropy loss (3) ● For a set of n inputs 35 labels (one-hot) Softmax
  • 36. Cross-entropy loss (4) ● In general, cross-entropy loss works better than square error loss: ○ Square error loss usually gives too much emphasis to incorrect outputs. ○ In square error loss, as the output gets closer to either 0.0 or 1.0 the gradients get smaller, the change in weights gets smaller and training is slower. 36
  • 37. Weighted cross-entropy 37 ● Dealing with class imbalance ○ Different number of samples for different classes ○ Introduce a weighing factor (related to inverse class frequency or treated as hyperparameter) class “ant” class “panda”class “flamingo”
  • 38. Multi-label classification (1) ● Outputs can be matched to more than one label ○ “car”, “automobile”, “motor vehicle” can be applied to a same image of a car. 38Bucak, Serhat Selcuk, Rong Jin, and Anil K. Jain. "Multi-label learning with incomplete class assignments." CVPR 2011.
  • 39. Multi-label classification (2) ● Outputs can be matched to more than one label ○ “car”, “automobile”, “motor vehicle” can be applied to a same image of a car. ● .. directly use sigmoid at each output independently instead of softmax 39
  • 40. Multi-label classification (3) ● Cross-entropy loss for multi-label classification: 40
  • 41. Outline ● Introduction ○ Definition, properties, training process ● Common types of loss functions ○ Regression ○ Classification ● Regularization ● Example 41
  • 42. Regularization ● Control the capacity of the network to prevent overfitting ○ L2-regularization (weight decay): ○ L1-regularization: 42 regularization parameter
  • 43. Outline ● Introduction ○ Definition, properties, training process ● Common types of loss functions ○ Regression ○ Classification ● Regularization ● Example 43
  • 44. cross-entropy loss = 1.63 cross-entropy loss = 4.14 Example 44Why you should use cross-entropy instead of classification error Classification accuracy = 2/3 = 0.66 Classification accuracy = 2/3 = 0.66 0.1 0.2 0.7 Confidences S(lk ) 0.3 0.4 0.3 0.3 0.3 0.4 0.3 0.3 0.4 Confidences S(lk ) 0.1 0.7 0.2 0.1 0.2 0.7 Ground Truth 1 0 0 0 1 0 0 0 1 Predictions 0 0 1 0 1 0 0 0 1
  • 45. Example 45Why you should use cross-entropy instead of classification error a) Compute the accuracy of the predictions. 0.1 0.2 0.7 Confidences S(lk ) Predictions 0.3 0.4 0.3 0.3 0.3 0.4 0 0 1 0 1 0 0 0 1 The class with the maximum confidence is predicted. Ground Truth 1 0 0 0 1 0 0 0 1
  • 46. Example 46 a) Compute the accuracy of the predictions. 0.1 0.2 0.7 Confidences S(lk ) Predictions 0.3 0.4 0.3 0.3 0.3 0.4 0 0 1 0 1 0 0 0 1 Class predictions and ground truth can be compared with a confusion matrix. Ground Truth 1 0 0 0 1 0 0 0 1 Prediction Class 1 Class 2 Class 3 GT Class 1 0 0 1 Class 2 0 1 0 Class 3 0 0 1 Accuracy =2 / 3 = 0.66 Accuracy =2 / 3 = 0.66
  • 47. Example 47Why you should use cross-entropy instead of classification error Classification accuracy = 2/3 cross-entropy loss = 4.14 a) Compute the accuracy of the predictions b) Compute the cross-entropy loss of the prediction 0.1 0.2 0.7 Confidences S(lk ) 0.3 0.4 0.3 0.3 0.3 0.4 Ground Truth y 1 0 0 0 1 0 0 0 1
  • 48. Example 48Why you should use cross-entropy instead of classification error a) Compute the accuracy of the predictions b) Compute the cross-entropy loss of the prediction 0.1 0.2 0.7 Confidences S(lk ) 0.3 0.4 0.3 0.3 0.3 0.4 Ground Truth y 1 0 0 0 1 0 0 0 1 -log 2.3 0.92 0.92
  • 49. Example 49Why you should use cross-entropy instead of classification error a) Compute the accuracy of the predictions b) Compute the cross-entropy loss of the prediction 0.1 0.2 0.7 Confidences S(lk ) 0.3 0.4 0.3 0.3 0.3 0.4 Ground Truth y 1 0 0 0 1 0 0 0 1 L=4.14 -log 2.3 0.92 0.92
  • 50. Assignment 50 a) Compute the accuracy of the predictions b) Compute the cross-entropy loss of the prediction c) Compare the result with the one obtained with the example and discuss how accuray and cross-entropy have captured the quality of the predictions. 0.3 0.3 0.4 Ground Truth 0.1 0.7 0.2 0.1 0.2 0.7 1 0 0 0 1 0 0 0 1 Confidences S(lk ) Why you should use cross-entropy instead of classification error
  • 51. References ● About loss functions ● Neural networks and deep learning ● Are loss functions all the same? ● Convolutional neural networks for Visual Recognition ● Deep learning book, MIT Press, 2016 ● On Loss Functions for Deep Neural Networks in Classification 51