SlideShare a Scribd company logo
1 of 39
Download to read offline
Monday reading books
on Machine Learning
JAOUAD DABOUNOU
FST of Settat
Hassan 1st University
February 21, 2022
004 – Introduction
Probability Theory
2
Introduction
A partir de ce lundi 31 janvier, une lecture de trois livres, dans le cadre de "Monday reading books on
machine learning".
Le premier livre, qui constituera le fil conducteur de toute l'action :
Christopher Bishop; Pattern Recognition and Machine Learning, Springer-Verlag New York Inc, 2006
Seront utilisées des parties de deux livres, surtout du livre :
Ian Goodfellow, Yoshua Bengio, Aaron Courville; Deep Learning, The MIT Press, 2016
et du livre :
Ovidiu Calin; Deep Learning Architectures: A Mathematical Approach, Springer, 2020
3
Introduction
4
Introduction
Consider two random variables X for Fruit and Y for Box.
X can take the values x1 = 'o' and x2 = 'a'.
Y can take the values y1 = 'r', y2 = 'b', y3 = 'br', y4 = 'v' and y5 = 'y' corresponding to the box color.
5
Probability Theory
blue brown
red yellow
violet
orange
apple
X: Fruit Y: Box
We will introduce some basic concepts of probability theory and information theory by considering the simple
example of fruits and boxes.
The probability distribution for a random variable describes how the probabilities are distributed over the values
of the random variable. It is the mathematical function that gives the probabilities of occurrence of different
possible outcomes.
6
Probability distribution
p(X='o') = 1
p(X='a') = 0
Probability distribution
We will introduce some basic concepts of probability theory and information theory by considering the simple
example of fruits and boxes.
The probability distribution for a random variable describes how the probabilities are distributed over the values
of the random variable. It is the mathematical function that gives the probabilities of occurrence of different
possible outcomes.
7
Probability distribution
p(X='o') = 0.5
p(X='a') = 0.5
Probability distribution
We will introduce some basic concepts of probability theory and information theory by considering the simple
example of fruits and boxes.
The probability distribution for a random variable describes how the probabilities are distributed over the values
of the random variable. It is the mathematical function that gives the probabilities of occurrence of different
possible outcomes.
8
Probability distribution
p(X='o') = 0.75
p(X='a') = 0.25
Probability distribution
Probability distribution can be used to quantify the relative
frequency of occurrences of uncertain events.
Probability distribution is a part of measurement
uncertainty analysis.
Information theory is the mathematical approach for the quantification,
storage and communication of digital information.
9
Information theory
Claude Shannon (1916 - 2001)
Associated with information theory are the concepts of probability, uncertainty, communication and noise in
data.
10
Information theory
Low uncertainty
High Knowledge
Low information
Low entropy
No surprise
High uncertainty
Low Knowledge
High information
High entropy Great surprise
Associated with information theory are the concepts of probability, uncertainty, communication and noise in
data.
11
Information theory
Low uncertainty
High Knowledge
Low information
Low entropy
No surprise
High uncertainty
Low Knowledge
High information
High entropy Great surprise
Associated with information theory are the concepts of probability, uncertainty, communication and noise in
data.
12
Information theory
Low uncertainty
High Knowledge
Low information
Low entropy
No suprise
High uncertainty
Low Knowledge
High information
High entropy Great surprise
Associated with information theory are the concepts of probability, uncertainty, communication and noise in
data.
13
Information theory
Low uncertainty
High Knowledge
Low information
Low entropy
No suprise
High uncertainty
Low Knowledge
High information
High entropy Great surprise
Associated with information theory are the concepts of probability, uncertainty, communication and noise in
data.
14
Information theory
Low uncertainty
High Knowledge
Low information
Low entropy
No suprise
High uncertainty
Low Knowledge
High information
High entropy Great surprise
The amount of information can be viewed as the ‘degree of surprise’ on learning the value of x. If we are told that a
highly improbable event has just occurred, we will have received more information than if we were told that some very
likely event has just occurred, and if we knew that the event was certain to happen we would receive no information.
Our measure of information content will therefore depend on the probability distribution p(x), and we therefore look
for a quantity h(x) that is a monotonic function of the probability p(x) and that expresses the information content.
15
Information theory
p(X='o') = 1
Probability
h(X='o') = -log2 p(x) = 0
Information
p(X='a') = 0.5
h(X='a') = -log2 p(x) = 1
p(X='a') = 0.125
h(X='a') = -log2 p(x) = 3
Amount of uncertainty
Entropy is a probabilistic measure of uncertainty or ignorance. Information is a measure of a reduction in that
uncertainty.
16
Entropy
p(X='o') = 0.875
Probability
h(X='o') = -log2 p(x) = 0.193
Information
p(X='a') = 0.125
p(X='a') = -log2 p(x) = 3
Given a probability distribution p(X), entropy H of the system can then be expressed as :
𝐻 𝑋 = −
𝑘=1,𝐾
𝑝 𝑥𝑘 log(𝑝 𝑥𝑘 )
Entropy H(X) reaches its maximum value if all outcomes of the random variable X have the same probability.
H(X) expresses the uncertainty or ignorance about the system outcomes. H(X) = 0, if and only if the probability of an
outcome is 1 and of all other is 0.
17
Entropy
H(X) = 0
Entropy H(X) = 0.54 H(X) = 0.81 H(X) = 0.91 H(X) = 1
Entropy can be considered as a measure of variability in a system.
No uncertainty Maximum uncertainty
𝑝(𝑐𝑎𝑡) =
5
20
= 0.25
Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'.
We make the assumption of independent and identically distributed outcomes.
18
Probability Theory
𝑝(𝑒𝑙𝑒𝑝ℎ𝑎𝑛𝑡) =
4
20
= 0.2
𝑝(𝑑𝑜𝑔) =
7
20
= 0.35
𝑝(ℎ𝑜𝑟𝑠𝑒) =
4
20
= 0.2
H(X) = −
𝑘=1,𝐾
𝑝 𝑥𝑘 log 𝑝 𝑥𝑘 = 1.96
Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'.
We make the assumption of independent and identically distributed outcomes.
19
Probability Theory
Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'.
We make the assumption of independent and identically distributed outcomes.
20
Probability Theory
Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'.
We make the assumption of independent and identically distributed outcomes.
21
Probability Theory
Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'.
We make the assumption of independent and identically distributed outcomes.
22
Probability Theory
Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'.
We make the assumption of independent and identically distributed outcomes.
23
Probability Theory
c d h e c d c c h e d d h e e
Let
'c' = 'cat'
'e' = 'elephant'
'h' = 'horse'
'd' = dog'.
Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'.
We make the assumption of independent and identically distributed outcomes.
24
Probability Theory
c d h e c d c c h e d d h e e
1
0
0
0
0
0
0
1
0
0
1
0
0
1
0
0
1
0
0
0
0
0
0
1
1
0
0
0
0
1
0
0
0
0
1
0
0
1
0
0
0
0
0
1
0
0
0
1
0
0
1
0
0
1
0
0
0
1
0
0
Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'.
We make the assumption of independent and identically distributed outcomes.
25
Probability Theory
Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'.
We make the assumption of independent and identically distributed outcomes.
26
Probability Theory
Sample 1 : s1
Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'.
We make the assumption of independent and identically distributed outcomes.
27
Probability Theory
h
0
0
1
0
0.07
0.01
0.6
0.3
s1
Sample 1 : s1
Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'.
We make the assumption of independent and identically distributed outcomes.
28
Probability Theory
h e
0
0
1
0
0
1
0
0
0.07
0.01
0.6
0.3
0.03
0.8
0.1
0.07
Sample 2 : s2
s1 s2
Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'.
We make the assumption of independent and identically distributed outcomes.
29
Probability Theory
h e c
0
0
1
0
0
1
0
0
1
0
0
0
0.07
0.01
0.6
0.3
0.03
0.8
0.1
0.07
0.4
0.05
0.05
0.5
Sample 3 : s3
s1 s2 s3
Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'.
We make the assumption of independent and identically distributed outcomes.
30
Probability Theory
h e c d
0
0
1
0
0
1
0
0
1
0
0
0
0
0
0
1
0.07
0.01
0.6
0.3
0.03
0.8
0.1
0.07
0.4
0.05
0.05
0.5
0.4
0.01
0.09
0.5
Sample 4 : s4
s1 s2 s3 s4
Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'.
We make the assumption of independent and identically distributed outcomes.
31
Probability Theory
h e c d c
0
0
1
0
0
1
0
0
1
0
0
0
0
0
0
1
1
0
0
0
0.07
0.01
0.6
0.3
0.03
0.8
0.1
0.07
0.4
0.05
0.05
0.5
0.4
0.01
0.09
0.5
0.6
0.02
0.03
0.35
Sample 5 : s5
s1 s2 s3 s4 s5
Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'.
We make the assumption of independent and identically distributed outcomes.
32
Probability Theory
h e c d c d
0
0
1
0
0
1
0
0
1
0
0
0
0
0
0
1
1
0
0
0
0
0
0
1
0.07
0.01
0.6
0.3
0.03
0.8
0.1
0.07
0.4
0.05
0.05
0.5
0.4
0.01
0.09
0.5
0.6
0.02
0.03
0.35
0.28
0.02
0.1
0.6
Sample 6 : s6
s1 s2 s3 s4 s5 s6
Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'.
We make the assumption of independent and identically distributed outcomes.
33
K-L Divergence
h
0
0
1
0
0.07
0.01
0.6
0.3
We want to use a metric that allows
us to estimate the deviation of the
probability distribution q from the
probability distribution p.
p is the true probability distribution
q is the predicted probability distribution
𝑝(𝑥1|𝑠1)
𝑝(𝑥2|𝑠1)
𝑝(𝑥3|𝑠1)
𝑝(𝑥4|𝑠1)
𝑞(𝑥1|𝑠1)
𝑞(𝑥2|𝑠1)
𝑞(𝑥3|𝑠1)
𝑞(𝑥4|𝑠1)
s1
Sample 1 : s1
We want to use a metric that allows us to estimate the deviation of the probability distribution q from the probability
distribution p. For simplification purposes, we put 𝑝 𝑥1 = 𝑝 𝑥1 𝑠1 and 𝑞 𝑥1 = 𝑞(𝑥1|𝑠1)
34
K-L Divergence
h
0
0
1
0
0.07
0.01
0.6
0.3
p is the true probability distribution
q is the predicted probability distribution
𝑝(𝑥1)
𝑝(𝑥2)
𝑝(𝑥3)
𝑝(𝑥4)
𝑞(𝑥1)
𝑞(𝑥2)
𝑞(𝑥3)
𝑞(𝑥4)
s1
Sample 1 : s1
We want to use a metric that allows us to estimate the deviation of the probability distribution q from the probability
distribution p.
35
K-L Divergence
h
0
0
1
0
0.07
0.01
0.6
0.3
p is the true probability distribution
q is the predicted probability distribution
p q
Distance entre deux
distribution de probabilité
𝐷𝐾𝐿(𝑝| 𝑞 =
𝑘=1,𝐾
𝑝 𝑥𝑘 log
𝑝 𝑥𝑘
𝑞 𝑥𝑘
For K classes x1,…xK
𝑝(𝑥1)
𝑝(𝑥2)
𝑝(𝑥3)
𝑝(𝑥4)
𝑞(𝑥1)
𝑞(𝑥2)
𝑞(𝑥3)
𝑞(𝑥4)
𝑝(𝑥1|𝑠𝑖)
𝑝(𝑥2|𝑠𝑖)
𝑝(𝑥3|𝑠𝑖)
𝑝(𝑥4|𝑠𝑖)
𝑞(𝑥1|𝑠𝑖)
𝑞(𝑥2|𝑠𝑖)
𝑞(𝑥3|𝑠𝑖)
𝑞(𝑥4|𝑠𝑖)
We can also estimate the deviation of the probability distribution q from the probability distribution p using N samples.
36
K-L Divergence
h e c d c d
0
0
1
0
0
1
0
0
1
0
0
0
0
0
0
1
1
0
0
0
0
0
0
1
0.07
0.01
0.6
0.3
0.03
0.8
0.1
0.07
0.4
0.05
0.05
0.5
0.4
0.01
0.09
0.5
0.6
0.02
0.03
0.35
0.28
0.02
0.1
0.6
Probability Distribution
𝐷𝐾𝐿(𝑝| 𝑞 =
1
𝑁
𝑖=1,𝑁 𝑘=1,𝐾
𝑝 𝑥𝑘|𝑠𝑖 log
𝑝 𝑥𝑘|𝑠𝑖
𝑞 𝑥𝑘|𝑠𝑖
Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'.
We make the assumption of independent and identically distributed outcomes.
37
K-L Divergence for Neural Networks
Dataset
0.07
0.01
0.6
0.3
0
0
1
0
Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'.
We make the assumption of independent and identically distributed outcomes.
38
K-L Divergence for Neural Networks
Dataset
0.6
0.02
0.03
0.35
1
0
0
0
39
THANK YOU
Introduction

More Related Content

What's hot

Machine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree LearningMachine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree Learningbutest
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learningamalalhait
 
DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)Cory Cook
 
Lecture 2: Entropy and Mutual Information
Lecture 2: Entropy and Mutual InformationLecture 2: Entropy and Mutual Information
Lecture 2: Entropy and Mutual Informationssuserb83554
 
Overview on Optimization algorithms in Deep Learning
Overview on Optimization algorithms in Deep LearningOverview on Optimization algorithms in Deep Learning
Overview on Optimization algorithms in Deep LearningKhang Pham
 
What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?Kazuki Yoshida
 
Feed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentFeed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentMuhammad Rasel
 
Overfitting & Underfitting
Overfitting & UnderfittingOverfitting & Underfitting
Overfitting & UnderfittingSOUMIT KAR
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptronomaraldabash
 
Feature selection
Feature selectionFeature selection
Feature selectionDong Guo
 
Machine learning ppt.
Machine learning ppt.Machine learning ppt.
Machine learning ppt.ASHOK KUMAR
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural NetworksDatabricks
 
Feature selection concepts and methods
Feature selection concepts and methodsFeature selection concepts and methods
Feature selection concepts and methodsReza Ramezani
 
Machine Learning and Inductive Inference
Machine Learning and Inductive InferenceMachine Learning and Inductive Inference
Machine Learning and Inductive Inferencebutest
 

What's hot (20)

Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
Bayesian learning
Bayesian learningBayesian learning
Bayesian learning
 
Machine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree LearningMachine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree Learning
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learning
 
DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)
 
Lecture 2: Entropy and Mutual Information
Lecture 2: Entropy and Mutual InformationLecture 2: Entropy and Mutual Information
Lecture 2: Entropy and Mutual Information
 
PAC Learning
PAC LearningPAC Learning
PAC Learning
 
Overview on Optimization algorithms in Deep Learning
Overview on Optimization algorithms in Deep LearningOverview on Optimization algorithms in Deep Learning
Overview on Optimization algorithms in Deep Learning
 
What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?
 
Feed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentFeed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descent
 
Decision tree
Decision treeDecision tree
Decision tree
 
AI Lecture 7 (uncertainty)
AI Lecture 7 (uncertainty)AI Lecture 7 (uncertainty)
AI Lecture 7 (uncertainty)
 
Overfitting & Underfitting
Overfitting & UnderfittingOverfitting & Underfitting
Overfitting & Underfitting
 
Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptron
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Machine learning ppt.
Machine learning ppt.Machine learning ppt.
Machine learning ppt.
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
 
Feature selection concepts and methods
Feature selection concepts and methodsFeature selection concepts and methods
Feature selection concepts and methods
 
Machine Learning and Inductive Inference
Machine Learning and Inductive InferenceMachine Learning and Inductive Inference
Machine Learning and Inductive Inference
 

Similar to Mrbml004 : Introduction to Information Theory for Machine Learning

random variation 9473 by jaideep.ppt
random variation 9473 by jaideep.pptrandom variation 9473 by jaideep.ppt
random variation 9473 by jaideep.pptBhartiYadav316049
 
Discrete Probability Distributions
Discrete Probability DistributionsDiscrete Probability Distributions
Discrete Probability Distributionsmandalina landy
 
Probability distribution 2
Probability distribution 2Probability distribution 2
Probability distribution 2Nilanjan Bhaumik
 
Probability distribution for Dummies
Probability distribution for DummiesProbability distribution for Dummies
Probability distribution for DummiesBalaji P
 
AP Statistic and Probability 6.1 (1).ppt
AP Statistic and Probability 6.1 (1).pptAP Statistic and Probability 6.1 (1).ppt
AP Statistic and Probability 6.1 (1).pptAlfredNavea1
 
Mba i qt unit-4.1_introduction to probability distributions
Mba i qt unit-4.1_introduction to probability distributionsMba i qt unit-4.1_introduction to probability distributions
Mba i qt unit-4.1_introduction to probability distributionsRai University
 
Mayo Slides: Part I Meeting #2 (Phil 6334/Econ 6614)
Mayo Slides: Part I Meeting #2 (Phil 6334/Econ 6614)Mayo Slides: Part I Meeting #2 (Phil 6334/Econ 6614)
Mayo Slides: Part I Meeting #2 (Phil 6334/Econ 6614)jemille6
 
The axiomatic power of Kolmogorov complexity
The axiomatic power of Kolmogorov complexity The axiomatic power of Kolmogorov complexity
The axiomatic power of Kolmogorov complexity lbienven
 
Bivariate Discrete Distribution
Bivariate Discrete DistributionBivariate Discrete Distribution
Bivariate Discrete DistributionArijitDhali
 
2 Review of Statistics. 2 Review of Statistics.
2 Review of Statistics. 2 Review of Statistics.2 Review of Statistics. 2 Review of Statistics.
2 Review of Statistics. 2 Review of Statistics.WeihanKhor2
 
ISM_Session_5 _ 23rd and 24th December.pptx
ISM_Session_5 _ 23rd and 24th December.pptxISM_Session_5 _ 23rd and 24th December.pptx
ISM_Session_5 _ 23rd and 24th December.pptxssuser1eba67
 

Similar to Mrbml004 : Introduction to Information Theory for Machine Learning (20)

random variation 9473 by jaideep.ppt
random variation 9473 by jaideep.pptrandom variation 9473 by jaideep.ppt
random variation 9473 by jaideep.ppt
 
Discrete and Continuous Random Variables
Discrete and Continuous Random VariablesDiscrete and Continuous Random Variables
Discrete and Continuous Random Variables
 
Statistics Homework Help
Statistics Homework HelpStatistics Homework Help
Statistics Homework Help
 
Chapter1
Chapter1Chapter1
Chapter1
 
Probability
ProbabilityProbability
Probability
 
Discrete Probability Distributions
Discrete Probability DistributionsDiscrete Probability Distributions
Discrete Probability Distributions
 
Probability distribution 2
Probability distribution 2Probability distribution 2
Probability distribution 2
 
Unit 2 Probability
Unit 2 ProbabilityUnit 2 Probability
Unit 2 Probability
 
Excel Homework Help
Excel Homework HelpExcel Homework Help
Excel Homework Help
 
Probability distribution for Dummies
Probability distribution for DummiesProbability distribution for Dummies
Probability distribution for Dummies
 
AP Statistic and Probability 6.1 (1).ppt
AP Statistic and Probability 6.1 (1).pptAP Statistic and Probability 6.1 (1).ppt
AP Statistic and Probability 6.1 (1).ppt
 
Mba i qt unit-4.1_introduction to probability distributions
Mba i qt unit-4.1_introduction to probability distributionsMba i qt unit-4.1_introduction to probability distributions
Mba i qt unit-4.1_introduction to probability distributions
 
Mayo Slides: Part I Meeting #2 (Phil 6334/Econ 6614)
Mayo Slides: Part I Meeting #2 (Phil 6334/Econ 6614)Mayo Slides: Part I Meeting #2 (Phil 6334/Econ 6614)
Mayo Slides: Part I Meeting #2 (Phil 6334/Econ 6614)
 
The axiomatic power of Kolmogorov complexity
The axiomatic power of Kolmogorov complexity The axiomatic power of Kolmogorov complexity
The axiomatic power of Kolmogorov complexity
 
Bivariate Discrete Distribution
Bivariate Discrete DistributionBivariate Discrete Distribution
Bivariate Discrete Distribution
 
Probability
ProbabilityProbability
Probability
 
2 Review of Statistics. 2 Review of Statistics.
2 Review of Statistics. 2 Review of Statistics.2 Review of Statistics. 2 Review of Statistics.
2 Review of Statistics. 2 Review of Statistics.
 
ISM_Session_5 _ 23rd and 24th December.pptx
ISM_Session_5 _ 23rd and 24th December.pptxISM_Session_5 _ 23rd and 24th December.pptx
ISM_Session_5 _ 23rd and 24th December.pptx
 
Binomial probability distributions
Binomial probability distributions  Binomial probability distributions
Binomial probability distributions
 
Actuarial Pricing Game
Actuarial Pricing GameActuarial Pricing Game
Actuarial Pricing Game
 

More from Jaouad Dabounou

اللغة والذكاء الاصطناعي.pdf
اللغة والذكاء الاصطناعي.pdfاللغة والذكاء الاصطناعي.pdf
اللغة والذكاء الاصطناعي.pdfJaouad Dabounou
 
RNN avec mécanisme d'attention
RNN avec mécanisme d'attentionRNN avec mécanisme d'attention
RNN avec mécanisme d'attentionJaouad Dabounou
 
Projection sur les ensembles convexes fermés
Projection sur les ensembles convexes fermésProjection sur les ensembles convexes fermés
Projection sur les ensembles convexes fermésJaouad Dabounou
 
Projection d’un point sur un ensemble
Projection d’un point sur un ensembleProjection d’un point sur un ensemble
Projection d’un point sur un ensembleJaouad Dabounou
 
Fonction distance à un ensemble
Fonction distance à un ensembleFonction distance à un ensemble
Fonction distance à un ensembleJaouad Dabounou
 
Théorèmes de Carathéodory
Théorèmes de CarathéodoryThéorèmes de Carathéodory
Théorèmes de CarathéodoryJaouad Dabounou
 
Intérieurs relatifs d’ensembles convexes
Intérieurs relatifs d’ensembles convexesIntérieurs relatifs d’ensembles convexes
Intérieurs relatifs d’ensembles convexesJaouad Dabounou
 
Topologie des ensembles convexes
Topologie des ensembles convexesTopologie des ensembles convexes
Topologie des ensembles convexesJaouad Dabounou
 
Réseaux de neurones récurrents et LSTM
Réseaux de neurones récurrents et LSTMRéseaux de neurones récurrents et LSTM
Réseaux de neurones récurrents et LSTMJaouad Dabounou
 
Analyse Convexe TD – Série 1 avec correction
Analyse Convexe TD – Série 1 avec correctionAnalyse Convexe TD – Série 1 avec correction
Analyse Convexe TD – Série 1 avec correctionJaouad Dabounou
 
Modèles de langue : Ngrammes
Modèles de langue : NgrammesModèles de langue : Ngrammes
Modèles de langue : NgrammesJaouad Dabounou
 
Analyse Factorielle des Correspondances
Analyse Factorielle des CorrespondancesAnalyse Factorielle des Correspondances
Analyse Factorielle des CorrespondancesJaouad Dabounou
 
Analyse en Composantes Principales
Analyse en Composantes PrincipalesAnalyse en Composantes Principales
Analyse en Composantes PrincipalesJaouad Dabounou
 
Analyse en Composantes Principales
Analyse en Composantes PrincipalesAnalyse en Composantes Principales
Analyse en Composantes PrincipalesJaouad Dabounou
 
Analyse numérique interpolation
Analyse numérique interpolationAnalyse numérique interpolation
Analyse numérique interpolationJaouad Dabounou
 
Polycopie Analyse Numérique
Polycopie Analyse NumériquePolycopie Analyse Numérique
Polycopie Analyse NumériqueJaouad Dabounou
 
Sélection de contrôles avec correction
Sélection de contrôles avec correctionSélection de contrôles avec correction
Sélection de contrôles avec correctionJaouad Dabounou
 
Dérivation et Intégration numériques
Dérivation et Intégration numériquesDérivation et Intégration numériques
Dérivation et Intégration numériquesJaouad Dabounou
 

More from Jaouad Dabounou (19)

اللغة والذكاء الاصطناعي.pdf
اللغة والذكاء الاصطناعي.pdfاللغة والذكاء الاصطناعي.pdf
اللغة والذكاء الاصطناعي.pdf
 
RNN avec mécanisme d'attention
RNN avec mécanisme d'attentionRNN avec mécanisme d'attention
RNN avec mécanisme d'attention
 
Projection sur les ensembles convexes fermés
Projection sur les ensembles convexes fermésProjection sur les ensembles convexes fermés
Projection sur les ensembles convexes fermés
 
Projection d’un point sur un ensemble
Projection d’un point sur un ensembleProjection d’un point sur un ensemble
Projection d’un point sur un ensemble
 
Fonction distance à un ensemble
Fonction distance à un ensembleFonction distance à un ensemble
Fonction distance à un ensemble
 
Théorèmes de Carathéodory
Théorèmes de CarathéodoryThéorèmes de Carathéodory
Théorèmes de Carathéodory
 
Intérieurs relatifs d’ensembles convexes
Intérieurs relatifs d’ensembles convexesIntérieurs relatifs d’ensembles convexes
Intérieurs relatifs d’ensembles convexes
 
Topologie des ensembles convexes
Topologie des ensembles convexesTopologie des ensembles convexes
Topologie des ensembles convexes
 
Réseaux de neurones récurrents et LSTM
Réseaux de neurones récurrents et LSTMRéseaux de neurones récurrents et LSTM
Réseaux de neurones récurrents et LSTM
 
Analyse Convexe TD – Série 1 avec correction
Analyse Convexe TD – Série 1 avec correctionAnalyse Convexe TD – Série 1 avec correction
Analyse Convexe TD – Série 1 avec correction
 
Modèles de langue : Ngrammes
Modèles de langue : NgrammesModèles de langue : Ngrammes
Modèles de langue : Ngrammes
 
Analyse Factorielle des Correspondances
Analyse Factorielle des CorrespondancesAnalyse Factorielle des Correspondances
Analyse Factorielle des Correspondances
 
Analyse en Composantes Principales
Analyse en Composantes PrincipalesAnalyse en Composantes Principales
Analyse en Composantes Principales
 
W2 vec001
W2 vec001W2 vec001
W2 vec001
 
Analyse en Composantes Principales
Analyse en Composantes PrincipalesAnalyse en Composantes Principales
Analyse en Composantes Principales
 
Analyse numérique interpolation
Analyse numérique interpolationAnalyse numérique interpolation
Analyse numérique interpolation
 
Polycopie Analyse Numérique
Polycopie Analyse NumériquePolycopie Analyse Numérique
Polycopie Analyse Numérique
 
Sélection de contrôles avec correction
Sélection de contrôles avec correctionSélection de contrôles avec correction
Sélection de contrôles avec correction
 
Dérivation et Intégration numériques
Dérivation et Intégration numériquesDérivation et Intégration numériques
Dérivation et Intégration numériques
 

Recently uploaded

Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsAndrey Dotsenko
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 

Recently uploaded (20)

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 

Mrbml004 : Introduction to Information Theory for Machine Learning

  • 1. Monday reading books on Machine Learning JAOUAD DABOUNOU FST of Settat Hassan 1st University February 21, 2022 004 – Introduction Probability Theory
  • 2. 2 Introduction A partir de ce lundi 31 janvier, une lecture de trois livres, dans le cadre de "Monday reading books on machine learning". Le premier livre, qui constituera le fil conducteur de toute l'action : Christopher Bishop; Pattern Recognition and Machine Learning, Springer-Verlag New York Inc, 2006 Seront utilisées des parties de deux livres, surtout du livre : Ian Goodfellow, Yoshua Bengio, Aaron Courville; Deep Learning, The MIT Press, 2016 et du livre : Ovidiu Calin; Deep Learning Architectures: A Mathematical Approach, Springer, 2020
  • 5. Consider two random variables X for Fruit and Y for Box. X can take the values x1 = 'o' and x2 = 'a'. Y can take the values y1 = 'r', y2 = 'b', y3 = 'br', y4 = 'v' and y5 = 'y' corresponding to the box color. 5 Probability Theory blue brown red yellow violet orange apple X: Fruit Y: Box
  • 6. We will introduce some basic concepts of probability theory and information theory by considering the simple example of fruits and boxes. The probability distribution for a random variable describes how the probabilities are distributed over the values of the random variable. It is the mathematical function that gives the probabilities of occurrence of different possible outcomes. 6 Probability distribution p(X='o') = 1 p(X='a') = 0 Probability distribution
  • 7. We will introduce some basic concepts of probability theory and information theory by considering the simple example of fruits and boxes. The probability distribution for a random variable describes how the probabilities are distributed over the values of the random variable. It is the mathematical function that gives the probabilities of occurrence of different possible outcomes. 7 Probability distribution p(X='o') = 0.5 p(X='a') = 0.5 Probability distribution
  • 8. We will introduce some basic concepts of probability theory and information theory by considering the simple example of fruits and boxes. The probability distribution for a random variable describes how the probabilities are distributed over the values of the random variable. It is the mathematical function that gives the probabilities of occurrence of different possible outcomes. 8 Probability distribution p(X='o') = 0.75 p(X='a') = 0.25 Probability distribution Probability distribution can be used to quantify the relative frequency of occurrences of uncertain events. Probability distribution is a part of measurement uncertainty analysis.
  • 9. Information theory is the mathematical approach for the quantification, storage and communication of digital information. 9 Information theory Claude Shannon (1916 - 2001)
  • 10. Associated with information theory are the concepts of probability, uncertainty, communication and noise in data. 10 Information theory Low uncertainty High Knowledge Low information Low entropy No surprise High uncertainty Low Knowledge High information High entropy Great surprise
  • 11. Associated with information theory are the concepts of probability, uncertainty, communication and noise in data. 11 Information theory Low uncertainty High Knowledge Low information Low entropy No surprise High uncertainty Low Knowledge High information High entropy Great surprise
  • 12. Associated with information theory are the concepts of probability, uncertainty, communication and noise in data. 12 Information theory Low uncertainty High Knowledge Low information Low entropy No suprise High uncertainty Low Knowledge High information High entropy Great surprise
  • 13. Associated with information theory are the concepts of probability, uncertainty, communication and noise in data. 13 Information theory Low uncertainty High Knowledge Low information Low entropy No suprise High uncertainty Low Knowledge High information High entropy Great surprise
  • 14. Associated with information theory are the concepts of probability, uncertainty, communication and noise in data. 14 Information theory Low uncertainty High Knowledge Low information Low entropy No suprise High uncertainty Low Knowledge High information High entropy Great surprise
  • 15. The amount of information can be viewed as the ‘degree of surprise’ on learning the value of x. If we are told that a highly improbable event has just occurred, we will have received more information than if we were told that some very likely event has just occurred, and if we knew that the event was certain to happen we would receive no information. Our measure of information content will therefore depend on the probability distribution p(x), and we therefore look for a quantity h(x) that is a monotonic function of the probability p(x) and that expresses the information content. 15 Information theory p(X='o') = 1 Probability h(X='o') = -log2 p(x) = 0 Information p(X='a') = 0.5 h(X='a') = -log2 p(x) = 1 p(X='a') = 0.125 h(X='a') = -log2 p(x) = 3 Amount of uncertainty
  • 16. Entropy is a probabilistic measure of uncertainty or ignorance. Information is a measure of a reduction in that uncertainty. 16 Entropy p(X='o') = 0.875 Probability h(X='o') = -log2 p(x) = 0.193 Information p(X='a') = 0.125 p(X='a') = -log2 p(x) = 3 Given a probability distribution p(X), entropy H of the system can then be expressed as : 𝐻 𝑋 = − 𝑘=1,𝐾 𝑝 𝑥𝑘 log(𝑝 𝑥𝑘 )
  • 17. Entropy H(X) reaches its maximum value if all outcomes of the random variable X have the same probability. H(X) expresses the uncertainty or ignorance about the system outcomes. H(X) = 0, if and only if the probability of an outcome is 1 and of all other is 0. 17 Entropy H(X) = 0 Entropy H(X) = 0.54 H(X) = 0.81 H(X) = 0.91 H(X) = 1 Entropy can be considered as a measure of variability in a system. No uncertainty Maximum uncertainty
  • 18. 𝑝(𝑐𝑎𝑡) = 5 20 = 0.25 Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'. We make the assumption of independent and identically distributed outcomes. 18 Probability Theory 𝑝(𝑒𝑙𝑒𝑝ℎ𝑎𝑛𝑡) = 4 20 = 0.2 𝑝(𝑑𝑜𝑔) = 7 20 = 0.35 𝑝(ℎ𝑜𝑟𝑠𝑒) = 4 20 = 0.2 H(X) = − 𝑘=1,𝐾 𝑝 𝑥𝑘 log 𝑝 𝑥𝑘 = 1.96
  • 19. Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'. We make the assumption of independent and identically distributed outcomes. 19 Probability Theory
  • 20. Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'. We make the assumption of independent and identically distributed outcomes. 20 Probability Theory
  • 21. Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'. We make the assumption of independent and identically distributed outcomes. 21 Probability Theory
  • 22. Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'. We make the assumption of independent and identically distributed outcomes. 22 Probability Theory
  • 23. Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'. We make the assumption of independent and identically distributed outcomes. 23 Probability Theory c d h e c d c c h e d d h e e Let 'c' = 'cat' 'e' = 'elephant' 'h' = 'horse' 'd' = dog'.
  • 24. Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'. We make the assumption of independent and identically distributed outcomes. 24 Probability Theory c d h e c d c c h e d d h e e 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0
  • 25. Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'. We make the assumption of independent and identically distributed outcomes. 25 Probability Theory
  • 26. Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'. We make the assumption of independent and identically distributed outcomes. 26 Probability Theory Sample 1 : s1
  • 27. Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'. We make the assumption of independent and identically distributed outcomes. 27 Probability Theory h 0 0 1 0 0.07 0.01 0.6 0.3 s1 Sample 1 : s1
  • 28. Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'. We make the assumption of independent and identically distributed outcomes. 28 Probability Theory h e 0 0 1 0 0 1 0 0 0.07 0.01 0.6 0.3 0.03 0.8 0.1 0.07 Sample 2 : s2 s1 s2
  • 29. Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'. We make the assumption of independent and identically distributed outcomes. 29 Probability Theory h e c 0 0 1 0 0 1 0 0 1 0 0 0 0.07 0.01 0.6 0.3 0.03 0.8 0.1 0.07 0.4 0.05 0.05 0.5 Sample 3 : s3 s1 s2 s3
  • 30. Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'. We make the assumption of independent and identically distributed outcomes. 30 Probability Theory h e c d 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 0.07 0.01 0.6 0.3 0.03 0.8 0.1 0.07 0.4 0.05 0.05 0.5 0.4 0.01 0.09 0.5 Sample 4 : s4 s1 s2 s3 s4
  • 31. Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'. We make the assumption of independent and identically distributed outcomes. 31 Probability Theory h e c d c 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0.07 0.01 0.6 0.3 0.03 0.8 0.1 0.07 0.4 0.05 0.05 0.5 0.4 0.01 0.09 0.5 0.6 0.02 0.03 0.35 Sample 5 : s5 s1 s2 s3 s4 s5
  • 32. Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'. We make the assumption of independent and identically distributed outcomes. 32 Probability Theory h e c d c d 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0.07 0.01 0.6 0.3 0.03 0.8 0.1 0.07 0.4 0.05 0.05 0.5 0.4 0.01 0.09 0.5 0.6 0.02 0.03 0.35 0.28 0.02 0.1 0.6 Sample 6 : s6 s1 s2 s3 s4 s5 s6
  • 33. Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'. We make the assumption of independent and identically distributed outcomes. 33 K-L Divergence h 0 0 1 0 0.07 0.01 0.6 0.3 We want to use a metric that allows us to estimate the deviation of the probability distribution q from the probability distribution p. p is the true probability distribution q is the predicted probability distribution 𝑝(𝑥1|𝑠1) 𝑝(𝑥2|𝑠1) 𝑝(𝑥3|𝑠1) 𝑝(𝑥4|𝑠1) 𝑞(𝑥1|𝑠1) 𝑞(𝑥2|𝑠1) 𝑞(𝑥3|𝑠1) 𝑞(𝑥4|𝑠1) s1 Sample 1 : s1
  • 34. We want to use a metric that allows us to estimate the deviation of the probability distribution q from the probability distribution p. For simplification purposes, we put 𝑝 𝑥1 = 𝑝 𝑥1 𝑠1 and 𝑞 𝑥1 = 𝑞(𝑥1|𝑠1) 34 K-L Divergence h 0 0 1 0 0.07 0.01 0.6 0.3 p is the true probability distribution q is the predicted probability distribution 𝑝(𝑥1) 𝑝(𝑥2) 𝑝(𝑥3) 𝑝(𝑥4) 𝑞(𝑥1) 𝑞(𝑥2) 𝑞(𝑥3) 𝑞(𝑥4) s1 Sample 1 : s1
  • 35. We want to use a metric that allows us to estimate the deviation of the probability distribution q from the probability distribution p. 35 K-L Divergence h 0 0 1 0 0.07 0.01 0.6 0.3 p is the true probability distribution q is the predicted probability distribution p q Distance entre deux distribution de probabilité 𝐷𝐾𝐿(𝑝| 𝑞 = 𝑘=1,𝐾 𝑝 𝑥𝑘 log 𝑝 𝑥𝑘 𝑞 𝑥𝑘 For K classes x1,…xK 𝑝(𝑥1) 𝑝(𝑥2) 𝑝(𝑥3) 𝑝(𝑥4) 𝑞(𝑥1) 𝑞(𝑥2) 𝑞(𝑥3) 𝑞(𝑥4)
  • 36. 𝑝(𝑥1|𝑠𝑖) 𝑝(𝑥2|𝑠𝑖) 𝑝(𝑥3|𝑠𝑖) 𝑝(𝑥4|𝑠𝑖) 𝑞(𝑥1|𝑠𝑖) 𝑞(𝑥2|𝑠𝑖) 𝑞(𝑥3|𝑠𝑖) 𝑞(𝑥4|𝑠𝑖) We can also estimate the deviation of the probability distribution q from the probability distribution p using N samples. 36 K-L Divergence h e c d c d 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0.07 0.01 0.6 0.3 0.03 0.8 0.1 0.07 0.4 0.05 0.05 0.5 0.4 0.01 0.09 0.5 0.6 0.02 0.03 0.35 0.28 0.02 0.1 0.6 Probability Distribution 𝐷𝐾𝐿(𝑝| 𝑞 = 1 𝑁 𝑖=1,𝑁 𝑘=1,𝐾 𝑝 𝑥𝑘|𝑠𝑖 log 𝑝 𝑥𝑘|𝑠𝑖 𝑞 𝑥𝑘|𝑠𝑖
  • 37. Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'. We make the assumption of independent and identically distributed outcomes. 37 K-L Divergence for Neural Networks Dataset 0.07 0.01 0.6 0.3 0 0 1 0
  • 38. Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'. We make the assumption of independent and identically distributed outcomes. 38 K-L Divergence for Neural Networks Dataset 0.6 0.02 0.03 0.35 1 0 0 0