SlideShare a Scribd company logo
Monday reading books
on Machine Learning
JAOUAD DABOUNOU
FST of Settat
Hassan 1st University
February 21, 2022
004 – Introduction
Probability Theory
2
Introduction
A partir de ce lundi 31 janvier, une lecture de trois livres, dans le cadre de "Monday reading books on
machine learning".
Le premier livre, qui constituera le fil conducteur de toute l'action :
Christopher Bishop; Pattern Recognition and Machine Learning, Springer-Verlag New York Inc, 2006
Seront utilisées des parties de deux livres, surtout du livre :
Ian Goodfellow, Yoshua Bengio, Aaron Courville; Deep Learning, The MIT Press, 2016
et du livre :
Ovidiu Calin; Deep Learning Architectures: A Mathematical Approach, Springer, 2020
3
Introduction
4
Introduction
Consider two random variables X for Fruit and Y for Box.
X can take the values x1 = 'o' and x2 = 'a'.
Y can take the values y1 = 'r', y2 = 'b', y3 = 'br', y4 = 'v' and y5 = 'y' corresponding to the box color.
5
Probability Theory
blue brown
red yellow
violet
orange
apple
X: Fruit Y: Box
We will introduce some basic concepts of probability theory and information theory by considering the simple
example of fruits and boxes.
The probability distribution for a random variable describes how the probabilities are distributed over the values
of the random variable. It is the mathematical function that gives the probabilities of occurrence of different
possible outcomes.
6
Probability distribution
p(X='o') = 1
p(X='a') = 0
Probability distribution
We will introduce some basic concepts of probability theory and information theory by considering the simple
example of fruits and boxes.
The probability distribution for a random variable describes how the probabilities are distributed over the values
of the random variable. It is the mathematical function that gives the probabilities of occurrence of different
possible outcomes.
7
Probability distribution
p(X='o') = 0.5
p(X='a') = 0.5
Probability distribution
We will introduce some basic concepts of probability theory and information theory by considering the simple
example of fruits and boxes.
The probability distribution for a random variable describes how the probabilities are distributed over the values
of the random variable. It is the mathematical function that gives the probabilities of occurrence of different
possible outcomes.
8
Probability distribution
p(X='o') = 0.75
p(X='a') = 0.25
Probability distribution
Probability distribution can be used to quantify the relative
frequency of occurrences of uncertain events.
Probability distribution is a part of measurement
uncertainty analysis.
Information theory is the mathematical approach for the quantification,
storage and communication of digital information.
9
Information theory
Claude Shannon (1916 - 2001)
Associated with information theory are the concepts of probability, uncertainty, communication and noise in
data.
10
Information theory
Low uncertainty
High Knowledge
Low information
Low entropy
No surprise
High uncertainty
Low Knowledge
High information
High entropy Great surprise
Associated with information theory are the concepts of probability, uncertainty, communication and noise in
data.
11
Information theory
Low uncertainty
High Knowledge
Low information
Low entropy
No surprise
High uncertainty
Low Knowledge
High information
High entropy Great surprise
Associated with information theory are the concepts of probability, uncertainty, communication and noise in
data.
12
Information theory
Low uncertainty
High Knowledge
Low information
Low entropy
No suprise
High uncertainty
Low Knowledge
High information
High entropy Great surprise
Associated with information theory are the concepts of probability, uncertainty, communication and noise in
data.
13
Information theory
Low uncertainty
High Knowledge
Low information
Low entropy
No suprise
High uncertainty
Low Knowledge
High information
High entropy Great surprise
Associated with information theory are the concepts of probability, uncertainty, communication and noise in
data.
14
Information theory
Low uncertainty
High Knowledge
Low information
Low entropy
No suprise
High uncertainty
Low Knowledge
High information
High entropy Great surprise
The amount of information can be viewed as the ‘degree of surprise’ on learning the value of x. If we are told that a
highly improbable event has just occurred, we will have received more information than if we were told that some very
likely event has just occurred, and if we knew that the event was certain to happen we would receive no information.
Our measure of information content will therefore depend on the probability distribution p(x), and we therefore look
for a quantity h(x) that is a monotonic function of the probability p(x) and that expresses the information content.
15
Information theory
p(X='o') = 1
Probability
h(X='o') = -log2 p(x) = 0
Information
p(X='a') = 0.5
h(X='a') = -log2 p(x) = 1
p(X='a') = 0.125
h(X='a') = -log2 p(x) = 3
Amount of uncertainty
Entropy is a probabilistic measure of uncertainty or ignorance. Information is a measure of a reduction in that
uncertainty.
16
Entropy
p(X='o') = 0.875
Probability
h(X='o') = -log2 p(x) = 0.193
Information
p(X='a') = 0.125
p(X='a') = -log2 p(x) = 3
Given a probability distribution p(X), entropy H of the system can then be expressed as :
𝐻 𝑋 = −
𝑘=1,𝐾
𝑝 𝑥𝑘 log(𝑝 𝑥𝑘 )
Entropy H(X) reaches its maximum value if all outcomes of the random variable X have the same probability.
H(X) expresses the uncertainty or ignorance about the system outcomes. H(X) = 0, if and only if the probability of an
outcome is 1 and of all other is 0.
17
Entropy
H(X) = 0
Entropy H(X) = 0.54 H(X) = 0.81 H(X) = 0.91 H(X) = 1
Entropy can be considered as a measure of variability in a system.
No uncertainty Maximum uncertainty
𝑝(𝑐𝑎𝑡) =
5
20
= 0.25
Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'.
We make the assumption of independent and identically distributed outcomes.
18
Probability Theory
𝑝(𝑒𝑙𝑒𝑝ℎ𝑎𝑛𝑡) =
4
20
= 0.2
𝑝(𝑑𝑜𝑔) =
7
20
= 0.35
𝑝(ℎ𝑜𝑟𝑠𝑒) =
4
20
= 0.2
H(X) = −
𝑘=1,𝐾
𝑝 𝑥𝑘 log 𝑝 𝑥𝑘 = 1.96
Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'.
We make the assumption of independent and identically distributed outcomes.
19
Probability Theory
Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'.
We make the assumption of independent and identically distributed outcomes.
20
Probability Theory
Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'.
We make the assumption of independent and identically distributed outcomes.
21
Probability Theory
Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'.
We make the assumption of independent and identically distributed outcomes.
22
Probability Theory
Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'.
We make the assumption of independent and identically distributed outcomes.
23
Probability Theory
c d h e c d c c h e d d h e e
Let
'c' = 'cat'
'e' = 'elephant'
'h' = 'horse'
'd' = dog'.
Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'.
We make the assumption of independent and identically distributed outcomes.
24
Probability Theory
c d h e c d c c h e d d h e e
1
0
0
0
0
0
0
1
0
0
1
0
0
1
0
0
1
0
0
0
0
0
0
1
1
0
0
0
0
1
0
0
0
0
1
0
0
1
0
0
0
0
0
1
0
0
0
1
0
0
1
0
0
1
0
0
0
1
0
0
Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'.
We make the assumption of independent and identically distributed outcomes.
25
Probability Theory
Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'.
We make the assumption of independent and identically distributed outcomes.
26
Probability Theory
Sample 1 : s1
Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'.
We make the assumption of independent and identically distributed outcomes.
27
Probability Theory
h
0
0
1
0
0.07
0.01
0.6
0.3
s1
Sample 1 : s1
Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'.
We make the assumption of independent and identically distributed outcomes.
28
Probability Theory
h e
0
0
1
0
0
1
0
0
0.07
0.01
0.6
0.3
0.03
0.8
0.1
0.07
Sample 2 : s2
s1 s2
Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'.
We make the assumption of independent and identically distributed outcomes.
29
Probability Theory
h e c
0
0
1
0
0
1
0
0
1
0
0
0
0.07
0.01
0.6
0.3
0.03
0.8
0.1
0.07
0.4
0.05
0.05
0.5
Sample 3 : s3
s1 s2 s3
Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'.
We make the assumption of independent and identically distributed outcomes.
30
Probability Theory
h e c d
0
0
1
0
0
1
0
0
1
0
0
0
0
0
0
1
0.07
0.01
0.6
0.3
0.03
0.8
0.1
0.07
0.4
0.05
0.05
0.5
0.4
0.01
0.09
0.5
Sample 4 : s4
s1 s2 s3 s4
Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'.
We make the assumption of independent and identically distributed outcomes.
31
Probability Theory
h e c d c
0
0
1
0
0
1
0
0
1
0
0
0
0
0
0
1
1
0
0
0
0.07
0.01
0.6
0.3
0.03
0.8
0.1
0.07
0.4
0.05
0.05
0.5
0.4
0.01
0.09
0.5
0.6
0.02
0.03
0.35
Sample 5 : s5
s1 s2 s3 s4 s5
Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'.
We make the assumption of independent and identically distributed outcomes.
32
Probability Theory
h e c d c d
0
0
1
0
0
1
0
0
1
0
0
0
0
0
0
1
1
0
0
0
0
0
0
1
0.07
0.01
0.6
0.3
0.03
0.8
0.1
0.07
0.4
0.05
0.05
0.5
0.4
0.01
0.09
0.5
0.6
0.02
0.03
0.35
0.28
0.02
0.1
0.6
Sample 6 : s6
s1 s2 s3 s4 s5 s6
Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'.
We make the assumption of independent and identically distributed outcomes.
33
K-L Divergence
h
0
0
1
0
0.07
0.01
0.6
0.3
We want to use a metric that allows
us to estimate the deviation of the
probability distribution q from the
probability distribution p.
p is the true probability distribution
q is the predicted probability distribution
𝑝(𝑥1|𝑠1)
𝑝(𝑥2|𝑠1)
𝑝(𝑥3|𝑠1)
𝑝(𝑥4|𝑠1)
𝑞(𝑥1|𝑠1)
𝑞(𝑥2|𝑠1)
𝑞(𝑥3|𝑠1)
𝑞(𝑥4|𝑠1)
s1
Sample 1 : s1
We want to use a metric that allows us to estimate the deviation of the probability distribution q from the probability
distribution p. For simplification purposes, we put 𝑝 𝑥1 = 𝑝 𝑥1 𝑠1 and 𝑞 𝑥1 = 𝑞(𝑥1|𝑠1)
34
K-L Divergence
h
0
0
1
0
0.07
0.01
0.6
0.3
p is the true probability distribution
q is the predicted probability distribution
𝑝(𝑥1)
𝑝(𝑥2)
𝑝(𝑥3)
𝑝(𝑥4)
𝑞(𝑥1)
𝑞(𝑥2)
𝑞(𝑥3)
𝑞(𝑥4)
s1
Sample 1 : s1
We want to use a metric that allows us to estimate the deviation of the probability distribution q from the probability
distribution p.
35
K-L Divergence
h
0
0
1
0
0.07
0.01
0.6
0.3
p is the true probability distribution
q is the predicted probability distribution
p q
Distance entre deux
distribution de probabilité
𝐷𝐾𝐿(𝑝| 𝑞 =
𝑘=1,𝐾
𝑝 𝑥𝑘 log
𝑝 𝑥𝑘
𝑞 𝑥𝑘
For K classes x1,…xK
𝑝(𝑥1)
𝑝(𝑥2)
𝑝(𝑥3)
𝑝(𝑥4)
𝑞(𝑥1)
𝑞(𝑥2)
𝑞(𝑥3)
𝑞(𝑥4)
𝑝(𝑥1|𝑠𝑖)
𝑝(𝑥2|𝑠𝑖)
𝑝(𝑥3|𝑠𝑖)
𝑝(𝑥4|𝑠𝑖)
𝑞(𝑥1|𝑠𝑖)
𝑞(𝑥2|𝑠𝑖)
𝑞(𝑥3|𝑠𝑖)
𝑞(𝑥4|𝑠𝑖)
We can also estimate the deviation of the probability distribution q from the probability distribution p using N samples.
36
K-L Divergence
h e c d c d
0
0
1
0
0
1
0
0
1
0
0
0
0
0
0
1
1
0
0
0
0
0
0
1
0.07
0.01
0.6
0.3
0.03
0.8
0.1
0.07
0.4
0.05
0.05
0.5
0.4
0.01
0.09
0.5
0.6
0.02
0.03
0.35
0.28
0.02
0.1
0.6
Probability Distribution
𝐷𝐾𝐿(𝑝| 𝑞 =
1
𝑁
𝑖=1,𝑁 𝑘=1,𝐾
𝑝 𝑥𝑘|𝑠𝑖 log
𝑝 𝑥𝑘|𝑠𝑖
𝑞 𝑥𝑘|𝑠𝑖
Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'.
We make the assumption of independent and identically distributed outcomes.
37
K-L Divergence for Neural Networks
Dataset
0.07
0.01
0.6
0.3
0
0
1
0
Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'.
We make the assumption of independent and identically distributed outcomes.
38
K-L Divergence for Neural Networks
Dataset
0.6
0.02
0.03
0.35
1
0
0
0
39
THANK YOU
Introduction

More Related Content

What's hot

Soft Computing-173101
Soft Computing-173101Soft Computing-173101
Soft Computing-173101AMIT KUMAR
 
Lecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation MaximizationLecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation Maximizationbutest
 
Neural Networks: Rosenblatt's Perceptron
Neural Networks: Rosenblatt's PerceptronNeural Networks: Rosenblatt's Perceptron
Neural Networks: Rosenblatt's Perceptron
Mostafa G. M. Mostafa
 
Fuzzy logic and application in AI
Fuzzy logic and application in AIFuzzy logic and application in AI
Fuzzy logic and application in AI
Ildar Nurgaliev
 
What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?
Kazuki Yoshida
 
Principle of Maximum Entropy
Principle of Maximum EntropyPrinciple of Maximum Entropy
Principle of Maximum Entropy
Jiawang Liu
 
Hidden Markov Models with applications to speech recognition
Hidden Markov Models with applications to speech recognitionHidden Markov Models with applications to speech recognition
Hidden Markov Models with applications to speech recognitionbutest
 
Tutorial on Deep Generative Models
 Tutorial on Deep Generative Models Tutorial on Deep Generative Models
Tutorial on Deep Generative Models
MLReview
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Simplilearn
 
Introduction to Autoencoders
Introduction to AutoencodersIntroduction to Autoencoders
Introduction to Autoencoders
Yan Xu
 
Machine Learning Classifiers
Machine Learning ClassifiersMachine Learning Classifiers
Machine Learning Classifiers
Mostafa
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
mrizwan969
 
Edge Detection and Segmentation
Edge Detection and SegmentationEdge Detection and Segmentation
Edge Detection and Segmentation
A B Shinde
 
Artificial Neural Network Lect4 : Single Layer Perceptron Classifiers
Artificial Neural Network Lect4 : Single Layer Perceptron ClassifiersArtificial Neural Network Lect4 : Single Layer Perceptron Classifiers
Artificial Neural Network Lect4 : Single Layer Perceptron Classifiers
Mohammed Bennamoun
 
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...
Simplilearn
 
Bayesian learning
Bayesian learningBayesian learning
Bayesian learning
Rogier Geertzema
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational Autoencoder
Mark Chang
 
Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classification
Sung Yub Kim
 
Dbscan algorithom
Dbscan algorithomDbscan algorithom
Dbscan algorithom
Mahbubur Rahman Shimul
 

What's hot (20)

Soft Computing-173101
Soft Computing-173101Soft Computing-173101
Soft Computing-173101
 
Lecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation MaximizationLecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation Maximization
 
Neural Networks: Rosenblatt's Perceptron
Neural Networks: Rosenblatt's PerceptronNeural Networks: Rosenblatt's Perceptron
Neural Networks: Rosenblatt's Perceptron
 
Fuzzy logic and application in AI
Fuzzy logic and application in AIFuzzy logic and application in AI
Fuzzy logic and application in AI
 
What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?
 
07 approximate inference in bn
07 approximate inference in bn07 approximate inference in bn
07 approximate inference in bn
 
Principle of Maximum Entropy
Principle of Maximum EntropyPrinciple of Maximum Entropy
Principle of Maximum Entropy
 
Hidden Markov Models with applications to speech recognition
Hidden Markov Models with applications to speech recognitionHidden Markov Models with applications to speech recognition
Hidden Markov Models with applications to speech recognition
 
Tutorial on Deep Generative Models
 Tutorial on Deep Generative Models Tutorial on Deep Generative Models
Tutorial on Deep Generative Models
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
 
Introduction to Autoencoders
Introduction to AutoencodersIntroduction to Autoencoders
Introduction to Autoencoders
 
Machine Learning Classifiers
Machine Learning ClassifiersMachine Learning Classifiers
Machine Learning Classifiers
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Edge Detection and Segmentation
Edge Detection and SegmentationEdge Detection and Segmentation
Edge Detection and Segmentation
 
Artificial Neural Network Lect4 : Single Layer Perceptron Classifiers
Artificial Neural Network Lect4 : Single Layer Perceptron ClassifiersArtificial Neural Network Lect4 : Single Layer Perceptron Classifiers
Artificial Neural Network Lect4 : Single Layer Perceptron Classifiers
 
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...
 
Bayesian learning
Bayesian learningBayesian learning
Bayesian learning
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational Autoencoder
 
Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classification
 
Dbscan algorithom
Dbscan algorithomDbscan algorithom
Dbscan algorithom
 

Similar to Mrbml004 : Introduction to Information Theory for Machine Learning

random variation 9473 by jaideep.ppt
random variation 9473 by jaideep.pptrandom variation 9473 by jaideep.ppt
random variation 9473 by jaideep.ppt
BhartiYadav316049
 
Discrete and Continuous Random Variables
Discrete and Continuous Random VariablesDiscrete and Continuous Random Variables
Discrete and Continuous Random Variables
Cumberland County Schools
 
Statistics Homework Help
Statistics Homework HelpStatistics Homework Help
Statistics Homework Help
Statistics Homework Helper
 
Chapter1
Chapter1Chapter1
Probability
ProbabilityProbability
Probability
Neha Raikar
 
Discrete Probability Distributions
Discrete Probability DistributionsDiscrete Probability Distributions
Discrete Probability Distributionsmandalina landy
 
Probability distribution 2
Probability distribution 2Probability distribution 2
Probability distribution 2
Nilanjan Bhaumik
 
Unit 2 Probability
Unit 2 ProbabilityUnit 2 Probability
Unit 2 Probability
Rai University
 
Excel Homework Help
Excel Homework HelpExcel Homework Help
Excel Homework Help
Excel Homework Help
 
Probability distribution for Dummies
Probability distribution for DummiesProbability distribution for Dummies
Probability distribution for Dummies
Balaji P
 
AP Statistic and Probability 6.1 (1).ppt
AP Statistic and Probability 6.1 (1).pptAP Statistic and Probability 6.1 (1).ppt
AP Statistic and Probability 6.1 (1).ppt
AlfredNavea1
 
Mba i qt unit-4.1_introduction to probability distributions
Mba i qt unit-4.1_introduction to probability distributionsMba i qt unit-4.1_introduction to probability distributions
Mba i qt unit-4.1_introduction to probability distributions
Rai University
 
Mayo Slides: Part I Meeting #2 (Phil 6334/Econ 6614)
Mayo Slides: Part I Meeting #2 (Phil 6334/Econ 6614)Mayo Slides: Part I Meeting #2 (Phil 6334/Econ 6614)
Mayo Slides: Part I Meeting #2 (Phil 6334/Econ 6614)
jemille6
 
The axiomatic power of Kolmogorov complexity
The axiomatic power of Kolmogorov complexity The axiomatic power of Kolmogorov complexity
The axiomatic power of Kolmogorov complexity
lbienven
 
Bivariate Discrete Distribution
Bivariate Discrete DistributionBivariate Discrete Distribution
Bivariate Discrete Distribution
ArijitDhali
 
Probability
ProbabilityProbability
Probability
Anjali Devi J S
 
2 Review of Statistics. 2 Review of Statistics.
2 Review of Statistics. 2 Review of Statistics.2 Review of Statistics. 2 Review of Statistics.
2 Review of Statistics. 2 Review of Statistics.
WeihanKhor2
 
ISM_Session_5 _ 23rd and 24th December.pptx
ISM_Session_5 _ 23rd and 24th December.pptxISM_Session_5 _ 23rd and 24th December.pptx
ISM_Session_5 _ 23rd and 24th December.pptx
ssuser1eba67
 
Binomial probability distributions
Binomial probability distributions  Binomial probability distributions
Binomial probability distributions
Long Beach City College
 
Actuarial Pricing Game
Actuarial Pricing GameActuarial Pricing Game
Actuarial Pricing Game
Arthur Charpentier
 

Similar to Mrbml004 : Introduction to Information Theory for Machine Learning (20)

random variation 9473 by jaideep.ppt
random variation 9473 by jaideep.pptrandom variation 9473 by jaideep.ppt
random variation 9473 by jaideep.ppt
 
Discrete and Continuous Random Variables
Discrete and Continuous Random VariablesDiscrete and Continuous Random Variables
Discrete and Continuous Random Variables
 
Statistics Homework Help
Statistics Homework HelpStatistics Homework Help
Statistics Homework Help
 
Chapter1
Chapter1Chapter1
Chapter1
 
Probability
ProbabilityProbability
Probability
 
Discrete Probability Distributions
Discrete Probability DistributionsDiscrete Probability Distributions
Discrete Probability Distributions
 
Probability distribution 2
Probability distribution 2Probability distribution 2
Probability distribution 2
 
Unit 2 Probability
Unit 2 ProbabilityUnit 2 Probability
Unit 2 Probability
 
Excel Homework Help
Excel Homework HelpExcel Homework Help
Excel Homework Help
 
Probability distribution for Dummies
Probability distribution for DummiesProbability distribution for Dummies
Probability distribution for Dummies
 
AP Statistic and Probability 6.1 (1).ppt
AP Statistic and Probability 6.1 (1).pptAP Statistic and Probability 6.1 (1).ppt
AP Statistic and Probability 6.1 (1).ppt
 
Mba i qt unit-4.1_introduction to probability distributions
Mba i qt unit-4.1_introduction to probability distributionsMba i qt unit-4.1_introduction to probability distributions
Mba i qt unit-4.1_introduction to probability distributions
 
Mayo Slides: Part I Meeting #2 (Phil 6334/Econ 6614)
Mayo Slides: Part I Meeting #2 (Phil 6334/Econ 6614)Mayo Slides: Part I Meeting #2 (Phil 6334/Econ 6614)
Mayo Slides: Part I Meeting #2 (Phil 6334/Econ 6614)
 
The axiomatic power of Kolmogorov complexity
The axiomatic power of Kolmogorov complexity The axiomatic power of Kolmogorov complexity
The axiomatic power of Kolmogorov complexity
 
Bivariate Discrete Distribution
Bivariate Discrete DistributionBivariate Discrete Distribution
Bivariate Discrete Distribution
 
Probability
ProbabilityProbability
Probability
 
2 Review of Statistics. 2 Review of Statistics.
2 Review of Statistics. 2 Review of Statistics.2 Review of Statistics. 2 Review of Statistics.
2 Review of Statistics. 2 Review of Statistics.
 
ISM_Session_5 _ 23rd and 24th December.pptx
ISM_Session_5 _ 23rd and 24th December.pptxISM_Session_5 _ 23rd and 24th December.pptx
ISM_Session_5 _ 23rd and 24th December.pptx
 
Binomial probability distributions
Binomial probability distributions  Binomial probability distributions
Binomial probability distributions
 
Actuarial Pricing Game
Actuarial Pricing GameActuarial Pricing Game
Actuarial Pricing Game
 

More from Jaouad Dabounou

اللغة والذكاء الاصطناعي.pdf
اللغة والذكاء الاصطناعي.pdfاللغة والذكاء الاصطناعي.pdf
اللغة والذكاء الاصطناعي.pdf
Jaouad Dabounou
 
RNN avec mécanisme d'attention
RNN avec mécanisme d'attentionRNN avec mécanisme d'attention
RNN avec mécanisme d'attention
Jaouad Dabounou
 
Projection sur les ensembles convexes fermés
Projection sur les ensembles convexes fermésProjection sur les ensembles convexes fermés
Projection sur les ensembles convexes fermés
Jaouad Dabounou
 
Projection d’un point sur un ensemble
Projection d’un point sur un ensembleProjection d’un point sur un ensemble
Projection d’un point sur un ensemble
Jaouad Dabounou
 
Fonction distance à un ensemble
Fonction distance à un ensembleFonction distance à un ensemble
Fonction distance à un ensemble
Jaouad Dabounou
 
Théorèmes de Carathéodory
Théorèmes de CarathéodoryThéorèmes de Carathéodory
Théorèmes de Carathéodory
Jaouad Dabounou
 
Intérieurs relatifs d’ensembles convexes
Intérieurs relatifs d’ensembles convexesIntérieurs relatifs d’ensembles convexes
Intérieurs relatifs d’ensembles convexes
Jaouad Dabounou
 
Topologie des ensembles convexes
Topologie des ensembles convexesTopologie des ensembles convexes
Topologie des ensembles convexes
Jaouad Dabounou
 
Réseaux de neurones récurrents et LSTM
Réseaux de neurones récurrents et LSTMRéseaux de neurones récurrents et LSTM
Réseaux de neurones récurrents et LSTM
Jaouad Dabounou
 
Analyse Convexe TD – Série 1 avec correction
Analyse Convexe TD – Série 1 avec correctionAnalyse Convexe TD – Série 1 avec correction
Analyse Convexe TD – Série 1 avec correction
Jaouad Dabounou
 
Modèles de langue : Ngrammes
Modèles de langue : NgrammesModèles de langue : Ngrammes
Modèles de langue : Ngrammes
Jaouad Dabounou
 
Analyse Factorielle des Correspondances
Analyse Factorielle des CorrespondancesAnalyse Factorielle des Correspondances
Analyse Factorielle des Correspondances
Jaouad Dabounou
 
Analyse en Composantes Principales
Analyse en Composantes PrincipalesAnalyse en Composantes Principales
Analyse en Composantes Principales
Jaouad Dabounou
 
W2 vec001
W2 vec001W2 vec001
W2 vec001
Jaouad Dabounou
 
Analyse en Composantes Principales
Analyse en Composantes PrincipalesAnalyse en Composantes Principales
Analyse en Composantes Principales
Jaouad Dabounou
 
Analyse numérique interpolation
Analyse numérique interpolationAnalyse numérique interpolation
Analyse numérique interpolation
Jaouad Dabounou
 
Polycopie Analyse Numérique
Polycopie Analyse NumériquePolycopie Analyse Numérique
Polycopie Analyse Numérique
Jaouad Dabounou
 
Sélection de contrôles avec correction
Sélection de contrôles avec correctionSélection de contrôles avec correction
Sélection de contrôles avec correction
Jaouad Dabounou
 
Dérivation et Intégration numériques
Dérivation et Intégration numériquesDérivation et Intégration numériques
Dérivation et Intégration numériques
Jaouad Dabounou
 

More from Jaouad Dabounou (19)

اللغة والذكاء الاصطناعي.pdf
اللغة والذكاء الاصطناعي.pdfاللغة والذكاء الاصطناعي.pdf
اللغة والذكاء الاصطناعي.pdf
 
RNN avec mécanisme d'attention
RNN avec mécanisme d'attentionRNN avec mécanisme d'attention
RNN avec mécanisme d'attention
 
Projection sur les ensembles convexes fermés
Projection sur les ensembles convexes fermésProjection sur les ensembles convexes fermés
Projection sur les ensembles convexes fermés
 
Projection d’un point sur un ensemble
Projection d’un point sur un ensembleProjection d’un point sur un ensemble
Projection d’un point sur un ensemble
 
Fonction distance à un ensemble
Fonction distance à un ensembleFonction distance à un ensemble
Fonction distance à un ensemble
 
Théorèmes de Carathéodory
Théorèmes de CarathéodoryThéorèmes de Carathéodory
Théorèmes de Carathéodory
 
Intérieurs relatifs d’ensembles convexes
Intérieurs relatifs d’ensembles convexesIntérieurs relatifs d’ensembles convexes
Intérieurs relatifs d’ensembles convexes
 
Topologie des ensembles convexes
Topologie des ensembles convexesTopologie des ensembles convexes
Topologie des ensembles convexes
 
Réseaux de neurones récurrents et LSTM
Réseaux de neurones récurrents et LSTMRéseaux de neurones récurrents et LSTM
Réseaux de neurones récurrents et LSTM
 
Analyse Convexe TD – Série 1 avec correction
Analyse Convexe TD – Série 1 avec correctionAnalyse Convexe TD – Série 1 avec correction
Analyse Convexe TD – Série 1 avec correction
 
Modèles de langue : Ngrammes
Modèles de langue : NgrammesModèles de langue : Ngrammes
Modèles de langue : Ngrammes
 
Analyse Factorielle des Correspondances
Analyse Factorielle des CorrespondancesAnalyse Factorielle des Correspondances
Analyse Factorielle des Correspondances
 
Analyse en Composantes Principales
Analyse en Composantes PrincipalesAnalyse en Composantes Principales
Analyse en Composantes Principales
 
W2 vec001
W2 vec001W2 vec001
W2 vec001
 
Analyse en Composantes Principales
Analyse en Composantes PrincipalesAnalyse en Composantes Principales
Analyse en Composantes Principales
 
Analyse numérique interpolation
Analyse numérique interpolationAnalyse numérique interpolation
Analyse numérique interpolation
 
Polycopie Analyse Numérique
Polycopie Analyse NumériquePolycopie Analyse Numérique
Polycopie Analyse Numérique
 
Sélection de contrôles avec correction
Sélection de contrôles avec correctionSélection de contrôles avec correction
Sélection de contrôles avec correction
 
Dérivation et Intégration numériques
Dérivation et Intégration numériquesDérivation et Intégration numériques
Dérivation et Intégration numériques
 

Recently uploaded

Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 

Recently uploaded (20)

Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 

Mrbml004 : Introduction to Information Theory for Machine Learning

  • 1. Monday reading books on Machine Learning JAOUAD DABOUNOU FST of Settat Hassan 1st University February 21, 2022 004 – Introduction Probability Theory
  • 2. 2 Introduction A partir de ce lundi 31 janvier, une lecture de trois livres, dans le cadre de "Monday reading books on machine learning". Le premier livre, qui constituera le fil conducteur de toute l'action : Christopher Bishop; Pattern Recognition and Machine Learning, Springer-Verlag New York Inc, 2006 Seront utilisées des parties de deux livres, surtout du livre : Ian Goodfellow, Yoshua Bengio, Aaron Courville; Deep Learning, The MIT Press, 2016 et du livre : Ovidiu Calin; Deep Learning Architectures: A Mathematical Approach, Springer, 2020
  • 5. Consider two random variables X for Fruit and Y for Box. X can take the values x1 = 'o' and x2 = 'a'. Y can take the values y1 = 'r', y2 = 'b', y3 = 'br', y4 = 'v' and y5 = 'y' corresponding to the box color. 5 Probability Theory blue brown red yellow violet orange apple X: Fruit Y: Box
  • 6. We will introduce some basic concepts of probability theory and information theory by considering the simple example of fruits and boxes. The probability distribution for a random variable describes how the probabilities are distributed over the values of the random variable. It is the mathematical function that gives the probabilities of occurrence of different possible outcomes. 6 Probability distribution p(X='o') = 1 p(X='a') = 0 Probability distribution
  • 7. We will introduce some basic concepts of probability theory and information theory by considering the simple example of fruits and boxes. The probability distribution for a random variable describes how the probabilities are distributed over the values of the random variable. It is the mathematical function that gives the probabilities of occurrence of different possible outcomes. 7 Probability distribution p(X='o') = 0.5 p(X='a') = 0.5 Probability distribution
  • 8. We will introduce some basic concepts of probability theory and information theory by considering the simple example of fruits and boxes. The probability distribution for a random variable describes how the probabilities are distributed over the values of the random variable. It is the mathematical function that gives the probabilities of occurrence of different possible outcomes. 8 Probability distribution p(X='o') = 0.75 p(X='a') = 0.25 Probability distribution Probability distribution can be used to quantify the relative frequency of occurrences of uncertain events. Probability distribution is a part of measurement uncertainty analysis.
  • 9. Information theory is the mathematical approach for the quantification, storage and communication of digital information. 9 Information theory Claude Shannon (1916 - 2001)
  • 10. Associated with information theory are the concepts of probability, uncertainty, communication and noise in data. 10 Information theory Low uncertainty High Knowledge Low information Low entropy No surprise High uncertainty Low Knowledge High information High entropy Great surprise
  • 11. Associated with information theory are the concepts of probability, uncertainty, communication and noise in data. 11 Information theory Low uncertainty High Knowledge Low information Low entropy No surprise High uncertainty Low Knowledge High information High entropy Great surprise
  • 12. Associated with information theory are the concepts of probability, uncertainty, communication and noise in data. 12 Information theory Low uncertainty High Knowledge Low information Low entropy No suprise High uncertainty Low Knowledge High information High entropy Great surprise
  • 13. Associated with information theory are the concepts of probability, uncertainty, communication and noise in data. 13 Information theory Low uncertainty High Knowledge Low information Low entropy No suprise High uncertainty Low Knowledge High information High entropy Great surprise
  • 14. Associated with information theory are the concepts of probability, uncertainty, communication and noise in data. 14 Information theory Low uncertainty High Knowledge Low information Low entropy No suprise High uncertainty Low Knowledge High information High entropy Great surprise
  • 15. The amount of information can be viewed as the ‘degree of surprise’ on learning the value of x. If we are told that a highly improbable event has just occurred, we will have received more information than if we were told that some very likely event has just occurred, and if we knew that the event was certain to happen we would receive no information. Our measure of information content will therefore depend on the probability distribution p(x), and we therefore look for a quantity h(x) that is a monotonic function of the probability p(x) and that expresses the information content. 15 Information theory p(X='o') = 1 Probability h(X='o') = -log2 p(x) = 0 Information p(X='a') = 0.5 h(X='a') = -log2 p(x) = 1 p(X='a') = 0.125 h(X='a') = -log2 p(x) = 3 Amount of uncertainty
  • 16. Entropy is a probabilistic measure of uncertainty or ignorance. Information is a measure of a reduction in that uncertainty. 16 Entropy p(X='o') = 0.875 Probability h(X='o') = -log2 p(x) = 0.193 Information p(X='a') = 0.125 p(X='a') = -log2 p(x) = 3 Given a probability distribution p(X), entropy H of the system can then be expressed as : 𝐻 𝑋 = − 𝑘=1,𝐾 𝑝 𝑥𝑘 log(𝑝 𝑥𝑘 )
  • 17. Entropy H(X) reaches its maximum value if all outcomes of the random variable X have the same probability. H(X) expresses the uncertainty or ignorance about the system outcomes. H(X) = 0, if and only if the probability of an outcome is 1 and of all other is 0. 17 Entropy H(X) = 0 Entropy H(X) = 0.54 H(X) = 0.81 H(X) = 0.91 H(X) = 1 Entropy can be considered as a measure of variability in a system. No uncertainty Maximum uncertainty
  • 18. 𝑝(𝑐𝑎𝑡) = 5 20 = 0.25 Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'. We make the assumption of independent and identically distributed outcomes. 18 Probability Theory 𝑝(𝑒𝑙𝑒𝑝ℎ𝑎𝑛𝑡) = 4 20 = 0.2 𝑝(𝑑𝑜𝑔) = 7 20 = 0.35 𝑝(ℎ𝑜𝑟𝑠𝑒) = 4 20 = 0.2 H(X) = − 𝑘=1,𝐾 𝑝 𝑥𝑘 log 𝑝 𝑥𝑘 = 1.96
  • 19. Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'. We make the assumption of independent and identically distributed outcomes. 19 Probability Theory
  • 20. Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'. We make the assumption of independent and identically distributed outcomes. 20 Probability Theory
  • 21. Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'. We make the assumption of independent and identically distributed outcomes. 21 Probability Theory
  • 22. Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'. We make the assumption of independent and identically distributed outcomes. 22 Probability Theory
  • 23. Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'. We make the assumption of independent and identically distributed outcomes. 23 Probability Theory c d h e c d c c h e d d h e e Let 'c' = 'cat' 'e' = 'elephant' 'h' = 'horse' 'd' = dog'.
  • 24. Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'. We make the assumption of independent and identically distributed outcomes. 24 Probability Theory c d h e c d c c h e d d h e e 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0
  • 25. Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'. We make the assumption of independent and identically distributed outcomes. 25 Probability Theory
  • 26. Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'. We make the assumption of independent and identically distributed outcomes. 26 Probability Theory Sample 1 : s1
  • 27. Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'. We make the assumption of independent and identically distributed outcomes. 27 Probability Theory h 0 0 1 0 0.07 0.01 0.6 0.3 s1 Sample 1 : s1
  • 28. Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'. We make the assumption of independent and identically distributed outcomes. 28 Probability Theory h e 0 0 1 0 0 1 0 0 0.07 0.01 0.6 0.3 0.03 0.8 0.1 0.07 Sample 2 : s2 s1 s2
  • 29. Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'. We make the assumption of independent and identically distributed outcomes. 29 Probability Theory h e c 0 0 1 0 0 1 0 0 1 0 0 0 0.07 0.01 0.6 0.3 0.03 0.8 0.1 0.07 0.4 0.05 0.05 0.5 Sample 3 : s3 s1 s2 s3
  • 30. Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'. We make the assumption of independent and identically distributed outcomes. 30 Probability Theory h e c d 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 0.07 0.01 0.6 0.3 0.03 0.8 0.1 0.07 0.4 0.05 0.05 0.5 0.4 0.01 0.09 0.5 Sample 4 : s4 s1 s2 s3 s4
  • 31. Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'. We make the assumption of independent and identically distributed outcomes. 31 Probability Theory h e c d c 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0.07 0.01 0.6 0.3 0.03 0.8 0.1 0.07 0.4 0.05 0.05 0.5 0.4 0.01 0.09 0.5 0.6 0.02 0.03 0.35 Sample 5 : s5 s1 s2 s3 s4 s5
  • 32. Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'. We make the assumption of independent and identically distributed outcomes. 32 Probability Theory h e c d c d 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0.07 0.01 0.6 0.3 0.03 0.8 0.1 0.07 0.4 0.05 0.05 0.5 0.4 0.01 0.09 0.5 0.6 0.02 0.03 0.35 0.28 0.02 0.1 0.6 Sample 6 : s6 s1 s2 s3 s4 s5 s6
  • 33. Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'. We make the assumption of independent and identically distributed outcomes. 33 K-L Divergence h 0 0 1 0 0.07 0.01 0.6 0.3 We want to use a metric that allows us to estimate the deviation of the probability distribution q from the probability distribution p. p is the true probability distribution q is the predicted probability distribution 𝑝(𝑥1|𝑠1) 𝑝(𝑥2|𝑠1) 𝑝(𝑥3|𝑠1) 𝑝(𝑥4|𝑠1) 𝑞(𝑥1|𝑠1) 𝑞(𝑥2|𝑠1) 𝑞(𝑥3|𝑠1) 𝑞(𝑥4|𝑠1) s1 Sample 1 : s1
  • 34. We want to use a metric that allows us to estimate the deviation of the probability distribution q from the probability distribution p. For simplification purposes, we put 𝑝 𝑥1 = 𝑝 𝑥1 𝑠1 and 𝑞 𝑥1 = 𝑞(𝑥1|𝑠1) 34 K-L Divergence h 0 0 1 0 0.07 0.01 0.6 0.3 p is the true probability distribution q is the predicted probability distribution 𝑝(𝑥1) 𝑝(𝑥2) 𝑝(𝑥3) 𝑝(𝑥4) 𝑞(𝑥1) 𝑞(𝑥2) 𝑞(𝑥3) 𝑞(𝑥4) s1 Sample 1 : s1
  • 35. We want to use a metric that allows us to estimate the deviation of the probability distribution q from the probability distribution p. 35 K-L Divergence h 0 0 1 0 0.07 0.01 0.6 0.3 p is the true probability distribution q is the predicted probability distribution p q Distance entre deux distribution de probabilité 𝐷𝐾𝐿(𝑝| 𝑞 = 𝑘=1,𝐾 𝑝 𝑥𝑘 log 𝑝 𝑥𝑘 𝑞 𝑥𝑘 For K classes x1,…xK 𝑝(𝑥1) 𝑝(𝑥2) 𝑝(𝑥3) 𝑝(𝑥4) 𝑞(𝑥1) 𝑞(𝑥2) 𝑞(𝑥3) 𝑞(𝑥4)
  • 36. 𝑝(𝑥1|𝑠𝑖) 𝑝(𝑥2|𝑠𝑖) 𝑝(𝑥3|𝑠𝑖) 𝑝(𝑥4|𝑠𝑖) 𝑞(𝑥1|𝑠𝑖) 𝑞(𝑥2|𝑠𝑖) 𝑞(𝑥3|𝑠𝑖) 𝑞(𝑥4|𝑠𝑖) We can also estimate the deviation of the probability distribution q from the probability distribution p using N samples. 36 K-L Divergence h e c d c d 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0.07 0.01 0.6 0.3 0.03 0.8 0.1 0.07 0.4 0.05 0.05 0.5 0.4 0.01 0.09 0.5 0.6 0.02 0.03 0.35 0.28 0.02 0.1 0.6 Probability Distribution 𝐷𝐾𝐿(𝑝| 𝑞 = 1 𝑁 𝑖=1,𝑁 𝑘=1,𝐾 𝑝 𝑥𝑘|𝑠𝑖 log 𝑝 𝑥𝑘|𝑠𝑖 𝑞 𝑥𝑘|𝑠𝑖
  • 37. Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'. We make the assumption of independent and identically distributed outcomes. 37 K-L Divergence for Neural Networks Dataset 0.07 0.01 0.6 0.3 0 0 1 0
  • 38. Consider here a random variable X for Animal. X can take the values x1 = 'cat', x2 = 'elephant', x3 = 'horse' and x4 = dog'. We make the assumption of independent and identically distributed outcomes. 38 K-L Divergence for Neural Networks Dataset 0.6 0.02 0.03 0.35 1 0 0 0