Probability Review
• Probability
• Random variables
• One random variables
• Two or more random variables
Why Probability?
• Probability represents a (standardized)
measure of chance, and quantifies
uncertainty.
• Statistics uses rules of probability as a tool
for making inferences about or describing a
population using data from a sample
Probability v.s. Statistics
Diagram showing the difference between statistics and probability. (Image by MIT OpenCourseWare.
Based on Gilbert, Norma. Statistics. W.B. Saunders Co., 1976.)
Sample space and Events
• Sample space, result of an experiment
Ex: If you toss a coin twice, 
• Event A: a subset of 
Ex: First toss is head A= {HH,HT}
• S: Event space, a set of events
Ex: {A|A }
Terms
 
, , , are called "outcomes" or "sample points"
the elements of are called "sample points"
That's why are called a "sam
, ,
ple sp
,
ace".
HH HT T
HH HT TH T
H TT
o
T
S


 


Three Probability Axioms
Probability satisfies
• Nonnegativity: for every event A,
• Normalization:
• Additivity: for mutually exclusive events

( ) 0
P A




  
1
1
( ) ( )
i i
i
i
P A P A
 
 1
P
 
   
: | [0,1]
P S A A
i
A
If A is an event, then
P(A) is the probability that
event A occurs.
L. Wang, Department of Statistics University of South Carolina; Slide 13
Mutually Exclusive Events
• Mutually exclusive events can not occur
at the same time.
Mutually Exclusive Events Not Mutually
Exclusive Events
 
Properties of Probability
Ex: If A and B are mutually exclusive, then
1. P(A∩B)=0
2. P(A∪B) = P ( A )+ P ( B )
 
 
1 n
1
1
(1) 0
(2) , , : mutually exclusive ( ) ( )
(3) 1 ( )
(4) ( ) ( )
(5) ( ) ( ) ( ) ( )
n
n
n i
i
i
C
P
A A P A P A
P A P A
A B P A P B
P A B P A P B P A B


 
  
 
  
    


Conditional Probability
• More generally, we can express the
conditional probability of B given that A has
occurred as:
• We can rewrite this formula to get the
Multiplicative Rule of Probability:
( )
( | )
( )
P A B
P B A
P A


( ) ( | ) ( )
P A B P B A P A
 
Conditional Probability
• In some cases events are related, so that if we
know event A has occurred then we learn more
about an event B
• Example: Roll a die
A: observe an even number {2,4,6}
B: observe a number less than 4 {1,2,3}
If we know nothing else, then P(B) = 3/6 = 1/2
But if we know A has occurred, then P(B | A) = 1/3
We purchase 30% of our parts from Vendor V.
Vendor V’s defective rate is 5%.
What is the probability that
a randomly chosen part is defective and from Vendor V?
① 0.200
② 0.050
③ 0.015
④ 0.030
L. Wang, Department of Statistics University of South Carolina; Slide 17
P(A)=0.3: We purchase 30% of our parts from Vendor V.
P(B|A)=0.05: Vendor V’s defective rate is 5%.
P(A∩B)=?: What is the probability that
a randomly chosen part is defective and from Vendor V?
[Example of Conditional Probability]
Independence
• Events are not always be related.
• Events A and B are independent
if and only if:
iff:
( | ) ( )
P B A P B

( ) ( ) ( )
P A B P A P B
 
Sample spaces have inconvenience?
Because samples spaces are not always
numbers.
What happens when you are dealing with six
different outcomes counted over 1000
individual trials?
We need something that allows us to visualize
the distribution of such items in a meaningful
way that charts and lists of sample spaces and
events couldn't possibly accomplish.
http://mathhelpforum.com/statistics/113891-need-random-variables.html
Sample spaces have inconvenience?
Sample spaces have inconvenience.
So we need something, that are functions
called Random variables.
The term “random” usually means not able to be predicted or happening
by chance. It
Sample spaces have inconvenience.
So we need something, that are functions
called Random variables.
The term “Random Variables”
• Random variable
-Its value is determined by the outcome of an
experiment.
-It takes on a new value each time the
experiment is performed.
-That is why it is “variable”.
• The term “random” means
-not able to be predicted or happening by
chance. It
Random Variable as a
Measurement
• Thus a random variable can be thought of
as a measurement on an experiment
Domain (정의역) Codomain (공역)
Function (함수)
Random Variables
 
X
Roll two dice.
Let X = the sum of the outcomes.
Random Variables
Examples of Random Variables
• Roll two dice.
Let X = number of sixes.
– Possible values of X = {0, 1, 2}.
• Select a player on the LotteGiants.
Let X = his batting average.
– Possible values of X are
{x | 0 ≤ x ≤ 1}.
• Roll two dice; X = no. of sixes.
Event A={(6,6)}  X=2
P(A)=P(X=2)
• Throw two coins; X = no. of heads.
Event A={(H,T), (T,H)}  X=1
P(A)=P(X=1)
Events can be expressed by random variables.
 
 
 
 
(1) ( ) | ( )
(2) ( ) | ( )
P X a P X a
P a X b P a X b
 
 
   
     
Events can be expressed by random variables.
Why Use a Random Variable?
• We are interested in the number of sixes.
• So why not let the sample space be the
possible number of sixes?
S = {0, 1, 2}
• Would that be wrong?
Why Use a Random Variable?
• The random variable allows us to set up
the sample space in any way that is
convenient.
• Then, through the random variable, we
can focus on the characteristic of interest.
• In many experiments, it is easier to deal with a
summary variable than with the original probability
structure.
[Example] In an opinion poll, we ask 50 people whether
agree or disagree with a certain issue.
– Suppose we record a "1" for agree and "0" for disagree.
– The sample space for this experiment has 250 elements.
• Suppose we are only interested in the number of
people who agree.
– Define X=number of "1“ ‘s recorded out of 50.
– Easier to deal with this sample space (has only 51 elements).
CS479/679 Pattern Recognition Spring 2013 – Dr. George Bebis
Why Use a Random Variable?
Why Use a Random Variable?
• We design the sample space so that it
will be easy to find the probabilities.
• This may involve more than just the
characteristic in which we are interested.
 
: [0,1] defined as
( )
F R
F x P X x

 
(Cumulative) Distribution function F of
a random variable X.
Two Types of Random Variables
• Discrete Random Variable – A random
variable whose set of possible values is a
discrete set.
• Continuous Random Variable – A
random variable whose set of possible
values is a continuous set.
• In the following two examples, are they
discrete or continuous?
Examples of Random Variables
• Roll two dice.
Let X = number of sixes.
– Possible values of X = {0, 1, 2}.
• Select a player on the LotteGiants.
Let X = his batting average.
– Possible values of X are
{x | 0 ≤ x ≤ 1}.
Two Types of Random Variables
• Discrete random variables
– Number of sales
– Number of calls
– People in line
– Mistakes per page
• Continuous random variables
– Length
– Depth
– Volume
– Time
– Weight
McClave, Statistics, 11th ed. Chapter 4: Discrete Random Variables
Discrete random variables
Probability mass function(pmf)
( ) ( )
p x P X x
 
Example of a pmf
• Roll two dice; X = no. of sixes.
– P(X = 0) = 25/36.
– P(X = 1) = 10/36.
– P(X = 2) = 1/36.
41
pmf
Example of a pmf
Example of a pmf
• Suppose that 10% of all households have
no children, 30% have one child, 40%
have two children, and 20% have three
children.
• Select a household at random and
let X = number of children.
• What is the pmf of X?
Example of a pmf
• We may list each value.
– P(X = 0) = 0.10
– P(X = 1) = 0.30
– P(X = 2) = 0.40
– P(X = 3) = 0.20
• Or we may present it as a chart.
x P(X = x)
0 0.10
1 0.30
2 0.40
3 0.20
Example of a pmf
• Or we may present it as a stick
graph.
x
P(X = x)
0 1 2 3
0.10
0.20
0.30
0.40
Example of a pmf
• Or we may present it as a histogram.
x
P(X = x)
0 1 2 3
0.10
0.20
0.30
0.40
Example of a pmf
• Mean of a Discrete Random Variable
– The weighted average of all of its
values. The weights are the probabilities.
• Variance of a Discrete Random Variable
The weighted average of the squared
deviations from the mean.
• Standard variance
The Mean of
a Discrete Random Variable
• The mean is also called the expected
value.
• However, that does not mean that it is
literally the value that we expect to see.
• “Expected value” is simply a synonym for
the mean or average.
Example of the Mean
• Recall the example where X was the
number of children in a household.
x P(X = x)
0 0.10
1 0.30
2 0.40
3 0.20
Example of the Mean
• Multiply each x by the corresponding
probability.
x P(X = x) xP(X = x)
0 0.10 0.00
1 0.30 0.30
2 0.40 0.80
3 0.20 0.60
Example of the Mean
• Add up the column of products to
get the mean.
x P(X = x) xP(X = x)
0 0.10 0.00
1 0.30 0.30
2 0.40 0.80
3 0.20 0.60
1.70 = µ
Example of the Variance
x P(X = x)
0 0.10
1 0.30
2 0.40
3 0.20
   
 
2 2
2 2 2 2 2
( )
0 0.1 1 0.3 2 0.4 3 0.2 1.7
V X E X E X
 
        
Distributions of
discrete random variables.
• Discrete Uniform Distribution
• Bernoulli Distribution
• Binomial Distribution
• Geometric Distribution
• …
Continuous random variables.
G. Baker, Department of Statistics University of South Carolina; Slide 56
Continuous Random Variable
• A continuous random variable is one
for which the outcome can be any value
in an interval of the real number line.
• Usually a measurement.
• Examples
– Let Y = length in mm
– Let Y = time in seconds
– Let Y = temperature in ºC
Continuous Random Variable
• We don’t calculate P(Y = y), we calculate
P(a < Y < b), where a and b are real
numbers.
• For a continuous random variable
P(Y = y) = 0.
Probability density function(pdf)
(1) : pdf of ( ) ( )
(2) : pdf of
( ) ( )
( ) ( ) ( : CDF of X )
x
b
a
f X P X x f y dy
f X
P a X b f x dx
d
f x F x F
dx

  
   
 


Time Spent Waiting for a Bus
• A bus arrives at a bus stop every 30
minutes. If a person arrives at the bus
stop at a random time, what is the
probability that the person will have to
wait less than 10 minutes for the next
bus?
G. Baker, Department of Statistics
University of South Carolina; Slide 60
Identify the Random Variable
• Let Y = wait time in minutes.
• Since the arrival time is random, someone
is as likely to arrive 1 minute before a bus
arrives as 2 minutes, as 3 minutes, etc.
Wait Time in Minutes
0 5 10 15 20 25 30
Wait Time
1.0
1/30
pdf of Y
f(y)=
What is the probability a person will wait l
ess than 10 minutes?
Wait Time in Minutes
0 5 10 15 20 25 30
Wait Time
10/30 = 0.33 20/30 = 0.67
This is called a continuous uniform distribution.
1/30
G. Baker, Department of Statistics
University of South Carolina; Slide 62
Properties of a pdf f
1) f(y) > 0 for all possible intervals of y.
2)
3) If y0 is a specific value of interest, then the
cumulative distribution function (cdf) is
4) If y1 and y2 are specific values of interest,
then




 1
)
( dy
y
f





0
)
(
)
(
)
( 0
0
y
dy
y
f
y
Y
P
y
F
 




2
1
)
(
)
(
)
(
)
( 1
2
2
1
y
y
y
F
y
F
dy
y
f
y
Y
y
P
G. Baker, Department of Statistics
University of South Carolina; Slide 63
Expected Value for a Continuous
Random Variable
• Recall Expected Value for a Discrete Random
Variable:
• Expected value for a continuous random
variable:
 

 )
(
)
( y
p
y
Y
E 





 dy
y
yf
Y
E )
(
)
( 
Variance for a Continuous
Random Variable
 
2 2
( ) ( ) ( ) ( )
Var Y E y y p y
 
   

 
2 2
( ) ( ) ( ) ( )
Var Y E y y f y dy
 


   

Recall: Variance for a Discrete Random Variable:
Variance for a Continuous Random Variable:
G. Baker, Department of Statistics
University of South Carolina; Slide 65
 
2 2
( ) ( ) ( ) ( )
Var Y E y y f y dy
 


   

dy
y
f
y
y )
(
)
2
( 2
2






 







 2
2
2
2
)
( 

dy
y
f
y
  











 dy
y
f
dy
y
yf
dy
y
f
y )
(
)
(
2
)
( 2
2


2
2
2
2
)
(
)
( 
 


 



Y
E
dy
y
f
y
Mean and Variance
Normal Distribution
Importance of
Normal Distribution
• 1. Describes Many Random Processes or
Continuous Phenomena
• 2. Can Be Used to Approximate Discrete
Probability Distributions
– Example: Binomial
• 3. Basis for Classical Statistical Inference
Normal Distribution
• 1. ‘Bell-Shaped’ &
Symmetrical
2. Mean, Median,
Mode Are Equal
3. ‘Middle Spread’
Is 1.33 
4. Random Variable
Has Infinite Range
Mean
Median
Mode
X
f(X)
X
f(X)
Normal Distribution Useful
Properties
• About half of “weight”
below mean (because
symmetrical)
• About 68% of probability
within 1 standard deviation
of mean (at change in
curve)
• About 95% of probability
within 2 standard
deviations
• More than 99% of
probability within 3
standard deviations
Mean
Median
Mode
X
f(X)
X
f(X)

 

  
 

 2
 
 2
 
 3


 3

Probability
Density Function
2
1
2
1
( ) e
2
x
f x


 

 
  
 

2
1
2
1
( ) e
2
x
f x


 

 
  
 

• x = Value of Random Variable (- < x < )
•  = Population Standard Deviation
 =3.14159
e = 2.71828
•  = Mean of Random Variable x
Notation
•X is N(μ,σ)
•The random variable X has a normal
distribution (N) with mean μ and standard
deviation σ.
•X is N(40,1)
•X is N(10,5)
•X is N(50,3)
Standardize the
Normal Distribution
X


X


One table!
One table!
Normal
Distribution
Normal
Distribution
 = 0
= 1
Z
 = 0
= 1
Z
X
Z




X
Z



 Standardized Normal
Distribution
Standardized Normal
Distribution
Z is N(0,1)
Standardizing Example
X
= 5
 = 10
6.2 X
= 5
 = 10
6.2
Normal
Distribution
Normal
Distribution
6.2 5
.12
10
X
Z


 
  
6.2 5
.12
10
X
Z


 
  
Z
= 0
 = 1
.12 Z
= 0
 = 1
.12
Standardized Normal
Distribution
Standardized Normal
Distribution
Joint Distribution of
two or more
random variables
Joint Distribution of
two or more
discrete random variables
Joint Distribution of two or More Random Variables
• Sometimes more than one measurement (r.v.) is taken on each member of
the sample space. In cases like this there will be a few random variables
defined on the same probability space and we would like to explore their
joint distribution.
• Joint behavior of 2 random variable (continuous or discrete), X and Y
determined by their joint cumulative distribution function
• n – dimensional case
   .
,
,
, y
Y
x
X
P
y
x
F Y
X 


   .
,
,
,..., 1
1
1
,...,
1 n
n
n
X
X x
X
x
X
P
x
x
F n


 
Discrete case
• Suppose X, Y are discrete random variables defined on the same probability
space.
• The joint probability mass function of 2 discrete random variables X and Y
is the function pX,Y(x,y) defined for all pairs of real numbers x and y by
• For a joint pmf pX,Y(x,y) we must have: pX,Y(x,y) ≥ 0 for all values of x,y and
   
y
Y
and
x
X
P
y
x
p Y
X 


,
,
  1
,
, 

x y
Y
X y
x
p
Example for illustration
• Toss a coin 3 times. Define,
X: number of heads on 1st toss, Y: total number of heads.
• The sample space is Ω ={TTT, TTH, THT, HTT, THH, HTH, HHT, HHH}.
• We display the joint distribution of X and Y in the following table
• Can we recover the probability mass function for X and Y from the joint table?
• To find the probability mass function of X we sum the appropriate rows of the
table of the joint probability function.
• Similarly, to find the mass function for Y we sum the appropriate columns.
Marginal Probability Function
• The marginal probability mass function for X is
• The marginal probability mass function for Y is
• Case of several discrete random variables is analogous.
• If X1,…,Xm are discrete random variables on the same sample space with
joint probability function
The marginal probability function for X1 is
• The 2-dimentional marginal probability function for X1 and X2 is
   


y
Y
X
X y
x
p
x
p ,
,
   


x
Y
X
Y y
x
p
y
p ,
,
   
m
m
m
X
X x
X
x
X
P
x
x
p n


 ,...,
,... 1
1
1
,...
1
   


m
n
x
x
m
X
X
X x
x
p
x
p
,...,
1
,...
1
2
1
1
,...
   


m
n
x
x
m
X
X
X
X x
x
x
x
p
x
x
p
,...,
3
2
1
,...
2
1
3
1
2
1
,...,
,
,
,
Example
   
,
(1) , ?
X X Y
y
p x p x y
 

   
,
(2) , ?
Y X Y
x
p y p x y
 

Independence of random variables
• Definition
Random variables X and Y are independent if the events and
are independent.
 
A
X   
B
Y 
Theorem
• Two discrete random variables X and Y with joint pmf pX,Y(x,y) and
marginal mass function pX(x) and pY(y), are independent if and only if
     
y
p
x
p
y
x
p Y
X
Y
X 
,
,
Conditional Probability on a joint discrete distribution
• Given the joint pmf of X and Y, we want to find
and
   
 
y
Y
P
y
Y
and
x
X
P
y
Y
x
X
P





 |
   
 
x
X
P
y
Y
and
x
X
P
x
X
y
Y
P





 |
Example
 
(1) 2 | 1 ?
P Y X
  
 
(2) 1| 1 ?
P X Y
  
Why is the joint pmf/pdf useful?
85
Joint Distribution of
two or more
continuous random variables
The Joint Distribution of two Continuous R.V’s
• Definition
Random variables X and Y are (jointly) continuous if there is a non-negative
function fX,Y(x,y) such that
for any “reasonable” 2-dimensional set A.
• fX,Y(x,y) is called a joint density function for (X, Y).
• In particular , if A = {(X, Y): X ≤ x, Y ≤ x}, the joint CDF of X,Y is
• From Fundamental Theorem of Calculus we have
 
   



A
Y
X dxdy
y
x
f
A
Y
X
P ,
, ,
   
 

 


x y
Y
X
Y
X dv
du
v
u
f
y
x
F ,
,
, ,
,
     
y
x
F
x
y
y
x
F
y
x
y
x
f Y
X
Y
X
Y
X ,
,
, ,
2
,
2
.








Properties of joint density function
• for all
• It’s integral over R2 is
  0
,
, 
y
x
f Y
X
 
 






1
,
, dxdy
y
x
f Y
X
R
y
x 
,
Joint pdf (continuous r.v.)
For n random variables, the joint pdf assigns a
probability for each possible combination of
values:
1 2
( , ,..., ) 0
n
f x x x 
1 2 1
... ( , ,..., ) ... 1
n n
R R
f x x x dx dx 
 
Example
• Consider the following bivariate density function
• It’s a valid density function.
• Compute P(X > Y)
 
 











otherwise
y
x
xy
x
y
x
f Y
X
0
1
0
,
1
0
7
12
,
2
,
for all
  0
,
, 
y
x
f Y
X
 
 






1
,
, dxdy
y
x
f Y
X
R
y
x 
,
   
1
2
,
0 0
12
( ) ,
7
x
X Y
x y
P X Y f x y dxdy x xy dydx

   
  
week 8 91
Properties of Joint Distribution Function
For random variables X, Y , FX,Y : R2  [0,1] given by FX,Y (x,y) = P(X ≤ x,Y ≤ x)
• FX,Y (x,y) is non-decreasing in each variable i.e.
if x1 ≤ x2 and y1 ≤ y2 .
• and
  0
,
lim , 







y
x
F Y
X
y
x
  1
,
lim , 





y
x
F Y
X
y
x
   
2
2
,
1
1
, ,
, y
x
F
y
x
F Y
X
Y
X 
   
y
F
y
x
F Y
Y
X
x



,
lim ,    
x
F
y
x
F X
Y
X
y



,
lim ,
Marginal Densities and Distribution Functions
• The marginal (cumulative) distribution function of X is
• The marginal density of X is then
• Similarly the marginal density of Y is
     
 








x
Y
X
X dydu
y
u
f
x
X
P
x
F ,
,
     





 dy
y
x
f
x
F
x
f Y
X
X
X ,
,
'
   




 dx
y
x
f
y
f Y
X
Y ,
,
Generalization to higher dimensions
Suppose X, Y, Z are jointly continuous random variables with density f(x,y,z), then
• Marginal density of X is given by:
• Marginal density of X, Y is given by :
   
 






 dydz
z
y
x
f
x
fX ,
,
   




 dz
z
y
x
f
y
x
f Y
X ,
,
,
,
Covariance
Covariance
• Variables may change in relation to each
other
• Covariance measures how much the
movement in one variable predicts the
movement in a corresponding variable
95
R F Riesenfeld Sp 2010 CS5961 Comp Stat
Definition of Covariance
Cov (X,Y)= E[(X-X)(Y – Y)]
Alternative Formula
Cov (X,Y)= E(XY) – E(X)E(Y)
Variance of a Sum
Var (X+Y)= Var (X) + Var (Y)+2 Cov (X,Y)
Claim: Covariance is Bilinear
    
  

x y
Cov aX b cY d E aX E aX cY E cY
E ac X Y
acCov X Y
( , ) [( ( ))( ( ))]
[ ( )( )]
( , ).
 
What does the sign of covariance mean?
Look at Y = aX + b.
Then: Cov(X,Y) = Cov(X,aX + b) = aVar(X).
If a > 0, above the average in X goes with above the ave in Y.
If a < 0, above the average n X goes with below the ave in Y.
Cov(X,Y) = 0 means that there is no linear trend which connects
X and Y.
x
y
x
y
a>0 a<0
Ave(Y)
Ave(Y)
Ave(X) Ave(X)
Meaning of the value of Covariance
Let HI be height in inches and HC be the height in
centimeters.
Cov(HC,W) = Cov(2.54 HI,W) = 2.54 Cov (HI,W).
So the value depends on the units and is
not very informative!
Covariance and Correlation
Define the correlation coefficient:
 
  
X E X Y E Y
Corr X Y E
SD X SD Y
( ) ( )
( , ) ( )
( ) ( )

   
Cov X Y
1 1
SD X SD Y
( , )
( ) ( )

Using the linearity of Expectation we get:
Covariance and Correlation
Notice that (aX+b, cY+d) = (X,Y)(a,b>0).
This new quantity is
independent of the change in scale .
So it’s value is quite informative.
Correlation and Independence
X & Y are uncorrelated iff any of the following hold
Cov(X,Y) = 0,
Corr(X,Y) = 0
E(XY) = E(X) E(Y).
In particular, if X and Y are independent
they are uncorrelated.
Example
(1) Cov( , ) ?
ov( , )
(2) ?
X Y
X Y
C X Y

 

 

Bai giang Chapter 6 avandce math for engeneering

  • 1.
    Probability Review • Probability •Random variables • One random variables • Two or more random variables
  • 2.
    Why Probability? • Probabilityrepresents a (standardized) measure of chance, and quantifies uncertainty. • Statistics uses rules of probability as a tool for making inferences about or describing a population using data from a sample
  • 3.
    Probability v.s. Statistics Diagramshowing the difference between statistics and probability. (Image by MIT OpenCourseWare. Based on Gilbert, Norma. Statistics. W.B. Saunders Co., 1976.)
  • 4.
    Sample space andEvents • Sample space, result of an experiment Ex: If you toss a coin twice,  • Event A: a subset of  Ex: First toss is head A= {HH,HT} • S: Event space, a set of events Ex: {A|A }
  • 5.
    Terms   , ,, are called "outcomes" or "sample points" the elements of are called "sample points" That's why are called a "sam , , ple sp , ace". HH HT T HH HT TH T H TT o T S      
  • 6.
    Three Probability Axioms Probabilitysatisfies • Nonnegativity: for every event A, • Normalization: • Additivity: for mutually exclusive events  ( ) 0 P A        1 1 ( ) ( ) i i i i P A P A    1 P       : | [0,1] P S A A i A
  • 7.
    If A isan event, then P(A) is the probability that event A occurs.
  • 8.
    L. Wang, Departmentof Statistics University of South Carolina; Slide 13 Mutually Exclusive Events • Mutually exclusive events can not occur at the same time. Mutually Exclusive Events Not Mutually Exclusive Events  
  • 9.
    Properties of Probability Ex:If A and B are mutually exclusive, then 1. P(A∩B)=0 2. P(A∪B) = P ( A )+ P ( B )     1 n 1 1 (1) 0 (2) , , : mutually exclusive ( ) ( ) (3) 1 ( ) (4) ( ) ( ) (5) ( ) ( ) ( ) ( ) n n n i i i C P A A P A P A P A P A A B P A P B P A B P A P B P A B                   
  • 10.
    Conditional Probability • Moregenerally, we can express the conditional probability of B given that A has occurred as: • We can rewrite this formula to get the Multiplicative Rule of Probability: ( ) ( | ) ( ) P A B P B A P A   ( ) ( | ) ( ) P A B P B A P A  
  • 11.
    Conditional Probability • Insome cases events are related, so that if we know event A has occurred then we learn more about an event B • Example: Roll a die A: observe an even number {2,4,6} B: observe a number less than 4 {1,2,3} If we know nothing else, then P(B) = 3/6 = 1/2 But if we know A has occurred, then P(B | A) = 1/3
  • 12.
    We purchase 30%of our parts from Vendor V. Vendor V’s defective rate is 5%. What is the probability that a randomly chosen part is defective and from Vendor V? ① 0.200 ② 0.050 ③ 0.015 ④ 0.030 L. Wang, Department of Statistics University of South Carolina; Slide 17 P(A)=0.3: We purchase 30% of our parts from Vendor V. P(B|A)=0.05: Vendor V’s defective rate is 5%. P(A∩B)=?: What is the probability that a randomly chosen part is defective and from Vendor V? [Example of Conditional Probability]
  • 13.
    Independence • Events arenot always be related. • Events A and B are independent if and only if: iff: ( | ) ( ) P B A P B  ( ) ( ) ( ) P A B P A P B  
  • 14.
    Sample spaces haveinconvenience?
  • 15.
    Because samples spacesare not always numbers. What happens when you are dealing with six different outcomes counted over 1000 individual trials? We need something that allows us to visualize the distribution of such items in a meaningful way that charts and lists of sample spaces and events couldn't possibly accomplish. http://mathhelpforum.com/statistics/113891-need-random-variables.html Sample spaces have inconvenience?
  • 16.
    Sample spaces haveinconvenience. So we need something, that are functions called Random variables. The term “random” usually means not able to be predicted or happening by chance. It
  • 17.
    Sample spaces haveinconvenience. So we need something, that are functions called Random variables.
  • 18.
    The term “RandomVariables” • Random variable -Its value is determined by the outcome of an experiment. -It takes on a new value each time the experiment is performed. -That is why it is “variable”. • The term “random” means -not able to be predicted or happening by chance. It
  • 19.
    Random Variable asa Measurement • Thus a random variable can be thought of as a measurement on an experiment
  • 20.
    Domain (정의역) Codomain(공역) Function (함수) Random Variables
  • 21.
      X Roll twodice. Let X = the sum of the outcomes. Random Variables
  • 22.
    Examples of RandomVariables • Roll two dice. Let X = number of sixes. – Possible values of X = {0, 1, 2}. • Select a player on the LotteGiants. Let X = his batting average. – Possible values of X are {x | 0 ≤ x ≤ 1}.
  • 23.
    • Roll twodice; X = no. of sixes. Event A={(6,6)}  X=2 P(A)=P(X=2) • Throw two coins; X = no. of heads. Event A={(H,T), (T,H)}  X=1 P(A)=P(X=1) Events can be expressed by random variables.
  • 24.
           (1) ( ) | ( ) (2) ( ) | ( ) P X a P X a P a X b P a X b               Events can be expressed by random variables.
  • 25.
    Why Use aRandom Variable? • We are interested in the number of sixes. • So why not let the sample space be the possible number of sixes? S = {0, 1, 2} • Would that be wrong?
  • 26.
    Why Use aRandom Variable? • The random variable allows us to set up the sample space in any way that is convenient. • Then, through the random variable, we can focus on the characteristic of interest.
  • 27.
    • In manyexperiments, it is easier to deal with a summary variable than with the original probability structure. [Example] In an opinion poll, we ask 50 people whether agree or disagree with a certain issue. – Suppose we record a "1" for agree and "0" for disagree. – The sample space for this experiment has 250 elements. • Suppose we are only interested in the number of people who agree. – Define X=number of "1“ ‘s recorded out of 50. – Easier to deal with this sample space (has only 51 elements). CS479/679 Pattern Recognition Spring 2013 – Dr. George Bebis Why Use a Random Variable?
  • 28.
    Why Use aRandom Variable? • We design the sample space so that it will be easy to find the probabilities. • This may involve more than just the characteristic in which we are interested.
  • 29.
      : [0,1]defined as ( ) F R F x P X x    (Cumulative) Distribution function F of a random variable X.
  • 30.
    Two Types ofRandom Variables • Discrete Random Variable – A random variable whose set of possible values is a discrete set. • Continuous Random Variable – A random variable whose set of possible values is a continuous set. • In the following two examples, are they discrete or continuous?
  • 31.
    Examples of RandomVariables • Roll two dice. Let X = number of sixes. – Possible values of X = {0, 1, 2}. • Select a player on the LotteGiants. Let X = his batting average. – Possible values of X are {x | 0 ≤ x ≤ 1}.
  • 32.
    Two Types ofRandom Variables • Discrete random variables – Number of sales – Number of calls – People in line – Mistakes per page • Continuous random variables – Length – Depth – Volume – Time – Weight McClave, Statistics, 11th ed. Chapter 4: Discrete Random Variables
  • 33.
  • 34.
    Probability mass function(pmf) () ( ) p x P X x  
  • 35.
    Example of apmf • Roll two dice; X = no. of sixes. – P(X = 0) = 25/36. – P(X = 1) = 10/36. – P(X = 2) = 1/36.
  • 36.
  • 37.
    Example of apmf • Suppose that 10% of all households have no children, 30% have one child, 40% have two children, and 20% have three children. • Select a household at random and let X = number of children. • What is the pmf of X?
  • 38.
    Example of apmf • We may list each value. – P(X = 0) = 0.10 – P(X = 1) = 0.30 – P(X = 2) = 0.40 – P(X = 3) = 0.20
  • 39.
    • Or wemay present it as a chart. x P(X = x) 0 0.10 1 0.30 2 0.40 3 0.20 Example of a pmf
  • 40.
    • Or wemay present it as a stick graph. x P(X = x) 0 1 2 3 0.10 0.20 0.30 0.40 Example of a pmf
  • 41.
    • Or wemay present it as a histogram. x P(X = x) 0 1 2 3 0.10 0.20 0.30 0.40 Example of a pmf
  • 42.
    • Mean ofa Discrete Random Variable – The weighted average of all of its values. The weights are the probabilities.
  • 43.
    • Variance ofa Discrete Random Variable The weighted average of the squared deviations from the mean. • Standard variance
  • 44.
    The Mean of aDiscrete Random Variable • The mean is also called the expected value. • However, that does not mean that it is literally the value that we expect to see. • “Expected value” is simply a synonym for the mean or average.
  • 45.
    Example of theMean • Recall the example where X was the number of children in a household. x P(X = x) 0 0.10 1 0.30 2 0.40 3 0.20
  • 46.
    Example of theMean • Multiply each x by the corresponding probability. x P(X = x) xP(X = x) 0 0.10 0.00 1 0.30 0.30 2 0.40 0.80 3 0.20 0.60
  • 47.
    Example of theMean • Add up the column of products to get the mean. x P(X = x) xP(X = x) 0 0.10 0.00 1 0.30 0.30 2 0.40 0.80 3 0.20 0.60 1.70 = µ
  • 48.
    Example of theVariance x P(X = x) 0 0.10 1 0.30 2 0.40 3 0.20       2 2 2 2 2 2 2 ( ) 0 0.1 1 0.3 2 0.4 3 0.2 1.7 V X E X E X           
  • 49.
    Distributions of discrete randomvariables. • Discrete Uniform Distribution • Bernoulli Distribution • Binomial Distribution • Geometric Distribution • …
  • 50.
  • 51.
    G. Baker, Departmentof Statistics University of South Carolina; Slide 56 Continuous Random Variable • A continuous random variable is one for which the outcome can be any value in an interval of the real number line. • Usually a measurement. • Examples – Let Y = length in mm – Let Y = time in seconds – Let Y = temperature in ºC
  • 52.
    Continuous Random Variable •We don’t calculate P(Y = y), we calculate P(a < Y < b), where a and b are real numbers. • For a continuous random variable P(Y = y) = 0.
  • 53.
    Probability density function(pdf) (1): pdf of ( ) ( ) (2) : pdf of ( ) ( ) ( ) ( ) ( : CDF of X ) x b a f X P X x f y dy f X P a X b f x dx d f x F x F dx            
  • 54.
    Time Spent Waitingfor a Bus • A bus arrives at a bus stop every 30 minutes. If a person arrives at the bus stop at a random time, what is the probability that the person will have to wait less than 10 minutes for the next bus?
  • 55.
    G. Baker, Departmentof Statistics University of South Carolina; Slide 60 Identify the Random Variable • Let Y = wait time in minutes. • Since the arrival time is random, someone is as likely to arrive 1 minute before a bus arrives as 2 minutes, as 3 minutes, etc. Wait Time in Minutes 0 5 10 15 20 25 30 Wait Time 1.0 1/30 pdf of Y f(y)=
  • 56.
    What is theprobability a person will wait l ess than 10 minutes? Wait Time in Minutes 0 5 10 15 20 25 30 Wait Time 10/30 = 0.33 20/30 = 0.67 This is called a continuous uniform distribution. 1/30
  • 57.
    G. Baker, Departmentof Statistics University of South Carolina; Slide 62 Properties of a pdf f 1) f(y) > 0 for all possible intervals of y. 2) 3) If y0 is a specific value of interest, then the cumulative distribution function (cdf) is 4) If y1 and y2 are specific values of interest, then      1 ) ( dy y f      0 ) ( ) ( ) ( 0 0 y dy y f y Y P y F       2 1 ) ( ) ( ) ( ) ( 1 2 2 1 y y y F y F dy y f y Y y P
  • 58.
    G. Baker, Departmentof Statistics University of South Carolina; Slide 63 Expected Value for a Continuous Random Variable • Recall Expected Value for a Discrete Random Variable: • Expected value for a continuous random variable:     ) ( ) ( y p y Y E        dy y yf Y E ) ( ) ( 
  • 59.
    Variance for aContinuous Random Variable   2 2 ( ) ( ) ( ) ( ) Var Y E y y p y          2 2 ( ) ( ) ( ) ( ) Var Y E y y f y dy          Recall: Variance for a Discrete Random Variable: Variance for a Continuous Random Variable:
  • 60.
    G. Baker, Departmentof Statistics University of South Carolina; Slide 65   2 2 ( ) ( ) ( ) ( ) Var Y E y y f y dy          dy y f y y ) ( ) 2 ( 2 2                 2 2 2 2 ) (   dy y f y                dy y f dy y yf dy y f y ) ( ) ( 2 ) ( 2 2   2 2 2 2 ) ( ) (           Y E dy y f y
  • 61.
  • 62.
  • 63.
    Importance of Normal Distribution •1. Describes Many Random Processes or Continuous Phenomena • 2. Can Be Used to Approximate Discrete Probability Distributions – Example: Binomial • 3. Basis for Classical Statistical Inference
  • 64.
    Normal Distribution • 1.‘Bell-Shaped’ & Symmetrical 2. Mean, Median, Mode Are Equal 3. ‘Middle Spread’ Is 1.33  4. Random Variable Has Infinite Range Mean Median Mode X f(X) X f(X)
  • 65.
    Normal Distribution Useful Properties •About half of “weight” below mean (because symmetrical) • About 68% of probability within 1 standard deviation of mean (at change in curve) • About 95% of probability within 2 standard deviations • More than 99% of probability within 3 standard deviations Mean Median Mode X f(X) X f(X)            2    2    3    3 
  • 66.
    Probability Density Function 2 1 2 1 ( )e 2 x f x              2 1 2 1 ( ) e 2 x f x              • x = Value of Random Variable (- < x < ) •  = Population Standard Deviation  =3.14159 e = 2.71828 •  = Mean of Random Variable x
  • 67.
    Notation •X is N(μ,σ) •Therandom variable X has a normal distribution (N) with mean μ and standard deviation σ. •X is N(40,1) •X is N(10,5) •X is N(50,3)
  • 68.
    Standardize the Normal Distribution X   X   Onetable! One table! Normal Distribution Normal Distribution  = 0 = 1 Z  = 0 = 1 Z X Z     X Z     Standardized Normal Distribution Standardized Normal Distribution Z is N(0,1)
  • 69.
    Standardizing Example X = 5 = 10 6.2 X = 5  = 10 6.2 Normal Distribution Normal Distribution 6.2 5 .12 10 X Z        6.2 5 .12 10 X Z        Z = 0  = 1 .12 Z = 0  = 1 .12 Standardized Normal Distribution Standardized Normal Distribution
  • 70.
    Joint Distribution of twoor more random variables
  • 71.
    Joint Distribution of twoor more discrete random variables
  • 72.
    Joint Distribution oftwo or More Random Variables • Sometimes more than one measurement (r.v.) is taken on each member of the sample space. In cases like this there will be a few random variables defined on the same probability space and we would like to explore their joint distribution. • Joint behavior of 2 random variable (continuous or discrete), X and Y determined by their joint cumulative distribution function • n – dimensional case    . , , , y Y x X P y x F Y X       . , , ,..., 1 1 1 ,..., 1 n n n X X x X x X P x x F n    
  • 73.
    Discrete case • SupposeX, Y are discrete random variables defined on the same probability space. • The joint probability mass function of 2 discrete random variables X and Y is the function pX,Y(x,y) defined for all pairs of real numbers x and y by • For a joint pmf pX,Y(x,y) we must have: pX,Y(x,y) ≥ 0 for all values of x,y and     y Y and x X P y x p Y X    , ,   1 , ,   x y Y X y x p
  • 74.
    Example for illustration •Toss a coin 3 times. Define, X: number of heads on 1st toss, Y: total number of heads. • The sample space is Ω ={TTT, TTH, THT, HTT, THH, HTH, HHT, HHH}. • We display the joint distribution of X and Y in the following table • Can we recover the probability mass function for X and Y from the joint table? • To find the probability mass function of X we sum the appropriate rows of the table of the joint probability function. • Similarly, to find the mass function for Y we sum the appropriate columns.
  • 75.
    Marginal Probability Function •The marginal probability mass function for X is • The marginal probability mass function for Y is • Case of several discrete random variables is analogous. • If X1,…,Xm are discrete random variables on the same sample space with joint probability function The marginal probability function for X1 is • The 2-dimentional marginal probability function for X1 and X2 is       y Y X X y x p x p , ,       x Y X Y y x p y p , ,     m m m X X x X x X P x x p n    ,..., ,... 1 1 1 ,... 1       m n x x m X X X x x p x p ,..., 1 ,... 1 2 1 1 ,...       m n x x m X X X X x x x x p x x p ,..., 3 2 1 ,... 2 1 3 1 2 1 ,..., , , ,
  • 76.
    Example    , (1) , ? X X Y y p x p x y        , (2) , ? Y X Y x p y p x y   
  • 77.
    Independence of randomvariables • Definition Random variables X and Y are independent if the events and are independent.   A X    B Y  Theorem • Two discrete random variables X and Y with joint pmf pX,Y(x,y) and marginal mass function pX(x) and pY(y), are independent if and only if       y p x p y x p Y X Y X  , ,
  • 78.
    Conditional Probability ona joint discrete distribution • Given the joint pmf of X and Y, we want to find and       y Y P y Y and x X P y Y x X P       |       x X P y Y and x X P x X y Y P       |
  • 79.
    Example   (1) 2| 1 ? P Y X      (2) 1| 1 ? P X Y   
  • 80.
    Why is thejoint pmf/pdf useful? 85
  • 81.
    Joint Distribution of twoor more continuous random variables
  • 82.
    The Joint Distributionof two Continuous R.V’s • Definition Random variables X and Y are (jointly) continuous if there is a non-negative function fX,Y(x,y) such that for any “reasonable” 2-dimensional set A. • fX,Y(x,y) is called a joint density function for (X, Y). • In particular , if A = {(X, Y): X ≤ x, Y ≤ x}, the joint CDF of X,Y is • From Fundamental Theorem of Calculus we have          A Y X dxdy y x f A Y X P , , ,            x y Y X Y X dv du v u f y x F , , , , ,       y x F x y y x F y x y x f Y X Y X Y X , , , , 2 , 2 .        
  • 83.
    Properties of jointdensity function • for all • It’s integral over R2 is   0 , ,  y x f Y X           1 , , dxdy y x f Y X R y x  ,
  • 84.
    Joint pdf (continuousr.v.) For n random variables, the joint pdf assigns a probability for each possible combination of values: 1 2 ( , ,..., ) 0 n f x x x  1 2 1 ... ( , ,..., ) ... 1 n n R R f x x x dx dx   
  • 85.
    Example • Consider thefollowing bivariate density function • It’s a valid density function. • Compute P(X > Y)                otherwise y x xy x y x f Y X 0 1 0 , 1 0 7 12 , 2 , for all   0 , ,  y x f Y X           1 , , dxdy y x f Y X R y x  ,     1 2 , 0 0 12 ( ) , 7 x X Y x y P X Y f x y dxdy x xy dydx        
  • 86.
    week 8 91 Propertiesof Joint Distribution Function For random variables X, Y , FX,Y : R2  [0,1] given by FX,Y (x,y) = P(X ≤ x,Y ≤ x) • FX,Y (x,y) is non-decreasing in each variable i.e. if x1 ≤ x2 and y1 ≤ y2 . • and   0 , lim ,         y x F Y X y x   1 , lim ,       y x F Y X y x     2 2 , 1 1 , , , y x F y x F Y X Y X      y F y x F Y Y X x    , lim ,     x F y x F X Y X y    , lim ,
  • 87.
    Marginal Densities andDistribution Functions • The marginal (cumulative) distribution function of X is • The marginal density of X is then • Similarly the marginal density of Y is                 x Y X X dydu y u f x X P x F , ,             dy y x f x F x f Y X X X , , '          dx y x f y f Y X Y , ,
  • 88.
    Generalization to higherdimensions Suppose X, Y, Z are jointly continuous random variables with density f(x,y,z), then • Marginal density of X is given by: • Marginal density of X, Y is given by :              dydz z y x f x fX , ,          dz z y x f y x f Y X , , , ,
  • 89.
  • 90.
    Covariance • Variables maychange in relation to each other • Covariance measures how much the movement in one variable predicts the movement in a corresponding variable 95 R F Riesenfeld Sp 2010 CS5961 Comp Stat
  • 91.
    Definition of Covariance Cov(X,Y)= E[(X-X)(Y – Y)] Alternative Formula Cov (X,Y)= E(XY) – E(X)E(Y) Variance of a Sum Var (X+Y)= Var (X) + Var (Y)+2 Cov (X,Y) Claim: Covariance is Bilinear          x y Cov aX b cY d E aX E aX cY E cY E ac X Y acCov X Y ( , ) [( ( ))( ( ))] [ ( )( )] ( , ).  
  • 92.
    What does thesign of covariance mean? Look at Y = aX + b. Then: Cov(X,Y) = Cov(X,aX + b) = aVar(X). If a > 0, above the average in X goes with above the ave in Y. If a < 0, above the average n X goes with below the ave in Y. Cov(X,Y) = 0 means that there is no linear trend which connects X and Y. x y x y a>0 a<0 Ave(Y) Ave(Y) Ave(X) Ave(X)
  • 93.
    Meaning of thevalue of Covariance Let HI be height in inches and HC be the height in centimeters. Cov(HC,W) = Cov(2.54 HI,W) = 2.54 Cov (HI,W). So the value depends on the units and is not very informative!
  • 94.
    Covariance and Correlation Definethe correlation coefficient:      X E X Y E Y Corr X Y E SD X SD Y ( ) ( ) ( , ) ( ) ( ) ( )      Cov X Y 1 1 SD X SD Y ( , ) ( ) ( )  Using the linearity of Expectation we get:
  • 95.
    Covariance and Correlation Noticethat (aX+b, cY+d) = (X,Y)(a,b>0). This new quantity is independent of the change in scale . So it’s value is quite informative.
  • 96.
    Correlation and Independence X& Y are uncorrelated iff any of the following hold Cov(X,Y) = 0, Corr(X,Y) = 0 E(XY) = E(X) E(Y). In particular, if X and Y are independent they are uncorrelated.
  • 98.
    Example (1) Cov( ,) ? ov( , ) (2) ? X Y X Y C X Y      