The document discusses random variables and vectors. It defines random variables as functions that assign outcomes of random experiments to real numbers. There are two types of random variables: discrete and continuous. Random variables are characterized by their expected value, variance/standard deviation, and other moments. Random vectors are multivariate random variables. Key concepts covered include probability mass functions, probability density functions, expected value, variance, and how these properties change when random variables are scaled or combined linearly.
Introduction to random variables and vectors, their definitions, and the outline of topics to be discussed.
Explanation of random variables, including outcomes from experiments and examples using coin flips.
Definition of discrete random variables, their outcomes, probability mass functions, and examples including uniform and binomial distributions.
Introduction to continuous random variables, including definitions, probability density functions, and cumulative distribution functions with equations.
Discussion on how to characterize random variables through distribution, mean and variance calculation using examples from gambling.
Detailed explanation of mean and variance, their calculations, and importance in characterizing the distribution of random variables.
Effects of scaling and summing independent random variables, linear properties in expectation and variance.
Explanation of the Central Limit Theorem and implications for sums of independent identically distributed random variables.
Introduction to random vectors, joint distributions, covariance matrices, and independence concepts.
List of recommended resources and software for further reading and statistical application.
Random Variables/Vectors
Tomoki Tsuchida
Computational & Cognitive Neuroscience Lab
Department of Cognitive Science
University of California, San Diego
2.
Talk Outline
• Random Variables Defined
• Types of Random Variables
‣ Discrete
‣ Continuous
• Characterizing Random Variables
‣ Expected Value
‣ Variance/Standard Deviation; Entropy
‣ Linear Combinations of Random Variables
• Random Vectors Defined
• Characterizing Random Vectors
‣ Expected Value
‣ Covariance
3.
Random Variable
Elementary Outcomesof a
The Real Line
Random Experiment
=$1
Flipping a coin once =$0
• Random variable is a function of each outcome.
• The probability of the r.v. (taking a particular value) is
determined by the probability of the outcome.
4.
Example
Let X bethe sum of the payoffs from two coin flips.
P(X = 0) = P({TT}) = 1/4
P(X=1) = P({TH}) = P({HT}) = 1/2
P(X=2) = P({HH}) = 1/4
The random variable X takes values {0, 1, 2},
with probabilities {1/4, 1/2, 1/2}.
5.
Talk Outline
• Random Variables Defined
• Types of Random Variables
‣ Discrete
‣ Continuous
• Characterizing Random Variables
‣ Expected Value
‣ Variance/Standard Deviation; Entropy
‣ Linear Combinations of Random Variables
• Random Vectors Defined
• Characterizing Random Vectors
‣ Expected Value
‣ Covariance
6.
Discrete Random Variables:
Variables whose outcomes
are separated by gaps
Rolling a six-sided die once
Flipping a coin once
(and get paid for the number
(and get paid for H):
on the face):
{0,1}
{1,2,3,4,5,6}
7.
Discrete Random Variables:
Defined by a probability mass function, P
•P(X=a)=P(a)
•1≥P(a)≥0
•The probability of all outcomes
sums to one (from the axiom!)
Rolling a fair six-sided die
1.170
0.878
0.585
Probability
0.293
0
1 2 3 4 5 6
Outcome
8.
Types of ProbabilityMass Functions:
Discrete Uniform Distribution
P(X = a) = 1 / N
(where N is the total number of distinct outcomes)
Rolling a fair six-sided die
€ 1.170
0.878
0.585
Probability
0.293
0
1 2 3 4 5 6
Outcome
9.
Types of ProbabilityMass Functions:
Binomial Distribution
Flipping a fair coin twice
0.500
0.375
Probability 0.250
0.125
0
0 1 2
Number of Heads
10.
Types of ProbabilityMass Functions:
Binomial Distribution
⎛ n ⎞ k n−k
pmf: P(X = k) = ⎜ ⎟ p (1− p)
⎝ k ⎠
k: the number of “successes” (in our case the
outcome of heads is defined as success)
p: probability of success in a single observation (in
€
our case .5)
n: the number of observations (in our case two)
⎛ n ⎞ n!
⎜ ⎟ = : the number of different ways you could
⎝ k ⎠ k!(n − k)!
get k successes out of n observations
11.
Talk Outline
• Random Variables Defined
• Types of Random Variables
‣ Discrete
‣ Continuous
• Characterizing Random Variables
‣ Expected Value
‣ Variance/Standard Deviation; Entropy
‣ Linear Combinations of Random Variables
• Random Vectors Defined
• Characterizing Random Vectors
‣ Expected Value
‣ Covariance
Continuous Random Variables:
Defined by probability density function
Discrete (pmf) Continuous (pdf)
0.500
Probability
0.375
0.250
0.125
0
0 1 2
Number of Heads
14.
Continuous Random Variables:
Probability of a range of outcomes
p(179<=x<=181) p(x<=178)
b
p(a ≤ x ≤ b) = ∫ a
f (x)dx
p(X=x)=0 (no single outcome has any probability!)
15.
Continuous Random Variables:
Defined by probability density function, f
Continuous
•f(a)≥0
•The area under the pdf
must equal 1
16.
Types of ProbabilityDensity
Functions:
Continuous Uniform Distribution
if a≤x≤b:
1
f(x) =
b−a
else:
f(x) = 0
a = lower bound
b = upper bound
17.
Types of ProbabilityDensity
Functions:
Normal (Gaussian) Distribution
(x− µ )2
1 2
f(x) = e 2σ
σ 2π
σ = standard deviation
µ = mean
Types of probabilitydistributions
There are lots and lots of distributions!
(But we can always look them on wikipedia!)
20.
Talk Outline
• Random Variables Defined
• Types of Random Variables
‣ Discrete
‣ Continuous
• Characterizing Random Variables
‣ Expected Value
‣ Variance/Standard Deviation; Entropy
‣ Linear Combinations of Random Variables
• Random Vectors Defined
• Characterizing Random Vectors
‣ Expected Value
‣ Covariance
21.
Characterizing the distributionof a
random variable
If we know the distribution of a
random variable, we pretty much
know all there is to know about
the random variable.
22.
Characterizing the distributionof a
random variable
If we know the distribution of a
random variable, we pretty much
know all there is to know about
the random variable.
But with real data, we don’t know
the full distribution.
23.
Characterizing the distributionof a
random variable
If we know the distribution of a
random variable, we pretty much
know all there is to know about
the random variable.
But with real data, we don’t know
the full distribution.
24.
Characterizing the distributionof a
random variable
If we know the distribution of a
random variable, we pretty much
know all there is to know about
the random variable.
But with real data, we don’t know
the full distribution.
So we want to characterize distributions by a
couple of numbers (“statistics”.)
25.
Characterizing the CentralTendency
of a Random Variable
(x− µ )2
Normal (Gaussian) Distribution
1 2
p(x) = e 2σ
σ 2π
σ = standard deviation
µ = mean
We know
everything from
mean and STD
26.
A Simple GamblingGame
1. Flip a fair coin
2. Possible outcomes:
I give you $2
P(X=2)=1/2
P(X=-1)=1/2
You give me $1
27.
E(X)=$.5
A Simple Gambling Game
Probability Mass Function
0.500
Probability
0.375
0.250
0.125
0
-1 0 1 2
Win/loss for you (in $)
28.
E(X)=$.5
A Simple Gambling Game
Probability Mass Function
0.500
If we played this game an
infinite # of times, what would
Probability
0.375
the average outcome be?
0.250
0.125
0
-1 0 1 2
Win/loss for you (in $)
29.
E(X)=$.5
A Simple Gambling Game
Probability Mass Function
0.500
If we played this game an
infinite # of times, what would
Probability
0.375
the average outcome be?
0.250
µ = E(X) = ∑ P(X = x i )x i
0.125
0 Expected value
-1 0 1 2
€ the “mean”
Win/loss for you (in $)
30.
Another Gambling Game
1.Roll a fair six-sided die
2. Possible outcomes:
Die Payoff
1 $8
2 -$1
3 -$1
4 -$1
5 -$1
6 -$1
31.
Another Gambling Game
Probability Mass Function
0.900
What’s the mean outcome
Probability
0.675
of this game?
0.450
µ = E(X) = ∑ P(X = x i )x i
0.225
0
-1 0 1 2 3 4 5 6 7 8
€
Win/loss for you (in $) E(X)=$.5
32.
Why should youprefer the coin
game?
Coin Game Die Game
0.850 0.900
Probability
Probability
0.638 0.675
0.425 0.450
0.213 0.225
0 0
-1 0 1 2 3 4 5 6 7 8 -1 0 1 2 3 4 5 6 7 8
Win/loss for you (in $) Win/loss for you (in $)
33.
Talk Outline
• Random Variables Defined
• Types of Random Variables
‣ Discrete
‣ Continuous
• Characterizing Random Variables
‣ Expected Value
‣ Variance/Standard Deviation; Moments
‣ Linear Combinations of Random Variables
• Random Vectors Defined
• Characterizing Random Vectors
‣ Expected Value
‣ Covariance
34.
Characterizing the Variabilityof a
Random Variable
Coin Game Die Game
0.850 0.900
Probability
Probability
0.638 0.675
0.425 0.450
0.213 0.225
0 0
-1 0 1 2 3 4 5 6 7 8 -1 0 1 2 3 4 5 6 7 8
Win/loss for you (in $) Win/loss for you (in $)
35.
Variance: The expectedvalue of the squared
deviation from the mean
Probability Mass Function Variance shows the
0.500 “spread” of the
distribution.
Probability
0.375
0.250
σ 2 = Var(X) = ∑ P(X = x i )(x i − µ) 2
0.125
ANS: 2.25=9/4 dollars
0 € squared
-1 0 1 2
Win/loss for you (in $)
36.
Standard Deviation: Thesquare root of the
variance
Probability Mass Function
0.500
Probability
0.375
σ 2 = Var(X) = ∑ P(X = x i )(x i − µ) 2
0.250
σ = Var(X)
0.125
€ (Why? Because variance
0
-1 0 1 2 €was in the units of X2. STD
is in the same unit X.)
ANS: 1.5 dollars
Win/loss for you (in $)
37.
µ = $0.5,σ = $1.5 µ = $0.5, σ ≈ $3.35
Coin Game Die Game
0.850 0.900
€
Probability
Probability
0.638 0.675
0.425 0.450
0.213 0.225
0 0
-1 0 1 2 3 4 5 6 7 8 -1 0 1 2 3 4 5 6 7 8
Win/loss for you (in $) Win/loss for you (in $)
38.
Summary: Mean &Variance
Discrete Continuous
Definition
R.V.s R.V.s
∞
Mean:µ E(X) ∑ p(x )x i i ∫ p(x)xdx
i −∞
∞
2 2
Variance:σ 2
E((X − µ) ) ∑ p(x i )(x i − µ) 2 ∫ p(x)(x − µ) dx
€ i € −∞
€
€ €
€
39.
Moments
But why stopat the variance (~ 2nd moment?)
3rd moment 4t moment
E(X 3 ) E(X 4 )
Skewness Kurtosis
40.
Talk Outline
• Random Variables Defined
• Types of Random Variables
‣ Discrete
‣ Continuous
• Characterizing Random Variables
‣ Expected Value
‣ Variance/Standard Deviation; Entropy
‣ Linear Combinations of Random Variables
• Random Vectors Defined
• Characterizing Random Vectors
‣ Expected Value
‣ Covariance
41.
What happens ifI scale a R.V.?
Original Coin Game: X Y=2X
0.500 0.500
Probability
Probability
0.375 0.375
0.250 0.250
0.125 0.125
0 0
-1 0 1 2 -2 -1 0 1 2 3 4
Win/loss for you (in $) Win/loss for you (in $)
42.
What happens ifI scale a R.V.?
Y=2X
The New Mean: 0.500
Probability
µY = ∑ pY (2x i )2x i = 2∑ pX (x i )x i = 2µX 0.375
i i
µX = .5 0.250
µY = 1
0.125
0
-2 -1 0 1 2 3 4
Win/loss for you (in $)
43.
What happens ifI scale a R.V.?
Y=2X
The New Variance: 0.500
σ Y = ∑ pY (2x i )(2x i − µY ) 2 = ...
2
Probability
0.375
i
∑ pY (2x i )(2x i − 2µX ) 2 = ... 0.250
i
4 ∑ pX (x i )(x i − µX ) 2 = 4σ X = 9
2
0.125
i
0
-2 -1 0 1 2 3 4
Win/loss for you (in $)
44.
What happens ifI sum two
independent R.V.s?
One Round Y=X+X
0.500 0.500
Probability
Probability
0.375 0.375
0.250 0.250
0.125 0.125
0 0
-2 -1 0 1 2 3 4 -2 -1 0 1 2 3 4
Win/loss for you (in $) Win/loss for you (in $)
45.
What happens ifI sum two
independent R.V.s?
Y=X+X
The New Mean: 0.500
µY = µX + µX = 1
Probability
0.375
0.250
The New Variance:
2 2 2
σ = σ + σ = 4.5
Y X X 0.125
0
-2 -1 0 1 2 3 4
Win/loss for you (in $)
46.
What happens ifI sum two independent
identically distributed R.V.s?
One Round Y=X+X
0.500 0.500
Probability
Probability
0.375 0.375
0.250 0.250
0.125 0.125
0 0
-2 -1 0 1 2 3 4 -2 -1 0 1 2 3 4
Win/loss for you (in $) Win/loss for you (in $)
47.
Expectation is linear
E(aX) = aE(X)
E(X + Y ) = E(X) + E(Y )
E(X + c) = E(X) + c
We could’ve calculated the previous results using
these properties!
Exercise: what happens to Var(aX)
and Var(X+Y) ?
48.
What happens ifI sum independent
identically distributed (i.i.d.) R.V.s?
1.500
Probability
1.125
0.750
0.375
0
0 1
# of Heads
49.
What happens ifI sum independent
identically distributed (i.i.d.) R.V.s?
0.500
Probability
0.375
0.250
0.125
0
0 1 2
# of Heads
50.
What happens ifI sum independent
identically distributed (i.i.d.) R.V.s?
0.4
Probability
0.3
0.2
0.1
0
0 1 2 3
# of Heads
51.
What happens ifI sum independent
identically distributed (i.i.d.) R.V.s?
0.4
Probability
0.3
0.2
0.1
0
0 1 2 3 4
# of Heads
52.
What happens ifI sum independent
identically distributed (i.i.d.) R.V.s?
What’s happening to the
Mean of 75 flips
pmf?
Ans: it’s looking more
and more Gaussian
53.
What happens ifI sum independent
identically distributed (i.i.d.) R.V.s?
Mean of 150 flips
54.
Central Limit Theorem:
The sum of i.i.d. random variables is approximately
normally distributed when the number of random
This is one reason why
variables is large.
Gaussian variables are
popularly assumed when
doing statistical analysis
Normal pdf
or modeling. Another Mean of 150 flips
reason is that it’s
mathematically simpler
from: Oxford Dictionary of Statistics
55.
The sum oftwo or more r.v.’s with normal
distributions are also normal distributions
The number of random
variables necessary to
make the sum
Normal pdf
approximately Gaussian
depends on the type of
population distribution
mean of 25samples
From: R. R. Wilcox (2003) Applying Contemporary
Statistical Techniques
62.
Wilcox says youneed
100 samples from this
distribution to get a
decent approximation
mean of 50 samples
From: R. R. Wilcox (2003) Applying Contemporary
Statistical Techniques
63.
Entropy: Another measureof variability
Probability Mass Function
0.60
H = −∑ p(x i )log 2 ( p(x i ))
Probability
0.45
0.30
Any base is OK, but when base 2
0.15
is used entropy is said to be in
€ units of “bits”
0
Democrat Republican
UCSD voters
64.
Entropy: Another measureof variability
H = −∑ p(x i )log 2 ( p(x i ))
1. Entropy is minimal (H=0) when one outcome is
certain
2. Entropy is maximal when each of the
€
k outcomes is equally likely
⎛ 1 ⎞
H max = −log 2 ⎜ ⎟ = log 2 k
⎝ k ⎠
3. Entropy is a measure of information capacity.
€
65.
Talk Outline
• Random Variables Defined
• Types of Random Variables
‣ Discrete
‣ Continuous Do simple RT experiment
• Characterizing Random Variables
‣ Expected Value
‣ Variance/Standard Deviation; Entropy
‣ Linear Combinations of Random Variables
• Random Vectors Defined
• Characterizing Random Vectors
‣ Expected Value
‣ Covariance
66.
What about morethan one
random variable?
256 EEG sensors
120 million photoreceptors
67.
Random Vectors
• An n dimensional random vector consists of n random
variables all associated with the same probability space
(i.e., each outcome dictates the value of every random
variable)
• Example 2-D Random Vector:
⎡X ⎤ X=Reaction Time
v = ⎢ ⎥
⎣Y ⎦ Y=Arm Length
• Sample m times from v:
v1 v 2 v 3 ... v m
€
⎡x1 x2 x 3 ... x m ⎤
⎢ ⎥
⎣y1 y2 y 3 ... y m ⎦
68.
Probability Distribution ofa
Random Vector:
Example: Two normal r.v.s:
“Joint distribution” of
constituent r.v.s:
Probability
p(v) = p(X,Y )
Y
X
69.
Probability Distribution ofa
Random Vector:
Scatterplot of 5000
observations Example: Two normal r.v.s:
Probability
Y
X
70.
What will thescatterplot of
our data look like?
A: B:
C: D:
71.
Talk Outline
• Random Variables Defined
• Types of Random Variables
‣ Discrete
‣ Continuous
• Characterizing Random Variables
‣ Expected Value
‣ Variance/Standard Deviation; Entropy
‣ Linear Combinations of Random Variables
• Random Vectors Defined
• Characterizing Random Vectors
‣ Expected Value
‣ Covariance
72.
Expected Value ofa Random Vector
• The expected value of a random vector, v, is simply the
expected value of its constituent random variables.
• Example 2-D Random Vector:
⎡X ⎤
v = ⎢ ⎥
⎣Y ⎦
E(Y ) E(v)
⎡ E(X)⎤
E(v) = ⎢ ⎥
⎣ E(Y ) ⎦
€
€ €
⎡µX ⎤
µv = ⎢ ⎥
⎣µY ⎦ E(X)
73.
Variance of aRandom Vector?
• Is the variance of a random vector, v, simply the
variance of its constituent random variables?
• Example 2-D Random Vector:
⎡X ⎤ 2
⎡σ X ⎤
2
v = ⎢ ⎥ σ v = ⎢ 2 ⎥ ?
⎣Y ⎦ ⎣σ Y ⎦
€
€
74.
Variance of aRandom Vector?
• Is the variance of a random vector, v, simply the
variance of its constituent random variables?
• Example 2-D Random Vector:
X
⎡X ⎤ 2
⎡σ X ⎤
2
v = ⎢ ⎥ σ v = ⎢ 2 ⎥ ?
⎣Y ⎦ ⎣σ Y ⎦
€
€
Covariance Matrix ofa Random Vector
• Diagonal entries are the variance of that dimension
• Off-diagonal entries are the covariance between the
column and row dimensions
‣ Covariance between two random variables:
Cov(X,Y ) = E((X − µx )(Y − µy ))
Note: Cov(X,Y ) = Cov(Y, X)
Cov(X,Y ) = 0 if X and Y are independent
€
Cov(X,Y ) ∝ Corr(X,Y )
• Our 2-D example:
⎡X ⎤ ⎡ Var(X) Cov(Y, X)⎤
€ v = ⎢ ⎥ C = ⎢ ⎥
⎣Y ⎦ ⎣Cov(X,Y ) Var(Y ) ⎦
Covariance of 0does NOT entail
independence!!
•Recall: Cov(X,Y ) ∝ Corr(X,Y )
Cov(X,Y )
Corr(X,Y ) =
σ Xσ Y
•PMF of two dependent variables with a
covariance of 0:
€ p(X = 1,Y = 0) = .25 p(X = 0,Y = 1) = .25
p(X = −1,Y = 0) = .25 p(X = 0,Y = −1) = .25
•Special case: If two normally distributed random
variables have a covariance of 0, they ARE independent
€ €
79.
Talk Outline
• Random Variables Defined
• Types of Random Variables
‣ Discrete
‣ Continuous
• Characterizing Random Variables
‣ Expected Value
‣ Variance/Standard Deviation; Entropy
‣ Linear Combinations of Random Variables
• Random Vectors Defined
• Characterizing Random Vectors
‣ Expected Value
‣ Covariance
80.
Recommended Resources:
The Mathworldonline math encyclopedia:
http://mathworld.wolfram.com/
Gonzalez & Woods: Review Chapter on Linear
Algebra, Probability, & Random Variables:
http://www.imageprocessingplace.com/root_files_V3/
tutorials.htm
Javier Movellan’s useful math facts:
http://mplab.ucsd.edu/wordpress/?page_id=75
81.
Dana Ballard’s NaturalComputation
(some good stuff)
Dayan & Abbot
Theoretical Neuroscience
82.
Contemporary Data Analysis
RandWilcox, Applying Contemporary
Statistical Techniques
Sheldon Ross
A First Course in Probability