1. Random Variables/Vectors
Tomoki Tsuchida
Computational & Cognitive Neuroscience Lab
Department of Cognitive Science
University of California, San Diego
2. Talk Outline
• Random Variables Defined
• Types of Random Variables
‣ Discrete
‣ Continuous
• Characterizing Random Variables
‣ Expected Value
‣ Variance/Standard Deviation; Entropy
‣ Linear Combinations of Random Variables
• Random Vectors Defined
• Characterizing Random Vectors
‣ Expected Value
‣ Covariance
3. Random Variable
Elementary Outcomes of a
The Real Line
Random Experiment
=$1
Flipping a coin once =$0
• Random variable is a function of each outcome.
• The probability of the r.v. (taking a particular value) is
determined by the probability of the outcome.
4. Example
Let X be the sum of the payoffs from two coin flips.
P(X = 0) = P({TT}) = 1/4
P(X=1) = P({TH}) = P({HT}) = 1/2
P(X=2) = P({HH}) = 1/4
The random variable X takes values {0, 1, 2},
with probabilities {1/4, 1/2, 1/2}.
5. Talk Outline
• Random Variables Defined
• Types of Random Variables
‣ Discrete
‣ Continuous
• Characterizing Random Variables
‣ Expected Value
‣ Variance/Standard Deviation; Entropy
‣ Linear Combinations of Random Variables
• Random Vectors Defined
• Characterizing Random Vectors
‣ Expected Value
‣ Covariance
6. Discrete Random Variables:
Variables whose outcomes
are separated by gaps
Rolling a six-sided die once
Flipping a coin once
(and get paid for the number
(and get paid for H):
on the face):
{0,1}
{1,2,3,4,5,6}
7. Discrete Random Variables:
Defined by a probability mass function, P
•P(X=a)=P(a)
•1≥P(a)≥0
•The probability of all outcomes
sums to one (from the axiom!)
Rolling a fair six-sided die
1.170
0.878
0.585
Probability
0.293
0
1 2 3 4 5 6
Outcome
8. Types of Probability Mass Functions:
Discrete Uniform Distribution
P(X = a) = 1 / N
(where N is the total number of distinct outcomes)
Rolling a fair six-sided die
€ 1.170
0.878
0.585
Probability
0.293
0
1 2 3 4 5 6
Outcome
9. Types of Probability Mass Functions:
Binomial Distribution
Flipping a fair coin twice
0.500
0.375
Probability 0.250
0.125
0
0 1 2
Number of Heads
10. Types of Probability Mass Functions:
Binomial Distribution
⎛ n ⎞ k n−k
pmf: P(X = k) = ⎜ ⎟ p (1− p)
⎝ k ⎠
k: the number of “successes” (in our case the
outcome of heads is defined as success)
p: probability of success in a single observation (in
€
our case .5)
n: the number of observations (in our case two)
⎛ n ⎞ n!
⎜ ⎟ = : the number of different ways you could
⎝ k ⎠ k!(n − k)!
get k successes out of n observations
11. Talk Outline
• Random Variables Defined
• Types of Random Variables
‣ Discrete
‣ Continuous
• Characterizing Random Variables
‣ Expected Value
‣ Variance/Standard Deviation; Entropy
‣ Linear Combinations of Random Variables
• Random Vectors Defined
• Characterizing Random Vectors
‣ Expected Value
‣ Covariance
13. Continuous Random Variables:
Defined by probability density function
Discrete (pmf) Continuous (pdf)
0.500
Probability
0.375
0.250
0.125
0
0 1 2
Number of Heads
14. Continuous Random Variables:
Probability of a range of outcomes
p(179<=x<=181) p(x<=178)
b
p(a ≤ x ≤ b) = ∫ a
f (x)dx
p(X=x)=0 (no single outcome has any probability!)
15. Continuous Random Variables:
Defined by probability density function, f
Continuous
•f(a)≥0
•The area under the pdf
must equal 1
16. Types of Probability Density
Functions:
Continuous Uniform Distribution
if a≤x≤b:
1
f(x) =
b−a
else:
f(x) = 0
a = lower bound
b = upper bound
17. Types of Probability Density
Functions:
Normal (Gaussian) Distribution
(x− µ )2
1 2
f(x) = e 2σ
σ 2π
σ = standard deviation
µ = mean
19. Types of probability distributions
There are lots and lots of distributions!
(But we can always look them on wikipedia!)
20. Talk Outline
• Random Variables Defined
• Types of Random Variables
‣ Discrete
‣ Continuous
• Characterizing Random Variables
‣ Expected Value
‣ Variance/Standard Deviation; Entropy
‣ Linear Combinations of Random Variables
• Random Vectors Defined
• Characterizing Random Vectors
‣ Expected Value
‣ Covariance
21. Characterizing the distribution of a
random variable
If we know the distribution of a
random variable, we pretty much
know all there is to know about
the random variable.
22. Characterizing the distribution of a
random variable
If we know the distribution of a
random variable, we pretty much
know all there is to know about
the random variable.
But with real data, we don’t know
the full distribution.
23. Characterizing the distribution of a
random variable
If we know the distribution of a
random variable, we pretty much
know all there is to know about
the random variable.
But with real data, we don’t know
the full distribution.
24. Characterizing the distribution of a
random variable
If we know the distribution of a
random variable, we pretty much
know all there is to know about
the random variable.
But with real data, we don’t know
the full distribution.
So we want to characterize distributions by a
couple of numbers (“statistics”.)
25. Characterizing the Central Tendency
of a Random Variable
(x− µ )2
Normal (Gaussian) Distribution
1 2
p(x) = e 2σ
σ 2π
σ = standard deviation
µ = mean
We know
everything from
mean and STD
26. A Simple Gambling Game
1. Flip a fair coin
2. Possible outcomes:
I give you $2
P(X=2)=1/2
P(X=-1)=1/2
You give me $1
27. E(X)=$.5
A Simple Gambling Game
Probability Mass Function
0.500
Probability
0.375
0.250
0.125
0
-1 0 1 2
Win/loss for you (in $)
28. E(X)=$.5
A Simple Gambling Game
Probability Mass Function
0.500
If we played this game an
infinite # of times, what would
Probability
0.375
the average outcome be?
0.250
0.125
0
-1 0 1 2
Win/loss for you (in $)
29. E(X)=$.5
A Simple Gambling Game
Probability Mass Function
0.500
If we played this game an
infinite # of times, what would
Probability
0.375
the average outcome be?
0.250
µ = E(X) = ∑ P(X = x i )x i
0.125
0 Expected value
-1 0 1 2
€ the “mean”
Win/loss for you (in $)
30. Another Gambling Game
1. Roll a fair six-sided die
2. Possible outcomes:
Die Payoff
1 $8
2 -$1
3 -$1
4 -$1
5 -$1
6 -$1
31. Another Gambling Game
Probability Mass Function
0.900
What’s the mean outcome
Probability
0.675
of this game?
0.450
µ = E(X) = ∑ P(X = x i )x i
0.225
0
-1 0 1 2 3 4 5 6 7 8
€
Win/loss for you (in $) E(X)=$.5
32. Why should you prefer the coin
game?
Coin Game Die Game
0.850 0.900
Probability
Probability
0.638 0.675
0.425 0.450
0.213 0.225
0 0
-1 0 1 2 3 4 5 6 7 8 -1 0 1 2 3 4 5 6 7 8
Win/loss for you (in $) Win/loss for you (in $)
33. Talk Outline
• Random Variables Defined
• Types of Random Variables
‣ Discrete
‣ Continuous
• Characterizing Random Variables
‣ Expected Value
‣ Variance/Standard Deviation; Moments
‣ Linear Combinations of Random Variables
• Random Vectors Defined
• Characterizing Random Vectors
‣ Expected Value
‣ Covariance
34. Characterizing the Variability of a
Random Variable
Coin Game Die Game
0.850 0.900
Probability
Probability
0.638 0.675
0.425 0.450
0.213 0.225
0 0
-1 0 1 2 3 4 5 6 7 8 -1 0 1 2 3 4 5 6 7 8
Win/loss for you (in $) Win/loss for you (in $)
35. Variance: The expected value of the squared
deviation from the mean
Probability Mass Function Variance shows the
0.500 “spread” of the
distribution.
Probability
0.375
0.250
σ 2 = Var(X) = ∑ P(X = x i )(x i − µ) 2
0.125
ANS: 2.25=9/4 dollars
0 € squared
-1 0 1 2
Win/loss for you (in $)
36. Standard Deviation: The square root of the
variance
Probability Mass Function
0.500
Probability
0.375
σ 2 = Var(X) = ∑ P(X = x i )(x i − µ) 2
0.250
σ = Var(X)
0.125
€ (Why? Because variance
0
-1 0 1 2 €was in the units of X2. STD
is in the same unit X.)
ANS: 1.5 dollars
Win/loss for you (in $)
37. µ = $0.5, σ = $1.5 µ = $0.5, σ ≈ $3.35
Coin Game Die Game
0.850 0.900
€
Probability
Probability
0.638 0.675
0.425 0.450
0.213 0.225
0 0
-1 0 1 2 3 4 5 6 7 8 -1 0 1 2 3 4 5 6 7 8
Win/loss for you (in $) Win/loss for you (in $)
38. Summary: Mean & Variance
Discrete Continuous
Definition
R.V.s R.V.s
∞
Mean:µ E(X) ∑ p(x )x i i ∫ p(x)xdx
i −∞
∞
2 2
Variance:σ 2
E((X − µ) ) ∑ p(x i )(x i − µ) 2 ∫ p(x)(x − µ) dx
€ i € −∞
€
€ €
€
39. Moments
But why stop at the variance (~ 2nd moment?)
3rd moment 4t moment
E(X 3 ) E(X 4 )
Skewness Kurtosis
40. Talk Outline
• Random Variables Defined
• Types of Random Variables
‣ Discrete
‣ Continuous
• Characterizing Random Variables
‣ Expected Value
‣ Variance/Standard Deviation; Entropy
‣ Linear Combinations of Random Variables
• Random Vectors Defined
• Characterizing Random Vectors
‣ Expected Value
‣ Covariance
41. What happens if I scale a R.V.?
Original Coin Game: X Y=2X
0.500 0.500
Probability
Probability
0.375 0.375
0.250 0.250
0.125 0.125
0 0
-1 0 1 2 -2 -1 0 1 2 3 4
Win/loss for you (in $) Win/loss for you (in $)
42. What happens if I scale a R.V.?
Y=2X
The New Mean: 0.500
Probability
µY = ∑ pY (2x i )2x i = 2∑ pX (x i )x i = 2µX 0.375
i i
µX = .5 0.250
µY = 1
0.125
0
-2 -1 0 1 2 3 4
Win/loss for you (in $)
43. What happens if I scale a R.V.?
Y=2X
The New Variance: 0.500
σ Y = ∑ pY (2x i )(2x i − µY ) 2 = ...
2
Probability
0.375
i
∑ pY (2x i )(2x i − 2µX ) 2 = ... 0.250
i
4 ∑ pX (x i )(x i − µX ) 2 = 4σ X = 9
2
0.125
i
0
-2 -1 0 1 2 3 4
Win/loss for you (in $)
44. What happens if I sum two
independent R.V.s?
One Round Y=X+X
0.500 0.500
Probability
Probability
0.375 0.375
0.250 0.250
0.125 0.125
0 0
-2 -1 0 1 2 3 4 -2 -1 0 1 2 3 4
Win/loss for you (in $) Win/loss for you (in $)
45. What happens if I sum two
independent R.V.s?
Y=X+X
The New Mean: 0.500
µY = µX + µX = 1
Probability
0.375
0.250
The New Variance:
2 2 2
σ = σ + σ = 4.5
Y X X 0.125
0
-2 -1 0 1 2 3 4
Win/loss for you (in $)
46. What happens if I sum two independent
identically distributed R.V.s?
One Round Y=X+X
0.500 0.500
Probability
Probability
0.375 0.375
0.250 0.250
0.125 0.125
0 0
-2 -1 0 1 2 3 4 -2 -1 0 1 2 3 4
Win/loss for you (in $) Win/loss for you (in $)
47. Expectation is linear
E(aX) = aE(X)
E(X + Y ) = E(X) + E(Y )
E(X + c) = E(X) + c
We could’ve calculated the previous results using
these properties!
Exercise: what happens to Var(aX)
and Var(X+Y) ?
48. What happens if I sum independent
identically distributed (i.i.d.) R.V.s?
1.500
Probability
1.125
0.750
0.375
0
0 1
# of Heads
49. What happens if I sum independent
identically distributed (i.i.d.) R.V.s?
0.500
Probability
0.375
0.250
0.125
0
0 1 2
# of Heads
50. What happens if I sum independent
identically distributed (i.i.d.) R.V.s?
0.4
Probability
0.3
0.2
0.1
0
0 1 2 3
# of Heads
51. What happens if I sum independent
identically distributed (i.i.d.) R.V.s?
0.4
Probability
0.3
0.2
0.1
0
0 1 2 3 4
# of Heads
52. What happens if I sum independent
identically distributed (i.i.d.) R.V.s?
What’s happening to the
Mean of 75 flips
pmf?
Ans: it’s looking more
and more Gaussian
53. What happens if I sum independent
identically distributed (i.i.d.) R.V.s?
Mean of 150 flips
54. Central Limit Theorem:
The sum of i.i.d. random variables is approximately
normally distributed when the number of random
This is one reason why
variables is large.
Gaussian variables are
popularly assumed when
doing statistical analysis
Normal pdf
or modeling. Another Mean of 150 flips
reason is that it’s
mathematically simpler
from: Oxford Dictionary of Statistics
55. The sum of two or more r.v.’s with normal
distributions are also normal distributions
The number of random
variables necessary to
make the sum
Normal pdf
approximately Gaussian
depends on the type of
population distribution
61. mean of 25 samples
From: R. R. Wilcox (2003) Applying Contemporary
Statistical Techniques
62. Wilcox says you need
100 samples from this
distribution to get a
decent approximation
mean of 50 samples
From: R. R. Wilcox (2003) Applying Contemporary
Statistical Techniques
63. Entropy: Another measure of variability
Probability Mass Function
0.60
H = −∑ p(x i )log 2 ( p(x i ))
Probability
0.45
0.30
Any base is OK, but when base 2
0.15
is used entropy is said to be in
€ units of “bits”
0
Democrat Republican
UCSD voters
64. Entropy: Another measure of variability
H = −∑ p(x i )log 2 ( p(x i ))
1. Entropy is minimal (H=0) when one outcome is
certain
2. Entropy is maximal when each of the
€
k outcomes is equally likely
⎛ 1 ⎞
H max = −log 2 ⎜ ⎟ = log 2 k
⎝ k ⎠
3. Entropy is a measure of information capacity.
€
65. Talk Outline
• Random Variables Defined
• Types of Random Variables
‣ Discrete
‣ Continuous Do simple RT experiment
• Characterizing Random Variables
‣ Expected Value
‣ Variance/Standard Deviation; Entropy
‣ Linear Combinations of Random Variables
• Random Vectors Defined
• Characterizing Random Vectors
‣ Expected Value
‣ Covariance
66. What about more than one
random variable?
256 EEG sensors
120 million photoreceptors
67. Random Vectors
• An n dimensional random vector consists of n random
variables all associated with the same probability space
(i.e., each outcome dictates the value of every random
variable)
• Example 2-D Random Vector:
⎡X ⎤ X=Reaction Time
v = ⎢ ⎥
⎣Y ⎦ Y=Arm Length
• Sample m times from v:
v1 v 2 v 3 ... v m
€
⎡x1 x2 x 3 ... x m ⎤
⎢ ⎥
⎣y1 y2 y 3 ... y m ⎦
68. Probability Distribution of a
Random Vector:
Example: Two normal r.v.s:
“Joint distribution” of
constituent r.v.s:
Probability
p(v) = p(X,Y )
Y
X
69. Probability Distribution of a
Random Vector:
Scatterplot of 5000
observations Example: Two normal r.v.s:
Probability
Y
X
70. What will the scatterplot of
our data look like?
A: B:
C: D:
71. Talk Outline
• Random Variables Defined
• Types of Random Variables
‣ Discrete
‣ Continuous
• Characterizing Random Variables
‣ Expected Value
‣ Variance/Standard Deviation; Entropy
‣ Linear Combinations of Random Variables
• Random Vectors Defined
• Characterizing Random Vectors
‣ Expected Value
‣ Covariance
72. Expected Value of a Random Vector
• The expected value of a random vector, v, is simply the
expected value of its constituent random variables.
• Example 2-D Random Vector:
⎡X ⎤
v = ⎢ ⎥
⎣Y ⎦
E(Y ) E(v)
⎡ E(X)⎤
E(v) = ⎢ ⎥
⎣ E(Y ) ⎦
€
€ €
⎡µX ⎤
µv = ⎢ ⎥
⎣µY ⎦ E(X)
73. Variance of a Random Vector?
• Is the variance of a random vector, v, simply the
variance of its constituent random variables?
• Example 2-D Random Vector:
⎡X ⎤ 2
⎡σ X ⎤
2
v = ⎢ ⎥ σ v = ⎢ 2 ⎥ ?
⎣Y ⎦ ⎣σ Y ⎦
€
€
74. Variance of a Random Vector?
• Is the variance of a random vector, v, simply the
variance of its constituent random variables?
• Example 2-D Random Vector:
X
⎡X ⎤ 2
⎡σ X ⎤
2
v = ⎢ ⎥ σ v = ⎢ 2 ⎥ ?
⎣Y ⎦ ⎣σ Y ⎦
€
€
76. Covariance Matrix of a Random Vector
• Diagonal entries are the variance of that dimension
• Off-diagonal entries are the covariance between the
column and row dimensions
‣ Covariance between two random variables:
Cov(X,Y ) = E((X − µx )(Y − µy ))
Note: Cov(X,Y ) = Cov(Y, X)
Cov(X,Y ) = 0 if X and Y are independent
€
Cov(X,Y ) ∝ Corr(X,Y )
• Our 2-D example:
⎡X ⎤ ⎡ Var(X) Cov(Y, X)⎤
€ v = ⎢ ⎥ C = ⎢ ⎥
⎣Y ⎦ ⎣Cov(X,Y ) Var(Y ) ⎦
78. Covariance of 0 does NOT entail
independence!!
•Recall: Cov(X,Y ) ∝ Corr(X,Y )
Cov(X,Y )
Corr(X,Y ) =
σ Xσ Y
•PMF of two dependent variables with a
covariance of 0:
€ p(X = 1,Y = 0) = .25 p(X = 0,Y = 1) = .25
p(X = −1,Y = 0) = .25 p(X = 0,Y = −1) = .25
•Special case: If two normally distributed random
variables have a covariance of 0, they ARE independent
€ €
79. Talk Outline
• Random Variables Defined
• Types of Random Variables
‣ Discrete
‣ Continuous
• Characterizing Random Variables
‣ Expected Value
‣ Variance/Standard Deviation; Entropy
‣ Linear Combinations of Random Variables
• Random Vectors Defined
• Characterizing Random Vectors
‣ Expected Value
‣ Covariance
80. Recommended Resources:
The Mathworld online math encyclopedia:
http://mathworld.wolfram.com/
Gonzalez & Woods: Review Chapter on Linear
Algebra, Probability, & Random Variables:
http://www.imageprocessingplace.com/root_files_V3/
tutorials.htm
Javier Movellan’s useful math facts:
http://mplab.ucsd.edu/wordpress/?page_id=75
81. Dana Ballard’s Natural Computation
(some good stuff)
Dayan & Abbot
Theoretical Neuroscience
82. Contemporary Data Analysis
Rand Wilcox, Applying Contemporary
Statistical Techniques
Sheldon Ross
A First Course in Probability