Upcoming SlideShare
×

# Random Variables

653

Published on

Published in: Technology, Education
3 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total Views
653
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
66
0
Likes
3
Embeds 0
No embeds

No notes for slide
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• Bernoulli\n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• aka 1st moment\naka weighted average\n
• aka 1st moment\naka weighted average\n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• Mention Information, how if you know nothing, more information. unifrom has most info.\n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• ### Random Variables

1. 1. Random Variables/Vectors Tomoki Tsuchida Computational & Cognitive Neuroscience Lab Department of Cognitive Science University of California, San Diego
2. 2. Talk Outline• Random Variables Deﬁned• Types of Random Variables ‣ Discrete ‣ Continuous• Characterizing Random Variables ‣ Expected Value ‣ Variance/Standard Deviation; Entropy ‣ Linear Combinations of Random Variables• Random Vectors Deﬁned• Characterizing Random Vectors ‣ Expected Value ‣ Covariance
3. 3. Random VariableElementary Outcomes of a The Real Line Random Experiment =\$1 Flipping a coin once =\$0• Random variable is a function of each outcome.• The probability of the r.v. (taking a particular value) isdetermined by the probability of the outcome.
4. 4. ExampleLet X be the sum of the payoffs from two coin ﬂips. P(X = 0) = P({TT}) = 1/4 P(X=1) = P({TH}) = P({HT}) = 1/2 P(X=2) = P({HH}) = 1/4 The random variable X takes values {0, 1, 2}, with probabilities {1/4, 1/2, 1/2}.
5. 5. Talk Outline• Random Variables Deﬁned• Types of Random Variables ‣ Discrete ‣ Continuous• Characterizing Random Variables ‣ Expected Value ‣ Variance/Standard Deviation; Entropy ‣ Linear Combinations of Random Variables• Random Vectors Deﬁned• Characterizing Random Vectors ‣ Expected Value ‣ Covariance
6. 6. Discrete Random Variables: Variables whose outcomes are separated by gaps Rolling a six-sided die once Flipping a coin once(and get paid for the number (and get paid for H): on the face): {0,1} {1,2,3,4,5,6}
7. 7. Discrete Random Variables: Deﬁned by a probability mass function, P •P(X=a)=P(a) •1≥P(a)≥0 •The probability of all outcomes sums to one (from the axiom!) Rolling a fair six-sided die 1.170 0.878 0.585Probability 0.293 0 1 2 3 4 5 6 Outcome
8. 8. Types of Probability Mass Functions: Discrete Uniform Distribution P(X = a) = 1 / N (where N is the total number of distinct outcomes) Rolling a fair six-sided die € 1.170 0.878 0.585 Probability 0.293 0 1 2 3 4 5 6 Outcome
9. 9. Types of Probability Mass Functions: Binomial Distribution Flipping a fair coin twice 0.500 0.375Probability 0.250 0.125 0 0 1 2 Number of Heads
10. 10. Types of Probability Mass Functions: Binomial Distribution ⎛ n ⎞ k n−k pmf: P(X = k) = ⎜ ⎟ p (1− p) ⎝ k ⎠ k: the number of “successes” (in our case the outcome of heads is deﬁned as success) p: probability of success in a single observation (in € our case .5) n: the number of observations (in our case two)⎛ n ⎞ n!⎜ ⎟ = : the number of different ways you could⎝ k ⎠ k!(n − k)! get k successes out of n observations
11. 11. Talk Outline• Random Variables Deﬁned• Types of Random Variables ‣ Discrete ‣ Continuous• Characterizing Random Variables ‣ Expected Value ‣ Variance/Standard Deviation; Entropy ‣ Linear Combinations of Random Variables• Random Vectors Deﬁned• Characterizing Random Vectors ‣ Expected Value ‣ Covariance
12. 12. Continuous Random Variables:Variables for which an outcome always lies between two other outcomes A person’s height: ?≥a>0
13. 13. Continuous Random Variables: Deﬁned by probability density function Discrete (pmf) Continuous (pdf) 0.500Probability 0.375 0.250 0.125 0 0 1 2 Number of Heads
14. 14. Continuous Random Variables: Probability of a range of outcomes p(179<=x<=181) p(x<=178) b p(a ≤ x ≤ b) = ∫ a f (x)dx p(X=x)=0 (no single outcome has any probability!)
15. 15. Continuous Random Variables: Deﬁned by probability density function, f Continuous•f(a)≥0•The area under the pdfmust equal 1
16. 16. Types of Probability Density Functions: Continuous Uniform Distributionif a≤x≤b: 1 f(x) = b−aelse: f(x) = 0a = lower boundb = upper bound
17. 17. Types of Probability Density Functions: Normal (Gaussian) Distribution (x− µ )2 1 2f(x) = e 2σ σ 2πσ = standard deviationµ = mean
18. 18. Cumulative distribution functionWhat if we want to know P(X ≤ x)? Density function Distribution function
19. 19. Types of probability distributionsThere are lots and lots of distributions! (But we can always look them on wikipedia!)
20. 20. Talk Outline• Random Variables Deﬁned• Types of Random Variables ‣ Discrete ‣ Continuous• Characterizing Random Variables ‣ Expected Value ‣ Variance/Standard Deviation; Entropy ‣ Linear Combinations of Random Variables• Random Vectors Deﬁned• Characterizing Random Vectors ‣ Expected Value ‣ Covariance
21. 21. Characterizing the distribution of a random variableIf we know the distribution of arandom variable, we pretty muchknow all there is to know aboutthe random variable.
22. 22. Characterizing the distribution of a random variableIf we know the distribution of arandom variable, we pretty muchknow all there is to know aboutthe random variable.But with real data, we don’t knowthe full distribution.
23. 23. Characterizing the distribution of a random variableIf we know the distribution of arandom variable, we pretty muchknow all there is to know aboutthe random variable.But with real data, we don’t knowthe full distribution.
24. 24. Characterizing the distribution of a random variableIf we know the distribution of arandom variable, we pretty muchknow all there is to know aboutthe random variable.But with real data, we don’t knowthe full distribution.So we want to characterize distributions by acouple of numbers (“statistics”.)
25. 25. Characterizing the Central Tendency of a Random Variable (x− µ )2 Normal (Gaussian) Distribution 1 2p(x) = e 2σ σ 2πσ = standard deviationµ = meanWe knoweverything frommean and STD
26. 26. A Simple Gambling Game1. Flip a fair coin2. Possible outcomes: I give you \$2 P(X=2)=1/2 P(X=-1)=1/2 You give me \$1
27. 27. E(X)=\$.5 A Simple Gambling Game Probability Mass Function 0.500Probability 0.375 0.250 0.125 0 -1 0 1 2 Win/loss for you (in \$)
28. 28. E(X)=\$.5 A Simple Gambling Game Probability Mass Function 0.500 If we played this game an inﬁnite # of times, what wouldProbability 0.375 the average outcome be? 0.250 0.125 0 -1 0 1 2 Win/loss for you (in \$)
29. 29. E(X)=\$.5 A Simple Gambling Game Probability Mass Function 0.500 If we played this game an inﬁnite # of times, what wouldProbability 0.375 the average outcome be? 0.250 µ = E(X) = ∑ P(X = x i )x i 0.125 0 Expected value -1 0 1 2 € the “mean” Win/loss for you (in \$)
30. 30. Another Gambling Game1. Roll a fair six-sided die2. Possible outcomes: Die Payoff 1 \$8 2 -\$1 3 -\$1 4 -\$1 5 -\$1 6 -\$1
31. 31. Another Gambling Game Probability Mass Function 0.900 What’s the mean outcomeProbability 0.675 of this game? 0.450 µ = E(X) = ∑ P(X = x i )x i 0.225 0 -1 0 1 2 3 4 5 6 7 8 € Win/loss for you (in \$) E(X)=\$.5
32. 32. Why should you prefer the coin game? Coin Game Die Game 0.850 0.900Probability Probability 0.638 0.675 0.425 0.450 0.213 0.225 0 0 -1 0 1 2 3 4 5 6 7 8 -1 0 1 2 3 4 5 6 7 8 Win/loss for you (in \$) Win/loss for you (in \$)
33. 33. Talk Outline• Random Variables Deﬁned• Types of Random Variables ‣ Discrete ‣ Continuous• Characterizing Random Variables ‣ Expected Value ‣ Variance/Standard Deviation; Moments ‣ Linear Combinations of Random Variables• Random Vectors Deﬁned• Characterizing Random Vectors ‣ Expected Value ‣ Covariance
34. 34. Characterizing the Variability of a Random Variable Coin Game Die Game 0.850 0.900Probability Probability 0.638 0.675 0.425 0.450 0.213 0.225 0 0 -1 0 1 2 3 4 5 6 7 8 -1 0 1 2 3 4 5 6 7 8 Win/loss for you (in \$) Win/loss for you (in \$)
35. 35. Variance: The expected value of the squared deviation from the mean Probability Mass Function Variance shows the 0.500 “spread” of the distribution.Probability 0.375 0.250 σ 2 = Var(X) = ∑ P(X = x i )(x i − µ) 2 0.125 ANS: 2.25=9/4 dollars 0 € squared -1 0 1 2 Win/loss for you (in \$)
36. 36. Standard Deviation: The square root of the variance Probability Mass Function 0.500Probability 0.375 σ 2 = Var(X) = ∑ P(X = x i )(x i − µ) 2 0.250 σ = Var(X) 0.125 € (Why? Because variance 0 -1 0 1 2 €was in the units of X2. STD is in the same unit X.) ANS: 1.5 dollars Win/loss for you (in \$)
37. 37. µ = \$0.5, σ = \$1.5 µ = \$0.5, σ ≈ \$3.35 Coin Game Die Game 0.850 0.900 €Probability Probability 0.638 0.675 0.425 0.450 0.213 0.225 0 0 -1 0 1 2 3 4 5 6 7 8 -1 0 1 2 3 4 5 6 7 8 Win/loss for you (in \$) Win/loss for you (in \$)
38. 38. Summary: Mean & Variance Discrete Continuous Deﬁnition R.V.s R.V.s ∞Mean:µ E(X) ∑ p(x )x i i ∫ p(x)xdx i −∞ ∞ 2 2Variance:σ 2 E((X − µ) ) ∑ p(x i )(x i − µ) 2 ∫ p(x)(x − µ) dx € i € −∞ € € € €
39. 39. MomentsBut why stop at the variance (~ 2nd moment?) 3rd moment 4t moment E(X 3 ) E(X 4 ) Skewness Kurtosis
40. 40. Talk Outline• Random Variables Deﬁned• Types of Random Variables ‣ Discrete ‣ Continuous• Characterizing Random Variables ‣ Expected Value ‣ Variance/Standard Deviation; Entropy ‣ Linear Combinations of Random Variables• Random Vectors Deﬁned• Characterizing Random Vectors ‣ Expected Value ‣ Covariance
41. 41. What happens if I scale a R.V.? Original Coin Game: X Y=2X 0.500 0.500 ProbabilityProbability 0.375 0.375 0.250 0.250 0.125 0.125 0 0 -1 0 1 2 -2 -1 0 1 2 3 4 Win/loss for you (in \$) Win/loss for you (in \$)
42. 42. What happens if I scale a R.V.? Y=2X The New Mean: 0.500 ProbabilityµY = ∑ pY (2x i )2x i = 2∑ pX (x i )x i = 2µX 0.375 i iµX = .5 0.250µY = 1 0.125 0 -2 -1 0 1 2 3 4 Win/loss for you (in \$)
43. 43. What happens if I scale a R.V.? Y=2X The New Variance: 0.500σ Y = ∑ pY (2x i )(2x i − µY ) 2 = ... 2 Probability 0.375 i∑ pY (2x i )(2x i − 2µX ) 2 = ... 0.250 i4 ∑ pX (x i )(x i − µX ) 2 = 4σ X = 9 2 0.125 i 0 -2 -1 0 1 2 3 4 Win/loss for you (in \$)
44. 44. What happens if I sum two independent R.V.s? One Round Y=X+X 0.500 0.500Probability Probability 0.375 0.375 0.250 0.250 0.125 0.125 0 0 -2 -1 0 1 2 3 4 -2 -1 0 1 2 3 4 Win/loss for you (in \$) Win/loss for you (in \$)
45. 45. What happens if I sum two independent R.V.s? Y=X+X The New Mean: 0.500µY = µX + µX = 1 Probability 0.375 0.250 The New Variance: 2 2 2σ = σ + σ = 4.5 Y X X 0.125 0 -2 -1 0 1 2 3 4 Win/loss for you (in \$)
46. 46. What happens if I sum two independent identically distributed R.V.s? One Round Y=X+X 0.500 0.500Probability Probability 0.375 0.375 0.250 0.250 0.125 0.125 0 0 -2 -1 0 1 2 3 4 -2 -1 0 1 2 3 4 Win/loss for you (in \$) Win/loss for you (in \$)
47. 47. Expectation is linear E(aX) = aE(X) E(X + Y ) = E(X) + E(Y ) E(X + c) = E(X) + cWe could’ve calculated the previous results using these properties! Exercise: what happens to Var(aX) and Var(X+Y) ?
48. 48. What happens if I sum independentidentically distributed (i.i.d.) R.V.s? 1.500 Probability 1.125 0.750 0.375 0 0 1 # of Heads
49. 49. What happens if I sum independentidentically distributed (i.i.d.) R.V.s? 0.500 Probability 0.375 0.250 0.125 0 0 1 2 # of Heads
50. 50. What happens if I sum independentidentically distributed (i.i.d.) R.V.s? 0.4 Probability 0.3 0.2 0.1 0 0 1 2 3 # of Heads
51. 51. What happens if I sum independentidentically distributed (i.i.d.) R.V.s? 0.4 Probability 0.3 0.2 0.1 0 0 1 2 3 4 # of Heads
52. 52. What happens if I sum independent identically distributed (i.i.d.) R.V.s?What’s happening to the Mean of 75 ﬂipspmf?Ans: it’s looking moreand more Gaussian
53. 53. What happens if I sum independentidentically distributed (i.i.d.) R.V.s? Mean of 150 ﬂips
54. 54. Central Limit Theorem: The sum of i.i.d. random variables is approximately normally distributed when the number of random This is one reason why variables is large. Gaussian variables are popularly assumed when doing statistical analysis Normal pdf or modeling. Another Mean of 150 ﬂips reason is that it’s mathematically simplerfrom: Oxford Dictionary of Statistics
55. 55. The sum of two or more r.v.’s with normal distributions are also normal distributionsThe number of randomvariables necessary tomake the sum Normal pdfapproximately Gaussiandepends on the type ofpopulation distribution
56. 56. Continuous Uniform Distribution
57. 57. Mean of 20 ObservationsFrom: R. R. Wilcox (2003) Applying Contemporary Statistical Techniques
58. 58. 1 ObservationFrom: R. R. Wilcox (2003) Applying Contemporary Statistical Techniques
59. 59. Mean of 20 ObservationsFrom: R. R. Wilcox (2003) Applying Contemporary Statistical Techniques
60. 60. 1 ObservationFrom: R. R. Wilcox (2003) Applying Contemporary Statistical Techniques
61. 61. mean of 25 samplesFrom: R. R. Wilcox (2003) Applying Contemporary Statistical Techniques
62. 62. Wilcox says you need100 samples from thisdistribution to get adecent approximation mean of 50 samples From: R. R. Wilcox (2003) Applying Contemporary Statistical Techniques
63. 63. Entropy: Another measure of variability Probability Mass Function 0.60 H = −∑ p(x i )log 2 ( p(x i ))Probability 0.45 0.30 Any base is OK, but when base 2 0.15 is used entropy is said to be in € units of “bits” 0 Democrat Republican UCSD voters
64. 64. Entropy: Another measure of variability H = −∑ p(x i )log 2 ( p(x i ))1. Entropy is minimal (H=0) when one outcome is certain2. Entropy is maximal when each of the € k outcomes is equally likely ⎛ 1 ⎞ H max = −log 2 ⎜ ⎟ = log 2 k ⎝ k ⎠3. Entropy is a measure of information capacity. €
65. 65. Talk Outline• Random Variables Deﬁned• Types of Random Variables ‣ Discrete ‣ Continuous Do simple RT experiment• Characterizing Random Variables ‣ Expected Value ‣ Variance/Standard Deviation; Entropy ‣ Linear Combinations of Random Variables• Random Vectors Deﬁned• Characterizing Random Vectors ‣ Expected Value ‣ Covariance
66. 66. What about more than one random variable? 256 EEG sensors120 million photoreceptors
67. 67. Random Vectors • An n dimensional random vector consists of n random variables all associated with the same probability space (i.e., each outcome dictates the value of every random variable) • Example 2-D Random Vector: ⎡X ⎤ X=Reaction Time v = ⎢ ⎥ ⎣Y ⎦ Y=Arm Length • Sample m times from v: v1 v 2 v 3 ... v m€ ⎡x1 x2 x 3 ... x m ⎤ ⎢ ⎥ ⎣y1 y2 y 3 ... y m ⎦
68. 68. Probability Distribution of a Random Vector: Example: Two normal r.v.s:“Joint distribution” of constituent r.v.s: Probability p(v) = p(X,Y ) Y X
69. 69. Probability Distribution of a Random Vector:Scatterplot of 5000 observations Example: Two normal r.v.s: Probability Y X
70. 70. What will the scatterplot of our data look like?A: B:C: D:
71. 71. Talk Outline• Random Variables Deﬁned• Types of Random Variables ‣ Discrete ‣ Continuous• Characterizing Random Variables ‣ Expected Value ‣ Variance/Standard Deviation; Entropy ‣ Linear Combinations of Random Variables• Random Vectors Deﬁned• Characterizing Random Vectors ‣ Expected Value ‣ Covariance
72. 72. Expected Value of a Random Vector • The expected value of a random vector, v, is simply the expected value of its constituent random variables. • Example 2-D Random Vector: ⎡X ⎤ v = ⎢ ⎥ ⎣Y ⎦ E(Y ) E(v) ⎡ E(X)⎤ E(v) = ⎢ ⎥ ⎣ E(Y ) ⎦€ € € ⎡µX ⎤ µv = ⎢ ⎥ ⎣µY ⎦ E(X)
73. 73. Variance of a Random Vector?• Is the variance of a random vector, v, simply the variance of its constituent random variables?• Example 2-D Random Vector: ⎡X ⎤ 2 ⎡σ X ⎤ 2 v = ⎢ ⎥ σ v = ⎢ 2 ⎥ ? ⎣Y ⎦ ⎣σ Y ⎦ € €
74. 74. Variance of a Random Vector?• Is the variance of a random vector, v, simply the variance of its constituent random variables?• Example 2-D Random Vector: X ⎡X ⎤ 2 ⎡σ X ⎤ 2 v = ⎢ ⎥ σ v = ⎢ 2 ⎥ ? ⎣Y ⎦ ⎣σ Y ⎦ € €
75. 75. X & Y all have Variance of 2A: B: C:
76. 76. Covariance Matrix of a Random Vector • Diagonal entries are the variance of that dimension • Off-diagonal entries are the covariance between the column and row dimensions ‣ Covariance between two random variables: Cov(X,Y ) = E((X − µx )(Y − µy )) Note: Cov(X,Y ) = Cov(Y, X) Cov(X,Y ) = 0 if X and Y are independent € Cov(X,Y ) ∝ Corr(X,Y ) • Our 2-D example: ⎡X ⎤ ⎡ Var(X) Cov(Y, X)⎤ € v = ⎢ ⎥ C = ⎢ ⎥ ⎣Y ⎦ ⎣Cov(X,Y ) Var(Y ) ⎦
77. 77. Which Data=which Covariance Matrix?A: B:C: ⎡ 2 1.5⎤ Q = ⎢ ⎥ ⎣1.5 2 ⎦ ⎡2 0⎤ S = ⎢ ⎥ ⎡ 2 −1.5⎤ ⎣0 2⎦ R = ⎢ ⎥ ⎣−1.5 2 ⎦ €
78. 78. Covariance of 0 does NOT entail independence!! •Recall: Cov(X,Y ) ∝ Corr(X,Y ) Cov(X,Y ) Corr(X,Y ) = σ Xσ Y •PMF of two dependent variables with a covariance of 0: € p(X = 1,Y = 0) = .25 p(X = 0,Y = 1) = .25 p(X = −1,Y = 0) = .25 p(X = 0,Y = −1) = .25 •Special case: If two normally distributed random variables have a covariance of 0, they ARE independent€ €
79. 79. Talk Outline• Random Variables Deﬁned• Types of Random Variables ‣ Discrete ‣ Continuous• Characterizing Random Variables ‣ Expected Value ‣ Variance/Standard Deviation; Entropy ‣ Linear Combinations of Random Variables• Random Vectors Deﬁned• Characterizing Random Vectors ‣ Expected Value ‣ Covariance
80. 80. Recommended Resources:The Mathworld online math encyclopedia: http://mathworld.wolfram.com/Gonzalez & Woods: Review Chapter on Linear Algebra, Probability, & Random Variables: http://www.imageprocessingplace.com/root_ﬁles_V3/ tutorials.htm Javier Movellan’s useful math facts: http://mplab.ucsd.edu/wordpress/?page_id=75
81. 81. Dana Ballard’s Natural Computation (some good stuff) Dayan & Abbot Theoretical Neuroscience
82. 82. Contemporary Data AnalysisRand Wilcox, Applying Contemporary Statistical Techniques Sheldon Ross A First Course in Probability
83. 83. Recommended Free Stats Software www.r-project.org www.scipy.org
1. #### A particular slide catching your eye?

Clipping is a handy way to collect important slides you want to go back to later.