SlideShare a Scribd company logo
1 of 31
Download to read offline
Lecture 4: Random
Variables and Distributions
Goals
• Working with distributions in R
• Overview of discrete and continuous
distributions important in genetics/genomics
• Random Variables
Random Variables
!
"
0 1
-1
A rv is any rule (i.e., function) that associates
a number with each outcome in the sample
space
Two Types of Random Variables
• A discrete random variable has a
countable number of possible values
• A continuous random variable takes all
values in an interval of numbers
Probability Distributions of RVs
Discrete
Let X be a discrete rv. Then the probability mass function (pmf), f(x),
of X is:
!
f (x) =
P(X = x), x ∈ Ω
0, x ∉ Ω
Continuous
P(a " X " b) = f (x)dx
a
b
#
Let X be a continuous rv. Then the probability density function (pdf) of
X is a function f(x) such that for any two numbers a and b with a ≤ b:
a b
A a
Using CDFs to Compute Probabilities
Continuous rv:
!
F(x) = P(X " x) = f (y)dy
#$
x
%
pdf cdf
P(a " X " b) = F(b) # F(a)
Using CDFs to Compute Probabilities
Continuous rv:
!
F(x) = P(X " x) = f (y)dy
#$
x
%
pdf cdf
P(a " X " b) = F(b) # F(a)
Expectation of Random Variables
Continuous
µX = E[X] = x " f (x)dx
#$
$
%
The expected or mean value of a continuous rv X with pdf f(x) is:
Discrete
Let X be a discrete rv that takes on values in the set D and has a
pmf f(x). Then the expected or mean value of X is:
!
µX = E[X] = x " f (x)
x#D
$
Variance of Random Variables
Continuous
"X
2
= V[X] = (x # µ)2
$ f (x)dx
#%
%
& = E[(X # µ)2
]
The variance of a continuous rv X with pdf f(x) and mean µ is:
Discrete
Let X be a discrete rv with pmf f(x) and expected value µ. The
variance of X is:
!
"X
2
= V[X] = (x #
x$D
% µ)2
= E[(X # µ)2
]
Example of Expectation and Variance
• Let L1, L2, …, Ln be a sequence of n nucleotides and define the rv
Xi:
1, if Li = A
0, otherwise
Xi
• pmf is then: P(Xi = 1) = P(Li = A) = pA
P(Xi = 0) = P(Li = C or G or T) = 1 - pA
• E[X] = 1 x pA + 0 x (1 - pA) = pA
• Var[X] = E[X - µ]2 = E[X2] - µ2
= [12 x pA + 02 x (1 - pA)] - pA
2
= pA (1 - pA)
The Distributions We’ll Study
1. Binomial Distribution
2. Hypergeometric Distribution
3. Poisson Distribution
4. Normal Distribution
Binomial Distribution
• Experiment consists of n trials
– e.g., 15 tosses of a coin; 20 patients; 1000 people surveyed
• Trials are identical and each can result in
one of the same two outcomes
– e.g., head or tail in each toss of a coin
– Generally called “success” and “failure”
– Probability of success is p, probability of failure is 1 – p
• Trials are independent
• Constant probability for each observation
– e.g., Probability of getting a tail is the same each time we
toss the coin
Binomial Distribution
!
P{X = x} = ( )px
(1" p)n"x
n
x
pmf:
E(x) = np
cdf:
!
P{X " x} = ( )py
(1# p)n#y
y= 0
x
$ n
y
Var(x) = np(1-p)
Binomial Distribution: Example 1
• A couple, who are both carriers for a recessive
disease, wish to have 5 children. They want to know
the probability that they will have four healthy kids
!
P{X = 4} = ( )0.754
" 0.251
5
4
= 0.395
0 1 2 3 4 5
p(x)
Binomial Distribution: Example 2
• Wright-Fisher model: There are i copies of the A allele
in a population of size 2N in generation t. What is the
distribution of the number of A alleles in generation t
+ 1?
!
i
2N
"
#
$
%
&
'
j
1(
i
2N
"
#
$
%
&
'
2N( j
2N
pij = j
j = 0, 1, …, 2N
Hypergeometric Distribution
• Population to be sampled consists of N
finite individuals, objects, or elements
• Each individual can be characterized as a
success or failure, m successes in the
population
• A sample of size k is drawn and the rv of
interest is X = number of successes
Hypergeometric Distribution
• Similar in spirit to Binomial distribution, but from a finite
population without replacement
20 white balls
out of
100 balls
If we randomly sample 10 balls, what is the probability that 7
or more are white?
Hypergeometric Distribution
• pmf of a hypergeometric rv:
P{X = i | n,m,k} =
m
i
n
k - i
m + n
k
For i = 0, 1, 2, 3, …
Where,
m = Number of balls in urn considered “success”
k = Number of balls selected
n = Number of balls in urn considered “failure”
m + n = Total number of balls in urn
Hypergeometric Distribution
• Extensively used in genomics to test for “enrichment”:
" = Number of annotated genes
Number of
genes of
interest
Number of
genes with
annotation
Number of genes of interest with
annotation
Poisson Distribution
• Useful in studying rare events
• Poisson distribution also used in situations
where “events” happen at certain points
in time
• Poisson distribution approximates the
binomial distribution when n is large and p
is small
Poisson Distribution
!
P{X = i} = e"# #i
i!
• A rv X follows a Poisson distribution if the pmf of X is:
For i = 0, 1, 2, 3, …
• λ is frequently a rate per unit time:
• Safely approximates a binomial experiment when n > 100, p <
0.01, np = λ < 20)
• E(X) = Var(X) = λ
λ = αt = expected number of events per unit time t
Poisson RV: Example 1
!
P{X = i} = e"d di
i!
• The number of crossovers, X, between two
markers is X ~ poisson(λ=d)
!
P{X = 0} = e"d
!
P{X "1} =1# e#d
Poisson RV: Example 2
• Recent work in Drosophila suggests the spontaneous rate of
deleterious mutations is ~ 1.2 per diploid genome. Thus, let’s
tentatively assume X ~ poisson(λ = 1.2) for humans. What is
the probability that an individual has 12 or more spontaneous
deleterious mutations?
!
P{X "12} =1# e#1.2 1.2i
i!
i= 0
11
$
= 6.17 x 10-9
Poisson RV: Example 3
• Suppose that a rare disease has an incidence of 1 in 1000 people
per year. Assuming that members of the population are affected
independently, find the probability of k cases in a population of
10,000 (followed over 1 year) for k=0,1,2.
The expected value (mean) = λ = .001*10,000 = 10
!
P(X = 0) =
(10)0
e"(10)
0!
= .0000454
P(X =1) =
(10)1
e"(10)
1!
= .000454
P(X = 2) =
(10)2
e"(10)
2!
= .00227
Normal Distribution
• “Most important” probability distribution
• Many rv’s are approximately normally
distributed
• Even when they aren’t, their sums and
averages often are (CLT)
Normal Distribution
!
f (x;µ,"2
) =
1
2#"
e$(x$µ)2
/ 2" 2
• pdf of normal distribution:
• standard normal distribution (µ = 0, σ2 = 1):
!
f (z;0,1) =
1
2"#
e$z2
/ 2
• cdf of Z:
P(Z " z) = f (y;0,1)
#$
z
% dy
Standardizing Normal RV
• If X has a normal distribution with mean µ and
standard deviation σ, we can standardize to a standard
normal rv:
!
Z =
X " µ
#
I Digress: Sampling Distributions
• Before data is collected, we regard observations as random
variables (X1,X2,…,Xn)
• This implies that until data is collected, any function (statistic)
of the observations (mean, sd, etc.) is also a random variable
• Thus, any statistic, because it is a random variable, has a
probability distribution - referred to as a sampling
distribution
• Let’s focus on the sampling distribution of the mean,
!
X
Behold The Power of the CLT
• Let X1,X2,…,Xn be an iid random sample from a distribution with mean µ and
standard deviation σ. If n is sufficiently large:
!
X ~N(µ
,
!
"
n
)
Example
• If the mean and standard deviation of serum iron values from
healthy men are 120 and 15 mgs per 100ml, respectively, what is
the probability that a random sample of 50 normal men will yield a
mean between 115 and 125 mgs per 100ml?
!
p(115 " x "125 = p
115 #120
2.12
" x "
125 #120
2.12
$
%
&
'
(
)
!
= p "2.36 # z # 2.36
( )
= p z " 2.36
( )# p z " #2.36
( )
= 0.9909 # 0.0091
= 0.9818
First, calculate mean and sd to normalize (120 and )
!
15/ 50
R
• Understand how to calculate probabilities from probability
distributions
 Normal: dnorm and pnorm
 Poisson: dpois and ppois
 Binomial: dbinom and pbinom
 Hypergeometric: dhyper and phyper
• Exploring relationships among distributions

More Related Content

Similar to lecture4.pdf

Doe02 statistics
Doe02 statisticsDoe02 statistics
Doe02 statisticsArif Rahman
 
Statistical computing2
Statistical computing2Statistical computing2
Statistical computing2Padma Metta
 
understanding-key-concepts-of-probability-and-random-variables-through-exampl...
understanding-key-concepts-of-probability-and-random-variables-through-exampl...understanding-key-concepts-of-probability-and-random-variables-through-exampl...
understanding-key-concepts-of-probability-and-random-variables-through-exampl...elistemidayo
 
Lect w2 measures_of_location_and_spread
Lect w2 measures_of_location_and_spreadLect w2 measures_of_location_and_spread
Lect w2 measures_of_location_and_spreadRione Drevale
 
U1.4-RVDistributions.ppt
U1.4-RVDistributions.pptU1.4-RVDistributions.ppt
U1.4-RVDistributions.pptSameeraasif2
 
Probability distribution
Probability distributionProbability distribution
Probability distributionManoj Bhambu
 
Probability and Statistics : Binomial Distribution notes ppt.pdf
Probability and Statistics : Binomial Distribution notes ppt.pdfProbability and Statistics : Binomial Distribution notes ppt.pdf
Probability and Statistics : Binomial Distribution notes ppt.pdfnomovi6416
 
Chapter 4 part2- Random Variables
Chapter 4 part2- Random VariablesChapter 4 part2- Random Variables
Chapter 4 part2- Random Variablesnszakir
 
random variation 9473 by jaideep.ppt
random variation 9473 by jaideep.pptrandom variation 9473 by jaideep.ppt
random variation 9473 by jaideep.pptBhartiYadav316049
 
Probability Distribution - Binomial, Exponential and Normal
Probability Distribution - Binomial, Exponential and NormalProbability Distribution - Binomial, Exponential and Normal
Probability Distribution - Binomial, Exponential and NormalBrainware University
 
Statistics lecture 6 (ch5)
Statistics lecture 6 (ch5)Statistics lecture 6 (ch5)
Statistics lecture 6 (ch5)jillmitchell8778
 
Probability and Statistics
Probability and StatisticsProbability and Statistics
Probability and StatisticsMalik Sb
 
Discrete Probability Distributions
Discrete Probability DistributionsDiscrete Probability Distributions
Discrete Probability Distributionsmandalina landy
 

Similar to lecture4.pdf (20)

PTSP PPT.pdf
PTSP PPT.pdfPTSP PPT.pdf
PTSP PPT.pdf
 
Doe02 statistics
Doe02 statisticsDoe02 statistics
Doe02 statistics
 
Statistical computing2
Statistical computing2Statistical computing2
Statistical computing2
 
understanding-key-concepts-of-probability-and-random-variables-through-exampl...
understanding-key-concepts-of-probability-and-random-variables-through-exampl...understanding-key-concepts-of-probability-and-random-variables-through-exampl...
understanding-key-concepts-of-probability-and-random-variables-through-exampl...
 
Binomial probability distributions
Binomial probability distributions  Binomial probability distributions
Binomial probability distributions
 
Lect w2 measures_of_location_and_spread
Lect w2 measures_of_location_and_spreadLect w2 measures_of_location_and_spread
Lect w2 measures_of_location_and_spread
 
U1.4-RVDistributions.ppt
U1.4-RVDistributions.pptU1.4-RVDistributions.ppt
U1.4-RVDistributions.ppt
 
Probability distribution
Probability distributionProbability distribution
Probability distribution
 
Probability and Statistics : Binomial Distribution notes ppt.pdf
Probability and Statistics : Binomial Distribution notes ppt.pdfProbability and Statistics : Binomial Distribution notes ppt.pdf
Probability and Statistics : Binomial Distribution notes ppt.pdf
 
Chapter 4 part2- Random Variables
Chapter 4 part2- Random VariablesChapter 4 part2- Random Variables
Chapter 4 part2- Random Variables
 
random variation 9473 by jaideep.ppt
random variation 9473 by jaideep.pptrandom variation 9473 by jaideep.ppt
random variation 9473 by jaideep.ppt
 
Prob distros
Prob distrosProb distros
Prob distros
 
U unit7 ssb
U unit7 ssbU unit7 ssb
U unit7 ssb
 
Probability Distribution - Binomial, Exponential and Normal
Probability Distribution - Binomial, Exponential and NormalProbability Distribution - Binomial, Exponential and Normal
Probability Distribution - Binomial, Exponential and Normal
 
Stats chapter 7
Stats chapter 7Stats chapter 7
Stats chapter 7
 
Statistics lecture 6 (ch5)
Statistics lecture 6 (ch5)Statistics lecture 6 (ch5)
Statistics lecture 6 (ch5)
 
Probability and Statistics
Probability and StatisticsProbability and Statistics
Probability and Statistics
 
Quantification
QuantificationQuantification
Quantification
 
Discrete Probability Distributions
Discrete Probability DistributionsDiscrete Probability Distributions
Discrete Probability Distributions
 
5. RV and Distributions.pptx
5. RV and Distributions.pptx5. RV and Distributions.pptx
5. RV and Distributions.pptx
 

Recently uploaded

Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
ROOT CAUSE ANALYSIS PowerPoint Presentation
ROOT CAUSE ANALYSIS PowerPoint PresentationROOT CAUSE ANALYSIS PowerPoint Presentation
ROOT CAUSE ANALYSIS PowerPoint PresentationAadityaSharma884161
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Romantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxRomantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxsqpmdrvczh
 
Planning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxPlanning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxLigayaBacuel1
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.arsicmarija21
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........LeaCamillePacle
 

Recently uploaded (20)

Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
ROOT CAUSE ANALYSIS PowerPoint Presentation
ROOT CAUSE ANALYSIS PowerPoint PresentationROOT CAUSE ANALYSIS PowerPoint Presentation
ROOT CAUSE ANALYSIS PowerPoint Presentation
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Romantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxRomantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptx
 
Planning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxPlanning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptx
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........
 

lecture4.pdf

  • 1. Lecture 4: Random Variables and Distributions
  • 2. Goals • Working with distributions in R • Overview of discrete and continuous distributions important in genetics/genomics • Random Variables
  • 3. Random Variables ! " 0 1 -1 A rv is any rule (i.e., function) that associates a number with each outcome in the sample space
  • 4. Two Types of Random Variables • A discrete random variable has a countable number of possible values • A continuous random variable takes all values in an interval of numbers
  • 5. Probability Distributions of RVs Discrete Let X be a discrete rv. Then the probability mass function (pmf), f(x), of X is: ! f (x) = P(X = x), x ∈ Ω 0, x ∉ Ω Continuous P(a " X " b) = f (x)dx a b # Let X be a continuous rv. Then the probability density function (pdf) of X is a function f(x) such that for any two numbers a and b with a ≤ b: a b A a
  • 6. Using CDFs to Compute Probabilities Continuous rv: ! F(x) = P(X " x) = f (y)dy #$ x % pdf cdf P(a " X " b) = F(b) # F(a)
  • 7. Using CDFs to Compute Probabilities Continuous rv: ! F(x) = P(X " x) = f (y)dy #$ x % pdf cdf P(a " X " b) = F(b) # F(a)
  • 8. Expectation of Random Variables Continuous µX = E[X] = x " f (x)dx #$ $ % The expected or mean value of a continuous rv X with pdf f(x) is: Discrete Let X be a discrete rv that takes on values in the set D and has a pmf f(x). Then the expected or mean value of X is: ! µX = E[X] = x " f (x) x#D $
  • 9. Variance of Random Variables Continuous "X 2 = V[X] = (x # µ)2 $ f (x)dx #% % & = E[(X # µ)2 ] The variance of a continuous rv X with pdf f(x) and mean µ is: Discrete Let X be a discrete rv with pmf f(x) and expected value µ. The variance of X is: ! "X 2 = V[X] = (x # x$D % µ)2 = E[(X # µ)2 ]
  • 10. Example of Expectation and Variance • Let L1, L2, …, Ln be a sequence of n nucleotides and define the rv Xi: 1, if Li = A 0, otherwise Xi • pmf is then: P(Xi = 1) = P(Li = A) = pA P(Xi = 0) = P(Li = C or G or T) = 1 - pA • E[X] = 1 x pA + 0 x (1 - pA) = pA • Var[X] = E[X - µ]2 = E[X2] - µ2 = [12 x pA + 02 x (1 - pA)] - pA 2 = pA (1 - pA)
  • 11. The Distributions We’ll Study 1. Binomial Distribution 2. Hypergeometric Distribution 3. Poisson Distribution 4. Normal Distribution
  • 12. Binomial Distribution • Experiment consists of n trials – e.g., 15 tosses of a coin; 20 patients; 1000 people surveyed • Trials are identical and each can result in one of the same two outcomes – e.g., head or tail in each toss of a coin – Generally called “success” and “failure” – Probability of success is p, probability of failure is 1 – p • Trials are independent • Constant probability for each observation – e.g., Probability of getting a tail is the same each time we toss the coin
  • 13. Binomial Distribution ! P{X = x} = ( )px (1" p)n"x n x pmf: E(x) = np cdf: ! P{X " x} = ( )py (1# p)n#y y= 0 x $ n y Var(x) = np(1-p)
  • 14. Binomial Distribution: Example 1 • A couple, who are both carriers for a recessive disease, wish to have 5 children. They want to know the probability that they will have four healthy kids ! P{X = 4} = ( )0.754 " 0.251 5 4 = 0.395 0 1 2 3 4 5 p(x)
  • 15. Binomial Distribution: Example 2 • Wright-Fisher model: There are i copies of the A allele in a population of size 2N in generation t. What is the distribution of the number of A alleles in generation t + 1? ! i 2N " # $ % & ' j 1( i 2N " # $ % & ' 2N( j 2N pij = j j = 0, 1, …, 2N
  • 16. Hypergeometric Distribution • Population to be sampled consists of N finite individuals, objects, or elements • Each individual can be characterized as a success or failure, m successes in the population • A sample of size k is drawn and the rv of interest is X = number of successes
  • 17. Hypergeometric Distribution • Similar in spirit to Binomial distribution, but from a finite population without replacement 20 white balls out of 100 balls If we randomly sample 10 balls, what is the probability that 7 or more are white?
  • 18. Hypergeometric Distribution • pmf of a hypergeometric rv: P{X = i | n,m,k} = m i n k - i m + n k For i = 0, 1, 2, 3, … Where, m = Number of balls in urn considered “success” k = Number of balls selected n = Number of balls in urn considered “failure” m + n = Total number of balls in urn
  • 19. Hypergeometric Distribution • Extensively used in genomics to test for “enrichment”: " = Number of annotated genes Number of genes of interest Number of genes with annotation Number of genes of interest with annotation
  • 20. Poisson Distribution • Useful in studying rare events • Poisson distribution also used in situations where “events” happen at certain points in time • Poisson distribution approximates the binomial distribution when n is large and p is small
  • 21. Poisson Distribution ! P{X = i} = e"# #i i! • A rv X follows a Poisson distribution if the pmf of X is: For i = 0, 1, 2, 3, … • λ is frequently a rate per unit time: • Safely approximates a binomial experiment when n > 100, p < 0.01, np = λ < 20) • E(X) = Var(X) = λ λ = αt = expected number of events per unit time t
  • 22. Poisson RV: Example 1 ! P{X = i} = e"d di i! • The number of crossovers, X, between two markers is X ~ poisson(λ=d) ! P{X = 0} = e"d ! P{X "1} =1# e#d
  • 23. Poisson RV: Example 2 • Recent work in Drosophila suggests the spontaneous rate of deleterious mutations is ~ 1.2 per diploid genome. Thus, let’s tentatively assume X ~ poisson(λ = 1.2) for humans. What is the probability that an individual has 12 or more spontaneous deleterious mutations? ! P{X "12} =1# e#1.2 1.2i i! i= 0 11 $ = 6.17 x 10-9
  • 24. Poisson RV: Example 3 • Suppose that a rare disease has an incidence of 1 in 1000 people per year. Assuming that members of the population are affected independently, find the probability of k cases in a population of 10,000 (followed over 1 year) for k=0,1,2. The expected value (mean) = λ = .001*10,000 = 10 ! P(X = 0) = (10)0 e"(10) 0! = .0000454 P(X =1) = (10)1 e"(10) 1! = .000454 P(X = 2) = (10)2 e"(10) 2! = .00227
  • 25. Normal Distribution • “Most important” probability distribution • Many rv’s are approximately normally distributed • Even when they aren’t, their sums and averages often are (CLT)
  • 26. Normal Distribution ! f (x;µ,"2 ) = 1 2#" e$(x$µ)2 / 2" 2 • pdf of normal distribution: • standard normal distribution (µ = 0, σ2 = 1): ! f (z;0,1) = 1 2"# e$z2 / 2 • cdf of Z: P(Z " z) = f (y;0,1) #$ z % dy
  • 27. Standardizing Normal RV • If X has a normal distribution with mean µ and standard deviation σ, we can standardize to a standard normal rv: ! Z = X " µ #
  • 28. I Digress: Sampling Distributions • Before data is collected, we regard observations as random variables (X1,X2,…,Xn) • This implies that until data is collected, any function (statistic) of the observations (mean, sd, etc.) is also a random variable • Thus, any statistic, because it is a random variable, has a probability distribution - referred to as a sampling distribution • Let’s focus on the sampling distribution of the mean, ! X
  • 29. Behold The Power of the CLT • Let X1,X2,…,Xn be an iid random sample from a distribution with mean µ and standard deviation σ. If n is sufficiently large: ! X ~N(µ , ! " n )
  • 30. Example • If the mean and standard deviation of serum iron values from healthy men are 120 and 15 mgs per 100ml, respectively, what is the probability that a random sample of 50 normal men will yield a mean between 115 and 125 mgs per 100ml? ! p(115 " x "125 = p 115 #120 2.12 " x " 125 #120 2.12 $ % & ' ( ) ! = p "2.36 # z # 2.36 ( ) = p z " 2.36 ( )# p z " #2.36 ( ) = 0.9909 # 0.0091 = 0.9818 First, calculate mean and sd to normalize (120 and ) ! 15/ 50
  • 31. R • Understand how to calculate probabilities from probability distributions  Normal: dnorm and pnorm  Poisson: dpois and ppois  Binomial: dbinom and pbinom  Hypergeometric: dhyper and phyper • Exploring relationships among distributions