Probability distributions
Dr. S. A. Rizwan, M.D.
Public	Health	Specialist
SBCM,	Joint	Program	– Riyadh
Ministry	of	Health,	Kingdom	of	Saudi	Arabia
Learning	objectives
Demystifying statistics! – Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Define	probability	distributions
• Describe	the	common	types	of	probability	distributions
• Describe	sampling	distribution
• Understand	the	central	limit	theorem
Probability	distribution
Demystifying statistics! – Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Probability	distribution	is	a	mathematical	
function	that	can	be	thought	of	as	providing	
the	probability	of	occurrence	of	different	
possible	outcomes	in	an	experiment.
• The	distribution	of	a	statistical	data	set	(or	a	
population)	is	a	listing	or	function	showing	all	
the	possible	values	(or	intervals)	of	the	data	
and	how	often	they	occur.
Probability	distribution
Demystifying statistics! – Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
Section	1:	
Binomial	distribution
Demystifying statistics! – Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh 5
Binomial	distribution
Demystifying statistics! – Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Following	conditions	need	to	be	satisfied	for	a	
binomial	experiment/distribution:
• There	is	a	fixed	number	of	n	trials	carried	out.
• The	outcome	of	a	given	trial	is	either	a	“success”	or	
“failure”.
• The	probability	of	success	(p)	remains	constant	from	
trial	to	trial.	
• The	trials	are	independent,	the	outcome	of	a	trial	is	
not	affected	by	the	outcome	of	any	other	trial.
Binomial	distribution	– example
Demystifying statistics! – Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Suppose	we	have	n	=	40	patients	who	will	be	receiving	an	experimental	
therapy	which	is	believed	to	be	better	than	current	treatments	which	
historically	have	had	a	5-year	survival	rate	of	20%,	i.e.	the	probability	of	
5-year	survival	is	p	=	0.20
• Thus	the	number	of	patients	out	of	40	in	our	study	surviving	at	least	5	
years	has	a	binomial	distribution,	i.e.		X	~	BIN(40,	0.20)
Binomial	distribution	– example
Demystifying statistics! – Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Suppose	that	using	the	new	treatment	we	find	that	16	out	of	the	40	
patients	survive	at	least	5	years	past	diagnosis.
• Q:		Does	this	result	suggest	that	the	new	therapy	has	a	better	5-year	
survival	rate	than	the	current,	i.e.	is	the	probability	that	a	patient	survives	
at	least	5	years	greater	than	.20	or	a	20%	chance	when	treated	using	the	
new	therapy?
Binomial	distribution	– example
Demystifying statistics! – Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• We	essentially	ask	ourselves	the	following:
• If	we	assume	that	new	therapy	is	no	better	than	the	current	what	is	the	
probability	we	would	see	these	results	by	chance	variation	alone?
• More	specifically	what	is	the	probability	of	seeing	16	or	more	successes	
out	of	40	if	the	success	rate	of	the	new	therapy	is	.20	or	20%	as	well?
Binomial	distribution	– example
Demystifying statistics! – Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• This	is	a	binomial	experiment	situation
• There	are	n	=	40	patients	and	we	are	counting	the	number	of	patients	
that	survive	5	or	more	years.		The	individual	patient	outcomes	are	
independent	and	IF	WE	ASSUME	the	new	method	is	NOT	better,	then	the	
probability	of	success	is	p	=	.20	or	20%	for	all	patients.
• So	X	=	#	of	“successes”	in	the	clinical	trial	is	binomial	with	n	=	40	and	p	=	
0.20,	i.e.			X	~	BIN(40,	0.20)
Binomial	distribution	– example
Demystifying statistics! – Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• X	~	BIN(40,.20),	find	the	probability	that	16	or	more	patients	survive	at	
least	5	years. probabilities are computed
automatically for greater than
or equal to and less than or
equal to x.
Enter
n = sample size
x = observed # of “successes”
p = probabilityof “success”
Binomial	distribution	– example
Demystifying statistics! – Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• The	chance	that	we	would	see	16	or	more	patients	out	of	40	surviving	at	
least	5	years	if	the	new	method	has	the	same	chance	of	success	as	the	
current	methods	(20%)	is	VERY	SMALL,	0.0029.
Section	2:	
Normal	distribution
Demystifying statistics! – Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh 13
Normal	distribution
Demystifying statistics! – Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• The	normal	distribution	is	a	descriptive	model	that	describes	real	world	
situations.
• It	is	defined	as	a	continuous	frequency	distribution	of	infinite	range	(can	
take	any	value).
• This	is	the	most	important	probability	distribution	in	statistics		and		
important	tool	in	analysis	of	epidemiological	data
Normal	distribution	- properties
Demystifying statistics! – Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• The	normal	distribution	is	defined	by	two	parameters,	μ	and	σ.	
• You	can	draw	a	normal	distribution	for	any	μ	and	σ	combination.		
• There	is	one	normal	distribution,	Z,	that	is	special.	
• It	has	μ	=	0	and	σ	=	1.
• Also	called	standard	normal	distribution.
Normal	distribution	- properties
Demystifying statistics! – Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Mean	=	Median	=	Mode
• Spread determined	by	SD
• Bell-shaped
• Symmetry	about	the	center
• 50%	of	values	less	than	the	mean		and	
50%	greater	than	the	mean
• It	approaches	horizontal	axis	
asymptotically:	- ∞	<	X	<	+	∞
• Area	under	the	curve	is	1
Normal	distribution	- properties
Demystifying statistics! – Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
Normal	distribution	- properties
Demystifying statistics! – Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
Normal	distribution	– example
Demystifying statistics! – Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Assuming	the	normal	heart	rate	(H.R)	in	normal	healthy	individuals	is	
normally	distributed	with	Mean	=	70	and	Standard	Deviation	=	10
• Q1.	What	area	under	the	curve	is	above	80	beats/min?
• Q2.	What	area	of	the	curve	is	above	90	beats/min?
• Q3.	What	area	of	the	curve	is	between	50-90	beats/min?	
• Q4.	What	area	of	the	curve	is	above	100	beats/min?
• Q5.	What	area	of	the	curve	is	below	40	beats	per	min	or	above	100	beats	
per	min?
Normal	distribution	– example
Demystifying statistics! – Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
Normal	distribution	– example
Demystifying statistics! – Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
Section	3:	
Sampling	distribution
Demystifying statistics! – Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh 22
Sampling	distribution
Demystifying statistics! – Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Sampling	distribution	of	the	mean	– A	theoretical	probability	distribution	
of	sample	means	that	would	be	obtained	by	drawing	from	the	population	
all	possible	samples	of	the	same	size.
Sampling	distribution
Demystifying statistics! – Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
Central	Limit	Theorem
Demystifying statistics! – Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• No	matter	what	we	are	measuring,	the	distribution	of	any	measure	
across	all	possible	samples	we	could	take	approximates	a	normal	
distribution,	as	long	as	the	number	of	cases	in	each	sample	is	about	30	or	
larger.
• If	we	repeatedly	drew	samples	from	a	population	and	calculated	the	
mean	of	a	variable	or	a	percentage	or,	those	sample	means	or	
percentages	would	be	normally	distributed.
• It	enables	us	to	calculate	Standard	error	from	a	single	sample
Section	4:	
Percentiles
Demystifying statistics! – Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh 26
Percentiles
Demystifying statistics! – Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Value	below	which	a	percentage	of	data	falls.
• For	example:	80%	of	people	are	shorter	than	you,	That	means	you	are	at	
the	80th	percentile.	If	your	height	is	1.85m	then	"1.85m"	is	the	80th	
percentile	height	in	that	group.
Percentiles
Demystifying statistics! – Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
Percentiles
Demystifying statistics! – Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Quantiles	are	cutpointsdividing	the	range	of	a	probability	distribution	
into	contiguous	intervals	with	equal	probabilities
• Median,	tertiles,	quartiles,	quintiles,	sextiles,	septiles,	octiles,	deciles,	
percentiles	or	centiles
• Inter-quartile	range
Take	home	messages
Demystifying statistics! – Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Understanding	the	distributions	
lets	us	understand	the	inferential	
statistics	better
Thank	you!
Email	your	queries	to	sarizwan1986@outlook.com

Probability distributions, sampling distributions and central limit theorem

  • 1.
    Probability distributions Dr. S.A. Rizwan, M.D. Public Health Specialist SBCM, Joint Program – Riyadh Ministry of Health, Kingdom of Saudi Arabia
  • 2.
    Learning objectives Demystifying statistics! –Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh • Define probability distributions • Describe the common types of probability distributions • Describe sampling distribution • Understand the central limit theorem
  • 3.
    Probability distribution Demystifying statistics! –Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh • Probability distribution is a mathematical function that can be thought of as providing the probability of occurrence of different possible outcomes in an experiment. • The distribution of a statistical data set (or a population) is a listing or function showing all the possible values (or intervals) of the data and how often they occur.
  • 4.
    Probability distribution Demystifying statistics! –Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
  • 5.
    Section 1: Binomial distribution Demystifying statistics! –Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh 5
  • 6.
    Binomial distribution Demystifying statistics! –Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh • Following conditions need to be satisfied for a binomial experiment/distribution: • There is a fixed number of n trials carried out. • The outcome of a given trial is either a “success” or “failure”. • The probability of success (p) remains constant from trial to trial. • The trials are independent, the outcome of a trial is not affected by the outcome of any other trial.
  • 7.
    Binomial distribution – example Demystifying statistics!– Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh • Suppose we have n = 40 patients who will be receiving an experimental therapy which is believed to be better than current treatments which historically have had a 5-year survival rate of 20%, i.e. the probability of 5-year survival is p = 0.20 • Thus the number of patients out of 40 in our study surviving at least 5 years has a binomial distribution, i.e. X ~ BIN(40, 0.20)
  • 8.
    Binomial distribution – example Demystifying statistics!– Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh • Suppose that using the new treatment we find that 16 out of the 40 patients survive at least 5 years past diagnosis. • Q: Does this result suggest that the new therapy has a better 5-year survival rate than the current, i.e. is the probability that a patient survives at least 5 years greater than .20 or a 20% chance when treated using the new therapy?
  • 9.
    Binomial distribution – example Demystifying statistics!– Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh • We essentially ask ourselves the following: • If we assume that new therapy is no better than the current what is the probability we would see these results by chance variation alone? • More specifically what is the probability of seeing 16 or more successes out of 40 if the success rate of the new therapy is .20 or 20% as well?
  • 10.
    Binomial distribution – example Demystifying statistics!– Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh • This is a binomial experiment situation • There are n = 40 patients and we are counting the number of patients that survive 5 or more years. The individual patient outcomes are independent and IF WE ASSUME the new method is NOT better, then the probability of success is p = .20 or 20% for all patients. • So X = # of “successes” in the clinical trial is binomial with n = 40 and p = 0.20, i.e. X ~ BIN(40, 0.20)
  • 11.
    Binomial distribution – example Demystifying statistics!– Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh • X ~ BIN(40,.20), find the probability that 16 or more patients survive at least 5 years. probabilities are computed automatically for greater than or equal to and less than or equal to x. Enter n = sample size x = observed # of “successes” p = probabilityof “success”
  • 12.
    Binomial distribution – example Demystifying statistics!– Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh • The chance that we would see 16 or more patients out of 40 surviving at least 5 years if the new method has the same chance of success as the current methods (20%) is VERY SMALL, 0.0029.
  • 13.
    Section 2: Normal distribution Demystifying statistics! –Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh 13
  • 14.
    Normal distribution Demystifying statistics! –Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh • The normal distribution is a descriptive model that describes real world situations. • It is defined as a continuous frequency distribution of infinite range (can take any value). • This is the most important probability distribution in statistics and important tool in analysis of epidemiological data
  • 15.
    Normal distribution - properties Demystifying statistics!– Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh • The normal distribution is defined by two parameters, μ and σ. • You can draw a normal distribution for any μ and σ combination. • There is one normal distribution, Z, that is special. • It has μ = 0 and σ = 1. • Also called standard normal distribution.
  • 16.
    Normal distribution - properties Demystifying statistics!– Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh • Mean = Median = Mode • Spread determined by SD • Bell-shaped • Symmetry about the center • 50% of values less than the mean and 50% greater than the mean • It approaches horizontal axis asymptotically: - ∞ < X < + ∞ • Area under the curve is 1
  • 17.
    Normal distribution - properties Demystifying statistics!– Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
  • 18.
    Normal distribution - properties Demystifying statistics!– Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
  • 19.
    Normal distribution – example Demystifying statistics!– Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh • Assuming the normal heart rate (H.R) in normal healthy individuals is normally distributed with Mean = 70 and Standard Deviation = 10 • Q1. What area under the curve is above 80 beats/min? • Q2. What area of the curve is above 90 beats/min? • Q3. What area of the curve is between 50-90 beats/min? • Q4. What area of the curve is above 100 beats/min? • Q5. What area of the curve is below 40 beats per min or above 100 beats per min?
  • 20.
    Normal distribution – example Demystifying statistics!– Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
  • 21.
    Normal distribution – example Demystifying statistics!– Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
  • 22.
    Section 3: Sampling distribution Demystifying statistics! –Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh 22
  • 23.
    Sampling distribution Demystifying statistics! –Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh • Sampling distribution of the mean – A theoretical probability distribution of sample means that would be obtained by drawing from the population all possible samples of the same size.
  • 24.
    Sampling distribution Demystifying statistics! –Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
  • 25.
    Central Limit Theorem Demystifying statistics! –Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh • No matter what we are measuring, the distribution of any measure across all possible samples we could take approximates a normal distribution, as long as the number of cases in each sample is about 30 or larger. • If we repeatedly drew samples from a population and calculated the mean of a variable or a percentage or, those sample means or percentages would be normally distributed. • It enables us to calculate Standard error from a single sample
  • 26.
    Section 4: Percentiles Demystifying statistics! –Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh 26
  • 27.
    Percentiles Demystifying statistics! –Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh • Value below which a percentage of data falls. • For example: 80% of people are shorter than you, That means you are at the 80th percentile. If your height is 1.85m then "1.85m" is the 80th percentile height in that group.
  • 28.
    Percentiles Demystifying statistics! –Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
  • 29.
    Percentiles Demystifying statistics! –Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh • Quantiles are cutpointsdividing the range of a probability distribution into contiguous intervals with equal probabilities • Median, tertiles, quartiles, quintiles, sextiles, septiles, octiles, deciles, percentiles or centiles • Inter-quartile range
  • 30.
    Take home messages Demystifying statistics! –Lecture 2 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh • Understanding the distributions lets us understand the inferential statistics better
  • 31.