P value, Power & Type I & II error
Dr. S. A. Rizwan, M.D.
Public Health Specialist
SBCM, Joint Program – Riyadh
Ministry of Health, Kingdom of Saudi Arabia
Learning	objectives
Demystifying statistics! – Lecture 5 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Define	p	value
• Describe	the	meaning	and	limitations	of	p	value
• Define	power	of	a	test	and	its	meaning	
• Describe	type	1	and	type	2	errors	in	hypothesis	
testing	and	how	they	affect	the	interpretation	of	
results
• Understand	how	consideration	of	p	value,	type	1	
and	2	errors	relate	to	sample	size	calculation
2
Section	1:	
P	value
Demystifying statistics! – Lecture 5 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh 3
P	value
Demystifying statistics! – Lecture 5 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Defined	as	the	probability	of	obtaining	a	result	
equal	to	or	more	extreme	than	what	was	actually	
observed
• First	introduced	by Karl	Pearson in	his Pearson's	
chi-squared	test
• It	can	also	be	seen	in	relation	to	the	probability	of	
making	a	Type	I	error
4
P	value
Demystifying statistics! – Lecture 5 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
The	vertical	coordinate	is	the	probability	density	of	each	outcome,	computed	under	the	null	hypothesis.	
The p-value	is	the	area	under	the	curve	past	the	observed	data	point.
5
P	value	– choice	of	cut	off	value
Demystifying statistics! – Lecture 5 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Arbitrary	cut-off	0.05	(5%	chance	of	a	false+	conclusion)
• If	p<0.05	statistically	significant- Reject	H0,	Accept	H1
• If	p>0.05	statistically	not	significant,	Accept	H0,	Reject	H1
• Testing	potential	harmful	interventions	‘α’	value	is	set	
below	0.05
• Depends upon	the	research	question!
6
P	value	– degrees	of	magnitude
Demystifying statistics! – Lecture 5 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Very	small	(<0.001),	the	results	are	said	to	be	
highly	significant
• Near	0.05,	it	is	said	to	be	borderline	significant
• Near	1.0,	result	does	not	matter!
7
P	value	– how	to	calculate	it?
Demystifying statistics! – Lecture 5 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Depending	upon	the	statistic	we	are	interested	in	
predetermined	p	values	and	their	critical	values	
are	displayed	in	statistical	tables
• So	each	type	of	distribution	has	its	own	table
• It	is	also	possible	to	calculate	exact	p	values	with	
computers	instead	of	using	such	tables
8
P	value	– interpretation
Demystifying statistics! – Lecture 5 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• If	the	results	are	statistically	significant,	decide	whether	the	
observed	differences	are	clinically	important
• If	not	significant,	see	if	the	sample	size	was	adequate	enough	
not	to	have	missed	a	clinically	important	difference
• Power of	the	study	tells	us	the	strength	which	we	can	conclude	
that	there	is	no	difference	between	the	two	groups
9
P	value	– interpretation
Demystifying statistics! – Lecture 5 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Statistical	significance	does	not	necessarily	mean	real	significance
• If	sample	size	is	large,	even	small	differences	can	have	a	low	p-value
• Lack	of	significance	doesn’t	necessarily	mean	null	hypothesis	is	true
• If	sample	size	is	small,	there	could	be	a	real	difference,	but	we	are	not	
able	to	detect	it
• If	you	perform	a	large	number	of	tests	in	a	study,	1	in	20	will	be	
significant	merely	by	chance
10
Section	2:	
Type	1	and	2	errors
Demystifying statistics! – Lecture 5 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh 11
Demystifying statistics! – Lecture 5 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• These	are	errors	that	arise	when	performing	hypothesis	testing	and	
decision	making
• Type	1	error	(false	positive	conclusion)
• Stating	difference	when	there	is	no	difference,	alpha
• Related	to	p	value,	how?
• Set	at	1/20	or	0.05	or	5%
• The	probability	is	distributed	at	the	tails	of	the	normal	curve	i.e.,	0.025	on	
either	tail
• Type	2	error (false	negative	conclusion)
• Stating	no	difference	when	there	is	a	difference,	beta
• Occurs	when	sample	size	is	too	small.
• Conventional	values	are	0.1	or	0.2
• Related	to	power,	how?
What	are	these	errors?
12
Demystifying statistics! – Lecture 5 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
What	are	these	errors?
13
Reality:
No effect
Reality:
Effect exists
Research concludes:
Fail to reject null;
No effect
CORRECT FAILURE TO
REJECT
TYPE 2 ERROR (β)
Researcher concludes:
Reject null;
Effect exists
TYPE 1 ERROR (α) CORRECT REJECT (1-β)
• Advanced	learning:	Do	you	know	there	are	type	3	and	4	also?
Demystifying statistics! – Lecture 5 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
Example	1
14
Demystifying statistics! – Lecture 5 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
Example	2
15
Demystifying statistics! – Lecture 5 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
Example	3
16
Section	3:	
Power	of	the	study
Demystifying statistics! – Lecture 5 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh 17
Power	of	the	study
Demystifying statistics! – Lecture 5 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• The	ability	to	detect	a	statistically	significant	association
• It	can	also	be	seen	as	the	probability	of	not	missing	an	effect,	
due	to	sampling	error,	when	there	really	is	an	effect
• It	is	also	the	probability	of	avoiding	a	type	2	error,	i.e.,	1	– beta
• A	prospective	power	analysis	is	used	before	collecting	data,	to	
consider	design	sensitivity
• A	retrospective	power	analysis	is	used	in	order	to	know	
whether	the	studies	you	are	interpreting	were	well	enough	
designed
18
Factors	affecting	power
Demystifying statistics! – Lecture 5 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• All	else	being	equal:
1. As	sample	sizes	increase,	power	increases
2. As	population	variances	decrease,	power	increases
3. As	the	difference increases,	power	increases
4. Statistical	power	is	greater	for	one-tailed	tests
5. The	greater	the	probability	of	making	a	Type	I	error,	the	
greater	the	power
19
Calculating	Power:	Example
Demystifying statistics! – Lecture 5 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• A	study	of	n	=	16	retains	null	H:	μ	=	170	at	α	=	0.05	(two-sided);	
σ	is	40.	What	was	the	power	of	test’s	conditions	to	identify	a	
population	mean	of	190?
( )
5160.0
04.0
40
16|190170|
96.1
||
1 0
1 2
=
Φ=
⎟
⎟
⎠
⎞
⎜
⎜
⎝
⎛ −
+−Φ=
⎟
⎟
⎠
⎞
⎜
⎜
⎝
⎛ −
+−Φ=− −
σ
µµ
β α
n
z a
20
Calculating	Power:	Example
Demystifying statistics! – Lecture 5 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Top	curve	assumes	null	H	is	true	
• Bottom	curve	assumes	alternative	H	is	true
• α	is	set	to	0.05	(two-sided)
• We	will	reject	null	when	a		sample	mean	exceeds	
189.6	(right	tail,	top	curve)
• The	probability	of	getting	a	value	greater	than	189.6	
on	the	bottom	curve	is	0.5160,	corresponding	to	the	
power	of	the	test
21
Power	vs.	confidence	intervals
Demystifying statistics! – Lecture 5 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Once	we	have	constructed	a	confidence	interval,	power	calculations	
yield	no	additional	insights
• It	is	pointless	to	perform	power	calculations	for	hypotheses	outside	
of	the	confidence	interval
• Confidence	intervals	better	inform	readers	about	the	possibility	of	
an	inadequate	sample	size	than	do	post	hoc	power	calculations
22
How	do	the	errors	relate	to	sample	size?
Demystifying statistics! – Lecture 5 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Sample	size	for	one-sample	z	test:
• 1	– β	≡	desired	power
• α	≡	desired	significance	level	(two-sided)	
• σ	≡	population	standard	deviation
• Δ	=	μ0	– μa	≡	the	difference	worth	detecting
( )
2
2
11
2
2
Δ
+
=
−− αβσ zz
n
23
How	do	the	errors	relate	to	sample	size?
Demystifying statistics! – Lecture 5 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• How	large	a	sample	is	needed	for	a	one-sample	z	test	
with	90%	power	and	α	=	0.05	(two-tailed)	when	σ	=	
40?	Let	H0:	μ	=	170	and	Ha:	μ	=	190	(thus,	Δ	=	μ0	−	μa
=	170	– 190	=	−20)
• Sample	size	should	be	42	to	ensure	adequate	power.
( ) 99.41
20
)96.128.1(40
2
22
2
2
11
2
2
=
−
+
=
Δ
+
=
−− αβσ zz
n
24
How	do	the	errors	relate	to	sample	size?
Demystifying statistics! – Lecture 5 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
N	=	16 N	=	42
25
Take	home	messages
Demystifying statistics! – Lecture 5 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• P	value,	type	1	and	2	errors,	
alpha,	beta,	power,	critical	
value	and	hypothesis	testing,	
sample	size	are	all	related	to	
each	other
26
Thank	you!
Email	your	queries	to	sarizwan1986@outlook.com	
27

P value, Power, Type 1 and 2 errors

  • 1.
    P value, Power& Type I & II error Dr. S. A. Rizwan, M.D. Public Health Specialist SBCM, Joint Program – Riyadh Ministry of Health, Kingdom of Saudi Arabia
  • 2.
    Learning objectives Demystifying statistics! –Lecture 5 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh • Define p value • Describe the meaning and limitations of p value • Define power of a test and its meaning • Describe type 1 and type 2 errors in hypothesis testing and how they affect the interpretation of results • Understand how consideration of p value, type 1 and 2 errors relate to sample size calculation 2
  • 3.
    Section 1: P value Demystifying statistics! –Lecture 5 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh 3
  • 4.
    P value Demystifying statistics! –Lecture 5 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh • Defined as the probability of obtaining a result equal to or more extreme than what was actually observed • First introduced by Karl Pearson in his Pearson's chi-squared test • It can also be seen in relation to the probability of making a Type I error 4
  • 5.
    P value Demystifying statistics! –Lecture 5 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh The vertical coordinate is the probability density of each outcome, computed under the null hypothesis. The p-value is the area under the curve past the observed data point. 5
  • 6.
    P value – choice of cut off value Demystifying statistics!– Lecture 5 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh • Arbitrary cut-off 0.05 (5% chance of a false+ conclusion) • If p<0.05 statistically significant- Reject H0, Accept H1 • If p>0.05 statistically not significant, Accept H0, Reject H1 • Testing potential harmful interventions ‘α’ value is set below 0.05 • Depends upon the research question! 6
  • 7.
    P value – degrees of magnitude Demystifying statistics!– Lecture 5 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh • Very small (<0.001), the results are said to be highly significant • Near 0.05, it is said to be borderline significant • Near 1.0, result does not matter! 7
  • 8.
    P value – how to calculate it? Demystifying statistics!– Lecture 5 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh • Depending upon the statistic we are interested in predetermined p values and their critical values are displayed in statistical tables • So each type of distribution has its own table • It is also possible to calculate exact p values with computers instead of using such tables 8
  • 9.
    P value – interpretation Demystifying statistics!– Lecture 5 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh • If the results are statistically significant, decide whether the observed differences are clinically important • If not significant, see if the sample size was adequate enough not to have missed a clinically important difference • Power of the study tells us the strength which we can conclude that there is no difference between the two groups 9
  • 10.
    P value – interpretation Demystifying statistics!– Lecture 5 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh • Statistical significance does not necessarily mean real significance • If sample size is large, even small differences can have a low p-value • Lack of significance doesn’t necessarily mean null hypothesis is true • If sample size is small, there could be a real difference, but we are not able to detect it • If you perform a large number of tests in a study, 1 in 20 will be significant merely by chance 10
  • 11.
    Section 2: Type 1 and 2 errors Demystifying statistics! –Lecture 5 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh 11
  • 12.
    Demystifying statistics! –Lecture 5 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh • These are errors that arise when performing hypothesis testing and decision making • Type 1 error (false positive conclusion) • Stating difference when there is no difference, alpha • Related to p value, how? • Set at 1/20 or 0.05 or 5% • The probability is distributed at the tails of the normal curve i.e., 0.025 on either tail • Type 2 error (false negative conclusion) • Stating no difference when there is a difference, beta • Occurs when sample size is too small. • Conventional values are 0.1 or 0.2 • Related to power, how? What are these errors? 12
  • 13.
    Demystifying statistics! –Lecture 5 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh What are these errors? 13 Reality: No effect Reality: Effect exists Research concludes: Fail to reject null; No effect CORRECT FAILURE TO REJECT TYPE 2 ERROR (β) Researcher concludes: Reject null; Effect exists TYPE 1 ERROR (α) CORRECT REJECT (1-β) • Advanced learning: Do you know there are type 3 and 4 also?
  • 14.
    Demystifying statistics! –Lecture 5 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh Example 1 14
  • 15.
    Demystifying statistics! –Lecture 5 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh Example 2 15
  • 16.
    Demystifying statistics! –Lecture 5 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh Example 3 16
  • 17.
    Section 3: Power of the study Demystifying statistics! –Lecture 5 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh 17
  • 18.
    Power of the study Demystifying statistics! –Lecture 5 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh • The ability to detect a statistically significant association • It can also be seen as the probability of not missing an effect, due to sampling error, when there really is an effect • It is also the probability of avoiding a type 2 error, i.e., 1 – beta • A prospective power analysis is used before collecting data, to consider design sensitivity • A retrospective power analysis is used in order to know whether the studies you are interpreting were well enough designed 18
  • 19.
    Factors affecting power Demystifying statistics! –Lecture 5 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh • All else being equal: 1. As sample sizes increase, power increases 2. As population variances decrease, power increases 3. As the difference increases, power increases 4. Statistical power is greater for one-tailed tests 5. The greater the probability of making a Type I error, the greater the power 19
  • 20.
    Calculating Power: Example Demystifying statistics! –Lecture 5 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh • A study of n = 16 retains null H: μ = 170 at α = 0.05 (two-sided); σ is 40. What was the power of test’s conditions to identify a population mean of 190? ( ) 5160.0 04.0 40 16|190170| 96.1 || 1 0 1 2 = Φ= ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − +−Φ= ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − +−Φ=− − σ µµ β α n z a 20
  • 21.
    Calculating Power: Example Demystifying statistics! –Lecture 5 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh • Top curve assumes null H is true • Bottom curve assumes alternative H is true • α is set to 0.05 (two-sided) • We will reject null when a sample mean exceeds 189.6 (right tail, top curve) • The probability of getting a value greater than 189.6 on the bottom curve is 0.5160, corresponding to the power of the test 21
  • 22.
    Power vs. confidence intervals Demystifying statistics! –Lecture 5 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh • Once we have constructed a confidence interval, power calculations yield no additional insights • It is pointless to perform power calculations for hypotheses outside of the confidence interval • Confidence intervals better inform readers about the possibility of an inadequate sample size than do post hoc power calculations 22
  • 23.
    How do the errors relate to sample size? Demystifying statistics! –Lecture 5 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh • Sample size for one-sample z test: • 1 – β ≡ desired power • α ≡ desired significance level (two-sided) • σ ≡ population standard deviation • Δ = μ0 – μa ≡ the difference worth detecting ( ) 2 2 11 2 2 Δ + = −− αβσ zz n 23
  • 24.
    How do the errors relate to sample size? Demystifying statistics! –Lecture 5 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh • How large a sample is needed for a one-sample z test with 90% power and α = 0.05 (two-tailed) when σ = 40? Let H0: μ = 170 and Ha: μ = 190 (thus, Δ = μ0 − μa = 170 – 190 = −20) • Sample size should be 42 to ensure adequate power. ( ) 99.41 20 )96.128.1(40 2 22 2 2 11 2 2 = − + = Δ + = −− αβσ zz n 24
  • 25.
    How do the errors relate to sample size? Demystifying statistics! –Lecture 5 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh N = 16 N = 42 25
  • 26.
    Take home messages Demystifying statistics! –Lecture 5 SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh • P value, type 1 and 2 errors, alpha, beta, power, critical value and hypothesis testing, sample size are all related to each other 26
  • 27.