SlideShare a Scribd company logo
1 of 30
Download to read offline
1
Quantitative Science I
Hypothesis Testing
Hamdy F. F. Mahmoud, PhD
Collegiate Assistant Professor
Statistics Department @ VT
o Hypothesis testing for a population mean
o Steps of hypotheses testing:
▪ Null Hypothesis and Alternative Hypothesis
▪ Test statistic
▪ P-value
▪ Conclusion
o Relationship between Confidence Interval and Hypothesis Testing
o Read Chapter 6.1,6.2
This lecture note covers
Statistical Inference
❑There are two common types of statistical inference:
▪ Confidence interval is used when your goal is to estimate a
population parameter.
▪ Tests of significance is used to assess evidence in the data
about some claim.
❑A test of significance is a formal procedure for comparing
observed data with a claim (also called a hypothesis) whose truth
we want to assess.
▪ The claim is a statement about a parameter, like the population
proportion p or the population mean µ.
❑ We express the results of a significance test in terms of a
probability, called the P-value, that measures how well the data and
the claim agree.
The Reasoning of Tests of Significance
❑ Assume that you have been told that the average grade in a certain
course is 60/100 (claimed value).
▪ You take a group of students taking that course and collect the
grades of all of them.
▪ You calculate the statistic: sample mean and obtain ഥX = 90/100.
This looks like a high grade!!!
▪ We see that ഥX > 60. We would like to know just how certain we can
be that μ > 60.
▪ A confidence interval is not quite what we need. For example, if
we construct the CI that, with 95% CI, μ is between [58, 90]. It
does not directly tell us how confident we can be that μ > 60 and
how strong the evidence against the claim.
Tests of Significance
▪ Our aim will be to infer µ, the value of the mean for the
population.
▪ We are going to start with a very unrealistic situation:
assuming we know 𝜎, the standard deviation of the
distribution for the population.
Steps in Significance Tests
1. State the null and alternative hypothesis.
2. Calculate a test statistic to measure the compatibility between
the null hypothesis and the data.
‐ Test statistic =
estimate from data − 𝑡ℎ𝑒 𝑐𝑙𝑎𝑖𝑚𝑒𝑑 𝑣𝑎𝑙𝑢𝑒 𝑢𝑛𝑑𝑒𝑟𝐻0
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑡ℎ𝑒 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒
3. Calculate the probability of the estimate (the statistic you
measured) under the null hypothesis - P-value.
4. State a conclusion regarding evidence against the null
hypothesis.
Step 1: Null and alternative hypotheses
▪ The null hypothesis is the claim which is initially
favored or believed to be true. Often default or
uninteresting situation of “no effect” or “no difference”.
▪ THEN, we usually need to determine if there is strong
enough evidence against it.
▪ The test of significance is designed to assess the
strength of the evidence against the null hypothesis.
Back to our motivating example
Claimed value = 60/100, actually obtain ഥ𝒙 = 90/100.
1) Assuming that μ=60, is it just a rare case?
2) How rare is it? Is there some evidence that maybe the
average grade is greater than 60?
▪ The statement being tested is that the mean of the population
(the value of the parameter µ) is 60 – Null Hypothesis, 𝐇 𝟎.
‐ The test of significance is designed to assess the strength
of evidence against the null hypothesis.
▪ The alternate statement is that the mean of the population
(the value of the parameter µ) is > 60 – Alternative
Hypothesis, 𝐇 𝒂.
‐ The test of significance is designed to assess the strength
of evidence to support the alternative hypothesis.
Practice on null and alternative hypotheses formulation
Specifications for a water pipe call for a mean breaking strength μ
of more than 2000 lb per linear foot. Engineers will perform a
hypothesis test to decide whether to use a certain kind of pipe.
They will select a random sample of 1 ft sections of pipe, measure
their breaking strengths, and perform a hypothesis test. The pipe
will not be used unless the engineers can conclude that μ > 2000.
▪ How to set up the null hypothesis and the alternative hypothesis?
H0: μ = 60 vs. Ha: μ <60
Suspect the average grade is lower. One-sided Ha.
H0: μ = 60 vs. Ha: μ >60
Suspect the average grade is higher. One-sided Ha.
H0: μ =60 vs. Ha: μ ≠60
Suspect the average grade is different. Two-sided Ha.
Note:
you must decide on the setting, based on general knowledge,
before you see the data or other measurements.
Hypotheses Possibilities
The Basic Idea
Every time we perform a hypothesis test, this is the basic
procedure that we will follow:
1.We'll make an initial assumption about the population
parameter.
2.We'll collect evidence or else use somebody else's
evidence (in either case, our evidence will come in the
form of data).
3.Based on the available evidence (data), we'll decide
whether to "reject" or "not reject" our initial assumption.
Step 2: Test Statistic: Z Test for 𝛍
▪ We want to test whether we have evidence that the
mean of the population has a certain value μ0.
H0: 𝜇 = 𝜇0
▪ From the data (sample size n) we measure the sample
mean ത𝑋.
Z = Test Statistic =
ത𝑋 − 𝜇0
𝜎
√𝑛
Based on the CLT, ഥ𝒙 comes from a distribution N(µ0,
𝝈
𝒏
)
We know that under 𝑯 𝟎 the mean value for the population is µ0.
Step 3: P-value
❑In performing a hypothesis test, we
essentially put the null hypothesis
on trial. We begin by assuming that
H0 is true, just as we begin a trial by
assuming a defendant to be
innocent.
❑The hypothesis test involves
measuring the strength of the
disagreement between the sample
and H0 to produce a number
between 0 and 1, called a P value.
❑P-value is a probability, computed
assuming that H0 is true, that the
test statistics would take as
extreme or more extreme values as
the one actually observed.
More about P-value…
When the P-value is small, there are 2 choices:
1. The null hypothesis is true, and our observed effect is
extremely rare!
OR more likely…
2. The null hypothesis is false, and our data is telling us this
by the small P-value!
Significance Level
▪ We need a cut-off point (decisive value) that we can compare our
P-value to and draw a conclusion or make a decision. In other
words, how much evidence do we need to reject H0 ?
▪ This cut-off point is the significance level. It is announced in
advance and serves as a standard on how much evidence against
H0 we need to reject H0. Usually denoted α.
▪ Typical values of α: 0.05, 0.01.
▪ If not stated otherwise, assume α=0.05.
Step 4: The conclusion/decision
▪ If the P-value is smaller than a fixed significance level α, then
we reject the null hypothesis (in favor of the alternative).
▪ Otherwise we don’t have enough evidence to reject the null.
‐ If we don’t reject the null, do we accept it?
▪ Note: Should always report a P-value with your conclusion
and write the conclusion in terms of the problem.
Step 4: The conclusion/decision
Statistical Significance
The final step in performing a significance test is to draw a
conclusion ―reject H0 or fail to reject H0.
▪ If our sample result is too unlikely to have happened by
chance assuming H0 is true, then we will reject H0.
▪ Otherwise, we will fail to reject H0.
• Note: A fail-to-reject H0 decision in a significance test
does not mean that H0 is true. For that reason, you
should never “accept H0” or use language implying
that you believe H0 is true.
Why “fail to reject” H0 vs. “accept” H0?
❑ 𝐻0 Hypothesis: There are NO racoons in the backyard.
• Observation 1: I randomly go out and do not see racoons.
• Conclusion: 𝑯 𝟎 hypothesis “seems” to be correct now.
• Observation 2 at a later time: I see racoons in the yard...
• Conclusion: 𝑯 𝟎 hypothesis is incorrect!!!
Why not “accept null hypothesis”?
Can NOT “prove truth”, only “disprove truth”
▪ We fail to reject 𝐻0 Hypothesis based on Observation 1 may be
DUE to bad sample or small sample size.
▪ Only rejection is significant, that is, if reject 𝐻0, we have
significant conclusion that 𝝁 = 𝝁 𝟎 is untrue.
Tests for a Population Mean
Example 1: [Two-sided test]
• A scale is to be calibrated by weighing a 1000 g test weight 60 times.
The 60 scale readings have mean 1000.6 g and standard deviation 2 g.
• Find the P-value for testing 𝐻0: μ = 1000 versus 𝐻1 : μ ≠ 1000.
1000 1000.6
2.32
-2.32
Example 2 [One-sided Test]
▪ The article “Wear in Boundary Lubrication” (S. Hsu, R. Munro, and M.
Shen, Journal of Engineering Tribology, 2002:427–441) discusses
several experiments involving various lubricants. In one experiment, 45
steel balls lubricated with purified paraffin were subjected to a 40 kg
load at 600 rpm for 60 minutes. The average wear, measured by the
reduction in diameter, was 673.2 μm, and the standard deviation was
14.9 μm. Assume that the specification for a lubricant is that the mean
wear be less than 675 μm.
▪ Find the P-value for the testing 𝐻0 : μ ≥ 675 versus 𝐻1 : μ < 675.
Tests for a Population Mean
One-sided vs. two-sided
▪ If, based on previous data or experience, we expect “increase”,
“more”, “better”, etc. (“decrease”, “less”, “worse”, etc.), then
we can use a one-sided test.
▪ Otherwise, by default, we use two-sided. Key words:
“different”, “departures”, “changed”…
The Relationship between Hypothesis Tests and Confidence Interval
❑In a hypothesis test for a population mean μ, we specify a
particular value of μ (the null hypothesis) and determine
whether that value is plausible.
❑In contrast, a confidence interval for a population mean μ
can be thought of as the collection of all values for μ that
meet a certain criterion of plausibility, specified by the
confidence level 100(1 − α)%.
A level α two-sided significance test rejects H0: µ=µ0 exactly when
µ0 falls outside a level 1- α confidence interval for µ.
Conclusions after using a Confidence Interval to do a Hypothesis Testing
Claimed value from null hypothesis fits
inside the CI?
Yes No
Fail to reject H0. Reject H0.
Relationship between C.I. and H.T. – recall example 1
• A scale is to be calibrated by weighing a 1000 g test weight 60
times. The 60 scale readings have mean 1000.6 g and standard
deviation 2 g.
• Find the 90% C.I. for the mean weight of the scale readings.
C=90% → z*=1.645
margin of error = 1.645×
2
√60
= 0.425
C.I. = (1000.6-0.425, 1000.6+0.425) = (1000.175, 1001.025).
• At 𝜶 = 𝟎. 𝟏, since 𝜇0 = 1000 is outside the above C.I. We
reject H0. We have significant evidence that the population
mean is different from 1000 minutes.
Choosing the level of significance
• α=0.05 is accepted standard, but…
• if the conclusion that Ha is true has “costly” implications,
smaller α may be appropriate
• not always need to make a decision: describing the evidence by
P-value may be enough
• no sharp border between statistically significant and
insignificant
Statistical vs. practical significance
• Statistically significant effect may be small:
Example (“Executive” blood pressure):
• µ0 = 128
• σ = 15
• n = 1000 obs.
• sample mean = 127
‐ Z = (127-128)/ (15/sqrt(1000)) = -2.11
‐ P-value for two-sided Ha = 2*0.0174=0.0348
‐ Significant??
▪ Stat. significance is not necessarily practical significance.
▪ Outliers may produce or destroy statistical significance.

More Related Content

What's hot

STATISTICS: Normal Distribution
STATISTICS: Normal Distribution STATISTICS: Normal Distribution
STATISTICS: Normal Distribution
jundumaug1
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
Aileen Balbido
 
Research method ch07 statistical methods 1
Research method ch07 statistical methods 1Research method ch07 statistical methods 1
Research method ch07 statistical methods 1
naranbatn
 

What's hot (20)

STATISTICS: Normal Distribution
STATISTICS: Normal Distribution STATISTICS: Normal Distribution
STATISTICS: Normal Distribution
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Basic Concepts of Inferential statistics
Basic Concepts of Inferential statisticsBasic Concepts of Inferential statistics
Basic Concepts of Inferential statistics
 
Descriptive statistics and graphs
Descriptive statistics and graphsDescriptive statistics and graphs
Descriptive statistics and graphs
 
Data interpretation
Data interpretationData interpretation
Data interpretation
 
Chi Square & Anova
Chi Square & AnovaChi Square & Anova
Chi Square & Anova
 
Basic Descriptive statistics
Basic Descriptive statisticsBasic Descriptive statistics
Basic Descriptive statistics
 
Quantitative data 2
Quantitative data 2Quantitative data 2
Quantitative data 2
 
Student t-test
Student t-testStudent t-test
Student t-test
 
Student T - test
Student T -  testStudent T -  test
Student T - test
 
Normality evaluation in a data
Normality evaluation in a dataNormality evaluation in a data
Normality evaluation in a data
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Kruskal wallis test
Kruskal wallis testKruskal wallis test
Kruskal wallis test
 
Central tendency
Central tendencyCentral tendency
Central tendency
 
Quantitative analysis
Quantitative analysisQuantitative analysis
Quantitative analysis
 
Statistical Concepts: Introduction
Statistical Concepts: IntroductionStatistical Concepts: Introduction
Statistical Concepts: Introduction
 
Descriptive
DescriptiveDescriptive
Descriptive
 
Analysis of data in research
Analysis of data in researchAnalysis of data in research
Analysis of data in research
 
Frequency Distribution - Biostatistics - Ravinandan A P.pdf
Frequency Distribution - Biostatistics - Ravinandan A P.pdfFrequency Distribution - Biostatistics - Ravinandan A P.pdf
Frequency Distribution - Biostatistics - Ravinandan A P.pdf
 
Research method ch07 statistical methods 1
Research method ch07 statistical methods 1Research method ch07 statistical methods 1
Research method ch07 statistical methods 1
 

Similar to Test of hypotheses part i

Chapter 20 and 21 combined testing hypotheses about proportions 2013
Chapter 20 and 21 combined testing hypotheses about proportions 2013Chapter 20 and 21 combined testing hypotheses about proportions 2013
Chapter 20 and 21 combined testing hypotheses about proportions 2013
calculistictt
 
Review Z Test Ci 1
Review Z Test Ci 1Review Z Test Ci 1
Review Z Test Ci 1
shoffma5
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
Nirajan Bam
 
Estimation and hypothesis testing 1 (graduate statistics2)
Estimation and hypothesis testing 1 (graduate statistics2)Estimation and hypothesis testing 1 (graduate statistics2)
Estimation and hypothesis testing 1 (graduate statistics2)
Harve Abella
 

Similar to Test of hypotheses part i (20)

Chapter 20 and 21 combined testing hypotheses about proportions 2013
Chapter 20 and 21 combined testing hypotheses about proportions 2013Chapter 20 and 21 combined testing hypotheses about proportions 2013
Chapter 20 and 21 combined testing hypotheses about proportions 2013
 
20200519073328de6dca404c.pdfkshhjejhehdhd
20200519073328de6dca404c.pdfkshhjejhehdhd20200519073328de6dca404c.pdfkshhjejhehdhd
20200519073328de6dca404c.pdfkshhjejhehdhd
 
Tests of significance
Tests of significanceTests of significance
Tests of significance
 
Hypothesis testing1
Hypothesis testing1Hypothesis testing1
Hypothesis testing1
 
hypothesis test
 hypothesis test hypothesis test
hypothesis test
 
Unit 4 Tests of Significance
Unit 4 Tests of SignificanceUnit 4 Tests of Significance
Unit 4 Tests of Significance
 
Basics of Hypothesis testing for Pharmacy
Basics of Hypothesis testing for PharmacyBasics of Hypothesis testing for Pharmacy
Basics of Hypothesis testing for Pharmacy
 
RESEARCH METHODS LESSON 3
RESEARCH METHODS LESSON 3RESEARCH METHODS LESSON 3
RESEARCH METHODS LESSON 3
 
7 hypothesis testing
7 hypothesis testing7 hypothesis testing
7 hypothesis testing
 
Review Z Test Ci 1
Review Z Test Ci 1Review Z Test Ci 1
Review Z Test Ci 1
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
 
Hypothesis
HypothesisHypothesis
Hypothesis
 
Hypothesis Testing.pptx
Hypothesis Testing.pptxHypothesis Testing.pptx
Hypothesis Testing.pptx
 
Hypothesis
HypothesisHypothesis
Hypothesis
 
Chapter 9 Fundamental of Hypothesis Testing.ppt
Chapter 9 Fundamental of Hypothesis Testing.pptChapter 9 Fundamental of Hypothesis Testing.ppt
Chapter 9 Fundamental of Hypothesis Testing.ppt
 
Hipotesis y muestreo estadístico
Hipotesis y muestreo estadísticoHipotesis y muestreo estadístico
Hipotesis y muestreo estadístico
 
Testing of hypothesis
Testing of hypothesisTesting of hypothesis
Testing of hypothesis
 
Risk Management - CH 7 - Hypothesis Tests and Confidence | CMT Level 3 | Char...
Risk Management - CH 7 - Hypothesis Tests and Confidence | CMT Level 3 | Char...Risk Management - CH 7 - Hypothesis Tests and Confidence | CMT Level 3 | Char...
Risk Management - CH 7 - Hypothesis Tests and Confidence | CMT Level 3 | Char...
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
 
Estimation and hypothesis testing 1 (graduate statistics2)
Estimation and hypothesis testing 1 (graduate statistics2)Estimation and hypothesis testing 1 (graduate statistics2)
Estimation and hypothesis testing 1 (graduate statistics2)
 

Recently uploaded

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
SoniaTolstoy
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
fonyou31
 

Recently uploaded (20)

Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 

Test of hypotheses part i

  • 1. 1 Quantitative Science I Hypothesis Testing Hamdy F. F. Mahmoud, PhD Collegiate Assistant Professor Statistics Department @ VT
  • 2. o Hypothesis testing for a population mean o Steps of hypotheses testing: ▪ Null Hypothesis and Alternative Hypothesis ▪ Test statistic ▪ P-value ▪ Conclusion o Relationship between Confidence Interval and Hypothesis Testing o Read Chapter 6.1,6.2 This lecture note covers
  • 3. Statistical Inference ❑There are two common types of statistical inference: ▪ Confidence interval is used when your goal is to estimate a population parameter. ▪ Tests of significance is used to assess evidence in the data about some claim. ❑A test of significance is a formal procedure for comparing observed data with a claim (also called a hypothesis) whose truth we want to assess. ▪ The claim is a statement about a parameter, like the population proportion p or the population mean µ. ❑ We express the results of a significance test in terms of a probability, called the P-value, that measures how well the data and the claim agree.
  • 4. The Reasoning of Tests of Significance ❑ Assume that you have been told that the average grade in a certain course is 60/100 (claimed value). ▪ You take a group of students taking that course and collect the grades of all of them. ▪ You calculate the statistic: sample mean and obtain ഥX = 90/100. This looks like a high grade!!! ▪ We see that ഥX > 60. We would like to know just how certain we can be that μ > 60. ▪ A confidence interval is not quite what we need. For example, if we construct the CI that, with 95% CI, μ is between [58, 90]. It does not directly tell us how confident we can be that μ > 60 and how strong the evidence against the claim.
  • 5. Tests of Significance ▪ Our aim will be to infer µ, the value of the mean for the population. ▪ We are going to start with a very unrealistic situation: assuming we know 𝜎, the standard deviation of the distribution for the population.
  • 6. Steps in Significance Tests 1. State the null and alternative hypothesis. 2. Calculate a test statistic to measure the compatibility between the null hypothesis and the data. ‐ Test statistic = estimate from data − 𝑡ℎ𝑒 𝑐𝑙𝑎𝑖𝑚𝑒𝑑 𝑣𝑎𝑙𝑢𝑒 𝑢𝑛𝑑𝑒𝑟𝐻0 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑡ℎ𝑒 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 3. Calculate the probability of the estimate (the statistic you measured) under the null hypothesis - P-value. 4. State a conclusion regarding evidence against the null hypothesis.
  • 7. Step 1: Null and alternative hypotheses ▪ The null hypothesis is the claim which is initially favored or believed to be true. Often default or uninteresting situation of “no effect” or “no difference”. ▪ THEN, we usually need to determine if there is strong enough evidence against it. ▪ The test of significance is designed to assess the strength of the evidence against the null hypothesis.
  • 8. Back to our motivating example Claimed value = 60/100, actually obtain ഥ𝒙 = 90/100. 1) Assuming that μ=60, is it just a rare case? 2) How rare is it? Is there some evidence that maybe the average grade is greater than 60? ▪ The statement being tested is that the mean of the population (the value of the parameter µ) is 60 – Null Hypothesis, 𝐇 𝟎. ‐ The test of significance is designed to assess the strength of evidence against the null hypothesis. ▪ The alternate statement is that the mean of the population (the value of the parameter µ) is > 60 – Alternative Hypothesis, 𝐇 𝒂. ‐ The test of significance is designed to assess the strength of evidence to support the alternative hypothesis.
  • 9. Practice on null and alternative hypotheses formulation Specifications for a water pipe call for a mean breaking strength μ of more than 2000 lb per linear foot. Engineers will perform a hypothesis test to decide whether to use a certain kind of pipe. They will select a random sample of 1 ft sections of pipe, measure their breaking strengths, and perform a hypothesis test. The pipe will not be used unless the engineers can conclude that μ > 2000. ▪ How to set up the null hypothesis and the alternative hypothesis?
  • 10. H0: μ = 60 vs. Ha: μ <60 Suspect the average grade is lower. One-sided Ha. H0: μ = 60 vs. Ha: μ >60 Suspect the average grade is higher. One-sided Ha. H0: μ =60 vs. Ha: μ ≠60 Suspect the average grade is different. Two-sided Ha. Note: you must decide on the setting, based on general knowledge, before you see the data or other measurements. Hypotheses Possibilities
  • 11. The Basic Idea Every time we perform a hypothesis test, this is the basic procedure that we will follow: 1.We'll make an initial assumption about the population parameter. 2.We'll collect evidence or else use somebody else's evidence (in either case, our evidence will come in the form of data). 3.Based on the available evidence (data), we'll decide whether to "reject" or "not reject" our initial assumption.
  • 12. Step 2: Test Statistic: Z Test for 𝛍 ▪ We want to test whether we have evidence that the mean of the population has a certain value μ0. H0: 𝜇 = 𝜇0 ▪ From the data (sample size n) we measure the sample mean ത𝑋. Z = Test Statistic = ത𝑋 − 𝜇0 𝜎 √𝑛 Based on the CLT, ഥ𝒙 comes from a distribution N(µ0, 𝝈 𝒏 ) We know that under 𝑯 𝟎 the mean value for the population is µ0.
  • 13. Step 3: P-value ❑In performing a hypothesis test, we essentially put the null hypothesis on trial. We begin by assuming that H0 is true, just as we begin a trial by assuming a defendant to be innocent. ❑The hypothesis test involves measuring the strength of the disagreement between the sample and H0 to produce a number between 0 and 1, called a P value. ❑P-value is a probability, computed assuming that H0 is true, that the test statistics would take as extreme or more extreme values as the one actually observed.
  • 14. More about P-value… When the P-value is small, there are 2 choices: 1. The null hypothesis is true, and our observed effect is extremely rare! OR more likely… 2. The null hypothesis is false, and our data is telling us this by the small P-value!
  • 15. Significance Level ▪ We need a cut-off point (decisive value) that we can compare our P-value to and draw a conclusion or make a decision. In other words, how much evidence do we need to reject H0 ? ▪ This cut-off point is the significance level. It is announced in advance and serves as a standard on how much evidence against H0 we need to reject H0. Usually denoted α. ▪ Typical values of α: 0.05, 0.01. ▪ If not stated otherwise, assume α=0.05.
  • 16. Step 4: The conclusion/decision ▪ If the P-value is smaller than a fixed significance level α, then we reject the null hypothesis (in favor of the alternative). ▪ Otherwise we don’t have enough evidence to reject the null. ‐ If we don’t reject the null, do we accept it? ▪ Note: Should always report a P-value with your conclusion and write the conclusion in terms of the problem.
  • 17. Step 4: The conclusion/decision
  • 18. Statistical Significance The final step in performing a significance test is to draw a conclusion ―reject H0 or fail to reject H0. ▪ If our sample result is too unlikely to have happened by chance assuming H0 is true, then we will reject H0. ▪ Otherwise, we will fail to reject H0. • Note: A fail-to-reject H0 decision in a significance test does not mean that H0 is true. For that reason, you should never “accept H0” or use language implying that you believe H0 is true.
  • 19. Why “fail to reject” H0 vs. “accept” H0? ❑ 𝐻0 Hypothesis: There are NO racoons in the backyard. • Observation 1: I randomly go out and do not see racoons. • Conclusion: 𝑯 𝟎 hypothesis “seems” to be correct now. • Observation 2 at a later time: I see racoons in the yard... • Conclusion: 𝑯 𝟎 hypothesis is incorrect!!! Why not “accept null hypothesis”? Can NOT “prove truth”, only “disprove truth” ▪ We fail to reject 𝐻0 Hypothesis based on Observation 1 may be DUE to bad sample or small sample size. ▪ Only rejection is significant, that is, if reject 𝐻0, we have significant conclusion that 𝝁 = 𝝁 𝟎 is untrue.
  • 20.
  • 21. Tests for a Population Mean Example 1: [Two-sided test] • A scale is to be calibrated by weighing a 1000 g test weight 60 times. The 60 scale readings have mean 1000.6 g and standard deviation 2 g. • Find the P-value for testing 𝐻0: μ = 1000 versus 𝐻1 : μ ≠ 1000.
  • 23. Example 2 [One-sided Test] ▪ The article “Wear in Boundary Lubrication” (S. Hsu, R. Munro, and M. Shen, Journal of Engineering Tribology, 2002:427–441) discusses several experiments involving various lubricants. In one experiment, 45 steel balls lubricated with purified paraffin were subjected to a 40 kg load at 600 rpm for 60 minutes. The average wear, measured by the reduction in diameter, was 673.2 μm, and the standard deviation was 14.9 μm. Assume that the specification for a lubricant is that the mean wear be less than 675 μm. ▪ Find the P-value for the testing 𝐻0 : μ ≥ 675 versus 𝐻1 : μ < 675. Tests for a Population Mean
  • 24.
  • 25. One-sided vs. two-sided ▪ If, based on previous data or experience, we expect “increase”, “more”, “better”, etc. (“decrease”, “less”, “worse”, etc.), then we can use a one-sided test. ▪ Otherwise, by default, we use two-sided. Key words: “different”, “departures”, “changed”…
  • 26. The Relationship between Hypothesis Tests and Confidence Interval ❑In a hypothesis test for a population mean μ, we specify a particular value of μ (the null hypothesis) and determine whether that value is plausible. ❑In contrast, a confidence interval for a population mean μ can be thought of as the collection of all values for μ that meet a certain criterion of plausibility, specified by the confidence level 100(1 − α)%. A level α two-sided significance test rejects H0: µ=µ0 exactly when µ0 falls outside a level 1- α confidence interval for µ.
  • 27. Conclusions after using a Confidence Interval to do a Hypothesis Testing Claimed value from null hypothesis fits inside the CI? Yes No Fail to reject H0. Reject H0.
  • 28. Relationship between C.I. and H.T. – recall example 1 • A scale is to be calibrated by weighing a 1000 g test weight 60 times. The 60 scale readings have mean 1000.6 g and standard deviation 2 g. • Find the 90% C.I. for the mean weight of the scale readings. C=90% → z*=1.645 margin of error = 1.645× 2 √60 = 0.425 C.I. = (1000.6-0.425, 1000.6+0.425) = (1000.175, 1001.025). • At 𝜶 = 𝟎. 𝟏, since 𝜇0 = 1000 is outside the above C.I. We reject H0. We have significant evidence that the population mean is different from 1000 minutes.
  • 29. Choosing the level of significance • α=0.05 is accepted standard, but… • if the conclusion that Ha is true has “costly” implications, smaller α may be appropriate • not always need to make a decision: describing the evidence by P-value may be enough • no sharp border between statistically significant and insignificant
  • 30. Statistical vs. practical significance • Statistically significant effect may be small: Example (“Executive” blood pressure): • µ0 = 128 • σ = 15 • n = 1000 obs. • sample mean = 127 ‐ Z = (127-128)/ (15/sqrt(1000)) = -2.11 ‐ P-value for two-sided Ha = 2*0.0174=0.0348 ‐ Significant?? ▪ Stat. significance is not necessarily practical significance. ▪ Outliers may produce or destroy statistical significance.