Statistical Inference and Hypothesis Testing
by
Dr. Priyanka Dixit
TISS, Mumbai
Descriptive and Inferential Statistics
• Descriptive statistics is the term given to the analysis
of data that helps describe, show or summarize data
in a meaningful way such that, pattern might emerge
from the data. It do not, however, allow us to make
conclusions beyond the data we have analysed or
reach conclusions regarding any hypotheses we might
have made.
• It is applicable to properly describe data through
statistics and graphs.
Inferential Statistics
• Inferential statistics are techniques that allow
us to use these samples to make generalization
about the populations from which the samples
were drawn.
Statistical Inference
The process of generalization in prescribed
manner from a sample to its universe is known as
Statistical Inference.
Universe/
Population
µ σ
SAMPL
E
Population Parameters
µ: Population mean
σ: Population standard deviation
Sample Statistic
x: Sample mean
s: Sample standard deviation
X s
Statistical Inference
• Inductive Inference: Extension from particular
to the general is called inductive inference.
• Inductive inference involves element of
uncertainty in the conclusions.
• Deductive Inference
• Deductive inference can be described as
a method of deriving information from
the accepted facts, involves no
uncertainty in the conclusions. The
conclusions reached by deductive
inference are conclusive.
Population and Sample
• The population is an abstract term that refers
to the totality of all conceptually possible
observations, measurements or outcomes of
some specified kind.
• The number of conceptually possible
observations is called the size of the
population.
• The size varies according to the population
being investigated.
Contd…
• For example, a study of monthly income may be
conducted at a district, state and country level.
• So, in the first case, the population will consist of
the income of one district, all residents of the
state in the second case and in the third case
income of all citizens of the country.
• A population may be finite when it consists of a
given number of observations and infinite when
it includes infinite number of observations.
Sample
• A sample is a set of observations selected from
the population.
• The number of observations included in the
sample is called the size of the sample.
• In finite population, a random sample is obtained
by giving every individual in the population an
equal chance of being chosen.
• In case of infinite population, a sample is random
if each observation is independent of every other
observation.
Parameter/Statistics
• Population and samples are studied through
their characteristics. The most important of
these characteristics are the Mean, the
Variance and the Standard deviation.
• The characteristics of a population are called
parameters.
• The characteristics of sample are called
statistics.
Parameters
(Population)
Statistics (Sample)
Population Mean Sample Mean
Population Variance Sample Variance
Population Standard
Deviation
Sample Standard
Deviation
• The purpose of statistical inference is to make a
judgment about the particular parameters on
the basis of sample statistics.
• The judgment relating to population parameters
are of two types; one is related to estimation of
a parameter, the other with testing hypothesis
about the parameter.
Hypothesis Testing
Hypothesis testing in inferential statistics involves
making inferences about the nature of the
population on the basis of observations of a
sample drawn from the population. The
hypothesis is tested against the information
provided by sample in the form of a test-statistic.
What is Statistical Hypothesis?
A Hypothesis is a statement about one or more
population parameters.
Null Hypothesis
What is null hypothesis?
A null hypothesis (H0) is a hypothesis of no
relationship or no difference.
Steps in hypothesis testing
1. State the Hypothesis
2. Set the criterion for rejecting H0
3. Compute the test statistic
4. Decide whether to reject H0
1. State the Hypothesis
In inferential statistics, the term hypothesis has a very
specific meaning: conjecture about one or more
population parameters.
The hypothesis to be tested is called the null hypothesis
and is given the symbol H0.
Example: We use a null hypothesis that the mean
quantitative GRE score of the population of MPH
students is 455.
Thus, our null hypothesis, written in symbols, is
H0: µ = 455 OR H0: µ-455 = 0
Where
µ = population mean
455= Hypothesis value to be tested
We test the null hypothesis (H0) against the
alternative hypothesis (symbolized H1), which
includes the possible outcomes not covered by the
null hypothesis.
For the above example we will use the alternative
hypothesis as
H1 : µ ≠ 455
The alternative hypothesis, often considered the
research hypothesis, can be supported only be
rejecting the null hypothesis.
2. Set the Criterion for Rejecting H0
After stating the hypothesis the next step in hypothesis testing is
determining how different the sample statistic must be from
the hypothesized population parameter (µ) before the null
hypothesis can be rejected.
For our example, suppose we randomly select 144 MPH students
from the population and find the sample mean to be 535. Is
this sample mean =535 sufficiently different from what we
hypothesize for the population mean (µ = 455) to warrant rejecting
null hypothesis.
Before answering this question, we need to consider three
concepts: (i) errors in hypothesis testing, (ii) level of significance,
and (iii) Region of rejection
Properties of Normal Distribution
8. The areas of a normal curve are measured in standard deviation units.
The proportions of cases in specified areas of a normal curve, as
marked by standard deviations, are constant as detailed below:
Number of standard Results lying outside
deviation from mean this (%)
1.00
31.74
1.64
10.00
1.96
5.00
2.58
1.00
3.29
0.10
i. Errors in hypothesis testing
When we decide to reject or not reject the null
hypothesis, there are four possible situations:
a. A true hypothesis is rejected.
b. A true hypothesis is not rejected.
c. A false hypothesis is not rejected
d. A false hypothesis is rejected
In a specific situation, we may make one of two types
of errors, as shown in the figure below:
Decision made State of nature
Null hypothesis is
true
Null hypothesis is
false
Reject null
hypothesis Type I error
Correct
decision
Do not reject null
hypothesis
Correct
decision
Type II error
Example
Verdict of Jury
Defendant
Guilty Innocent
Not Guilty Incorrect Correct decision
Guilty Correct decision Incorrect
Contd… Errors
Type I error is when we reject a true null
hypothesis.
Type II error is when we do not reject a false
null hypothesis
ii. Level of significance
• To choose the criterion for rejecting H0, the
researcher must first select what is called the level of
significance.
• The level of significance or alpha (α) level is defined
as the probability of making a Type I error when
testing a null hypothesis.
• The level of significance is the probability of making a
Type I error: rejecting H0 when it is true.
Power of the Test
• Type II error involves acceptance of H0 when it is actually false
or not finding an effect when actually there is an effect.
• β is the probability of type II error.
• (1-β) is called the power of the test= Probability of finding an
effect when actually there is an effect.
• Power of a statistical test is analogous to the sensitivity of a
diagnostic test.
• α being the false positive.
• β being the false negative.
iii. Region of Rejection
• The region of rejection is the area of the sampling
distribution that represents those values of the sample
mean that are improbable if the null hypothesis is true.
• The Critical values of the tests statistic are those values in
the sampling distribution that represent the beginning of the
region of rejection.
• When the alternative hypothesis is non-directional, the
region of rejection is located in both tails of the sampling
distribution. The test of the null hypothesis against this non-
directional alternative is called a two-tailed test.
• The probability of obtaining a mean as extreme as or more
extreme than the observed sample mean (xbar), given that
the null hypothesis is true, is called the p-value of the test or
p.
Properties of Normal Distribution
8. Properties of Normal Distribution
The areas of a normal curve are measured in standard deviation units.
The proportions of cases in specified areas of a normal curve, as
marked by standard deviations, are constant as detailed below:
Number of standard Results lying outside
deviation from mean this (%)
1.00
31.74
1.64
10.00
1.96
5.00
2.58
1.00
3.29
0.10
Region of rejection for sampling distribution of the mean for null
hypothesis H0 : µ = 455 and S.D. (σx) = 8.33
3. Compute the Test Statistic
In our example
µ=455, the hypothesized value for the parameter
n=144, the size of the sample
= 535, the observed value for the sample statistic
σ=100, the value of the standard deviation in the population
First using the concept of z scores, we determine how
Different is from µ, or the number of standard errors
(standard deviation units) the observed sample value is
from the hypothesized value.
In symbols,
calculating the z score using above formula is called
computing the test statistic
4. Decide about H0
Suppose we had found that the sample mean
for 144 students was not 535, but 465. Our
hypotheses, sampling distribution, and critical
values (+1.96 and -1.96) remain the same, but
now the test statistic is
In other words, the observed sample mean ( = 465) is 1.20
standard errors above the hypothesized value of the
population mean.
Theoretical sampling distribution for the hypothesis H0:µ=45,
illustrating the values of the test statistic when =465
Note that the test statistic (1.20) does not exceed the critical value; it does not fall
into the region of the rejection; and we should not reject the null hypothesis .
-1.96 +1.96
1.20 9.60
• This test statistic (1.20) is then compared to
the critical value (1.96).
• If the test statistic exceeds the critical values
in absolute value, then the null hypothesis is
rejected.
• If the test statistic does not exceeds the
critical values in absolute value, then the null
hypothesis is accepted.
Region of rejection : Directional Alternative Hypothesis
In the GRE example, we tested the null hypothesis against a
non-directional alternative:
H0 : µ = 455
H1 : µ ≠ 455
This test is called two-tailed or non-directional because the
region of rejection was located in both tails of the sampling
distribution of the mean.
Suppose a direction of the results is anticipated. A directional
hypothesis states that a parameter is either greater or less than
the hypothesis value.
For instance, in the GRE example we might use the alternative
hypothesis that the mean GRE level of our population is greater
than 455, in symbols,
H0 : µ = 455
H1 : µ > 455
An alternative hypothesis can be either non-directional
or directional.
A directional alternative hypothesis states that the
parameter is greater than or less than the
hypothesized value.
A non-directional alternative hypothesis merely
states that the parameter is different from (not equal
to) the hypothesized value.
The test of the null hypothesis against a directional
alternative is called a one-tailed test, the region of
rejection is located in one of the two tails of the
sampling distribution. The specific tail of the
distribution is determined by the direction of the
alternative hypothesis.
Now suppose the alternative hypothesis states that the
mean GRE was less than 455. In symbols, the
hypotheses are
H0 : µ = 455
H1 : µ < 455
Here the critical region lies on the left tail of the
distribution.
Type-I and Type-II Errors in Decision Making
In a specific situation, we may make one of two types of
errors, as shown in the figure below:
Decision taken by
the investigator
Existing Reality
Group A=Group B Group A # Group B
Group A # Group B P[ Type-I Error]
(Level of significance)
Correct Decision
(Power of the study)
Group A=Group B Correct Decision
(Level of confidence)
Type – II Error
Testing of Hypothesis
Q=1 A random sample of 100 observations from a
population with standard deviation 60 yielded a
sample mean of 100.
(a) Test the null hypothesis that µ=100 against the
alternative hypothesis (µ≠100) using α=0.05.
(b) Test the null hypothesis that µ=100 against the
alternative hypothesis (µ>100) using α=0.05
Testing of Hypothesis
Ex=1 A random sample of 200 observations from a
population with standard deviation 80 yielded a
sample mean of 150.
(a) Test the null hypothesis that µ=100 against the
alternative hypothesis (µ≠100) using α=0.05.
(b) Test the null hypothesis that µ=100 against the
alternative hypothesis (µ>100) using α=0.05
• Ex=2 A random sample of 100 observations
from a population with standard deviation 60
yielded a sample mean of 100.
• (a)Test the null hypothesis that µ=111 against
the alternative hypothesis (µ≠111) using α=0.05.
• (b) Test the null hypothesis that µ<=111 against
the alternative hypothesis (µ>111) using α=0.05
• Explain why the results differ.
Q=2 The heights of 10 males of a given locality
are found to be as follows:
70, 67, 62, 68, 61, 68, 70, 64, 64, 66 inches.
Is it reasonable to believe that the average height
is greater than 64 inches?
What will be the finding if alternative hypothesis
was two-tailed
Contd.. Answer
Mean=66; S.D.=3.16 and Variance=10.00, t=2.00
• The tabulated value of t-statistic at 9 d.f. and α=0.05
(one-tailed) is 1.833
• Since calculated value is greater than the tabulated
value, we will reject the null hypothesis. We can
believe that mean height is greater than 64 inches.
• What will be the finding if alternative hypothesis was
two-tailed (answer it).
Student’s t Distributions
Does the adjustment of using s to estimate σ have an effect on the
statistical test? Actually, it does, especially for small samples.
The effect is that the normal distribution is inappropriate as the
sampling distribution of the mean.
In the beginning of the 20th century William S. Gosset found that,
for small samples, sampling distribution departed substantially
from
the normal distribution and that, as sample sizes changed, the
distributions changed.
This gave rise to not one distribution but a family of distributions.
The t distributions are a family of symmetrical, bell-shaped
distributions that change as the sample size changes.
Degrees of Freedom
Degrees of Freedom : The number of degrees
of freedom is a mathematical concept defined
as the number of observations less the
number of restrictions placed on them.
Student’s t distribution for 1, 2, 5, 10, and
∞ degrees of freedom
Point Estimates and Interval
Estimates
A point estimate is a single value that represent the
best estimate of the population value. If we are
estimating the mean of a population (µ), then the
sample mean is the best point estimates.
Interval Estimation builds on points estimation to arrive
at a range of values that are tenable for the
parameter and that define an interval we are
confident contains the parameter.
Confidence Interval
CI= ± (ZCV) (σX)
Where
= Sample mean
ZCV = Critical value using the normal distribution and
σX = Standard error of the mean
Confidence Interval
CI= ± (tCV) (sX)
Where
= Sample mean
tCV = Critical value using appropriate t distribution and
sX = estimated standard error of the mean from the
sample
Comparison of Two Means
• Q=As part of an investigation of the development of infant sleep patterns,
the sleep of 20 infants (10 male and 10 female) was monitored on several
occasions between 1 week and 6 months of age. The quiet sleep results
(in minutes) at 1 week of age for the 20 study infants follow.
• Is there evidence of a difference in quiet sleep behavior between two
genders?
• Is there evidence that male mean quiet sleep behavior is higher than
female?
Quiet sleep
(male)
85 129 215 143 44 173 230 198 105 127 Mean=
144.90
Quiet sleep
(female)
140 155 33 209 166 72 116 131 97 124 Mean=
124.30
Sp is pooled variance, Sm^2 and Sf^2 is variance of two sample set
Contd… Answer
For male; S1=59.35; S1
2=3522.54; Mean=144.90
For female; S2=49.48; S2
2=2448.011; Mean=124.30
• t=0.843 at 18 d.f.
Paired-t-test
• As part of a study to determine the effects of a certain oral contraceptive
on weight gain; nine healthy females were weighted at the beginning of a
course of oral contraceptive use. They were reweighed after 3 months.
Results are given below. Do the results suggest evidence of weight gain?
• Longitudinal Study/Real-Cohort Study
Subject Initial weight (LBS) 3 - Months weight
(LBS)
1 120 123
2 141 143
3 130 140
4 150 145
5 135 140
6 140 143
7 120 118
8 140 141
9 130 132
• Contd… Answer
• t=1.509
• One-tailed
• Tabulated value of t at α=0.05 and d.f. =8 is 1.860 (one-
tailed).
Male Female
42.1 41.3 42.4 43.2 41.8 42.7 43.8 42.5 43.1 44.0
41.0 41.8 42.8 42.3 42.7 43.6 43.3 43.5 41.7 44.1
Do the data provide sufficient evidence to conclude that, on the
average, the male weight is greater than female weight? Perform
the required hypothesis test at the 5% level of significance.
Proportion Test
• Q=1 In a sample of 1000 people in Maharashtra, 540 are rice
eaters and the rest are wheat eaters. Can we assume that
both rice and wheat are equally popular in this state at 1%
level of significance?
Z tabulated at 1% level of significance is 2.58 (two-tailed).
Q=2 Twenty people were attacked by a disease and only 18
survived. Will you reject the hypothesis that the survival rate,
if attacked by this disease, is 85% in favour of the hypothesis
that it is more, at 5% level.
Z tabulated at 5% level of significance is 2.58 (one-tailed).
Q=3 In a year there are 956 births in a town A of which 52.5%
were males, while in towns A and B combined, this proportion
in a total of 1406 births was 0.496. Is there any significant
difference in the proportion of male births in the two towns?
Z tabulated at 5% level of significance is 1.96 (two-tailed).
References
• Medical Statistics-Principles & Methods by K.R.
Sundaram, S. N. Dwivedi and V Sreenivas.

Lecture_Hypothesis_Testing statistics .pptx

  • 1.
    Statistical Inference andHypothesis Testing by Dr. Priyanka Dixit TISS, Mumbai
  • 2.
    Descriptive and InferentialStatistics • Descriptive statistics is the term given to the analysis of data that helps describe, show or summarize data in a meaningful way such that, pattern might emerge from the data. It do not, however, allow us to make conclusions beyond the data we have analysed or reach conclusions regarding any hypotheses we might have made. • It is applicable to properly describe data through statistics and graphs.
  • 3.
    Inferential Statistics • Inferentialstatistics are techniques that allow us to use these samples to make generalization about the populations from which the samples were drawn.
  • 4.
    Statistical Inference The processof generalization in prescribed manner from a sample to its universe is known as Statistical Inference. Universe/ Population µ σ SAMPL E Population Parameters µ: Population mean σ: Population standard deviation Sample Statistic x: Sample mean s: Sample standard deviation X s
  • 5.
    Statistical Inference • InductiveInference: Extension from particular to the general is called inductive inference. • Inductive inference involves element of uncertainty in the conclusions.
  • 6.
    • Deductive Inference •Deductive inference can be described as a method of deriving information from the accepted facts, involves no uncertainty in the conclusions. The conclusions reached by deductive inference are conclusive.
  • 7.
    Population and Sample •The population is an abstract term that refers to the totality of all conceptually possible observations, measurements or outcomes of some specified kind. • The number of conceptually possible observations is called the size of the population. • The size varies according to the population being investigated.
  • 8.
    Contd… • For example,a study of monthly income may be conducted at a district, state and country level. • So, in the first case, the population will consist of the income of one district, all residents of the state in the second case and in the third case income of all citizens of the country. • A population may be finite when it consists of a given number of observations and infinite when it includes infinite number of observations.
  • 9.
    Sample • A sampleis a set of observations selected from the population. • The number of observations included in the sample is called the size of the sample. • In finite population, a random sample is obtained by giving every individual in the population an equal chance of being chosen. • In case of infinite population, a sample is random if each observation is independent of every other observation.
  • 10.
    Parameter/Statistics • Population andsamples are studied through their characteristics. The most important of these characteristics are the Mean, the Variance and the Standard deviation. • The characteristics of a population are called parameters. • The characteristics of sample are called statistics.
  • 11.
    Parameters (Population) Statistics (Sample) Population MeanSample Mean Population Variance Sample Variance Population Standard Deviation Sample Standard Deviation
  • 12.
    • The purposeof statistical inference is to make a judgment about the particular parameters on the basis of sample statistics. • The judgment relating to population parameters are of two types; one is related to estimation of a parameter, the other with testing hypothesis about the parameter.
  • 13.
    Hypothesis Testing Hypothesis testingin inferential statistics involves making inferences about the nature of the population on the basis of observations of a sample drawn from the population. The hypothesis is tested against the information provided by sample in the form of a test-statistic. What is Statistical Hypothesis? A Hypothesis is a statement about one or more population parameters.
  • 14.
    Null Hypothesis What isnull hypothesis? A null hypothesis (H0) is a hypothesis of no relationship or no difference. Steps in hypothesis testing 1. State the Hypothesis 2. Set the criterion for rejecting H0 3. Compute the test statistic 4. Decide whether to reject H0
  • 15.
    1. State theHypothesis In inferential statistics, the term hypothesis has a very specific meaning: conjecture about one or more population parameters. The hypothesis to be tested is called the null hypothesis and is given the symbol H0. Example: We use a null hypothesis that the mean quantitative GRE score of the population of MPH students is 455. Thus, our null hypothesis, written in symbols, is H0: µ = 455 OR H0: µ-455 = 0 Where µ = population mean 455= Hypothesis value to be tested
  • 16.
    We test thenull hypothesis (H0) against the alternative hypothesis (symbolized H1), which includes the possible outcomes not covered by the null hypothesis. For the above example we will use the alternative hypothesis as H1 : µ ≠ 455 The alternative hypothesis, often considered the research hypothesis, can be supported only be rejecting the null hypothesis.
  • 17.
    2. Set theCriterion for Rejecting H0 After stating the hypothesis the next step in hypothesis testing is determining how different the sample statistic must be from the hypothesized population parameter (µ) before the null hypothesis can be rejected. For our example, suppose we randomly select 144 MPH students from the population and find the sample mean to be 535. Is this sample mean =535 sufficiently different from what we hypothesize for the population mean (µ = 455) to warrant rejecting null hypothesis. Before answering this question, we need to consider three concepts: (i) errors in hypothesis testing, (ii) level of significance, and (iii) Region of rejection
  • 18.
    Properties of NormalDistribution 8. The areas of a normal curve are measured in standard deviation units. The proportions of cases in specified areas of a normal curve, as marked by standard deviations, are constant as detailed below: Number of standard Results lying outside deviation from mean this (%) 1.00 31.74 1.64 10.00 1.96 5.00 2.58 1.00 3.29 0.10
  • 19.
    i. Errors inhypothesis testing When we decide to reject or not reject the null hypothesis, there are four possible situations: a. A true hypothesis is rejected. b. A true hypothesis is not rejected. c. A false hypothesis is not rejected d. A false hypothesis is rejected
  • 20.
    In a specificsituation, we may make one of two types of errors, as shown in the figure below: Decision made State of nature Null hypothesis is true Null hypothesis is false Reject null hypothesis Type I error Correct decision Do not reject null hypothesis Correct decision Type II error
  • 21.
    Example Verdict of Jury Defendant GuiltyInnocent Not Guilty Incorrect Correct decision Guilty Correct decision Incorrect
  • 22.
    Contd… Errors Type Ierror is when we reject a true null hypothesis. Type II error is when we do not reject a false null hypothesis
  • 23.
    ii. Level ofsignificance • To choose the criterion for rejecting H0, the researcher must first select what is called the level of significance. • The level of significance or alpha (α) level is defined as the probability of making a Type I error when testing a null hypothesis. • The level of significance is the probability of making a Type I error: rejecting H0 when it is true.
  • 24.
    Power of theTest • Type II error involves acceptance of H0 when it is actually false or not finding an effect when actually there is an effect. • β is the probability of type II error. • (1-β) is called the power of the test= Probability of finding an effect when actually there is an effect. • Power of a statistical test is analogous to the sensitivity of a diagnostic test. • α being the false positive. • β being the false negative.
  • 25.
    iii. Region ofRejection • The region of rejection is the area of the sampling distribution that represents those values of the sample mean that are improbable if the null hypothesis is true. • The Critical values of the tests statistic are those values in the sampling distribution that represent the beginning of the region of rejection. • When the alternative hypothesis is non-directional, the region of rejection is located in both tails of the sampling distribution. The test of the null hypothesis against this non- directional alternative is called a two-tailed test. • The probability of obtaining a mean as extreme as or more extreme than the observed sample mean (xbar), given that the null hypothesis is true, is called the p-value of the test or p.
  • 26.
    Properties of NormalDistribution 8. Properties of Normal Distribution The areas of a normal curve are measured in standard deviation units. The proportions of cases in specified areas of a normal curve, as marked by standard deviations, are constant as detailed below: Number of standard Results lying outside deviation from mean this (%) 1.00 31.74 1.64 10.00 1.96 5.00 2.58 1.00 3.29 0.10
  • 27.
    Region of rejectionfor sampling distribution of the mean for null hypothesis H0 : µ = 455 and S.D. (σx) = 8.33
  • 28.
    3. Compute theTest Statistic In our example µ=455, the hypothesized value for the parameter n=144, the size of the sample = 535, the observed value for the sample statistic σ=100, the value of the standard deviation in the population First using the concept of z scores, we determine how Different is from µ, or the number of standard errors (standard deviation units) the observed sample value is from the hypothesized value. In symbols,
  • 29.
    calculating the zscore using above formula is called computing the test statistic
  • 30.
    4. Decide aboutH0 Suppose we had found that the sample mean for 144 students was not 535, but 465. Our hypotheses, sampling distribution, and critical values (+1.96 and -1.96) remain the same, but now the test statistic is
  • 31.
    In other words,the observed sample mean ( = 465) is 1.20 standard errors above the hypothesized value of the population mean.
  • 32.
    Theoretical sampling distributionfor the hypothesis H0:µ=45, illustrating the values of the test statistic when =465 Note that the test statistic (1.20) does not exceed the critical value; it does not fall into the region of the rejection; and we should not reject the null hypothesis . -1.96 +1.96 1.20 9.60
  • 33.
    • This teststatistic (1.20) is then compared to the critical value (1.96). • If the test statistic exceeds the critical values in absolute value, then the null hypothesis is rejected. • If the test statistic does not exceeds the critical values in absolute value, then the null hypothesis is accepted.
  • 34.
    Region of rejection: Directional Alternative Hypothesis In the GRE example, we tested the null hypothesis against a non-directional alternative: H0 : µ = 455 H1 : µ ≠ 455 This test is called two-tailed or non-directional because the region of rejection was located in both tails of the sampling distribution of the mean. Suppose a direction of the results is anticipated. A directional hypothesis states that a parameter is either greater or less than the hypothesis value. For instance, in the GRE example we might use the alternative hypothesis that the mean GRE level of our population is greater than 455, in symbols, H0 : µ = 455 H1 : µ > 455
  • 35.
    An alternative hypothesiscan be either non-directional or directional. A directional alternative hypothesis states that the parameter is greater than or less than the hypothesized value. A non-directional alternative hypothesis merely states that the parameter is different from (not equal to) the hypothesized value.
  • 36.
    The test ofthe null hypothesis against a directional alternative is called a one-tailed test, the region of rejection is located in one of the two tails of the sampling distribution. The specific tail of the distribution is determined by the direction of the alternative hypothesis. Now suppose the alternative hypothesis states that the mean GRE was less than 455. In symbols, the hypotheses are H0 : µ = 455 H1 : µ < 455 Here the critical region lies on the left tail of the distribution.
  • 37.
    Type-I and Type-IIErrors in Decision Making In a specific situation, we may make one of two types of errors, as shown in the figure below: Decision taken by the investigator Existing Reality Group A=Group B Group A # Group B Group A # Group B P[ Type-I Error] (Level of significance) Correct Decision (Power of the study) Group A=Group B Correct Decision (Level of confidence) Type – II Error
  • 38.
    Testing of Hypothesis Q=1A random sample of 100 observations from a population with standard deviation 60 yielded a sample mean of 100. (a) Test the null hypothesis that µ=100 against the alternative hypothesis (µ≠100) using α=0.05. (b) Test the null hypothesis that µ=100 against the alternative hypothesis (µ>100) using α=0.05
  • 39.
    Testing of Hypothesis Ex=1A random sample of 200 observations from a population with standard deviation 80 yielded a sample mean of 150. (a) Test the null hypothesis that µ=100 against the alternative hypothesis (µ≠100) using α=0.05. (b) Test the null hypothesis that µ=100 against the alternative hypothesis (µ>100) using α=0.05
  • 40.
    • Ex=2 Arandom sample of 100 observations from a population with standard deviation 60 yielded a sample mean of 100. • (a)Test the null hypothesis that µ=111 against the alternative hypothesis (µ≠111) using α=0.05. • (b) Test the null hypothesis that µ<=111 against the alternative hypothesis (µ>111) using α=0.05 • Explain why the results differ.
  • 42.
    Q=2 The heightsof 10 males of a given locality are found to be as follows: 70, 67, 62, 68, 61, 68, 70, 64, 64, 66 inches. Is it reasonable to believe that the average height is greater than 64 inches? What will be the finding if alternative hypothesis was two-tailed
  • 43.
    Contd.. Answer Mean=66; S.D.=3.16and Variance=10.00, t=2.00 • The tabulated value of t-statistic at 9 d.f. and α=0.05 (one-tailed) is 1.833 • Since calculated value is greater than the tabulated value, we will reject the null hypothesis. We can believe that mean height is greater than 64 inches. • What will be the finding if alternative hypothesis was two-tailed (answer it).
  • 44.
    Student’s t Distributions Doesthe adjustment of using s to estimate σ have an effect on the statistical test? Actually, it does, especially for small samples. The effect is that the normal distribution is inappropriate as the sampling distribution of the mean. In the beginning of the 20th century William S. Gosset found that, for small samples, sampling distribution departed substantially from the normal distribution and that, as sample sizes changed, the distributions changed. This gave rise to not one distribution but a family of distributions. The t distributions are a family of symmetrical, bell-shaped distributions that change as the sample size changes.
  • 45.
    Degrees of Freedom Degreesof Freedom : The number of degrees of freedom is a mathematical concept defined as the number of observations less the number of restrictions placed on them.
  • 46.
    Student’s t distributionfor 1, 2, 5, 10, and ∞ degrees of freedom
  • 48.
    Point Estimates andInterval Estimates A point estimate is a single value that represent the best estimate of the population value. If we are estimating the mean of a population (µ), then the sample mean is the best point estimates. Interval Estimation builds on points estimation to arrive at a range of values that are tenable for the parameter and that define an interval we are confident contains the parameter.
  • 49.
    Confidence Interval CI= ±(ZCV) (σX) Where = Sample mean ZCV = Critical value using the normal distribution and σX = Standard error of the mean
  • 50.
    Confidence Interval CI= ±(tCV) (sX) Where = Sample mean tCV = Critical value using appropriate t distribution and sX = estimated standard error of the mean from the sample
  • 51.
    Comparison of TwoMeans • Q=As part of an investigation of the development of infant sleep patterns, the sleep of 20 infants (10 male and 10 female) was monitored on several occasions between 1 week and 6 months of age. The quiet sleep results (in minutes) at 1 week of age for the 20 study infants follow. • Is there evidence of a difference in quiet sleep behavior between two genders? • Is there evidence that male mean quiet sleep behavior is higher than female? Quiet sleep (male) 85 129 215 143 44 173 230 198 105 127 Mean= 144.90 Quiet sleep (female) 140 155 33 209 166 72 116 131 97 124 Mean= 124.30
  • 52.
    Sp is pooledvariance, Sm^2 and Sf^2 is variance of two sample set
  • 53.
    Contd… Answer For male;S1=59.35; S1 2=3522.54; Mean=144.90 For female; S2=49.48; S2 2=2448.011; Mean=124.30 • t=0.843 at 18 d.f.
  • 54.
    Paired-t-test • As partof a study to determine the effects of a certain oral contraceptive on weight gain; nine healthy females were weighted at the beginning of a course of oral contraceptive use. They were reweighed after 3 months. Results are given below. Do the results suggest evidence of weight gain? • Longitudinal Study/Real-Cohort Study Subject Initial weight (LBS) 3 - Months weight (LBS) 1 120 123 2 141 143 3 130 140 4 150 145 5 135 140 6 140 143 7 120 118 8 140 141 9 130 132
  • 55.
    • Contd… Answer •t=1.509 • One-tailed • Tabulated value of t at α=0.05 and d.f. =8 is 1.860 (one- tailed).
  • 56.
    Male Female 42.1 41.342.4 43.2 41.8 42.7 43.8 42.5 43.1 44.0 41.0 41.8 42.8 42.3 42.7 43.6 43.3 43.5 41.7 44.1 Do the data provide sufficient evidence to conclude that, on the average, the male weight is greater than female weight? Perform the required hypothesis test at the 5% level of significance.
  • 57.
    Proportion Test • Q=1In a sample of 1000 people in Maharashtra, 540 are rice eaters and the rest are wheat eaters. Can we assume that both rice and wheat are equally popular in this state at 1% level of significance? Z tabulated at 1% level of significance is 2.58 (two-tailed). Q=2 Twenty people were attacked by a disease and only 18 survived. Will you reject the hypothesis that the survival rate, if attacked by this disease, is 85% in favour of the hypothesis that it is more, at 5% level. Z tabulated at 5% level of significance is 2.58 (one-tailed).
  • 58.
    Q=3 In ayear there are 956 births in a town A of which 52.5% were males, while in towns A and B combined, this proportion in a total of 1406 births was 0.496. Is there any significant difference in the proportion of male births in the two towns? Z tabulated at 5% level of significance is 1.96 (two-tailed).
  • 59.
    References • Medical Statistics-Principles& Methods by K.R. Sundaram, S. N. Dwivedi and V Sreenivas.