HYPOTHESIS TESTING & DATA
PROCESSING
By
Suresh Sundar
Data Analysis
Critical examination of the assembled and grouped data for studying
the characteristics of object under study and for determining the
patterns of relationship among the variables relating to it.
Purpose
• Summarizes data into understandable and meaningful forms
• To make exact descriptions
• To identify the causal factors
• To identify the underlying complex phenomena
• To draw reliable inference from the observed data
• To make estimations or generalizations from sample surveys
Types
Descriptive:
Describes the nature of an object under study
Inferential:
Drawing inferences and conclusions from the findings of a research
study
Descriptive analysis:
• It describes the population or characteristics of population under
study.
• It organizes and present data in a meaningful way
• Mean, Median, mode, standard deviation, variance
Example: suppose a pet shop sells cats, dogs and fish and if 100 pets
were sold, out of which 40 were dogs then one description of the data
on pets sold would be that 40% were dogs
Inferential analysis:
• Drawing conclusions about the population based on sample analysis
and observation
• It compares, tests and predicts data
Example: if we want to know the average height of all men in the city
with a population of so may million residents.
Hypothesis
• It is an assumption or a statement that may or may not be true
• In research it is a formal question that has to be resolved
• It is tested on the basis of information obtained from a sample
Hypothesis Testing
• It is a statistical test used to determine whether there is enough
evidence in a sample of data to infer that a certain condition is true
for the entire population
• They are widely used in business and industry for making decisions
Example:
How much rainfall affects plant growth
How an increase in labor affects productivity
Types
Two opposing hypotheses
•Null Hypothesis
Commonly accepted fact that researchers try to nullify
•Alternate Hypothesis
The hypothesis that researcher is trying to prove
Null Hypothesis(Ho)
• It is the statement being tested
• Usually it is the statement of “no effect” or “no difference”
• It proposes that no statistical significance exists between the two
variables in the hypothesis
• It is presumed to be true until statistical evidence nullifies it for
alternate hypothesis
Example: There is no significant difference/relationship between
advertising budget and sales volume
Alternate Hypothesis(H1)
• Contrary to null hypothesis
• It states that there is a significant difference between the two
variables under study
Example: there is a significant difference/relationship between
advertising budget and sales volume
One-tailed and two tailed tests
One-tailed: If null hypothesis gets rejected when a value of the test
statistic falls in one specified tail of the distribution
Two-tailed: If null hypothesis gets rejected when a value of the test
statistic falls in either one or the other of the two tails of its sampling
distribution
Example
• Consider a soft drink bottling plant which dispenses soft drinks in
bottles of 300 ml capacity. The bottling is done through an automatic
plant. An overfilling of bottle means a huge loss to the company given
the large volume of sales and an under filling means the customers
are getting less than 300ml of drink when they are paying for 300ml.
This could bring bad reputation to the company. Therefore it would
prefer to test the hypothesis whether the mean content of the bottles
is different from 300ml.
Two-tailed/two-sided hypothesis
Ho : µ = 300ml
H1 : µ ≠ 300ml
One-tailed/one-sided hypothesis
Ho : µ = 300ml
H1 : µ > 300ml (or)
H1 : µ < 300ml
Errors
• The acceptance or rejection of a hypothesis is based upon sample
results and there is always a possibility of sample not being
representative of the population.
• This could result in errors as a consequence of which inferences
drawn could be wrong.
Correct
decision
Type 1
error
Type 2
error
Correct
decision
Accept Ho Reject Ho
Ho True
Ho False
Types
Type 1 Error : If the hypothesis Ho is rejected when it is actually true.
It is denoted by α. This is termed as level of significance.
Type 2 Error : If the null hypothesis Ho is accepted when it is actually
false.
Limitations
• It is not decision making itself, but it helps in decision making
• It does not explain the reasons why the difference exist but only
indicate difference is due to fluctuations in sampling or other reasons.
• Tests are based on probabilities and cannot be expressed with full
certainty.
• The inferences based on significance tests cannot be said to be
entirely correct evidence regarding the truth of hypothesis.
Steps in testing of hypothesis
1. Setting up of a hypothesis
2. Setting up of a suitable significance level
3. Determination of a test statistic
4. Determination of critical region
5. Computing the value of test statistic
6. Making decisions
1.Setting up of a hypothesis
• First step is to establish the hypothesis to be tested(assumptions
about the value of the population parameter)
Null Hypothesis(Ho)
Alternate Hypothesis(H1)
• The two hypothesis are formulated in such a way that is one is true
the other is false and vice versa
Criteria for hypothesis formulation
• It should be empirically testable, whether it is right or wrong
• It should be specific and precise
• It should specify the variables between which the relationship is to be
established
• It should describe one issue only
• It must be consistent with known facts
2.Setting a suitable significance level(α)
• Α denotes the probability of rejecting the null hypothesis when it is
true
• It varies from problem to problem, but usually taken as either 5% or
1%
• A 5% level of significance means that there are 5 chances out of 100
that a null hypothesis will get rejected when it should be accepted.
• It means that the researcher is 95% confident that a right decision has
been taken.
• Therefore the confidence with which a researcher rejects or accepts a
null hypothesis depends upon α.
3.Determination of test statistic
• It is a standardized value that is calculated from sample data during
hypothesis testing.
• It compares and measures the degree of agreement between our
sample data with what is expected under null hypothesis.
• The larger the test statistic, the smaller the p-value and the more
likely you are to reject the null hypothesis.
Types of Test statistic
Hypothesis test Test statistic
Z-test Z-score
T-rest T-score
ANOVA F-statistic
Chi-square test Chi-square statistic
4.Determination of critical region
• The area under the sampling distribution curve is divided into two
mutually exclusive regions called acceptance and rejection region.
• The value of test statistic that will lead to the rejection or acceptance
of null hypothesis is called critical region.
• For a significance level of α, the optimal critical region for a two-tailed
test consists of α/2 per cent area in the right and left hand tail of the
distribution.
5.Computing the value of the test statistic
• The next step is to compute the value of the test statistic based on a
random sample of size ‘n’.
• Then we have to examine whether it falls in the critical/rejection
region or acceptance region.
6.Decision making
• If the value of the test statistic falls within the acceptance region then
null hypothesis is accepted and if it falls within the critical region then
it is rejected.
• If the hypothesis is being tested at 5% level of significance, it would
be rejected if the observed values have a probability of less than 5%.
• In that case the difference between sample statistic and the
hypothesized population parameter is considered to be significant
and vice versa.
Example
A sample of 200 bulbs made by a company gives a lifetime mean of
1540 hours with a standard deviation of 42 hours. Is it likely that the
sample has been drawn from a population with a mean lifetime of 1500
hours? You may use 5% level of significance.
Solution:
Sample size n=200
Mean X=1540
Standard Deviation s=42 hrs
Ho : µ = 1500(the bulbs have a mean life of 1500 hrs)
H1 : µ ≠ 1500(the bulbs don’t have a mean life of 1500 hrs)
Z = X-µ
s/√n
Z = 13.47
Standard normal table value is 1.96
Null hypothesis is rejected.
THANK YOU

Statistical analysis

  • 1.
    HYPOTHESIS TESTING &DATA PROCESSING By Suresh Sundar
  • 2.
    Data Analysis Critical examinationof the assembled and grouped data for studying the characteristics of object under study and for determining the patterns of relationship among the variables relating to it.
  • 3.
    Purpose • Summarizes datainto understandable and meaningful forms • To make exact descriptions • To identify the causal factors • To identify the underlying complex phenomena • To draw reliable inference from the observed data • To make estimations or generalizations from sample surveys
  • 4.
    Types Descriptive: Describes the natureof an object under study Inferential: Drawing inferences and conclusions from the findings of a research study
  • 6.
    Descriptive analysis: • Itdescribes the population or characteristics of population under study. • It organizes and present data in a meaningful way • Mean, Median, mode, standard deviation, variance Example: suppose a pet shop sells cats, dogs and fish and if 100 pets were sold, out of which 40 were dogs then one description of the data on pets sold would be that 40% were dogs
  • 7.
    Inferential analysis: • Drawingconclusions about the population based on sample analysis and observation • It compares, tests and predicts data Example: if we want to know the average height of all men in the city with a population of so may million residents.
  • 8.
    Hypothesis • It isan assumption or a statement that may or may not be true • In research it is a formal question that has to be resolved • It is tested on the basis of information obtained from a sample
  • 9.
    Hypothesis Testing • Itis a statistical test used to determine whether there is enough evidence in a sample of data to infer that a certain condition is true for the entire population • They are widely used in business and industry for making decisions Example: How much rainfall affects plant growth How an increase in labor affects productivity
  • 10.
    Types Two opposing hypotheses •NullHypothesis Commonly accepted fact that researchers try to nullify •Alternate Hypothesis The hypothesis that researcher is trying to prove
  • 11.
    Null Hypothesis(Ho) • Itis the statement being tested • Usually it is the statement of “no effect” or “no difference” • It proposes that no statistical significance exists between the two variables in the hypothesis • It is presumed to be true until statistical evidence nullifies it for alternate hypothesis Example: There is no significant difference/relationship between advertising budget and sales volume
  • 12.
    Alternate Hypothesis(H1) • Contraryto null hypothesis • It states that there is a significant difference between the two variables under study Example: there is a significant difference/relationship between advertising budget and sales volume
  • 13.
    One-tailed and twotailed tests One-tailed: If null hypothesis gets rejected when a value of the test statistic falls in one specified tail of the distribution Two-tailed: If null hypothesis gets rejected when a value of the test statistic falls in either one or the other of the two tails of its sampling distribution
  • 14.
    Example • Consider asoft drink bottling plant which dispenses soft drinks in bottles of 300 ml capacity. The bottling is done through an automatic plant. An overfilling of bottle means a huge loss to the company given the large volume of sales and an under filling means the customers are getting less than 300ml of drink when they are paying for 300ml. This could bring bad reputation to the company. Therefore it would prefer to test the hypothesis whether the mean content of the bottles is different from 300ml.
  • 15.
    Two-tailed/two-sided hypothesis Ho :µ = 300ml H1 : µ ≠ 300ml One-tailed/one-sided hypothesis Ho : µ = 300ml H1 : µ > 300ml (or) H1 : µ < 300ml
  • 16.
    Errors • The acceptanceor rejection of a hypothesis is based upon sample results and there is always a possibility of sample not being representative of the population. • This could result in errors as a consequence of which inferences drawn could be wrong. Correct decision Type 1 error Type 2 error Correct decision Accept Ho Reject Ho Ho True Ho False
  • 17.
    Types Type 1 Error: If the hypothesis Ho is rejected when it is actually true. It is denoted by α. This is termed as level of significance. Type 2 Error : If the null hypothesis Ho is accepted when it is actually false.
  • 18.
    Limitations • It isnot decision making itself, but it helps in decision making • It does not explain the reasons why the difference exist but only indicate difference is due to fluctuations in sampling or other reasons. • Tests are based on probabilities and cannot be expressed with full certainty. • The inferences based on significance tests cannot be said to be entirely correct evidence regarding the truth of hypothesis.
  • 19.
    Steps in testingof hypothesis 1. Setting up of a hypothesis 2. Setting up of a suitable significance level 3. Determination of a test statistic 4. Determination of critical region 5. Computing the value of test statistic 6. Making decisions
  • 21.
    1.Setting up ofa hypothesis • First step is to establish the hypothesis to be tested(assumptions about the value of the population parameter) Null Hypothesis(Ho) Alternate Hypothesis(H1) • The two hypothesis are formulated in such a way that is one is true the other is false and vice versa
  • 22.
    Criteria for hypothesisformulation • It should be empirically testable, whether it is right or wrong • It should be specific and precise • It should specify the variables between which the relationship is to be established • It should describe one issue only • It must be consistent with known facts
  • 23.
    2.Setting a suitablesignificance level(α) • Α denotes the probability of rejecting the null hypothesis when it is true • It varies from problem to problem, but usually taken as either 5% or 1% • A 5% level of significance means that there are 5 chances out of 100 that a null hypothesis will get rejected when it should be accepted. • It means that the researcher is 95% confident that a right decision has been taken. • Therefore the confidence with which a researcher rejects or accepts a null hypothesis depends upon α.
  • 24.
    3.Determination of teststatistic • It is a standardized value that is calculated from sample data during hypothesis testing. • It compares and measures the degree of agreement between our sample data with what is expected under null hypothesis. • The larger the test statistic, the smaller the p-value and the more likely you are to reject the null hypothesis.
  • 25.
    Types of Teststatistic Hypothesis test Test statistic Z-test Z-score T-rest T-score ANOVA F-statistic Chi-square test Chi-square statistic
  • 26.
    4.Determination of criticalregion • The area under the sampling distribution curve is divided into two mutually exclusive regions called acceptance and rejection region. • The value of test statistic that will lead to the rejection or acceptance of null hypothesis is called critical region. • For a significance level of α, the optimal critical region for a two-tailed test consists of α/2 per cent area in the right and left hand tail of the distribution.
  • 28.
    5.Computing the valueof the test statistic • The next step is to compute the value of the test statistic based on a random sample of size ‘n’. • Then we have to examine whether it falls in the critical/rejection region or acceptance region.
  • 29.
    6.Decision making • Ifthe value of the test statistic falls within the acceptance region then null hypothesis is accepted and if it falls within the critical region then it is rejected. • If the hypothesis is being tested at 5% level of significance, it would be rejected if the observed values have a probability of less than 5%. • In that case the difference between sample statistic and the hypothesized population parameter is considered to be significant and vice versa.
  • 30.
    Example A sample of200 bulbs made by a company gives a lifetime mean of 1540 hours with a standard deviation of 42 hours. Is it likely that the sample has been drawn from a population with a mean lifetime of 1500 hours? You may use 5% level of significance. Solution: Sample size n=200 Mean X=1540 Standard Deviation s=42 hrs
  • 31.
    Ho : µ= 1500(the bulbs have a mean life of 1500 hrs) H1 : µ ≠ 1500(the bulbs don’t have a mean life of 1500 hrs) Z = X-µ s/√n Z = 13.47 Standard normal table value is 1.96 Null hypothesis is rejected.
  • 32.