The document discusses hypothesis testing and statistical inference. It begins by defining two types of statistical inference - hypothesis testing and parameter estimation. Hypothesis testing determines if sample data is consistent with a hypothesized population parameter, while parameter estimation provides an approximate value of the population parameter.
It then discusses key aspects of hypothesis testing, including stating the null and alternative hypotheses, developing an analysis plan, analyzing sample data, and deciding whether to accept or reject the null hypothesis. Examples are provided to illustrate hypothesis testing methodology and key concepts like p-values, significance levels, directional versus non-directional hypotheses, and applying the steps of hypothesis testing to evaluate a research study's results.
3. CHAPTER AGENDA
▪ Two Types of Statistical Inference
▪ Statistical inference encompasses two procedures: hypothesis testing and
parameter estimation.
▪ A hypothesis test determines if a sample of data is consistent with or contradicts
a hypothesis about the value of a population parameter, for example, the
hypothesis that its value is less than or equal to zero.
▪ A hypothesis test tells us if an effect is present or not, whereas an estimate tells
us about the size of an effect.
▪ Parameter estimation, uses the information in a sample to
determine the approximate value of a population parameter.
Thus, a hypothesis test tells us if an effect is present or not,
whereas an estimate tells us about the size of an effect.
3
4. Hypothesis Test
▪ A hypothesis is an educated guess about something in the world around you. It should be
testable, either by experiment or observation. For example:
A new medicine you think might work.
A way of teaching you think might be better.
It can really be anything at all as long as you can put it to the test.
What is a Hypothesis Statement?
If you are going to propose a hypothesis, it’s customary to write a statement. You statement will
look like this “If I…(do this to an independent variable)….then (this will happen to the
dependent variable).”
▪ A good hypothesis statement should:
▪ Include an “if” and “then” statement (according to the University of California).
▪ Include both the independent and dependent variables.
▪ Be testable by experiment, survey or other scientifically sound technique.
▪ Be based on information in prior research (either yours or someone else’s).
▪ Have design criteria (for engineering or programming projects).
4
5. Hypothesis Testing
▪ A case study:
▪ Let us say that average marks in mathematics of class 8th students of ABC School is 85. On
the other hand, if we randomly select 30 students and calculate their average score, their
average comes to be 95. What can be concluded from this experiment? It’s simple. Here are
the conclusions:
▪ These 30 students are different from ABC School’s class 8th students, hence their average
score is better i.e. behavior of these randomly selected 30 students sample is different from
the population (all ABC School’s class 8th students) or these are two different population.
▪ There is no difference at all. The result is due to random chance only i.e. we found the
average value of 85. It could have been higher / lower than 85 since there are students
having average score less or more than 85.
▪ How should we decide which explanation is correct? There are various methods to help you
to decide this. Here are some options:
▪ Increase sample size
▪ Test for another samples
▪ Calculate random chance probability
▪ The first two methods require more time & budget. Hence, aren’t desirable when time or
budget are constraints.
5
6. Parameter Estimation V/s Hypothesis Testing
▪ The hypothesis test evaluates the veracity of a conjecture about a population
parameter leading to an acceptance or rejection of that conjecture. In contrast,
estimation is aimed at providing a plausible value or range of values for the
population parameter.
▪ In this sense, estimation is a bolder endeavor and offers potentially more useful
information. Rather than merely telling us whether we should accept or reject a
specific claim such as a rule‘s average return is less than or equal to zero,
estimation approximates the average return and provides a range of values
within which the rule‘s true rate of return should lie at a specified level of
probability.
▪ For example, it may tell us that the rule‘s estimated return is 10% and there is a
95% probability that it falls within the range of 5% to 15%. This statement
contains two kinds of estimates;
▪ a point estimate, that the rule‘s return is 10%, and
▪ an interval estimate, that the return lies in the range 5% to 15%. The rule studies
discussed in Part Two use estimation as an adjunct to the hypothesis tests.
6
7. Hypothesis Tests versus Informal Inference
▪ If a rule has been profitable in a sample of historical data, this sample statistic
is an indisputable fact.
▪ what can be inferred about the rule‘s future performance? Is it likely to be
profitable because it possesses genuine predictive power or are profits unlikely
because its past profits were due to chance?
▪ The hypothesis test is a formal and rigorous inference procedure for deciding
which of these alternatives is more likely to be correct, and so can help us
decide if it would be rational to use the rule for actual trading in the future.
▪ Confirmatory Evidence: It’s Nice, It’s Necessary, but It Aren't Sufficient
▪ Informal inference is biased in favor of confirmatory evidence. informal
inference makes the mistake of assuming that confirmatory evidence is
sufficient to establish its truth.
▪ This is a logical error. Confirmatory evidence does not compel the conclusion
that the idea is true.
7
8. Hypothesis Tests versus Informal Inference
▪ The crucial distinction between necessary evidence and sufficient
evidence can be illustrated with the following example.
▪ Suppose we wish to test the truth of the assertion: The creature I
observe is a dog. We observe that the creature has four legs (the
evidence).
▪ This evidence is consistent with (i.e., confirmatory of) the creature
being a dog.
▪ In other words, if the creature is a dog, then it will necessarily have
four legs. However, four legs are not sufficient evidence to establish
that the creature is a dog.
▪ It may very well be another four-legged creature (cat, rhino, and so
forth).
▪ The logical basis of the hypothesis test is falsification of the
consequent
8
9. What Is a Statistical Hypothesis?
▪ A statistical hypothesis is a conjecture about the value of a population
parameter.
▪ If the observed value is close to the hypothesized value, the reasonable
inference would be that the hypothesis is correct. If, on the other
hand, the value of the sample value is far away from the hypothesized
value, the truth of the hypothesis is called into question.
▪ For example, suppose it is hypothesized that a rule‘s expected return
is equal to zero, but the back test produced a return of +20%. The
conclusion of the hypothesis test may say something like the
following:
▪ If the rule’s expected rate of return were truly equal to zero, there is a 0.03
probability that the back-tested return could be equal to or greater than
+20% due to chance.
9
10. Example of Statistical Hypothesis?
▪ If, for example, a person wants to test that a penny has exactly a 50%
chance of landing heads, the null hypothesis would be yes, and the
null hypothesis would be no, it does not. Mathematically, the null
hypothesis would be represented as Ho: P = 0.5. The alternative
hypothesis would be denoted as "Ha" and be identical to the null
hypothesis, except with the equal sign struck-through, meaning that it
does not equal 50%.
▪ A random sample of 100 coin flips is taken from a random population
of coin flippers, and the null hypothesis is then tested. If it is found
that the 100 coin flips were distributed as 40 heads and 60 tails, the
analyst would assume that a penny does not have a 50% chance of
landing heads, and would reject the null hypothesis and accept the
alternative hypothesis. Afterward, a new hypothesis would be tested,
this time that a penny has a 40% chance of landing heads
▪
10
11. Steps of Hypothesis Testing
▪ Four Steps of Hypothesis Testing
▪ All hypotheses are tested using a four-step process.
▪ The first step is for the analyst to state the two hypotheses so
that only one can be right.
▪ The next step is to formulate an analysis plan, which outlines
how the data will be evaluated.
▪ The third step is to carry out the plan and physically analyze the
sample data.
▪ The fourth and final step is to analyze the results and either
accept or reject the null.
▪
11
12. Dueling Hypotheses: The Null Hypothesis versus
the Alternative Hypothesis
▪ A hypothesis test, therefore, involves two
hypotheses. One is called the null hypothesis
and the other the alternative hypothesis.
▪ The alternative hypothesis, the one the
scientist would like to prove, asserts the
discovery of important new knowledge.
▪ Null hypothesis simply asserts that nothing
new has been discovered.
▪ For the TA rules tested in this book, the
alternative hypothesis asserts the rule has an
expected return greater than zero. The null
hypothesis asserts that the rule does not have
an expected return greater than zero.
12
Mutually Exclusive and
Exhaustive Hypotheses
13. Basics of Statistics
▪ “How should we calculate the random chance probability?“.
▪ Basics of Statistics
13
Z-Value/
Table/ p
value
Central
Limit
Theorem:
Significance
Level
14. Z-Value/ Table/ P value
▪ Z value is a measure of standard
deviation i.e. how many standard
deviation away from mean is the
observed value.
▪ For example, the value of z value = +1.8
can be interpreted as the observed
value is +1.8 standard deviations away
from the mean.
▪ P-values are probabilities.
▪ X is the point on the curve, μ is mean of
the population and σ is standard
deviation of population.
▪ the population distribution is not
normal, we’d resort to Central Limit
Theorem.
14
15. Central Limit Theorem
▪ Let’s look at the case below. Here, we have a data of
1000 students of 10th standard with their total marks.
Following are the derived key metrics of this population:
▪ Now, let’s take a sample of 40 students from this
population. So, how many samples can we take from this
population? We can take 25 samples(1000/40 = 25). Can
you say that every sample will have the same average
marks as population has (48.4)? Ideally, it is desirable but
practically every sample is unlikely to have the same
average.
▪ Here we have taken 1000 samples of 40 students
(randomly sample generated in excel). Let’s look at the
frequency distribution of these sample averages of
thousands samples and other statistical metrics:
▪ Is this some kind of distribution you can recall? Probably
not. These marks have been randomly distributed to all
the students.
15
16. Central Limit Theorem
▪ Now, let’s take a sample of 40 students from this
population. So, how many samples can we take from this
population? We can take 25 samples(1000/40 = 25). Can
you say that every sample will have the same average
marks as population has (48.4)? Ideally, it is desirable but
practically every sample is unlikely to have the same
average.
▪ Here we have taken 1000 samples of 40 students
(randomly sample generated in excel). Let’s look at the
frequency distribution of these sample averages of
thousands samples and other statistical metrics:
▪ We have calculated the random chance probability. It
comes out to be 40%, then should I go with first
conclusion or other one ? Here the “Significance
Level” will help us to decide.
16
17. What is Significance Level?
▪ We have taken an assumption that probability of sample mean 95 is
40%, which is high i.e. more likely that we can say that there is a greater
chance that this has occurred due to randomness and not due to
behavior difference.
▪ Had the probability been 7%, it would have been a no-brainer to infer
that it is not due to randomness. There may be some behavior difference
because probability is relatively low which means high probability leads
to acceptance of randomness and low probability leads to behavior
difference
▪ Now, how do we decide what is high probability and what is low
probability?
▪ In general, across all domains, cut off of 5% is accepted. This 5% is called
Significance Level also known as alpha level (symbolized as α).
▪ It means that if random chance probability is less than 5% then we can
conclude that there is difference in behavior of two different population.
(1- Significance level) is also known as Confidence Level i.e. we can say
that I am 95% confident that it is not driven by randomness.
17
18. Directional/ Non Directional Hypothesis Testing
▪ In previous example, our Null hypothesis was, there is no difference i.e.
mean is 100 and alternate hypothesis was sample mean is greater than
100.
▪ But, we could also set an alternate hypothesis as sample mean is not
equals to 100.
▪ the question is “Which alternate hypothesis is more suitable?”.
▪ There are certain points which will help you to decide which alternate
hypothesis is suitable.
▪ You are not interested in testing sample mean lower than 100, you only
want to test the greater value
▪ You have strong believe that Impact of raw cornstarch is greater
▪ In above two cases, we will go with One tail test.
▪ In one tail test, our alternate hypothesis is greater or less than the
observed mean so it is also known as Directional Hypothesis test.
18
19. Directional/ Non Directional Hypothesis Testing
▪ On the other hand, if you don’t know whether the impact of test is
greater or lower then we go with Two tail test also known as Non
Directional Hypothesis test.
▪ Let’s say one of research organization is coming up with new method of
teaching. They want to test the impact of this method.
▪ But, they are not aware that it has positive or negative impact. In such
cases, we should go with two tailed test.
▪ In one tail test, we reject the Null hypothesis if the sample mean is
either positive or negative extreme any one of them.
▪ But, in case of two tail test we can reject the Null hypothesis in any
direction (positive or negative).
19
20. Directional/ Non Directional Hypothesis Testing
▪ Two-tailed test allots half of your alpha
to testing the statistical significance in
one direction and half of your alpha in
the other direction.
▪ This means that .025 is in each tail of the
distribution of your test statistic.
▪ Why are we saying 0.025 on both side
because normal distribution is
symmetric.
▪ Now we come to a conclusion that
Rejection criteria for Null hypothesis in
two tailed test is 0.025 and it is lower
than 0.05 i.e. two tail test has more strict
criteria to reject the Null Hypothesis.
20
21. Example
▪ Templer and Tomeo (2002) reported that the population mean score on
the quantitative portion of the Graduate Record Examination (GRE)
General Test for students taking the exam between 1994 and 1997 was
558 ± 139 (μ ± σ). Suppose we select a sample of 100 participants (n =
100). We record a sample mean equal to 585 (M = 585). Compute the p-
value t0 check whether or not we will retain the null hypothesis (μ = 558)
at 0.05 level of significance (α = .05).
▪ Step-1: State the hypotheses.
▪ The population mean is 558.
▪ H0: μ= 558
H1: μ ≠ 558 (two tail test)
21
22. Example
▪ Step-2: Set up the significance level.
▪ As stated in the question, it as 5% (0.05). In a non-directional two-tailed
test, we divide the alpha value in half so that an equal proportion of area
is placed in the upper and lower tail.
▪ So, the significance level on either side is calculated as: α/2 = 0.025. and
z score associated with this (1-0.025=0.975) is 1.96.
▪ As this is a two-tailed test, z-score(observed) which is less than -1.96 or
greater than 1.96 is a evidence to reject the Null hypothesis.
▪ Step-3: Compute the random chance probability or z score
▪ For this set of data: z= (585-558) / (139/√100)=1.94
▪ You can look at the probability by looking at z- table and p-value
associated with 1.94 is 0.9738 i.e. probability of having value less than
585 is 0.9738 and more than or equals to 585 is (1-0.9738)=0.03
22
23. Example
▪ Step-4: Here, to make a decision,
we compare the obtained z value
to the critical values (+/- 1.96).
▪ We reject the null hypothesis if the
obtained value exceeds a critical
values.
▪ Here obtained value (Zobt= 1.94) is
less than the critical value.
▪ It does not fall in the rejection
region.
▪ The decision is to retain the null
hypothesis.
23
24. Hypothesis Testing
A hypothesis is an educated guess about something in the world
around you. It should be testable, either by experiment or
observation.
Statistical Inference is defined as the procedure of analyzing the result
and making conclusions from data based on random variation.
A hypothesis test determines if a sample of data is consistent with or
contradicts a hypothesis about the value of a population parameter
Parameter estimation, uses the information in a sample to determine
the approximate value of a population parameter.
25. Hypothesis Testing
A hypothesis test, therefore, involves two hypotheses. One is
called the null hypothesis and the other the alternative
hypothesis.
The alternative hypothesis, the one the scientist would like to
prove, asserts the discovery of important new knowledge.
Null hypothesis simply asserts that nothing new has been
discovered.
26. Hypothesis Testing
A hypothesis that is built upon a certain
directional relationship between two variables
and constructed upon an already existing
theory, is called a directional hypothesis.
Non Directional involves an open-ended non-
directional hypothesis that predicts that the
independent variable will influence the
dependent variable; however, the nature or
direction of a relationship between two
subject variables is not defined or clear.
A non directional hypothesis is used when a
two-tailed test of significance is run, and a
directional hypothesis when a one-tailed test
of significance is run.
27. Z Value & P Value
Z-score tells you how many standard deviations you are away
from the mean. If a z-score is equal to 0, it is on the mean. A
positive z-score indicates the raw score is higher than the mean
average. For example, if a z-score is equal to +1, it is 1 standard
deviation above the mean.
Z Table also called the standard normal table, is a mathematical
table that allows us to know the percentage of values below (to
the left) a z-score in a standard normal distribution (SND).
A P-value measures the probability of obtaining the observed
results, assuming that the null hypothesis is true. The lower the p-
value, the greater the statistical significance of the observed
difference.
28. Central limit theorem
The central limit theorem (CLT) states that the distribution of
sample means approximates a normal distribution as the sample
size gets larger, regardless of the population's distribution.
Sample sizes equal to or greater than 30 are often considered
sufficient for the CLT to hold.
A key aspect of CLT is that the average of the sample means
and standard deviations will equal the population mean and
standard deviation.
A sufficiently large sample size can predict the characteristics of a
population more accurately.
29. Hypothesis Testing
Significance Level -
what is high probability and what is low probability?
In general, across all domains, cut off of 5% is accepted. This 5%
is called Significance Level also known as alpha level
(symbolized as α).
It means that if random chance probability is less than 5% then
we can conclude that there is difference in behavior of two
different population. (1- Significance level) is also known as
Confidence Level i.e. we can say that I am 95% confident that it is
not driven by randomness.
30. Monte Carlo Simulation Method
Monte Carlo simulation is a model used to predict the
probability of different outcomes when the intervention of
random variables is present.
Monte Carlo simulations help to explain the impact of risk
and uncertainty in prediction and forecasting models
A variety of fields utilize Monte Carlo simulations, including
finance, engineering, supply chain, and science.
The basis of a Monte Carlo simulation involves assigning
multiple values to an uncertain variable to achieve multiple
results and then averaging the results to obtain an estimate.
Monte Carlo simulations assume perfectly efficient markets.