2. Introduction
• The development of any science depends
upon empirical research in that area. The term
research refers to the systematic method of
defining the problem, formulating a
hypothesis , collecting the data, analyzing it
and drawing conclusions.
3. Hypothesis and Research
• The first step in research is framing a hypothesis.
• A hypothesis is a tentative statement about the
relationship between two or more variables. It is
a specific, testable prediction about what you
expect to happen in a study.
• Hypothesis testing is an act in statistics whereby
an analyst tests an assumption regarding a
population parameter.
• It is just the statement which is to be proved or
disproved.
4. Karl Popper explanation of Psuedo
Science
• According to Popper, you observe 1st swan it is white...
• You observe 2nd swan it is white...
• You observe 3rd swan it is white...
• And you draw conclusions that all swans are white.
• This was Sigmund Freud theory. It was based on human
behaviour and human phenomena.
• He said that methods of Sigmund Freud can be used to
prove or disprove anything.
• So that is Pseudo Science.
5. Karl Popper on hypothesis
• Karl Popper argued that instead of white
swans you start looking around black swan to
disprove the theory and that is the reason our
null hypothesis is null and void.
• That is Science!!!!!!!
• Null Hypothesis : All swans are not white
• Alternative Hypothesis : All swans are white
8. Three types of Hypothesis
Research Hypothesis
Consuming coffee has effect on sleep hours
Statistical Hypothesis
H0: There is no significant effect of consumption of coffee
on sleep hours .....(remember Karl Popper...
we test null hypothesis)
H1: There is significant effect of consumption of coffee on
sleep hours
Both are mutually exclusive events.
Substantial Hypothesis
Null and Alternative Hypothesis are Mathematical
Opposites
9. Framing the Hypothesis
• The statement which we want to prove is
alternative hypothesis. So our research starts by
disproving the null hypothesis.
• First write the alternative hypothesis
• Alternative Hypothesis: Mortality rate is high in
old age patients affected by Covid 19. (all swans
are white)
• Null Hypothesis: There is no significant effect of
age on mortality rate of Covid 19 patients. (all
swans are not white)
10. Are you confused in
framing hypothesis
Normally what we want to
disprove is null hypothesis
• When we begin to test a theory, are we
looking to confirm it, or disconfirm it???
25. Normal Probability Distribution
• Gaussian Probability Distribution by Karl Gauss
• Random Variable is continuous
• Known as Normal law of Error stands out in the
history of mankind as one of the broadest
generalization of natural philosophy
• Guiding instrument for researchers in Physical &
Social Sciences , medicine ,agriculture and
engineering
• Tool for the analysis and interpretation of the
basic data obtained by observation & experiment
26.
27. Normal Distribution
• The normal distribution is a probability function
that describes how the values of a variable are
distributed. It is a symmetric distribution where
most of the observations cluster around the
central peak and the probabilities for values
further away from the mean taper off equally in
both directions.
• A normal distribution has
some interesting properties: it has a bell shape,
the mean and median are equal, and 68% of the
data falls within 1 standard deviation.
30. Characterstics of Normal Distribution
• Bell shaped curve where area under the curve is
the probability area
• Perfectly symmetrical curve
• Mean , Median and Mode lie at one point in the
middle
• The probability under the curve is divided +_3
standard deviations
• Used when sample size is large
• The tails of the curve never meet the X axis
38. Central Limit Theorem
The central limit theorem states that if you have
a population with mean μ and standard
deviation σ and take sufficiently large random
samples from the population with
replacement, then the distribution of the
sample means will be approximately normally
distributed.
39. Why Standardize
• Problem :Professor Willoughby is marking a test.
• Here are the students results (out of 60 points):
• 20, 15, 26, 32, 18, 28, 35, 14, 26, 22, 17
• Most students didn't even get 30 out of 60,
and most will fail.
• The test must have been really hard, so the Prof
decides to Standardize all the scores and only fail
people 1 standard deviation below the mean.
40. Solution
• The Mean is 23, and the Standard Deviation is
6.6, and these are the Standard Scores:
• -0.45, -1.21, 0.45, 1.36, -0.76, 0.76, 1.82, -
1.36, 0.45, -0.15, -0.91
• Only 2 students will fail (the ones who scored
15 and 14 on the test)
41. Check Normality of Data
Descriptive Statistics
Shapiro Wilk Test
SK Test-Kolmogorov–Smirnov test
SES Test
Ku Test
SEK Test
Graphical Method- Q&Q Plot
Formal Test
KST Test
SWT Test
ADT Test
RJT Test
43. Population Vs. Sample
• A population includes all of the elements from
a set of data.
• A sample consists one or more observations
drawn from the population.
• A population may refer to an entire group of
people, objects, events, hospital visits, or
measurements.
• In statistics, a population is the entire pool
from which a statistical sample is drawn.
46. Inferential Statistics
46
• Inferential Statistics
– Many situations require information about a large group of
elements (individuals, companies, products, customers, etc.).
But, because of the paucity of time, cost, etc., data can be only
collected from only a small portion of the group
– The larger group of elements in a particular study is called the
population, and the smaller group is called the sample
– Statistics uses data from a sample to make estimates and test
hypotheses about the characteristics of a population through a
process referred to as statistical inference.
47. The Reality…
• We can rarely study a whole population, so inference is tried from a
sample of the population
• There will always be random variation from sample to sample
• In general, smaller samples have less precision, reliability, and
statistical power (more sampling variability)
47
48. Parameter Vs. Statistics
• A parameter is a value that describes a
characteristic of an entire population, such as
the population mean.
• A statistic is a characteristic of a sample.
• If you collect a sample and calculate the mean
and standard deviation, these are sample
statistics.
• Inferential statistics allow you to use sample
statistics to make conclusions about a population.
50. Types of Tests
Parametric Test
The statistical test which makes assumptions
about the distribution of population
parameters are known as parametric tests.
Non Parametric Test
The alternative which makes no assumptions
about the distribution of population
parameters are known as non parametric
tests.
52. How Do We State The Null and
Alternative Hypotheses?
H0: The means for all groups are the same
(equal).
H1: The means are different for at least one
pair ofgroups.
H0: 1 = 2 = ………. =k
H1: 1 2 ………. k
53. P value
• The level in which we are allowed to
reject the null hypothesis when it is
true or Type 1 Error
• A rule of thumb is if p-value < 0.05
(5% level of significance) we reject
null hypothesis
• if p-value > 0.05 (5% level of
significance) we fail to reject null
hypothesis.
54.
55. Hypothesis Testing Elements (cont’d.)
55
Significance Level (alpha = α): The level in which we are
allowed to reject the null hypothesis
Who decides the alpha level: By convention, the researcher decides
the significance level (1%, 5% or 10%)
• Probability Value (p): The probability of an observed statistic
occurring on the basis of the sampling distribution.
• If p < significance level (α = .05) Reject null hypothesis
Statistically
significant
• If p > significance level (α = .05) Fail to reject null hypothesis
Statistically
non-
56. Tcal and Ttab to decide Hypothesis
• Tcal < Ttab – Fail to Reject Null Hypothesis
• Tcal>Ttab – Reject Null Hypothesis
Tcal , Zcal ,Fcal will be obtained from formulae
Ttab ,Ztab,Ftab will be obtained from Tables
All Software give Tcal,Fcal along with p values
57. Meaning of ‘significant’
• When we say that something is statistically significant, it
means that the probability of something happening by chance
is less than our confidence or significance level.
57
58. Inferential Statistics
• Types of Errors
• Type I
• Type II
• Type I
• rejecting the null when it’s true
• in law, we don’t want to convict innocent
• “controlled” by alpha level (Confidence Level e.g., 99% or 95%)
• Type II
• NOT rejecting the null when it’s wrong
• In medicine, we’d rather treat someone who isn’t sick than to NOT treat someone who is
• Beta, effect size, power of a test, alpha level (Confidence Level)
H0 is true H0 is false
We reject
H0
Type I error OK
We don’t
reject H0
OK Type II
Error
59. Hypothesis Testing Elements (cont’d.)
59
Probability
By using inferential statistics to make decisions, we can report the
probability that we have made a Type-I error (indicated by the p
value we report)
By reporting the p value, we inform readers to the problems that
we were incorrect when we decided to reject the null hypothesis.
60. Normality Test in Excel
• Descriptive Statistics –Check
- Value of Mean , Median and Mode
- Value of Skewness and Kurtosis ( should be
within +_ 2)
• Check Histogram – Shape of the curve
• Check Box & Whisker Plot – Symmetry and
Outliers
• Alternatively K-S test can be done in excel also
61. Skewness: “Refers to lack of
symmetry”
[Excellent]-1------------------Skewness--------------------+1
[Acceptable]-2------------------Skewness--------------------+2
62. Skewness and Kurtosis
Statistical software packages will give some measure of skewness and
kurtosis for a given numeric variable.
Skewness measures departure from symmetry and is usually
characterized as being left or right skewed.
Kurtosis measures “peakedness” of a distribution and comes in two
forms, platykurtosis and leptokurtosis.
63. Skewness and Kurtosis
Statistical software packages will give some measure of skewness and
kurtosis for a given numeric variable.
Skewness measures departure from symmetry and is usually
characterized as being left or right skewed.
Kurtosis measures “peakedness” of a distribution and comes in two
forms, platykurtosis and leptokurtosis.
Kurtosis checks how sharply the tails taper off
64. Kurtosis:”degree of flatness and
peakdness
[Excellent]-1-----------------Kurtosis--------------------+1
[Acceptable]-2------------------Kurtosis--------------------+2
66. Normal Distribution Problem
• The average weight of girls in Indian subcontinent
is 48Kgs with a standard deviation of 3Kgs. What
is the probablity that a girl will be
a) Between 51 and 54Kg
b) Between 54 and 57Kg
c) Less than 39Kg
d) More than 57kg
e) Between 39 and 42Kg
f) No of girls between 42-45Kg if total population
of girls is 3Cr (30000000)