3. What is statistics?
Statistics is the science of collecting, analyzing, presenting, and
interpreting data, as well as of making decisions based on such
analyses. (To test hypotheses and answer research questions)
According to Horace Secrist, statistics means “aggregate of facts
affected to marked extent by multiplicity of causes, numerically
expressed, enumerated or estimated according to a reasonable
standard of accuracy, collected in a systematic manner for a
predetermined purpose and placed in relation to each other”
4. Educated guesses vs pure guesses
It helps us to make educated and intelligent decisions
Statistics has
2
aspects/
2
word meaning - theoretical and applied
Theoretical or mathematical statistics - deals with the development,
derivation, and proof of statistical theorems, formulas, rules, and
laws
Applied statistics- applications of those theorems, formulas, rules,
and laws to solve real-world problems.
5. Descriptive Vs Inferential
Data set, element, observation
Descriptive statistics - consists of methods for organising,
displaying, and describing data by using tables, graphs, and
summary measures
Inferential statistics - what is a population? sample? Inferential
statistics consists of methods that use sample results to help make
decisions or predictions about a population
6. Heart dise
Cancer
Accidents
Stroke
Alzheimer
Diabetes
Influenza a
Suicide
a. What is the
b. How many
c. How many
February 6, 2015 (Source: www.cdc.gov).
Cause of Death Number of Deaths
Heart disease 611,105
Cancer 584,881
Accidents 130,557
Stroke 128,978
Alzheimer’s disease 84,767
Diabetes 75,578
Influenza and Pneumonia 56,979
Suicide 41,149
Identify the variables, data set, element, observations
7. Scope of Statistics
Raw data into comprehensible form; convincing form of presenting facts, probability
theory and measuring uncertainty, testing hypothesis, drawing inferences regarding
the characteristics of the universe on the basis of the sample data
Ancient times, statistics as the science of state craft; now all arenas, social economics,
sciences, politics, military and
fi
scal policies,
fi
ve year plan & Neeti Ayog
Prasanta Chandra Mahalanobis, Ronald
fi
sher
Research and development ; Forecasting, exit polls, politics; Economic planning;
Health, agriculture, business management, etc
Covid pandemic and statistics
8. Limitations
It is important to know what statistics can and cannot do.
Aggregates and averages, individual aspects are ignored, lack of qualitative
aspects, validity implications
Misuse of statistics through ignorance and conscious deceit
Data manipulation
Journalists, politicians and others increasingly use statistical results to make a
point or bolster an argument
Love jihad and narcotic jihad
10. Central value or typical value for a probability distribution
Average or just the central location
Measures of central tendency are de
fi
ned for a population (large set
of observations of a similar nature; a sample portion of the
observation of a population
Descriptive statistics - to
fi
nd representative value
Mean - average; Median - midpoint; most frequently observed value
in a data set
12. Variability of data ( to know how much homogenous or
heterogeneous the data is) Variations of the items among themselves
around an average
Greater the variation amongst different items of a series, the more
will be the dispersion
To determine the reliability of two or more series
13. Range
The difference between the maximum and minimum value of a
sample one a given variable
Range = max-min
Example :
1
,
3
,
5
,
6
,
7
(range =
7
-
1
=
6
)
14. Quartile Deviation
The quartiles are values that divide a list of numbers into quarters.
It is the half of the distance between the third and the
fi
rst quartile
15. Standard Deviation
Most commonly used measure of dispersion. Measure of spread of
data about the mean.
SD=square root of sum of squared deviation from the mean divided
by the number of observations
16. Parametric and Non-Parametric
Tests
For testing hypothesis
We assume that the data used must show a certain distribution,
usually normal distribution
For normal distribution - Parametric test; eg. t Test, ANOVA, Pearson
correlation
Non-normal distribution - Non parametric tests - eg. Spearman
correlation
17. Statistical Tests
Purpose of research question / hypothesis
Comparison (identifying the difference between the given items) and relationship
(identifying the connection
Type of data -
1
. categorical (qualitative or yes, no answers)
2
. continuous (often expressed in numbers; eg. height, weight, marks
scored etc)
If the purpose is comparison and data type is categorical =chi-squared
If the purpose is comparison and data type is both categorical and continuous then t Test
If the purpose is identifying the relationship and type of data is continuous then
correlation
18. Chi-Squared = comparison, categorical only
t-Test = comparison, categorical and continuous
Correlation = relationship, continuous type data only
19. Correlation
Correlation - the linear (extending along a strait line) relationship of variables
Parental education and child’s career achievement
If the correlation analysis show that tow characteristics are related, then it
should be investigated whether one variable can be used to predict the other
Regression
The strength of the correlation is determined by the correlation coef
fi
cient
which varies between -
1
and +
1
Direction of the correlation - positive and negative
20. Calculation of the correlation
coef
fi
cient
Correlation analysis according to Karl Pearson’s Coef
fi
cient of
Correlation
Spearman’s Rank Correlation Coef
fi
cient
SPSS