Basic knowledge on
statistics.
Data: any collection of facts and figures. It is the
raw material to be processed by computer.
Dataset, case and variable
• Data is the information you collect from the experiment, survey
• Dataset is the representation of data defining a set of variables that are
measured on a set of case.
• Variable is any characteristics of object
• Recorded information about an object we observe a case.
• Cases: individuals under study
• Variables: characters of individual under study
• Dependent variable: that depends on other factor (on whom)
• Independent variable: that does not depend on other factor (whose effect)
Data processing cycle
Statistics
•The science of collectiong, organizing, presenting,
analyzing, and interpreting data to assist in making
more effective decisions
•Statistical analysis – used to manipulate
summarize, and investigate data, so that useful
decision-making information results.
Measure of dispersion
Standard deviation is most widely used measure of
dispersion.
Should I use standard deviation or standard
error?
Standard deviation is a measure of dispersion of the data from the
mean.
It clarifies the standard amount of variation on either side of the mean.
Standard Deviation is a descriptive
statistic, whereas the standard error is an
inferential statistic.
You might have observed that different samples, with identical size,
drawn from the same population, will give diverse values of statistic
under consideration, i.e. sample mean.
Standard Error (SE) provides, the standard deviation in different values
of the sample mean. It is used to make a comparison between sample
means across the populations.
Standard error is a measure of how precise is our estimate of the mean.
 It gives an idea of the exactness and reliability of the estimate.
Note that the standard error of the mean depends on the sample size,
the standard error of the mean shrink to 0 as sample size increases to
infinity.
Decision criteria?
It depends.
If the message you want to carry is about the spread and
variability of the data, then standard deviation is the metric to
use.
If you are interested in the precision of the means or in
comparing and testing differences between means then
standard error is your metric.
What is skewness?
Skewness can be quantified to define the extent to which a
distribution differs from a normal distribution.
The skewness value of normal distribution is 0.
Kurtosis is a measure of the peakedness of a distribution.
Z values must be between -1.96 to 1.96
Shapiro wilk test p value should be >0.05
Histogram should approximately indicate normal
distribution
Types of statistics
• Descriptive statistics: describes the population, provides description
and summarization of data
Mean, median, mode, s.d.
• Inferential statistics: Draw inferences and predictions about a
population based on a sample of data taken from the population.
S.e., t test, chi square, ANOVA
Parametric tests and non parametric tests
Level of measurement
• Nominal: no ranking, no mathematical observation (+,-,x, divide)
• Ordinal: ranking but no mathematical observation
• Interval: ranking, add, substract
• Ratio: 0 has meaning, ranking, all mathematical observation
Summary
Information Nominal Ordinal Interval Ratio
Classification Yes Yes Yes Yes
Rank order Yes Yes Yes
Equal interval Yes Yes
Nonarbitrary
zero
Yes
(Source: Singleton and Straits, 2005)
Types of parametric test we will be doing
One sample t test
Paired Sample t test
Independent sample t test
One way ANOVA
Corelation
Type on non parametric test we will be doing
Chi square test
Friedman test
Kruskal wallis test
Sign test
In multi variate test we will be doing
• Binary logistic

Basic knowledge on statistics

  • 1.
  • 2.
    Data: any collectionof facts and figures. It is the raw material to be processed by computer.
  • 3.
    Dataset, case andvariable • Data is the information you collect from the experiment, survey • Dataset is the representation of data defining a set of variables that are measured on a set of case. • Variable is any characteristics of object • Recorded information about an object we observe a case. • Cases: individuals under study • Variables: characters of individual under study • Dependent variable: that depends on other factor (on whom) • Independent variable: that does not depend on other factor (whose effect)
  • 12.
  • 13.
    Statistics •The science ofcollectiong, organizing, presenting, analyzing, and interpreting data to assist in making more effective decisions •Statistical analysis – used to manipulate summarize, and investigate data, so that useful decision-making information results.
  • 17.
  • 18.
    Standard deviation ismost widely used measure of dispersion.
  • 19.
    Should I usestandard deviation or standard error? Standard deviation is a measure of dispersion of the data from the mean. It clarifies the standard amount of variation on either side of the mean. Standard Deviation is a descriptive statistic, whereas the standard error is an inferential statistic.
  • 20.
    You might haveobserved that different samples, with identical size, drawn from the same population, will give diverse values of statistic under consideration, i.e. sample mean. Standard Error (SE) provides, the standard deviation in different values of the sample mean. It is used to make a comparison between sample means across the populations. Standard error is a measure of how precise is our estimate of the mean.  It gives an idea of the exactness and reliability of the estimate. Note that the standard error of the mean depends on the sample size, the standard error of the mean shrink to 0 as sample size increases to infinity.
  • 21.
    Decision criteria? It depends. Ifthe message you want to carry is about the spread and variability of the data, then standard deviation is the metric to use. If you are interested in the precision of the means or in comparing and testing differences between means then standard error is your metric.
  • 22.
    What is skewness? Skewnesscan be quantified to define the extent to which a distribution differs from a normal distribution. The skewness value of normal distribution is 0. Kurtosis is a measure of the peakedness of a distribution. Z values must be between -1.96 to 1.96 Shapiro wilk test p value should be >0.05 Histogram should approximately indicate normal distribution
  • 23.
    Types of statistics •Descriptive statistics: describes the population, provides description and summarization of data Mean, median, mode, s.d. • Inferential statistics: Draw inferences and predictions about a population based on a sample of data taken from the population. S.e., t test, chi square, ANOVA
  • 24.
    Parametric tests andnon parametric tests
  • 25.
    Level of measurement •Nominal: no ranking, no mathematical observation (+,-,x, divide) • Ordinal: ranking but no mathematical observation • Interval: ranking, add, substract • Ratio: 0 has meaning, ranking, all mathematical observation
  • 26.
    Summary Information Nominal OrdinalInterval Ratio Classification Yes Yes Yes Yes Rank order Yes Yes Yes Equal interval Yes Yes Nonarbitrary zero Yes (Source: Singleton and Straits, 2005)
  • 27.
    Types of parametrictest we will be doing One sample t test Paired Sample t test Independent sample t test One way ANOVA Corelation
  • 28.
    Type on nonparametric test we will be doing Chi square test Friedman test Kruskal wallis test Sign test
  • 29.
    In multi variatetest we will be doing • Binary logistic