This document provides an introduction to applied statistics and various statistical concepts. It discusses the normal (Gaussian) distribution, standard deviation, standard error of the mean, and confidence intervals. Examples and explanations are provided for each concept. Hands-on examples for calculating these statistics in Excel, SPSS, and Prism are also presented. The document aims to explain key statistical terms and how they are applied in data analysis.
2. Applied Statistics
part 1
Introduction
Normal (Gaussian) Distribution
Standard Deviation
Standard Error of the Mean
Confidence Interval of the Mean
3. Which one do you hate most?
Cockroaches or statistics?
4. Introduction
There are three kinds of lies: Lies, Damn Lies, and Statistics!
(Mark Twain)
I can prove anything by statistics - except the truth. (George
Canning)
It is true we can tell a lie easily with statistics, yet is easier to
lie without it!
It is for deciding about a population with as accuracy as
possible.
… and no one claims it is error free.
5.
6.
7.
8.
9. Introduction (Cont,d)
By statistics we try to extrapolate variables from
“sample” to “population”.
Kinds of variables:
Categorical, Nominal, qualitative
Measurement, Numeric, Quantitative
Continuous
Discrete
Ordinal (differences between values are not the same)
Interval (differences between values are the same)
Ratio (similar to interval, but also has a clear 0.0)
10. Normal (Gaussian) Distribution
The experimenters think it can be proved by
mathematics; and the mathematicians, believe it has
been established by observation. (W. Lippmann)
Named after Carl Friedrich Gauss, 19th century
German mathematician.
It underlies the assumption of many statistical tests.
The distribution emerges when many independent
random factors act in an additive manner to create
variability.
11. Normal (Gaussian) Distribution (Cont,d)
μ (mu) is the mean
of the population
σ (sigma) is the
standard deviation
of the population
21. Standard Deviation (SD)
Standard deviation (SD) is the variability or scatter of
the numbers compared to their mean.
SD is a number that tells you how far numbers are
from their mean.
Obviously, the higher is SD, the more observations
needed to make results reasonable.
SD of a sample (s) is always greater than the SD of
its related population (σ). why?
22. Standard Deviation (SD)
The data may have the same mean yet their pattern is
different.
The data may have the same mean and SD, yet their
pattern is different!
The unit of SD is similar to the unit of the variable in
question.
This makes its interpretation easier, compared to the
variance.
What is variance and Is it useful? Variance is SD
squared and it is not useful!
25. Standard Deviation (SD)
Hands-on practice
To calculate Mean and SD in Excel:
For SD of a sample: =STDEV.S(number1,[number2],...)
For Mean of a sample: =AVERAGE(number1,[number2],...)
To calculate Mean and SD in SPSS:
Analyze => Descriptive Statistics => Frequencies => Statistics =>
Mean & Standard Deviation check boxes
Analyze => Descriptive Statistics => Descriptives => Options =>
Mean & Standard Deviation check boxes
Analyze => Descriptive Statistics => Explore => Statistics =>
Descriptive check box
To calculate Mean and SD in Prism:
Analyze => Column statistics => Mean, SD, SEM check box
27. Individual observations (X's) and means (red dots) for random
samples from a population with a parametric mean of 5 (blue line)
Standard Error of The Mean (SEM)
28. Standard Error of The Mean (SEM)
Means ±1 standard deviation of 100 random samples (N=3) from a
population with a parametric mean of 5 (blue line).
Note that there are 100 means, NOT 100 observations. The “X”s represent
the mean of 3 observations.
The calculated standard
deviation of 100 sample
means is 0.63
29. Standard Error of The Mean (SEM)
Means ±1 standard error of 100 random samples (N=3) from a population
with a parametric mean of 5 (blue line).
Note that there are 100 means, NOT 100 observations. The “X”s represent
the mean of 3 observations.
30. Standard Error of The Mean (SEM)
Means ±1 standard error of 100 random samples (N=20) from a population
with a parametric mean of 5 (blue line).
Note that there are 100 means, NOT 100 observations. The “X”s represent
the mean of 20 observations.
31. Standard Error of The Mean (SEM)
The SEM quantifies how precisely you know the true
mean of the population.
It is a measure of how far your sample mean is likely to
be from the true population mean.
SEM =
𝑆𝐷
𝑁
The higher is SD, the less precise is your estimation of
the population mean.
The SEM is always smaller than the SD. Why?
32. Standard Error of The Mean (SEM)
Hands-on practice
To calculate SE in Excel:
For SE of a sample: =STDEV/SQRT(COUNT(sampling
range))
To calculate SE in SPSS:
Analyze => Descriptive Statistics => Frequencies =>
Statistics => S.E. mean check box
Analyze => Descriptive Statistics => Descriptives => Options
=> S.E. mean check box
To calculate SE in Prism:
Analyze => Column statistics => Mean, SD, SEM check box
35. Confidence Interval (CI)
Statistically and mathematically CI and SEM are different,
but conceptually and practically they serve the same
purpose.
How sure are you? confidence interval (CI) is the way to
answer this.
The CI of a mean tells you how precisely you have
determined the mean.
SEM is the probability about the difference between the
mean of the population and the mean of the sample.
CI is directly related to SEM, so if CI includes ZERO, it
means: NO difference!
36. Confidence Interval (CI)
The X represent the mean of the sample, and the bars represent the SEM of
the sample. The red dot and red bars are samples which do not include the
mean of the population. the blue line is the mean of the population (which
we do NOT know).
37. Confidence Interval (CI)
The X represent the mean of the sample, and the bars represent the SEM of
the sample. The red dot and red bars are samples which do not include the
mean of the population. the blue line is the mean of the population (which
we do NOT know).
38. Confidence Interval (CI)
It is globally accepted to calculate 95% CI.
A 95% CI is a range that you can be 95% certain
contains the true mean of the population.
Don't misinterpret CI as the range that contains 95%
of the values!
Is it possible that the CI of a mean does not include
the true mean?
39. The graph shows three samples (of different size) all sampled
from the same population
Confidence Interval (CI) 30
40. ten sets of data (N=5), from a Gaussian distribution
with a mean of 100 and a standard deviation of 35
Confidence Interval (CI)
41. 95% CI of the mean for each sample.
Confidence Interval (CI)
42. Confidence Interval (CI)
A common rule-of-thumb is that the 95% CI is
computed from the mean ± 2 SEMs.
So you may roughly double the size of the SEM error
bars, to represent them as CI error bars. Why?
With large samples, the rule is accurate, with small
ones, the CI is much wider than anticipated by this
rule.
45. Confidence Interval (CI)
We can express the precision of any computed value
as a 95% CI (eg: CI of a slope for the best-fit value,
CI of SD).
There is a myth that when two means have
overlapping CIs, the means are not significantly
different.
Another version is: if each mean is outside the CI of
another mean, the means are significantly different.
Neither of these is true!
46. Confidence Interval (CI)
It is easy for two sets of data to have overlapping CIs,
yet still be significantly different.
Conversely, each mean can be outside the confidence
interval of the other, yet they're still not significantly
different.
Do not compare two means by visually comparing
their confidence intervals, just use the correct
statistical test.
47. Confidence Interval (CI)
The error bars may be asymmetrical.
This is especially true with nominal variables eg: the
number of cigarettes smoked, or the number of color
blind men.
In these cases a zero or negative number makes no sense.
We know this because there are some occurrences of the
variable in the population.
The calculation method of CI is different if this is the
case.
48. Confidence Interval (CI)
Hands-on practice
To calculate CI in Excel:
For normal distribution:
=CONFIDENCE.NORM(alpha,standard_dev,size)
To calculate CI in SPSS:
Analyze => Descriptive Statistics => Explore => Statistics =>
Descriptive check box
Analyze => Compare means => One sample T Test =>
Options
To calculate CI in Prism:
Analyze => Column statistics => CI of the Mean check box
51. 1.6: Introduction to Plots
• A plot(graphs) is a graphical technique for
representing a data set.
• Graphs are a visual representation of the
variables and relationship between
variables.
• Plots are very useful for humans who can
quickly derive an understanding which
would not come from lists of values.
59. Charts
Pie Chart
6/23/2009 Arsia Jamali-Students' Scientific
Research Center
60
Disrtribution of Stage of the
Pancreatic Cancer In Patients
IV
III
II
I
Bar Chart
Disrtribution of Stage of the Pancreatic
Cancer In Patients
0
20
40
60
80
100
120
IV III II I
IV
III
II
I
60. 6/23/2009 Arsia Jamali-Students' Scientific
Research Center
61
Charts
Histogram
6/23/2009
Arsia Jamali-Students' Scientific
Research Center
61
Area
61. 6/23/2009 Arsia Jamali-Students' Scientific
Research Center
62
6/23/2009
Arsia Jamali-Students' Scientific
Research Center
62
Charts
Box Plot
6/23/2009
Arsia Jamali-Students' Scientific
Research Center
62
Error Bar
62. 6/23/2009 Arsia Jamali-Students' Scientific
Research Center
63
Charts
Clustered Bar
6/23/2009
Arsia Jamali-Students' Scientific
Research Center
63
Scatter Plot
0
500
1000
1500
2000
2500
3000
3500
4000
4500
0 10 20 30 40 50
Birth
Weight
Gestational Week
Disrtribution of Stage of the Pancreatic Cancer According
To Age In Patients
0
10
20
30
40
50
60
70
IV III II I
105 80 20 10
Stage
Number
of
The
Patients
Male
Female
63. Significance of Cluster Bar
0%
5%
10%
15%
20%
25%
30%
Stage I Stage II Stage
III
Stage
IV
Distributaion of Stage
in Pancreatic Cancer
Pateints
0%
10%
20%
30%
40%
50%
60%
Male Female
Distribuation of Gender
in Pancreatic Cancer
Pateints
Arsia Jamali-Students' Scientific
Research Center
64
6/23/2009
64. Significance of Cluster Bar
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
80.0%
90.0%
Stage I Stage II Stage III Stage IV
Male
Female
Arsia Jamali-Students' Scientific
Research Center
3/5/2022 65