1. U N I V E R S I T Y O F S O U T H F L O R I D A //
Introduction to
Statistical Methods
Dr. S. Shivendu
2. U N I V E R S I T Y O F S O U T H F L O R I D A // 2
Objectives
Introduction to Statistical Methods
Differentiate parametric and non-parametric
methods.
01
Analyze the role of random variable,
probability distributions and their properties
in statistical thinking.
02
Describe the central limit theorem, and its
importance in statistical inference. .
03
3. U N I V E R S I T Y O F S O U T H F L O R I D A // 3
Agenda
Introduction to Statistical Methods
Random variables
Properties of random samples
Sampling
Central Limit Theorem
Statistical testing
Hypothesis test
Statistical inference
Interval estimation and interpretation
4. U N I V E R S I T Y O F S O U T H F L O R I D A // 4
Statistical Thinking
Fundamental to the thought process is that variation
exists in populations and process data.
It involves applying rational thought and the science of statistics to
critically assess data and inferences.
Focus on the data generation process and missing data or
missing information on unobserved variables or missing
variables.
5. U N I V E R S I T Y O F S O U T H F L O R I D A // 5
What is a Parametric Method?
Methods that are either based
on normality assumptions on
the population or use the
Central Limit Theorem
This method is used to
compare means and to
perform hypothesis tests
The T-test is one of the most
used tests and parametric
methods lead to confidence
intervals and p-value
What about situations where
we are interested in another
sample statistic like median, or
the data is nominal or ordinal?
6. U N I V E R S I T Y O F S O U T H F L O R I D A // 6
Normality Test
An informal approach to testing normality is to compare a
histogram of the sample data to a normal probability curve.
Graphical Methods
The empirical distribution of the data (the histogram) should be
bell-shaped and resemble the normal distribution.
This might be difficult to see if the sample is small.
7. U N I V E R S I T Y O F S O U T H F L O R I D A // 7
Non-Parametric Methods
The method can be used with nominal data.
In general, for a statistical method to be classified as nonparametric, it
must satisfy at least one of the following conditions:
The method can be used with ordinal data.
The method can be used with interval or ratio data when no
assumption can be made about the population probability
distribution.
8. U N I V E R S I T Y O F S O U T H F L O R I D A // 8
Estimating Distributions and Distribution-free Tests
Plot histogram and
eyeball or normality test
Common approach
Normality tests are used
to also compute how
likely it is for a random
variable underlying the
data set to be normally
distributed.
Normally distributed
In descriptive statistics
terms, a normality test
measures the goodness
of fit of a normal model
Model Fitness
In frequentist statistics
statistical hypothesis
testing, data are tested
against the null
hypothesis that it is
normally distributed.
Null Hypothesis
9. U N I V E R S I T Y O F S O U T H F L O R I D A // 9
Inference
Statistical inference is the act of generalizing from a sample to a
population with a calculated degree of certainty.
We want to learn about population parameters…
…but we can only calculate sample statistics
10. U N I V E R S I T Y O F S O U T H F L O R I D A // 10
Parameters and Statistics
It is essential that we draw distinctions between parameters and statistics.
Source
Calculated
p
s
x ˆ
,
,
Constants
Examples
11. U N I V E R S I T Y O F S O U T H F L O R I D A // 11
Parameters and Statistics
We are going to illustrate inferential concept by considering how well a given sample
mean “x-bar” reflects an underling population mean µ.
12. U N I V E R S I T Y O F S O U T H F L O R I D A // 12
Precision and Reliability
How precisely does a given sample mean (x-bar)
reflect the underlying population mean (μ)?
How reliable are our inferences?
To answer these questions, we consider a simulation experiment in
which we take all possible samples of size n taken from the population
13. U N I V E R S I T Y O F S O U T H F L O R I D A // 13
VS
N = 10,000
Lognormal shape (positive skew)
μ = 173
σ = 30
Population
Take repeated random samples, each of n = 100
Calculate the x-bar in each sample
Sampling distributions
Simulation Experiment
14. U N I V E R S I T Y O F S O U T H F L O R I D A // 14
VS
Population Sampling distributions
Simulation Experiment Results
Distribution B is more Normal than
distribution A Central Limit Theorem
Both distributions centered on µ x-
bar is unbiased estimator of μ
Distribution B is skinnier than distribution
A related to “square root law”
15. U N I V E R S I T Y O F S O U T H F L O R I D A // 15
Reiteration of Key Findings
Simulation Experiment
Finding 1 (Central Limit Theorem)
The sampling distribution of the x-bar tends toward Normality even
when the population is not Normal (esp. strong in large samples).
Finding 2 (Unbiasedness)
The expected value of the x-bar is μ
Finding 3
Is related to the square root law, which says:
n
x
16. U N I V E R S I T Y O F S O U T H F L O R I D A // 16
Standard Deviation of the Sample Mean
Standard Error
The standard deviation of the sampling distribution of the mean has a
special name: standard error of the mean (denoted σxbar or SExbar)
n
SEx
x
The square root law says:
17. U N I V E R S I T Y O F S O U T H F L O R I D A // 17
Square Root Law
Quadrupling the sample size cuts the standard error of the sample mean in half
Example: σ = 15
15
1
15
n
SEx
For n = 1
5
.
7
4
15
n
SEx
For n = 4
75
.
3
16
15
n
SEx
For n = 16
18. U N I V E R S I T Y O F S O U T H F L O R I D A // 18
Putting It Together
The sampling distribution of the x-bar tends to be Normal with mean µ and σxbar = σ / √n
x ~ N(µ, SE)
For n = 10
19. U N I V E R S I T Y O F S O U T H F L O R I D A // 19
Law of Large Numbers
As a sample gets larger and larger, the x-bar approaches μ. The figure demonstrates
results from an experiment done in a population with μ = 173.3
20. U N I V E R S I T Y O F S O U T H F L O R I D A // 20
Point Estimates
Estimating population parameter from sample statistic
Sample statistic is the point estimate of the population
parameter
What is sampling variation?
The larger the sample size, the lower the sampling
variation. Why?
21. U N I V E R S I T Y O F S O U T H F L O R I D A // 21
Sampling Distribution
Distribution of point estimate based on many independently
drawn samples of fixed size.
So, what is sampling distribution? Given a sampling distribution,
how to interpret a point estimate?
The standard deviation of a sampling distribution is the
“Standard Error” of an estimate.
SE of sample mean = population standard deviation/ sqrt(n)
22. U N I V E R S I T Y O F S O U T H F L O R I D A // 22
Other Estimates
Confidence interval Central Limit Theorem
Some conditions
23. U N I V E R S I T Y O F S O U T H F L O R I D A // 23
Inference for Numerical Data
One-sample mean with small sample size:
The sampling distribution is approximately t-distribution
The normality conditions
The sample is from an approximately normally distributed
population and sample observations are independent
The sample size is large, and the population distribution is not
too skewed (small number of outliers) and sample
observations are independent
24. U N I V E R S I T Y O F S O U T H F L O R I D A // 24
What is T-Distribution?
Centered at zero and defined by single parameter:
degrees of freedom = n-1
T-Confidence interval for the sample mean
One–sample T-Test
Testing for differences in two populations means
T-test for paired data
25. U N I V E R S I T Y O F S O U T H F L O R I D A // 25
Inference for Categorical Data
Population proportion
Inference for single proportion
Inference for difference in
proportions of two populations
26. U N I V E R S I T Y O F S O U T H F L O R I D A //
You have reached the end
of the presentation.
Editor's Notes
Ref: Wimalawansa et al. Am J Med 1998, 104:219-226.