This document provides an overview of an introductory course on statistical concepts at the University of South Florida. It outlines the course objectives, which are to identify the course structure, recap foundational statistics concepts, and identify the programming structure in SAS. The agenda covers topics like data analytics, probability, statistical inference, distributions, and SAS basics. It also discusses key statistical thinking concepts like variation, inference from data, and the relationship between data, information, knowledge and wisdom. Hypothesis testing and its errors and power are explained. Issues with correlated data are also covered.
1. U N I V E R S I T Y O F S O U T H F L O R I D A //
Overview of Statistical Concepts
Introduction to Course
Dr. S. Shivendu
2. U N I V E R S I T Y O F S O U T H F L O R I D A // 2
Objectives
Overview of Statistical Concepts
Identify the structure of the course.
01
Recap foundational statistics concepts.
02
Identify the programming structure in SAS.
03
3. U N I V E R S I T Y O F S O U T H F L O R I D A // 3
Agenda
Overview of Statistical Concepts
Data Analytics
Data Science, Business Intelligence, and Statistical Thinking
Probability
Statistics, Statistical Inference, and Statistics Learning
Common families of distributions
Parametric and non-parametric methods
SAS Basics
SAS environment, program syntax, and running program
Structure of data
Types of data, generating log and output
4. U N I V E R S I T Y O F S O U T H F L O R I D A // 4
Course Textbooks
5. Business Analytics
Providing insight from data To the right people At the right time
There is not a single way to define business analytics.
In this course, business analytics is about delivering decision support by…
6. U N I V E R S I T Y O F S O U T H F L O R I D A // 6
How You Do It?
Business analytics is the
scientific process of
transforming data into
insight for better decision
making.
Business analytics is
specific to the business
context.
Value proposition is not
correctness alone, but
“better decisions”.
What makes a decision
better?
7. U N I V E R S I T Y O F S O U T H F L O R I D A // 7
Decision Making
A process of choosing among two or more alternative
courses of action for the purpose of attaining a goal.
Analytics supports decision marking.
Having clarity of goals or objectives is key to decision
making.
Goals are exogenous but are key to value creation.
8. U N I V E R S I T Y O F S O U T H F L O R I D A // 8
Simon’s Model of Decision Making
Herbert A. Simon
Intelligence
Identifies the problem or
opportunity
Design
Inventing or developing
alternatives
Choice
Compare and select a
solution
He won the Nobel Prize in
Economics in 1978 “for his
pioneering research into the
decision-making process within
economic organizations”.
9. U N I V E R S I T Y O F S O U T H F L O R I D A // 9
Use Data to “Know”
Wisdom
Information
Knowledge
Data
10. U N I V E R S I T Y O F S O U T H F L O R I D A // 10
Use Data to “Know”
Connectedness
Understanding
Data
Information
Knowledge
Wisdom
Understanding
relations
Understanding
patterns
Understanding
principles
11. U N I V E R S I T Y O F S O U T H F L O R I D A // 11
Business and Data Analytics
Modern organizations are usually managed by facts for performance
evaluation, improvement, and decision making.
Data: key inputs to decision models.
Analysis: extracting larger meaning from data to support
evaluation and decision making.
Data
availability
Time and
effort
Analysis v.
instinct
Boss’
expectations
12. U N I V E R S I T Y O F S O U T H F L O R I D A // 12
Statistical Thinking
You may not have all data. For example, Population vs.
sample, or All vs. a subset
Decisions are usually based on incomplete information.
Variation exists in all processes. You may not know all
perspectives on an issue.
Things in the future may not be consistent with what
happened before.
We usually rely on the relations between variables from data
and make inferences.
13. U N I V E R S I T Y O F S O U T H F L O R I D A // 13
VS
Probability is used when we have some model or
representation of the world and want to answer questions
like: “What kind of data will this truth produce?”
Informal Definition
Probability is a numerical description of how likely an
event is to occur or how likely it is that a proposition is
true.
Formal Definition:
What is Probability?
14. U N I V E R S I T Y O F S O U T H F L O R I D A // 14
Statistical Thinking
A set of mathematical
procedures for
summarizing and
interpreting observations.
Descriptive statistics
Inferential statistics
Statistics
Observations are typically
numerical or categorical.
Facts about specific
people or things are
usually referred to
as data.
Observations Necessary?
Statistical Thinking is a
thought process and not
a mere “application of a
set of methods.
Process
Statistical thinking will
one day be as
necessary for efficient
citizenship as the
ability to read and
write.
H.G. Well
15. U N I V E R S I T Y O F S O U T H F L O R I D A // 15
Hypothesis Testing
Steps
Define your hypotheses (null, alternative)
Specify your null distribution
Do an experiment
Reject or fail to reject (~accept) the null hypothesis
Calculate the p-value of what you observed
16. U N I V E R S I T Y O F S O U T H F L O R I D A // 16
Hypothesis Testing
Error and Power
Type-I Error (also known as “α”)
Rejecting the null when the effect isn’t real.
Type-II Error (also known as “β “)
Failing to reject the null when the effect is real.
POWER (the flip side of type-II error: 1- β)
The probability of seeing a true effect if one exists.
17. U N I V E R S I T Y O F S O U T H F L O R I D A // 17
Hypothesis Testing
Pascal’s Wager
God exists
Big mistake
Correct
Big pay off
God doesn’t exist
Correct
Minor mistake
The Truth
Your Decision
Reject God
Accept God
18. U N I V E R S I T Y O F S O U T H F L O R I D A // 18
Hypothesis Testing
Type I and Type II Errors in a Box
H0 True
(example: the drug
doesn’t work)
Type I error (α)
Correct
H0 False
(example: the drug
works)
Correct
Type II error (β)
True State of Null Hypothesis
Your Statistical Decision
Reject H0
(ex: you conclude that the
drug works)
Do not reject H0
(ex: you conclude that there
is insufficient evidence that
the drug works)
19. Type I Error Rate Type II Error Rate Statistical Power
The probability of finding an effect
that isn’t real (false positive).
If we require p-value<.05 for
statistical significance, this means
that 1/20 times we will find a
positive result just by chance.
The probability of missing an effect
(false negative).
The probability of finding an effect if it is
there (the probability of not making a
type II error).
When we design studies, we
typically aim for a power of 80%
(allowing a false negative rate,
or type II error rate, of 20%).
Hypothesis Testing
Error and Power
20. U N I V E R S I T Y O F S O U T H F L O R I D A // 20
Pitfalls of Hypothesis Testing
Over-emphasis on p-
values.
Clinically unimportant
effects may be
statistically significant if a
study is large (and
therefore, has a small
standard error and
extreme precision).
Over-Emphasis
Statistical significance
does not imply a cause-
effect relationship.
Interpret results in the
context of the study
design.
No Equal Causation
Results that are not
statistically significant
should not be interpreted as
"evidence of no effect,” but
as “no evidence of effect”
Studies may miss effects if
they are insufficiently
powered (lack precision).
Low Statistical Power
The fallacy of comparing
statistical significance.
The effect was significant
in the treatment group,
but not significant in the
control group” does not
imply that the groups
differ significantly.
Comparison
21. U N I V E R S I T Y O F S O U T H F L O R I D A // 21
Correlated Data
Are the observations independent or correlated?
Observations are unrelated
(usually different, unrelated
people)
Some are related to one
another, for example the
same person over time
Independent Correlated
Example – split-face trial
Side of face
(Unit of observation)
56
subjects
Apply SPF 85
sunscreen on one
side of the face, SPF
50 in the other half
The outcome is sunburn
(Yes or no)
Hours engaged in
outdoor sports
Observations are
correlated
22. U N I V E R S I T Y O F S O U T H F L O R I D A // 22
Correlated Data
Overestimate p-values for within-person or
within-cluster comparisons
Underestimate p-values for between-person
or between-cluster comparisons
Ignoring correlations will…
23. U N I V E R S I T Y O F S O U T H F L O R I D A //
You have reached the end
of the presentation.
Editor's Notes
from Business Analytics for Managers: Taking Business Intelligence Beyond Reporting, by Gert H.N. Laursen & Jesper Thorlund
Ref: Wimalawansa et al. Am J Med 1998, 104:219-226.