SlideShare a Scribd company logo
1 of 21
Review of Chapters 1- 5
We review some important themes from the first 5 chapters
1. Introduction
• Statistics- Set of methods for collecting/analyzing data (the
art and science of learning from data). Provides methods for
• Design – Planning / Implementing a study
• Description – Graphical and numerical methods for
summarizing the data
• Inference – Methods for making predictions about a
population (total set of subjects of interest, real or
conceptual), based on a sample
2. Sampling and Measurement
• Variable – a characteristic that can vary in value among
subjects in a sample or a population.
Types of variables
• Categorical
• Quantitative
• Categorical variables can be ordinal (ordered categories) or
nominal (unordered categories)
• Quantitative variables can be continuous or discrete
• Classifications affect the analysis; e.g., for categorical
variables we make inferences about proportions and for
quantitative variables we make inferences about means
Randomization – obtaining reliable data
by reducing potential bias
Simple random sample: In a sample survey, each
possible sample of size n has the same chance of
being selected.
Randomization in a survey used to get a good cross-
section of population. With such probability sampling
methods, standard errors tell us how close sample
statistics tend to be to population parameters.
(Otherwise, the sampling error is unpredictable.)
Experimental vs. observational
studies
• Sample surveys are examples of observational
studies (merely observe subjects without any
experimental manipulation)
• Experimental studies: Researcher assigns
subjects to experimental conditions.
– Subjects should be assigned at random to the
conditions (“treatments”)
– Randomization “balances” treatment groups with
respect to lurking variables that could affect
response (e.g., demographic characteristics,
SES), makes it easier to assess cause and effect
3. Descriptive Statistics
• Numerical descriptions of center (mean and median),
variability (standard deviation – typical distance from
mean), position (quartiles, percentiles)
• Bivariate description uses regression/correlation
(quantitative variable), contingency table analysis
(categorical variables). Graphics include histogram,
box plot, scatterplot
•Mean drawn toward longer tail for skewed distributions, relative to
median.
•Properties of the standard deviation s:
• s increases with the amount of variation around the mean
•s depends on units of the data (e.g. measure euro vs $)
•Like mean, affected by outliers
•Empirical rule: If distribution approx. bell-shaped,
about 68% of data within 1 std. dev. of mean
about 95% of data within 2 std. dev. of mean
all or nearly all data within 3 std. dev. of mean
Sample statistics /
Population parameters
• We distinguish between summaries of samples
(statistics) and summaries of populations
(parameters).
Denote statistics by Roman letters, parameters by
Greek letters:
• Population mean =m, standard deviation = s,
proportion  are parameters. In practice,
parameter values are unknown, we make
inferences about their values using sample
statistics.
4. Probability Distributions
Probability: With random sampling or a randomized
experiment, the probability an observation takes a
particular value is the proportion of times that outcome
would occur in a long sequence of observations.
A probability distribution lists all the possible values and
their probabilities (which add to 1.0)
Like frequency dist’s, probability distributions
have mean and standard deviation
Standard Deviation - Measures the “typical” distance of
an outcome from the mean, denoted by σ
( ) ( )
E Y yP y
m   
Normal distribution
• Symmetric, bell-shaped
• Characterized by mean (m) and standard deviation (s),
representing center and spread
• Prob. within any particular number z of standard
deviations of m is same for all normal distributions
• An individual observation from an approximately
normal distribution satisfies:
– Probability 0.68 within 1 standard deviation of mean
– 0.95 within 2 standard deviations
– 0.997 (virtually all) within 3 standard deviations
Notes about z-scores
• z-score represents number of standard deviations that a value
falls from mean of dist.
• A value y is z = (y - µ)/σ standard deviations from µ
• The standard normal distribution is the normal dist with µ =
0, σ = 1
Sampling dist. of sample mean
• is a variable, its value varying from sample to
sample about population mean µ. Sampling
distribution of a statistic is the probability distribution
for the possible values of the statistic
• Standard deviation of sampling dist of is called the
standard error of
• For random sampling, the sampling dist. of
has mean µ and standard error
popul. std. dev.
sample size
y
n
s
s  
y
y
y
y
y
Central Limit Theorem: For random sampling
with “large” n, sampling dist of sample mean
is approximately a normal distribution
• Approx. normality applies no matter what the
shape of the population distribution
• How “large” n needs to be depends on skew of
population dist, but usually n ≥ 30 sufficient
• Can be verified empirically, by simulating with
“sampling distribution” applet at
www.prenhall.com/agresti. Following figure shows
how sampling distribution depends on n and
shape of population distribution.
y
5. Statistical Inference: Estimation
Point estimate: A single statistic value that is the
“best guess” for the parameter value (such as
sample mean as point estimate of popul. mean)
Interval estimate: An interval of numbers around the
point estimate, that has a fixed “confidence level” of
containing the parameter value. Called a
confidence interval.
(Based on sampling dist. of the point estimate, has
form point estimate plus and minus a margin of
error that is a z or t score times the standard error)
Confidence Interval for a Proportion
(in a particular category)
• Sample proportion is a mean when we let y=1 for
observation in category of interest, y=0 otherwise
• Population prop. is mean µ of prob. dist having
• The standard deviation of this prob. dist. is
• The standard error of the sample proportion is
(1) and (0) 1
P P
 
  
(1 ) (e.g., 0.50 when 0.50)
s   
  
ˆ / (1 )/
n n

s s  
  
ˆ

Finding a CI in practice
• Complication: The true standard error
itself depends on the unknown parameter!
ˆ / (1 )/
n n

s s  
  
In practice, we estimate
and then find 95% CI using formula
^
^ ^
1
(1 )
by se
n n

 
 
s
 

 
  
 
ˆ ˆ
1.96( ) to 1.96( )
se se
 
 
CI for a population mean
• For a random sample from a normal population
distribution, a 95% CI for µ is
where df = n-1 for the t-score
• Normal population assumption ensures
sampling dist. has bell shape for any n (Recall
figure on p. 93 of text and next page). Method is
robust to violation of normal assumption, more
so for large n because of CLT.
.025( ), with /
y t se se s n
 
Questions you should be able to answer:
• What is the difference between descriptive statistics and
inferential statistics? Give an example of each.
• Why do we distinguish between quantitative variables and
categorical (nominal and ordinal) variables?
• What is a simple random sample?
• Why should a survey use random sampling, and why should an
experiment use randomization in assigning subjects to
treatments?
• What is the difference between an experiment and an
observational study?
• Define the mean and median, and give an advantage and
disadvantage of each. Give an example when they would be
quite different? Which would be larger, and why?
• How do you interpret a standard deviation?
• Which measures are resistant to outliers: Mean, median,
range, standard deviation, inter-quartile range?
• What information does a box plot give you?
• How can you describe an association between two
(i) quantitative variables, (ii) categorical variables
• What is the correlation used for, and how do you interpret its
value?
• If you know or can estimate P(A) and P(B), how do you find
P(A and B) when A and B are independent?
• How do you use the normal distribution to get tail
probabilities, central probabilities, z-scores?
• What does the z-score represent?
• What is a sampling distribution?
• What is a standard error?
• What’s the difference between s, σ, se. Which is the smallest?
• Why is the normal distribution important in statistics?
• What does the Central Limit Theorem say? Why is it
important in statistical inference?
• What are the differences among (i) population distribution, (ii)
sample data distribution, (iii) sampling distribution?
• How do you interpret a confidence interval?
• Why do standard errors have to be estimated, and what
impact does that have (i) for inference about a proportion, (ii)
for inference about a mean?
• How does the t distribution depend on the df value, and how
does it compare to the standard normal distribution?
• How does a confidence interval depend on the (i) sample
size, (ii) confidence level? How do you find the z-score or t-
score for a particular confidence?
• Why do you use a t-score instead of a z-score when you find
the margin of error for a confidence interval for a mean?

More Related Content

Similar to Review of Chapters 1-5.ppt

Introduction to Statistics2312.ppt
Introduction to Statistics2312.pptIntroduction to Statistics2312.ppt
Introduction to Statistics2312.pptpathianithanaidu
 
Presentation research- chapter 10-11 istiqlal
Presentation research- chapter 10-11 istiqlalPresentation research- chapter 10-11 istiqlal
Presentation research- chapter 10-11 istiqlalIstiqlalEid
 
Statistical Methods in Research
Statistical Methods in ResearchStatistical Methods in Research
Statistical Methods in ResearchManoj Sharma
 
Basics in Epidemiology & Biostatistics 2 RSS6 2014
Basics in Epidemiology & Biostatistics 2 RSS6 2014Basics in Epidemiology & Biostatistics 2 RSS6 2014
Basics in Epidemiology & Biostatistics 2 RSS6 2014RSS6
 
Introduction to Statistics53004300.ppt
Introduction to Statistics53004300.pptIntroduction to Statistics53004300.ppt
Introduction to Statistics53004300.pptTripthiDubey
 
Malimu statistical significance testing.
Malimu statistical significance testing.Malimu statistical significance testing.
Malimu statistical significance testing.Miharbi Ignasm
 
Standard deviation
Standard deviationStandard deviation
Standard deviationM K
 
Stats-Review-Maie-St-John-5-20-2009.ppt
Stats-Review-Maie-St-John-5-20-2009.pptStats-Review-Maie-St-John-5-20-2009.ppt
Stats-Review-Maie-St-John-5-20-2009.pptDiptoKumerSarker1
 
1. complete stats notes
1. complete stats notes1. complete stats notes
1. complete stats notesBob Smullen
 
Basics of biostatistic
Basics of biostatisticBasics of biostatistic
Basics of biostatisticNeurologyKota
 
COM 201_Inferential Statistics_18032022.pptx
COM 201_Inferential Statistics_18032022.pptxCOM 201_Inferential Statistics_18032022.pptx
COM 201_Inferential Statistics_18032022.pptxAkinsolaAyomidotun
 
Research methodology and iostatistics ppt
Research methodology and iostatistics pptResearch methodology and iostatistics ppt
Research methodology and iostatistics pptNikhat Mohammadi
 
Estimation and hypothesis
Estimation and hypothesisEstimation and hypothesis
Estimation and hypothesisJunaid Ijaz
 

Similar to Review of Chapters 1-5.ppt (20)

Introduction to Statistics2312.ppt
Introduction to Statistics2312.pptIntroduction to Statistics2312.ppt
Introduction to Statistics2312.ppt
 
Presentation research- chapter 10-11 istiqlal
Presentation research- chapter 10-11 istiqlalPresentation research- chapter 10-11 istiqlal
Presentation research- chapter 10-11 istiqlal
 
estimation
estimationestimation
estimation
 
Estimation
EstimationEstimation
Estimation
 
Statistical Methods in Research
Statistical Methods in ResearchStatistical Methods in Research
Statistical Methods in Research
 
Basics in Epidemiology & Biostatistics 2 RSS6 2014
Basics in Epidemiology & Biostatistics 2 RSS6 2014Basics in Epidemiology & Biostatistics 2 RSS6 2014
Basics in Epidemiology & Biostatistics 2 RSS6 2014
 
Introduction to Statistics53004300.ppt
Introduction to Statistics53004300.pptIntroduction to Statistics53004300.ppt
Introduction to Statistics53004300.ppt
 
Malimu statistical significance testing.
Malimu statistical significance testing.Malimu statistical significance testing.
Malimu statistical significance testing.
 
week6a.ppt
week6a.pptweek6a.ppt
week6a.ppt
 
Res701 research methodology lecture 7 8-devaprakasam
Res701 research methodology lecture 7 8-devaprakasamRes701 research methodology lecture 7 8-devaprakasam
Res701 research methodology lecture 7 8-devaprakasam
 
Standard deviation
Standard deviationStandard deviation
Standard deviation
 
template.pptx
template.pptxtemplate.pptx
template.pptx
 
Stats-Review-Maie-St-John-5-20-2009.ppt
Stats-Review-Maie-St-John-5-20-2009.pptStats-Review-Maie-St-John-5-20-2009.ppt
Stats-Review-Maie-St-John-5-20-2009.ppt
 
1. complete stats notes
1. complete stats notes1. complete stats notes
1. complete stats notes
 
Basics of biostatistic
Basics of biostatisticBasics of biostatistic
Basics of biostatistic
 
COM 201_Inferential Statistics_18032022.pptx
COM 201_Inferential Statistics_18032022.pptxCOM 201_Inferential Statistics_18032022.pptx
COM 201_Inferential Statistics_18032022.pptx
 
Research methodology and iostatistics ppt
Research methodology and iostatistics pptResearch methodology and iostatistics ppt
Research methodology and iostatistics ppt
 
Biostatistics
BiostatisticsBiostatistics
Biostatistics
 
Estimation and hypothesis
Estimation and hypothesisEstimation and hypothesis
Estimation and hypothesis
 
Statistical analysis
Statistical  analysisStatistical  analysis
Statistical analysis
 

Recently uploaded

Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........LeaCamillePacle
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxChelloAnnAsuncion2
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Quarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayQuarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayMakMakNepo
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
Romantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxRomantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxsqpmdrvczh
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 

Recently uploaded (20)

Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Quarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayQuarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up Friday
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
Romantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxRomantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 

Review of Chapters 1-5.ppt

  • 1. Review of Chapters 1- 5 We review some important themes from the first 5 chapters 1. Introduction • Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods for • Design – Planning / Implementing a study • Description – Graphical and numerical methods for summarizing the data • Inference – Methods for making predictions about a population (total set of subjects of interest, real or conceptual), based on a sample
  • 2. 2. Sampling and Measurement • Variable – a characteristic that can vary in value among subjects in a sample or a population. Types of variables • Categorical • Quantitative • Categorical variables can be ordinal (ordered categories) or nominal (unordered categories) • Quantitative variables can be continuous or discrete • Classifications affect the analysis; e.g., for categorical variables we make inferences about proportions and for quantitative variables we make inferences about means
  • 3. Randomization – obtaining reliable data by reducing potential bias Simple random sample: In a sample survey, each possible sample of size n has the same chance of being selected. Randomization in a survey used to get a good cross- section of population. With such probability sampling methods, standard errors tell us how close sample statistics tend to be to population parameters. (Otherwise, the sampling error is unpredictable.)
  • 4. Experimental vs. observational studies • Sample surveys are examples of observational studies (merely observe subjects without any experimental manipulation) • Experimental studies: Researcher assigns subjects to experimental conditions. – Subjects should be assigned at random to the conditions (“treatments”) – Randomization “balances” treatment groups with respect to lurking variables that could affect response (e.g., demographic characteristics, SES), makes it easier to assess cause and effect
  • 5. 3. Descriptive Statistics • Numerical descriptions of center (mean and median), variability (standard deviation – typical distance from mean), position (quartiles, percentiles) • Bivariate description uses regression/correlation (quantitative variable), contingency table analysis (categorical variables). Graphics include histogram, box plot, scatterplot
  • 6. •Mean drawn toward longer tail for skewed distributions, relative to median. •Properties of the standard deviation s: • s increases with the amount of variation around the mean •s depends on units of the data (e.g. measure euro vs $) •Like mean, affected by outliers •Empirical rule: If distribution approx. bell-shaped, about 68% of data within 1 std. dev. of mean about 95% of data within 2 std. dev. of mean all or nearly all data within 3 std. dev. of mean
  • 7. Sample statistics / Population parameters • We distinguish between summaries of samples (statistics) and summaries of populations (parameters). Denote statistics by Roman letters, parameters by Greek letters: • Population mean =m, standard deviation = s, proportion  are parameters. In practice, parameter values are unknown, we make inferences about their values using sample statistics.
  • 8. 4. Probability Distributions Probability: With random sampling or a randomized experiment, the probability an observation takes a particular value is the proportion of times that outcome would occur in a long sequence of observations. A probability distribution lists all the possible values and their probabilities (which add to 1.0)
  • 9. Like frequency dist’s, probability distributions have mean and standard deviation Standard Deviation - Measures the “typical” distance of an outcome from the mean, denoted by σ ( ) ( ) E Y yP y m   
  • 10. Normal distribution • Symmetric, bell-shaped • Characterized by mean (m) and standard deviation (s), representing center and spread • Prob. within any particular number z of standard deviations of m is same for all normal distributions • An individual observation from an approximately normal distribution satisfies: – Probability 0.68 within 1 standard deviation of mean – 0.95 within 2 standard deviations – 0.997 (virtually all) within 3 standard deviations
  • 11. Notes about z-scores • z-score represents number of standard deviations that a value falls from mean of dist. • A value y is z = (y - µ)/σ standard deviations from µ • The standard normal distribution is the normal dist with µ = 0, σ = 1
  • 12. Sampling dist. of sample mean • is a variable, its value varying from sample to sample about population mean µ. Sampling distribution of a statistic is the probability distribution for the possible values of the statistic • Standard deviation of sampling dist of is called the standard error of • For random sampling, the sampling dist. of has mean µ and standard error popul. std. dev. sample size y n s s   y y y y y
  • 13. Central Limit Theorem: For random sampling with “large” n, sampling dist of sample mean is approximately a normal distribution • Approx. normality applies no matter what the shape of the population distribution • How “large” n needs to be depends on skew of population dist, but usually n ≥ 30 sufficient • Can be verified empirically, by simulating with “sampling distribution” applet at www.prenhall.com/agresti. Following figure shows how sampling distribution depends on n and shape of population distribution. y
  • 14.
  • 15. 5. Statistical Inference: Estimation Point estimate: A single statistic value that is the “best guess” for the parameter value (such as sample mean as point estimate of popul. mean) Interval estimate: An interval of numbers around the point estimate, that has a fixed “confidence level” of containing the parameter value. Called a confidence interval. (Based on sampling dist. of the point estimate, has form point estimate plus and minus a margin of error that is a z or t score times the standard error)
  • 16. Confidence Interval for a Proportion (in a particular category) • Sample proportion is a mean when we let y=1 for observation in category of interest, y=0 otherwise • Population prop. is mean µ of prob. dist having • The standard deviation of this prob. dist. is • The standard error of the sample proportion is (1) and (0) 1 P P      (1 ) (e.g., 0.50 when 0.50) s       ˆ / (1 )/ n n  s s      ˆ 
  • 17. Finding a CI in practice • Complication: The true standard error itself depends on the unknown parameter! ˆ / (1 )/ n n  s s      In practice, we estimate and then find 95% CI using formula ^ ^ ^ 1 (1 ) by se n n      s           ˆ ˆ 1.96( ) to 1.96( ) se se    
  • 18. CI for a population mean • For a random sample from a normal population distribution, a 95% CI for µ is where df = n-1 for the t-score • Normal population assumption ensures sampling dist. has bell shape for any n (Recall figure on p. 93 of text and next page). Method is robust to violation of normal assumption, more so for large n because of CLT. .025( ), with / y t se se s n  
  • 19. Questions you should be able to answer: • What is the difference between descriptive statistics and inferential statistics? Give an example of each. • Why do we distinguish between quantitative variables and categorical (nominal and ordinal) variables? • What is a simple random sample? • Why should a survey use random sampling, and why should an experiment use randomization in assigning subjects to treatments? • What is the difference between an experiment and an observational study? • Define the mean and median, and give an advantage and disadvantage of each. Give an example when they would be quite different? Which would be larger, and why? • How do you interpret a standard deviation?
  • 20. • Which measures are resistant to outliers: Mean, median, range, standard deviation, inter-quartile range? • What information does a box plot give you? • How can you describe an association between two (i) quantitative variables, (ii) categorical variables • What is the correlation used for, and how do you interpret its value? • If you know or can estimate P(A) and P(B), how do you find P(A and B) when A and B are independent? • How do you use the normal distribution to get tail probabilities, central probabilities, z-scores? • What does the z-score represent? • What is a sampling distribution? • What is a standard error? • What’s the difference between s, σ, se. Which is the smallest? • Why is the normal distribution important in statistics?
  • 21. • What does the Central Limit Theorem say? Why is it important in statistical inference? • What are the differences among (i) population distribution, (ii) sample data distribution, (iii) sampling distribution? • How do you interpret a confidence interval? • Why do standard errors have to be estimated, and what impact does that have (i) for inference about a proportion, (ii) for inference about a mean? • How does the t distribution depend on the df value, and how does it compare to the standard normal distribution? • How does a confidence interval depend on the (i) sample size, (ii) confidence level? How do you find the z-score or t- score for a particular confidence? • Why do you use a t-score instead of a z-score when you find the margin of error for a confidence interval for a mean?