SlideShare a Scribd company logo
1 of 62
Basics of Statistics
Experimental design,
& sample size determination
1
2
Research Process
Research question
Hypothesis
Identify research design
Data collection
Presentation of data
Data analysis
Interpretation of data
3
4
Types of variables:
Nominal
Qualitative
or Categorical,
or Attribute Ordinal
Variable
Discrete
Quantitative
or Numerical
Continuous
5
Parametric-vs-Non-Parametric-Test-Hierarchy
Statistical Tests
Statistic Test Type of Data Needed Test Statistic Example
Correlation Two continuous
variables
Pearson’s r Are blood pressure and
weight correlated?
T-
tests/ANOVA
Means from a
continuous variable
taken from two or
more groups
Student’s t Do normal weight (group
1) patients have lower
blood pressure than
obese patients (group 2)?
Chi-square Two categorical
variables
Chi-square X2 Are obese individuals
(obese vs. not obese)
significantly more likely to
have a stroke (stroke vs.
no stroke)?
Logistic
Regression
A dichotomous
variable as the
outcome
Odds Ratios
(OR) & 95%
Confidence
Intervals (CI)
Does obesity predict
stroke (stroke vs. no
stroke) when controlling
for other variables?
8
Types of Statistics/Analyses
Descriptive Statistics
– Frequencies
– Basic measurements
Inferential Statistics
– Hypothesis Testing
– Correlation
– Confidence Intervals
– Significance Testing (T-tests)
– Chi-square
– Prediction (Logistic Regression )
Describing a phenomena
How many? How much?
BP, HR, BMI, IQ, etc.
Inferences about a phenomena
Proving or disproving theories
Associations between phenomena
If sample relates to the larger
population
E.g., Diet and health
Basic principles
1. Formulate question/goal in advance
2. Comparison/control
3. Replication
4. Randomization
5. Stratification (aka blocking)
6. Factorial experiments
9
Example
Question: Does salted drinking water affect blood
pressure (BP) in mice?
Experiment:
1. Provide a mouse with water containing 1% NaCl.
2. Wait 14 days.
3. Measure BP.
10
Comparison/control
Good experiments are comparative.
• Compare BP in mice fed salt water to BP in mice
fed plain water.
• Compare BP in strain A mice fed salt water to BP
in strain B mice fed salt water.
Ideally, the experimental group is compared to
concurrent controls (rather than to historical controls).
11
Replication
12
Why replicate?
• Reduce the effect of uncontrolled variation
(i.e., increase precision).
• Quantify uncertainty.
A related point:
An estimate is of no value without some
statement of the uncertainty in the estimate.
13
Randomization
Experimental subjects (“units”) should be assigned to
treatment groups at random.
At random does not mean haphazardly.
One needs to explicitly randomize using
• A computer, or
• Coins, dice or cards.
14
Why randomize?
• Avoid bias.
– For example: the first six mice you grab may have
intrinsically higher BP.
• Control the role of chance.
– Randomization allows the later use of probability
theory, and so gives a solid foundation for
statistical analysis.
15
Stratification
• Suppose that some BP measurements will be made
in the morning and some in the afternoon.
• If you anticipate a difference between morning and
afternoon measurements:
– Ensure that within each period, there are equal
numbers of subjects in each treatment group.
– Take account of the difference between periods in
your analysis.
• This is sometimes called “blocking”.
16
Example
• 20 male mice and 20 female mice.
• Half to be treated; the other half left untreated.
• Can only work with 4 mice per day.
Question: How to assign individuals to treatment
groups and to days?
17
An extremely
bad design
18
Randomized
19
A stratified design
20
Randomization
and stratification
• If you can (and want to), fix a variable.
– e.g., use only 8 week old male mice from a single
strain.
• If you don’t fix a variable, stratify it.
– e.g., use both 8 week and 12 week old male mice,
and stratify with respect to age.
• If you can neither fix nor stratify a variable, randomize it.
21
Factorial
experiments
Suppose we are interested in the effect of both salt
water and a high-fat diet on blood pressure.
Ideally: look at all 4 treatments in one experiment.
Plain water Normal diet
Salt water High-fat diet
Why?
– We can learn more.
– More efficient than doing all single-factor
experiments.

22
Interactions
23
Other points
• Blinding
– Measurements made by people can be influenced
by unconscious biases.
– Ideally, dissections and measurements should be
made without knowledge of the treatment applied.
• Internal controls
– It can be useful to use the subjects themselves as
their own controls (e.g., consider the response
after vs. before treatment).
– Why? Increased precision.
24
Other points
• Representativeness
– Are the subjects/tissues you are studying really
representative of the population you want to
study?
– Ideally, your study material is a random sample
from the population of interest.
25
Summary
• Unbiased
– Randomization
– Blinding
• High precision
– Uniform material
– Replication
– Stratification
• Simple
– Protect against mistakes
• Wide range of applicability
– Deliberate variation
– Factorial designs
• Able to estimate uncertainty
– Replication
– Randomization
Characteristics of good experiments:
26
Basic statistics
27
What is statistics?
“We may at once admit that any inference from the
particular to the general must be attended with some
degree of uncertainty, but this is not the same as to
admit that such inference cannot be absolutely
rigorous, for the nature and degree of the uncertainty
may itself be capable of rigorous expression.”
— Sir R. A. Fisher
28
What is statistics?
• Data exploration and analysis
• Inductive inference with probability
• Quantification of uncertainty
29
Example
• We have data on the blood
pressure (BP) of 6 mice.
• We are not interested in
these particular 6 mice.
• Rather, we want to make
inferences about the BP of
all possible such mice.
30
Sampling
31
Several samples
32
Distribution of
sample average
33
• We observe the BP of 6 mice, with average = 103.6
and standard deviation (SD) = 9.7.
• We assume that BP in the underlying population
follows a normal (aka Gaussian) distribution.
• On the basis of these data, we calculate a 95%
confidence interval (CI) for the underlying average
BP:
103.6 ± 10.2 = (93.4 to 113.8)
Confidence
intervals
34
What is a CI?
• The plausible values for the underlying population
average BP, given the data on the six mice.
• In advance, there is a 95% chance of obtaining an
interval that contains the population average.
35
100 CIs
36
CI for difference
95% CI for treatment effect = 12.6 ± 11.5
37
Significance tests
Confidence interval:
The plausible values for the effect of salt water on BP.
Test of statistical significance:
Answer the question, “Does salt water have an effect?”
Null hypothesis (H0): Salt water has no effect on BP.
Alt. hypothesis (Ha): Salt water does have an effect.
38
Two possible errors
• Type I error (“false positive” or a) Find a significant
difference even though one does not exist. Usually set at
0.05 (5%) or 0.01 (1%). Conclude that salt water has an
effect on BP when, in fact, it does not have an effect.
• Type II error (“false negative” or b) Fail to find a
significant difference even though one exists. Usually set
at 0.20 (20%). Power = 1 – b (ie usually 80%). Conclude
that salt water has no effect on BP when, in fact, salt
water really does have an effect on BP.
39
Type I and II errors
The truth
Conclusion No effect Has an effect
Reject H0 Type I error 
Fail to reject H0  Type II error
40
• Calculate a test statistic using the data.
(For example, we could look at the difference
between the average BP in the treated and control
groups; let’s call this D.)
• If this statistic, D, is large, the treatment appears to
have some effect.
• How large is large?
– We compare the observed statistic to its
distribution if the treatment had no effect.
Conducting the test
41
Significance level
• We seek to control the rate of type I errors.
• Significance level (usually denoted a) = chance you
reject H0, if H0 is true; usually we take a = 5%.
• We reject H0 when |D| > C, for some C.
• C is chosen so that, if H0 is true, the chance that
|D| > C is a.
42
If salt has no effect
43
If salt has an effect
44
P-values
• A P-value is the probability of obtaining data as
extreme as was observed, if the null hypothesis were
true (i.e., if the treatment has no effect).
• If your P-value is smaller than your chosen
significance level (a), you reject the null hypothesis.
• We seek to reject the null hypothesis (we seek to
show that there is a treatment effect), and so small P-
values are good.
45
Summary
• Confidence interval
– Plausible values for the true population average or treatment
effect, given the observed data.
• Test of statistical significance
– Use the observed data to answer a yes/no question, such as
“Does the treatment have an effect?”
• P-value
– Summarizes the result of the significance test.
– Small P-value  conclude that there is an effect.
Never cite a P-value without a confidence interval.
46
Data presentation
0
5
10
15
20
25
30
35
40
A B
Group
Bad plot
Good plot
47
Data presentation
Treatment Mean (SEM)
A 11.3 (0.6)
B 13.5 (0.8)
C 14.7 (0.6)
Treatment Mean (SEM)
A 11.2965 (0.63)
B 13.49 (0.7913)
C 14.727 (0.6108)
Good table Bad table
48
Sample size determination
49
Fundamental
formula
sample
per
$
available
$
n 
50
Listen to the IACUC
Too few animals  a total waste
Too many animals  a partial waste
51
Significance test
• Compare the BP of 6 mice
fed salt water to 6 mice fed
plain water.
•  = true difference in
average BP (the treatment
effect).
• H0:  = 0 (i.e., no effect)
• Test statistic, D.
• If |D| > C, reject H0.
• C chosen so that the chance
you reject H0, if H0 is true, is
5%
Distribution of D
when  = 0
52
Statistical power
Power = The chance that you reject H0 when H0 is false
(i.e., you [correctly] conclude that there is a treatment
effect when there really is a treatment effect).
53
Power depends on…
• The structure of the experiment
• The method for analyzing the data
• The size of the true underlying effect
• The variability in the measurements
• The chosen significance level (a)
• The sample size
Note: We usually try to determine the sample size to
give a particular power (often 80%).
54
Effect of sample size
6 per group:
12 per group:
Power = 94%
Power = 70%
55
Effect of the effect
 = 8.5:
 = 12.5:
Power = 70%
Power = 96%
56
A formula
  2
n 2
1
2
/
2









 b
a

z
z
57
Various effects
• Desired power   sample size 
• Stringency of statistical test   sample size 
• Measurement variability   sample size 
• Treatment effect   sample size 
58
Determining
sample size
The things you need to know:
• Structure of the experiment
• Method for analysis
• Chosen significance level, a (usually 5%)
• Desired power (usually 80%)
• Variability in the measurements
– if necessary, perform a pilot study, or use data from prior
publications
• The smallest meaningful effect
59
Reducing sample size
• Reduce the number of treatment groups being
compared.
• Find a more precise measurement (e.g., average
time to effect rather than proportion sick).
• Decrease the variability in the measurements.
– Make subjects more homogeneous.
– Use stratification.
– Control for other variables (e.g., weight).
– Average multiple measurements on each subject.
60
Final conclusions
• Experiments should be designed.
• Good design and good analysis can lead to reduced
sample sizes.
• Consult an expert on both the analysis and the
design of your experiment.
61
Resources
• ML Samuels, JA Witmer (2003) Statistics for the Life Sciences,
3rd edition. Prentice Hall.
– An excellent introductory text.
• GW Oehlert (2000) A First Course in Design and Analysis of
Experiments. WH Freeman & Co.
– Includes a more advanced treatment of experimental design.
• Course: Statistics for Laboratory Scientists (Biostatistics
140.615-616, Johns Hopkins Bloomberg Sch. Pub. Health)
– Introductory statistics course, intended for experimental
scientists.
– Greatly expands upon the topics presented here.
62

More Related Content

What's hot

Exploratory factor analysis
Exploratory factor analysisExploratory factor analysis
Exploratory factor analysisAmmar Pervaiz
 
Data analysis 1
Data analysis 1Data analysis 1
Data analysis 1Bùi Trâm
 
項目反応理論による尺度運用
項目反応理論による尺度運用項目反応理論による尺度運用
項目反応理論による尺度運用Yoshitake Takebayashi
 
Accelerated Stability During Formulation Development of Early Stage Protein T...
Accelerated Stability During Formulation Development of Early Stage Protein T...Accelerated Stability During Formulation Development of Early Stage Protein T...
Accelerated Stability During Formulation Development of Early Stage Protein T...KBI Biopharma
 
5.7 poisson regression in the analysis of cohort data
5.7 poisson regression in the analysis of  cohort data5.7 poisson regression in the analysis of  cohort data
5.7 poisson regression in the analysis of cohort dataA M
 
Hypothesis Testing
Hypothesis TestingHypothesis Testing
Hypothesis TestingTushar Kumar
 
Design of Experiments
Design of Experiments Design of Experiments
Design of Experiments Furk Kruf
 
Multiple Linear Regression II and ANOVA I
Multiple Linear Regression II and ANOVA IMultiple Linear Regression II and ANOVA I
Multiple Linear Regression II and ANOVA IJames Neill
 
Exploratory Factor Analysis
Exploratory Factor AnalysisExploratory Factor Analysis
Exploratory Factor AnalysisMark Ng
 
Null hypothesis for Pearson Correlation (independence)
Null hypothesis for Pearson Correlation (independence)Null hypothesis for Pearson Correlation (independence)
Null hypothesis for Pearson Correlation (independence)Ken Plummer
 
Bernoullis Random Variables And Binomial Distribution
Bernoullis Random Variables And Binomial DistributionBernoullis Random Variables And Binomial Distribution
Bernoullis Random Variables And Binomial DistributionDataminingTools Inc
 
Introduction to Generalized Linear Models
Introduction to Generalized Linear ModelsIntroduction to Generalized Linear Models
Introduction to Generalized Linear Modelsrichardchandler
 
Fitting Data into Probability Distributions
Fitting Data into Probability DistributionsFitting Data into Probability Distributions
Fitting Data into Probability DistributionsNikhil Chandra Sarkar
 

What's hot (16)

Exploratory factor analysis
Exploratory factor analysisExploratory factor analysis
Exploratory factor analysis
 
Binomial distribution
Binomial distributionBinomial distribution
Binomial distribution
 
Data analysis 1
Data analysis 1Data analysis 1
Data analysis 1
 
項目反応理論による尺度運用
項目反応理論による尺度運用項目反応理論による尺度運用
項目反応理論による尺度運用
 
Bayesian statistics
Bayesian statisticsBayesian statistics
Bayesian statistics
 
Accelerated Stability During Formulation Development of Early Stage Protein T...
Accelerated Stability During Formulation Development of Early Stage Protein T...Accelerated Stability During Formulation Development of Early Stage Protein T...
Accelerated Stability During Formulation Development of Early Stage Protein T...
 
5.7 poisson regression in the analysis of cohort data
5.7 poisson regression in the analysis of  cohort data5.7 poisson regression in the analysis of  cohort data
5.7 poisson regression in the analysis of cohort data
 
Hypothesis Testing
Hypothesis TestingHypothesis Testing
Hypothesis Testing
 
Design of Experiments
Design of Experiments Design of Experiments
Design of Experiments
 
Multiple Linear Regression II and ANOVA I
Multiple Linear Regression II and ANOVA IMultiple Linear Regression II and ANOVA I
Multiple Linear Regression II and ANOVA I
 
Exploratory Factor Analysis
Exploratory Factor AnalysisExploratory Factor Analysis
Exploratory Factor Analysis
 
Narrative research
Narrative researchNarrative research
Narrative research
 
Null hypothesis for Pearson Correlation (independence)
Null hypothesis for Pearson Correlation (independence)Null hypothesis for Pearson Correlation (independence)
Null hypothesis for Pearson Correlation (independence)
 
Bernoullis Random Variables And Binomial Distribution
Bernoullis Random Variables And Binomial DistributionBernoullis Random Variables And Binomial Distribution
Bernoullis Random Variables And Binomial Distribution
 
Introduction to Generalized Linear Models
Introduction to Generalized Linear ModelsIntroduction to Generalized Linear Models
Introduction to Generalized Linear Models
 
Fitting Data into Probability Distributions
Fitting Data into Probability DistributionsFitting Data into Probability Distributions
Fitting Data into Probability Distributions
 

Similar to Basics of Statistics.pptx

Statistics basics for oncologist kiran
Statistics basics for oncologist kiranStatistics basics for oncologist kiran
Statistics basics for oncologist kiranKiran Ramakrishna
 
Chapter 28 clincal trials
Chapter 28 clincal trials Chapter 28 clincal trials
Chapter 28 clincal trials Nilesh Kucha
 
Parametric tests
Parametric testsParametric tests
Parametric testsheena45
 
P-values the gold measure of statistical validity are not as reliable as many...
P-values the gold measure of statistical validity are not as reliable as many...P-values the gold measure of statistical validity are not as reliable as many...
P-values the gold measure of statistical validity are not as reliable as many...David Pratap
 
Research method ch07 statistical methods 1
Research method ch07 statistical methods 1Research method ch07 statistical methods 1
Research method ch07 statistical methods 1naranbatn
 
Application of statistical tests in Biomedical Research .pptx
Application of statistical tests in Biomedical Research .pptxApplication of statistical tests in Biomedical Research .pptx
Application of statistical tests in Biomedical Research .pptxHalim AS
 
ChandanChakrabarty_1.pdf
ChandanChakrabarty_1.pdfChandanChakrabarty_1.pdf
ChandanChakrabarty_1.pdfDikshathawait
 
SAMPLE SIZE CALCULATION IN DIFFERENT STUDY DESIGNS AT.pptx
SAMPLE SIZE CALCULATION IN DIFFERENT STUDY DESIGNS AT.pptxSAMPLE SIZE CALCULATION IN DIFFERENT STUDY DESIGNS AT.pptx
SAMPLE SIZE CALCULATION IN DIFFERENT STUDY DESIGNS AT.pptxssuserd509321
 
Lecture 7 Hypothesis testing.pptx
Lecture 7 Hypothesis testing.pptxLecture 7 Hypothesis testing.pptx
Lecture 7 Hypothesis testing.pptxshakirRahman10
 
Common statistical tests and applications in epidemiological literature
Common statistical tests and applications in epidemiological literatureCommon statistical tests and applications in epidemiological literature
Common statistical tests and applications in epidemiological literatureKadium
 
Common statistical tests and applications in epidemiological literature
Common statistical tests and applications in epidemiological literatureCommon statistical tests and applications in epidemiological literature
Common statistical tests and applications in epidemiological literatureKadium
 
Common statistical tests and applications in epidemiological literature
Common statistical tests and applications in epidemiological literatureCommon statistical tests and applications in epidemiological literature
Common statistical tests and applications in epidemiological literatureKadium
 
Soni_Biostatistics.ppt
Soni_Biostatistics.pptSoni_Biostatistics.ppt
Soni_Biostatistics.pptOgunsina1
 
Crash Course in A/B testing
Crash Course in A/B testingCrash Course in A/B testing
Crash Course in A/B testingWayne Lee
 
Bio-Statistics in Bio-Medical research
Bio-Statistics in Bio-Medical researchBio-Statistics in Bio-Medical research
Bio-Statistics in Bio-Medical researchShinjan Patra
 

Similar to Basics of Statistics.pptx (20)

Statistics basics for oncologist kiran
Statistics basics for oncologist kiranStatistics basics for oncologist kiran
Statistics basics for oncologist kiran
 
Chapter 28 clincal trials
Chapter 28 clincal trials Chapter 28 clincal trials
Chapter 28 clincal trials
 
Parametric tests
Parametric testsParametric tests
Parametric tests
 
P-values the gold measure of statistical validity are not as reliable as many...
P-values the gold measure of statistical validity are not as reliable as many...P-values the gold measure of statistical validity are not as reliable as many...
P-values the gold measure of statistical validity are not as reliable as many...
 
Research method ch07 statistical methods 1
Research method ch07 statistical methods 1Research method ch07 statistical methods 1
Research method ch07 statistical methods 1
 
How to do the maths
How to do the mathsHow to do the maths
How to do the maths
 
Application of statistical tests in Biomedical Research .pptx
Application of statistical tests in Biomedical Research .pptxApplication of statistical tests in Biomedical Research .pptx
Application of statistical tests in Biomedical Research .pptx
 
Biostatistics
BiostatisticsBiostatistics
Biostatistics
 
ChandanChakrabarty_1.pdf
ChandanChakrabarty_1.pdfChandanChakrabarty_1.pdf
ChandanChakrabarty_1.pdf
 
SAMPLE SIZE CALCULATION IN DIFFERENT STUDY DESIGNS AT.pptx
SAMPLE SIZE CALCULATION IN DIFFERENT STUDY DESIGNS AT.pptxSAMPLE SIZE CALCULATION IN DIFFERENT STUDY DESIGNS AT.pptx
SAMPLE SIZE CALCULATION IN DIFFERENT STUDY DESIGNS AT.pptx
 
Lecture 7 Hypothesis testing.pptx
Lecture 7 Hypothesis testing.pptxLecture 7 Hypothesis testing.pptx
Lecture 7 Hypothesis testing.pptx
 
Common statistical tests and applications in epidemiological literature
Common statistical tests and applications in epidemiological literatureCommon statistical tests and applications in epidemiological literature
Common statistical tests and applications in epidemiological literature
 
Common statistical tests and applications in epidemiological literature
Common statistical tests and applications in epidemiological literatureCommon statistical tests and applications in epidemiological literature
Common statistical tests and applications in epidemiological literature
 
Common statistical tests and applications in epidemiological literature
Common statistical tests and applications in epidemiological literatureCommon statistical tests and applications in epidemiological literature
Common statistical tests and applications in epidemiological literature
 
3b. Introductory Statistics - Julia Saperia
3b. Introductory Statistics - Julia Saperia3b. Introductory Statistics - Julia Saperia
3b. Introductory Statistics - Julia Saperia
 
Lund 2009
Lund 2009Lund 2009
Lund 2009
 
Soni_Biostatistics.ppt
Soni_Biostatistics.pptSoni_Biostatistics.ppt
Soni_Biostatistics.ppt
 
Crash Course in A/B testing
Crash Course in A/B testingCrash Course in A/B testing
Crash Course in A/B testing
 
Hypo
HypoHypo
Hypo
 
Bio-Statistics in Bio-Medical research
Bio-Statistics in Bio-Medical researchBio-Statistics in Bio-Medical research
Bio-Statistics in Bio-Medical research
 

More from Ammar Sami AL-Bayati

More from Ammar Sami AL-Bayati (9)

Plant Physiology in Brief.ppt
Plant Physiology in Brief.pptPlant Physiology in Brief.ppt
Plant Physiology in Brief.ppt
 
Understanding Horticulture Plant Physiology.ppt
Understanding Horticulture Plant Physiology.pptUnderstanding Horticulture Plant Physiology.ppt
Understanding Horticulture Plant Physiology.ppt
 
Breeding Of Field & Horticultural Crops-2016.pdf
Breeding Of Field & Horticultural Crops-2016.pdfBreeding Of Field & Horticultural Crops-2016.pdf
Breeding Of Field & Horticultural Crops-2016.pdf
 
Horticulture Quiz 6.pdf
Horticulture Quiz 6.pdfHorticulture Quiz 6.pdf
Horticulture Quiz 6.pdf
 
Horticulture Quiz 2.pdf
Horticulture Quiz 2.pdfHorticulture Quiz 2.pdf
Horticulture Quiz 2.pdf
 
Horticulture Quiz 5.pdf
Horticulture Quiz 5.pdfHorticulture Quiz 5.pdf
Horticulture Quiz 5.pdf
 
Horticulture Quiz 1.pdf
Horticulture Quiz 1.pdfHorticulture Quiz 1.pdf
Horticulture Quiz 1.pdf
 
Horticulture Quiz 3.pdf
Horticulture Quiz 3.pdfHorticulture Quiz 3.pdf
Horticulture Quiz 3.pdf
 
Horticulture Quiz 4.pdf
Horticulture Quiz 4.pdfHorticulture Quiz 4.pdf
Horticulture Quiz 4.pdf
 

Recently uploaded

“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentInMediaRes1
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitolTechU
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerunnathinaik
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 

Recently uploaded (20)

“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media Component
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptx
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developer
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 

Basics of Statistics.pptx

  • 1. Basics of Statistics Experimental design, & sample size determination 1
  • 2. 2 Research Process Research question Hypothesis Identify research design Data collection Presentation of data Data analysis Interpretation of data
  • 3. 3
  • 4. 4 Types of variables: Nominal Qualitative or Categorical, or Attribute Ordinal Variable Discrete Quantitative or Numerical Continuous
  • 5. 5
  • 7. Statistical Tests Statistic Test Type of Data Needed Test Statistic Example Correlation Two continuous variables Pearson’s r Are blood pressure and weight correlated? T- tests/ANOVA Means from a continuous variable taken from two or more groups Student’s t Do normal weight (group 1) patients have lower blood pressure than obese patients (group 2)? Chi-square Two categorical variables Chi-square X2 Are obese individuals (obese vs. not obese) significantly more likely to have a stroke (stroke vs. no stroke)? Logistic Regression A dichotomous variable as the outcome Odds Ratios (OR) & 95% Confidence Intervals (CI) Does obesity predict stroke (stroke vs. no stroke) when controlling for other variables?
  • 8. 8 Types of Statistics/Analyses Descriptive Statistics – Frequencies – Basic measurements Inferential Statistics – Hypothesis Testing – Correlation – Confidence Intervals – Significance Testing (T-tests) – Chi-square – Prediction (Logistic Regression ) Describing a phenomena How many? How much? BP, HR, BMI, IQ, etc. Inferences about a phenomena Proving or disproving theories Associations between phenomena If sample relates to the larger population E.g., Diet and health
  • 9. Basic principles 1. Formulate question/goal in advance 2. Comparison/control 3. Replication 4. Randomization 5. Stratification (aka blocking) 6. Factorial experiments 9
  • 10. Example Question: Does salted drinking water affect blood pressure (BP) in mice? Experiment: 1. Provide a mouse with water containing 1% NaCl. 2. Wait 14 days. 3. Measure BP. 10
  • 11. Comparison/control Good experiments are comparative. • Compare BP in mice fed salt water to BP in mice fed plain water. • Compare BP in strain A mice fed salt water to BP in strain B mice fed salt water. Ideally, the experimental group is compared to concurrent controls (rather than to historical controls). 11
  • 13. Why replicate? • Reduce the effect of uncontrolled variation (i.e., increase precision). • Quantify uncertainty. A related point: An estimate is of no value without some statement of the uncertainty in the estimate. 13
  • 14. Randomization Experimental subjects (“units”) should be assigned to treatment groups at random. At random does not mean haphazardly. One needs to explicitly randomize using • A computer, or • Coins, dice or cards. 14
  • 15. Why randomize? • Avoid bias. – For example: the first six mice you grab may have intrinsically higher BP. • Control the role of chance. – Randomization allows the later use of probability theory, and so gives a solid foundation for statistical analysis. 15
  • 16. Stratification • Suppose that some BP measurements will be made in the morning and some in the afternoon. • If you anticipate a difference between morning and afternoon measurements: – Ensure that within each period, there are equal numbers of subjects in each treatment group. – Take account of the difference between periods in your analysis. • This is sometimes called “blocking”. 16
  • 17. Example • 20 male mice and 20 female mice. • Half to be treated; the other half left untreated. • Can only work with 4 mice per day. Question: How to assign individuals to treatment groups and to days? 17
  • 21. Randomization and stratification • If you can (and want to), fix a variable. – e.g., use only 8 week old male mice from a single strain. • If you don’t fix a variable, stratify it. – e.g., use both 8 week and 12 week old male mice, and stratify with respect to age. • If you can neither fix nor stratify a variable, randomize it. 21
  • 22. Factorial experiments Suppose we are interested in the effect of both salt water and a high-fat diet on blood pressure. Ideally: look at all 4 treatments in one experiment. Plain water Normal diet Salt water High-fat diet Why? – We can learn more. – More efficient than doing all single-factor experiments.  22
  • 24. Other points • Blinding – Measurements made by people can be influenced by unconscious biases. – Ideally, dissections and measurements should be made without knowledge of the treatment applied. • Internal controls – It can be useful to use the subjects themselves as their own controls (e.g., consider the response after vs. before treatment). – Why? Increased precision. 24
  • 25. Other points • Representativeness – Are the subjects/tissues you are studying really representative of the population you want to study? – Ideally, your study material is a random sample from the population of interest. 25
  • 26. Summary • Unbiased – Randomization – Blinding • High precision – Uniform material – Replication – Stratification • Simple – Protect against mistakes • Wide range of applicability – Deliberate variation – Factorial designs • Able to estimate uncertainty – Replication – Randomization Characteristics of good experiments: 26
  • 28. What is statistics? “We may at once admit that any inference from the particular to the general must be attended with some degree of uncertainty, but this is not the same as to admit that such inference cannot be absolutely rigorous, for the nature and degree of the uncertainty may itself be capable of rigorous expression.” — Sir R. A. Fisher 28
  • 29. What is statistics? • Data exploration and analysis • Inductive inference with probability • Quantification of uncertainty 29
  • 30. Example • We have data on the blood pressure (BP) of 6 mice. • We are not interested in these particular 6 mice. • Rather, we want to make inferences about the BP of all possible such mice. 30
  • 34. • We observe the BP of 6 mice, with average = 103.6 and standard deviation (SD) = 9.7. • We assume that BP in the underlying population follows a normal (aka Gaussian) distribution. • On the basis of these data, we calculate a 95% confidence interval (CI) for the underlying average BP: 103.6 ± 10.2 = (93.4 to 113.8) Confidence intervals 34
  • 35. What is a CI? • The plausible values for the underlying population average BP, given the data on the six mice. • In advance, there is a 95% chance of obtaining an interval that contains the population average. 35
  • 37. CI for difference 95% CI for treatment effect = 12.6 ± 11.5 37
  • 38. Significance tests Confidence interval: The plausible values for the effect of salt water on BP. Test of statistical significance: Answer the question, “Does salt water have an effect?” Null hypothesis (H0): Salt water has no effect on BP. Alt. hypothesis (Ha): Salt water does have an effect. 38
  • 39. Two possible errors • Type I error (“false positive” or a) Find a significant difference even though one does not exist. Usually set at 0.05 (5%) or 0.01 (1%). Conclude that salt water has an effect on BP when, in fact, it does not have an effect. • Type II error (“false negative” or b) Fail to find a significant difference even though one exists. Usually set at 0.20 (20%). Power = 1 – b (ie usually 80%). Conclude that salt water has no effect on BP when, in fact, salt water really does have an effect on BP. 39
  • 40. Type I and II errors The truth Conclusion No effect Has an effect Reject H0 Type I error  Fail to reject H0  Type II error 40
  • 41. • Calculate a test statistic using the data. (For example, we could look at the difference between the average BP in the treated and control groups; let’s call this D.) • If this statistic, D, is large, the treatment appears to have some effect. • How large is large? – We compare the observed statistic to its distribution if the treatment had no effect. Conducting the test 41
  • 42. Significance level • We seek to control the rate of type I errors. • Significance level (usually denoted a) = chance you reject H0, if H0 is true; usually we take a = 5%. • We reject H0 when |D| > C, for some C. • C is chosen so that, if H0 is true, the chance that |D| > C is a. 42
  • 43. If salt has no effect 43
  • 44. If salt has an effect 44
  • 45. P-values • A P-value is the probability of obtaining data as extreme as was observed, if the null hypothesis were true (i.e., if the treatment has no effect). • If your P-value is smaller than your chosen significance level (a), you reject the null hypothesis. • We seek to reject the null hypothesis (we seek to show that there is a treatment effect), and so small P- values are good. 45
  • 46. Summary • Confidence interval – Plausible values for the true population average or treatment effect, given the observed data. • Test of statistical significance – Use the observed data to answer a yes/no question, such as “Does the treatment have an effect?” • P-value – Summarizes the result of the significance test. – Small P-value  conclude that there is an effect. Never cite a P-value without a confidence interval. 46
  • 48. Data presentation Treatment Mean (SEM) A 11.3 (0.6) B 13.5 (0.8) C 14.7 (0.6) Treatment Mean (SEM) A 11.2965 (0.63) B 13.49 (0.7913) C 14.727 (0.6108) Good table Bad table 48
  • 51. Listen to the IACUC Too few animals  a total waste Too many animals  a partial waste 51
  • 52. Significance test • Compare the BP of 6 mice fed salt water to 6 mice fed plain water. •  = true difference in average BP (the treatment effect). • H0:  = 0 (i.e., no effect) • Test statistic, D. • If |D| > C, reject H0. • C chosen so that the chance you reject H0, if H0 is true, is 5% Distribution of D when  = 0 52
  • 53. Statistical power Power = The chance that you reject H0 when H0 is false (i.e., you [correctly] conclude that there is a treatment effect when there really is a treatment effect). 53
  • 54. Power depends on… • The structure of the experiment • The method for analyzing the data • The size of the true underlying effect • The variability in the measurements • The chosen significance level (a) • The sample size Note: We usually try to determine the sample size to give a particular power (often 80%). 54
  • 55. Effect of sample size 6 per group: 12 per group: Power = 94% Power = 70% 55
  • 56. Effect of the effect  = 8.5:  = 12.5: Power = 70% Power = 96% 56
  • 57. A formula   2 n 2 1 2 / 2           b a  z z 57
  • 58. Various effects • Desired power   sample size  • Stringency of statistical test   sample size  • Measurement variability   sample size  • Treatment effect   sample size  58
  • 59. Determining sample size The things you need to know: • Structure of the experiment • Method for analysis • Chosen significance level, a (usually 5%) • Desired power (usually 80%) • Variability in the measurements – if necessary, perform a pilot study, or use data from prior publications • The smallest meaningful effect 59
  • 60. Reducing sample size • Reduce the number of treatment groups being compared. • Find a more precise measurement (e.g., average time to effect rather than proportion sick). • Decrease the variability in the measurements. – Make subjects more homogeneous. – Use stratification. – Control for other variables (e.g., weight). – Average multiple measurements on each subject. 60
  • 61. Final conclusions • Experiments should be designed. • Good design and good analysis can lead to reduced sample sizes. • Consult an expert on both the analysis and the design of your experiment. 61
  • 62. Resources • ML Samuels, JA Witmer (2003) Statistics for the Life Sciences, 3rd edition. Prentice Hall. – An excellent introductory text. • GW Oehlert (2000) A First Course in Design and Analysis of Experiments. WH Freeman & Co. – Includes a more advanced treatment of experimental design. • Course: Statistics for Laboratory Scientists (Biostatistics 140.615-616, Johns Hopkins Bloomberg Sch. Pub. Health) – Introductory statistics course, intended for experimental scientists. – Greatly expands upon the topics presented here. 62