SlideShare a Scribd company logo
1 of 101
Elementary Statistics for the
Biological and Life Sciences
STAT 205
University of South Carolina
Columbia, SC
© 2011, University of South Carolina. All rights reserved, except where previous rights
exist. No part of this material may be reproduced, stored in a retrieval system, or
transmitted in any form or by any means — electronic, mechanical, photoreproduction,
recording, or scanning — without the prior written consent of the University of South
Carolina.
STAT205 – Elementary Statistics for the Biological and Life Sciences 2
Motivation: why analyze data?
 Clinical trials/drug development:
compare existing treatments with new
methods to cure disease.
 Agriculture: enhance crop yields,
improve pest resistance
 Ecology: study how ecosystems
develop/respond to environmental
impacts
 Lab studies: learn more about
biological tissue/cellular activity
STAT205 – Elementary Statistics for the Biological and Life Sciences 3
Chapter 2: Description of
Populations and Samples
Selected tables and figures from Samuels, M. L., and Witmer, J. A., Statistics for the
Life Sciences, 3rd Ed. © 2003, Prentice Hall, Upper Saddle River, NJ. Used by per-
mission.
STAT205 – Elementary Statistics for the Biological and Life Sciences 4
Statistics is:
 Statistics is the science of
• collecting,
• summarizing,
• analyzing, and
• interpreting
data.
 Goal: to understand the underlying
biological phenomena that generate
the data.
STAT205 – Elementary Statistics for the Biological and Life Sciences 5
Random Variables
 Data are generated by some random
process or phenomenon.
 Any observed datum represents the
outcome of a Random Variable.
 NOTATION: upper case letter, W, X, Y, etc.
STAT205 – Elementary Statistics for the Biological and Life Sciences 6
Types of Random
Variables
 Qualitative
• Categorical (e.g., blood type: A, B, AB, O)
• Ordinal (e.g., therapy response: none,
some, cured)
 Quantitative
• Discrete (e.g., number of nests – 0,1,2,…)
• Continuous (e.g., cholesterol conc. – 220.2,
210.4, 180.9, etc.)
STAT205 – Elementary Statistics for the Biological and Life Sciences 7
Random Samples
 We take data as samples from a larger
population.
 DEF’N: A SAMPLE is a collection of
‘subjects’ upon which we measure one
or more variables.
 DEF’N: The SAMPLE SIZE is the
number of subjects in a sample.
NOTATION: n.
STAT205 – Elementary Statistics for the Biological and Life Sciences 8
Observations
 DEF’N: The OBSERVATIONAL UNIT is
the type of subject being sampled.
Example: observational units could be
(i) baby, (ii) moth, (iii), Petri dish, etc.
 DEF’N: An OBSERVATION is a
recorded outcome of a variable from a
random sample.
NOTATION: lower case letter, x, y, etc.
STAT205 – Elementary Statistics for the Biological and Life Sciences 9
Frequency Distributions
 DEF’N: A FREQUENCY DISTRIBUTION is a
summary display of the frequencies of
occurrence of each value in a sample.
 For continuous (Ex. 2.4, 2.6, 2.7, & 2.8) or
categorical (Ex. 2.1, 2.2, 2.3, & 2.5) data.
 DEF’N: A RELATIVE FREQUENCY is a raw
frequency divided by n:
Rel. Freq. =
Freq
n
STAT205 – Elementary Statistics for the Biological and Life Sciences 10
Example 2.4
Ex. 2.4: Y = no. of piglets surviving 21
days (litter size).
A sample of n=36 pigs (sows) generated
the data in Table 2.4.
STAT205 – Elementary Statistics for the Biological and Life Sciences 11
Dot Plot
 A DOT PLOT is a simple graphic where
dots indicate observed data in a sample.
 Ex. 2.4: Fig. 2.4 gives the dot plot for the
litter size data:
STAT205 – Elementary Statistics for the Biological and Life Sciences 12
Histogram
 A HISTOGRAM is a simple bar chart where
the bars replace the dots in a dot plot.
 Ex. 2.4 (cont’d): Fig. 2.5 gives the
histogram for the litter size data.
STAT205 – Elementary Statistics for the Biological and Life Sciences 13
Stemplot
 A STEMPLOT (a.k.a. STEM-LEAF
DIAGRAM) is a dot plot (often drawn on its
side) with data information replacing the
dots.
 The ‘stems’ are the core values of the data,
set in common groups.
 The ‘leaves’ are the last digits of each
datum.
STAT205 – Elementary Statistics for the Biological and Life Sciences 14
Example 2.8
 Ex. 2.8: Y = radish growth. Data in Table 2.8:
 (Ordered) stemplot in Fig. 2.15:
STAT205 – Elementary Statistics for the Biological and Life Sciences 15
Frequency Distn’s
 Frequency distributions come in varied
shapes:
• Symmetric & bell-shaped
• Symmetric, not bell-shaped
• Asymmetric & skewed right (rt. tail longer)
• Asymmetric & skewed left (left tail longer)
• Bimodal (two distinct clumps)
 We use histograms, etc., to visualize these
shapes in the data.
Histogram for continuous
data
 For cont. data, histogram is defined by
constructing bins to toss data in.
 Let y(1) be the smallest (min) and y(n) be the
largest (max) values in the data set.
 Divide interval (y(1), y(n)) into, say, 5 to 20
intervals of equal-sized length (more
intervals for more data, e.g. n/5 bins total).
 Count how many obs. in each bin...that's it!
STAT205 – Elementary Statistics for the Biological and Life Sciences 16
STAT205 – Elementary Statistics for the Biological and Life Sciences 17
Descriptive Statistics
 DEF’N: The SAMPLE MEAN is the
arithmetic average of a set of n data values.
 NOTATION:
 The sample mean is often viewed as a kind
of ‘balance point’ in the data.
y = 1
n yi

i=1
n
=
y1 + y2 + + yn
n
STAT205 – Elementary Statistics for the Biological and Life Sciences 18
Example 2.15
 Ex. 2.15: Y = weight gain (lb) of lambs on
special diet. Data: {11, 13, 19, 2, 10, 1}
 n = 6:
 Fig. 2.27:
y = 11 + 13 + 19 + 2 + 10 + 1
6
= 56
6
= 9.33 lb
STAT205 – Elementary Statistics for the Biological and Life Sciences 19
Sample Median
 DEF’N: The SAMPLE MEDIAN is the
value of the data nearest to their
middle. It splits the data in half.
 Find the median by ordering the
data, and calculating their middle
point (n odd) or the average of their
two middle points (n even).
 NOTATION: Q2
New notation
 Original data: y1, y2,…, yn.
 Ordered data: y(1), y(2),…, y(n).
 Example: y1=3.7, y2=-2.0, y3=-7.5, y4=2.1.
 Then: y(1)=-7.5, y(2)=-2.0, y(3)=2.1, y(4)=3.7.
 If n is odd then Q2 is middle ordered value.
 If n is even then Q2 is average of middle
two ordered values.
STAT205 – Elementary Statistics for the Biological and Life Sciences 21
Example 2.17
 Ex. 2.17: (2.15 cont’d) Lamb weight gain.
n = 6 is even , so find Q2 as avg. of two
middle points
 ordered data: y(1) = 1, y(2) = 2,
y(3) = 10, y(4) = 11, y(5) = 13, y(6) = 19.
Q2 = 10 + 11
2
= 10.5 lb
STAT205 – Elementary Statistics for the Biological and Life Sciences 22
Example 2.19
Ex. 2.19: Y = cricket singing times.
Data in Table 2.10:
STAT205 – Elementary Statistics for the Biological and Life Sciences 23
Example 2.19 (cont’d)
STAT205 – Elementary Statistics for the Biological and Life Sciences 24
Skewness
 Mean & median indicate skewness:
• If data are skewed right, mean > median.
• If data are skewed left, mean < median.
• If data are symmetric, mean ≈ median.
 Both the mean and the median are useful
summary measures of location. The
median is slightly more ROBUST to
extreme values of yi, but of course, the
mean is easier to calculate.
STAT205 – Elementary Statistics for the Biological and Life Sciences 25
Quartiles
DEF’N: The QUARTILES of a distribution are
points that separate the data into quarters or
fourths:
• The first quartile separates the lower 25% of
the data from the upper 75%. NOTATION: Q1
• The second quartile separates the lower 50%
of the data from the upper 50%. NOTATION: Q2
• The third quartile separates the lower 75% of
the data from the upper 25%. NOTATION: Q3
STAT205 – Elementary Statistics for the Biological and Life Sciences 26
Example 2.20
 Ex. 2.20: Y = Systolic blood pressure
(mm Hg) in men; n= 7.
 Ordered data:
y(1) = 113, y(2) = 124, y(3) = 124,
y(4) = 132,
y(5) = 146, y(6) = 151, y(7) = 170.
 Q1 = 124
 Q2 = 132
 Q3 = 151
STAT205 – Elementary Statistics for the Biological and Life Sciences 27
IQR
 DEF’N: The INTER-QUARTILE RANGE is
IQR = Q3 – Q1
 DEF’N: The MINIMUM is the smallest value
of a data set or distribution.
NOTATION: y(1)
 DEF’N: The MAXIMUM is the largest value
of a data set or distribution.
NOTATION: y(n)
STAT205 – Elementary Statistics for the Biological and Life Sciences 28
Five Number Summary
 DEF’N: The FIVE NUMBER SUMMARY is
{y(1), Q1, Q2, Q3, y(n)}
 DEF’N: A BOXPLOT is a graphic plot of the
5-no. summary, with a box spanning the
IQR and bridging the quartiles:
y(1) y(n)
Q1 Q2 Q3
STAT205 – Elementary Statistics for the Biological and Life Sciences 29
Example 2.22
Ex. 2.22: Y = radish growth data from Ex.
2.8. Five-no. summary is {8, 15, 21, 30, 37}.
Boxplot is given in Fig. 2.30:
STAT205 – Elementary Statistics for the Biological and Life Sciences 30
Example 2.23
Ex. 2.23: Y = radish
growth data over three
different growth
regimes (see Ex. 2.9).
In Fig. 2.32, we use
boxplots for compar-
ative purposes. 
STAT205 – Elementary Statistics for the Biological and Life Sciences 31
Outliers
 DEF’N: An OUTLIER is an obsv’n that differs
dramatically from the rest of the data.
Formally: Yi is an outlier if
Yi < Q1 – (1.5  IQR) or Yi > Q3 + (1.5  IQR)
“lower fence” “upper fence”
STAT205 – Elementary Statistics for the Biological and Life Sciences 32
Example 2.25
 Ex. 2.25: Y = radish growth data in full light
(from Ex. 2.23). The ordered data are:
3, 5, 5, 7, 7, 8, 9, 10, 10, 10, 10, 14, 20, 21
 IQR = Q3 – Q1 = 10 – 7 = 3
 Upper fence = Q3 + (1.5  IQR)
= 10 + (1.5)(3) = 14.5
 Lower fence = Q1 – (1.5  IQR)
= 7 – (1.5)(3) = 2.5
 y = 20 and y = 21 are outliers.
STAT205 – Elementary Statistics for the Biological and Life Sciences 33
Dispersion
 DEF’N: The SAMPLE RANGE is
Range = Y(n) – Y(1) = Max. – Min.
 DEF’N: The SAMPLE VARIANCE is
 DEF’N: The SAMPLE STANDARD
DEVIATION (SD) is S = S2
S
2
= 1
n-1
(Yi - Y)
2

i=1
n
STAT205 – Elementary Statistics for the Biological and Life Sciences 34
The Empirical Rule
The sample mean and the sample SD
are useful in describing data sets. The
EMPIRICAL RULE states that
• ~68% of the data lie between
• ~95% of the data lie between
• >99% of the data lie between
Y - S and Y + S
Y - 2S and Y + 2S
Y - 3S and Y + 3S
STAT205 – Elementary Statistics for the Biological and Life Sciences 35
Example 2.36
Ex. 2.36: Suppose Y = pulse rate after 5 mins.
of exercise. For n = 28 subjects, we find Y =
98 (beats/min) and S = 13.4 (beats/min).
Thus, e.g., from the empirical rule we expect
~95% of the data to lie between
98 – (2)(13.4) = 98 – 26.8 = 71.2 beats/min
and
98 + (2)(13.4) = 98 + 26.8 = 124.8 beats/min.
STAT205 – Elementary Statistics for the Biological and Life Sciences 36
Inference
 DEF’N: The POPULATION is the larger
group of subjects (organisms, plots,
regions, ecosystems, etc.) on which we
wish to draw inferences.
 DEF’N: A PARAMETER is a quantified
population characteristic. E.g., the popl’n
mean is µ, the popl’n SD is s.
 DEF’N: A STATISTIC is a sample quantity
used to estimate a popl’n parameter.
STAT205 – Elementary Statistics for the Biological and Life Sciences 37
Proportions
 DEF’N: The POPULATION PROPORTION
is the proportion of subjects exhibiting a
particular trait or outcome in the popl’n.
(It generalizes to the probability that any
popl’n element will exhibit the trait.)
NOTATION: p
 DEF’N: The SAMPLE PROPORTION is the
number of sample elements exhibiting the
trait, divided by the sample size, n.
NOTATION: p
STAT205 – Elementary Statistics for the Biological and Life Sciences 38
Chapter 3: Random
Sampling, Probability, and the
Binomial Distribution
Selected tables and figures from Samuels, M. L., and Witmer, J. A., Statistics for the
Life Sciences, 3rd Ed. © 2003, Prentice Hall, Upper Saddle River, NJ. Used by per-
mission.
STAT205 – Elementary Statistics for the Biological and Life Sciences 39
Random Samples
 DEF’N: A SIMPLE RANDOM SAMPLE of n
items is a data set where
(a) every popl’n element has an equal chance
of selection, and
(b) every popl’n element is chosen
independently of every other element.
 This draws upon the larger concept of
RANDOMIZATION: selection of data that
avoids sources of possible bias.
STAT205 – Elementary Statistics for the Biological and Life Sciences 40
Random Sampling
To choose a random sample:
1. assign each popl’n element a unique
code (or set of codes);
2. from a random number table (Table 1,
p. 670) or via computer, in a systematic
manner select n random digits whose
range corresponds to the codes assigned
above; and
3. select every element if its code appears in
step (2), ignoring repeated codes or those
with no assignment.
STAT205 – Elementary Statistics for the Biological and Life Sciences 41
Example 3.1
Ex. 3.1: Simple random sample of size n =
6 from population of 75 elements.
1. label each element 01, 02, …, 75
2. select random digits from a source such
as Table 1, a TI-84, or R.
3. choose elements for the sample if they
correspond to the selected random digits
(ignore repeats and drop-outs)
See Table 3.1 
STAT205 – Elementary Statistics for the Biological and Life Sciences 42
Example 3.1 (cont’d)
 The sample uses elements 23, 38, 59, 21, 08, 09
STAT205 – Elementary Statistics for the Biological and Life Sciences 43
Probability
 DEF’N: A PROBABILITY is the chance of
some event, E, occurring in a specified
manner. NOTATION: P{E}
 We often view probabilities from a
Relative Frequency Interpretation:
P{E} =
# ways E occurs
# total events
STAT205 – Elementary Statistics for the Biological and Life Sciences 44
Example 3.12
Ex. 3.12: Toss a fair coin twice. We know
P{H} = 1/2 (see Ex. 3.8). What is P{HH}?
 Consider all possible outcomes:
HH, HT, TH, TT
 If each outcome is equally likely, then
P{HH} = # HH
# all outcomes
= 1
4
STAT205 – Elementary Statistics for the Biological and Life Sciences 45
Probability Rules
 Rule 1: 0 ≤ P{E} ≤ 1.
 Rule 2: The entirety of events has
probability = 1. That is, if E1, ..., Ek are
all the possible events, ∑P{Ei} = 1.
(here, E1, ..., Ek are disjoint!)
 Rule 3: (The Complement Rule):
If E
c
= {not E}, then P{E
c
} = 1 – P{E}.
STAT205 – Elementary Statistics for the Biological and Life Sciences 46
Example 3.19
 Ex. 3.19: U.S. Blood types:
P{O} = 0.44 P{A} = 0.42
P{B} = 0.10 P{AB} = 0.04
 Note: (1) all are between 0 and 1 
and (2) P{O} + P{A} + P{B} + P{AB}
= 0.44 + 0.42 + 0.10 + 0.04
= 1.00 
 So, e.g., P{Oc
} = 1 – P{O} = 1 – 0.44 = 0.56
STAT205 – Elementary Statistics for the Biological and Life Sciences 47
Probability (cont’d)
 DEF’N: Two events, E1 and E2, are
DISJOINT (a.k.a MUTUALLY EXCLUSIVE) if
they cannot occur simultaneously.
 DEF’N: The UNION of two events, E1 and
E2, is the event that E1 or E2 (or both)
occurs.
 DEF’N: The INTERSECTION of two
events, E1 and E2, is the event that E1 and
E2 occurs.
STAT205 – Elementary Statistics for the Biological and Life Sciences 48
Venn Diagrams
 A useful graphic to conceptualize how
events interrelate is the Venn Diagram.
 For example, Fig. 3.8 shows a Venn Diagram
with 2 intersecting events, E1 and E2:
STAT205 – Elementary Statistics for the Biological and Life Sciences 49
Probability Rules (cont’d)
 We often denote the entirety of events as
the Sample Space, S. Conversely, the
Null Space is  = S
c
 Rule 4: If E1 and E2 are disjoint, then
P{E1 or E2} = P{E1} + P{E2}.
 Rule 5: If E1 and E2 are any two events,
then
P{E1 or E2} = P{E1} + P{E2} – P{E1 and E2}.
STAT205 – Elementary Statistics for the Biological and Life Sciences 50
Example 3.20
Ex. 3.20: Hair/Eye color of 1770 men. We
have the following distribution of traits:
So, e.g., P{Black Hair} = 500/1770, etc.
STAT205 – Elementary Statistics for the Biological and Life Sciences 51
Example 3.20 (cont’d)
Find P{Black Hair OR Red Hair}.
Clearly, E1 = {Black Hair} and
E2 = {Red Hair} are disjoint,
so from Rule 4,
P{Black Hair OR Red Hair}
= P{Black Hair} + P{Red Hair}
= 500/1770 + 70/1770 = 570/1770
= 0.32.
STAT205 – Elementary Statistics for the Biological and Life Sciences 52
Example 3.20 (cont’d)
Now, find P{Black Hair OR Blue Eyes}.
Here, E1 = {Black Hair} and
E3 = {Blue Eyes} are NOT disjoint,
so apply Rule 5:
P{Black Hair OR Blue Eyes}
= P{Black Hair} + P{Blue Eyes}
– P{Black Hair AND Blue Eyes}
= 500/1770 + 1050/1770 – 200/1770
= 1350/1770 = 0.76.
STAT205 – Elementary Statistics for the Biological and Life Sciences 53
Probability (cont’d)
 DEF’N: Two events, E1 and E2, are
INDEPENDENT if knowledge that E1 occurs
does not affect P{E2} and vice versa.
If two events are not independent, they are
DEPENDENT.
 DEF’N: A CONDITIONAL PROBABILITY is
the probability that 1 event occurs, given
that the other has already occurred.
NOTATION: P{E1 | E2}.
STAT205 – Elementary Statistics for the Biological and Life Sciences 54
Probability Rules (cont’d)
 Rule 6: If E1 and E2 are independent, then
P{E1 and E2} = P{E1}  P{E2}.
 Rule 7: If E1 and E2 are any two events, then
P{E1 and E2} = P{E1}  P{E2 | E1}
= P{E2}  P{E1 | E2}.
 Consequences:
• if E1 and E2 are independent, then
P{E1} = P{E1 | E2} and P{E2} = P{E2 | E1}
• also, P{E2 | E1} = P{E1 and E2}/P{E1} if P{E1}≠0.
STAT205 – Elementary Statistics for the Biological and Life Sciences 55
Examples 3.21–3.22
Exs. 3.21–3.22 (3.20, cont’d): Hair/Eye color
of 1770 men.
Refer back to Table 3.3. There, we saw
P{Blue Eyes AND Black Hair} = 200/1770,
while P{Black Hair} = 500/1770. So,
P{Blue Eyes | Black Hair}
=
P{Blue Eyes AND Black Hair }
P{Black Hair}
= 200/1770
500/1770
= 200
500
= 0.40
STAT205 – Elementary Statistics for the Biological and Life Sciences 56
Example 3.25
Ex. 3.25 (3.20, cont’d): Hair/Eye color of 1770
men.
In Table 3.3, there is no evidence of indepen-
dence between Hair & Eye color. So, e.g.,
P{Red Hair AND Brown Eyes}
= P{Red Hair} P{Brown Eyes | Red Hair}
which agrees with the display in Table 3.3.
= 70
1770
20
70
= 20
1770
Bayes’ rule
 Bayes rule is a powerful identity for
obtaining conditional probabilities:
 P{A|B}=P{B|A}P{A} / P{B}.
 Can get P{B}=P{B|A}P{A}+P{B|Ac}P{Ac}.
 Useful in diagnostic screening
applications.
Diagnostic tests
 Say a test is positive or negative: T+ or T-.
 A subject has the disease or not: D+ or D-.
 P(D+) is the prevalence of the disease.
 P(T+|D+) is the sensitivity of the test.
 P(T-|D-) is the specificity of the test.
 P(D+|T+) and P(D-|T-) are the predictive
values positive and negative, respectively.
Screening for hepatitis C at an
STD clinic
 (Weisbord, Trepka, Zhang, Smith, and Brewer,
2003). At an STD clinic in Miami, Florida,
patients were screened for hepatitis C using
CDC screening criteria in the form of a
questionnaire.
 Study concluded P(T+ |D+) = 0.61, P(T− |D−)
= 0.91 and P(D+) = 0.047.
Law of total probability for
P(T+)
 P(T+)=P(T+|D+)P(D+)+P(T+|D−)P(D−)
=P(T+|D+)P(D+)+[1−P(T−|D−)][1−P(D+)]
= 0.61×0.047+(1−0.91)(1−0.047)
= 0.114.
 Say the CDC criteria tells me I’m at risk for
hepatitis C, i.e. my questionnaire yields T+.
 What is the probability that I really have it?
To C or not to C?
 P(D+|T+) = P(T+|D+)P(D+)/P(T+)
= 0.61 × 0.047 / 0.114
= 0.25.
 There’s still only a 1 in 4 chance I’ve got
hepatitis C. But this is much larger than
P(D+)=0.047, the probability before knowing
T+.
 Better get a blood test.
STAT205 – Elementary Statistics for the Biological and Life Sciences 62
Density Curves
 DEF’N: A RANDOM VARIABLE is a
measured outcome of some random
process.
 When a random variable is discrete, it is
usually straightforward to interpret
probabilities associated with it.
 For instance, if Y = {# leaves on tree}:
 P{Y = 122} = 0.42 is interpretable
 P{Y = 18} = 0.02 is interpretable
 but P{Y=120.472} is not interpretable.
STAT205 – Elementary Statistics for the Biological and Life Sciences 63
Probability Histogram
A probability histogram is used to
visualize discrete probability masses:
Notice: each “mass” has area=probability,
and all masses sum to 1.
0
0.1
0.2
0.3
0.4
0.5
1 2 3 4 5 6 7 8 9
k
P{Y=k}
STAT205 – Elementary Statistics for the Biological and Life Sciences 64
Continuous Random Variables
 By contrast, a continuous random variable
has a different probability interpretation.
 Extending the probability histogram to the
continuous case, we say Y has a
PROBABILITY DENSITY CURVE, where area
still represents probability.
STAT205 – Elementary Statistics for the Biological and Life Sciences 65
Continuous Random Variables
Consequences of the continuous probability
model:
• P{Y = a} = 0 = P{Y = b} (area of a line is zero)
• So, P{Y ≤ a} = P{Y < a} + P{Y = a} = P{Y < a}
• And for that matter:
P{a ≤ Y ≤ b} = P{a < Y ≤ b}
= P{a ≤ Y < b} = P{a < Y < b}
(all if Y is continuous).
STAT205 – Elementary Statistics for the Biological and Life Sciences 66
Example 3.30
Ex. 3.30: Y = diameter (in.) of tree trunk.
• Suppose the density has the form given in
Fig. 3.13:
• Then, for example, P{Y > 8} =
P{8 < Y ≤ 10} + P{Y > 10} = 0.12 + 0.07 = 0.19
STAT205 – Elementary Statistics for the Biological and Life Sciences 67
Mean and Expected Value
 DEF’N: If Y is a discrete random variable,
its POPULATION MEAN is given by
µY = ∑yiP{Y = yi}
(where the sum is taken over all possible
yi’s)
 More generally, the EXPECTED VALUE of Y
is E(Y) = ∑yiP{Y = yi}.
STAT205 – Elementary Statistics for the Biological and Life Sciences 68
Ex. 3.35: Y = # tail vertebrae in fish.
From Table 3.4 we find
yi 20 21 22 23
P{Y = yi} .03 .51 .40 .06
So, E(Y) = ∑yiP{Y = yi}
= (20)(.03) + (21)(.51) + (22)(.40) + (23)(.06)
= … = 21.49.
Example 3.35
STAT205 – Elementary Statistics for the Biological and Life Sciences 69
Variance
 DEF’N: If Y is a discrete random variable,
its POPULATION VARIANCE is given by
sY
2 = ∑(yi – µY)2P{Y = yi}
One can show this is also
sY
2 = E(Y2) – {E(Y)}2 = E(Y2) – µY
2
 From this, the POPULATION STANDARD
DEVIATION of Y is sY = (sY
2)1/2.
STAT205 – Elementary Statistics for the Biological and Life Sciences 70
Example 3.37
Ex. 3.37: (3.35, cont’d). From Table 3.4 we
were given the values of P{Y = yi}.
Recall µY = 21.49.
So, sY
2 = ∑(yi – µY)2P{Y = yi}
= (20–21.49)2
(.03) + (21–21.49)
2
(.51)
+ (22–21.49)
2
(.40) + (23–21.49)
2
(.06)
= … = 0.4299.
STAT205 – Elementary Statistics for the Biological and Life Sciences 71
Example 3.37 (cont’d)
So sY
2 = 0.4299.
But, it’s a lot easier to use
sY
2 = E(Y2) – µY
2 =
{(20)
2
(.03) + (21)
2
(.51)
+ (22)
2
(.40) + (23)
2
(.06)} – (21.49)2
= 462.25 – 461.8201
= 0.4299.
STAT205 – Elementary Statistics for the Biological and Life Sciences 72
Rules of Expected Value
 E(·) is a mathematical operator.
 It has certain general properties:
• Rule E1: E(aX + bY) = aE(X) + bE(Y)
= aµX + bµY
• Rule E2: E(a + bY) = a + bE(Y) = a +
bµY
(a “linear operator”)
STAT205 – Elementary Statistics for the Biological and Life Sciences 73
Rules of Variance
The special variance operator also has
certain general properties:
• Rule E3: If X and Y are independent, then
sX+Y
2 = sX
2 + sY
2.
• Rule E4: If X and Y are independent, then
sX–Y
2 = sX
2 + sY
2.
• General rule: If X and Y are independent,
then
saX+bY
2 = a2sX
2 + b2sY
2.
STAT205 – Elementary Statistics for the Biological and Life Sciences 74
Example 3.41
Ex. 3.41: X = mass of cylinder from balance.
Y = mass of cylinder from 2nd balance.
Suppose sX = 0.03 and sY = 0.04. Then, if we
calculate the difference between the two
weighings, X – Y, we know
sX-Y = sX
2 + sY
2 = 0.032
+ 0.042
= 0.0009 + 0.0016 = 0.0025 = 0.05
STAT205 – Elementary Statistics for the Biological and Life Sciences 75
Independent Trials
 DEF’N: The INDEPENDENT TRIALS
MODEL occurs when
(i) n independent trials are studied
(ii) each trial results in a single binary obsv’n
(iii) each trial’s success has (constant)
probability: P{success} = p
Notice that if P{success} = p, P{failure} = 1–p.
 We call this a BInS (Binary / Indep. / n is
const. / Same p) setting.
STAT205 – Elementary Statistics for the Biological and Life Sciences 76
Example 3.43
Ex 3.43: Suppose 39% of organisms in a
popl’n exhibit a mutant trait. Sample n=5
organisms randomly and check for
mutation:
• Binary?  (mutant vs. non-mutant)
• Indep.?  (if no bias in sampling)
• n const.?  (n=5)
• Same p?  (p = 0.39)
STAT205 – Elementary Statistics for the Biological and Life Sciences 77
Binomial Distribution
 DEF’N: In a BInS setting, if we let
Y = {# successes} then Y has a
BINOMIAL DISTRIBUTION.
 NOTATION: Y ~ Bin(n,p).
 The binomial probability function is
P{Y = j} = nCj pj
(1 – p)n–j
(j = 0,1,…,n).
STAT205 – Elementary Statistics for the Biological and Life Sciences 78
Binomial Coefficient
 In the binomial probability function
P{Y = j} = nCj pj
(1 – p)n–j
the BINOMIAL COEFFICIENT is
 Also, j! is the FACTORIAL OPERATOR:
j! = j(j–1)(j–2)…(2)(1)
 We define 0! = 1.
nCj = n!
j! (n-j)!
STAT205 – Elementary Statistics for the Biological and Life Sciences 79
Example of factorial operator: at n = 5,
5! = (5)(4)(3)(2)(1) = 120
4! = (4)(3)(2)(1) = 24
3! = (3)(2)(1) = 6
2! = (2)(1) = 2
So: j 0 1 2 3 4 5
nCj 1 5 10 10 5 1
(Also see Table 3.6.)
Values of nCj are given in Table 2 (p. 674)
Factorial Operator
STAT205 – Elementary Statistics for the Biological and Life Sciences 80
Table 3.6
STAT205 – Elementary Statistics for the Biological and Life Sciences 81
Example 3.45
Ex 3.45 (Ex. 3.43 cont’d): Y ~ Bin(5 , 0.39);
So P{Y = 3} = 5C3(.39)3
(.61)2
= (10)(.0593)(.3721) = 0.22.
Can also find this via TI-
84 or R. Table 3.7 gives
the full distribution.
Figure 3.15 gives a
probability histogram.
STAT205 – Elementary Statistics for the Biological and Life Sciences 82
Binomial Mean & Variance
 If Y ~ Bin(n,p), the population mean and
variance are:
µY = np and sY
2 = np(1–p)
 Ex. 3.49: Y = {# Rh+ in BInS sample}. We’re
given p = P{Rh+} = 0.85. So, if n = 6, we
expect µY = (6)(0.85) = 5.1 Rh+ in the
sample, with sY
2 = (6)(.85)(.15) = 0.765, so
that sY = √.765 = 0.87.
STAT205 – Elementary Statistics for the Biological and Life Sciences 83
Chapter 4:
The Normal Distribution
Selected tables and figures from Samuels, M. L., and Witmer, J. A., Statistics for the
Life Sciences, 3rd Ed. © 2003, Prentice Hall, Upper Saddle River, NJ. Used by per-
mission.
STAT205 – Elementary Statistics for the Biological and Life Sciences 84
Normal Distribution
 DEF’N: A continuous random variable
Y has a NORMAL DISTRIBUTION if its
probability density can be written as
over –∞ < y < ∞.
 NOTATION: Y ~ N(µY , sY
2)
 The mean and variance of a normal dist’n
are E(Y) = µY and E[(Y – µY)2] = sY
2.
f (y) = 1
sY 2
e-(y-µY)
2
/2sY
2
STAT205 – Elementary Statistics for the Biological and Life Sciences 85
Normal Dist’n Examples
 The Normal distribution appears in many
biological contexts:
 Ex. 4.1: Y = serum cholesterol (mg/dLi)
 Ex. 4.2: Y = eggshell thickness (mm)
 Ex. 4.3: Y = nerve cell interspike times (ms)
STAT205 – Elementary Statistics for the Biological and Life Sciences 86
Normal Curve
The Normal density curve is
(i) continuous over –∞ < y < ∞
(ii) symmetric about y = µ
(iii) unimodal, and hence “bell-shaped”
STAT205 – Elementary Statistics for the Biological and Life Sciences 87
Figure 4.7
Since each µ,s2 pair indexes a different
Normal dist’n, this represents a rich family
of curves:
STAT205 – Elementary Statistics for the Biological and Life Sciences 88
Standard Normal
 DEF’N: The STANDARDIZATION FORMULA
for Y ~ N(µ,s2) is
Z = (Y – µ)/s
This is often called a ‘Z-score’.
 If Y ~ N(µ,s2), then Z ~ N(0,1) and we say Z
has a STANDARD NORMAL dist’n.
 Std. Normal probab’s are tabulated in Table
3 (p. 675) and on text’s inside front cover.
STAT205 – Elementary Statistics for the Biological and Life Sciences 89
(Portion of) Table 3, p.675
STAT205 – Elementary Statistics for the Biological and Life Sciences 90
Example: (p. 124) Suppose Z ~ N(0,1).
Find P{Z ≤ 1.53}.
In Table 3:
1.53  0.03
 M
 M
 M
1.5 ………... 0.9370
Hint: “always draw the picture”
P(Z ≤ z)
STAT205 – Elementary Statistics for the Biological and Life Sciences 91
P(a < Z ≤ b)
 If Z ~ N(0,1), and we find P{Z ≤ 1.53} = 0.937,
notice then that P{Z > 1.53} = 1 – 0.937
= 0.063.
 Example: (p. 125) Suppose Z ~ N(0,1); then
P{–1.20 < Z ≤ 0.80} = P{Z ≤ 0.80} – P{Z ≤ –1.20}
= 0.7881 – 0.1151 = 0.6730.
(See Fig. 4.11)
 Can also find Std. Normal probabilities using TI-84
or R!
STAT205 – Elementary Statistics for the Biological and Life Sciences 92
Empirical Rule, revisited
 If Z ~ N(0,1), it mimics the empirical rule
very closely:
 The same effect holds for any Y ~ N(µ,s2).
STAT205 – Elementary Statistics for the Biological and Life Sciences 93
Example 4.5
Ex. 4.5: Y = length of herrings (mm).
Suppose Y ~ N(54, 20.25). Then we know
(a) What % of fish are less than 60 mm long?
Z = Y - 54
20.25
= Y - 54
4.5
~ N(0,1)
P[Y < 60] = P Y - 54
4.5
< 60 - 54
4.5
= P Z < 6
4.5
= P[Z < 1.33]
= 0.9082
STAT205 – Elementary Statistics for the Biological and Life Sciences 94
Example 4.5 (cont’d)
Y = length of herrings ~ N(54, 20.25).
(c) What % of fish are between 51 and 60 mm
long?
P[51 < Y < 60] = P 51 - 54
4.5
< Y - 54
4.5
< 60 - 54
4.5
= P -3
4.5
< Z < 6
4.5
= P[-.67 < Z < 1.33]
= P[Z  1.33] - P[Z < -.67]
= 0.9082 - 0.2514 = 0.6568
STAT205 – Elementary Statistics for the Biological and Life Sciences 95
Std. Normal Tail Areas
 We can also INVERT the std. Normal table
(Table 3):
 Z ~ N(0,1), so find P{Z < 1.96} = 0.975. Then we
know P{Z > 1.96} = 1 – 0.975 = 0.025.
 So, 2.5% of std. normal popl’n exceeds 1.96.
STAT205 – Elementary Statistics for the Biological and Life Sciences 96
za
More generally, if we find some number
za such that P{Z ≤ za} = 1 – a, we know
P{Z > za} = a and vice versa:
STAT205 – Elementary Statistics for the Biological and Life Sciences 97
Std. Normal Critical Point
 DEF’N: The UPPER-a CRITICAL POINT
from Z ~ N(0,1) is the value za such that
P{Z > za} = a.
 Find za by:
• carefully inverting Table 3
• reading off the bottom row (df = ∞) of
Table 4 (p. 677)
• using TI-84 Normal dist’n calculator or R
STAT205 – Elementary Statistics for the Biological and Life Sciences 98
Percentiles
 DEF’N: The point of a distribution below
which p% lies is the pth PERCENTILE of
the dist’n.
 If Z ~ N(0,1), za is the (1 – a)th percentile
of Z.
 We often ask what value is the pth
percentile of a biological population (see
Ex. 4.6).
STAT205 – Elementary Statistics for the Biological and Life Sciences 99
Example 4.6
STAT205 – Elementary Statistics for the Biological and Life Sciences 100
Example 4.6 (cont’d)
 We want to find y* such that P{Y < y*} = 0.70.
This is
 Now, from Table 3 we find P{Z < 0.52} =
0.6985 is close to 0.70. This tells us to
equate (approximately) 0.52 and (y*–54)/4.5
 y* – 54 ≈ (0.52)(4.5)
 y* ≈ (0.52)(4.5) + 54 = 56.34
P Y - 54
4.5
<
y* - 54
4.5
= P Z <
y* - 54
4.5
STAT205 – Elementary Statistics for the Biological and Life Sciences 101
Example 4.6 (conclusion)
So, we find that approximately 70% (69.85%,
exactly) of herring are less than 56.34 mm
long.
Notice also that we derived the critical point
z0.30 ≈ 0.52. (More precisely, we found z0.3015
= 0.52.)
Using TI-84, we can find z0.30 = 0.5244: this
yields the exact value y* = (0.5244)(4.5) + 54
= 56.36 for Example 4.6.

More Related Content

Similar to Basic Statistics for Beginners 123458921

Chapter 02 describing distributions with numbers part II
Chapter 02 describing distributions with numbers part IIChapter 02 describing distributions with numbers part II
Chapter 02 describing distributions with numbers part IIHamdy F. F. Mahmoud
 
Business statistics (Basics)
Business statistics (Basics)Business statistics (Basics)
Business statistics (Basics)AhmedToheed3
 
2-L2 Presentation of data.pptx
2-L2 Presentation of data.pptx2-L2 Presentation of data.pptx
2-L2 Presentation of data.pptxssuser03ba7c
 
Biostatics introduction
Biostatics introductionBiostatics introduction
Biostatics introductionMidhun Mkc
 
Introduction to Biostatistics
Introduction to BiostatisticsIntroduction to Biostatistics
Introduction to BiostatisticsRamFeg
 
Biostatistics.pptx
Biostatistics.pptxBiostatistics.pptx
Biostatistics.pptxPboOtieno
 
ISBA 2022 Susie Bayarri lecture
ISBA 2022 Susie Bayarri lectureISBA 2022 Susie Bayarri lecture
ISBA 2022 Susie Bayarri lecturePierre Jacob
 
Introduction to biostatistics by Dr Allah Yar Malik
Introduction to biostatistics by Dr Allah Yar MalikIntroduction to biostatistics by Dr Allah Yar Malik
Introduction to biostatistics by Dr Allah Yar Malikhuraismalik
 
Big Data Analytics for Healthcare
Big Data Analytics for HealthcareBig Data Analytics for Healthcare
Big Data Analytics for HealthcareChandan Reddy
 
Basics of Descriptive biostatistics.pptx
Basics of Descriptive biostatistics.pptxBasics of Descriptive biostatistics.pptx
Basics of Descriptive biostatistics.pptxamrfleifle1
 
BIOSTATISTICS MEAN MEDIAN MODE SEMESTER 8 AND M PHARMACY BIOSTATISTICS.pptx
BIOSTATISTICS MEAN MEDIAN MODE SEMESTER 8 AND M PHARMACY BIOSTATISTICS.pptxBIOSTATISTICS MEAN MEDIAN MODE SEMESTER 8 AND M PHARMACY BIOSTATISTICS.pptx
BIOSTATISTICS MEAN MEDIAN MODE SEMESTER 8 AND M PHARMACY BIOSTATISTICS.pptxPayaamvohra1
 
Contingency tables
Contingency tablesContingency tables
Contingency tablesPaul Gardner
 
Class1.ppt
Class1.pptClass1.ppt
Class1.pptGautam G
 

Similar to Basic Statistics for Beginners 123458921 (20)

Chapter 02 describing distributions with numbers part II
Chapter 02 describing distributions with numbers part IIChapter 02 describing distributions with numbers part II
Chapter 02 describing distributions with numbers part II
 
INTRODUCTION TO BIO STATISTICS
INTRODUCTION TO BIO STATISTICS INTRODUCTION TO BIO STATISTICS
INTRODUCTION TO BIO STATISTICS
 
Chapter 1: Statistics
Chapter 1: StatisticsChapter 1: Statistics
Chapter 1: Statistics
 
Business statistics (Basics)
Business statistics (Basics)Business statistics (Basics)
Business statistics (Basics)
 
2-L2 Presentation of data.pptx
2-L2 Presentation of data.pptx2-L2 Presentation of data.pptx
2-L2 Presentation of data.pptx
 
Bas 103
Bas 103Bas 103
Bas 103
 
9주차
9주차9주차
9주차
 
Biostatics introduction
Biostatics introductionBiostatics introduction
Biostatics introduction
 
Introduction to Biostatistics
Introduction to BiostatisticsIntroduction to Biostatistics
Introduction to Biostatistics
 
Bio statistics 1
Bio statistics 1Bio statistics 1
Bio statistics 1
 
Biostatistics.pptx
Biostatistics.pptxBiostatistics.pptx
Biostatistics.pptx
 
Biostatistics
BiostatisticsBiostatistics
Biostatistics
 
Medical statistics
Medical statisticsMedical statistics
Medical statistics
 
ISBA 2022 Susie Bayarri lecture
ISBA 2022 Susie Bayarri lectureISBA 2022 Susie Bayarri lecture
ISBA 2022 Susie Bayarri lecture
 
Introduction to biostatistics by Dr Allah Yar Malik
Introduction to biostatistics by Dr Allah Yar MalikIntroduction to biostatistics by Dr Allah Yar Malik
Introduction to biostatistics by Dr Allah Yar Malik
 
Big Data Analytics for Healthcare
Big Data Analytics for HealthcareBig Data Analytics for Healthcare
Big Data Analytics for Healthcare
 
Basics of Descriptive biostatistics.pptx
Basics of Descriptive biostatistics.pptxBasics of Descriptive biostatistics.pptx
Basics of Descriptive biostatistics.pptx
 
BIOSTATISTICS MEAN MEDIAN MODE SEMESTER 8 AND M PHARMACY BIOSTATISTICS.pptx
BIOSTATISTICS MEAN MEDIAN MODE SEMESTER 8 AND M PHARMACY BIOSTATISTICS.pptxBIOSTATISTICS MEAN MEDIAN MODE SEMESTER 8 AND M PHARMACY BIOSTATISTICS.pptx
BIOSTATISTICS MEAN MEDIAN MODE SEMESTER 8 AND M PHARMACY BIOSTATISTICS.pptx
 
Contingency tables
Contingency tablesContingency tables
Contingency tables
 
Class1.ppt
Class1.pptClass1.ppt
Class1.ppt
 

Recently uploaded

(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 

Recently uploaded (20)

(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 

Basic Statistics for Beginners 123458921

  • 1. Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2011, University of South Carolina. All rights reserved, except where previous rights exist. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or by any means — electronic, mechanical, photoreproduction, recording, or scanning — without the prior written consent of the University of South Carolina.
  • 2. STAT205 – Elementary Statistics for the Biological and Life Sciences 2 Motivation: why analyze data?  Clinical trials/drug development: compare existing treatments with new methods to cure disease.  Agriculture: enhance crop yields, improve pest resistance  Ecology: study how ecosystems develop/respond to environmental impacts  Lab studies: learn more about biological tissue/cellular activity
  • 3. STAT205 – Elementary Statistics for the Biological and Life Sciences 3 Chapter 2: Description of Populations and Samples Selected tables and figures from Samuels, M. L., and Witmer, J. A., Statistics for the Life Sciences, 3rd Ed. © 2003, Prentice Hall, Upper Saddle River, NJ. Used by per- mission.
  • 4. STAT205 – Elementary Statistics for the Biological and Life Sciences 4 Statistics is:  Statistics is the science of • collecting, • summarizing, • analyzing, and • interpreting data.  Goal: to understand the underlying biological phenomena that generate the data.
  • 5. STAT205 – Elementary Statistics for the Biological and Life Sciences 5 Random Variables  Data are generated by some random process or phenomenon.  Any observed datum represents the outcome of a Random Variable.  NOTATION: upper case letter, W, X, Y, etc.
  • 6. STAT205 – Elementary Statistics for the Biological and Life Sciences 6 Types of Random Variables  Qualitative • Categorical (e.g., blood type: A, B, AB, O) • Ordinal (e.g., therapy response: none, some, cured)  Quantitative • Discrete (e.g., number of nests – 0,1,2,…) • Continuous (e.g., cholesterol conc. – 220.2, 210.4, 180.9, etc.)
  • 7. STAT205 – Elementary Statistics for the Biological and Life Sciences 7 Random Samples  We take data as samples from a larger population.  DEF’N: A SAMPLE is a collection of ‘subjects’ upon which we measure one or more variables.  DEF’N: The SAMPLE SIZE is the number of subjects in a sample. NOTATION: n.
  • 8. STAT205 – Elementary Statistics for the Biological and Life Sciences 8 Observations  DEF’N: The OBSERVATIONAL UNIT is the type of subject being sampled. Example: observational units could be (i) baby, (ii) moth, (iii), Petri dish, etc.  DEF’N: An OBSERVATION is a recorded outcome of a variable from a random sample. NOTATION: lower case letter, x, y, etc.
  • 9. STAT205 – Elementary Statistics for the Biological and Life Sciences 9 Frequency Distributions  DEF’N: A FREQUENCY DISTRIBUTION is a summary display of the frequencies of occurrence of each value in a sample.  For continuous (Ex. 2.4, 2.6, 2.7, & 2.8) or categorical (Ex. 2.1, 2.2, 2.3, & 2.5) data.  DEF’N: A RELATIVE FREQUENCY is a raw frequency divided by n: Rel. Freq. = Freq n
  • 10. STAT205 – Elementary Statistics for the Biological and Life Sciences 10 Example 2.4 Ex. 2.4: Y = no. of piglets surviving 21 days (litter size). A sample of n=36 pigs (sows) generated the data in Table 2.4.
  • 11. STAT205 – Elementary Statistics for the Biological and Life Sciences 11 Dot Plot  A DOT PLOT is a simple graphic where dots indicate observed data in a sample.  Ex. 2.4: Fig. 2.4 gives the dot plot for the litter size data:
  • 12. STAT205 – Elementary Statistics for the Biological and Life Sciences 12 Histogram  A HISTOGRAM is a simple bar chart where the bars replace the dots in a dot plot.  Ex. 2.4 (cont’d): Fig. 2.5 gives the histogram for the litter size data.
  • 13. STAT205 – Elementary Statistics for the Biological and Life Sciences 13 Stemplot  A STEMPLOT (a.k.a. STEM-LEAF DIAGRAM) is a dot plot (often drawn on its side) with data information replacing the dots.  The ‘stems’ are the core values of the data, set in common groups.  The ‘leaves’ are the last digits of each datum.
  • 14. STAT205 – Elementary Statistics for the Biological and Life Sciences 14 Example 2.8  Ex. 2.8: Y = radish growth. Data in Table 2.8:  (Ordered) stemplot in Fig. 2.15:
  • 15. STAT205 – Elementary Statistics for the Biological and Life Sciences 15 Frequency Distn’s  Frequency distributions come in varied shapes: • Symmetric & bell-shaped • Symmetric, not bell-shaped • Asymmetric & skewed right (rt. tail longer) • Asymmetric & skewed left (left tail longer) • Bimodal (two distinct clumps)  We use histograms, etc., to visualize these shapes in the data.
  • 16. Histogram for continuous data  For cont. data, histogram is defined by constructing bins to toss data in.  Let y(1) be the smallest (min) and y(n) be the largest (max) values in the data set.  Divide interval (y(1), y(n)) into, say, 5 to 20 intervals of equal-sized length (more intervals for more data, e.g. n/5 bins total).  Count how many obs. in each bin...that's it! STAT205 – Elementary Statistics for the Biological and Life Sciences 16
  • 17. STAT205 – Elementary Statistics for the Biological and Life Sciences 17 Descriptive Statistics  DEF’N: The SAMPLE MEAN is the arithmetic average of a set of n data values.  NOTATION:  The sample mean is often viewed as a kind of ‘balance point’ in the data. y = 1 n yi  i=1 n = y1 + y2 + + yn n
  • 18. STAT205 – Elementary Statistics for the Biological and Life Sciences 18 Example 2.15  Ex. 2.15: Y = weight gain (lb) of lambs on special diet. Data: {11, 13, 19, 2, 10, 1}  n = 6:  Fig. 2.27: y = 11 + 13 + 19 + 2 + 10 + 1 6 = 56 6 = 9.33 lb
  • 19. STAT205 – Elementary Statistics for the Biological and Life Sciences 19 Sample Median  DEF’N: The SAMPLE MEDIAN is the value of the data nearest to their middle. It splits the data in half.  Find the median by ordering the data, and calculating their middle point (n odd) or the average of their two middle points (n even).  NOTATION: Q2
  • 20. New notation  Original data: y1, y2,…, yn.  Ordered data: y(1), y(2),…, y(n).  Example: y1=3.7, y2=-2.0, y3=-7.5, y4=2.1.  Then: y(1)=-7.5, y(2)=-2.0, y(3)=2.1, y(4)=3.7.  If n is odd then Q2 is middle ordered value.  If n is even then Q2 is average of middle two ordered values.
  • 21. STAT205 – Elementary Statistics for the Biological and Life Sciences 21 Example 2.17  Ex. 2.17: (2.15 cont’d) Lamb weight gain. n = 6 is even , so find Q2 as avg. of two middle points  ordered data: y(1) = 1, y(2) = 2, y(3) = 10, y(4) = 11, y(5) = 13, y(6) = 19. Q2 = 10 + 11 2 = 10.5 lb
  • 22. STAT205 – Elementary Statistics for the Biological and Life Sciences 22 Example 2.19 Ex. 2.19: Y = cricket singing times. Data in Table 2.10:
  • 23. STAT205 – Elementary Statistics for the Biological and Life Sciences 23 Example 2.19 (cont’d)
  • 24. STAT205 – Elementary Statistics for the Biological and Life Sciences 24 Skewness  Mean & median indicate skewness: • If data are skewed right, mean > median. • If data are skewed left, mean < median. • If data are symmetric, mean ≈ median.  Both the mean and the median are useful summary measures of location. The median is slightly more ROBUST to extreme values of yi, but of course, the mean is easier to calculate.
  • 25. STAT205 – Elementary Statistics for the Biological and Life Sciences 25 Quartiles DEF’N: The QUARTILES of a distribution are points that separate the data into quarters or fourths: • The first quartile separates the lower 25% of the data from the upper 75%. NOTATION: Q1 • The second quartile separates the lower 50% of the data from the upper 50%. NOTATION: Q2 • The third quartile separates the lower 75% of the data from the upper 25%. NOTATION: Q3
  • 26. STAT205 – Elementary Statistics for the Biological and Life Sciences 26 Example 2.20  Ex. 2.20: Y = Systolic blood pressure (mm Hg) in men; n= 7.  Ordered data: y(1) = 113, y(2) = 124, y(3) = 124, y(4) = 132, y(5) = 146, y(6) = 151, y(7) = 170.  Q1 = 124  Q2 = 132  Q3 = 151
  • 27. STAT205 – Elementary Statistics for the Biological and Life Sciences 27 IQR  DEF’N: The INTER-QUARTILE RANGE is IQR = Q3 – Q1  DEF’N: The MINIMUM is the smallest value of a data set or distribution. NOTATION: y(1)  DEF’N: The MAXIMUM is the largest value of a data set or distribution. NOTATION: y(n)
  • 28. STAT205 – Elementary Statistics for the Biological and Life Sciences 28 Five Number Summary  DEF’N: The FIVE NUMBER SUMMARY is {y(1), Q1, Q2, Q3, y(n)}  DEF’N: A BOXPLOT is a graphic plot of the 5-no. summary, with a box spanning the IQR and bridging the quartiles: y(1) y(n) Q1 Q2 Q3
  • 29. STAT205 – Elementary Statistics for the Biological and Life Sciences 29 Example 2.22 Ex. 2.22: Y = radish growth data from Ex. 2.8. Five-no. summary is {8, 15, 21, 30, 37}. Boxplot is given in Fig. 2.30:
  • 30. STAT205 – Elementary Statistics for the Biological and Life Sciences 30 Example 2.23 Ex. 2.23: Y = radish growth data over three different growth regimes (see Ex. 2.9). In Fig. 2.32, we use boxplots for compar- ative purposes. 
  • 31. STAT205 – Elementary Statistics for the Biological and Life Sciences 31 Outliers  DEF’N: An OUTLIER is an obsv’n that differs dramatically from the rest of the data. Formally: Yi is an outlier if Yi < Q1 – (1.5  IQR) or Yi > Q3 + (1.5  IQR) “lower fence” “upper fence”
  • 32. STAT205 – Elementary Statistics for the Biological and Life Sciences 32 Example 2.25  Ex. 2.25: Y = radish growth data in full light (from Ex. 2.23). The ordered data are: 3, 5, 5, 7, 7, 8, 9, 10, 10, 10, 10, 14, 20, 21  IQR = Q3 – Q1 = 10 – 7 = 3  Upper fence = Q3 + (1.5  IQR) = 10 + (1.5)(3) = 14.5  Lower fence = Q1 – (1.5  IQR) = 7 – (1.5)(3) = 2.5  y = 20 and y = 21 are outliers.
  • 33. STAT205 – Elementary Statistics for the Biological and Life Sciences 33 Dispersion  DEF’N: The SAMPLE RANGE is Range = Y(n) – Y(1) = Max. – Min.  DEF’N: The SAMPLE VARIANCE is  DEF’N: The SAMPLE STANDARD DEVIATION (SD) is S = S2 S 2 = 1 n-1 (Yi - Y) 2  i=1 n
  • 34. STAT205 – Elementary Statistics for the Biological and Life Sciences 34 The Empirical Rule The sample mean and the sample SD are useful in describing data sets. The EMPIRICAL RULE states that • ~68% of the data lie between • ~95% of the data lie between • >99% of the data lie between Y - S and Y + S Y - 2S and Y + 2S Y - 3S and Y + 3S
  • 35. STAT205 – Elementary Statistics for the Biological and Life Sciences 35 Example 2.36 Ex. 2.36: Suppose Y = pulse rate after 5 mins. of exercise. For n = 28 subjects, we find Y = 98 (beats/min) and S = 13.4 (beats/min). Thus, e.g., from the empirical rule we expect ~95% of the data to lie between 98 – (2)(13.4) = 98 – 26.8 = 71.2 beats/min and 98 + (2)(13.4) = 98 + 26.8 = 124.8 beats/min.
  • 36. STAT205 – Elementary Statistics for the Biological and Life Sciences 36 Inference  DEF’N: The POPULATION is the larger group of subjects (organisms, plots, regions, ecosystems, etc.) on which we wish to draw inferences.  DEF’N: A PARAMETER is a quantified population characteristic. E.g., the popl’n mean is µ, the popl’n SD is s.  DEF’N: A STATISTIC is a sample quantity used to estimate a popl’n parameter.
  • 37. STAT205 – Elementary Statistics for the Biological and Life Sciences 37 Proportions  DEF’N: The POPULATION PROPORTION is the proportion of subjects exhibiting a particular trait or outcome in the popl’n. (It generalizes to the probability that any popl’n element will exhibit the trait.) NOTATION: p  DEF’N: The SAMPLE PROPORTION is the number of sample elements exhibiting the trait, divided by the sample size, n. NOTATION: p
  • 38. STAT205 – Elementary Statistics for the Biological and Life Sciences 38 Chapter 3: Random Sampling, Probability, and the Binomial Distribution Selected tables and figures from Samuels, M. L., and Witmer, J. A., Statistics for the Life Sciences, 3rd Ed. © 2003, Prentice Hall, Upper Saddle River, NJ. Used by per- mission.
  • 39. STAT205 – Elementary Statistics for the Biological and Life Sciences 39 Random Samples  DEF’N: A SIMPLE RANDOM SAMPLE of n items is a data set where (a) every popl’n element has an equal chance of selection, and (b) every popl’n element is chosen independently of every other element.  This draws upon the larger concept of RANDOMIZATION: selection of data that avoids sources of possible bias.
  • 40. STAT205 – Elementary Statistics for the Biological and Life Sciences 40 Random Sampling To choose a random sample: 1. assign each popl’n element a unique code (or set of codes); 2. from a random number table (Table 1, p. 670) or via computer, in a systematic manner select n random digits whose range corresponds to the codes assigned above; and 3. select every element if its code appears in step (2), ignoring repeated codes or those with no assignment.
  • 41. STAT205 – Elementary Statistics for the Biological and Life Sciences 41 Example 3.1 Ex. 3.1: Simple random sample of size n = 6 from population of 75 elements. 1. label each element 01, 02, …, 75 2. select random digits from a source such as Table 1, a TI-84, or R. 3. choose elements for the sample if they correspond to the selected random digits (ignore repeats and drop-outs) See Table 3.1 
  • 42. STAT205 – Elementary Statistics for the Biological and Life Sciences 42 Example 3.1 (cont’d)  The sample uses elements 23, 38, 59, 21, 08, 09
  • 43. STAT205 – Elementary Statistics for the Biological and Life Sciences 43 Probability  DEF’N: A PROBABILITY is the chance of some event, E, occurring in a specified manner. NOTATION: P{E}  We often view probabilities from a Relative Frequency Interpretation: P{E} = # ways E occurs # total events
  • 44. STAT205 – Elementary Statistics for the Biological and Life Sciences 44 Example 3.12 Ex. 3.12: Toss a fair coin twice. We know P{H} = 1/2 (see Ex. 3.8). What is P{HH}?  Consider all possible outcomes: HH, HT, TH, TT  If each outcome is equally likely, then P{HH} = # HH # all outcomes = 1 4
  • 45. STAT205 – Elementary Statistics for the Biological and Life Sciences 45 Probability Rules  Rule 1: 0 ≤ P{E} ≤ 1.  Rule 2: The entirety of events has probability = 1. That is, if E1, ..., Ek are all the possible events, ∑P{Ei} = 1. (here, E1, ..., Ek are disjoint!)  Rule 3: (The Complement Rule): If E c = {not E}, then P{E c } = 1 – P{E}.
  • 46. STAT205 – Elementary Statistics for the Biological and Life Sciences 46 Example 3.19  Ex. 3.19: U.S. Blood types: P{O} = 0.44 P{A} = 0.42 P{B} = 0.10 P{AB} = 0.04  Note: (1) all are between 0 and 1  and (2) P{O} + P{A} + P{B} + P{AB} = 0.44 + 0.42 + 0.10 + 0.04 = 1.00   So, e.g., P{Oc } = 1 – P{O} = 1 – 0.44 = 0.56
  • 47. STAT205 – Elementary Statistics for the Biological and Life Sciences 47 Probability (cont’d)  DEF’N: Two events, E1 and E2, are DISJOINT (a.k.a MUTUALLY EXCLUSIVE) if they cannot occur simultaneously.  DEF’N: The UNION of two events, E1 and E2, is the event that E1 or E2 (or both) occurs.  DEF’N: The INTERSECTION of two events, E1 and E2, is the event that E1 and E2 occurs.
  • 48. STAT205 – Elementary Statistics for the Biological and Life Sciences 48 Venn Diagrams  A useful graphic to conceptualize how events interrelate is the Venn Diagram.  For example, Fig. 3.8 shows a Venn Diagram with 2 intersecting events, E1 and E2:
  • 49. STAT205 – Elementary Statistics for the Biological and Life Sciences 49 Probability Rules (cont’d)  We often denote the entirety of events as the Sample Space, S. Conversely, the Null Space is  = S c  Rule 4: If E1 and E2 are disjoint, then P{E1 or E2} = P{E1} + P{E2}.  Rule 5: If E1 and E2 are any two events, then P{E1 or E2} = P{E1} + P{E2} – P{E1 and E2}.
  • 50. STAT205 – Elementary Statistics for the Biological and Life Sciences 50 Example 3.20 Ex. 3.20: Hair/Eye color of 1770 men. We have the following distribution of traits: So, e.g., P{Black Hair} = 500/1770, etc.
  • 51. STAT205 – Elementary Statistics for the Biological and Life Sciences 51 Example 3.20 (cont’d) Find P{Black Hair OR Red Hair}. Clearly, E1 = {Black Hair} and E2 = {Red Hair} are disjoint, so from Rule 4, P{Black Hair OR Red Hair} = P{Black Hair} + P{Red Hair} = 500/1770 + 70/1770 = 570/1770 = 0.32.
  • 52. STAT205 – Elementary Statistics for the Biological and Life Sciences 52 Example 3.20 (cont’d) Now, find P{Black Hair OR Blue Eyes}. Here, E1 = {Black Hair} and E3 = {Blue Eyes} are NOT disjoint, so apply Rule 5: P{Black Hair OR Blue Eyes} = P{Black Hair} + P{Blue Eyes} – P{Black Hair AND Blue Eyes} = 500/1770 + 1050/1770 – 200/1770 = 1350/1770 = 0.76.
  • 53. STAT205 – Elementary Statistics for the Biological and Life Sciences 53 Probability (cont’d)  DEF’N: Two events, E1 and E2, are INDEPENDENT if knowledge that E1 occurs does not affect P{E2} and vice versa. If two events are not independent, they are DEPENDENT.  DEF’N: A CONDITIONAL PROBABILITY is the probability that 1 event occurs, given that the other has already occurred. NOTATION: P{E1 | E2}.
  • 54. STAT205 – Elementary Statistics for the Biological and Life Sciences 54 Probability Rules (cont’d)  Rule 6: If E1 and E2 are independent, then P{E1 and E2} = P{E1}  P{E2}.  Rule 7: If E1 and E2 are any two events, then P{E1 and E2} = P{E1}  P{E2 | E1} = P{E2}  P{E1 | E2}.  Consequences: • if E1 and E2 are independent, then P{E1} = P{E1 | E2} and P{E2} = P{E2 | E1} • also, P{E2 | E1} = P{E1 and E2}/P{E1} if P{E1}≠0.
  • 55. STAT205 – Elementary Statistics for the Biological and Life Sciences 55 Examples 3.21–3.22 Exs. 3.21–3.22 (3.20, cont’d): Hair/Eye color of 1770 men. Refer back to Table 3.3. There, we saw P{Blue Eyes AND Black Hair} = 200/1770, while P{Black Hair} = 500/1770. So, P{Blue Eyes | Black Hair} = P{Blue Eyes AND Black Hair } P{Black Hair} = 200/1770 500/1770 = 200 500 = 0.40
  • 56. STAT205 – Elementary Statistics for the Biological and Life Sciences 56 Example 3.25 Ex. 3.25 (3.20, cont’d): Hair/Eye color of 1770 men. In Table 3.3, there is no evidence of indepen- dence between Hair & Eye color. So, e.g., P{Red Hair AND Brown Eyes} = P{Red Hair} P{Brown Eyes | Red Hair} which agrees with the display in Table 3.3. = 70 1770 20 70 = 20 1770
  • 57. Bayes’ rule  Bayes rule is a powerful identity for obtaining conditional probabilities:  P{A|B}=P{B|A}P{A} / P{B}.  Can get P{B}=P{B|A}P{A}+P{B|Ac}P{Ac}.  Useful in diagnostic screening applications.
  • 58. Diagnostic tests  Say a test is positive or negative: T+ or T-.  A subject has the disease or not: D+ or D-.  P(D+) is the prevalence of the disease.  P(T+|D+) is the sensitivity of the test.  P(T-|D-) is the specificity of the test.  P(D+|T+) and P(D-|T-) are the predictive values positive and negative, respectively.
  • 59. Screening for hepatitis C at an STD clinic  (Weisbord, Trepka, Zhang, Smith, and Brewer, 2003). At an STD clinic in Miami, Florida, patients were screened for hepatitis C using CDC screening criteria in the form of a questionnaire.  Study concluded P(T+ |D+) = 0.61, P(T− |D−) = 0.91 and P(D+) = 0.047.
  • 60. Law of total probability for P(T+)  P(T+)=P(T+|D+)P(D+)+P(T+|D−)P(D−) =P(T+|D+)P(D+)+[1−P(T−|D−)][1−P(D+)] = 0.61×0.047+(1−0.91)(1−0.047) = 0.114.  Say the CDC criteria tells me I’m at risk for hepatitis C, i.e. my questionnaire yields T+.  What is the probability that I really have it?
  • 61. To C or not to C?  P(D+|T+) = P(T+|D+)P(D+)/P(T+) = 0.61 × 0.047 / 0.114 = 0.25.  There’s still only a 1 in 4 chance I’ve got hepatitis C. But this is much larger than P(D+)=0.047, the probability before knowing T+.  Better get a blood test.
  • 62. STAT205 – Elementary Statistics for the Biological and Life Sciences 62 Density Curves  DEF’N: A RANDOM VARIABLE is a measured outcome of some random process.  When a random variable is discrete, it is usually straightforward to interpret probabilities associated with it.  For instance, if Y = {# leaves on tree}:  P{Y = 122} = 0.42 is interpretable  P{Y = 18} = 0.02 is interpretable  but P{Y=120.472} is not interpretable.
  • 63. STAT205 – Elementary Statistics for the Biological and Life Sciences 63 Probability Histogram A probability histogram is used to visualize discrete probability masses: Notice: each “mass” has area=probability, and all masses sum to 1. 0 0.1 0.2 0.3 0.4 0.5 1 2 3 4 5 6 7 8 9 k P{Y=k}
  • 64. STAT205 – Elementary Statistics for the Biological and Life Sciences 64 Continuous Random Variables  By contrast, a continuous random variable has a different probability interpretation.  Extending the probability histogram to the continuous case, we say Y has a PROBABILITY DENSITY CURVE, where area still represents probability.
  • 65. STAT205 – Elementary Statistics for the Biological and Life Sciences 65 Continuous Random Variables Consequences of the continuous probability model: • P{Y = a} = 0 = P{Y = b} (area of a line is zero) • So, P{Y ≤ a} = P{Y < a} + P{Y = a} = P{Y < a} • And for that matter: P{a ≤ Y ≤ b} = P{a < Y ≤ b} = P{a ≤ Y < b} = P{a < Y < b} (all if Y is continuous).
  • 66. STAT205 – Elementary Statistics for the Biological and Life Sciences 66 Example 3.30 Ex. 3.30: Y = diameter (in.) of tree trunk. • Suppose the density has the form given in Fig. 3.13: • Then, for example, P{Y > 8} = P{8 < Y ≤ 10} + P{Y > 10} = 0.12 + 0.07 = 0.19
  • 67. STAT205 – Elementary Statistics for the Biological and Life Sciences 67 Mean and Expected Value  DEF’N: If Y is a discrete random variable, its POPULATION MEAN is given by µY = ∑yiP{Y = yi} (where the sum is taken over all possible yi’s)  More generally, the EXPECTED VALUE of Y is E(Y) = ∑yiP{Y = yi}.
  • 68. STAT205 – Elementary Statistics for the Biological and Life Sciences 68 Ex. 3.35: Y = # tail vertebrae in fish. From Table 3.4 we find yi 20 21 22 23 P{Y = yi} .03 .51 .40 .06 So, E(Y) = ∑yiP{Y = yi} = (20)(.03) + (21)(.51) + (22)(.40) + (23)(.06) = … = 21.49. Example 3.35
  • 69. STAT205 – Elementary Statistics for the Biological and Life Sciences 69 Variance  DEF’N: If Y is a discrete random variable, its POPULATION VARIANCE is given by sY 2 = ∑(yi – µY)2P{Y = yi} One can show this is also sY 2 = E(Y2) – {E(Y)}2 = E(Y2) – µY 2  From this, the POPULATION STANDARD DEVIATION of Y is sY = (sY 2)1/2.
  • 70. STAT205 – Elementary Statistics for the Biological and Life Sciences 70 Example 3.37 Ex. 3.37: (3.35, cont’d). From Table 3.4 we were given the values of P{Y = yi}. Recall µY = 21.49. So, sY 2 = ∑(yi – µY)2P{Y = yi} = (20–21.49)2 (.03) + (21–21.49) 2 (.51) + (22–21.49) 2 (.40) + (23–21.49) 2 (.06) = … = 0.4299.
  • 71. STAT205 – Elementary Statistics for the Biological and Life Sciences 71 Example 3.37 (cont’d) So sY 2 = 0.4299. But, it’s a lot easier to use sY 2 = E(Y2) – µY 2 = {(20) 2 (.03) + (21) 2 (.51) + (22) 2 (.40) + (23) 2 (.06)} – (21.49)2 = 462.25 – 461.8201 = 0.4299.
  • 72. STAT205 – Elementary Statistics for the Biological and Life Sciences 72 Rules of Expected Value  E(·) is a mathematical operator.  It has certain general properties: • Rule E1: E(aX + bY) = aE(X) + bE(Y) = aµX + bµY • Rule E2: E(a + bY) = a + bE(Y) = a + bµY (a “linear operator”)
  • 73. STAT205 – Elementary Statistics for the Biological and Life Sciences 73 Rules of Variance The special variance operator also has certain general properties: • Rule E3: If X and Y are independent, then sX+Y 2 = sX 2 + sY 2. • Rule E4: If X and Y are independent, then sX–Y 2 = sX 2 + sY 2. • General rule: If X and Y are independent, then saX+bY 2 = a2sX 2 + b2sY 2.
  • 74. STAT205 – Elementary Statistics for the Biological and Life Sciences 74 Example 3.41 Ex. 3.41: X = mass of cylinder from balance. Y = mass of cylinder from 2nd balance. Suppose sX = 0.03 and sY = 0.04. Then, if we calculate the difference between the two weighings, X – Y, we know sX-Y = sX 2 + sY 2 = 0.032 + 0.042 = 0.0009 + 0.0016 = 0.0025 = 0.05
  • 75. STAT205 – Elementary Statistics for the Biological and Life Sciences 75 Independent Trials  DEF’N: The INDEPENDENT TRIALS MODEL occurs when (i) n independent trials are studied (ii) each trial results in a single binary obsv’n (iii) each trial’s success has (constant) probability: P{success} = p Notice that if P{success} = p, P{failure} = 1–p.  We call this a BInS (Binary / Indep. / n is const. / Same p) setting.
  • 76. STAT205 – Elementary Statistics for the Biological and Life Sciences 76 Example 3.43 Ex 3.43: Suppose 39% of organisms in a popl’n exhibit a mutant trait. Sample n=5 organisms randomly and check for mutation: • Binary?  (mutant vs. non-mutant) • Indep.?  (if no bias in sampling) • n const.?  (n=5) • Same p?  (p = 0.39)
  • 77. STAT205 – Elementary Statistics for the Biological and Life Sciences 77 Binomial Distribution  DEF’N: In a BInS setting, if we let Y = {# successes} then Y has a BINOMIAL DISTRIBUTION.  NOTATION: Y ~ Bin(n,p).  The binomial probability function is P{Y = j} = nCj pj (1 – p)n–j (j = 0,1,…,n).
  • 78. STAT205 – Elementary Statistics for the Biological and Life Sciences 78 Binomial Coefficient  In the binomial probability function P{Y = j} = nCj pj (1 – p)n–j the BINOMIAL COEFFICIENT is  Also, j! is the FACTORIAL OPERATOR: j! = j(j–1)(j–2)…(2)(1)  We define 0! = 1. nCj = n! j! (n-j)!
  • 79. STAT205 – Elementary Statistics for the Biological and Life Sciences 79 Example of factorial operator: at n = 5, 5! = (5)(4)(3)(2)(1) = 120 4! = (4)(3)(2)(1) = 24 3! = (3)(2)(1) = 6 2! = (2)(1) = 2 So: j 0 1 2 3 4 5 nCj 1 5 10 10 5 1 (Also see Table 3.6.) Values of nCj are given in Table 2 (p. 674) Factorial Operator
  • 80. STAT205 – Elementary Statistics for the Biological and Life Sciences 80 Table 3.6
  • 81. STAT205 – Elementary Statistics for the Biological and Life Sciences 81 Example 3.45 Ex 3.45 (Ex. 3.43 cont’d): Y ~ Bin(5 , 0.39); So P{Y = 3} = 5C3(.39)3 (.61)2 = (10)(.0593)(.3721) = 0.22. Can also find this via TI- 84 or R. Table 3.7 gives the full distribution. Figure 3.15 gives a probability histogram.
  • 82. STAT205 – Elementary Statistics for the Biological and Life Sciences 82 Binomial Mean & Variance  If Y ~ Bin(n,p), the population mean and variance are: µY = np and sY 2 = np(1–p)  Ex. 3.49: Y = {# Rh+ in BInS sample}. We’re given p = P{Rh+} = 0.85. So, if n = 6, we expect µY = (6)(0.85) = 5.1 Rh+ in the sample, with sY 2 = (6)(.85)(.15) = 0.765, so that sY = √.765 = 0.87.
  • 83. STAT205 – Elementary Statistics for the Biological and Life Sciences 83 Chapter 4: The Normal Distribution Selected tables and figures from Samuels, M. L., and Witmer, J. A., Statistics for the Life Sciences, 3rd Ed. © 2003, Prentice Hall, Upper Saddle River, NJ. Used by per- mission.
  • 84. STAT205 – Elementary Statistics for the Biological and Life Sciences 84 Normal Distribution  DEF’N: A continuous random variable Y has a NORMAL DISTRIBUTION if its probability density can be written as over –∞ < y < ∞.  NOTATION: Y ~ N(µY , sY 2)  The mean and variance of a normal dist’n are E(Y) = µY and E[(Y – µY)2] = sY 2. f (y) = 1 sY 2 e-(y-µY) 2 /2sY 2
  • 85. STAT205 – Elementary Statistics for the Biological and Life Sciences 85 Normal Dist’n Examples  The Normal distribution appears in many biological contexts:  Ex. 4.1: Y = serum cholesterol (mg/dLi)  Ex. 4.2: Y = eggshell thickness (mm)  Ex. 4.3: Y = nerve cell interspike times (ms)
  • 86. STAT205 – Elementary Statistics for the Biological and Life Sciences 86 Normal Curve The Normal density curve is (i) continuous over –∞ < y < ∞ (ii) symmetric about y = µ (iii) unimodal, and hence “bell-shaped”
  • 87. STAT205 – Elementary Statistics for the Biological and Life Sciences 87 Figure 4.7 Since each µ,s2 pair indexes a different Normal dist’n, this represents a rich family of curves:
  • 88. STAT205 – Elementary Statistics for the Biological and Life Sciences 88 Standard Normal  DEF’N: The STANDARDIZATION FORMULA for Y ~ N(µ,s2) is Z = (Y – µ)/s This is often called a ‘Z-score’.  If Y ~ N(µ,s2), then Z ~ N(0,1) and we say Z has a STANDARD NORMAL dist’n.  Std. Normal probab’s are tabulated in Table 3 (p. 675) and on text’s inside front cover.
  • 89. STAT205 – Elementary Statistics for the Biological and Life Sciences 89 (Portion of) Table 3, p.675
  • 90. STAT205 – Elementary Statistics for the Biological and Life Sciences 90 Example: (p. 124) Suppose Z ~ N(0,1). Find P{Z ≤ 1.53}. In Table 3: 1.53  0.03  M  M  M 1.5 ………... 0.9370 Hint: “always draw the picture” P(Z ≤ z)
  • 91. STAT205 – Elementary Statistics for the Biological and Life Sciences 91 P(a < Z ≤ b)  If Z ~ N(0,1), and we find P{Z ≤ 1.53} = 0.937, notice then that P{Z > 1.53} = 1 – 0.937 = 0.063.  Example: (p. 125) Suppose Z ~ N(0,1); then P{–1.20 < Z ≤ 0.80} = P{Z ≤ 0.80} – P{Z ≤ –1.20} = 0.7881 – 0.1151 = 0.6730. (See Fig. 4.11)  Can also find Std. Normal probabilities using TI-84 or R!
  • 92. STAT205 – Elementary Statistics for the Biological and Life Sciences 92 Empirical Rule, revisited  If Z ~ N(0,1), it mimics the empirical rule very closely:  The same effect holds for any Y ~ N(µ,s2).
  • 93. STAT205 – Elementary Statistics for the Biological and Life Sciences 93 Example 4.5 Ex. 4.5: Y = length of herrings (mm). Suppose Y ~ N(54, 20.25). Then we know (a) What % of fish are less than 60 mm long? Z = Y - 54 20.25 = Y - 54 4.5 ~ N(0,1) P[Y < 60] = P Y - 54 4.5 < 60 - 54 4.5 = P Z < 6 4.5 = P[Z < 1.33] = 0.9082
  • 94. STAT205 – Elementary Statistics for the Biological and Life Sciences 94 Example 4.5 (cont’d) Y = length of herrings ~ N(54, 20.25). (c) What % of fish are between 51 and 60 mm long? P[51 < Y < 60] = P 51 - 54 4.5 < Y - 54 4.5 < 60 - 54 4.5 = P -3 4.5 < Z < 6 4.5 = P[-.67 < Z < 1.33] = P[Z  1.33] - P[Z < -.67] = 0.9082 - 0.2514 = 0.6568
  • 95. STAT205 – Elementary Statistics for the Biological and Life Sciences 95 Std. Normal Tail Areas  We can also INVERT the std. Normal table (Table 3):  Z ~ N(0,1), so find P{Z < 1.96} = 0.975. Then we know P{Z > 1.96} = 1 – 0.975 = 0.025.  So, 2.5% of std. normal popl’n exceeds 1.96.
  • 96. STAT205 – Elementary Statistics for the Biological and Life Sciences 96 za More generally, if we find some number za such that P{Z ≤ za} = 1 – a, we know P{Z > za} = a and vice versa:
  • 97. STAT205 – Elementary Statistics for the Biological and Life Sciences 97 Std. Normal Critical Point  DEF’N: The UPPER-a CRITICAL POINT from Z ~ N(0,1) is the value za such that P{Z > za} = a.  Find za by: • carefully inverting Table 3 • reading off the bottom row (df = ∞) of Table 4 (p. 677) • using TI-84 Normal dist’n calculator or R
  • 98. STAT205 – Elementary Statistics for the Biological and Life Sciences 98 Percentiles  DEF’N: The point of a distribution below which p% lies is the pth PERCENTILE of the dist’n.  If Z ~ N(0,1), za is the (1 – a)th percentile of Z.  We often ask what value is the pth percentile of a biological population (see Ex. 4.6).
  • 99. STAT205 – Elementary Statistics for the Biological and Life Sciences 99 Example 4.6
  • 100. STAT205 – Elementary Statistics for the Biological and Life Sciences 100 Example 4.6 (cont’d)  We want to find y* such that P{Y < y*} = 0.70. This is  Now, from Table 3 we find P{Z < 0.52} = 0.6985 is close to 0.70. This tells us to equate (approximately) 0.52 and (y*–54)/4.5  y* – 54 ≈ (0.52)(4.5)  y* ≈ (0.52)(4.5) + 54 = 56.34 P Y - 54 4.5 < y* - 54 4.5 = P Z < y* - 54 4.5
  • 101. STAT205 – Elementary Statistics for the Biological and Life Sciences 101 Example 4.6 (conclusion) So, we find that approximately 70% (69.85%, exactly) of herring are less than 56.34 mm long. Notice also that we derived the critical point z0.30 ≈ 0.52. (More precisely, we found z0.3015 = 0.52.) Using TI-84, we can find z0.30 = 0.5244: this yields the exact value y* = (0.5244)(4.5) + 54 = 56.36 for Example 4.6.