2. In
this
lecture
we
will
discuss
the
different
methods
to
measure
central
tendency
and
dispersion
in
a
sta/s/cal
sample.
2
3. Central
tendency
is
just
a
technical
way
of
saying,
what’s
typical
of
this
sample?
For
example,
out
of
all
Carlow
students,
which
gender
is
the
more
typical
one?
Male
or
female?
Out
of
all
the
products
listed
on
Amazon,
which
is
the
best
seller?
And
out
of
all
the
eBay
lis/ngs
of
“Tickle
Me
Elmo,”
which
price
is
the
most
common
one?
3
4. These
three
different
measures
are
discussed
in
detail
by
Khan
Academy.
Here
are
some
brief
summaries.
We
will
discuss
normal
distribu/on.
One
key
idea
is
this:
If
the
sample
is
normally
distributed,
meaning
it
looks
like
a
symmetrical
bell
curve,
then
mean,
median
and
mode
will
be
the
same
number.
However,
if
the
sample
is
skewed
either
to
the
leS
or
to
the
right,
then
these
three
numbers
would
take
on
different
values.
4
5. Concepts
like
“mean”
and
“standard
devia/on”
are
really
based
on
the
theory
of
normal
curve.
Note
it’s
a
theory,
a
conceptualiza/on
of
how
data
should
be
distributed
in
an
ideal
world.
In
reality,
oSen
/mes
distribu/ons
are
not
perfectly
normal.
Next
slide
is
an
example.
Note
that
the
“mean”
=
the
50th
percen/le.
5
6. Look
at
this
distribu/on
of
salary
data.
It’s
heavy
on
the
leS
side,
with
a
long
skinny
tail
on
the
right.
Definitely
not
symmetrical.
6
7. When
we
impose
the
normal
curve
on
top
of
the
salary
distribu/on,
we
see
that
the
normal
curve
only
captures
the
right
tail
well.
For
the
leS
tail,
the
normal
curve
doesn’t
describe
the
actual
distribu/on
very
well.
This
is
because
the
salary
data
is
posi%vely
skewed.
In
skewed
data,
“mode”
and
“median”
describe
the
central
tendency
be]er
than
the
“mean”.
7
8. In
addi/on
to
central
tendency,
we
also
need
a
way
to
describe
how
spread
out
the
distribu/on
is,
and
how
weird
a
case
is
(rela/ve
to
the
mean).
When
a
case
is
very
close
to
the
mean,
we
have
an
average
joe.
When
a
case
is
far
off
from
the
mean
on
the
/p
of
a
long
tail,
we
have
a
weirdo!
In
real
life,
we
oSen
discuss
dispersion
without
realizing
it.
For
example:
In
which
percen/le
is
my
child’s
height?
How
many
people
in
this
class
will
get
an
A?
Is
the
customer’s
credit
score
above
or
below
average?
By
how
much?
Is
a
dona/on
of
$30,000
pre]y
common
or
very
rare?
How
rare
is
it?
This
slide
illustrates
the
distribu/on
of
total
purchase
aSer
a
customer
clicks
on
a
link.
Look
at
the
data,
the
mean,
the
distribu/on,
and
reflect
on
the
following
ques/ons:
How
likely
would
an
average
customer
spend
$200
per
order?
Very
unlikely
–
it’s
at
the
end
of
the
curve
–
in
a
tail.
How
about
$35?
Much
more
likely
–
it’s
the
average
order.
In
what
percenEle
is
a
$67
order?
The
84th
-‐
we
know
because
it’s
one
standard
deviaEon
(34%)
above
the
mean
(50%).
The
next
slide
explains
what
a
standard
deviaEon
is.
8
9. Standard
devia/on
is
a
standardized
measure
of
dispersion.
It
tells
you
whether
the
distribu/on
is
short
and
fat
(with
a
big
standard
distribu/on)
or
tall
and
skinny
(with
a
small
standard
distribu/on).
The
calcula/on
is
explained
well
by
Khan
(see
Khan’s
Academy
video
clips
linked
in
this
session).
The
basic
idea
to
take
away
is:
The
standard
devia/on
tells
you,
on
average,
how
far
away
the
data
points
are
from
the
mean.
For
example,
let’s
say
that
the
Steelers
have
an
average
score
of
25
per
game,
and
the
standard
devia/on
is
1.
Let’s
also
say
that
the
Greenbay
Packers
have
an
average
score
of
25
per
game,
and
a
standard
devia/on
of
7.
In
this
example,
both
teams
are
comparable
in
terms
of
average
scores,
but
the
Steelers
have
a
much
smaller
standard
devia/on.
This
means
the
Steelers’
performance
is
pre]y
consistent
over
/me,
their
scores
may
be
above
or
below
25,
but
only
by
1-‐2
points
on
average.
If
you
plot
their
scores
on
a
chart,
you
would
see
that
most
of
them
pack
around
25,
with
a
nice
narrow
distribu/on
that
peaks
around
25.
In
contrast,
the
Packers
may
average
around
25,
but
their
performance
varies
widely
from
game
to
game.
One
day
they
may
score
18
(25-‐7)
and
the
next
day
they
may
score
32
(25+7)
If
you
plot
their
widely
varied
scores
on
a
chart,
you
would
get
a
short
and
fat
distribu/on.
(Go
Steelers
Go!)
9
10. What
are
prac/cal
ways
to
use
the
standard
devia/on?
With
a
normal
distribu/on,
the
mean
divides
it
up
evenly
in
the
middle.
The
por/on
below
the
mean
covers
50%
of
the
popula/on,
whereas
the
por/on
above
the
mean
also
covers
50%
of
the
popula/on.
The
first
standard
devia/on
away
from
the
mean
covers
34%
of
the
distribu/on.
In
other
words,
1
standard
devia/on
above
the
mean
=
50%
+
34%
=
84%
=
84th
percen/le
Let’s
say
that
the
average
weight
for
a
one
year
old
is
25
lbs,
with
a
standard
devia/on
of
2
lbs.
Connor
is
23
lbs.
That’s
1
standard
devia/on
below
the
mean.
In
other
words
he
is
50%-‐34%
or
in
the16th
percen/le
of
the
popula/on
Nardia
is
27
lbs.
That’s
1
standard
devia/on
above
the
mean.
In
other
words
she
is
50%+34%
or
in
the
84th
percen/le
of
the
popula/on
The
en/re
distribu/on
is
covered
by
roughly
6
standard
devia/ons
–
3
above
the
mean
and
3
below
the
mean
Hence
the
name
of
the
quality
management
program
“Six
Sigma”
10
11. More
examples:
Given
a
mean
and
a
standard
devia/on
score,
you
have
a
pre]y
good
idea
of
what
the
distribu/on
is
like
–
is
it
fat
and
short,
or
tall
and
skinny?
We
can
then
map
out
individual
scores
on
the
distribu/on
and
tell
the
average
joes
from
the
weirdos!
11
12. The
Z
score
is
the
number
of
standard
devia/ons
from
the
mean.
With
our
previous
example,
Connor
would
have
a
Z
score
of
nega/ve
1
(that
is
1
standard
devia/on
below
the
mean),
while
Nardia
has
a
Z
score
of
1
(that
is
1
standard
devia/on
above
the
mean).
The
average
joes
would
have
close
to
zero
z
scores
(e.g.,
0.0006,
-‐.0029)
Whereas
the
weirdos
have
extremely
large
or
small
z
scores
(e.g.,
3.07,
-‐2.99)
Again
-‐
The
z
score
is
the
number
of
standard
devia/ons
that
a
data
point
is
away
from
the
mean.
Let's
say
that
the
average
weight
for
all
American
women
is
150
lbs,
and
the
standard
devia/on
is
20
lbs.
If
your
weight
is
130,
then
your
z
score
is
-‐1,
because
you're
exactly
1
standard
devia/on
below
the
mean.
If
Peggy's
weight
is
170,
then
her
z
score
is
1,
because
she
is
exactly
1
standard
devia/on
above
the
mean.
12
13. Ques/ons?
Schedule
a
chat/phone
mee/ng
with
the
instructor
for
more
assistance
13