Introduction to Descriptive Statistics-Central Tendency & Dispersion-FA2013

This
lecture
presenta/on
complements
Khan’s
tutorials.

1

In
this
lecture
we
will
discuss
the
diﬀerent
methods
to
measure
central
tendency
and

dispersion
in
a
sta/s/cal
sample.

2

Central
tendency
is
just
a
technical
way
of
saying,
what’s
typical
of
this
sample?
For

example,
out
of
all
Carlow
students,
which
gender
is
the
more
typical
one?
Male
or

female?
Out
of
all
the
products
listed
on
Amazon,
which
is
the
best
seller?
And
out
of

all
the
eBay
lis/ngs
of
“Tickle
Me
Elmo,”
which
price
is
the
most
common
one?

3

These
three
diﬀerent
measures
are
discussed
in
detail
by
Khan
Academy.
Here
are

some
brief
summaries.

We
will
discuss
normal
distribu/on.

One
key
idea
is
this:

If
the
sample
is
normally
distributed,
meaning
it
looks
like
a
symmetrical
bell
curve,

then
mean,
median
and
mode
will
be
the
same
number.

However,
if
the
sample
is
skewed
either
to
the
leS
or
to
the
right,
then
these
three

numbers
would
take
on
diﬀerent
values.

4

Concepts
like
“mean”
and
“standard
devia/on”
are
really
based
on
the
theory
of

normal
curve.

Note
it’s
a
theory,
a
conceptualiza/on
of
how
data
should
be
distributed
in
an
ideal

world.

In
reality,
oSen
/mes
distribu/ons
are
not
perfectly
normal.

Next
slide
is
an
example.

Note
that
the
“mean”
=
the
50th
percen/le.

5

Look
at
this
distribu/on
of
salary
data.

It’s
heavy
on
the
leS
side,
with
a
long
skinny
tail
on
the
right.

Deﬁnitely
not
symmetrical.

6

When
we
impose
the
normal
curve
on
top
of
the
salary
distribu/on,
we
see
that
the

normal
curve
only
captures
the
right
tail
well.

For
the
leS
tail,
the
normal
curve
doesn’t
describe
the
actual
distribu/on
very
well.

This
is
because
the
salary
data
is
posi%vely
skewed.

In
skewed
data,
“mode”
and
“median”
describe
the
central
tendency
be]er
than
the

“mean”.

7

In
addi/on
to
central
tendency,
we
also
need
a
way
to
describe
how
spread
out
the

distribu/on
is,
and
how
weird
a
case
is
(rela/ve
to
the
mean).

When
a
case
is
very
close
to
the
mean,
we
have
an
average
joe.

When
a
case
is
far
oﬀ
from
the
mean
on
the
/p
of
a
long
tail,
we
have
a
weirdo!

In
real
life,
we
oSen
discuss
dispersion
without
realizing
it.
For
example:

In
which
percen/le
is
my
child’s
height?

How
many
people
in
this
class
will
get
an
A?

Is
the
customer’s
credit
score
above
or
below
average?
By
how
much?

Is
a
dona/on
of
$30,000
pre]y
common
or
very
rare?
How
rare
is
it?

This
slide
illustrates
the
distribu/on
of
total
purchase
aSer
a
customer
clicks
on
a
link.

Look
at
the
data,
the
mean,
the
distribu/on,
and
reﬂect
on
the
following
ques/ons:

How
likely
would
an
average
customer
spend
$200
per
order?

 Very
unlikely
–
it’s
at
the
end
of
the
curve
–
in
a
tail.

How
about
$35?


Much
more
likely
–
it’s
the
average
order.

In
what
percenEle
is
a
$67
order?

 The
84th
-‐
we
know
because
it’s
one
standard
deviaEon
(34%)
above
the
mean

(50%).

The
next
slide
explains
what
a
standard
deviaEon
is.

8

Standard
devia/on
is
a
standardized
measure
of
dispersion.

It
tells
you
whether
the
distribu/on
is
short
and
fat
(with
a
big
standard
distribu/on)

or
tall
and
skinny
(with
a
small
standard
distribu/on).

The
calcula/on
is
explained
well
by
Khan
(see
Khan’s
Academy
video
clips
linked
in

this
session).

The
basic
idea
to
take
away
is:

The
standard
devia/on
tells
you,
on
average,
how
far
away
the
data
points
are
from

the
mean.

For
example,
let’s
say
that
the
Steelers
have
an
average
score
of
25
per
game,
and
the

standard
devia/on
is
1.
Let’s
also
say
that
the
Greenbay
Packers
have
an
average

score
of
25
per
game,
and
a
standard
devia/on
of
7.

In
this
example,
both
teams
are
comparable
in
terms
of
average
scores,
but
the

Steelers
have
a
much
smaller
standard
devia/on.
This
means
the
Steelers’

performance
is
pre]y
consistent
over
/me,
their
scores
may
be
above
or
below
25,

but
only
by
1-‐2
points
on
average.
If
you
plot
their
scores
on
a
chart,
you
would
see

that
most
of
them
pack
around
25,
with
a
nice
narrow
distribu/on
that
peaks
around

25.

In
contrast,
the
Packers
may
average
around
25,
but
their
performance
varies
widely

from
game
to
game.
One
day
they
may
score
18
(25-‐7)
and
the
next
day
they
may

score
32
(25+7)
If
you
plot
their
widely
varied
scores
on
a
chart,
you
would
get
a
short

and
fat
distribu/on.

(Go
Steelers
Go!)

9

What
are
prac/cal
ways
to
use
the
standard
devia/on?

With
a
normal
distribu/on,
the
mean
divides
it
up
evenly
in
the
middle.
The
por/on

below
the
mean
covers
50%
of
the
popula/on,
whereas
the
por/on
above
the
mean

also
covers
50%
of
the
popula/on.

The
ﬁrst
standard
devia/on
away
from
the
mean
covers
34%
of
the
distribu/on.

In
other
words,
1
standard
devia/on
above
the
mean
=
50%
+
34%
=
84%
=
84th

percen/le

Let’s
say
that
the
average
weight
for
a
one
year
old
is
25
lbs,
with
a
standard

devia/on
of
2
lbs.

Connor
is
23
lbs.
That’s
1
standard
devia/on
below
the
mean.
In
other
words
he
is

50%-‐34%
or
in
the16th
percen/le
of
the
popula/on

Nardia
is
27
lbs.
That’s
1
standard
devia/on
above
the
mean.
In
other
words
she
is

50%+34%
or
in
the
84th
percen/le
of
the
popula/on

The
en/re
distribu/on
is
covered
by
roughly
6
standard
devia/ons
–
3
above
the

mean
and
3
below
the
mean

Hence
the
name
of
the
quality
management
program
“Six
Sigma”

10

More
examples:

Given
a
mean
and
a
standard
devia/on
score,
you
have
a
pre]y
good
idea
of
what

the
distribu/on
is
like
–
is
it
fat
and
short,
or
tall
and
skinny?

We
can
then
map
out
individual
scores
on
the
distribu/on
and
tell
the
average
joes

from
the
weirdos!

11

The
Z
score
is
the
number
of
standard
devia/ons
from
the
mean.

With
our
previous
example,
Connor
would
have
a
Z
score
of
nega/ve
1
(that
is
1

standard
devia/on
below
the
mean),
while
Nardia
has
a
Z
score
of
1
(that
is
1

standard
devia/on
above
the
mean).

The
average
joes
would
have
close
to
zero
z
scores
(e.g.,
0.0006,
-‐.0029)

Whereas
the
weirdos
have
extremely
large
or
small
z
scores
(e.g.,
3.07,
-‐2.99)

Again
-‐

The
z
score
is
the
number
of
standard
devia/ons
that
a
data
point
is
away
from
the

mean.

Let's
say
that
the
average
weight
for
all
American
women
is
150
lbs,
and
the
standard

devia/on
is
20
lbs.

If
your
weight
is
130,
then
your
z
score
is

-‐1,
because
you're
exactly
1
standard

devia/on
below
the
mean.

If
Peggy's
weight
is
170,
then
her
z
score
is
1,
because
she
is
exactly
1
standard

devia/on
above
the
mean.

12

Ques/ons?
Schedule
a
chat/phone
mee/ng
with
the
instructor
for
more
assistance

13

Introduction to Descriptive Statistics-Central Tendency & Dispersion-FA2013

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (7)

Similar to Introduction to Descriptive Statistics-Central Tendency & Dispersion-FA2013

Similar to Introduction to Descriptive Statistics-Central Tendency & Dispersion-FA2013 (20)

Recently uploaded

Recently uploaded (20)

Introduction to Descriptive Statistics-Central Tendency & Dispersion-FA2013