1.
INTRODUCTION TO
STATISTICS & PROBABILITY
Chapter 1:
Looking at Data—Distributions
Dr. Nahid Sultana
2.
Chapter 1:
Looking at Data—Distributions
1.1 Displaying Distributions with Graphs
1.2 Describing Distributions with Numbers
1.3 Density Curves and Normal Distributions
3.
1.1 Displaying Distributions with Graphs
Objectives
Variables; Types of variables
Graphs for categorical variables
Bar graphs
Pie charts
Graphs for quantitative variables
Histograms
Stemplots
Stemplots versus histograms; Interpreting histograms
Time plots
4.
Variables
Statistics is the science of learning from data.
Individuals are the objects described in a set of data.
Individuals can be people, animals, plants, or any object of
interest.
A variable is any characteristic of an individual.
A variable varies among individuals i.e. it can take different
values for different individuals.
Example: age, height, blood pressure, ethnicity, first language
5.
Two types of variables
Variables can be either categorical
Something that falls into one of several categories.
Example: Your blood type (A, B, AB, O), your hair color, your ethnicity,
whether you paid income tax last tax year or not.
Or quantitative
Something that takes numerical values for which arithmetic
operations, such as adding and averaging, make sense.
Example: How tall you are, your age, your blood cholesterol level, the
number of credit cards you own.
6.
How do you decide if a variable is categorical or
quantitative?
Ask:
What are the n individuals examined?
What is being recorded about those n individuals?
Is that a number ( quantitative) or a statement ( categorical)?
Individuals studied
Diagnosis
Age at death
Patient A
Heart disease
56
Patient B
Stroke
70
Patient C
Stroke
75
Patient D
Lung cancer
60
Patient E
Heart disease
80
Patient F
Accident
73
Patient G
Diabetes
69
Each individual is given a
description
Each individual is given a
meaningful number
7.
A study examined the condition of deer after a
particularly nasty winter. Sex and condition
(good and poor) of a random sample of 61
deer are noted. Data from such a study could
appear in either of these two formats:
Frequency table
Raw data
Note: “Count” is NOT the variable
studied – it’s a summary statistic
for the data set.
Who/what are the individuals? the 61 deer
What are the variables, and are they quantitative or categorical?
sex (categorical) and condition (categorical)
8.
Distribution of a Variable
To examine a single variable, we graphically display its
distribution.
The distribution of a variable tells us what values it takes and
how often it takes these values.
Distributions can be displayed using a variety of graphical
tools. The proper choice of graph depends on the nature of
the variable.
Categorical Variable
Pie chart
Bar graph
Quantitative Variable
Histogram
Stemplot
9.
Distribution of Categorical Variables
Most common ways to graph categorical data:
Bar Graphs represent each category as a bar whose heights
represents either the count of individuals with that characteristic,
the frequency, or the percent of individuals with that
characteristic, the relative frequency.
Bar graph quickly compares the size of each group.
Pie Charts show the distribution of a categorical variable as a
“pie” whose slices are sized by the percents for the categories.
Require that you include all the categories that make up a whole.
10.
Bar Graphs and Pie Charts
Marital Status
Single
Married
Widowed
Divorced
Count (millions)
41.8
113.3
13.9
16.3
Percent
22.6
61.1
7.5
8.8
12.
Pie Charts and Bar Graphs (cont…)
Bar Graphs
Data in the graph can be ordered any way we want (alphabetical, by
increasing value, by year, by personal preference, etc.)
16.
Distribution of Quantitative Variables
Tells us what values the variable takes on and how often it
takes those values.
Can be displayed using:
Histograms
Stemplots
Time plots
Histograms and stemplots are summary graphs for a single
variable. They are very useful to understand the pattern of
variability in the data.
Time plot shows the behavior of observations over time.
17.
Histogram
Histograms show the distribution of a quantitative variable by
using bars whose height represents the number of individuals
who take on a value within a particular class.
Draw a histogram :
Divide the possible values into equal size interval (classes).
This makes up the horizontal axis.
Count how many observations fall into each interval (may
change to percents).
For each class on the horizontal axis, draw a bar. The
height of the bar represents the count (or percent) of data
points that fall in that class interval.
18.
Histogram (Cont…)
Example: Weight Data―Introductory Statistics Class
Count
7
12
7
8
12
4
1
0
1
Weight Data
Number of Students
Weight Group
100 - <120
120 - <140
140 - <160
160 - <180
180 - <200
200 - <220
220 - <240
240 - <260
260 - <280
15
10
5
0
Weight
The first bar represents all students with weight 100-<120. The
height of this bar shows how many students’ weight are in this
range.
19.
Stemplots
Stemplots separate each observation into a stem and a leaf
that are then plotted to display the distribution while maintaining
the original values of the variable.
To construct a stemplot:
Separate each observation into a stem (first part of the
number) and a leaf (the remaining part of the number).
Write the stems in a vertical column; draw a vertical line to
the right of the stems.
Write each leaf in the row to the right of its stem; order
leaves if desired.
20.
Stemplots (Cont…)
Example: Stemplot of the percents of females who are literate.
21.
Stemplots (Cont…)
(a)Write the stems.
(b) Go through the data and write each leaf on the proper stem
(c) Arrange the leaves on each stem in order out from the stem.
22.
Stemplots (Cont…)
To compare two related distributions, a back-to-back stem plot
with common stems is useful.
Example:
Here this Back-to-back stemplot comparing
the distributions of female and male
literacy rates.
Values on the left are the female percents,
ordered out from the stem from right to
left.
Values on the right are the male percents.
It is clear that literacy is generally higher
among males than among females in these
countries.
23.
Stemplots (Cont…)
Stem plots do not work well for large datasets.
When the observed values have too many digits, trim the numbers
before making a stem plot.
If there are very few stems (when the data cover only a very small
range of values), then we may want to create more stems by
splitting the original stems.
Example: If all of the data values were between 150 and 179, then
we may choose to use the following stems:
15
15
Leaves 0–4 would go on each upper stem
16
(first “15”), and leaves 5–9 would go on
16
each lower stem (second “15”).
17
17
24.
Examining Distributions
When describing the distribution of a quantitative variable,
we look for the overall pattern and for striking deviations
from that pattern.
We can describe the overall pattern of a histogram by
its shape, center, and spread.
Histogram with a smoothed
curve highlighting the overall
pattern of the distribution
25.
Examining Distributions (Cont…)
A distribution is symmetric if the right and left sides of the
graph are approximately mirror images of each other.
It is skewed to the left (left-skewed) if the left side of the
graph is much longer than the right side.
A distribution is skewed to the right (right-skewed) if the right
side of the graph is much longer than the left side.
26.
Outliers
Outliers are observations that lie outside the overall pattern
of a distribution.
The overall pattern is
fairly symmetrical
except for two states
Alaska and Florida.
A large gap in the
distribution is typically a
sign of an outlier.
Alaska
Florida
27.
Time Plots
Time plot of a variable plots each observation against the time
at which it was measured.
Always put time on the horizontal axis and the measuring
variable is on the vertical axis of the plot .
Connect the data
points by lines helps
emphasize any change
over time.
28.
Time Plots (cont…)
Scale matter
Look at the scales
29.
Graphing time series
Data collected over time are displayed in a time plot, with time on
the horizontal axis and the variable of interest on the vertical axis.
In a time series, a trend is a rise or fall that persists over time,
despite small irregularities.
o This plot is a graph of a time
series .
o It shows that there is a
decreasing trend in the data.
Gostou de algum slide específico?
Recortar slides é uma maneira fácil de colecionar informações para acessar mais tarde.
Be the first to comment