SlideShare a Scribd company logo
1 of 71
Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
2-1
Exploring Data
with Graphs and
Numerical
Summaries
Chapter 2
Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
2-2
Different Types of
Data
Section
2.1
2-3 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
A variable is any characteristic observed in a study.
Examples: high temperature, low temperature, cloud cover,
whether it rained, and number of centimeters of precipitation
A variable can be classified as either
 Categorical (in Categories), or
 Quantitative (Numerical)
Variable
2-4 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
A variable can be classified as categorical if each
observation belongs to one of a set of distinct categories.
Examples:
 Gender (Male or Female)
 Religious Affiliation (Catholic, Jewish, …)
 Type of Residence (Apartment, Condo, …)
 Belief in Life After Death (Yes or No)
Categorical Variable
2-5 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
A variable is called quantitative if observations on it take
numerical values that represent different magnitudes of the
variable.
Examples:
 Age
 Number of Siblings
 Annual Income
Quantitative Variable
2-6 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
For Quantitative variables: key features are the center and
variability (spread) of the data.
Example: What’s a typical annual amount of precipitation? Is
there much variation from year to year?
For Categorical variables: a key feature is the relative
number of observations in the various categories.
Example: What percentage of days were sunny in a given
year?
Main Features of Quantitative and Categorical
Variables
2-7 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
A quantitative variable is discrete if its possible values form
a set of separate numbers, such as 0,1,2,3,….
Discrete variables have a finite number of possible values.
Examples:
 Number of pets in a household
 Number of children in a family
 Number of foreign languages spoken by an individual
Discrete Quantitative Variable
2-8 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
A quantitative variable is continuous if its possible values
form an interval.
Continuous variables have an infinite number of possible
values.
Examples:
 Height/Weight
 Age
 Amount of time to complete an assignment
Continuous Quantitative Variable
2-9 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
The distribution of a variable describes how the
observations fall (are distributed) across the range of
possible values.
Graphs and frequency tables are used to look for key
features of a distribution.
Distribution of a Variable
2-10 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
A frequency table is a listing of possible values for a
variable, together with the number of observations and/or
relative frequencies for each value.
Frequency Table
Table 2.1 Frequency of Shark Attacks in Various Regions for 2004–2013
2-11 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
Proportion & Percentage (Relative
Frequencies)
The proportion of observations falling in a certain category
is the number of observations in that category divided by the
total number of observations.
proportion =
frequency of that class
sum of all frequencies
The percentage is the proportion multiplied by 100.
Proportions and percentages are also called relative
frequencies.
2-12 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
There were 203 reported shark attacks in Florida between
2004 and 2013. Table 2.1 classifies 689 shark attacks
reported from 2004 through 2013, so, for Florida,
 203 is the frequency.
 203/689 = 0.295 is the proportion and relative
frequency.
 29.5 is the percentage 0.295 *100= 29.5%.
Frequency, Proportion, & Percentage
Example
2-13 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
To show the distribution for a discrete quantitative
variable, we would similarly list the distinct values and the
frequency of each one occurring.
Discrete Quantitative Variable
2-14 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
For a continuous quantitative variable (or when the
number of possible outcomes is very large for a discrete
variable), we divide the numeric scale on which the variable
is measured into as set of nonoverlapping intervals and
count the number of observations that fall in each interval.
The frequency table then shows these intervals together with
the corresponding count.
Continuous Quantitative Variable
Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
2-15
Section
2.2
Graphical
Summaries of
Data
2-16 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
The two primary graphical displays for summarizing a
categorical variable are the pie chart and the bar graph.
Pie Chart: A circle having a “slice of pie” for each category
Bar Graph: A graph that displays a vertical bar for each
category
Graphs for Categorical Variables
2-17 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
Pie Charts:
 Used for summarizing a categorical variable.
 Drawn as a circle where each category is represented
as a “slice of the pie”.
 The size of each pie slice is proportional to the
percentage of observations falling in that category.
Pie Charts
2-18 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
Example: Shark Attacks in the United States
Table 2.2 Unprovoked Shark Attacks in the U.S., 2004 - 2013
2-19 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
Figure 2.1 Pie Chart of Shark Attacks Across U.S. States. The label for each
slice of the pie gives the category and the percentage of attacks in a state. The slice that
represents the percentage of attacks reported in Hawaii is 13% of the total area of the pie.
Question Why is it beneficial to label the pie wedges with the percent?
Example: Shark Attacks in the United States
2-20 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
Bar graphs are used for summarizing a categorical variable.
 Bar graphs display a vertical bar for each category.
 The height of each bar represents either counts
(“frequencies”) or percentages (“relative frequencies”) for
that category.
 It is usually easier to compare categories with a bar
graph rather than with a pie chart.
 Bar graphs are called Pareto charts when the
categories are ordered by their frequency, from the
tallest bar to the shortest bar.
Bar Graphs
2-21 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
Figure 2.2 Bar Graph of Shark Attacks Across U.S. States. Except for the Other
category, which is shown last, the bars are ordered from largest to smallest based on
the frequency of shark attacks. Question What is the advantage of ordering the bars
this way rather than alphabetically?
Example: Shark Attacks in the United States
2-22 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
Dot Plot: shows a dot for each observation placed above its
value on a number line.
Stem-and-Leaf Plot: portrays the individual observations.
Histogram: uses bars to portray the data.
Graphs for Quantitative Variables
2-23 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
Dot plots are used for summarizing a quantitative variable.
To construct a dot plot
1. Draw a horizontal line.
2. Label it with the name of the variable.
3. Mark regular values of the variable on it.
4. For each observation, place a dot above its value on
the number line.
Dot Plots
2-24 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
Each observation is represented by a stem and a leaf.
To construct a stem-and-leaf plot
1. Sort the data in order from smallest to largest.
2. Place the stems in a column, starting with the
smallest.
3. Place a vertical line to their right.
4. On the right side of the vertical line, indicate each leaf
(final digit) that has a particular stem.
5. List the leaves in increasing order.
Stem-and-Leaf Plots
2-25 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
A histogram provides a versatile way to picture the
distribution of a large data set.
1. Divide the range of the data into intervals of equal width.
2. Count the number of observations in each interval,
creating a frequency table.
3. On the horizontal axis, label the values or the endpoints
of the intervals.
4. Draw a bar over each value or interval with height equal
to its frequency (or percentage), values of which are
marked on the vertical axis.
5. Label and title appropriately.
Steps for Constructing a Histogram
2-26 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
Example: Sodium in Cereals
Table 2.4 Frequency Table for Sodium in 20 Breakfast Cereals. The table
summarizes the sodium values using eight intervals and lists the number of
observations in each, as well as the proportions and percentages.
2-27 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
Figure 2.6 Histogram of Breakfast Cereal Sodium Values. The rectangular bar over
an interval has height equal to the number of observations in the interval.
Histogram for Sodium in Cereals
2-28 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
How do we decide which to use? Here are some guidelines:
Dot-plot and stem-and-leaf plot
 More useful for small data sets
 Data values are retained
Histogram
 More useful for large data sets
 Most compact display
 More flexibility in defining intervals
Choosing a Graph Type
2-29 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
Overall pattern consists of center, spread, and shape.
 Assess where a distribution is centered by finding the
median (50% of data below median and 50% of data
above).
 Assess the spread of a distribution.
 Shape of a distribution: roughly symmetric, skewed to
the right, or skewed to the left
Interpreting Histograms
2-30 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
Shape
A distribution is unimodal
if it has a single mound or
peak.
A distribution is bimodal
if it has two distinct
mounds.
The value that occurs the most often is called the mode.
2-31 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
A distribution is skewed
to the left if the left tail is
longer than the right tail.
Shape
A distribution is skewed to
the right if the right tail is
longer than the left tail.
Symmetric Distributions: if both left and
right sides of the histogram are mirror images of
each other.
2-32 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
Examples of Skewness
2-33 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
A data set collected over time is called a time series.
Time plots are used to display time series data graphically.
 Plots each observation on the vertical scale against the
time it was measured on the horizontal scale.
 Common patterns in the data over time, known as
trends, should be noted.
 To see a trend more clearly, it is beneficial to connect
the data points in their time sequence.
Time Plots
2-34 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
The figure shows a time plot of the incidence rate from1980 to
2012, based on numbers reported by the Center for Disease
Control and Prevention.
Time Plot Example
Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
2-35
Section
2.3
Measuring the
Center of
Quantitative Data
2-36 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
The mean is the sum of the observations divided by the
number of observations.
Mean


n
x
x
2-37 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
Example: Center of the Cereal Sodium Data
We find the mean by adding all the observations and then
dividing this sum by the number of observations, which is 20:
0, 340, 70, 140, 200, 180, 210, 150, 100, 130,
140, 180, 190, 160, 290, 50, 220, 180, 200, 210
Mean = (0 + 340 + 70 + . . . +210)/20 = 3340/20 =167
2-38 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
The median is the middle value of the observations when they are
ordered from the smallest to the largest (or from the largest to
smallest).
How to Determine the Median:
Put the n observations in order of their size.
If the number of observations, n, is:
 odd, then the median is the middle observation.
 even, then the median is the average of the two middle
observations.
Median
2-39 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
An outlier is an observation that falls well above or well below
the overall bulk of the data.
Outlier
2-40 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
CO2 pollution levels in 9 largest nations measured in metric
tons per person:
The CO2 values have n = 9 observations. The ordered
values are: 0.3, 0.4, 0,8, 1.4, 1.8, 2.1, 5.9, 11.6, 16.9.
Example: CO2 Pollution
2-41 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
Since n is odd, the median is the middle value, 1.8.
The relatively high value of 16.9 falls well above the rest of the
data. It is an outlier.
The size of the outlier affects the calculation of the mean but not
the median.
Example: CO2 Pollution
2-42 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
The shape of a distribution influences whether the mean is
larger or smaller than the median.
 Perfectly symmetric, the mean equals the median.
 Skewed to the left, the mean is smaller than the
median.
 Skewed to the right, the mean is larger than the
median.
Comparing the Mean and Median
2-43 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
In a skewed distribution, the mean is farther out in the long tail
than is the median.
 For skewed distributions the median is preferred because
it better represents what is typical.
Figure 2.9 Relationship Between the Mean and Median. Question: For skewed
distributions, what causes the mean and median to differ?
Comparing the Mean and Median
2-44 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
A numerical summary measure is resistant if extreme
observations (outliers) have little, if any, influence on its
value.
 The Median is resistant to outliers.
 The Mean is not resistant to outliers.
Resistant Measures
2-45 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
Mode
 Value that occurs most often.
 Highest bar in the histogram.
 The mode is most often used with categorical data.
The Mode
Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
2-46
Section
2.4
Measuring the
Variability of
Quantitative Data
2-47 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
One way to measure the variability of a distribution is to
calculate the range.
The range is the difference between the largest and
smallest values in the data set:
Range = maximum value – minimum value
The range is simple to compute and easy to understand,
but it uses only the extreme values and ignores the other
values. Therefore, it’s affected severely by outliers.
Range
2-48 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
The deviation of an observation x from the mean 𝑥 is
(𝑥 − 𝑥), the difference between the observation and the
sample mean.
 Each observation has a deviation from the mean.
 A deviation is positive if the value falls above the mean
and negative if the value falls below the mean.
 The sum of the deviations for all the values in a data
set is always zero.
Standard Deviation
2-49 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
The deviation of an observation x from the mean 𝑥 is
(𝑥 − 𝑥), the difference between the observation and the
sample mean.
 Summary measures of variability from the mean use
either the squared deviations or their absolute values.
 The average of the squared deviations is called the
variance.
 The symbol (𝑥 − 𝑥)2 is called a sum of squares.
Standard Deviation
2-50 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
For the cereal sodium values, the mean is 𝑥 = 167. The
observation of 210 for Honeycomb has a deviation of
210 - 167 = 43. The observation of 50 for Honey Smacks
has a deviation of 50 – 167 = -117. Figure 2.11 shows
these deviations.
Figure 2.11 Dot Plot for Cereal Sodium Data, Showing Deviations for Two
Observations. Question: When is a deviation positive and when is it negative?
Standard Deviation Example
2-51 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
Gives a measure of variation by summarizing the deviations
of each observation from the mean and calculating an
adjusted average of these deviations.
The larger the standard deviation, s, the greater the
variability of the data.
The Standard Deviation s of n
Observations
1
)
( 2




n
x
x
s
2-52 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
Standard Deviation Example
Women’s and Men’s Ideal Number of Children
Men: 0, 0, 0, 2, 4, 4, 4 Women: 0, 2, 2, 2, 2, 2, 4
Both men and women have a mean of 2 and a range of 4.
The standard deviation for men is
𝑠 =
(𝑥−𝑥)2
𝑛−1
=
24
6
= 2.0.
The standard deviation for women is 1.2.
2-53 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
The most basic property of the standard deviation is this:
The larger the standard deviation S, the greater the variability of the
data.
 S measures the spread of the data.
 S = 0 only when all observations have the same value, otherwise
S > 0. As the spread of the data increases, S gets larger.
 S has the same units of measurement as the original
observations. The variance = S 2 has units that are squared.
 S is not resistant. Strong skewness or a few outliers can greatly
increase S.
Properties of the Standard Deviation
2-54 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
Magnitude of s: The Empirical Rule
If a distribution of data is bell-shaped, then approximately:
 68% of the observations fall within 1 standard deviation of the
mean, that is between the values of 𝑥 – s and 𝑥 + s
(denoted 𝑥 ± 𝑠).
 95% of the observations fall within 2 standard deviations of
the mean (𝑥 ± 2𝑠).
 All or nearly all observations fall within 3 standard deviations
of the mean (𝑥 ± 3𝑠).
2-55 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
Figure 2.12 The Empirical Rule. For bell-shaped distributions, this tells us approximately
how much of the data fall within 1, 2, and 3 standard deviations of the mean. Question:
About what percentage would fall more than 2 standard deviations from the mean?
Magnitude of s: The Empirical Rule
Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
2-56
Section
2.5
Using Measures
of Position to
Describe
Variability
2-57 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
The th percentile is a value such that percent of the
observations fall below or at that value.
Percentile
p p
2-58 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
Quartiles
Figure 2.14 The Quartiles Split the Distribution Into Four Parts. 25% is below the first
quartile (Q1), 25% is between the first quartile and the second quartile (the median, Q2), 25%
is between the second quartile and the third quartile (Q3), and 25% is above the third quartile.
Question: Why is the second quartile also the median?
2-59 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
SUMMARY: Finding Quartiles
 Arrange the data in order.
 Consider the median. This is the second quartile, Q2.
 Consider the lower half of the observations (excluding
the median itself if n is odd). The median of these
observations is the first quartile, Q1.
 Consider the upper half of the observations (excluding
the median itself if n is odd). Their median is the third
quartile, Q3.
2-60 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
Consider the sodium values for the 20 breakfast cereals. What are the
quartiles for the 20 cereal sodium values? From Table 2.3, the sodium values, in
ascending order, are:
The median of the 20 values is the average of the 10th and 11th
observations, 180 and 180, which is Q2 = 180 mg.
The first quartile Q1 is the median of the 10 smallest observations (in the
top row), which is the average of 130 and 140, Q1 = 135 mg.
The third quartile Q3 is the median of the 10 largest observations (in the
bottom row), which is the average of 200 and 210, Q3 = 205 mg.
Example: Cereal Sodium Data
2-61 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
The interquartile range is the distance between the third
quartile and first quartile:
IQR = Q3  Q1
IQR gives the spread of middle 50% of the data.
The Interquartile Range (IQR)
In Words: If the interquartile range of U.S. music teacher salaries equals $16,000, this
means that for the middle 50% of the distribution stretches over a distance of $16,000.
2-62 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
Examining the data for unusual observations, such as
outliers, is important in any statistical analysis. Is there a
formula for flagging an observation as potentially being an
outlier?
The 1.5 x IQR Criterion for Identifying Potential Outliers
An observation is a potential outlier if it falls a distance of
more than 1.5 x IQR below the first quartile or a distance of
more than 1.5 x IQR above the third quartile.
Detecting Potential Outliers
2-63 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
The five-number summary is the basis of a graphical display
called the box plot, and consists of
 Minimum value
 First Quartile
 Median
 Third Quartile
 Maximum value
Five-Number Summary of Positions
2-64 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
 A box goes from Q1 to Q3.
 A line is drawn inside the box at the median.
 A line goes from the lower end of the box to the
smallest observation that is not a potential outlier and
from the upper end of the box to the largest
observation that is not a potential outlier.
 The potential outliers are shown separately.
SUMMARY: Constructing a Box Plot
2-65 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
Figure 2.15 shows a box plot for the sodium values. Labels
are also given for the five-number summary of positions.
Example: Box Plot for Cereal Sodium
Data
Figure 2.15 Box Plot and Five-Number Summary for 20 Breakfast Cereal Sodium Values.
The central box contains the middle 50% of the data. The line in the box marks the median.
Whiskers extend from the box to the smallest and largest observations, which are not identified
as potential outliers. Potential outliers are marked separately. Question: Why is the left whisker
drawn down only to 50 rather than to 0?
2-66 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
Side-by-Side Box Plots Help to Compare
Groups
A box plot does not portray certain features of a distribution, such as
distinct mounds and possible gaps, as clearly as does a histogram.
Box plots are useful for identifying potential outliers.
Figure 2.16 Box Plots of Male and Female College Student Heights. The box plots use the
same scale for height. Question: What are approximate values of the quartiles for the two
groups?
2-67 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
The z-score also identifies position and potential outliers.
The z-score for an observation is the number of standard
deviations that it falls from the mean. A positive z-score indicates
the observation is above the mean. A negative z-score indicates
the observation is below the mean. For sample data, the z -score
is calculated as:
An observation from a bell-shaped distribution is a potential outlier
if its z-score < -3 or > +3 (3 standard deviation criterion).
Z-Score
observation mean
.
standard deviation
z


Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
2-68
Section
2.6
Recognizing and
Avoiding
Misuses of
Graphical
Summaries
2-69 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
 Label both axes and provide proper headings.
 To better compare relative size, the vertical axis should start
at 0.
 Be cautious in using anything other than bars, lines, or points.
 It can be difficult to portray more than one group on a single
graph when the variable values differ greatly. Consider
instead using separate graphs, or plotting relative sizes such
as ratios or percentages.
Summary: Guidelines for Constructing
Effective Graphs
2-70 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
Example: Beware of Poor Graphs
Figure 2.18 An Example of a Poor Graph. Question: What’s misleading about the
way the data are presented?
2-71 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.
Figure 2.19 A Better Graph for the Data in Figure 2.18. Question: What trends do
you see in the enrollments from 2004 to 2012?
Example: Beware of Poor Graphs

More Related Content

What's hot

Reliability and validity
Reliability and validityReliability and validity
Reliability and validityKaimrc_Rss_Jd
 
reliablity and validity in social sciences research
reliablity and validity  in social sciences researchreliablity and validity  in social sciences research
reliablity and validity in social sciences researchSourabh Sharma
 
Presentation validity
Presentation validityPresentation validity
Presentation validityAshMusavi
 
General Aptitude Test Battery (GATB)
General Aptitude Test Battery (GATB)General Aptitude Test Battery (GATB)
General Aptitude Test Battery (GATB)Natasha Gupta
 
Correlation Research Design
Correlation Research DesignCorrelation Research Design
Correlation Research DesignSu Qee
 
Differential item functioning2
Differential item functioning2Differential item functioning2
Differential item functioning2Carlo Magno
 
Classical Test Theory and Item Response Theory
Classical Test Theory and Item Response TheoryClassical Test Theory and Item Response Theory
Classical Test Theory and Item Response Theorysaira kazim
 
Validity and Reliability - Research Mangement
Validity and Reliability - Research MangementValidity and Reliability - Research Mangement
Validity and Reliability - Research MangementVinu Arpitha
 
Statistics and probability lecture 01
Statistics and probability  lecture 01Statistics and probability  lecture 01
Statistics and probability lecture 01MuhammadTufailKaran
 
What is a partial correlation?
What is a partial correlation?What is a partial correlation?
What is a partial correlation?Ken Plummer
 
Validity, Reliability and Feasibility
Validity, Reliability and FeasibilityValidity, Reliability and Feasibility
Validity, Reliability and FeasibilityJasna3134
 
Stratified Random Sampling - Problems
Stratified Random Sampling -  ProblemsStratified Random Sampling -  Problems
Stratified Random Sampling - ProblemsSundar B N
 
Sarjinder singh
Sarjinder singhSarjinder singh
Sarjinder singhHina Aslam
 
Parametric & non-parametric
Parametric & non-parametricParametric & non-parametric
Parametric & non-parametricSoniaBabaee
 

What's hot (20)

Reliability and validity
Reliability and validityReliability and validity
Reliability and validity
 
Reliability bachman 1990 chapter 6
Reliability bachman 1990 chapter 6Reliability bachman 1990 chapter 6
Reliability bachman 1990 chapter 6
 
lfstat3e_ppt_02_rev.ppt
lfstat3e_ppt_02_rev.pptlfstat3e_ppt_02_rev.ppt
lfstat3e_ppt_02_rev.ppt
 
reliablity and validity in social sciences research
reliablity and validity  in social sciences researchreliablity and validity  in social sciences research
reliablity and validity in social sciences research
 
Presentation validity
Presentation validityPresentation validity
Presentation validity
 
General Aptitude Test Battery (GATB)
General Aptitude Test Battery (GATB)General Aptitude Test Battery (GATB)
General Aptitude Test Battery (GATB)
 
Correlation Research Design
Correlation Research DesignCorrelation Research Design
Correlation Research Design
 
Experimental research
Experimental researchExperimental research
Experimental research
 
Differential item functioning2
Differential item functioning2Differential item functioning2
Differential item functioning2
 
Classical Test Theory and Item Response Theory
Classical Test Theory and Item Response TheoryClassical Test Theory and Item Response Theory
Classical Test Theory and Item Response Theory
 
Validity and Reliability - Research Mangement
Validity and Reliability - Research MangementValidity and Reliability - Research Mangement
Validity and Reliability - Research Mangement
 
Ch4 ppt
Ch4 pptCh4 ppt
Ch4 ppt
 
TONI-4 Test Review
TONI-4 Test ReviewTONI-4 Test Review
TONI-4 Test Review
 
Statistics and probability lecture 01
Statistics and probability  lecture 01Statistics and probability  lecture 01
Statistics and probability lecture 01
 
Cap.9. Recoleccion De Datos.Escalograma de Guttman.
Cap.9. Recoleccion De Datos.Escalograma de Guttman.Cap.9. Recoleccion De Datos.Escalograma de Guttman.
Cap.9. Recoleccion De Datos.Escalograma de Guttman.
 
What is a partial correlation?
What is a partial correlation?What is a partial correlation?
What is a partial correlation?
 
Validity, Reliability and Feasibility
Validity, Reliability and FeasibilityValidity, Reliability and Feasibility
Validity, Reliability and Feasibility
 
Stratified Random Sampling - Problems
Stratified Random Sampling -  ProblemsStratified Random Sampling -  Problems
Stratified Random Sampling - Problems
 
Sarjinder singh
Sarjinder singhSarjinder singh
Sarjinder singh
 
Parametric & non-parametric
Parametric & non-parametricParametric & non-parametric
Parametric & non-parametric
 

Similar to Data analysis techniques

Statistik Chapter 2
Statistik Chapter 2Statistik Chapter 2
Statistik Chapter 2WanBK Leo
 
Data and graphs
Data and graphsData and graphs
Data and graphsMy_VivJaan
 
Multiple Linear Regression Applications in Real Estate Pricing
Multiple Linear Regression Applications in Real Estate PricingMultiple Linear Regression Applications in Real Estate Pricing
Multiple Linear Regression Applications in Real Estate Pricinginventionjournals
 
Multiple Linear Regression Applications in Real Estate Pricing
Multiple Linear Regression Applications in Real Estate PricingMultiple Linear Regression Applications in Real Estate Pricing
Multiple Linear Regression Applications in Real Estate Pricinginventionjournals
 
Triola 11 chapter 2 notes f11
Triola 11 chapter 2 notes f11Triola 11 chapter 2 notes f11
Triola 11 chapter 2 notes f11babygirl5810
 
Triola 11 chapter 2
Triola 11 chapter 2Triola 11 chapter 2
Triola 11 chapter 2babygirl5810
 
Research Method for Business chapter 12
Research Method for Business chapter 12Research Method for Business chapter 12
Research Method for Business chapter 12Mazhar Poohlah
 
Data Description-Numerical Measure-Chap003 2 2.ppt
Data Description-Numerical Measure-Chap003 2 2.pptData Description-Numerical Measure-Chap003 2 2.ppt
Data Description-Numerical Measure-Chap003 2 2.pptArkoKesha
 
(8) Lesson 9.2
(8) Lesson 9.2(8) Lesson 9.2
(8) Lesson 9.2wzuri
 
Forecasting Stock Market using Multiple Linear Regression
Forecasting Stock Market using Multiple Linear RegressionForecasting Stock Market using Multiple Linear Regression
Forecasting Stock Market using Multiple Linear Regressionijtsrd
 
MSC III_Research Methodology and Statistics_Descriptive statistics.pdf
MSC III_Research Methodology and Statistics_Descriptive statistics.pdfMSC III_Research Methodology and Statistics_Descriptive statistics.pdf
MSC III_Research Methodology and Statistics_Descriptive statistics.pdfSuchita Rawat
 
STATISTICSINFORMED DECISIONS USING DATAFifth EditionChapte.docx
STATISTICSINFORMED DECISIONS USING DATAFifth EditionChapte.docxSTATISTICSINFORMED DECISIONS USING DATAFifth EditionChapte.docx
STATISTICSINFORMED DECISIONS USING DATAFifth EditionChapte.docxrafaelaj1
 

Similar to Data analysis techniques (20)

organizing data.pdf
organizing data.pdforganizing data.pdf
organizing data.pdf
 
Stat11t chapter2
Stat11t chapter2Stat11t chapter2
Stat11t chapter2
 
Chapter2 biostatistics
Chapter2 biostatisticsChapter2 biostatistics
Chapter2 biostatistics
 
Statistik Chapter 2
Statistik Chapter 2Statistik Chapter 2
Statistik Chapter 2
 
Data and graphs
Data and graphsData and graphs
Data and graphs
 
Multiple Linear Regression Applications in Real Estate Pricing
Multiple Linear Regression Applications in Real Estate PricingMultiple Linear Regression Applications in Real Estate Pricing
Multiple Linear Regression Applications in Real Estate Pricing
 
Multiple Linear Regression Applications in Real Estate Pricing
Multiple Linear Regression Applications in Real Estate PricingMultiple Linear Regression Applications in Real Estate Pricing
Multiple Linear Regression Applications in Real Estate Pricing
 
Triola 11 chapter 2 notes f11
Triola 11 chapter 2 notes f11Triola 11 chapter 2 notes f11
Triola 11 chapter 2 notes f11
 
Triola 11 chapter 2
Triola 11 chapter 2Triola 11 chapter 2
Triola 11 chapter 2
 
Unit 3.2
Unit 3.2Unit 3.2
Unit 3.2
 
Research Method for Business chapter 12
Research Method for Business chapter 12Research Method for Business chapter 12
Research Method for Business chapter 12
 
Stat214_Chapter 3.pdf
Stat214_Chapter 3.pdfStat214_Chapter 3.pdf
Stat214_Chapter 3.pdf
 
Data Description-Numerical Measure-Chap003 2 2.ppt
Data Description-Numerical Measure-Chap003 2 2.pptData Description-Numerical Measure-Chap003 2 2.ppt
Data Description-Numerical Measure-Chap003 2 2.ppt
 
Panel data content
Panel data contentPanel data content
Panel data content
 
(8) Lesson 9.2
(8) Lesson 9.2(8) Lesson 9.2
(8) Lesson 9.2
 
Chapter 3
Chapter 3Chapter 3
Chapter 3
 
Forecasting Stock Market using Multiple Linear Regression
Forecasting Stock Market using Multiple Linear RegressionForecasting Stock Market using Multiple Linear Regression
Forecasting Stock Market using Multiple Linear Regression
 
MSC III_Research Methodology and Statistics_Descriptive statistics.pdf
MSC III_Research Methodology and Statistics_Descriptive statistics.pdfMSC III_Research Methodology and Statistics_Descriptive statistics.pdf
MSC III_Research Methodology and Statistics_Descriptive statistics.pdf
 
Ch 3 DATA.doc
Ch 3 DATA.docCh 3 DATA.doc
Ch 3 DATA.doc
 
STATISTICSINFORMED DECISIONS USING DATAFifth EditionChapte.docx
STATISTICSINFORMED DECISIONS USING DATAFifth EditionChapte.docxSTATISTICSINFORMED DECISIONS USING DATAFifth EditionChapte.docx
STATISTICSINFORMED DECISIONS USING DATAFifth EditionChapte.docx
 

Recently uploaded

SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....ShaimaaMohamedGalal
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 

Recently uploaded (20)

SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 

Data analysis techniques

  • 1. Copyright © 2017, 20013, and 2009, Pearson Education, Inc. 2-1 Exploring Data with Graphs and Numerical Summaries Chapter 2
  • 2. Copyright © 2017, 20013, and 2009, Pearson Education, Inc. 2-2 Different Types of Data Section 2.1
  • 3. 2-3 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. A variable is any characteristic observed in a study. Examples: high temperature, low temperature, cloud cover, whether it rained, and number of centimeters of precipitation A variable can be classified as either  Categorical (in Categories), or  Quantitative (Numerical) Variable
  • 4. 2-4 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. A variable can be classified as categorical if each observation belongs to one of a set of distinct categories. Examples:  Gender (Male or Female)  Religious Affiliation (Catholic, Jewish, …)  Type of Residence (Apartment, Condo, …)  Belief in Life After Death (Yes or No) Categorical Variable
  • 5. 2-5 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. A variable is called quantitative if observations on it take numerical values that represent different magnitudes of the variable. Examples:  Age  Number of Siblings  Annual Income Quantitative Variable
  • 6. 2-6 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. For Quantitative variables: key features are the center and variability (spread) of the data. Example: What’s a typical annual amount of precipitation? Is there much variation from year to year? For Categorical variables: a key feature is the relative number of observations in the various categories. Example: What percentage of days were sunny in a given year? Main Features of Quantitative and Categorical Variables
  • 7. 2-7 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. A quantitative variable is discrete if its possible values form a set of separate numbers, such as 0,1,2,3,…. Discrete variables have a finite number of possible values. Examples:  Number of pets in a household  Number of children in a family  Number of foreign languages spoken by an individual Discrete Quantitative Variable
  • 8. 2-8 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. A quantitative variable is continuous if its possible values form an interval. Continuous variables have an infinite number of possible values. Examples:  Height/Weight  Age  Amount of time to complete an assignment Continuous Quantitative Variable
  • 9. 2-9 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. The distribution of a variable describes how the observations fall (are distributed) across the range of possible values. Graphs and frequency tables are used to look for key features of a distribution. Distribution of a Variable
  • 10. 2-10 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. A frequency table is a listing of possible values for a variable, together with the number of observations and/or relative frequencies for each value. Frequency Table Table 2.1 Frequency of Shark Attacks in Various Regions for 2004–2013
  • 11. 2-11 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. Proportion & Percentage (Relative Frequencies) The proportion of observations falling in a certain category is the number of observations in that category divided by the total number of observations. proportion = frequency of that class sum of all frequencies The percentage is the proportion multiplied by 100. Proportions and percentages are also called relative frequencies.
  • 12. 2-12 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. There were 203 reported shark attacks in Florida between 2004 and 2013. Table 2.1 classifies 689 shark attacks reported from 2004 through 2013, so, for Florida,  203 is the frequency.  203/689 = 0.295 is the proportion and relative frequency.  29.5 is the percentage 0.295 *100= 29.5%. Frequency, Proportion, & Percentage Example
  • 13. 2-13 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. To show the distribution for a discrete quantitative variable, we would similarly list the distinct values and the frequency of each one occurring. Discrete Quantitative Variable
  • 14. 2-14 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. For a continuous quantitative variable (or when the number of possible outcomes is very large for a discrete variable), we divide the numeric scale on which the variable is measured into as set of nonoverlapping intervals and count the number of observations that fall in each interval. The frequency table then shows these intervals together with the corresponding count. Continuous Quantitative Variable
  • 15. Copyright © 2017, 20013, and 2009, Pearson Education, Inc. 2-15 Section 2.2 Graphical Summaries of Data
  • 16. 2-16 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. The two primary graphical displays for summarizing a categorical variable are the pie chart and the bar graph. Pie Chart: A circle having a “slice of pie” for each category Bar Graph: A graph that displays a vertical bar for each category Graphs for Categorical Variables
  • 17. 2-17 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. Pie Charts:  Used for summarizing a categorical variable.  Drawn as a circle where each category is represented as a “slice of the pie”.  The size of each pie slice is proportional to the percentage of observations falling in that category. Pie Charts
  • 18. 2-18 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. Example: Shark Attacks in the United States Table 2.2 Unprovoked Shark Attacks in the U.S., 2004 - 2013
  • 19. 2-19 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. Figure 2.1 Pie Chart of Shark Attacks Across U.S. States. The label for each slice of the pie gives the category and the percentage of attacks in a state. The slice that represents the percentage of attacks reported in Hawaii is 13% of the total area of the pie. Question Why is it beneficial to label the pie wedges with the percent? Example: Shark Attacks in the United States
  • 20. 2-20 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. Bar graphs are used for summarizing a categorical variable.  Bar graphs display a vertical bar for each category.  The height of each bar represents either counts (“frequencies”) or percentages (“relative frequencies”) for that category.  It is usually easier to compare categories with a bar graph rather than with a pie chart.  Bar graphs are called Pareto charts when the categories are ordered by their frequency, from the tallest bar to the shortest bar. Bar Graphs
  • 21. 2-21 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. Figure 2.2 Bar Graph of Shark Attacks Across U.S. States. Except for the Other category, which is shown last, the bars are ordered from largest to smallest based on the frequency of shark attacks. Question What is the advantage of ordering the bars this way rather than alphabetically? Example: Shark Attacks in the United States
  • 22. 2-22 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. Dot Plot: shows a dot for each observation placed above its value on a number line. Stem-and-Leaf Plot: portrays the individual observations. Histogram: uses bars to portray the data. Graphs for Quantitative Variables
  • 23. 2-23 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. Dot plots are used for summarizing a quantitative variable. To construct a dot plot 1. Draw a horizontal line. 2. Label it with the name of the variable. 3. Mark regular values of the variable on it. 4. For each observation, place a dot above its value on the number line. Dot Plots
  • 24. 2-24 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. Each observation is represented by a stem and a leaf. To construct a stem-and-leaf plot 1. Sort the data in order from smallest to largest. 2. Place the stems in a column, starting with the smallest. 3. Place a vertical line to their right. 4. On the right side of the vertical line, indicate each leaf (final digit) that has a particular stem. 5. List the leaves in increasing order. Stem-and-Leaf Plots
  • 25. 2-25 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. A histogram provides a versatile way to picture the distribution of a large data set. 1. Divide the range of the data into intervals of equal width. 2. Count the number of observations in each interval, creating a frequency table. 3. On the horizontal axis, label the values or the endpoints of the intervals. 4. Draw a bar over each value or interval with height equal to its frequency (or percentage), values of which are marked on the vertical axis. 5. Label and title appropriately. Steps for Constructing a Histogram
  • 26. 2-26 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. Example: Sodium in Cereals Table 2.4 Frequency Table for Sodium in 20 Breakfast Cereals. The table summarizes the sodium values using eight intervals and lists the number of observations in each, as well as the proportions and percentages.
  • 27. 2-27 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. Figure 2.6 Histogram of Breakfast Cereal Sodium Values. The rectangular bar over an interval has height equal to the number of observations in the interval. Histogram for Sodium in Cereals
  • 28. 2-28 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. How do we decide which to use? Here are some guidelines: Dot-plot and stem-and-leaf plot  More useful for small data sets  Data values are retained Histogram  More useful for large data sets  Most compact display  More flexibility in defining intervals Choosing a Graph Type
  • 29. 2-29 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. Overall pattern consists of center, spread, and shape.  Assess where a distribution is centered by finding the median (50% of data below median and 50% of data above).  Assess the spread of a distribution.  Shape of a distribution: roughly symmetric, skewed to the right, or skewed to the left Interpreting Histograms
  • 30. 2-30 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. Shape A distribution is unimodal if it has a single mound or peak. A distribution is bimodal if it has two distinct mounds. The value that occurs the most often is called the mode.
  • 31. 2-31 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. A distribution is skewed to the left if the left tail is longer than the right tail. Shape A distribution is skewed to the right if the right tail is longer than the left tail. Symmetric Distributions: if both left and right sides of the histogram are mirror images of each other.
  • 32. 2-32 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. Examples of Skewness
  • 33. 2-33 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. A data set collected over time is called a time series. Time plots are used to display time series data graphically.  Plots each observation on the vertical scale against the time it was measured on the horizontal scale.  Common patterns in the data over time, known as trends, should be noted.  To see a trend more clearly, it is beneficial to connect the data points in their time sequence. Time Plots
  • 34. 2-34 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. The figure shows a time plot of the incidence rate from1980 to 2012, based on numbers reported by the Center for Disease Control and Prevention. Time Plot Example
  • 35. Copyright © 2017, 20013, and 2009, Pearson Education, Inc. 2-35 Section 2.3 Measuring the Center of Quantitative Data
  • 36. 2-36 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. The mean is the sum of the observations divided by the number of observations. Mean   n x x
  • 37. 2-37 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. Example: Center of the Cereal Sodium Data We find the mean by adding all the observations and then dividing this sum by the number of observations, which is 20: 0, 340, 70, 140, 200, 180, 210, 150, 100, 130, 140, 180, 190, 160, 290, 50, 220, 180, 200, 210 Mean = (0 + 340 + 70 + . . . +210)/20 = 3340/20 =167
  • 38. 2-38 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. The median is the middle value of the observations when they are ordered from the smallest to the largest (or from the largest to smallest). How to Determine the Median: Put the n observations in order of their size. If the number of observations, n, is:  odd, then the median is the middle observation.  even, then the median is the average of the two middle observations. Median
  • 39. 2-39 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. An outlier is an observation that falls well above or well below the overall bulk of the data. Outlier
  • 40. 2-40 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. CO2 pollution levels in 9 largest nations measured in metric tons per person: The CO2 values have n = 9 observations. The ordered values are: 0.3, 0.4, 0,8, 1.4, 1.8, 2.1, 5.9, 11.6, 16.9. Example: CO2 Pollution
  • 41. 2-41 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. Since n is odd, the median is the middle value, 1.8. The relatively high value of 16.9 falls well above the rest of the data. It is an outlier. The size of the outlier affects the calculation of the mean but not the median. Example: CO2 Pollution
  • 42. 2-42 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. The shape of a distribution influences whether the mean is larger or smaller than the median.  Perfectly symmetric, the mean equals the median.  Skewed to the left, the mean is smaller than the median.  Skewed to the right, the mean is larger than the median. Comparing the Mean and Median
  • 43. 2-43 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. In a skewed distribution, the mean is farther out in the long tail than is the median.  For skewed distributions the median is preferred because it better represents what is typical. Figure 2.9 Relationship Between the Mean and Median. Question: For skewed distributions, what causes the mean and median to differ? Comparing the Mean and Median
  • 44. 2-44 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. A numerical summary measure is resistant if extreme observations (outliers) have little, if any, influence on its value.  The Median is resistant to outliers.  The Mean is not resistant to outliers. Resistant Measures
  • 45. 2-45 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. Mode  Value that occurs most often.  Highest bar in the histogram.  The mode is most often used with categorical data. The Mode
  • 46. Copyright © 2017, 20013, and 2009, Pearson Education, Inc. 2-46 Section 2.4 Measuring the Variability of Quantitative Data
  • 47. 2-47 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. One way to measure the variability of a distribution is to calculate the range. The range is the difference between the largest and smallest values in the data set: Range = maximum value – minimum value The range is simple to compute and easy to understand, but it uses only the extreme values and ignores the other values. Therefore, it’s affected severely by outliers. Range
  • 48. 2-48 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. The deviation of an observation x from the mean 𝑥 is (𝑥 − 𝑥), the difference between the observation and the sample mean.  Each observation has a deviation from the mean.  A deviation is positive if the value falls above the mean and negative if the value falls below the mean.  The sum of the deviations for all the values in a data set is always zero. Standard Deviation
  • 49. 2-49 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. The deviation of an observation x from the mean 𝑥 is (𝑥 − 𝑥), the difference between the observation and the sample mean.  Summary measures of variability from the mean use either the squared deviations or their absolute values.  The average of the squared deviations is called the variance.  The symbol (𝑥 − 𝑥)2 is called a sum of squares. Standard Deviation
  • 50. 2-50 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. For the cereal sodium values, the mean is 𝑥 = 167. The observation of 210 for Honeycomb has a deviation of 210 - 167 = 43. The observation of 50 for Honey Smacks has a deviation of 50 – 167 = -117. Figure 2.11 shows these deviations. Figure 2.11 Dot Plot for Cereal Sodium Data, Showing Deviations for Two Observations. Question: When is a deviation positive and when is it negative? Standard Deviation Example
  • 51. 2-51 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. Gives a measure of variation by summarizing the deviations of each observation from the mean and calculating an adjusted average of these deviations. The larger the standard deviation, s, the greater the variability of the data. The Standard Deviation s of n Observations 1 ) ( 2     n x x s
  • 52. 2-52 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. Standard Deviation Example Women’s and Men’s Ideal Number of Children Men: 0, 0, 0, 2, 4, 4, 4 Women: 0, 2, 2, 2, 2, 2, 4 Both men and women have a mean of 2 and a range of 4. The standard deviation for men is 𝑠 = (𝑥−𝑥)2 𝑛−1 = 24 6 = 2.0. The standard deviation for women is 1.2.
  • 53. 2-53 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. The most basic property of the standard deviation is this: The larger the standard deviation S, the greater the variability of the data.  S measures the spread of the data.  S = 0 only when all observations have the same value, otherwise S > 0. As the spread of the data increases, S gets larger.  S has the same units of measurement as the original observations. The variance = S 2 has units that are squared.  S is not resistant. Strong skewness or a few outliers can greatly increase S. Properties of the Standard Deviation
  • 54. 2-54 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. Magnitude of s: The Empirical Rule If a distribution of data is bell-shaped, then approximately:  68% of the observations fall within 1 standard deviation of the mean, that is between the values of 𝑥 – s and 𝑥 + s (denoted 𝑥 ± 𝑠).  95% of the observations fall within 2 standard deviations of the mean (𝑥 ± 2𝑠).  All or nearly all observations fall within 3 standard deviations of the mean (𝑥 ± 3𝑠).
  • 55. 2-55 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. Figure 2.12 The Empirical Rule. For bell-shaped distributions, this tells us approximately how much of the data fall within 1, 2, and 3 standard deviations of the mean. Question: About what percentage would fall more than 2 standard deviations from the mean? Magnitude of s: The Empirical Rule
  • 56. Copyright © 2017, 20013, and 2009, Pearson Education, Inc. 2-56 Section 2.5 Using Measures of Position to Describe Variability
  • 57. 2-57 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. The th percentile is a value such that percent of the observations fall below or at that value. Percentile p p
  • 58. 2-58 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. Quartiles Figure 2.14 The Quartiles Split the Distribution Into Four Parts. 25% is below the first quartile (Q1), 25% is between the first quartile and the second quartile (the median, Q2), 25% is between the second quartile and the third quartile (Q3), and 25% is above the third quartile. Question: Why is the second quartile also the median?
  • 59. 2-59 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. SUMMARY: Finding Quartiles  Arrange the data in order.  Consider the median. This is the second quartile, Q2.  Consider the lower half of the observations (excluding the median itself if n is odd). The median of these observations is the first quartile, Q1.  Consider the upper half of the observations (excluding the median itself if n is odd). Their median is the third quartile, Q3.
  • 60. 2-60 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. Consider the sodium values for the 20 breakfast cereals. What are the quartiles for the 20 cereal sodium values? From Table 2.3, the sodium values, in ascending order, are: The median of the 20 values is the average of the 10th and 11th observations, 180 and 180, which is Q2 = 180 mg. The first quartile Q1 is the median of the 10 smallest observations (in the top row), which is the average of 130 and 140, Q1 = 135 mg. The third quartile Q3 is the median of the 10 largest observations (in the bottom row), which is the average of 200 and 210, Q3 = 205 mg. Example: Cereal Sodium Data
  • 61. 2-61 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. The interquartile range is the distance between the third quartile and first quartile: IQR = Q3  Q1 IQR gives the spread of middle 50% of the data. The Interquartile Range (IQR) In Words: If the interquartile range of U.S. music teacher salaries equals $16,000, this means that for the middle 50% of the distribution stretches over a distance of $16,000.
  • 62. 2-62 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. Examining the data for unusual observations, such as outliers, is important in any statistical analysis. Is there a formula for flagging an observation as potentially being an outlier? The 1.5 x IQR Criterion for Identifying Potential Outliers An observation is a potential outlier if it falls a distance of more than 1.5 x IQR below the first quartile or a distance of more than 1.5 x IQR above the third quartile. Detecting Potential Outliers
  • 63. 2-63 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. The five-number summary is the basis of a graphical display called the box plot, and consists of  Minimum value  First Quartile  Median  Third Quartile  Maximum value Five-Number Summary of Positions
  • 64. 2-64 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.  A box goes from Q1 to Q3.  A line is drawn inside the box at the median.  A line goes from the lower end of the box to the smallest observation that is not a potential outlier and from the upper end of the box to the largest observation that is not a potential outlier.  The potential outliers are shown separately. SUMMARY: Constructing a Box Plot
  • 65. 2-65 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. Figure 2.15 shows a box plot for the sodium values. Labels are also given for the five-number summary of positions. Example: Box Plot for Cereal Sodium Data Figure 2.15 Box Plot and Five-Number Summary for 20 Breakfast Cereal Sodium Values. The central box contains the middle 50% of the data. The line in the box marks the median. Whiskers extend from the box to the smallest and largest observations, which are not identified as potential outliers. Potential outliers are marked separately. Question: Why is the left whisker drawn down only to 50 rather than to 0?
  • 66. 2-66 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. Side-by-Side Box Plots Help to Compare Groups A box plot does not portray certain features of a distribution, such as distinct mounds and possible gaps, as clearly as does a histogram. Box plots are useful for identifying potential outliers. Figure 2.16 Box Plots of Male and Female College Student Heights. The box plots use the same scale for height. Question: What are approximate values of the quartiles for the two groups?
  • 67. 2-67 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. The z-score also identifies position and potential outliers. The z-score for an observation is the number of standard deviations that it falls from the mean. A positive z-score indicates the observation is above the mean. A negative z-score indicates the observation is below the mean. For sample data, the z -score is calculated as: An observation from a bell-shaped distribution is a potential outlier if its z-score < -3 or > +3 (3 standard deviation criterion). Z-Score observation mean . standard deviation z  
  • 68. Copyright © 2017, 20013, and 2009, Pearson Education, Inc. 2-68 Section 2.6 Recognizing and Avoiding Misuses of Graphical Summaries
  • 69. 2-69 Copyright © 2017, 20013, and 2009, Pearson Education, Inc.  Label both axes and provide proper headings.  To better compare relative size, the vertical axis should start at 0.  Be cautious in using anything other than bars, lines, or points.  It can be difficult to portray more than one group on a single graph when the variable values differ greatly. Consider instead using separate graphs, or plotting relative sizes such as ratios or percentages. Summary: Guidelines for Constructing Effective Graphs
  • 70. 2-70 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. Example: Beware of Poor Graphs Figure 2.18 An Example of a Poor Graph. Question: What’s misleading about the way the data are presented?
  • 71. 2-71 Copyright © 2017, 20013, and 2009, Pearson Education, Inc. Figure 2.19 A Better Graph for the Data in Figure 2.18. Question: What trends do you see in the enrollments from 2004 to 2012? Example: Beware of Poor Graphs