 To use the techniques of exploratory data
analysis, including boxplots and five-number
summaries, to discover various aspects of
data.
 Organizes data using frequency distribution
 Graphic Representation: Histogram,
Frequency Polygon, Ogives
 Purpose: confirm various conjectures about
the nature of the data.
 Organizes data using stem and leaf plot
 Measure of central tendency used: median
 Measure of variation: IQR (Q3 – Q1)
 Graphic Representation: Box-and-Whisker
Plot
 Purpose: To examine data to find out what
information can be discovered about the data
such as the center and the spread.
 Five important values used:
1. MinimumValue
2. Q1
3. Median
4. Q3
5. MaximumValue
 A boxplot is a graph of a data set obtained by
1. Drawing a horizontal line from the minimum
value to Q1
2. Drawing a horizontal line from Q3 to the
maximum value
3. Drawing a box whose vertical sides pass through
Q1 and Q3 with a vertical line inside the box
passing through Q2(the median)
 A stockbroker recorded the number of clients she saw
each day over an 11-day period. The data are shown.
Construct a boxplot for the data.
 33, 38, 43, 30, 29, 40, 51, 27, 42, 23, 31
1. Order: 23, 27, 29, 30, 31, 33, 38, 40, 42, 43, 51
2. Find median, Q1 and Q3:
▪ 33, 29, 42
3. Draw scale for data on x-axis
4. Locate the 5 numbers
5. Draw a box around Q1 and Q3, draw vertical line through the
median, connect maximum and minimum values
 If median is near center, distribution is appx
symmetric.
 If median falls left of center, distribution is
positively skewed.
 If median falls right of center, distribution is
negatively skewed.
 If lines are appx same length, distribution is
appx symmetric.
 If right line is larger than the left, distribution
is positively skewed.
 If left line is larger than the right, distribution
is negatively skewed.
 A resistant statistic is one that is not affected by
an extremely skewed distribution.
 Examples: summary stats (median and IQR)
 A nonresistant statistic is one that is affected by
an extremely skewed distribution.
 Examples: mean, standard deviation
 When distribution is skewed or contains outliers,
median & IQR more accurately summarize the
data.
 p. 157-158 #1, 5, 7-10, 13, 15

3.5 Exploratory Data Analysis

  • 2.
     To usethe techniques of exploratory data analysis, including boxplots and five-number summaries, to discover various aspects of data.
  • 3.
     Organizes datausing frequency distribution  Graphic Representation: Histogram, Frequency Polygon, Ogives  Purpose: confirm various conjectures about the nature of the data.
  • 4.
     Organizes datausing stem and leaf plot  Measure of central tendency used: median  Measure of variation: IQR (Q3 – Q1)  Graphic Representation: Box-and-Whisker Plot  Purpose: To examine data to find out what information can be discovered about the data such as the center and the spread.
  • 5.
     Five importantvalues used: 1. MinimumValue 2. Q1 3. Median 4. Q3 5. MaximumValue
  • 6.
     A boxplotis a graph of a data set obtained by 1. Drawing a horizontal line from the minimum value to Q1 2. Drawing a horizontal line from Q3 to the maximum value 3. Drawing a box whose vertical sides pass through Q1 and Q3 with a vertical line inside the box passing through Q2(the median)
  • 7.
     A stockbrokerrecorded the number of clients she saw each day over an 11-day period. The data are shown. Construct a boxplot for the data.  33, 38, 43, 30, 29, 40, 51, 27, 42, 23, 31 1. Order: 23, 27, 29, 30, 31, 33, 38, 40, 42, 43, 51 2. Find median, Q1 and Q3: ▪ 33, 29, 42 3. Draw scale for data on x-axis 4. Locate the 5 numbers 5. Draw a box around Q1 and Q3, draw vertical line through the median, connect maximum and minimum values
  • 8.
     If medianis near center, distribution is appx symmetric.  If median falls left of center, distribution is positively skewed.  If median falls right of center, distribution is negatively skewed.
  • 9.
     If linesare appx same length, distribution is appx symmetric.  If right line is larger than the left, distribution is positively skewed.  If left line is larger than the right, distribution is negatively skewed.
  • 10.
     A resistantstatistic is one that is not affected by an extremely skewed distribution.  Examples: summary stats (median and IQR)  A nonresistant statistic is one that is affected by an extremely skewed distribution.  Examples: mean, standard deviation  When distribution is skewed or contains outliers, median & IQR more accurately summarize the data.
  • 11.
     p. 157-158#1, 5, 7-10, 13, 15