AP Review Exploring Data
Describing a Distribution Discuss center, shape, and spread in context. Center:  Mean or Median Shape:  Roughly Symmetrical, Right or Left Skewed Spread:  Standard Deviation, IQR, Range, or Spread
Checking for Outliers A survey was conducted to gather ratings of the quality of service at local restaurants at a nearby mall.  Respondents were to rate overall service using values between 0 (terrible) and 100 (excellent).  The five number summary is 32, 47.5, 51, 63.5, 92.  The data values above Q3 are 65, 66, 70, 71, and 92.  Are there outliers on the high end?
Checking for Outliers Outliers > Q3 + 1.5(IQR) Outliers > 63.5 + 1.5 (63.5 – 47.5) Outliers > 87.5 Therefore, 92 is an outlier.
Robust and Sensitive Statistics Robust (not affected by extreme values) Median, IQR Sensitive (affected by extreme values) Mean, s, range
Parameters and Statistics Parameters are numerical values that describe a population. Statistics are numerical values that describe a sample.
Z – Scores and Percentiles Barron’s p. 41 #10 Assuming that batting averages have a bell-shaped distribution, arrange in ascending order: I.  An average with a z-score of –1 II. An average with a percentile rank of 20%. III. An average at the first quartile, Q1. I, II, III
Normal Distribution Barron’s P. 367 #3 The average yearly snowfall in a city is 55 inches.  What is the standard deviation if 15% of the years have snowfalls above 60 inches?  Assume yearly snowfalls are normally distributed.
Linear Regression Don’t forget about formulas on chart. r is the correlation coefficient. r^2 is the coefficient of determination. r has no units Strong r indicates association, not causation. r is not affected if x & y are reversed or if operations (mult, divide, add, sub) are performed on each x or on each y.
Linear Regression r^2 describes the percent variation of the dependent variable, y, explained by the linear relationship (LSRL) with the independent variable, x.  PUT IN CONTEXT! When discussing r, describe line as weak, moderate, or strong linear relationship between x & y
Linear Regression Influential Point – pulls regression line toward it.  An influential point is usually a point in the x-direction. Outlier – shows up in residual plot usually in the y – direction.
Linear Regression When performing Linear Regression, do the following: Create a scatterplot Calculate the equation of the regression line Plot the residuals A residual is the observed y – predicted y.
Barron’s Problems Multiple Choice P. 370  #13, 14, 16, 19, 21, 24, 27, 30, 38 Free Response P. 430 #2

Exploring Data

  • 1.
  • 2.
    Describing a DistributionDiscuss center, shape, and spread in context. Center: Mean or Median Shape: Roughly Symmetrical, Right or Left Skewed Spread: Standard Deviation, IQR, Range, or Spread
  • 3.
    Checking for OutliersA survey was conducted to gather ratings of the quality of service at local restaurants at a nearby mall. Respondents were to rate overall service using values between 0 (terrible) and 100 (excellent). The five number summary is 32, 47.5, 51, 63.5, 92. The data values above Q3 are 65, 66, 70, 71, and 92. Are there outliers on the high end?
  • 4.
    Checking for OutliersOutliers > Q3 + 1.5(IQR) Outliers > 63.5 + 1.5 (63.5 – 47.5) Outliers > 87.5 Therefore, 92 is an outlier.
  • 5.
    Robust and SensitiveStatistics Robust (not affected by extreme values) Median, IQR Sensitive (affected by extreme values) Mean, s, range
  • 6.
    Parameters and StatisticsParameters are numerical values that describe a population. Statistics are numerical values that describe a sample.
  • 7.
    Z – Scoresand Percentiles Barron’s p. 41 #10 Assuming that batting averages have a bell-shaped distribution, arrange in ascending order: I. An average with a z-score of –1 II. An average with a percentile rank of 20%. III. An average at the first quartile, Q1. I, II, III
  • 8.
    Normal Distribution Barron’sP. 367 #3 The average yearly snowfall in a city is 55 inches. What is the standard deviation if 15% of the years have snowfalls above 60 inches? Assume yearly snowfalls are normally distributed.
  • 9.
    Linear Regression Don’tforget about formulas on chart. r is the correlation coefficient. r^2 is the coefficient of determination. r has no units Strong r indicates association, not causation. r is not affected if x & y are reversed or if operations (mult, divide, add, sub) are performed on each x or on each y.
  • 10.
    Linear Regression r^2describes the percent variation of the dependent variable, y, explained by the linear relationship (LSRL) with the independent variable, x. PUT IN CONTEXT! When discussing r, describe line as weak, moderate, or strong linear relationship between x & y
  • 11.
    Linear Regression InfluentialPoint – pulls regression line toward it. An influential point is usually a point in the x-direction. Outlier – shows up in residual plot usually in the y – direction.
  • 12.
    Linear Regression Whenperforming Linear Regression, do the following: Create a scatterplot Calculate the equation of the regression line Plot the residuals A residual is the observed y – predicted y.
  • 13.
    Barron’s Problems MultipleChoice P. 370 #13, 14, 16, 19, 21, 24, 27, 30, 38 Free Response P. 430 #2