• Save
Displaying quantitative data 1.1
Upcoming SlideShare
Loading in...5
×
 

Displaying quantitative data 1.1

on

  • 296 views

 

Statistics

Views

Total Views
296
Views on SlideShare
296
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Displaying quantitative data 1.1 Displaying quantitative data 1.1 Presentation Transcript

  • Displaying Quantitative Data with Graphs Section 1.1
  • What you’ll learnTo create and interpret the followinggraphs: Dotplot Stem and leaf Regular Stem and Leaf Split Stem and Leaf Back-to-Back Stem and Leaf Histogram Time Plot Ogive
  • To learn how to display and describe quantitative data wewill be using some baseball statistics. The following tableshows the number of home runs in a single season forthree well-known baseball players: Hank Aaron, BarryBonds, and Babe Ruth. Hank Aaron Barry Bonds Babe Ruth 13 32 16 40 54 46 27 44 25 37 59 41 26 39 24 34 35 34 44 29 19 49 41 22 30 44 33 73 46 39 38 25 25 40 47 34 47 34 34 46 60 45 40 37 54 44 20 33 46 24 42 49
  • DotplotLabel the horizontal axis with the name of thevariable and title the graphScale the axis based on the values of thevariableMark a dot (we’ll use x’s) above the number onthe axis corresponding to each data value Number of Hom Runs in a Single Season e Dot Plot 20 25 30 35 40 45 50 55 60 Ruth
  • Describing a Distribution We describe a distribution (the values the variable takes on and how often it takes these values) using the acronym SOCS  Shape– We describe the shape of a distribution in one of two ways: Symmetric/Approx. SymmetricCollection 1 Dot Plot Shape Dot Plot -3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3 4 Sym etric m Uniform
  • Skewed Right LeftShape Dot Plot Shape Dot Plot “tail” “tail” -4 -3 -2 -1 0 1 2 3 4 -3 -2 -1 0 1 2 3 4 RightSkew ed LeftSkew ed Notice that the direction of the “skew” is the same direction as the “tail”
  • •Outliers: These are observations that wewould consider “unusual”. Pieces of datathat don’t “fit” the overall pattern of the data. Babe Ruth had two seasons Number of Home Runs in a Single Season Dot Plot that appear to be somewhat different than the rest of his Unusual observation??? career. These may be “outliers” 20 25 30 35 40 45 50 55 60 65(We’ll learn a numerical way to Ruth determine if observations are truly “unusual” later) Number of Home Runs in a Single Season Dot Plot Unusual observation??? The season in which Barry Bonds hit 73 home runs does not appear to fit the overall pattern. This piece 10 20 30 40 50 60 70 80 of data may be an outlier. Bonds
  • Center: A single value that describes the entiredistribution. A “typical” value that gives a concisesummary of the whole batch of numbers. Number of Home Runs in a Single Season Dot Plot 20 25 30 35 40 45 50 55 60 65 RuthA typical season for Babe Ruth appears to beapproximately 46 home runs*We’ll learn about three different numerical measures of center in the nextsection
  • Spread: Since we know Number of Home Runs in a Single Season Dot Plot that not everyone is typical, we need to also talk about the variation of a distribution. We need to discuss if the values of the distribution are tightly 20 25 30 35 40 45 50 55 60 65 clustered around the Ruth center making it easy to predict or do the values Babe Ruth’s number of home runs in a vary a great deal from the single season varies from a low of 23 to center making prediction a high of 60. more difficult?*We’ll learn about three different numerical measures of spread in the nextsection.
  • Distribution Description using SOCSThe distribution of Babe Ruth’s number of homeruns in a single season is approximatelysymmetric1 with two possible unusualobservations at 23 and 25 home runs .2 Hetypically hits about 463 home runs in a season.Over his career, the number of home runs hasvaried from a low of 23 to a high of 60. 4 1-Shape 2-Outliers 3-Center 4-Spread
  • Stem and Leaf PlotCreating a stem and leaf plot Number of Home Runs in a Single Season Order the data points from least to greatest Separate each observation Hank Aaron into a stem (all but the 1 3 rightmost digit) and a leaf (the final digit)—Ex. 123-> 12 2 04679 (stem): 3 (leaf) In a T-chart, write the stems 3 0244899 vertically in increasing order on the left side of the chart. 4 00444457 On the right side of the chart write each leaf to the right of its stem, spacing the leaves Key equally Include a key and title for the graph 4 6 = 46
  • Split Stem and Leaf PlotIf the data in a distribution is concentrated in justa few stems, the picture may be moredescriptive if we “split” the stemsWhen we “split” stems we want the samenumber of digits to be possible in each stem.This means that each original stem can be splitinto 2 or 5 new stems.A good rule of thumb is to have a minimum of 5stems overallLet’s look at how splitting stems changes thelook of the distribution of Hank Aaron’s homerun data.
  • Split each stem into 2 Number of Home Runs in anew stems. This Single Seasonmeans that the first Hank Aaronstem includes the 1 3leaves 0-4 and the 1second stem has the 2 04leaves 5-9 2 679Splitting the stems 3 0244helps us to “see” the 3 899shape of the 4 004444distribution in this 4 57 Keycase. 4 6 = 46
  • Back-to-Back Stem and Leaf Number of Home Runs in a SingleBack-to-Back stem Seasonand leaf plots allow Aaron 3 1 Ruthus to quickly 1compare two 40 2 2distributions. 976 2 5 4420 3 4 998 3 5Use SOCS to 444400 4 11make comparisons 75 4 66679between 5 449distributions 5 Key 6 0 4 6 = 46
  • Advantages and Disadvantages of dotplots/stem and leaf plotsAdvantages Disadvantages  Preserves each piece  If creating by hand, of data large data sets can be cumbersome  Shows features of the distribution with  Data that is widely regards to shape— varied may be difficult such as clusters, gaps, to graph outliers, etc
  • HistogramsA histogram is one of the most common graphsused for quantitative variables.Although a histogram looks like a bar chartthere are some important differences In a histogram, the “bars” touch each other Histograms do not necessarily preserve individual data pieces Changing the “scale” or “bin width” can drastically alter the picture of the distribution, so caution must be used when describing a distribution when only a histogram has been used
  • Creating a histogramDivide the range of Barry Bonds:data into classes of  Data Ranges from 16equal width. Count to 73, so we choosethe number of for our classesobservations in eachclass. (Remember 15 ≤ # of HR ≤ 19 .that the width is . .somewhat arbitrary 70 ≤ # of HR ≤ 75and you might choose  We can thena different width than determine the countssomeone else) for each “bin”
  • So the frequency The horizontal axisdistribution looks like: represents the variable values, so Class Frequency using the lower bound 15-19 2 of each class to scale 20-24 1 is appropriate. 25-29 2 The vertical axis can 30-34 4 represent 35-39 2  Frequency 40-44 2 45-49 2  Relative frequency 50-54 0  Cumulative frequency 55-59 0  Relative cumulative 60-64 0 frequency 65-69 0 We’ll use frequency 70-74 1
  • Label and scale your axes. Title your graphDraw a bar that represents the frequency foreach class. Remember that the bars of thehistograms should touch each other. Barry Bonds Histogram 7 6 5 Count 4 3 2 1 10 20 30 40 50 60 70 80 90 HomeRuns
  • Interpretation We interpret a histogram in the same way we interpret a dotplot or stem and leaf plot. ALWAYS use SOCSShape OutliersCenter Spread
  • Time PlotsSometimes, our data is collected atintervals over time and we are looking forchanges or patterns that have occurred.We use a time plot for this type of dataA time plot uses both the horizontal andvertical axes. The horizontal axis represents the time intervals The vertical axis represents the variable values
  • Creating a Time Plot Barry Bonds Line Scatter PlotLabel and scale the 80 70axes. Title your 60 BondsHR 50graph. 40 30Plot a point 20 10corresponding to the 1986 1990 1994 Year 1998 2002data taken at eachtime interval Year 1986 HR 16 Year 1994 HR 37A line segment drawn 1987 25 1995 33between each point 1988 1989 24 19 1996 1997 42 40may be helpful to see 1990 33 1998 37patterns in the data 1991 1992 25 34 1999 2000 34 49 1993 46 2001 73
  • Describing Time PlotsWhen describing time Barry Bonds Line Scatter Plotplots, you should look for 80trends in the data 70 60Although the number of BondsHR 50home runs do not show a 40constant increase from 30year to year we note that 20overall, the number of 10home runs made by 1986 1990 1994 Year 1998 2002Barry Bond has increasedover time with the mostnotable increase beingbetween 1999 and 2001.
  • Relative frequency, Cumulativefrequency, Percentiles, and Ogives Sometimes we are interested in describing the relative position of an observation For example: you have no doubtably been told at one time or another that you scored at the 80th percentile. This means that 80% of the people taking the test score the same or lower than you did. How can we model this?
  • Ogive (Relative cumulative frequency graph)We first start # of home Relative runs in a Relative Cumulative Cumulativeby creating a season Frequency Frequency Frequency Frequencyfrequency 15-19 2 0.125 2 0.125 20-24 1 0.0625 3 0.1875table 25-29 2 0.125 5 0.3125We’ll look at 30-34 4 0.25 9 0.5625 35-39 2 0.125 11 0.6875how each 40-44 2 0.125 13 0.8125column is 45-49 2 0.125 15 0.9375created in the 50-54 0 0 15 0.9375next few 55-59 0 0 15 0.9375 60-64 0 0 15 0.9375slides 65-69 0 0 15 0.9375 70-74 1 0.0625 16 1
  • Relative Frequency The # of home runs… and # of home * the frequency are the same runs in a season Frequency Relative Frequency columns as we created for 15-19 2 0.125 the histogram. 20-24 25-29 1 2 0.0625 0.125 To find the values for the 30-34 4 0.25 35-39 2 0.125 “Relative Frequency” 40-44 2 0.125 column find the following: 45-49 2 0.125 50-54 0 0Frequency Value 55-59 0 0 Total # of = Relative Frequency 60-64 0 0 observations 65-69 0 0 70-74 1 0.0625 * Within rounding, this column should equal 1
  • Cumulative FrequencyCumulative frequency # of homesimply adds the runs in a Relative Cumulativecounts in the season 15-19 Frequency 2 Frequency 0.125 Frequency 2frequency column that 20-24 1 0.0625 3fall in or below the 25-29 2 0.125 5current class level. 30-34 35-39 4 2 0.25 0.125 9 11For Example: to find 40-44 2 0.125 13the “13”, add the 45-49 50-54 2 0 0.125 0 15 15frequencies in the 55-59 0 0 15oval: 60-64 0 0 152+1+2+4+2+2=13 65-69 70-74 0 1 0 0.0625 15 16
  • Relative Cumulative FrequencyRelative cumulative # of ho mfrequency divides the e runs in a Relative Cumulative Relative Cumulativecumulative frequency season Frequency Frequency Frequency Frequency 15-19 2 0.125 2 0.125by the total number of 20-24 1 0.0625 3 0.1875observations 25-29 30-34 2 4 0.125 0.25 5 9 0.3125 0.5625 35-39 2 0.125 11 0.6875 40-44 2 0.125 13 0.8125 45-49 2 0.125 15 0.9375For Example: 50-54 0 0 15 0.9375 55-59 0 0 15 0.9375 .8125 = 13/16 60-64 0 0 15 0.9375 65-69 0 0 15 0.9375 70-74 1 0.0625 16 1 Sum 16 1
  • Creating the OgiveLabel and scale the axes Horizontal: Variable  Vertical: Relative Cumulative Frequency (percentile)Plot a point corresponding to the relativecumulative frequency in each class interval atthe left endpoint of the next class intervalThe last point you should plot should be at aheight of 100%
  • # of home Relativeruns in a Cumulative Barry Bonds Scatter Plotseason Frequency *15-19 0.125 1.220-24 0.1875 1.0 Relcumfreq25-29 0.3125 0.830-34 0.562535-39 0.6875 0.640-44 0.8125 0.445-49 0.9375 0.250-54 0.937555-59 0.9375 0.060-64 0.937565-69 0.9375 10 20 30 40 50 60 70 8070-74 1 HR A line segment from point to point can be added for analysis
  • Types of Info from OgivesFinding an individual observation within thedistributionFind the relative standing of a season in whichBarry Bonds hit 40 home runs Barry Bonds Scatter Plot Relcumfreq 1.2 1.0 0.8 0.6 0.4 0.2 0.0 10 20 30 40 50 60 70 80 HRA season with 40 home runs lies at the 60th percentile, meaning thatapproximately 60% of his seasons had 40 or less home runs
  • Locating an observation corresponding to apercentile.How many home runs must be hit in a seasonto correspond to the 75th percentile? Barry Bonds Scatter Plot Relcumfreq 1.2 1.0 0.8 0.6 0.4 0.2 0.0 10 20 30 40 50 60 70 80 HR To be better than 75% of Mr. Bonds season, approximately 42 home runs must be hit.
  • A little History on the word Ogive(sometimes called an Ogee)It was first used by Sir FrancisGalton, who borrowed a term fromarchitecture to describe thecumulative normal curve (moreabout that next chapter).The ogive in architecture was acommon decorative element inmany of the English Churchesaround 1400. The picture at rightshows the door to the Church ofThe Holy Cross at the village ofCaston in Norfolk. In this image youcan see the use of the ogive in thedesign of the door and repeated inthe windows above.Find more about this term atMathwords.