Your SlideShare is downloading. ×
Ap stats chapter 2 miller revised
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Ap stats chapter 2 miller revised


Published on

Published in: Education

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Chapter 2 Graphical Methods for Describing Data DistributionsCreated by Kathy Fritz / Revised by S. Miller September 2012
  • 2. Variable• any characteristic whose value may change from one individual to another College Home
  • 3. Data• The values for a variable from individual observations
  • 4. Suppose that a PE coach records theheight of each student in his class. This is an example of univariate dataUnivariate – consist of observations on asingle variable made on individuals in asample or population
  • 5. Suppose that the PE coach records theheight and weight of each student in hisclass. This is an example of bivariate dataBivariate - data that consist of pairs ofnumbers from two variables for eachindividual in a sample or population
  • 6. Suppose that the PE coach records theheight, weight, number of sit-ups, andnumber of push-ups for each student inhis class. This is an example of multivariate dataMultivariate - data that consist ofobservations on two or more variables
  • 7. Two types of variablescategorical numerical
  • 8. Categorical variables• Qualitative• Consist of categorical responses1. Car model Which of They are all these2. Birth year categorical variables are3. Type of cell phone variables! NOT4. Your zip code categorical5. Which club you have joined variables?
  • 9. Numerical variables• quantitative It makes sense to perform math There operations on these values. are two types of numerical variables -• observations or measurements take on discrete and continuous numerical values1. GPAs Which of these Does it makes sense variables are2. Height of students to find an average NOT numerical?3. Codes to combination locks code to combination4. Number of text messages per day locks?5. Weight of textbooks
  • 10. Two types of variablescategorical numerical discrete continuous
  • 11. Discrete (numerical)• Isolated points along a number line• usually counts of items• Example: number of textbooks purchased
  • 12. Continuous (numerical)• Variable that can be any value in a given interval• usually measurements of something• Examples: GPAs or height or weight
  • 13. Are the following variables categoricalor numerical (discrete or continuous)?1. the color of cars in the teacher’s lot Categorical2. the number of calculators owned by students at your college Discrete numerical3. the zip code of an individual Categorical Is money a measurement or a count?4. the amount of time it takes students to drive to school Continuous numerical5. the appraised value of homes in your city Discrete numerical
  • 14. Graphical Display Variable Type Data Type Purpose Display dataBar Chart Use the following table to Univariate Categorical distributionComparative BarChart determine2an appropriate 2 or more Univariate for or more groups Categorical Compare groupsDotplot graphical display a data set. data of Univariate What types Numerical Display graphs can be distribution Numerical used withComparative Univariate for 2 or Compare 2 or moredotplot more groups groupsStem-and-leaf categorical Display data Univariate Numericaldisplay data? distributionComparative stem- Univariate for 2 Compare 2 or moreand-leaf groups In section Numerical 2.3, we will groupsHistogram Univariate see how the various Numerical Display data distribution graphical displays for InvestigateScatterplot Bivariate univariate,relationship between Numerical numerical data compare. 2 variables Univariate, collected Investigate trendTime series plot Numerical over time over time
  • 15. DisplayingCategorical Data Bar Charts Comparative Bar Charts
  • 16. Bar ChartWhen to Use: Univariate, Categorical dataTo comply with new standards from the U. S. Department of This ischart is afrequency distribution. A bar called a graphical bottom of theTransportation, helmets should reach thedisplay formotorcyclist’s ears. The report “Motorcycle Helmet Use in 2005 – categorical data.Overall Results” (National Highway Traffic Safety Administration,Augustfrequency distribution is by observing 1700 A 2005) summarized data collected a table that displays the possible categories alongmotorcyclists nationwide at selected roadway locations. The frequency for a particularEach time a motorcyclist passed by,frequencies or whether with the associated the observer that notedthe category is thehelmet (N), a noncompliant helmet (NC), rider was wearing no number of timesor a compliant helmet (C). frequencies. set. relative in the data category appears Helmet Use FrequencyThe data are summarized in this N 731table: NC 153This should equal the total number of C 816 observations. 1700
  • 17. Bar ChartTo compile with new standards from the U. S. Department ofTransportation, helmets should reach the bottom of themotorcyclist’s ears. The report “Motorcycle Helmet Use in 2005 –Overall Results” (National Highway Traffic Safety Administration,August 2005) summarized data collected by observing 1700motorcyclists nationwide at selected roadway locations.Each time a motorcyclist passed by, the observer noted whetherthe rider was wearing no helmet (N), a noncompliant helmet (NC),or a compliant helmet (C).The data are summarized in this Relative Helmet Use Helmet Use Frequencytable: N 731 0.430 This should equal 1 NC 153 0.090 816 (allowing for rounding). C 0.480 1700 1.000
  • 18. Bar ChartHow to construct1. Draw a horizontal line; write the categories or All bars should have the same width so labels below the line at regularly spaced that both the height and the area of intervals the bar are proportional to the frequency or relative frequency of the2. Draw a vertical line; label the scale using corresponding categories. frequency or relative frequency3. Place a rectangular bar above each category label with a height determined by its frequency or relative frequency
  • 19. Bar ChartWhat to Look For Frequently or infrequently occurring categoriesHere is thecompleted bar chartfor the motorcyclehelmet data.Describe this graph.
  • 20. Comparative Bar ChartsWhen to Use Univariate, Categorical data for Bar charts can two or more groups also be used to provide a visual You use relative frequency rather comparison of two or more groups. than frequency on the vertical axisHow to constructyou can make meaningful so that comparisons even if the sample• Constructed by using the same horizontal and sizes are not the same. vertical axes for the bar charts of two or more groups• Usually color-coded to indicate which bars Why? correspond to each group• Should use relative frequencies on the vertical axis
  • 21. Each year the Princeton Review conducts a survey ofstudents applying to college and of parents of collegeapplicants. In 2009, 12,715 high school studentsresponded to the question “Ideally how far from homewould you like the college you attend to be?”Also, 3007 parents of students applying to collegeresponded to the question “how far from home wouldyou like the college yourshould you do first?Data What child attends to be?”are displayed in the frequency table below. FrequencyIdeal Distance Students Parents Create aLess than 250 miles 4450 1594 comparative250 to 500 miles 3942 902 bar chart500 to 1000 miles 2416 331 with these data.More than 1000 miles 1907 180
  • 22. Relative FrequencyIdeal Distance Students ParentsLess than 250 miles .35 .53250 to 500 miles .31 .30500 to 1000 miles .19 .11More than 1000 miles .15 .06 Found by dividing the frequency by the total number of students Found by dividing the frequency by the total number of parents What does this graph show about the ideal distance college should be from home?
  • 23. DisplayingNumerical Data Dotplots Stem-and-leaf Displays Histograms
  • 24. DotplotWhen to Use Univariate, Numerical dataHow to construct1. Draw a horizontal line and mark it with an appropriate numerical scale2. Locate each value in the data set along the scale and represent it by a dot. If there are two are more observations with the same value, stack the dots vertically
  • 25. DotplotWhat to Look For • A representative or typical value (center) in An outlier is an unusually large or small the data set • The extent to which the data values data value. spread out • The nature offor deciding when an(shape) A precise rule the distribution observation is an outlier is given we look for with What in Chapter 3. along the numberunivariate, numerical data line • The presence of unusual values (gaps and sets are similar for outliers) dotplots, stem-and-leaf displays, and histograms.
  • 26. The first three observations are Professor Norm gave a 10-question quiz last plotted – note that you stack the week in his introductory statistics class. The points if values are repeated. number of correct answers for each student is recorded below.First draw a horizontal line with an appropriate scale. 6 8 6 5 4 7 9 4 5 8 5 This 6 the completed dotplot. 4 is 7 7 3 8 7 6 7 6 6 6 5 5 9Write a sentenceor two describingthis distribution. 2 4 6 8 10 Number of correct answers Number of correct answers
  • 27. What to Look ForWhat to Look For The representative or typical value (center) in the data set• • The representative or typical value (center) in the data set• • The extent to which the data values spreadone that has a A symmetrical distributionspread out data values is out The extent to which thedata values spread out extent to which vertical Norm curve, (shape) alongthe left line If we draw a•Professor line of gave a 10-question the number line is • The nature of the distribution (shape) along the number half The nature of symmetry where quiz last smoothing out this• • The presence of unusual values The presence of unusual valuesweek in hiswe will see that of the right half.The dotplot, mirror image statistics class. a introductorynumber of ONLY oneanswers for each student is there is correct peak.recorded below. Distributions with a single peak are said to be 2 4 6 8 10 unimodal. Number of correct answers TheDistributions with two center for the distribution of the number of peaks are bimodal, and correct answers is about 6. There is not a lot of with more than two peaks variability in the observations. The distribution are multimodal. is approximately symmetrical with no unusual observations.
  • 28. Comparative DotplotsWhen to Use Univariate, numerical data with observations from 2 or more groupsHow to construct • Constructed using the same numerical scale for two or more dotplots • Be sure to include group labels for the dotplots in the displayWhat to Look For Comment on the same four attributes, but comparing the dotplots displayed.
  • 29. Distributions where the right tail is longer In anotherthatcomparative be positively the data Notice introductorysidedotplotclass, skewed Create a the left statistics with of the than the left is said to (or lower tail) sets from the two statistics classes, Professor Skew also gaveto 10-question quiz. The (or skewed a the right). distribution is longer than the right side (or number of correct answers for andSkew’s class Is the distribution for Prof. Skew. to is Professors’ Norm each said upper tail). This distribution is studentbe recorded direction of skewness is always inleft). The below. negatively skewed (or skewed to the the symmetric? Why or why not? direction of the longer tail.The center8 the distribution for the number 6 of 10 8 8 7 9 8 10of correct 7 answers on 9Prof. Skew’s class is 8 Prof. Skew 8 8 7 7 3 7larger than the center of Prof. Norm’s class. 8 7 6 6 6 5 5 9 8There is also more variability in Prof. Skew’sdistribution. Prof. Skew’s distributionappears to have an unusual observation whereone student few had 2 answers correct while Write a only Prof. Normthere were no unusual observations in Prof. sentencesNorm’s class. The distribution for Prof. Skew comparing theseis negatively skewed while Prof. Norm’s distributions.distribution is more symmetrical. 2 4 6 8 10 Number of correct answers
  • 30. Stem-and-Leaf DisplaysWhen to Use Univariate, Numerical dataHow to construct Stem-and-leafor more of the leading digits for • Select one displays are an effective way to summarize univariate numerical data when the the stem • List the data set stem values in a vertical possible is not too large. column • Record the leaf for each observationlist Each observation is split intosure to Be two parts: beside theconsists of theevery stem from Stem – corresponding stem digit(s) first value • Indicate the units forthe finaland leavesthe Leaf - consists of stems digit(s) to the smallest someplace in the display largest value
  • 31. Stem-and-Leaf DisplaysWhat to Look For • A representative or typical value (center) in the data set • The extent to which the data values spread out • The presence of unusual values (gaps and outliers) • The extent of symmetry in the data distribution • The number and location of peaks
  • 32. The completed stem-and-leafleaf will is shown So the display be the last below. two digits.TheLet 5.6% be represented (AARP Bulletin, Junethe article “Going Wireless” as 05.6% so that all2009) reported thedigits in front of the decimal. If we numbers have two estimated percentage of due to However, it is somewhat difficult tothe leaf is 5.6 With 05.6%, readhouseholds with only wireless phone service (no behind – use the 2-digits, we would have will be written to 20 the 2-digit stems. from 05 and it stemslandline) for the 50 U.S. states andstems! the second the stem the District of that’s way too many 0. ForColumbia. Data use the first digit (tens) as our stems. So let’s just for the 19 Eastern but theare written number, states first digit 5.7 also is given A common practice is to drop allhere. in theThis makes the (with a behind the stem 0 display leaf. 5.6 5.7 20.0 16.8 16.5 13.4commato read, and easier between). 8.0 10.8 9.3 11.6 11.4 16.3 14.0 10.8 7.8 DOES NOT change the 20.6 10.8 5.1 11.6 What is the leaf for 20.0% overall distribution of A 5stem-and-leaf display is anshould that leafway and What is the variablebe where appropriate0 5.6, 9 8 79.3, 8.0, 7.8, 5.1 5 5.7, 5 5.7 the data set. written?1 6.8, 3 0 13.4, 4 0 0 1summarize theseinterest? 6 6 6.5, 1 6 0.8, 1.6, 1.4, 6.3, 4.0, 0.8, 0.8, 1.6 data. to of2 0.0, 0.6 00 0.0 (A dotplot would also be Wireless percent a reasonable choice.)
  • 33. The article “Going Wireless” (AARP Bulletin, June2009) reported the estimated percentage ofhouseholds with only wireless phone service (nolandline) for the 50 U.S. states and the District ofColumbia. Data for the 19 Eastern states are givenhere. While it is not necessary to write The center of the distribution 0 559875 5789 for the the leaves in order estimated percentage 1 6 6 3 01 1 3 4 6 6 6 0001 1164001 of households with only wireless 2 00 from smallest to phone service is approximately Stem: tens 11%. There doesby doing so, largest, not appear to Leaf: ones be much the centerThisthe variability. ofWrite a few display distribution is more appears to be asentences describing unimodal, symmetric easily seen.this distribution. distribution with no outliers.
  • 34. Comparative Stem-and-Leaf DisplaysWhen to Use Univariate, numerical data with observations from 2 or more groupHow to construct • List the leaves for one data set to the right of the stems • List the leaves for the second data set to the left of the stems • Be sure to include group labels to identify which group is on the left and which is on the right
  • 35. The article “Going Wireless” (AARP Bulletin, June 2009) reported the estimated percentage of households with only wireless phone service (no landline) for the 50 U.S. states and the District of Columbia. Data for the 13 Western states are given Western States Eastern States here. 998 0 555789 8766110 1 00011134666 11.7 18.9 9.0 16.7 8.0 22.1 9.2 10.8 521 2 00 21.1 17.7 25.5 16.3 11.4 Stem: tens Leaf: ones The center of the distribution ofcomparative stem- Create a the estimated and-leaf display comparing theWrite a few of households with only wireless phone service percentage for the Western states is a little larger than the centersentences distributions of the Easterncomparing these states. Both distributions are for the Eastern and Western states.distribution. with approximately the same amount of symmetrical variability.
  • 36. HistogramsWhen to Use Univariate numerical data Dotplots and stem-and-leaf displays are notHow to construct Constructed data Discrete differently for effective ways to summarize numerical • Draw a horizontal scale and mark it with the possible data when the discrete contains a large data set versus continuous values for the variable • Draw a vertical scale and data it datafrequency or number of mark values. data almost Discrete numerical with relative frequencyalways result from counting. In Histograms are value, draw a rectangle centered a such cases, each observation is • Above each possible displays that don’t work at well for small a height corresponding to its that value with data sets but do work well whole number frequency or relative frequency for larger numerical data sets.What to look for Center or typical value; spread; general shape and location and number of peaks; and gaps or outliers
  • 37. Queen honey bees mate shortly after they become adults.During a mating flight, the queen usually takes multiplepartners, collecting sperm that she will store and usethroughout the rest of her life.A paper, “The Curious Promiscuity of Queen Honey Bees”(Annals of Zoology [2001]: 255-265), provided thefollowing data on the number of partners for 30 queenbees.12 2 4 6 6 7 8 7 8 118 3 5 6 7 10 1 9 7 69 7 5 4 7 4 6 7 8 10Here is a dotplotof these data. 2 4 6 8 10 12 Number of Partners
  • 38. The bars should be centered over the discrete data values and have heightsQueen honey bees continued corresponding to the frequency of each data value. 6 Frequency 4 2 0 2 4 6 8 10 12In practice, histograms for discrete data ONLY show the Number of partnersThe distributionnumber built the histogram on of queen rectangular bars. We of partners, partners top of the The variable, for the number of is discrete. Tohoney bees to show create a histogram: with aover the dotplot is approximatelybars are centered center that the symmetricat 7 partners already have athat heights of the bars are discrete data values and horizontal axis – of we and a somewhat large amountvariability. There doesn’t appear to befrequency we need to frequency of each data any outliers. the add a vertical axis for value.
  • 39. Here are two histograms showing the of What do you notice about the shapes“queen bee these two One uses frequency data set”. histograms?on the vertical axis, while the other uses relative frequency
  • 40. Histograms with equal width intervalsWhen to Use Univariate numerical dataHow to construct Continuous data • Mark the boundaries of the class intervals on the horizontal axis • Use either frequency or relative frequency on the vertical axis • Draw a rectangle for each class interval directly above that interval. The height of each rectangle is the frequency or relative frequency of the corresponding intervalWhat to look for Center or typical value; spread; general shape and location and number of peaks; and gaps or outliers
  • 41. The top dotplot shows all the data Consider the following data on carry-on luggage values in each interval stacked in weight for 25 airline passengers.This interval includes 10the the interval. barsbut not With25.0 17.9 the middle 30.0 rectangular to cover continuous data, of 18.0 values 28.2 27.8 10.1 27.6 and all 28.7 up an interval 20.9 data values (notwill 20.8 28.5 15. of 33.8 intervals just one value). including 31.4 The next 27.6 21.9 19.9 include 15 and 28.0 Looking 24.9up todotplot, it 22.7easy 20,see that we all22.4 at this but not including to and so on. values 26.4 22.0 34.5 is 25.3 could use intervals with a width of 5. Here is a is a continuous numerical data set. This dotplot of this data set.
  • 42. From the dotplot, it is easy to see how the continuous histogram is created.
  • 43. Comparative Histograms The article “Early Television Exposure and The biggest difference between the two histograms Subsequent Attention Problems in Children”• Mustthe lowApril with a much higher proportion of 3- is at use two separate histograms with the (Pediatrics, end, 2004) investigated the television same horizontal U.S. children. 0-2 TVfrequency on year-old children axis and relative hours show viewing habits of falling in the These graphsinterval the vertical axis 1-year-old children.3-year old than the viewing habits of 1-year old and children. 1-yr-olds 3-yr-olds
  • 44. Histograms with unequal width intervalsWhen to use when you have a concentration of data in the middle with some extreme valuesHow to construct construct similar to histograms with continuous data, but with density on the vertical axis relative frequency for interval density width of interval
  • 45. When people are asked for the values such as age or weight, they sometimes relative frequency on the verticalThe When using shade the truth in their responses. axis, article “Self-Report of Academic Performance” (Social the proportional area principle is violated. Methods and Research [November 1981]: 165-185) focused on SAT scores and grade point average (GPA). For each student inthe relativethe difference between reported to Notice the sample, frequency for the interval 0.4 GPA and< actual GPA was than the relative frequency for the 2.0 is smaller determined. Positive differences resulted from individuals reporting GPAs the bar is MUCH interval -0.1 to < 0, but the area of larger than theClass Relative Frequency correct value.Interval larger.-2.0 to < -0.4 0.023-0.4 to < -0.2 0.055-0.2 to < 0.1 0.097 -0.1 to < 0 0.210 0 to < 0.1 0.189 0.1 to < 0.2 0.139 0.2 to < 0.4 0.116 0.4 to < 2.0 0.171
  • 46. GPAs continued Class Relative Width Density To fix this problem, we Interval Frequency need to find the -2.0 to < -0.4 0.023 1.6 0.014 density of each -0.4 to < -0.2 0.055 0.2 0.275 interval. -0.2 to < 0.1 0.097 0.1 0.970 -0.1 to < 0 0.210 0.1 2.100 0 to < 0.1 0.189 0.1 1.890 relative frequency for intervaldensity 0.1 to 0.2 0.139 0.1 1.390 width of interval 0.2 to < 0.4 0.116 0.2 0.580 0.4 to 2.0 0.171 1.6 0.107 This is a correct histogram with unequal widths.
  • 47. Displaying BivariateNumerical Data Scatterplots Time Series Plots
  • 48. ScatterplotsWhen to Use Bivariate Numerical dataHow to construct 1. Draw horizontal and vertical axes. Label the horizontal axis and include an appropriate scale for the x-variable. Label the vertical axis and include an appropriate scale for the y-variable. 2. For each (x, y) pair in the data set, add a dot in the appropriate location in the display.What to look for Relationship between x and y
  • 49. The accompanying table gives the cost (indollars) and an overall quality rating for 10different brands of men’s athletic shoes( 65 45 45 80 110 110 30 80 110 70Rating 71 70 62 59 58 57 56 52 51 51Is there a relationship between x = cost andy = quality rating? A scatterplot can help answer this question
  • 50. Cost 65 45 45 80 110 110 30 80 110 70 Rating 71 70 62 59 58 57 56 52 51 51 Is there a relationship 70 between x = cost and Next, plotdraw completed Here is eachand y) pair. yFirst, the (x, label = quality rating? appropriate horizontal scatterplot.Rating 60 and vertical axes. There appears to be a 50 negative relationship 20 40 60 80 100 between cost of athletic Cost shoes and their quality rating – does that surprise you?
  • 51. Time Series PlotsWhen to Use Bivariate data with time and another variableHow to construct 1. Draw horizontal and vertical axes. Label the horizontal axis and include an appropriate scale for the x-variable. Label the vertical axis and include an appropriate scale for the y-variable. 2. For each (x, y) pair in the data set, add a dot in the appropriate location in the display. 3. Connect each dot in orderWhat to look for trends or patterns over time
  • 52. The Christmas Price Index is computed each year byPNC Advisors. It is a humorous look at the cost ofgiving all the gifts described in the popular Christmassong “The Twelve Days of Christmas”( Describe any trends or patterns that you see. Why is there a downward trend between 1993 & 1995?
  • 53. Graphical Displaysin the Media Pie Charts Segmented Bar Charts
  • 54. Pie (Circle) ChartWhen to Use Categorical dataHow to construct • A circle is used to represent the whole data set. • “Slices” of the pie represent the categories • The size of a particular category’s slice is proportional to its frequency or relative frequency. • Most effective for summarizing data sets when there are not too many categories
  • 55. Pie (Circle) ChartThe article “Fred Flintstone, Check Your Policy” (The WashingtonPost, October 2, 2005) summarized a survey of 1014 adultsconducted by the Life and Health Insurance Foundation forEducation. Each person surveyed was asked to select which of fivefictional characters had the greatest need for life insurance:Spider-Man, Batman, Fred Flintstone, Harry Potter, and MargeSimpson. The data are summarized in the pie chart. The survey results were quite different from the assessment of an insurance expert. The insurance expert felt that Batman, a wealthy bachelor, and Spider-Man did not need life insurance as much as Fred Flintstone, a married man with dependents!
  • 56. Segmented can be difficult to construct by A pie chart (or Stacked) Bar ChartsWhen to Use circular Categorical data makes hand. The shape sometimes if difficult to compare areas for different categories, particularly when the relativeHow to construct frequencies are similar. • Use a rectangular bar rather than a circle to represent the entire data set. So, we could use a segmented bar chart. • The bar is divided into segments, with different segments representing different categories. • The area of the segment is proportional to the relative frequency for the particular category.
  • 57. Segmented (or Stacked) Bar ChartsEach year, the Higher Education Research Instituteconducts a survey of college seniors. In 2008,approximately 23,000 seniors participated in the survey(“Findings from the 2008 Administration of the CollegeSenior Survey,” Higher Education Research Institute,June 2009).This segmented barchart summarizesstudent responses tothe question: “Duringthe past year, how muchtime did you spendstudying and doinghomework in a typicalweek?”
  • 58. Common Mistakes
  • 59. Avoid these Common Mistakes 1. Areas should be proportional to frequency, relative frequency, or magnitude of the number being represented.By replacing naturally drawn to The eye is the bars of a bar large areas in graphical displays. chart with milk buckets, Sometimes, indistorted. to make areas are an effort the graphical displays more interesting, designers1980 sight The two buckets for lose of this important principle. represent 32 cows, whereas Consider this graph (1970 Today, the one bucket for USA October 3, 2002).cows. represents 19
  • 60. Avoid these Common Mistakes1. Areas should be proportional to frequency, relative frequency, or magnitude of the number being represented.Another common distortionoccurs when a thirddimension is added to barcharts or pie charts. Thisdistorts the areas andmakes it much moredifficult to interpret.
  • 61. Avoid these Common Mistakes2. Be cautious of graphs with broken axes (axes that don’t start at 0).• The use of broken axes in a scatterplot does not result in a misleading picture of the relationship of bivariate data.• In time series plots, broken axes can sometimes exaggerate the magnitude of change over time.• In bar charts and histograms, the vertical axis should NEVER be broken. This violates the “proportional area” principle.
  • 62. Avoid these Common Mistakes2. Be cautious of graphs with broken axes (axes that don’t start at 0).This bar chart is similar toone in an advertisement fora software product designedto raise student test scores.Areas of the bars are notproportional to themagnitude of the numbersrepresented – the area forthe rectangle 68 is morethan three times the area ofthe rectangle representing55!
  • 63. Avoid these Common Mistakes3. Notice that the intervals between observations are Watch out for unequal time spacing in time irregular,plots. points in the plot are equally spaced series yet the along the time axis. This makes it difficult to assess the rate ofis a correct time series plot. Here change over time.If observationsover time are notmade at regulartime intervals,special care mustbe taken inconstructing thetime series plot.
  • 64. Avoid these Common Mistakes 4. Be careful how you interpret patterns in Does an increase in the number of Methodist scatterplots. ministers CAUSE the increase in imported rum? Consider the following scatterplot showing the relationship between the number of Methodist ministers in New England and the amount of Cuban rum imported into Boston from 1860 to 1940 ( 35000 r = .999973 30000A strong pattern in a Number of Barrels of Imported Rum 25000scatterplot means that 20000the two variables tend tovary together in a 15000predictable way, BUT it 10000does not mean that thereis a cause-and-effect 5000 0 50 100 150 200 250 300relationship. Number of Methodist Ministers
  • 65. Avoid these Common Mistakes5. Make sure that a graphical display creates the right first impression.Consider the following graphfrom USA Today (June 25,2001). Although this graphdoes not violate theproportional area principle,the way the “bar” for thenone category is displayedmakes this graph difficult toread. A quick glance at thisgraph may leave the readerwith an incorrect impression.