Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Displaying Distributions with Graphs


Published on

Introduction to Statistics and probability Theory

Published in: Education, Technology
  • Be the first to comment

Displaying Distributions with Graphs

  1. 1. INTRODUCTION TO STATISTICS & PROBABILITY Chapter 1: Looking at Data—Distributions Dr. Nahid Sultana
  2. 2. Chapter 1: Looking at Data—Distributions  1.1 Displaying Distributions with Graphs  1.2 Describing Distributions with Numbers  1.3 Density Curves and Normal Distributions
  3. 3. 1.1 Displaying Distributions with Graphs Objectives  Variables; Types of variables  Graphs for categorical variables  Bar graphs  Pie charts  Graphs for quantitative variables  Histograms  Stemplots  Stemplots versus histograms; Interpreting histograms  Time plots
  4. 4. Variables  Statistics is the science of learning from data.  Individuals are the objects described in a set of data. Individuals can be people, animals, plants, or any object of interest.  A variable is any characteristic of an individual.  A variable varies among individuals i.e. it can take different values for different individuals. Example: age, height, blood pressure, ethnicity, first language
  5. 5. Two types of variables  Variables can be either categorical  Something that falls into one of several categories. Example: Your blood type (A, B, AB, O), your hair color, your ethnicity, whether you paid income tax last tax year or not.  Or quantitative  Something that takes numerical values for which arithmetic operations, such as adding and averaging, make sense. Example: How tall you are, your age, your blood cholesterol level, the number of credit cards you own.
  6. 6. How do you decide if a variable is categorical or quantitative? Ask:  What are the n individuals examined?  What is being recorded about those n individuals?  Is that a number ( quantitative) or a statement ( categorical)? Individuals studied Diagnosis Age at death Patient A Heart disease 56 Patient B Stroke 70 Patient C Stroke 75 Patient D Lung cancer 60 Patient E Heart disease 80 Patient F Accident 73 Patient G Diabetes 69 Each individual is given a description Each individual is given a meaningful number
  7. 7. A study examined the condition of deer after a particularly nasty winter. Sex and condition (good and poor) of a random sample of 61 deer are noted. Data from such a study could appear in either of these two formats: Frequency table Raw data Note: “Count” is NOT the variable studied – it’s a summary statistic for the data set. Who/what are the individuals? the 61 deer What are the variables, and are they quantitative or categorical? sex (categorical) and condition (categorical)
  8. 8. Distribution of a Variable To examine a single variable, we graphically display its distribution. The distribution of a variable tells us what values it takes and how often it takes these values. Distributions can be displayed using a variety of graphical tools. The proper choice of graph depends on the nature of the variable. Categorical Variable Pie chart Bar graph Quantitative Variable Histogram Stemplot
  9. 9. Distribution of Categorical Variables Most common ways to graph categorical data:  Bar Graphs represent each category as a bar whose heights represents either the count of individuals with that characteristic, the frequency, or the percent of individuals with that characteristic, the relative frequency. Bar graph quickly compares the size of each group.  Pie Charts show the distribution of a categorical variable as a “pie” whose slices are sized by the percents for the categories. Require that you include all the categories that make up a whole.
  10. 10. Bar Graphs and Pie Charts Marital Status Single Married Widowed Divorced Count (millions) 41.8 113.3 13.9 16.3 Percent 22.6 61.1 7.5 8.8
  11. 11. Pie Charts and Bar Graphs (cont…)
  12. 12. Pie Charts and Bar Graphs (cont…) Bar Graphs  Data in the graph can be ordered any way we want (alphabetical, by increasing value, by year, by personal preference, etc.)
  13. 13. Pie Charts and Bar Graphs (cont…)
  14. 14. Pie Charts and Bar Graphs (cont…) Pie Graphs
  15. 15. Pie Charts and Bar Graphs (cont…) Pie Graphs
  16. 16. Distribution of Quantitative Variables  Tells us what values the variable takes on and how often it takes those values.  Can be displayed using:  Histograms  Stemplots  Time plots Histograms and stemplots are summary graphs for a single variable. They are very useful to understand the pattern of variability in the data. Time plot shows the behavior of observations over time.
  17. 17. Histogram Histograms show the distribution of a quantitative variable by using bars whose height represents the number of individuals who take on a value within a particular class. Draw a histogram :  Divide the possible values into equal size interval (classes). This makes up the horizontal axis.  Count how many observations fall into each interval (may change to percents).  For each class on the horizontal axis, draw a bar. The height of the bar represents the count (or percent) of data points that fall in that class interval.
  18. 18. Histogram (Cont…) Example: Weight Data―Introductory Statistics Class Count 7 12 7 8 12 4 1 0 1 Weight Data Number of Students Weight Group 100 - <120 120 - <140 140 - <160 160 - <180 180 - <200 200 - <220 220 - <240 240 - <260 260 - <280 15 10 5 0 Weight The first bar represents all students with weight 100-<120. The height of this bar shows how many students’ weight are in this range.
  19. 19. Stemplots Stemplots separate each observation into a stem and a leaf that are then plotted to display the distribution while maintaining the original values of the variable. To construct a stemplot:  Separate each observation into a stem (first part of the number) and a leaf (the remaining part of the number).  Write the stems in a vertical column; draw a vertical line to the right of the stems.  Write each leaf in the row to the right of its stem; order leaves if desired.
  20. 20. Stemplots (Cont…) Example: Stemplot of the percents of females who are literate.
  21. 21. Stemplots (Cont…) (a)Write the stems. (b) Go through the data and write each leaf on the proper stem (c) Arrange the leaves on each stem in order out from the stem.
  22. 22. Stemplots (Cont…)  To compare two related distributions, a back-to-back stem plot with common stems is useful. Example: Here this Back-to-back stemplot comparing the distributions of female and male literacy rates. Values on the left are the female percents, ordered out from the stem from right to left. Values on the right are the male percents. It is clear that literacy is generally higher among males than among females in these countries.
  23. 23. Stemplots (Cont…)  Stem plots do not work well for large datasets.  When the observed values have too many digits, trim the numbers before making a stem plot.  If there are very few stems (when the data cover only a very small range of values), then we may want to create more stems by splitting the original stems. Example: If all of the data values were between 150 and 179, then we may choose to use the following stems: 15 15 Leaves 0–4 would go on each upper stem 16 (first “15”), and leaves 5–9 would go on 16 each lower stem (second “15”). 17 17
  24. 24. Examining Distributions  When describing the distribution of a quantitative variable, we look for the overall pattern and for striking deviations from that pattern.  We can describe the overall pattern of a histogram by its shape, center, and spread. Histogram with a smoothed curve highlighting the overall pattern of the distribution
  25. 25. Examining Distributions (Cont…)  A distribution is symmetric if the right and left sides of the graph are approximately mirror images of each other.  It is skewed to the left (left-skewed) if the left side of the graph is much longer than the right side.  A distribution is skewed to the right (right-skewed) if the right side of the graph is much longer than the left side.
  26. 26. Outliers  Outliers are observations that lie outside the overall pattern of a distribution. The overall pattern is fairly symmetrical except for two states Alaska and Florida. A large gap in the distribution is typically a sign of an outlier. Alaska Florida
  27. 27. Time Plots  Time plot of a variable plots each observation against the time at which it was measured.  Always put time on the horizontal axis and the measuring variable is on the vertical axis of the plot .  Connect the data points by lines helps emphasize any change over time.
  28. 28. Time Plots (cont…) Scale matter Look at the scales
  29. 29. Graphing time series  Data collected over time are displayed in a time plot, with time on the horizontal axis and the variable of interest on the vertical axis.  In a time series, a trend is a rise or fall that persists over time, despite small irregularities. o This plot is a graph of a time series . o It shows that there is a decreasing trend in the data.