The Role of Box Plots in Comparing Multiple Data Sets

• 1. CITOOLKIT Box Plot The Role of Box Plots in Comparing Multiple Data Sets
• 2. citoolkit.com Definition One of the best ways to analyze any process is to plot the data on a graph or chart. Box Plot 2 * X X X X X X X X X X X
• 3. citoolkit.com Definition A box plot is a graph that shows the frequency of numeric data values. Box Plot 3 Also referred to as a Box-and- Whisker Plot as it displays the data in a box-and-whiskers format.
• 4. citoolkit.com Applications Mainly used to explore data as well as to present the data in an easy and understandable manner. Box Plot 4 Box plots are widely used in statistics, process improvement, scientific research, economics, and in social and human sciences
• 5. citoolkit.com Vertical vs. Horizontal Box plots can be drawn either vertically or horizontally. Box Plot 5 The length of the box plot indicates the spread of the data.
• 6. citoolkit.com Characteristics They provide a quick way for examining the central tendency and variation present in the data. Box Plot 6 * * Outliers are usually plotted as asterisks A wider range boxplot indicates more variability.
• 7. citoolkit.com Characteristics Box plots are useful when comparing between several data sets. Box Plot 7 A B C D In terms of central tendency and variability.
• 8. citoolkit.com Characteristics The same continuous data can be presented graphically using histograms and box plots. Box Plot 8 Mean * ●
• 9. citoolkit.com Characteristics Less detailed than histograms and take up less space which make them more practical when comparing multiple data sets. Box Plot 9
• 10. citoolkit.com Uses Used to check if there is a significant difference in the process after implementing process improvement. Box Plot 10 BEFORE AFTER In terms of central tendency and variability.
• 11. citoolkit.com Structure A box plot is made up of a box and two whiskers. Box Plot 11 Box Whisker Whisker Interquartile Range The maximum length of a whisker is limited to 1.5 times the interquartile range (IQR).
• 12. citoolkit.com Structure Box plots summarize key statistics from the data. Box Plot 12 Median Maximum value Minimum value ● ● Lower quartile Upper quartile The middle line is the median of the data points.
• 13. citoolkit.com Structure The data is plotted such as . . . The middle 50% of the data points The top 25% of the data points The bottom 25% of the data points 25% 25% Quartile Group 4 Quartile Group 1 Interquartile Range Note: When the median line is not present in the box plot, it suggests that it coincides with one of the quartiles. Box Plot 13
• 14. citoolkit.com Structure Sometimes the mean is displayed with a special character. Box Plot 14 Median Mean Other character can indicate the mean such as a diamond or a plus.
• 15. citoolkit.com Structure Any data beyond the whiskers are considered outliers. Box Plot 15 * Outlier values If the values are real, you should investigate what was going on in the process at that time. Outliers often reflect errors in data recording or data entry
• 16. citoolkit.com Structure Box Plot 16 * Outlier Median Mean 25% 25% Quartile Group 4 Quartile Group 1 Interquartile Range 25% 25% Maximum value Minimum value Lower quartile Upper quartile Box Whisker ●
• 17. citoolkit.com Data Size Like histograms, used for moderate to large amount of data. Box Plot 17 N = 40 N = 14 The size of the box plot can vary significantly if the data size is too small.
• 18. citoolkit.com Individual Value Plots Individual Value Plots are preferred over boxplots when representing small amount of data. Box Plot 18
• 19. citoolkit.com Symmetric Distribution Can tell whether the distribution is symmetrical or skewed. Box Plot 19 Symmetric Non symmetric Mean Median In a symmetric distribution, the mean and median are nearly the same, and the two whiskers has almost the same length.
• 20. citoolkit.com Example – Crop Yield The following box plots display the yield of a crop resulting from the application of two different fertilizers. Box Plot 20 Fertilizer #2 appears to yield higher crop yields than Fertilizer #1. What other comments would you make about the above box plots? Think about the variation as well as the presence of any unusual values or outliers. Fertilizer 2 Fertilizer 1 55 50 45 40 35 Yield Data source: Minitab What other comments would you make about the above boxplots? Think about the variation as well as the presence of any unusual values.
• 21. citoolkit.com Example – Diabetes Test The below box plots illustrate an analysis that was conducted for diagnosing the presence of diabetes at a workplace. Box Plot 21 Female Male 500 400 300 200 100 0 Gender Test results * * * * * * * * * * * * * * * * It is evident that females have in general higher glucose levels than males. To confirm the statistical significance of this difference, ANOVA can be applied to compare the means between the two groups.
• 22. citoolkit.com Further Information There are many applications and online services that allow the creation of box plots quickly and easily (such as Minitab, JMP, and SPSS). Box Plot 22