# Box whisker show

Application of Box-whisker plot in outlayer detection and in assessing change in learning performance

1. 1. Application of Box-whisker Plot in Psychological Research Dr. D. Dutta Roy, Ph.D. Psychology Research Unit INDIAN STATISTICAL INSTITUTE 203, B.T. Road, Kolkata – 700 108 E-mail: ddroy@isical.ac.in http://www.isical.ac.in/~ddroy Venue: Psychology Research Unit, ISI., Kolkata
2. 2. Box-Whisker Plot JOHN WILDER TUCKY(1915-2000) • It is a plot that displays summary information about the distribution of the values. • SPSS and STATISTICA are useful statistical software to draw box whisker plot. Dr. D. Dutta Roy, Indian Statistical Institute
3. 3. PROPERTIES Dr. D. Dutta Roy, Indian Statistical Institute
4. 4. HINGES• There are two hinges 25th and 75th percentiles.• The lowest boundary of the box is the 25th percentile and upper boundary of the box is 75th percentile.• Horizontal line inside the box represents the median. 50% of the cases are included within the box. Dr. D. Dutta Roy, Indian Statistical Institute
5. 5. WhiskersThe largest and smallest observed values that are not outliers are shown in lines.Lines are drawn from the ends of the box to these values. These lines are called whiskers. Dr. D. Dutta Roy, Indian Statistical Institute
6. 6. OUTLYING VALUES:cases with values that are 1.5 box length more than three box- 6 5 lengths from the upper or 6 4 lower age of the box are 3 called extreme values. 2These are designated with an 1 asterisk(*) and O. 0 N= 6 DATA 3 box length Cases with values that are 12 between 1.5 and 3 box- 10 6 lengths from the upper or 8 lower age of the box are 6 called outliers and 4 designated with a circle or 2 0 O N= 6 DATA Dr. D. Dutta Roy, Indian Statistical Institute
7. 7. Normal Probability Curve Properties• Mean, Median, Mode values = 0.• The mean, median and the mode all coincide and there is perfect balance between the right and left halves of the curve.• Between the Mean and ( + - 1 SD) or the middle two-thirds = 68.27% of total cases.• Between the Mean and (+ - 2 SD) = 95% of total cases.• Between the Mean and (+ - 3 SD) = 99.7% or 100% of total cases.• Skewness = 0.• Positive skewness = When distribution spreads to the left, it is negatively skewed and positive skewness is opposite.• Peakedness =Mesokurtic. Dr. D. Dutta Roy, Indian Statistical Institute
8. 8. Box-Plot and NPC Dr. D. Dutta Roy, Indian Statistical Institute
9. 9. Application• Outlier detection• Detecting changes in learning process Dr. D. Dutta Roy, Indian Statistical Institute
10. 10. What is outlier ?• Outliers are observations with a unique combination of characteristics identifiable as distinctly different Do you find outliers in the pictures ? from the other observations. Dr. D. Dutta Roy, Indian Statistical Institute
11. 11. Impact of outliers Correlations (After Eliminating 99999) Correlations income income Pearson Pearson 1 1 income Correlationincome Correlation Sig. (2-tailed) Sig. (2-tailed) N 58 N 60 Pearson Pearson 0.50 0.16988 expenditure Correlationexpenditure Correlation Sig. (2-tailed) 0.00 0.20646 N 55 Sig. (2-tailed) N 57 **. Correlation is significant at the 0.01 level (2-tailed). Dr. D. Dutta Roy, Indian Statistical Institute
12. 12. Types of Outliers Dr. D. Dutta Roy, Indian Statistical Institute
13. 13. 1. Procedural ErrorThis is data entry error for mistake in coding. Dr. D. Dutta Roy, Indian Statistical Institute
14. 14. 2. Extra ordinary event and researcher has own explanation. Correlations (After Eliminating 99999) Correlations income income Pearson Pearson 1 1 income Correlationincome Correlation Sig. (2-tailed) Sig. (2-tailed) N 58 N 60 Pearson Pearson 0.50 0.16988 expenditure Correlationexpenditure Correlation Sig. (2-tailed) 0.00 0.20646 N 55 Sig. (2-tailed) N 57 **. Correlation is significant at the 0.01 level (2-tailed). Dr. D. Dutta Roy, Indian Statistical Institute
15. 15. 3. Extra ordinary event and researcher has no explanation.4. Observations that fall within the ordinary range of values or each of the variables but are unique in their combination of values across the variables. Dr. D. Dutta Roy, Indian Statistical Institute
16. 16. Is outlier harmful ?• Outliers can not be categorized as either beneficial or problematic, but instead must be viewed within the context of the analysis and should be evaluated by the types of information they may provide. Dr. D. Dutta Roy, Indian Statistical Institute
17. 17. Can outlier be detected ?• Robust statistics like correlation is seriously affected by the outliers. Therefore outlier detection is prelude for item analysis, or testing reliability and validity of the psychological instrument using correlation coefficients.• In univariate statistics, Outlier can be detected by stem-leaf plot and box-whisker plots.• In bivariate statistics, scatter plot and in multivariate statistics, Mahalanobis D2 is useful for outlier detection. 12 10 6 8 6 4 2 0 N= 6 DATA Dr. D. Dutta Roy, Indian Statistical Institute
18. 18. The Information out of propertiesThe box-plot contains an impressive amount of information.• From the median one can determine the central tendency or location.• From the length of the box one can determine the spread, or variability, of observation.• If the median is not in the centre of the box, the observed values are skewed.• If the median is closer to the bottom of the box than to the top, the data are positively skewed.• If the median is closer to the top of the box than to the bottom the distribution is negatively skewed.• The length of the tail is shown by the whiskers and the outline and extreme points. Dr. D. Dutta Roy, Indian Statistical Institute
19. 19. CASE STUDY ON APPLICATION OF BOX-WHISKER PLOT IN DETECTING CHANGE IN LEARNING PROCESS Dr. D. Dutta Roy, Indian Statistical Institute
20. 20. Detecting change in learning process• Learning is the modification of behaviour through practice and experience.• Change in learning process can be usually detected using Learning curve.• A learning curve is a graphical representation of the changing rate of learning for a given activity or tool.• Typically, the increase in retention of information is sharpest after the initial attempts, and then gradually evens out, meaning that less and less new information is retained after each repetition. Dr. D. Dutta Roy, Indian Statistical Institute
21. 21. CASE STUDY25 students were trained with 7 training modules of Fast ForWord.Results were analyzed in terms of box- whisker plots. Dr. D. Dutta Roy, Indian Statistical Institute
22. 22. Circus Sequence (CS)• The participant develops listening accuracy by presenting sweep sounds at different frequencies, durations, and with different lengths of time between sounds. The frequencies and durations of the sound sweeps correspond to the rapid transitions in the sounds of the English language. Dr. D. Dutta Roy, Indian Statistical Institute
23. 23. Results of CS Box & Whi s er Pl ot (CS exerci s k e, T reatm ents = 34) 110 90 70Percentage of Success 50 30 10 M i n-M ax -10 25%-75% T1 T3 T5 T7 T 9 T 11 T 13 T 15 T 17 T 19 T 21 T 23 T 25 T 27 T 29 T 31 T 33 T2 T4 T6 T 8 T 10 T 12 T 14 T 16 T 18 T 20 T 22 T 24 T 26 T 28 T 30 T 32 T 34 M edi an val ue Size: Box size gradually becomes larger indicating inclusion of more number of cases in learning competency group. Location of median: Median moves upward with successive trials. This indicates successive learning competency across trials. Whiskers: Upper whisker gradually vanishes and lower whisker moves upward. This indicates achievement of learning competency of most cases though few cases found difficulty to achieve. After 100% achievement, box size increase indicating fluctuation of attention or operation of other intervening factors operate when one achieves the goal. Dr. D. Dutta Roy, Indian Statistical Institute
24. 24. Old MacDonald’s Flying Farm (OM)• Students use the computer mouse to catch and hold a flying animal. The animal repeats a single syllable several times, and students must release the animal when they hear a change in the syllable. Dr. D. Dutta Roy, Indian Statistical Institute
25. 25. Results of OM Box & Whisker Plot (OM exercise, Treatment = 20) 110 100 90 80 70Percentage of Success 60 50 40 30 20 10 0 Min-Max -10 25%-75% T1 T3 T5 T7 T9 T11 T13 T15 T17 T19 T2 T4 T6 T8 T10 T12 T14 T16 T18 T20 Median value Dr. D. Dutta Roy, Indian Statistical Institute
26. 26. Phonic Words (PW)• Students see two Box & Whisker Plot (PW Exercise, Treatment = 20) 110 pictures representing 90 70 two similar words that Percentage of Success 50 differ only by initial or 30 final consonant (“tack” 10 versus “tag”). When Min-Max -10 25%-75% T1 T3 T5 T7 T9 T11 T13 T15 T17 T19 T2 T4 T6 T8 T10 T12 T14 T16 T18 T20 Median value students hear the word representing one of the pictures, they must click the picture that matches the word Dr. D. Dutta Roy, Indian Statistical Institute
27. 27. Compare relative effectiveness of training modules Box & Whisker Plot (PW Exercise, Treatment = 20) 110 90 70 Percentage of Success 50 30 10 Min-Max -10 25%-75% T1 T3 T5 T7 T9 T11 T13 T15 T17 T19 T2 T4 T6 T8 T10 T12 T14 T16 T18 T20 Median value Box & Whisker Plot (OM exercise, Treatment = 20) Box & Whisker Plot (CS exercise, Treatments = 34) 110 110 100 90 90 80 70 Percentage of Success 70Percentage of Success 60 50 50 40 30 30 20 10 10 0 Min-Max Min-Max -10 -10 25%-75% 25%-75% T1 T3 T5 T7 T9 T11 T13 T15 T17 T19 T1 T3 T5 T7 T9 T11 T13 T15 T17 T19 T21 T23 T25 T27 T29 T31 T33 Median value T2 T4 T6 T8 T10 T12 T14 T16 T18 T20 Median value T2 T4 T6 T8 T10 T12 T14 T16 T18 T20 T22 T24 T26 T28 T30 T32 T34 Dr. D. Dutta Roy, Indian Statistical Institute
28. 28. SUMMARY• Box-whisker plot is useful statistical tool to detect outliers and to detect change in the learning process.• Box plot is effective statistical tool to compare relative effectiveness of different training modules. Dr. D. Dutta Roy, Indian Statistical Institute