Your SlideShare is downloading. ×
0
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Descriptives & Graphing
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Descriptives & Graphing

16,964

Published on

Overviews descriptive and graphical approaches to analysis of univariate data.

Overviews descriptive and graphical approaches to analysis of univariate data.

Published in: Education, Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
16,964
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
664
Comments
0
Likes
3
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • 7126/6667 Survey Research & Design in Psychology
    Semester 1, 2015, University of Canberra, ACT, Australia
    James T. Neill
    http://www.slideshare.net/jtneill/descriptives-graphing
    http://en.wikiversity.org/wiki/Survey_research_and_design_in_psychology/Lectures/Descriptives_%26_graphing
    Image source: http://commons.wikimedia.org/wiki/File:3D_Bar_Graph_Meeting.jpg
    Image author: lumaxart, http://www.flickr.com/photos/lumaxart/2136954043/
    Image license: Creative Commons Attribution Share Alike 2.0 unported, http://creativecommons.org/licenses/by-sa/2.0/deed.en
    Description: Overviews descriptive and graphical approaches to analysis of univariate data.
  • You are adding tools to your toolkit
    Image: Clipart
  • <number>
  • <number>
  • Image source: http://www.flickr.com/photos/rnddave/5094020069
    By David (rnddave), http://www.flickr.com/photos/rnddave/
    License: CC-by-SA 2.0 http://creativecommons.org/licenses/by-sa/2.0/deed.en
  • Image source: http://www.flickr.com/photos/analytik/1356366068/
    By analytic http://www.flickr.com/photos/analytik/
    License: CC-by-SA 2.0 http://creativecommons.org/licenses/by-sa/2.0/deed.en
  • Image source: http://www.flickr.com/photos/elmoalves/2932572231/
    By analytic Elmo Alves http://www.flickr.com/photos/elmoalves/
    License: CC-A 2.0 http://creativecommons.org/licenses/by/2.0/deed.en
  • Image source: http://www.flickr.com/photos/lloydm/2429991235/
    By analytic fakelvis http://www.flickr.com/photos/lloydm/
    License: CC-by-SA 2.0 http://creativecommons.org/licenses/by-sa/2.0/deed.en
  • Image source: http://www.flickr.com/photos/peanutlen/2228077524/ by Smile My Day
    Image author: Terence Chang, http://www.flickr.com/photos/peanutlen/
    Image license: CC-by-A 2.0
  • <number>
  • <number>
  • <number>
  • <number>
  • This lecture focuses on univariate descritive statistics and graphs
  • By using univariate statistics and possibly also graphs, we want to give a meaningful snapshop summary which captures the main features of each variable’s distribution.
  • Nominal Mode e.g., what’s the favourite colour?
    Ordinal Median e.g.,
    See also http://www.quickmba.com/stats/centralten/
    OVERHEAD p.84 Bryman & Duncan (1997)
  • Nominal data consists of labels - e.g., 1=no, 2=yes
    Note that if you want to test whether one frequency is significantly higher than another, then use binomial test or a contingency test (chi-square).
    Also note that nominal variables can be used as IV’s in tests of mean differences, but not in parametric tests of association such as correlations. To use in parametric tests of association, nominal data can be dummy coded (i.e., converted into a series of dichotomous variables).
  • Note: Can use ordinal data as IV’s in tests of mean differences, but not in parametric tests of association.
  • Image source: Unknown.
  • e.g., for a distribution with 32%, 33%, and 34%, the mode would be misleading to report; instead it would be appropriate to report the similar % for each of the three categories.
  • Note: Can use ordinal data as IV’s in tests of mean differences, but not in parametric tests of association.
    Crosstabs (contingency table) is the bivariate equivalent of frequencies
  • Image source: Bell Curve http://www.flickr.com/photos/trevorblake/3200899889/
    By Trevor Blake http://www.flickr.com/photos/trevorblake/
    License: CC-by-SA 2.0 http://creativecommons.org/licenses/by-sa/2.0/deed.en
  • Image source: Unknown
    Karl Pearson in his 1893 letter to Nature suggested that the moments about the mean could be used to measure the deviations of empirical distributions from the normal distribution
    Moments around the mean:
    http://www.visualstatistics.net/Visual%20Statistics%20Multimedia/normalization.htm
  • Imge sources: Clipart
  • Standard deviation is related to the scale of measurement, e.g.
    If SD = 1 one for cms, it would be 10 for ms and 1000 for km
    So, don’t assume SD = 50 is big or SD = .1 is small – it all depends on what scale is used.
    Be aware that the lower the N, the lower the SD –> large samples reduce the SD
    N-1 is the formula when generalising from a sample to a population; otherwise use N if its the SD for the sample.
  • Image sources: Image source: http://www.visualstatistics.net/Visual%20Statistics%20Multimedia/normalization.htm
  • Image source: Unknown.
    The kurtosis reflects the extent to which the density of the empirical distribution differs from the probability densities of the normal curve.
    Mesokurtic = 0
    http://www.visualstatistics.net/Visual%20Statistics%20Multimedia/normalization.htm
  • The significance tests for skewness / kurtosis are not often used, at least in part because they are subject to sample size, so with a small size sample they are less likely to be significant than with a large sample size.
  • Image source: Unknown.
  • Image source: Unknown.
    The significance tests for skewness / kurtosis are subject to sample size, so with a small size sample they are less likely to be significant than with a large sample size.
  • Image source: James Neill, 2007, Creative Commons Attribution 2.5 Australia.
    Roughly normal, with positive skew
  • Image source: James Neill, 2007, Creative Commons Attribution 2.5 Australia.
    Bimodal
  • Image source: James Neill, 2007, Creative Commons Attribution 2.5 Australia.
    Bimodal, with positive skew
  • Image source: James Neill, 2007, Creative Commons Attribution 2.5 Australia.
    At what age do you think you will die?
    There is an outlier near zero which is minimising the positive skew; the data is also leptokurtic.
  • Image source: James Neill, 2007, Creative Commons Attribution 2.5 Australia.
    This distribution is bi-modal. It should not be treated as normal.
    In fact, if one looks more closely, it would sense to break down the distribution by gender.
    From the Quick Fun Survey data in Tutorial 1.
  • Image source: James Neill, 2007, Creative Commons Attribution 2.5 Australia.
    This is the distribution for males; it has a ceiling effect, with ‘very feminine’ not being selected at all (and not shown on the graph – it should be). It is negatively skewed and leptokurtic. Note though that because ‘Very feminine’ has no cases and is not shown, the population data would probably be even more skewed than this sample indicates. It is probably leptokurtic.
  • OVERHEAD BOX-PLOT p.84 Bryman & Duncan (1997)
  • The more skewed a distribution is, the more important it is to use the median tends as a measure of central tendency
  • Image source: Unknown
  • Image source: http://www.flickr.com/photos/pagedooley/2121472112/
    By Kevin Dooley http://www.flickr.com/photos/pagedooley/
    License: CC-by-A 2.0 http://creativecommons.org/licenses/by/2.0/deed.en
  • Image source: http://en.wikipedia.org/wiki/File:FAE_visualization.jpg
    License: Public domain
  • http://www.youtube.com/watch?v=dvM4JPGsmVw
    Image source:http://commons.wikimedia.org/wiki/File:Parodyfilm.png
    Image author: FRacco, http://commons.wikimedia.org/wiki/User:FRacco
    Image license: Creative Commons Attribution 3.0 unported, http://creativecommons.org/licenses/by-sa/3.0/deed.en
  • For more examples of bias questions, see Nardi (2006). pp. 66-67
    Image source:http://commons.wikimedia.org/wiki/File:Parodyfilm.png
    Image author: FRacco, http://commons.wikimedia.org/wiki/User:FRacco
    Image license: Creative Commons Attribution 3.0 unported, http://creativecommons.org/licenses/by-sa/3.0/deed.en
  • Image source:http://www.processtrends.com/TOC_data_visualization.htm
    Cleveland, William S., Elements of Graphing Data, 1985
    License: Unknown
    Cleveland (1984) conducted experiments to measure people's accuracy in interpreting graphs, with findings as follows (Robbins):
    Position along a common scale
    Position along non aligned scales
    Length
    Angle-slope
    Area
    Volume
    Color hue - saturation - density
  • Image source: Unknown
  • Image source: Wild, C. J., & Seber, G. A. F. (2000). Chance encounters: A first course in data analysis and inference. New York: Wiley.
    DV = height (ratio)
    IV = Gender (categorical)
  • Image source: Unknown.
    A bit of a plug and plea for stem & leaf plots – they are underused. They are powerful because they are:
    Efficient – e.g., they contain all the data succinctly – others could use the data in a stem & leaf plot to do further analysis
    Visual and mathematical: As well as containing all the data, the stem & leaf plot presents a powerful, recognizable visual of the data, akin to a bar graph. Turning a stem & leaf plot 90 degrees counter-clockiwse is recommend – this makes the visual display more conventional and is easy to recognise, and the numbers are are less obvious, hence emphasizing the visual histogram shape.
  • Image source: Unknown.
    This is a univariate precursor to a scatterplot (a plot of a ratio by ratio variable).
    It works if there is a small amount of data; otherwise use a histogram to indicate the frequency within equal interval ranges.
    From: http://www.physics.csbsju.edu/stats/display.distribution.html
    Image source: Unknown.
    Karl Pearson in his 1893 letter to Nature suggested that the moments about the mean could be used to measure the deviations of empirical distributions from the normal distribution
    Moments around the mean:
    http://www.visualstatistics.net/Visual%20Statistics%20Multimedia/normalization.htm
    Image source: James Neill, 2007, Creative Commons Attribution 2.5 Australia.
    Histogram: At what age do you think you will die?
    There is an outlier near zero which is minimising the positive skew; the data is also quite strongly leptokurtic.
  • Image source: Unknown.
  • Tufte, Edward R., The Visual Display of Quantitative Information, 1983
  • Tufte, Edward R., The Visual Display of Quantitative Information, 1983
  • Visualizing Quantitative Data, Tufte E. R., Graphics Press, 2001
    Graphical Methods for Data Analysis, Chambers J., Cleveland, B. Kleiner, and P. Tukey, Duxbury Press, Boston, 1983
    Exploratory Data Analysis, Tukey J., Addison-Wesley Pub Co., 1977
  • Transcript

    • 1. Lecture 3 Survey Research & Design in Psychology James Neill, 2015 Descriptives & Graphing
    • 2. 2 Overview: Descriptives & Graphing 1. Steps with data 2. How to approach data 3. Parametric vs. non-parametric & LOM 4. Descriptive statistics 5. Normal distribution 6. Non-normal distributions 7. Effect of skew on central tendency 8. Principles of graphing 9. Univariate graphical techniques
    • 3. 3 Steps with data
    • 4. 4 Steps with dataCheck and screen data Explore, describe, & graph Test hypotheses
    • 5. 5 Data checking • Survey data can be checked by having one person read the survey responses aloud to another person who checks the electronic data file. • For a large study, a proportion of the surveys can be checked and the error-rate declared in the research report.
    • 6. 6 Data screening • Carefully 'screening' a data file helps to minimise errors and maximise validity. • For example, screen for: –Out of range values (min. and max.) –Mis-entered data –Missing cases –Duplicate cases –Missing data
    • 7. 7 Don't be afraid - you can't break data!
    • 8. 8 Play with the data – get to know it.
    • 9. 9 Get intimate with your data
    • 10. 10 Clearly report the data's main features find a meaningful, accurate way to depict the ‘true story’ of the data
    • 11. 11 Parametric & non-parametric statistics & level of measurement
    • 12. 12 Golden rule of data analysis Level of measurement determines type of descriptive statistics and graphs Level of measurement determines which types of descriptive statistics and which types of graphs are appropriate.
    • 13. 13 Levels of measurement and non-parametric vs. parametric Categorical & ordinal DVs → non-parametric (Does not assume a normal distribution) Interval & ratio DVs → parametric (Assumes a normal distribution) → non-parametric (If distribution is non-normal) DVs = dependent variables
    • 14. 14 Parametric statistics • Procedures which estimate parameters of a population, usually based on the normal distribution –M, SD, skewness, kurtosis → t-tests, ANOVAs –r → linear regression, multiple linear regression
    • 15. 15 • More powerful (more sensitive) • Have more assumptions (normal distribution) • More vulnerable to violations of assumptions Parametric statistics
    • 16. 16 Non-parametric statistics (Distribution-free tests) • Have fewer assumptions (do not assume a normal distribution) • Common non-parametrics statistics: –Frequency → sign test, chi-squared –Rank order → Mann-Whitney U test, Wilcoxon matched-pairs signed-ranks test
    • 17. 17 Univariate descriptive statistics
    • 18. 18 Number of variables Univariate = one variable Bivariate = two variables Multivariate = more than two variables e.g., mean, median, mode, histogram, bar chart, box plot e.g., correlation, t-test, scatterplot, clustered bar chart e.g., reliability analysis, factor analysis, multiple linear regression
    • 19. 19 What do we want to describe? The distributional properties of underlying variables, based on: ● Central tendency(ies): Frequencies, Mode, Median, Mean ● Shape: Skewness, Kurtosis ● Spread (dispersion): Min., Max., Range, IQR, Percentiles, Var/SD for sampled data.
    • 20. 20 Measures of central tendency Statistics which represent the ‘centre’ of a frequency distribution: –Mode (most frequent) –Median (50th percentile) –Mean (average) Which ones to use depends on: –Type of data (level of measurement) –Shape of distribution (esp. skewness) Reporting more than one may be appropriate.
    • 21. 21 Measures of central tendency √√If meaningfulRatio √√√Interval √Ordinal √Nominal MeanMedianMode / Freq. /%s If meaningful x x x
    • 22. 22 Measures of distribution • Measures of shape, spread, dispersion, and deviation from the central tendency Non-parametric: • Min and max • Range • Percentiles Parametric: • SD • Skewness • Kurtosis
    • 23. 23 √√√Ratio √√Interval √Ordinal Nominal Var / SDPercentileMin / Max, Range Measures of spread / dispersion / deviation If meaningful √ x x x x
    • 24. 24 Descriptives for nominal data • Nominal = Labelled categories • Descriptive statistics: –Most frequent? (Mode – e.g., females) –Least frequent? (e.g., Males) –Frequencies (e.g., 20 females, 10 males) –Percentages (e.g. 67% females, 33% males) –Cumulative percentages –Ratios (e.g., twice as many females as males)
    • 25. 25 Descriptives for ordinal data • Ordinal = Conveys order but not distance (e.g., ranks) • Descriptives approach is as for nominal (frequencies, mode etc.) • Plus percentiles (including median) may be useful
    • 26. 26 Descriptives for interval data • Interval = order and distance, but no true 0 (0 is arbitrary). • Central tendency (mode, median, mean) • Shape/Spread (min., max., range, SD, skewness, kurtosis) Interval data is discrete, but is often treated as ratio/continuous (especially for > 5 intervals)
    • 27. 27 Descriptives for ratio data • Ratio = Numbers convey order and distance, meaningful 0 point • As for interval, use median, mean, SD, skewness etc. • Can also use ratios (e.g., Category A is twice as large as Category B)
    • 28. 28 Mode (Mo) • Most common score - highest point in a frequency distribution – a real score – the most common response • Suitable for all levels of data, but may not be appropriate for ratio (continuous) • Not affected by outliers • Check frequencies and bar graph to see whether it is an accurate and useful statistic
    • 29. 29 Frequencies (f) and percentages (%) • # of responses in each category • % of responses in each category • Frequency table • Visualise using a bar or pie chart
    • 30. 30 Median (Mdn) • Mid-point of distribution (Quartile 2, 50th percentile) • Not badly affected by outliers • May not represent the central tendency in skewed data • If the Median is useful, then consider what other percentiles may also be worth reporting
    • 31. 31 Summary: Descriptive statistics Depending on the level of measurement and whether data is parametric (normal): • Describe the central tendency –Frequencies, Percentages –Mode, Median, Mean • Describe the variability: –Min, Max, Range, Quartiles –Standard Deviation, Variance
    • 32. 32 Properties of the normal distribution
    • 33. 33 The four moments of a normal distribution Row 1 Row 2 Row 3 Row 4 0 2 4 6 8 10 12 Column 1 Column 2 Column 3 Mean ←SD→ -ve Skew +ve Skew ←Kurt→
    • 34. 34 The four moments of a normal distribution Four mathematical qualities (parameters) can describe a continuous distribution which as least roughly follows a bell curve shape: • 1st = mean (central tendency) • 2nd = SD (dispersion) • 3rd = skewness (lean / tail) • 4th = kurtosis (peakedness / flattness)
    • 35. 35 Mean (1st moment ) • Average score Mean = Σ X / N • For normally distributed ratio or interval (if treating it as continuous) data. • Influenced by extreme scores (outliers)
    • 36. 36 Beware inappropriate averaging... With your head in an oven and your feet in ice you would feel, on average, just fine The majority of people have more than the average number of legs (M = 1.9999).
    • 37. 37 Standard deviation (2nd moment ) • SD = square root of the variance = Σ (X - X)2 N – 1 • For normally distributed interval or ratio data • Affected by outliers • Can also derive the Standard Error (SE) = SD / square root of N
    • 38. 38 Skewness (3rd moment ) • Lean of distribution – +ve = tail to right – -ve = tail to left • Can be caused by an outlier, or ceiling or floor effects • Can be accurate (e.g., cars owned per person would have a skewed distribution)
    • 39. 39 Skewness (3rd moment) (with ceiling and floor effects) ● Negative skew ● Ceiling effect ● Positive skew ● Floor effect
    • 40. 40 Kurtosis (4th moment ) • Flatness or peakedness of distribution +ve = peaked -ve = flattened • By altering the X &/or Y axis, any distribution can be made to look more peaked or flat – add a normal curve to help judge kurtosis visually.
    • 41. 41 Kurtosis (4th moment ) Blue = Negative (platykurtic) Red = Positive (leptokurtic)
    • 42. 42 Judging severity of skewness & kurtosis • View histogram with normal curve • Deal with outliers • Rule of thumb: Skewness and kurtosis > -1 or < 1 is generally considered to sufficiently normal for meeting the assumptions of parametric inferential statistics • Significance tests of skewness: Tend to be overly sensitive (therefore avoid using)
    • 43. 43 Areas under the normal curve If distribution is normal (bell-shaped - or close): ~68% of scores within +/- 1 SD of M ~95% of scores within +/- 2 SD of M ~99.7% of scores within +/- 3 SD of M
    • 44. 44 Areas under the normal curve
    • 45. 45 Non-normal distributions
    • 46. 46 Types of non-normal distribution • Modality –Uni-modal (one peak) –Bi-modal (two peaks) –Multi-modal (more than two peaks) • Skewness –Positive (tail to right) –Negative (tail to left) • Kurtosis –Platykurtic (Flat) –Leptokurtic (Peaked)
    • 47. 47 Non-normal distributions
    • 48. 48 Histogram of weight WEIGHT 110.0100.090.080.070.060.050.040.0 Histogram Frequency 8 6 4 2 0 Std. Dev = 17.10 Mean = 69.6 N = 20.00
    • 49. 49 Histogram of daily calorie intake
    • 50. 50 Histogram of fertility
    • 51. 51 Example ‘normal’ distribution 1 60 50 40 30 20 10 0 Frequency Mean =81.21 Std. Dev. =18.228 N =188
    • 52. 52 Example ‘normal’ distribution 2 Very masculineFairly masculineAndrogynousFairly feminineVery feminine Femininity-Masculinity 60 40 20 0 Count This bimodal graph actually consists of two different, underlying normal distributions.
    • 53. 53 Very masculineFairly masculineAndrogynousFairly feminineVery feminine Femininity-Masculinity 60 40 20 0 Count Very masculineFairly masculineAndrogynousFairly feminine Femininity-Masculinity 50 40 30 20 10 0 Count Gender: male Distribution for females Distribution for males Very masculineFairly masculineAndrogynousFairly feminineVery feminine Femininity-Masculinity 60 40 20 0 Count Gender: female
    • 54. 54 Non-normal distribution: Use non-parametric descriptive statistics • Min. & Max. • Range = Max.-Min. • Percentiles • Quartiles –Q1 –Mdn (Q2) –Q3 –IQR (Q3-Q1)
    • 55. 55 Effects of skew on measures of central tendency +vely skewed distributions mode < median < mean symmetrical (normal) distributions mean = median = mode -vely skewed distributions mean < median < mode
    • 56. 56 Effects of skew on measures of central tendency
    • 57. 57 Transformations • Converts data using various formulae to achieve normality and allow more powerful tests • Loses original metric • Complicates interpretation
    • 58. 58 1.If a survey question produces a ‘floor effect’, where will the mean, median and mode lie in relation to one another? Review questions
    • 59. 59 2.Would the mean # of cars owned in Australia to exceed the median? Review questions
    • 60. 60 3.Would you expect the mean score on an easy test to exceed the median performance? Review questions
    • 61. 61 Graphical techniques
    • 62. 62 Visualisation “Visualization is any technique for creating images, diagrams, or animations to communicate a message.” - Wikipedia
    • 63. 63 Science is beautiful (Nature Video) (Youtube – 5:30 mins)
    • 64. 64 Is Pivot a turning point for web exploration? (Gary Flake) (TED talk - 6 min.)
    • 65. 65 Principles of graphing • Clear purpose • Maximise clarity • Minimise clutter • Allow visual comparison
    • 66. 66 Graphs (Edward Tufte) • Visualise data • Reveal data – Describe – Explore – Tabulate – Decorate • Communicate complex ideas with clarity, precision, and efficiency
    • 67. 67 Graphing steps 1. Identify purpose of the graph (make large amounts of data coherent; present many #s in small space; encourage the eye to make comparisons) 2. Select type of graph to use 3. Draw and modify graph to be clear, non-distorting, and well- labelled (maximise clarity, minimise clarity; show the data; avoid distortion; reveal data at several levels/layers)
    • 68. 68 Software for data visualisation (graphing) 1. Statistical packages ● e.g., SPSS Graphs or via Analyses 2. Spreadsheet packages ● e.g., MS Excel 3. Word-processors ● e.g., MS Word – Insert – Object – Micrograph Graph Chart
    • 69. 69 Cleveland’s hierarchy
    • 70. 70 Univariate graphs • Bar graph • Pie chart • Histogram • Stem & leaf plot • Data plot / Error bar • Box plot Non-parametric i.e., nominal, ordinal, or non- normal interval or ratio} } Parametric i.e., normally distributed interval or ratio
    • 71. 71 Bar chart (Bar graph) AREA Biology Anthropology Information Technolo Psychology Sociology Count 13 12 12 11 11 10 10 9 9 AREA Biology Anthropology Information Technolo Psychology Sociology Count 12 11 10 9 8 7 6 5 4 3 2 1 0 • Allows comparison of heights of bars • X-axis: Collapse if too many categories • Y-axis: Count/Frequency or % - truncation exaggerates differences • Can add data labels (data values for each bar) Note truncated Y-axis
    • 72. 72 Biology Anthropology Information Technolo Psychology Sociology • Use a bar chart instead • Hard to read –Difficult to show • Small values • Small differences –Rotation of chart and position of slices influences perception Pie chart
    • 73. 73 Histogram Participant Age 62.552.542.532.522.512.5 3000 2000 1000 0 Std. Dev = 9.16 Mean = 24.0 N = 5575.00 Participant Age 63.0 58.0 53.0 48.0 43.0 38.0 33.0 28.0 23.0 18.0 13.0 8.0 600 500 400 300 200 100 0 Std. Dev = 9.16 Mean = 24.0 N = 5575.00 Participant Age 65 61 57 53 49 45 41 37 33 29 25 21 17 13 9 1000 800 600 400 200 0 Std. Dev = 9.16 Mean = 24 N = 5575.00 • For continuous data (Likert?, Ratio) • X-axis needs a happy medium for # of categories • Y-axis matters (can exaggerate)
    • 74. 74 Histogram of male & female heights Wild & Seber (2000)
    • 75. 75 Stem & leaf plot ● Use for ordinal, interval and ratio data (if rounded) ● May look confusing to unfamiliar reader
    • 76. 76 • Contains actual data • Collapses tails • Underused alternative to histogram Stem & leaf plot Frequency Stem & Leaf 7.00 1 . & 192.00 1 . 22223333333 541.00 1 . 444444444444444455555555555555 610.00 1 . 6666666666666677777777777777777777 849.00 1 . 88888888888888888888888888899999999999999999999 614.00 2 . 0000000000000000111111111111111111 602.00 2 . 222222222222222233333333333333333 447.00 2 . 4444444444444455555555555 291.00 2 . 66666666677777777 240.00 2 . 88888889999999 167.00 3 . 000001111 146.00 3 . 22223333 153.00 3 . 44445555 118.00 3 . 666777 99.00 3 . 888999 106.00 4 . 000111 54.00 4 . 222 339.00 Extremes (>=43)
    • 77. 77 Box plot (Box & whisker) ● Useful for interval and ratio data ● Represents min., max, median, quartiles, & outliers
    • 78. 78 • Alternative to histogram • Useful for screening • Useful for comparing variables • Can get messy - too much info • Confusing to unfamiliar reader Box plot (Box & whisker) Participant Gender FemaleMaleMissing 10 8 6 4 2 0 Time Management-T1 Self-Confidence-T1 44954162578259628414042044353275182341862330517623006559128211495 3201419358828475475400198324512898200336473 52157129504268724318255928345427211669040523444 4423423635403519067273946893137 3562338330403962312229 12255255545 2385410773323584004 552433515563 28294482267253154120226228451504231939983902646355221793020527435314997364541416412902548168628144167196326144171955174443826882822262617931747148 218736735510399522434250553623594998649620510638344230032962562527 35644317149302843626902101233519693009296541539905538229314216883634 27433593251521081985531655582138303424526783352317 2480296024926454284316542285186 419324766472662291 6084308 17 2699 3556334 1503275241623466255243493045 304032431371222596415943511907247380 4028 18082659 197862231372721142861 226520672270403852527688296021515564300430321938532836535506271835192336608405435012183292849986302224518624385114882241 27806412743294423212570661146542792576430229232476 231214932334 4308292014254307 569 5491
    • 79. 79 Data plot & error bar Data plot Error bar
    • 80. 80 • Alternative to histogram • Implies continuity e.g., time • Can show multiple lines Line graph OVERALL SCALES-T3 OVERALL SCALES-T2 OVERALL SCALES-T1 OVERALL SCALES-T0 Mean 8.0 7.5 7.0 6.5 6.0 5.5 5.0
    • 81. 81 Graphical integrity (part of academic integrity)
    • 82. 82 "Like good writing, good graphical displays of data communicate ideas with clarity, precision, and efficiency. Like poor writing, bad graphical displays distort or obscure the data, make it harder to understand or compare, or otherwise thwart the communicative effect which the graph should convey." Michael Friendly – Gallery of Data Visualisation
    • 83. 83 Tufte’s graphical integrity • Some lapses intentional, some not • Lie Factor = size of effect in graph size of effect in data • Misleading uses of area • Misleading uses of perspective • Leaving out important context • Lack of taste and aesthetics
    • 84. 84 Review exercise: Fill in the cells in this table Level Properties Examples Descriptive Statistics Graphs Nominal /Categorical Ordinal / Rank Interval Ratio Answers: http://goo.gl/Ln9e1
    • 85. 85 Links • Presenting Data – Statistics Glossary v1.1 - http://www.cas.lancs.ac.uk/glossary_v1.1/presdata.html • A Periodic Table of Visualisation Methods - http://www.visual-literacy.org/periodic_table/periodic_table.html • Gallery of Data Visualization - http://www.math.yorku.ca/SCS/Gallery/ • Univariate Data Analysis – The Best & Worst of Statistical Graphs - http://www.csulb.edu/~msaintg/ppa696/696uni.htm • Pitfalls of Data Analysis – http://www.vims.edu/~david/pitfalls/pitfalls.htm • Statistics for the Life Sciences – http://www.math.sfu.ca/~cschwarz/Stat-301/Handouts/Handouts.
    • 86. 86 References 1. Cleveland, W. S. (1985). The elements of graphing data. Monterey, CA: Wadsworth. 2. Jones, G. E. (2006). How to lie with charts. Santa Monica, CA: LaPuerta. 3. Tufte, E. (1983). The visual display of quantitative information. Cheshire, CT: Graphics Press. 4. Wild, C. J., & Seber, G. A. F. (2000). Chance encounters: A first course in data analysis and inference. New York: Wiley.
    • 87. 87 Open Office Impress ● This presentation was made using Open Office Impress. ● Free and open source software. ● http://www.openoffice.org/product/impress.html

    ×