Like this presentation? Why not share!

# Descriptives & Graphing

## by James Neill, Lecturer at University of Canberra on Mar 11, 2008

• 20,717 views

Overviews descriptive and graphical approaches to analysis of univariate data.

Overviews descriptive and graphical approaches to analysis of univariate data.

### Views

Total Views
20,717
Views on SlideShare
19,670
Embed Views
1,047

Likes
2
596
0

### 5 Embeds1,047

 http://ucspace.canberra.edu.au 976 http://www.slideshare.net 50 http://ibishkastrati.blogspot.com 19 http://static.slideshare.net 1 http://esp.sp.edu.sg 1

### Categories

Uploaded via SlideShare as OpenOffice

## Descriptives & GraphingPresentation Transcript

• Descriptives & Graphing Lecture 3 Survey Research & Design in Psychology James Neill, 2014
• Overview: Descriptives & Graphing 1. Steps with data 2. How to approach data 3. Parametric vs. non-parametric & LOM 4. Descriptive statistics 5. Normal distribution 6. Non-normal distributions 7. Effect of skew on central tendency 8. Principles of graphing 9. Univariate graphical techniques 2
• Steps with data 3
• Collect, enter, check & screen Steps with data Explore, describe, & graph Test hypotheses 4
• Data checking • Survey data can be checked by having one person read the survey responses aloud to another person who checks the electronic data file. • For a large study, a proportion of the surveys can be checked and the error-rate declared in the research report. 5
• Data screening • Carefully 'screening' a data file helps to minimise errors and maximise validity. • Screen for: – Out of range values – Mis-entered data – Missing cases – Duplicate cases – Missing data 6
• How to approach data 7
• Don't be afraid - you can't break data! 8
• Play with the data – get to know it. 9
• Get intimate with your data 10
• Exploring, describing & graphing data THE CHALLENGE: to find a meaningful, accurate way to depict the ‘true story’ of the data 11
• Clearly report the data's main features 12
• Summary: Steps with data Spend ‘quality time’ investigating (exploring and describing) your data 1. Check and screen the data 2. How to approach data 1. Don't be afraid – you can't break the data 2. Play with the data – get to know it 3. Get intimate 3. Report key features 13
• Parametric & non-parametric statistics & level of measurement 14
• Golden rule of data analysis Level of measurement determines type of descriptive statistics and graphs The level of measurement (see previous lecture) determines which types of descriptive statistics and which types of graphs are appropriate. 15
• Levels of measurement and non-parametric vs. parametric Categorical & ordinal DVs → non-parametric (Does not assume a normal distribution) Interval & ratio DVs → parametric (Assumes a normal distribution) → non-parametric (If distribution is non-normal) DVs = dependent variables 16
• Parametric statistics • Procedures which estimate parameters of a population, usually based on the normal distribution –M, SD, skewness, kurtosis → t-tests, ANOVAs –r → linear regression, multiple linear regression 17
• Parametric statistics • More powerful (more sensitive) • Have more assumptions • More vulnerable to violations of assumptions 18
• Non-parametric statistics (Distribution-free tests) • Do not rely on estimates of population parameters –Frequency → sign test, chi-squared –Rank order → Mann-Whitney U test, Wilcoxon matched-pairs signed-ranks test 19
• Univariate descriptive statistics 20
• Number of variables Univariate = one variable Bivariate = two variables Multivariate e.g., mean, median, mode, histogram, bar chart, box plot e.g., correlation, t-test, scatterplot, clustered bar chart = more than two variables e.g., reliability analysis, factor analysis, multiple linear regression 21
• What do we want to describe? The distributional properties of underlying variables, based on: ● Central tendency(ies): Frequencies, Mode, Median, Mean ● ● Shape: Skewness, Kurtosis Spread (dispersion): Min., Max., Range, IQR, Percentiles, Var/SD for sampled data. 22
• Measures of central tendency Statistics which represent the ‘centre’ of a frequency distribution: – Mode (most frequent) – Median (50th percentile) – Mean (average) Which ones to use depends on: – Type of data (level of measurement) – Shape of distribution (esp. skewness) Reporting more than one may be appropriate. 23
• Measures of central tendency Mode / Freq. /%s Median Mean Nominal √ x x Ordinal √ If meaningful x Interval √ √ √ If meaningful √ √ Ratio 24
• Measures of distribution • Measures of shape, spread, disperson and deviation from the central tendency Non-parametric: Parametric: • Min and max • Range • Percentiles • SD • Skewness • Kurtosis 25
• Measures of spread / dispersion / deviation Min / Max, Percentile Range Var / SD Nominal x x x Ordinal √ If meaningful x Interval √ √ √ Ratio √ √ √ 26
• Descriptives for nominal data • Nominal = Labelled categories • Descriptive statistics: –Most frequent? (Mode – e.g., females) –Least frequent? (e.g, Males) –Frequencies (e.g., 20 females, 10 males) –Percentages (e.g. 67% females, 33% males) –Cumulative percentages –Ratios (e.g., twice as many females as males) 27
• Descriptives for ordinal data • Ordinal = Conveys order but not distance (e.g., ranks) • Descriptives approach is as for nominal (frequencies, mode etc.) • Plus percentiles (including median) may be useful 28
• Descriptives for interval data • Interval = order and distance, but no true 0 (0 is arbitrary). • Central tendency (mode, median, mean) • Shape/Spread (min, max, range, SD, skewness, kurtosis) Interval data is discrete, but is often treated as ratio/continuous (especially for > 5 intervals) 29
• Descriptives for ratio data • Ratio = Numbers convey order and distance, meaningful 0 point • Descriptives approach is as for interval (i.e., median, mean, SD, skewness etc.) • Ratios 30
• Mode (Mo) • Most common score - highest point in a frequency distribution – a real score – for most no. of participants • Suitable for all levels of data, but may not be appropriate for ratio (continuous) • Not affected by outliers • Check frequencies and bar graph to see whether it is an accurate and useful statistic. 31
• Frequencies (f) and percentages (%) • • • • # of responses in each category % of responses in each category Frequency table Visualise using a bar or pie chart 32
• Median (Mdn) • Mid-point of distribution (Quartile 2, 50th percentile) • Not badly affected by outliers • May not represent the central tendency in skewed data • If the Median is useful, then consider what other percentiles may also be worth reporting. 33
• Summary: Descriptive statistics Depending on the level of measurement and whether data is parametric (normal): • Describe the central tendency –Frequencies, Percentages –Mode, Median, Mean • Describe the variability: –Min, Max, Range, Quartiles –Standard Deviation, Variance 34
• Properties of the normal distribution 35
• The four moments of a normal distribution ←Kurt→ Mean 12 10 8 ←SD→ 6 -ve Skew 4 2 Column 1 Column 2 Column 3 +ve Skew 0 Row 1 Row 2 Row 3 Row 4 36
• The four moments of a normal distribution Four mathematical qualities (parameters) can describe a continuous distribution which as least roughly follows a bell curve shape: • • • • 1st = mean (central tendency) 2nd = SD (dispersion) 3rd = skewness (lean / tail) 4th = kurtosis (peakedness / flattness) 37
• Mean (1st moment ) • Average score Mean = Σ X / N • For normally distributed ratio or interval (if treating it as continuous) data. • Influenced by extreme scores (outliers) 38
• Beware inappropriate averaging... With your head in an oven and your feet in ice you would feel, on average, just fine The majority of people have more than the average number of legs (M = 1.9999). 39
• Standard deviation (2nd moment ) • SD = square root of the variance = Σ (X - X)2 N–1 • For normally distributed interval or ratio data • Affected by outliers • Can also derive the Standard Error (SE) = SD / square root of N 40
• Skewness (3rd moment ) • Lean of distribution – +ve = tail to right – -ve = tail to left • Can be caused by an outlier, or ceiling or floor effects • Can be accurate e.g., no. of cars owned per person 41
• Skewness (3rd moment) (with ceiling and floor effects) ● ● Negative skew Ceiling effect ● ● Positive skew Floor effect 42
• Kurtosis (4th moment ) • Flatness or peakedness of distribution +ve = peaked -ve = flattened • By altering the X &/or Y axis, any distribution can be made to look more peaked or flat – add a normal curve to help judge kurtosis visually. 43
• Kurtosis (4th moment ) Red = Positive (leptokurtic) Blue = Negative (platykurtic) 44
• Judging severity of skewness & kurtosis • View histogram with normal curve • Deal with outliers • Rule of thumb: Skewness and kurtosis > -1 or < 1 is generally considered to sufficiently normal for meeting the assumptions of parametric inferential statistics • Significance tests of skewness: Tend to be overly sensitive (therefore avoid using) 45
• Areas under the normal curve If distribution is normal (bell-shaped - or close): ~68% of scores within +/- 1 SD of M ~95% of scores within +/- 2 SD of M ~99.7% of scores within +/- 3 SD of M 46
• Areas under the normal curve 47
• Summary: Normal distribution ←Kurt→ Mean 12 10 8 ←SD→ 6 -ve Skew 4 2 Column 1 Column 2 Column 3 +ve Skew 0 Row 1 Row 2 Row 3 Row 4 48
• Non-normal distributions 49
• Types of non-normal distribution • Modality – Uni-modal (one peak) – Bi-modal (two peaks) – Multi-modal (more than two peaks) • Skewness – Positive (tail to right) – Negative (tail to left) • Kurtosis – Platykurtic (Flat) – Leptokurtic (Peaked) 50
• Non-normal distributions 51
• Histogram of weight Histogram 8 6 Frequency 4 2 Std. Dev = 17.10 Mean = 69.6 N = 20.00 0 40.0 50.0 WEIGHT 60.0 70.0 80.0 90.0 100.0 110.0 52
• Histogram of daily calorie intake 53
• Histogram of fertility 54
• 60 Example ‘normal’ distribution 1 50 Frequency 40 30 20 10 Mean =81.21 Std. Dev. =18.228 N =188 0 55
• Count 60 Example ‘normal’ distribution This bimodal graph 2actually consists of two different, underlying normal distributions. 40 20 0 Very feminine Fairly feminine Androgynous Fairly masculine Femininity-Masculinity Very masculine 56
• Count 60 40 20 0 Very feminine Fairly feminine Androgynous Fairly masculine Very masculine Femininity-Masculinity Distribution for males Distribution for females Gender: male Gender: female 50 60 40 Count Count 30 40 20 20 10 0 0 Very feminine Fairly feminine Androgynous Fairly masculine Femininity-Masculinity Very masculine Fairly feminine Androgynous Fairly masculine Femininity-Masculinity Very masculine 57
• Non-normal distribution: Use non-parametric descriptive statistics • • • • Min & Max Range = Max-Min Percentiles Quartiles –Q1 –Mdn (Q2) –Q3 –IQR (Q3-Q1) 58
• Effects of skew on measures of central tendency +vely skewed distributions mode < median < mean symmetrical (normal) distributions mean = median = mode -vely skewed distributions mean < median < mode 59
• Effects of skew on measures of central tendency 60
• Transformations • Converts data using various formulae to achieve normality and allow more powerful tests • Loses original metric • Complicates interpretation 61
• Summary: Non-normal distributions 1. Modality 1. Unimodal 2. Bimodal 3. Multimodal Rule of thumb 2. Skewness Skewness and kurtosis in 3. Kurtosis the range of -1 to +1 can be treated as approx. normal 62
• Review questions 1.If a survey question produces a ‘floor effect’, where will the mean, median and mode lie in relation to one another? 63
• Review questions 2.Would the mean # of cars owned in Australia to exceed the median? 64
• Review questions 3.Would you expect the mean score on an easy test to exceed the median performance? 65
• Graphical techniques 66
• Visualisation “Visualization is any technique for creating images, diagrams, or animations to communicate a message.” 67 - Wikipedia
• Science is beautiful (Nature Video) (Youtube – 5:30 mins) 68
• Is Pivot a turning point for web exploration? (Gary Flake) (TED talk - 6 min.) 69
• Graphs (Edward Tufte) • Visualise data • Reveal data – – – – Describe Explore Tabulate Decorate • Communicate complex ideas with clarity, precision, and efficiency 70
• Graphing steps 1. Identify purpose of the graph (make large amounts of data coherent; present many #s in small space; encourage the eye to make comparisons) 2. Select type of graph to use 3. Draw and modify graph to be clear, non-distorting, and welllabelled (maximise clarity, minimise clarity; show the data; avoid distortion; reveal data at several levels/layers; ) 4. Disseminate (e.g., include in report with verbal description) 71
• Software for data visualisation (graphing) 1. Statistical packages ● e.g., SPSS Graphs or via Analyses 2. Spreadsheet packages ● e.g., MS Excel 3. Word-processors ● e.g., MS Word – Insert – Object – Micrograph Graph Chart 72
• Cleveland’s hierarchy 73
• Univariate graphs 74
• Univariate graphs • Bar graph Non-parametric i.e., nominal, ordinal, or nonnormal interval or ratio • Pie chart • Histogram • Stem & leaf plot Parametric • Data plot / i.e., normally distributed interval or ratio Error bar • Box plot 75 } }
• Bar chart (Bar graph) • Allows comparison of heights of bars • X-axis: Collapse if too many categories • Y-axis: Count/Frequency or % - truncation exaggerates differences • Can add data labels (data values for each bar) 13 12 11 12 10 12 9 8 7 Count Count 11 11 6 5 10 4 10 3 2 9 1 9 0 Note truncated Y-axis Sociology Information T echnolo Psychology Anthropology AREA Biology Sociology Information T echnolo Psychology Biology Anthropology AREA 76
• Pie chart • Use a bar chart instead • Hard to read Biology Sociology –Difficult to show • Small values • Small differences Anthropology –Rotation of chart and position of slices influences perception Psychology Information T echnolo 77
• Histogram • For continuous data (Likert?, Ratio) • X-axis needs a happy medium for # of categories • Y-axis matters (can exaggerate) 3000 2000 600 500 1000 400 1000 Std. Dev =300 9.16 Mean = 24.0 0 12.5 22.5 32.5 42.5 52.5 62.5 800 N = 5575.00 200 600 Participant Age 100 St d. Dev = 9.16 Mean = 24.0 N = 5575.00 0 400 63 58 53 48 43 38 33 28 23 18 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 0 13 8. 200 Std. Dev = 9.16 Participant Age Mean = 24 N = 5575.00 0 9 17 13 25 21 33 29 41 37 49 45 Participant Age 57 53 65 61 78
• Histogram of male & female heights Wild & Seber (2000) 79
• Stem & leaf plot ● ● Use for ordinal, interval and ratio data (if rounded) May look confusing to unfamiliar reader 80
• Stem & leaf plot • Contains actual data • Collapses tails • Underused alternative to histogram Frequency Stem 7.00 1 192.00 1 541.00 1 610.00 1 849.00 1 614.00 2 602.00 2 447.00 2 291.00 2 240.00 2 167.00 3 146.00 3 153.00 3 118.00 3 99.00 3 106.00 4 54.00 4 339.00 Extremes & . . . . . . . . . . . . . . . . . Leaf & 22223333333 444444444444444455555555555555 6666666666666677777777777777777777 88888888888888888888888888899999999999999999999 0000000000000000111111111111111111 222222222222222233333333333333333 4444444444444455555555555 66666666677777777 88888889999999 000001111 22223333 44445555 666777 888999 000111 222 (>=43) 81
• Box plot (Box & whisker) ● ● Useful for interval and ratio data Represents min., max, median, quartiles, & outliers 82
• Box plot (Box & whisker) • • • • • Alternative to histogram Useful for screening Useful for comparing variables Can get messy - too much info Confusing to unfamiliar reader 10 8 6 4 569 5491 2 2241 1488 51 2438 5186 2224 630 4998 2928 2183 3501 4054 608 2336 3519 2718 5506 3653 5328 1938 3032 3004 5564 2151 2960 688 527 52 4038 2270 2067 2265 2476 2923 4302 2576 4279 1465 661 2570 2321 2944 2743 641 2780 2334 1493 2312 4307 1425 2920 4308 148 1747 1793 2626 2822 2688 4438 174 1955 417 2614 1963 4167 2814 686 5481 2902 641 4141 3645 4997 3531 5274 3020 2179 3552 2646 390 3998 2319 1504 2845 2262 4120 5315 2672 4482 2829 2527 56 2962 3003 3442 638 510 620 649 4998 2359 5536 4250 2243 3995 5510 3673 2187 3634 688 1421 2293 5538 3990 415 2965 3009 1969 2335 2101 2690 436 3028 149 4317 3564 2317 335 678 2452 3034 2138 5558 5316 1985 2108 2515 3593 2743 5186 4228 3165 4284 645 2492 2960 2480 2291 266 647 2476 4193 4308 608 17 2699 334 3556 3045 4349 2552 3466 4162 2752 1503 80 73 724 190 4351 4159 2596 122 3137 324 3040 4028 2659 1808 2861 2114 727 2313 62 1978 0 1495 2821 5591 3006 62 517 2330 4186 1823 75 3532 2044 1404 284 2596 578 4162 4495 473 36 2003 2898 2451 1983 400 5475 2847 3588 1419 320 3444 4052 690 2116 5427 2834 2559 318 724 4268 2950 571 521 3137 689 394 727 1906 4035 3635 342 442 2229 231 396 3040 3383 3562 5545 5525 122 4004 2358 733 4107 2385 5563 3351 5524 T ime Management-T 1 Self-Confidence-T 1 Missing Male Participant Gender Female 83
• Data plot & error bar Error bar Data plot 84
• Line graph • Alternative to histogram • Implies continuity e.g., time • Can show multiple lines 8.0 7.5 Mean 7.0 6.5 6.0 5.5 5.0 OVERALL SCALES-T 0 OVERALL SCALES-T 2 OVERALL SCALES-T 1 OVERALL SCALES-T 3 85
• Graphical integrity (part of academic integrity) 86
• "Like good writing, good graphical displays of data communicate ideas with clarity, precision, and efficiency. Like poor writing, bad graphical displays distort or obscure the data, make it harder to understand or compare, or otherwise thwart the communicative effect which the graph should convey." Michael Friendly – Gallery of Data Visualisation 87
• Tufte’s graphical integrity • Some lapses intentional, some not • Lie Factor = size of effect in graph size of effect in data • Misleading uses of area • Misleading uses of perspective • Leaving out important context • Lack of taste and aesthetics 88
• Summary: Principles of graphing • • • • Clear purpose Maximise clarity Minimise clutter Cleveland's hierarchy – Allow visual comparison 89
• Review exercise: Fill in the cells in this table Level Properties Examples Descriptive Graphs Statistics Nominal /Categorical Ordinal / Rank Interval Ratio Answers: http://goo.gl/Ln9e1 90
• Summary: Univariate graphical techniques • Bar graph / Pie chart • Histogram • Stem & leaf plot • Box plot (Box & whisker) • Data plot / Error bar 91
• Links • Presenting Data – Statistics Glossary v1.1 http://www.cas.lancs.ac.uk/glossary_v1.1/presdata.html • A Periodic Table of Visualisation Methods http://www.visual-literacy.org/periodic_table/periodic_table.html • Gallery of Data Visualization http://www.math.yorku.ca/SCS/Gallery/ • Univariate Data Analysis – The Best & Worst of Statistical Graphs - http://www.csulb.edu/~msaintg/ppa696/696uni.htm • Pitfalls of Data Analysis – http://www.vims.edu/~david/pitfalls/pitfalls.htm • Statistics for the Life Sciences – http://www.math.sfu.ca/~cschwarz/Stat-301/Handouts/Handouts. 92
• References 1. Cleveland, W. S. (1985). The elements of graphing data. Monterey, CA: Wadsworth. 2. Jones, G. E. (2006). How to lie with charts. Santa Monica, CA: LaPuerta. 3. Tufte, E. (1983). The visual display of quantitative information. Cheshire, CT: Graphics Press. 4. Wild, C. J., & Seber, G. A. F. (2000). Chance encounters: A first course in data analysis and inference. New York: Wiley. 93
• Open Office Impress This presentation was made using Open Office Impress. ● Free and open source software. ● ● http://www.openoffice.org/product/impress.html 94