Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Lecture 3
Survey Research & Design in Psychology
James Neill, 2017
Creative Commons Attribution 4.0
Descriptives & Graphin...
2
Overview:
Descriptives & Graphing
1. Getting to know a data set
2. LOM & types of statistics
3. Descriptive statistics
4...
3
Getting to know
a data-set
(how to approach data)
4
Play with the data –
get to know it.Image source: http://www.flickr.com/photos/analytik/1356366068/
5
Don't be afraid - you
can't break data!Image source: http://www.flickr.com/photos/rnddave/5094020069
6
Check & screen the data –
keep signal, reduce noise
Image source: https://commons.wikimedia.org/wiki/File:Nasir-al_molk_...
7
Data checking: One
person reads the survey responses
aloud to another person who checks
the electronic data file.
For la...
8
Data screening: Carefully
'screening' a data file helps to
remove errors and maximise validity.
For example, screen for:...
9
Explore the data
Image source: https://commons.wikimedia.org/wiki/File:Kazimierz_Nowak_in_jungle_2.jpgI
10
Get intimate
with the data
Image source: http://www.flickr.com/photos/elmoalves/2932572231/
11
Describe the data's
main features
find a
meaningful,
accurate
way to
depict the
‘true story’ of
the data
Image source: ...
12
Test hypotheses
to answer research questions
Image source: https://pixabay.com/en/light-bulb-current-light-glow-1042480/
13
Level of
measurement &
types of statistics
Image source: http://www.flickr.com/photos/peanutlen/2228077524/
14
Golden rule of data analysis
A variable's level of
measurement determines the
type of statistics that can be
used, incl...
15
Levels of measurement and
non-parametric vs. parametric
Categorical & ordinal data DV
→ non-parametric
(Does not assume...
16
Parametric statistics
• Statistics which estimate
parameters of a population, based
on the normal distribution
–Univari...
17
• More powerful
(more sensitive)
• More assumptions
(population is normally distributed)
• Vulnerable to violations of
...
18
Non-parametric statistics
• Statistics which do not assume
sampling from a population which
is normally distributed
–Th...
19
Non-parametric statistics
• Less powerful
(less sensitive)
• Fewer assumptions
(do not assume a normal distribution)
• ...
20
Univariate
descriptive
statistics
21
Number of variables
Univariate
= one variable
Bivariate
= two variables
Multivariate
= more than two variables
mean, me...
22
What do we want to describe?
The distributional properties of
variables, based on:
● Central tendency(ies): e.g.,
frequ...
23
Measures of central tendency
Statistics which represent the
‘centre’ of a frequency distribution:
–Mode (most frequent)...
24
Measures of central tendency
√√If meaningfulRatio
√√√Interval
√Ordinal
√Nominal
MeanMedianMode /
Freq. /%s
If meaningfu...
25
Measures of distribution
• Measures of shape, spread,
dispersion, and deviation from the
central tendency
Non-parametri...
26
√√√Ratio
√√Interval
√Ordinal
Nominal
Var / SDPercentileMin / Max,
Range
Measures of spread /
dispersion / deviation
If ...
27
Descriptives for nominal data
• Nominal LOM = Labelled categories
• Descriptive statistics:
–Most frequent? (Mode – e.g...
28
Descriptives for ordinal data
• Ordinal LOM = Conveys order but
not distance (e.g., ranks)
• Descriptives approach is a...
29
Descriptives for interval data
• Interval LOM = order and
distance, but no true 0 (0 is
arbitrary).
• Central tendency ...
30
Descriptives for ratio data
• Ratio = Numbers convey order
and distance, meaningful 0 point
• As for interval, use medi...
31
Mode (Mo)
• Most common score - highest point in a
frequency distribution – a real score – the most
common response
• S...
32
Frequencies (f) and
percentages (%)
• # of responses in each category
• % of responses in each category
• Frequency tab...
33
Median (Mdn)
• Mid-point of distribution
(Quartile 2, 50th
percentile)
• Not badly affected by outliers
• May not repre...
34
Summary: Descriptive statistics
• Level of measurement and
normality determines whether
data can be treated as parametr...
35
Properties of the
normal distribution
Image source: http://www.flickr.com/photos/trevorblake/3200899889/
36
Four moments of a
normal distribution
Row 1 Row 2 Row 3 Row 4
0
2
4
6
8
10
12
Column 1
Column 2
Column 3
Mean
←SD→
-ve ...
37
Four moments of a
normal distribution
Four mathematical qualities
(parameters) can describe a
continuous distribution w...
38
Mean
(1st moment )
• Average score
Mean = Σ X / N
• For normally distributed ratio or
interval (if treating it as conti...
39
Beware inappropriate averaging...
With your head in an oven
and your feet in ice
you would feel,
on average,
just fine
...
40
Standard deviation
(2nd moment)
• SD = square root of the variance
= Σ (X - X)2
N – 1
• For normally distributed interv...
41
Skewness
(3rd moment )
• Lean of distribution
– +ve = tail to right
– -ve = tail to left
• Can be caused by an outlier,...
42
Skewness (3rd moment)
(with ceiling and floor effects)
● Negative skew
● Ceiling effect
● Positive skew
● Floor effect
...
43
Kurtosis
(4th moment )
• Flatness or peakedness of
distribution
+ve = peaked
-ve = flattened
• By altering the X &/or Y...
44
Kurtosis
(4th moment )
Image source: https://classconnection.s3.amazonaws.com/65/flashcards/2185065/jpg/kurtosis-142C11...
45
Judging severity of
skewness & kurtosis
• View histogram with normal curve
• Deal with outliers
• Rule of thumb:
Skewne...
46
Areas under the normal curve
If distribution is normal
(bell-shaped - or close):
~68% of scores within +/- 1 SD of M
~9...
47
Areas under the normal curve
Image source: https://commons.wikimedia.org/wiki/File:Empirical_Rule.PNG
48
Non-normal
distributions
49
Types of non-normal distribution
• Modality
–Uni-modal (one peak)
–Bi-modal (two peaks)
–Multi-modal (more than two pea...
50
Non-normal distributions
51
Histogram of people's weight
WEIGHT
110.0100.090.080.070.060.050.040.0
Histogram
Frequency 8
6
4
2
0
Std. Dev = 17.10
M...
52
Histogram of daily calorie intake
N = 75
53
Histogram of fertility
54
Example ‘normal’ distribution
1
140120100806040200
Die
60
50
40
30
20
10
0
Frequency
Mean =81.21
Std. Dev. =18.228
N =1...
55
Example ‘normal’ distribution
2
Very masculineFairly masculineAndrogynousFairly feminineVery feminine
Femininity-Mascul...
56
Very masculineFairly masculineAndrogynousFairly feminineVery feminine
Femininity-Masculinity
60
40
20
0
Count
Very masc...
57
Non-normal distribution:
Use non-parametric descriptive statistics
• Min. & Max.
• Range = Max. - Min.
• Percentiles
• ...
58
Effects of skew on measures
of central tendency
+vely skewed distributions
mode < median < mean
symmetrical (normal) di...
59
Effects of skew on measures of
central tendency
60
Transformations
• Converts data using various
formulae to achieve normality
and allow more powerful tests
• Loses origi...
61
1. If a survey question produces a
‘floor effect’, where will the mean,
median and mode lie in relation to
one another?...
62
2. Would the mean # of cars owned in
Australia to exceed the median?
Review questions
63
3. Would the mean score on an easy
test exceed the median
performance?
Review questions
64
Graphical
techniques
Image source: http://www.flickr.com/photos/pagedooley/2121472112/
65
Visualisation
“Visualization is any technique
for creating images, diagrams, or
animations to communicate a message.”
-...
66
Science is
beautiful
(Nature Video)
(Youtube – 5:30 mins)
Image source:http://commons.wikimedia.org/wiki/File:Parodyfil...
67
Is Pivot a turning
point for web
exploration?
(Gary Flake)
(TED talk - 6 min.)
Image source:http://commons.wikimedia.or...
68
Principles of graphing
• Clear purpose
• Maximise clarity
• Minimise clutter
• Allow visual comparison
69
Graphs
(Edward Tufte)
• Visualise data
• Reveal data
– Describe
– Explore
– Tabulate
– Decorate
• Communicate complex i...
70
Graphing steps
1. Identify purpose of the graph
(make large amounts of data coherent;
present many #s in small space;
e...
71
Software for
data visualisation (graphing)
1. Statistical packages
● e.g., SPSS Graphs or via Analyses
2. Spreadsheet p...
72
Cleveland’s hierarchyImage source:http://www.processtrends.com/TOC_data_visualization.htm
73
Cleveland’s hierarchyImage source:https://priceonomics.com/how-william-cleveland-turned-data-visualization/
74
Univariate graphs
• Bar graph
• Pie chart
• Histogram
• Stem & leaf plot
• Data plot /
Error bar
• Box plot
Non-paramet...
75
Bar chart (Bar graph)
AREA
Biology
Anthropology
Information Technolo
Psychology
Sociology
Count
13
12
12
11
11
10
10
9
...
76
Biology
Anthropology
Information Technolo
Psychology
Sociology
• Use a bar chart instead
• Hard to read
–Difficult to s...
77
Pie chart → Use bar chart instead
Image source: https://priceonomics.com/how-william-cleveland-turned-data-visualizatio...
78
Histogram
Participant Age
62.552.542.532.522.512.5
3000
2000
1000
0
Std. Dev = 9.16
Mean = 24.0
N = 5575.00
Participant...
79
Histogram of male & female heights
Wild & Seber (2000)
Image source: Wild, C. J., & Seber, G. A. F. (2000). Chance enco...
80
Stem & leaf plot
● Use for ordinal, interval and ratio data
(if rounded)
● May look confusing to unfamiliar reader
81
• Contains actual data
• Collapses tails
• Underused alternative to histogram
Stem & leaf plot
Frequency Stem & Leaf
7....
82
Box plot
(Box &
whisker)
● Useful for
interval and
ratio data
● Represents
min., max,
median,
quartiles, &
outliers
83
• Alternative to histogram
• Useful for screening
• Useful for comparing variables
• Can get messy - too much info
• Co...
84
Data plot & error bar
Data plot Error bar
85
• Alternative to histogram
• Implies continuity e.g., time
• Can show multiple lines
Line graph
OVERALL SCALES-T3
OVERA...
86
Graphical
integrity
(part of
academic
integrity)
87
"Like good writing, good graphical
displays of data communicate ideas
with clarity, precision, and efficiency.
Like poo...
88
Tufte’s graphical integrity
• Some lapses intentional, some not
• Lie Factor = size of effect in graph
size of effect i...
89
Review exercise:
Fill in the cells in this table
Level Properties Examples Descriptive
Statistics
Graphs
Nominal
/Categ...
90
References
1. Chambers, J., Cleveland, B., Kleiner, B., & Tukey, P. (1983).
Graphical methods for data analysis. Boston...
91
Open Office Impress
● This presentation was made using
Open Office Impress.
● Free and open source software.
● http://w...
Upcoming SlideShare
Loading in …5
×

Descriptives & Graphing

19,807 views

Published on

Overviews descriptive and graphical approaches to analysis of univariate data.

Published in: Education, Technology
  • But kurtosis does not measure anything about the peak. It measures the rare, extreme observations only. Please see https://en.wikipedia.org/wiki/Talk:Kurtosis#Why_kurtosis_should_not_be_interpreted_as_.22peakedness.22
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Descriptives & Graphing

  1. 1. Lecture 3 Survey Research & Design in Psychology James Neill, 2017 Creative Commons Attribution 4.0 Descriptives & Graphing Image source: http://commons.wikimedia.org/wiki/File:3D_Bar_Graph_Meeting.jpg
  2. 2. 2 Overview: Descriptives & Graphing 1. Getting to know a data set 2. LOM & types of statistics 3. Descriptive statistics 4. Normal distribution 5. Non-normal distributions 6. Effect of skew on central tendency 7. Principles of graphing 8. Univariate graphical techniques
  3. 3. 3 Getting to know a data-set (how to approach data)
  4. 4. 4 Play with the data – get to know it.Image source: http://www.flickr.com/photos/analytik/1356366068/
  5. 5. 5 Don't be afraid - you can't break data!Image source: http://www.flickr.com/photos/rnddave/5094020069
  6. 6. 6 Check & screen the data – keep signal, reduce noise Image source: https://commons.wikimedia.org/wiki/File:Nasir-al_molk_-1.jpg
  7. 7. 7 Data checking: One person reads the survey responses aloud to another person who checks the electronic data file. For large studies, check a proportion of the surveys and declare the error-rate in the research report. Image source: http://maxpixel.freegreatpicture.com/Business-Team-Two-People-Meeting-Computers-Office-1209640
  8. 8. 8 Data screening: Carefully 'screening' a data file helps to remove errors and maximise validity. For example, screen for: Out of range values Mis-entered data Missing cases Duplicate cases Missing data Image source: https://commons.wikimedia.org/wiki/File:Archaeology_dirt_screening.jpg
  9. 9. 9 Explore the data Image source: https://commons.wikimedia.org/wiki/File:Kazimierz_Nowak_in_jungle_2.jpgI
  10. 10. 10 Get intimate with the data Image source: http://www.flickr.com/photos/elmoalves/2932572231/
  11. 11. 11 Describe the data's main features find a meaningful, accurate way to depict the ‘true story’ of the data Image source: http://www.flickr.com/photos/lloydm/2429991235/
  12. 12. 12 Test hypotheses to answer research questions Image source: https://pixabay.com/en/light-bulb-current-light-glow-1042480/
  13. 13. 13 Level of measurement & types of statistics Image source: http://www.flickr.com/photos/peanutlen/2228077524/
  14. 14. 14 Golden rule of data analysis A variable's level of measurement determines the type of statistics that can be used, including types of: • descriptive statistics • graphs • inferential statistics
  15. 15. 15 Levels of measurement and non-parametric vs. parametric Categorical & ordinal data DV → non-parametric (Does not assume a normal distribution) Interval & ratio data DV → parametric (Assumes a normal distribution) → non-parametric (If distribution is non-normal) DVs = dependent variables
  16. 16. 16 Parametric statistics • Statistics which estimate parameters of a population, based on the normal distribution –Univariate: mean, standard deviation, skewness, kurtosis, t-tests, ANOVAs –Bivariate: correlation, linear regression –Multivariate: multiple linear regression
  17. 17. 17 • More powerful (more sensitive) • More assumptions (population is normally distributed) • Vulnerable to violations of assumptions (less robust) Parametric statistics
  18. 18. 18 Non-parametric statistics • Statistics which do not assume sampling from a population which is normally distributed –There are non-parametric alternatives for many parametric statistics –e.g., sign test, chi-squared, Mann- Whitney U test, Wilcoxon matched-pairs signed-ranks test.
  19. 19. 19 Non-parametric statistics • Less powerful (less sensitive) • Fewer assumptions (do not assume a normal distribution) • Less vulnerable to assumption violation (more robust)
  20. 20. 20 Univariate descriptive statistics
  21. 21. 21 Number of variables Univariate = one variable Bivariate = two variables Multivariate = more than two variables mean, median, mode, histogram, bar chart correlation, t-test, scatterplot, clustered bar chart reliability analysis, factor analysis, multiple linear regression
  22. 22. 22 What do we want to describe? The distributional properties of variables, based on: ● Central tendency(ies): e.g., frequencies, mode, median, mean ● Shape: e.g., skewness, kurtosis ● Spread (dispersion): min., max., range, IQR, percentiles, variance, standard deviation
  23. 23. 23 Measures of central tendency Statistics which represent the ‘centre’ of a frequency distribution: –Mode (most frequent) –Median (50th percentile) –Mean (average) Which ones to use depends on: –Type of data (level of measurement) –Shape of distribution (esp. skewness) Reporting more than one may be appropriate.
  24. 24. 24 Measures of central tendency √√If meaningfulRatio √√√Interval √Ordinal √Nominal MeanMedianMode / Freq. /%s If meaningful x x x
  25. 25. 25 Measures of distribution • Measures of shape, spread, dispersion, and deviation from the central tendency Non-parametric: • Min. and max. • Range • Percentiles Parametric: • SD • Skewness • Kurtosis
  26. 26. 26 √√√Ratio √√Interval √Ordinal Nominal Var / SDPercentileMin / Max, Range Measures of spread / dispersion / deviation If meaningful √ x x x x
  27. 27. 27 Descriptives for nominal data • Nominal LOM = Labelled categories • Descriptive statistics: –Most frequent? (Mode – e.g., females) –Least frequent? (e.g., Males) –Frequencies (e.g., 20 females, 10 males) –Percentages (e.g. 67% females, 33% males) –Cumulative percentages –Ratios (e.g., twice as many females as males)
  28. 28. 28 Descriptives for ordinal data • Ordinal LOM = Conveys order but not distance (e.g., ranks) • Descriptives approach is as for nominal (frequencies, mode etc.) • Plus percentiles (including median) may be useful
  29. 29. 29 Descriptives for interval data • Interval LOM = order and distance, but no true 0 (0 is arbitrary). • Central tendency (mode, median, mean) • Shape/Spread (min., max., range, SD, skewness, kurtosis) Interval data is discrete, but is often treated as ratio/continuous (especially for > 5 intervals)
  30. 30. 30 Descriptives for ratio data • Ratio = Numbers convey order and distance, meaningful 0 point • As for interval, use median, mean, SD, skewness etc. • Can also use ratios (e.g., Category A is twice as large as Category B)
  31. 31. 31 Mode (Mo) • Most common score - highest point in a frequency distribution – a real score – the most common response • Suitable for all levels of data, but may not be appropriate for ratio (continuous) • Not affected by outliers • Check frequencies and bar graph to see whether it is an accurate and useful statistic
  32. 32. 32 Frequencies (f) and percentages (%) • # of responses in each category • % of responses in each category • Frequency table • Visualise using a bar or pie chart
  33. 33. 33 Median (Mdn) • Mid-point of distribution (Quartile 2, 50th percentile) • Not badly affected by outliers • May not represent the central tendency in skewed data • If the Median is useful, then consider what other percentiles may also be worth reporting
  34. 34. 34 Summary: Descriptive statistics • Level of measurement and normality determines whether data can be treated as parametric • Describe the central tendency –Frequencies, Percentages –Mode, Median, Mean • Describe the variability: –Min., Max., Range, Quartiles –Standard Deviation, Variance
  35. 35. 35 Properties of the normal distribution Image source: http://www.flickr.com/photos/trevorblake/3200899889/
  36. 36. 36 Four moments of a normal distribution Row 1 Row 2 Row 3 Row 4 0 2 4 6 8 10 12 Column 1 Column 2 Column 3 Mean ←SD→ -ve Skew +ve Skew ←Kurtosis→
  37. 37. 37 Four moments of a normal distribution Four mathematical qualities (parameters) can describe a continuous distribution which at least roughly follows a bell curve shape: • 1st = mean (central tendency) • 2nd = SD (dispersion) • 3rd = skewness (lean / tail) • 4th = kurtosis (peakedness / flattness)
  38. 38. 38 Mean (1st moment ) • Average score Mean = Σ X / N • For normally distributed ratio or interval (if treating it as continuous) data. • Influenced by extreme scores (outliers)
  39. 39. 39 Beware inappropriate averaging... With your head in an oven and your feet in ice you would feel, on average, just fine The majority of people have more than the average number of legs (M = 1.9999).
  40. 40. 40 Standard deviation (2nd moment) • SD = square root of the variance = Σ (X - X)2 N – 1 • For normally distributed interval or ratio data • Affected by outliers • Can also derive the Standard Error (SE) = SD / square root of N
  41. 41. 41 Skewness (3rd moment ) • Lean of distribution – +ve = tail to right – -ve = tail to left • Can be caused by an outlier, or ceiling or floor effects • Can be accurate (e.g., cars owned per person would have a skewed distribution)
  42. 42. 42 Skewness (3rd moment) (with ceiling and floor effects) ● Negative skew ● Ceiling effect ● Positive skew ● Floor effect Image source http://www.visualstatistics.net/Visual%20Statistics%20Multimedia/normalization.htm
  43. 43. 43 Kurtosis (4th moment ) • Flatness or peakedness of distribution +ve = peaked -ve = flattened • By altering the X &/or Y axis, any distribution can be made to look more peaked or flat – add a normal curve to help judge kurtosis visually.
  44. 44. 44 Kurtosis (4th moment ) Image source: https://classconnection.s3.amazonaws.com/65/flashcards/2185065/jpg/kurtosis-142C1127AF2178FB244.jpg
  45. 45. 45 Judging severity of skewness & kurtosis • View histogram with normal curve • Deal with outliers • Rule of thumb: Skewness and kurtosis > -1 or < 1 is generally considered to sufficiently normal for meeting the assumptions of parametric inferential statistics • Significance tests of skewness: Tend to be overly sensitive (therefore avoid using)
  46. 46. 46 Areas under the normal curve If distribution is normal (bell-shaped - or close): ~68% of scores within +/- 1 SD of M ~95% of scores within +/- 2 SD of M ~99.7% of scores within +/- 3 SD of M
  47. 47. 47 Areas under the normal curve Image source: https://commons.wikimedia.org/wiki/File:Empirical_Rule.PNG
  48. 48. 48 Non-normal distributions
  49. 49. 49 Types of non-normal distribution • Modality –Uni-modal (one peak) –Bi-modal (two peaks) –Multi-modal (more than two peaks) • Skewness –Positive (tail to right) –Negative (tail to left) • Kurtosis –Platykurtic (Flat) –Leptokurtic (Peaked)
  50. 50. 50 Non-normal distributions
  51. 51. 51 Histogram of people's weight WEIGHT 110.0100.090.080.070.060.050.040.0 Histogram Frequency 8 6 4 2 0 Std. Dev = 17.10 Mean = 69.6 N = 20.00
  52. 52. 52 Histogram of daily calorie intake N = 75
  53. 53. 53 Histogram of fertility
  54. 54. 54 Example ‘normal’ distribution 1 140120100806040200 Die 60 50 40 30 20 10 0 Frequency Mean =81.21 Std. Dev. =18.228 N =188 At what age do you think you will die?
  55. 55. 55 Example ‘normal’ distribution 2 Very masculineFairly masculineAndrogynousFairly feminineVery feminine Femininity-Masculinity 60 40 20 0 Count This bimodal graph actually consists of two different, underlying normal distributions.
  56. 56. 56 Very masculineFairly masculineAndrogynousFairly feminineVery feminine Femininity-Masculinity 60 40 20 0 Count Very masculineFairly masculineAndrogynousFairly feminine Femininity-Masculinity 50 40 30 20 10 0 Count Gender: male Distribution for females Distribution for males Very masculineFairly masculineAndrogynousFairly feminineVery feminine Femininity-Masculinity 60 40 20 0 Count Gender: female
  57. 57. 57 Non-normal distribution: Use non-parametric descriptive statistics • Min. & Max. • Range = Max. - Min. • Percentiles • Quartiles –Q1 –Mdn (Q2) –Q3 –IQR (Q3-Q1)
  58. 58. 58 Effects of skew on measures of central tendency +vely skewed distributions mode < median < mean symmetrical (normal) distributions mean = median = mode -vely skewed distributions mean < median < mode
  59. 59. 59 Effects of skew on measures of central tendency
  60. 60. 60 Transformations • Converts data using various formulae to achieve normality and allow more powerful tests • Loses original metric • Complicates interpretation
  61. 61. 61 1. If a survey question produces a ‘floor effect’, where will the mean, median and mode lie in relation to one another? Review questions
  62. 62. 62 2. Would the mean # of cars owned in Australia to exceed the median? Review questions
  63. 63. 63 3. Would the mean score on an easy test exceed the median performance? Review questions
  64. 64. 64 Graphical techniques Image source: http://www.flickr.com/photos/pagedooley/2121472112/
  65. 65. 65 Visualisation “Visualization is any technique for creating images, diagrams, or animations to communicate a message.” - Wikipedia Image source: http://en.wikipedia.org/wiki/File:FAE_visualization.jpg
  66. 66. 66 Science is beautiful (Nature Video) (Youtube – 5:30 mins) Image source:http://commons.wikimedia.org/wiki/File:Parodyfilm.png
  67. 67. 67 Is Pivot a turning point for web exploration? (Gary Flake) (TED talk - 6 min.) Image source:http://commons.wikimedia.org/wiki/File:Parodyfilm.png
  68. 68. 68 Principles of graphing • Clear purpose • Maximise clarity • Minimise clutter • Allow visual comparison
  69. 69. 69 Graphs (Edward Tufte) • Visualise data • Reveal data – Describe – Explore – Tabulate – Decorate • Communicate complex ideas with clarity, precision, and efficiency
  70. 70. 70 Graphing steps 1. Identify purpose of the graph (make large amounts of data coherent; present many #s in small space; encourage the eye to make comparisons) 2. Select type of graph to use 3. Draw and modify graph to be clear, non-distorting, and well- labelled (maximise clarity, minimise clarity; show the data; avoid distortion; reveal data at several levels/layers)
  71. 71. 71 Software for data visualisation (graphing) 1. Statistical packages ● e.g., SPSS Graphs or via Analyses 2. Spreadsheet packages ● e.g., MS Excel 3. Word-processors ● e.g., MS Word – Insert – Object – Micrograph Graph Chart
  72. 72. 72 Cleveland’s hierarchyImage source:http://www.processtrends.com/TOC_data_visualization.htm
  73. 73. 73 Cleveland’s hierarchyImage source:https://priceonomics.com/how-william-cleveland-turned-data-visualization/
  74. 74. 74 Univariate graphs • Bar graph • Pie chart • Histogram • Stem & leaf plot • Data plot / Error bar • Box plot Non-parametric i.e., nominal or ordinal } } Parametric i.e., normally distributed interval or ratio
  75. 75. 75 Bar chart (Bar graph) AREA Biology Anthropology Information Technolo Psychology Sociology Count 13 12 12 11 11 10 10 9 9 AREA Biology Anthropology Information Technolo Psychology Sociology Count 12 11 10 9 8 7 6 5 4 3 2 1 0 • Allows comparison of heights of bars • X-axis: Collapse if too many categories • Y-axis: Count/Frequency or % - truncation exaggerates differences • Can add data labels (data values for each bar) Note truncated Y-axis
  76. 76. 76 Biology Anthropology Information Technolo Psychology Sociology • Use a bar chart instead • Hard to read –Difficult to show • Small values • Small differences –Rotation of chart and position of slices influences perception Pie chart
  77. 77. 77 Pie chart → Use bar chart instead Image source: https://priceonomics.com/how-william-cleveland-turned-data-visualization/
  78. 78. 78 Histogram Participant Age 62.552.542.532.522.512.5 3000 2000 1000 0 Std. Dev = 9.16 Mean = 24.0 N = 5575.00 Participant Age 63.0 58.0 53.0 48.0 43.0 38.0 33.0 28.0 23.0 18.0 13.0 8.0 600 500 400 300 200 100 0 Std. Dev = 9.16 Mean = 24.0 N = 5575.00 Participant Age 65 61 57 53 49 45 41 37 33 29 25 21 17 13 9 1000 800 600 400 200 0 Std. Dev = 9.16 Mean = 24 N = 5575.00 • For continuous data (Likert?, Ratio) • X-axis needs a happy medium for # of categories • Y-axis matters (can exaggerate)
  79. 79. 79 Histogram of male & female heights Wild & Seber (2000) Image source: Wild, C. J., & Seber, G. A. F. (2000). Chance encounters: A first course in data analysis and inference. New York: Wiley.
  80. 80. 80 Stem & leaf plot ● Use for ordinal, interval and ratio data (if rounded) ● May look confusing to unfamiliar reader
  81. 81. 81 • Contains actual data • Collapses tails • Underused alternative to histogram Stem & leaf plot Frequency Stem & Leaf 7.00 1 . & 192.00 1 . 22223333333 541.00 1 . 444444444444444455555555555555 610.00 1 . 6666666666666677777777777777777777 849.00 1 . 88888888888888888888888888899999999999999999999 614.00 2 . 0000000000000000111111111111111111 602.00 2 . 222222222222222233333333333333333 447.00 2 . 4444444444444455555555555 291.00 2 . 66666666677777777 240.00 2 . 88888889999999 167.00 3 . 000001111 146.00 3 . 22223333 153.00 3 . 44445555 118.00 3 . 666777 99.00 3 . 888999 106.00 4 . 000111 54.00 4 . 222 339.00 Extremes (>=43)
  82. 82. 82 Box plot (Box & whisker) ● Useful for interval and ratio data ● Represents min., max, median, quartiles, & outliers
  83. 83. 83 • Alternative to histogram • Useful for screening • Useful for comparing variables • Can get messy - too much info • Confusing to unfamiliar reader Box plot (Box & whisker) Participant Gender FemaleMaleMissing 10 8 6 4 2 0 Time Management-T1 Self-Confidence-T1 44954162578259628414042044353275182341862330517623006559128211495 3201419358828475475400198324512898200336473 52157129504268724318255928345427211669040523444 4423423635403519067273946893137 3562338330403962312229 12255255545 2385410773323584004 552433515563 28294482267253154120226228451504231939983902646355221793020527435314997364541416412902548168628144167196326144171955174443826882822262617931747148 218736735510399522434250553623594998649620510638344230032962562527 35644317149302843626902101233519693009296541539905538229314216883634 27433593251521081985531655582138303424526783352317 2480296024926454284316542285186 419324766472662291 6084308 17 2699 3556334 1503275241623466255243493045 304032431371222596415943511907247380 4028 18082659 197862231372721142861 226520672270403852527688296021515564300430321938532836535506271835192336608405435012183292849986302224518624385114882241 27806412743294423212570661146542792576430229232476 231214932334 4308292014254307 569 5491
  84. 84. 84 Data plot & error bar Data plot Error bar
  85. 85. 85 • Alternative to histogram • Implies continuity e.g., time • Can show multiple lines Line graph OVERALL SCALES-T3 OVERALL SCALES-T2 OVERALL SCALES-T1 OVERALL SCALES-T0 Mean 8.0 7.5 7.0 6.5 6.0 5.5 5.0
  86. 86. 86 Graphical integrity (part of academic integrity)
  87. 87. 87 "Like good writing, good graphical displays of data communicate ideas with clarity, precision, and efficiency. Like poor writing, bad graphical displays distort or obscure the data, make it harder to understand or compare, or otherwise thwart the communicative effect which the graph should convey." Michael Friendly – Gallery of Data Visualisation
  88. 88. 88 Tufte’s graphical integrity • Some lapses intentional, some not • Lie Factor = size of effect in graph size of effect in data • Misleading uses of area • Misleading uses of perspective • Leaving out important context • Lack of taste and aesthetics
  89. 89. 89 Review exercise: Fill in the cells in this table Level Properties Examples Descriptive Statistics Graphs Nominal /Categorical Ordinal / Rank Interval Ratio Answers: http://goo.gl/Ln9e1
  90. 90. 90 References 1. Chambers, J., Cleveland, B., Kleiner, B., & Tukey, P. (1983). Graphical methods for data analysis. Boston, MA: Duxbury Press. 2. Cleveland, W. S. (1985). The elements of graphing data. Monterey, CA: Wadsworth. 3. Jones, G. E. (2006). How to lie with charts. Santa Monica, CA: LaPuerta. 4. Tufte, E. R. (1983). The visual display of quantitative information. Cheshire, CT: Graphics Press. 5. Tufte. E. R. (2001). Visualizing quantitative data. Cheshire, CT: Graphics Press. 6. Tukey J. (1977). Exploratory data analysis. Addison-Wesley. 7. Wild, C. J., & Seber, G. A. F. (2000). Chance encounters: A first course in data analysis and inference. New York: Wiley.
  91. 91. 91 Open Office Impress ● This presentation was made using Open Office Impress. ● Free and open source software. ● http://www.openoffice.org/product/impress.html

×