3. Quantitative data analysis
Quantitative data can either be:
Continuous data e.g. examination scores
Categorical data, e.g. marital status, gender
We usually describe or summarise data using:
Visual displays
Statistics ( descriptive or inferential)
4. Quantitative data analysis (cont’d)
Data analysis should always begin with preliminary analysis using
Visual displays (also referred to as exploratory analysis – EDA)
Descriptive analysis
This helps one to
understand the data
Select an appropriate statistics to use to analyse data
5. Statistics
Two approaches in statistics used to make sense of data
Descriptive statistics- used to organize or summarise data (reducing to manageable
from), primarily aim at describing data
Examples – visual displays, measures of central tendency, measures of variation
Inferential statistics- tools for finding out information for a population from the
descriptive characteristics of a sample drawn from a population.
Examples- correlation coefficient, t-test- chi-square, regression analysis
6. Descriptive statistics
Descriptive statistics include:
Visual displays
Frequency distribution, histograms, stem and leaf, box plots and
scatter plots
Measures of central tendency-
mean, median, and mode (provide a sense of location of the
distribution).
Measures of variability or dispersion
Range, Standard deviation, variance
7. Measures of central tendency
Also known as measure of location
Three measures of central tendency are mode, median and mean
8. Mode
0
100
200
300
400
500
600
700
800
900
1 2 3 4 5 6 7 8 9
No
of
students
MSCE grades
Number of students by MSCE grades
The most common observation/
score or most frequently occurring
value
If adjacent values occur with equal
frequency take the average of the
two as the mode
Form 3A: 13, 14, 14, 14, 15, 15, 15,
17, 18,19, 22, 23 = 14 + 15 / 2
Mode = 18.5
If there are two unadjacent values
with equal frequency report both as
mode -the mode is called bimodal
Form 3B: 12, 13, 14, 14, 14, 15, 16,
16, 19, 19,19, 21 Mode = 14 & 19
9. Median
Is value that lies in the middle of the distribution when the values are arranged in
ascending order.
It is the value that divides the distribution into half.
For odd numbers in a distribution the median is the value that lies in the middle of
the distribution such that half the values fall above it and half below it.
3, 8, 12, 15, 19 Median = 12
For even numbers in a distribution median is the average (arithmetic mean) of the
two middle
3, 8, 12, 14, 15, 19 Median = 12+14 / 2= 13
10. Mean
Is the average
Is defined as the sum (total) of values/scores divided by number of observations or
scores
Is usually designated as x̂ (Xbar) = x
̂ = ∑X
N
x1 + x2 + x3 +…….
n
11. Calculate mean age, mode and median (in years)
for each class of form 3 students
Form 3 A Form 3 B
13 12
14 13
14 14
14 14
14 14
15 15
15 16
17 16
18 19
19 19
22 19
23 21
12. An important property of mean
Sum of deviations of all measurements or values in a set from their mean is = 0
take, 7, 13, 22, 9, 11, 4, mean = 11
7-11= -4
13-11= 2
22=11= 11
9=11= -2
4-11= -7
(-4+2+11+ (-2)+(-7))= 0
This is property of mean is makes it amenable to algebraic calculations which are the
basis of many of inferential statistics techniques
13. Advantages and disadvantages of the
mode
Advantages
Represents a value that actually appears in the data
Represents the largest number of values or scores
It is unaffected by extreme scores/ values
Disadvantages
Not based upon all observations
Cannot be used in mathematical operations
It is to a great extent affected by fluctuations of sampling
Difficult to interpret when data set contains more than one mode
14. Advantages and disadvantages of the
median
Advantages
It is unaffected by extreme scores / values
Disadvantages
Not based upon all observations
Cannot be used in mathematical operations
15. Advantages and disadvantages of the
mean
Advantages
Its calculation is based on all observations
It is least affected by sampling fluctuations
Best measure for comparing 2 or more series of data
it can be manipulated
Disadvantages
It may not be represented in actual data – so it is theoretical
it is affected by extreme values
16. Measures of central tendency and scales of
measurement
Mean is an appropriate measure of central location for interval and ratio data
Median is an ordinal statistic. Its calculation is based on ordinal properties of data
Mode is appropriate for nominal data
17. Measures of variation or dispersion
Are a group of statistics that provide information on how a set of scores or values
are distributed.
Show how spread out the distribution of observations (scores or set of values) is
from the mean of the distribution
In other words how much on average scores or values differ from the mean
18. Performance on an exam for 2 schools might appear to
be similar when one considers measure of central
tendency
Scores Freq. Class A Cum freq Freq class B Cum freq
60 2 120 7 420
70 3 210 0 0
80 5 400 1 80
90 3 270 0 0
100 2 200 7 700
15 1200 15 1200
mean 80 80
median 80 80
19. Two distributions with equal mean, median and
mode but different variation of values
Score on examination for sch A: n=15,
mean= 80, median =80
0
1
2
3
4
5
6
50 60 70 80 90 100
Score on an examination for sch b: n=15,
mean =80, median =80
0
1
2
3
4
5
6
7
8
50 60 70 80 90 100
20. Measures of variation
First group measures variation in a distribution in terms of the distance from
smaller values to higher values.
Range, interquartile range (IQR), semi-interquartile range (SIQR)
Second group measures variation in terms of a summary measure of each scores
deviation from the mean
Variance, standard deviation
21. Range
It is the simplest measure of variation
It is defined as the distance between the smallest and the largest value in a data set
It is the difference between the largest and the smallest values in a set of data
10, 12, 15, 18, 20 = 20-10 range is 10
2, 8, 15, 22, 28 = 28 – 2 range is 26
22. Variance
It is the sum of squares of deviations about the mean
1,4,7,10,13 mean= 1+4+7+10+13=35 divide by N= 35/5 = 7
Deviations from mean=
1-7 = -6
4-7 = -3
7-7 = 0
10-7 = 3
13-7 = 6
23. Sum of squares
Mean deviation
X X - X
̂
1 (1-7) = -6
4 (4-7) = -3
7 (7-7) = 0
10 (10-7) = 3
13 (13-7) = 6
̂X= 7 ∑ (X - X
̂ ) = 0
Sum of squares
(X-X̂)2
-62 = 36
-32 = 9
02 = 0
32 = 9
62 = 36
∑ (X-X
̂ )2 = 90
24. Sample variance
Is given by the formula (definitional formulae)
S2 = ∑(X-X
̂ )2 = 36 + 9 + 0 + 9 + 36 = 90 = 18
N N 5
Or unbiased estimate
S2 = ∑(X-X
̂̂ )2 = 36 + 9 + 0 + 9 + 36 = 90 = 22.5
N-1 5-1 4
26. Variance (cont’d)
Variance tells us how representative the mean is of each score in the distribution
The closer the score is to the mean the smaller the variation
Conversely, the farther the score is from the mean the greater the variance and the
likelihood the mean is not representative of the scores
27. Standard deviation
It is the square root of the variance
It is a value that indicates the average variability of the scores.
It tells us about the distance , on average, of the scores from the mean
31. Steps in calculating variance and standard deviation
Step 1 Step 2 Step 3 Step 4 Step 5 Step 6
Calculate
mean 𝑋 or
Xbar
∑X
N
Subtract
mean from
each value
(x)
(X-𝑋)
Square each
of the
differences
(X-X)2
Add up the
squared
mean
deviations
∑(X- 𝑋)2
Divide the
sum of
squares by
N_1
∑ (X- 𝑋̂)2
N-1
Take the
square root
of result in
(5)
(𝑋 − 𝑋)2
N-1
10 10-12= -2 -(2)2 = -2 x -2 4
14 14-12= 2 (2)2 = 2 x 2 4
6 6-12 = -6 (-6)2 =-6 x -6 36
18 18-12= 6 (6)2 = 6 x 6 36
10+14+6+18
= 48/4
=80 =80
4-1
√ 26.7
Xbar = 12 S2 = 26.67 S = 5.16
32. Standard deviation cont’d
Variance and standard deviation are very sensitive to extreme scores
In research Standard deviation is normally reported with the mean
Editor's Notes
Form 3A mean = 18.1, mode= 14, median =15,
Form 3B mean= 15.3, mode=14,19 median=15.5