descriptive statistics- 1.pptx

Statistical techniques and
data analysis

Quantitative data analysis
Quantitative data can either be:
Continuous data e.g. examination scores
Categorical data, e.g. marital status, gender
We usually describe or summarise data using:
Visual displays
Statistics ( descriptive or inferential)

Quantitative data analysis (cont’d)
 Data analysis should always begin with preliminary analysis using
Visual displays (also referred to as exploratory analysis – EDA)
Descriptive analysis
 This helps one to
 understand the data
Select an appropriate statistics to use to analyse data

Statistics
 Two approaches in statistics used to make sense of data
 Descriptive statistics- used to organize or summarise data (reducing to manageable
from), primarily aim at describing data
 Examples – visual displays, measures of central tendency, measures of variation
 Inferential statistics- tools for finding out information for a population from the
descriptive characteristics of a sample drawn from a population.
 Examples- correlation coefficient, t-test- chi-square, regression analysis

Descriptive statistics
Descriptive statistics include:
 Visual displays
 Frequency distribution, histograms, stem and leaf, box plots and
scatter plots
 Measures of central tendency-
 mean, median, and mode (provide a sense of location of the
distribution).
 Measures of variability or dispersion
 Range, Standard deviation, variance

Measures of central tendency
 Also known as measure of location
 Three measures of central tendency are mode, median and mean

Mode
0
100
200
300
400
500
600
700
800
900
1 2 3 4 5 6 7 8 9
No
of
students
MSCE grades
Number of students by MSCE grades
The most common observation/
score or most frequently occurring
value
If adjacent values occur with equal
frequency take the average of the
two as the mode
Form 3A: 13, 14, 14, 14, 15, 15, 15,
17, 18,19, 22, 23 = 14 + 15 / 2
Mode = 18.5
If there are two unadjacent values
with equal frequency report both as
mode -the mode is called bimodal
Form 3B: 12, 13, 14, 14, 14, 15, 16,
16, 19, 19,19, 21 Mode = 14 & 19

Median
 Is value that lies in the middle of the distribution when the values are arranged in
ascending order.
 It is the value that divides the distribution into half.
 For odd numbers in a distribution the median is the value that lies in the middle of
the distribution such that half the values fall above it and half below it.
 3, 8, 12, 15, 19 Median = 12
 For even numbers in a distribution median is the average (arithmetic mean) of the
two middle
 3, 8, 12, 14, 15, 19 Median = 12+14 / 2= 13

Mean
 Is the average
 Is defined as the sum (total) of values/scores divided by number of observations or
scores
 Is usually designated as x̂ (Xbar) = x
̂ = ∑X
 N
 x1 + x2 + x3 +…….
 n

Calculate mean age, mode and median (in years)
for each class of form 3 students
Form 3 A Form 3 B
13 12
14 13
14 14
14 14
14 14
15 15
15 16
17 16
18 19
19 19
22 19
23 21

An important property of mean
 Sum of deviations of all measurements or values in a set from their mean is = 0
 take, 7, 13, 22, 9, 11, 4, mean = 11
 7-11= -4
 13-11= 2
 22=11= 11
 9=11= -2
 4-11= -7
 (-4+2+11+ (-2)+(-7))= 0
 This is property of mean is makes it amenable to algebraic calculations which are the
basis of many of inferential statistics techniques

Advantages and disadvantages of the
mode
 Advantages
 Represents a value that actually appears in the data
 Represents the largest number of values or scores
 It is unaffected by extreme scores/ values
 Disadvantages
 Not based upon all observations
 Cannot be used in mathematical operations
 It is to a great extent affected by fluctuations of sampling
 Difficult to interpret when data set contains more than one mode

median
 Advantages
 It is unaffected by extreme scores / values
 Disadvantages
 Not based upon all observations
 Cannot be used in mathematical operations

mean
 Advantages
 Its calculation is based on all observations
 It is least affected by sampling fluctuations
 Best measure for comparing 2 or more series of data
 it can be manipulated
Disadvantages
 It may not be represented in actual data – so it is theoretical
 it is affected by extreme values

Measures of central tendency and scales of
measurement
 Mean is an appropriate measure of central location for interval and ratio data
 Median is an ordinal statistic. Its calculation is based on ordinal properties of data
 Mode is appropriate for nominal data

Measures of variation or dispersion
 Are a group of statistics that provide information on how a set of scores or values
are distributed.
 Show how spread out the distribution of observations (scores or set of values) is
from the mean of the distribution
 In other words how much on average scores or values differ from the mean

Performance on an exam for 2 schools might appear to
be similar when one considers measure of central
tendency
Scores Freq. Class A Cum freq Freq class B Cum freq
60 2 120 7 420
70 3 210 0 0
80 5 400 1 80
90 3 270 0 0
100 2 200 7 700
15 1200 15 1200
mean 80 80
median 80 80

Two distributions with equal mean, median and
mode but different variation of values
Score on examination for sch A: n=15,
mean= 80, median =80
0
1
2
3
4
5
6
50 60 70 80 90 100
Score on an examination for sch b: n=15,
mean =80, median =80
0
1
2
3
4
5
6
7
8
50 60 70 80 90 100

Measures of variation
 First group measures variation in a distribution in terms of the distance from
smaller values to higher values.
 Range, interquartile range (IQR), semi-interquartile range (SIQR)
 Second group measures variation in terms of a summary measure of each scores
deviation from the mean
 Variance, standard deviation

Range
 It is the simplest measure of variation
 It is defined as the distance between the smallest and the largest value in a data set
 It is the difference between the largest and the smallest values in a set of data
 10, 12, 15, 18, 20 = 20-10 range is 10
 2, 8, 15, 22, 28 = 28 – 2 range is 26

Variance
 It is the sum of squares of deviations about the mean
 1,4,7,10,13 mean= 1+4+7+10+13=35 divide by N= 35/5 = 7
 Deviations from mean=
 1-7 = -6
 4-7 = -3
 7-7 = 0
 10-7 = 3
 13-7 = 6

Sum of squares
Mean deviation
X X - X
̂
1 (1-7) = -6
4 (4-7) = -3
7 (7-7) = 0
10 (10-7) = 3
13 (13-7) = 6
̂X= 7 ∑ (X - X
̂ ) = 0
Sum of squares
(X-X̂)2
-62 = 36
-32 = 9
02 = 0
32 = 9
62 = 36
∑ (X-X
̂ )2 = 90

Sample variance
 Is given by the formula (definitional formulae)
 S2 = ∑(X-X
̂ )2 = 36 + 9 + 0 + 9 + 36 = 90 = 18
 N N 5
 Or unbiased estimate
 S2 = ∑(X-X
̂̂ )2 = 36 + 9 + 0 + 9 + 36 = 90 = 22.5
 N-1 5-1 4

Computational formula for Variance

Variance (cont’d)
 Variance tells us how representative the mean is of each score in the distribution
 The closer the score is to the mean the smaller the variation
 Conversely, the farther the score is from the mean the greater the variance and the
likelihood the mean is not representative of the scores

Standard deviation
 It is the square root of the variance
 It is a value that indicates the average variability of the scores.
 It tells us about the distance , on average, of the scores from the mean

Definitional formula of standard deviation

Variance vs standard deviation
 s = ∑(𝑋 − 𝑋)2 Standard deviation
 N
 s2 = ∑(X-X
̂ )2 Variance
 N

Computational formulae for Standard
deviation

Steps in calculating variance and standard deviation
Step 1 Step 2 Step 3 Step 4 Step 5 Step 6
Calculate
mean 𝑋 or
Xbar
∑X
N
Subtract
mean from
each value
(x)
(X-𝑋)
Square each
of the
differences
(X-X)2
Add up the
squared
mean
deviations
∑(X- 𝑋)2
Divide the
sum of
squares by
N_1
∑ (X- 𝑋̂)2
N-1
Take the
square root
of result in
(5)
(𝑋 − 𝑋)2
N-1
10 10-12= -2 -(2)2 = -2 x -2 4
14 14-12= 2 (2)2 = 2 x 2 4
6 6-12 = -6 (-6)2 =-6 x -6 36
18 18-12= 6 (6)2 = 6 x 6 36
10+14+6+18
= 48/4
=80 =80
4-1
√ 26.7
Xbar = 12 S2 = 26.67 S = 5.16

Standard deviation cont’d
 Variance and standard deviation are very sensitive to extreme scores
 In research Standard deviation is normally reported with the mean

descriptive statistics- 1.pptx

Recommended

Recommended

More Related Content

Similar to descriptive statistics- 1.pptx

Similar to descriptive statistics- 1.pptx (20)

Recently uploaded

Recently uploaded (20)

descriptive statistics- 1.pptx

Editor's Notes