Descriptive
Statistics
Arithmetic Mean
Median
Mode
Approach of describing numerical data
Variance
Standard Deviation
Coefficient of Variation
Range
Interquartile Range
Central Tendency Variation
Central Tendency
• Numerical central value of a set observation is called measures of central
tendency.
• It is a central or typical value for a probability distribution.
• It may also be called a center or location of the distribution.
• Measures of central tendency:
Mean
Median
Mode
Measures of Central Tendency
Central Tendency
Mean Median Mode
n
x
x
n
1
i
i



Midpoint of
ranked values
Most frequently
observed value
Arithmetic
average
Mean
Mean is a single and typical value used to represent a set of data. It also
referred as the average.
Objective:
• To get a single value that represents the entire data
• To facilitate the comparison between groups of data of similar nature
Classification of mean
• Arithmetic Mean (AM)
• Geometric Mean (GM)
• Harmonic Mean (HM)
Arithmetic Mean
The arithmetic mean (mean) is the most common measure of central tendency
• For a population of N values:
• For a sample of size n:
Sample size
n
x
x
x
n
x
x n
2
1
n
1
i
i






  Observed
values
N
x
x
x
N
x
μ N
2
1
N
1
i
i






 
Population size
Population
values
Arithmetic Mean
• The most common measure of central tendency
• Mean = sum of values divided by the number of values
• Affected by extreme values (outliers)
0 1 2 3 4 5 6 7 8 9 10
Mean = 3
0 1 2 3 4 5 6 7 8 9 10
Mean = 4
3
5
15
5
5
4
3
2
1






4
5
20
5
10
4
3
2
1






Properties of mean
• It takes all observations into account reflecting the value
• It is used in other statistical tools
• It is most reliable for drawing inferences
• It is the easiest to use in advanced statistics technique
Limitations of mean
• Highly affected by extreme values, even just one extreme value
• Sometimes negative and zero values can not be counted
Median
In an ordered list, the median is the “middle” number (50% above, 50%
below)
• It is not affected by extreme values
0 1 2 3 4 5 6 7 8 9 10
Median = 3
0 1 2 3 4 5 6 7 8 9 10
Median = 3
Median
• It is the middle value of a set of numbers which have been ordered by
magnitude
• The median is also the number that is halfway into the set.
• The location of the median:
• If the number of values is odd, the median is the middle number
• If the number of values is even, the median is the average of the two middle
numbers
Median
For grouped frequency
Where,
L = lower limit of the median class
N = total number of observations
F = cumulative frequency of preceding median class
fm = frequency of the median class
C = class interval of the median class
SBP Range
(mmHg)
Frequency
Cumulative
Frequency
101-105 2 2
106-110 3 5
111-115 5 10
116-120 8 18
121-125 6 24
126-130 4 28
131-135 2 30
136-140 1 31
L = 121
N= 31
F= 18
fm=6
C=5
118.92
L = 116
N= 31
F= 10
fm=8
C=5
119.43
Properties of median
• Not affected by extreme value
• Perfect statistical example for skewed distribution
• Can be calculated from frequency distribution
• It is not influenced by the position of items
Limitations of median
• It is not based on all observations
• Compared to mean it is less reliable
• Not suitable for further analysis
Mode
• The mode is the value of a data set that occurs most frequently.
• It is the commonly observed value which occurs maximum number
times
Mode
• A measure of central tendency
• Value that occurs most often
• Not affected by extreme values
• Used for either numerical or categorical data
• There may be no mode
• There may be several modes
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode = 9
0 1 2 3 4 5 6
No Mode
Mode
SBP Range (mmHg) Frequency
101-105 2
106-110 3
111-115 5
116-120 8
121-125 6
126-130 4
131-135 2
136-140 1
l = 116
f1 = 8
f0 = 5
f2 = 6
h = 5
119
SBP Range (mmHg) Frequency
101-105 2
106-110 3
111-115 5
116-120 8
121-125 6
126-130 4
131-135 8
136-140 1
L = 131
f1 = 8
f0 = 4
f2 = 1
h = 5
132.82
Properties of mode
• Not affected by extreme value
• For large number of data, mode happens to be
meaningful as an average
• Can be calculated from frequency distribution
• Do not affected by small and large numbers
• It is not based on all observations
• Compared to mean it is less reliable
• Not suitable for further advanced analysis
Limitations of mode
• Mean is generally used, unless extreme values (outliers) exist
• Then median is often used, since the median is not sensitive to
extreme values.
Which one is the “best” measurement?
Dispersions/variability
• Dispersions are the measures of extent of deviation of individual from
the central value (average).
• It determines how much representative the central value is.
• It may be small if the values are closely bunched about their mean and
it is large when the values are scattered widely about their mean.
• To determine the reliability of an average
• For controlling the variability
• For comparing two or more series of data regarding their variability
• For facilitating the use of other statistical measures
Objectives of Dispersions Measurement
• It should be rigidly defined
• It should be easy to calculate and easy to understand
• It should be based on all observations
• It should be suitable for further mathematical treatment
• It should be affected as little as possible to the sampling fluctuation
Characteristics of a good measure of
Dispersions
Shape of a Distribution
• Describes how data are distributed
• Measures of shape: Symmetric or skewed
Mean = Median
Mean < Median Median < Mean
Right-Skewed
Left-Skewed Symmetric
Same center,
different variation
Measures of Variability
Variation
Variance Standard
Deviation
Coefficient
of Variation
Range Interquartile
Range
Measures of variation give
information on the spread or
variability of the data values
Range
• Simplest measure of variation
• Difference between the largest and the smallest
observations:
Range = Xlargest – Xsmallest
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 14 - 1 = 13
Example:
• Ignores the way in which data are distributed
• Sensitive to outliers
7 8 9 10 11 12
Range = 12 - 7 = 5
7 8 9 10 11 12
Range = 12 - 7 = 5
Characteristics of the Range
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 5 - 1 = 4
Range = 120 - 1 = 119
Quartiles
Quartiles split the ranked data into 4 segments with an equal number of values
per segment
25% 25% 25% 25%
 The first quartile, Q1, is the value for which 25% of the observations
are smaller and 75% are larger
 Q2 is the same as the median (50% are smaller, 50% are larger)
 Only 25% of the observations are greater than the third quartile
Q1 Q2 Q3
Quartile Formulas
Find a quartile by determining the value in the appropriate
position in the ranked data, where
First quartile position: Q1 = 0.25(n+1)
Second quartile position: Q2 = 0.50(n+1)
(median)
Third quartile position: Q3 = 0.75(n+1)
where n is the number of observed values
(n = 9)
Q1 = is in the 0.25(9+1) = 2.5 position of the ranked data
so use the value half-way between the 2nd
and 3rd
values,
Q1 = 12.5
Quartiles
Sample Ranked Data: 11 12 13 16 16 17 18 21 22
 Example: Find the first, second and third quartiles
 second and third quartiles ??
Interquartile Range
Example:
Median
(Q2)
X
maximum
X
minimum Q1 Q3
25% 25% 25% 25%
12 30 45 57 70
Interquartile range
= 57 – 30 = 27
Interquartile range = 3rd
quartile – 1st
quartile
IQR = Q3 – Q1
Variance
Variance measures how far each number in the set is from the mean.
It is calculated by
• taking the differences between each number in the set and the mean
• squaring the differences
• dividing the sum of the squares by the number of values in the set.
Standard deviation
• It is a measure of how spread-out numbers are. Its symbol is σ. It is the
square root of the deviations of individual items from their arithmetic
mean.
• 8, 9, 11, 12 Ava. = 10
• 18, 19, 2, 1 Ava. = 10
• Calculate standard deviation, consider a sample of IQ scores given by
96, 104, 126, 134 and 140.
Examples
Calculate standard deviation, consider a sample of IQ scores given by 96,
104, 126, 134 and 140.
The mean of this data is (96+104+126+134+140)/5 =120.
σ = √[ ∑(x-120)^2 / 5 ]
The deviation from the mean is given by
96-120 = -24,
104-120 = -16,
126-120 = 6,
134-120 = 14,
140-120 = 20.
σ = √[ ((-24)^2+(-16)^2+(6)^2+(14)^2+(20)^2) / 5 ]
σ = √[ (1464) / 5 ] = ± 17.11
Comparing Standard Deviations
Mean = 15.5
s = 3.338
11 12 13 14 15 16 17 18 19 20 21
11 12 13 14 15 16 17 18 19 20 21
Data B
Data A
Mean = 15.5
s = 0.926
11 12 13 14 15 16 17 18 19 20 21
Mean = 15.5
s = 4.570
Data C
Measuring variation
Small standard deviation
Large standard deviation
Standard Deviation
• Most commonly used measure of variation
• Each value in the data set is used in the calculation
• Shows variation from the mean
• Has the same units as the original data
• It cannot be negative.
• A standard deviation close to 0 indicates that the data points tend to be close
to the mean.
• The further the data points are from the mean, the greater the standard
deviation
Important Note
• Standard deviation of sample data of a population
• Variance of sample data of a population
Coefficient of Variation
• Measures relative variation
• Always in percentage (%)
• Shows variation relative to mean
• Can be used to compare two or more sets of data
measured in different units
100%
x
s
CV 









Comparing Coefficient of Variation
• Stock A:
• Average price last year = $50
• Standard deviation = $5
• Stock B:
• Average price last year = $100
• Standard deviation = $5
Both stocks
have the same
standard
deviation, but
stock B is less
variable relative
to its price
10%
100%
$50
$5
100%
x
s
CVA 












5%
100%
$100
$5
100%
x
s
CVB 












Measure of Locations of Data
Percentiles
• Percentile is a measure of position in a set of observations.
• It is a number where a certain percentage of scores fall below that
percentile.
• It is a measure used in statistics indicating the value below which a
given percentage of observations in a group of observations falls.
For example, the 29th percentile is the value of a variable such that
29% of the observations are less than the value and 71% of the
observations are greater.
• Suppose, you got 80th
percentile on GRE analytical score, that means
80% of GRE test taker have marks less than you and 20% of GRE test
taker have more marks than you.
• Standard error (SE) is the standard deviation of the sampling
distribution of a statistic.
• If the statistic is the sample mean, it is called the standard error of the
mean (SEM)
Standard Error of the sample mean
Standard error of the mean is a
measure of the dispersion of sample
means around population mean
Find out the standard error among the following data
Exercise
Drug
Concentration
(μg/ml)
Absorption Mean ± SE
25
0.286
??
0.214
0.255
50
0.482
??
0.510
0.524
100
1.119
??
1.225
1.316
Percentile rank
A percentile rank is the percentage of scores that fall at or below a given
score.
• 10 marks
• 2,5,6,3, 6,8,10,1, 4,6,7,2
• 1,2,2,3, 4,5,6,6, 6,7,8,10
• 10 marks
• 7,10,10,10 9,8,9,9, 10,9,10,7
• 7, 7,8,9,9,9,9,
10,10,10,10,10
Practical problems
1. Find the mean, median & mood of each of the following sets of blood
pressure reading
145, 146, 148, 146, 145, 147, 144, 144, 138, 142, 140, 152, 160, 158,
148, 148.
2. Calculate the appropriate average for prolactin levels (ng/L) obtained
during a clinical trial involving 10 subjects, 9.4, 7.0, 7.6, 6.7, 6.3, 8.6,
6.8, 10.6, 8.9, 9.4

Descriptive statistics: Mean, Mode, Median

  • 1.
  • 2.
    Arithmetic Mean Median Mode Approach ofdescribing numerical data Variance Standard Deviation Coefficient of Variation Range Interquartile Range Central Tendency Variation
  • 3.
    Central Tendency • Numericalcentral value of a set observation is called measures of central tendency. • It is a central or typical value for a probability distribution. • It may also be called a center or location of the distribution. • Measures of central tendency: Mean Median Mode
  • 4.
    Measures of CentralTendency Central Tendency Mean Median Mode n x x n 1 i i    Midpoint of ranked values Most frequently observed value Arithmetic average
  • 5.
    Mean Mean is asingle and typical value used to represent a set of data. It also referred as the average. Objective: • To get a single value that represents the entire data • To facilitate the comparison between groups of data of similar nature Classification of mean • Arithmetic Mean (AM) • Geometric Mean (GM) • Harmonic Mean (HM)
  • 6.
    Arithmetic Mean The arithmeticmean (mean) is the most common measure of central tendency • For a population of N values: • For a sample of size n: Sample size n x x x n x x n 2 1 n 1 i i         Observed values N x x x N x μ N 2 1 N 1 i i         Population size Population values
  • 7.
    Arithmetic Mean • Themost common measure of central tendency • Mean = sum of values divided by the number of values • Affected by extreme values (outliers) 0 1 2 3 4 5 6 7 8 9 10 Mean = 3 0 1 2 3 4 5 6 7 8 9 10 Mean = 4 3 5 15 5 5 4 3 2 1       4 5 20 5 10 4 3 2 1      
  • 8.
    Properties of mean •It takes all observations into account reflecting the value • It is used in other statistical tools • It is most reliable for drawing inferences • It is the easiest to use in advanced statistics technique
  • 9.
    Limitations of mean •Highly affected by extreme values, even just one extreme value • Sometimes negative and zero values can not be counted
  • 10.
    Median In an orderedlist, the median is the “middle” number (50% above, 50% below) • It is not affected by extreme values 0 1 2 3 4 5 6 7 8 9 10 Median = 3 0 1 2 3 4 5 6 7 8 9 10 Median = 3
  • 11.
    Median • It isthe middle value of a set of numbers which have been ordered by magnitude • The median is also the number that is halfway into the set. • The location of the median: • If the number of values is odd, the median is the middle number • If the number of values is even, the median is the average of the two middle numbers
  • 12.
    Median For grouped frequency Where, L= lower limit of the median class N = total number of observations F = cumulative frequency of preceding median class fm = frequency of the median class C = class interval of the median class
  • 13.
    SBP Range (mmHg) Frequency Cumulative Frequency 101-105 22 106-110 3 5 111-115 5 10 116-120 8 18 121-125 6 24 126-130 4 28 131-135 2 30 136-140 1 31 L = 121 N= 31 F= 18 fm=6 C=5 118.92 L = 116 N= 31 F= 10 fm=8 C=5 119.43
  • 14.
    Properties of median •Not affected by extreme value • Perfect statistical example for skewed distribution • Can be calculated from frequency distribution • It is not influenced by the position of items Limitations of median • It is not based on all observations • Compared to mean it is less reliable • Not suitable for further analysis
  • 15.
    Mode • The modeis the value of a data set that occurs most frequently. • It is the commonly observed value which occurs maximum number times
  • 16.
    Mode • A measureof central tendency • Value that occurs most often • Not affected by extreme values • Used for either numerical or categorical data • There may be no mode • There may be several modes 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Mode = 9 0 1 2 3 4 5 6 No Mode
  • 17.
  • 18.
    SBP Range (mmHg)Frequency 101-105 2 106-110 3 111-115 5 116-120 8 121-125 6 126-130 4 131-135 2 136-140 1 l = 116 f1 = 8 f0 = 5 f2 = 6 h = 5 119
  • 19.
    SBP Range (mmHg)Frequency 101-105 2 106-110 3 111-115 5 116-120 8 121-125 6 126-130 4 131-135 8 136-140 1 L = 131 f1 = 8 f0 = 4 f2 = 1 h = 5 132.82
  • 20.
    Properties of mode •Not affected by extreme value • For large number of data, mode happens to be meaningful as an average • Can be calculated from frequency distribution • Do not affected by small and large numbers • It is not based on all observations • Compared to mean it is less reliable • Not suitable for further advanced analysis Limitations of mode
  • 21.
    • Mean isgenerally used, unless extreme values (outliers) exist • Then median is often used, since the median is not sensitive to extreme values. Which one is the “best” measurement?
  • 22.
    Dispersions/variability • Dispersions arethe measures of extent of deviation of individual from the central value (average). • It determines how much representative the central value is. • It may be small if the values are closely bunched about their mean and it is large when the values are scattered widely about their mean.
  • 23.
    • To determinethe reliability of an average • For controlling the variability • For comparing two or more series of data regarding their variability • For facilitating the use of other statistical measures Objectives of Dispersions Measurement
  • 24.
    • It shouldbe rigidly defined • It should be easy to calculate and easy to understand • It should be based on all observations • It should be suitable for further mathematical treatment • It should be affected as little as possible to the sampling fluctuation Characteristics of a good measure of Dispersions
  • 25.
    Shape of aDistribution • Describes how data are distributed • Measures of shape: Symmetric or skewed Mean = Median Mean < Median Median < Mean Right-Skewed Left-Skewed Symmetric
  • 26.
    Same center, different variation Measuresof Variability Variation Variance Standard Deviation Coefficient of Variation Range Interquartile Range Measures of variation give information on the spread or variability of the data values
  • 27.
    Range • Simplest measureof variation • Difference between the largest and the smallest observations: Range = Xlargest – Xsmallest 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Range = 14 - 1 = 13 Example:
  • 28.
    • Ignores theway in which data are distributed • Sensitive to outliers 7 8 9 10 11 12 Range = 12 - 7 = 5 7 8 9 10 11 12 Range = 12 - 7 = 5 Characteristics of the Range 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120 Range = 5 - 1 = 4 Range = 120 - 1 = 119
  • 29.
    Quartiles Quartiles split theranked data into 4 segments with an equal number of values per segment 25% 25% 25% 25%  The first quartile, Q1, is the value for which 25% of the observations are smaller and 75% are larger  Q2 is the same as the median (50% are smaller, 50% are larger)  Only 25% of the observations are greater than the third quartile Q1 Q2 Q3
  • 30.
    Quartile Formulas Find aquartile by determining the value in the appropriate position in the ranked data, where First quartile position: Q1 = 0.25(n+1) Second quartile position: Q2 = 0.50(n+1) (median) Third quartile position: Q3 = 0.75(n+1) where n is the number of observed values
  • 31.
    (n = 9) Q1= is in the 0.25(9+1) = 2.5 position of the ranked data so use the value half-way between the 2nd and 3rd values, Q1 = 12.5 Quartiles Sample Ranked Data: 11 12 13 16 16 17 18 21 22  Example: Find the first, second and third quartiles  second and third quartiles ??
  • 32.
    Interquartile Range Example: Median (Q2) X maximum X minimum Q1Q3 25% 25% 25% 25% 12 30 45 57 70 Interquartile range = 57 – 30 = 27 Interquartile range = 3rd quartile – 1st quartile IQR = Q3 – Q1
  • 33.
    Variance Variance measures howfar each number in the set is from the mean. It is calculated by • taking the differences between each number in the set and the mean • squaring the differences • dividing the sum of the squares by the number of values in the set.
  • 34.
    Standard deviation • Itis a measure of how spread-out numbers are. Its symbol is σ. It is the square root of the deviations of individual items from their arithmetic mean.
  • 35.
    • 8, 9,11, 12 Ava. = 10 • 18, 19, 2, 1 Ava. = 10 • Calculate standard deviation, consider a sample of IQ scores given by 96, 104, 126, 134 and 140.
  • 36.
    Examples Calculate standard deviation,consider a sample of IQ scores given by 96, 104, 126, 134 and 140. The mean of this data is (96+104+126+134+140)/5 =120. σ = √[ ∑(x-120)^2 / 5 ] The deviation from the mean is given by 96-120 = -24, 104-120 = -16, 126-120 = 6, 134-120 = 14, 140-120 = 20. σ = √[ ((-24)^2+(-16)^2+(6)^2+(14)^2+(20)^2) / 5 ] σ = √[ (1464) / 5 ] = ± 17.11
  • 37.
    Comparing Standard Deviations Mean= 15.5 s = 3.338 11 12 13 14 15 16 17 18 19 20 21 11 12 13 14 15 16 17 18 19 20 21 Data B Data A Mean = 15.5 s = 0.926 11 12 13 14 15 16 17 18 19 20 21 Mean = 15.5 s = 4.570 Data C
  • 38.
    Measuring variation Small standarddeviation Large standard deviation
  • 39.
    Standard Deviation • Mostcommonly used measure of variation • Each value in the data set is used in the calculation • Shows variation from the mean • Has the same units as the original data • It cannot be negative. • A standard deviation close to 0 indicates that the data points tend to be close to the mean. • The further the data points are from the mean, the greater the standard deviation
  • 40.
    Important Note • Standarddeviation of sample data of a population • Variance of sample data of a population
  • 41.
    Coefficient of Variation •Measures relative variation • Always in percentage (%) • Shows variation relative to mean • Can be used to compare two or more sets of data measured in different units 100% x s CV          
  • 42.
    Comparing Coefficient ofVariation • Stock A: • Average price last year = $50 • Standard deviation = $5 • Stock B: • Average price last year = $100 • Standard deviation = $5 Both stocks have the same standard deviation, but stock B is less variable relative to its price 10% 100% $50 $5 100% x s CVA              5% 100% $100 $5 100% x s CVB             
  • 43.
    Measure of Locationsof Data Percentiles • Percentile is a measure of position in a set of observations. • It is a number where a certain percentage of scores fall below that percentile. • It is a measure used in statistics indicating the value below which a given percentage of observations in a group of observations falls. For example, the 29th percentile is the value of a variable such that 29% of the observations are less than the value and 71% of the observations are greater.
  • 44.
    • Suppose, yougot 80th percentile on GRE analytical score, that means 80% of GRE test taker have marks less than you and 20% of GRE test taker have more marks than you.
  • 45.
    • Standard error(SE) is the standard deviation of the sampling distribution of a statistic. • If the statistic is the sample mean, it is called the standard error of the mean (SEM) Standard Error of the sample mean Standard error of the mean is a measure of the dispersion of sample means around population mean
  • 46.
    Find out thestandard error among the following data Exercise Drug Concentration (μg/ml) Absorption Mean ± SE 25 0.286 ?? 0.214 0.255 50 0.482 ?? 0.510 0.524 100 1.119 ?? 1.225 1.316
  • 47.
    Percentile rank A percentilerank is the percentage of scores that fall at or below a given score.
  • 48.
    • 10 marks •2,5,6,3, 6,8,10,1, 4,6,7,2 • 1,2,2,3, 4,5,6,6, 6,7,8,10 • 10 marks • 7,10,10,10 9,8,9,9, 10,9,10,7 • 7, 7,8,9,9,9,9, 10,10,10,10,10
  • 49.
    Practical problems 1. Findthe mean, median & mood of each of the following sets of blood pressure reading 145, 146, 148, 146, 145, 147, 144, 144, 138, 142, 140, 152, 160, 158, 148, 148. 2. Calculate the appropriate average for prolactin levels (ng/L) obtained during a clinical trial involving 10 subjects, 9.4, 7.0, 7.6, 6.7, 6.3, 8.6, 6.8, 10.6, 8.9, 9.4