SlideShare a Scribd company logo
1 of 55
Download to read offline
Descriptive statistics:
Numerical summary
measures
Tufa Kolola
(MPH, Ass’t. Prof.)
1
Contents
§ Introduction
§ Measures of central tendency
§ Measures of relative standing
§ Shape of distribution
§ Measures of dispersion
2
Learning
objectives
After the end of this session you will be able
to:
§ Compute and interpret the mean, median, and
mode for a set of data
§ Construct and interpret a box and whiskers plot
§ Compute and interpret the range, variance,
standard deviation coefficient of variation for a
set of data
§ Use numerical measures along with graphs,
charts, and tables to describe data
3
Numerical
summary measures
Numerical summary measures : A descriptive
measure which summarize the data set by a
single number
§ Unlike frequency distributions, indicate the
average value or (the middle) and the spread of
the values
4
Summary Measures
Measures of central
tendency (Location)
Mean
Median
Mode
Measures of
Relative Standing
Weighted Mean
Numerical summary
measures
Measures of dispersion
(Variation)
Variance
Standard Deviation
Coefficient of
Variation
Range
Percentiles
Interquartile Range
Quartiles
5
Measures of central
tendency(MCT)
§ On the scale of values of a variable, there is a certain
stage at which the largest number of items tend to
cluster
§ Since this stage is usually in the centre of distribution,
the tendency of the statistical data to get concentrated
at a certain value is called “central tendency”
§ The various methods of determining the point about
which the observations tend to concentrate are called
MCT
6
Characteristics of
good MCT
1. It should be based on all the observations
2. It should not be affected by the extreme values
3. It should be as close to the minimum & maximum
number of values as possible
4. It should have a definite value
5. It should not be subjected to complicated and
tedious calculations
6. It should be capable of further algebraic treatment
7. It should be stable with regard to sampling
7
Measures of central
tendency(MCT)
Center and Location
Mean Median Mode Weighted Mean







i
i
i
W
i
i
i
W
w
x
w
w
x
w
X
8
Arithmetic Mean:
ungrouped data
§ The Mean is the average of data set (Is the sum of
all the observations divided by the total number of
observations)
– Sample mean
– Population mean
n = Sample Size
N = Population Size
n
x
x
x
n
x
x n
n
i
i






 
2
1
1
N
x
x
x
N
x
N
N
i
i







 
2
1
1
9
Arithmetic Mean
§ The most common measure of central tendency
§ Affected by extreme values (outliers)
0 1 2 3 4 5 6 7 8 9 10
Mean = 3
0 1 2 3 4 5 6 7 8 9 10
Mean = 4
3
5
15
5
5
4
3
2
1






4
5
20
5
10
4
3
2
1






10
Grouped data
§ In calculating the mean from grouped data, we
assume that all values falling into a particular
class interval are located at the midpoint of the
interval. It is calculated as follows:
11



fi
mifi
Sample
)
(
mean
§ Where:
mi=the midpoint of the ith class interval
fi= the frequency of the ith class interval
Example. Compute the mean age of 169 subjects
from the grouped data
12
Properties of
Arithmetic Mean
§ For a given set of data there is one and only one
arithmetic mean (uniqueness)
§ Easy to calculate and understand (simple)
§ Influenced by each and every value in a data sets
§ Greatly affected by the extreme values
§ Poor measure of location if the underlying
distribution is not normal (or not Gaussian)
§ In case of grouped data if any class interval is
open, arithmetic mean can not be calculated
13
Median: Ungrouped
data
§ In an ordered array, the median is the “middle”
number
– If n or N is odd, the median is the middle number
– If n or N is even, the median is the average of the
two middle numbers
0 1 2 3 4 5 6 7 8 9 10
Median = 3
0 1 2 3 4 5 6 7 8 9 10
Median = 3
§ The median is the value of the middle term in a
data set that has been ranked in increasing order
14
Median
15
Grouped data
§ In calculating the median from grouped data, we
assume that the values within a class-interval are
evenly distributed through the interval
§ The first step is to locate the class interval in
which the median is located, using the following
procedure
§ Find n/2 and see a class interval with a minimum
cumulative frequency which contains n/2
§ Then, use the following formula
16
where,
Lm = lower true class boundary of the interval containing the
median
Fc = cumulative frequency of the interval just above the
median class interval
fm = frequency of the interval containing the median
W= class interval width
n = total number of observations
c
m
m
n
F
2
x = L W
f
 

 
  
 
 

17
Example: Compute the median age of 169
subjects from the grouped data
18
§ n/2 = 169/2 = 84.5
§ n/2 = 84.5 = in the 3rd class interval
§ Lower limit = 29.5, Upper limit = 39.5
§ Frequency of the class = 47
§ (n/2 – fc) = 84.5-70 = 14.5
Median = 29.5 + (14.5/47)10 = 32.58 33

19
Properties of
Median
§ There is only one median for a given set of data
(uniqueness)
§ The median is easy to calculate
§ Median is a positional average and hence it is
insensitive to very large or very small values
§ Median can be calculated even in the case of open
end intervals if sample size known
§ It is determined mainly by the middle points and
less sensitive to the remaining data points
(weakness)
20
Mode: Ungrouped
data
§ Value that occurs most often
§ Not affected by extreme values
§ Used for either numerical or categorical data
§ There may be no mode
§ There may be several modes
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode = 5
0 1 2 3 4 5 6
No Mode
21
Mode: Grouped
data
§ To find the mode of grouped data, we usually
refer to the modal class, where the modal class
is the class interval with the highest frequency
§ If a single value for the mode of grouped data
must be specified, it is taken as the mid-point of
the modal class interval
22
Properties of
Mode
§ It is not affected by extreme values
§ It can be calculated for distributions with open end
classes
§ Often its value is not unique
§ The main drawback of mode is that often it does
not exist
23
Measures of
Relative Standing
§ Where does one particular measurement stand
in relation to the other measurements in the data
set?
§ Descriptive measures that locate the relative
position of an observation in relation to the other
observations are called measures of relative
standing
24
Measures of
Relative Standing
Measures of
Relative Standing
Percentiles Quartiles
n 1st quartile = 25th percentile
n 2nd quartile = 50th percentile
= median
n 3rd quartile = 75th percentile
§ The pth percentile in a data
array: is a number such that
p% of the observations of
the data set fall below and
(100-p)% of the observations
fall above it. (where 0 ≤ p ≤
100)
25
Percentiles
§ The pth percentile in an ordered array of n values
is the value in ith position, where
n Example: The 60th percentile in an ordered array
of 19 values is the value in 12th position:
1)
(n
100
p
i 

12
1)
(19
100
60
1)
(n
100
p
i 




26
27
§ Commonly used percentiles
– First (lower) decile = 10th percentile
– First (lower) quartile, Q1 = 25th percentile
– Second (middle)quartile,Q2 = 50th percentile
– Third quartile, Q3 = 75th percentile
– Ninth (upper) decile = 90th percentile
Percentiles
Quartiles
§ Quartiles Split Ordered Data into 4 equal
portions
§ Q1 and Q3 are Measures of Non-central Location
§ Q2 = the Median
25% 25% 25% 25%
 
1
Q  
2
Q  
3
Q
28
Quartiles
§ Each Quartile has position and value
– With the data in an ordered array, the position of Qi
is:
– The value of Qi is the value associated with that
position in the ordered array
§ Example:
Data in Ordered Array: 11 12 13 16 16 17 18 21 22
   
1 1
1 9 1 12 13
Position of 2.5 12.5
4 2
Q Q
 
   
 
 
1
4
i
i n
Q


29
Example
The prices ($) of 18 brands of walking shoes:
40 60 65 65 67 68 68 70 70
70 70 70 70 74 75 75 90 95
üQ1is 3/4 of the way between the 4th and 5th
ordered measurements, or
Q1 = 65 + .75(67 - 65) = 66.5.
30
Example
The prices ($) of 18 brands of walking shoes:
40 60 65 65 65 68 68 70 70
70 70 70 70 74 75 75 90 95
üQ3 is 1/4 of the way between the 14th and 15th
ordered measurements, or
Q3 = 74 + .25(75 - 74) = 74.25
üAnd IQR = Q3 – Q1 = 74.25 – 66.5 = 7.75
31
Shape of a
Distribution
§ Describes how data is distributed
§ Measures of Shape
- Symmetric or skewed (asymmetric)
Mean = Median = Mode
Mean < Median < Mode Mode < Median < Mean
Right-Skewed
Left-Skewed Symmetric
(Longer tail extends to left) (Longer tail extends to right)
32
The Five Number
Summary
§ One way to give a nice profile of a data set is the
“five-number summary,” which consists of:
1. The smallest measurement
2. The first quartile, Q1
3. The median, Q2
4. The third quartile, Q3
5. The largest measurement
§ Displayed visually using a box-and-whiskers plot
33
The Box-and-
Whisker plot
§ 5-number summary
– Median, Q1, Q3, Xsmallest, Xlargest
§ Box Plot
– Graphical display of data using 5-number
summary
Median
( )
4 6 8 10 12
Maximum
Minimum
1
Q 3
Q
2
Q
34
Distribution Shape &
Box-and-Whisker Plot
Right-Skewed
Left-Skewed Symmetric
1
Q 1
Q 1
Q
2
Q 2
Q 2
Q
3
Q 3
Q
3
Q
35
§ Skewed distributions usually have a long whisker in the
direction of the skewness
Shape of a Distribution
and Quartiles
§ If the distribution is symmetric, then the upper and
lower quartiles should be approximately equally
spaced from the median
§ If the upper quartile is farther from the median than
the lower quartile, then the distribution is positively
skewed
§ If the lower quartile is farther from the median than
the upper quartile, then the distribution is negatively
skewed
36
Outlier
§ A value located at a distance of more than
1.5(IQR) from the box
üLower fence: Q1-1.5 IQR
üUpper fence: Q3+1.5 IQR
§ Measurements beyond the upper or lower fence
are outliers and are marked with *
*
37
Measures of
Variation
Variation
Variance Standard Deviation Coefficient of
Variation
Population
Variance
Sample
Variance
Population
Standard
Deviation
Sample
Standard
Deviation
Range
Interquartile
Range
38
Measures of
Variation
§ Measures that quantify the variation or dispersion
of a set of data from its central location
§ The amount may be small when the values are
close together and large when the values are far
apart from each other
§ If all the values are the same, no dispersion
§ How much are the observations spread out
around the mean value?
39
§ Measures of variation give information on the
spread or variability of the data values
Measures of
Variation
Same center,
different variation
40
Measures of
Variation
§ The more Spread out or dispersed data, the larger
the measures of variation
§ The more concentrated the data, the smaller the
measures of variation
§ If all observations are equal, measures of variation
= Zero
§ All measures of variation are Non-negative
41
Range
§ Simplest measure of variation
§ Difference between the largest and the smallest
observations:
Range = xmaximum – xminimum
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 14 - 1 = 13
Example:
42
§ Ignores the way in which data are distributed
§ Sensitive to outliers
7 8 9 10 11 12
Range = 12 - 7 = 5
7 8 9 10 11 12
Range = 12 - 7 = 5
Disadvantages of the
Range
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 5 - 1 = 4
Range = 120 - 1 = 119
43
Interquartile
Range
§ We can eliminate some outlier problems by using
the interquartile range
§ Eliminate some high-and low-valued observations
and calculate the range from the remaining values
§ Also known as midspread
– Spread in the middle 50%
§ Interquartile range = 3rd quartile – 1st quartile
44
Interquartile
Range
Median
(Q2)
X
maximum
X
minimum Q1 Q3
Example:
25% 25% 25% 25%
12 30 45 57 70
Interquartile range
= 57 – 30 = 27
§ Not affected by extreme values
45
§ Shows variation about the mean
§ Average of squared deviations of values from the
mean
– Sample variance:
– Population variance:
Variance
N
μ)
(x
σ
N
1
i
2
i
2




1
-
n
)
x
(x
s
n
1
i
2
i
2




46
Standard
Deviation
§ Most commonly used measure of variation
§ Shows variation about the mean
§ Has the same units as the original data
- Sample standard deviation:
- Population standard deviation:
N
μ)
(x
σ
N
1
i
2
i




1
-
n
)
x
(x
s
n
1
i
2
i




47
Variance vs.
Standard Deviation
§ Both measure the average “scatter” about the mean
§ Variance computations produce “squared” units which
makes interpretation more difficult
– For example, kg2 is meaningless.
§ Since it is the square root of the Variance, the
Standard Deviation is expressed in the same units as
the original data
§ Therefore, the Standard Deviation is the most
commonly used measure of variation
48
Comparing Standard
Deviations
Mean = 15.5
s = 3.338
11 12 13 14 15 16 17 18 19 20 21
11 12 13 14 15 16 17 18 19 20 21
Data B
Data A
Mean = 15.5
s = .9258
11 12 13 14 15 16 17 18 19 20 21
Mean = 15.5
s = 4.57
Data C
49
Coefficient of
Variation
§ Measures relative variation
§ Always in percentage (%)
§ Shows variation relative to mean
§ Is used to compare two or more sets of data
measured in different units
Population Sample
s
CV = ×100%
X
 
 
 
σ
CV = ×100%
μ
 
 
 
50
Compare the Coefficient of
Variation between data A, data B
and Data C
Mean = 15.5
s = 3.338
11 12 13 14 15 16 17 18 19 20 21
11 12 13 14 15 16 17 18 19 20 21
Data B
Data A
Mean = 15.5
s = .9258
11 12 13 14 15 16 17 18 19 20 21
Mean = 15.5
s = 4.57
Data C
51
§ Which data more Spread out around the mean?
§ If the data distribution is bell-shaped, then the
interval:
§ contains about 68% of the values in
the population
§ contains about 95% of the values in
the population
§ contains about 99.7% of the values
in the population
The Empirical Rule
1σ
μ 
μ 2σ

μ 3σ

52
The Empirical Rule
53
Summary
§ Quantitative data are usually described by a
measure of central tendency and a measure of
variation
§ In describing data, it is important to select the
measure of central tendency that most accurately
represents the data
§ To do so, it is important to know if data is
symmetrical or skewed
54
55
Thank you

More Related Content

Similar to 3. Descriptive statistics.pdf

3.3 Measures of relative standing and boxplots
3.3 Measures of relative standing and boxplots3.3 Measures of relative standing and boxplots
3.3 Measures of relative standing and boxplotsLong Beach City College
 
Measures of Dispersion.pptx
Measures of Dispersion.pptxMeasures of Dispersion.pptx
Measures of Dispersion.pptxVanmala Buchke
 
Chapter 3 Ken Black 2.ppt
Chapter 3 Ken Black 2.pptChapter 3 Ken Black 2.ppt
Chapter 3 Ken Black 2.pptNurinaSWGotami
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statisticsBurak Mızrak
 
QT1 - 03 - Measures of Central Tendency
QT1 - 03 - Measures of Central TendencyQT1 - 03 - Measures of Central Tendency
QT1 - 03 - Measures of Central TendencyPrithwis Mukerjee
 
QT1 - 03 - Measures of Central Tendency
QT1 - 03 - Measures of Central TendencyQT1 - 03 - Measures of Central Tendency
QT1 - 03 - Measures of Central TendencyPrithwis Mukerjee
 
Basic Statistical Descriptions of Data.pptx
Basic Statistical Descriptions of Data.pptxBasic Statistical Descriptions of Data.pptx
Basic Statistical Descriptions of Data.pptxAnusuya123
 
Measures of central tendency and dispersion
Measures of central tendency and dispersionMeasures of central tendency and dispersion
Measures of central tendency and dispersionAbhinav yadav
 
Measure of Variability Report.pptx
Measure of Variability Report.pptxMeasure of Variability Report.pptx
Measure of Variability Report.pptxCalvinAdorDionisio
 
polar pojhjgfnbhggnbh hnhghgnhbhnhbjnhhhhhh
polar pojhjgfnbhggnbh hnhghgnhbhnhbjnhhhhhhpolar pojhjgfnbhggnbh hnhghgnhbhnhbjnhhhhhh
polar pojhjgfnbhggnbh hnhghgnhbhnhbjnhhhhhhNathanAndreiBoongali
 
Measures of dispersion
Measures of dispersionMeasures of dispersion
Measures of dispersionDrZahid Khan
 
ap_stat_1.3.ppt
ap_stat_1.3.pptap_stat_1.3.ppt
ap_stat_1.3.pptfghgjd
 
Analysis of students’ performance
Analysis of students’ performanceAnalysis of students’ performance
Analysis of students’ performanceGautam Kumar
 
CABT Math 8 measures of central tendency and dispersion
CABT Math 8   measures of central tendency and dispersionCABT Math 8   measures of central tendency and dispersion
CABT Math 8 measures of central tendency and dispersionGilbert Joseph Abueg
 
measure of dispersion
measure of dispersion measure of dispersion
measure of dispersion som allul
 

Similar to 3. Descriptive statistics.pdf (20)

3.3 Measures of relative standing and boxplots
3.3 Measures of relative standing and boxplots3.3 Measures of relative standing and boxplots
3.3 Measures of relative standing and boxplots
 
Measures of Dispersion.pptx
Measures of Dispersion.pptxMeasures of Dispersion.pptx
Measures of Dispersion.pptx
 
Chapter 3 Ken Black 2.ppt
Chapter 3 Ken Black 2.pptChapter 3 Ken Black 2.ppt
Chapter 3 Ken Black 2.ppt
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
QT1 - 03 - Measures of Central Tendency
QT1 - 03 - Measures of Central TendencyQT1 - 03 - Measures of Central Tendency
QT1 - 03 - Measures of Central Tendency
 
QT1 - 03 - Measures of Central Tendency
QT1 - 03 - Measures of Central TendencyQT1 - 03 - Measures of Central Tendency
QT1 - 03 - Measures of Central Tendency
 
Basic Statistical Descriptions of Data.pptx
Basic Statistical Descriptions of Data.pptxBasic Statistical Descriptions of Data.pptx
Basic Statistical Descriptions of Data.pptx
 
Mod mean quartile
Mod mean quartileMod mean quartile
Mod mean quartile
 
Measures of central tendency and dispersion
Measures of central tendency and dispersionMeasures of central tendency and dispersion
Measures of central tendency and dispersion
 
Statistics 3, 4
Statistics 3, 4Statistics 3, 4
Statistics 3, 4
 
Central Tendency.pptx
Central Tendency.pptxCentral Tendency.pptx
Central Tendency.pptx
 
Dscriptive statistics
Dscriptive statisticsDscriptive statistics
Dscriptive statistics
 
Measure of Variability Report.pptx
Measure of Variability Report.pptxMeasure of Variability Report.pptx
Measure of Variability Report.pptx
 
polar pojhjgfnbhggnbh hnhghgnhbhnhbjnhhhhhh
polar pojhjgfnbhggnbh hnhghgnhbhnhbjnhhhhhhpolar pojhjgfnbhggnbh hnhghgnhbhnhbjnhhhhhh
polar pojhjgfnbhggnbh hnhghgnhbhnhbjnhhhhhh
 
Measures of dispersion
Measures of dispersionMeasures of dispersion
Measures of dispersion
 
ap_stat_1.3.ppt
ap_stat_1.3.pptap_stat_1.3.ppt
ap_stat_1.3.ppt
 
Descriptive
DescriptiveDescriptive
Descriptive
 
Analysis of students’ performance
Analysis of students’ performanceAnalysis of students’ performance
Analysis of students’ performance
 
CABT Math 8 measures of central tendency and dispersion
CABT Math 8   measures of central tendency and dispersionCABT Math 8   measures of central tendency and dispersion
CABT Math 8 measures of central tendency and dispersion
 
measure of dispersion
measure of dispersion measure of dispersion
measure of dispersion
 

More from YomifDeksisaHerpa

More from YomifDeksisaHerpa (6)

yom seminar TWO.pptx
yom seminar TWO.pptxyom seminar TWO.pptx
yom seminar TWO.pptx
 
1Basic biostatistics.pdf
1Basic biostatistics.pdf1Basic biostatistics.pdf
1Basic biostatistics.pdf
 
2Analysis of Variance.pdf
2Analysis of Variance.pdf2Analysis of Variance.pdf
2Analysis of Variance.pdf
 
2. Descriptive Statistics.pdf
2. Descriptive Statistics.pdf2. Descriptive Statistics.pdf
2. Descriptive Statistics.pdf
 
Delivering effective presentations.ppt
Delivering effective presentations.pptDelivering effective presentations.ppt
Delivering effective presentations.ppt
 
ethical dillema.pptx
ethical dillema.pptxethical dillema.pptx
ethical dillema.pptx
 

Recently uploaded

Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...
Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...
Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...Miss joya
 
Housewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment Booking
Housewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment BookingHousewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment Booking
Housewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment Bookingnarwatsonia7
 
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.Artifacts in Nuclear Medicine with Identifying and resolving artifacts.
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.MiadAlsulami
 
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...narwatsonia7
 
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service JaipurHigh Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipurparulsinha
 
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknow
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service LucknowVIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknow
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknownarwatsonia7
 
Ahmedabad Call Girls CG Road 🔝9907093804 Short 1500 💋 Night 6000
Ahmedabad Call Girls CG Road 🔝9907093804  Short 1500  💋 Night 6000Ahmedabad Call Girls CG Road 🔝9907093804  Short 1500  💋 Night 6000
Ahmedabad Call Girls CG Road 🔝9907093804 Short 1500 💋 Night 6000aliya bhat
 
Bangalore Call Girls Majestic 📞 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Majestic 📞 9907093804 High Profile Service 100% SafeBangalore Call Girls Majestic 📞 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Majestic 📞 9907093804 High Profile Service 100% Safenarwatsonia7
 
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service ChennaiCall Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service ChennaiNehru place Escorts
 
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...Miss joya
 
VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...
VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...
VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...Miss joya
 
Book Call Girls in Kasavanahalli - 7001305949 with real photos and phone numbers
Book Call Girls in Kasavanahalli - 7001305949 with real photos and phone numbersBook Call Girls in Kasavanahalli - 7001305949 with real photos and phone numbers
Book Call Girls in Kasavanahalli - 7001305949 with real photos and phone numbersnarwatsonia7
 
Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowSonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowRiya Pathan
 
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowKolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowNehru place Escorts
 
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking Models
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking ModelsMumbai Call Girls Service 9910780858 Real Russian Girls Looking Models
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking Modelssonalikaur4
 

Recently uploaded (20)

Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...
Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...
Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...
 
Housewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment Booking
Housewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment BookingHousewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment Booking
Housewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment Booking
 
Russian Call Girls in Delhi Tanvi ➡️ 9711199012 💋📞 Independent Escort Service...
Russian Call Girls in Delhi Tanvi ➡️ 9711199012 💋📞 Independent Escort Service...Russian Call Girls in Delhi Tanvi ➡️ 9711199012 💋📞 Independent Escort Service...
Russian Call Girls in Delhi Tanvi ➡️ 9711199012 💋📞 Independent Escort Service...
 
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.Artifacts in Nuclear Medicine with Identifying and resolving artifacts.
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.
 
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
 
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
 
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service JaipurHigh Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
 
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknow
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service LucknowVIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknow
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknow
 
Ahmedabad Call Girls CG Road 🔝9907093804 Short 1500 💋 Night 6000
Ahmedabad Call Girls CG Road 🔝9907093804  Short 1500  💋 Night 6000Ahmedabad Call Girls CG Road 🔝9907093804  Short 1500  💋 Night 6000
Ahmedabad Call Girls CG Road 🔝9907093804 Short 1500 💋 Night 6000
 
Bangalore Call Girls Majestic 📞 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Majestic 📞 9907093804 High Profile Service 100% SafeBangalore Call Girls Majestic 📞 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Majestic 📞 9907093804 High Profile Service 100% Safe
 
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service ChennaiCall Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
 
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...
 
VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...
VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...
VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...
 
Book Call Girls in Kasavanahalli - 7001305949 with real photos and phone numbers
Book Call Girls in Kasavanahalli - 7001305949 with real photos and phone numbersBook Call Girls in Kasavanahalli - 7001305949 with real photos and phone numbers
Book Call Girls in Kasavanahalli - 7001305949 with real photos and phone numbers
 
Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Available
 
Escort Service Call Girls In Sarita Vihar,, 99530°56974 Delhi NCR
Escort Service Call Girls In Sarita Vihar,, 99530°56974 Delhi NCREscort Service Call Girls In Sarita Vihar,, 99530°56974 Delhi NCR
Escort Service Call Girls In Sarita Vihar,, 99530°56974 Delhi NCR
 
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowSonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
 
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowKolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
 
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking Models
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking ModelsMumbai Call Girls Service 9910780858 Real Russian Girls Looking Models
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking Models
 
sauth delhi call girls in Bhajanpura 🔝 9953056974 🔝 escort Service
sauth delhi call girls in Bhajanpura 🔝 9953056974 🔝 escort Servicesauth delhi call girls in Bhajanpura 🔝 9953056974 🔝 escort Service
sauth delhi call girls in Bhajanpura 🔝 9953056974 🔝 escort Service
 

3. Descriptive statistics.pdf

  • 2. Contents § Introduction § Measures of central tendency § Measures of relative standing § Shape of distribution § Measures of dispersion 2
  • 3. Learning objectives After the end of this session you will be able to: § Compute and interpret the mean, median, and mode for a set of data § Construct and interpret a box and whiskers plot § Compute and interpret the range, variance, standard deviation coefficient of variation for a set of data § Use numerical measures along with graphs, charts, and tables to describe data 3
  • 4. Numerical summary measures Numerical summary measures : A descriptive measure which summarize the data set by a single number § Unlike frequency distributions, indicate the average value or (the middle) and the spread of the values 4
  • 5. Summary Measures Measures of central tendency (Location) Mean Median Mode Measures of Relative Standing Weighted Mean Numerical summary measures Measures of dispersion (Variation) Variance Standard Deviation Coefficient of Variation Range Percentiles Interquartile Range Quartiles 5
  • 6. Measures of central tendency(MCT) § On the scale of values of a variable, there is a certain stage at which the largest number of items tend to cluster § Since this stage is usually in the centre of distribution, the tendency of the statistical data to get concentrated at a certain value is called “central tendency” § The various methods of determining the point about which the observations tend to concentrate are called MCT 6
  • 7. Characteristics of good MCT 1. It should be based on all the observations 2. It should not be affected by the extreme values 3. It should be as close to the minimum & maximum number of values as possible 4. It should have a definite value 5. It should not be subjected to complicated and tedious calculations 6. It should be capable of further algebraic treatment 7. It should be stable with regard to sampling 7
  • 8. Measures of central tendency(MCT) Center and Location Mean Median Mode Weighted Mean        i i i W i i i W w x w w x w X 8
  • 9. Arithmetic Mean: ungrouped data § The Mean is the average of data set (Is the sum of all the observations divided by the total number of observations) – Sample mean – Population mean n = Sample Size N = Population Size n x x x n x x n n i i         2 1 1 N x x x N x N N i i          2 1 1 9
  • 10. Arithmetic Mean § The most common measure of central tendency § Affected by extreme values (outliers) 0 1 2 3 4 5 6 7 8 9 10 Mean = 3 0 1 2 3 4 5 6 7 8 9 10 Mean = 4 3 5 15 5 5 4 3 2 1       4 5 20 5 10 4 3 2 1       10
  • 11. Grouped data § In calculating the mean from grouped data, we assume that all values falling into a particular class interval are located at the midpoint of the interval. It is calculated as follows: 11    fi mifi Sample ) ( mean § Where: mi=the midpoint of the ith class interval fi= the frequency of the ith class interval
  • 12. Example. Compute the mean age of 169 subjects from the grouped data 12
  • 13. Properties of Arithmetic Mean § For a given set of data there is one and only one arithmetic mean (uniqueness) § Easy to calculate and understand (simple) § Influenced by each and every value in a data sets § Greatly affected by the extreme values § Poor measure of location if the underlying distribution is not normal (or not Gaussian) § In case of grouped data if any class interval is open, arithmetic mean can not be calculated 13
  • 14. Median: Ungrouped data § In an ordered array, the median is the “middle” number – If n or N is odd, the median is the middle number – If n or N is even, the median is the average of the two middle numbers 0 1 2 3 4 5 6 7 8 9 10 Median = 3 0 1 2 3 4 5 6 7 8 9 10 Median = 3 § The median is the value of the middle term in a data set that has been ranked in increasing order 14
  • 16. Grouped data § In calculating the median from grouped data, we assume that the values within a class-interval are evenly distributed through the interval § The first step is to locate the class interval in which the median is located, using the following procedure § Find n/2 and see a class interval with a minimum cumulative frequency which contains n/2 § Then, use the following formula 16
  • 17. where, Lm = lower true class boundary of the interval containing the median Fc = cumulative frequency of the interval just above the median class interval fm = frequency of the interval containing the median W= class interval width n = total number of observations c m m n F 2 x = L W f              17
  • 18. Example: Compute the median age of 169 subjects from the grouped data 18
  • 19. § n/2 = 169/2 = 84.5 § n/2 = 84.5 = in the 3rd class interval § Lower limit = 29.5, Upper limit = 39.5 § Frequency of the class = 47 § (n/2 – fc) = 84.5-70 = 14.5 Median = 29.5 + (14.5/47)10 = 32.58 33  19
  • 20. Properties of Median § There is only one median for a given set of data (uniqueness) § The median is easy to calculate § Median is a positional average and hence it is insensitive to very large or very small values § Median can be calculated even in the case of open end intervals if sample size known § It is determined mainly by the middle points and less sensitive to the remaining data points (weakness) 20
  • 21. Mode: Ungrouped data § Value that occurs most often § Not affected by extreme values § Used for either numerical or categorical data § There may be no mode § There may be several modes 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Mode = 5 0 1 2 3 4 5 6 No Mode 21
  • 22. Mode: Grouped data § To find the mode of grouped data, we usually refer to the modal class, where the modal class is the class interval with the highest frequency § If a single value for the mode of grouped data must be specified, it is taken as the mid-point of the modal class interval 22
  • 23. Properties of Mode § It is not affected by extreme values § It can be calculated for distributions with open end classes § Often its value is not unique § The main drawback of mode is that often it does not exist 23
  • 24. Measures of Relative Standing § Where does one particular measurement stand in relation to the other measurements in the data set? § Descriptive measures that locate the relative position of an observation in relation to the other observations are called measures of relative standing 24
  • 25. Measures of Relative Standing Measures of Relative Standing Percentiles Quartiles n 1st quartile = 25th percentile n 2nd quartile = 50th percentile = median n 3rd quartile = 75th percentile § The pth percentile in a data array: is a number such that p% of the observations of the data set fall below and (100-p)% of the observations fall above it. (where 0 ≤ p ≤ 100) 25
  • 26. Percentiles § The pth percentile in an ordered array of n values is the value in ith position, where n Example: The 60th percentile in an ordered array of 19 values is the value in 12th position: 1) (n 100 p i   12 1) (19 100 60 1) (n 100 p i      26
  • 27. 27 § Commonly used percentiles – First (lower) decile = 10th percentile – First (lower) quartile, Q1 = 25th percentile – Second (middle)quartile,Q2 = 50th percentile – Third quartile, Q3 = 75th percentile – Ninth (upper) decile = 90th percentile Percentiles
  • 28. Quartiles § Quartiles Split Ordered Data into 4 equal portions § Q1 and Q3 are Measures of Non-central Location § Q2 = the Median 25% 25% 25% 25%   1 Q   2 Q   3 Q 28
  • 29. Quartiles § Each Quartile has position and value – With the data in an ordered array, the position of Qi is: – The value of Qi is the value associated with that position in the ordered array § Example: Data in Ordered Array: 11 12 13 16 16 17 18 21 22     1 1 1 9 1 12 13 Position of 2.5 12.5 4 2 Q Q           1 4 i i n Q   29
  • 30. Example The prices ($) of 18 brands of walking shoes: 40 60 65 65 67 68 68 70 70 70 70 70 70 74 75 75 90 95 üQ1is 3/4 of the way between the 4th and 5th ordered measurements, or Q1 = 65 + .75(67 - 65) = 66.5. 30
  • 31. Example The prices ($) of 18 brands of walking shoes: 40 60 65 65 65 68 68 70 70 70 70 70 70 74 75 75 90 95 üQ3 is 1/4 of the way between the 14th and 15th ordered measurements, or Q3 = 74 + .25(75 - 74) = 74.25 üAnd IQR = Q3 – Q1 = 74.25 – 66.5 = 7.75 31
  • 32. Shape of a Distribution § Describes how data is distributed § Measures of Shape - Symmetric or skewed (asymmetric) Mean = Median = Mode Mean < Median < Mode Mode < Median < Mean Right-Skewed Left-Skewed Symmetric (Longer tail extends to left) (Longer tail extends to right) 32
  • 33. The Five Number Summary § One way to give a nice profile of a data set is the “five-number summary,” which consists of: 1. The smallest measurement 2. The first quartile, Q1 3. The median, Q2 4. The third quartile, Q3 5. The largest measurement § Displayed visually using a box-and-whiskers plot 33
  • 34. The Box-and- Whisker plot § 5-number summary – Median, Q1, Q3, Xsmallest, Xlargest § Box Plot – Graphical display of data using 5-number summary Median ( ) 4 6 8 10 12 Maximum Minimum 1 Q 3 Q 2 Q 34
  • 35. Distribution Shape & Box-and-Whisker Plot Right-Skewed Left-Skewed Symmetric 1 Q 1 Q 1 Q 2 Q 2 Q 2 Q 3 Q 3 Q 3 Q 35 § Skewed distributions usually have a long whisker in the direction of the skewness
  • 36. Shape of a Distribution and Quartiles § If the distribution is symmetric, then the upper and lower quartiles should be approximately equally spaced from the median § If the upper quartile is farther from the median than the lower quartile, then the distribution is positively skewed § If the lower quartile is farther from the median than the upper quartile, then the distribution is negatively skewed 36
  • 37. Outlier § A value located at a distance of more than 1.5(IQR) from the box üLower fence: Q1-1.5 IQR üUpper fence: Q3+1.5 IQR § Measurements beyond the upper or lower fence are outliers and are marked with * * 37
  • 38. Measures of Variation Variation Variance Standard Deviation Coefficient of Variation Population Variance Sample Variance Population Standard Deviation Sample Standard Deviation Range Interquartile Range 38
  • 39. Measures of Variation § Measures that quantify the variation or dispersion of a set of data from its central location § The amount may be small when the values are close together and large when the values are far apart from each other § If all the values are the same, no dispersion § How much are the observations spread out around the mean value? 39
  • 40. § Measures of variation give information on the spread or variability of the data values Measures of Variation Same center, different variation 40
  • 41. Measures of Variation § The more Spread out or dispersed data, the larger the measures of variation § The more concentrated the data, the smaller the measures of variation § If all observations are equal, measures of variation = Zero § All measures of variation are Non-negative 41
  • 42. Range § Simplest measure of variation § Difference between the largest and the smallest observations: Range = xmaximum – xminimum 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Range = 14 - 1 = 13 Example: 42
  • 43. § Ignores the way in which data are distributed § Sensitive to outliers 7 8 9 10 11 12 Range = 12 - 7 = 5 7 8 9 10 11 12 Range = 12 - 7 = 5 Disadvantages of the Range 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120 Range = 5 - 1 = 4 Range = 120 - 1 = 119 43
  • 44. Interquartile Range § We can eliminate some outlier problems by using the interquartile range § Eliminate some high-and low-valued observations and calculate the range from the remaining values § Also known as midspread – Spread in the middle 50% § Interquartile range = 3rd quartile – 1st quartile 44
  • 45. Interquartile Range Median (Q2) X maximum X minimum Q1 Q3 Example: 25% 25% 25% 25% 12 30 45 57 70 Interquartile range = 57 – 30 = 27 § Not affected by extreme values 45
  • 46. § Shows variation about the mean § Average of squared deviations of values from the mean – Sample variance: – Population variance: Variance N μ) (x σ N 1 i 2 i 2     1 - n ) x (x s n 1 i 2 i 2     46
  • 47. Standard Deviation § Most commonly used measure of variation § Shows variation about the mean § Has the same units as the original data - Sample standard deviation: - Population standard deviation: N μ) (x σ N 1 i 2 i     1 - n ) x (x s n 1 i 2 i     47
  • 48. Variance vs. Standard Deviation § Both measure the average “scatter” about the mean § Variance computations produce “squared” units which makes interpretation more difficult – For example, kg2 is meaningless. § Since it is the square root of the Variance, the Standard Deviation is expressed in the same units as the original data § Therefore, the Standard Deviation is the most commonly used measure of variation 48
  • 49. Comparing Standard Deviations Mean = 15.5 s = 3.338 11 12 13 14 15 16 17 18 19 20 21 11 12 13 14 15 16 17 18 19 20 21 Data B Data A Mean = 15.5 s = .9258 11 12 13 14 15 16 17 18 19 20 21 Mean = 15.5 s = 4.57 Data C 49
  • 50. Coefficient of Variation § Measures relative variation § Always in percentage (%) § Shows variation relative to mean § Is used to compare two or more sets of data measured in different units Population Sample s CV = ×100% X       σ CV = ×100% μ       50
  • 51. Compare the Coefficient of Variation between data A, data B and Data C Mean = 15.5 s = 3.338 11 12 13 14 15 16 17 18 19 20 21 11 12 13 14 15 16 17 18 19 20 21 Data B Data A Mean = 15.5 s = .9258 11 12 13 14 15 16 17 18 19 20 21 Mean = 15.5 s = 4.57 Data C 51 § Which data more Spread out around the mean?
  • 52. § If the data distribution is bell-shaped, then the interval: § contains about 68% of the values in the population § contains about 95% of the values in the population § contains about 99.7% of the values in the population The Empirical Rule 1σ μ  μ 2σ  μ 3σ  52
  • 54. Summary § Quantitative data are usually described by a measure of central tendency and a measure of variation § In describing data, it is important to select the measure of central tendency that most accurately represents the data § To do so, it is important to know if data is symmetrical or skewed 54