Descriptive statistics

DESCRIPTIVE STATISTICS
PRESENTED BY
AKHIL C A
FIRST YEAR PG
DEPT.OF PUBLIC HEALTH DENTISTRY
SCB DENTAL COLLEGE,CUTTACK
2
SEMINAR-2

CONTENTS
 INTRODUCTION
 BASIC CONSIDERATIONS
 MEASURES OF FREQUENCY
 MEASURES OF CENTRAL TENDENCY
 MEASURES OF DISPERSION
 MEASURES OF LOCATION
 MEASURE OF SHAPE
 BOX-AND-WHISKER PLOTS
 SUMMARY
 CONCLUSION
 PUBLIC HEALTH SIGNIFICANCE
 REFERENCE
3

INTRODUCTION
 Nearly everyday statistics are used to support assertions about health
and what people can do to improve their health; like the roles of
diet,exercise,the environment etc..
 Because the effects are often small and vary greatly from person to
person an understanding of statistics and how it allows researchers to
draw conclusions from data is very essential for every person
interested in public health.
 Statistics play a crucial role in research ,planning and decision making
in the health sciences.
4
Perez S, Ruizb M. Descriptive statistics .Allergol Immunopathol.2009;37(6):314–320
Kaur P, Stoltzfus J, Yellapu V. Descriptive statistics.Int J Acad Med 2018;4:60-63

INTRODUCTION
 Statistics is the set of procedures and techniques used to collect, organise
and analyse data, which are the basis for making decisions in situations of
uncertainty.
 Data generally consist of an extensive number of measurements or
observations that are too numerous or complicated to be understood
through simple observation.
 There are ways to condense and organize information into a set of
descriptive measures and visual devices that enhance the understanding
of complex data.
5

BASIC CONSIDERATIONS
 Statistics are divided into descriptive and inferential.
Descriptive statistics refers to the collection, presentation, description,
analysis and interpretation of data collected.
 Its purpose is to summarize the findings from a set of values.
 And it helps to form a set of conclusions about themselves.
 It can be used to summarise or describe any data set, either a population
or a sample.
6

Inferential statistics refers to the set of techniques used to gain conclusions
from the population through manipulation of the sample data.
 It is the process of making generalisations about the population from a
representative sample of data.
 Before analyzing any dataset, one should be familiar with different types of
variables.
Variable; Any quantity that varies. Any attribute, phenomenon, or event that
can have different values.
7
Porta M, Last JM, Greenland S. A Dictionary of Epidemiology. 5th ed. Oxford: Oxford University Press (OUP); 2008.

 Types of variable;
Quantitative Variables
 Quantitative variables are numerical scales that measure the amount of
something.
 Example, height,weight of preschool children, and the age of patients
seen in a dental clinic.
8
• QUANTITATIVE VARIABLE
• QUALITATIVE VARIABLE

 There are two types of numerical scales:
Continuous and Discrete
 Continuous numerical variables can take any value between two
points (e.g., weight,length), so they are values with decimals.
 Discrete numerical variables take values from discrete scales, integer
values (e.g no; of patients).
9

Qualitative variables/Categorical variables
 Some characteristics are not capable of being measured but can be
categorized only.
 for example, when an ill person is given a medical diagnosis, a
person is designated as belonging to an ethnic group, or a person,
place, or object is said to possess or not to possess some
characteristic of interest.
 In such cases measuring consists of categorizing.
10

Measurement Scales
Scale ; A device or system for measuring equal portions.
The Nominal Scale
 The lowest measurement scale is the nominal scale.
 As the name implies it consists of “naming” observations or classifying
them into various mutually exclusive and collectively exhaustive
categories.
 Examples include such dichotomies as male–female,adults-children,
those with periodontal diseases – dental caries etc..
11
Daniel WW, Cross CL. Biostatistics: basic concepts and methodology for the Health Sciences. 10th ed . New Delhi : Wiley; 2014.
Porta M, Last JM, Greenland S. A Dictionary of Epidemiology. 5th ed. Oxford: Oxford University Press (OUP); 2008.

The Ordinal Scale
 Whenever observations are not only different from category to category
but can be ranked according to some criterion, they are said to be
measured on an ordinal scale.
 Individuals may be classified according to socioeconomic status as
low,medium, or high.
 The intelligence of children may be above average, average, or below
average.
 Oral hygiene or prognosis as poor,fair,good etc..
12

The Interval Scale
 Is a more sophisticated scale than the nominal or ordinal in that with
this scale not only it is possible to order measurements, but also the
distance between any two measurements is known.
 Difference between a measurement of 20 and a measurement of 30 is
equal to the difference between measurements of 30 and 40.
 The ability to do this implies the use of a unit distance and a zero point,
both of which are arbitrary.
13

 The selected zero point is not necessarily a true zero
 best example of an interval scale is provided by the way in which
temperature is usually measured (degrees Fahrenheit or Celsius).
The Ratio Scale
 The highest level of measurement is the ratio scale. This scale is
characterized by the fact that equality of ratios as well as equality of
intervals may be determined.
 Fundamental to the ratio scale is a true zero point.
 The measurement of such familiar traits as height, weight, and length
makes use of the ratio scale.
14

MEASURES OF FREQUENCY
Absolute Frequency
Is the number of times a particular value occurs in the data.
Relative Frequency
 Is the number of times a particular value occurs in the data (absolute
frequency) relative to the total number of values for that variable.
 The relative frequency may be expressed in proportions, and
percentages.
15

CLASS
INTERVAL
ABSOLUTE
FREQUENCY
RELATIVE
FREQUENCY
30-39 11 .0582
40-49 46 .2434
50-59 70 .3704
60-69 45 .2381
70-79 16 .0847
80-89 1 .0053
TOTAL 189 1.0001
16
Frequency Distributions of the Ages of Subjects
Measures of Frequency

Rate
 A rate measures the occurrence of some particular event
(development of disease or the occurrence of death) in a population during a
given time period.
 It is a statement of the risk of developing a condition.
Ratio
 Another measure of disease frequency is a ratio.
 It expresses a relation in size between two random quantities.
 The numerator is not a component of the denominator.
 The ratio of white blood cells relative to red cells is 1:600 or 1/600,
meaning that for each white cell, there are 600 red cells
17
Park K. Park's textbook of Preventive and social medicine. 25th ed. India: Bhanot Publishers; 2017.

 Proportion A proportion is a ratio which indicates the relation in magnitude
of a part of the whole. The numerator is always included in the
denominator.
 Percentage is another way of expressing a proportion as fraction of 100.
 The total percentage of an entire dataset should always add up to 100%.
For example, in total of thirty participants, where 2 experience adverse
drug effects, 2/30 = 0.066 × 100 = 6.6% of participants experience adverse
effects.
18

 Measures of frequency are often expressed visually in the form of
tables, histograms (for quantitative variables), or bar graphs (for
qualitative variables) to make the information more easily interpretable.
19
HISTOGRAM
BAR DIAGRAM

MEASURES OF CENTRAL
TENDENCY
 When given a set of raw data one of the most useful ways of
summarising that data is to find an average of that set of data.
 An average is a measure of the centre of the data set.
 There are three common ways of describing the centre of a set of
numbers. They are the
-The mean
-The median
-The mode
20
Pagano M,Gauvreau K. Principles Of Biostatistics. 2nd ed. New Delhi:Ceneage Learning India Pvt Ltd; 2000.

Arithmetic Mean
 The most familiar measure of central tendency is the arithmetic mean.
 It is the descriptive measure most people have in mind when they
speak of the “average.”
 To obtain the mean, the individual observations are first added
together, and then divided by the number of observations
21
Park K. Park's textbook of Preventive and social medicine. 25th ed. India:
Bhanot Publishers; 2017.
Measures Of Central Tendency

Properties of the Mean
 Uniqueness. For a given set of data there is one and only one
arithmetic mean.
 Simplicity. The arithmetic mean is easily understood and easy to
compute.
 Since each and every value in a set of data enters into the
computation of the mean, it is affected by each value. Extreme values,
distort it that it becomes undesirable as a measure of central tendency.
22

The Sample Mean
 We use x- to designate the sample mean and n to indicate the number of
values
in the sample.
The population mean
In a finite population of values, represented by xN, where N is the number of
values in the population. Finally, we will use the Greek letter µ to stand for the
population mean.
23

SUBJECT FEV1
(liters)
1 2.30
2 2.15
3 3.50
4 2.60
5 2.75
6 2.82
7 4.05
8 2.25
9 2.68
10 3.00
11 4.02
12 2.85
13 3.38
24
Forced expiratory volumes

 The mean can be used as a summary measure for both discrete and
continuous measurements.
 In general, however, it is not appropriate for either nominal or ordinal data.
 One exception to this rule applies when we have dichotomous data and the
two possible outcomes are represented by the values 0 and 1.
 In this situation, the mean of the observations is equal to the proportion of 1s
in the data set.
 Eg; asthmatic males=[0+ 1 + 1 + 0 + 0 + 1 + 1 + 1 + 0 + 1 + 1 + 1 + 0]/13
= 0.615(Therefore, 61.5% of the study subjects are males)
25

Grouped Data
Subject Duration
(years)
1 12
2 11
3 12
4 6
5 11
6 11
7 8
8 5
9 5
10 5
26
Duration of transfusion in the sickle cell disease
Pagano M,Gauvreau K. Principles Of Biostatistics. 2nd ed. New Delhi:Ceneage
Learning India Pvt Ltd; 2000.

 To compute the mean of a set of data arranged as a frequency
distribution, we begin by assuming that all the values that fall into a
particular interval are equal to the midpoint of that interval.(group mean)
 To find the mean of the grouped data, we first sum the measurements by
multiplying the midpoint of each interval by the corresponding frequency
and adding these products; we then divide by the total number of values.
27
k is the number of intervals in the table, m; is the
midpoint of the ith interval,
and!; is the frequency associated with the ith
interval.
K; is the number of intervals in the table.
m; is the midpoint of the ith interval.
fi
; is the frequency associated with the ith interval.

Absolute frequencies of serum
cholesterol levels
Cholester
ol Level
(mg/100
ml) of Men
Number
Of Men
80-119 13
120-159 150
160-199 442
200-239 299
240-279 115
280-319 34
320-359 9
360-399 5
Total 1067
28
Pagano M,Gauvreau K. Principles Of Biostatistics. 2nd ed. New Delhi:Ceneage

Geometric mean
 The geometric mean is a type of average, usually used for
growth rates, like population growth or interest rates.
 While the arithmetic mean adds items, the geometric
mean multiplies items.
 Also, you can only get the geometric mean for positive numbers.
29

 Suppose a rectangle has a dimensions of 9m*4m.
 What is the side length of a square with equivalent area?
 4m
 Square side length=(9*4) 1/2
= 6
30
9m
6m
6m

-Lets say stipend increases in value by 10% at the end of first year, 20%
the second year, and 30% the third year.Q;Average rate of increase per
year?
Year 0-1 lakh *1.1 (1.1)*(1.2)*(1.3)=1.716
Year 1-1.1 lakh *1.2 G.M=(1.716)1/3
Year 2-1.32 lakh *1.3 =1.1972
Year 3-1.71 lakh
31

Harmonic mean
 The Harmonic Mean (HM) is defined as the reciprocal of the arithmetic
mean of the given data values.
 It is based on all the observations, and it is rigidly defined.
 Harmonic mean gives less weightage to the large values and large
weightage to the small values to balance the values correctly.
 It is applied in the case of times and average rates.
32

 Since the harmonic mean is the reciprocal of the arithmetic mean, the
formula to define the harmonic mean “HM” is given as follows:
 If x1, x2, x3,…, xn are the individual items up to n terms, then,
 Harmonic Mean, HM = n / [(1/x1)+(1/x2)+(1/x3)+…+(1/xn)]
 Eg;
 We travel 10 km at 60 km/hr,than another 10 km at 20 km/hr,what is our
average speed?
 H.M=2/[(1/60)+(1/20)]=30km/hr
33

Median
 The median is an average of a different kind, which does not depend
upon the total and number of items.
 To obtain the median, the data is first arranged in an ascending or
descending order of magnitude, and then the value of the middle
observation is located, which is called the median.
 Positional average of a data set.
 The median of a finite set of values is that value which divides the set
into two equal parts
 If the number of values is odd, the median will be the middle value
when all values have been arranged in order of magnitude.
34

 When the number of values is even, there is no single middle value.
Instead there are two middle values.
 In this case the median is taken to be the mean of these two middle
values, when all values have been arranged in the order of their
magnitudes.
 Therefore, if a set of data contains a total of ‘n’ observations where ‘n’ is
odd, the median is the middle value, or the
[(n + 1)/2]th largest measurement;
 If n is even, the median is usually taken to be the average of the two
middlemost values, the (n/2)th and [(n/2) + 1]th observations
35

Properties of the Median
 Uniqueness. As is true with the mean, there is only one median for a
given set of data.
 Simplicity. The median is easy to calculate.
 It is not as drastically affected by extreme values as is the mean,which
makes it the best measure of central tendency when the data is
skewed.
36

 For eg;consider the set of;
2.15, 2.25, 2.30, 2.60, 2.68, 2.75, 2.82, 2.85, 3.00, 3.38, 3.50, 4.02, 4.05.
 Since there are an odd number of observations in the list, the median is the
(13 + 1)/2 =7th observation, or 2.82. Six of the measurements are less than or
equal to 2.82 liters, and six are greater than or equal to 2.82.
 subject 12 was recorded as 40.2 rather than 4.02;
2.15, 2.25, 2.30, 2.60, 2.68, 2.75, 2.82, 2.85, 3.00, 3.38, 3.50, 4.05, 40.2.
 median FEV1 would remain 2.82liters ,unlike the mean as it is much less
sensitive to unusual data points.
37

Mode
 The mode is the commonly occurring value in a distribution of data.
 It is the most frequent item or the most "fashionable" value in a series of
observations.
 The mode is located from the frequency distribution table ,taking the value
of the variable with the maximum frequency.
 When mode is ill defined it can be calculated using the equation;
Mode=3Median-2Mean
 If all the values are different there is no mode;
 on the other hand, a set of values may have more than one mode.
38

 Example;
 Let us consider a laboratory with 10 employees whose ages are
20, 21, 20, 20, 34, 22, 24, 27, 27, 27
 We could say that these data have two modes, 20 and 27.
 The sample consisting of the values 10, 21, 33, 53, 54 has no mode
 If it is required to know the value that has high influence in the series
mode may be computed.
39

 The mode may be used also for describing qualitative data.
 For example, suppose the patients seen in a mental health clinic during a
given year received one of the following diagnoses: mental retardation,
organic brain syndrome, psychosis, neurosis, and personality disorder.
 The diagnosis occurring most frequently in the group of patients would be
called the modal diagnosis.
40

 The advantages of mode are that it is easy to understand, and is not
affected by the extreme items.
 The disadvantages are that the exact location is often uncertain and is
often not clearly defined .
 Therefore, mode is not often used in biological or medical statistics.
41

Rationale for using
Mean,median And Mode
 The best measure of central tendency for a given set of data often
depends on the way in which the values are distributed.
 If they are symmetric and unimodal then the mean, the median and the
mode should all be roughly the same.
42

 If the distribution of values is symmetric but bimodal, so that the
corresponding frequency polygon would have two peaks, then the mean
and the median should again be approximately the same.
 This common value could lie between the two peaks, and hence be a
measurement that is extremely unlikely to occur.
 Here it might be better to report two modes rather than the mean or the
median
43

 When the data are not symmetric, the median is often the best measure of
central tendency.
 Because the mean is sensitive to extreme observations, it is pulled in the
direction of the outlying data values.
44

SKEWNESS
 Skewness is a measure of the symmetry of the variable, used to show
the data distribution around its mean.
 If a distribution is symmetric, the left half of its graph (histogram or
frequency polygon) will be a mirror image of its right half.
45

 When the left half and right half of the graph of a distribution are not
mirror images of each other, the distribution is asymmetric.
 If the graph (histogram or frequency polygon) of a distribution is
asymmetric, the distribution is said to be skewed .
46

 If a distribution is not symmetric because its graph extends further to
the right than to the left, that is, if it has a long tail to the right,
positively skewed. its mean is greater than its mode.
47

 If a distribution is not symmetric because its graph extends further to
the left than to the right, that is, if it has a long tail to the left, negatively
skewed. its mean is less than its mode.
48

MEASURES OF DISPERSION
 A measure of dispersion conveys information regarding the amount of
variability present in a set of data.
 If all the values are the same, there is no dispersion; if they are not all
the same, dispersion is present in the data.
 The amount of dispersion may be small when the values, though
different, are close together.
 Other terms used synonymously with dispersion include variation,
spread, and scatter.
49
Nicholas J. Introduction to descriptive statistics. Sydney: Mathematics Learning Centre, University of Sydney; 1990.

 Measures of dispersion includes
 Range
 Mean Deviation
 Variance
 Standard deviation
 Coefficient of variation.
50
Measures Of Dispersion

Range
 The range is by far the simplest measure of dispersion
 The range is the difference between the largest and smallest value in a
set of observations. If we denote the range by R, the largest value by
XL, and the smallest value by XS,
 R=XL-XS
 If we have grouped data, the range is taken as the difference between
the mid-points of the extreme categories.
51

 Ordinarily in medical practice, the normal range covers the
observations falling in 95% confidence limits.
 The main advantage in using the range is the simplicity of its
computation.
 As a measure of dispersion the range is severely limited. Since it
depends only on two observations, the lowest and the highest, we will
get misleading idea of dispersion if these values are outliers.
52

 Eg;The marks of a group of thirty students on two tests.
 On test A, the range is 70 − 45 = 25.
 On test B, the range is 72−40 = 32, but apart from the outliers, the
distribution of marks on test B is clearly less spread out than that of A.
53

Mean Deviation
 If mean blood pressure of a large representative series is taken, some
observations are found above the mean or plus and others are below
the mean or minus.
 summing up the differences or deviations from the mean, in any
distribution, the sum of plus and minus differences will be equal and
the net balance will be zero.
54

 vertical parallel lines or modulus indicates deviations from the mean
ignoring negative sign or it is taken as positive.
 Though simple and easy, mean deviation is not used in statistical
analyses being of less mathematical value, particularly in drawing
inferences.
55

Example : The diastolic blood pressure of
10 individuals was as follows :
 83, 75, 81 , 79, 71, 95, 75, 77, 84 ,90.
The mean deviation?
 Mean=810/10=81
 M.D=56/10=5.6
56
B.P MEAN DEVIATION
83 81 2
75 81 -6
81 81 0
79 81 -2
71 81 -10
95 81 14
75 81 -6
77 81 -4
84 81 3
90 81 9

Variance
 The variance quantifies the amount of variability, or spread, around the
mean of the measurements.
 To accomplish this, we might simply attempt to calculate the average
distance of the individual observations from :X-
 It is therefore easy to see that the variance can be described as the
average squared deviation of individual values from the mean of that set.
57

 In computing the variance of a sample of values the square of each
difference is used to ensure a positive numerator and hence a much
more valuable measure of dispersion.
 The Variance When the values of a set of observations lie close to their
mean, the dispersion is less than when they are scattered over a wide
range.
58

Standard Deviation (R.M.S deviation)
 The variance represents squared units and, therefore, is not an
appropriate measure of dispersion when we wish to express this
concept in terms of the original units.
 To obtain a measure of dispersion in original units, we merely take the
square root of the variance. The result is called the standard deviation.
 It is the most frequently used
measure of deviation.
59

Uses of standard deviation
 It summarizes the deviations of a large distribution from mean in one
figure used as a unit of variation.
 Indicates whether the variation of difference of an individual from the
mean is by chance, or due to some special reasons.
 It also helps in finding the suitable size of sample.
60

B.P Deviation
From Mean
Squared
Deviation
83 2 4
75 -6 36
81 0 -
79 -2 4
71 -10 100
95 14 196
75 -6 36
77 -4 16
84 3 9
90 9 81
Total 482
61
 n =10
 Mean = 81
 Variance(s2
) =(482/9)=53.55
 S.D = (variance)1/2
= (482/9)1/2
= 7.31

Coefficient of Variation
 When one desires to compare the dispersion in two sets of data,
comparing the two standard deviations may lead to fallacious results. It
may be that the two variables involved are measured in different units.
 Needed in situations like these is a measure of relative variation rather
than absolute variation.
 Coefficient of variation (CV) is used to compare the variability of one
character in two different groups having different magnitude of values
or two characters in the same group by expressing in percentage.
62

 The coefficient of variation is calculated from standard deviation and
mean of the characteristic. The ratio of SD and mean is found in
percentage. Thus SD expressed as percentage of mean is the
coefficient of variation.
 Example;
In two series of adults aged 21 years and children 3 months old
following values were obtained for the height.which series shows
greater variation?
Persons Mean hight SD
63

 Thus, we find that heights in children show greater variation
than in adults being in the ratio of 8.33%/6.25 or 1.3:1.0
64
Person
s
Mean Height(cm) Standard Deviation
Adults 160 10
children 60 5
CV-adults =(10/160)*100 =6.25%
CV-children =(5/60)*100 =8.33%

Degrees of Freedom
 The phrase “degrees of freedom” was introduced by Sir Ronald Fisher in
1922 without mentioning its purpose.
 Everett (2002) explains degrees of freedom as ;
‘Essentially the term means the number of independent units of information in
a sample relevant to the estimation of a parameter or the calculation of a
statistic.’
65
Eisenhauer JG. Degrees of Freedom. Teaching Statistics. 2008;30(3):75–8.

 A setof independent results x1,x2, . . .xn has n degrees of freedom, but “n–
1” if the mean x- is known, since any one of the xi is now dependent on the
sum of the others.
 Eg;sample of n=5.to calculate the sample variance, Suppose we find x-=10.
 Since n x-=50,the sum of all five observations equals 50; thus four
observations could be freely altered, but once any four of the observations
are fixed, the final observation is determined by default.
66

 Consequently there are only “n-1” degrees of freedom (df) for use in
calculating the sample variance;the effective sample size has been
reduced to df =n– 1.
 Note that a sample of size n retains n degrees of freedom if the population
mean μ is known, since this does not determine xi for
i=1 . . .n if the other(n– 1) values are known.
 The concept is of importance in statistical inference since it defines the
effective size of a sample.’
67

MEASURES OF LOCATION
PERCENTILES AND QUARTILES
 The mean and median are special cases of a family of parameters known
as location parameters.
 These descriptive measures are called location parameters because they
can be used to designate certain positions on the horizontal axis when the
distribution of a variable is graphed.
 In that sense the so-called location parameters “locate” the distribution on
the horizontal axis.
68

 Centiles or percentiles are values in a series of observations
arranged in ascending order of magnitude which divide the distribution
into 100 equal parts.
 Given a set of n observations; x1; x2; ... xn
 The pth percentile P is the value of X such that p percent or less of the
observations are less than P and (100 –p) percent or less of the
observations are greater than P.
69
Measures Of Location

 Thus, the median is 50th centile. The 50th percentile will have 50%
observations on either side.
 Accordingly, 10th percentile should have 10% observations to the left
and 90% to the right.
 Eg;If children at age 3½ years form 10th percentile, it means 10% of
entire population is below 3½ years of age and 90% is above that age
70

 Quartiles: They are 3 different points located on the entire range of a
variable such as height—Q1, Q2 and Q3.
 Q1 or lower quartile will have 25% observations of heights falling on its
left and 75% on its right;
71

72
 Q2 or median will have 50% of
observations on either side.
 Q3 will have 75% observations on its left and 25% on its right.

Eg;
 Suppose now we have a small data set of twelve observations
15 18 19 20 20 20 21 23 23 24 24 25
73

 15 18 19 20 20 20 21 23 23 24 24 25
 To find the first quartile we consider the observations less than the
median.
15 18 19 20 20 20
 we consider the observations which are greater than the median.
21 23 23 24 24 25
74
First quartile(Q1)=19.5
Third quartile(Q3)=23.5
Median(Q2)=20.5

 To find the observed value corresponding to a given percentile
 K=[n*p%]/100 k=largest observation
If “k”is not an integer round it up to next integer.
 Eg; n=12(previous example)
75th percentile=(12*75)/100=9
Since k=9,the midpoint between 9th and 10th observations corresponds to the
75th percentile.
75

Interquartile Range
 The interquartile range (IQR) is the difference between the third and
first quartiles.
 The range provides a crude measure of the variability present in a set of
data.A similar measure that reflects the variability among the middle 50
percent of the observations in a data set is the interquartile range.
76

 A large IQR indicates a large amount of variability among the middle 50
percent of the relevant observations,
 small IQR indicates a small amount of variability among the relevant
observations.
77
 we see that 50% of the area is
between the first and third
quartiles.
 This means that 50% of the
observations lie between the first
and third quartiles.

MEASURE OF SHAPE
KURTOSIS
 Kurtosis is a measure of the degree to which a distribution is “peaked”
or flat in comparison to a normal distribution whose graph is
characterized by a bell-shaped appearance.
78

mesokurtic
 A normal, or bell-shaped distribution, is said to be
mesokurtic.
79
Measure Of Shape

80
platykurtic
A distribution, in comparison to a normal distribution, may possesses an
excessive proportion of observations in its tails, so that its graph exhibits a
flattened appearance. Such a distribution is said to be platykurtic.
Measure Of Shape

Leptokurtic
 Conversely, a distribution, in comparison to a
normal distribution, may possess a smaller proportion of observations in
its tails, so that its graph exhibits a more peaked appearance. Such a
distribution is said to be leptokurtic.
81
Measure Of Shape

 A perfectly mesokurtic distribution has a kurtosis measure of 3 based
on the equation.
82
Measure Of Shape

83
 Most computer algorithms reduce the measure by 3, as is done in Equation
so that the kurtosis measure of a;
Mesokurtic distribution -zero
Leptokurtic distribution -positive
Leptokurtic distribution -negative
Measure Of Shape

BOX-AND-WHISKER PLOTS
 The box-plot is another way of representing a data set graphically.
 It is constructed using the quartiles, and gives a good indication of the
spread of the data set and its symmetry.(or lack of symmetry).
 It is a very useful method for comparing two or more data sets.
 The box-plot consists of a scale, a box drawn between the first and third
quartile, the median placed within the box, whiskers on both sides of the
box and outliers (if any).
84

85
Box-and-whisker Plots

Steps;
86
Represent the variable of interest on the horizontal axis.
Draw a box in the space above the horizontal axis in such a way that
the left end of the box aligns with the first quartile Q1 and the right end
of the box aligns with the third quartile Q3.
Divide the box into two parts by a vertical line that aligns with the
median Q2.

87
Draw a horizontal line called a whisker from the left end of the box to a
point that aligns with the smallest measurement in the data set.
Draw another horizontal line, or whisker, from the right end of the box
to a point that aligns with the largest measurement in the data set.

Outliers
 Outliers are data uncommonly distant from the rest of the sample.
 One method for identifying outliers is by calculating percentiles. Outliers
are considered to be those which fall outside the range:
 A graphical way to verify the existence of outliers is through the box-and-
whisker diagram or boxplot.
 The boxplot represent a box whose edges are 25th and 75th percentiles,
with the median of the data and lines or whiskers that are P25 -1.5IQR
and P75+1.5IQR quantities, identifying as outliers the points that fall
outside this range.
88

 Eg;
Step 1:Marks of 21 students out of 100
30 44 45 46 47 48 49
49 50 51 52 53 53 54
54 59 59 61 70 81 81
89
 Median (Q2) = 52
 First quartile(Q1) = 47.5
 Third quartile (Q3) = 59
 IQR =11.50

BOX PLOT
90

SUMMARY
 Descriptive statistics are used to summarize data in an organized manner
by describing the charecteristics of variables in a sample or population.
 Calculating descriptive statistics should always occur before making
inferential statistical comparisons.
 Descriptive statistics include types of variables (nominal, ordinal, interval,
and ratio) as well as measures of frequency, central tendency,
dispersion/variation,location, and shape.
 Descriptive statistics condense data into a simpler summary,which makes
the further proceedings much easier.
91

CONCLUSION
 Descriptive statistics form a critical part of initial data analysis
 However descriptive statistics does not allow us to make conclusions
beyond the data we have analysed or reach conclusions regarding any
hypotheses we might have made.
 But it provide the foundation for comparing variables with inferential
statistical tests.
 Therefore, as part of good research practice, it is essential that one report
the most appropriate descriptive statistics using a systematic approach to
reduce the likelihood of presenting misleading results.
92

PUBLIC HEALTH
SIGNIFICANCE
 Accurate, comprehensive, high-quality data and statistics forms core
element of evidence-based public health policy.
 By raising health awareness among the general public, they can also help
achieve better social and health outcomes and reduce health inequalities.
 Since the results of statistical analysis are fundamental in influencing the
future of public health and health sciences, the appropriate use of
descriptive statistics allow health-care administrators and providers to
more effectively weigh the impact of health policies and programs.
93

REFERENCE
1. Park K. Park's textbook of Preventive and social medicine. 25th ed. India:
Bhanot Publishers; 2017.
2. Kothari CR. Research Methodology:Methods and Techniques.2nd revised
ed .New Delhi:New Age International Publishers;2004.
3. Peter S.Essentials Of Public Health Dentistry.5th ed.New Delhi:Arya Medi
Publishing House;2013.
4. Mahajan BK.Methods in Biostatistics For Medical Students And Research
Workers.6th ed.New Delhi:Jay pee Publishers;1997.
5. Daniel WW, Cross CL. Biostatistics: basic concepts and methodology for
the Health Sciences. 10th ed . New Delhi : Wiley; 2014.
94

6. Pagano M,Gauvreau K. Principles Of Biostatistics. 2nd ed. New Delhi:Ceneage
7. Kim JS, Dailey RJ. Biostatistics for oral healthcare. Ames: Blackwell
Munksgaard; 2008.
8. Nicholas J. Introduction to descriptive statistics. Sydney: Mathematics Learning
Centre, University of Sydney; 1990.
9. Porta M, Last JM, Greenland S. A Dictionary of Epidemiology. 5th ed. Oxford:
Oxford University Press ; 2008.
95
References

10. Eisenhauer JG. Degrees of Freedom. Teaching Statistics. 2008;30(3):75–
80.
11. Kaur P, Stoltzfus J, Yellapu V. Descriptive statistics.Int J Acad Med
2018;4:60-63
12. Perez S, Ruizb M. Descriptive statistics .Allergol
Immunopathol.2009;37(6):314–320
96

Descriptive statistics

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Descriptive statistics

Similar to Descriptive statistics (20)

Recently uploaded

Recently uploaded (20)

Descriptive statistics

Editor's Notes