Health information

Health information & basic
medical statistics (1)
Dr. Kuntala Ray
Associate Professor
1

Learning objective
• At the end of the class you can able to
• Define health information system(HIS)
• Enlist different source of health information
system
• Uses of HIS
• Define descriptive statistics
• Organise data
• Present in tabular form and graphical form
• Discuss the measures of central tendency &
dispersion
2

Health Information System
Definition : A mechanism for the collection, processing, analysis
and transmission of information required for organizing and
operating health services, and also for research and training.
Objectives:
• To provide reliable, relevant, up-to-date, adequate, timely and
reasonably complete information for health managers at all
levels
• At the sharing of technical and scientific (including
bibliographical) information by all health personnel
participating in the health services of a country;
• To provide at periodic intervals, data that will show the general
performance of the health services and
• To assist planners in studying their current functioning and
trends in demand and work load. 3

Components of health information system
• Demography and vital events;
• Environmental health statistics;
• Health status : mortality, morbidity, disability, and
quality of life;
• Health resources : facilities, beds, manpower;
• Utilization and non-utilization of health services :
attendance, admissions, waiting lists;
• Indices of outcome of medical care; and
• Financial statistics (cost, expenditure) related to
the particular objective.
4

Uses of health information
• To measure the health status of the people
and to quantify their health problems and
medical and health care needs;
• For local, national and international
comparisons of health status.
• For planning, administration and effective
management of health services and
programmes
5

Uses of health information
• For assessing whether health services are
accomplishing their objectives in terms of
their effectiveness and efficiency;
• For assessing the attitudes and degree of
satisfaction of the beneficiaries with the
health system; and
• For research into particular problems of
health and disease.
6

Sources of health information system
1. Census
2. Registration of vital events
3. Sample Registration System (SRS)
4. Notification of diseases
5. Hospital records
6. Disease registers
7. Record linkage
8. Epidemiological surveillance
9. Other health service records
10. Environmental health data
11. Health manpower statistics
12. Population surveys
7

Census
• Census means “to enumerate”
• It is taken at regular intervals, usually of 10 years.
• A census is defined by the United Nations as "the total process of
collecting. compiling and publishing demographic, economic and
social data pertaining at a specified time or times, to al/ persons in
a country or delimited territory“
• Census is a massive undertaking to contact every member of the
population in a given time and collect a variety of information
• It provides information regarding total population, density
according to per square kilometers of land area, decadal growth
rate, literacy rate, economic conditions, occupational
characteristics, and selected indicators of mortality like overall
death rate and infant mortality rate.
8

Census
• Methods of collection of data :
(a) De - facto method : Persons are enumerated according
to their location at the time of enumeration. This method
is used in developing countries like India.
(b) De- jure method : This method is used in developed
countries like U.S.A. The persons are assigned according
to their “usual” place of residence and not according to
their location at the time of census, as practiced in de -
facto method.
• This method provides a better indication of permanent
population and related socio - demographic factors of an
area, though it is more expensive
• The main drawback of census - the full results are usually
not available quickly.
9

Sample Registration System
• The SRS is undertaken under the authority of the Registrar General of India.
• Presently more than 3,700 sample units, each consisting of villages and
urban blocks.
• Each rural sampling unit has a complete village (maximum population of
1500) while each urban sampling unit is equivalent to an urban census
enumeration block with population of 750 to 1,000.
• SRS is based on a system of double recording method.
• The first part of record collection is done by a part time enumerator
(usually the local school teachers) in his or her area.
• In the second part, once in six months, an official from the SRS department,
who is a full time enumerator independently collects data form all the
households in the sample villages and urban blocks.
• These two sources of information are cross checked and the data which
does not match or only partially matches is then actually verified in the
field.
• The reports are published as annual report and half - yearly bulletins,
provides age and sex structure, fertility indicators and age and sex specific
death rates.
10

Registration of vital events
• It is the precursor of health statistics.
• If registration of vital events is complete and accurate, it can
serve as a reliable source of health information.
• United Nations defines- "legal registration, statistical recording
and reporting of the occurrence of, and the collection,
compilation, presentation, analysis and distribution of statistics
pertaining to vital events, i.e., live births, deaths, foetal
deaths, marriages, divorces, adoptions, legitimations,
recognitions, annulments and legal separations“
• This data being grossly deficient in regard to accuracy,
timeliness, completeness and coverage.
• There are also other reasons such as lack of uniformity in the
collection, compilation and transmission of data which is
different for rural and urban areas, and multiple registration
agencies
11

The Central Births and Deaths
Registration Act, 1969
• Govt. of India promulgated the Central Births and Deaths
Registration Act in 1969. The Act came into force on 1 April 1970.
• The Act provides for compulsory registration of births and deaths
throughout the country, and compilation of vital statistics.
• The Act also fixes the responsibility for reporting births and
deaths. While the public (e.g., parents, relatives) are to report
events occurring in their households, the heads of hospitals,
nursing homes, hotels, jails or dharmashalas are to report events
occurring in such institutions to the concerning Registrar.
• The time limit for registering the event of births and that of
deaths is 21 days uniformly all over India.
• In case of default a late fee can be imposed.
12

Notification of disease
• Historically notification of infectious diseases was the first health information
sub-system to be established.
• The primary purpose is to effect prevention and/or control of the disease.
• Notification is also a valuable source of morbidity data
• Usually diseases which are considered to be serious menaces to public health
are included in the list of notifiable diseases
• At the international level, the following diseases are notifiable to WHO in
Geneva under the International Health Regulations (IHR) , viz. cholera, plague
and yellow fever.
• A few others - louse- borne typhus, relapsing fever, polio, influenza, malaria,
rabies and salmonellosis are subject to international surveillance.
• Limitation :
(a) notification covers only a small part of the total sickness in the community
(b) the system suffers from a good deal of under-reporting
(c) Especially atypical and subclinical cases escape notification due to non-
recognition 13

Hospital record
• Like India, where registration of vital events is defective and
notification of infectious diseases extremely inadequate, hospital data
constitute a basic and primary source of information about diseases
prevalent in the community.
• The main drawbacks :
(a) They constitute only the "tip of the iceberg" - i.e., they provide
information on only those patients who seek medical care, but not on
a representative sample of the population. Mild cases & subclinical
cases are always missed
(b) The admission policy may vary from hospital to hospital; therefore
hospital statistics tend to be highly selective
(c) Population served by a hospital (population at risk) cannot be defined.
There are no precise boundaries to the catchment area of a hospital. In
effect, hospital statistics provide only the numerator (i.e. , the cases),
not the denominator. Extrapolation of hospital data to an entire
community is highly conjectural in estimating frequency rates 14

Hospital record
• Hospital data provides information on the following
aspects:
(a) geographic sources of patients
(b) age and sex distribution of different diseases and
duration of hospital stay
(c) distribution of diagnosis
(d) Association between different diseases
(e) Period between disease and hospital admission
(f) the distribution of patients according to different
social and biological characteristics,
(g) the cost of hospital care.
15

Organization and Presentation of Data
(Descriptive Statistics)
17

Statistics
Descriptive
Inferential
Correlational
Relationships
Generalising
Organising,
summarising &
describing data
Significance
18

What is Descriptive Statistics
• Before we can interpret or communicate the
information provided by an investigation, ‘the raw
data must be organized and presented in a clear and
intelligible way.’
• Descriptive Statistics describe specific characteristics
of data;
- how many cases fall into a category
- typical values,
- the degree of inter-relationship or correlation
among measurements/variables.
19

First Step: MASTER TABLE
• It is the first step of data compilation
• All the data/information are compiled in a table
• All the data of each subject is recorded row wise
• Variables are mentioned on top of each column
• All Tables and diagrams are prepared from this Master
Table.
• The title of the research work/project is the title of the
master table.
• Excel Data Sheet
20

Organization and Presentation of Discrete Data
(Nominal or Ordinal Scale)
• Tabular Presentation
- Frequency Distribution Table
• Graphic Presentation
- Bar Diagram/Chart
- Pie Diagram/Chart
- Pictogram
- Spot Map
21

Organization and Presentation of Continuous Data
(Interval or Ratio Scale)
• Tabular Presentation
- Grouped Frequency Distribution Table
• Graphic Presentation
- Histograms
- Frequency Polygons
- Ogive
- Scatter Diagram
22

General Principles in Designing a Table
• Tables should be numbered (Table 1, Table 2)
• Title of table: simple, brief & self explanatory
• Headings of columns & rows: clear & concise
• Data must be presented meaningfully, according
to size of importance, alphabetically or
geographically
• Footnotes may be given if necessary
24

Discrete/Categorical Data:
Frequency Distribution Table
• Table 1: Gender wise distribution of geriatric population of
Matigara block, Darjeeling
Gender Number Percentage
Male 50 47.6
Female 55 52.4
Total 105 100.0
25

Continuous Data:
Grouped Frequency Distribution Table
• Table 2: Distribution of the geriatric population according to
their height (N = 105)
Height in Cm. Number Percentage
160 – 162 4 3.8
162 – 164 11 10.5
164 – 166 12 11.4
166 – 168 12 11.4
168 – 170 16 15.3
170 – 172 15 14.3
172 – 174 12 11.4
174 – 176 10 9.5
176 - 178 8 7.6
178 – 180 5 4.8
Total 105 100.00
26

Contingency Table ( 2 X N).
Menopause Metabolic syndrome Total
Yes No
Yes 33
(60)
22
(40)
55
(100)
No 17
(25)
51
(75)
68
(100)
Total 50
(40.7)
73
(59.3)
123
(100)
Note: Parenthesis indicate percentages
Test statistics: Pearson Chi-Square = 16.982 df = 1 P value <0.05 (0.000)
Table 3: Prevalence of metabolic syndrome in menopausal women (n = 123)
27

Relative Frequency Table
Serum CK
(U/I)
Frequency Relative
Frequency
Cumulative Rel.
Frequency
20-39 1 0.028 0.028
40-59 4 0.111 0.139
60-79 7 0.194 0.333
80-99 8 0.222 0.555
100-119 8 0.222 0.777
120-139 3 0.083 0.860
140-159 2 0.056 0.916
160-179 1 0.028 0.944
180-199 0 0.000 0.944
200-219 2 0.056 1.000
Total 36 1.000
28

Frequency Distribution Tables: Comparative Features
For Categorical Data:
• Presentation is very simple
• There is no range of variability
• No class interval
• Frequency is displayed according to categories
For Continuous Data:
• Presentation is more cumbersome
• First split into convenient groups
• Frequency in each group is shown in adjacent column
• Class interval should be equal and not too broad or too narrow
• The number of classes should not be too many or too few
29

Graphical Presentation
(Discrete/Categorical Data)
30

Bar Diagram
• The length of the bar indicates the frequency
• Usually marked on the vertical line
• The characteristic is shown on the base line
• It presents the relative values of a qualitative
character in discrete data
• The space between two bars should be so adjusted
that the presentation may appear neat, clean and
easily comprehensible
• Type: Simple,Multiple,Proportional bar diagram
31

30
39
61
31
58
39
0
10
20
30
40
50
60
70
5 criteria 4 criteria 3 criteria 2 criteria 1 criteria 0 criteria
Number
Component of metabolic synd.
Total
Fig-1: Distribution of the component of metabolic syndrome among the
Study population.
32

Fig.-2:Religion wise distribution of population in 4 towns
Town
33

Fig-3: Gender wise distribution of school children in 3
schools of Matigara, Darjeeling
Schools
Frequency
34

Pie or Sector diagram
• The data are presented in a circle
• Degrees of angles and the area of a sectors
denote the relative frequency of a character.
• First, the frequency of each character should be
expressed in percentage then each percentage to
be multiplied by 3.6 to get the angle of the each
sector
35

40%
17%
6%
37%
General
Scheduled
Cast
Scheduled
Tribes
OBC
Fig-4: Pie diagram showing distribution of study population according to Caste.
36

Graphical Presentation
(Continuous Data)
37

Histogram
 It is set of contiguously drawn bars showing a
frequency distribution of continuous data.
The bars are drawn for each group of values such
that the area is proportional to the frequencies in
that group.
The vertical axis can represent percentages instead
of frequency.
• Should be continuous variable.
• Also known as - Area diagram
38

Frequency Histogram for Blood Pressure Data
Histogram
0
2
4
6
8
10
12
14
16
18
99.5
113.5
127.5
141.5
155.5
169.5
183.5
197.5
211.5
Systolic Blood Pressure
Frequency
Frequency
39

Frequency Polygon
• A frequency polygon is drawn in histogram.
• Plotted in the midpoints of class interval.
• Joined by straight lines.
Frequency Poly gon f or B.P.
0
2
4
6
8
10
12
14
16
18
92.5 106.5 120.5 134.5 148.5 162.5 176.5 190.5 204.5 218.5
Sy stolic Pressure
Frequency
40

Cumulative Frequency Polygon or Ogive
• A frequency distribution table is made.
• Converted to a cumulative frequency table.
• Cumulative frequency is the total number of
characteristics in each particular range from lowest
value up to including any higher group value.
• The cumulative frequencies are plotted corresponding
to the group limits of the characteristic.
• Joining the points by free hand curve gives the ogive.
41

Ogive for Blood Pressure Data
Blood Pressures of 50 Subjects
0
10
20
30
40
50
60
99.5 127.5 155.5 183.5 211.5
Systolic Pressure
CummulativeFrequency
42

Scatter Diagram
• Graphic presentation to show the nature of
correlation between two variable characters.
• Shows the direction of the relationship.
• It may be positive linear, negative linear or
non-linear correlation.
• Also known as correlation diagram.
43

Scatter Diagrams
a) c) e)
b) d) f)
Negative
Correlation
Positive
Correlation
Constant
Correlation
44

Simple Descriptive Statistics
• Summarizing data in frequency distribution
calculate simple ‘statistics’ for -
• Comparing relative frequencies into specific category
• Understanding comparative trends in the data
• Ratio
• Proportion
• Percentage
• Rate (Example: Incidence Rate, Prevalence Rate)
46

Measures of Rate, Ratio and Proportion
• Rate: Measures the occurrence of some particular events in a
population during a given period of time.
Four elements: Numerator, Denominator, time specification and multiplier.
Examples:
- Crude rate: actual observed rate
- Specific rate: actual observed rate due to specific cause
- Standardized rate: by direct or indirect method of standardization
• Ratio: Expresses the relation in size between two quantities.
• Proportion: It is a ratio in which the numerator is a part of the
denominator, i.e. it expresses the relation of a part to the whole.
47

Measures of Central Tendency and
Dispersion
• Measures of Central Tendency
- Mean
- Mode
- Median
• Measures of Dispersion
- Range
- Average deviation
- Variance
- Standard deviation
48

Measures of Central Tendency (Location)
Measures of location indicate where on the
number line the data are to be found. Common
measures of location are:
(i) the Arithmetic Mean,
(ii) the Median, and
(iii) the Mode
49

The Mean



n
i
in
xx
1
1
50
• Let x1,x2,x3,…,xn be the realised values of a random variable X,
from a sample of size n. The sample arithmetic mean is
defined as:
Example: The systolic blood pressure of seven middle aged
men were as follows: 151, 124, 132, 170, 146, 124
and 113.
 
14.137
7
113124146170132124151


x

The Median and Mode
• If the sample data are arranged in increasing
order, the median is
(i) the middle value if n is an odd number, or
(ii) midway between the two middle values if n is
an even number
• The mode - most commonly occurring value.
51

Example 1: if n is odd
The reordered systolic blood pressure data seen earlier are:
113, 124, 124, 132, 146, 151, and 170.
The Median is the middle value of the ordered data, i.e. 132.
Two individuals have systolic blood pressure = 124 mm Hg,
so the Mode is 124.
52
Example 2: if n is even
Six men with high cholesterol participated in a study to investigate the effects of diet
on cholesterol level. At the beginning of the study, their cholesterol levels (mg/dL)
were as follows:
366, 327, 274, 292, 274 and 230.
Rearrange the data in numerical order as follows: 230, 274, 274, 292, 327 and 366.
The Median is half way between the middle two readings, i.e. (274+292)  2 = 283.
Two men have the same cholesterol level- the Mode is 274.

Mean versus Median
• Large sample values tend to inflate the mean. This will
happen if the histogram of the data is right-skewed.
• The median is not influenced by large sample values and is a
better measure of centrality if the distribution is skewed.
• Note if mean=median=mode then the data are said to be
symmetrical
- e.g. In the CK measurement study, the sample mean =
98.28. The median = 94.5, i.e. mean is larger than median
indicating that mean is inflated by two large data values 201
and 203.
53

Measures of Dispersion
• Measures of dispersion characterise how spread
out the distribution is, i.e., how variable the data
are.
• Commonly used measures of dispersion include:
1. Range
2. Variance & Standard deviation
3. Coefficient of Variation (or relative standard deviation)
4. Inter-quartile range
54

Range
• the sample Range is the difference between
the largest and smallest observations in the
sample
• easy to calculate;
– Blood pressure example: min=113 and
max=170, so the range=57 mmHg
• useful for “best” or “worst” case scenarios 
• sensitive to extreme values 
55

Sample Variance
 
1
1
2
2





n
xx
s
n
i
i
56
• The sample variance, s2, is the arithmetic
mean of the squared deviations from the
sample mean:
>

Standard Deviation
• The sample standard deviation, s, is the square-root
of the variance
 
1
1
2





n
xx
s
n
i
i
57
 s has the advantage of being in the same units
as the original variable x

Example
Data Deviation Deviation2
151 13.86 192.02
124 -13.14 172.73
132 -5.14 26.45
170 32.86 1079.59
146 8.86 78.45
124 -13.14 172.73
113 -24.14 582.88
Sum = 960.0 Sum = 0.00 Sum = 2304.86
∑ x- ∑(x- )2
58
14.137x x x

Example (contd.)
59
  86.2304
7
1
2
i
i xx
6.19
17
86.2304


s

Coefficient of Variation
%100






x
s
CV
60
• The coefficient of variation (CV) or relative standard
deviation (RSD) is the sample standard deviation
expressed as a percentage of the mean, i.e.
• The CV is not affected by multiplicative changes in scale
• Consequently, a useful way of comparing the dispersion
of variables measured on different scales

Example
%3.14
%
1.137
6.19
100







CV
61
The CV of the blood pressure data is:
i.e., the standard deviation is 14.3% as large
as the mean.

Inter-quartile range
• The Median divides a distribution into two halves.
• The first and third quartiles (denoted Q1 and Q3) are defined
as follows:
– 25% of the data lie below Q1 (and 75% is above Q1),
– 25% of the data lie above Q3 (and 75% is below Q3)
• The inter-quartile range (IQR) is the difference between the
first and third quartiles, i.e. IQR = Q3- Q1
62

Normal Distribution: Normal Curve
63

Approaching a Continuous Density Curve
64

Cut the width of the Class Intervals in Half
65

Cut the width of the Class Intervals in Half Again
66

We Approach a Continuous Density [Normal] Curve
68

Normal Curve
• Has been worked out by a mathematician Gauss.(known as
Gaussian curve)
• Most biological variables are normally distributed.
• Can be represented in a frequency polygon.
• Can be plotted for continuous variable.
• Equal number of cases fall below and above mean.
• Bell shaped, most data occur in the middle.
• Symmetric with mean, median and mode at the same point.
69

Standard Scores
• One use of the normal curve is to explore Standard
Scores. Standard Scores are expressed in standard
deviation units, making it much easier to compare
variables measured on different scales.
• There are many kinds of Standard Scores. The most
common standard score is the ‘z’ scores.
• A ‘z’ score states the number of standard deviations by
which the original score lies above or below the mean of
a normal curve.
70

The Z Score
• The normal curve is not a single curve but a family of
curves, each of which is determined by its mean and
standard deviation.
• In order to work with a variety of normal curves, we
cannot have a table for every possible combination of
means and standard deviations.
71

Z Score (Standard Score)
• Z = X - μ
• Z indicates how many standard
deviations away from the mean the
point x lies.
• Z score is calculated to 2 decimal
places.
σ
72

The Standard Normal Curve
• The Standard Normal Curve (z distribution) is the
distribution of normally distributed standard scores
with mean equal to zero and a standard deviation of
one.
• A z score is nothing more than a figure, which
represents how many standard deviation units a raw
score is away from the mean.
73

Characteristics of Normal Distribution
• Hence Mean = Median
• The total area under the curve is 1 (or 100%)
• Normal Distribution has the same shape as
Standard Normal Distribution.
74

Essential Features
• The mean ± 1 standard deviation covers
66.7% of the area under the curve
• The mean ± 2 standard deviation covers 95%
of the area under the curve
• The mean ± 3 standard deviation covers
99.7% of the area under the curve
76

Health information

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Health information

Similar to Health information (20)

Recently uploaded

Recently uploaded (20)

Health information

Editor's Notes