SlideShare a Scribd company logo
1 of 122
Arba Minch University
College of Medicine & Health Sciences
,School of public health Department of
Public Health
Epidemiology and Biostatistics unit
By Kusse Otayto(BSc,MPH in Epi/Biostatistics)
By: Kusse Otayto(BSc, MPH( Epidemiology & Biostatistics))
1
Descriptive statistics
 It deals with the description of data in a clear &
informative manner using tables, graphs &
numerical summary
 It involves the organization & summarization of a
body of data with one or more meaningful tools.
 It helps to identify the general features & trends in a
set of data & extracting useful information
 Also very important in conveying the final results of
a study
2
Descriptive statistics
 Data
 Are information collected from the source or
 Are the raw materials of statistics
 Are numbers which can be obtained by
measurements or counting
 Data are made up of a set of variables
 It Can be obtained from Counting, Routinely kept
records, Surveys, Experiments, Reports…
 Types of data
1. Primary data
2. Secondary data
3
1. Primary data
1. Primary data:
 Are data collected from the items or individual
respondents directly by the researcher themselves
for the purpose of a study.
 Advantages of primary data
1. The data is original
2. Possibility of flexibility
3. Source for extensive research
 Disadvantages of primary data
1. Expensive & time consuming
2. Possibility of personal prejudice(biases)
4
2. Secondary data
2. Secondary data:
 Are data which had been collected by certain people or
organization & statistically treated & the information
contained in it is used for other purpose by other people
 Obtained from journals, reports, government
publications
 Advantages of secondary data
1. Are readymade
2. Relatively cheaper
3. Lesser degree of personal prejudice
 Disadvantages of secondary data
1. Lacks originality
2. May or may not suit the objects of enquiry (Not source
for extensive research)
3. It is used with great care & caution
5
Methods of data collection
 Before any statistical work can be done data must be
collected.
 Data collection is a crucial stage in the planning &
implementation of a study
 If the data collection has been superficial, biased or
incomplete, data analysis becomes difficult, & the research
report will be of poor quality.
 Therefore, we should concentrate all possible efforts on
developing appropriate tools, & should test them several
times.
 Depending on the type of variable & the objective of the
study different data collection methods can be employed:
Observation,Interview,using self administered written
questionnaire 6
A. Observation
 Is a technique that involves systematically selecting,
watching & recording behavior & characteristics of
living things, objects or phenomena.
 It includes all methods from simple visual
observations to the use of high level equipments
 It can be undertaken in the following ways:
1. Participant observation:
 The observer takes part in the situation he or she
observes.
2. Non-participant observation:
 The observer watches the situation, openly or
concealed, but does not participate
7
Cont…
 Observations can give additional, more accurate information
on behavior of people than interviews or questionnaires
 Observations can also be made on objects
 Outline the guidelines for the observations prior to actual
data collection.
 Advantages
 Gives relatively more accurate data on behavior &
activities
 Disadvantages:
 Investigators or observer’s own biases
 Needs more resources & skilled human power during the
use of high level machines.
8
Cont…
B. Interview (face-to-face)
 Is a data collection technique that involves oral
questioning of respondents, either individually or as a
group
 Answers to the questions posed during an interview can
be recorded by:
1. Writing them down (either during the interview itself
or immediately after the interview) or
2. By tape-recording the responses, or
3. By a combination of both.
 Advantages of face-to-face interview
 Can stimulate & maintain the respondent’s interest
 Can create a rapport(bond) (understanding, concord)
 Observations can be made as well.
 Disadvantage
 It is time consuming & expensive 9
Cont…
Cont…
1. In-depth interview
 It is a conversion between the researcher & the
subject about the research area or topic.
 It is designed to allow the respondent to tell their
story in their own way
 Issues are covered in detail; respondent leads the
interviews/sets the agenda; no fixed order
 Important in:
 Highly sensitive issues
 Geographical dispersed respondents
 When peer pressure is expected to distort facts
 It takes high cost & time than FGD 10
2. Focus group discussions
 It allows a group of 8 -12 informants to freely discuss
a certain subject with the guidance of a facilitator or
reporter
Advantages
 Group interaction stimulate richer responses &
emergence of new ideas
 The researcher observes & gets first hand insights
 Can be done more quickly & generally less expensive
than in- depth interviews
Disadvantage
 Not good in highly sensitive issues
11
Cont…
C.Using self-administered written questionnaire
 Is a data collection tool in which written questions
are presented that are to be answered by the
respondents in written form
 It can be administered in different ways, such as by:
 Sending questionnaires by mail with clear
instructions
 Gathering all or part of the respondents in one place
at one time, giving oral or written instructions, &
letting the respondents fill out
 Hand-delivering questionnaires to respondents &
collecting them later
12
Cont…
 The questions can be either open-ended or
closed
A. Example of closed ended question
1. What is the current breastfeeding status of mother ?
A. Exclusive breastfeeding
B. Partial breastfeeding
C. Not breastfeeding
B. Example of Open ended question
1. At what age should the child start supplementary
food? why?
13
Cont….
Advantages
 Is simpler & cheaper than interview
 Can be administered to many persons
simultaneously
 Can be sent by post.
Disadvantages
 It demands a certain level of education & skill of
respondents
 If a mailed questionnaire one, people of a low socio-
economic status are less likely to respond to it
14
Cont….
Variable
Variable
 Is a characteristic which takes different values in
different PPT (persons, places, or things).
 Any aspect of an individual or object that is
measured (e.g. BP) or recorded (e.g. age, sex) &
takes any value.
 There may be one or many variable in a study
15
Types of variables
A. Qualitative (categorical) variables
 Nominal
 Ordinal
B. Quantitative (numerical) variables
 Continuous
 Discrete
1. Dependent (outcome,Response) variable
2. Independent (exposure,Explanatory) variable
16
Variable
1. Categorical(Qualitative) variable
 A variable which can not be measured in
quantitative form but can only be sorted by name or
categories
 Not able to be measured as we measure height or
weight
 The notion of magnitude is absent or implicit.
 Categories must not overlap & must cover all
possibilities
17
Variable….
Categorical variable is divided into two:
1. Nominal variable
 The values fall into un-ordered categories or classes
 Uses names, labels or symbols to assign each
measurement.
 Examples: Blood type (A, B, AB, O) Sex
(male/female)
2. Ordinal variable
 Assigns each measurement to one of a limited number of
categories that are ranked in terms of order.
 Although non-numerical, can be considered to have a
natural ordering
 Examples:
1. Cancer stages: 1, 2, 3, 4
2. Pain severity: no pain, slight pain, moderate pain, severe
pain 18
Variable….
B. Quantitative (numerical) variable
 A variable that can be measured or counted & expressed
numerically.
 Has the notion of magnitude.
 E.g. Height, weight, # of children, etc.
 Quantitative variable is divided into two:
1. Discrete variable
 It can only have a limited number of discrete values &
hence takes on integer values only
 Characterized by gaps or interruptions in the values.
 Both the order & magnitude of the values matter.
 The values are not just labels, but are actual measurable
quantities.
 E.g. Number of children in household(0, 1, 2, 3, etc.) 19
Variable….
Variables…
2. Continuous variable
 It can have an infinite number of possible values in
any given interval or within some range
 Both the magnitude & the order of the values matter
 Does not possess the gaps or interruptions
 E.g. Weight (50.123...), Height (1.342...)
20
Variables…
Manipulation of variables
 Continuous variables can be discredited
 E.g. Age (1&1/12-1yr) can be rounded to whole
numbers
 Continuous or discrete variables can be categorized
 E.g. Age categories- 1(1-5), 2(6-10), 3(11-15)
 Categorical variables can be re-categorized
 E.g. marital status (Single, Married, Divorced,
Widowed) lumping from 4 categories down to 2
(married, single)
21
Variables…
1. Independent variables
 Precede(come first) dependent variables in time
 Are often manipulated by the researcher
2. Dependent variables
 What is measured as an outcome in a study
 Values depend on the independent variable
 Example
1. Health education involving active participation of mothers
will produce more positive changes in child feeding than
health education based on lectures.
 Independent variable:
 Type of health education
 Dependent variable:
 Changes in child feeding 22
Scales of Measurement
23
Scales of Measurement
 Scales of measurement
 Is an assignment of numbers to subjects, objects or
events(variables) in which we are interested according to
a set of rules
 Measurement is a way of refining our ordinary
observations so that we can assign numerical values to
our observations.
 These numbers will provide the raw material for our
statistical analysis.
 Why we measure things or worry about the different forms
that measurement may take?
 It allows us to go beyond simply describing the presence
or absence of an event or thing to specifying how much,
how long, or how intense it is.
 With measurement, our observations become more
accurate & more reliable. 24
Scales...
 There are four types of scales of measurement.
1. Nominal scale
 Used when data are classified into one of two or
more categories
 The values fall into un-ordered categories or classes(
aren’t hierarchical, one category isn’t “better” or
“higher” than another)
 Uses names, labels or symbols to assign each
measurement.
 Labeling or naming allows us to make qualitative
distinctions or to categorize & then count the
frequency of persons, objects, or things in each
category.
25
 It should be: Exhaustive & Mutually exclusive
1. Exhaustive :
 Should include all possible answerable responses.
2. Mutually exclusive :
 No respondent should be able to have two attributes
simultaneously
 Not really a ‘scale’ because it does not scale objects along
any dimension
 Assignment of numbers to the categories has no
mathematical meaning, simply for identification
purposes.
 Examples:
1. Marital status(Single, Married, Divorced)
2. Religion(Muslim, Protestant, Orthodox, Catholic) 26
Scales...
Scales...
2. Ordinal scale
 Used when data are classified into logically order- rank
 Assigns each measurement to one of a limited number of
categories that are logically ranked in terms of order
 Although non-numerical, can be considered to have a
natural ordering (The numbers have limited meaning
4>3>2>1)
 No consistent distance between points of measurement
 Example: Social class (Very poor, Poor, Rich, Very rich)
 There are not equal interval b/n adjacent numbers
27
Scales...
3. Interval scale
 Used when data are classified on a scale that assumes
equal distance between numbers
 There are Magnitude + Constant distance b/n points
+ No true zero point + Equal interval b/n adjacent
numbers
 Example: Temp. in o
F on 4 consecutive days
 Days: A B C D
 Temp. o
F: 50 55 60 65
 For these data, not only is day A with 50o F cooler
than day D with 65o but is 15o cooler.
 It has no true zero point (“0” is arbitrarily chosen &
doesn’t reflect the absence of temp.) 28
Scales...
4. Ratio scale
 Used when data are classified on a scale that assumes
equal distance & a true zero value
 Measurement begins at a true zero point & the scale has
equal space
 There are Magnitude + Constant distance b/n points +
Equal ratios + True zero.
 Examples: Height, weight, BP, etc.
 Zero weight or height means the complete absence of
weight or height.
 A 100-kg person has one-half the weight of a 200-kg
person & twice the weight of a 50-kg person.
 It is the most sensitive, powerful type- b/c contain the
most precise information about each observation that is
made 29
30
Decision tree to determine the appropriate scale of
measurement.
Question 1
There any order to the numbers?
Question 2
Are there equal interval b/n adjacent
numbers?
Question 3
Is there absolute zero?
Nominal
scale
Ordinal
scale
Interval
scale
Ratio
scale
Yes
Yes
Yes
No
No
No
31
Why Is Level of Measurement Important?
 Helps you to decide
1. What kind of data display or summary method &
What statistical analysis is appropriate on the values
that were assigned &
2. How to interpret the data from that variable.
32
Data organization & presentation
33
Data Organization & Presentation
1. For categorical variables
A. Using table of frequency distribution
1. Frequency counts
2. Relative frequency
3. Cumulative frequency
4. Relative cumulative frequency
B. Using pictorial forms
1. Bar charts(graph)
2. Pie charts
 Ordered array:
 A simple arrangement of individual observations in
order of magnitude.
 Very difficult with large sample size
34
2. For Quantitative variable
A. Using table of frequency distributions
1. Frequency counts
2. Relative frequency
3. Cumulative frequencies
4. Relative cumulative frequency
B. Using pictorial forms
1. Histogram
2. Frequency polygon
3. Line graph
4. Scattered plot
5. Box
6. Ogive/cumulative frequency… 35
Data Organization & Presentation….
 Frequency table:
 It involves a listing of all the observed values of the variable
being studied & How many times each value is observed.
 Frequency distribution:
 The distribution of the total number of observations among
the various categories is called a frequency distribution.
 Simple & effective way for summarizing large amounts of
data
 Relative Frequency
 It is the proportion or percentages of observations in each
category.
 The distribution of proportions is called the relative
frequency distribution of the variable
 Given a total number of observations, the relative frequency
distribution is easily derived from the frequency distribution.
36
Frequency table & Frequency Distributions…
Frequency table & Frequency Distributions…..
Cumulative frequency
 It is the number of observations in the category plus
observations in all categories smaller than it.
Cumulative relative frequency
 It is the proportion of observations in the category
plus observations in all categories smaller than it.
 It is obtained by dividing the cumulative frequency
by the total number of observations.
37
BWT Freq. Cum. Freq Rel. Freq. Cum. rel. freq
Very low 43 43 43/9974*100 = 0.4 43/9974*100 = 0.4
Low 793 43+793 = 836 793/9974*100 = 8.0 836/9974*100 = 8.4
Normal 8870 836+8870 = 9706 8870/9974*100 = 88.9 9706/9974*100 = 97.3
Big 268 9706+268 = 9974 268/9974*100 = 2.7 9974/9974*100 = 100
Total 9974 100 38
For example: Birth weight for newborns with levels:
1. Very low
2. Low
3. Normal &
4. Big
Table 1. Distribution of birth weight of newborns b/n 1976-1996 at “X” town.
For categorical variables
 For Quantitative variable,
 Select a set of continuous, non-overlapping intervals
such that each value can be placed in one & only one
of the intervals.
 The first consideration is how many intervals to
include
 To determine the number of class intervals & the
corresponding width, we may use:
 Sturge’s rule:
 Where
K = Number of class intervals
n = No. of observations
W = Width of the class interval
K 1 3.322(logn)
W
L S
K
 


39
Quantitative variable
1. Example: Leisure time (hours) per week for 40
college students:
23 24 18 14 20 36 24 26 23 21 16 15 19 20 22
14 13 10 19 27 29 22 38 28 34 32 23 19 21 31
16 28 19 18 12 27 15 21 25 16
K = 1 + 3.322 (log n)
K = 1 + 3.322 (log40) = 6.32 ≈ 6
Maximum value = 38, Minimum value = 10
W = L-S
K
W = (38-10)/6 = 4.66 ≈ 5 40
Quantitative variable....
41
Time (Hours) Frequency Relative Frequency Cumulative Relative Frequency
10-14
15-19
20-24
25-29
30-34
35-39
5
11
12
7
3
2
0.125
0.275
0.300
0.175
0.075
0.050
0.125
0.400
0.700
0.875
0.950
1.00
Total 40 1.00
Quantitative variable
42
 Class Limit: The range for each class
 Upper class limit
 Lower class limit
 Mid-point (Class mark):
 The value of the interval which lies midway b/n the
lower & the upper limits of a class.
 Class boundary (True limits):
 Are those limits that make an interval of a continuous
variable continuous in both directions
 Upper class boundary
 Lower class boundary
 Subtract 0.5 from the lower & add it to the upper class limit
Quantitative variable....
43
Time(Hours) True limit(class boundary) Mid-point Frequency
10-14
15-19
20-24
25-29
30-34
35-39
9.5 – 14.5
14.5 – 19.5
19.5 – 24.5
24.5 – 29.5
29.5 – 34.5
34.5 - 39.5
(10+14)/2 = 12
(15+19)/2 = 17
(20+24)/2 = 22
(25+29)/2 = 27
(30+34)/2 = 32
(35+39)/2 = 37
5
11
12
7
3
2
Total 40
Quantitative variable....
Guidelines for constructing tables
1. Keep them simple (Limit the number of variables to
three or less)
2. All tables should be self-explanatory (Include clear
title telling what, when & where)
3. Clearly label the rows & columns
4. State clearly the unit of measurement used
5. Explain codes & abbreviations in the foot-note
6. Show totals
7. If data is not original, indicate the source in foot-
note.
44
Pictorial /Diagrammatic presentation
Importance of diagrammatic presentation
1. Diagrams have greater attraction than mere figures
2. They give quick overall impression of the data
3. They have great memorizing value than mere figures
4. They facilitate comparison
5. Used to understand patterns & trends
 E.g.,
 Skewed or symmetric distribution
 Multiple peaks / mode
 Are there any outliers ?
 Relationship between variables. 45
1. Bar charts (Graphs)
1. Graphical equivalent of a frequency table
2. Categories are listed on the horizontal axis (X-axis)
3. Frequencies or relative frequencies are represented
on the Y-axis (ordinate)
4. The height of each bar is proportional to the
frequency or relative frequency of observations in
that category
46
Qualitative variable presentation
A. Simple bar chart:-used to represent a single
variable
47
0
20
40
60
80
100
Not immunized Partially immunized Fully immunized
Immunization status
Number
of
children
Fig. 1. Immunization status of Children in Adami Tulu Woreda, Feb.1995
B. Sub-divided (component) bar chart
1. If there are different quantities forming the sub-
divisions of the totals, simple bars may be sub-
divided in the ratio of the various sub-divisions to
exhibit the relationship of the parts to the whole.
2. The order in which the components are shown in a
“bar” is followed in all bars used in the diagram
48
Qualitative variable presentation
Example of 100%component bar chart:
0
20
40
60
80
100
August October December
2003
Percent
Mixed
P. vivax
P. falciparum
49
Fig.1 Plasmodium species distribution for confirmed malaria cases, Zeway, 2003
 Method of constructing bar chart
1. All the bars must have equal width
2. The bars are not joined together (leave space b/n
bars)
3. The different bars should be separated by equal
distances
4. All the bars should rest on the same line called the
base
5. Both axes clearly label
 Instead of “stacks” rising up from the horizontal (bar
chart), we could plot instead the shares of a pie.
50
Qualitative variable presentation
2. Pie chart
1. It shows the relative frequency for each category by
dividing a circle into sectors
2. The angles are proportional to the relative frequency.
3. Used for a single categorical variable
4. Use percentage distributions
 Steps to construct a pie-chart
1. Construct a frequency table
2. Change the frequency into percentage (P)
3. Change the percentages into degrees, where,
 Degree = Percentage X 360o
4. Draw a circle & divide it accordingly 51
Qualitative variable presentation
Cause of death No. of death Percentage
Circulatory system
Neoplasm
Respiratory system
Injury & poisoning
Digestive system
Others
100 000
70 000
30 000
6 000
10 000
20 000
100,000/236,000*360o = 153o
70,000/236,000*360o = 107o
30,000/236,000*360o = 46o
6,000/236,000*360o = 9o
10,000/236,000*360o = 15o
20,000/236,000*360o = 30o
Total 236 000 100% (360o)
52
Steps to construct a pie-chart
Example: Distribution of deaths for females, in England and Wales, 1989.
53
 Instead of “stacks” rising up from the horizontal (bar chart), we could plot
instead the shares of a pie.
 Recalling that a circle has 360 degrees, that 50% means 180 degrees, 25%
means 90 degrees, etc, we can identify “wedges” according to relative
frequency
Distribution fo cause of death for females, in England and Wales, 1989
Circulatory system
42%
Neoplasmas
30%
Respiratory system
13%
Injury and Poisoning
3%
Digestive System
4%
Others
8%
3. Histogram
1. Histograms are frequency distributions with
continuous class interval that have been turned into
graphs
2. A histogram is a type of bar chart, but there are no
spaces b/n the bars(continuous data)
3. Histograms are used to visually represent frequency
distributions of continuous data
4. Given a set of numerical data, we can obtain
impression of the shape of its distribution by
constructing a histogram
54
Quantitative variable presentation
3. Histogram
5. Constructed by choosing a set of non-overlapping class
intervals & counting the number of observations that fall in
each class.
6. It is necessary that the class intervals be non-overlapping so
that each observation falls in one & only one interval.
7. Bars are drawn over the intervals
8. The area of each bar is proportional to the frequency of
observations in the interval
 Two problems with histograms
1. They are somewhat difficult to construct
2. The actual values within the respective groups are lost
& difficult to reconstruct
 Stem-and-leaf plot overcomes these problems
55
Quantitative variable presentation….
Age group 15-19 20-24 25-29 30-34 35-39 40-44 45-49
Number 11 36 28 13 7 3 2
56
Age of women at the time of marriage
0
5
10
15
20
25
30
35
40
14.5-19.5 19.5-24.5 24.5-29.5 29.5-34.5 34.5-39.5 39.5-44.5 44.5-49.5
Age group
No
of
women
Histogram
Example: Distribution of the age of women at the time of marriage
5. Frequency polygon
1. Instead of drawing bars for each class interval,
sometimes a single point is drawn at the mid point of
each class interval & consecutive points joined by
straight line.
2. Graphs drawn in this way are called frequency
polygons
3. The total area under the frequency polygon is equal
to the area under the histogram
4. Frequency polygons are superior to histograms for
comparing two/more sets of data.
57
Quantitative variable presentation….
Age of women at the time of marriage
0
5
10
15
20
25
30
35
40
12 17 22 27 32 37 42 47
Age
No
of
women
58
6. Scatter plot
1. Most studies in medicine involve measuring more
than one characteristic
2. For two quantitative variables we use bivariate plots
(also called scatter plots or scatter diagrams).
3. In the study on percentage saturation of bile,
information was collected on the age of each patient
4. To see whether a relationship existed between the
two measures.
 E.g. Saturation of bile & age
59
Quantitative variable presentation….
6. Scatter plot….
 When both the variables are qualitative then we can
use a bar graph.
 When one of the characteristics is qualitative & the
other is quantitative, the data can be displayed in box
& whisker plots.
 A scatter diagram is constructed by drawing X- & Y-
axes.
 Each point represented by a point or dot() represents
a pair of values measured for a single study subject
 The graph suggests the possibility of a positive
relationship between age & percentage saturation of
bile in women. 60
Quantitative variable presentation….
Age and percentage saturation of bile for women patients in
hospital Z, 1998
0
20
40
60
80
100
120
140
160
0 10 20 30 40 50 60 70 80
Age
Saturation
of
bile
61
7. Line graph
1. Useful for assessing the trend of particular situation
overtime.
2. Helps for monitoring the trend of epidemics.
3. Values for each category are connected by
continuous line.
4. Sometimes two or more graphs are drawn on the
same graph taking the same scale so that the plotted
graphs are comparable.
62
Quantitative variable presentation….
Line graph
63
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
5.5
1967 1969 1971 1973 1975 1977 1979
Year
Rate
(%)
Fig 5: Malaria Parasite Prevalence Rates in Ethiopia, 1967 – 1979 E.C.
Line Graph
0
10
20
30
40
50
60
1960 1970 1980 1990 2000
Year
MMR/1000
Year MMR
1960 50
1970 45
1980 26
1990 15
2000 12
64
Figure (1): Maternal mortality rate of (country), 1960-2000
Reading assignment
Reading assignment
1. Ogive curve
2. Box & whisker plot
3. Stem and Leave plot
65
Numerical summary measures
1. Measures of central tendency
2. Measures of dispersion
66
Measures of Central Tendency
67
1. Measures of Central Tendency
 Statistic:–
 Descriptive measure computed from sample data
 Parameter:–
 Descriptive measure computed from population data
 Measures of central tendency:-
 Are the measures used to summarize the point at
which the data tend to cluster in a single number or
statistic.
 The most commonly used measures of central
tendency are:
1. Arithmetic Mean,
2. Median &
3. Mode.
68
1. Arithmetic mean
1. Arithmetic mean
 It is the average of the data set
 The sum of the observations divided by the number of
observations.
 Mean for ungrouped data
 Mean of a sample
 Mean of a population
= (X bar) refers to the mean of a sample &
= refers to the mean of a population
Σx is a command that adds all of the X values
n = is the total number of values in the series of a sample
&
N = is the sum for a population
X
μ
69
N
X



n
X
X


Arithmetic mean …..
 Example: 19 21 20 20 34 22 24 27 27 27
 Calculate the mean , n=10
 Mean = 19 + 21 + 20 +20+ 34 + 22 + 24 + 27 + 27 +27 = 24.1
10
 Mean for grouped data
 We assume that all values falling into a particular class
interval are located at the mid-point of the interval.
 It is calculated as follow:
70
x =
m f
f
i i
i=1
k
i
i=1
k


Where,
k = the number of class intervals
mi = the mid-point of the ith class
interval
fi = the frequency of the ith class
Example. Compute the mean age of 169 subjects from the
grouped data.
Class interval Mid-point (mi) Frequency (fi) mifi
10-19
20-29
30-39
40-49
50-59
60-69
14.5
24.5
34.5
44.5
54.5
64.5
4
66
47
36
12
4
58.0
1617.0
1621.5
1602.0
654.0
258.0
Total __ 169 5810.5
71
Mean = 5810.5/169 = 34.5 years
Arithmetic mean …..
Properties of the arithmetic mean
1. Can be used for both discrete & continuous data.
 However, it is not appropriate for either nominal
or ordinal data.
2. For given set of data there is one & only one
arithmetic mean.
3. It is easily understood & easy to compute.
4. Algebraic sum of the deviations of the given values
from their arithmetic mean is always zero.
5. It is greatly affected by the extreme values.
72
2. Median
Median
 Is the value that divides a series of values in 1/2 when
they are listed in order
 If observations are odd, the median is defined as the
 [(n+1)/2]th observation.
 E.g. 19 20 20 21 22 23 24 27 27 27 34 n=11
 Median = [(n+1)/2]th = [(11+1)/2]th = [6]th= 23
 If observations are even the median is the average of
the two middle
 (n/2)th + [(n/2)+1]th /2 values i.e, there is no middle
observation.
 E.g. 19 20 20 21 22 24 27 27 27 34 n= 10
 Median = (n/2)th + [(n/2)+1]th /2= (10/2)th +
[(10/2)+1]th /2= (5)th + [6]th /2 = (22 + 24)/2 = 23 73
 Median for Grouped data
 We assume that the values within a class-interval are
evenly distributed through the interval.
 The first step is to locate the class interval in which it
is located.
 Find n/2 & see a class interval with a minimum
cumulative frequency which contains n/2.
 Note:- All class intervals with cumulative frequencies
≥n/2 contain the median.
74
Median….
To find a unique median value, use the following
interpolation formal.
75
W
f
F
2
n
L
=
x
~
m
c
m














Median….
 Where,
• Lm = lower true class boundary of the interval containing the median
• Fc = cumulative frequency of the interval just above the median class
interval
• fm = frequency of the interval containing the median
• W= class interval width
• n = total number of observations
Ex. Compute the median age of 169 subjects from the
grouped data.
Class interval Mid-point (mi) Frequency (fi) Cum. freq
10-19
20-29
30-39
40-49
50-59
60-69
14.5
24.5
34.5
44.5
54.5
64.5
4
66
47
36
12
4
4
70
117
153
165
169
Total 169
76
Median….
77
 Median =
= n/2 = 169/2 = 84.5
= 84.5 = in the 3rd class interval
= Lower limit = 29.5,
= Upper limit = 39.5
= Frequency of the class = 47
= Fc above class interval = 70
= Median = 29.5 + (84.5-70 /47)10 = 32.58 ≈ 33
W
f
F
2
n
L
=
x
~
m
c
m














Median….
Properties of median
1. Can be used for ordinal, discrete & continuous data.
 However, it is not appropriate for nominal data.
2. There is only one median for a given set of data
3. The median is easy to calculate
4. Median is a positional average & hence it is not
drastically affected by extreme values
5. It is not a good representative of data if the number
of items is small
78
3. Mode
 Mode
 It is the value/ observation which occurs most frequently.
 Most distributions have one peak & are described as uni-
modal.
 E.g. 19 21 20 20 34 22 24 27 27 27
 Mode = 27
 The mode of grouped data usually refers to the modal class
with the highest frequency.
 The modal value is the highest bar in a histogram
 Not a good summary
 Possible to have one, more than one/no mode
79
To find a single value of mode for grouped data, use
the following formula:
 
 
 
Mode 1
mo
1 2
Δ
= L + i
Δ + Δ
80
mo
L
1

2

 Where:
 i is the class width
 is the difference b/n the frequency of class mode & the frequency
of the class after (below) the class mode
 is the difference b/n the frequency of class mode & the frequency
of the class before (above) the class mode
 is the lower boundary of class mode
Mode….
Ex. Find the mode for the following data
81
 Solution
 Lmo = 19.5, F =66, Fb =47, Fa =4, i=10
 Mode =19.5+((66-47)/66-47+66-4))10 =21.8=22
Mode….
Class interval Mid-point (mi) Frequency (fi) Cum. freq
10-19
20-29
30-39
40-49
50-59
60-69
14.5
24.5
34.5
44.5
54.5
64.5
4
66
47
36
12
4
4
70
117
153
165
169
Total 169
Properties of mode
1. Can be used for nominal, ordinal, discrete &
continuous data.
 However, it is more appropriate for nominal &
ordinal data.
2. It is not affected by extreme values
3. Often its value is not unique
4. The main drawback of mode is that often it does not
exist
82
2. Measures of Dispersion
83
2. Measures of Dispersion
Measures of Dispersion
 Measures that quantify the variation or dispersion of a
set of data from its central location
 Dispersion of a set of observations is the variety exhibited by
the observations
1. If all the values are the same→ There is no dispersion
2. If all the values are different → There is a dispersion
3. If the values close to each other →The amount of
dispersion is small
4. If the values are widely scattered/spread → The
dispersion is greater
84
Common measures of dispersion
1. Range
2. Inter quartile range
3. Variance
4. Standard deviation
5. Coefficient of variation
85
Measures of Dispersion….
1. Range (R)
Range (R)
 Is the difference b/n the largest & smallest
observations in a sample.
 Range concern only on two values
 Range = Maximum value – Minimum value
 The range is the simplest measure of dispersion.
 A data set with higher range shows more variability
 Example –
 Data values: 5, 9, 12, 16, 23, 34, 37, 42
 Maximum value= 42,
 Minimum value= 5
 Range = 42-5 = 37 86
 Properties of range
1. It is the simplest crude measure & can be easily
understood
2. It takes into account only two values which causes it
to be a poor measure of dispersion
3. Very sensitive to extreme observations (outliers)
4. The larger the sample size, the larger the range
87
1. Properties of range....
2. Inter-quartile range (IQR)
 Inter-quartile range (IQR)
 It is used when the median is used as the measure of
central tendency.
 It gives the range in which the middle 50% of the
distribution lies.
 The inter-quartile range quantifies the difference b/n
the third & first quartiles.
IQR = Q3 - Q1
 A large IQR indicates a large amount of variability
among the middle 50% of the observations &
 A small IQR indicates a small amount of variability
88
2. Inter-quartile range (IQR).....
 The inter-quartile range is particularly useful to
describe data sets where there are a few extreme
values.
 Unlike the range, & to a lesser extent the standard
deviation, it is not sensitive to extreme values as it
relies on the spread of the middle 50% of the
distribution.
 So, if there are data sets which have extreme values,
it can be more appropriate to use the median to
describe central tendency & the inter-quartile range
to describe the spread. 89
What does Quartiles mean?
 If the data are divided into four equal parts, we speak of
quartiles.
 Quartiles (Q1, Q2, Q3, Q4) – sample size (data) is divided
into 4 equal parts getting 25% of observations in each of
them.
 The first quartile(Q1):
 Is the point which gives us 25% of the area to the left of
it & 75% to the right of it.
 This means that 25% of the observations are less than or
equal to the first quartile & 75% of the observations
greater than or equal to the first quartile.
 The first quartile (Q1): 25% of all the ranked
observations are less than Q1.
 The first quartile is also called the 25th percentile.
90
 The second quartile (Q2):
 The point which gives us 50% of the area to the left
of it & 50% to the right of it
 The second quartile is called the median.
 The third quartile (Q3):
 Is the point which gives us 75% of the area to the left
of it & 25% of the area to the right of it.
 This means that 75% of the observations are less
than or equal to the third quartile & 25% of the
observation are greater than or equal to the third
quartile.
 The third quartile is also called the 75th percentile.
91
What does Quartiles mean?....
 Ex.1: Suppose we have a small data set of
twelve observations
 15 18 19 20 20 20 21 23 23 24 24 25
1. We want to divide the data into four equal sets
2. First, we find the median
 15 18 19 20 20 20 ↑ median 21 23 23 24 24 25
 Median = 20.5 (half way b/n the 6th & 7th
observations),
 Divides the data into two equal sets with exactly 50% of
the observations in each:
 The 1st - 6th observations in the first set &
 The 7th - 12th observations in the other. 92
What does Quartiles mean?....
What does Quartiles mean?....
 To find the first quartile we consider the observations
less than the median.
 15 18 19 ↑ 20 20 20
 The first quartile is the median of these data.
 In this case, the first quartile is half way b/n the 3rd &
4thobservations & is equal to 19.5.
 Now, we consider the observations which are greater than
the median.
 21 23 23 ↑ 24 24 25
 The third quartile is the median of these data & is equal to
23.5.
 15 18 19 ↑ 20 20 20 ↑ 21 23 23 ↑ 24 24 25
Q1 Q2 Q3
 IQR = Q3- Q1 = 23.5- 19.5.= 4 93
What does Quartiles mean?....
 Example 1: Suppose the first & third quartile for weights of
girls 12 months of age are 8.8 Kg & 10.2 Kg, respectively.
 IQR = 10.2 Kg – 8.8 Kg = 1.4
 i.e., 50% of the infant girls weigh between 8.8 & 10.2
Kg.
 Example 2: Given the following data set (age of patients):-
 18, 59, 24, 42, 21, 23, 24, 32
 Find the inter-quartile range
 Solution: 18 21 23 24 24 32 42 59
 Q1st = {(n+1)/4}th = (2.25) th = 21 + (23-21)x .25 = 21.5
 Q3rd = {3/4 (n+1)} th = (6.75) th = 32 + (42-32)x .75 =
39.5
 Hence, IQR = 39.5 - 21.5 = 18
94
What does Quartiles mean?....
 Ex.2 :Given these data: 13, 7, 9, 15, 11, 5, 8, 4
a. Arrange the observations in increasing order.
 4, 5, 7, 8, 9, 11, 13, 15.
b. Find the position of the 1st & 3rd quartiles.
= n=8.
= Position of Q1 = ¼ (n+1) = ¼ (8+1) = 2.25th
= Q1 lies the 2nd & 3rd observations
= Position of Q3 = ¾(n+1) = ¾(8+1) = 6.75th
= Q3 lies the 6th & 7th observations
95
What does Quartiles mean?....
C. Identify the value of the 1st & 3rd quartiles.
 The value of Q1 is equal to the value of the 2nd
observation plus 1/4th the difference b/n the values of
the 3rd & 2nd observations:
 Value of the 3rd observation =7
 Value of the 2nd observation = 5
 Q1 = 5 +1/4(7-5) = 5 +2/4 = 5.5
 The value of Q3 is equal to the value of the 6th
observation plus 3/4ths of the difference b/n the value
of the 7th & 6th observations:
 Value of the 7th observation =13
 Value of the 6th observation=11
 Q3 = 11 +3/4 (13-11) = 11 +3(2)/4 = 11+6/4 = 12.5
96
What does Quartiles mean?....
d. Calculate the inter-quartile range
 Q3 = 12.5 ; Q1 = 5.5
 IQR = Q3-Q1 = 12.5–5.5 = 7
 Generally we apply this formula:
1. Qk = ((kn/4) th + (kn/4+1)th)/2 -if n is even
2. Qk = ((kn/4+1)/2) th- if n is odd
 Quartiles for grouped data
 Apply the same method with median
= Q1= Q1L+((n/4-fc)/fQ1)I & Q3= Q3L+((3n/4-fc)/fQ3)i
 To find the class of each
= Q1=n/4 & Q3=3n/4
= IQR= Q3-Q1 97
What does Quartiles mean?....
Properties of IQR
1. It is a simple & versatile measure
2. It encloses the central 50% of the observations
3. It is not based on all observations but only on two
specific values
4. It is important in selecting cut-off points in the
formulation of clinical standards
5. Since it excludes the lowest & highest 25% values, it
eliminates the outlier problem
6. Less sensitive to the size of the sample
98
Percentiles
 Percentiles:
 Are simply divide the data into 100 pieces.
 Are less sensitive to outliers &
 Are not greatly affected by the sample size (n).
99
100
 P0:
 The minimum
 P25:
 25% of the sample values are less than or equal to this
value.
 1st Quartile, P25 means 25th percentile
 P50:
 50% of the sample are less than or equal to this value.
 2nd Quartile
 P75:
 75% of the sample values are less than or equal to this
value.
 3rd Quartile
 P100:
 The maximum
Percentiles….
101
 The pth
percentile:
 Is a value that is p%
of the observations &  the
remaining (1-p)%
.
 The observation corresponding to p(n+1)th
if p(n+1)
is an integer
 The average of (k)th
& (k+1)th
observations if p(n+1)
is not an integer, where k is the largest integer less
than p(n+1).
 If p(n+1) = 3.6, the average of 3rd & 4th observation
Percentiles…..
102
 Example: Birth weight in grams
 2069, 2581, 2759, 2834, 2838, 2841, 3031,
 3101, 3200, 3245, 3248, 3260, 3265, 3314,
 3323, 3484, 3541, 3609, 3649, 4146
 Find the 10th & 90th percentile of the data set. n=20
 Solution: 10th percentile =Pt = ((tn/100)th +
(tn/100+1)th)/2 -if n is even
 20×0.1 = (2)th + (20×0.1)+1 = (3)th are not integers,
 The average of the 2nd & 3rd values
 = (2581+2759)/2 = 2670 g
 Solution: 90th percentile =
 20×0.9 = (18)th + (20×0.9)+1 = (19)th are not
integers,
 The average of the18th & 19th values
 = (3609+3649)/2 = 3629 g
Percentiles…..
 Generally we apply this formula:
1. Pt = ((tn/100)th + (tn/100+1)th)/2 -if n is even
2. Pt = ((tn/100+1)/2) th -if n is odd
 For grouped data use the following formula:
 P = PL+ (P(n)-fc)/f)i
 To find the class, use p(n) value or
 Where
 m represents the percentile we're finding,
 N is the total number of observations in the data set.
103
Percentiles…..
Variance (2, s2)
 The variance
 Is the average of the squares of the deviations taken
from the mean
 A good measure of dispersion should make use of all
the data.
 The variance achieves this by averaging the sum of
the squares of the deviations from the mean.
 The sample variance of the set x1, x2, ., xn of n
observations with mean ẍ is
 Degrees of freedom
 n-1 used because if we know n-1 deviations, the nth deviation is known
 Deviations have to sum to zero 104
S
(x x)
n - 1
2
i
2
i=1
n



 It is squared because the sum of the deviations of the
individual observations of a sample about the sample
mean is always zero
 Degrees of freedom
 In computing the variance there are (n-1) degrees of
freedom because only (n-1) of the deviations are
independent from each other
 This is because the sum of the deviations from their
mean (Xi-Mean) must add to zero.
 The last one can always be calculated from the
others automatically (It is not free to vary).
105
Variance (2, s2)
 Example
 Data: 43,66,61,64,65,38,59,57,57,50.
 Find Sample Variance of the data ,
 Mean = 56
 S2= [(43 - 56) 2 + (66 - 56)2+…..+(50 - 56) 2 ]/10-1 =
810/9 = 90
Variance for grouped data
106
S
(m x) f
f -1
2
i
2
i
i=1
k
i
i=1
k




x
 Where
 mi = the mid-point of the ith class interval
 fi = the frequency of the ith class interval
 = the sample mean
 k = the number of class intervals
Variance (2, s2)
 Ex. Compute the variance of the age of 169 subjects
from the grouped data.
Class interval (mi) (fi) (mi-Mean) (mi-Mean)2 (mi-Mean)2 fi
10-19
20-29
30-39
40-49
50-59
60-69
14.5
24.5
34.5
44.5
54.5
64.5
4
66
47
36
12
4
-19.98
-9-98
0.02
10.02
20.02
30.02
399.20
99.60
0.0004
100.40
400.80
901.20
1596.80
6573.60
0.0188
3614.40
4809.60
3604.80
Total 169 1901.20 20199.22
107
 Mean = 5810.5/169 = 34.48 years
 S2 = 20199.22/169-1 = 120.23
Variance (2, s2)
108
1. The main disadvantage of variance is that its unit is
the square of the unite of the original measurement
values
2. The variance gives more weight to the extreme
values as compared to those which are near to mean
value, because the difference is squared in variance.
3. The drawbacks of variance are overcome by the
standard deviation.
Properties of Variance
Standard deviation (, s)
 Standard deviation (, s)
 It is the square root of the variance.
 This produces a measure having the same scale as
that of the individual values.
 It shows variation about the mean
109
 
 2
and S = S2
110
Standard deviation (, s).....
Properties of SD
1. Has the advantage of being expressed in the same
units of measurement as the mean
2. The best measure of dispersion & is used widely
because of the properties of the theoretical normal
curve.
3. However, if the units of measurements of variables of
two data sets is not the same, then there variability
can’t be compared by comparing the values of SD.
111
112
Wide spread results in higher SDs Narrow spread in lower SDs
Standard deviation (, s).....
Coefficient of variation (CV)
 Coefficient of variation (CV)
 When two data sets have different units of
measurements, or their means differ sufficiently in
size, the CV should be used as a measure of
dispersion.
 It is the best measure to compare the variability of
two series of sets of observations.
 Data with less coefficient of variation is considered
more consistent.
 CV is the ratio of the SD to the mean multiplied by
100.
113
CV
S
x
100
 
“Cholesterol is more variable than systolic blood pressure”
SD Mean CV (%)
SBP
Cholesterol
15mm
40mg/dl
130mm
200md/dl
11.5
20.0
114
Coefficient of variation (CV).....
Distributions
Distributions used in statistical analysis:
1. Discrete random variables:
1) Binomial,
2) Poisson &
3) Hyper geometric distributions.
 E.g. The analysis of discrete random variables,
such as the position of a nucleotide on a given
sequence may use techniques based on a binomial
distribution & not techniques that assume a
normal distribution.
2. Continuous random variables:
1) Normal distribution,
2) Z distribution.
115
Normal distribution
 Normal distribution
 It is symmetric about its mean/one half of the curve
is the mirror image of the other half
 The mean, median, & mode are equal & are in
different positions
 The highest point is at its mean
 The height of the curve decreases as one moves away
from the mean in either direction, approaching, but
never reaching zero
116
117
Mean
A normal distribution is symmetric about its mean
As one moves away from
the mean in either direction
the height of the curve
decreases, approaching,
but never reaching zero
The highest point of
the overlying normal
curve is at the mean
Normal distribution…..
Skewed distributions
 Skewed distributions
 The data are not distributed symmetrically in
skewed distributions
 The mean, median, & mode are not equal & are in
different positions
 Scores are clustered at one end of the distribution
 A small number of extreme values are located in the
limits of the opposite end
 Skew is always toward the direction of the longer tail
118
Skewed distributions….
A. Negatively skewed distribution
 Occurs when majority of scores are at the right end
of the curve & a few small scores are scattered at the
left end
 Positive if skewed to the right
B. Positively skewed distribution
 Occurs when the majority of scores are at the left
end of the curve & a few extreme large scores are
scattered at the right end.
 Negative if to the left
119
Median Mode Mean
(a). Symmetric Distribution
Mean = Median = Mode
Mode Median Mean
(b). Distribution skewed to the right
Mean > Median > Mode
Mean Median Mode
(c). Distribution skewed to the left
Mean < Median < Mode 120
Which measures to use?
1. When the distribution is symmetric & uni-modal,
summarize the data using means & standard deviations.
2. When the data are skewed, it is preferable to use the
median & quartiles as summary statistics.
3. Median & quartiles are not easily influenced by extreme
values in a skewed distribution unlike means & standard
deviations.
A. Symmetric & uni-modal distribution —
 Mean, median, & mode should all be approximately the
same
B. Skewed to the right (Positively skewed) —
 Mean is sensitive to extreme values, so median might be
more appropriate
C. Skewed to the left (Negatively skewed) –
 Mean is sensitive to extreme values, so median might be
more appropriate 121
122

More Related Content

Similar to Basic concepts in biostatistics edited pc-1.pptx

Review of descriptive statistics
Review of descriptive statisticsReview of descriptive statistics
Review of descriptive statisticsAniceto Naval
 
251109 rm-m.r.-data collection methods in quantitative research-an overview
251109 rm-m.r.-data collection methods in quantitative research-an overview251109 rm-m.r.-data collection methods in quantitative research-an overview
251109 rm-m.r.-data collection methods in quantitative research-an overviewVivek Vasan
 
Periodontal Research: Basics and beyond – Part II (Ethical issues, sampling, ...
Periodontal Research: Basics and beyond – Part II (Ethical issues, sampling, ...Periodontal Research: Basics and beyond – Part II (Ethical issues, sampling, ...
Periodontal Research: Basics and beyond – Part II (Ethical issues, sampling, ...naseemashraf2
 
1 reflection4reflection (thorax and l
1 reflection4reflection (thorax and l1 reflection4reflection (thorax and l
1 reflection4reflection (thorax and lVivan17
 
ACFrOgAFL8lOHEhl0Rdi_TF_mN6qmQi9WFaL7FBhsoyggt3HJEJfrm1SqzLJJDNRzhM8b8T49Sm-t...
ACFrOgAFL8lOHEhl0Rdi_TF_mN6qmQi9WFaL7FBhsoyggt3HJEJfrm1SqzLJJDNRzhM8b8T49Sm-t...ACFrOgAFL8lOHEhl0Rdi_TF_mN6qmQi9WFaL7FBhsoyggt3HJEJfrm1SqzLJJDNRzhM8b8T49Sm-t...
ACFrOgAFL8lOHEhl0Rdi_TF_mN6qmQi9WFaL7FBhsoyggt3HJEJfrm1SqzLJJDNRzhM8b8T49Sm-t...EspeMat
 
Advantages of Quantitative Research.pptx
Advantages of Quantitative Research.pptxAdvantages of Quantitative Research.pptx
Advantages of Quantitative Research.pptxNelgen1
 
practical reporting.pptx
practical reporting.pptxpractical reporting.pptx
practical reporting.pptxprimoboymante
 
Unit 2 types of research
Unit 2 types of researchUnit 2 types of research
Unit 2 types of researchAsima shahzadi
 
Introduction to nursing research
Introduction to nursing researchIntroduction to nursing research
Introduction to nursing researchNursing Hi Nursing
 
Pscyhology methodology pp
Pscyhology methodology ppPscyhology methodology pp
Pscyhology methodology ppabonica
 
2-kinds-and-importance-of-research.pptx
2-kinds-and-importance-of-research.pptx2-kinds-and-importance-of-research.pptx
2-kinds-and-importance-of-research.pptxJenniferApollo
 
Nursing Process presentation by Rebira .pptx
Nursing  Process presentation by Rebira .pptxNursing  Process presentation by Rebira .pptx
Nursing Process presentation by Rebira .pptxRebiraWorkineh
 
Nursing Process presentation in wallagga university by Rebira .pptx
Nursing  Process presentation in wallagga university by Rebira .pptxNursing  Process presentation in wallagga university by Rebira .pptx
Nursing Process presentation in wallagga university by Rebira .pptxRebiraWorkineh
 
Pptsummativeassessment 130217030359-phpapp01
Pptsummativeassessment 130217030359-phpapp01Pptsummativeassessment 130217030359-phpapp01
Pptsummativeassessment 130217030359-phpapp01Mariz Encabo
 

Similar to Basic concepts in biostatistics edited pc-1.pptx (20)

Data collection
Data collectionData collection
Data collection
 
Review of descriptive statistics
Review of descriptive statisticsReview of descriptive statistics
Review of descriptive statistics
 
251109 rm-m.r.-data collection methods in quantitative research-an overview
251109 rm-m.r.-data collection methods in quantitative research-an overview251109 rm-m.r.-data collection methods in quantitative research-an overview
251109 rm-m.r.-data collection methods in quantitative research-an overview
 
Periodontal Research: Basics and beyond – Part II (Ethical issues, sampling, ...
Periodontal Research: Basics and beyond – Part II (Ethical issues, sampling, ...Periodontal Research: Basics and beyond – Part II (Ethical issues, sampling, ...
Periodontal Research: Basics and beyond – Part II (Ethical issues, sampling, ...
 
unit 2.2.ppt
unit 2.2.pptunit 2.2.ppt
unit 2.2.ppt
 
Rerearch design
Rerearch designRerearch design
Rerearch design
 
1 reflection4reflection (thorax and l
1 reflection4reflection (thorax and l1 reflection4reflection (thorax and l
1 reflection4reflection (thorax and l
 
ACFrOgAFL8lOHEhl0Rdi_TF_mN6qmQi9WFaL7FBhsoyggt3HJEJfrm1SqzLJJDNRzhM8b8T49Sm-t...
ACFrOgAFL8lOHEhl0Rdi_TF_mN6qmQi9WFaL7FBhsoyggt3HJEJfrm1SqzLJJDNRzhM8b8T49Sm-t...ACFrOgAFL8lOHEhl0Rdi_TF_mN6qmQi9WFaL7FBhsoyggt3HJEJfrm1SqzLJJDNRzhM8b8T49Sm-t...
ACFrOgAFL8lOHEhl0Rdi_TF_mN6qmQi9WFaL7FBhsoyggt3HJEJfrm1SqzLJJDNRzhM8b8T49Sm-t...
 
CH1.pdf
CH1.pdfCH1.pdf
CH1.pdf
 
Ch1
Ch1Ch1
Ch1
 
Data collection
Data collectionData collection
Data collection
 
Advantages of Quantitative Research.pptx
Advantages of Quantitative Research.pptxAdvantages of Quantitative Research.pptx
Advantages of Quantitative Research.pptx
 
practical reporting.pptx
practical reporting.pptxpractical reporting.pptx
practical reporting.pptx
 
Unit 2 types of research
Unit 2 types of researchUnit 2 types of research
Unit 2 types of research
 
Introduction to nursing research
Introduction to nursing researchIntroduction to nursing research
Introduction to nursing research
 
Pscyhology methodology pp
Pscyhology methodology ppPscyhology methodology pp
Pscyhology methodology pp
 
2-kinds-and-importance-of-research.pptx
2-kinds-and-importance-of-research.pptx2-kinds-and-importance-of-research.pptx
2-kinds-and-importance-of-research.pptx
 
Nursing Process presentation by Rebira .pptx
Nursing  Process presentation by Rebira .pptxNursing  Process presentation by Rebira .pptx
Nursing Process presentation by Rebira .pptx
 
Nursing Process presentation in wallagga university by Rebira .pptx
Nursing  Process presentation in wallagga university by Rebira .pptxNursing  Process presentation in wallagga university by Rebira .pptx
Nursing Process presentation in wallagga university by Rebira .pptx
 
Pptsummativeassessment 130217030359-phpapp01
Pptsummativeassessment 130217030359-phpapp01Pptsummativeassessment 130217030359-phpapp01
Pptsummativeassessment 130217030359-phpapp01
 

Recently uploaded

Test bank for critical care nursing a holistic approach 11th edition morton f...
Test bank for critical care nursing a holistic approach 11th edition morton f...Test bank for critical care nursing a holistic approach 11th edition morton f...
Test bank for critical care nursing a holistic approach 11th edition morton f...robinsonayot
 
Circulatory Shock, types and stages, compensatory mechanisms
Circulatory Shock, types and stages, compensatory mechanismsCirculatory Shock, types and stages, compensatory mechanisms
Circulatory Shock, types and stages, compensatory mechanismsMedicoseAcademics
 
Shazia Iqbal 2024 - Bioorganic Chemistry.pdf
Shazia Iqbal 2024 - Bioorganic Chemistry.pdfShazia Iqbal 2024 - Bioorganic Chemistry.pdf
Shazia Iqbal 2024 - Bioorganic Chemistry.pdfTrustlife
 
Drug development life cycle indepth overview.pptx
Drug development life cycle indepth overview.pptxDrug development life cycle indepth overview.pptx
Drug development life cycle indepth overview.pptxMohammadAbuzar19
 
Top 10 Most Beautiful Chinese Pornstars List 2024
Top 10 Most Beautiful Chinese Pornstars List 2024Top 10 Most Beautiful Chinese Pornstars List 2024
Top 10 Most Beautiful Chinese Pornstars List 2024locantocallgirl01
 
HISTORY, CONCEPT AND ITS IMPORTANCE IN DRUG DEVELOPMENT.pptx
HISTORY, CONCEPT AND ITS IMPORTANCE IN DRUG DEVELOPMENT.pptxHISTORY, CONCEPT AND ITS IMPORTANCE IN DRUG DEVELOPMENT.pptx
HISTORY, CONCEPT AND ITS IMPORTANCE IN DRUG DEVELOPMENT.pptxDhanashri Prakash Sonavane
 
7 steps How to prevent Thalassemia : Dr Sharda Jain & Vandana Gupta
7 steps How to prevent Thalassemia : Dr Sharda Jain & Vandana Gupta7 steps How to prevent Thalassemia : Dr Sharda Jain & Vandana Gupta
7 steps How to prevent Thalassemia : Dr Sharda Jain & Vandana GuptaLifecare Centre
 
Obat Aborsi Ampuh Usia 1,2,3,4,5,6,7 Bulan 081901222272 Obat Penggugur Kandu...
Obat Aborsi Ampuh Usia 1,2,3,4,5,6,7 Bulan  081901222272 Obat Penggugur Kandu...Obat Aborsi Ampuh Usia 1,2,3,4,5,6,7 Bulan  081901222272 Obat Penggugur Kandu...
Obat Aborsi Ampuh Usia 1,2,3,4,5,6,7 Bulan 081901222272 Obat Penggugur Kandu...Halo Docter
 
TEST BANK For Porth's Essentials of Pathophysiology, 5th Edition by Tommie L ...
TEST BANK For Porth's Essentials of Pathophysiology, 5th Edition by Tommie L ...TEST BANK For Porth's Essentials of Pathophysiology, 5th Edition by Tommie L ...
TEST BANK For Porth's Essentials of Pathophysiology, 5th Edition by Tommie L ...rightmanforbloodline
 
The Clean Living Project Episode 23 - Journaling
The Clean Living Project Episode 23 - JournalingThe Clean Living Project Episode 23 - Journaling
The Clean Living Project Episode 23 - JournalingThe Clean Living Project
 
Jual Obat Aborsi Di Dubai UAE Wa 0838-4800-7379 Obat Penggugur Kandungan Cytotec
Jual Obat Aborsi Di Dubai UAE Wa 0838-4800-7379 Obat Penggugur Kandungan CytotecJual Obat Aborsi Di Dubai UAE Wa 0838-4800-7379 Obat Penggugur Kandungan Cytotec
Jual Obat Aborsi Di Dubai UAE Wa 0838-4800-7379 Obat Penggugur Kandungan Cytotecjualobat34
 
TEST BANK For Guyton and Hall Textbook of Medical Physiology, 14th Edition by...
TEST BANK For Guyton and Hall Textbook of Medical Physiology, 14th Edition by...TEST BANK For Guyton and Hall Textbook of Medical Physiology, 14th Edition by...
TEST BANK For Guyton and Hall Textbook of Medical Physiology, 14th Edition by...rightmanforbloodline
 
Physiologic Anatomy of Heart_AntiCopy.pdf
Physiologic Anatomy of Heart_AntiCopy.pdfPhysiologic Anatomy of Heart_AntiCopy.pdf
Physiologic Anatomy of Heart_AntiCopy.pdfMedicoseAcademics
 
Physicochemical properties (descriptors) in QSAR.pdf
Physicochemical properties (descriptors) in QSAR.pdfPhysicochemical properties (descriptors) in QSAR.pdf
Physicochemical properties (descriptors) in QSAR.pdfRAJ K. MAURYA
 
Face and Muscles of facial expression.pptx
Face and Muscles of facial expression.pptxFace and Muscles of facial expression.pptx
Face and Muscles of facial expression.pptxDr. Rabia Inam Gandapore
 
VIP ℂall Girls Arekere Bangalore 6378878445 WhatsApp: Me All Time Serviℂe Ava...
VIP ℂall Girls Arekere Bangalore 6378878445 WhatsApp: Me All Time Serviℂe Ava...VIP ℂall Girls Arekere Bangalore 6378878445 WhatsApp: Me All Time Serviℂe Ava...
VIP ℂall Girls Arekere Bangalore 6378878445 WhatsApp: Me All Time Serviℂe Ava...deepakkumar115120
 
Part I - Anticipatory Grief: Experiencing grief before the loss has happened
Part I - Anticipatory Grief: Experiencing grief before the loss has happenedPart I - Anticipatory Grief: Experiencing grief before the loss has happened
Part I - Anticipatory Grief: Experiencing grief before the loss has happenedbkling
 
MOTION MANAGEMANT IN LUNG SBRT BY DR KANHU CHARAN PATRO
MOTION MANAGEMANT IN LUNG SBRT BY DR KANHU CHARAN PATROMOTION MANAGEMANT IN LUNG SBRT BY DR KANHU CHARAN PATRO
MOTION MANAGEMANT IN LUNG SBRT BY DR KANHU CHARAN PATROKanhu Charan
 
Top 10 Most Beautiful Russian Pornstars List 2024
Top 10 Most Beautiful Russian Pornstars List 2024Top 10 Most Beautiful Russian Pornstars List 2024
Top 10 Most Beautiful Russian Pornstars List 2024locantocallgirl01
 
Cardiac Output, Venous Return, and Their Regulation
Cardiac Output, Venous Return, and Their RegulationCardiac Output, Venous Return, and Their Regulation
Cardiac Output, Venous Return, and Their RegulationMedicoseAcademics
 

Recently uploaded (20)

Test bank for critical care nursing a holistic approach 11th edition morton f...
Test bank for critical care nursing a holistic approach 11th edition morton f...Test bank for critical care nursing a holistic approach 11th edition morton f...
Test bank for critical care nursing a holistic approach 11th edition morton f...
 
Circulatory Shock, types and stages, compensatory mechanisms
Circulatory Shock, types and stages, compensatory mechanismsCirculatory Shock, types and stages, compensatory mechanisms
Circulatory Shock, types and stages, compensatory mechanisms
 
Shazia Iqbal 2024 - Bioorganic Chemistry.pdf
Shazia Iqbal 2024 - Bioorganic Chemistry.pdfShazia Iqbal 2024 - Bioorganic Chemistry.pdf
Shazia Iqbal 2024 - Bioorganic Chemistry.pdf
 
Drug development life cycle indepth overview.pptx
Drug development life cycle indepth overview.pptxDrug development life cycle indepth overview.pptx
Drug development life cycle indepth overview.pptx
 
Top 10 Most Beautiful Chinese Pornstars List 2024
Top 10 Most Beautiful Chinese Pornstars List 2024Top 10 Most Beautiful Chinese Pornstars List 2024
Top 10 Most Beautiful Chinese Pornstars List 2024
 
HISTORY, CONCEPT AND ITS IMPORTANCE IN DRUG DEVELOPMENT.pptx
HISTORY, CONCEPT AND ITS IMPORTANCE IN DRUG DEVELOPMENT.pptxHISTORY, CONCEPT AND ITS IMPORTANCE IN DRUG DEVELOPMENT.pptx
HISTORY, CONCEPT AND ITS IMPORTANCE IN DRUG DEVELOPMENT.pptx
 
7 steps How to prevent Thalassemia : Dr Sharda Jain & Vandana Gupta
7 steps How to prevent Thalassemia : Dr Sharda Jain & Vandana Gupta7 steps How to prevent Thalassemia : Dr Sharda Jain & Vandana Gupta
7 steps How to prevent Thalassemia : Dr Sharda Jain & Vandana Gupta
 
Obat Aborsi Ampuh Usia 1,2,3,4,5,6,7 Bulan 081901222272 Obat Penggugur Kandu...
Obat Aborsi Ampuh Usia 1,2,3,4,5,6,7 Bulan  081901222272 Obat Penggugur Kandu...Obat Aborsi Ampuh Usia 1,2,3,4,5,6,7 Bulan  081901222272 Obat Penggugur Kandu...
Obat Aborsi Ampuh Usia 1,2,3,4,5,6,7 Bulan 081901222272 Obat Penggugur Kandu...
 
TEST BANK For Porth's Essentials of Pathophysiology, 5th Edition by Tommie L ...
TEST BANK For Porth's Essentials of Pathophysiology, 5th Edition by Tommie L ...TEST BANK For Porth's Essentials of Pathophysiology, 5th Edition by Tommie L ...
TEST BANK For Porth's Essentials of Pathophysiology, 5th Edition by Tommie L ...
 
The Clean Living Project Episode 23 - Journaling
The Clean Living Project Episode 23 - JournalingThe Clean Living Project Episode 23 - Journaling
The Clean Living Project Episode 23 - Journaling
 
Jual Obat Aborsi Di Dubai UAE Wa 0838-4800-7379 Obat Penggugur Kandungan Cytotec
Jual Obat Aborsi Di Dubai UAE Wa 0838-4800-7379 Obat Penggugur Kandungan CytotecJual Obat Aborsi Di Dubai UAE Wa 0838-4800-7379 Obat Penggugur Kandungan Cytotec
Jual Obat Aborsi Di Dubai UAE Wa 0838-4800-7379 Obat Penggugur Kandungan Cytotec
 
TEST BANK For Guyton and Hall Textbook of Medical Physiology, 14th Edition by...
TEST BANK For Guyton and Hall Textbook of Medical Physiology, 14th Edition by...TEST BANK For Guyton and Hall Textbook of Medical Physiology, 14th Edition by...
TEST BANK For Guyton and Hall Textbook of Medical Physiology, 14th Edition by...
 
Physiologic Anatomy of Heart_AntiCopy.pdf
Physiologic Anatomy of Heart_AntiCopy.pdfPhysiologic Anatomy of Heart_AntiCopy.pdf
Physiologic Anatomy of Heart_AntiCopy.pdf
 
Physicochemical properties (descriptors) in QSAR.pdf
Physicochemical properties (descriptors) in QSAR.pdfPhysicochemical properties (descriptors) in QSAR.pdf
Physicochemical properties (descriptors) in QSAR.pdf
 
Face and Muscles of facial expression.pptx
Face and Muscles of facial expression.pptxFace and Muscles of facial expression.pptx
Face and Muscles of facial expression.pptx
 
VIP ℂall Girls Arekere Bangalore 6378878445 WhatsApp: Me All Time Serviℂe Ava...
VIP ℂall Girls Arekere Bangalore 6378878445 WhatsApp: Me All Time Serviℂe Ava...VIP ℂall Girls Arekere Bangalore 6378878445 WhatsApp: Me All Time Serviℂe Ava...
VIP ℂall Girls Arekere Bangalore 6378878445 WhatsApp: Me All Time Serviℂe Ava...
 
Part I - Anticipatory Grief: Experiencing grief before the loss has happened
Part I - Anticipatory Grief: Experiencing grief before the loss has happenedPart I - Anticipatory Grief: Experiencing grief before the loss has happened
Part I - Anticipatory Grief: Experiencing grief before the loss has happened
 
MOTION MANAGEMANT IN LUNG SBRT BY DR KANHU CHARAN PATRO
MOTION MANAGEMANT IN LUNG SBRT BY DR KANHU CHARAN PATROMOTION MANAGEMANT IN LUNG SBRT BY DR KANHU CHARAN PATRO
MOTION MANAGEMANT IN LUNG SBRT BY DR KANHU CHARAN PATRO
 
Top 10 Most Beautiful Russian Pornstars List 2024
Top 10 Most Beautiful Russian Pornstars List 2024Top 10 Most Beautiful Russian Pornstars List 2024
Top 10 Most Beautiful Russian Pornstars List 2024
 
Cardiac Output, Venous Return, and Their Regulation
Cardiac Output, Venous Return, and Their RegulationCardiac Output, Venous Return, and Their Regulation
Cardiac Output, Venous Return, and Their Regulation
 

Basic concepts in biostatistics edited pc-1.pptx

  • 1. Arba Minch University College of Medicine & Health Sciences ,School of public health Department of Public Health Epidemiology and Biostatistics unit By Kusse Otayto(BSc,MPH in Epi/Biostatistics) By: Kusse Otayto(BSc, MPH( Epidemiology & Biostatistics)) 1
  • 2. Descriptive statistics  It deals with the description of data in a clear & informative manner using tables, graphs & numerical summary  It involves the organization & summarization of a body of data with one or more meaningful tools.  It helps to identify the general features & trends in a set of data & extracting useful information  Also very important in conveying the final results of a study 2
  • 3. Descriptive statistics  Data  Are information collected from the source or  Are the raw materials of statistics  Are numbers which can be obtained by measurements or counting  Data are made up of a set of variables  It Can be obtained from Counting, Routinely kept records, Surveys, Experiments, Reports…  Types of data 1. Primary data 2. Secondary data 3
  • 4. 1. Primary data 1. Primary data:  Are data collected from the items or individual respondents directly by the researcher themselves for the purpose of a study.  Advantages of primary data 1. The data is original 2. Possibility of flexibility 3. Source for extensive research  Disadvantages of primary data 1. Expensive & time consuming 2. Possibility of personal prejudice(biases) 4
  • 5. 2. Secondary data 2. Secondary data:  Are data which had been collected by certain people or organization & statistically treated & the information contained in it is used for other purpose by other people  Obtained from journals, reports, government publications  Advantages of secondary data 1. Are readymade 2. Relatively cheaper 3. Lesser degree of personal prejudice  Disadvantages of secondary data 1. Lacks originality 2. May or may not suit the objects of enquiry (Not source for extensive research) 3. It is used with great care & caution 5
  • 6. Methods of data collection  Before any statistical work can be done data must be collected.  Data collection is a crucial stage in the planning & implementation of a study  If the data collection has been superficial, biased or incomplete, data analysis becomes difficult, & the research report will be of poor quality.  Therefore, we should concentrate all possible efforts on developing appropriate tools, & should test them several times.  Depending on the type of variable & the objective of the study different data collection methods can be employed: Observation,Interview,using self administered written questionnaire 6
  • 7. A. Observation  Is a technique that involves systematically selecting, watching & recording behavior & characteristics of living things, objects or phenomena.  It includes all methods from simple visual observations to the use of high level equipments  It can be undertaken in the following ways: 1. Participant observation:  The observer takes part in the situation he or she observes. 2. Non-participant observation:  The observer watches the situation, openly or concealed, but does not participate 7 Cont…
  • 8.  Observations can give additional, more accurate information on behavior of people than interviews or questionnaires  Observations can also be made on objects  Outline the guidelines for the observations prior to actual data collection.  Advantages  Gives relatively more accurate data on behavior & activities  Disadvantages:  Investigators or observer’s own biases  Needs more resources & skilled human power during the use of high level machines. 8 Cont…
  • 9. B. Interview (face-to-face)  Is a data collection technique that involves oral questioning of respondents, either individually or as a group  Answers to the questions posed during an interview can be recorded by: 1. Writing them down (either during the interview itself or immediately after the interview) or 2. By tape-recording the responses, or 3. By a combination of both.  Advantages of face-to-face interview  Can stimulate & maintain the respondent’s interest  Can create a rapport(bond) (understanding, concord)  Observations can be made as well.  Disadvantage  It is time consuming & expensive 9 Cont…
  • 10. Cont… 1. In-depth interview  It is a conversion between the researcher & the subject about the research area or topic.  It is designed to allow the respondent to tell their story in their own way  Issues are covered in detail; respondent leads the interviews/sets the agenda; no fixed order  Important in:  Highly sensitive issues  Geographical dispersed respondents  When peer pressure is expected to distort facts  It takes high cost & time than FGD 10
  • 11. 2. Focus group discussions  It allows a group of 8 -12 informants to freely discuss a certain subject with the guidance of a facilitator or reporter Advantages  Group interaction stimulate richer responses & emergence of new ideas  The researcher observes & gets first hand insights  Can be done more quickly & generally less expensive than in- depth interviews Disadvantage  Not good in highly sensitive issues 11 Cont…
  • 12. C.Using self-administered written questionnaire  Is a data collection tool in which written questions are presented that are to be answered by the respondents in written form  It can be administered in different ways, such as by:  Sending questionnaires by mail with clear instructions  Gathering all or part of the respondents in one place at one time, giving oral or written instructions, & letting the respondents fill out  Hand-delivering questionnaires to respondents & collecting them later 12 Cont…
  • 13.  The questions can be either open-ended or closed A. Example of closed ended question 1. What is the current breastfeeding status of mother ? A. Exclusive breastfeeding B. Partial breastfeeding C. Not breastfeeding B. Example of Open ended question 1. At what age should the child start supplementary food? why? 13 Cont….
  • 14. Advantages  Is simpler & cheaper than interview  Can be administered to many persons simultaneously  Can be sent by post. Disadvantages  It demands a certain level of education & skill of respondents  If a mailed questionnaire one, people of a low socio- economic status are less likely to respond to it 14 Cont….
  • 15. Variable Variable  Is a characteristic which takes different values in different PPT (persons, places, or things).  Any aspect of an individual or object that is measured (e.g. BP) or recorded (e.g. age, sex) & takes any value.  There may be one or many variable in a study 15
  • 16. Types of variables A. Qualitative (categorical) variables  Nominal  Ordinal B. Quantitative (numerical) variables  Continuous  Discrete 1. Dependent (outcome,Response) variable 2. Independent (exposure,Explanatory) variable 16 Variable
  • 17. 1. Categorical(Qualitative) variable  A variable which can not be measured in quantitative form but can only be sorted by name or categories  Not able to be measured as we measure height or weight  The notion of magnitude is absent or implicit.  Categories must not overlap & must cover all possibilities 17 Variable….
  • 18. Categorical variable is divided into two: 1. Nominal variable  The values fall into un-ordered categories or classes  Uses names, labels or symbols to assign each measurement.  Examples: Blood type (A, B, AB, O) Sex (male/female) 2. Ordinal variable  Assigns each measurement to one of a limited number of categories that are ranked in terms of order.  Although non-numerical, can be considered to have a natural ordering  Examples: 1. Cancer stages: 1, 2, 3, 4 2. Pain severity: no pain, slight pain, moderate pain, severe pain 18 Variable….
  • 19. B. Quantitative (numerical) variable  A variable that can be measured or counted & expressed numerically.  Has the notion of magnitude.  E.g. Height, weight, # of children, etc.  Quantitative variable is divided into two: 1. Discrete variable  It can only have a limited number of discrete values & hence takes on integer values only  Characterized by gaps or interruptions in the values.  Both the order & magnitude of the values matter.  The values are not just labels, but are actual measurable quantities.  E.g. Number of children in household(0, 1, 2, 3, etc.) 19 Variable….
  • 20. Variables… 2. Continuous variable  It can have an infinite number of possible values in any given interval or within some range  Both the magnitude & the order of the values matter  Does not possess the gaps or interruptions  E.g. Weight (50.123...), Height (1.342...) 20
  • 21. Variables… Manipulation of variables  Continuous variables can be discredited  E.g. Age (1&1/12-1yr) can be rounded to whole numbers  Continuous or discrete variables can be categorized  E.g. Age categories- 1(1-5), 2(6-10), 3(11-15)  Categorical variables can be re-categorized  E.g. marital status (Single, Married, Divorced, Widowed) lumping from 4 categories down to 2 (married, single) 21
  • 22. Variables… 1. Independent variables  Precede(come first) dependent variables in time  Are often manipulated by the researcher 2. Dependent variables  What is measured as an outcome in a study  Values depend on the independent variable  Example 1. Health education involving active participation of mothers will produce more positive changes in child feeding than health education based on lectures.  Independent variable:  Type of health education  Dependent variable:  Changes in child feeding 22
  • 24. Scales of Measurement  Scales of measurement  Is an assignment of numbers to subjects, objects or events(variables) in which we are interested according to a set of rules  Measurement is a way of refining our ordinary observations so that we can assign numerical values to our observations.  These numbers will provide the raw material for our statistical analysis.  Why we measure things or worry about the different forms that measurement may take?  It allows us to go beyond simply describing the presence or absence of an event or thing to specifying how much, how long, or how intense it is.  With measurement, our observations become more accurate & more reliable. 24
  • 25. Scales...  There are four types of scales of measurement. 1. Nominal scale  Used when data are classified into one of two or more categories  The values fall into un-ordered categories or classes( aren’t hierarchical, one category isn’t “better” or “higher” than another)  Uses names, labels or symbols to assign each measurement.  Labeling or naming allows us to make qualitative distinctions or to categorize & then count the frequency of persons, objects, or things in each category. 25
  • 26.  It should be: Exhaustive & Mutually exclusive 1. Exhaustive :  Should include all possible answerable responses. 2. Mutually exclusive :  No respondent should be able to have two attributes simultaneously  Not really a ‘scale’ because it does not scale objects along any dimension  Assignment of numbers to the categories has no mathematical meaning, simply for identification purposes.  Examples: 1. Marital status(Single, Married, Divorced) 2. Religion(Muslim, Protestant, Orthodox, Catholic) 26 Scales...
  • 27. Scales... 2. Ordinal scale  Used when data are classified into logically order- rank  Assigns each measurement to one of a limited number of categories that are logically ranked in terms of order  Although non-numerical, can be considered to have a natural ordering (The numbers have limited meaning 4>3>2>1)  No consistent distance between points of measurement  Example: Social class (Very poor, Poor, Rich, Very rich)  There are not equal interval b/n adjacent numbers 27
  • 28. Scales... 3. Interval scale  Used when data are classified on a scale that assumes equal distance between numbers  There are Magnitude + Constant distance b/n points + No true zero point + Equal interval b/n adjacent numbers  Example: Temp. in o F on 4 consecutive days  Days: A B C D  Temp. o F: 50 55 60 65  For these data, not only is day A with 50o F cooler than day D with 65o but is 15o cooler.  It has no true zero point (“0” is arbitrarily chosen & doesn’t reflect the absence of temp.) 28
  • 29. Scales... 4. Ratio scale  Used when data are classified on a scale that assumes equal distance & a true zero value  Measurement begins at a true zero point & the scale has equal space  There are Magnitude + Constant distance b/n points + Equal ratios + True zero.  Examples: Height, weight, BP, etc.  Zero weight or height means the complete absence of weight or height.  A 100-kg person has one-half the weight of a 200-kg person & twice the weight of a 50-kg person.  It is the most sensitive, powerful type- b/c contain the most precise information about each observation that is made 29
  • 30. 30 Decision tree to determine the appropriate scale of measurement. Question 1 There any order to the numbers? Question 2 Are there equal interval b/n adjacent numbers? Question 3 Is there absolute zero? Nominal scale Ordinal scale Interval scale Ratio scale Yes Yes Yes No No No
  • 31. 31
  • 32. Why Is Level of Measurement Important?  Helps you to decide 1. What kind of data display or summary method & What statistical analysis is appropriate on the values that were assigned & 2. How to interpret the data from that variable. 32
  • 33. Data organization & presentation 33
  • 34. Data Organization & Presentation 1. For categorical variables A. Using table of frequency distribution 1. Frequency counts 2. Relative frequency 3. Cumulative frequency 4. Relative cumulative frequency B. Using pictorial forms 1. Bar charts(graph) 2. Pie charts  Ordered array:  A simple arrangement of individual observations in order of magnitude.  Very difficult with large sample size 34
  • 35. 2. For Quantitative variable A. Using table of frequency distributions 1. Frequency counts 2. Relative frequency 3. Cumulative frequencies 4. Relative cumulative frequency B. Using pictorial forms 1. Histogram 2. Frequency polygon 3. Line graph 4. Scattered plot 5. Box 6. Ogive/cumulative frequency… 35 Data Organization & Presentation….
  • 36.  Frequency table:  It involves a listing of all the observed values of the variable being studied & How many times each value is observed.  Frequency distribution:  The distribution of the total number of observations among the various categories is called a frequency distribution.  Simple & effective way for summarizing large amounts of data  Relative Frequency  It is the proportion or percentages of observations in each category.  The distribution of proportions is called the relative frequency distribution of the variable  Given a total number of observations, the relative frequency distribution is easily derived from the frequency distribution. 36 Frequency table & Frequency Distributions…
  • 37. Frequency table & Frequency Distributions….. Cumulative frequency  It is the number of observations in the category plus observations in all categories smaller than it. Cumulative relative frequency  It is the proportion of observations in the category plus observations in all categories smaller than it.  It is obtained by dividing the cumulative frequency by the total number of observations. 37
  • 38. BWT Freq. Cum. Freq Rel. Freq. Cum. rel. freq Very low 43 43 43/9974*100 = 0.4 43/9974*100 = 0.4 Low 793 43+793 = 836 793/9974*100 = 8.0 836/9974*100 = 8.4 Normal 8870 836+8870 = 9706 8870/9974*100 = 88.9 9706/9974*100 = 97.3 Big 268 9706+268 = 9974 268/9974*100 = 2.7 9974/9974*100 = 100 Total 9974 100 38 For example: Birth weight for newborns with levels: 1. Very low 2. Low 3. Normal & 4. Big Table 1. Distribution of birth weight of newborns b/n 1976-1996 at “X” town. For categorical variables
  • 39.  For Quantitative variable,  Select a set of continuous, non-overlapping intervals such that each value can be placed in one & only one of the intervals.  The first consideration is how many intervals to include  To determine the number of class intervals & the corresponding width, we may use:  Sturge’s rule:  Where K = Number of class intervals n = No. of observations W = Width of the class interval K 1 3.322(logn) W L S K     39 Quantitative variable
  • 40. 1. Example: Leisure time (hours) per week for 40 college students: 23 24 18 14 20 36 24 26 23 21 16 15 19 20 22 14 13 10 19 27 29 22 38 28 34 32 23 19 21 31 16 28 19 18 12 27 15 21 25 16 K = 1 + 3.322 (log n) K = 1 + 3.322 (log40) = 6.32 ≈ 6 Maximum value = 38, Minimum value = 10 W = L-S K W = (38-10)/6 = 4.66 ≈ 5 40 Quantitative variable....
  • 41. 41 Time (Hours) Frequency Relative Frequency Cumulative Relative Frequency 10-14 15-19 20-24 25-29 30-34 35-39 5 11 12 7 3 2 0.125 0.275 0.300 0.175 0.075 0.050 0.125 0.400 0.700 0.875 0.950 1.00 Total 40 1.00 Quantitative variable
  • 42. 42  Class Limit: The range for each class  Upper class limit  Lower class limit  Mid-point (Class mark):  The value of the interval which lies midway b/n the lower & the upper limits of a class.  Class boundary (True limits):  Are those limits that make an interval of a continuous variable continuous in both directions  Upper class boundary  Lower class boundary  Subtract 0.5 from the lower & add it to the upper class limit Quantitative variable....
  • 43. 43 Time(Hours) True limit(class boundary) Mid-point Frequency 10-14 15-19 20-24 25-29 30-34 35-39 9.5 – 14.5 14.5 – 19.5 19.5 – 24.5 24.5 – 29.5 29.5 – 34.5 34.5 - 39.5 (10+14)/2 = 12 (15+19)/2 = 17 (20+24)/2 = 22 (25+29)/2 = 27 (30+34)/2 = 32 (35+39)/2 = 37 5 11 12 7 3 2 Total 40 Quantitative variable....
  • 44. Guidelines for constructing tables 1. Keep them simple (Limit the number of variables to three or less) 2. All tables should be self-explanatory (Include clear title telling what, when & where) 3. Clearly label the rows & columns 4. State clearly the unit of measurement used 5. Explain codes & abbreviations in the foot-note 6. Show totals 7. If data is not original, indicate the source in foot- note. 44
  • 45. Pictorial /Diagrammatic presentation Importance of diagrammatic presentation 1. Diagrams have greater attraction than mere figures 2. They give quick overall impression of the data 3. They have great memorizing value than mere figures 4. They facilitate comparison 5. Used to understand patterns & trends  E.g.,  Skewed or symmetric distribution  Multiple peaks / mode  Are there any outliers ?  Relationship between variables. 45
  • 46. 1. Bar charts (Graphs) 1. Graphical equivalent of a frequency table 2. Categories are listed on the horizontal axis (X-axis) 3. Frequencies or relative frequencies are represented on the Y-axis (ordinate) 4. The height of each bar is proportional to the frequency or relative frequency of observations in that category 46 Qualitative variable presentation
  • 47. A. Simple bar chart:-used to represent a single variable 47 0 20 40 60 80 100 Not immunized Partially immunized Fully immunized Immunization status Number of children Fig. 1. Immunization status of Children in Adami Tulu Woreda, Feb.1995
  • 48. B. Sub-divided (component) bar chart 1. If there are different quantities forming the sub- divisions of the totals, simple bars may be sub- divided in the ratio of the various sub-divisions to exhibit the relationship of the parts to the whole. 2. The order in which the components are shown in a “bar” is followed in all bars used in the diagram 48 Qualitative variable presentation
  • 49. Example of 100%component bar chart: 0 20 40 60 80 100 August October December 2003 Percent Mixed P. vivax P. falciparum 49 Fig.1 Plasmodium species distribution for confirmed malaria cases, Zeway, 2003
  • 50.  Method of constructing bar chart 1. All the bars must have equal width 2. The bars are not joined together (leave space b/n bars) 3. The different bars should be separated by equal distances 4. All the bars should rest on the same line called the base 5. Both axes clearly label  Instead of “stacks” rising up from the horizontal (bar chart), we could plot instead the shares of a pie. 50 Qualitative variable presentation
  • 51. 2. Pie chart 1. It shows the relative frequency for each category by dividing a circle into sectors 2. The angles are proportional to the relative frequency. 3. Used for a single categorical variable 4. Use percentage distributions  Steps to construct a pie-chart 1. Construct a frequency table 2. Change the frequency into percentage (P) 3. Change the percentages into degrees, where,  Degree = Percentage X 360o 4. Draw a circle & divide it accordingly 51 Qualitative variable presentation
  • 52. Cause of death No. of death Percentage Circulatory system Neoplasm Respiratory system Injury & poisoning Digestive system Others 100 000 70 000 30 000 6 000 10 000 20 000 100,000/236,000*360o = 153o 70,000/236,000*360o = 107o 30,000/236,000*360o = 46o 6,000/236,000*360o = 9o 10,000/236,000*360o = 15o 20,000/236,000*360o = 30o Total 236 000 100% (360o) 52 Steps to construct a pie-chart Example: Distribution of deaths for females, in England and Wales, 1989.
  • 53. 53  Instead of “stacks” rising up from the horizontal (bar chart), we could plot instead the shares of a pie.  Recalling that a circle has 360 degrees, that 50% means 180 degrees, 25% means 90 degrees, etc, we can identify “wedges” according to relative frequency Distribution fo cause of death for females, in England and Wales, 1989 Circulatory system 42% Neoplasmas 30% Respiratory system 13% Injury and Poisoning 3% Digestive System 4% Others 8%
  • 54. 3. Histogram 1. Histograms are frequency distributions with continuous class interval that have been turned into graphs 2. A histogram is a type of bar chart, but there are no spaces b/n the bars(continuous data) 3. Histograms are used to visually represent frequency distributions of continuous data 4. Given a set of numerical data, we can obtain impression of the shape of its distribution by constructing a histogram 54 Quantitative variable presentation
  • 55. 3. Histogram 5. Constructed by choosing a set of non-overlapping class intervals & counting the number of observations that fall in each class. 6. It is necessary that the class intervals be non-overlapping so that each observation falls in one & only one interval. 7. Bars are drawn over the intervals 8. The area of each bar is proportional to the frequency of observations in the interval  Two problems with histograms 1. They are somewhat difficult to construct 2. The actual values within the respective groups are lost & difficult to reconstruct  Stem-and-leaf plot overcomes these problems 55 Quantitative variable presentation….
  • 56. Age group 15-19 20-24 25-29 30-34 35-39 40-44 45-49 Number 11 36 28 13 7 3 2 56 Age of women at the time of marriage 0 5 10 15 20 25 30 35 40 14.5-19.5 19.5-24.5 24.5-29.5 29.5-34.5 34.5-39.5 39.5-44.5 44.5-49.5 Age group No of women Histogram Example: Distribution of the age of women at the time of marriage
  • 57. 5. Frequency polygon 1. Instead of drawing bars for each class interval, sometimes a single point is drawn at the mid point of each class interval & consecutive points joined by straight line. 2. Graphs drawn in this way are called frequency polygons 3. The total area under the frequency polygon is equal to the area under the histogram 4. Frequency polygons are superior to histograms for comparing two/more sets of data. 57 Quantitative variable presentation….
  • 58. Age of women at the time of marriage 0 5 10 15 20 25 30 35 40 12 17 22 27 32 37 42 47 Age No of women 58
  • 59. 6. Scatter plot 1. Most studies in medicine involve measuring more than one characteristic 2. For two quantitative variables we use bivariate plots (also called scatter plots or scatter diagrams). 3. In the study on percentage saturation of bile, information was collected on the age of each patient 4. To see whether a relationship existed between the two measures.  E.g. Saturation of bile & age 59 Quantitative variable presentation….
  • 60. 6. Scatter plot….  When both the variables are qualitative then we can use a bar graph.  When one of the characteristics is qualitative & the other is quantitative, the data can be displayed in box & whisker plots.  A scatter diagram is constructed by drawing X- & Y- axes.  Each point represented by a point or dot() represents a pair of values measured for a single study subject  The graph suggests the possibility of a positive relationship between age & percentage saturation of bile in women. 60 Quantitative variable presentation….
  • 61. Age and percentage saturation of bile for women patients in hospital Z, 1998 0 20 40 60 80 100 120 140 160 0 10 20 30 40 50 60 70 80 Age Saturation of bile 61
  • 62. 7. Line graph 1. Useful for assessing the trend of particular situation overtime. 2. Helps for monitoring the trend of epidemics. 3. Values for each category are connected by continuous line. 4. Sometimes two or more graphs are drawn on the same graph taking the same scale so that the plotted graphs are comparable. 62 Quantitative variable presentation….
  • 63. Line graph 63 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 1967 1969 1971 1973 1975 1977 1979 Year Rate (%) Fig 5: Malaria Parasite Prevalence Rates in Ethiopia, 1967 – 1979 E.C.
  • 64. Line Graph 0 10 20 30 40 50 60 1960 1970 1980 1990 2000 Year MMR/1000 Year MMR 1960 50 1970 45 1980 26 1990 15 2000 12 64 Figure (1): Maternal mortality rate of (country), 1960-2000
  • 65. Reading assignment Reading assignment 1. Ogive curve 2. Box & whisker plot 3. Stem and Leave plot 65
  • 66. Numerical summary measures 1. Measures of central tendency 2. Measures of dispersion 66
  • 67. Measures of Central Tendency 67
  • 68. 1. Measures of Central Tendency  Statistic:–  Descriptive measure computed from sample data  Parameter:–  Descriptive measure computed from population data  Measures of central tendency:-  Are the measures used to summarize the point at which the data tend to cluster in a single number or statistic.  The most commonly used measures of central tendency are: 1. Arithmetic Mean, 2. Median & 3. Mode. 68
  • 69. 1. Arithmetic mean 1. Arithmetic mean  It is the average of the data set  The sum of the observations divided by the number of observations.  Mean for ungrouped data  Mean of a sample  Mean of a population = (X bar) refers to the mean of a sample & = refers to the mean of a population Σx is a command that adds all of the X values n = is the total number of values in the series of a sample & N = is the sum for a population X μ 69 N X    n X X  
  • 70. Arithmetic mean …..  Example: 19 21 20 20 34 22 24 27 27 27  Calculate the mean , n=10  Mean = 19 + 21 + 20 +20+ 34 + 22 + 24 + 27 + 27 +27 = 24.1 10  Mean for grouped data  We assume that all values falling into a particular class interval are located at the mid-point of the interval.  It is calculated as follow: 70 x = m f f i i i=1 k i i=1 k   Where, k = the number of class intervals mi = the mid-point of the ith class interval fi = the frequency of the ith class
  • 71. Example. Compute the mean age of 169 subjects from the grouped data. Class interval Mid-point (mi) Frequency (fi) mifi 10-19 20-29 30-39 40-49 50-59 60-69 14.5 24.5 34.5 44.5 54.5 64.5 4 66 47 36 12 4 58.0 1617.0 1621.5 1602.0 654.0 258.0 Total __ 169 5810.5 71 Mean = 5810.5/169 = 34.5 years Arithmetic mean …..
  • 72. Properties of the arithmetic mean 1. Can be used for both discrete & continuous data.  However, it is not appropriate for either nominal or ordinal data. 2. For given set of data there is one & only one arithmetic mean. 3. It is easily understood & easy to compute. 4. Algebraic sum of the deviations of the given values from their arithmetic mean is always zero. 5. It is greatly affected by the extreme values. 72
  • 73. 2. Median Median  Is the value that divides a series of values in 1/2 when they are listed in order  If observations are odd, the median is defined as the  [(n+1)/2]th observation.  E.g. 19 20 20 21 22 23 24 27 27 27 34 n=11  Median = [(n+1)/2]th = [(11+1)/2]th = [6]th= 23  If observations are even the median is the average of the two middle  (n/2)th + [(n/2)+1]th /2 values i.e, there is no middle observation.  E.g. 19 20 20 21 22 24 27 27 27 34 n= 10  Median = (n/2)th + [(n/2)+1]th /2= (10/2)th + [(10/2)+1]th /2= (5)th + [6]th /2 = (22 + 24)/2 = 23 73
  • 74.  Median for Grouped data  We assume that the values within a class-interval are evenly distributed through the interval.  The first step is to locate the class interval in which it is located.  Find n/2 & see a class interval with a minimum cumulative frequency which contains n/2.  Note:- All class intervals with cumulative frequencies ≥n/2 contain the median. 74 Median….
  • 75. To find a unique median value, use the following interpolation formal. 75 W f F 2 n L = x ~ m c m               Median….  Where, • Lm = lower true class boundary of the interval containing the median • Fc = cumulative frequency of the interval just above the median class interval • fm = frequency of the interval containing the median • W= class interval width • n = total number of observations
  • 76. Ex. Compute the median age of 169 subjects from the grouped data. Class interval Mid-point (mi) Frequency (fi) Cum. freq 10-19 20-29 30-39 40-49 50-59 60-69 14.5 24.5 34.5 44.5 54.5 64.5 4 66 47 36 12 4 4 70 117 153 165 169 Total 169 76 Median….
  • 77. 77  Median = = n/2 = 169/2 = 84.5 = 84.5 = in the 3rd class interval = Lower limit = 29.5, = Upper limit = 39.5 = Frequency of the class = 47 = Fc above class interval = 70 = Median = 29.5 + (84.5-70 /47)10 = 32.58 ≈ 33 W f F 2 n L = x ~ m c m               Median….
  • 78. Properties of median 1. Can be used for ordinal, discrete & continuous data.  However, it is not appropriate for nominal data. 2. There is only one median for a given set of data 3. The median is easy to calculate 4. Median is a positional average & hence it is not drastically affected by extreme values 5. It is not a good representative of data if the number of items is small 78
  • 79. 3. Mode  Mode  It is the value/ observation which occurs most frequently.  Most distributions have one peak & are described as uni- modal.  E.g. 19 21 20 20 34 22 24 27 27 27  Mode = 27  The mode of grouped data usually refers to the modal class with the highest frequency.  The modal value is the highest bar in a histogram  Not a good summary  Possible to have one, more than one/no mode 79
  • 80. To find a single value of mode for grouped data, use the following formula:       Mode 1 mo 1 2 Δ = L + i Δ + Δ 80 mo L 1  2   Where:  i is the class width  is the difference b/n the frequency of class mode & the frequency of the class after (below) the class mode  is the difference b/n the frequency of class mode & the frequency of the class before (above) the class mode  is the lower boundary of class mode Mode….
  • 81. Ex. Find the mode for the following data 81  Solution  Lmo = 19.5, F =66, Fb =47, Fa =4, i=10  Mode =19.5+((66-47)/66-47+66-4))10 =21.8=22 Mode…. Class interval Mid-point (mi) Frequency (fi) Cum. freq 10-19 20-29 30-39 40-49 50-59 60-69 14.5 24.5 34.5 44.5 54.5 64.5 4 66 47 36 12 4 4 70 117 153 165 169 Total 169
  • 82. Properties of mode 1. Can be used for nominal, ordinal, discrete & continuous data.  However, it is more appropriate for nominal & ordinal data. 2. It is not affected by extreme values 3. Often its value is not unique 4. The main drawback of mode is that often it does not exist 82
  • 83. 2. Measures of Dispersion 83
  • 84. 2. Measures of Dispersion Measures of Dispersion  Measures that quantify the variation or dispersion of a set of data from its central location  Dispersion of a set of observations is the variety exhibited by the observations 1. If all the values are the same→ There is no dispersion 2. If all the values are different → There is a dispersion 3. If the values close to each other →The amount of dispersion is small 4. If the values are widely scattered/spread → The dispersion is greater 84
  • 85. Common measures of dispersion 1. Range 2. Inter quartile range 3. Variance 4. Standard deviation 5. Coefficient of variation 85 Measures of Dispersion….
  • 86. 1. Range (R) Range (R)  Is the difference b/n the largest & smallest observations in a sample.  Range concern only on two values  Range = Maximum value – Minimum value  The range is the simplest measure of dispersion.  A data set with higher range shows more variability  Example –  Data values: 5, 9, 12, 16, 23, 34, 37, 42  Maximum value= 42,  Minimum value= 5  Range = 42-5 = 37 86
  • 87.  Properties of range 1. It is the simplest crude measure & can be easily understood 2. It takes into account only two values which causes it to be a poor measure of dispersion 3. Very sensitive to extreme observations (outliers) 4. The larger the sample size, the larger the range 87 1. Properties of range....
  • 88. 2. Inter-quartile range (IQR)  Inter-quartile range (IQR)  It is used when the median is used as the measure of central tendency.  It gives the range in which the middle 50% of the distribution lies.  The inter-quartile range quantifies the difference b/n the third & first quartiles. IQR = Q3 - Q1  A large IQR indicates a large amount of variability among the middle 50% of the observations &  A small IQR indicates a small amount of variability 88
  • 89. 2. Inter-quartile range (IQR).....  The inter-quartile range is particularly useful to describe data sets where there are a few extreme values.  Unlike the range, & to a lesser extent the standard deviation, it is not sensitive to extreme values as it relies on the spread of the middle 50% of the distribution.  So, if there are data sets which have extreme values, it can be more appropriate to use the median to describe central tendency & the inter-quartile range to describe the spread. 89
  • 90. What does Quartiles mean?  If the data are divided into four equal parts, we speak of quartiles.  Quartiles (Q1, Q2, Q3, Q4) – sample size (data) is divided into 4 equal parts getting 25% of observations in each of them.  The first quartile(Q1):  Is the point which gives us 25% of the area to the left of it & 75% to the right of it.  This means that 25% of the observations are less than or equal to the first quartile & 75% of the observations greater than or equal to the first quartile.  The first quartile (Q1): 25% of all the ranked observations are less than Q1.  The first quartile is also called the 25th percentile. 90
  • 91.  The second quartile (Q2):  The point which gives us 50% of the area to the left of it & 50% to the right of it  The second quartile is called the median.  The third quartile (Q3):  Is the point which gives us 75% of the area to the left of it & 25% of the area to the right of it.  This means that 75% of the observations are less than or equal to the third quartile & 25% of the observation are greater than or equal to the third quartile.  The third quartile is also called the 75th percentile. 91 What does Quartiles mean?....
  • 92.  Ex.1: Suppose we have a small data set of twelve observations  15 18 19 20 20 20 21 23 23 24 24 25 1. We want to divide the data into four equal sets 2. First, we find the median  15 18 19 20 20 20 ↑ median 21 23 23 24 24 25  Median = 20.5 (half way b/n the 6th & 7th observations),  Divides the data into two equal sets with exactly 50% of the observations in each:  The 1st - 6th observations in the first set &  The 7th - 12th observations in the other. 92 What does Quartiles mean?.... What does Quartiles mean?....
  • 93.  To find the first quartile we consider the observations less than the median.  15 18 19 ↑ 20 20 20  The first quartile is the median of these data.  In this case, the first quartile is half way b/n the 3rd & 4thobservations & is equal to 19.5.  Now, we consider the observations which are greater than the median.  21 23 23 ↑ 24 24 25  The third quartile is the median of these data & is equal to 23.5.  15 18 19 ↑ 20 20 20 ↑ 21 23 23 ↑ 24 24 25 Q1 Q2 Q3  IQR = Q3- Q1 = 23.5- 19.5.= 4 93 What does Quartiles mean?....
  • 94.  Example 1: Suppose the first & third quartile for weights of girls 12 months of age are 8.8 Kg & 10.2 Kg, respectively.  IQR = 10.2 Kg – 8.8 Kg = 1.4  i.e., 50% of the infant girls weigh between 8.8 & 10.2 Kg.  Example 2: Given the following data set (age of patients):-  18, 59, 24, 42, 21, 23, 24, 32  Find the inter-quartile range  Solution: 18 21 23 24 24 32 42 59  Q1st = {(n+1)/4}th = (2.25) th = 21 + (23-21)x .25 = 21.5  Q3rd = {3/4 (n+1)} th = (6.75) th = 32 + (42-32)x .75 = 39.5  Hence, IQR = 39.5 - 21.5 = 18 94 What does Quartiles mean?....
  • 95.  Ex.2 :Given these data: 13, 7, 9, 15, 11, 5, 8, 4 a. Arrange the observations in increasing order.  4, 5, 7, 8, 9, 11, 13, 15. b. Find the position of the 1st & 3rd quartiles. = n=8. = Position of Q1 = ¼ (n+1) = ¼ (8+1) = 2.25th = Q1 lies the 2nd & 3rd observations = Position of Q3 = ¾(n+1) = ¾(8+1) = 6.75th = Q3 lies the 6th & 7th observations 95 What does Quartiles mean?....
  • 96. C. Identify the value of the 1st & 3rd quartiles.  The value of Q1 is equal to the value of the 2nd observation plus 1/4th the difference b/n the values of the 3rd & 2nd observations:  Value of the 3rd observation =7  Value of the 2nd observation = 5  Q1 = 5 +1/4(7-5) = 5 +2/4 = 5.5  The value of Q3 is equal to the value of the 6th observation plus 3/4ths of the difference b/n the value of the 7th & 6th observations:  Value of the 7th observation =13  Value of the 6th observation=11  Q3 = 11 +3/4 (13-11) = 11 +3(2)/4 = 11+6/4 = 12.5 96 What does Quartiles mean?....
  • 97. d. Calculate the inter-quartile range  Q3 = 12.5 ; Q1 = 5.5  IQR = Q3-Q1 = 12.5–5.5 = 7  Generally we apply this formula: 1. Qk = ((kn/4) th + (kn/4+1)th)/2 -if n is even 2. Qk = ((kn/4+1)/2) th- if n is odd  Quartiles for grouped data  Apply the same method with median = Q1= Q1L+((n/4-fc)/fQ1)I & Q3= Q3L+((3n/4-fc)/fQ3)i  To find the class of each = Q1=n/4 & Q3=3n/4 = IQR= Q3-Q1 97 What does Quartiles mean?....
  • 98. Properties of IQR 1. It is a simple & versatile measure 2. It encloses the central 50% of the observations 3. It is not based on all observations but only on two specific values 4. It is important in selecting cut-off points in the formulation of clinical standards 5. Since it excludes the lowest & highest 25% values, it eliminates the outlier problem 6. Less sensitive to the size of the sample 98
  • 99. Percentiles  Percentiles:  Are simply divide the data into 100 pieces.  Are less sensitive to outliers &  Are not greatly affected by the sample size (n). 99
  • 100. 100  P0:  The minimum  P25:  25% of the sample values are less than or equal to this value.  1st Quartile, P25 means 25th percentile  P50:  50% of the sample are less than or equal to this value.  2nd Quartile  P75:  75% of the sample values are less than or equal to this value.  3rd Quartile  P100:  The maximum Percentiles….
  • 101. 101  The pth percentile:  Is a value that is p% of the observations &  the remaining (1-p)% .  The observation corresponding to p(n+1)th if p(n+1) is an integer  The average of (k)th & (k+1)th observations if p(n+1) is not an integer, where k is the largest integer less than p(n+1).  If p(n+1) = 3.6, the average of 3rd & 4th observation Percentiles…..
  • 102. 102  Example: Birth weight in grams  2069, 2581, 2759, 2834, 2838, 2841, 3031,  3101, 3200, 3245, 3248, 3260, 3265, 3314,  3323, 3484, 3541, 3609, 3649, 4146  Find the 10th & 90th percentile of the data set. n=20  Solution: 10th percentile =Pt = ((tn/100)th + (tn/100+1)th)/2 -if n is even  20×0.1 = (2)th + (20×0.1)+1 = (3)th are not integers,  The average of the 2nd & 3rd values  = (2581+2759)/2 = 2670 g  Solution: 90th percentile =  20×0.9 = (18)th + (20×0.9)+1 = (19)th are not integers,  The average of the18th & 19th values  = (3609+3649)/2 = 3629 g Percentiles…..
  • 103.  Generally we apply this formula: 1. Pt = ((tn/100)th + (tn/100+1)th)/2 -if n is even 2. Pt = ((tn/100+1)/2) th -if n is odd  For grouped data use the following formula:  P = PL+ (P(n)-fc)/f)i  To find the class, use p(n) value or  Where  m represents the percentile we're finding,  N is the total number of observations in the data set. 103 Percentiles…..
  • 104. Variance (2, s2)  The variance  Is the average of the squares of the deviations taken from the mean  A good measure of dispersion should make use of all the data.  The variance achieves this by averaging the sum of the squares of the deviations from the mean.  The sample variance of the set x1, x2, ., xn of n observations with mean ẍ is  Degrees of freedom  n-1 used because if we know n-1 deviations, the nth deviation is known  Deviations have to sum to zero 104 S (x x) n - 1 2 i 2 i=1 n   
  • 105.  It is squared because the sum of the deviations of the individual observations of a sample about the sample mean is always zero  Degrees of freedom  In computing the variance there are (n-1) degrees of freedom because only (n-1) of the deviations are independent from each other  This is because the sum of the deviations from their mean (Xi-Mean) must add to zero.  The last one can always be calculated from the others automatically (It is not free to vary). 105 Variance (2, s2)
  • 106.  Example  Data: 43,66,61,64,65,38,59,57,57,50.  Find Sample Variance of the data ,  Mean = 56  S2= [(43 - 56) 2 + (66 - 56)2+…..+(50 - 56) 2 ]/10-1 = 810/9 = 90 Variance for grouped data 106 S (m x) f f -1 2 i 2 i i=1 k i i=1 k     x  Where  mi = the mid-point of the ith class interval  fi = the frequency of the ith class interval  = the sample mean  k = the number of class intervals Variance (2, s2)
  • 107.  Ex. Compute the variance of the age of 169 subjects from the grouped data. Class interval (mi) (fi) (mi-Mean) (mi-Mean)2 (mi-Mean)2 fi 10-19 20-29 30-39 40-49 50-59 60-69 14.5 24.5 34.5 44.5 54.5 64.5 4 66 47 36 12 4 -19.98 -9-98 0.02 10.02 20.02 30.02 399.20 99.60 0.0004 100.40 400.80 901.20 1596.80 6573.60 0.0188 3614.40 4809.60 3604.80 Total 169 1901.20 20199.22 107  Mean = 5810.5/169 = 34.48 years  S2 = 20199.22/169-1 = 120.23 Variance (2, s2)
  • 108. 108 1. The main disadvantage of variance is that its unit is the square of the unite of the original measurement values 2. The variance gives more weight to the extreme values as compared to those which are near to mean value, because the difference is squared in variance. 3. The drawbacks of variance are overcome by the standard deviation. Properties of Variance
  • 109. Standard deviation (, s)  Standard deviation (, s)  It is the square root of the variance.  This produces a measure having the same scale as that of the individual values.  It shows variation about the mean 109    2 and S = S2
  • 111. Properties of SD 1. Has the advantage of being expressed in the same units of measurement as the mean 2. The best measure of dispersion & is used widely because of the properties of the theoretical normal curve. 3. However, if the units of measurements of variables of two data sets is not the same, then there variability can’t be compared by comparing the values of SD. 111
  • 112. 112 Wide spread results in higher SDs Narrow spread in lower SDs Standard deviation (, s).....
  • 113. Coefficient of variation (CV)  Coefficient of variation (CV)  When two data sets have different units of measurements, or their means differ sufficiently in size, the CV should be used as a measure of dispersion.  It is the best measure to compare the variability of two series of sets of observations.  Data with less coefficient of variation is considered more consistent.  CV is the ratio of the SD to the mean multiplied by 100. 113
  • 114. CV S x 100   “Cholesterol is more variable than systolic blood pressure” SD Mean CV (%) SBP Cholesterol 15mm 40mg/dl 130mm 200md/dl 11.5 20.0 114 Coefficient of variation (CV).....
  • 115. Distributions Distributions used in statistical analysis: 1. Discrete random variables: 1) Binomial, 2) Poisson & 3) Hyper geometric distributions.  E.g. The analysis of discrete random variables, such as the position of a nucleotide on a given sequence may use techniques based on a binomial distribution & not techniques that assume a normal distribution. 2. Continuous random variables: 1) Normal distribution, 2) Z distribution. 115
  • 116. Normal distribution  Normal distribution  It is symmetric about its mean/one half of the curve is the mirror image of the other half  The mean, median, & mode are equal & are in different positions  The highest point is at its mean  The height of the curve decreases as one moves away from the mean in either direction, approaching, but never reaching zero 116
  • 117. 117 Mean A normal distribution is symmetric about its mean As one moves away from the mean in either direction the height of the curve decreases, approaching, but never reaching zero The highest point of the overlying normal curve is at the mean Normal distribution…..
  • 118. Skewed distributions  Skewed distributions  The data are not distributed symmetrically in skewed distributions  The mean, median, & mode are not equal & are in different positions  Scores are clustered at one end of the distribution  A small number of extreme values are located in the limits of the opposite end  Skew is always toward the direction of the longer tail 118
  • 119. Skewed distributions…. A. Negatively skewed distribution  Occurs when majority of scores are at the right end of the curve & a few small scores are scattered at the left end  Positive if skewed to the right B. Positively skewed distribution  Occurs when the majority of scores are at the left end of the curve & a few extreme large scores are scattered at the right end.  Negative if to the left 119
  • 120. Median Mode Mean (a). Symmetric Distribution Mean = Median = Mode Mode Median Mean (b). Distribution skewed to the right Mean > Median > Mode Mean Median Mode (c). Distribution skewed to the left Mean < Median < Mode 120
  • 121. Which measures to use? 1. When the distribution is symmetric & uni-modal, summarize the data using means & standard deviations. 2. When the data are skewed, it is preferable to use the median & quartiles as summary statistics. 3. Median & quartiles are not easily influenced by extreme values in a skewed distribution unlike means & standard deviations. A. Symmetric & uni-modal distribution —  Mean, median, & mode should all be approximately the same B. Skewed to the right (Positively skewed) —  Mean is sensitive to extreme values, so median might be more appropriate C. Skewed to the left (Negatively skewed) –  Mean is sensitive to extreme values, so median might be more appropriate 121
  • 122. 122