1. intro_biostatistics.pptx

Debre Tabor University is new and different
7/4/2023
Asaye.A
1
Debre Tabor University
College of Heath Science
Social and Public Health
Biostatistics Course for Health Science Students
Debre Tabor, Ethiopia

Contact detail
7/4/2023
Asaye.A
2
 Asaye Alamneh (Lecturer of Biostatistics at DTU)
 Debre Tabor University
 College of Health Science
 Department: Social and Public Health
 Qualifications:
 BSc in Statistics, MPH in Biostatistics
 Contacts:
 Email: asaye2127stat@gmail.com
 Location: Debre Tabor University

Introduction to biostatistics
7/4/2023
3
Asaye.A

Outlines of presentation
7/4/2023
Asaye.A
4
 Definition of statistics and biostatistics
 Basic statistical concepts
 Classification of statistics
 Types of variables
 Application and limitation of biostatistics

Objectives
7/4/2023
Asaye.A
5
 After completing this chapter, the student will be able to:
 Define Statistics and Biostatistics
 List some basic terms
 Define and identify the different types of data and
understand why we need to classify variables
 Describe the importance and limitations of statistics
 Identify source of data

Definition of statistics
7/4/2023
Asaye.A
6
 The word statistics come from the Latin “status” which refers to
political state or government.
 Statistics can be defined in two ways:- plural sense and singular
sense.
1. Plural sense: are the aggregate of facts and figures, which are
expressed in numerical form.
 For example: Statistics on industrial production, Population growth
in the country in different years, etc.

Definition of statistics
7/4/2023
Asaye.A
7
2. Singular sense: Statistics refers to the science of collection,
organization, presentation, analysis, and interpretation of numerical of
data.
 It is useful to make data simple and easy to be understood by entire
population.
 Help us to use numbers to communicate ideas.
 For example: if we want to have a study about the distribution of
weights of the health science students in DTU.

Biostatistics
7/4/2023
Asaye.A
8
Biostatistics: application of statistical methods to medical, biological and
public health related problems.
 When the data being analyzed are derived from the biological science and
medicine, we use the term biostatistics to manage medical uncertainties.

Biostatistics….
7/4/2023
Asaye.A
9
 It is the scientific treatment given to the medical data derived from
group of individuals or patients.
 Collection of data.
 Presentation of the collected data.
 Analysis and interpretation of the results.
 Making decisions on the basis of such analysis.

Types of biostatistics
7/4/2023
Asaye.A
10
Based on how the data can be used, biostatistics can be classified in to
two main categories .
1. Descriptive statistics:
 Ways of collecting, organizing, summarizing, and presenting data at
hand into concise manner to get an impression of the data.
 Use to organize and describe the sample/population to simplify large
amount of data in sensible ways .
 It also show the final results in the form of table and graph.

Types……..
7/4/2023
Asaye.A
11
2. Inferential statistics: are methods for using sample data to make general
conclusions (inferences) about populations.
 Making conclusions for the population that is beyond available data.
For example:
 Probability distribution,
 Estimation,
 Confidence interval,
 Hypothesis testing,
 Regression analysis, etc.

Type of Biostatistics
7/4/2023
Asaye.A
12
Collection
Organizing
Summarizing
Presenting of data
Descriptive Statistics
Making inferences
Hypothesis testing
Determining relationship
Making the prediction
Inferential Statistics
Biostatistics

Stages of statistical investigation
7/4/2023
Asaye.A
13
A) Collection of data: measuring or gathering numerical data.
B) Organization of data: organizing and classifying the collected data.
C) Presentation of data: overview of the data in form of tables, graphs
and charts.
D) Analysis of data: extracting relevant information from the
summarized data
E) Interpretation of data: making generalization to the target
population.

Definitions of some basic terms
7/4/2023
Asaye.A
14
Population: A large group possessing a given characteristic or set of
characteristics .
A population may be finite or infinite
Parameter: characteristics obtained from the population or a single
measurement of population value.
Example: population mean (μ),population standard deviation (δ)
Statistic: characteristics obtained from the samples
Example: sample mean , mode , median SD, Variance etc

Cont….
7/4/2023
Asaye.A
15
Sampling: The technique of sample selection from the entire population
Sample: A subset of the population selecting by same sampling
techniques
Census: Complete enumeration of the population

Cont….
7/4/2023
Asaye.A
16
 Data: is raw, unorganized facts that need to be processed.
 When data is processed, organized, structured or presented in a given context
so as to make it useful, it is called information.
 Figure1: Relation between data and information

Variable
7/4/2023
Asaye.A
18
 Variable- a characteristic which take different values in different
persons, places or things or any aspect of population unit that is
measured or recorded.
 e.g. height, weight, marital status, etc.
 Random variables: are variables whose value are determined by
chance.
 Data: are sets of values of one or more variables.
 Are numbers which can be measurements or can be obtained by
counting.
 Data set: it is a collection of observation on a variable.

Types of Variables (1)
7/4/2023
Asaye.A
20
 Depending on the characteristic of the measurement, variable can
be classified into two types.
1. Quantitative (numerical variable): it is one that can be measured
and expressed quantitatively or numerically.
 It is the result of measuring or counting attributes population.
 Quantitative variables are also subdivided into two types:-
 A. Discrete variable
 B. Continuous variable:

Types …..
7/4/2023
Asaye.A
21
A. Discrete variable:
 A variable whose values are countable and assign a whole
number.
 There is no decimal number.
 E.g. the number of daily admission of hospital, number of live
births per 1000 women, number of motor vehicle accident in Debre
Tabor town.

Types …..
7/4/2023
Asaye.A
22
B. Continuous variable: the one that does not have gaps or interruption.
 A variable that can assume any decimal number value over a certain
intervals.
For example;
 Serum cholesterol level of a patient,
 Weight,
 Age,
 Laboratory result,
 Time, Arm circumference

Types …..
7/4/2023
Asaye.A
23
2. Qualitative (categorical) variable: it can not be quantified or
measured numerically, but measured by assigning names to
items (events).
E.g. sex, marital status, race or ethnic group, occupational status,
eye color etc.
A. Nominal variables: variables with no inherent order or ranking
sequence.
B. Ordinal variables: variables with an ordered series .

Types of Variables (2)
7/4/2023
Asaye.A
24
 Dependent variable: the outcome of interest, which should change in
response to some intervention.
 Some times called as out come or response variable.
 Independent variable: is the intervention, or what is being
manipulated.
 a variable that you believe might influence your outcome
measure.
 An independent variable is a hypothesized cause or influence on a
dependent variable.

Type of scales of measurement
7/4/2023
Asaye.A
25
 Based on the nature of the variable, variables can be measured
into four d/t levels of measurement.
 Measurement is defined as the assignment of numbers, symbols
and/or names to objects or events.

Type of scales….
7/4/2023
Asaye.A
26
 Each scale of measurement has certain properties which in turn
determine the appropriateness for use of certain statistical analyses.
 The property of value assigned to data based on the three properties of
measurement such as, order, distance and fixed zero/true zero.
 The four scales/levels of measurement are nominal, ordinal, interval
and ratio.

1. Nominal scale
7/4/2023
Asaye.A
27
 It is the lowest level of measurement.
 It simply consists of "naming" or classifying them into various
mutually exclusive, all inclusive categories in which no order or
ranking can be imposed on the data.
 When numbers are assigned to categories, it only used for coding
purposes and it does not provide a sense of size.

Cont…
7/4/2023
Asaye.A
28
 No arithmetic and relational operation can be applied.
 Nominal measurements have no three properties among values.
For example;
 Sex of a person (M, F),
 Eye color (e.g. brown, blue),
 Religion (Muslim, Christian),
 Place of residence (urban, rural),
 Race (e.g. black, white).

2. Ordinal scale
7/4/2023
Asaye.A
29
 Level of measurement which classifies data into categories that
can be ranked. Differences between the ranks do not exist.
 Relational operations of greater than, less than are applicable,
 The real difference between the ranks do not exist.

Cont…
7/4/2023
Asaye.A
30
Example;-
 Socio-economic status (very low, low, medium, high, very high)
 Patient status (unimproved, improved, much improved),
 Height of patients (very short, short, tall, very tall),
 Blood pressure (very low, low, high, very high),
 Job satisfaction level (highly dissatisfied, dissatisfied, satisfied,
highly satisfied), etc

3. Interval Scale
7/4/2023
Asaye.A
31
 It is possible to rank or order and tell the real distance between any two
measurements.
 However, there is no meaningful zero, so ratios are meaningless.
 All arithmetic operations except division are applicable.
 Relational operations are also possible.

Cont…
7/4/2023
Asaye.A
32
 The selected zero point is not necessarily a true zero in which it doesn't
have to indicate a total absence of the quantity being measured.
 Not that, zero degree Celsius is arbitrary so it does not make sense to
say that 20 degree Celsius is twice hot as 10 degree Celsius.
Examples:
 Body temperature in OF or OC, time of the day, days of the year, test
score, IQ…

4. Ratio scale
7/4/2023
Asaye.A
33
 Is the highest level of measurement.
 It classifies data that can be ranked, differences are meaningful, and
there is a true zero. True ratios exist between the different units of
measure.
 There is always a true zero point, which shows the absence of
condition.
 All arithmetic and relational operations are applicable.
Example: volume, height, weight, length, number of items, etc.

7/4/2023
Asaye.A
34
Summary of level of measurement

Summary of levels of measurement
No
No
No
Yes
Nominal
No
No
Yes
Yes
Ordinal
No
Yes
Yes
Yes
Interval
Yes
Yes
Yes
Yes
Ratio
Determine if one
data value is a
multiple of another
Subtract data
values
Arrange
data in
order
Put data in
categories
Level of
measurement

Summary of levels of measurement

Why we need Biostatistics?
7/4/2023
Asaye.A
37
 The main theory of statistics lies in the term variability.
 We can also have instrumental variability and observers variability.
1. Handling variation.
1. Biological variation: variation among individuals as well as within
individuals over time.
For example; height, weight, blood pressure,….
2. Sample variation: biomedical research project are usually carried out on
small numbers of study subjects.

7/4/2023
Asaye.A
38
2. Essential for scientific statistical methods of investigation.
 Formulate hypothesis.
 Design study to objectively to test hypothesis.
 Collect reliable and unbiased data.
 Process and evaluate data rigorously.
 Interpretate and making appropriate conclusion.
 These statistical methods are designed to contribute to the process of
making scientific judgment in the face of uncertainties and variation.

7/4/2023
Asaye.A
39
 It helps the researcher to arrive at a scientific judgment about a
hypothesis.
 It study the association between two or more attributes
 To evaluate the efficacy of drugs
 To determine the success or failure of health care program
 To define and measure the extent of the disease
 Statistical methods help us to understand public health issues and
disease, also quantifying uncertainties present in basic medical
sciences.

Limitations of statistics
7/4/2023
Asaye.A
40
 It deals with only those subjects of investigation that are capable of being
quantitatively measured and numerically expressed.
 It deals on only aggregates of facts and no importance to individual items.
 Statistical data are only approximately and not mathematically correct.
 Statistics can be easily misused and therefore should be used be experts.

Sources of data
7/4/2023
Asaye.A
41
 There are two basic sources of statistical data: These are:
1. Primary data: The first hand data were collected from the items or
individual respondents directly by researcher primarily for the purpose of
certain study.

Primary data…
7/4/2023
Asaye.A
42
 The major primary sources of data are :-
 Surveys,
 Surveillance,
 Census,
 Observation and,
 Experimental studies.

Secondary data
7/4/2023
Asaye.A
43
2. Secondary data: which had been collected by certain people or agency, and
statistically treated and the information contained in it is used for other purpose.
 For example: hospital records, magazines, CSA, DHS, and vital statistics:
 Birth reports,
 Death reports,
 Epidemic reports
 Reports of laboratory utilization (including laboratory test results)

Exercises
7/4/2023
Asaye.A
44
For each of the following variable indicate whether it is quantitative or
qualitative and specify the measurement scale for each variable :
1. Blood Pressure (mmHg)
2. Cholesterol (mmol/l)
3. Diabetes (Yes/No)
4. Body Mass Index (Kg/m2)
5. Age (years)
6. Employment (paid work/retired/housewife)
7. Smoking Status (smokers/non-smokers, ex-smokers)
8. Exercise (hours per week)
9. Drink alcohol (units per week)
10. Level of pain (mild/moderate/severe)

7/4/2023
Asaye.A
45
Methods of data collection and presentation

Methods of data collection and
presentation….
7/4/2023
Asaye.A
46
 At the end of this chapter ,you should be able to
– Understand Method of Data collection
– Identify Method of data Presentation
 Tabular Presentation
 Diagrammatic presentation
 Graphic presentation

1. Methods of data collection
7/4/2023
Asaye.A
47
 Data collection techniques allow us to systematically collect
data about our objects of study (people, objects, and
phenomena) and about the setting in which they occur
 Data can be obtained by a variety of ways. One of the most
common is through the use of surveys
 Surveys can be done by using a variety of data collection
methods

Methods…
7/4/2023
Asaye.A
48
There are various methods of data collection
1. Observation
2. Interviews
3. Questionnaires
4. Extraction of data from records

Methods……
7/4/2023
Asaye.A
49
 1. Observation- It is a technique that involves systematically
selecting, watching and recoding behaviors of people or other
phenomena and for the purpose of getting (gaining) specified
information.
 It includes all methods from simple visual observations to the
use of high level machines and measurements, sophisticated
equipment or facilities

Methods……
7/4/2023
Asaye.A
50
 Advantage- it gives relatively more accurate data on behavior and
activities.
 Disadvantage: -
 Investigator’s (observers) bias
 It requires more resource and
 Skilled human power during use of high level machines.

Methods……
7/4/2023
Asaye.A
51
2. Interview- commonly used for research data collection techniques.
 Face-to-Face interview,
 Telephone interview.
 Interviewee (responder), interviewer (asker)’

Methods…
7/4/2023
Asaye.A
52
I. Direct personal interview
 The investigator presents himself /herself personally before the
informant and questions him /her personally
 Best suited to situations where problems are not completely
understood and where questions can not be formulated before
hand and one question leads to other.
Disadvantage
 It is time consuming
 It is not suited for large group of informants

Methods…
7/4/2023
Asaye.A
53
II . Interviewing using questionnaire
 One drafts a detailed questionnaire
 The investigator appoints agents known as enumerators, who go to the respondents
personally with the questionnaire,
 Ask them the questions given there in, and
 Record their replies
 They can be
 Face-to-face or
 Telephone interviews.

Methods…
7/4/2023
Asaye.A
54
Face-to-face interviews
Advantage
 The interviewer knows exactly who is responding to the questionnaire.
 The interviewer can help the respondent if he/she has difficulty in
understanding the questions e.g. language, concentration.
 There is more flexibility in presenting the items ;they can range from
closed to open.
 Observations can be made as well.

Methods…
7/4/2023
Asaye.A
55
Disadvantage
 Untrained interviewer may distort the meaning of questions.
 Attributes of the interviewer may affect the responses given due to
bias of the interviewer and his/ her social or ethnic characteristics.
 More cost in terms of time and money (training and salary of
interviewers).

Methods…
7/4/2023
Asaye.A
56
Telephone interviews
Advantage
 Less expensive in time and money compared with face to face interviews
 The interviewer is able to help the respondent if he/she doesn’t understand
the question
 Broad representative samples can be obtained for those who have
telephone lines
 May assure the uniformity if interviewer is the same.

Methods…
7/4/2023
Asaye.A
57
Disadvantage
 Under presentation of those group which do not have telephone
 Respondent may be substituted by another
 Problems with questions with multiple options for answer and complicate
questions

3. Questionnaires
7/4/2023
Asaye.A
58
 Self administered questionnaires:- the respondent reads the question and
fill the answers by themselves.
 Advantage
 Is simpler and cheaper.
 Can be administered to many persons simultaneously (e.g. to a class
of students).
 Disadvantage
 They demand a certain level of education and skill on the part of the
respondents.

Methods…
7/4/2023
Asaye.A
59
Postal questionnaire
 The questionnaires are sent by post to the informants together with
a polite covering letter by explaining
 The detail information
 The aims and objectives of collecting the information, and
 Requesting the respondents to cooperate by furnishing the
correct replies and returning the questionnaire duly filled in.
 The return postage expenses are usually covered by the investigator.

Methods…
7/4/2023
Asaye.A
60
The main problems with postal questionnaire are :
 Response rates tend to be relatively low, and
 There may be under representation of less literate subjects.

Methods…
7/4/2023
Asaye.A
61
Mailed Questionnaire
 The questionnaire is mailed to respondents to be filled.
 Some times known as self enumeration.
Advantage
 Cheap
 No need for trained interviewers.
 No interviewer bias.
 They can be coordinated from one central location.

Methods…
7/4/2023
Asaye.A
62
Disadvantage
 Low response rate.
 Uncompleted questionnaires due to omission or invalid
response.
 No assurance that the questionnaire was answered by right
person.
 Needs intense follow up to get a high response rate.

Methods……
7/4/2023
Asaye.A
63
3. Extraction of data from records
 Clinical and other personal records, death certificates, published mortality
statistics, census publications, etc.
Examples;
1. Official publications of Central Statistical Authority
2. Publication of Ministry of Health and Other Ministries
3. News Papers and Journals.

Methods…
7/4/2023
Asaye.A
64
4. International Publications like Publications by WHO, World Bank,
UNICEF.
5. Records of hospitals or any Health Institutions.
 During the use of data from documents, though they are less time
consuming and relatively have low cost, take care on the quality and
completeness of the data.

Problems in gathering data
7/4/2023
Asaye.A
65
Common problems might include:
 Language barriers
 Lack of adequate time
 Expense
 Inadequately trained and experienced staff
 Bias
 Cultural norms

Choosing method of data collection
7/4/2023
Asaye.A
66
 To chose a better data collection method, we have to focus on
relevant, timely, accurate and usability of information.
 Some methods pay attention to timeliness and reduction in cost.
 Others pay attention to accuracy and the strength of the method
in using scientific approaches.

Cont…
7/4/2023
Asaye.A
67
 The selection of the method of data collection is also based on
practical considerations, such as:
 The need for personnel, skills, equipment, etc. in relation to what
is available.
 The acceptability of the procedures to the subjects.
 The probability that the method will provide a good coverage.
i.e. will supply the required information about all or almost all
members of the population.

Types of questions
7/4/2023
Asaye.A
68
 Before looking the steps in questionnaire design, we need to review
the types of questions.
 There are two types of questions
1. Open ended (free-response)
2. Close ended (restricted choice)
1. open ended
e.g. in your opinion what is the biggest barrier in getting your hospitals
ANC unit patient.

Types……
7/4/2023
Asaye.A
69
 Advantages- it stimulates free thoughts of respondent
 Helpful to obtain information on sensitive issues
 Disadvantages- there may problem of recalling answers
 It is not suitable for mailed question
 Answers are difficult to code for statistical analysis
 The problem of poor hand writing

Types……
7/4/2023
Asaye.A
70
2. Close ended- provides fixed answers
e.g. including your present visit how many times did you visit this
hospital in the past two yrs?
A. Once B. Twice C. 3x D. 4x E. >4x
 Advantage- suitable for many forms of statistical analysis
 Not difficult to code
 Disadvantage- limits a variety of details

Types……
7/4/2023
Asaye.A
71
Partially open ended question
 Advantage- provides alternatives if certain option are over looked
 it identifies missing categories for future use
 Disadvantage- respondent may ignore other options
e.g. if the house hold lost any of its members due to death in the last 12
months what was the cause of death.
1.Malaria 3. car accident
2. famine/hunger 4.others specify

Requirements of questions
7/4/2023
Asaye.A
72
 Must have face validity
 The question that we design should be one that give an obviously
valid and relevant measurement for the variable.
 Must be clear and unambiguous.
 One question contain only one ideas and all respondent will
understand in the same way.
 Must not be offensive (avoid questions that may offend the
respondent).

Cont…
7/4/2023
Asaye.A
73
 The questions should be fair (should not be loaded).
 Sensitive questions - It may not be possible to avoid asking
‘sensitive’ questions that may offend respondents
 In such situations the interviewer (questioner) should do it very
carefully and wisely

Cont…
7/4/2023
Asaye.A
74
 Start with an interesting but non-controversial question
(preferably open) that is directly related to the subject of the
study.
 Pose more sensitive questions as late as possible in the
interview
 Use simple language.
 Make the questionnaire as short as possible.

What to be considered before designing questioning tool
7/4/2023
Asaye.A
75
 What exactly do we want to know, according to the objectives
and variables we identified earlier?
 Of whom will we ask questions and what techniques will we
use?
 Are our informants mainly literate or illiterate?
 How large is the sample that will be interviewed?

Types…
7/4/2023
Asaye.A
76
Question
type
Open ended
Closed ended
Simple
dichotomy
Multiple
choice
Determinant
choice
Check-list

Types of closed format
7/4/2023
Asaye.A
77
Choice of categories
 Q. What is your marital status?
 Single
 Married
 Divorced
 Widowed
Likert (similar)style scale
 Q. Biostatistics is an interesting subject
 Strongly disagree
 Disagree
 Cannot decided
 Agree
 Strongly agree

Cont…
7/4/2023
Asaye.A
78
Checklists
 Circle the public health specialties you are particularly interested in
 Epidemiology and Biostatistics
 Reproductive health
 Nutrition
 Health informatics
 Health service management
 General

Cont…
7/4/2023
Asaye.A
79
Ranking
 Please rank your interest in the following specialties
(1=most interesting, 4=least interesting )
 Epidemiology and Biostatistics
 Reproductive health
 Nutrition
 Health informatics

2. Methods of data organization and presentation
7/4/2023
Asaye.A
80
 The most convenient method of organizing data is to construct a frequency
distribution.
 A frequency distribution is the organization of raw data in table form, using
classes and frequencies.
 Frequency distribution table: lists categories of scores along with their
corresponding frequencies.
 For this different techniques of data organization and presentation like order
array, tables and diagrams are used.

Array (ordered array)
7/4/2023
Asaye.A
81
 A serial arrangement of numerical data in an ascending or
descending order.
 A simple arrangement of individual observations in order of
magnitude.
 This will enable us to know the range over which the items are
spread and will also get an idea of their general distribution.
 It is an appropriate way of presentation when the data are small in
size (usually less than 20).

Frequency Distribution (F.D.)
7/4/2023
Asaye.A
82
Frequency distribution is organization of the values of a
variable arranged in order of magnitude either individually (for a
discrete variable), or in to classes (for a continuous variable), or
into categories (in case of qualitative data) along with their
frequencies.

Frequency Distribution (F.D.)…
7/4/2023
Asaye.A
83
A frequency distribution has two main parts; namely,
i. The values of the variable (if quantitative) or the
categories (if qualitative), and
ii. The number of observations (frequency)
corresponding to the values or categories.

Frequency Distribution (F.D.)…
7/4/2023
Asaye.A
84
There are two types of frequency distributions
i. Categorical (or qualitative)
ii. Numerical (or quantitative)
1. Categorical Frequency Distribution
 Data are classified according to non-numerical categories.
 Categories must be mutually exclusive and exhaustive.
 Used to organize nominal and ordinal data.

Cont…
7/4/2023
Asaye.A
85
a) Nominal data: Here the construction is straight forward: count the
occurrences in each category and find the totals.
Example: The martial status of 60 adults classified as single, married,
divorced and widowed is presented in a FD as below:
Ordinal data: The construction is identical to the nominal case, but, the
categories should be put in an ordered manner.
Marital
status
Single Married Divorced Widowed Total
Frequency 25 20 8 7 60

Cont…
7/4/2023
Asaye.A
86
b) Ordinal data. The construction is identical to
the nominal case. How ever, the categories
should be put in an ordered manner.
Example: Satisfaction on teaching method in a
class of size 60 is presented in a FD as shown
below

Numerical F.D
7/4/2023
Asaye.A
87
2. Numerical Frequency Distribution
 data are classified according to numerical size.
 used to organize interval and ratio data.
 may be discrete or continuous, depending on whether the
variable is discrete or continuous.

Numerical F.D…
7/4/2023
Asaye.A
88
a) Discrete (Ungrouped) Frequency Distribution
 Count the number of times each possible value is repeated.
Example: In a survey of 30 families, the number of children per
family was recorded and obtained the following data:
4 2 4 3 2 8 3 4 4 2 2 8 5 3 4 5 4 5 4 3 5 2 7 3 3 6 7 3 8 4.
The distribution of children in 30 families would be:
No. of
children
2 3 4 5 6 7 8 total
No. of family
(f)
5 7 8 4 1 2 3 30

Continuous grouped F.D
7/4/2023
Asaye.A
89
b) Continuous/grouped Frequency Distribution
o Arise from continuous variables/data.
o Unlike for a discrete FD, a class can not be allocated to
each value of a continuous variable.
o Categories in to which the observations are distributed are
called classes or class intervals.
o Classes should be exhaustive and mutually exclusive.

Example
7/4/2023
Asaye.A
90
Time spent Frequency
10 – 14 8
15 – 19 28
20 – 24 27
25 – 29 12
30 – 34 4
35 – 39 1

Steps in constructing continuous frequency distribution
7/4/2023
Asaye.A
91
1. Determine the number of classes (k): Number of items
belonging to a class.
 Decide ”k” with the help of Sturge’s rule:
k = 1 + 3.322 log(n)
Rounded up or down to the nearest integer.
Where n= number of observations, log= common logarithm
(logarithm of 10).

Cont…
7/4/2023
Asaye.A
92
 Example if n=10, k=4.32≈4, if n=100, k=7.644≈8, if n=1000,
k=10.96≈11
 2. Determine the class width (w): the difference between the
upper or lower boundaries of two consecutive classes (may be
class limits).
 We can use, W =
𝑅𝑎𝑛𝑔𝑒
𝐾
 Note that “W” rounded up or down to the nearest integers.

Cont…
7/4/2023
Asaye.A
93
3. Determine the Class Limits
 It separates one class from another and have gap between the upper
limits of one class and the lower limit of the next class.
 The lower class limit of the first class should be the smallest value of
the observations.
 Add the size of a class width on the lower class limit to obtain the
lower class limit of the next classes.
 Unit of measure (U): This is the possible difference between
successive values or measures. E.g. 1, 0.1, 0.01, 0.001……

Cont…
7/4/2023
Asaye.A
94
 To find the upper limit of the first class, subtract U from the lower
limit of the second class.
 Then continue to add the class width to this upper limit to find the
rest of the upper limits or
 Obtain the upper class limits by adding class width minus one to the
corresponding lower class limits. i.e. UCL =LCL+ (W-1)

Cont…
7/4/2023
Asaye.A
95
4. Determine the Class boundaries
 Making an interval of a continuous variable continuous in both directions,
no gap exists between classes.
 let U =LCL of the second class – UCL of preceding class.
 Add half of this difference (U/2) to all upper class limits to get the upper
class boundaries (UCBs), and subtract (U/2) from all lower class limits to
get the lower class boundaries (LCBs).
 UCBi = UCLi +U/2
 LCBi = LCLi – U/2

Cont…
7/4/2023
Asaye.A
96
5. Class mark (C.M) or Mid points: it is the average of the lower and upper
class limits or the average of upper and lower class boundary.
6. Determine the frequency of each class: determined simply by counting
the number of observations belonging to each class.
7. Cumulative frequency is the number of observations less than/ more than
or equal to a specific value.
8. Cumulative frequency above (Greater than type): it is the total
frequency of all values greater than or equal to the lower class boundary of a
given class.

Cont…
7/4/2023
Asaye.A
97
9. Cumulative frequency below (less than type): it is the total frequency of
all values less than or equal to the upper class boundary of a given class.
10. Relative frequency (rf): it is the frequency divided by the total frequency.
11. Relative cumulative frequency (rcf): it is the cumulative frequency
divided by the total frequency.

Cont…
7/4/2023
Asaye.A
98
Example: The blood glucose level for 50 patients is shown below.
Construct a frequency distribution for the following data.

Cont…
7/4/2023
Asaye.A
99
Solution:
Step 1: Find the highest and the lowest value H=88, L=42
Step 2: Find the range; R=H-L=88-42=46.
Step 3: Select the number of classes desired using Sturge’s formula;
k=1+3.322log (50) =6.64=7(rounding up)
Step 4: Find the class width; w=R/k=46/7=6.57=7 (rounding up)

Cont…
7/4/2023
Asaye.A
100
Step 5: Select the starting observation as lowest class limit (this is
usually the lowest observation).
 Add the class width to that observation to get the lower limit of the
next class.
 Keep adding until there are 7 classes. 42, 49, 56, 63, 70, 77, 84 are
the lower class limits.
Step 6: Find the upper class limit; e.g. the first upper class=49- U=49-
1=48. The rest CL: 55, 62, 69, 76, 83, 90 are the upper class limits.

Cont…
7/4/2023
Asaye.A
101
 So combining step 5 and step 6, one can construct the following
classes.
Step 7: Find the class boundaries by subtracting 0.5 from each lower
class limit and adding 0.5 to the UCL.

Cont…
7/4/2023
Asaye.A
102
Example: For class 1: LCBi =LCLi - U/2 = 42-0.5 = 41.5 and UCBi =
UCLi + U/2 = 48+0.5 = 48.5.
 Then continue adding W on both boundaries to obtain the rest
boundaries.
 By doing so one can obtain the following classes.

Cont…
7/4/2023
Asaye.A
103
Step 8: Find the frequencies
Step 9: Find cumulative frequency.
Step 10: Find relative frequency and /or relative cumulative frequency.

Example:
7/4/2023
Asaye.A
105
Construct a continuous FD for the following raw data of ages of
patients admitted at DTU hospital in a given week.

Cont…
7/4/2023
Asaye.A
107
 Here is the FD

Cont…
7/4/2023
Asaye.A
108
 The class marks and class boundaries of the above Example are:

Continuous/grouped F.D …
7/4/2023
Asaye.A
109
Cumulative frequency distributions
o Tells us how often the values fall below or above that class. There
are two types of CFD:
The “less than” cumulative F.D.
o Obtained by adding the frequency of all the preceding classes
including the frequency of that class.
The “more than” cumulative F.D.
o Obtained by adding the frequency of the succeeding classes
including the frequency of that class.

Cont…
7/4/2023
Asaye.A
110
 For the data in the above Example, both cumulative frequency
distributions are given below:

Following the rules for grouping data
7/4/2023
Asaye.A
111
 The groups must not overlap: not to be confuse concerning in which group
a measurement belongs.
 There must be continuity from one group to the next: Otherwise some
measurements may not fit in a group.
 The groups must range from the lowest measurement to the highest
measurement.
 The groups should normally be of an equal width.

Methods of data presentation
7/4/2023
Asaye.A
112
Commonly, here are two ways of presenting
statistical data:
1. Statistical tables
2. Graphs/Diagrams

1. Tabulation methods of data
presentations
7/4/2023
Asaye.A
113
1. Statistical tables
o A statistical table is an orderly and systematic presentation of data
in rows and columns.
Rows : are horizontal arrangements.
Columns: are vertical arrangements.
o Use of tables for organizing data that involves grouping the data
into mutually exclusive categories of the variables and counting
the number of occurrences (frequency) to each category.

Cont….
7/4/2023
Asaye.A
114
 Based on the purpose for which the table is designed and the
complexity of the relationship, a table could be either of
simple frequency table or cross tabulation.
 Simple frequency table is used when the individual
observations involve only to a single variable.
 Cross tabulation is used to obtain the frequency distribution of
one variable by another variables.

General principles to construct
tables
7/4/2023
Asaye.A
115
1. Tables should be as simple as possible.
2. Tables should be self-explanatory.
Title should be clear and placed above the table. a good title
answers: what? when? where? how classified ?
Each row and column should be labeled.
Numerical entities of zero should be explicitly written rather
than indicated by a dash.
Dashed are reserved for missing or unobserved data.
Totals should be shown either in the top row and the first
column or in the last row and last column.
3. If data are not original, their source should be given in a footnote.

A) Simple or one-way table
7/4/2023
Asaye.A
116
Simple frequency table: most basic table is a simple
frequency distribution with one variable.
Example:
Table. Blood group of voluntary blood donors
examined in red cross blood bank within a day, may
2006 (n=548)
Blood group Number of
students
Percent
A 240 43.8
B 146 26.6
AB 57 10.4
O 105 19.2
Total 548 100
Rows
Title Columns

Two and three variable table
7/4/2023
Asaye.A
117
 If two variables are cross tabulated, it is a two variable table
 If the tabulation is among three variables, it is three variable
table .
 In cross tabulated frequency distributions where there are row
and column totals, the decision for the denominator is based
on the variable of interest to be compared over the subset of
the other variable.

Common form of a two by two
variable
7/4/2023
Asaye.A
119
 It is a special form of table favorite among
epidemiologist.
 It is used to compare whether there is relationship
between the two variables.
Exposure Numbers of subjects Total
Cases Controls
Exposed 23 23 46
Non-
exposed
4 139 143
Total 27 162 189

Composite/ Higher Order Table
7/4/2023
Asaye.A
120
It is a large table combining several separate variable/tables
Age, sex and other demographic variables may be combined
to form a single table
Example: Distribution of Health Professional by Sex
and Residence

Diagrammatic and Graphical methods of data presentation
7/4/2023
Asaye.A
121
Advantages
 To understand the information easily.
 To make the data attractive.
 To make comparisons of items easily.
 To draw attention of the observer.
 The purpose of graphs and diagrams is not to provide exact and detailed
information, but simple comparisons.
 Any further information shall rather be obtained from the original data.

Limitations of Diagrammatic presentation
7/4/2023
Asaye.A
122
 The technique is made use only for purposes of comparison. It is
not to be used when comparison is either not possible or is not
necessary.
 is not an alternative to tabulation. It only strengthens the textual
exposition of a subject, and cannot serve as a complete substitute
for statistical data.
 It can give only an approximate idea and as such where greater
accuracy is needed diagrams will not be suitable.
 They fail to bring to light small differences.

2. Diagrammatic Presentation of data
7/4/2023
Asaye.A
123
 Diagrams are appropriate for presenting discrete as well as
qualitative data.
 The three most commonly used diagrammatic presentation of
data are:
 Pie charts
 Bar charts
 Pictograms

1. Pie chart
7/4/2023
Asaye.A
125
 Pie chart can used to compare the relation between the
whole and its components.
 useful for qualitative or quantitative discrete data.
 Pie chart is a circular diagram and the area of the sector of a
circle is used in pie chart.

Cont…
7/4/2023
Asaye.A
126


Example:
7/4/2023
Asaye.A
127
Draw a suitable diagram to represent the following
population in a town.
Men Women Girls Boys
2500 2000 4000 1500

2. Bar charts (or graphs)
7/4/2023
Asaye.A
130
 Categories are listed on the horizontal axis (X-axis).
 Frequencies or relative frequencies are represented on the Y-
axis.
 The height of each bar is proportional to the frequency or
relative frequency of observations in that category.
 There are three types of bars.

Tips for constructing bar diagrams
7/4/2023
Asaye.A
131
1. Whenever possible it is better to construct a bar diagram on a graph
paper
2. All bars drawn in any single study should be of the same width
3. The different bars should be separated by equal distances
4. All the bars should rest on the same line called the base
5. Whenever possible, it is advisable to draw bars in order of
magnitude

Cont…
7/4/2023
Asaye.A
132
A. Simple bar chart:- used to represent a single variable classified on spatial,
quantitative or temporal basis.

Cont…
7/4/2023
Asaye.A
133
Example: Construct a bar chart for the following data

Cont…
7/4/2023
Asaye.A
135
B. Sub-divided bar chart (component)
o is used to represent data in which the total magnitude is divided into
different or components
o Example: Plasmodium species distribution for confirmed malaria
cases, Zeway, 2003

Cont…
7/4/2023
Asaye.A
136
C. Multiple bar chart
 are used two or more sets of inter-related data are represented
(multiple bar diagram facilities comparison between more than one
phenomenon).
 The following figure shows a multiple bar chart to represent the
import and export of Canada (values in $) for the years 1991 to
1995.

3. Graphical Presentation of data
7/4/2023
Asaye.A
139
The histogram, frequency polygon and cumulative frequency graph (ogive) are
most commonly applied graphical representation for continuous data.
Procedures for constructing statistical graphs
• Draw and label the X and Y axes.
• Choose a suitable scale for the frequencies or cumulative frequencies and label
it on the Y axes.
• Represent the class boundaries for the histogram or ogive and the mid points
for the frequency polygon on the X axes.
• Plot the points.
• Draw the bars or lines to connect the points.

Graphical Presentation of
data
7/4/2023
Asaye.A
140
1. Histogram
 A graph which places the class boundaries on the horizontal axis
and the frequencies on a vertical axis
 Class marks and class limits are some times used as quantity on the
X axes.
 Non-overlapping intervals that cover all of the data values must be
used.

Cont…
7/4/2023
Asaye.A
141
 Bars are drawn over the intervals in such a way that the areas of the
bars are all proportional in the same way to their interval
frequencies.
 To avoid crowding, you can use class midpoints.
 Example: Distribution of the age of women at the time of marriage

Cont…
7/4/2023
Asaye.A
142
Histogram

Cont…
7/4/2023
Asaye.A
143
2. Frequency polygon
 Line graph of class marks against class frequencies.
 To draw a frequency polygon we connect the midpoints of class
boundaries of the histogram by a straight line

Cont…
7/4/2023
Asaye.A
144
 It can be also drawn without erecting rectangles by joining the top
midpoints of the intervals representing the frequency of the classes as
follows:

Cont…
7/4/2023
Asaye.A
145
3. Ogive Curve (Cumulative Frequency Polygon)
 A graph showing the cumulative frequency (less than or more than type)
plotted against upper or lower class boundaries respectively.
 Ogive uses class boundaries along the horizontal axis, and cumulative
frequency along vertical axis.
 Less than Ogive uses less than cumulative frequency on y axis.
 More than Ogive uses more than cumulative frequency on 𝑦 axis.
 The points are joined by a free hand curve

Cont…
7/4/2023
Asaye.A
150
4. Line graph
o A variable is taken along X-axis and the frequency of occurrence of
each of its observed values along the Y-axis.
o The points are plotted and joined by line.
o An arithmetic scale line graph shows patterns or trends over some
variable, usually time.

7/4/2023
Asaye.A
153
Summary Measures

learning outcomes
7/4/2023
Asaye.A
154
 After completing this chapter a student will able to;
 List and calculate measures of central tendency
 List and calculate measures of dispersion
 Describe types of shape.

Numerical Summary
Measures
7/4/2023
Asaye.A
155
 They are the single numbers which quantify the characteristics
of a distribution of values.
 They are two types;
1. Measures of central tendency or location
2. Measures of dispersion

Measures of Central Tendency/ Measures of Location
7/4/2023
Asaye.A
156
 Measures of central Tendency: the methods of determining the
actual value at which the data tend to concentrate.
 The tendency of the statistical data to get concentrated at a certain
value is called “central tendency”
 The objective of calculating MCT is to determine a single figure
which may be used to represent the whole data set.
 Since a MCT represents the entire data, it facilitates comparison
within one group or between groups of data

Characteristics of a good MCT
7/4/2023
Asaye.A
157
 A MCT is good or satisfactory if it possesses the following characteristics:
o It should be based on all the observations
o It should not be affected by the extreme values
o It should be as close to the maximum number of values as possible
o It should have a definite value
o It should not be subjected to complicated and tedious calculations
o It should be capable of further algebraic treatment
o It should be stable with regard to sampling

Cont…
7/4/2023
Asaye.A
158
 The most common measures of central tendency include:
 Arithmetic Mean
 Median
 Mode

1. Arithmetic Mean
7/4/2023
Asaye.A
159
1. Ungrouped Data
 The arithmetic mean is the "average" of the data set and by far the
most widely used measure of central location.
 Is the sum of all the observations divided by the total number of
observations.

Arithmetic…..
7/4/2023
Asaye.A
160
The heart rates for n=10 patients were as follows (beats per minute):
167, 120, 150, 125, 150, 140, 40, 136, 120, 150.
What is the arithmetic mean for the heart rate of these patients?

Cont…
7/4/2023
Asaye.A
161
 When the data are arranged or given in the form of frequency
distribution i.e. there are K variety such that a value Xi has
frequency fi (i=1,2,…,k), then the arithmetic mean will be given as ;

Cont…
7/4/2023
Asaye.A
162
Solution

Cont…
7/4/2023
Asaye.A
163
Exercise
Consider the following frequency distribution table
Calculate the average of this data set?

Cont…
7/4/2023
Asaye.A
164
2. For grouped data
 In calculating the mean from grouped data, we assume that all values falling
into a particular class interval are located at the midpoint of each interval.
 Therefore, mean for grouped data is calculated as:

Arithmetic…..
7/4/2023
Asaye.A
165
Example
Compute the mean age of 169 subjects from the grouped data.
Mean = 5810.5/169 = 34.48 years
Class interval Mid-point (mi) Frequency (fi) mifi
10-19
20-29
30-39
40-49
50-59
60-69
14.5
24.5
34.5
44.5
54.5
64.5
4
66
47
36
12
4
58.0
1617.0
1621.5
1602.0
654.0
258.0
Total __ 169 5810.5

Arithmetic…..
7/4/2023
Asaye.A
166
 The mean can be thought of as a “balancing point”, “center of gravity”
 It is possible in extreme cases for all but one of the sample points to be on
one side of the arithmetic mean & in this case, the mean is a poor measure
of central location or does not reflect the center of the sample.

Properties of the Arithmetic Mean
7/4/2023
Asaye.A
167
 The mean can be used as a summary measure for both discrete and
continuous data, but it is not appropriate for either of nominal or ordinal
data.
 For a given set of data there is only one arithmetic mean (uniqueness).
 Easy to calculate and understand (simple).
 Influenced by each and every value in a data set
 Greatly affected by the extreme values.
 In case of grouped data if any class interval is open, arithmetic mean can
not be calculated.

2. Median
o It is the an alternative measure of central tendency, second in popularity
next to arithmetic mean.
o Suppose there are n observations in a sample
o If these observations are ordered from smallest to largest, then the median
is defined as follows:
o The median, is a value such that at least half of the observations are less
than or equal to median and at least half of the observations are greater
than or equal to median.
 The median is the midpoint of the data array.

2. Median….
7/4/2023
Asaye.A
169
Ungrouped data
 The median is the value which divides the data set into two equal parts.
 If the number of values is odd, the median will be the middle value when
all values are arranged in order of magnitude.
 When the number of observations is even, there is no single middle value
but two middle observations.
 In this case the median is the mean of these two middle observations, when
all observations have been arranged in the order of their magnitude.

Cont…
7/4/2023
Asaye.A
170
1. For ungrouped data
• If the number of observations is odd, the median is defined as the
[(n+1)/2]th observation.
• If the number of observations is even the median is the average of
the two middle (n/2)th and [(n/2)+1]th values.
• To find the median of a data set:
• Arrange the data in ascending order.
• Find the middle observation of this ordered data.

Cont…
7/4/2023
Asaye.A
171
Example1: where n is even: 19,20, 20, 21, 22, 24, 27, 27, 27,34
Then, the median = (22 + 24)/2 = 23
Example2: The number of children with asthma during a specific year
in seven local districts clinic is shown.
Find the median for this data set.
253, 125, 328, 417, 201, 70, 90

Cont…
7/4/2023
Asaye.A
172
Solution:
First we must arrange the data in ascending order
70, 90, 125, 201, 253, 328, 417
Therefore, the fourth observation is the median of the data, i.e. the value 201
is the median value.

Exercise
7/4/2023
Asaye.A
173
The actual waiting time for the first job on the selected sample of nine people
having different field of specialization was given below.
waiting time(in months): 11.6,11.3, 10.7, 18.0, 3.3, 9.2, 8.3, 3.8, 6.8
Calculate the median of the waiting time.

Cont…
7/4/2023
Asaye.A
174
2. For grouped data
-If data are given in the shape of continuous frequency distribution, the
median is defined as:
Where: Lmed =lower class boundary of the median class. f med= The frequency
of the median class, W=the size of the median class, n= total number of
observation, f c= The cumulative frequency less than type preceding the
median class.
Note: the median class is the class with smallest cumulative frequency {less
than type) greater than or equal to n/2.

Cont…
7/4/2023
Asaye.A
175
 Example; find the median for the following distribution

Cont…
7/4/2023
Asaye.A
176
Solution

Cont…
7/4/2023
Asaye.A
177
 We can computed the median value as follow;

Merit and demerit of median
7/4/2023
Asaye.A
178
Merits:
 Median is a positional average and hence not influenced by extreme
observations.
 Can be calculated in the case of open end intervals.
 The median can be used as a summary measure for ordinal, discrete and
continuous data, in general however, it is not appropriate for nominal data.
Demerits:
 It is not a good representative of data if the number of items are small.
 It is not amenable to further algebraic treatment.
 It is vulnerable to sampling fluctuations.

3. Mode
7/4/2023
Asaye.A
179
 Mode is a value which occurs most frequently in a set of values.
 The mode may not exist and even if it does exist, it may not be
unique.
 If in a set of observed values, all values occur once or equal
number of times, there is no mode

Cont…
7/4/2023
Asaye.A
180
Examples:
1. Find the mode of 5, 3, 5, 8, and 9 ; Mode = 5
2. Find the mode of 8, 9, 9, 7, 8, 2, 5; Mode =8 and 9
3. Find the mode of 4, 12, 3, 6, and 7. No mode/ mode doesn’t exist.

Cont…
7/4/2023
Asaye.A
181
Mode for Grouped data
 NB: The mode for grouped data is modal class.
 The Modal class is the class with the largest frequency.
 mode = L +
 1
 1   2
∗ W
 Where L = The lower class boundary of the modal class;
w = the size of the modal class
f1= frequency of the class preceding the modal class.
f2= frequency of the class succeeding the modal class
fmod = frequency of the modal class.
1 = fmod - f1 , 2 = fmod - f2

Cont…
7/4/2023
Asaye.A
182
Example: Calculate the modal age for the age distribution of 228 patients
below.

Cont…
7/4/2023
Asaye.A
183
Solution
By inspection (simply looking at the frequencies), the mode lies in the
fourth class, where L=29.5, fmod = 57, f1=50, f2=48, w = 5, and
Therefore, the modal age, x = 29.5 +
7
7  9
∗ 5
 29.5  2.2
 31.7
∆2=57-48=9
∆1=57-50=7,

Properties of Mode
7/4/2023
Asaye.A
184
 The mode can be used as a summary measure for nominal,
ordinal, discrete and continuous data, in general however, it is
more appropriate for nominal and ordinal data.
 It is not affected by extreme values
 It can be calculated for distributions with open end classes
 Sometimes its value is not unique
 The main drawback of mode is that it may not exist

Merit and Demerit of Mode
7/4/2023
Asaye.A
185
Merits:
 It is not affected by extreme observations.
 Easy to calculate and simple to understand.
 It can be calculated for distribution with open end class.

Cont…
7/4/2023
Asaye.A
186
Demerits:
 It is not rigidly defined. i.e. its value is not unique.
 It is not based on all observations.
 It is not suitable for further mathematical treatment.
 It is not stable average, i.e. it is affected by fluctuations of
sampling to some extent.

Measure of location
7/4/2023
Asaye.A
187
Quartiles
- Quartiles are measures that divide the frequency distribution in to four
equal parts.
- The value of the variables corresponding to these divisions are denoted
Q1, Q2, and Q3 often called the first, the second and the third quartile
respectively.
- Q1 is a value which has 25% items which are less than or equal to it
- Similarly Q2 has 50% items with value less than or equal
to it.

Cont…
7/4/2023
Asaye.A
188
− Q3 has 75% items whose values are less than or equal to it.
 Quartile for ungrouped data.
 Arrange data in ascending order.
 If the number of observation is
A. Odd
 Qi =
𝑖(𝑛+1)th
4
item
 B. Even
 Qi =(
𝑖𝑛
4
𝑡ℎ+
𝑖𝑛
4
+1 𝑡ℎ
2
)

For grouped data
7/4/2023
Asaye.A
189

Percentiles
7/4/2023
Asaye.A
190
 Simply divide the data into 100 pieces
 Shows the percentage of values that fall below the particular value in a set
of data scores.

Cont…
7/4/2023
Asaye.A
191
 Arrange the numbers in ascending order.
Percentiles for individual series
A. Odd
Pi =
𝑖(𝑛+1)th
100
item
B. Even
Pi =(
𝑖𝑛
100
𝑡ℎ+
𝑖𝑛
100
+1 𝑡ℎ
2
)
Percentiles for grouped data
𝑃𝑖= 𝐿 +
𝑤
𝑓𝑃𝑖
𝑖𝑛
100
− 𝐶𝐹 ,i = 1, 2,...,99 .

Cont…
7/4/2023
Asaye.A
192


Cont…
7/4/2023
Asaye.A
193
 For example: suppose that 50% of a cohort survived at least 4 years.
 This means also that 50% survived at most 4 years.
 We say that 4 years is the median.
 The media is also called 50th percentile.
 We write p50= 4 years.

Example
7/4/2023
Asaye.A
194
Marks of 50 students out of 85 is given below. Based on the data find
𝑄1 𝑎𝑛𝑑 𝑃7.
Solution: first find CB and CF distribution.
Second determine the quartile and percentile classes.
For 𝑄1: the smallest CF ≥ i*N/4=1*50/4= 12.5
Marks
46-50 51-55 56-60 61-65 66-70 71-75 76-80
fi
4 8 15 5 9 5 4
Marks 46-50 51-55 56-60 61-65 66-70 71-75 76-80
CB 45.5-
50.5
50.5-
55.5
55.5-60.5 60.5-65.5 65.5-70.5 70.5-75.5 75.5-
80.5
fi 4 8 15 5 9 5 4
CF 4 12 27 32 41 46 50

Cont…
7/4/2023
Asaye.A
195
 CF ≥ 12.5 are 27,37,41,46, and 50. but the smallest CF is 27. so the
quartile class is the third class (55.5-60.5).
 Q1 = L +
𝑤
𝑓
𝑄1
𝑛
4
− 𝐶𝐹 = 55.5 +
5
15
12.5 − 12 = 55.7
 For percentiles
 P7 measure of (7n/100)th value = 3.5th value which lies in group 45.5
– 50.5.
 P7 = L +
𝑤
𝑓
𝑃7
7𝑛
100
− 𝐶𝐹 = 45.5 +
5
4
3.5 − 0 = 49.875.

Cont…
7/4/2023
Asaye.A
196
1. Calculate 𝑄1 , 𝑄2, 𝑄3, 𝐷4, 𝑃40 & 𝑃90 for the following data given
on the table below.
2. The following frequency distribution represents the magnitude of
earth quake.
Compute the median and verify that it is equal to the second quartile
and find 72nd percentile.
x 10 11 12 13 14 15 16 17 18
f 2 8 25 48 65 40 20 9 2
Magnitude 0-0.9 1-1.9 2-2.9 3-3.9 4-4.9 5-5.9 6-6.9 7-7.9
Frequency 20 50 45 30 10 8 6 1

Summary
7/4/2023
Asaye.A
197
1. The arithmetic mean is used for interval and ratio data and for
symmetric distribution.
2. The median and quartiles are used for ordinal, interval and ratio data
whose distribution is skewed.
3. For nominal data mode is the appropriate MCT.

Measures of variation/dispersion
7/4/2023
Asaye.A
198
 The scatter or spread of items of a distribution is known as
dispersion or variation.
 In other words, the degree to which numerical data tend to
spread about an average value is called dispersion or variation
of the data.
 Measures of dispersions are statistical measures which provide
ways of measuring the extent in which data are dispersed or spread
out.

Agood measure of variation posses:
7/4/2023
Asaye.A
199
o It should be easy to compute and understand.
o It should be based on all observations.
o It should be Uniquely defined
o It should be capable of further algebraic treatment.
o It should be as little as affected by extreme values

Cont…
7/4/2023
Asaye.A
200
o Measures of dispersion include:
o Range
o Inter-quartile range
o Variance
o Standard deviation
o Coefficient of variation
o Standard scores (Z-scores)

Range
7/4/2023
Asaye.A
201
 It is the difference between the largest and smallest observation from the
data.
 Example: Consider the data on the weight (in Kg) of 10 new born
children at Debre tabor hospital within a month: 2.51, 3.01, 3.25,
2.02,1.98, 2.33, 2.33, 2.98, 2.88, 2.43

Cont…
7/4/2023
Asaye.A
202
Solution:
 The range for the dataset can be computed by first arranging all
observation in to ascending order as: 1.98, 2.02, 2.33, 2.33, 2.43, 2.51,
2.88, 2.98, 3.01, 3.25.
 Range = Maximum – Minimum = 3.25-1.98
= 1.27

Cont…
7/4/2023
Asaye.A
203
Limitations of Range
 It is based upon two extreme cases in the entire distribution, the range
may be considerably changed if either of the extreme cases happens to
drop out, while the removal of any other case would not affect it at all.
 It wastes information , it takes no account of the entire data.

Inter-quartile range
7/4/2023
Asaye.A
204
The inter-quartile range (IQR) is the difference between the third and the first
quartiles.
Example: Suppose the first and third quartile for weights of girls 12 months of
age are 8.8 Kg and 10.2 Kg respectively.
The IQR = 10.2 Kg – 8.8 Kg

Variance and standard deviation
7/4/2023
Asaye.A
205
 Variance measure how far on average scores deviate or differ
from the mean.
 Variance is the average of the square of the distance each value
from the mean.

Cont…
7/4/2023
Asaye.A
206
 For ungrouped data

Cont…
7/4/2023
Asaye.A
207
 For the case of frequency distribution it is expressed as:
 Why you use n-1;
− To obtain unbiased estimate of population variance or,
− To describe the spread of the population.

Cont…
7/4/2023
Asaye.A
208
 There is a problem in a variance because the deviations are squared
and its units also square, in order to get the original unit of
measurements using square root.

Example1
7/4/2023
Asaye.A
209
Consider the following three datasets
 Dataset 1:7, 7, 7, 7, 7, 7 Mean=7, sd=0
 Dataset 2: 6, 7, 7, 7, 7, 8, mean=7, sd=0.63
 Dataset 3: 3, 2, 7, 8, 9, 13, mean=7, sd=4.04
 We understand that the same mean but different variation

Example2
7/4/2023
Asaye.A
210
Find the variance and standard deviation based on the given data 35, 45,
30, 35, 40, 25
Solution; Firstly we find the mean
Next subtract the mean from each value and square it:

Exercise
7/4/2023
Asaye.A
212
 The Areas of spray able surfaces with DDT from a sample of 15 houses
are measured as follows (in m2) :
101,105,110,114,115,124,125,125,130,133,135,136,13 7,140,145
Find the variance and standard deviation of the given data set?

Cont…
7/4/2023
Asaye.A
213


Example
7/4/2023
Asaye.A
214
Find the variance and the standard deviation for the frequency
distribution of the given data set below.

Cont…
7/4/2023
Asaye.A
215
Class Frequency Midpoint fi.xm
5.5-10.5 1 8 8 1*(8-24.5)2= 272.25
10.5-15.5 2 13 26 2*(13-24.5)2 = 264.5
15.5-20.5 3 18 54 3*(18-24.5)2 = 126.75
20.5-25.5 5 23 115 5*(23-24.5)2 = 11.25
25.5-30.5 4 28 112 4*(28-24.5)2 = 49
30.5-35.5 3 33 99 3*(33-24.5)2 = 216.75
35.5-40.5 2 38 76 2*(38-24.5)2 = 364.5
Total n = 20 490 1,305

Cont…
7/4/2023
Asaye.A
217
Properties of Variance:
 The main demerit of variance is that its unit is the square of the unit
of the original measurement values.
 The variance gives more weight to the extreme values as compared
to those which are near to mean value, because the difference is
squared in variance.
 The drawbacks of variance are overcome by the standard deviation.

Cont…
7/4/2023
Asaye.A
218
SD Vs. Standard Error (SE)
 SD describes the variability among individual values in a given data set.
 SE is used to describe the variability among separate sample means
obtained from one sample to another.
 We interpret SE of the mean to mean that another similarly conducted
study may give a mean that may lie between ± SE.

Cont…
7/4/2023
Asaye.A
219
 The SD has the advantage of being expressed in the same units of
measurement as the mean.
 SD is considered to be the best measure of dispersion and is used
widely because of the properties of the theoretical normal curve.
 However, if the units of measurements of variables of two data sets
is not the same, then there variability can’t be compared by
comparing the values of SD

Coefficient of variation
7/4/2023
Asaye.A
220
 When two data sets have different units of measurements, or their means
differ sufficiently in size, the CV should be used as a measure of
dispersion.
 It is the best measure to compare the variability of two series of sets of
observations.
 A series with less coefficient of variation is considered more consistent.
 𝐶𝑣 =
𝑆
𝑋
∗ 100%

Cont…
7/4/2023
Asaye.A
221
 Example -“Cholesterol is more variable than systolic blood pressure”

Standard score (Z-scores)
7/4/2023
Asaye.A
222
 It is obtained by subtracting the mean of the data set from
the value and dividing the result by the standard deviation
of the data set.
 It tells us how many standard deviations a specific value is
above or below the mean value of the data set.
 The z-score is the number of standard deviations the data
value falls above (positive z-score) or below (negative z-
score) the mean for the data set.

Cont…
7/4/2023
Asaye.A
223
 Z-score computed from the population
𝑍 𝑠𝑐𝑜𝑟𝑒 =
𝑋 − 𝜇
𝜎
 Z-score computed from the sample
𝑍 𝑠𝑐𝑜𝑟𝑒 =
𝑋 − 𝑋
𝑆
Example: Suppose that a student scored 66 in biostatistics and 80 in anatomy
. The score of the summary of the courses is given below.
In which course did the student scored better as compared to his classmates?
Course Average score Standard deviation of the score
Biostatistics 51 12
Anatomy 72 16

Solution:
7/4/2023
Asaye.A
224
Z-score of student in Biostatistics: 𝑍 =
𝑋−𝜇
𝜎
=
66−51
12
=
15
12
=
1.25
Z-score of student in Anatomy: 𝑍 =
𝑋−𝜇
𝜎
=
80−72
16
=
8
16
= 0.5
From these two standard scores, we can conclude that the
student has scored better in Biostatistics course relative to his
classmates than in Anatomy.

Moments
 The rth moments about the mean (the rth central moments) defined as
𝑀𝑟 =
𝑋𝑖 − 𝑋 𝑟
𝑛
, r = 0, 1, 2, …
 For continuous grouped data
𝑀𝑟 =
𝑓𝑖 𝑋𝑖 − 𝑋 𝑟
𝑛
Where 𝑋𝑖’s is class mark
Find the first three central moments of the numbers 2, 3 and 7

Measure of shape
7/4/2023
Asaye.A
226
 There are different type of measure of shape;
I. Skewness
II. Kurtosis

1. Skewness
7/4/2023
Asaye.A
227
o Measure of central tendency and variation do not reveal the
shape of frequency distribution.
o Skewness is the degree of asymmetry or departure from
symmetry of a distribution.
o A skewed frequency distribution is one that is not symmetrical.
o Skewness is concerned with the shape of the curve not size.

Concept of skewness
7/4/2023
Asaye.A
228
o The skewness of a distribution is defined as the lack of symmetry.
o In a symmetrical distribution, mean, median, and mode are equal to
each other.

Skewness…
7/4/2023
Asaye.A
229
• For moderately skewed distribution, the following relation holds
among the three commonly used measures of central tendency.
 Mean-Mode=3*(Mean-Median)
 Thera are two type of skewness based the its shape.
 Positively skewed: Smaller observations are more frequent than larger
observations. i.e. the majority of the observations have a value below an
average and it has a long tail in the positive direction (Mean > Median).

Cont…
7/4/2023
Asaye.A
230
Skewed to the right (positively skewed)
Mode
Median
Mean

Cont…
7/4/2023
Asaye.A
231
 Negatively (left) skewed: Smaller observations are less frequent
than larger observations. i.e. the majority of the observations have a
value above an average. i.e. Mean < Median.
Mean
Median
Mode

Measures of Skewness
7/4/2023
Asaye.A
232
1. Karl Pearson’s Coefficient of Skewness (SK):

Mean - Mode
Standard deviation
Sk

3(Mean - Median)
Standard deviation
Sk
If SK = 0, then the distribution issymmetrical.
If SK > 0, then the distribution is positively skewed.
If SK < 0, then the distribution is negativelyskewed.

Cont…
7/4/2023
Asaye.A
233
2. Moment Coefficient of Skewness
 Moment coefficient of skewness is based on moments. The formula
for calculating coefficient of skewness is:
𝛼3=
𝑀3
𝑀2
3/2 =
𝑀3
𝜎3
Where, Mr = 𝑖=1
𝑛
(𝑥𝑖 − 𝑥)𝑟
/𝑛
𝛼3 > 0, the distribution is positively skewed
α3 = 0, the distribution is symmetric
α3 < 0, the distribution is negatively skewed

2. Kurtosis
7/4/2023
Asaye.A
234
o Kurtosis is a measure of peakedness of a distribution, and measured
relative to the peakedness of a normal curve.
o The peakedness of a distribution can be classified into three:
o Leptokurtic: -
- A distribution having relatively high peak.
- A curve is more peaked than the normal curve .

Cont…
7/4/2023
Asaye.A
235
o Mesokurtic: -
- Normal peak
- The curve is properly peaked
o Platykurtic:
 Flat toped
 A large number of observations have low frequency are spread
in the middle interval.

Measures of kurtosis
7/4/2023
Asaye.A
237
 The moment coefficient of skewedness 𝛽2;
𝛽2 =
𝑀4
𝑀2
2
Where; 𝑀2 and 𝑀4 are central moments.
 If 𝛽2 = 3, then the distribution is Mesokurtic.
 If 𝛽2 > 3, then the distribution is Leptokurtic.
 If 𝛽2 < 3, then the distribution is Platykurtic.

Example:
7/4/2023
Asaye.A
238
Based on the following data:
𝑀0 = 1, 𝑀1 = -0.6, 𝑀2 = 1.6, 𝑀3 = -2.4, 𝑀4 = 5.8
a) Find the coefficient of skewness and discuss the distribution type.
b) Find the coefficient of kurtosis and discuss the distribution type.
Solution
a) 𝛼3=
𝑀′3
𝑀′2
3/2 =
−2.4
1.63/2 = -1.19 < 0, the distribution is negatively
skewed.
b) 𝛼4=
𝑀′4
𝑀′2
2 =
5.8
1.62 = 2.26 < 3, the curve is Platykurtic.

1. intro_biostatistics.pptx

More Related Content

What's hot

Similar to 1. intro_biostatistics.pptx

More from Abebe334138

Recently uploaded

1. intro_biostatistics.pptx