SlideShare a Scribd company logo
1 of 238
Debre Tabor University is new and different
7/4/2023
Asaye.A
1
Debre Tabor University
College of Heath Science
Social and Public Health
Biostatistics Course for Health Science Students
Debre Tabor, Ethiopia
Contact detail
7/4/2023
Asaye.A
2
 Asaye Alamneh (Lecturer of Biostatistics at DTU)
 Debre Tabor University
 College of Health Science
 Department: Social and Public Health
 Qualifications:
 BSc in Statistics, MPH in Biostatistics
 Contacts:
 Email: asaye2127stat@gmail.com
 Location: Debre Tabor University
Introduction to biostatistics
7/4/2023
3
Asaye.A
Outlines of presentation
7/4/2023
Asaye.A
4
 Definition of statistics and biostatistics
 Basic statistical concepts
 Classification of statistics
 Types of variables
 Application and limitation of biostatistics
Objectives
7/4/2023
Asaye.A
5
 After completing this chapter, the student will be able to:
 Define Statistics and Biostatistics
 List some basic terms
 Define and identify the different types of data and
understand why we need to classify variables
 Describe the importance and limitations of statistics
 Identify source of data
Definition of statistics
7/4/2023
Asaye.A
6
 The word statistics come from the Latin “status” which refers to
political state or government.
 Statistics can be defined in two ways:- plural sense and singular
sense.
1. Plural sense: are the aggregate of facts and figures, which are
expressed in numerical form.
 For example: Statistics on industrial production, Population growth
in the country in different years, etc.
Definition of statistics
7/4/2023
Asaye.A
7
2. Singular sense: Statistics refers to the science of collection,
organization, presentation, analysis, and interpretation of numerical of
data.
 It is useful to make data simple and easy to be understood by entire
population.
 Help us to use numbers to communicate ideas.
 For example: if we want to have a study about the distribution of
weights of the health science students in DTU.
Biostatistics
7/4/2023
Asaye.A
8
Biostatistics: application of statistical methods to medical, biological and
public health related problems.
 When the data being analyzed are derived from the biological science and
medicine, we use the term biostatistics to manage medical uncertainties.
Biostatistics….
7/4/2023
Asaye.A
9
 It is the scientific treatment given to the medical data derived from
group of individuals or patients.
 Collection of data.
 Presentation of the collected data.
 Analysis and interpretation of the results.
 Making decisions on the basis of such analysis.
Types of biostatistics
7/4/2023
Asaye.A
10
Based on how the data can be used, biostatistics can be classified in to
two main categories .
1. Descriptive statistics:
 Ways of collecting, organizing, summarizing, and presenting data at
hand into concise manner to get an impression of the data.
 Use to organize and describe the sample/population to simplify large
amount of data in sensible ways .
 It also show the final results in the form of table and graph.
Types……..
7/4/2023
Asaye.A
11
2. Inferential statistics: are methods for using sample data to make general
conclusions (inferences) about populations.
 Making conclusions for the population that is beyond available data.
For example:
 Probability distribution,
 Estimation,
 Confidence interval,
 Hypothesis testing,
 Regression analysis, etc.
Type of Biostatistics
7/4/2023
Asaye.A
12
Collection
Organizing
Summarizing
Presenting of data
Descriptive Statistics
Making inferences
Hypothesis testing
Determining relationship
Making the prediction
Inferential Statistics
Biostatistics
Stages of statistical investigation
7/4/2023
Asaye.A
13
A) Collection of data: measuring or gathering numerical data.
B) Organization of data: organizing and classifying the collected data.
C) Presentation of data: overview of the data in form of tables, graphs
and charts.
D) Analysis of data: extracting relevant information from the
summarized data
E) Interpretation of data: making generalization to the target
population.
Definitions of some basic terms
7/4/2023
Asaye.A
14
Population: A large group possessing a given characteristic or set of
characteristics .
A population may be finite or infinite
Parameter: characteristics obtained from the population or a single
measurement of population value.
Example: population mean (μ),population standard deviation (δ)
Statistic: characteristics obtained from the samples
Example: sample mean , mode , median SD, Variance etc
Cont….
7/4/2023
Asaye.A
15
Sampling: The technique of sample selection from the entire population
Sample: A subset of the population selecting by same sampling
techniques
Census: Complete enumeration of the population
Cont….
7/4/2023
Asaye.A
16
 Data: is raw, unorganized facts that need to be processed.
 When data is processed, organized, structured or presented in a given context
so as to make it useful, it is called information.
 Figure1: Relation between data and information
Cont…
7/4/2023
Asaye.A
17
Variable
7/4/2023
Asaye.A
18
 Variable- a characteristic which take different values in different
persons, places or things or any aspect of population unit that is
measured or recorded.
 e.g. height, weight, marital status, etc.
 Random variables: are variables whose value are determined by
chance.
 Data: are sets of values of one or more variables.
 Are numbers which can be measurements or can be obtained by
counting.
 Data set: it is a collection of observation on a variable.
7/4/2023
Asaye.A
19
Types of Variables (1)
7/4/2023
Asaye.A
20
 Depending on the characteristic of the measurement, variable can
be classified into two types.
1. Quantitative (numerical variable): it is one that can be measured
and expressed quantitatively or numerically.
 It is the result of measuring or counting attributes population.
 Quantitative variables are also subdivided into two types:-
 A. Discrete variable
 B. Continuous variable:
Types …..
7/4/2023
Asaye.A
21
A. Discrete variable:
 A variable whose values are countable and assign a whole
number.
 There is no decimal number.
 E.g. the number of daily admission of hospital, number of live
births per 1000 women, number of motor vehicle accident in Debre
Tabor town.
Types …..
7/4/2023
Asaye.A
22
B. Continuous variable: the one that does not have gaps or interruption.
 A variable that can assume any decimal number value over a certain
intervals.
For example;
 Serum cholesterol level of a patient,
 Weight,
 Age,
 Laboratory result,
 Time, Arm circumference
Types …..
7/4/2023
Asaye.A
23
2. Qualitative (categorical) variable: it can not be quantified or
measured numerically, but measured by assigning names to
items (events).
E.g. sex, marital status, race or ethnic group, occupational status,
eye color etc.
A. Nominal variables: variables with no inherent order or ranking
sequence.
B. Ordinal variables: variables with an ordered series .
Types of Variables (2)
7/4/2023
Asaye.A
24
 Dependent variable: the outcome of interest, which should change in
response to some intervention.
 Some times called as out come or response variable.
 Independent variable: is the intervention, or what is being
manipulated.
 a variable that you believe might influence your outcome
measure.
 An independent variable is a hypothesized cause or influence on a
dependent variable.
Type of scales of measurement
7/4/2023
Asaye.A
25
 Based on the nature of the variable, variables can be measured
into four d/t levels of measurement.
 Measurement is defined as the assignment of numbers, symbols
and/or names to objects or events.
Type of scales….
7/4/2023
Asaye.A
26
 Each scale of measurement has certain properties which in turn
determine the appropriateness for use of certain statistical analyses.
 The property of value assigned to data based on the three properties of
measurement such as, order, distance and fixed zero/true zero.
 The four scales/levels of measurement are nominal, ordinal, interval
and ratio.
1. Nominal scale
7/4/2023
Asaye.A
27
 It is the lowest level of measurement.
 It simply consists of "naming" or classifying them into various
mutually exclusive, all inclusive categories in which no order or
ranking can be imposed on the data.
 When numbers are assigned to categories, it only used for coding
purposes and it does not provide a sense of size.
Cont…
7/4/2023
Asaye.A
28
 No arithmetic and relational operation can be applied.
 Nominal measurements have no three properties among values.
For example;
 Sex of a person (M, F),
 Eye color (e.g. brown, blue),
 Religion (Muslim, Christian),
 Place of residence (urban, rural),
 Race (e.g. black, white).
2. Ordinal scale
7/4/2023
Asaye.A
29
 Level of measurement which classifies data into categories that
can be ranked. Differences between the ranks do not exist.
 Relational operations of greater than, less than are applicable,
 The real difference between the ranks do not exist.
Cont…
7/4/2023
Asaye.A
30
Example;-
 Socio-economic status (very low, low, medium, high, very high)
 Patient status (unimproved, improved, much improved),
 Height of patients (very short, short, tall, very tall),
 Blood pressure (very low, low, high, very high),
 Job satisfaction level (highly dissatisfied, dissatisfied, satisfied,
highly satisfied), etc
3. Interval Scale
7/4/2023
Asaye.A
31
 It is possible to rank or order and tell the real distance between any two
measurements.
 However, there is no meaningful zero, so ratios are meaningless.
 All arithmetic operations except division are applicable.
 Relational operations are also possible.
Cont…
7/4/2023
Asaye.A
32
 The selected zero point is not necessarily a true zero in which it doesn't
have to indicate a total absence of the quantity being measured.
 Not that, zero degree Celsius is arbitrary so it does not make sense to
say that 20 degree Celsius is twice hot as 10 degree Celsius.
Examples:
 Body temperature in OF or OC, time of the day, days of the year, test
score, IQ…
4. Ratio scale
7/4/2023
Asaye.A
33
 Is the highest level of measurement.
 It classifies data that can be ranked, differences are meaningful, and
there is a true zero. True ratios exist between the different units of
measure.
 There is always a true zero point, which shows the absence of
condition.
 All arithmetic and relational operations are applicable.
Example: volume, height, weight, length, number of items, etc.
7/4/2023
Asaye.A
34
Summary of level of measurement
Summary of levels of measurement
No
No
No
Yes
Nominal
No
No
Yes
Yes
Ordinal
No
Yes
Yes
Yes
Interval
Yes
Yes
Yes
Yes
Ratio
Determine if one
data value is a
multiple of another
Subtract data
values
Arrange
data in
order
Put data in
categories
Level of
measurement
Summary of levels of measurement
Why we need Biostatistics?
7/4/2023
Asaye.A
37
 The main theory of statistics lies in the term variability.
 We can also have instrumental variability and observers variability.
1. Handling variation.
1. Biological variation: variation among individuals as well as within
individuals over time.
For example; height, weight, blood pressure,….
2. Sample variation: biomedical research project are usually carried out on
small numbers of study subjects.
Why we need Biostatistics?
7/4/2023
Asaye.A
38
2. Essential for scientific statistical methods of investigation.
 Formulate hypothesis.
 Design study to objectively to test hypothesis.
 Collect reliable and unbiased data.
 Process and evaluate data rigorously.
 Interpretate and making appropriate conclusion.
 These statistical methods are designed to contribute to the process of
making scientific judgment in the face of uncertainties and variation.
Why we need Biostatistics?
7/4/2023
Asaye.A
39
 It helps the researcher to arrive at a scientific judgment about a
hypothesis.
 It study the association between two or more attributes
 To evaluate the efficacy of drugs
 To determine the success or failure of health care program
 To define and measure the extent of the disease
 Statistical methods help us to understand public health issues and
disease, also quantifying uncertainties present in basic medical
sciences.
Limitations of statistics
7/4/2023
Asaye.A
40
 It deals with only those subjects of investigation that are capable of being
quantitatively measured and numerically expressed.
 It deals on only aggregates of facts and no importance to individual items.
 Statistical data are only approximately and not mathematically correct.
 Statistics can be easily misused and therefore should be used be experts.
Sources of data
7/4/2023
Asaye.A
41
 There are two basic sources of statistical data: These are:
1. Primary data: The first hand data were collected from the items or
individual respondents directly by researcher primarily for the purpose of
certain study.
Primary data…
7/4/2023
Asaye.A
42
 The major primary sources of data are :-
 Surveys,
 Surveillance,
 Census,
 Observation and,
 Experimental studies.
Secondary data
7/4/2023
Asaye.A
43
2. Secondary data: which had been collected by certain people or agency, and
statistically treated and the information contained in it is used for other purpose.
 For example: hospital records, magazines, CSA, DHS, and vital statistics:
 Birth reports,
 Death reports,
 Epidemic reports
 Reports of laboratory utilization (including laboratory test results)
Exercises
7/4/2023
Asaye.A
44
For each of the following variable indicate whether it is quantitative or
qualitative and specify the measurement scale for each variable :
1. Blood Pressure (mmHg)
2. Cholesterol (mmol/l)
3. Diabetes (Yes/No)
4. Body Mass Index (Kg/m2)
5. Age (years)
6. Employment (paid work/retired/housewife)
7. Smoking Status (smokers/non-smokers, ex-smokers)
8. Exercise (hours per week)
9. Drink alcohol (units per week)
10. Level of pain (mild/moderate/severe)
7/4/2023
Asaye.A
45
Methods of data collection and presentation
Methods of data collection and
presentation….
7/4/2023
Asaye.A
46
 At the end of this chapter ,you should be able to
– Understand Method of Data collection
– Identify Method of data Presentation
 Tabular Presentation
 Diagrammatic presentation
 Graphic presentation
1. Methods of data collection
7/4/2023
Asaye.A
47
 Data collection techniques allow us to systematically collect
data about our objects of study (people, objects, and
phenomena) and about the setting in which they occur
 Data can be obtained by a variety of ways. One of the most
common is through the use of surveys
 Surveys can be done by using a variety of data collection
methods
Methods…
7/4/2023
Asaye.A
48
There are various methods of data collection
1. Observation
2. Interviews
3. Questionnaires
4. Extraction of data from records
Methods……
7/4/2023
Asaye.A
49
 1. Observation- It is a technique that involves systematically
selecting, watching and recoding behaviors of people or other
phenomena and for the purpose of getting (gaining) specified
information.
 It includes all methods from simple visual observations to the
use of high level machines and measurements, sophisticated
equipment or facilities
Methods……
7/4/2023
Asaye.A
50
 Advantage- it gives relatively more accurate data on behavior and
activities.
 Disadvantage: -
 Investigator’s (observers) bias
 It requires more resource and
 Skilled human power during use of high level machines.
Methods……
7/4/2023
Asaye.A
51
2. Interview- commonly used for research data collection techniques.
 Face-to-Face interview,
 Telephone interview.
 Interviewee (responder), interviewer (asker)’
Methods…
7/4/2023
Asaye.A
52
I. Direct personal interview
 The investigator presents himself /herself personally before the
informant and questions him /her personally
 Best suited to situations where problems are not completely
understood and where questions can not be formulated before
hand and one question leads to other.
Disadvantage
 It is time consuming
 It is not suited for large group of informants
Methods…
7/4/2023
Asaye.A
53
II . Interviewing using questionnaire
 One drafts a detailed questionnaire
 The investigator appoints agents known as enumerators, who go to the respondents
personally with the questionnaire,
 Ask them the questions given there in, and
 Record their replies
 They can be
 Face-to-face or
 Telephone interviews.
Methods…
7/4/2023
Asaye.A
54
Face-to-face interviews
Advantage
 The interviewer knows exactly who is responding to the questionnaire.
 The interviewer can help the respondent if he/she has difficulty in
understanding the questions e.g. language, concentration.
 There is more flexibility in presenting the items ;they can range from
closed to open.
 Observations can be made as well.
Methods…
7/4/2023
Asaye.A
55
Disadvantage
 Untrained interviewer may distort the meaning of questions.
 Attributes of the interviewer may affect the responses given due to
bias of the interviewer and his/ her social or ethnic characteristics.
 More cost in terms of time and money (training and salary of
interviewers).
Methods…
7/4/2023
Asaye.A
56
Telephone interviews
Advantage
 Less expensive in time and money compared with face to face interviews
 The interviewer is able to help the respondent if he/she doesn’t understand
the question
 Broad representative samples can be obtained for those who have
telephone lines
 May assure the uniformity if interviewer is the same.
Methods…
7/4/2023
Asaye.A
57
Disadvantage
 Under presentation of those group which do not have telephone
 Respondent may be substituted by another
 Problems with questions with multiple options for answer and complicate
questions
3. Questionnaires
7/4/2023
Asaye.A
58
 Self administered questionnaires:- the respondent reads the question and
fill the answers by themselves.
 Advantage
 Is simpler and cheaper.
 Can be administered to many persons simultaneously (e.g. to a class
of students).
 Disadvantage
 They demand a certain level of education and skill on the part of the
respondents.
Methods…
7/4/2023
Asaye.A
59
Postal questionnaire
 The questionnaires are sent by post to the informants together with
a polite covering letter by explaining
 The detail information
 The aims and objectives of collecting the information, and
 Requesting the respondents to cooperate by furnishing the
correct replies and returning the questionnaire duly filled in.
 The return postage expenses are usually covered by the investigator.
Methods…
7/4/2023
Asaye.A
60
The main problems with postal questionnaire are :
 Response rates tend to be relatively low, and
 There may be under representation of less literate subjects.
Methods…
7/4/2023
Asaye.A
61
Mailed Questionnaire
 The questionnaire is mailed to respondents to be filled.
 Some times known as self enumeration.
Advantage
 Cheap
 No need for trained interviewers.
 No interviewer bias.
 They can be coordinated from one central location.
Methods…
7/4/2023
Asaye.A
62
Disadvantage
 Low response rate.
 Uncompleted questionnaires due to omission or invalid
response.
 No assurance that the questionnaire was answered by right
person.
 Needs intense follow up to get a high response rate.
Methods……
7/4/2023
Asaye.A
63
3. Extraction of data from records
 Clinical and other personal records, death certificates, published mortality
statistics, census publications, etc.
Examples;
1. Official publications of Central Statistical Authority
2. Publication of Ministry of Health and Other Ministries
3. News Papers and Journals.
Methods…
7/4/2023
Asaye.A
64
4. International Publications like Publications by WHO, World Bank,
UNICEF.
5. Records of hospitals or any Health Institutions.
 During the use of data from documents, though they are less time
consuming and relatively have low cost, take care on the quality and
completeness of the data.
Problems in gathering data
7/4/2023
Asaye.A
65
Common problems might include:
 Language barriers
 Lack of adequate time
 Expense
 Inadequately trained and experienced staff
 Bias
 Cultural norms
Choosing method of data collection
7/4/2023
Asaye.A
66
 To chose a better data collection method, we have to focus on
relevant, timely, accurate and usability of information.
 Some methods pay attention to timeliness and reduction in cost.
 Others pay attention to accuracy and the strength of the method
in using scientific approaches.
Cont…
7/4/2023
Asaye.A
67
 The selection of the method of data collection is also based on
practical considerations, such as:
 The need for personnel, skills, equipment, etc. in relation to what
is available.
 The acceptability of the procedures to the subjects.
 The probability that the method will provide a good coverage.
i.e. will supply the required information about all or almost all
members of the population.
Types of questions
7/4/2023
Asaye.A
68
 Before looking the steps in questionnaire design, we need to review
the types of questions.
 There are two types of questions
1. Open ended (free-response)
2. Close ended (restricted choice)
1. open ended
e.g. in your opinion what is the biggest barrier in getting your hospitals
ANC unit patient.
Types……
7/4/2023
Asaye.A
69
 Advantages- it stimulates free thoughts of respondent
 Helpful to obtain information on sensitive issues
 Disadvantages- there may problem of recalling answers
 It is not suitable for mailed question
 Answers are difficult to code for statistical analysis
 The problem of poor hand writing
Types……
7/4/2023
Asaye.A
70
2. Close ended- provides fixed answers
e.g. including your present visit how many times did you visit this
hospital in the past two yrs?
A. Once B. Twice C. 3x D. 4x E. >4x
 Advantage- suitable for many forms of statistical analysis
 Not difficult to code
 Disadvantage- limits a variety of details
Types……
7/4/2023
Asaye.A
71
Partially open ended question
 Advantage- provides alternatives if certain option are over looked
 it identifies missing categories for future use
 Disadvantage- respondent may ignore other options
e.g. if the house hold lost any of its members due to death in the last 12
months what was the cause of death.
1.Malaria 3. car accident
2. famine/hunger 4.others specify
Requirements of questions
7/4/2023
Asaye.A
72
 Must have face validity
 The question that we design should be one that give an obviously
valid and relevant measurement for the variable.
 Must be clear and unambiguous.
 One question contain only one ideas and all respondent will
understand in the same way.
 Must not be offensive (avoid questions that may offend the
respondent).
Cont…
7/4/2023
Asaye.A
73
 The questions should be fair (should not be loaded).
 Sensitive questions - It may not be possible to avoid asking
‘sensitive’ questions that may offend respondents
 In such situations the interviewer (questioner) should do it very
carefully and wisely
Cont…
7/4/2023
Asaye.A
74
 Start with an interesting but non-controversial question
(preferably open) that is directly related to the subject of the
study.
 Pose more sensitive questions as late as possible in the
interview
 Use simple language.
 Make the questionnaire as short as possible.
What to be considered before designing questioning tool
7/4/2023
Asaye.A
75
 What exactly do we want to know, according to the objectives
and variables we identified earlier?
 Of whom will we ask questions and what techniques will we
use?
 Are our informants mainly literate or illiterate?
 How large is the sample that will be interviewed?
Types…
7/4/2023
Asaye.A
76
Question
type
Open ended
Closed ended
Simple
dichotomy
Multiple
choice
Determinant
choice
Check-list
Types of closed format
7/4/2023
Asaye.A
77
Choice of categories
 Q. What is your marital status?
 Single
 Married
 Divorced
 Widowed
Likert (similar)style scale
 Q. Biostatistics is an interesting subject
 Strongly disagree
 Disagree
 Cannot decided
 Agree
 Strongly agree
Cont…
7/4/2023
Asaye.A
78
Checklists
 Circle the public health specialties you are particularly interested in
 Epidemiology and Biostatistics
 Reproductive health
 Nutrition
 Health informatics
 Health service management
 General
Cont…
7/4/2023
Asaye.A
79
Ranking
 Please rank your interest in the following specialties
(1=most interesting, 4=least interesting )
 Epidemiology and Biostatistics
 Reproductive health
 Nutrition
 Health informatics
2. Methods of data organization and presentation
7/4/2023
Asaye.A
80
 The most convenient method of organizing data is to construct a frequency
distribution.
 A frequency distribution is the organization of raw data in table form, using
classes and frequencies.
 Frequency distribution table: lists categories of scores along with their
corresponding frequencies.
 For this different techniques of data organization and presentation like order
array, tables and diagrams are used.
Array (ordered array)
7/4/2023
Asaye.A
81
 A serial arrangement of numerical data in an ascending or
descending order.
 A simple arrangement of individual observations in order of
magnitude.
 This will enable us to know the range over which the items are
spread and will also get an idea of their general distribution.
 It is an appropriate way of presentation when the data are small in
size (usually less than 20).
Frequency Distribution (F.D.)
7/4/2023
Asaye.A
82
Frequency distribution is organization of the values of a
variable arranged in order of magnitude either individually (for a
discrete variable), or in to classes (for a continuous variable), or
into categories (in case of qualitative data) along with their
frequencies.
Frequency Distribution (F.D.)…
7/4/2023
Asaye.A
83
A frequency distribution has two main parts; namely,
i. The values of the variable (if quantitative) or the
categories (if qualitative), and
ii. The number of observations (frequency)
corresponding to the values or categories.
Frequency Distribution (F.D.)…
7/4/2023
Asaye.A
84
There are two types of frequency distributions
i. Categorical (or qualitative)
ii. Numerical (or quantitative)
1. Categorical Frequency Distribution
 Data are classified according to non-numerical categories.
 Categories must be mutually exclusive and exhaustive.
 Used to organize nominal and ordinal data.
Cont…
7/4/2023
Asaye.A
85
a) Nominal data: Here the construction is straight forward: count the
occurrences in each category and find the totals.
Example: The martial status of 60 adults classified as single, married,
divorced and widowed is presented in a FD as below:
Ordinal data: The construction is identical to the nominal case, but, the
categories should be put in an ordered manner.
Marital
status
Single Married Divorced Widowed Total
Frequency 25 20 8 7 60
Cont…
7/4/2023
Asaye.A
86
b) Ordinal data. The construction is identical to
the nominal case. How ever, the categories
should be put in an ordered manner.
Example: Satisfaction on teaching method in a
class of size 60 is presented in a FD as shown
below
Numerical F.D
7/4/2023
Asaye.A
87
2. Numerical Frequency Distribution
 data are classified according to numerical size.
 used to organize interval and ratio data.
 may be discrete or continuous, depending on whether the
variable is discrete or continuous.
Numerical F.D…
7/4/2023
Asaye.A
88
a) Discrete (Ungrouped) Frequency Distribution
 Count the number of times each possible value is repeated.
Example: In a survey of 30 families, the number of children per
family was recorded and obtained the following data:
4 2 4 3 2 8 3 4 4 2 2 8 5 3 4 5 4 5 4 3 5 2 7 3 3 6 7 3 8 4.
The distribution of children in 30 families would be:
No. of
children
2 3 4 5 6 7 8 total
No. of family
(f)
5 7 8 4 1 2 3 30
Continuous grouped F.D
7/4/2023
Asaye.A
89
b) Continuous/grouped Frequency Distribution
o Arise from continuous variables/data.
o Unlike for a discrete FD, a class can not be allocated to
each value of a continuous variable.
o Categories in to which the observations are distributed are
called classes or class intervals.
o Classes should be exhaustive and mutually exclusive.
Example
7/4/2023
Asaye.A
90
Time spent Frequency
10 – 14 8
15 – 19 28
20 – 24 27
25 – 29 12
30 – 34 4
35 – 39 1
Steps in constructing continuous frequency distribution
7/4/2023
Asaye.A
91
1. Determine the number of classes (k): Number of items
belonging to a class.
 Decide ”k” with the help of Sturge’s rule:
k = 1 + 3.322 log(n)
Rounded up or down to the nearest integer.
Where n= number of observations, log= common logarithm
(logarithm of 10).
Cont…
7/4/2023
Asaye.A
92
 Example if n=10, k=4.32≈4, if n=100, k=7.644≈8, if n=1000,
k=10.96≈11
 2. Determine the class width (w): the difference between the
upper or lower boundaries of two consecutive classes (may be
class limits).
 We can use, W =
𝑅𝑎𝑛𝑔𝑒
𝐾
 Note that “W” rounded up or down to the nearest integers.
Cont…
7/4/2023
Asaye.A
93
3. Determine the Class Limits
 It separates one class from another and have gap between the upper
limits of one class and the lower limit of the next class.
 The lower class limit of the first class should be the smallest value of
the observations.
 Add the size of a class width on the lower class limit to obtain the
lower class limit of the next classes.
 Unit of measure (U): This is the possible difference between
successive values or measures. E.g. 1, 0.1, 0.01, 0.001……
Cont…
7/4/2023
Asaye.A
94
 To find the upper limit of the first class, subtract U from the lower
limit of the second class.
 Then continue to add the class width to this upper limit to find the
rest of the upper limits or
 Obtain the upper class limits by adding class width minus one to the
corresponding lower class limits. i.e. UCL =LCL+ (W-1)
Cont…
7/4/2023
Asaye.A
95
4. Determine the Class boundaries
 Making an interval of a continuous variable continuous in both directions,
no gap exists between classes.
 let U =LCL of the second class – UCL of preceding class.
 Add half of this difference (U/2) to all upper class limits to get the upper
class boundaries (UCBs), and subtract (U/2) from all lower class limits to
get the lower class boundaries (LCBs).
 UCBi = UCLi +U/2
 LCBi = LCLi – U/2
Cont…
7/4/2023
Asaye.A
96
5. Class mark (C.M) or Mid points: it is the average of the lower and upper
class limits or the average of upper and lower class boundary.
6. Determine the frequency of each class: determined simply by counting
the number of observations belonging to each class.
7. Cumulative frequency is the number of observations less than/ more than
or equal to a specific value.
8. Cumulative frequency above (Greater than type): it is the total
frequency of all values greater than or equal to the lower class boundary of a
given class.
Cont…
7/4/2023
Asaye.A
97
9. Cumulative frequency below (less than type): it is the total frequency of
all values less than or equal to the upper class boundary of a given class.
10. Relative frequency (rf): it is the frequency divided by the total frequency.
11. Relative cumulative frequency (rcf): it is the cumulative frequency
divided by the total frequency.
Cont…
7/4/2023
Asaye.A
98
Example: The blood glucose level for 50 patients is shown below.
Construct a frequency distribution for the following data.
Cont…
7/4/2023
Asaye.A
99
Solution:
Step 1: Find the highest and the lowest value H=88, L=42
Step 2: Find the range; R=H-L=88-42=46.
Step 3: Select the number of classes desired using Sturge’s formula;
k=1+3.322log (50) =6.64=7(rounding up)
Step 4: Find the class width; w=R/k=46/7=6.57=7 (rounding up)
Cont…
7/4/2023
Asaye.A
100
Step 5: Select the starting observation as lowest class limit (this is
usually the lowest observation).
 Add the class width to that observation to get the lower limit of the
next class.
 Keep adding until there are 7 classes. 42, 49, 56, 63, 70, 77, 84 are
the lower class limits.
Step 6: Find the upper class limit; e.g. the first upper class=49- U=49-
1=48. The rest CL: 55, 62, 69, 76, 83, 90 are the upper class limits.
Cont…
7/4/2023
Asaye.A
101
 So combining step 5 and step 6, one can construct the following
classes.
Step 7: Find the class boundaries by subtracting 0.5 from each lower
class limit and adding 0.5 to the UCL.
Cont…
7/4/2023
Asaye.A
102
Example: For class 1: LCBi =LCLi - U/2 = 42-0.5 = 41.5 and UCBi =
UCLi + U/2 = 48+0.5 = 48.5.
 Then continue adding W on both boundaries to obtain the rest
boundaries.
 By doing so one can obtain the following classes.
Cont…
7/4/2023
Asaye.A
103
Step 8: Find the frequencies
Step 9: Find cumulative frequency.
Step 10: Find relative frequency and /or relative cumulative frequency.
Cont…
7/4/2023
Asaye.A
104
Example:
7/4/2023
Asaye.A
105
Construct a continuous FD for the following raw data of ages of
patients admitted at DTU hospital in a given week.
Cont…
7/4/2023
Asaye.A
106
Cont…
7/4/2023
Asaye.A
107
 Here is the FD
Cont…
7/4/2023
Asaye.A
108
 The class marks and class boundaries of the above Example are:
Continuous/grouped F.D …
7/4/2023
Asaye.A
109
Cumulative frequency distributions
o Tells us how often the values fall below or above that class. There
are two types of CFD:
The “less than” cumulative F.D.
o Obtained by adding the frequency of all the preceding classes
including the frequency of that class.
The “more than” cumulative F.D.
o Obtained by adding the frequency of the succeeding classes
including the frequency of that class.
Cont…
7/4/2023
Asaye.A
110
 For the data in the above Example, both cumulative frequency
distributions are given below:
Following the rules for grouping data
7/4/2023
Asaye.A
111
 The groups must not overlap: not to be confuse concerning in which group
a measurement belongs.
 There must be continuity from one group to the next: Otherwise some
measurements may not fit in a group.
 The groups must range from the lowest measurement to the highest
measurement.
 The groups should normally be of an equal width.
Methods of data presentation
7/4/2023
Asaye.A
112
Commonly, here are two ways of presenting
statistical data:
1. Statistical tables
2. Graphs/Diagrams
1. Tabulation methods of data
presentations
7/4/2023
Asaye.A
113
1. Statistical tables
o A statistical table is an orderly and systematic presentation of data
in rows and columns.
Rows : are horizontal arrangements.
Columns: are vertical arrangements.
o Use of tables for organizing data that involves grouping the data
into mutually exclusive categories of the variables and counting
the number of occurrences (frequency) to each category.
Cont….
7/4/2023
Asaye.A
114
 Based on the purpose for which the table is designed and the
complexity of the relationship, a table could be either of
simple frequency table or cross tabulation.
 Simple frequency table is used when the individual
observations involve only to a single variable.
 Cross tabulation is used to obtain the frequency distribution of
one variable by another variables.
General principles to construct
tables
7/4/2023
Asaye.A
115
1. Tables should be as simple as possible.
2. Tables should be self-explanatory.
Title should be clear and placed above the table. a good title
answers: what? when? where? how classified ?
Each row and column should be labeled.
Numerical entities of zero should be explicitly written rather
than indicated by a dash.
Dashed are reserved for missing or unobserved data.
Totals should be shown either in the top row and the first
column or in the last row and last column.
3. If data are not original, their source should be given in a footnote.
A) Simple or one-way table
7/4/2023
Asaye.A
116
Simple frequency table: most basic table is a simple
frequency distribution with one variable.
Example:
Table. Blood group of voluntary blood donors
examined in red cross blood bank within a day, may
2006 (n=548)
Blood group Number of
students
Percent
A 240 43.8
B 146 26.6
AB 57 10.4
O 105 19.2
Total 548 100
Rows
Title Columns
Two and three variable table
7/4/2023
Asaye.A
117
 If two variables are cross tabulated, it is a two variable table
 If the tabulation is among three variables, it is three variable
table .
 In cross tabulated frequency distributions where there are row
and column totals, the decision for the denominator is based
on the variable of interest to be compared over the subset of
the other variable.
Example
7/4/2023
Asaye.A
118
Common form of a two by two
variable
7/4/2023
Asaye.A
119
 It is a special form of table favorite among
epidemiologist.
 It is used to compare whether there is relationship
between the two variables.
Exposure Numbers of subjects Total
Cases Controls
Exposed 23 23 46
Non-
exposed
4 139 143
Total 27 162 189
Composite/ Higher Order Table
7/4/2023
Asaye.A
120
It is a large table combining several separate variable/tables
Age, sex and other demographic variables may be combined
to form a single table
Example: Distribution of Health Professional by Sex
and Residence
Diagrammatic and Graphical methods of data presentation
7/4/2023
Asaye.A
121
Advantages
 To understand the information easily.
 To make the data attractive.
 To make comparisons of items easily.
 To draw attention of the observer.
 The purpose of graphs and diagrams is not to provide exact and detailed
information, but simple comparisons.
 Any further information shall rather be obtained from the original data.
Limitations of Diagrammatic presentation
7/4/2023
Asaye.A
122
 The technique is made use only for purposes of comparison. It is
not to be used when comparison is either not possible or is not
necessary.
 is not an alternative to tabulation. It only strengthens the textual
exposition of a subject, and cannot serve as a complete substitute
for statistical data.
 It can give only an approximate idea and as such where greater
accuracy is needed diagrams will not be suitable.
 They fail to bring to light small differences.
2. Diagrammatic Presentation of data
7/4/2023
Asaye.A
123
 Diagrams are appropriate for presenting discrete as well as
qualitative data.
 The three most commonly used diagrammatic presentation of
data are:
 Pie charts
 Bar charts
 Pictograms
Cont…
7/4/2023
Asaye.A
124
1. Pie chart
7/4/2023
Asaye.A
125
 Pie chart can used to compare the relation between the
whole and its components.
 useful for qualitative or quantitative discrete data.
 Pie chart is a circular diagram and the area of the sector of a
circle is used in pie chart.
Cont…
7/4/2023
Asaye.A
126

Example:
7/4/2023
Asaye.A
127
Draw a suitable diagram to represent the following
population in a town.
Men Women Girls Boys
2500 2000 4000 1500
Cont…
7/4/2023
Asaye.A
128
Cont…
7/4/2023
Asaye.A
129
2. Bar charts (or graphs)
7/4/2023
Asaye.A
130
 Categories are listed on the horizontal axis (X-axis).
 Frequencies or relative frequencies are represented on the Y-
axis.
 The height of each bar is proportional to the frequency or
relative frequency of observations in that category.
 There are three types of bars.
Tips for constructing bar diagrams
7/4/2023
Asaye.A
131
1. Whenever possible it is better to construct a bar diagram on a graph
paper
2. All bars drawn in any single study should be of the same width
3. The different bars should be separated by equal distances
4. All the bars should rest on the same line called the base
5. Whenever possible, it is advisable to draw bars in order of
magnitude
Cont…
7/4/2023
Asaye.A
132
A. Simple bar chart:- used to represent a single variable classified on spatial,
quantitative or temporal basis.
Cont…
7/4/2023
Asaye.A
133
Example: Construct a bar chart for the following data
Cont…
7/4/2023
Asaye.A
134
Cont…
7/4/2023
Asaye.A
135
B. Sub-divided bar chart (component)
o is used to represent data in which the total magnitude is divided into
different or components
o Example: Plasmodium species distribution for confirmed malaria
cases, Zeway, 2003
Cont…
7/4/2023
Asaye.A
136
C. Multiple bar chart
 are used two or more sets of inter-related data are represented
(multiple bar diagram facilities comparison between more than one
phenomenon).
 The following figure shows a multiple bar chart to represent the
import and export of Canada (values in $) for the years 1991 to
1995.
Cont…
7/4/2023
Asaye.A
137
Cont…
7/4/2023
Asaye.A
138
3. Graphical Presentation of data
7/4/2023
Asaye.A
139
The histogram, frequency polygon and cumulative frequency graph (ogive) are
most commonly applied graphical representation for continuous data.
Procedures for constructing statistical graphs
• Draw and label the X and Y axes.
• Choose a suitable scale for the frequencies or cumulative frequencies and label
it on the Y axes.
• Represent the class boundaries for the histogram or ogive and the mid points
for the frequency polygon on the X axes.
• Plot the points.
• Draw the bars or lines to connect the points.
Graphical Presentation of
data
7/4/2023
Asaye.A
140
1. Histogram
 A graph which places the class boundaries on the horizontal axis
and the frequencies on a vertical axis
 Class marks and class limits are some times used as quantity on the
X axes.
 Non-overlapping intervals that cover all of the data values must be
used.
Cont…
7/4/2023
Asaye.A
141
 Bars are drawn over the intervals in such a way that the areas of the
bars are all proportional in the same way to their interval
frequencies.
 To avoid crowding, you can use class midpoints.
 Example: Distribution of the age of women at the time of marriage
Cont…
7/4/2023
Asaye.A
142
Histogram
Cont…
7/4/2023
Asaye.A
143
2. Frequency polygon
 Line graph of class marks against class frequencies.
 To draw a frequency polygon we connect the midpoints of class
boundaries of the histogram by a straight line
Cont…
7/4/2023
Asaye.A
144
 It can be also drawn without erecting rectangles by joining the top
midpoints of the intervals representing the frequency of the classes as
follows:
Cont…
7/4/2023
Asaye.A
145
3. Ogive Curve (Cumulative Frequency Polygon)
 A graph showing the cumulative frequency (less than or more than type)
plotted against upper or lower class boundaries respectively.
 Ogive uses class boundaries along the horizontal axis, and cumulative
frequency along vertical axis.
 Less than Ogive uses less than cumulative frequency on y axis.
 More than Ogive uses more than cumulative frequency on 𝑦 axis.
 The points are joined by a free hand curve
Cont…
7/4/2023
Asaye.A
146
Cont…
7/4/2023
Asaye.A
147
Cont…
7/4/2023
Asaye.A
148
Cont…
7/4/2023
Asaye.A
149
Cont…
7/4/2023
Asaye.A
150
4. Line graph
o A variable is taken along X-axis and the frequency of occurrence of
each of its observed values along the Y-axis.
o The points are plotted and joined by line.
o An arithmetic scale line graph shows patterns or trends over some
variable, usually time.
Cont…
7/4/2023
Asaye.A
151
7/4/2023
Asaye.A
152
7/4/2023
Asaye.A
153
Summary Measures
learning outcomes
7/4/2023
Asaye.A
154
 After completing this chapter a student will able to;
 List and calculate measures of central tendency
 List and calculate measures of dispersion
 Describe types of shape.
Numerical Summary
Measures
7/4/2023
Asaye.A
155
 They are the single numbers which quantify the characteristics
of a distribution of values.
 They are two types;
1. Measures of central tendency or location
2. Measures of dispersion
Measures of Central Tendency/ Measures of Location
7/4/2023
Asaye.A
156
 Measures of central Tendency: the methods of determining the
actual value at which the data tend to concentrate.
 The tendency of the statistical data to get concentrated at a certain
value is called “central tendency”
 The objective of calculating MCT is to determine a single figure
which may be used to represent the whole data set.
 Since a MCT represents the entire data, it facilitates comparison
within one group or between groups of data
Characteristics of a good MCT
7/4/2023
Asaye.A
157
 A MCT is good or satisfactory if it possesses the following characteristics:
o It should be based on all the observations
o It should not be affected by the extreme values
o It should be as close to the maximum number of values as possible
o It should have a definite value
o It should not be subjected to complicated and tedious calculations
o It should be capable of further algebraic treatment
o It should be stable with regard to sampling
Cont…
7/4/2023
Asaye.A
158
 The most common measures of central tendency include:
 Arithmetic Mean
 Median
 Mode
1. Arithmetic Mean
7/4/2023
Asaye.A
159
1. Ungrouped Data
 The arithmetic mean is the "average" of the data set and by far the
most widely used measure of central location.
 Is the sum of all the observations divided by the total number of
observations.
Arithmetic…..
7/4/2023
Asaye.A
160
The heart rates for n=10 patients were as follows (beats per minute):
167, 120, 150, 125, 150, 140, 40, 136, 120, 150.
What is the arithmetic mean for the heart rate of these patients?
Cont…
7/4/2023
Asaye.A
161
 When the data are arranged or given in the form of frequency
distribution i.e. there are K variety such that a value Xi has
frequency fi (i=1,2,…,k), then the arithmetic mean will be given as ;
Cont…
7/4/2023
Asaye.A
162
Solution
Cont…
7/4/2023
Asaye.A
163
Exercise
Consider the following frequency distribution table
Calculate the average of this data set?
Cont…
7/4/2023
Asaye.A
164
2. For grouped data
 In calculating the mean from grouped data, we assume that all values falling
into a particular class interval are located at the midpoint of each interval.
 Therefore, mean for grouped data is calculated as:
Arithmetic…..
7/4/2023
Asaye.A
165
Example
Compute the mean age of 169 subjects from the grouped data.
Mean = 5810.5/169 = 34.48 years
Class interval Mid-point (mi) Frequency (fi) mifi
10-19
20-29
30-39
40-49
50-59
60-69
14.5
24.5
34.5
44.5
54.5
64.5
4
66
47
36
12
4
58.0
1617.0
1621.5
1602.0
654.0
258.0
Total __ 169 5810.5
Arithmetic…..
7/4/2023
Asaye.A
166
 The mean can be thought of as a “balancing point”, “center of gravity”
 It is possible in extreme cases for all but one of the sample points to be on
one side of the arithmetic mean & in this case, the mean is a poor measure
of central location or does not reflect the center of the sample.
Properties of the Arithmetic Mean
7/4/2023
Asaye.A
167
 The mean can be used as a summary measure for both discrete and
continuous data, but it is not appropriate for either of nominal or ordinal
data.
 For a given set of data there is only one arithmetic mean (uniqueness).
 Easy to calculate and understand (simple).
 Influenced by each and every value in a data set
 Greatly affected by the extreme values.
 In case of grouped data if any class interval is open, arithmetic mean can
not be calculated.
2. Median
o It is the an alternative measure of central tendency, second in popularity
next to arithmetic mean.
o Suppose there are n observations in a sample
o If these observations are ordered from smallest to largest, then the median
is defined as follows:
o The median, is a value such that at least half of the observations are less
than or equal to median and at least half of the observations are greater
than or equal to median.
 The median is the midpoint of the data array.
2. Median….
7/4/2023
Asaye.A
169
Ungrouped data
 The median is the value which divides the data set into two equal parts.
 If the number of values is odd, the median will be the middle value when
all values are arranged in order of magnitude.
 When the number of observations is even, there is no single middle value
but two middle observations.
 In this case the median is the mean of these two middle observations, when
all observations have been arranged in the order of their magnitude.
Cont…
7/4/2023
Asaye.A
170
1. For ungrouped data
• If the number of observations is odd, the median is defined as the
[(n+1)/2]th observation.
• If the number of observations is even the median is the average of
the two middle (n/2)th and [(n/2)+1]th values.
• To find the median of a data set:
• Arrange the data in ascending order.
• Find the middle observation of this ordered data.
Cont…
7/4/2023
Asaye.A
171
Example1: where n is even: 19,20, 20, 21, 22, 24, 27, 27, 27,34
Then, the median = (22 + 24)/2 = 23
Example2: The number of children with asthma during a specific year
in seven local districts clinic is shown.
Find the median for this data set.
253, 125, 328, 417, 201, 70, 90
Cont…
7/4/2023
Asaye.A
172
Solution:
First we must arrange the data in ascending order
70, 90, 125, 201, 253, 328, 417
Therefore, the fourth observation is the median of the data, i.e. the value 201
is the median value.
Exercise
7/4/2023
Asaye.A
173
The actual waiting time for the first job on the selected sample of nine people
having different field of specialization was given below.
waiting time(in months): 11.6,11.3, 10.7, 18.0, 3.3, 9.2, 8.3, 3.8, 6.8
Calculate the median of the waiting time.
Cont…
7/4/2023
Asaye.A
174
2. For grouped data
-If data are given in the shape of continuous frequency distribution, the
median is defined as:
Where: Lmed =lower class boundary of the median class. f med= The frequency
of the median class, W=the size of the median class, n= total number of
observation, f c= The cumulative frequency less than type preceding the
median class.
Note: the median class is the class with smallest cumulative frequency {less
than type) greater than or equal to n/2.
Cont…
7/4/2023
Asaye.A
175
 Example; find the median for the following distribution
Cont…
7/4/2023
Asaye.A
176
Solution
Cont…
7/4/2023
Asaye.A
177
 We can computed the median value as follow;
Merit and demerit of median
7/4/2023
Asaye.A
178
Merits:
 Median is a positional average and hence not influenced by extreme
observations.
 Can be calculated in the case of open end intervals.
 The median can be used as a summary measure for ordinal, discrete and
continuous data, in general however, it is not appropriate for nominal data.
Demerits:
 It is not a good representative of data if the number of items are small.
 It is not amenable to further algebraic treatment.
 It is vulnerable to sampling fluctuations.
3. Mode
7/4/2023
Asaye.A
179
 Mode is a value which occurs most frequently in a set of values.
 The mode may not exist and even if it does exist, it may not be
unique.
 If in a set of observed values, all values occur once or equal
number of times, there is no mode
Cont…
7/4/2023
Asaye.A
180
Examples:
1. Find the mode of 5, 3, 5, 8, and 9 ; Mode = 5
2. Find the mode of 8, 9, 9, 7, 8, 2, 5; Mode =8 and 9
3. Find the mode of 4, 12, 3, 6, and 7. No mode/ mode doesn’t exist.
Cont…
7/4/2023
Asaye.A
181
Mode for Grouped data
 NB: The mode for grouped data is modal class.
 The Modal class is the class with the largest frequency.
 mode = L +
 1
 1   2
∗ W
 Where L = The lower class boundary of the modal class;
w = the size of the modal class
f1= frequency of the class preceding the modal class.
f2= frequency of the class succeeding the modal class
fmod = frequency of the modal class.
1 = fmod - f1 , 2 = fmod - f2
Cont…
7/4/2023
Asaye.A
182
Example: Calculate the modal age for the age distribution of 228 patients
below.
Cont…
7/4/2023
Asaye.A
183
Solution
By inspection (simply looking at the frequencies), the mode lies in the
fourth class, where L=29.5, fmod = 57, f1=50, f2=48, w = 5, and
Therefore, the modal age, x = 29.5 +
7
7  9
∗ 5
 29.5  2.2
 31.7
∆2=57-48=9
∆1=57-50=7,
Properties of Mode
7/4/2023
Asaye.A
184
 The mode can be used as a summary measure for nominal,
ordinal, discrete and continuous data, in general however, it is
more appropriate for nominal and ordinal data.
 It is not affected by extreme values
 It can be calculated for distributions with open end classes
 Sometimes its value is not unique
 The main drawback of mode is that it may not exist
Merit and Demerit of Mode
7/4/2023
Asaye.A
185
Merits:
 It is not affected by extreme observations.
 Easy to calculate and simple to understand.
 It can be calculated for distribution with open end class.
Cont…
7/4/2023
Asaye.A
186
Demerits:
 It is not rigidly defined. i.e. its value is not unique.
 It is not based on all observations.
 It is not suitable for further mathematical treatment.
 It is not stable average, i.e. it is affected by fluctuations of
sampling to some extent.
Measure of location
7/4/2023
Asaye.A
187
Quartiles
- Quartiles are measures that divide the frequency distribution in to four
equal parts.
- The value of the variables corresponding to these divisions are denoted
Q1, Q2, and Q3 often called the first, the second and the third quartile
respectively.
- Q1 is a value which has 25% items which are less than or equal to it
- Similarly Q2 has 50% items with value less than or equal
to it.
Cont…
7/4/2023
Asaye.A
188
− Q3 has 75% items whose values are less than or equal to it.
 Quartile for ungrouped data.
 Arrange data in ascending order.
 If the number of observation is
A. Odd
 Qi =
𝑖(𝑛+1)th
4
item
 B. Even
 Qi =(
𝑖𝑛
4
𝑡ℎ+
𝑖𝑛
4
+1 𝑡ℎ
2
)
For grouped data
7/4/2023
Asaye.A
189
Percentiles
7/4/2023
Asaye.A
190
 Simply divide the data into 100 pieces
 Shows the percentage of values that fall below the particular value in a set
of data scores.
Cont…
7/4/2023
Asaye.A
191
 Arrange the numbers in ascending order.
Percentiles for individual series
A. Odd
Pi =
𝑖(𝑛+1)th
100
item
B. Even
Pi =(
𝑖𝑛
100
𝑡ℎ+
𝑖𝑛
100
+1 𝑡ℎ
2
)
Percentiles for grouped data
𝑃𝑖= 𝐿 +
𝑤
𝑓𝑃𝑖
𝑖𝑛
100
− 𝐶𝐹 ,i = 1, 2,...,99 .
Cont…
7/4/2023
Asaye.A
192

Cont…
7/4/2023
Asaye.A
193
 For example: suppose that 50% of a cohort survived at least 4 years.
 This means also that 50% survived at most 4 years.
 We say that 4 years is the median.
 The media is also called 50th percentile.
 We write p50= 4 years.
Example
7/4/2023
Asaye.A
194
Marks of 50 students out of 85 is given below. Based on the data find
𝑄1 𝑎𝑛𝑑 𝑃7.
Solution: first find CB and CF distribution.
Second determine the quartile and percentile classes.
For 𝑄1: the smallest CF ≥ i*N/4=1*50/4= 12.5
Marks
46-50 51-55 56-60 61-65 66-70 71-75 76-80
fi
4 8 15 5 9 5 4
Marks 46-50 51-55 56-60 61-65 66-70 71-75 76-80
CB 45.5-
50.5
50.5-
55.5
55.5-60.5 60.5-65.5 65.5-70.5 70.5-75.5 75.5-
80.5
fi 4 8 15 5 9 5 4
CF 4 12 27 32 41 46 50
Cont…
7/4/2023
Asaye.A
195
 CF ≥ 12.5 are 27,37,41,46, and 50. but the smallest CF is 27. so the
quartile class is the third class (55.5-60.5).
 Q1 = L +
𝑤
𝑓
𝑄1
𝑛
4
− 𝐶𝐹 = 55.5 +
5
15
12.5 − 12 = 55.7
 For percentiles
 P7 measure of (7n/100)th value = 3.5th value which lies in group 45.5
– 50.5.
 P7 = L +
𝑤
𝑓
𝑃7
7𝑛
100
− 𝐶𝐹 = 45.5 +
5
4
3.5 − 0 = 49.875.
Cont…
7/4/2023
Asaye.A
196
1. Calculate 𝑄1 , 𝑄2, 𝑄3, 𝐷4, 𝑃40 & 𝑃90 for the following data given
on the table below.
2. The following frequency distribution represents the magnitude of
earth quake.
Compute the median and verify that it is equal to the second quartile
and find 72nd percentile.
x 10 11 12 13 14 15 16 17 18
f 2 8 25 48 65 40 20 9 2
Magnitude 0-0.9 1-1.9 2-2.9 3-3.9 4-4.9 5-5.9 6-6.9 7-7.9
Frequency 20 50 45 30 10 8 6 1
Summary
7/4/2023
Asaye.A
197
1. The arithmetic mean is used for interval and ratio data and for
symmetric distribution.
2. The median and quartiles are used for ordinal, interval and ratio data
whose distribution is skewed.
3. For nominal data mode is the appropriate MCT.
Measures of variation/dispersion
7/4/2023
Asaye.A
198
 The scatter or spread of items of a distribution is known as
dispersion or variation.
 In other words, the degree to which numerical data tend to
spread about an average value is called dispersion or variation
of the data.
 Measures of dispersions are statistical measures which provide
ways of measuring the extent in which data are dispersed or spread
out.
Agood measure of variation posses:
7/4/2023
Asaye.A
199
o It should be easy to compute and understand.
o It should be based on all observations.
o It should be Uniquely defined
o It should be capable of further algebraic treatment.
o It should be as little as affected by extreme values
Cont…
7/4/2023
Asaye.A
200
o Measures of dispersion include:
o Range
o Inter-quartile range
o Variance
o Standard deviation
o Coefficient of variation
o Standard scores (Z-scores)
Range
7/4/2023
Asaye.A
201
 It is the difference between the largest and smallest observation from the
data.
 Example: Consider the data on the weight (in Kg) of 10 new born
children at Debre tabor hospital within a month: 2.51, 3.01, 3.25,
2.02,1.98, 2.33, 2.33, 2.98, 2.88, 2.43
Cont…
7/4/2023
Asaye.A
202
Solution:
 The range for the dataset can be computed by first arranging all
observation in to ascending order as: 1.98, 2.02, 2.33, 2.33, 2.43, 2.51,
2.88, 2.98, 3.01, 3.25.
 Range = Maximum – Minimum = 3.25-1.98
= 1.27
Cont…
7/4/2023
Asaye.A
203
Limitations of Range
 It is based upon two extreme cases in the entire distribution, the range
may be considerably changed if either of the extreme cases happens to
drop out, while the removal of any other case would not affect it at all.
 It wastes information , it takes no account of the entire data.
Inter-quartile range
7/4/2023
Asaye.A
204
The inter-quartile range (IQR) is the difference between the third and the first
quartiles.
Example: Suppose the first and third quartile for weights of girls 12 months of
age are 8.8 Kg and 10.2 Kg respectively.
The IQR = 10.2 Kg – 8.8 Kg
Variance and standard deviation
7/4/2023
Asaye.A
205
 Variance measure how far on average scores deviate or differ
from the mean.
 Variance is the average of the square of the distance each value
from the mean.
Cont…
7/4/2023
Asaye.A
206
 For ungrouped data
Cont…
7/4/2023
Asaye.A
207
 For the case of frequency distribution it is expressed as:
 Why you use n-1;
− To obtain unbiased estimate of population variance or,
− To describe the spread of the population.
Cont…
7/4/2023
Asaye.A
208
 There is a problem in a variance because the deviations are squared
and its units also square, in order to get the original unit of
measurements using square root.
Example1
7/4/2023
Asaye.A
209
Consider the following three datasets
 Dataset 1:7, 7, 7, 7, 7, 7 Mean=7, sd=0
 Dataset 2: 6, 7, 7, 7, 7, 8, mean=7, sd=0.63
 Dataset 3: 3, 2, 7, 8, 9, 13, mean=7, sd=4.04
 We understand that the same mean but different variation
Example2
7/4/2023
Asaye.A
210
Find the variance and standard deviation based on the given data 35, 45,
30, 35, 40, 25
Solution; Firstly we find the mean
Next subtract the mean from each value and square it:
Cont…
7/4/2023
Asaye.A
211
Exercise
7/4/2023
Asaye.A
212
 The Areas of spray able surfaces with DDT from a sample of 15 houses
are measured as follows (in m2) :
101,105,110,114,115,124,125,125,130,133,135,136,13 7,140,145
Find the variance and standard deviation of the given data set?
Cont…
7/4/2023
Asaye.A
213

Example
7/4/2023
Asaye.A
214
Find the variance and the standard deviation for the frequency
distribution of the given data set below.
Cont…
7/4/2023
Asaye.A
215
Class Frequency Midpoint fi.xm
5.5-10.5 1 8 8 1*(8-24.5)2= 272.25
10.5-15.5 2 13 26 2*(13-24.5)2 = 264.5
15.5-20.5 3 18 54 3*(18-24.5)2 = 126.75
20.5-25.5 5 23 115 5*(23-24.5)2 = 11.25
25.5-30.5 4 28 112 4*(28-24.5)2 = 49
30.5-35.5 3 33 99 3*(33-24.5)2 = 216.75
35.5-40.5 2 38 76 2*(38-24.5)2 = 364.5
Total n = 20 490 1,305
Cont…
7/4/2023
Asaye.A
216
Cont…
7/4/2023
Asaye.A
217
Properties of Variance:
 The main demerit of variance is that its unit is the square of the unit
of the original measurement values.
 The variance gives more weight to the extreme values as compared
to those which are near to mean value, because the difference is
squared in variance.
 The drawbacks of variance are overcome by the standard deviation.
Cont…
7/4/2023
Asaye.A
218
SD Vs. Standard Error (SE)
 SD describes the variability among individual values in a given data set.
 SE is used to describe the variability among separate sample means
obtained from one sample to another.
 We interpret SE of the mean to mean that another similarly conducted
study may give a mean that may lie between ± SE.
Cont…
7/4/2023
Asaye.A
219
 The SD has the advantage of being expressed in the same units of
measurement as the mean.
 SD is considered to be the best measure of dispersion and is used
widely because of the properties of the theoretical normal curve.
 However, if the units of measurements of variables of two data sets
is not the same, then there variability can’t be compared by
comparing the values of SD
Coefficient of variation
7/4/2023
Asaye.A
220
 When two data sets have different units of measurements, or their means
differ sufficiently in size, the CV should be used as a measure of
dispersion.
 It is the best measure to compare the variability of two series of sets of
observations.
 A series with less coefficient of variation is considered more consistent.
 𝐶𝑣 =
𝑆
𝑋
∗ 100%
Cont…
7/4/2023
Asaye.A
221
 Example -“Cholesterol is more variable than systolic blood pressure”
Standard score (Z-scores)
7/4/2023
Asaye.A
222
 It is obtained by subtracting the mean of the data set from
the value and dividing the result by the standard deviation
of the data set.
 It tells us how many standard deviations a specific value is
above or below the mean value of the data set.
 The z-score is the number of standard deviations the data
value falls above (positive z-score) or below (negative z-
score) the mean for the data set.
Cont…
7/4/2023
Asaye.A
223
 Z-score computed from the population
𝑍 𝑠𝑐𝑜𝑟𝑒 =
𝑋 − 𝜇
𝜎
 Z-score computed from the sample
𝑍 𝑠𝑐𝑜𝑟𝑒 =
𝑋 − 𝑋
𝑆
Example: Suppose that a student scored 66 in biostatistics and 80 in anatomy
. The score of the summary of the courses is given below.
In which course did the student scored better as compared to his classmates?
Course Average score Standard deviation of the score
Biostatistics 51 12
Anatomy 72 16
Solution:
7/4/2023
Asaye.A
224
Z-score of student in Biostatistics: 𝑍 =
𝑋−𝜇
𝜎
=
66−51
12
=
15
12
=
1.25
Z-score of student in Anatomy: 𝑍 =
𝑋−𝜇
𝜎
=
80−72
16
=
8
16
= 0.5
From these two standard scores, we can conclude that the
student has scored better in Biostatistics course relative to his
classmates than in Anatomy.
Moments
 The rth moments about the mean (the rth central moments) defined as
𝑀𝑟 =
𝑋𝑖 − 𝑋 𝑟
𝑛
, r = 0, 1, 2, …
 For continuous grouped data
𝑀𝑟 =
𝑓𝑖 𝑋𝑖 − 𝑋 𝑟
𝑛
Where 𝑋𝑖’s is class mark
Find the first three central moments of the numbers 2, 3 and 7
Measure of shape
7/4/2023
Asaye.A
226
 There are different type of measure of shape;
I. Skewness
II. Kurtosis
1. Skewness
7/4/2023
Asaye.A
227
o Measure of central tendency and variation do not reveal the
shape of frequency distribution.
o Skewness is the degree of asymmetry or departure from
symmetry of a distribution.
o A skewed frequency distribution is one that is not symmetrical.
o Skewness is concerned with the shape of the curve not size.
Concept of skewness
7/4/2023
Asaye.A
228
o The skewness of a distribution is defined as the lack of symmetry.
o In a symmetrical distribution, mean, median, and mode are equal to
each other.
Skewness…
7/4/2023
Asaye.A
229
• For moderately skewed distribution, the following relation holds
among the three commonly used measures of central tendency.
 Mean-Mode=3*(Mean-Median)
 Thera are two type of skewness based the its shape.
 Positively skewed: Smaller observations are more frequent than larger
observations. i.e. the majority of the observations have a value below an
average and it has a long tail in the positive direction (Mean > Median).
Cont…
7/4/2023
Asaye.A
230
Skewed to the right (positively skewed)
Mode
Median
Mean
Cont…
7/4/2023
Asaye.A
231
 Negatively (left) skewed: Smaller observations are less frequent
than larger observations. i.e. the majority of the observations have a
value above an average. i.e. Mean < Median.
Mean
Median
Mode
Measures of Skewness
7/4/2023
Asaye.A
232
1. Karl Pearson’s Coefficient of Skewness (SK):

Mean - Mode
Standard deviation
Sk

3(Mean - Median)
Standard deviation
Sk
If SK = 0, then the distribution issymmetrical.
If SK > 0, then the distribution is positively skewed.
If SK < 0, then the distribution is negativelyskewed.
Cont…
7/4/2023
Asaye.A
233
2. Moment Coefficient of Skewness
 Moment coefficient of skewness is based on moments. The formula
for calculating coefficient of skewness is:
𝛼3=
𝑀3
𝑀2
3/2 =
𝑀3
𝜎3
Where, Mr = 𝑖=1
𝑛
(𝑥𝑖 − 𝑥)𝑟
/𝑛
𝛼3 > 0, the distribution is positively skewed
α3 = 0, the distribution is symmetric
α3 < 0, the distribution is negatively skewed
2. Kurtosis
7/4/2023
Asaye.A
234
o Kurtosis is a measure of peakedness of a distribution, and measured
relative to the peakedness of a normal curve.
o The peakedness of a distribution can be classified into three:
o Leptokurtic: -
- A distribution having relatively high peak.
- A curve is more peaked than the normal curve .
Cont…
7/4/2023
Asaye.A
235
o Mesokurtic: -
- Normal peak
- The curve is properly peaked
o Platykurtic:
 Flat toped
 A large number of observations have low frequency are spread
in the middle interval.
Cont…
7/4/2023
Asaye.A
236
Measures of kurtosis
7/4/2023
Asaye.A
237
 The moment coefficient of skewedness 𝛽2;
𝛽2 =
𝑀4
𝑀2
2
Where; 𝑀2 and 𝑀4 are central moments.
 If 𝛽2 = 3, then the distribution is Mesokurtic.
 If 𝛽2 > 3, then the distribution is Leptokurtic.
 If 𝛽2 < 3, then the distribution is Platykurtic.
Example:
7/4/2023
Asaye.A
238
Based on the following data:
𝑀0 = 1, 𝑀1 = -0.6, 𝑀2 = 1.6, 𝑀3 = -2.4, 𝑀4 = 5.8
a) Find the coefficient of skewness and discuss the distribution type.
b) Find the coefficient of kurtosis and discuss the distribution type.
Solution
a) 𝛼3=
𝑀′3
𝑀′2
3/2 =
−2.4
1.63/2 = -1.19 < 0, the distribution is negatively
skewed.
b) 𝛼4=
𝑀′4
𝑀′2
2 =
5.8
1.62 = 2.26 < 3, the curve is Platykurtic.

More Related Content

Similar to 1. intro_biostatistics.pptx

Analysis of statistical data in heath information management
Analysis of statistical data in heath information managementAnalysis of statistical data in heath information management
Analysis of statistical data in heath information management
Saleh Ahmed
 
General Statistics boa
General Statistics boaGeneral Statistics boa
General Statistics boa
raileeanne
 
Ebd1 lecture 3 2010
Ebd1 lecture 3  2010Ebd1 lecture 3  2010
Ebd1 lecture 3 2010
Reko Kemo
 
Ebd1 lecture 3 2010
Ebd1 lecture 3  2010Ebd1 lecture 3  2010
Ebd1 lecture 3 2010
Reko Kemo
 
Ebd1 lecture 3 2010
Ebd1 lecture 3  2010Ebd1 lecture 3  2010
Ebd1 lecture 3 2010
Reko Kemo
 

Similar to 1. intro_biostatistics.pptx (20)

chapter 1.pptx
chapter 1.pptxchapter 1.pptx
chapter 1.pptx
 
Unit 1 - Statistics (Part 1).pptx
Unit 1 - Statistics (Part 1).pptxUnit 1 - Statistics (Part 1).pptx
Unit 1 - Statistics (Part 1).pptx
 
Statistical lechure
Statistical lechureStatistical lechure
Statistical lechure
 
Analysis of statistical data in heath information management
Analysis of statistical data in heath information managementAnalysis of statistical data in heath information management
Analysis of statistical data in heath information management
 
General Statistics boa
General Statistics boaGeneral Statistics boa
General Statistics boa
 
Introduction and meanings of Statistics.docx
Introduction and meanings of Statistics.docxIntroduction and meanings of Statistics.docx
Introduction and meanings of Statistics.docx
 
BIOSTATISTICS FUNDAMENTALS FOR BIOTECHNOLOGY
BIOSTATISTICS FUNDAMENTALS FOR BIOTECHNOLOGYBIOSTATISTICS FUNDAMENTALS FOR BIOTECHNOLOGY
BIOSTATISTICS FUNDAMENTALS FOR BIOTECHNOLOGY
 
Frequency Distribution.pdf
Frequency Distribution.pdfFrequency Distribution.pdf
Frequency Distribution.pdf
 
QUANTITATIVE METHODS NOTES.pdf
QUANTITATIVE METHODS NOTES.pdfQUANTITATIVE METHODS NOTES.pdf
QUANTITATIVE METHODS NOTES.pdf
 
Biostatistics
BiostatisticsBiostatistics
Biostatistics
 
Introduction to statistics in health care
Introduction to statistics in health care Introduction to statistics in health care
Introduction to statistics in health care
 
Biostatistics khushbu
Biostatistics khushbuBiostatistics khushbu
Biostatistics khushbu
 
Module 8-S M & T C I, Regular.pptx
Module 8-S M & T C I, Regular.pptxModule 8-S M & T C I, Regular.pptx
Module 8-S M & T C I, Regular.pptx
 
Ebd1 lecture 3 2010
Ebd1 lecture 3  2010Ebd1 lecture 3  2010
Ebd1 lecture 3 2010
 
Ebd1 lecture 3 2010
Ebd1 lecture 3  2010Ebd1 lecture 3  2010
Ebd1 lecture 3 2010
 
Ebd1 lecture 3 2010
Ebd1 lecture 3  2010Ebd1 lecture 3  2010
Ebd1 lecture 3 2010
 
Lecture notes on STS 102
Lecture notes on STS 102Lecture notes on STS 102
Lecture notes on STS 102
 
Basic stat
Basic statBasic stat
Basic stat
 
probability and statistics-4.pdf
probability and statistics-4.pdfprobability and statistics-4.pdf
probability and statistics-4.pdf
 
Definition Of Statistics
Definition Of StatisticsDefinition Of Statistics
Definition Of Statistics
 

More from Abebe334138 (12)

Advanced Biostatistics presentation pptx
Advanced Biostatistics presentation  pptxAdvanced Biostatistics presentation  pptx
Advanced Biostatistics presentation pptx
 
Regression Analysis.ppt
Regression Analysis.pptRegression Analysis.ppt
Regression Analysis.ppt
 
Lecture_5Conditional_Probability_Bayes_T.pptx
Lecture_5Conditional_Probability_Bayes_T.pptxLecture_5Conditional_Probability_Bayes_T.pptx
Lecture_5Conditional_Probability_Bayes_T.pptx
 
3. Statistical inference_anesthesia.pptx
3.  Statistical inference_anesthesia.pptx3.  Statistical inference_anesthesia.pptx
3. Statistical inference_anesthesia.pptx
 
chapter-7b.pptx
chapter-7b.pptxchapter-7b.pptx
chapter-7b.pptx
 
chapter -7.pptx
chapter -7.pptxchapter -7.pptx
chapter -7.pptx
 
7 Chi-square and F (1).ppt
7 Chi-square and F (1).ppt7 Chi-square and F (1).ppt
7 Chi-square and F (1).ppt
 
Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptx
 
RCT CH0.ppt
RCT CH0.pptRCT CH0.ppt
RCT CH0.ppt
 
Lecture_R.ppt
Lecture_R.pptLecture_R.ppt
Lecture_R.ppt
 
ppt1221[1][1].pptx
ppt1221[1][1].pptxppt1221[1][1].pptx
ppt1221[1][1].pptx
 
dokumen.tips_biostatistics-basics-biostatistics.ppt
dokumen.tips_biostatistics-basics-biostatistics.pptdokumen.tips_biostatistics-basics-biostatistics.ppt
dokumen.tips_biostatistics-basics-biostatistics.ppt
 

Recently uploaded

Bihar Sharif Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Bihar Sharif Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetBihar Sharif Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Bihar Sharif Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Call Girls Service
 
Mathura Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Mathura Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetMathura Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Mathura Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Call Girls Service
 
Ernakulam Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Ernakulam Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetErnakulam Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Ernakulam Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Call Girls Chandigarh
 
Top 20 Famous Indian Female Pornstars Name List 2024
Top 20 Famous Indian Female Pornstars Name List 2024Top 20 Famous Indian Female Pornstars Name List 2024
Top 20 Famous Indian Female Pornstars Name List 2024
Sheetaleventcompany
 
Nanded Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Nanded Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetNanded Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Nanded Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Call Girls Service
 
💚 Punjabi Call Girls In Chandigarh 💯Lucky 🔝8868886958🔝Call Girl In Chandigarh
💚 Punjabi Call Girls In Chandigarh 💯Lucky 🔝8868886958🔝Call Girl In Chandigarh💚 Punjabi Call Girls In Chandigarh 💯Lucky 🔝8868886958🔝Call Girl In Chandigarh
💚 Punjabi Call Girls In Chandigarh 💯Lucky 🔝8868886958🔝Call Girl In Chandigarh
Sheetaleventcompany
 
Sambalpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Sambalpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetSambalpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Sambalpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Call Girls Service
 
Thoothukudi Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Thoothukudi Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetThoothukudi Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Thoothukudi Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Call Girls Service
 
Call Girl in Bangalore 9632137771 {LowPrice} ❤️ (Navya) Bangalore Call Girls ...
Call Girl in Bangalore 9632137771 {LowPrice} ❤️ (Navya) Bangalore Call Girls ...Call Girl in Bangalore 9632137771 {LowPrice} ❤️ (Navya) Bangalore Call Girls ...
Call Girl in Bangalore 9632137771 {LowPrice} ❤️ (Navya) Bangalore Call Girls ...
mahaiklolahd
 
Tirupati Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Tirupati Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetTirupati Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Tirupati Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Call Girls Service
 
Bareilly Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Bareilly Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetBareilly Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Bareilly Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Call Girls Service
 
raisen Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
raisen Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meetraisen Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
raisen Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Call Girls Service
 
Jalna Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Jalna Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetJalna Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Jalna Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Call Girls Service
 
Russian Call Girls in Noida Pallavi 9711199171 High Class Call Girl Near Me
Russian Call Girls in Noida Pallavi 9711199171 High Class Call Girl Near MeRussian Call Girls in Noida Pallavi 9711199171 High Class Call Girl Near Me
Russian Call Girls in Noida Pallavi 9711199171 High Class Call Girl Near Me
mriyagarg453
 
Thrissur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Thrissur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetThrissur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Thrissur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Call Girls Service
 
Mangalore Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Mangalore Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetMangalore Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Mangalore Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Call Girls Service
 
Call Girls in Udaipur Girija Udaipur Call Girl ✔ VQRWTO ❤️ 100% offer with...
Call Girls in Udaipur  Girija  Udaipur Call Girl  ✔ VQRWTO ❤️ 100% offer with...Call Girls in Udaipur  Girija  Udaipur Call Girl  ✔ VQRWTO ❤️ 100% offer with...
Call Girls in Udaipur Girija Udaipur Call Girl ✔ VQRWTO ❤️ 100% offer with...
mahaiklolahd
 
Bhagalpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Bhagalpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetBhagalpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Bhagalpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Call Girls Service
 

Recently uploaded (20)

Bihar Sharif Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Bihar Sharif Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetBihar Sharif Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Bihar Sharif Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
Mathura Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Mathura Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetMathura Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Mathura Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
Ernakulam Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Ernakulam Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetErnakulam Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Ernakulam Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
Top 20 Famous Indian Female Pornstars Name List 2024
Top 20 Famous Indian Female Pornstars Name List 2024Top 20 Famous Indian Female Pornstars Name List 2024
Top 20 Famous Indian Female Pornstars Name List 2024
 
Nanded Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Nanded Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetNanded Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Nanded Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
💚 Punjabi Call Girls In Chandigarh 💯Lucky 🔝8868886958🔝Call Girl In Chandigarh
💚 Punjabi Call Girls In Chandigarh 💯Lucky 🔝8868886958🔝Call Girl In Chandigarh💚 Punjabi Call Girls In Chandigarh 💯Lucky 🔝8868886958🔝Call Girl In Chandigarh
💚 Punjabi Call Girls In Chandigarh 💯Lucky 🔝8868886958🔝Call Girl In Chandigarh
 
Sambalpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Sambalpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetSambalpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Sambalpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
Thoothukudi Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Thoothukudi Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetThoothukudi Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Thoothukudi Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
Call Girl in Bangalore 9632137771 {LowPrice} ❤️ (Navya) Bangalore Call Girls ...
Call Girl in Bangalore 9632137771 {LowPrice} ❤️ (Navya) Bangalore Call Girls ...Call Girl in Bangalore 9632137771 {LowPrice} ❤️ (Navya) Bangalore Call Girls ...
Call Girl in Bangalore 9632137771 {LowPrice} ❤️ (Navya) Bangalore Call Girls ...
 
Tirupati Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Tirupati Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetTirupati Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Tirupati Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
Bareilly Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Bareilly Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetBareilly Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Bareilly Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
raisen Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
raisen Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meetraisen Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
raisen Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
Jalna Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Jalna Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetJalna Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Jalna Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
Call Now ☎ 9999965857 !! Call Girls in Hauz Khas Escort Service Delhi N.C.R.
Call Now ☎ 9999965857 !! Call Girls in Hauz Khas Escort Service Delhi N.C.R.Call Now ☎ 9999965857 !! Call Girls in Hauz Khas Escort Service Delhi N.C.R.
Call Now ☎ 9999965857 !! Call Girls in Hauz Khas Escort Service Delhi N.C.R.
 
Russian Call Girls in Noida Pallavi 9711199171 High Class Call Girl Near Me
Russian Call Girls in Noida Pallavi 9711199171 High Class Call Girl Near MeRussian Call Girls in Noida Pallavi 9711199171 High Class Call Girl Near Me
Russian Call Girls in Noida Pallavi 9711199171 High Class Call Girl Near Me
 
(Deeksha) 💓 9920725232 💓High Profile Call Girls Navi Mumbai You Can Get The S...
(Deeksha) 💓 9920725232 💓High Profile Call Girls Navi Mumbai You Can Get The S...(Deeksha) 💓 9920725232 💓High Profile Call Girls Navi Mumbai You Can Get The S...
(Deeksha) 💓 9920725232 💓High Profile Call Girls Navi Mumbai You Can Get The S...
 
Thrissur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Thrissur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetThrissur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Thrissur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
Mangalore Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Mangalore Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetMangalore Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Mangalore Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
Call Girls in Udaipur Girija Udaipur Call Girl ✔ VQRWTO ❤️ 100% offer with...
Call Girls in Udaipur  Girija  Udaipur Call Girl  ✔ VQRWTO ❤️ 100% offer with...Call Girls in Udaipur  Girija  Udaipur Call Girl  ✔ VQRWTO ❤️ 100% offer with...
Call Girls in Udaipur Girija Udaipur Call Girl ✔ VQRWTO ❤️ 100% offer with...
 
Bhagalpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Bhagalpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetBhagalpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Bhagalpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 

1. intro_biostatistics.pptx

  • 1. Debre Tabor University is new and different 7/4/2023 Asaye.A 1 Debre Tabor University College of Heath Science Social and Public Health Biostatistics Course for Health Science Students Debre Tabor, Ethiopia
  • 2. Contact detail 7/4/2023 Asaye.A 2  Asaye Alamneh (Lecturer of Biostatistics at DTU)  Debre Tabor University  College of Health Science  Department: Social and Public Health  Qualifications:  BSc in Statistics, MPH in Biostatistics  Contacts:  Email: asaye2127stat@gmail.com  Location: Debre Tabor University
  • 4. Outlines of presentation 7/4/2023 Asaye.A 4  Definition of statistics and biostatistics  Basic statistical concepts  Classification of statistics  Types of variables  Application and limitation of biostatistics
  • 5. Objectives 7/4/2023 Asaye.A 5  After completing this chapter, the student will be able to:  Define Statistics and Biostatistics  List some basic terms  Define and identify the different types of data and understand why we need to classify variables  Describe the importance and limitations of statistics  Identify source of data
  • 6. Definition of statistics 7/4/2023 Asaye.A 6  The word statistics come from the Latin “status” which refers to political state or government.  Statistics can be defined in two ways:- plural sense and singular sense. 1. Plural sense: are the aggregate of facts and figures, which are expressed in numerical form.  For example: Statistics on industrial production, Population growth in the country in different years, etc.
  • 7. Definition of statistics 7/4/2023 Asaye.A 7 2. Singular sense: Statistics refers to the science of collection, organization, presentation, analysis, and interpretation of numerical of data.  It is useful to make data simple and easy to be understood by entire population.  Help us to use numbers to communicate ideas.  For example: if we want to have a study about the distribution of weights of the health science students in DTU.
  • 8. Biostatistics 7/4/2023 Asaye.A 8 Biostatistics: application of statistical methods to medical, biological and public health related problems.  When the data being analyzed are derived from the biological science and medicine, we use the term biostatistics to manage medical uncertainties.
  • 9. Biostatistics…. 7/4/2023 Asaye.A 9  It is the scientific treatment given to the medical data derived from group of individuals or patients.  Collection of data.  Presentation of the collected data.  Analysis and interpretation of the results.  Making decisions on the basis of such analysis.
  • 10. Types of biostatistics 7/4/2023 Asaye.A 10 Based on how the data can be used, biostatistics can be classified in to two main categories . 1. Descriptive statistics:  Ways of collecting, organizing, summarizing, and presenting data at hand into concise manner to get an impression of the data.  Use to organize and describe the sample/population to simplify large amount of data in sensible ways .  It also show the final results in the form of table and graph.
  • 11. Types…….. 7/4/2023 Asaye.A 11 2. Inferential statistics: are methods for using sample data to make general conclusions (inferences) about populations.  Making conclusions for the population that is beyond available data. For example:  Probability distribution,  Estimation,  Confidence interval,  Hypothesis testing,  Regression analysis, etc.
  • 12. Type of Biostatistics 7/4/2023 Asaye.A 12 Collection Organizing Summarizing Presenting of data Descriptive Statistics Making inferences Hypothesis testing Determining relationship Making the prediction Inferential Statistics Biostatistics
  • 13. Stages of statistical investigation 7/4/2023 Asaye.A 13 A) Collection of data: measuring or gathering numerical data. B) Organization of data: organizing and classifying the collected data. C) Presentation of data: overview of the data in form of tables, graphs and charts. D) Analysis of data: extracting relevant information from the summarized data E) Interpretation of data: making generalization to the target population.
  • 14. Definitions of some basic terms 7/4/2023 Asaye.A 14 Population: A large group possessing a given characteristic or set of characteristics . A population may be finite or infinite Parameter: characteristics obtained from the population or a single measurement of population value. Example: population mean (μ),population standard deviation (δ) Statistic: characteristics obtained from the samples Example: sample mean , mode , median SD, Variance etc
  • 15. Cont…. 7/4/2023 Asaye.A 15 Sampling: The technique of sample selection from the entire population Sample: A subset of the population selecting by same sampling techniques Census: Complete enumeration of the population
  • 16. Cont…. 7/4/2023 Asaye.A 16  Data: is raw, unorganized facts that need to be processed.  When data is processed, organized, structured or presented in a given context so as to make it useful, it is called information.  Figure1: Relation between data and information
  • 18. Variable 7/4/2023 Asaye.A 18  Variable- a characteristic which take different values in different persons, places or things or any aspect of population unit that is measured or recorded.  e.g. height, weight, marital status, etc.  Random variables: are variables whose value are determined by chance.  Data: are sets of values of one or more variables.  Are numbers which can be measurements or can be obtained by counting.  Data set: it is a collection of observation on a variable.
  • 20. Types of Variables (1) 7/4/2023 Asaye.A 20  Depending on the characteristic of the measurement, variable can be classified into two types. 1. Quantitative (numerical variable): it is one that can be measured and expressed quantitatively or numerically.  It is the result of measuring or counting attributes population.  Quantitative variables are also subdivided into two types:-  A. Discrete variable  B. Continuous variable:
  • 21. Types ….. 7/4/2023 Asaye.A 21 A. Discrete variable:  A variable whose values are countable and assign a whole number.  There is no decimal number.  E.g. the number of daily admission of hospital, number of live births per 1000 women, number of motor vehicle accident in Debre Tabor town.
  • 22. Types ….. 7/4/2023 Asaye.A 22 B. Continuous variable: the one that does not have gaps or interruption.  A variable that can assume any decimal number value over a certain intervals. For example;  Serum cholesterol level of a patient,  Weight,  Age,  Laboratory result,  Time, Arm circumference
  • 23. Types ….. 7/4/2023 Asaye.A 23 2. Qualitative (categorical) variable: it can not be quantified or measured numerically, but measured by assigning names to items (events). E.g. sex, marital status, race or ethnic group, occupational status, eye color etc. A. Nominal variables: variables with no inherent order or ranking sequence. B. Ordinal variables: variables with an ordered series .
  • 24. Types of Variables (2) 7/4/2023 Asaye.A 24  Dependent variable: the outcome of interest, which should change in response to some intervention.  Some times called as out come or response variable.  Independent variable: is the intervention, or what is being manipulated.  a variable that you believe might influence your outcome measure.  An independent variable is a hypothesized cause or influence on a dependent variable.
  • 25. Type of scales of measurement 7/4/2023 Asaye.A 25  Based on the nature of the variable, variables can be measured into four d/t levels of measurement.  Measurement is defined as the assignment of numbers, symbols and/or names to objects or events.
  • 26. Type of scales…. 7/4/2023 Asaye.A 26  Each scale of measurement has certain properties which in turn determine the appropriateness for use of certain statistical analyses.  The property of value assigned to data based on the three properties of measurement such as, order, distance and fixed zero/true zero.  The four scales/levels of measurement are nominal, ordinal, interval and ratio.
  • 27. 1. Nominal scale 7/4/2023 Asaye.A 27  It is the lowest level of measurement.  It simply consists of "naming" or classifying them into various mutually exclusive, all inclusive categories in which no order or ranking can be imposed on the data.  When numbers are assigned to categories, it only used for coding purposes and it does not provide a sense of size.
  • 28. Cont… 7/4/2023 Asaye.A 28  No arithmetic and relational operation can be applied.  Nominal measurements have no three properties among values. For example;  Sex of a person (M, F),  Eye color (e.g. brown, blue),  Religion (Muslim, Christian),  Place of residence (urban, rural),  Race (e.g. black, white).
  • 29. 2. Ordinal scale 7/4/2023 Asaye.A 29  Level of measurement which classifies data into categories that can be ranked. Differences between the ranks do not exist.  Relational operations of greater than, less than are applicable,  The real difference between the ranks do not exist.
  • 30. Cont… 7/4/2023 Asaye.A 30 Example;-  Socio-economic status (very low, low, medium, high, very high)  Patient status (unimproved, improved, much improved),  Height of patients (very short, short, tall, very tall),  Blood pressure (very low, low, high, very high),  Job satisfaction level (highly dissatisfied, dissatisfied, satisfied, highly satisfied), etc
  • 31. 3. Interval Scale 7/4/2023 Asaye.A 31  It is possible to rank or order and tell the real distance between any two measurements.  However, there is no meaningful zero, so ratios are meaningless.  All arithmetic operations except division are applicable.  Relational operations are also possible.
  • 32. Cont… 7/4/2023 Asaye.A 32  The selected zero point is not necessarily a true zero in which it doesn't have to indicate a total absence of the quantity being measured.  Not that, zero degree Celsius is arbitrary so it does not make sense to say that 20 degree Celsius is twice hot as 10 degree Celsius. Examples:  Body temperature in OF or OC, time of the day, days of the year, test score, IQ…
  • 33. 4. Ratio scale 7/4/2023 Asaye.A 33  Is the highest level of measurement.  It classifies data that can be ranked, differences are meaningful, and there is a true zero. True ratios exist between the different units of measure.  There is always a true zero point, which shows the absence of condition.  All arithmetic and relational operations are applicable. Example: volume, height, weight, length, number of items, etc.
  • 35. Summary of levels of measurement No No No Yes Nominal No No Yes Yes Ordinal No Yes Yes Yes Interval Yes Yes Yes Yes Ratio Determine if one data value is a multiple of another Subtract data values Arrange data in order Put data in categories Level of measurement
  • 36. Summary of levels of measurement
  • 37. Why we need Biostatistics? 7/4/2023 Asaye.A 37  The main theory of statistics lies in the term variability.  We can also have instrumental variability and observers variability. 1. Handling variation. 1. Biological variation: variation among individuals as well as within individuals over time. For example; height, weight, blood pressure,…. 2. Sample variation: biomedical research project are usually carried out on small numbers of study subjects.
  • 38. Why we need Biostatistics? 7/4/2023 Asaye.A 38 2. Essential for scientific statistical methods of investigation.  Formulate hypothesis.  Design study to objectively to test hypothesis.  Collect reliable and unbiased data.  Process and evaluate data rigorously.  Interpretate and making appropriate conclusion.  These statistical methods are designed to contribute to the process of making scientific judgment in the face of uncertainties and variation.
  • 39. Why we need Biostatistics? 7/4/2023 Asaye.A 39  It helps the researcher to arrive at a scientific judgment about a hypothesis.  It study the association between two or more attributes  To evaluate the efficacy of drugs  To determine the success or failure of health care program  To define and measure the extent of the disease  Statistical methods help us to understand public health issues and disease, also quantifying uncertainties present in basic medical sciences.
  • 40. Limitations of statistics 7/4/2023 Asaye.A 40  It deals with only those subjects of investigation that are capable of being quantitatively measured and numerically expressed.  It deals on only aggregates of facts and no importance to individual items.  Statistical data are only approximately and not mathematically correct.  Statistics can be easily misused and therefore should be used be experts.
  • 41. Sources of data 7/4/2023 Asaye.A 41  There are two basic sources of statistical data: These are: 1. Primary data: The first hand data were collected from the items or individual respondents directly by researcher primarily for the purpose of certain study.
  • 42. Primary data… 7/4/2023 Asaye.A 42  The major primary sources of data are :-  Surveys,  Surveillance,  Census,  Observation and,  Experimental studies.
  • 43. Secondary data 7/4/2023 Asaye.A 43 2. Secondary data: which had been collected by certain people or agency, and statistically treated and the information contained in it is used for other purpose.  For example: hospital records, magazines, CSA, DHS, and vital statistics:  Birth reports,  Death reports,  Epidemic reports  Reports of laboratory utilization (including laboratory test results)
  • 44. Exercises 7/4/2023 Asaye.A 44 For each of the following variable indicate whether it is quantitative or qualitative and specify the measurement scale for each variable : 1. Blood Pressure (mmHg) 2. Cholesterol (mmol/l) 3. Diabetes (Yes/No) 4. Body Mass Index (Kg/m2) 5. Age (years) 6. Employment (paid work/retired/housewife) 7. Smoking Status (smokers/non-smokers, ex-smokers) 8. Exercise (hours per week) 9. Drink alcohol (units per week) 10. Level of pain (mild/moderate/severe)
  • 45. 7/4/2023 Asaye.A 45 Methods of data collection and presentation
  • 46. Methods of data collection and presentation…. 7/4/2023 Asaye.A 46  At the end of this chapter ,you should be able to – Understand Method of Data collection – Identify Method of data Presentation  Tabular Presentation  Diagrammatic presentation  Graphic presentation
  • 47. 1. Methods of data collection 7/4/2023 Asaye.A 47  Data collection techniques allow us to systematically collect data about our objects of study (people, objects, and phenomena) and about the setting in which they occur  Data can be obtained by a variety of ways. One of the most common is through the use of surveys  Surveys can be done by using a variety of data collection methods
  • 48. Methods… 7/4/2023 Asaye.A 48 There are various methods of data collection 1. Observation 2. Interviews 3. Questionnaires 4. Extraction of data from records
  • 49. Methods…… 7/4/2023 Asaye.A 49  1. Observation- It is a technique that involves systematically selecting, watching and recoding behaviors of people or other phenomena and for the purpose of getting (gaining) specified information.  It includes all methods from simple visual observations to the use of high level machines and measurements, sophisticated equipment or facilities
  • 50. Methods…… 7/4/2023 Asaye.A 50  Advantage- it gives relatively more accurate data on behavior and activities.  Disadvantage: -  Investigator’s (observers) bias  It requires more resource and  Skilled human power during use of high level machines.
  • 51. Methods…… 7/4/2023 Asaye.A 51 2. Interview- commonly used for research data collection techniques.  Face-to-Face interview,  Telephone interview.  Interviewee (responder), interviewer (asker)’
  • 52. Methods… 7/4/2023 Asaye.A 52 I. Direct personal interview  The investigator presents himself /herself personally before the informant and questions him /her personally  Best suited to situations where problems are not completely understood and where questions can not be formulated before hand and one question leads to other. Disadvantage  It is time consuming  It is not suited for large group of informants
  • 53. Methods… 7/4/2023 Asaye.A 53 II . Interviewing using questionnaire  One drafts a detailed questionnaire  The investigator appoints agents known as enumerators, who go to the respondents personally with the questionnaire,  Ask them the questions given there in, and  Record their replies  They can be  Face-to-face or  Telephone interviews.
  • 54. Methods… 7/4/2023 Asaye.A 54 Face-to-face interviews Advantage  The interviewer knows exactly who is responding to the questionnaire.  The interviewer can help the respondent if he/she has difficulty in understanding the questions e.g. language, concentration.  There is more flexibility in presenting the items ;they can range from closed to open.  Observations can be made as well.
  • 55. Methods… 7/4/2023 Asaye.A 55 Disadvantage  Untrained interviewer may distort the meaning of questions.  Attributes of the interviewer may affect the responses given due to bias of the interviewer and his/ her social or ethnic characteristics.  More cost in terms of time and money (training and salary of interviewers).
  • 56. Methods… 7/4/2023 Asaye.A 56 Telephone interviews Advantage  Less expensive in time and money compared with face to face interviews  The interviewer is able to help the respondent if he/she doesn’t understand the question  Broad representative samples can be obtained for those who have telephone lines  May assure the uniformity if interviewer is the same.
  • 57. Methods… 7/4/2023 Asaye.A 57 Disadvantage  Under presentation of those group which do not have telephone  Respondent may be substituted by another  Problems with questions with multiple options for answer and complicate questions
  • 58. 3. Questionnaires 7/4/2023 Asaye.A 58  Self administered questionnaires:- the respondent reads the question and fill the answers by themselves.  Advantage  Is simpler and cheaper.  Can be administered to many persons simultaneously (e.g. to a class of students).  Disadvantage  They demand a certain level of education and skill on the part of the respondents.
  • 59. Methods… 7/4/2023 Asaye.A 59 Postal questionnaire  The questionnaires are sent by post to the informants together with a polite covering letter by explaining  The detail information  The aims and objectives of collecting the information, and  Requesting the respondents to cooperate by furnishing the correct replies and returning the questionnaire duly filled in.  The return postage expenses are usually covered by the investigator.
  • 60. Methods… 7/4/2023 Asaye.A 60 The main problems with postal questionnaire are :  Response rates tend to be relatively low, and  There may be under representation of less literate subjects.
  • 61. Methods… 7/4/2023 Asaye.A 61 Mailed Questionnaire  The questionnaire is mailed to respondents to be filled.  Some times known as self enumeration. Advantage  Cheap  No need for trained interviewers.  No interviewer bias.  They can be coordinated from one central location.
  • 62. Methods… 7/4/2023 Asaye.A 62 Disadvantage  Low response rate.  Uncompleted questionnaires due to omission or invalid response.  No assurance that the questionnaire was answered by right person.  Needs intense follow up to get a high response rate.
  • 63. Methods…… 7/4/2023 Asaye.A 63 3. Extraction of data from records  Clinical and other personal records, death certificates, published mortality statistics, census publications, etc. Examples; 1. Official publications of Central Statistical Authority 2. Publication of Ministry of Health and Other Ministries 3. News Papers and Journals.
  • 64. Methods… 7/4/2023 Asaye.A 64 4. International Publications like Publications by WHO, World Bank, UNICEF. 5. Records of hospitals or any Health Institutions.  During the use of data from documents, though they are less time consuming and relatively have low cost, take care on the quality and completeness of the data.
  • 65. Problems in gathering data 7/4/2023 Asaye.A 65 Common problems might include:  Language barriers  Lack of adequate time  Expense  Inadequately trained and experienced staff  Bias  Cultural norms
  • 66. Choosing method of data collection 7/4/2023 Asaye.A 66  To chose a better data collection method, we have to focus on relevant, timely, accurate and usability of information.  Some methods pay attention to timeliness and reduction in cost.  Others pay attention to accuracy and the strength of the method in using scientific approaches.
  • 67. Cont… 7/4/2023 Asaye.A 67  The selection of the method of data collection is also based on practical considerations, such as:  The need for personnel, skills, equipment, etc. in relation to what is available.  The acceptability of the procedures to the subjects.  The probability that the method will provide a good coverage. i.e. will supply the required information about all or almost all members of the population.
  • 68. Types of questions 7/4/2023 Asaye.A 68  Before looking the steps in questionnaire design, we need to review the types of questions.  There are two types of questions 1. Open ended (free-response) 2. Close ended (restricted choice) 1. open ended e.g. in your opinion what is the biggest barrier in getting your hospitals ANC unit patient.
  • 69. Types…… 7/4/2023 Asaye.A 69  Advantages- it stimulates free thoughts of respondent  Helpful to obtain information on sensitive issues  Disadvantages- there may problem of recalling answers  It is not suitable for mailed question  Answers are difficult to code for statistical analysis  The problem of poor hand writing
  • 70. Types…… 7/4/2023 Asaye.A 70 2. Close ended- provides fixed answers e.g. including your present visit how many times did you visit this hospital in the past two yrs? A. Once B. Twice C. 3x D. 4x E. >4x  Advantage- suitable for many forms of statistical analysis  Not difficult to code  Disadvantage- limits a variety of details
  • 71. Types…… 7/4/2023 Asaye.A 71 Partially open ended question  Advantage- provides alternatives if certain option are over looked  it identifies missing categories for future use  Disadvantage- respondent may ignore other options e.g. if the house hold lost any of its members due to death in the last 12 months what was the cause of death. 1.Malaria 3. car accident 2. famine/hunger 4.others specify
  • 72. Requirements of questions 7/4/2023 Asaye.A 72  Must have face validity  The question that we design should be one that give an obviously valid and relevant measurement for the variable.  Must be clear and unambiguous.  One question contain only one ideas and all respondent will understand in the same way.  Must not be offensive (avoid questions that may offend the respondent).
  • 73. Cont… 7/4/2023 Asaye.A 73  The questions should be fair (should not be loaded).  Sensitive questions - It may not be possible to avoid asking ‘sensitive’ questions that may offend respondents  In such situations the interviewer (questioner) should do it very carefully and wisely
  • 74. Cont… 7/4/2023 Asaye.A 74  Start with an interesting but non-controversial question (preferably open) that is directly related to the subject of the study.  Pose more sensitive questions as late as possible in the interview  Use simple language.  Make the questionnaire as short as possible.
  • 75. What to be considered before designing questioning tool 7/4/2023 Asaye.A 75  What exactly do we want to know, according to the objectives and variables we identified earlier?  Of whom will we ask questions and what techniques will we use?  Are our informants mainly literate or illiterate?  How large is the sample that will be interviewed?
  • 77. Types of closed format 7/4/2023 Asaye.A 77 Choice of categories  Q. What is your marital status?  Single  Married  Divorced  Widowed Likert (similar)style scale  Q. Biostatistics is an interesting subject  Strongly disagree  Disagree  Cannot decided  Agree  Strongly agree
  • 78. Cont… 7/4/2023 Asaye.A 78 Checklists  Circle the public health specialties you are particularly interested in  Epidemiology and Biostatistics  Reproductive health  Nutrition  Health informatics  Health service management  General
  • 79. Cont… 7/4/2023 Asaye.A 79 Ranking  Please rank your interest in the following specialties (1=most interesting, 4=least interesting )  Epidemiology and Biostatistics  Reproductive health  Nutrition  Health informatics
  • 80. 2. Methods of data organization and presentation 7/4/2023 Asaye.A 80  The most convenient method of organizing data is to construct a frequency distribution.  A frequency distribution is the organization of raw data in table form, using classes and frequencies.  Frequency distribution table: lists categories of scores along with their corresponding frequencies.  For this different techniques of data organization and presentation like order array, tables and diagrams are used.
  • 81. Array (ordered array) 7/4/2023 Asaye.A 81  A serial arrangement of numerical data in an ascending or descending order.  A simple arrangement of individual observations in order of magnitude.  This will enable us to know the range over which the items are spread and will also get an idea of their general distribution.  It is an appropriate way of presentation when the data are small in size (usually less than 20).
  • 82. Frequency Distribution (F.D.) 7/4/2023 Asaye.A 82 Frequency distribution is organization of the values of a variable arranged in order of magnitude either individually (for a discrete variable), or in to classes (for a continuous variable), or into categories (in case of qualitative data) along with their frequencies.
  • 83. Frequency Distribution (F.D.)… 7/4/2023 Asaye.A 83 A frequency distribution has two main parts; namely, i. The values of the variable (if quantitative) or the categories (if qualitative), and ii. The number of observations (frequency) corresponding to the values or categories.
  • 84. Frequency Distribution (F.D.)… 7/4/2023 Asaye.A 84 There are two types of frequency distributions i. Categorical (or qualitative) ii. Numerical (or quantitative) 1. Categorical Frequency Distribution  Data are classified according to non-numerical categories.  Categories must be mutually exclusive and exhaustive.  Used to organize nominal and ordinal data.
  • 85. Cont… 7/4/2023 Asaye.A 85 a) Nominal data: Here the construction is straight forward: count the occurrences in each category and find the totals. Example: The martial status of 60 adults classified as single, married, divorced and widowed is presented in a FD as below: Ordinal data: The construction is identical to the nominal case, but, the categories should be put in an ordered manner. Marital status Single Married Divorced Widowed Total Frequency 25 20 8 7 60
  • 86. Cont… 7/4/2023 Asaye.A 86 b) Ordinal data. The construction is identical to the nominal case. How ever, the categories should be put in an ordered manner. Example: Satisfaction on teaching method in a class of size 60 is presented in a FD as shown below
  • 87. Numerical F.D 7/4/2023 Asaye.A 87 2. Numerical Frequency Distribution  data are classified according to numerical size.  used to organize interval and ratio data.  may be discrete or continuous, depending on whether the variable is discrete or continuous.
  • 88. Numerical F.D… 7/4/2023 Asaye.A 88 a) Discrete (Ungrouped) Frequency Distribution  Count the number of times each possible value is repeated. Example: In a survey of 30 families, the number of children per family was recorded and obtained the following data: 4 2 4 3 2 8 3 4 4 2 2 8 5 3 4 5 4 5 4 3 5 2 7 3 3 6 7 3 8 4. The distribution of children in 30 families would be: No. of children 2 3 4 5 6 7 8 total No. of family (f) 5 7 8 4 1 2 3 30
  • 89. Continuous grouped F.D 7/4/2023 Asaye.A 89 b) Continuous/grouped Frequency Distribution o Arise from continuous variables/data. o Unlike for a discrete FD, a class can not be allocated to each value of a continuous variable. o Categories in to which the observations are distributed are called classes or class intervals. o Classes should be exhaustive and mutually exclusive.
  • 90. Example 7/4/2023 Asaye.A 90 Time spent Frequency 10 – 14 8 15 – 19 28 20 – 24 27 25 – 29 12 30 – 34 4 35 – 39 1
  • 91. Steps in constructing continuous frequency distribution 7/4/2023 Asaye.A 91 1. Determine the number of classes (k): Number of items belonging to a class.  Decide ”k” with the help of Sturge’s rule: k = 1 + 3.322 log(n) Rounded up or down to the nearest integer. Where n= number of observations, log= common logarithm (logarithm of 10).
  • 92. Cont… 7/4/2023 Asaye.A 92  Example if n=10, k=4.32≈4, if n=100, k=7.644≈8, if n=1000, k=10.96≈11  2. Determine the class width (w): the difference between the upper or lower boundaries of two consecutive classes (may be class limits).  We can use, W = 𝑅𝑎𝑛𝑔𝑒 𝐾  Note that “W” rounded up or down to the nearest integers.
  • 93. Cont… 7/4/2023 Asaye.A 93 3. Determine the Class Limits  It separates one class from another and have gap between the upper limits of one class and the lower limit of the next class.  The lower class limit of the first class should be the smallest value of the observations.  Add the size of a class width on the lower class limit to obtain the lower class limit of the next classes.  Unit of measure (U): This is the possible difference between successive values or measures. E.g. 1, 0.1, 0.01, 0.001……
  • 94. Cont… 7/4/2023 Asaye.A 94  To find the upper limit of the first class, subtract U from the lower limit of the second class.  Then continue to add the class width to this upper limit to find the rest of the upper limits or  Obtain the upper class limits by adding class width minus one to the corresponding lower class limits. i.e. UCL =LCL+ (W-1)
  • 95. Cont… 7/4/2023 Asaye.A 95 4. Determine the Class boundaries  Making an interval of a continuous variable continuous in both directions, no gap exists between classes.  let U =LCL of the second class – UCL of preceding class.  Add half of this difference (U/2) to all upper class limits to get the upper class boundaries (UCBs), and subtract (U/2) from all lower class limits to get the lower class boundaries (LCBs).  UCBi = UCLi +U/2  LCBi = LCLi – U/2
  • 96. Cont… 7/4/2023 Asaye.A 96 5. Class mark (C.M) or Mid points: it is the average of the lower and upper class limits or the average of upper and lower class boundary. 6. Determine the frequency of each class: determined simply by counting the number of observations belonging to each class. 7. Cumulative frequency is the number of observations less than/ more than or equal to a specific value. 8. Cumulative frequency above (Greater than type): it is the total frequency of all values greater than or equal to the lower class boundary of a given class.
  • 97. Cont… 7/4/2023 Asaye.A 97 9. Cumulative frequency below (less than type): it is the total frequency of all values less than or equal to the upper class boundary of a given class. 10. Relative frequency (rf): it is the frequency divided by the total frequency. 11. Relative cumulative frequency (rcf): it is the cumulative frequency divided by the total frequency.
  • 98. Cont… 7/4/2023 Asaye.A 98 Example: The blood glucose level for 50 patients is shown below. Construct a frequency distribution for the following data.
  • 99. Cont… 7/4/2023 Asaye.A 99 Solution: Step 1: Find the highest and the lowest value H=88, L=42 Step 2: Find the range; R=H-L=88-42=46. Step 3: Select the number of classes desired using Sturge’s formula; k=1+3.322log (50) =6.64=7(rounding up) Step 4: Find the class width; w=R/k=46/7=6.57=7 (rounding up)
  • 100. Cont… 7/4/2023 Asaye.A 100 Step 5: Select the starting observation as lowest class limit (this is usually the lowest observation).  Add the class width to that observation to get the lower limit of the next class.  Keep adding until there are 7 classes. 42, 49, 56, 63, 70, 77, 84 are the lower class limits. Step 6: Find the upper class limit; e.g. the first upper class=49- U=49- 1=48. The rest CL: 55, 62, 69, 76, 83, 90 are the upper class limits.
  • 101. Cont… 7/4/2023 Asaye.A 101  So combining step 5 and step 6, one can construct the following classes. Step 7: Find the class boundaries by subtracting 0.5 from each lower class limit and adding 0.5 to the UCL.
  • 102. Cont… 7/4/2023 Asaye.A 102 Example: For class 1: LCBi =LCLi - U/2 = 42-0.5 = 41.5 and UCBi = UCLi + U/2 = 48+0.5 = 48.5.  Then continue adding W on both boundaries to obtain the rest boundaries.  By doing so one can obtain the following classes.
  • 103. Cont… 7/4/2023 Asaye.A 103 Step 8: Find the frequencies Step 9: Find cumulative frequency. Step 10: Find relative frequency and /or relative cumulative frequency.
  • 105. Example: 7/4/2023 Asaye.A 105 Construct a continuous FD for the following raw data of ages of patients admitted at DTU hospital in a given week.
  • 108. Cont… 7/4/2023 Asaye.A 108  The class marks and class boundaries of the above Example are:
  • 109. Continuous/grouped F.D … 7/4/2023 Asaye.A 109 Cumulative frequency distributions o Tells us how often the values fall below or above that class. There are two types of CFD: The “less than” cumulative F.D. o Obtained by adding the frequency of all the preceding classes including the frequency of that class. The “more than” cumulative F.D. o Obtained by adding the frequency of the succeeding classes including the frequency of that class.
  • 110. Cont… 7/4/2023 Asaye.A 110  For the data in the above Example, both cumulative frequency distributions are given below:
  • 111. Following the rules for grouping data 7/4/2023 Asaye.A 111  The groups must not overlap: not to be confuse concerning in which group a measurement belongs.  There must be continuity from one group to the next: Otherwise some measurements may not fit in a group.  The groups must range from the lowest measurement to the highest measurement.  The groups should normally be of an equal width.
  • 112. Methods of data presentation 7/4/2023 Asaye.A 112 Commonly, here are two ways of presenting statistical data: 1. Statistical tables 2. Graphs/Diagrams
  • 113. 1. Tabulation methods of data presentations 7/4/2023 Asaye.A 113 1. Statistical tables o A statistical table is an orderly and systematic presentation of data in rows and columns. Rows : are horizontal arrangements. Columns: are vertical arrangements. o Use of tables for organizing data that involves grouping the data into mutually exclusive categories of the variables and counting the number of occurrences (frequency) to each category.
  • 114. Cont…. 7/4/2023 Asaye.A 114  Based on the purpose for which the table is designed and the complexity of the relationship, a table could be either of simple frequency table or cross tabulation.  Simple frequency table is used when the individual observations involve only to a single variable.  Cross tabulation is used to obtain the frequency distribution of one variable by another variables.
  • 115. General principles to construct tables 7/4/2023 Asaye.A 115 1. Tables should be as simple as possible. 2. Tables should be self-explanatory. Title should be clear and placed above the table. a good title answers: what? when? where? how classified ? Each row and column should be labeled. Numerical entities of zero should be explicitly written rather than indicated by a dash. Dashed are reserved for missing or unobserved data. Totals should be shown either in the top row and the first column or in the last row and last column. 3. If data are not original, their source should be given in a footnote.
  • 116. A) Simple or one-way table 7/4/2023 Asaye.A 116 Simple frequency table: most basic table is a simple frequency distribution with one variable. Example: Table. Blood group of voluntary blood donors examined in red cross blood bank within a day, may 2006 (n=548) Blood group Number of students Percent A 240 43.8 B 146 26.6 AB 57 10.4 O 105 19.2 Total 548 100 Rows Title Columns
  • 117. Two and three variable table 7/4/2023 Asaye.A 117  If two variables are cross tabulated, it is a two variable table  If the tabulation is among three variables, it is three variable table .  In cross tabulated frequency distributions where there are row and column totals, the decision for the denominator is based on the variable of interest to be compared over the subset of the other variable.
  • 119. Common form of a two by two variable 7/4/2023 Asaye.A 119  It is a special form of table favorite among epidemiologist.  It is used to compare whether there is relationship between the two variables. Exposure Numbers of subjects Total Cases Controls Exposed 23 23 46 Non- exposed 4 139 143 Total 27 162 189
  • 120. Composite/ Higher Order Table 7/4/2023 Asaye.A 120 It is a large table combining several separate variable/tables Age, sex and other demographic variables may be combined to form a single table Example: Distribution of Health Professional by Sex and Residence
  • 121. Diagrammatic and Graphical methods of data presentation 7/4/2023 Asaye.A 121 Advantages  To understand the information easily.  To make the data attractive.  To make comparisons of items easily.  To draw attention of the observer.  The purpose of graphs and diagrams is not to provide exact and detailed information, but simple comparisons.  Any further information shall rather be obtained from the original data.
  • 122. Limitations of Diagrammatic presentation 7/4/2023 Asaye.A 122  The technique is made use only for purposes of comparison. It is not to be used when comparison is either not possible or is not necessary.  is not an alternative to tabulation. It only strengthens the textual exposition of a subject, and cannot serve as a complete substitute for statistical data.  It can give only an approximate idea and as such where greater accuracy is needed diagrams will not be suitable.  They fail to bring to light small differences.
  • 123. 2. Diagrammatic Presentation of data 7/4/2023 Asaye.A 123  Diagrams are appropriate for presenting discrete as well as qualitative data.  The three most commonly used diagrammatic presentation of data are:  Pie charts  Bar charts  Pictograms
  • 125. 1. Pie chart 7/4/2023 Asaye.A 125  Pie chart can used to compare the relation between the whole and its components.  useful for qualitative or quantitative discrete data.  Pie chart is a circular diagram and the area of the sector of a circle is used in pie chart.
  • 127. Example: 7/4/2023 Asaye.A 127 Draw a suitable diagram to represent the following population in a town. Men Women Girls Boys 2500 2000 4000 1500
  • 130. 2. Bar charts (or graphs) 7/4/2023 Asaye.A 130  Categories are listed on the horizontal axis (X-axis).  Frequencies or relative frequencies are represented on the Y- axis.  The height of each bar is proportional to the frequency or relative frequency of observations in that category.  There are three types of bars.
  • 131. Tips for constructing bar diagrams 7/4/2023 Asaye.A 131 1. Whenever possible it is better to construct a bar diagram on a graph paper 2. All bars drawn in any single study should be of the same width 3. The different bars should be separated by equal distances 4. All the bars should rest on the same line called the base 5. Whenever possible, it is advisable to draw bars in order of magnitude
  • 132. Cont… 7/4/2023 Asaye.A 132 A. Simple bar chart:- used to represent a single variable classified on spatial, quantitative or temporal basis.
  • 133. Cont… 7/4/2023 Asaye.A 133 Example: Construct a bar chart for the following data
  • 135. Cont… 7/4/2023 Asaye.A 135 B. Sub-divided bar chart (component) o is used to represent data in which the total magnitude is divided into different or components o Example: Plasmodium species distribution for confirmed malaria cases, Zeway, 2003
  • 136. Cont… 7/4/2023 Asaye.A 136 C. Multiple bar chart  are used two or more sets of inter-related data are represented (multiple bar diagram facilities comparison between more than one phenomenon).  The following figure shows a multiple bar chart to represent the import and export of Canada (values in $) for the years 1991 to 1995.
  • 139. 3. Graphical Presentation of data 7/4/2023 Asaye.A 139 The histogram, frequency polygon and cumulative frequency graph (ogive) are most commonly applied graphical representation for continuous data. Procedures for constructing statistical graphs • Draw and label the X and Y axes. • Choose a suitable scale for the frequencies or cumulative frequencies and label it on the Y axes. • Represent the class boundaries for the histogram or ogive and the mid points for the frequency polygon on the X axes. • Plot the points. • Draw the bars or lines to connect the points.
  • 140. Graphical Presentation of data 7/4/2023 Asaye.A 140 1. Histogram  A graph which places the class boundaries on the horizontal axis and the frequencies on a vertical axis  Class marks and class limits are some times used as quantity on the X axes.  Non-overlapping intervals that cover all of the data values must be used.
  • 141. Cont… 7/4/2023 Asaye.A 141  Bars are drawn over the intervals in such a way that the areas of the bars are all proportional in the same way to their interval frequencies.  To avoid crowding, you can use class midpoints.  Example: Distribution of the age of women at the time of marriage
  • 143. Cont… 7/4/2023 Asaye.A 143 2. Frequency polygon  Line graph of class marks against class frequencies.  To draw a frequency polygon we connect the midpoints of class boundaries of the histogram by a straight line
  • 144. Cont… 7/4/2023 Asaye.A 144  It can be also drawn without erecting rectangles by joining the top midpoints of the intervals representing the frequency of the classes as follows:
  • 145. Cont… 7/4/2023 Asaye.A 145 3. Ogive Curve (Cumulative Frequency Polygon)  A graph showing the cumulative frequency (less than or more than type) plotted against upper or lower class boundaries respectively.  Ogive uses class boundaries along the horizontal axis, and cumulative frequency along vertical axis.  Less than Ogive uses less than cumulative frequency on y axis.  More than Ogive uses more than cumulative frequency on 𝑦 axis.  The points are joined by a free hand curve
  • 150. Cont… 7/4/2023 Asaye.A 150 4. Line graph o A variable is taken along X-axis and the frequency of occurrence of each of its observed values along the Y-axis. o The points are plotted and joined by line. o An arithmetic scale line graph shows patterns or trends over some variable, usually time.
  • 154. learning outcomes 7/4/2023 Asaye.A 154  After completing this chapter a student will able to;  List and calculate measures of central tendency  List and calculate measures of dispersion  Describe types of shape.
  • 155. Numerical Summary Measures 7/4/2023 Asaye.A 155  They are the single numbers which quantify the characteristics of a distribution of values.  They are two types; 1. Measures of central tendency or location 2. Measures of dispersion
  • 156. Measures of Central Tendency/ Measures of Location 7/4/2023 Asaye.A 156  Measures of central Tendency: the methods of determining the actual value at which the data tend to concentrate.  The tendency of the statistical data to get concentrated at a certain value is called “central tendency”  The objective of calculating MCT is to determine a single figure which may be used to represent the whole data set.  Since a MCT represents the entire data, it facilitates comparison within one group or between groups of data
  • 157. Characteristics of a good MCT 7/4/2023 Asaye.A 157  A MCT is good or satisfactory if it possesses the following characteristics: o It should be based on all the observations o It should not be affected by the extreme values o It should be as close to the maximum number of values as possible o It should have a definite value o It should not be subjected to complicated and tedious calculations o It should be capable of further algebraic treatment o It should be stable with regard to sampling
  • 158. Cont… 7/4/2023 Asaye.A 158  The most common measures of central tendency include:  Arithmetic Mean  Median  Mode
  • 159. 1. Arithmetic Mean 7/4/2023 Asaye.A 159 1. Ungrouped Data  The arithmetic mean is the "average" of the data set and by far the most widely used measure of central location.  Is the sum of all the observations divided by the total number of observations.
  • 160. Arithmetic….. 7/4/2023 Asaye.A 160 The heart rates for n=10 patients were as follows (beats per minute): 167, 120, 150, 125, 150, 140, 40, 136, 120, 150. What is the arithmetic mean for the heart rate of these patients?
  • 161. Cont… 7/4/2023 Asaye.A 161  When the data are arranged or given in the form of frequency distribution i.e. there are K variety such that a value Xi has frequency fi (i=1,2,…,k), then the arithmetic mean will be given as ;
  • 163. Cont… 7/4/2023 Asaye.A 163 Exercise Consider the following frequency distribution table Calculate the average of this data set?
  • 164. Cont… 7/4/2023 Asaye.A 164 2. For grouped data  In calculating the mean from grouped data, we assume that all values falling into a particular class interval are located at the midpoint of each interval.  Therefore, mean for grouped data is calculated as:
  • 165. Arithmetic….. 7/4/2023 Asaye.A 165 Example Compute the mean age of 169 subjects from the grouped data. Mean = 5810.5/169 = 34.48 years Class interval Mid-point (mi) Frequency (fi) mifi 10-19 20-29 30-39 40-49 50-59 60-69 14.5 24.5 34.5 44.5 54.5 64.5 4 66 47 36 12 4 58.0 1617.0 1621.5 1602.0 654.0 258.0 Total __ 169 5810.5
  • 166. Arithmetic….. 7/4/2023 Asaye.A 166  The mean can be thought of as a “balancing point”, “center of gravity”  It is possible in extreme cases for all but one of the sample points to be on one side of the arithmetic mean & in this case, the mean is a poor measure of central location or does not reflect the center of the sample.
  • 167. Properties of the Arithmetic Mean 7/4/2023 Asaye.A 167  The mean can be used as a summary measure for both discrete and continuous data, but it is not appropriate for either of nominal or ordinal data.  For a given set of data there is only one arithmetic mean (uniqueness).  Easy to calculate and understand (simple).  Influenced by each and every value in a data set  Greatly affected by the extreme values.  In case of grouped data if any class interval is open, arithmetic mean can not be calculated.
  • 168. 2. Median o It is the an alternative measure of central tendency, second in popularity next to arithmetic mean. o Suppose there are n observations in a sample o If these observations are ordered from smallest to largest, then the median is defined as follows: o The median, is a value such that at least half of the observations are less than or equal to median and at least half of the observations are greater than or equal to median.  The median is the midpoint of the data array.
  • 169. 2. Median…. 7/4/2023 Asaye.A 169 Ungrouped data  The median is the value which divides the data set into two equal parts.  If the number of values is odd, the median will be the middle value when all values are arranged in order of magnitude.  When the number of observations is even, there is no single middle value but two middle observations.  In this case the median is the mean of these two middle observations, when all observations have been arranged in the order of their magnitude.
  • 170. Cont… 7/4/2023 Asaye.A 170 1. For ungrouped data • If the number of observations is odd, the median is defined as the [(n+1)/2]th observation. • If the number of observations is even the median is the average of the two middle (n/2)th and [(n/2)+1]th values. • To find the median of a data set: • Arrange the data in ascending order. • Find the middle observation of this ordered data.
  • 171. Cont… 7/4/2023 Asaye.A 171 Example1: where n is even: 19,20, 20, 21, 22, 24, 27, 27, 27,34 Then, the median = (22 + 24)/2 = 23 Example2: The number of children with asthma during a specific year in seven local districts clinic is shown. Find the median for this data set. 253, 125, 328, 417, 201, 70, 90
  • 172. Cont… 7/4/2023 Asaye.A 172 Solution: First we must arrange the data in ascending order 70, 90, 125, 201, 253, 328, 417 Therefore, the fourth observation is the median of the data, i.e. the value 201 is the median value.
  • 173. Exercise 7/4/2023 Asaye.A 173 The actual waiting time for the first job on the selected sample of nine people having different field of specialization was given below. waiting time(in months): 11.6,11.3, 10.7, 18.0, 3.3, 9.2, 8.3, 3.8, 6.8 Calculate the median of the waiting time.
  • 174. Cont… 7/4/2023 Asaye.A 174 2. For grouped data -If data are given in the shape of continuous frequency distribution, the median is defined as: Where: Lmed =lower class boundary of the median class. f med= The frequency of the median class, W=the size of the median class, n= total number of observation, f c= The cumulative frequency less than type preceding the median class. Note: the median class is the class with smallest cumulative frequency {less than type) greater than or equal to n/2.
  • 175. Cont… 7/4/2023 Asaye.A 175  Example; find the median for the following distribution
  • 177. Cont… 7/4/2023 Asaye.A 177  We can computed the median value as follow;
  • 178. Merit and demerit of median 7/4/2023 Asaye.A 178 Merits:  Median is a positional average and hence not influenced by extreme observations.  Can be calculated in the case of open end intervals.  The median can be used as a summary measure for ordinal, discrete and continuous data, in general however, it is not appropriate for nominal data. Demerits:  It is not a good representative of data if the number of items are small.  It is not amenable to further algebraic treatment.  It is vulnerable to sampling fluctuations.
  • 179. 3. Mode 7/4/2023 Asaye.A 179  Mode is a value which occurs most frequently in a set of values.  The mode may not exist and even if it does exist, it may not be unique.  If in a set of observed values, all values occur once or equal number of times, there is no mode
  • 180. Cont… 7/4/2023 Asaye.A 180 Examples: 1. Find the mode of 5, 3, 5, 8, and 9 ; Mode = 5 2. Find the mode of 8, 9, 9, 7, 8, 2, 5; Mode =8 and 9 3. Find the mode of 4, 12, 3, 6, and 7. No mode/ mode doesn’t exist.
  • 181. Cont… 7/4/2023 Asaye.A 181 Mode for Grouped data  NB: The mode for grouped data is modal class.  The Modal class is the class with the largest frequency.  mode = L +  1  1   2 ∗ W  Where L = The lower class boundary of the modal class; w = the size of the modal class f1= frequency of the class preceding the modal class. f2= frequency of the class succeeding the modal class fmod = frequency of the modal class. 1 = fmod - f1 , 2 = fmod - f2
  • 182. Cont… 7/4/2023 Asaye.A 182 Example: Calculate the modal age for the age distribution of 228 patients below.
  • 183. Cont… 7/4/2023 Asaye.A 183 Solution By inspection (simply looking at the frequencies), the mode lies in the fourth class, where L=29.5, fmod = 57, f1=50, f2=48, w = 5, and Therefore, the modal age, x = 29.5 + 7 7  9 ∗ 5  29.5  2.2  31.7 ∆2=57-48=9 ∆1=57-50=7,
  • 184. Properties of Mode 7/4/2023 Asaye.A 184  The mode can be used as a summary measure for nominal, ordinal, discrete and continuous data, in general however, it is more appropriate for nominal and ordinal data.  It is not affected by extreme values  It can be calculated for distributions with open end classes  Sometimes its value is not unique  The main drawback of mode is that it may not exist
  • 185. Merit and Demerit of Mode 7/4/2023 Asaye.A 185 Merits:  It is not affected by extreme observations.  Easy to calculate and simple to understand.  It can be calculated for distribution with open end class.
  • 186. Cont… 7/4/2023 Asaye.A 186 Demerits:  It is not rigidly defined. i.e. its value is not unique.  It is not based on all observations.  It is not suitable for further mathematical treatment.  It is not stable average, i.e. it is affected by fluctuations of sampling to some extent.
  • 187. Measure of location 7/4/2023 Asaye.A 187 Quartiles - Quartiles are measures that divide the frequency distribution in to four equal parts. - The value of the variables corresponding to these divisions are denoted Q1, Q2, and Q3 often called the first, the second and the third quartile respectively. - Q1 is a value which has 25% items which are less than or equal to it - Similarly Q2 has 50% items with value less than or equal to it.
  • 188. Cont… 7/4/2023 Asaye.A 188 − Q3 has 75% items whose values are less than or equal to it.  Quartile for ungrouped data.  Arrange data in ascending order.  If the number of observation is A. Odd  Qi = 𝑖(𝑛+1)th 4 item  B. Even  Qi =( 𝑖𝑛 4 𝑡ℎ+ 𝑖𝑛 4 +1 𝑡ℎ 2 )
  • 190. Percentiles 7/4/2023 Asaye.A 190  Simply divide the data into 100 pieces  Shows the percentage of values that fall below the particular value in a set of data scores.
  • 191. Cont… 7/4/2023 Asaye.A 191  Arrange the numbers in ascending order. Percentiles for individual series A. Odd Pi = 𝑖(𝑛+1)th 100 item B. Even Pi =( 𝑖𝑛 100 𝑡ℎ+ 𝑖𝑛 100 +1 𝑡ℎ 2 ) Percentiles for grouped data 𝑃𝑖= 𝐿 + 𝑤 𝑓𝑃𝑖 𝑖𝑛 100 − 𝐶𝐹 ,i = 1, 2,...,99 .
  • 193. Cont… 7/4/2023 Asaye.A 193  For example: suppose that 50% of a cohort survived at least 4 years.  This means also that 50% survived at most 4 years.  We say that 4 years is the median.  The media is also called 50th percentile.  We write p50= 4 years.
  • 194. Example 7/4/2023 Asaye.A 194 Marks of 50 students out of 85 is given below. Based on the data find 𝑄1 𝑎𝑛𝑑 𝑃7. Solution: first find CB and CF distribution. Second determine the quartile and percentile classes. For 𝑄1: the smallest CF ≥ i*N/4=1*50/4= 12.5 Marks 46-50 51-55 56-60 61-65 66-70 71-75 76-80 fi 4 8 15 5 9 5 4 Marks 46-50 51-55 56-60 61-65 66-70 71-75 76-80 CB 45.5- 50.5 50.5- 55.5 55.5-60.5 60.5-65.5 65.5-70.5 70.5-75.5 75.5- 80.5 fi 4 8 15 5 9 5 4 CF 4 12 27 32 41 46 50
  • 195. Cont… 7/4/2023 Asaye.A 195  CF ≥ 12.5 are 27,37,41,46, and 50. but the smallest CF is 27. so the quartile class is the third class (55.5-60.5).  Q1 = L + 𝑤 𝑓 𝑄1 𝑛 4 − 𝐶𝐹 = 55.5 + 5 15 12.5 − 12 = 55.7  For percentiles  P7 measure of (7n/100)th value = 3.5th value which lies in group 45.5 – 50.5.  P7 = L + 𝑤 𝑓 𝑃7 7𝑛 100 − 𝐶𝐹 = 45.5 + 5 4 3.5 − 0 = 49.875.
  • 196. Cont… 7/4/2023 Asaye.A 196 1. Calculate 𝑄1 , 𝑄2, 𝑄3, 𝐷4, 𝑃40 & 𝑃90 for the following data given on the table below. 2. The following frequency distribution represents the magnitude of earth quake. Compute the median and verify that it is equal to the second quartile and find 72nd percentile. x 10 11 12 13 14 15 16 17 18 f 2 8 25 48 65 40 20 9 2 Magnitude 0-0.9 1-1.9 2-2.9 3-3.9 4-4.9 5-5.9 6-6.9 7-7.9 Frequency 20 50 45 30 10 8 6 1
  • 197. Summary 7/4/2023 Asaye.A 197 1. The arithmetic mean is used for interval and ratio data and for symmetric distribution. 2. The median and quartiles are used for ordinal, interval and ratio data whose distribution is skewed. 3. For nominal data mode is the appropriate MCT.
  • 198. Measures of variation/dispersion 7/4/2023 Asaye.A 198  The scatter or spread of items of a distribution is known as dispersion or variation.  In other words, the degree to which numerical data tend to spread about an average value is called dispersion or variation of the data.  Measures of dispersions are statistical measures which provide ways of measuring the extent in which data are dispersed or spread out.
  • 199. Agood measure of variation posses: 7/4/2023 Asaye.A 199 o It should be easy to compute and understand. o It should be based on all observations. o It should be Uniquely defined o It should be capable of further algebraic treatment. o It should be as little as affected by extreme values
  • 200. Cont… 7/4/2023 Asaye.A 200 o Measures of dispersion include: o Range o Inter-quartile range o Variance o Standard deviation o Coefficient of variation o Standard scores (Z-scores)
  • 201. Range 7/4/2023 Asaye.A 201  It is the difference between the largest and smallest observation from the data.  Example: Consider the data on the weight (in Kg) of 10 new born children at Debre tabor hospital within a month: 2.51, 3.01, 3.25, 2.02,1.98, 2.33, 2.33, 2.98, 2.88, 2.43
  • 202. Cont… 7/4/2023 Asaye.A 202 Solution:  The range for the dataset can be computed by first arranging all observation in to ascending order as: 1.98, 2.02, 2.33, 2.33, 2.43, 2.51, 2.88, 2.98, 3.01, 3.25.  Range = Maximum – Minimum = 3.25-1.98 = 1.27
  • 203. Cont… 7/4/2023 Asaye.A 203 Limitations of Range  It is based upon two extreme cases in the entire distribution, the range may be considerably changed if either of the extreme cases happens to drop out, while the removal of any other case would not affect it at all.  It wastes information , it takes no account of the entire data.
  • 204. Inter-quartile range 7/4/2023 Asaye.A 204 The inter-quartile range (IQR) is the difference between the third and the first quartiles. Example: Suppose the first and third quartile for weights of girls 12 months of age are 8.8 Kg and 10.2 Kg respectively. The IQR = 10.2 Kg – 8.8 Kg
  • 205. Variance and standard deviation 7/4/2023 Asaye.A 205  Variance measure how far on average scores deviate or differ from the mean.  Variance is the average of the square of the distance each value from the mean.
  • 207. Cont… 7/4/2023 Asaye.A 207  For the case of frequency distribution it is expressed as:  Why you use n-1; − To obtain unbiased estimate of population variance or, − To describe the spread of the population.
  • 208. Cont… 7/4/2023 Asaye.A 208  There is a problem in a variance because the deviations are squared and its units also square, in order to get the original unit of measurements using square root.
  • 209. Example1 7/4/2023 Asaye.A 209 Consider the following three datasets  Dataset 1:7, 7, 7, 7, 7, 7 Mean=7, sd=0  Dataset 2: 6, 7, 7, 7, 7, 8, mean=7, sd=0.63  Dataset 3: 3, 2, 7, 8, 9, 13, mean=7, sd=4.04  We understand that the same mean but different variation
  • 210. Example2 7/4/2023 Asaye.A 210 Find the variance and standard deviation based on the given data 35, 45, 30, 35, 40, 25 Solution; Firstly we find the mean Next subtract the mean from each value and square it:
  • 212. Exercise 7/4/2023 Asaye.A 212  The Areas of spray able surfaces with DDT from a sample of 15 houses are measured as follows (in m2) : 101,105,110,114,115,124,125,125,130,133,135,136,13 7,140,145 Find the variance and standard deviation of the given data set?
  • 214. Example 7/4/2023 Asaye.A 214 Find the variance and the standard deviation for the frequency distribution of the given data set below.
  • 215. Cont… 7/4/2023 Asaye.A 215 Class Frequency Midpoint fi.xm 5.5-10.5 1 8 8 1*(8-24.5)2= 272.25 10.5-15.5 2 13 26 2*(13-24.5)2 = 264.5 15.5-20.5 3 18 54 3*(18-24.5)2 = 126.75 20.5-25.5 5 23 115 5*(23-24.5)2 = 11.25 25.5-30.5 4 28 112 4*(28-24.5)2 = 49 30.5-35.5 3 33 99 3*(33-24.5)2 = 216.75 35.5-40.5 2 38 76 2*(38-24.5)2 = 364.5 Total n = 20 490 1,305
  • 217. Cont… 7/4/2023 Asaye.A 217 Properties of Variance:  The main demerit of variance is that its unit is the square of the unit of the original measurement values.  The variance gives more weight to the extreme values as compared to those which are near to mean value, because the difference is squared in variance.  The drawbacks of variance are overcome by the standard deviation.
  • 218. Cont… 7/4/2023 Asaye.A 218 SD Vs. Standard Error (SE)  SD describes the variability among individual values in a given data set.  SE is used to describe the variability among separate sample means obtained from one sample to another.  We interpret SE of the mean to mean that another similarly conducted study may give a mean that may lie between ± SE.
  • 219. Cont… 7/4/2023 Asaye.A 219  The SD has the advantage of being expressed in the same units of measurement as the mean.  SD is considered to be the best measure of dispersion and is used widely because of the properties of the theoretical normal curve.  However, if the units of measurements of variables of two data sets is not the same, then there variability can’t be compared by comparing the values of SD
  • 220. Coefficient of variation 7/4/2023 Asaye.A 220  When two data sets have different units of measurements, or their means differ sufficiently in size, the CV should be used as a measure of dispersion.  It is the best measure to compare the variability of two series of sets of observations.  A series with less coefficient of variation is considered more consistent.  𝐶𝑣 = 𝑆 𝑋 ∗ 100%
  • 221. Cont… 7/4/2023 Asaye.A 221  Example -“Cholesterol is more variable than systolic blood pressure”
  • 222. Standard score (Z-scores) 7/4/2023 Asaye.A 222  It is obtained by subtracting the mean of the data set from the value and dividing the result by the standard deviation of the data set.  It tells us how many standard deviations a specific value is above or below the mean value of the data set.  The z-score is the number of standard deviations the data value falls above (positive z-score) or below (negative z- score) the mean for the data set.
  • 223. Cont… 7/4/2023 Asaye.A 223  Z-score computed from the population 𝑍 𝑠𝑐𝑜𝑟𝑒 = 𝑋 − 𝜇 𝜎  Z-score computed from the sample 𝑍 𝑠𝑐𝑜𝑟𝑒 = 𝑋 − 𝑋 𝑆 Example: Suppose that a student scored 66 in biostatistics and 80 in anatomy . The score of the summary of the courses is given below. In which course did the student scored better as compared to his classmates? Course Average score Standard deviation of the score Biostatistics 51 12 Anatomy 72 16
  • 224. Solution: 7/4/2023 Asaye.A 224 Z-score of student in Biostatistics: 𝑍 = 𝑋−𝜇 𝜎 = 66−51 12 = 15 12 = 1.25 Z-score of student in Anatomy: 𝑍 = 𝑋−𝜇 𝜎 = 80−72 16 = 8 16 = 0.5 From these two standard scores, we can conclude that the student has scored better in Biostatistics course relative to his classmates than in Anatomy.
  • 225. Moments  The rth moments about the mean (the rth central moments) defined as 𝑀𝑟 = 𝑋𝑖 − 𝑋 𝑟 𝑛 , r = 0, 1, 2, …  For continuous grouped data 𝑀𝑟 = 𝑓𝑖 𝑋𝑖 − 𝑋 𝑟 𝑛 Where 𝑋𝑖’s is class mark Find the first three central moments of the numbers 2, 3 and 7
  • 226. Measure of shape 7/4/2023 Asaye.A 226  There are different type of measure of shape; I. Skewness II. Kurtosis
  • 227. 1. Skewness 7/4/2023 Asaye.A 227 o Measure of central tendency and variation do not reveal the shape of frequency distribution. o Skewness is the degree of asymmetry or departure from symmetry of a distribution. o A skewed frequency distribution is one that is not symmetrical. o Skewness is concerned with the shape of the curve not size.
  • 228. Concept of skewness 7/4/2023 Asaye.A 228 o The skewness of a distribution is defined as the lack of symmetry. o In a symmetrical distribution, mean, median, and mode are equal to each other.
  • 229. Skewness… 7/4/2023 Asaye.A 229 • For moderately skewed distribution, the following relation holds among the three commonly used measures of central tendency.  Mean-Mode=3*(Mean-Median)  Thera are two type of skewness based the its shape.  Positively skewed: Smaller observations are more frequent than larger observations. i.e. the majority of the observations have a value below an average and it has a long tail in the positive direction (Mean > Median).
  • 230. Cont… 7/4/2023 Asaye.A 230 Skewed to the right (positively skewed) Mode Median Mean
  • 231. Cont… 7/4/2023 Asaye.A 231  Negatively (left) skewed: Smaller observations are less frequent than larger observations. i.e. the majority of the observations have a value above an average. i.e. Mean < Median. Mean Median Mode
  • 232. Measures of Skewness 7/4/2023 Asaye.A 232 1. Karl Pearson’s Coefficient of Skewness (SK):  Mean - Mode Standard deviation Sk  3(Mean - Median) Standard deviation Sk If SK = 0, then the distribution issymmetrical. If SK > 0, then the distribution is positively skewed. If SK < 0, then the distribution is negativelyskewed.
  • 233. Cont… 7/4/2023 Asaye.A 233 2. Moment Coefficient of Skewness  Moment coefficient of skewness is based on moments. The formula for calculating coefficient of skewness is: 𝛼3= 𝑀3 𝑀2 3/2 = 𝑀3 𝜎3 Where, Mr = 𝑖=1 𝑛 (𝑥𝑖 − 𝑥)𝑟 /𝑛 𝛼3 > 0, the distribution is positively skewed α3 = 0, the distribution is symmetric α3 < 0, the distribution is negatively skewed
  • 234. 2. Kurtosis 7/4/2023 Asaye.A 234 o Kurtosis is a measure of peakedness of a distribution, and measured relative to the peakedness of a normal curve. o The peakedness of a distribution can be classified into three: o Leptokurtic: - - A distribution having relatively high peak. - A curve is more peaked than the normal curve .
  • 235. Cont… 7/4/2023 Asaye.A 235 o Mesokurtic: - - Normal peak - The curve is properly peaked o Platykurtic:  Flat toped  A large number of observations have low frequency are spread in the middle interval.
  • 237. Measures of kurtosis 7/4/2023 Asaye.A 237  The moment coefficient of skewedness 𝛽2; 𝛽2 = 𝑀4 𝑀2 2 Where; 𝑀2 and 𝑀4 are central moments.  If 𝛽2 = 3, then the distribution is Mesokurtic.  If 𝛽2 > 3, then the distribution is Leptokurtic.  If 𝛽2 < 3, then the distribution is Platykurtic.
  • 238. Example: 7/4/2023 Asaye.A 238 Based on the following data: 𝑀0 = 1, 𝑀1 = -0.6, 𝑀2 = 1.6, 𝑀3 = -2.4, 𝑀4 = 5.8 a) Find the coefficient of skewness and discuss the distribution type. b) Find the coefficient of kurtosis and discuss the distribution type. Solution a) 𝛼3= 𝑀′3 𝑀′2 3/2 = −2.4 1.63/2 = -1.19 < 0, the distribution is negatively skewed. b) 𝛼4= 𝑀′4 𝑀′2 2 = 5.8 1.62 = 2.26 < 3, the curve is Platykurtic.