SlideShare a Scribd company logo
1 of 234
Download to read offline
University of Gondar
College of medicine and health science
Department of Epidemiology and Biostatistics
Basic Biostatistics
Wullo S. (BSc, MPH, Assistant professor)
Tuesday, December 26, 2023 1
Chapter One
1.1 Introduction to Biostatistics
❖ Objectives of the chapter
➢ After completing this chapter, the student will be able to:
– Define Statistics and Biostatistics
– Identify the Branch of biostatistics
– Enumerate the importance and limitations of biostatistics
– Define and Identify the different types of data and
understand why we need to classify variables
2
12/26/2023
Definition and classification of Biostatistics
 Statistics is the science of
 collecting
 organizing
 Presenting
 analysing and drawing conclusion (inferences) from
data for the purpose of making decision.
❖ Biostatistics: The application of statistical methods
to the fields of biological and health sciences.
3
12/26/2023
Classification of Biostatistics
Descriptive biostatistics
❖ A statistical method that is concerned with the collection,
organization, summarization, and analysis of data from a
sample of population.
Inferential biostatistics
❖ A statistical method that is concerned with the drawing
conclusions/inference about a particular population by
selecting and measuring a random sample from the population.
4
12/26/2023
Cont…
collection
organizing
summarizing
presenting of data
Descriptive Statistics
making inferences
hypothesis testing
determining relationship
making the prediction
Inferential Statistics
Biostatistics
5
12/26/2023
Descriptive Biostatistics
• Some statistical summaries which are especially common in
descriptive analyses are:
✓ Measures of central tendency
✓ Measures of dispersion
✓ Measures of association
✓ Cross-tabulation, contingency table
✓ Histogram
✓ Quantile, Q-Q plot
✓ Scatter plot
✓ Box plot
12/26/2023 6
Inferential Biostatistics
7
12/26/2023
1.2 Stages in statistical investigation
There are five stages or steps in any statistical investigation.
1. Collection of data
 The process of obtaining measurements or counts.
2. Organization of data
Includes editing, classifying, and tabulating the data
collected.
3. Presentation of data:
overall view of what the data actually looks like.
facilitate further statistical analysis.
Can be done in the form of tables and graphs or diagrams.
8
12/26/2023
Cont…
4. Analysis of data
 To dig out useful information for decision making
 It involves extracting relevant information from the data
(like mean, median, mode, range, variance…),
5. Interpretation of data
 Concerned with drawing conclusions from the data
collected and analyzed; and giving meaning to analysis
results.
 A difficult task and requires a high degree of skill and
experience.
9
12/26/2023
1.3 Definition of Some Basic terms
Population: is the complete set of possible measurements for which
inferences are to be made.
Census: a complete enumeration of the population. But in most real
problems it cannot be realized, hence we take sample.
Sample: A sample from a population is the set of measurements that are
actually collected in the course of an investigation.
Parameter: Characteristic or measure obtained from a population.
Statistic: A statistic (rather than the filed of Statistics) refers to a
numerical quantity computed from sample data (e.g. the mean, the
median, the maximum). 10
12/26/2023
Parameter and statistic
11
12/26/2023
Cont...
Sampling: The process or method of sample selection from the
population.
Sample size: The number of elements or observation to be
included in the sample.
variable is a characteristic or attribute that can assume different
values in different persons, places, or things.
Some examples of variables include:
▪ Diastolic blood pressure,
▪ heart rate, heights,
▪ The weights
Data: Refers to a collection of facts, values, observations, or
measurements that the variables can assume.
12
12/26/2023
Uses of statistics:
The main function of statistics is to enlarge our knowledge of
complex phenomena. The following are some uses of statistics:
▪ Estimating unknown population characteristics.
▪ Testing and formulating of hypothesis.
▪ Studying the relationship between two or more variable.
▪ Forecasting future events.
▪ Measuring the magnitude of variations in data.
▪ Furnishes a technique of comparison.
13
12/26/2023
Limitations of statistics
As a science statistics has its own limitations. The following are
some of the limitations:
▪ Deals with only quantitative information.
▪ Deals with only aggregate of facts and not with individual data
items.
▪ Statistical data are only approximately and not mathematical
correct.
▪ Statistics can be easily misused and therefore should be used
be experts.
14
12/26/2023
1.5 Types of Variables and Measurement Scales
A variable is a characteristic or attribute that can assume
different values in different persons, places, or things.
Examples :
▪ age,
▪ diastolic blood pressure,
▪ heart rate,
▪ the height of adult males,
▪ the weights of preschool children,
▪ gender of Biostatistics students,
▪ marital status of instructors at University of Gondar,
▪ ethnic group of patients
15
12/26/2023
A. Depending on the characteristic of the measurement, variable can be:
❖ Qualitative(Categorical) variable
✓ A variable or characteristic which cannot be measured in quantitative
form but can only be identified by name or categories,
✓ for instance place of birth, ethnic group, type of drug, stages of
breast cancer (I, II, III, or IV), degree of pain (minimal, moderate,
sever or unbearable).
✓ The categories should be clear cut, not overlapping, and cover all the
possibilities. For example, sex (male or female), vital status (alive or
dead), disease stage (depends on disease), ever smoked (yes or no).
16
12/26/2023
Quantitative(Numerical) variable:
➢ is one that can be measured and expressed numerically.
Example: survival time, systolic blood pressure, number of
children in a family, height, age, body mass index.
➢ they can be of two types
Discrete Variables
✓ Have a set of possible values that is either finite or countabl
infinite.
✓ The values of a discrete variable are usually whole
numbers.
✓ Numerical discrete data occur when the observations are
integers that correspond with a count of some sort.
17
12/26/2023
Some common examples are:
▪ Number of pregnancies,
▪ The number of bacteria colonies on a plate,
▪ The number of cells within a prescribed area upon
microscopic examination,
▪ The number of heart beats within a specified time interval,
▪ A mother’s history of numbers of births ( parity) and
pregnancies
▪ The number of episode of illness a patient experiences
during some time period, etc.
18
12/26/2023
Continuous Variables
✓ A continuous variable has a set of possible values including
all values in an interval of the real line.
✓ No gaps between possible values.
✓ Each observation theoretically falls somewhere along a
continuum.
Example: body mass index, height, blood pressure, serum
cholesterol level, weight, age etc.
19
12/26/2023
Con…
✓ Observations are not restricted to take on certain numerical
values: Often measurements (e.g., height, weight, age).
✓ Continuous data are used to report a measurement of the
individual that can take on any value within an acceptable
range.
20
12/26/2023
Level of measurement which classifies data into mutually exclusive, all
inclusive categories in which no order or ranking can be imposed on
the data.
▪ Assign subjects to groups or categories
▪ No order or distance relationship
▪ No arithmetic origin
▪ Only count numbers in categories
▪ Only present percentages of categories
▪ Chi-square most often used test of statistical significance
Nominal Scale
12/26/2023 21
Sex Social status
Marital status Days of the week (months)
Geographic location Seasons
Ethnic group Types of restaurants
Brand choice Religion
Job type : executive, technical, clerical
Other Examples
Coded as “0” Coded as “1”
Nominal Scale
12/26/2023 22
▪Classifies data according to some order or rank
▪With ordinal data, it is fair to say that one
response is greater or less than another.
▪E.g. if people were asked to rate the hotness of 3 chili
peppers, a scale of "hot", "hotter" and "hottest"
could be used. Values of "1" for "hot", "2" for
"hotter" and "3" for "hottest" could be assigned.
Ordinal Scale
Level of measurement which classifies data into categories that can be
ranked. Differences between the ranks do not exist.
12/26/2023 23
Ordinal Scales
• Arithmetic operations are not applicable but relational
operations are applicable.
• Ordering is the sole property of ordinal scale.
Examples:
Letter grades (A, B, C, D, F).
Rating scales (Excellent, Very good, Good, Fair, poor).
Military status.
12/26/2023 24
Interval Scales
• Level of measurement which classifies data that can be ranked
and differences are meaningful. However, there is no meaningful
zero, so ratios are meaningless.
• All arithmetic operations except division are applicable.
• Relational operations are also possible.
Examples:
IQ
Temperature in oF.
12/26/2023 25
▪assumes that the measurements are made in equal units.
▪i.e. gaps between whole numbers on the scale are equal.
▪e.g. Fahrenheit and Celsius temperature scales
▪an interval scale does not have a true zero.
▪e.g. A temperature of "zero" does not mean that
there is no temperature...it is just an arbitrary zero
point.
▪permissible statistics: count/frequencies, mode, median,
mean, standard deviation
Interval Scale
Numerically equal distances on the scale represent equal values in
the characteristic being measured. An interval scale contains all the
information of an ordinal scale, but it also allows you to compare the
differences between objects.
12/26/2023 26
Ratio Scales
• Level of measurement which classifies data that can be ranked,
differences are meaningful, and there is a true zero. True ratios
exist between the different units of measure.
• All arithmetic and relational operations are applicable.
Examples: Weight
Height
Number of students
Age
12/26/2023 27
Primary Scales of Measurement
4 81 9
Nominal Numbers
assigned to
runners
Ordinal Rank order of
winners
Third
Place
Second
Place
First
Place
Interval Performance
rating on a 0 to
10 Scale
8.2 9.1 9.6
Ratio Time to finish in
20 seconds 15.2 14.1 13.4
12/26/2023 28
STATISTICS
SCALE DESCRIPTIVE INFERENTIAL
Nominal Percentages, Mode Chi-square, Binomial test
Ordinal Percentile, Median Rank-order, Correlation,
ANOVA
Interval Range, Mean, SD Correlations, t-tests, ANOVA
Regression, Factor Analysis
Ratio Geometric Mean, Coefficient of Variation (CV)
Harmonic Mean
12/26/2023 29
Excercise
Categorize the following variables into nominal, ordinal,
interval or ratio
➢ Gender
➢ Grade(A, B, C, D and F )
➢ Rating scale(poor, good, excelent)
➢ Eye colour
➢ Political affilation
➢ Religious affilation
➢ Ranking of tennis players
➢ Majour field
➢ Nationality
30
➢Height
➢Weight
➢Time
➢Age
➢IQ
➢Temprature
➢Salary
12/26/2023
Chapter 2
Organization and Presentation of data
• Having collected and edited the data, the next important step
is to organize it.
• The process of arranging data in to classes or categories
according to similarities is called classification
• Classification is a preliminary and it prepares the ground for
proper presentation of data.
• The presentation of data is broadly classified in to the
following two categories:
• Tabular presentation
• Diagrammatic and Graphic presentation.
Tuesday, December 26, 2023 Wullo S. 31
Tabular presentation of data
• Frequency distribution: is the organization of raw
data in table form using classes and frequencies.
• There are three basic types of frequency
distributions
❖ Categorical frequency distribution
❖Ungrouped frequency distribution
❖Grouped frequency distribution
Tuesday, December 26, 2023 Wullo S. 32
Categorical frequency Distribution:
Used for data that can be place in specific categories such
as nominal, or ordinal.
Example1: a social worker collected the following data on
marital status for 25 persons.(M=married, S=single,
W=widowed, D=divorced)
M S D W D S S M M M W D S M M W D D S S S W W D D
Class (1) Frequency (3)
Percent (4) M 6
24 S 7
28 D 7
28 W 5
20
Tuesday, December 26, 2023 Wullo S. 33
Example 2
• Consider for example, the variable birth weight with levels ‘Very
low ’, ‘Low’, ‘Normal’ and ‘Big’. The frequency distribution for
newborns is obtained simply by counting by the number of
newborns in each birth weight category.
Table 2. Distribution of birth weight of newborns between 1976-1996 at TAH.
BWT Freq. Rel.Freq(%) Cum. FreCum.rel.freq.(%)
Very low 43 0.4 43 0.4
Low 793 8.0 836 8.4
Normal 8870 88.9 9706 97.3
Big 268 2.7 9974 100
Total 9974 100
Tuesday, December 26, 2023 Wullo S. 34
2. Ungrouped frequency Distribution
• Is a table of all the potential raw score values that could possible occur
in the data along with the number of times each actually occurred.
• Example: The following data represent the mark of 20 students. 80
,76, 90 ,85 ,80 ,70 ,60 ,62 ,70 ,85, 65 ,60 ,63 ,74 ,75 ,76 ,70 ,70 ,80, 85
Construct ungrouped frequency distribution
• Mark Frequency
60 2
62 1
63 1
65 1
70 4
74 1
75 2
76 1
80 3
85 3
90 1
Tuesday, December 26, 2023 Wullo S. 35
3.Grouped frequency distribution
• When the range of the data is large, the data must be grouped
in to classes that are more than one unit in width.
Definitions of same terms:
• Grouped Frequency Distribution: a frequency distribution
when several numbers are grouped in one class.
• Class limits: Separates one class in a grouped frequency
distribution from another. The limits could actually appear in
the data and have gaps between the upper limits of one class
and lower limit of the next.
• Units of measurement (U): the distance between two possible
consecutive measures. It is usually taken as 1, 0.1, 0.01, 0.001,
-----.
Tuesday, December 26, 2023 Wullo S. 36
Cont…
• Class boundaries: Separates one class in a grouped frequency
distribution from another. The boundaries have one more
decimal places than the row data and therefore do not appear
in the data. There is no gap between the upper boundary of
one class and lower boundary of the next class. The lower
class boundary is found by subtracting U/2 from the
corresponding lower class limit and the upper class boundary
is found by adding U/2 to the corresponding upper class limit.
• Class width: the difference between the upper and lower
class boundaries of any class. It is also the difference between
the lower limits of any two consecutive classes or the
difference between any two consecutive class marks.
Tuesday, December 26, 2023 Wullo S. 37
Cont…
• Class mark (Mid points): it is the average of the lower and
upper class limits or the average of upper and lower class
boundary.
• Cumulative frequency: is the number of observations less
than/more than or equal to a specific value.
• Cumulative frequency above: it is the total frequency of all
values greater than or equal to the lower class boundary of a
given class.
• Cumulative frequency blow: it is the total frequency of all
values less than or equal to the upper class boundary of a
given class.
Tuesday, December 26, 2023 Wullo S. 38
Cont…
• Cumulative Frequency Distribution (CFD): it is the
tabular arrangement of class interval together with
their corresponding cumulative frequencies. It can be
more than or less than type, depending on the type
of cumulative frequency used.
• Relative frequency (rf): it is the frequency divided by
the total frequency.
• Relative cumulative frequency (rcf): it is the
cumulative frequency divided by the total frequency.
Tuesday, December 26, 2023 Wullo S. 39
Steps for constructing Grouped
frequency Distribution
1. Find the largest and smallest values
2. Compute the Range(R) = Maximum - Minimum
3. Select the number of classes desired, usually
between 5 and 20 or use Sturges rule
where k is number of classes desired and n is total
number of observation.
4. Find the class width by dividing the range by the
number of classes and rounding up, not off.
Tuesday, December 26, 2023 Wullo S. 40
Cont…
5. Pick a suitable starting point less than or equal to the
minimum value. The starting point is called the lower limit of
the first class. Continue to add the class width to this lower
limit to get the rest of the lower limits.
6. To find the upper limit of the first class, subtract U from the
lower limit of the second class. Then continue to add the class
width to this upper limit to find the rest of the upper limits.
7. Find the boundaries by subtracting U/2 units from the lower
limits and adding U/2 units from the upper limits. The
boundaries are also half-way between the upper limit of one
class and the lower limit of the next class. !may not be
necessary to find the boundaries.
Tuesday, December 26, 2023 Wullo S. 41
Cont…
8. Tally the data.
9. Find the frequencies.
10. Find the cumulative frequencies. Depending on
what you're trying to accomplish, it may not be
necessary to find the cumulative frequencies.
11. If necessary, find the relative frequencies and/or
relative cumulative frequencies
Tuesday, December 26, 2023 Wullo S. 42
Example*:
• Construct a frequency distribution for the following data.
11, 29 , 6 ,33 , 14 ,31, 22 , 27 , 19 ,20 ,18 ,17 ,22 ,38 ,23 ,21 ,26
,34 ,39 ,27
Solutions:
Step 1: Find the highest and the lowest value H=39, L=6
Step 2: Find the range; R=H-L=39-6=33
Step 3: Select the number of classes desired using Sturges
formula;
Tuesday, December 26, 2023 Wullo S. 43
• Step 4: Find the class width; w=R/k=33/6=5.5=6
(rounding up)
• Step 5: Select the starting point, let it be the
minimum observation.
6, 12, 18, 24, 30, 36 are the lower class limits.
Step 6: Find the upper class limit; e.g. the first upper
class=12-U=12-1=11
11, 17, 23, 29, 35, 41 are the upper class limits.
So combining step 5 and step 6, one can construct the
following classes.
Tuesday, December 26, 2023 Wullo S. 44
• Class limits
6 – 11
12 – 17
18 – 23
24 – 29
30 – 35
36 – 41
• Step 7: Find the class boundaries;
• E.g. for class 1 Lower class boundary=6-U/2=5.5 Upper class
boundary =11+U/2=11.5
Tuesday, December 26, 2023 Wullo S. 45
• Then continue adding w on both boundaries to obtain the rest
boundaries. By doing so one can obtain the following classes.
• Class boundary
5.5 – 11.5
11.5 – 17.5
17.5 – 23.5
23.5 – 29.5
29.5 – 35.5
35.5 – 41.5
Step 8: tally the data.
Step 9: Write the numeric values for the tallies in the frequency
column
Tuesday, December 26, 2023 Wullo S. 46
Cont…
• . Step 10: Find cumulative frequency.
• Step 11: Find relative frequency or/and relative
cumulative frequency.
• The complete frequency distribution follows:
Tuesday, December 26, 2023 Wullo S. 47
Diagrammatic and Graphic
presentation of data.
• These are techniques for presenting data in visual
displays using geometric and pictures for a
categorical / qualitative types of data.
Importance:
• They have greater attraction.
• They facilitate comparison.
• They are easily understandable.
Diagrams are appropriate for presenting discrete data.
Tuesday, December 26, 2023 Wullo S. 48
Cont…
• The three most commonly used diagrammatic presentation
for discrete as well as qualitative data are:
• Pie charts
• Pictogram
• Bar charts
Pie chart
A pie chart is a circle that is divided in to sections or wedges
according to the percentage of frequencies in each category
of the distribution. The angle of the sector is obtained using:
angle of the sector =RF*360
Tuesday, December 26, 2023 Wullo S. 49
Cont…
• Example: Draw a suitable diagram to represent the following
population in a town.
Men Women Girls Boys
2500 2000 4000 1500
Solutions:
Step 1: Find the percentage.
Step 2: Find the number of degrees for each class.
Step 3: Using a protractor and compass, graph each section and
write its name corresponding percentage.
Tuesday, December 26, 2023 Wullo S. 50
Cont…
Tuesday, December 26, 2023 Wullo S. 51
1
2
3
4
Bar Charts
 The frequency distribution of a categorical variable is often
presented graphically as a bar chart or pie chart.
 Bar charts: display the frequency distribution for nominal
or ordinal data.
 In a bar chart the various categories into which the
observation fall are represented along horizontal axis and a
vertical bar is drawn above each category such that the
height of the bar represents either the frequency or the
relative frequency of observation within the class.
 The vertical axis should always start from 0 but the
horizontal can start from any where.
52
Wullo S.
Tuesday, December 26, 2023
Cont…
• There are different types of bar charts. The most common
being :
• Simple bar chart
• Component or sub divided bar chart.
• Multiple bar charts.
Simple bar chart:
Are used to display data on one variable.
They are thick lines (narrow rectangles) having the same
breadth.
The magnitude of a quantity is represented by the height /length
of the bar.
Tuesday, December 26, 2023 Wullo S. 53
Cont…
• Example: The following data represent sale
by product, in 1957 a given company for
three products A, B, C.
Product In 1957 Sales($)
A 12
B 24
C 24
Tuesday, December 26, 2023 Wullo S. 54
Cont…
• Component Bar chart
-When there is a desire to show how a total (or aggregate) is
divided in to its component parts, we use component bar chart.
• Example: Example: The following data represent sale by product,
1957- 1959 of a given company for three products A, B, C.
Product In 1957 Sales($) In 1958 Sales($) In 1959
A 12 14 18
B 24 28 18
C 24 30 36
Tuesday, December 26, 2023 Wullo S. 55
Cont…
• Multiple Bar charts
- These are used to display data on more than
one variable.
- - They are used for comparing different
variables at the same time. Example: Draw a
component bar chart to represent the sales
by product from 1957 to 1959.
Tuesday, December 26, 2023 Wullo S. 56
Graphical Presentation of data
The histogram, frequency polygon and cumulative frequency
graph or ogive are most commonly applied graphical
representation for continuous data.
Procedures for constructing statistical graphs:
• Draw and label the X and Y axes.
• Choose a suitable scale for the frequencies or cumulative
frequencies and label it on the Y axes.
• Represent the class boundaries for the histogram or ogive or
the mid points for the frequency polygon on the X axes.
• Plot the points.
• Draw the bars or lines to connect the points.
Tuesday, December 26, 2023 Wullo S. 57
Stem-and-Leaf
Represents data by separating each value into
two parts: the stem (the left most digit) and
the leaf (such as the rightmost digit).
• Are most effective with relatively small data
sets
• Are not suitable for reports and other
communications,
• Help researchers to understand the nature of
their data
Example: 43, 28, 34, 61, 77, 82, 22, 47, 49, 51, 29,
36, 66, 72, 41
Tuesday, December 26, 2023 Wullo S. 58
Histogram
• A graph which displays the data by using vertical bars of various heights
to represent frequencies.
• Class boundaries are placed along the horizontal axes.
• Class marks and class limits are some times used as quantity on the X
axes.
Table 3: Distribution of the age of women at the time of marriage
Age group No. of women
15-19 11
20-24 36
25-29 28
30-34 13
35-39 7
40-44 3
45-49 2
Tuesday, December 26, 2023 Wullo S. 59
Cont…
• A histogram of the age of women at the time of marriage
Tuesday, December 26, 2023 Wullo S. 60
Age of women at the time of marriage
0
5
10
15
20
25
30
35
40
14.5-19.5 19.5-24.5 24.5-29.5 29.5-34.5 34.5-39.5 39.5-44.5 44.5-49.5
Age group
No
of
women
Frequency Polygon
• Frequency Polygon : - A line graph. The
frequency is placed along the vertical axis
and classes mid points are placed along the
horizontal axis.
• It is customer to the next higher and lower
class interval with corresponding frequency
of zero, this is to make it a complete
polygon. Example: Draw a frequency
polygon for the above data
Tuesday, December 26, 2023 Wullo S. 61
Ogive (cumulative frequency polygon)
- A graph showing the cumulative frequency (less than
or more than type) plotted against upper or lower
class boundaries respectively.
- That is class boundaries are plotted along the
horizontal axis and the corresponding cumulative
frequencies are plotted along the vertical axis.
- The points are joined by a free hand curve.
- Example: Draw an ogive curve(less than type) for the
above data.
Tuesday, December 26, 2023 Wullo S. 62
Chapter 3
Numerical summary measures
A. Measures of location
• It is often useful to summarize, in a single number or
statistic, the general location of the data or the point at
which the data tend to cluster.
• Such statistics are called measures of location or
measures of central tendency.
• We describe them mean, median and mode.
63
Wullo S.
Tuesday, December 26, 2023
Cont…
Arithmetic mean
• The arithmetic mean, usually abbreviated to
‘mean’ is the sum of the observations divided
by the number of observations.
• We use the following data set of 10 numbers
to illustrate the computations:
Arithmetic Mean
19 21 20 20 34 22 24 27
27 27
• Then, mean = (19 + 21 + … +27) = 24.1
10
• General formula
a) Ungrouped mean
65
Wullo S.
Tuesday, December 26, 2023
Estimation of the mean from a grouped frequency distribution
In calculating the mean from grouped data, we assume that all
values falling into a particular class interval are located at the
mid-point of the interval.
It is calculated as follow:
Cont…
Tuesday, December 26, 2023 Wullo S. 66
where,
k = the number of class intervals
xi = the mid-point of the ith class interval
fi = the frequency of the ith class interval
Properties of the arithmetic mean
• The mean can be used as a summary measure for both
discrete and continuous data, in general however, it is not
appropriate for either nominal or ordinal data.
• For given set of data there is one and only one arithmetic
mean.
• The arithmetic mean is easily understood and easy to
compute.
• Algebraic sum of the deviations of the given values from their
arithmetic mean is always zero.
• The arithmetic mean is greatly affected by the extreme values.
• In grouped data if any class interval is open, arithmetic mean
can not be calculated.
67
Wullo S.
Tuesday, December 26, 2023
Reading Assignment
A. Weighted mean
B. Correct and wrong mean
C. Combined mean
D. Geometric mean
E. Harmonic
Tuesday, December 26, 2023 Wullo S. 68
Median
• With the observations arranged in increasing or decreasing
order, the median is defined as the middle observation.
• If the number of observations is odd, the median is defined as
the [(n+1)/2]th observation.
• If the number of observations is even the median is the
average of the two middle
{(n/2)th +[(n/2)+1]th }/2 values i.e
• If the number of observations is even, so that there is no
middle observation, the median is defined as the average of
the two middle observations.
• Example : 19 20 20 21 22 24 27
27 27 34
• Then, the median = (22 + 24)/2 = 23
69
Wullo S.
Tuesday, December 26, 2023
Median for Grouped data
In calculating the median from grouped data, we assume that the
values within a class-interval are evenly distributed through the
interval.
– The first step is to locate the class interval in which it is located.
– Find n/2 and see a class interval with a minimum cumulative
frequency which contains n/2.
• Whereas:
• LCB= lower class boundary of the median class
• Fc= cumulative frequency just before the median class
• fc=frequency of the median class
• W =class width and n=number of observations.
70
Wullo S.
Tuesday, December 26, 2023
Properties of median
• The median can be used as a summary measure for discrete
and continuous data, in general however, it is not
appropriate for nominal data.
• There is only one median for a given set of data
• The median is easy to calculate
• Median is a positional average and hence it is not drastically
affected by extreme values
• Median can be calculated even in the case of open end
intervals
71
Wullo S.
Tuesday, December 26, 2023
Mode
• Any observation of a variable at which the distribution reaches
a peak is called a mode.
• Most distributions encountered in practice have one peak and
are described as uni-modal.
• E.g. Consider the example of ten numbers
19 21 20 20 34 22 24 27
27 27
• In the above data set, the mode is 27, because the value 27
occurs three times (the most frequent).
• The mode of grouped data, usually refers to the modal class,
where the modal class is the class interval with the highest
frequency.
• If a single value for the mode of grouped data must be
specified, it is taken as the mid point of the modal class
interval
72
Wullo S.
Tuesday, December 26, 2023
Properties of mode
• The mode can be used as a summary measure for nominal,
ordinal, discrete and continuous data, in general however, it is
more appropriate for nominal and ordinal data.
• It is not affected by extreme values
• It can be calculated for distributions with open end classes
• Often its value is not unique
• The main drawback of mode is that often it does not exist
73
Wullo S.
Tuesday, December 26, 2023
proportion
• As we have seen from the previous section, a variable can be
either categorical or numerical
• Proportion is one of summery measures for categorical variable
and also numerical variable if we are counting the number of
cases under the specific category.
• If we denote “x” as the number of success in an experiment and
“n” is the number of trials then the proportion of success from n
number of trials is given by x/n.
74
Wullo S.
Tuesday, December 26, 2023
Percentiles, Quartiles and Inter-quartile Range
• The quartiles are sets of values which divide the distribution into
four parts such that there are an equal number of observations
in each part.
– Q1 = [(n+1)/4]th
– Q2 = [2(n+1)/4]th
– Q3 = [3(n+1)/4]th
• The inter-quartile range is the difference between the third and
the first quartiles.
– Q3 - Q1
Example1: We use the data set of 11 numbers:
19 21 20 20 34 22 24 27 27 27
28
– The first quartile is 20 and the third quartile is 27
– The inter quartile range = 27 – 20 = 7.
75
Wullo S.
Tuesday, December 26, 2023
Percentiles, Quartiles and Inter-quartile Range
• Percentiles divide the data into 100 parts of observations in each
part.
• It follows that the 25th percentile is the first quartile, the 50th
percentile is the median and the 75th percentile is the third
quartile.
76
Wullo S.
Tuesday, December 26, 2023
B. Measures of Dispersion (Variation)
• The scatter or spread of items of a distribution is known as
dispersion or variation.
• In other words the degree to which numerical data tend to
spread about an average value is called dispersion or
variation of the data.
• The most commonly used measures of dispersions are:
1) Range and relative range
2) Quartile deviation and coefficient of Quartile deviation
3) Mean deviation and coefficient of Mean deviation
4) variance
5) Standard deviation and coefficient of variation.
Tuesday, December 26, 2023 Wullo S. 77
Range
• The range is the largest score minus the smallest
score.
• It is a quick and dirty measure of variability
• Because the range is greatly affected by extreme
scores and its only depends on two observations
Relative Range (RR)
• It is also some times called coefficient of range and
given:
Tuesday, December 26, 2023 Wullo S. 78
Cont..
• Example:
1. If the range and relative range of a series are 4 and
0.25 respectively. Then what is the value of Smallest
observation and Largest observation
The Quartile Deviation (Semi-inter quartile range)
The inter quartile range is the difference between the
third and the first quartiles of a set of items and
semi-inter quartile range is half of the inter quartile
range.
Tuesday, December 26, 2023 Wullo S. 79
Variance
• variance is the "average squared deviation from the mean".
• A good measure of dispersion should make use of all the data.
• For the case of frequency distribution it is expressed as:
the variance is limited as a descriptive statistic because it is not
in the same units as in the observations.
Tuesday, December 26, 2023 Wullo S. 80
Cont….
Tuesday, December 26, 2023 Wullo S. 81
For the case of frequency distribution it is
expressed as:
Coefficient of Variation (C.V)
Is defined as the ratio of standard deviation to
the mean usually expressed as percents.
The distribution having less C.V is said to be less
variable or more consistent.
Tuesday, December 26, 2023 Wullo S. 82
Which measures to use?
• When the distribution of the data is symmetric and unimodal (i.e.
the data are approximately normally distributed), it is usual to
summarize the data using means and standard deviations.
• However when the data are skewed, it is preferable to use the
median and quartiles as summary statistics.
• Median and quartiles are not easily influenced by extreme values
in a skewed distribution unlike means and standard deviations.
• Remark:
– The mean and median of symmetric distribution coincide.
– When the distribution is skewed to the right, its mean is larger than its
median.
– When the distribution is skewed to the left, its mean is smaller than its
median. [See Figures 7(a-c)].
83
Wullo S.
Tuesday, December 26, 2023
Median Mode Mean
Fig. 2(a). Symmetric Distribution
Mean = Median = Mode
Mode Median Mean
Fig. 2(b). Distribution skewed to the right
Mean > Median > Mode
Mean Median Mode
Fig. 2(c). Distribution skewed to the left
Mean < Median < Mode
84
Wullo S.
Tuesday, December 26, 2023
Chapter 4
Introduction to probability
Objectives
• After completing this chapter, you should be able to
– Determine sample spaces and find the probability of an events
– Understand the different properties of probability
– Explain the various types of probability distributions – emphasis will be
given to the two widely used probability distributions
Tuesday, December 26, 2023 Wullo S. 85
Introduction
• Many medical decisions are made on a statistical
basis since individuals differ in their reactions to
medications or surgery in an unpredictable way.
• In that case the treatment applied is based on
getting the best outcome for as many patients as
possible
– The life experienced consists of a series of events
– “Probability” is a very useful concept and are used in
everyday communication.
86
Wullo S.
Tuesday, December 26, 2023
Introduction con'td…
 An understanding of probability is fundamental for
 quantifying the uncertainty in the decision-making
process
 drawing conclusions about a population of patients based on known
information about a sample of patients drawn from that population.
Probability can be defined as the chance of an event
occurring.
Many people are familiar with probability from
observing or playing games of chance, such as card
games, slot machines, or lotteries.
Probability theory is used in the various fields of area
like insurance, investments, and weather forecasting
and other areas.
87
Wullo S.
Tuesday, December 26, 2023
Basic concepts
• The following definitions and terms are used in
studying the theory of probability.
– Random experiment: a chance process that leads to
well-defined results called outcomes, that is the result
cannot be predicted. E.g. Tossing of coins, throwing of
dice are some examples of random experiments.
– Trial: Performing a random experiment is called a
trial.
– Outcomes: The results of a random experiment are
called its outcomes. When two coins are tossed the
possible outcomes are HH, HT, TH, TT.
88
Wullo S.
Tuesday, December 26, 2023
Basic concepts con'td….
– Sample space: Each conceivable outcome of an
experiment is called a sample point. The totality of all
sample points is called a sample space and i s denoted
by S.
– Event: An outcome or a combination of outcomes of a
random experiment is called an event. It is a subset of
the sample space of a random experiment.
– Equally-likely Approach: If an experiment must result in
n equally likely outcomes, then each possible outcome
must have probability 1/n of occurring.
– Mutually exclusive events: when the occurrence of any
one event excludes the occurrence of the other event.
Mutually exclusive events cannot occur simultaneously.
89
Wullo S.
Tuesday, December 26, 2023
Basic concepts cont’d…..
• Some sample spaces for various probability experiments are.
• Probability attempts to quantify an uncertain situation and relative
tries to make it more concrete the occurrence of events.
• Probability is used to quantify the likelihood, or chance, that an
outcome of a random experiment will occur.
• Probability is a number between 0 and 1 that expresses how likely the
event is occur.
90
Wullo S.
Tuesday, December 26, 2023
Basic concepts cont’d…..
• Example: Find the sample space for the gender of the children if
a family has three children. Use B for boy and G for girl.
– Solution: There are two genders, male and female, and each
child could be either gender. Hence, there are eight
possibilities, as shown here.
S= {BBB, BBG, BGB, GBB, GGG, GGB, GBG, BGG}
• Note: the way to find all possible outcomes of a probability
experiment (the sample spaces)
– by observation and reasoning;
– use a tree diagram (a device consisting of line segments
emanating from a starting point and also from the outcome
point.)
91
Wullo S.
Tuesday, December 26, 2023
Tree diagram of the above example
92
Wullo S.
Tuesday, December 26, 2023
Types of probability
1. Classical Probability
If the number of outcomes belonging to an event E is NE, and the total
number of outcomes is N, then the probability of event E is defined
as P(E)=NE/N
Example: A couple wants to have exactly 3 children. Assume that each
child is either a Boy or a Girl and that there are no duplicate births.
Find the probability that two of them will be boys? List all possible
orderings for the three children.
Solution: S {BBB, BBG, BGB, BGG, GBB, GBG, GGB, GGG}
Then the event E={BBG, BGB, GBB}
P(E)=3/8
2. Relative Frequency Probabilities
This approach to probability is well-suited to a wide range of scientific
disciplines. It is based on the idea that the underlying probability of
an event can be measured by repeated trials
Tuesday, December 26, 2023 Wullo S. 93
• In this view, probability is treated as a quantifiable level of belief
ranging from 0 (complete disbelief) to 1 (complete belief)
• For instance, an experienced physician may say “this patient has a
50% chance of recovery.”
• An appreciation of the various types of probability are not mutually
exclusive. And fortunately, all obey the same
• mathematical laws, and their methods of calculation are similar.
• All probabilities are a type of relative frequency—the number of
times something can occur divided by the total number of
possibilities or occurrences.
3. Subjective probability
Tuesday, December 26, 2023 Wullo S. 94
Rules of Probability
• Any probability assigned must be a nonnegative number.
• The probability of the sample space (the collection of all possible
outcomes) is equal to 1.
• The probability of A or B involves addition.
• P(A or B) = P(A) + P(B) if the two are mutually exclusive.
• The probability of A and B involves multiplication
• P(A and B) = P(A) P(B) if the two are independent
• P( Not A) = 1- P(A)
• P(At least one) = 1- P(none)
• P(none) = P(each event not happening)^number of events
Tuesday, December 26, 2023 Wullo S. 95
Addition Rule
• General rule: if two events, A and B, are not mutually exclusive,
then the probability that event A or event B occurs is:
– P(A or B) = P(A) + P(B) – P(A and B)
– P(AuB) = P(A) + P(B) – P(AnB)
• Special case: when two events, A and B, are mutually exclusive,
then the probability that event A or event B occurs is:
– P(A or B) = P(A) + P(B)
– P(AuB) = P(A) + P(B)
• since P(A and B) = 0 for mutually exclusive events
Wullo S. 96
Tuesday, December 26, 2023
Conditional Probabilities
• Sometimes the chance a particular event happens depends on the
outcome of some other event. This applies obviously with many
events that are spread out in time.
• The probability that an event occurs subject to the condition that
another event has already occurred is called conditional probability
• If A and B are events with Pr(A) > 0, the conditional probability of
B given A is
• Example: Drug test
• Given A is independent from B, what is the relationship between
Pr(A|B) and Pr(A)?
Pr( )
Pr( | )
Pr( )
AB
B A
A
=
97
Women Men
Success 200 1800
Failure 1800 200
A = {Patient is a Women}
B = {Drug fails}
Pr(B|A) = 1800/2000
Pr(A|B) =
Wullo S.
Tuesday, December 26,
2023
Conditional probability:
• The conditional probability of an event A given an
event B is present is:
– P(A | B) =P(AnB)/P(B)
• where P(B)≠0
• Joint probability
• The joint probability of an event A and an event B is
– P(AnB) = P(A and B)
• When events A and B are mutually exclusive, then
– P(A and B) = 0
Wullo S. 98
Tuesday, December 26, 2023
Multiplication rule
• General rule: the multiplication rule specifies the
joint probability as:
• P(AnB)=P(B)P(A/B)
• Special case: When events A and B are independent,
then:
– P(A|B) = P(A)
– P(AnB)=P(A)P(B)
Wullo S. 99
Tuesday, December 26, 2023
Example
1. Find the Probability of at least one male birth in ten consecutive
births?
Solution
• P(At least one male) = 1- P(all females)
• P(all females) = P(each single birth is a female)10 = (0.5)10
= 9.77 x 10-4
• So P(At least one male) = 1 – 9.77 x 10-4 = 0.999023.
Tuesday, December 26, 2023 Wullo S. 100
Exercises
1. 50% of the students in a school weigh more than 140 pounds, but
70% weigh less than 170. If a student is randomly selected, what
is the probability the student will weigh between 140 and 170?
• Solution
• P(w<140)=0.5, P(w>170)=0.3, and P(w<140)+P(140<w<170)+
P(w>170)= 1
• Then
• 0.5+P(140<w<170)+ 0.3= 1 P(140<w<170)=1-0.8=0.2
Tuesday, December 26, 2023 Wullo S. 101
Probability Distribution
• A probability distribution is a table or an equation
that links each outcome of a statistical experiment
with its probability of occurrence.
• Probability distribution for discrete variable
• Probability distribution for continuous variable
Wullo S. 102
Tuesday, December 26, 2023
Probability Distribution for discrete Variable
1. Binomial distribution
• A binomial experiment (also known as a Bernoulli trial) is a statistical
experiment that has the following properties:
• The experiment consists of n repeated fixed number of trials.
• Each trial can result in just two possible outcomes. We call one of these
outcomes a success and the other, a failure.
• The probability of success, denoted by P, is the same on every trial.
• The trials are independent; that is, the outcome on one trial does not
affect the outcome on other trials.
• The probability distribution of this experiment is known as binomial
probability distribution.
• The binomial distribution describes the distribution of "success" in a series
of trials, that is, out of N tries, what is the probability that X of them
succeed.
103
Wullo S.
Tuesday, December 26, 2023
Binomial formula
• If the probability of success on an individual trial is P, then the binomial
probability is defined by:
• Where
• K=the number of success
• P=probability of success
• n=the number of experiments
• 1-p=probability of failure
104
Wullo S.
Tuesday, December 26, 2023
2. Poisson Distribution
• The Poisson Distribution is a discrete distribution which takes on the values
X = 0, 1, 2, 3, and so on.
• It is often used as a model for the number of events in a specific time
period.
• It is determined by one parameter, lambda. The Poisson random variable
satisfies the following conditions:
• The number of successes in two disjoint time intervals is independent.
• The probability of a success during a small time interval is proportional to
the entire length of the time interval.
• Some of the examples in which Poisson distribution is appropriate are:
➢ birth defects and genetic mutations
➢ rare diseases (like Leukemia, but not AIDS because it is infectious and so
not independent)
➢ car accidents
➢ traffic flow and ideal gap distance, and so on
Tuesday, December 26, 2023 Wullo S. 105
Poisson formula
• The probability distribution of a Poisson random variable X representing the
number of successes occurring in a given time interval or a specified region of
space is given by the formula:
Where
• X=Number of successes per unit time
• e=The base of the natural log
• λ= The expected number of successes per unit time
• If λ is the average number of successes occurring in a given time interval or
region in the Poisson distribution, then the mean and the variance of the
Poisson distribution are both equal to λ.
Tuesday, December 26, 2023 Wullo S. 106
Probability Distribution for continuous Variables
• If a random variable is a continuous variable, its probability
distribution is called a continuous probability distribution.
• A continuous probability distribution differs from a discrete
probability distribution in several ways by:
• Under different circumstances, the outcome of a random
variable may not be limited to categories or counts.
• Because a continuous random variable X can take on an
uncountable infinite number of values, the probability
associated with any particular one value is almost equal to zero.
• As a result, a continuous probability distribution cannot be
expressed in tabular form.
107
Wullo S.
Tuesday, December 26, 2023
Normal distribution
• The normal distribution refers to a family of continuous
probability distributions described by the normal equation and
described as follows:
where
• X is a normal random variable,
• μ is the mean
• σ is the standard deviation
• pi is approximately 3.14159, and e is approximately 2.71828.
• The random variable X in the normal equation is called the
normal random variable. 108
Wullo S.
Tuesday, December 26, 2023
Characteristics of Normal Distribution
• It links frequency distribution to probability distribution
• Has a Bell Shape Curve and is Symmetric
• It is Symmetric around the mean: Two halves of the curve are the
same (mirror images)
• Hence Mean = Median
• The total area under the curve is 1 (or 100%)
• Normal Distribution has the same shape as Standard Normal
Distribution.
109
Wullo S.
Tuesday, December 26, 2023
Normal Curve
• The graph of the normal distribution depends on two factors:
✓the mean and the standard deviation.
• The mean of the distribution determines the location of the center of the
graph, and the standard deviation determines the height and width of the
graph.
• When the standard deviation is large, the curve is short and wide; when the
standard deviation is small, the curve is tall and narrow.
• All normal distributions look like a symmetric, bell-shaped curve.
110
Wullo S.
Tuesday, December 26, 2023
Standard Normal Distribution
• It makes life a lot easier for us if we standardize our normal curve, with a mean
of zero and a standard deviation of 1 unit.
• We can transform all the observations of any normal random variable X with
mean μ and variance σ to a new set of observations of another normal random
variable Z with mean 0 and variance 1 using the following transformation:
111
Wullo S.
Tuesday, December 26, 2023
• About 95% of the area under the curve falls within 2 standard deviations
of the mean.
• About 99.7% of the area under the curve falls within 3 standard deviations
of the mean.
• A graph of this standardized (mean 0 and variance 1) normal curve is given
in Graph:
112
Wullo S.
Tuesday, December 26, 2023
Table of normal
• Example 1: Suppose we want to compute the area under the
normal curve to the right of 1.45
• This area can be computed by finding the probability under
the normal curve. The probability can be read at the normal
curve by combining the value of 1.4 under the first column
and 0.05 under the first row.
• The green shaded area in the diagram represents the area
that is within 1.45 standard deviations from the mean. The
area of this shaded portion is 0.4265 (or 42.65% of the total
area under the
113
Wullo S.
Tuesday, December 26, 2023
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3304 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817 114
Wullo S.
Tuesday, December 26, 2023
Example 2
• Assuming the normal heart rate (H.R) in normal healthy individuals is
normally distributed with Mean = 70 and Standard Deviation =10 beats/min
Then:
1) What area under the curve is above 80 beats/min?
Now we know, Z =X-M/SD,
Z=? X=80, M= 70, SD=10 .
So we have to find the value of Z.
For this we need to draw the figure…..and find the area which corresponds to Z.
115
Wullo S.
Tuesday, December 26, 2023
• Since M=70, then the area under the curve which is above 80 beats
per minute corresponds to above + 1 standard deviation.
•
• The total shaded area corresponding to above 1+ standard
deviation in percentage is 15.9% or Z= 15.9/100 =0.159.
• Or we can find the value of z by substituting the values in the
formula Z= X-M/ standard deviation.
• Therefore, Z= 70-80/10 -10/10= -1.00 is the same as +1.00. The
value of z from the table for 1.00 is 0.159. How do we interpret this?
•
• This means that 15.9% of normal healthy individuals have a heart
rate above one standard deviation (greater than 80 beats per
minute).
116
Wullo S.
Tuesday, December 26, 2023
13.6%
2.2%
0.15
-3 -2 -1 μ 1 2 3
Diagram of Exercise
0.159
33.35%
117
Wullo S.
Tuesday, December 26, 2023
Example …
2) What area of the curve is above 90
beats/min?
3) What area of the curve is between
50-90 beats/min?
4)What area of the curve is above 100
beats/min?
5) What area of the curve is below 40 beats per
min or above 100 beats per min?
118
Wullo S.
Tuesday, December 26, 2023
solution
2) 2.3% or 0.023
3) 95.4% or 0.954
4) 0.15 % or 0.015
5) 0.3 % or 0.015 (for each tail)
119
Wullo S.
Tuesday, December 26, 2023
Example 3
• Suppose scores on an IQ test are normally distributed. If the test has a
mean of 100 and a standard deviation of 10, what is the probability that a
person who takes the test will score between 90 and 110?
• Solution:
120
Wullo S.
Tuesday, December 26, 2023
Application/Uses of Normal Distribution
• It’s application goes beyond describing distributions
• It is used by researchers and modelers.
• The major use of normal distribution is the role it plays
in statistical inference.
• The z score along with the t –score, chi-square and F-
statistics is important in hypothesis testing.
• It helps managers/management make decisions.
121
Wullo S.
Tuesday, December 26, 2023
Exercises
• Find the probability under the normal curve of the
following:
• The area greater than 1.25
• The area lower than 0.87
• The area greater than -2.36
• The area lower than -1.96
• The area between 0.87 and 1.25
• The area between -1.96 and 1.25
• The value of z which cuts the lower 20%
• The value of z which cuts the upper 10%
• The values of z which cut the middle 80%
122
Wullo S.
Tuesday, December 26, 2023
Chapter s
Sampling methods and sample size
determination
• Sampling is a procedure by which some members of
the given population are selected as representative
of the entire population
➢ Theoretical population
➢ Target population
➢ Study population
➢ sampling frame
➢ Sampling unit
➢ Study unit
Hierarchy of sampling
Wullo S. 124
Tuesday, December 26, 2023
Error in sampling
• No sample is the exact mirror image of the population
• Potential Source of Error in research are two.
1. Sampling error is any type of bias that is attributable to
mistakes in either drawing a sample or determining the
sample size
➢ Sampling error (chance ) Can not be avoided or totally
eliminated
➢ The causes are
– One is chance: That is the error that occurs just because of bad
luck
– Design error
– Un representativeness of the sample
Wullo S. 125
Tuesday, December 26, 2023
Error in sampling con’d..
2. Non-sampling error is any error which will be
committed during data collection, coding, entry, and
so on
✓ Observational error
✓ Respondent error
✓ Lack of preciseness of definition
✓ Error in editing and tabulation of the data
Wullo S. 126
Tuesday, December 26, 2023
Wullo S. 127
Tuesday, December 26, 2023
Advantage of sampling
➢Feasibility it may be the only feasible method
of collecting data
➢Reduced cost sampling reduces demands on
resource such as finance, personal and
material
➢Greater accuracy sampling may lead to better
accuracy of collecting data/ detailed
information
➢Greater speed data can be collected and
summarized more quickly
Wullo S. 128
Tuesday, December 26, 2023
Disadvantage of sampling
• There is always sampling error
• Sampling may create a feeling of discrimination within
the population
• It may be inadvisable where every unit in the
population is legally required to have a record
• Demands more rigid control in undertaking sample
operation
• The presence of bias creates difference between the
parameter and the statistic
• Minority and smallness in number of sub-groups often
render study to be suspected.
• Sample results are good approximations at best.
Wullo S. 129
Tuesday, December 26, 2023
Types of sampling
I. Probability sampling
• Is any method of sampling that utilizes some form of random
selection.
• probability sampling is a procedure for sampling from a
population in which
– The selection of a sample unit is based on chance
– Every element of the population has a known and non-zero
probability of being selected
– Random sampling helps produce representative samples by
eliminating voluntary response bias and guarding against under
coverage bias
❖ Every individual of the target population has equal chance to be
included in the sample.
Wullo S. 130
Tuesday, December 26, 2023
1. Simple random sample (SRS)
• Objective: To select n units out of N
• If the population is homogenous
• If frame is available
• If the study area is not very wide
– Note: Homogeneity refers to the similarity of the population with regard
to the outcome variable .
• Procedure:
✓Use a table of random numbers: takes on values
0,1,2,…….,9 with equal probability
✓ a computer random number generator
✓ mechanical device to select the sample.
✓RAND() function from Excel sheet if frame is available
✓Lottery method
Wullo S. 131
Tuesday, December 26, 2023
Wullo S. 132
Tuesday, December 26, 2023
2. Stratified random sampling
• Stratified Random Sampling involves dividing your population
into homogeneous subgroups and then taking a simple random
sample in each subgroup.
• Objective: Divide the population into non-overlapping groups
(i.e., strata) N1, N2, N3, ... Ni, such that N1 + N2 + N3 + ... + Ni = N.
Then do a simple random sample depending on the type of
allocation
➢ Proportional allocation:
➢ Equal allocation
Example:
➢ An agency has clients from three ethnic groups and the agency
wants to asses clients view of quality of service for the last year.
i
i N
N
n
n *
=
Wullo S. 133
Tuesday, December 26, 2023
Stratified random sampling
Wullo S. 134
Tuesday, December 26, 2023
3. Systematic random sampling
▪ Here are the steps you need to follow in order to achieve a
systematic random sample:
❖number the units in the population from 1 to N
❖decide on the n (sample size) that you want or need
❖k = N/n = the interval size
❖randomly select an integer between 1 to k
❖then take every kth unit
• Assumptions
– Homogeneous population
– Frame is not available
– If the study area is not very wide
Wullo S. 135
Tuesday, December 26, 2023
Systematic random sampling
• Example
Wullo S. 136
Tuesday, December 26, 2023
Wullo S. 137
Tuesday, December 26, 2023
Wullo S. 138
Tuesday, December 26, 2023
4. Cluster (area) random sampling
• The problem with random sampling methods when we have to
sample a population that's disbursed across a wide geographic
region is that you will have to cover a lot of ground geographically
in order to get to each of the units you sampled
• In cluster sampling, we follow these steps:
✓ divide population into clusters (usually along geographic
boundaries)
✓ randomly sample clusters
✓ measure all units within sampled clusters
Wullo S. 139
Tuesday, December 26, 2023
Example
• In the figure we see a map of the counties in New York State.
Let's say that we have to do a survey of town governments that
will require us going to the towns personally. If we do a simple
random sample state-wide we'll have to cover the entire state
geographically
Wullo S. 140
Tuesday, December 26, 2023
5. Multi-stage sampling
• The four methods we've covered so far -- simple, stratified,
systematic and cluster -- are the simplest random sampling
strategies
• When we combine sampling methods, we call this multi-stage
sampling
• Consider the problem of sampling students in grade schools.
We might begin with a national sample of school districts
stratified by educational level. Within selected districts, we
might do a simple random sample of schools. Within schools,
we might do a simple random sample of classes or grades.
And, within classes, we might even do a simple random
sample of students. In this case, we have three or four stages
in the sampling process and we use both stratified and simple
random sampling.
Wullo S. 141
Tuesday, December 26, 2023
Wullo S. 142
Tuesday, December 26, 2023
II. Non probability sampling
• Non probability sampling does not involve random selection
• Does that mean that non probability samples aren‘t representative
of the population?
• It does mean that non probability samples cannot depend upon the
rationale of probability theory
• Most sampling methods are purposive in nature because we usually
approach the sampling problem with a specific plan in mind.
Wullo S. 143
Tuesday, December 26, 2023
Convenience sampling
Example
➢ Man in the street (attitude of foreigners about Ethiopia)
➢ College students for psychological study
Wullo S. 144
Tuesday, December 26, 2023
Purposive sampling
• In purposive sampling, we sample with a purpose in mind
1. Modal Instance Sampling
✓ In statistics, the mode is the most frequently occurring value
in a distribution.
✓ we are sampling the most frequent case, or the "typical"
case
✓ We could say that the modal voter is a person who is of
average age, educational level, and income in the
population
Wullo S. 145
Tuesday, December 26, 2023
Purposive sampling…
2. Expert Sampling
✓ Expert sampling involves the assembling of a sample of persons
with known or demonstrable experience and expertise in some
area.
3. Quota Sampling
✓ In quota sampling, you select people non randomly according to
some fixed quota
✓ There are two types of quota sampling: proportional and non
proportional
4. Heterogeneity Sampling
✓ We sample for heterogeneity when we want to include all
opinions or views, and we aren't concerned about representing
these views proportionately.
✓ Another term for this is sampling for diversity.
Wullo S. 146
Tuesday, December 26, 2023
Purposive sampling…
5. Snowball sampling
✓ In snowball sampling, you begin by identifying someone who
meets the criteria for inclusion in your study
✓ You then ask them to recommend others who they may know
who also meet the criteria
✓ Snowball sampling is especially useful when you are trying to
reach populations that are inaccessible or hard to find
✓ For instance, if you are studying the homeless, you are not
likely to be able to find good lists of homeless people within a
specific geographical area. However, if you go to that area and
identify one or two, you may find that they know very well
who the other homeless people in their vicinity are and how
you can find them
Wullo S. 147
Tuesday, December 26, 2023
Summary
▪ Selecting a sampling method depends on
• Population to be studied
✓ Size and geographic distributions
✓ Heterogeneity with respect to the variable studied
• Resource available
• Level of precision required
• Importance of having a precise estimate of sampling error
Wullo S. 148
Tuesday, December 26, 2023
Sample size determination
• Determining the sample size for a study is a crucial
component of study design.
• The goal is to include sufficient numbers of subjects so that
statistically significant results can be detected.
• Among the questions that a researcher should ask when
planning a survey or study is that "How large a sample do I
need?"
• The answer will depend on the aims, nature and scope of the
study and on the expected result.
• All of which should be carefully considered at the planning
stage
Wullo S. 149
Tuesday, December 26, 2023
Sample size determination ….
• In general, sample size depends on
– Objective of the study
– Design of the study
– Plan for statistical analysis
– Accuracy of the measurement to be made
– Degree of precision required for generalization
– Degree of confidence
• We can use three approaches to determine sample size
– Rules of thumb for determining the sample size
– Statistical formula
• Confidence interval approach
• Hypothesis testing approach
Wullo S. 150
Tuesday, December 26, 2023
Rules of thumb
• For smaller samples(N < 100), there is little point in sampling.
Survey the entire population.
• If the population size is around 500 (give or take 100), 50%
should be sampled.
• If the population size is around 1500, 20% should be sampled.
• Beyond a certain point (N = 5000), the population size is
almost irrelevant and a sample size of 400 may be adequate.
• Statistician maximalist: at least 500
• To make generalizations about entire population, need a total
sample size of 200-400 (depending on total population and
confidence level desired)
Wullo S. 151
Tuesday, December 26, 2023
confidence interval
• Hence the absolute precision denoted by d is given as
• Where s.e is the standard error of the estimator of the
parameter of interest.
• The margin of error (d) measures the precision of the estimate
– Small value of w indicates high precision
– It lies in the interval (0%; 5%]
– For p close to 50%, w is assumed to be close to 5%
– For smaller value of p, w is assumed to be lower than 5%
e
s
z
proportion
mean .
)
(
2


e
s
z
d .
2

=
Wullo S. 152
Tuesday, December 26, 2023
Estimating a single population mean
Where the standard deviation δ can be estimated by;
✓From previous study, if there is
✓From pilot study
✓From educate guess
Wullo S. 153
Tuesday, December 26, 2023
Single population proportion
• Let p denotes proportion of success, then
Where the standard deviation p can be estimated by;
✓From previous study, if there is
✓From pilot study
✓P=50%
Wullo S. 154
Tuesday, December 26, 2023
Point to be considered
Wullo S. 155
Tuesday, December 26, 2023
Wullo S. 156
Tuesday, December 26, 2023
Example
Wullo S. 157
Tuesday, December 26, 2023
Chapter six
Inferential statistics
• After complete this session you will be able to do
– Parameter estimations
– Point estimate
– Confidence interval
– Hypothesis testing
– Z-test
– T-test
– Testing associations
– Chi-Square test
Introduction
159
Wullo S.
Tuesday, December 26, 2023
Introduction con'td……..
• Before beginning statistical analyses
– it is essential to examine the distribution of the variable for
skewness (tails),
– kurtosis (peaked or flat distribution), spread (range of the
values) and
– outliers (data values separated from the rest of the data).
• Information about each of these characteristics
determines to choose the statistical analyses and can
be accurately explained and interpreted.
160
Wullo S.
Tuesday, December 26, 2023
Introduction con’td …….
• Statistical tests can be either parametric or non-
parametric
• The path way for the analysis of continuous variables
is shown below
161
Wullo S.
Tuesday, December 26, 2023
Sampling Distribution
• The frequency distribution of all these samples forms the sampling
distribution of the sample statistic
162
Wullo S.
Tuesday, December 26, 2023
Sampling distribution .......
 Three characteristics about sampling distribution of a statistic
its mean
its variance
its shape
 If we repeatedly take sample of the same size n from a
population the means of the samples form a sampling
distribution of means of size n is equal to population mean.
 In practice we do not take repeated samples from a population
i.e. we do not encounter sampling distribution empirically, but it
is necessary to know their properties in order to draw statistical
inferences.
163
Wullo S.
Tuesday, December 26, 2023
The Central Limit Theorem
• Regardless of the shape of the frequency distribution of
a characteristic in the parent population,
• the means of a large number of samples (independent
observations) from the population will follow a normal
distribution (with the mean of means approaches the
population mean μ, and standard deviation of σ/√n ).
• Inferentialstatisticaltechniqueshavevariousassumptionst
hatmustbemetbeforevalid conclusions can be obtained
• Samples must be randomly selected.
• sample size must be greater (n>=30)
• the population must be normally or approximately normally distributed if the
sample size is less than 30.
164
Wullo S.
Tuesday, December 26, 2023
Sampling Distribution......
• E.g. Sampling Distribution of the mean
• Suppose we choose a random sample of size n, the
sampling distribution of the sample mean x posses the
following properties.
– The sample mean x will be an estimate of the population
mean μ.
– The standard deviation of x is σ/√n (called the standard
error of the mean).
– Provided n is large enough the shape of the sampling
distribution of x is normal.
165
Wullo S.
Tuesday, December 26, 2023
Sampling Distribution ..........
• Proportion
 Suppose we choose a random sample of size n, the sampling
distribution of the sample means p posses the following
properties.
 The sample proportion p will be an estimate of the
population mean p.
________
 The standard deviation of p is = √p(1-p) /n called the
standard error of the proportion).
 Provided n is large enough the shape of the sampling
distribution of p is normal.
166
Wullo S.
Tuesday, December 26, 2023
Parameter Estimations
• In parameter estimation, we generally assume that the
underlying (unknown) distribution of the variable of interest is
adequately described by one or more (unknown) parameters,
referred as population parameters.
• As it is usually not possible to make measurements on every
individual in a population, parameters cannot usually be
determined exactly.
• Instead we estimate parameters by calculating the
corresponding characteristics from a random sample
estimates .
• the process of estimating the value of a parameter from
information obtained from a sample.
• Point estimation
• Interval estimation
167
Wullo S.
Tuesday, December 26, 2023
Point estimation
• A point estimate is a specific numerical value estimate of a
parameter.
• Sample measures (i.e., statistics) are used to estimate population
measures (i.e., parameters). These statistics are called estimators.
• Point estimate for population mean µ is
• Point estimate for population proportion is given by
• Where x is the total number of success (events) 168
n
x
=
x
n
1
=
i
i

n
=
p
x

Wullo S.
Tuesday, December 26, 2023
Three Properties of a Good Estimator
• The estimator should be an unbiased estimator. That is, the
expected value or the mean of the estimates obtained from
samples of a given size is equal to the parameter being
estimated.
• The estimator should be consistent. For a consistent
estimator, as sample size increases, the value of the estimator
approaches the value of the parameter estimated.
• The estimator should be a relatively efficient estimator. That
is, of all the statistics that can be used to estimate a parameter,
the relatively efficient estimator has the smallest variance.
169
Wullo S.
Tuesday, December 26, 2023
Some BLUE estimators
170
Wullo S.
Tuesday, December 26, 2023
Confidence Interval estimate
• However the value of the sample statistic will vary from
sample to sample therefore, to simply obtain an
estimate of the single value of the parameter is not
generally acceptable.
– We need also a measure of how precise our estimate is likely
to be
– We need to take into account the sample to sample variation
of the statistic
• A confidence interval defines an interval within which
the true population parameter is like to fall (interval
estimate).
171
Wullo S.
Tuesday, December 26, 2023
Confidence intervals…
• Confidence interval therefore takes into account the sample to
sample variation of the statistic and gives the measure of
precision.
• The general formula used to calculate a Confidence interval is
Estimate ± K × Standard Error, k is called reliability coefficient
• Confidence intervals express the inherent uncertainty in any
medical study by expressing upper and lower bounds for
anticipated true underlying population parameter.
• The confidence level is the probability that the interval estimate
will contain the parameter, assuming that a large number of
samples are selected and that the estimation process on the
same parameter is repeated
• Most commonly the 95% confidence intervals are calculated,
however 90% and 99% confidence intervals are sometimes used.
172
Wullo S.
Tuesday, December 26, 2023
Confidence interval ……
]
/
)
1
(
.
,
/
)
1
(
.
[
]
.
,
.
[
2
2
2
2
n
p
p
z
p
n
p
p
z
p
n
z
x
n
z
x
−
+
−
−

+
−









173
A (1-α) 100% confidence interval for unknown population mean
and population proportion is given as follows;
Wullo S.
Tuesday, December 26, 2023
Interval estimation
174
Wullo S.
Tuesday, December 26, 2023
175
Wullo S.
Tuesday, December 26, 2023
176
Wullo S.
Tuesday, December 26, 2023
177
Wullo S.
Tuesday, December 26, 2023
Confidence intervals…
• The 95% confidence interval is calculated in such a way that,
under the conditions assumed for underlying distribution, the
interval will contain true population parameter 95% of the time.
• Loosely speaking, you might interpret a 95% confidence interval
as one which you are 95% confident contains the true parameter.
• 90% CI is narrower than 95% CI since we are only 90% certain that
the interval includes the population parameter.
• On the other hand 99% CI will be wider than 95% CI; the extra
width meaning that we can be more certain that the interval will
contain the population parameter. But to obtain a higher
confidence from the same sample, we must be willing to accept a
larger margin of error (a wider interval).
178
Wullo S.
Tuesday, December 26, 2023
Confidence intervals…
• For a given confidence level (i.e. 90%, 95%, 99%) the
width of the confidence interval depends on the
standard error of the estimate which in turn depends
on the
– 1. Sample size:-The larger the sample size, the narrower the
confidence interval (this is to mean the sample statistic will
approach the population parameter) and the more precise
our estimate. Lack of precision means that in repeated
sampling the values of the sample statistic are spread out or
scattered. The result of sampling is not repeatable.
179
Wullo S.
Tuesday, December 26, 2023
Confidence intervals…
- To increase precision (of an SRS), use a larger sample.
You can make the precision as high as you want by
taking a large enough sample. The margin of error
decreases as√n increases.
• 2. Standard deviation:-The more the variation among
the individual values, the wider the confidence
interval and the less precise the estimate. As sample
size increases SD decreases.
• Z is the value from SND
• 90% CI, z=1.64
• 95% CI, z=1.96
• 99% CI, z=2.58 180
Wullo S.
Tuesday, December 26, 2023
Confidence interval ……
• If the population standard deviation is unknown and
the sample size is small (<30), the formula for the
confidence interval for sample mean is: x ± t (s/√n)
– x is the sample mean
– s is the sample standard deviation
– n is the sample size
– t is the value from the t-distribution with (n-1) degrees of freedom
181
Wullo S.
Tuesday, December 26, 2023
Mean Example
 A SRS of 16 apparently healthy subjects yielded the following values of
urine excreted (milligram per day);
0.007, 0.03, 0.025, 0.008, 0.03, 0.038, 0.007, 0.005, 0.032, 0.04, 0.009,
0.014, 0.011, 0.022, 0.009, 0.008
Compute point estimate of the population mean
Construct 90%, 95%, 98% confidence interval for the mean
(0.01844-1.65x0.0123/4, 0.01844+1.65x0.0123/4)=(0.0134, 0.0235)
(0.01844-1.96x0.0123/4, 0.01844+1.96x0.0123/4)=(0.0124, 0.0245)
(0.01844-2.33x0.0123/4, 0.01844+2.33x0.0123/4)=(0.0113, 0.0256)
182
01844
.
0
16
295
.
0
n
x
=
x
then
,
values
observed
n
are
x
...,
,
x
,
x
If
n
1
=
i
i
n
2
1
=
=

Wullo S.
Tuesday, December 26, 2023
Proportion example
• In a survey of 300 automobile drivers in one city, 123 reported
that they wear seat belts regularly. Estimate the seat belt rate of
the city and 95% confidence interval for true population
proportion.
• Answer : p= 123/300 =0.41=41%
n=300,
Estimate of the seat belt of the city at 95%
CI = p ± z ×(√p(1-p) /n) =(0.35,0.47)
183
Wullo S.
Tuesday, December 26, 2023
Summary
• Students sometimes have difficulty deciding whether to use
Za/2 or t a/2 values when finding confidence intervals
184
Wullo S.
Tuesday, December 26, 2023
HYPOTHESIS TESTING
Introduction
– Researchers are interested in answering many types of questions. For example, A
physician might want to know whether a new medication will lower a person’s
blood pressure.
– These types of questions can be addressed through statistical hypothesis testing,
which is a decision-making process for evaluating claims about a population.
185
Wullo S.
Tuesday, December 26, 2023
Hypothesis Testing
• The formal process of hypothesis testing provides us with a
means of answering research questions.
• Hypothesis is a testable statement that describes the nature of
the proposed relationship between two or more variables of
interest.
• In hypothesis testing, the researcher must defined the
population under study, state the particular hypotheses that will
be investigated, give the significance level, select a sample from
the population, collect the data, perform the calculations
required for the statistical test, and reach a conclusion.
186
Wullo S.
Tuesday, December 26, 2023
Idea of hypothesis testing
187
Wullo S.
Tuesday, December 26, 2023
type of Hypotheses
• Null hypothesis (represented by HO) is the statement about the value of the
population parameter. That is the null hypothesis postulates that ‘there is no
difference between factor and outcome’ or ‘there is no an intervention effect’.
• Alternative hypothesis (represented by HA) states the ‘opposing’ view that
‘there is a difference between factor and outcome’ or ‘there is an intervention
effect’.
188
Wullo S.
Tuesday, December 26, 2023
Methods of hypothesis testing
• Hypotheses concerning about parameters
which may or may not be true
• The three methods used to test hypotheses
are
• The traditional method
• The P-value method (New approach)
• The confidence interval method
189
Wullo S.
Tuesday, December 26, 2023
Steps in hypothesis testing
1
Identify the null hypothesis H0 and
the alternate hypothesis HA.
190
3
Select the test statistic and
determine its value from the sample
data. This value is called the
observed value of the test statistic.
Remember that t statistic is usually
appropriate for a small number of
samples; for larger number of
samples, a z statistic can work well if
data are normally distributed.
4
Compare the observed value of the statistic to
the critical value obtained for the chosen a.
5
Make a decision.
6
Conclusion
2
Choose a. The value should be small, usually less
than 10%. It is important to consider the
consequences of both types of errors.
Wullo S.
Tuesday, December 26, 2023
Test Statistics
 Because of random variation, even an unbiased sample may not
accurately represent the population as a whole.
 As a result, it is possible that any observed differences or
associations may have occurred by chance.
• A test statistics is a value we can compare with known
distribution of what we expect when the null hypothesis is true.
• The general formula of the test statistics is:
Observed _ Hypothesized
Test statistics = value value .
Standard error
• The known distributions are Normal distribution, student’s distribution , Chi-
square distribution ….
191
Wullo S.
Tuesday, December 26, 2023
Critical value
• The critical value separates the critical region from the noncritical region
for a given level of significance
192
Wullo S.
Tuesday, December 26, 2023
Decision making
• Accept or Reject the null hypothesis
• There are 2 types of errors
• Type I error is more serious error and it is the level of significant
• power is the probability of rejecting false null hypothesis and it is
given by 1-β
193
Type of decision H0 true H0 false
Reject H0 Type I error (a)
Correct decision (1-
β)
Accept H0
Correct decision (1-
a)
Type II error (β)
Wullo S.
Tuesday, December 26, 2023
194
Wullo S.
Tuesday, December 26, 2023
195
Wullo S.
Tuesday, December 26, 2023
196
Wullo S.
Tuesday, December 26, 2023
H0: m = m0
H1: m < m0
0
0
0
H0: m = m0
H1: m > m0
H0: m = m0
H1: m  m0


/2
Critical
Value(s)
Rejection Regions
One tailed test
Two tailed test
Types of testes
Wullo S. 197
Tuesday, December 26, 2023
Two tailed test
198





=
−
=


=
=
o
tab
cal
o
tab
cal
tabulated
cal
A
H
reject
not
do
z
z
if
H
reject
z
z
if
Decision
test
tailed
two
for
z
z
n
x
z
H
H
|
|
|
|
:
)
(
:
)
(
:
1
2
0
0
0
1
0
0
0


m


m
m


m
m
Wullo S.
Tuesday, December 26, 2023
Steps in hypothesis testing…..
199
If the test statistic falls in the critical
region:
Reject H0 in favour of HA.
If the test statistic does not fall in the
critical region:
Conclude that there is not enough
evidence to reject H0.
Wullo S.
Tuesday, December 26, 2023
One tailed tests
200







=
=



−

−

=
−
=


=
=
o
tab
cal
o
tab
cal
A
o
tab
cal
o
tab
cal
tabulated
cal
A
H
reject
not
do
z
z
if
H
reject
z
z
if
Decision
H
H
H
reject
not
do
z
z
if
H
reject
z
z
if
Decision
test
tailed
one
for
z
z
n
x
z
H
H
:
)
(
:
)
(
:
3
:
,
)
(
:
)
(
:
2
0
0
1
0
0
0
0
0
0
1
0
0
0


m
m


m
m

m


m
m


m
m

Wullo S.
Tuesday, December 26, 2023
The P- Value
• In most applications, the outcome of performing a hypothesis
test is to produce a p-value.
• P-value is the probability that the observed difference is due to
chance.
• A large p-value implies that the probability of the value observed,
occurring just by chance is low, when the null hypothesis is true.
• That is, a small p-value suggests that there might be sufficient
evidence for rejecting the null hypothesis.
• The p value is defined as the probability of observing the
computed significance test value or a larger one, if the H0
hypothesis is true. For example, P[ Z >=Zcal/H0 true].
201
Wullo S.
Tuesday, December 26, 2023
P-value……
• A p-value is the probability of getting the
observed difference, or one more extreme, in
the sample purely by chance from a
population where the true difference is zero.
• If the p-value is greater than 0.05 then, by
convention, we conclude that the observed
difference could have occurred by chance and
there is no statistically significant evidence (at
the 5% level) for a difference between the
202
Wullo S.
Tuesday, December 26, 2023
How to calculate P-value
o Use statistical software like SPSS, SAS……..
o Hand calculations
—obtained the test statistics (Z Calculated or t-
calculated)
—find the probability of test statistics from
standard normal table
—subtract the probability from 0.5
—the result is P-value
Note if the test two tailed multiply 2 the result.
Wullo S. 203
Tuesday, December 26, 2023
P-value and confidence interval
• Confidence intervals and p-values are based upon the same
theory and mathematics and will lead to the same conclusion
about whether a population difference exists.
• Confidence intervals are referable because they give
information about the size of any difference in the population,
and they also (very usefully) indicate the amount of
uncertainty remaining about the size of the difference.
• When the null hypothesis is rejected in a hypothesis-testing
situation, the confidence interval for the mean using the same
level of significance will not contain the hypothesized mean.
204
Wullo S.
Tuesday, December 26, 2023
The P- Value …..
• But for what values of p-value should we reject the null
hypothesis?
– By convention, a p-value of 0.05 or smaller is considered
sufficient evidence for rejecting the null hypothesis.
– By using p-value of 0.05, we are allowing a 5% chance of
wrongly rejecting the null hypothesis when it is in fact
true.
• When the p-value is less than to 0.05, we often say that the
result is statistically significant.
205
Wullo S.
Tuesday, December 26, 2023
SUMMARY
206
Wullo S.
Tuesday, December 26, 2023
Hypothesis testing for single population mean
• EXAMPLE 1: A researcher claims that the mean of the IQ for 16
students is 110 and the expected value for all population is 100
with standard deviation of 10. Test the hypothesis .
• Solution
1. Ho:µ=100 VS HA:µ≠100
2. Assume α=0.05
3. Test statistics: z=(110-100)4/10=4
4. z-critical at 0,025 is equal to 1.96.
5. Decision: reject the null hypothesis since 4 ≥ 1.96
6. Conclusion: the mean of the IQ for all population is different
from 100 at 5% level of significance.
207
Wullo S.
Tuesday, December 26, 2023
Hypothesis testing for single proportions
• Example: In the study of childhood abuse in psychiatry patients, brown found
that 166 in a sample of 947 patients reported histories of physical or sexual
abuse.
a) constructs 95% confidence interval
b) test the hypothesis that the true population proportion is 30%?
• Solution (a)
• The 95% CI for P is given by
208
]
2
.
0
;
151
.
0
[
0124
.
0
96
.
1
175
.
0
947
825
.
0
175
.
0
96
.
1
175
.
0
)
1
(
2








−


n
p
p
z
p 
Wullo S.
Tuesday, December 26, 2023
Example……
• To the hypothesis we need to follow the steps
Step 1: State the hypothesis
Ho: P=Po=0.3
Ha: P≠Po ≠0.3
Step 2: Fix the level of significant (α=0.05)
Step 3: Compute the calculated and tabulated value of the test statistic
209
96
.
1
39
.
8
0149
.
0
125
.
0
947
)
7
.
0
(
3
.
0
3
.
0
175
.
0
)
1
(
=
−
=
−
=
−
=
−
−
=

tab
cal
z
n
p
p
Po
p
z
Wullo S.
Tuesday, December 26, 2023
Example……
• Step 4: Comparison of the calculated and tabulated values of the
test statistic
• Since the tabulated value is smaller than the calculated value of
the test the we reject the null hypothesis.
• Step 6: Conclusion
• Hence we concluded that the proportion of childhood abuse in
psychiatry patients is different from 0.3
• If the sample size is small (if np<5 and n(1-p)<5) then use student’s
t- statistic for the tabulated value of the test statistic.
210
Wullo S.
Tuesday, December 26, 2023
Two sample mean and proportion
• Still now we have seen estimate for only single mean and single
proportion. However it is possible to compute point and interval
estimation for the difference of two sample means.
• let x1, x2, …, xn1 are samples from the first population and y1, y2,
…, yn2 be sample from the second population.
• Sample mean for the first population be
• Sample mean for the second population
• Then the point estimate for the difference of means (µ1-µ2) is
given by
211
)
( Y
X −
Y
X
Wullo S.
Tuesday, December 26, 2023
Two sample estimation
• Confidence interval estimation
• A (1-α)100% confidence interval for the
difference of means is given by
• If are unknown, then can be estimated
by
212
2
2
2
1
2
1
2
)
(
n
n
z
y
x


 +

−
2
1, 
 and
2
1, s
and
s
Wullo S.
Tuesday, December 26, 2023
Hypothesis testing for two sample means
• The steps to test the hypothesis for difference of means is the
same with the single mean
Step 1: state the hypothesis
Ho: µ1-µ2 =0
VS
HA: µ1-µ2 ≠0, HA: µ1-µ2 <0, HA: µ1-µ2 >0
Step 2: Significance level (α)
Step 3: Test statistic
213
2
2
2
1
2
1
2
1 )
(
)
(
n
n
y
x
zcal


m
m
+
−
−
−
=
Wullo S.
Tuesday, December 26, 2023
Hypothesis …






−




−


−






−
=
=
o
cal
cal
o
cal
cal
A
o
cal
cal
o
tab
cal
A
o
tab
cal
o
tab
cal
A
tabulated
tabulated
H
reject
not
do
z
z
if
H
reject
z
z
if
H
For
H
reject
not
do
z
z
if
H
reject
z
z
if
H
For
H
reject
not
do
z
z
if
H
reject
z
z
if
H
For
test
tailed
one
for
z
z
test
tailed
two
for
z
z
0
:
0
:
|
|
|
|
0
:
2
1
2
1
2
1
2
m
m
m
m
m
m


214
Wullo S.
Tuesday, December 26, 2023
Small sample size and population variance is not given
• The test statistic will be student’s t-statistic with degree of
freedom equals to n1+n2 -2
• Hence the tabulated value of t is read from the table.
• The decision remains the same
215
Wullo S.
Tuesday, December 26, 2023
Example
• A researchers wish to know if the data they have collected provide
sufficient evidence to indicate a difference in mean serum uric
acid levels between normal individual and individual with down’s
syndrome. The data consists of serum uric acid readings on 12
individuals with down’s syndrome and 15 normal individuals. The
means are 4.5mg/100ml and 3.4 mg/100ml with standard
deviation of 2.9 and 3.5 mg/100ml respectively.
216
0
:
0
:
2
1
2
1

−
=
−
m
m
m
m
A
O
H
H
Wullo S.
Tuesday, December 26, 2023
SOLUTION
217
96
.
1
33
.
5
23
.
1
6
.
1
5178
.
1
6
.
1
15
5
.
3
12
9
.
2
0
)
4
.
3
3
.
4
(
)
(
)
(
025
.
0
2
2
2
2
2
2
1
2
1
2
1
=
=
=
=
=
+
−
−
=
+
−
−
−
=
z
z
n
n
y
x
zcal



m
m
Wullo S.
Tuesday, December 26, 2023
Estimation and hypothesis testing for two population proportion
• Let n1 and n2 be the sample size from the two population. If x and
y are the out come of interest then the point estimate for each
population is given by p1=x/n1 and p2=y/n2 respectively.
• The point estimates π1-π2 =p1-p2
• The interval estimate for the difference of proportions is given by
• If the sample size is large and n1p1>5, n1 (1-p1)>5, n2p2>5, then
218







 −
+
−

−
2
2
2
1
1
1
2
2
1
)
1
(
)
1
(
n
p
p
n
p
p
z
p
p 
Wullo S.
Tuesday, December 26, 2023
Hypothesis testing for two proportions
• To test the hypothesis
Ho: π1-π2 =0
VS
HA: π1-π2 ≠0
The test statistic is given by
219
2
2
2
1
1
1
2
1
2
1
)
1
(
)
1
(
)
(
)
(
n
p
p
n
p
p
p
p
zcal
−
+
−
−
−
−
=


Wullo S.
Tuesday, December 26, 2023
Small sample size
• If the sample size is small and n1p1>5, n2p2<5,
then use student’s t-statistic at n1+ n2-2
degrees of freedom with the given level of
significant.
220
Wullo S.
Tuesday, December 26, 2023
Chi-square test
• In recent years, the use of specialized statistical methods for
categorical data has increased dramatically, particularly for
applications in the biomedical and social sciences.
• Categorical scales occur frequently in the health sciences, for
measuring responses.
• E.g.
• patient survives an operation (yes, no),
• severity of an injury (none, mild, moderate, severe),
and
• stage of a disease (initial, advanced).
• Studies often collect data on categorical variables that can be
summarized as a series of counts and commonly arranged in
a tabular format known as a contingency table
221
Wullo S.
Tuesday, December 26, 2023
Chi-square Test Statistic cont’d…
• As with the z and t distributions, there is a different chi-square
distribution for each possible value of degrees of freedom.
Chi-square distributions with a small number of degrees of freedom
are highly skewed; however, this skewness is attenuated as the
number of degrees of freedom increases.
The chi-squared distribution is concentrated over nonnegative
values. It has mean equal to its degrees of freedom (df), and its
standard deviation equals √(2df ). As df increases, the distribution
concentrates around larger values and is more spread out.
The distribution is skewed to the right, but it becomes more bell-
shaped (normal) as df increases.
222
Wullo S.
Tuesday, December 26, 2023
The degrees of freedom for tests of hypothesis that involve an rxc
contingency table is equal to (r-1)x(c-1);
223
Wullo S.
Tuesday, December 26, 2023
Test of Association
• The chi-squared (2) test statistics is widely used in the
analysis of contingency tables.
• It compares the actual observed frequency in each group
with the expected frequency (the later is based on theory,
experience or comparison groups).
• The chi-squared test (Pearson’s χ2) allows us to test for
association between categorical (nominal!) variables.
• The null hypothesis for this test is there is no association
between the variables. Consequently a significant p-value
implies association.
224
Wullo S.
Tuesday, December 26, 2023
Test Statistic: 2-test with d.f. = (r-1)x(c-1)
225
( )

−
=
j
i ij
ij
ij
E
E
O
,
2
2

n
C
R
i
E
j
i
th
ij

=

=
total
grand
al
column tot
j
total
raw th
Oij=observed frequency, Eij=expected frequency of the cell at the
juncture of I th raw & j th column
Wullo S.
Tuesday, December 26, 2023
Procedures of Hypothesis Testing
1. State the hypothesis
2. Fix level of significance
3. Find the critical value (x2 (df, α))
4. Compute the test statistics
5. Decision rules; reject null hypothesis if test statistics > table
value.
226
Wullo S.
Tuesday, December 26, 2023
Test of associations for 2x2 tables
• If we call the frequencies in the four cells of 2x2 table a, b, c and d then
the table is given by
227
Disease status Row total
D ND
Exposur
e Status
E a b a+b
NE c d c+d
Column total a+c b+d N
Wullo S.
Tuesday, December 26, 2023
Test of association
• If the contingency table is 2x2
• Is the table is rxc then
228
( )
)
)(
)(
)(
(
2
2
d
c
b
a
d
b
c
a
bc
ad
n
+
+
+
+
−
=

( )

−
=
j
i ij
ij
ij
E
E
O
,
2
2

Wullo S.
Tuesday, December 26, 2023
Assumptions of the 2 - test
The chi-squared test assumes that
• Data must be categorical
• The data be a frequency data
– the numbers in each cell are ‘not too small’. No expected frequency should be
less than 1, and
– no more than 20% of the expected frequencies should be less than 5.
• If this does not hold row or column variables categories can
sometimes be combined (re-categorized) to make the expected
frequencies larger or use Yates continuity correction.
229
Wullo S.
Tuesday, December 26, 2023
Yates correction
• It is a requirement that a chi-squared test be applied to discrete data. Counting
numbers are appropriate, continuous measurements are not. Assuming
continuity in the underlying distribution distorts the p value and may make false
positives more likely.
• Frank Yates proposed a correction to the chi-squared formula. Adding a small
negative term to the argument. This tends to increase the p-value, and makes
the test more conservative, making false positives less likely. However, the test
may now be *too* conservative.
• Additionally, chi squared test should not be used when the observed values in a
cell are <5. It is, at times not inappropriate to pad an empty cell with a small
value, though, as one can only assume the result would be more significant with
no value there.
230
Wullo S.
Tuesday, December 26, 2023
Biostatistics ppt.pdf
Biostatistics ppt.pdf
Biostatistics ppt.pdf
Biostatistics ppt.pdf

More Related Content

Similar to Biostatistics ppt.pdf

1Measurements of health and disease_Introduction.pdf
1Measurements of health and disease_Introduction.pdf1Measurements of health and disease_Introduction.pdf
1Measurements of health and disease_Introduction.pdfAmanuelDina
 
Lect 1_Biostat.pdf
Lect 1_Biostat.pdfLect 1_Biostat.pdf
Lect 1_Biostat.pdfBirhanTesema
 
Data Display and Summary
Data Display and SummaryData Display and Summary
Data Display and SummaryDrZahid Khan
 
Medical Statistics.ppt
Medical Statistics.pptMedical Statistics.ppt
Medical Statistics.pptssuserf0d95a
 
Statistics for Data Analytics
Statistics for Data AnalyticsStatistics for Data Analytics
Statistics for Data AnalyticsSSaudia
 
1 Introduction to Biostatistics.pdf
1 Introduction to Biostatistics.pdf1 Introduction to Biostatistics.pdf
1 Introduction to Biostatistics.pdfbayisahrsa
 
Chapter-one.pptx
Chapter-one.pptxChapter-one.pptx
Chapter-one.pptxAbebeNega
 
Introduction to statistics.pptx
Introduction to statistics.pptxIntroduction to statistics.pptx
Introduction to statistics.pptxUnfold1
 
2021f_Cross-sectional study.pptx
2021f_Cross-sectional study.pptx2021f_Cross-sectional study.pptx
2021f_Cross-sectional study.pptxtamanielkhair
 
Sample &amp; data collection method,sample size estimation,variables
Sample &amp; data collection method,sample size estimation,variablesSample &amp; data collection method,sample size estimation,variables
Sample &amp; data collection method,sample size estimation,variablesBipulBorthakur
 

Similar to Biostatistics ppt.pdf (20)

CH1.pdf
CH1.pdfCH1.pdf
CH1.pdf
 
Ch1
Ch1Ch1
Ch1
 
1Measurements of health and disease_Introduction.pdf
1Measurements of health and disease_Introduction.pdf1Measurements of health and disease_Introduction.pdf
1Measurements of health and disease_Introduction.pdf
 
Biostatistics
BiostatisticsBiostatistics
Biostatistics
 
Biostatistics
BiostatisticsBiostatistics
Biostatistics
 
Biostatistics khushbu
Biostatistics khushbuBiostatistics khushbu
Biostatistics khushbu
 
Biostatistics
BiostatisticsBiostatistics
Biostatistics
 
Understanding statistics in research
Understanding statistics in researchUnderstanding statistics in research
Understanding statistics in research
 
Lect 1_Biostat.pdf
Lect 1_Biostat.pdfLect 1_Biostat.pdf
Lect 1_Biostat.pdf
 
Data Display and Summary
Data Display and SummaryData Display and Summary
Data Display and Summary
 
Medical Statistics.ppt
Medical Statistics.pptMedical Statistics.ppt
Medical Statistics.ppt
 
Statistics for Data Analytics
Statistics for Data AnalyticsStatistics for Data Analytics
Statistics for Data Analytics
 
Biostatistics
BiostatisticsBiostatistics
Biostatistics
 
1 Introduction to Biostatistics.pdf
1 Introduction to Biostatistics.pdf1 Introduction to Biostatistics.pdf
1 Introduction to Biostatistics.pdf
 
Biostatistics
Biostatistics Biostatistics
Biostatistics
 
Chapter-one.pptx
Chapter-one.pptxChapter-one.pptx
Chapter-one.pptx
 
Medical Statistics.pptx
Medical Statistics.pptxMedical Statistics.pptx
Medical Statistics.pptx
 
Introduction to statistics.pptx
Introduction to statistics.pptxIntroduction to statistics.pptx
Introduction to statistics.pptx
 
2021f_Cross-sectional study.pptx
2021f_Cross-sectional study.pptx2021f_Cross-sectional study.pptx
2021f_Cross-sectional study.pptx
 
Sample &amp; data collection method,sample size estimation,variables
Sample &amp; data collection method,sample size estimation,variablesSample &amp; data collection method,sample size estimation,variables
Sample &amp; data collection method,sample size estimation,variables
 

Recently uploaded

💸Cash Payment No Advance Call Girls Pune 🧿 9332606886 🧿 High Class Call Girl ...
💸Cash Payment No Advance Call Girls Pune 🧿 9332606886 🧿 High Class Call Girl ...💸Cash Payment No Advance Call Girls Pune 🧿 9332606886 🧿 High Class Call Girl ...
💸Cash Payment No Advance Call Girls Pune 🧿 9332606886 🧿 High Class Call Girl ...India Call Girls
 
💞 Safe And Secure Call Girls gaya 🧿 9332606886 🧿 High Class Call Girl Service...
💞 Safe And Secure Call Girls gaya 🧿 9332606886 🧿 High Class Call Girl Service...💞 Safe And Secure Call Girls gaya 🧿 9332606886 🧿 High Class Call Girl Service...
💞 Safe And Secure Call Girls gaya 🧿 9332606886 🧿 High Class Call Girl Service...India Call Girls
 
Call Girls In Indore 💯Call Us 🔝 9987056364 🔝 💃 Independent Escort Service Ind...
Call Girls In Indore 💯Call Us 🔝 9987056364 🔝 💃 Independent Escort Service Ind...Call Girls In Indore 💯Call Us 🔝 9987056364 🔝 💃 Independent Escort Service Ind...
Call Girls In Indore 💯Call Us 🔝 9987056364 🔝 💃 Independent Escort Service Ind...minkseocompany
 
💸Cash Payment No Advance Call Girls Nagpur 🧿 9332606886 🧿 High Class Call Gir...
💸Cash Payment No Advance Call Girls Nagpur 🧿 9332606886 🧿 High Class Call Gir...💸Cash Payment No Advance Call Girls Nagpur 🧿 9332606886 🧿 High Class Call Gir...
💸Cash Payment No Advance Call Girls Nagpur 🧿 9332606886 🧿 High Class Call Gir...India Call Girls
 
The Events of Cardiac Cycle - Wigger's Diagram
The Events of Cardiac Cycle - Wigger's DiagramThe Events of Cardiac Cycle - Wigger's Diagram
The Events of Cardiac Cycle - Wigger's DiagramMedicoseAcademics
 
🍑👄Ludhiana Escorts Service☎️98157-77685🍑👄 Call Girl service in Ludhiana☎️Ludh...
🍑👄Ludhiana Escorts Service☎️98157-77685🍑👄 Call Girl service in Ludhiana☎️Ludh...🍑👄Ludhiana Escorts Service☎️98157-77685🍑👄 Call Girl service in Ludhiana☎️Ludh...
🍑👄Ludhiana Escorts Service☎️98157-77685🍑👄 Call Girl service in Ludhiana☎️Ludh...dilpreetentertainmen
 
💸Cash Payment No Advance Call Girls Hyderabad 🧿 9332606886 🧿 High Class Call ...
💸Cash Payment No Advance Call Girls Hyderabad 🧿 9332606886 🧿 High Class Call ...💸Cash Payment No Advance Call Girls Hyderabad 🧿 9332606886 🧿 High Class Call ...
💸Cash Payment No Advance Call Girls Hyderabad 🧿 9332606886 🧿 High Class Call ...India Call Girls
 
2024 PCP #IMPerative Updates in Rheumatology
2024 PCP #IMPerative Updates in Rheumatology2024 PCP #IMPerative Updates in Rheumatology
2024 PCP #IMPerative Updates in RheumatologySidney Erwin Manahan
 
👉Indore Call Girl Service👉📞 7718850664 👉📞 Just📲 Call Anuj Call Girls In Indor...
👉Indore Call Girl Service👉📞 7718850664 👉📞 Just📲 Call Anuj Call Girls In Indor...👉Indore Call Girl Service👉📞 7718850664 👉📞 Just📲 Call Anuj Call Girls In Indor...
👉Indore Call Girl Service👉📞 7718850664 👉📞 Just📲 Call Anuj Call Girls In Indor...minkseocompany
 
💸Cash Payment No Advance Call Girls Kanpur 🧿 9332606886 🧿 High Class Call Gir...
💸Cash Payment No Advance Call Girls Kanpur 🧿 9332606886 🧿 High Class Call Gir...💸Cash Payment No Advance Call Girls Kanpur 🧿 9332606886 🧿 High Class Call Gir...
💸Cash Payment No Advance Call Girls Kanpur 🧿 9332606886 🧿 High Class Call Gir...India Call Girls
 
❤️Amritsar Escort Service☎️98151-129OO☎️ Call Girl service in Amritsar☎️ Amri...
❤️Amritsar Escort Service☎️98151-129OO☎️ Call Girl service in Amritsar☎️ Amri...❤️Amritsar Escort Service☎️98151-129OO☎️ Call Girl service in Amritsar☎️ Amri...
❤️Amritsar Escort Service☎️98151-129OO☎️ Call Girl service in Amritsar☎️ Amri...shallyentertainment1
 
Making change happen: learning from "positive deviancts"
Making change happen: learning from "positive deviancts"Making change happen: learning from "positive deviancts"
Making change happen: learning from "positive deviancts"HelenBevan4
 
💞 Safe And Secure Call Girls Mysore 🧿 9332606886 🧿 High Class Call Girl Servi...
💞 Safe And Secure Call Girls Mysore 🧿 9332606886 🧿 High Class Call Girl Servi...💞 Safe And Secure Call Girls Mysore 🧿 9332606886 🧿 High Class Call Girl Servi...
💞 Safe And Secure Call Girls Mysore 🧿 9332606886 🧿 High Class Call Girl Servi...India Call Girls
 
❤️ Zirakpur Call Girl Service ☎️9878799926☎️ Call Girl service in Zirakpur ☎...
❤️ Zirakpur Call Girl Service  ☎️9878799926☎️ Call Girl service in Zirakpur ☎...❤️ Zirakpur Call Girl Service  ☎️9878799926☎️ Call Girl service in Zirakpur ☎...
❤️ Zirakpur Call Girl Service ☎️9878799926☎️ Call Girl service in Zirakpur ☎...daljeetkaur2026
 
TIME FOR ACTION: MAY 2024 Securing A Strong Nursing Workforce for North Carolina
TIME FOR ACTION: MAY 2024 Securing A Strong Nursing Workforce for North CarolinaTIME FOR ACTION: MAY 2024 Securing A Strong Nursing Workforce for North Carolina
TIME FOR ACTION: MAY 2024 Securing A Strong Nursing Workforce for North CarolinaMebane Rash
 
👉Bangalore Call Girl Service👉📞 6378878445 👉📞 Just📲 Call Manisha Call Girls Se...
👉Bangalore Call Girl Service👉📞 6378878445 👉📞 Just📲 Call Manisha Call Girls Se...👉Bangalore Call Girl Service👉📞 6378878445 👉📞 Just📲 Call Manisha Call Girls Se...
👉Bangalore Call Girl Service👉📞 6378878445 👉📞 Just📲 Call Manisha Call Girls Se...dharampalsingh2210
 
❤️ Escorts Service in Bangalore ☎️81279-924O8☎️ Call Girl service in Bangalor...
❤️ Escorts Service in Bangalore ☎️81279-924O8☎️ Call Girl service in Bangalor...❤️ Escorts Service in Bangalore ☎️81279-924O8☎️ Call Girl service in Bangalor...
❤️ Escorts Service in Bangalore ☎️81279-924O8☎️ Call Girl service in Bangalor...chandigarhentertainm
 
❤️ Call Girls service In Panchkula☎️9815457724☎️ Call Girl service in Panchku...
❤️ Call Girls service In Panchkula☎️9815457724☎️ Call Girl service in Panchku...❤️ Call Girls service In Panchkula☎️9815457724☎️ Call Girl service in Panchku...
❤️ Call Girls service In Panchkula☎️9815457724☎️ Call Girl service in Panchku...Rashmi Entertainment
 
💞 Safe And Secure Call Girls Coimbatore 🧿 9332606886 🧿 High Class Call Girl S...
💞 Safe And Secure Call Girls Coimbatore 🧿 9332606886 🧿 High Class Call Girl S...💞 Safe And Secure Call Girls Coimbatore 🧿 9332606886 🧿 High Class Call Girl S...
💞 Safe And Secure Call Girls Coimbatore 🧿 9332606886 🧿 High Class Call Girl S...India Call Girls
 

Recently uploaded (19)

💸Cash Payment No Advance Call Girls Pune 🧿 9332606886 🧿 High Class Call Girl ...
💸Cash Payment No Advance Call Girls Pune 🧿 9332606886 🧿 High Class Call Girl ...💸Cash Payment No Advance Call Girls Pune 🧿 9332606886 🧿 High Class Call Girl ...
💸Cash Payment No Advance Call Girls Pune 🧿 9332606886 🧿 High Class Call Girl ...
 
💞 Safe And Secure Call Girls gaya 🧿 9332606886 🧿 High Class Call Girl Service...
💞 Safe And Secure Call Girls gaya 🧿 9332606886 🧿 High Class Call Girl Service...💞 Safe And Secure Call Girls gaya 🧿 9332606886 🧿 High Class Call Girl Service...
💞 Safe And Secure Call Girls gaya 🧿 9332606886 🧿 High Class Call Girl Service...
 
Call Girls In Indore 💯Call Us 🔝 9987056364 🔝 💃 Independent Escort Service Ind...
Call Girls In Indore 💯Call Us 🔝 9987056364 🔝 💃 Independent Escort Service Ind...Call Girls In Indore 💯Call Us 🔝 9987056364 🔝 💃 Independent Escort Service Ind...
Call Girls In Indore 💯Call Us 🔝 9987056364 🔝 💃 Independent Escort Service Ind...
 
💸Cash Payment No Advance Call Girls Nagpur 🧿 9332606886 🧿 High Class Call Gir...
💸Cash Payment No Advance Call Girls Nagpur 🧿 9332606886 🧿 High Class Call Gir...💸Cash Payment No Advance Call Girls Nagpur 🧿 9332606886 🧿 High Class Call Gir...
💸Cash Payment No Advance Call Girls Nagpur 🧿 9332606886 🧿 High Class Call Gir...
 
The Events of Cardiac Cycle - Wigger's Diagram
The Events of Cardiac Cycle - Wigger's DiagramThe Events of Cardiac Cycle - Wigger's Diagram
The Events of Cardiac Cycle - Wigger's Diagram
 
🍑👄Ludhiana Escorts Service☎️98157-77685🍑👄 Call Girl service in Ludhiana☎️Ludh...
🍑👄Ludhiana Escorts Service☎️98157-77685🍑👄 Call Girl service in Ludhiana☎️Ludh...🍑👄Ludhiana Escorts Service☎️98157-77685🍑👄 Call Girl service in Ludhiana☎️Ludh...
🍑👄Ludhiana Escorts Service☎️98157-77685🍑👄 Call Girl service in Ludhiana☎️Ludh...
 
💸Cash Payment No Advance Call Girls Hyderabad 🧿 9332606886 🧿 High Class Call ...
💸Cash Payment No Advance Call Girls Hyderabad 🧿 9332606886 🧿 High Class Call ...💸Cash Payment No Advance Call Girls Hyderabad 🧿 9332606886 🧿 High Class Call ...
💸Cash Payment No Advance Call Girls Hyderabad 🧿 9332606886 🧿 High Class Call ...
 
2024 PCP #IMPerative Updates in Rheumatology
2024 PCP #IMPerative Updates in Rheumatology2024 PCP #IMPerative Updates in Rheumatology
2024 PCP #IMPerative Updates in Rheumatology
 
👉Indore Call Girl Service👉📞 7718850664 👉📞 Just📲 Call Anuj Call Girls In Indor...
👉Indore Call Girl Service👉📞 7718850664 👉📞 Just📲 Call Anuj Call Girls In Indor...👉Indore Call Girl Service👉📞 7718850664 👉📞 Just📲 Call Anuj Call Girls In Indor...
👉Indore Call Girl Service👉📞 7718850664 👉📞 Just📲 Call Anuj Call Girls In Indor...
 
💸Cash Payment No Advance Call Girls Kanpur 🧿 9332606886 🧿 High Class Call Gir...
💸Cash Payment No Advance Call Girls Kanpur 🧿 9332606886 🧿 High Class Call Gir...💸Cash Payment No Advance Call Girls Kanpur 🧿 9332606886 🧿 High Class Call Gir...
💸Cash Payment No Advance Call Girls Kanpur 🧿 9332606886 🧿 High Class Call Gir...
 
❤️Amritsar Escort Service☎️98151-129OO☎️ Call Girl service in Amritsar☎️ Amri...
❤️Amritsar Escort Service☎️98151-129OO☎️ Call Girl service in Amritsar☎️ Amri...❤️Amritsar Escort Service☎️98151-129OO☎️ Call Girl service in Amritsar☎️ Amri...
❤️Amritsar Escort Service☎️98151-129OO☎️ Call Girl service in Amritsar☎️ Amri...
 
Making change happen: learning from "positive deviancts"
Making change happen: learning from "positive deviancts"Making change happen: learning from "positive deviancts"
Making change happen: learning from "positive deviancts"
 
💞 Safe And Secure Call Girls Mysore 🧿 9332606886 🧿 High Class Call Girl Servi...
💞 Safe And Secure Call Girls Mysore 🧿 9332606886 🧿 High Class Call Girl Servi...💞 Safe And Secure Call Girls Mysore 🧿 9332606886 🧿 High Class Call Girl Servi...
💞 Safe And Secure Call Girls Mysore 🧿 9332606886 🧿 High Class Call Girl Servi...
 
❤️ Zirakpur Call Girl Service ☎️9878799926☎️ Call Girl service in Zirakpur ☎...
❤️ Zirakpur Call Girl Service  ☎️9878799926☎️ Call Girl service in Zirakpur ☎...❤️ Zirakpur Call Girl Service  ☎️9878799926☎️ Call Girl service in Zirakpur ☎...
❤️ Zirakpur Call Girl Service ☎️9878799926☎️ Call Girl service in Zirakpur ☎...
 
TIME FOR ACTION: MAY 2024 Securing A Strong Nursing Workforce for North Carolina
TIME FOR ACTION: MAY 2024 Securing A Strong Nursing Workforce for North CarolinaTIME FOR ACTION: MAY 2024 Securing A Strong Nursing Workforce for North Carolina
TIME FOR ACTION: MAY 2024 Securing A Strong Nursing Workforce for North Carolina
 
👉Bangalore Call Girl Service👉📞 6378878445 👉📞 Just📲 Call Manisha Call Girls Se...
👉Bangalore Call Girl Service👉📞 6378878445 👉📞 Just📲 Call Manisha Call Girls Se...👉Bangalore Call Girl Service👉📞 6378878445 👉📞 Just📲 Call Manisha Call Girls Se...
👉Bangalore Call Girl Service👉📞 6378878445 👉📞 Just📲 Call Manisha Call Girls Se...
 
❤️ Escorts Service in Bangalore ☎️81279-924O8☎️ Call Girl service in Bangalor...
❤️ Escorts Service in Bangalore ☎️81279-924O8☎️ Call Girl service in Bangalor...❤️ Escorts Service in Bangalore ☎️81279-924O8☎️ Call Girl service in Bangalor...
❤️ Escorts Service in Bangalore ☎️81279-924O8☎️ Call Girl service in Bangalor...
 
❤️ Call Girls service In Panchkula☎️9815457724☎️ Call Girl service in Panchku...
❤️ Call Girls service In Panchkula☎️9815457724☎️ Call Girl service in Panchku...❤️ Call Girls service In Panchkula☎️9815457724☎️ Call Girl service in Panchku...
❤️ Call Girls service In Panchkula☎️9815457724☎️ Call Girl service in Panchku...
 
💞 Safe And Secure Call Girls Coimbatore 🧿 9332606886 🧿 High Class Call Girl S...
💞 Safe And Secure Call Girls Coimbatore 🧿 9332606886 🧿 High Class Call Girl S...💞 Safe And Secure Call Girls Coimbatore 🧿 9332606886 🧿 High Class Call Girl S...
💞 Safe And Secure Call Girls Coimbatore 🧿 9332606886 🧿 High Class Call Girl S...
 

Biostatistics ppt.pdf

  • 1. University of Gondar College of medicine and health science Department of Epidemiology and Biostatistics Basic Biostatistics Wullo S. (BSc, MPH, Assistant professor) Tuesday, December 26, 2023 1
  • 2. Chapter One 1.1 Introduction to Biostatistics ❖ Objectives of the chapter ➢ After completing this chapter, the student will be able to: – Define Statistics and Biostatistics – Identify the Branch of biostatistics – Enumerate the importance and limitations of biostatistics – Define and Identify the different types of data and understand why we need to classify variables 2 12/26/2023
  • 3. Definition and classification of Biostatistics  Statistics is the science of  collecting  organizing  Presenting  analysing and drawing conclusion (inferences) from data for the purpose of making decision. ❖ Biostatistics: The application of statistical methods to the fields of biological and health sciences. 3 12/26/2023
  • 4. Classification of Biostatistics Descriptive biostatistics ❖ A statistical method that is concerned with the collection, organization, summarization, and analysis of data from a sample of population. Inferential biostatistics ❖ A statistical method that is concerned with the drawing conclusions/inference about a particular population by selecting and measuring a random sample from the population. 4 12/26/2023
  • 5. Cont… collection organizing summarizing presenting of data Descriptive Statistics making inferences hypothesis testing determining relationship making the prediction Inferential Statistics Biostatistics 5 12/26/2023
  • 6. Descriptive Biostatistics • Some statistical summaries which are especially common in descriptive analyses are: ✓ Measures of central tendency ✓ Measures of dispersion ✓ Measures of association ✓ Cross-tabulation, contingency table ✓ Histogram ✓ Quantile, Q-Q plot ✓ Scatter plot ✓ Box plot 12/26/2023 6
  • 8. 1.2 Stages in statistical investigation There are five stages or steps in any statistical investigation. 1. Collection of data  The process of obtaining measurements or counts. 2. Organization of data Includes editing, classifying, and tabulating the data collected. 3. Presentation of data: overall view of what the data actually looks like. facilitate further statistical analysis. Can be done in the form of tables and graphs or diagrams. 8 12/26/2023
  • 9. Cont… 4. Analysis of data  To dig out useful information for decision making  It involves extracting relevant information from the data (like mean, median, mode, range, variance…), 5. Interpretation of data  Concerned with drawing conclusions from the data collected and analyzed; and giving meaning to analysis results.  A difficult task and requires a high degree of skill and experience. 9 12/26/2023
  • 10. 1.3 Definition of Some Basic terms Population: is the complete set of possible measurements for which inferences are to be made. Census: a complete enumeration of the population. But in most real problems it cannot be realized, hence we take sample. Sample: A sample from a population is the set of measurements that are actually collected in the course of an investigation. Parameter: Characteristic or measure obtained from a population. Statistic: A statistic (rather than the filed of Statistics) refers to a numerical quantity computed from sample data (e.g. the mean, the median, the maximum). 10 12/26/2023
  • 12. Cont... Sampling: The process or method of sample selection from the population. Sample size: The number of elements or observation to be included in the sample. variable is a characteristic or attribute that can assume different values in different persons, places, or things. Some examples of variables include: ▪ Diastolic blood pressure, ▪ heart rate, heights, ▪ The weights Data: Refers to a collection of facts, values, observations, or measurements that the variables can assume. 12 12/26/2023
  • 13. Uses of statistics: The main function of statistics is to enlarge our knowledge of complex phenomena. The following are some uses of statistics: ▪ Estimating unknown population characteristics. ▪ Testing and formulating of hypothesis. ▪ Studying the relationship between two or more variable. ▪ Forecasting future events. ▪ Measuring the magnitude of variations in data. ▪ Furnishes a technique of comparison. 13 12/26/2023
  • 14. Limitations of statistics As a science statistics has its own limitations. The following are some of the limitations: ▪ Deals with only quantitative information. ▪ Deals with only aggregate of facts and not with individual data items. ▪ Statistical data are only approximately and not mathematical correct. ▪ Statistics can be easily misused and therefore should be used be experts. 14 12/26/2023
  • 15. 1.5 Types of Variables and Measurement Scales A variable is a characteristic or attribute that can assume different values in different persons, places, or things. Examples : ▪ age, ▪ diastolic blood pressure, ▪ heart rate, ▪ the height of adult males, ▪ the weights of preschool children, ▪ gender of Biostatistics students, ▪ marital status of instructors at University of Gondar, ▪ ethnic group of patients 15 12/26/2023
  • 16. A. Depending on the characteristic of the measurement, variable can be: ❖ Qualitative(Categorical) variable ✓ A variable or characteristic which cannot be measured in quantitative form but can only be identified by name or categories, ✓ for instance place of birth, ethnic group, type of drug, stages of breast cancer (I, II, III, or IV), degree of pain (minimal, moderate, sever or unbearable). ✓ The categories should be clear cut, not overlapping, and cover all the possibilities. For example, sex (male or female), vital status (alive or dead), disease stage (depends on disease), ever smoked (yes or no). 16 12/26/2023
  • 17. Quantitative(Numerical) variable: ➢ is one that can be measured and expressed numerically. Example: survival time, systolic blood pressure, number of children in a family, height, age, body mass index. ➢ they can be of two types Discrete Variables ✓ Have a set of possible values that is either finite or countabl infinite. ✓ The values of a discrete variable are usually whole numbers. ✓ Numerical discrete data occur when the observations are integers that correspond with a count of some sort. 17 12/26/2023
  • 18. Some common examples are: ▪ Number of pregnancies, ▪ The number of bacteria colonies on a plate, ▪ The number of cells within a prescribed area upon microscopic examination, ▪ The number of heart beats within a specified time interval, ▪ A mother’s history of numbers of births ( parity) and pregnancies ▪ The number of episode of illness a patient experiences during some time period, etc. 18 12/26/2023
  • 19. Continuous Variables ✓ A continuous variable has a set of possible values including all values in an interval of the real line. ✓ No gaps between possible values. ✓ Each observation theoretically falls somewhere along a continuum. Example: body mass index, height, blood pressure, serum cholesterol level, weight, age etc. 19 12/26/2023
  • 20. Con… ✓ Observations are not restricted to take on certain numerical values: Often measurements (e.g., height, weight, age). ✓ Continuous data are used to report a measurement of the individual that can take on any value within an acceptable range. 20 12/26/2023
  • 21. Level of measurement which classifies data into mutually exclusive, all inclusive categories in which no order or ranking can be imposed on the data. ▪ Assign subjects to groups or categories ▪ No order or distance relationship ▪ No arithmetic origin ▪ Only count numbers in categories ▪ Only present percentages of categories ▪ Chi-square most often used test of statistical significance Nominal Scale 12/26/2023 21
  • 22. Sex Social status Marital status Days of the week (months) Geographic location Seasons Ethnic group Types of restaurants Brand choice Religion Job type : executive, technical, clerical Other Examples Coded as “0” Coded as “1” Nominal Scale 12/26/2023 22
  • 23. ▪Classifies data according to some order or rank ▪With ordinal data, it is fair to say that one response is greater or less than another. ▪E.g. if people were asked to rate the hotness of 3 chili peppers, a scale of "hot", "hotter" and "hottest" could be used. Values of "1" for "hot", "2" for "hotter" and "3" for "hottest" could be assigned. Ordinal Scale Level of measurement which classifies data into categories that can be ranked. Differences between the ranks do not exist. 12/26/2023 23
  • 24. Ordinal Scales • Arithmetic operations are not applicable but relational operations are applicable. • Ordering is the sole property of ordinal scale. Examples: Letter grades (A, B, C, D, F). Rating scales (Excellent, Very good, Good, Fair, poor). Military status. 12/26/2023 24
  • 25. Interval Scales • Level of measurement which classifies data that can be ranked and differences are meaningful. However, there is no meaningful zero, so ratios are meaningless. • All arithmetic operations except division are applicable. • Relational operations are also possible. Examples: IQ Temperature in oF. 12/26/2023 25
  • 26. ▪assumes that the measurements are made in equal units. ▪i.e. gaps between whole numbers on the scale are equal. ▪e.g. Fahrenheit and Celsius temperature scales ▪an interval scale does not have a true zero. ▪e.g. A temperature of "zero" does not mean that there is no temperature...it is just an arbitrary zero point. ▪permissible statistics: count/frequencies, mode, median, mean, standard deviation Interval Scale Numerically equal distances on the scale represent equal values in the characteristic being measured. An interval scale contains all the information of an ordinal scale, but it also allows you to compare the differences between objects. 12/26/2023 26
  • 27. Ratio Scales • Level of measurement which classifies data that can be ranked, differences are meaningful, and there is a true zero. True ratios exist between the different units of measure. • All arithmetic and relational operations are applicable. Examples: Weight Height Number of students Age 12/26/2023 27
  • 28. Primary Scales of Measurement 4 81 9 Nominal Numbers assigned to runners Ordinal Rank order of winners Third Place Second Place First Place Interval Performance rating on a 0 to 10 Scale 8.2 9.1 9.6 Ratio Time to finish in 20 seconds 15.2 14.1 13.4 12/26/2023 28
  • 29. STATISTICS SCALE DESCRIPTIVE INFERENTIAL Nominal Percentages, Mode Chi-square, Binomial test Ordinal Percentile, Median Rank-order, Correlation, ANOVA Interval Range, Mean, SD Correlations, t-tests, ANOVA Regression, Factor Analysis Ratio Geometric Mean, Coefficient of Variation (CV) Harmonic Mean 12/26/2023 29
  • 30. Excercise Categorize the following variables into nominal, ordinal, interval or ratio ➢ Gender ➢ Grade(A, B, C, D and F ) ➢ Rating scale(poor, good, excelent) ➢ Eye colour ➢ Political affilation ➢ Religious affilation ➢ Ranking of tennis players ➢ Majour field ➢ Nationality 30 ➢Height ➢Weight ➢Time ➢Age ➢IQ ➢Temprature ➢Salary 12/26/2023
  • 31. Chapter 2 Organization and Presentation of data • Having collected and edited the data, the next important step is to organize it. • The process of arranging data in to classes or categories according to similarities is called classification • Classification is a preliminary and it prepares the ground for proper presentation of data. • The presentation of data is broadly classified in to the following two categories: • Tabular presentation • Diagrammatic and Graphic presentation. Tuesday, December 26, 2023 Wullo S. 31
  • 32. Tabular presentation of data • Frequency distribution: is the organization of raw data in table form using classes and frequencies. • There are three basic types of frequency distributions ❖ Categorical frequency distribution ❖Ungrouped frequency distribution ❖Grouped frequency distribution Tuesday, December 26, 2023 Wullo S. 32
  • 33. Categorical frequency Distribution: Used for data that can be place in specific categories such as nominal, or ordinal. Example1: a social worker collected the following data on marital status for 25 persons.(M=married, S=single, W=widowed, D=divorced) M S D W D S S M M M W D S M M W D D S S S W W D D Class (1) Frequency (3) Percent (4) M 6 24 S 7 28 D 7 28 W 5 20 Tuesday, December 26, 2023 Wullo S. 33
  • 34. Example 2 • Consider for example, the variable birth weight with levels ‘Very low ’, ‘Low’, ‘Normal’ and ‘Big’. The frequency distribution for newborns is obtained simply by counting by the number of newborns in each birth weight category. Table 2. Distribution of birth weight of newborns between 1976-1996 at TAH. BWT Freq. Rel.Freq(%) Cum. FreCum.rel.freq.(%) Very low 43 0.4 43 0.4 Low 793 8.0 836 8.4 Normal 8870 88.9 9706 97.3 Big 268 2.7 9974 100 Total 9974 100 Tuesday, December 26, 2023 Wullo S. 34
  • 35. 2. Ungrouped frequency Distribution • Is a table of all the potential raw score values that could possible occur in the data along with the number of times each actually occurred. • Example: The following data represent the mark of 20 students. 80 ,76, 90 ,85 ,80 ,70 ,60 ,62 ,70 ,85, 65 ,60 ,63 ,74 ,75 ,76 ,70 ,70 ,80, 85 Construct ungrouped frequency distribution • Mark Frequency 60 2 62 1 63 1 65 1 70 4 74 1 75 2 76 1 80 3 85 3 90 1 Tuesday, December 26, 2023 Wullo S. 35
  • 36. 3.Grouped frequency distribution • When the range of the data is large, the data must be grouped in to classes that are more than one unit in width. Definitions of same terms: • Grouped Frequency Distribution: a frequency distribution when several numbers are grouped in one class. • Class limits: Separates one class in a grouped frequency distribution from another. The limits could actually appear in the data and have gaps between the upper limits of one class and lower limit of the next. • Units of measurement (U): the distance between two possible consecutive measures. It is usually taken as 1, 0.1, 0.01, 0.001, -----. Tuesday, December 26, 2023 Wullo S. 36
  • 37. Cont… • Class boundaries: Separates one class in a grouped frequency distribution from another. The boundaries have one more decimal places than the row data and therefore do not appear in the data. There is no gap between the upper boundary of one class and lower boundary of the next class. The lower class boundary is found by subtracting U/2 from the corresponding lower class limit and the upper class boundary is found by adding U/2 to the corresponding upper class limit. • Class width: the difference between the upper and lower class boundaries of any class. It is also the difference between the lower limits of any two consecutive classes or the difference between any two consecutive class marks. Tuesday, December 26, 2023 Wullo S. 37
  • 38. Cont… • Class mark (Mid points): it is the average of the lower and upper class limits or the average of upper and lower class boundary. • Cumulative frequency: is the number of observations less than/more than or equal to a specific value. • Cumulative frequency above: it is the total frequency of all values greater than or equal to the lower class boundary of a given class. • Cumulative frequency blow: it is the total frequency of all values less than or equal to the upper class boundary of a given class. Tuesday, December 26, 2023 Wullo S. 38
  • 39. Cont… • Cumulative Frequency Distribution (CFD): it is the tabular arrangement of class interval together with their corresponding cumulative frequencies. It can be more than or less than type, depending on the type of cumulative frequency used. • Relative frequency (rf): it is the frequency divided by the total frequency. • Relative cumulative frequency (rcf): it is the cumulative frequency divided by the total frequency. Tuesday, December 26, 2023 Wullo S. 39
  • 40. Steps for constructing Grouped frequency Distribution 1. Find the largest and smallest values 2. Compute the Range(R) = Maximum - Minimum 3. Select the number of classes desired, usually between 5 and 20 or use Sturges rule where k is number of classes desired and n is total number of observation. 4. Find the class width by dividing the range by the number of classes and rounding up, not off. Tuesday, December 26, 2023 Wullo S. 40
  • 41. Cont… 5. Pick a suitable starting point less than or equal to the minimum value. The starting point is called the lower limit of the first class. Continue to add the class width to this lower limit to get the rest of the lower limits. 6. To find the upper limit of the first class, subtract U from the lower limit of the second class. Then continue to add the class width to this upper limit to find the rest of the upper limits. 7. Find the boundaries by subtracting U/2 units from the lower limits and adding U/2 units from the upper limits. The boundaries are also half-way between the upper limit of one class and the lower limit of the next class. !may not be necessary to find the boundaries. Tuesday, December 26, 2023 Wullo S. 41
  • 42. Cont… 8. Tally the data. 9. Find the frequencies. 10. Find the cumulative frequencies. Depending on what you're trying to accomplish, it may not be necessary to find the cumulative frequencies. 11. If necessary, find the relative frequencies and/or relative cumulative frequencies Tuesday, December 26, 2023 Wullo S. 42
  • 43. Example*: • Construct a frequency distribution for the following data. 11, 29 , 6 ,33 , 14 ,31, 22 , 27 , 19 ,20 ,18 ,17 ,22 ,38 ,23 ,21 ,26 ,34 ,39 ,27 Solutions: Step 1: Find the highest and the lowest value H=39, L=6 Step 2: Find the range; R=H-L=39-6=33 Step 3: Select the number of classes desired using Sturges formula; Tuesday, December 26, 2023 Wullo S. 43
  • 44. • Step 4: Find the class width; w=R/k=33/6=5.5=6 (rounding up) • Step 5: Select the starting point, let it be the minimum observation. 6, 12, 18, 24, 30, 36 are the lower class limits. Step 6: Find the upper class limit; e.g. the first upper class=12-U=12-1=11 11, 17, 23, 29, 35, 41 are the upper class limits. So combining step 5 and step 6, one can construct the following classes. Tuesday, December 26, 2023 Wullo S. 44
  • 45. • Class limits 6 – 11 12 – 17 18 – 23 24 – 29 30 – 35 36 – 41 • Step 7: Find the class boundaries; • E.g. for class 1 Lower class boundary=6-U/2=5.5 Upper class boundary =11+U/2=11.5 Tuesday, December 26, 2023 Wullo S. 45
  • 46. • Then continue adding w on both boundaries to obtain the rest boundaries. By doing so one can obtain the following classes. • Class boundary 5.5 – 11.5 11.5 – 17.5 17.5 – 23.5 23.5 – 29.5 29.5 – 35.5 35.5 – 41.5 Step 8: tally the data. Step 9: Write the numeric values for the tallies in the frequency column Tuesday, December 26, 2023 Wullo S. 46
  • 47. Cont… • . Step 10: Find cumulative frequency. • Step 11: Find relative frequency or/and relative cumulative frequency. • The complete frequency distribution follows: Tuesday, December 26, 2023 Wullo S. 47
  • 48. Diagrammatic and Graphic presentation of data. • These are techniques for presenting data in visual displays using geometric and pictures for a categorical / qualitative types of data. Importance: • They have greater attraction. • They facilitate comparison. • They are easily understandable. Diagrams are appropriate for presenting discrete data. Tuesday, December 26, 2023 Wullo S. 48
  • 49. Cont… • The three most commonly used diagrammatic presentation for discrete as well as qualitative data are: • Pie charts • Pictogram • Bar charts Pie chart A pie chart is a circle that is divided in to sections or wedges according to the percentage of frequencies in each category of the distribution. The angle of the sector is obtained using: angle of the sector =RF*360 Tuesday, December 26, 2023 Wullo S. 49
  • 50. Cont… • Example: Draw a suitable diagram to represent the following population in a town. Men Women Girls Boys 2500 2000 4000 1500 Solutions: Step 1: Find the percentage. Step 2: Find the number of degrees for each class. Step 3: Using a protractor and compass, graph each section and write its name corresponding percentage. Tuesday, December 26, 2023 Wullo S. 50
  • 51. Cont… Tuesday, December 26, 2023 Wullo S. 51 1 2 3 4
  • 52. Bar Charts  The frequency distribution of a categorical variable is often presented graphically as a bar chart or pie chart.  Bar charts: display the frequency distribution for nominal or ordinal data.  In a bar chart the various categories into which the observation fall are represented along horizontal axis and a vertical bar is drawn above each category such that the height of the bar represents either the frequency or the relative frequency of observation within the class.  The vertical axis should always start from 0 but the horizontal can start from any where. 52 Wullo S. Tuesday, December 26, 2023
  • 53. Cont… • There are different types of bar charts. The most common being : • Simple bar chart • Component or sub divided bar chart. • Multiple bar charts. Simple bar chart: Are used to display data on one variable. They are thick lines (narrow rectangles) having the same breadth. The magnitude of a quantity is represented by the height /length of the bar. Tuesday, December 26, 2023 Wullo S. 53
  • 54. Cont… • Example: The following data represent sale by product, in 1957 a given company for three products A, B, C. Product In 1957 Sales($) A 12 B 24 C 24 Tuesday, December 26, 2023 Wullo S. 54
  • 55. Cont… • Component Bar chart -When there is a desire to show how a total (or aggregate) is divided in to its component parts, we use component bar chart. • Example: Example: The following data represent sale by product, 1957- 1959 of a given company for three products A, B, C. Product In 1957 Sales($) In 1958 Sales($) In 1959 A 12 14 18 B 24 28 18 C 24 30 36 Tuesday, December 26, 2023 Wullo S. 55
  • 56. Cont… • Multiple Bar charts - These are used to display data on more than one variable. - - They are used for comparing different variables at the same time. Example: Draw a component bar chart to represent the sales by product from 1957 to 1959. Tuesday, December 26, 2023 Wullo S. 56
  • 57. Graphical Presentation of data The histogram, frequency polygon and cumulative frequency graph or ogive are most commonly applied graphical representation for continuous data. Procedures for constructing statistical graphs: • Draw and label the X and Y axes. • Choose a suitable scale for the frequencies or cumulative frequencies and label it on the Y axes. • Represent the class boundaries for the histogram or ogive or the mid points for the frequency polygon on the X axes. • Plot the points. • Draw the bars or lines to connect the points. Tuesday, December 26, 2023 Wullo S. 57
  • 58. Stem-and-Leaf Represents data by separating each value into two parts: the stem (the left most digit) and the leaf (such as the rightmost digit). • Are most effective with relatively small data sets • Are not suitable for reports and other communications, • Help researchers to understand the nature of their data Example: 43, 28, 34, 61, 77, 82, 22, 47, 49, 51, 29, 36, 66, 72, 41 Tuesday, December 26, 2023 Wullo S. 58
  • 59. Histogram • A graph which displays the data by using vertical bars of various heights to represent frequencies. • Class boundaries are placed along the horizontal axes. • Class marks and class limits are some times used as quantity on the X axes. Table 3: Distribution of the age of women at the time of marriage Age group No. of women 15-19 11 20-24 36 25-29 28 30-34 13 35-39 7 40-44 3 45-49 2 Tuesday, December 26, 2023 Wullo S. 59
  • 60. Cont… • A histogram of the age of women at the time of marriage Tuesday, December 26, 2023 Wullo S. 60 Age of women at the time of marriage 0 5 10 15 20 25 30 35 40 14.5-19.5 19.5-24.5 24.5-29.5 29.5-34.5 34.5-39.5 39.5-44.5 44.5-49.5 Age group No of women
  • 61. Frequency Polygon • Frequency Polygon : - A line graph. The frequency is placed along the vertical axis and classes mid points are placed along the horizontal axis. • It is customer to the next higher and lower class interval with corresponding frequency of zero, this is to make it a complete polygon. Example: Draw a frequency polygon for the above data Tuesday, December 26, 2023 Wullo S. 61
  • 62. Ogive (cumulative frequency polygon) - A graph showing the cumulative frequency (less than or more than type) plotted against upper or lower class boundaries respectively. - That is class boundaries are plotted along the horizontal axis and the corresponding cumulative frequencies are plotted along the vertical axis. - The points are joined by a free hand curve. - Example: Draw an ogive curve(less than type) for the above data. Tuesday, December 26, 2023 Wullo S. 62
  • 63. Chapter 3 Numerical summary measures A. Measures of location • It is often useful to summarize, in a single number or statistic, the general location of the data or the point at which the data tend to cluster. • Such statistics are called measures of location or measures of central tendency. • We describe them mean, median and mode. 63 Wullo S. Tuesday, December 26, 2023
  • 64. Cont… Arithmetic mean • The arithmetic mean, usually abbreviated to ‘mean’ is the sum of the observations divided by the number of observations. • We use the following data set of 10 numbers to illustrate the computations:
  • 65. Arithmetic Mean 19 21 20 20 34 22 24 27 27 27 • Then, mean = (19 + 21 + … +27) = 24.1 10 • General formula a) Ungrouped mean 65 Wullo S. Tuesday, December 26, 2023 Estimation of the mean from a grouped frequency distribution In calculating the mean from grouped data, we assume that all values falling into a particular class interval are located at the mid-point of the interval. It is calculated as follow:
  • 66. Cont… Tuesday, December 26, 2023 Wullo S. 66 where, k = the number of class intervals xi = the mid-point of the ith class interval fi = the frequency of the ith class interval
  • 67. Properties of the arithmetic mean • The mean can be used as a summary measure for both discrete and continuous data, in general however, it is not appropriate for either nominal or ordinal data. • For given set of data there is one and only one arithmetic mean. • The arithmetic mean is easily understood and easy to compute. • Algebraic sum of the deviations of the given values from their arithmetic mean is always zero. • The arithmetic mean is greatly affected by the extreme values. • In grouped data if any class interval is open, arithmetic mean can not be calculated. 67 Wullo S. Tuesday, December 26, 2023
  • 68. Reading Assignment A. Weighted mean B. Correct and wrong mean C. Combined mean D. Geometric mean E. Harmonic Tuesday, December 26, 2023 Wullo S. 68
  • 69. Median • With the observations arranged in increasing or decreasing order, the median is defined as the middle observation. • If the number of observations is odd, the median is defined as the [(n+1)/2]th observation. • If the number of observations is even the median is the average of the two middle {(n/2)th +[(n/2)+1]th }/2 values i.e • If the number of observations is even, so that there is no middle observation, the median is defined as the average of the two middle observations. • Example : 19 20 20 21 22 24 27 27 27 34 • Then, the median = (22 + 24)/2 = 23 69 Wullo S. Tuesday, December 26, 2023
  • 70. Median for Grouped data In calculating the median from grouped data, we assume that the values within a class-interval are evenly distributed through the interval. – The first step is to locate the class interval in which it is located. – Find n/2 and see a class interval with a minimum cumulative frequency which contains n/2. • Whereas: • LCB= lower class boundary of the median class • Fc= cumulative frequency just before the median class • fc=frequency of the median class • W =class width and n=number of observations. 70 Wullo S. Tuesday, December 26, 2023
  • 71. Properties of median • The median can be used as a summary measure for discrete and continuous data, in general however, it is not appropriate for nominal data. • There is only one median for a given set of data • The median is easy to calculate • Median is a positional average and hence it is not drastically affected by extreme values • Median can be calculated even in the case of open end intervals 71 Wullo S. Tuesday, December 26, 2023
  • 72. Mode • Any observation of a variable at which the distribution reaches a peak is called a mode. • Most distributions encountered in practice have one peak and are described as uni-modal. • E.g. Consider the example of ten numbers 19 21 20 20 34 22 24 27 27 27 • In the above data set, the mode is 27, because the value 27 occurs three times (the most frequent). • The mode of grouped data, usually refers to the modal class, where the modal class is the class interval with the highest frequency. • If a single value for the mode of grouped data must be specified, it is taken as the mid point of the modal class interval 72 Wullo S. Tuesday, December 26, 2023
  • 73. Properties of mode • The mode can be used as a summary measure for nominal, ordinal, discrete and continuous data, in general however, it is more appropriate for nominal and ordinal data. • It is not affected by extreme values • It can be calculated for distributions with open end classes • Often its value is not unique • The main drawback of mode is that often it does not exist 73 Wullo S. Tuesday, December 26, 2023
  • 74. proportion • As we have seen from the previous section, a variable can be either categorical or numerical • Proportion is one of summery measures for categorical variable and also numerical variable if we are counting the number of cases under the specific category. • If we denote “x” as the number of success in an experiment and “n” is the number of trials then the proportion of success from n number of trials is given by x/n. 74 Wullo S. Tuesday, December 26, 2023
  • 75. Percentiles, Quartiles and Inter-quartile Range • The quartiles are sets of values which divide the distribution into four parts such that there are an equal number of observations in each part. – Q1 = [(n+1)/4]th – Q2 = [2(n+1)/4]th – Q3 = [3(n+1)/4]th • The inter-quartile range is the difference between the third and the first quartiles. – Q3 - Q1 Example1: We use the data set of 11 numbers: 19 21 20 20 34 22 24 27 27 27 28 – The first quartile is 20 and the third quartile is 27 – The inter quartile range = 27 – 20 = 7. 75 Wullo S. Tuesday, December 26, 2023
  • 76. Percentiles, Quartiles and Inter-quartile Range • Percentiles divide the data into 100 parts of observations in each part. • It follows that the 25th percentile is the first quartile, the 50th percentile is the median and the 75th percentile is the third quartile. 76 Wullo S. Tuesday, December 26, 2023
  • 77. B. Measures of Dispersion (Variation) • The scatter or spread of items of a distribution is known as dispersion or variation. • In other words the degree to which numerical data tend to spread about an average value is called dispersion or variation of the data. • The most commonly used measures of dispersions are: 1) Range and relative range 2) Quartile deviation and coefficient of Quartile deviation 3) Mean deviation and coefficient of Mean deviation 4) variance 5) Standard deviation and coefficient of variation. Tuesday, December 26, 2023 Wullo S. 77
  • 78. Range • The range is the largest score minus the smallest score. • It is a quick and dirty measure of variability • Because the range is greatly affected by extreme scores and its only depends on two observations Relative Range (RR) • It is also some times called coefficient of range and given: Tuesday, December 26, 2023 Wullo S. 78
  • 79. Cont.. • Example: 1. If the range and relative range of a series are 4 and 0.25 respectively. Then what is the value of Smallest observation and Largest observation The Quartile Deviation (Semi-inter quartile range) The inter quartile range is the difference between the third and the first quartiles of a set of items and semi-inter quartile range is half of the inter quartile range. Tuesday, December 26, 2023 Wullo S. 79
  • 80. Variance • variance is the "average squared deviation from the mean". • A good measure of dispersion should make use of all the data. • For the case of frequency distribution it is expressed as: the variance is limited as a descriptive statistic because it is not in the same units as in the observations. Tuesday, December 26, 2023 Wullo S. 80
  • 81. Cont…. Tuesday, December 26, 2023 Wullo S. 81 For the case of frequency distribution it is expressed as:
  • 82. Coefficient of Variation (C.V) Is defined as the ratio of standard deviation to the mean usually expressed as percents. The distribution having less C.V is said to be less variable or more consistent. Tuesday, December 26, 2023 Wullo S. 82
  • 83. Which measures to use? • When the distribution of the data is symmetric and unimodal (i.e. the data are approximately normally distributed), it is usual to summarize the data using means and standard deviations. • However when the data are skewed, it is preferable to use the median and quartiles as summary statistics. • Median and quartiles are not easily influenced by extreme values in a skewed distribution unlike means and standard deviations. • Remark: – The mean and median of symmetric distribution coincide. – When the distribution is skewed to the right, its mean is larger than its median. – When the distribution is skewed to the left, its mean is smaller than its median. [See Figures 7(a-c)]. 83 Wullo S. Tuesday, December 26, 2023
  • 84. Median Mode Mean Fig. 2(a). Symmetric Distribution Mean = Median = Mode Mode Median Mean Fig. 2(b). Distribution skewed to the right Mean > Median > Mode Mean Median Mode Fig. 2(c). Distribution skewed to the left Mean < Median < Mode 84 Wullo S. Tuesday, December 26, 2023
  • 85. Chapter 4 Introduction to probability Objectives • After completing this chapter, you should be able to – Determine sample spaces and find the probability of an events – Understand the different properties of probability – Explain the various types of probability distributions – emphasis will be given to the two widely used probability distributions Tuesday, December 26, 2023 Wullo S. 85
  • 86. Introduction • Many medical decisions are made on a statistical basis since individuals differ in their reactions to medications or surgery in an unpredictable way. • In that case the treatment applied is based on getting the best outcome for as many patients as possible – The life experienced consists of a series of events – “Probability” is a very useful concept and are used in everyday communication. 86 Wullo S. Tuesday, December 26, 2023
  • 87. Introduction con'td…  An understanding of probability is fundamental for  quantifying the uncertainty in the decision-making process  drawing conclusions about a population of patients based on known information about a sample of patients drawn from that population. Probability can be defined as the chance of an event occurring. Many people are familiar with probability from observing or playing games of chance, such as card games, slot machines, or lotteries. Probability theory is used in the various fields of area like insurance, investments, and weather forecasting and other areas. 87 Wullo S. Tuesday, December 26, 2023
  • 88. Basic concepts • The following definitions and terms are used in studying the theory of probability. – Random experiment: a chance process that leads to well-defined results called outcomes, that is the result cannot be predicted. E.g. Tossing of coins, throwing of dice are some examples of random experiments. – Trial: Performing a random experiment is called a trial. – Outcomes: The results of a random experiment are called its outcomes. When two coins are tossed the possible outcomes are HH, HT, TH, TT. 88 Wullo S. Tuesday, December 26, 2023
  • 89. Basic concepts con'td…. – Sample space: Each conceivable outcome of an experiment is called a sample point. The totality of all sample points is called a sample space and i s denoted by S. – Event: An outcome or a combination of outcomes of a random experiment is called an event. It is a subset of the sample space of a random experiment. – Equally-likely Approach: If an experiment must result in n equally likely outcomes, then each possible outcome must have probability 1/n of occurring. – Mutually exclusive events: when the occurrence of any one event excludes the occurrence of the other event. Mutually exclusive events cannot occur simultaneously. 89 Wullo S. Tuesday, December 26, 2023
  • 90. Basic concepts cont’d….. • Some sample spaces for various probability experiments are. • Probability attempts to quantify an uncertain situation and relative tries to make it more concrete the occurrence of events. • Probability is used to quantify the likelihood, or chance, that an outcome of a random experiment will occur. • Probability is a number between 0 and 1 that expresses how likely the event is occur. 90 Wullo S. Tuesday, December 26, 2023
  • 91. Basic concepts cont’d….. • Example: Find the sample space for the gender of the children if a family has three children. Use B for boy and G for girl. – Solution: There are two genders, male and female, and each child could be either gender. Hence, there are eight possibilities, as shown here. S= {BBB, BBG, BGB, GBB, GGG, GGB, GBG, BGG} • Note: the way to find all possible outcomes of a probability experiment (the sample spaces) – by observation and reasoning; – use a tree diagram (a device consisting of line segments emanating from a starting point and also from the outcome point.) 91 Wullo S. Tuesday, December 26, 2023
  • 92. Tree diagram of the above example 92 Wullo S. Tuesday, December 26, 2023
  • 93. Types of probability 1. Classical Probability If the number of outcomes belonging to an event E is NE, and the total number of outcomes is N, then the probability of event E is defined as P(E)=NE/N Example: A couple wants to have exactly 3 children. Assume that each child is either a Boy or a Girl and that there are no duplicate births. Find the probability that two of them will be boys? List all possible orderings for the three children. Solution: S {BBB, BBG, BGB, BGG, GBB, GBG, GGB, GGG} Then the event E={BBG, BGB, GBB} P(E)=3/8 2. Relative Frequency Probabilities This approach to probability is well-suited to a wide range of scientific disciplines. It is based on the idea that the underlying probability of an event can be measured by repeated trials Tuesday, December 26, 2023 Wullo S. 93
  • 94. • In this view, probability is treated as a quantifiable level of belief ranging from 0 (complete disbelief) to 1 (complete belief) • For instance, an experienced physician may say “this patient has a 50% chance of recovery.” • An appreciation of the various types of probability are not mutually exclusive. And fortunately, all obey the same • mathematical laws, and their methods of calculation are similar. • All probabilities are a type of relative frequency—the number of times something can occur divided by the total number of possibilities or occurrences. 3. Subjective probability Tuesday, December 26, 2023 Wullo S. 94
  • 95. Rules of Probability • Any probability assigned must be a nonnegative number. • The probability of the sample space (the collection of all possible outcomes) is equal to 1. • The probability of A or B involves addition. • P(A or B) = P(A) + P(B) if the two are mutually exclusive. • The probability of A and B involves multiplication • P(A and B) = P(A) P(B) if the two are independent • P( Not A) = 1- P(A) • P(At least one) = 1- P(none) • P(none) = P(each event not happening)^number of events Tuesday, December 26, 2023 Wullo S. 95
  • 96. Addition Rule • General rule: if two events, A and B, are not mutually exclusive, then the probability that event A or event B occurs is: – P(A or B) = P(A) + P(B) – P(A and B) – P(AuB) = P(A) + P(B) – P(AnB) • Special case: when two events, A and B, are mutually exclusive, then the probability that event A or event B occurs is: – P(A or B) = P(A) + P(B) – P(AuB) = P(A) + P(B) • since P(A and B) = 0 for mutually exclusive events Wullo S. 96 Tuesday, December 26, 2023
  • 97. Conditional Probabilities • Sometimes the chance a particular event happens depends on the outcome of some other event. This applies obviously with many events that are spread out in time. • The probability that an event occurs subject to the condition that another event has already occurred is called conditional probability • If A and B are events with Pr(A) > 0, the conditional probability of B given A is • Example: Drug test • Given A is independent from B, what is the relationship between Pr(A|B) and Pr(A)? Pr( ) Pr( | ) Pr( ) AB B A A = 97 Women Men Success 200 1800 Failure 1800 200 A = {Patient is a Women} B = {Drug fails} Pr(B|A) = 1800/2000 Pr(A|B) = Wullo S. Tuesday, December 26, 2023
  • 98. Conditional probability: • The conditional probability of an event A given an event B is present is: – P(A | B) =P(AnB)/P(B) • where P(B)≠0 • Joint probability • The joint probability of an event A and an event B is – P(AnB) = P(A and B) • When events A and B are mutually exclusive, then – P(A and B) = 0 Wullo S. 98 Tuesday, December 26, 2023
  • 99. Multiplication rule • General rule: the multiplication rule specifies the joint probability as: • P(AnB)=P(B)P(A/B) • Special case: When events A and B are independent, then: – P(A|B) = P(A) – P(AnB)=P(A)P(B) Wullo S. 99 Tuesday, December 26, 2023
  • 100. Example 1. Find the Probability of at least one male birth in ten consecutive births? Solution • P(At least one male) = 1- P(all females) • P(all females) = P(each single birth is a female)10 = (0.5)10 = 9.77 x 10-4 • So P(At least one male) = 1 – 9.77 x 10-4 = 0.999023. Tuesday, December 26, 2023 Wullo S. 100
  • 101. Exercises 1. 50% of the students in a school weigh more than 140 pounds, but 70% weigh less than 170. If a student is randomly selected, what is the probability the student will weigh between 140 and 170? • Solution • P(w<140)=0.5, P(w>170)=0.3, and P(w<140)+P(140<w<170)+ P(w>170)= 1 • Then • 0.5+P(140<w<170)+ 0.3= 1 P(140<w<170)=1-0.8=0.2 Tuesday, December 26, 2023 Wullo S. 101
  • 102. Probability Distribution • A probability distribution is a table or an equation that links each outcome of a statistical experiment with its probability of occurrence. • Probability distribution for discrete variable • Probability distribution for continuous variable Wullo S. 102 Tuesday, December 26, 2023
  • 103. Probability Distribution for discrete Variable 1. Binomial distribution • A binomial experiment (also known as a Bernoulli trial) is a statistical experiment that has the following properties: • The experiment consists of n repeated fixed number of trials. • Each trial can result in just two possible outcomes. We call one of these outcomes a success and the other, a failure. • The probability of success, denoted by P, is the same on every trial. • The trials are independent; that is, the outcome on one trial does not affect the outcome on other trials. • The probability distribution of this experiment is known as binomial probability distribution. • The binomial distribution describes the distribution of "success" in a series of trials, that is, out of N tries, what is the probability that X of them succeed. 103 Wullo S. Tuesday, December 26, 2023
  • 104. Binomial formula • If the probability of success on an individual trial is P, then the binomial probability is defined by: • Where • K=the number of success • P=probability of success • n=the number of experiments • 1-p=probability of failure 104 Wullo S. Tuesday, December 26, 2023
  • 105. 2. Poisson Distribution • The Poisson Distribution is a discrete distribution which takes on the values X = 0, 1, 2, 3, and so on. • It is often used as a model for the number of events in a specific time period. • It is determined by one parameter, lambda. The Poisson random variable satisfies the following conditions: • The number of successes in two disjoint time intervals is independent. • The probability of a success during a small time interval is proportional to the entire length of the time interval. • Some of the examples in which Poisson distribution is appropriate are: ➢ birth defects and genetic mutations ➢ rare diseases (like Leukemia, but not AIDS because it is infectious and so not independent) ➢ car accidents ➢ traffic flow and ideal gap distance, and so on Tuesday, December 26, 2023 Wullo S. 105
  • 106. Poisson formula • The probability distribution of a Poisson random variable X representing the number of successes occurring in a given time interval or a specified region of space is given by the formula: Where • X=Number of successes per unit time • e=The base of the natural log • λ= The expected number of successes per unit time • If λ is the average number of successes occurring in a given time interval or region in the Poisson distribution, then the mean and the variance of the Poisson distribution are both equal to λ. Tuesday, December 26, 2023 Wullo S. 106
  • 107. Probability Distribution for continuous Variables • If a random variable is a continuous variable, its probability distribution is called a continuous probability distribution. • A continuous probability distribution differs from a discrete probability distribution in several ways by: • Under different circumstances, the outcome of a random variable may not be limited to categories or counts. • Because a continuous random variable X can take on an uncountable infinite number of values, the probability associated with any particular one value is almost equal to zero. • As a result, a continuous probability distribution cannot be expressed in tabular form. 107 Wullo S. Tuesday, December 26, 2023
  • 108. Normal distribution • The normal distribution refers to a family of continuous probability distributions described by the normal equation and described as follows: where • X is a normal random variable, • μ is the mean • σ is the standard deviation • pi is approximately 3.14159, and e is approximately 2.71828. • The random variable X in the normal equation is called the normal random variable. 108 Wullo S. Tuesday, December 26, 2023
  • 109. Characteristics of Normal Distribution • It links frequency distribution to probability distribution • Has a Bell Shape Curve and is Symmetric • It is Symmetric around the mean: Two halves of the curve are the same (mirror images) • Hence Mean = Median • The total area under the curve is 1 (or 100%) • Normal Distribution has the same shape as Standard Normal Distribution. 109 Wullo S. Tuesday, December 26, 2023
  • 110. Normal Curve • The graph of the normal distribution depends on two factors: ✓the mean and the standard deviation. • The mean of the distribution determines the location of the center of the graph, and the standard deviation determines the height and width of the graph. • When the standard deviation is large, the curve is short and wide; when the standard deviation is small, the curve is tall and narrow. • All normal distributions look like a symmetric, bell-shaped curve. 110 Wullo S. Tuesday, December 26, 2023
  • 111. Standard Normal Distribution • It makes life a lot easier for us if we standardize our normal curve, with a mean of zero and a standard deviation of 1 unit. • We can transform all the observations of any normal random variable X with mean μ and variance σ to a new set of observations of another normal random variable Z with mean 0 and variance 1 using the following transformation: 111 Wullo S. Tuesday, December 26, 2023
  • 112. • About 95% of the area under the curve falls within 2 standard deviations of the mean. • About 99.7% of the area under the curve falls within 3 standard deviations of the mean. • A graph of this standardized (mean 0 and variance 1) normal curve is given in Graph: 112 Wullo S. Tuesday, December 26, 2023
  • 113. Table of normal • Example 1: Suppose we want to compute the area under the normal curve to the right of 1.45 • This area can be computed by finding the probability under the normal curve. The probability can be read at the normal curve by combining the value of 1.4 under the first column and 0.05 under the first row. • The green shaded area in the diagram represents the area that is within 1.45 standard deviations from the mean. The area of this shaded portion is 0.4265 (or 42.65% of the total area under the 113 Wullo S. Tuesday, December 26, 2023
  • 114. z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359 0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753 0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141 0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517 0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879 0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224 0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549 0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852 0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133 0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3304 0.3365 0.3389 1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621 1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830 1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015 1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177 1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319 1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441 1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545 1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633 1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706 1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767 2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817 114 Wullo S. Tuesday, December 26, 2023
  • 115. Example 2 • Assuming the normal heart rate (H.R) in normal healthy individuals is normally distributed with Mean = 70 and Standard Deviation =10 beats/min Then: 1) What area under the curve is above 80 beats/min? Now we know, Z =X-M/SD, Z=? X=80, M= 70, SD=10 . So we have to find the value of Z. For this we need to draw the figure…..and find the area which corresponds to Z. 115 Wullo S. Tuesday, December 26, 2023
  • 116. • Since M=70, then the area under the curve which is above 80 beats per minute corresponds to above + 1 standard deviation. • • The total shaded area corresponding to above 1+ standard deviation in percentage is 15.9% or Z= 15.9/100 =0.159. • Or we can find the value of z by substituting the values in the formula Z= X-M/ standard deviation. • Therefore, Z= 70-80/10 -10/10= -1.00 is the same as +1.00. The value of z from the table for 1.00 is 0.159. How do we interpret this? • • This means that 15.9% of normal healthy individuals have a heart rate above one standard deviation (greater than 80 beats per minute). 116 Wullo S. Tuesday, December 26, 2023
  • 117. 13.6% 2.2% 0.15 -3 -2 -1 μ 1 2 3 Diagram of Exercise 0.159 33.35% 117 Wullo S. Tuesday, December 26, 2023
  • 118. Example … 2) What area of the curve is above 90 beats/min? 3) What area of the curve is between 50-90 beats/min? 4)What area of the curve is above 100 beats/min? 5) What area of the curve is below 40 beats per min or above 100 beats per min? 118 Wullo S. Tuesday, December 26, 2023
  • 119. solution 2) 2.3% or 0.023 3) 95.4% or 0.954 4) 0.15 % or 0.015 5) 0.3 % or 0.015 (for each tail) 119 Wullo S. Tuesday, December 26, 2023
  • 120. Example 3 • Suppose scores on an IQ test are normally distributed. If the test has a mean of 100 and a standard deviation of 10, what is the probability that a person who takes the test will score between 90 and 110? • Solution: 120 Wullo S. Tuesday, December 26, 2023
  • 121. Application/Uses of Normal Distribution • It’s application goes beyond describing distributions • It is used by researchers and modelers. • The major use of normal distribution is the role it plays in statistical inference. • The z score along with the t –score, chi-square and F- statistics is important in hypothesis testing. • It helps managers/management make decisions. 121 Wullo S. Tuesday, December 26, 2023
  • 122. Exercises • Find the probability under the normal curve of the following: • The area greater than 1.25 • The area lower than 0.87 • The area greater than -2.36 • The area lower than -1.96 • The area between 0.87 and 1.25 • The area between -1.96 and 1.25 • The value of z which cuts the lower 20% • The value of z which cuts the upper 10% • The values of z which cut the middle 80% 122 Wullo S. Tuesday, December 26, 2023
  • 123. Chapter s Sampling methods and sample size determination • Sampling is a procedure by which some members of the given population are selected as representative of the entire population ➢ Theoretical population ➢ Target population ➢ Study population ➢ sampling frame ➢ Sampling unit ➢ Study unit
  • 124. Hierarchy of sampling Wullo S. 124 Tuesday, December 26, 2023
  • 125. Error in sampling • No sample is the exact mirror image of the population • Potential Source of Error in research are two. 1. Sampling error is any type of bias that is attributable to mistakes in either drawing a sample or determining the sample size ➢ Sampling error (chance ) Can not be avoided or totally eliminated ➢ The causes are – One is chance: That is the error that occurs just because of bad luck – Design error – Un representativeness of the sample Wullo S. 125 Tuesday, December 26, 2023
  • 126. Error in sampling con’d.. 2. Non-sampling error is any error which will be committed during data collection, coding, entry, and so on ✓ Observational error ✓ Respondent error ✓ Lack of preciseness of definition ✓ Error in editing and tabulation of the data Wullo S. 126 Tuesday, December 26, 2023
  • 127. Wullo S. 127 Tuesday, December 26, 2023
  • 128. Advantage of sampling ➢Feasibility it may be the only feasible method of collecting data ➢Reduced cost sampling reduces demands on resource such as finance, personal and material ➢Greater accuracy sampling may lead to better accuracy of collecting data/ detailed information ➢Greater speed data can be collected and summarized more quickly Wullo S. 128 Tuesday, December 26, 2023
  • 129. Disadvantage of sampling • There is always sampling error • Sampling may create a feeling of discrimination within the population • It may be inadvisable where every unit in the population is legally required to have a record • Demands more rigid control in undertaking sample operation • The presence of bias creates difference between the parameter and the statistic • Minority and smallness in number of sub-groups often render study to be suspected. • Sample results are good approximations at best. Wullo S. 129 Tuesday, December 26, 2023
  • 130. Types of sampling I. Probability sampling • Is any method of sampling that utilizes some form of random selection. • probability sampling is a procedure for sampling from a population in which – The selection of a sample unit is based on chance – Every element of the population has a known and non-zero probability of being selected – Random sampling helps produce representative samples by eliminating voluntary response bias and guarding against under coverage bias ❖ Every individual of the target population has equal chance to be included in the sample. Wullo S. 130 Tuesday, December 26, 2023
  • 131. 1. Simple random sample (SRS) • Objective: To select n units out of N • If the population is homogenous • If frame is available • If the study area is not very wide – Note: Homogeneity refers to the similarity of the population with regard to the outcome variable . • Procedure: ✓Use a table of random numbers: takes on values 0,1,2,…….,9 with equal probability ✓ a computer random number generator ✓ mechanical device to select the sample. ✓RAND() function from Excel sheet if frame is available ✓Lottery method Wullo S. 131 Tuesday, December 26, 2023
  • 132. Wullo S. 132 Tuesday, December 26, 2023
  • 133. 2. Stratified random sampling • Stratified Random Sampling involves dividing your population into homogeneous subgroups and then taking a simple random sample in each subgroup. • Objective: Divide the population into non-overlapping groups (i.e., strata) N1, N2, N3, ... Ni, such that N1 + N2 + N3 + ... + Ni = N. Then do a simple random sample depending on the type of allocation ➢ Proportional allocation: ➢ Equal allocation Example: ➢ An agency has clients from three ethnic groups and the agency wants to asses clients view of quality of service for the last year. i i N N n n * = Wullo S. 133 Tuesday, December 26, 2023
  • 134. Stratified random sampling Wullo S. 134 Tuesday, December 26, 2023
  • 135. 3. Systematic random sampling ▪ Here are the steps you need to follow in order to achieve a systematic random sample: ❖number the units in the population from 1 to N ❖decide on the n (sample size) that you want or need ❖k = N/n = the interval size ❖randomly select an integer between 1 to k ❖then take every kth unit • Assumptions – Homogeneous population – Frame is not available – If the study area is not very wide Wullo S. 135 Tuesday, December 26, 2023
  • 136. Systematic random sampling • Example Wullo S. 136 Tuesday, December 26, 2023
  • 137. Wullo S. 137 Tuesday, December 26, 2023
  • 138. Wullo S. 138 Tuesday, December 26, 2023
  • 139. 4. Cluster (area) random sampling • The problem with random sampling methods when we have to sample a population that's disbursed across a wide geographic region is that you will have to cover a lot of ground geographically in order to get to each of the units you sampled • In cluster sampling, we follow these steps: ✓ divide population into clusters (usually along geographic boundaries) ✓ randomly sample clusters ✓ measure all units within sampled clusters Wullo S. 139 Tuesday, December 26, 2023
  • 140. Example • In the figure we see a map of the counties in New York State. Let's say that we have to do a survey of town governments that will require us going to the towns personally. If we do a simple random sample state-wide we'll have to cover the entire state geographically Wullo S. 140 Tuesday, December 26, 2023
  • 141. 5. Multi-stage sampling • The four methods we've covered so far -- simple, stratified, systematic and cluster -- are the simplest random sampling strategies • When we combine sampling methods, we call this multi-stage sampling • Consider the problem of sampling students in grade schools. We might begin with a national sample of school districts stratified by educational level. Within selected districts, we might do a simple random sample of schools. Within schools, we might do a simple random sample of classes or grades. And, within classes, we might even do a simple random sample of students. In this case, we have three or four stages in the sampling process and we use both stratified and simple random sampling. Wullo S. 141 Tuesday, December 26, 2023
  • 142. Wullo S. 142 Tuesday, December 26, 2023
  • 143. II. Non probability sampling • Non probability sampling does not involve random selection • Does that mean that non probability samples aren‘t representative of the population? • It does mean that non probability samples cannot depend upon the rationale of probability theory • Most sampling methods are purposive in nature because we usually approach the sampling problem with a specific plan in mind. Wullo S. 143 Tuesday, December 26, 2023
  • 144. Convenience sampling Example ➢ Man in the street (attitude of foreigners about Ethiopia) ➢ College students for psychological study Wullo S. 144 Tuesday, December 26, 2023
  • 145. Purposive sampling • In purposive sampling, we sample with a purpose in mind 1. Modal Instance Sampling ✓ In statistics, the mode is the most frequently occurring value in a distribution. ✓ we are sampling the most frequent case, or the "typical" case ✓ We could say that the modal voter is a person who is of average age, educational level, and income in the population Wullo S. 145 Tuesday, December 26, 2023
  • 146. Purposive sampling… 2. Expert Sampling ✓ Expert sampling involves the assembling of a sample of persons with known or demonstrable experience and expertise in some area. 3. Quota Sampling ✓ In quota sampling, you select people non randomly according to some fixed quota ✓ There are two types of quota sampling: proportional and non proportional 4. Heterogeneity Sampling ✓ We sample for heterogeneity when we want to include all opinions or views, and we aren't concerned about representing these views proportionately. ✓ Another term for this is sampling for diversity. Wullo S. 146 Tuesday, December 26, 2023
  • 147. Purposive sampling… 5. Snowball sampling ✓ In snowball sampling, you begin by identifying someone who meets the criteria for inclusion in your study ✓ You then ask them to recommend others who they may know who also meet the criteria ✓ Snowball sampling is especially useful when you are trying to reach populations that are inaccessible or hard to find ✓ For instance, if you are studying the homeless, you are not likely to be able to find good lists of homeless people within a specific geographical area. However, if you go to that area and identify one or two, you may find that they know very well who the other homeless people in their vicinity are and how you can find them Wullo S. 147 Tuesday, December 26, 2023
  • 148. Summary ▪ Selecting a sampling method depends on • Population to be studied ✓ Size and geographic distributions ✓ Heterogeneity with respect to the variable studied • Resource available • Level of precision required • Importance of having a precise estimate of sampling error Wullo S. 148 Tuesday, December 26, 2023
  • 149. Sample size determination • Determining the sample size for a study is a crucial component of study design. • The goal is to include sufficient numbers of subjects so that statistically significant results can be detected. • Among the questions that a researcher should ask when planning a survey or study is that "How large a sample do I need?" • The answer will depend on the aims, nature and scope of the study and on the expected result. • All of which should be carefully considered at the planning stage Wullo S. 149 Tuesday, December 26, 2023
  • 150. Sample size determination …. • In general, sample size depends on – Objective of the study – Design of the study – Plan for statistical analysis – Accuracy of the measurement to be made – Degree of precision required for generalization – Degree of confidence • We can use three approaches to determine sample size – Rules of thumb for determining the sample size – Statistical formula • Confidence interval approach • Hypothesis testing approach Wullo S. 150 Tuesday, December 26, 2023
  • 151. Rules of thumb • For smaller samples(N < 100), there is little point in sampling. Survey the entire population. • If the population size is around 500 (give or take 100), 50% should be sampled. • If the population size is around 1500, 20% should be sampled. • Beyond a certain point (N = 5000), the population size is almost irrelevant and a sample size of 400 may be adequate. • Statistician maximalist: at least 500 • To make generalizations about entire population, need a total sample size of 200-400 (depending on total population and confidence level desired) Wullo S. 151 Tuesday, December 26, 2023
  • 152. confidence interval • Hence the absolute precision denoted by d is given as • Where s.e is the standard error of the estimator of the parameter of interest. • The margin of error (d) measures the precision of the estimate – Small value of w indicates high precision – It lies in the interval (0%; 5%] – For p close to 50%, w is assumed to be close to 5% – For smaller value of p, w is assumed to be lower than 5% e s z proportion mean . ) ( 2   e s z d . 2  = Wullo S. 152 Tuesday, December 26, 2023
  • 153. Estimating a single population mean Where the standard deviation δ can be estimated by; ✓From previous study, if there is ✓From pilot study ✓From educate guess Wullo S. 153 Tuesday, December 26, 2023
  • 154. Single population proportion • Let p denotes proportion of success, then Where the standard deviation p can be estimated by; ✓From previous study, if there is ✓From pilot study ✓P=50% Wullo S. 154 Tuesday, December 26, 2023
  • 155. Point to be considered Wullo S. 155 Tuesday, December 26, 2023
  • 156. Wullo S. 156 Tuesday, December 26, 2023
  • 157. Example Wullo S. 157 Tuesday, December 26, 2023
  • 158. Chapter six Inferential statistics • After complete this session you will be able to do – Parameter estimations – Point estimate – Confidence interval – Hypothesis testing – Z-test – T-test – Testing associations – Chi-Square test
  • 160. Introduction con'td…….. • Before beginning statistical analyses – it is essential to examine the distribution of the variable for skewness (tails), – kurtosis (peaked or flat distribution), spread (range of the values) and – outliers (data values separated from the rest of the data). • Information about each of these characteristics determines to choose the statistical analyses and can be accurately explained and interpreted. 160 Wullo S. Tuesday, December 26, 2023
  • 161. Introduction con’td ……. • Statistical tests can be either parametric or non- parametric • The path way for the analysis of continuous variables is shown below 161 Wullo S. Tuesday, December 26, 2023
  • 162. Sampling Distribution • The frequency distribution of all these samples forms the sampling distribution of the sample statistic 162 Wullo S. Tuesday, December 26, 2023
  • 163. Sampling distribution .......  Three characteristics about sampling distribution of a statistic its mean its variance its shape  If we repeatedly take sample of the same size n from a population the means of the samples form a sampling distribution of means of size n is equal to population mean.  In practice we do not take repeated samples from a population i.e. we do not encounter sampling distribution empirically, but it is necessary to know their properties in order to draw statistical inferences. 163 Wullo S. Tuesday, December 26, 2023
  • 164. The Central Limit Theorem • Regardless of the shape of the frequency distribution of a characteristic in the parent population, • the means of a large number of samples (independent observations) from the population will follow a normal distribution (with the mean of means approaches the population mean μ, and standard deviation of σ/√n ). • Inferentialstatisticaltechniqueshavevariousassumptionst hatmustbemetbeforevalid conclusions can be obtained • Samples must be randomly selected. • sample size must be greater (n>=30) • the population must be normally or approximately normally distributed if the sample size is less than 30. 164 Wullo S. Tuesday, December 26, 2023
  • 165. Sampling Distribution...... • E.g. Sampling Distribution of the mean • Suppose we choose a random sample of size n, the sampling distribution of the sample mean x posses the following properties. – The sample mean x will be an estimate of the population mean μ. – The standard deviation of x is σ/√n (called the standard error of the mean). – Provided n is large enough the shape of the sampling distribution of x is normal. 165 Wullo S. Tuesday, December 26, 2023
  • 166. Sampling Distribution .......... • Proportion  Suppose we choose a random sample of size n, the sampling distribution of the sample means p posses the following properties.  The sample proportion p will be an estimate of the population mean p. ________  The standard deviation of p is = √p(1-p) /n called the standard error of the proportion).  Provided n is large enough the shape of the sampling distribution of p is normal. 166 Wullo S. Tuesday, December 26, 2023
  • 167. Parameter Estimations • In parameter estimation, we generally assume that the underlying (unknown) distribution of the variable of interest is adequately described by one or more (unknown) parameters, referred as population parameters. • As it is usually not possible to make measurements on every individual in a population, parameters cannot usually be determined exactly. • Instead we estimate parameters by calculating the corresponding characteristics from a random sample estimates . • the process of estimating the value of a parameter from information obtained from a sample. • Point estimation • Interval estimation 167 Wullo S. Tuesday, December 26, 2023
  • 168. Point estimation • A point estimate is a specific numerical value estimate of a parameter. • Sample measures (i.e., statistics) are used to estimate population measures (i.e., parameters). These statistics are called estimators. • Point estimate for population mean µ is • Point estimate for population proportion is given by • Where x is the total number of success (events) 168 n x = x n 1 = i i  n = p x  Wullo S. Tuesday, December 26, 2023
  • 169. Three Properties of a Good Estimator • The estimator should be an unbiased estimator. That is, the expected value or the mean of the estimates obtained from samples of a given size is equal to the parameter being estimated. • The estimator should be consistent. For a consistent estimator, as sample size increases, the value of the estimator approaches the value of the parameter estimated. • The estimator should be a relatively efficient estimator. That is, of all the statistics that can be used to estimate a parameter, the relatively efficient estimator has the smallest variance. 169 Wullo S. Tuesday, December 26, 2023
  • 170. Some BLUE estimators 170 Wullo S. Tuesday, December 26, 2023
  • 171. Confidence Interval estimate • However the value of the sample statistic will vary from sample to sample therefore, to simply obtain an estimate of the single value of the parameter is not generally acceptable. – We need also a measure of how precise our estimate is likely to be – We need to take into account the sample to sample variation of the statistic • A confidence interval defines an interval within which the true population parameter is like to fall (interval estimate). 171 Wullo S. Tuesday, December 26, 2023
  • 172. Confidence intervals… • Confidence interval therefore takes into account the sample to sample variation of the statistic and gives the measure of precision. • The general formula used to calculate a Confidence interval is Estimate ± K × Standard Error, k is called reliability coefficient • Confidence intervals express the inherent uncertainty in any medical study by expressing upper and lower bounds for anticipated true underlying population parameter. • The confidence level is the probability that the interval estimate will contain the parameter, assuming that a large number of samples are selected and that the estimation process on the same parameter is repeated • Most commonly the 95% confidence intervals are calculated, however 90% and 99% confidence intervals are sometimes used. 172 Wullo S. Tuesday, December 26, 2023
  • 173. Confidence interval …… ] / ) 1 ( . , / ) 1 ( . [ ] . , . [ 2 2 2 2 n p p z p n p p z p n z x n z x − + − −  + −          173 A (1-α) 100% confidence interval for unknown population mean and population proportion is given as follows; Wullo S. Tuesday, December 26, 2023
  • 178. Confidence intervals… • The 95% confidence interval is calculated in such a way that, under the conditions assumed for underlying distribution, the interval will contain true population parameter 95% of the time. • Loosely speaking, you might interpret a 95% confidence interval as one which you are 95% confident contains the true parameter. • 90% CI is narrower than 95% CI since we are only 90% certain that the interval includes the population parameter. • On the other hand 99% CI will be wider than 95% CI; the extra width meaning that we can be more certain that the interval will contain the population parameter. But to obtain a higher confidence from the same sample, we must be willing to accept a larger margin of error (a wider interval). 178 Wullo S. Tuesday, December 26, 2023
  • 179. Confidence intervals… • For a given confidence level (i.e. 90%, 95%, 99%) the width of the confidence interval depends on the standard error of the estimate which in turn depends on the – 1. Sample size:-The larger the sample size, the narrower the confidence interval (this is to mean the sample statistic will approach the population parameter) and the more precise our estimate. Lack of precision means that in repeated sampling the values of the sample statistic are spread out or scattered. The result of sampling is not repeatable. 179 Wullo S. Tuesday, December 26, 2023
  • 180. Confidence intervals… - To increase precision (of an SRS), use a larger sample. You can make the precision as high as you want by taking a large enough sample. The margin of error decreases as√n increases. • 2. Standard deviation:-The more the variation among the individual values, the wider the confidence interval and the less precise the estimate. As sample size increases SD decreases. • Z is the value from SND • 90% CI, z=1.64 • 95% CI, z=1.96 • 99% CI, z=2.58 180 Wullo S. Tuesday, December 26, 2023
  • 181. Confidence interval …… • If the population standard deviation is unknown and the sample size is small (<30), the formula for the confidence interval for sample mean is: x ± t (s/√n) – x is the sample mean – s is the sample standard deviation – n is the sample size – t is the value from the t-distribution with (n-1) degrees of freedom 181 Wullo S. Tuesday, December 26, 2023
  • 182. Mean Example  A SRS of 16 apparently healthy subjects yielded the following values of urine excreted (milligram per day); 0.007, 0.03, 0.025, 0.008, 0.03, 0.038, 0.007, 0.005, 0.032, 0.04, 0.009, 0.014, 0.011, 0.022, 0.009, 0.008 Compute point estimate of the population mean Construct 90%, 95%, 98% confidence interval for the mean (0.01844-1.65x0.0123/4, 0.01844+1.65x0.0123/4)=(0.0134, 0.0235) (0.01844-1.96x0.0123/4, 0.01844+1.96x0.0123/4)=(0.0124, 0.0245) (0.01844-2.33x0.0123/4, 0.01844+2.33x0.0123/4)=(0.0113, 0.0256) 182 01844 . 0 16 295 . 0 n x = x then , values observed n are x ..., , x , x If n 1 = i i n 2 1 = =  Wullo S. Tuesday, December 26, 2023
  • 183. Proportion example • In a survey of 300 automobile drivers in one city, 123 reported that they wear seat belts regularly. Estimate the seat belt rate of the city and 95% confidence interval for true population proportion. • Answer : p= 123/300 =0.41=41% n=300, Estimate of the seat belt of the city at 95% CI = p ± z ×(√p(1-p) /n) =(0.35,0.47) 183 Wullo S. Tuesday, December 26, 2023
  • 184. Summary • Students sometimes have difficulty deciding whether to use Za/2 or t a/2 values when finding confidence intervals 184 Wullo S. Tuesday, December 26, 2023
  • 185. HYPOTHESIS TESTING Introduction – Researchers are interested in answering many types of questions. For example, A physician might want to know whether a new medication will lower a person’s blood pressure. – These types of questions can be addressed through statistical hypothesis testing, which is a decision-making process for evaluating claims about a population. 185 Wullo S. Tuesday, December 26, 2023
  • 186. Hypothesis Testing • The formal process of hypothesis testing provides us with a means of answering research questions. • Hypothesis is a testable statement that describes the nature of the proposed relationship between two or more variables of interest. • In hypothesis testing, the researcher must defined the population under study, state the particular hypotheses that will be investigated, give the significance level, select a sample from the population, collect the data, perform the calculations required for the statistical test, and reach a conclusion. 186 Wullo S. Tuesday, December 26, 2023
  • 187. Idea of hypothesis testing 187 Wullo S. Tuesday, December 26, 2023
  • 188. type of Hypotheses • Null hypothesis (represented by HO) is the statement about the value of the population parameter. That is the null hypothesis postulates that ‘there is no difference between factor and outcome’ or ‘there is no an intervention effect’. • Alternative hypothesis (represented by HA) states the ‘opposing’ view that ‘there is a difference between factor and outcome’ or ‘there is an intervention effect’. 188 Wullo S. Tuesday, December 26, 2023
  • 189. Methods of hypothesis testing • Hypotheses concerning about parameters which may or may not be true • The three methods used to test hypotheses are • The traditional method • The P-value method (New approach) • The confidence interval method 189 Wullo S. Tuesday, December 26, 2023
  • 190. Steps in hypothesis testing 1 Identify the null hypothesis H0 and the alternate hypothesis HA. 190 3 Select the test statistic and determine its value from the sample data. This value is called the observed value of the test statistic. Remember that t statistic is usually appropriate for a small number of samples; for larger number of samples, a z statistic can work well if data are normally distributed. 4 Compare the observed value of the statistic to the critical value obtained for the chosen a. 5 Make a decision. 6 Conclusion 2 Choose a. The value should be small, usually less than 10%. It is important to consider the consequences of both types of errors. Wullo S. Tuesday, December 26, 2023
  • 191. Test Statistics  Because of random variation, even an unbiased sample may not accurately represent the population as a whole.  As a result, it is possible that any observed differences or associations may have occurred by chance. • A test statistics is a value we can compare with known distribution of what we expect when the null hypothesis is true. • The general formula of the test statistics is: Observed _ Hypothesized Test statistics = value value . Standard error • The known distributions are Normal distribution, student’s distribution , Chi- square distribution …. 191 Wullo S. Tuesday, December 26, 2023
  • 192. Critical value • The critical value separates the critical region from the noncritical region for a given level of significance 192 Wullo S. Tuesday, December 26, 2023
  • 193. Decision making • Accept or Reject the null hypothesis • There are 2 types of errors • Type I error is more serious error and it is the level of significant • power is the probability of rejecting false null hypothesis and it is given by 1-β 193 Type of decision H0 true H0 false Reject H0 Type I error (a) Correct decision (1- β) Accept H0 Correct decision (1- a) Type II error (β) Wullo S. Tuesday, December 26, 2023
  • 197. H0: m = m0 H1: m < m0 0 0 0 H0: m = m0 H1: m > m0 H0: m = m0 H1: m  m0   /2 Critical Value(s) Rejection Regions One tailed test Two tailed test Types of testes Wullo S. 197 Tuesday, December 26, 2023
  • 199. Steps in hypothesis testing….. 199 If the test statistic falls in the critical region: Reject H0 in favour of HA. If the test statistic does not fall in the critical region: Conclude that there is not enough evidence to reject H0. Wullo S. Tuesday, December 26, 2023
  • 201. The P- Value • In most applications, the outcome of performing a hypothesis test is to produce a p-value. • P-value is the probability that the observed difference is due to chance. • A large p-value implies that the probability of the value observed, occurring just by chance is low, when the null hypothesis is true. • That is, a small p-value suggests that there might be sufficient evidence for rejecting the null hypothesis. • The p value is defined as the probability of observing the computed significance test value or a larger one, if the H0 hypothesis is true. For example, P[ Z >=Zcal/H0 true]. 201 Wullo S. Tuesday, December 26, 2023
  • 202. P-value…… • A p-value is the probability of getting the observed difference, or one more extreme, in the sample purely by chance from a population where the true difference is zero. • If the p-value is greater than 0.05 then, by convention, we conclude that the observed difference could have occurred by chance and there is no statistically significant evidence (at the 5% level) for a difference between the 202 Wullo S. Tuesday, December 26, 2023
  • 203. How to calculate P-value o Use statistical software like SPSS, SAS…….. o Hand calculations —obtained the test statistics (Z Calculated or t- calculated) —find the probability of test statistics from standard normal table —subtract the probability from 0.5 —the result is P-value Note if the test two tailed multiply 2 the result. Wullo S. 203 Tuesday, December 26, 2023
  • 204. P-value and confidence interval • Confidence intervals and p-values are based upon the same theory and mathematics and will lead to the same conclusion about whether a population difference exists. • Confidence intervals are referable because they give information about the size of any difference in the population, and they also (very usefully) indicate the amount of uncertainty remaining about the size of the difference. • When the null hypothesis is rejected in a hypothesis-testing situation, the confidence interval for the mean using the same level of significance will not contain the hypothesized mean. 204 Wullo S. Tuesday, December 26, 2023
  • 205. The P- Value ….. • But for what values of p-value should we reject the null hypothesis? – By convention, a p-value of 0.05 or smaller is considered sufficient evidence for rejecting the null hypothesis. – By using p-value of 0.05, we are allowing a 5% chance of wrongly rejecting the null hypothesis when it is in fact true. • When the p-value is less than to 0.05, we often say that the result is statistically significant. 205 Wullo S. Tuesday, December 26, 2023
  • 207. Hypothesis testing for single population mean • EXAMPLE 1: A researcher claims that the mean of the IQ for 16 students is 110 and the expected value for all population is 100 with standard deviation of 10. Test the hypothesis . • Solution 1. Ho:µ=100 VS HA:µ≠100 2. Assume α=0.05 3. Test statistics: z=(110-100)4/10=4 4. z-critical at 0,025 is equal to 1.96. 5. Decision: reject the null hypothesis since 4 ≥ 1.96 6. Conclusion: the mean of the IQ for all population is different from 100 at 5% level of significance. 207 Wullo S. Tuesday, December 26, 2023
  • 208. Hypothesis testing for single proportions • Example: In the study of childhood abuse in psychiatry patients, brown found that 166 in a sample of 947 patients reported histories of physical or sexual abuse. a) constructs 95% confidence interval b) test the hypothesis that the true population proportion is 30%? • Solution (a) • The 95% CI for P is given by 208 ] 2 . 0 ; 151 . 0 [ 0124 . 0 96 . 1 175 . 0 947 825 . 0 175 . 0 96 . 1 175 . 0 ) 1 ( 2         −   n p p z p  Wullo S. Tuesday, December 26, 2023
  • 209. Example…… • To the hypothesis we need to follow the steps Step 1: State the hypothesis Ho: P=Po=0.3 Ha: P≠Po ≠0.3 Step 2: Fix the level of significant (α=0.05) Step 3: Compute the calculated and tabulated value of the test statistic 209 96 . 1 39 . 8 0149 . 0 125 . 0 947 ) 7 . 0 ( 3 . 0 3 . 0 175 . 0 ) 1 ( = − = − = − = − − =  tab cal z n p p Po p z Wullo S. Tuesday, December 26, 2023
  • 210. Example…… • Step 4: Comparison of the calculated and tabulated values of the test statistic • Since the tabulated value is smaller than the calculated value of the test the we reject the null hypothesis. • Step 6: Conclusion • Hence we concluded that the proportion of childhood abuse in psychiatry patients is different from 0.3 • If the sample size is small (if np<5 and n(1-p)<5) then use student’s t- statistic for the tabulated value of the test statistic. 210 Wullo S. Tuesday, December 26, 2023
  • 211. Two sample mean and proportion • Still now we have seen estimate for only single mean and single proportion. However it is possible to compute point and interval estimation for the difference of two sample means. • let x1, x2, …, xn1 are samples from the first population and y1, y2, …, yn2 be sample from the second population. • Sample mean for the first population be • Sample mean for the second population • Then the point estimate for the difference of means (µ1-µ2) is given by 211 ) ( Y X − Y X Wullo S. Tuesday, December 26, 2023
  • 212. Two sample estimation • Confidence interval estimation • A (1-α)100% confidence interval for the difference of means is given by • If are unknown, then can be estimated by 212 2 2 2 1 2 1 2 ) ( n n z y x    +  − 2 1,   and 2 1, s and s Wullo S. Tuesday, December 26, 2023
  • 213. Hypothesis testing for two sample means • The steps to test the hypothesis for difference of means is the same with the single mean Step 1: state the hypothesis Ho: µ1-µ2 =0 VS HA: µ1-µ2 ≠0, HA: µ1-µ2 <0, HA: µ1-µ2 >0 Step 2: Significance level (α) Step 3: Test statistic 213 2 2 2 1 2 1 2 1 ) ( ) ( n n y x zcal   m m + − − − = Wullo S. Tuesday, December 26, 2023
  • 215. Small sample size and population variance is not given • The test statistic will be student’s t-statistic with degree of freedom equals to n1+n2 -2 • Hence the tabulated value of t is read from the table. • The decision remains the same 215 Wullo S. Tuesday, December 26, 2023
  • 216. Example • A researchers wish to know if the data they have collected provide sufficient evidence to indicate a difference in mean serum uric acid levels between normal individual and individual with down’s syndrome. The data consists of serum uric acid readings on 12 individuals with down’s syndrome and 15 normal individuals. The means are 4.5mg/100ml and 3.4 mg/100ml with standard deviation of 2.9 and 3.5 mg/100ml respectively. 216 0 : 0 : 2 1 2 1  − = − m m m m A O H H Wullo S. Tuesday, December 26, 2023
  • 218. Estimation and hypothesis testing for two population proportion • Let n1 and n2 be the sample size from the two population. If x and y are the out come of interest then the point estimate for each population is given by p1=x/n1 and p2=y/n2 respectively. • The point estimates π1-π2 =p1-p2 • The interval estimate for the difference of proportions is given by • If the sample size is large and n1p1>5, n1 (1-p1)>5, n2p2>5, then 218         − + −  − 2 2 2 1 1 1 2 2 1 ) 1 ( ) 1 ( n p p n p p z p p  Wullo S. Tuesday, December 26, 2023
  • 219. Hypothesis testing for two proportions • To test the hypothesis Ho: π1-π2 =0 VS HA: π1-π2 ≠0 The test statistic is given by 219 2 2 2 1 1 1 2 1 2 1 ) 1 ( ) 1 ( ) ( ) ( n p p n p p p p zcal − + − − − − =   Wullo S. Tuesday, December 26, 2023
  • 220. Small sample size • If the sample size is small and n1p1>5, n2p2<5, then use student’s t-statistic at n1+ n2-2 degrees of freedom with the given level of significant. 220 Wullo S. Tuesday, December 26, 2023
  • 221. Chi-square test • In recent years, the use of specialized statistical methods for categorical data has increased dramatically, particularly for applications in the biomedical and social sciences. • Categorical scales occur frequently in the health sciences, for measuring responses. • E.g. • patient survives an operation (yes, no), • severity of an injury (none, mild, moderate, severe), and • stage of a disease (initial, advanced). • Studies often collect data on categorical variables that can be summarized as a series of counts and commonly arranged in a tabular format known as a contingency table 221 Wullo S. Tuesday, December 26, 2023
  • 222. Chi-square Test Statistic cont’d… • As with the z and t distributions, there is a different chi-square distribution for each possible value of degrees of freedom. Chi-square distributions with a small number of degrees of freedom are highly skewed; however, this skewness is attenuated as the number of degrees of freedom increases. The chi-squared distribution is concentrated over nonnegative values. It has mean equal to its degrees of freedom (df), and its standard deviation equals √(2df ). As df increases, the distribution concentrates around larger values and is more spread out. The distribution is skewed to the right, but it becomes more bell- shaped (normal) as df increases. 222 Wullo S. Tuesday, December 26, 2023
  • 223. The degrees of freedom for tests of hypothesis that involve an rxc contingency table is equal to (r-1)x(c-1); 223 Wullo S. Tuesday, December 26, 2023
  • 224. Test of Association • The chi-squared (2) test statistics is widely used in the analysis of contingency tables. • It compares the actual observed frequency in each group with the expected frequency (the later is based on theory, experience or comparison groups). • The chi-squared test (Pearson’s χ2) allows us to test for association between categorical (nominal!) variables. • The null hypothesis for this test is there is no association between the variables. Consequently a significant p-value implies association. 224 Wullo S. Tuesday, December 26, 2023
  • 225. Test Statistic: 2-test with d.f. = (r-1)x(c-1) 225 ( )  − = j i ij ij ij E E O , 2 2  n C R i E j i th ij  =  = total grand al column tot j total raw th Oij=observed frequency, Eij=expected frequency of the cell at the juncture of I th raw & j th column Wullo S. Tuesday, December 26, 2023
  • 226. Procedures of Hypothesis Testing 1. State the hypothesis 2. Fix level of significance 3. Find the critical value (x2 (df, α)) 4. Compute the test statistics 5. Decision rules; reject null hypothesis if test statistics > table value. 226 Wullo S. Tuesday, December 26, 2023
  • 227. Test of associations for 2x2 tables • If we call the frequencies in the four cells of 2x2 table a, b, c and d then the table is given by 227 Disease status Row total D ND Exposur e Status E a b a+b NE c d c+d Column total a+c b+d N Wullo S. Tuesday, December 26, 2023
  • 228. Test of association • If the contingency table is 2x2 • Is the table is rxc then 228 ( ) ) )( )( )( ( 2 2 d c b a d b c a bc ad n + + + + − =  ( )  − = j i ij ij ij E E O , 2 2  Wullo S. Tuesday, December 26, 2023
  • 229. Assumptions of the 2 - test The chi-squared test assumes that • Data must be categorical • The data be a frequency data – the numbers in each cell are ‘not too small’. No expected frequency should be less than 1, and – no more than 20% of the expected frequencies should be less than 5. • If this does not hold row or column variables categories can sometimes be combined (re-categorized) to make the expected frequencies larger or use Yates continuity correction. 229 Wullo S. Tuesday, December 26, 2023
  • 230. Yates correction • It is a requirement that a chi-squared test be applied to discrete data. Counting numbers are appropriate, continuous measurements are not. Assuming continuity in the underlying distribution distorts the p value and may make false positives more likely. • Frank Yates proposed a correction to the chi-squared formula. Adding a small negative term to the argument. This tends to increase the p-value, and makes the test more conservative, making false positives less likely. However, the test may now be *too* conservative. • Additionally, chi squared test should not be used when the observed values in a cell are <5. It is, at times not inappropriate to pad an empty cell with a small value, though, as one can only assume the result would be more significant with no value there. 230 Wullo S. Tuesday, December 26, 2023