19. All rights Reserved
C
h
a
p
te
r 8
–
C
o
lle
c
tio
n
National Collection Program
PUH 5302, Applied Biostatistics 1
Course Learning Outcomes for Unit II
Upon completion of this unit, students should be able to:
2. Analyze relevant scientific evidence.
2.1 Compute the appropriate data to compare the extent of
disease between groups.
2.2 Summarize data collection in a sample.
20. 7. Evaluate the role of biostatistical analysis in public health
research.
7.1 Prepare an outline of a selected topic related to
biostatistical analysis.
Course/Unit
Learning Outcomes
Learning Activity
2.1
Unit Lesson
Chapter 3
Unit II Problem Solving
2.2
Unit Lesson
Chapter 4
Unit II Problem Solving
7.1 Unit II Research Paper Outline
Reading Assignment
Chapter 3: Quantifying the Extent of Disease
Chapter 4: Summarizing Data Collected in the Sample
Unit Lesson
Welcome to Unit II. In the previous unit, we had an overview of
biostatistics as an applied statistics to help us
find meaning and interpretation of some public health issues we
face. We defined some terms and
familiarized ourselves with some common study designs as
21. well. We also briefly discussed some differences
between differential and inferential statistics and their
application in biostatistics.
In this unit, we will discuss how scientific data are organized,
identified, and retrieved. Specifically, we will
select, compute, and interpret appropriate measures to compare
the extent of disease between groups, and
we will discuss how to summarize collected data in a sample.
Be prepared. We will be solving some statistical
problems as part of our learning in this unit.
Quantifying the Extent of a Disease
In addressing prevalence and incidence rates, we have to deploy
both descriptive and inferential statistics in
order to make generalizations about a given population. The
first method normally taken is that of descriptive
statistics, which enables the researcher to generate inferences.
These concepts help us address prevalence
and incidence. First, let’s attempt to answer a question: What is
disease prevalence?
Prevalence refers to something commonly occurring. With
regard to diseases, prevalence is considered the
number of cases of a disease showing up in a particular
population at a given time (Koch, 2015). For
comparison purposes, many experts use prevalence rate, which
is the measure of the proportion of infected
persons with a particular disease at a specific time or period of
time. Many people confuse prevalence with
UNIT II STUDY GUIDE
Organizing, Identifying, and Retrieving
Relevant Scientific Evidence
22. PUH 5302, Applied Biostatistics 2
UNIT x STUDY GUIDE
Title
incidence, but they are different. The difference lies in the fact
that prevalence covers all cases, both old and
new cases within a population at a specific time. Incidence only
deals with new cases.
Prevalence rate measure is used by public health workers to
determine the likelihood of one contracting a
disease. The number of people who have contracted a disease
may be referred to as cases or prevalent
cases. Epidemiologists or public health practitioners may want
to know the prevalence rate of a disease in a
given population. They will count the total number of cases or
infected persons with the disease that actually
exist in a population divided by the total population. For
example, if the number of people infected with HIV is
1500 out of a total population of 10,000 people, the prevalence
rate is 1500/10000 x 10n which equals 0.15
(15000 per every 100,000 people).
Point prevalence (PP) refers to the prevalence of a disease
measured at a specific time. It is the proportion of
persons with a particular disease attribute on a specific date or
point in time (Sullivan, 2018).
23. Calculating prevalence: Prevalence of a disease is determined
by using the following formula:
Prevalence = all new and old cases in a specific time /
population during the specific period x 10n
Attribute prevalence (AP) is determined by using this formula:
AP = persons infected with an attribute during a specific time /
population during that specific period x 10n
Note: (10n = 1 or 100; 1,000; 100,000)
Point prevalence (PP) is determined by using the following
formula:
Number of persons with disease / number of persons examined
at baseline
Let’s solve a problem involving prevalence of cardiovascular
disease (CVD) using data from Table 3-1 on
page 24 in your textbook as an example.
Incidence
Incidence refers to the frequency or extent of occurrence. For
example, there is a high incidence of HIV
among drug users. Public health researchers may decide to
follow a certain population for a period of time. In
so doing, they may be able to calculate the incidence of a
24. disease. Incidence is also considered how likely
someone is to develop a disease over time (Sullivan, 2018). In
other words, disease incidence is considered
the number of new infections or cases at a given period of time.
Incidence rate (IR) is the number of new cases of a disease
divided by the number of persons at risk for the
disease (Sullivan, 2018). For example, if in a year period, 10
men are diagnosed with HIV, out of a total male
study population of 400 with no infection at the beginning of
the period, then the incidence of HIV in this
population was 0.025 (or 2,500 per 100,000 men-years of
study). The incident rate is usually multiplied by
Details: Total men examined = 1792
Total affected = 244
Total Women examined = 2007
Total affected = 135
Let’s find the prevalence. We will express our answers in
percentages
Prevalence of CVD = (244 + 135 = 379) / 3799 = 0.0998 =
9.98%
Prevalence of CVD in Men = 244 / 1792 = 0.1362 = 13.62%
Prevalence of CVD in Women = 135 / 2007 = 0.0673 = 6.73%
25. PUH 5302, Applied Biostatistics 3
UNIT x STUDY GUIDE
Title
some multiples of 10 (for example, 10; 100; 1000; 10,000). The
IR is reported as a rate in relation to a specific
time interval. Taking our example, the IR of HIV is reported as
2.5 per 100 person-years.
Let’s see how we got this answer. The incident rate is
determined in the following way:
��������� ���� =
������ �� ������� �ℎ� ������� �������
������ � ��������� ������
��� �� �ℎ� �����ℎ� �� ���� ������ �ℎ��ℎ
������� ��� ������� − ����
From the example above, the incident rate = 10 / 400 = 0.025
= 0.025 x 100
= 2.5 per 100 person-years
Incidence proportion (IP) is the proportion of an initially
disease-free population who develops disease,
becomes injured, or dies during a specified period of time.
Some experts use incidence synonymous with
rate, risk, probability of getting disease, and cumulative
26. incidence. They have used proportions to measure
incidence for comparison purposes. IP is given using the
following formula:
IP = number of cases of the disease during a specific time / size
of the population at the beginning of that
period
When the incidence rate is calculated from the onset to the
outset of the study, the sum of the result is
referred to as cumulative incidence (Sullivan, 2018).
Cumulative incidence is calculated in the following way:
���������� ��������� =
������ �� ������� �ℎ� ������� �������
������ � ��������� ������
������ �� ������� �� ���� �� ��������
Measuring Differences Between Groups
Public health experts often report differences between groups in
order to compare disease trends among
different groups or demographics. They do so by using simple
differences and ratios. A ratio is a quantitative
relation between two amounts or quantities that show the
number of times one value is contained within the
other. For example, 1 is to 2 or 1:2 or ½.
Differences are measured using the following:
27. Risk difference (excess risk) is the difference in point
prevalence, cumulative incidence, or incidence rates
among the groups (Sullivan, 2018). It shows the absolute effect
of the exposure to the disease condition.
Population attributable risk (PAR) is the relationship between a
risk factor and the likelihood of a disease
(Sullivan, 2018). The PAR is calculated by using the formula
below.
��� =
����������������� −
�������������������
�����������������
Let’s solve some problems using these formulas using
information from Table 3-2 in your textbook.
Using the data above, we will calculate risk difference (RD) of
CVD between smokers and non-smokers and
population attributable risk.
1. RD = Exposed (81 / 744 = 0.1089) - Unexposed (298 / 3055 =
0.0975) = 0.0114 (1.14)
This shows that the risk or prevalence is 0.0114 higher in
smokers compared to non-smokers
28. PUH 5302, Applied Biostatistics 4
UNIT x STUDY GUIDE
Title
2.
��� =
����������������� −
��������������������
�����������������
= (0.0998 – 0.0975) / 0.0998 = 0.023 = 2.3%
This means 2.3% of the cases of CVD were due to smoking or
smoke-related exposure.
Relative risk (RR) is another ratio used to compare prevalence
between groups. RR is calculated using the
following formula:
RR = PP exposed / PP unexposed
Odds ratio is another ratio used to measure RR of disease
29. conditions that are rare, and it considers
prevalence in cases that are less than 10 percent. Odds ratio is
computed to measure relative risk under
certain study designs like case-control where relative risk
computation is not possible.
���� ����� =
�����������������
(1 − ����������������� )
⁄
�������������������
(1 − ������������������� )
⁄
Summarizing Data Collected in a Sample
Statisticians collect information or data from a sample on a
phenomenon under study. Data are facts, an
observation, information organized for analysis or to be used
for the basis to make a decision, or numerical
information. Data are collected in different ways using various
scales that are discussed below.
Nominal scales are used when you want to place data into
categories without giving data structure or order
(e.g., yes or no; male and female; color of hair—black, brown,
white, gray, other). Nominal scales do not imply
any ordering among responses (M&E Studies, n.d.).
30. Ordinal scales are used when you want to rank variables. For
example, patients are asked to rank their pain
level from 0–10, with 10 being the worst. These pain levels may
vary from patient to patient. There is no
defined relative positional order. It is a gross order subjective
to the patient’s feeling.
A researcher may also choose to measure patient satisfaction
with services delivered during inpatient
admission. The researcher may use an ordinal scale such as to
specify their feelings as very dissatisfied,
somewhat dissatisfied, somewhat satisfied, or very satisfied to
rank patient satisfaction.
Interval scales are used widely in statistics. A typical example
is the Likert scale. In a Likert scale, you may be
asked to measure or rate your level of satisfaction on a 5-point
scale from strongly satisfied, satisfied, neutral,
dissatisfied, or strongly dissatisfied.
Example of an ordinal scale
(Weis, 2015)
PUH 5302, Applied Biostatistics 5
UNIT x STUDY GUIDE
Title
31. Summarizing data collected can be done in various ways. The
data/variables must be organized in order to
be meaningful for descriptive or inferential analysis. There are
several types of variables.
two levels (e.g., male and female, yes and
no).
riables, may also have two
categories, except ordinal variables can
be ranked or ordered. For example, very satisfied, satisfied,
dissatisfied, very dissatisfied.
taking any value between a certain set
of real numbers. Examples include height, time, age, and
temperature (Sullivan, 2018).
Data are collected from a sample within a population. A
population simply refers to an entire group having
common observable characteristics: for example, a population
living in a specific geographical location. Often,
when the population is too large, it is impossible to collect data
from everyone. Therefore, we select a sample
from that population. The results obtained from the study are
generalized to the entire population.
Data Interpretation
Data are summarized or interpreted in various ways using charts
and statistics. Some of the most familiar
32. methods are displayed below.
Example of a Likert scale
(Smith, 2011)
PUH 5302, Applied Biostatistics 6
UNIT x STUDY GUIDE
Title
Statistics Charts
the frequencies
given values
-point
33. from the normal (Boeree, n.d.)
a group
-whisker plots for continuous variables
In summary, public health practitioners and other researchers
have used statistical theories and principles to
analyze, interpret, and report research data for several decades.
Public health professionals are required to
report findings relating to diseases and other population health
issues. The use of statistical methods has
played a major role in achieving this milestone.
References
Boeree, C. G. (n.d.). Descriptive statistics. Retrieved from
http://webspace.ship.edu/cgboer/descstats.html
Koch, G. (2015). Basic allied health statistics and analysis (4th
ed.). Stamford, CT: Cengage Learning.
M&E Studies. (n.d.). Types of measurement scales. Retrieved
from
34. http://www.mnestudies.com/research/types-measurement-scales
Smith, N. (2011). Example Likert scale [Image]. Retrieved from
https://commons.wikimedia.org/wiki/File:Example_Likert_Scale
.jpg
Sullivan, L. M. (2018). Essentials of biostatistics in public
health (3rd ed.). Burlington, MA: Jones & Bartlett
Learning.
Weis, R. (2015). Children’s pain scale [Image]. Retrieved from
https://commons.wikimedia.org/wiki/File:Children%27s_pain_s
cale.JPG
Learning Activities (Nongraded)
Nongraded Learning Activities are provided to aid students in
their course of study. You do not have to submit
them. If you have questions, contact your instructor for further
guidance and information.
Complete the following Chapter 3 practice problems: 2, 4, 7,
and 10 on pages 31–33 of your textbook. Also,
compete the following Chapter 4 practice problems: 12–19 on
page 65 in your textbook. Be sure to show all of
your work.
37. All rights Reserved
C
h
a
p
te
r 1
0
–
A
w
a
re
n
e
s
s
4
• Factoring in all elements of situational awareness
should create an overview of current security risk
• Descriptors such as high, medium, and low are too
vague to be helpful
• Security risk levels should be linked with actionable
items
42. p
te
r 1
0
–
A
w
a
re
n
e
s
s
9
Managing Vulnerability Information
• Situational awareness for national infrastructure
protection requires a degree of attention to daily
trivia around vulnerability information
• Practical heuristics for managing vulnerability
information
– Structured collection
– Worst case assumptions
– Nondefinitive conclusions
44. All rights Reserved
C
h
a
p
te
r 1
0
–
A
w
a
re
n
e
s
s
11
Managing Vulnerability Information
• Three basic rules for managers
– Always assume adversary knows as much or more about
your infrastructure
– Assume the adversary is always keeping vulnerability-
related secrets from you
55. A
w
a
re
n
e
s
s
National Awareness Program
1. Research paper assignment is to write a research paper that
explains how defense-in-depth (Chapter 6) and awareness
(Chapter 10) are complementary techniques to detect emerging
threats and strengthen countermeasures.
2. List of sources must be in APA format, and you MUST cite
your reference in the body of the paper using APA in-text
citation format
3. A source is any paper or article that you will reference in
your paper
4. NO PLAGIARISM AT ALL
5.
a. 2 peer reviewed resources
b. Paper MUST address: How defense-in-depth (Chapter 6) and
awareness
(Chapter 10) are complimentary techniques to
detect emerging threats and
strengthen countermeasures
56. c. Cited sources must directly support your paper (i.e. not
incidental references)
d. At least 500 words in length (but NOT longer than 1000
words)
6. Please use 5 to 6 reference citations
7. TEXT BOOK: Amoroso, E. G. (2012). Cyber attacks:
protecting national infrastructure. Elsevier.
8. If you are not sure how to identify peer reviewed papers or
articles, please visit the following resources:
a. http://diy.library.oregonstate.edu/using-google-scholar-find-
peer-reviewed-articles
b. http://libguides.gwu.edu/education/peer-reviewed-articles
Research Paper Rubric
Component 100% 75% 50% 25% 0
Basic
Requirements
Formatted correctly, at
least 500 words in
length, citation page
and internal citations
correct (APA format), at
least 2 cited peer
reviewed sources.
Does not meet required
57. page length, and/or
does not have 2 cited
peer reviewed sources.
Thesis
Statement
Engaging, challenging,
and clearly focuses the
paper. Effectively
stated in the
introduction and
carried throughout the
paper.
Clear and articulate,
engaging and clearly
focuses the paper, but
is not challenging. Is
effectively carried
throughout the paper.
Clearly stated in the
introduction, attempts
to be engaging, is
adequate, but lacks
insight and focus, and is
carried through the
paper.
Included in the
introduction, but is
vague. Lacks insight,
focus, and is not carried
throughout the paper.
58. Is vague or may be
lacking in the
introduction; is not
focused and lacks
development; is not
carried throughout the
paper.
Introduction Strong and effective, it
is engaging and clearly
defines the thesis, as
well as provides a
foundation for the body
of the paper.
Effective and engaging,
defines the thesis and
provides foundation for
the body of the paper.
Introduces the topic of
the paper and builds a
connection between
the topic, the thesis,
and the body of the
paper. Informative but
not engaging or strong.
Introduces the topic of
the paper loosely and
includes the thesis
statement. Provides
little information
regarding the topic.
Includes little more
59. than the thesis and
shows no demonstrable
knowledge of the topic
of the paper.
Content
Strongly and vividly
supports the thesis and
is reflective of strong,
thorough research.
Illustrates extensive
knowledge of the topic.
Every aspect of the
thesis is supported by
quality academic
research.
Strongly supports the
thesis and is reflective
of good, thorough
research. Illustrates
knowledge of the topic,
but could be extended.
Most aspects of the
thesis are supported by
quality academic
research.
Supports the thesis and
reflects research, and
illustrates adequate
knowledge of the topic.
Could be extended and
shows some gaps in
60. understanding of the
topic. Although there
may be some
inconsistencies with
support from quality
academic research.
Related to the thesis
but reflects inadequate
research and
knowledge of the topic,
and demonstrates a
lack of understanding.
There may be a lack of
support from quality
academic research.
Does not convey
adequate
understanding of the
topic, the research, or
the thesis. There are
many unsupported
aspects of the thesis
and the research lacks
quality sources.
Organization Effectively organized.
Logical structure of
points and smooth
transitions convey both
understanding of topic
and care in writing.
Well organized, but
may lack some
61. transitions between
ideas. Logical structure
of most ideas conveys
understanding of topic
and composition.
Ideas are logically
structured, but may
lack transitions
between ideas. Could
benefit from
reorganizing 1 or 2
ideas.
Some significant gaps in
organization are
present but the basic
framework of ideas is
logical. Overall
organization could be
improved.
Much of the paper lacks
organization of ideas,
making it difficult to
understand the ideas
expressed in the paper.
Citation Format APA format is used
accurately as needed
throughout the entire
paper.
APA format is used
throughout the entire
paper, but may show
62. variations or slight
inconsistencies of
format.
APA format is used
throughout the entire
paper, but may be
noticeably inconsistent
in format.
APA format is used
inaccurately and
inconsistently in the
paper.
APA is not used
(regardless of the
number of sources or
citations).
Conclusion Strongly and clearly
connects the thesis
statement to the
research to draw a
specific conclusion that
does not leave the
reader with questions
regarding the thesis.
Clearly connects the
thesis statement and
the research to draw a
clear conclusion that
draws the research to a
logical close.
63. Connects the thesis
statement and research
to draw a conclusion
regarding the research.
Restates the topic
statements throughout
the paper.
Restates the thesis and
the topic statements,
but does not draw any
specific conclusion
about the research or
the thesis.
There is no conclusion;
it restates the thesis at
best.
Conventions Conventions of
standard written
English are used with
accuracy; there are few,
if any, minor errors.
Conventions of
standard written
English are used; there
may be several minor
errors of usage.
Conventions of
standard written
English are used;
however, there may be
a few major errors and
64. few minor errors of
usage.
Conventions of
standard written
English are used with
numerous major errors
and several minor
errors of usage.
The paper shows
significant errors in
conventions of
standard written
English.