SlideShare a Scribd company logo
1 of 11
Download to read offline
Research in Nursing & Health, 2005, 28, 408–418
Focus on Research Methods
Analysis of Count Data Using
Poisson Regression*
M. Katherine Hutchinson1
,{
Matthew C. Holtman2z
1
University of Pennsylvania School of Nursing, 420 Guardian Drive, Philadelphia,
Pennsylvania 19104-6096
2
Fels Institute of Government and Department of Criminology, University of Pennsylvania
Accepted 16 May 2005
Abstract: Nurses and other health researchers are often concerned with
infrequently occurring, repeatable, health-related events such as number of
hospitalizations, pregnancies, or visits to a health care provider. Reports on
the occurrence of such discrete events take the form of non-negative integer
or count data. Because the counts of infrequently occurring events tend to be
non-normally distributed and highly positively skewed, the use of ordinary
least squares (OLS) regression with non-transformed data has several
shortcomings. Techniques such as Poisson regression and negative binomial
regression may provide more appropriate alternatives for analyzing
these data. The purpose of this article is to compare and contrast the use of
these three methods for the analysis of infrequently occurring count data. The
strengths, limitations, and special considerations of each approach are
discussed. Data from the National Longitudinal Survey of Adolescent Health
(AddHealth) are used for illustrative purposes.ß 2005 Wiley Periodicals, Inc. Res
Nurs Health 28:408–418, 2005
Keywords: Poisson regression; count data; data analysis
Nurses and other health researchers are often
concerned with infrequently occurring, repeatable,
health-related events such as number of hospitali-
zations, pregnancies, or visits to a health care
provider.Reportsontheoccurrenceofsuchdiscrete
events take the form of non-negative integer or
count data. Counts of infrequently occurring, repe-
atable events tend to cluster around the values of
*This research uses data from Add Health, a program project designed by J.
Richard Udry, Peter S. Bearman, and Kathleen Mullan Harris, and funded by a grant
P01-HD31921 from the National Institute of Child Health and Human Development,
with cooperative funding from 17 other agencies. Special acknowledgment is due
Ronald R. Rindfuss and Barbara Entwisle for assistance in the original design.
Persons interested in obtaining data files from Add Health should contact Add Health,
Carolina Population Center, 123 W. Franklin Street, Chapel Hill, NC 27516-2524
(www.cpc.unc.edu/addhealth/contract.html).
Contract grant sponsor: National Institute of Mental Health (to MKH); Contract
grant number: R03 MH63659.
Contract grant sponsor: National Institute of Child Health and Human
Development; Contract grant number: P01-HD31921.
Correspondence to M. Katherine Hutchinson.
{
Assistant Professor and Associate Director.
z
Lecturer.
Published online in Wiley InterScience (www.interscience.wiley.com)
DOI: 10.1002/nur.20093
408 ß2005 Wiley Periodicals, Inc.
0 and/or 1 and exhibit low frequencies at higher
values.Thistypeofdistributionhasa positive skew.
They are truncated at 0, and gradually trail off
toward higher values; the mean is characteristically
low but greater than the median because of the
influence of a few relatively large observations. In a
regression model, the distribution of the error term
mirrors the distribution of the dependent variable
itself (Lewis-Beck, 1989). Because ordinary least
squares (OLS) regression assumes normality in the
distribution of error terms and hence in the depen-
dent variable, its use with this type of data is pro-
blematic if the data are not transformed to address
the effects of the positive skew (Lewis-Beck).
Poisson regression and negative binomial
regression may provide more appropriate alter-
natives for the analysis of infrequently occurring,
untransformed count data. Neither of these alter-
native types of regression analysis assumes
normal distribution of the error terms and depen-
dent variables. Poisson regression assumes a
Poisson distribution—a specific type of distribu-
tion in which scores take the form of non-negative
whole number or integer values. The Poisson
distribution is truncated at 0, highly skewed in the
positive direction, and exhibits equidispersion
(i.e., a mean that is equal to the variance; Allison,
1999; Cameron & Trivedi, 1998). For the use of
uncorrected Poisson regression, these character-
istics should be present. When overdispersion is
present (i.e., the variance is greater than the mean)
Poisson regression may still be employed but
statistical corrections must be incorporated into
the model to correct for the overdispersion).
In contrast, negative binomial regression is
based on the assumption of a Poisson-like dis-
tribution (Allison, 1999) and no assumptions
regarding equidispersion are made. When over-
dispersion is present, negative binomial regression
can be employed without any special corrections.
The purpose of this article is to compare and
contrast the use of these three methods for the
analysis of infrequently occurring count data. The
strengths, limitations, and special considerations
of each approach are discussed. Data from the
National Longitudinal Survey of Adolescent
Health (AddHealth) public use dataset are used
for illustrative purposes.
METHODS
In our initial attempts to build comparative
regression models, we used independent variables
from Wave 1 of the AddHealth public use dataset
to predict number of pregnancies reported at Wave
II. However, in order to demonstrate the range of
options available in Poisson regression, we felt
that it was important to include a varying exposure
variable (e.g., years of sexual activity) and model
an outcome variable that exhibited overdispersion
(i.e., model a dependent variable with a variance
greater than its mean). When we restricted our
sample to sexually experienced young women and
used Wave II reports of numbers of pregnancies as
our outcome, the resultant model did not exhibit
overdispersion. Therefore, although we would
have preferred to use more than one wave of data
in order to more accurately model the longitudinal
dynamics of pregnancy, for illustrative purposes,
we have confined the analyses to a cross-sectional
analysis using only data from Wave I.
National Longitudinal
Study of Adolescent Health
The National Longitudinal Study of Adolescent
Health (AddHealth) was mandated to the National
Institute of Child Health and Human Development
(NICHD) by a Congressional act. Detailed infor-
mation about the study can be obtained on its web
site at http://www.cpc.unc.edu/addhealth. The
studyincludesaschool-basedsampleofadolescents
in7ththrough12thgrades.Studentswhocompleted
the in-school questionnaire (n ¼ 90,000) and those
who were listed on the school roster were used as a
samplingframe for a corerandomsampleof12,105
adolescents,stratifiedbygenderandgrade.In-home
interviewswereconductedwiththiscoresample,as
wellasoversamplesofethnicminoritiesandspecial
populations based on self-report data from the in-
school questionnaire. The analyses reported here
were limited to Wave 1 data from in-home inter-
views with adolescents who were included in the
public use dataset subsample.
Sample
The AddHealth public use dataset includes a
subsample of approximately 6,500 respondents.
Of the 3,356 female respondents, 36% (n ¼ 1,241)
were sexually experienced, based on self-reports
of ever having sexual intercourse. Analyses were
limited to those respondents who were sexually
experienced and whose years of sexual experience
could be calculated from their current age and
their age at first sexual intercourse. We deleted a
few outlier cases whose reported ages at first
intercourse were so low that their calculated years
of sexual experience exceeded 10. Of those
ANALYSIS OF COUNT DATA / HUTCHINSON AND HOLTMAN 409
sexually experienced girls remaining in the analy-
sis (n ¼ 1,241), most (n ¼ 1,011, 81%) reported no
pregnancies. Of those who had been pregnant, 190
(15%) reported only one pregnancy; 40 (3%)
reported more than one pregnancy (Table 1).
Independent Variables
For our analyses, we included nine important
predictor variables that have been shown to be
related to adolescent pregnancy—age, race (Afri-
can American, Asian, Hispanic/Latino), marital
status, sexual experience, college plans, contra-
ceptive self-efficacy, and consistent contraceptive
use. The descriptive statistics for all variables are
summarized in Table 2.
Age. Agewas included in all of our analyses, as
sexual activity, childbearing intentions, and con-
traceptive behaviors tend to vary with age. Age, as
self-reported at Wave 1, was recorded in years.
Race. Race was included in the analyses as age
at first intercourse and rates of adolescent
pregnancy have been shown to vary by race (Blum
et al., 2000; Kann et al., 2000). In addition, African
American adolescents have been found to be less
compliant with oral contraceptive use than their
White peers (Scher, Emans, & Grace, 1982).
Three dichotomous dummy variables, coded 1
for yes and 0 for no, were included to repre-
sent respondents’ self-identification as African
American, Hispanic/Latino, and/or Asian.
Marital status. Although the vast majority
(98%) of girls in the sample had never been
married, those who had been married may have
had different attitudes and intentions towards
becoming pregnant than girls who were never
married. Marital status was coded as single/never
married (1) or other (0) and included as a
dichotomous dummy variable in our analyses.
Sexual experience. Pregnancy is directly
related to sexual activity. Sexual experience was
calculated as the number of years that had elapsed
between the year when the Wave I interview was
conducted and the year the respondent reported
having sexual intercourse for the first time.
College plans. Because feelings of hopeless-
ness and lack of future plans may act as contri-
buting factors in adolescent sexual risk-taking and
unintended pregnancy, whether or not respondents
had college plans was assessed at Wave I. The
single-item measure was worded: ‘‘How much do
you want to go to college?’’ The item was scored
on a Likert-type scale from 1 to 5; higher scores
indicated a greater desire to attend college. The
mean score was 4.4 (SD ¼ 1.1).
Contraceptive self-efficacy. Greater sexual
self-efficacy has consistently been shown to be
related to both condom use and contraceptive
use (DiClemente, Lodico, & Grinstead, 1996;
Hutchinson, 2002; Hutchinson & Cooney, 1998;
Hutchinson, Jemmott, Jemmott, Braverman, &
Fong, 2003; Jemmott, Jemmott, & Hacker, 1992).
Four items on the Wave I survey assessed self-
efficacy related to sexual behavior and contra-
ceptive use. Items were scored from 1 to 5. Higher
scores indicated greater contraceptive self-efficacy.
The overall contraceptive self-efficacy score was
computed as the simple average of the four items.
Total possible scores ranged from 1 to 5; the mean
contraceptive self-efficacy score was 3.8 (SD¼ .8).
Consistent contraceptive use. In the Add-
Health survey, a number of questions addressed
contraceptivebehavior.Forthe present analysis,we
Table 1. Frequency Distribution of Outcome
Variable
Pregnancies
reported n %
0 1,011 81.5
1 190 15.3
2 32 2.6
3 6 0.5
4 2 0.2
Table 2. Descriptive Statistics for Sample: Girls Reporting Sexual Experience
Variable n Mean SD Minimum Maximum
Number of pregnancies 1,241 .23 .52 .0 4.0
Current age (age) 1,241 17.01 1.39 12.7 20.7
African American (Black) 1,241 .30 .46 .0 1.0
Hispanic (Hisp) 1,241 .10 .30 .0 1.0
Asian (Asian) 1,241 .03 .18 .0 1.0
Never married (single) (0 ¼ false;1 ¼ true) 1,236 .98 .14 .0 1.0
Years of sexual activity (years) 1,169 1.74 1.42 .0 8.0
College plans (college) 1,237 4.38 1.06 1.0 5.0
Contraceptive self-efficacy (effic) 1,233 3.76 .81 1.0 5.0
Consistent contraceptive use (contrac) 1,241 1.27 .80 .0 2.0
410 RESEARCH IN NURSING & HEALTH
created a proxy measure for consistent contra-
ceptive use based on two questions that asked
whether respondents used contraception during
intercourse the first time and the most recent time
they had sexual intercourse. Possible values on the
scale range from0 (did not use contraceptioneither
time) to 2 (used contraception both times). The
average score on this measure was 1.29 (SD ¼ .8).
In summary, we designated nine predictor
variables to include in our models—age, race,
marital status, sexual experience, college plans,
contraceptive self-efficacy, and consistent contra-
ceptive use. In all of our models, we expected age,
race, marital status, and sexual experience to be
positively associated with number of pregnancies.
College plans, contraceptive self-efficacy, and
consistent contraceptive use were expected to be
inversely related to number of pregnancies.
Outcome Variable
The outcome variable of interest, number of pre-
gnancies, was taken from respondents’ self-
reports. We did not include information on
whether pregnancies were carried to term or
whether they resulted in live births. Almost 19%
of the sexually experienced female adolescents
who participated in AddHealth reported having
been pregnant at least once at Wave I; slightly less
than 82% reported they had never been pregnant.
The number of reported pregnancies ranged from
0 to 4. The number of pregnancies was non-
normally distributed and highly skewed with a
mean of .22 and a variance of .27.
DATA ANALYSES AND RESULTS
Our goal was to compare the use of OLS, Poisson,
and negative binomial regressions for modeling
the effects of nine independent variables on the
number of pregnancies experienced by sexually
experienced adolescent females from the National
Longitudinal Survey of Adolescent Health.
The outcome of interest, number of pregnan-
cies, is an infrequently occurring, discrete and
repeatable event. More than 80% of the sample
had a pregnancy count of 0. Of those who had been
pregnant, 190 had had only one pregnancy; 40
reported more than one pregnancy.
OLS Regression
Ignoring the specific characteristics of a Poisson-
distributed outcome variable, a typical OLS re-
gression model to address these effects might look
like the following: Number of pregnancies ¼ a þ
b1 (age) þ b2 (African American) þ b3 (His-
panic) þ b4 (Asian) þ b5 (single) þ b6 (years of
sexual experience) þ b7 (college) þ b8 (contra-
ceptive self-efficacy) þ b9 (consistent contracep-
tive use).
The results fromthismodel are shownin Table3.
All but two of the effects are significant at the .05
level. In interpreting the estimated effects, we see
that, controlling for the other variables, each addi-
tional year of age increases the predicted number of
pregnancies by .05. African American girls are
expected to have .18 more pregnancies than White
girls. Being single decreases the expected number
of pregnancies by nearly .72. Each year of sexual
activity increases the predicted number of preg-
nancies by .07. Greater aspiration for college
reduces the predicted number of pregnancies
by.04 per point, greater contraceptive self-efficacy
by .05 per point. Each additional point on the
consistent contraceptive use scale decreases the
predicted number of pregnancies by .07.
So what is wrong with using the OLS approach?
The answer is that OLS is inappropriate for models
in which the dependent variable is highly skewed.
Table 3. OLS Regression Output for Model Predicting Number of Pregnancies
Variable DF
Parameter
Estimate
Standard
Error t value p > j t j
Intercept 1 .37410 .22441 1.67 .0958
Current age 1 .04718 .01223 3.86 .0001
African American 1 .17686 .03219 5.49 <.0001
Hispanic 1 .03127 .04971 .63 .5295
Asian 1 .08816 .08171 1.08 .2808
Never married 1 À.71671 .09796 À7.32 <.0001
Years of sexual activity 1 .07221 .01092 6.61 <.0001
College plans 1 À.03711 .01402 À2.65 .0082
Contraceptive self-efficacy 1 À.04573 .02005 À2.28 .0227
Consistent contraceptive use 1 À.07412 .01879 À3.94 <.0001
OLS, ordinary least squares.
ANALYSIS OF COUNT DATA / HUTCHINSON AND HOLTMAN 411
Our dependent variable, number of pregnancies, is
in this category. Its range is restricted by the fact
that its lower bound is zero. Furthermore, although
the number of girls who have been pregnant is
quite low, at the same time, a small handful of girls
reported several pregnancies, up to a maximum of
4. The result is a highly skewed distribution.
An OLS model does not perform well under
these conditions. OLS requires, as a basic assump-
tion, that the dependent variable and the error term
in the model be at least approximately normally
distributed (Lewis-Beck, 1989). With an outcome
variable this skewed, these assumptions are violat-
ed. Using OLS also risks violating the homosce-
dasticity assumption of OLS, which is that the error
terms are evenly distributed across values of the
dependent variable. When one or both of these
violations occur, the standard errors of the para-
meters will be estimated incorrectly, and as a result,
it does not produce accurate estimates for the t-tests
associated with the parameters. The user will be
unable to tell whether the effects are statistically
significant or not. Worse, because one can almost
always mechanically run such a model (even when
the model is inappropriate), there is a risk of taking
the results at face value, making a Type I error, and
reaching the wrong conclusions.
A second reason why a Poisson model is
preferred over OLS regression is that risk exposure
to the outcome of interest varies. OLS assumes
linear relationships between each independent
variable and the outcomes. Although using
OLSallowedcontrollingforyearsofsexualactivity
in the model, in doing so we were implicitly
assuming that the relationship between years and
pregnancies was linear. With a highly skewed
outcomevariable like pregnancy, thisassumptionis
not reasonable. As we will see, Poisson regression
has a more natural way of incorporating this
information into the model, producing more
realistic estimates of how likely a girl is to get
pregnant, holding constant the number of years she
has been sexually active.
Alternative Poisson-Type Models
There are two Poisson-type models (a true Poisson
and a negative binomial model, which is a
generalization of the Poisson) that are better
suited than OLS to analyzing the type of data in
the example. In specifying a Poisson-type model,
there are two initial choices to make: (a) whether
or not to incorporate time-frame adjustments in
the model and (b) whether to specify a true Poisson
approach or a negative binomial approach.
UnlikeOLSmodels,Poisson-type modelscanbe
specified in such a way as to take varying time-
frames or levels of exposure into account. Varying
time-frames or levels of exposure can be addressed
by including a predictor variable that adjusts the
model to account for the time-frame in which
the observations were made, or by standardizing
the outcome variable per time unit. Investigators
can also choose not to include such adjustments. In
the present example, the appropriate variable to
think about is the length of the interval of sexual
activity. Although the risk remains constant for
each exposure, the cumulative risk of becoming
pregnant increases with the number of sexual
encounters, and those encounters are likely to
increase with an increase in the interval of
exposure. Unfortunately, there is no direct measure
ofhow many times a girlhas had sex.Therefore, we
used years of sexual activity to give us an idea of
how much exposure each girl is likely to have
experienced. An 18-year-old girl who has been
sexually active since age fourteen, for example, has
probably had many more chances to become
pregnant than a girl who initiated sexual activity
at age 17. There may, of course be other diffe-
rences—the girl who started at 14 may be, in many
ways, less responsible or less able to plan than the
other girl. This kind of effect is not captured with
the time variable, just different levels of exposure.
A Poisson type regression can incorporate such
an exposure variable in one of two ways: by
including exposure as a standard predictor vari-
able, or incorporating exposure as an offset of the
outcome variable. In the latter case, the researcher
would include the log-transformed value of the
exposure variable (years sexually active) on
the right-hand-side of the equation, and instruct
the computer to assume that its coefficient is equal
to 1. In this kind of model, the outcome variable
predicted will be the rate of log-pregnancies per
unit of exposure (e.g., per year of sexual activity).
The reasons for these minor adjustments are
algebraic. The other alternative is simply to
include the untransformed (e.g., not logged)
exposure variable as a predictor on the right-hand
side of the equation. This slightly simpler specifi-
cation gives fairly comparable results, but requires
a slight shift in interpretation of the coefficients.
The second major option is whether to use a true
Poisson specification, or a negative binomial
specification. A true Poisson model assumes that
the distribution of the outcome variable (number of
pregnancies) has a mean equal to its variance. This
is part of the definition of the Poisson distribution.
The negative binomial (NB) distribution, on the
other hand, makes no such assumption and can
412 RESEARCH IN NURSING & HEALTH
have a variance that is larger or smaller than its
mean. The negative binomial distribution is said,
therefore, to be overdispersed compared to the true
Poisson distribution. The models are interpreted in
exactly the same way. The researcher needs to be
aware of the possibility of overdispersion when
estimating a model, and to pick the better of the two
models (Poisson or NB) depending on whether
there is evidence for overdispersion or not. When
overdispersion is present, true Poisson modeling
can still be employed; however, a standardized
correction must be made for the overdispersion.
Poisson modeling. A Poisson model uses a
maximum likelihood estimation technique (much
like a logistic regression), and can be run in SAS
using the GENMOD procedure. A Poissonversion
of our model is shown below. In this variation, we
used the simpler approach for the exposure
variable by including it in raw form as a predictor
variable. Notice the line that specifies the log link
function. For technical reasons, Poisson and
negative binomial regression model the natural
log of the outcome variable.
Example SAS instructions are provided below
and the output from these instructions is included
in Table 4.
proc genmod data ¼ adhealth.PoissonAnalysis;
model numpreg ¼ age black hisp asian single
years college effic contrac
/link ¼ log
dist ¼ poisson;
run;
The output is similar to that of an OLS
regression, a parameter estimate, a standard error
for each predictor variable, and a p value that is
based on a w2
statistic. The upper and lower limits
of a 95% confidence interval for the parameter also
are provided. Finally, an extra line labeled scale,
which is set to 1 in this first model, appears. The
scale parameter identifies how much the output
was adjusted to take overdispersion into account.
We have not dealt with overdispersion yet in this
model, so this parameter is not yet relevant.
There are significant effects (p < .05) for most
of the predictors, just as before. Each of the effects
is again in the expected direction. In interpreting
the coefficients recall that the outcome variable is
really the natural log (log base e—the power that e
has to be raised to get the original number, where e
is about 2.718) of the number of pregnancies. This
is a somewhat unnatural way to think about
pregnancies (‘‘how many logged pregnancies did
you have?’’), so it may be helpful to transform the
parameter estimates into slightly more accessible
form. This can be accomplished using spreadsheet
software or a scientific calculator. For years of
sexual experience, for example, the estimated
coefficient is .22. Taking e to the .22 power, we get
1.25 pregnancies for every year of sexual experi-
ence: e.22
¼ 1.25.
There is a simple way to interpret these
estimates. The percentage change in the outcome
count (Y) expected with each one unit increase in
the independent variable (X) equals 100 times the
inverse natural log of the coefficient minus one
(Y% ¼ 100 Â [eB
À 1]; Allison, 1999). In this
example, the percent increase in the expected
number of pregnancies for each additional year of
sexual experience would be 100 Â (1.25À1) ¼
25%.
But how reasonable are these estimates? What
is the predicted number of pregnancies for an
average girl in the sample—that is, a girl who has
average values on all of the predictor variables?
No such girl may actually exist in the dataset, but
applying the averages of all the predictors is an
easy way to see whether the model gives reason-
able results. In calculating the predicted number of
Table 4. Poisson Regression Output for Model Predicting Number of Pregnancies, with Years of Sexual
Activity Included as a Predictor
Parameter DF Estimate
Standard
Error
Wald 95% Confidence
Limits w2
Probability > w2
Intercept 1 À3.5182 .9551 À5.3903 À1.6462 13.57 .0002
Current age 1 .2278 .0556 .1188 .3368 16.78 <.0001
African American 1 .6958 .1297 .4416 .9500 28.78 <.0001
Hispanic 1 .1498 .2195 À.2804 .5801 .47 .4949
Asian 1 .5333 .3290 À.1116 1.1782 2.63 .1051
Never married 1 À1.1035 .2122 À1.5194 À.6876 27.04 <.0001
Years of sexual activity 1 .2195 .0381 .1447 .2942 33.13 <.0001
College plans 1 À.1198 .0515 À.2208 À.0188 5.41 .0201
Contraceptive self-efficacy 1 À.1939 .0852 À.3609 À.0268 5.17 .0230
Consistent contraceptive use 1 À.3106 .0812 À.4697 À.1514 14.62 .0001
Scale 0 1 0 1 1
ANALYSIS OF COUNT DATA / HUTCHINSON AND HOLTMAN 413
pregnancies, it is necessary to take into account the
effects of all the predictor variables (not just years
of sexual experience or exposure), and those
effects have to be added up before taking the expo-
nent of e. Referring to the descriptive statistics in
Table 2, multiply each predictor variable’s average
(current age, Black, Hispanic, Asian, single, years
of sexual activity, college plans, contraceptive
self-efficacy, consistent contraceptive use) by its
estimated effect from the model, and include the
intercept:
À3.52þ (.23 Â 17.01) þ (.70 Â .30) þ (.15 Â
.10) þ (.53 Â .03) þ (À1.10 Â .98) þ (.22 Â
1.74) þ (À.12 Â 4.38) þ (À.19 Â 3.76) þ (À.31
 1.27) ¼ À1.75
The total obtained is À1.75 for the log of the
predicted number of pregnancies, or .17 for the
predicted number of pregnancies (eÀ1.75
¼ .17).
The .17 figure is not far off the average number of
pregnancies in the sample, .20.
Model fit can be assessed usingthe deviance and
w2
statistics that are reported in the model output.
Under certain large-sample conditions, both sta-
tistics are approximately distributed as w2
with
degrees of freedom equal to the number of obser-
vations minus the number of parameters. In this
example, the deviance is 781.6 and the w2
is
3,359.9, both with 1,144 degrees of freedom. Both
statistics are non-significant (p ¼ 1 and .95,
respectively; the p values were calculated using
the w2
formula from standard spreadsheet soft-
ware). This suggests that the fit between the model
and the data is very good. A significant deviance or
w2
value would have indicated poor correspon-
dence between the model and the data, perhaps
due to inappropriate use of the Poisson specifica-
tion or due to the omission of an important pre-
dictor variable. In practice, the model fit statistics
will tend to get larger and become statistically
significant as the dataset gets larger, so they are not
always useful for assessing a single model by
itself. An alternative is to use the deviance statistic
to compare nested models with each other; for
details, refer to Cameron and Trivedi (1998).
The next step is to attempt a better specification
of the Poisson model using the offset method to
adjust for varying lengths of exposure to pregnancy
risk, in this case the interval of being sexually
active. In this approach, the log-transformed value
of years sexually activewill be included, calculated
using the following SAS code:
logyrs ¼ log(actvyrs);
In setting up the model, this variable is not
included with the other predictors, but instead a
line is included that says:
/offset ¼ logyrs
The resultant SAS code reads as follows:
proc genmod data ¼ adhealth.PoissonAnalysis;
model numpreg ¼ age black hisp asian single
college effic contrac
/link ¼ log
dist ¼ poisson
offset ¼ lactvyrs;
run;
Thissetsthecoefficientforlogyrsequalto1,and
adjusts the other estimates accordingly. It is done
this way as an algebraic reduction of the ratio
log(pregnancies/years). The logarithm of any
ratio is equal to the log of the numerator minus
the log of the denominator: log(a/b) ¼
log(a) À log(b), therefore log(pregnancies/years)
¼ log(pregnancies) À log(years). To estimate a
model predicting log(pregnancies) alone, add
log(years) to both sides of the regression equation.
The log(years) term then cancels out of the left
side, leaving log(pregnancies) by itself:
log(pregnancies/years) ¼ a þ b1(age) þb2-
(African American) þ b3(Hisp) þ b4(Asian) þ
b5(single) þ b6(college) þ b7(effic) þ b8(contrac)
When log(years) is added to both sides of the
equation, the result is:
log(pregnancies) ¼ a þ log(years) þ b1(age) þ
b2(African American) þ b3(Hispanic) þ b4-
(Asian) þ b5(single) þ b6(college) þ b7(effic) þ
b8(contrac)
The coefficient for log(years) is set to one in
order to maintain the correct scale. The results of
this model are shown in Table 5. In contrast to the
previous models, the effects for age and college
plans are no longer significant. If the sum of
the products of the means of all the predictor
variables with their coefficients is calculated, the
following is the result:
(À2.04) þ (.12 Â 17.01) þ (.66 Â .30) þ (.21
 .10) þ (.49  .03) þ (À.93  .98) þ (À.09 Â
4.38) þ (À.22 Â 3.76) þ (À.27 Â 1.27) ¼ À2.21
The model predicts À2.21 log pregnancies per
year for an average girl, or eÀ2.21
¼ .11 pregnan-
cies per year. Because the average number of years
of sexual activity in the sample is 1.7, this would
give a total of 1.7 Â .11 ¼ .19 pregnancies for an
average girl, one who has average values on all the
predictor variables that we included in the model.
This estimate is the best fit with the known values
in our dataset.
Overdispersion. Overdispersion means having
a Poisson-like distribution that is not quite Pois-
son, because its variance is larger than its mean.
When overdispersion is present, Poisson regres-
sion coefficients are reliable but the variance is
larger than the statistical program would expect
for a Poisson distribution. As a result, the standard
414 RESEARCH IN NURSING & HEALTH
errors calculated by the program are artificially
smaller than the true standard errors, and thus may
lead to more liberal significance test results and a
greater likelihood of Type I errors.
Overdispersion can be detected by comparing
the reported model w2
statistic with its degrees of
freedom (divide w2
by the degrees of freedom). An
overdispersed model will have a ratio greater than
2; the greater the ratio, the greater the over-
dispersion. An omitted predictor variable can
result in apparent overdispersion or underdisper-
sion; the researcher should investigate this possi-
bility while examining the diagnostics.
Overdispersion is corrected for by dividing each
standard error by the square root of the model
Pearson’s w2
divided by the degrees of freedom.
SAS will perform this adjustment automatically if
the PSCALE option is specified (Allison, 1999). In
this example, the code is as follows. Notice it is the
same as before except for the PSCALE option at
the end.
proc genmod data ¼ adhealth.PoissonAnalysis;
model numpreg ¼ age black hisp asian single
college effic contrac
/link ¼ log
dist ¼ poisson
offset ¼ lactvyrs
pscale;
run;
The output, presented in Table 6, looks very
similar to the previous Poisson analysis. The
coefficient estimates are exactly the same. The
difference is in the estimated standard errors,
which are now a little larger. The scale parameter
has also been adjusted upward from 1, reflecting
the degree of adjustment made. The increase in the
standard errors affects the hypotheses tests, redu-
cing the w2
statistics and increasing the p values
slightly. This means that some statistically sig-
nificant effect estimates will no longer reach
significance. In this case, only African American
race and marital status remain as significant
effects. The correction for overdispersion takes a
conservative approach to estimating the standard
errors, which means it tends to err on the side
of type-II errors—saying that an effect is not
significant even if it might be. In order to gain a
little more power to detect significant effects,
Table 5. Poisson Regression Output for Model Predicting Number of Pregnancies, with Years of Sexual
Activity Included as an Offset to Account for Varying Time-Frame of Exposure
Parameter DF Estimate
Standard
Error
Wald 95% Confidence
Limits w2
Probability > w2
Intercept 1 À2.0420 .9607 À3.9248 À.1591 4.52 .0335
Current age 1 .1220 .0543 .0156 .2283 5.05 .0246
African American 1 .6604 .1287 .4082 .9126 26.33 <.0001
Hispanic 1 .2070 .2186 À.2215 .6354 .90 .3438
Asian 1 .4896 .3294 À.1561 1.1352 2.21 .1373
Never married 1 À.9294 .2080 À1.3371 À.5218 19.97 <.0001
College plans 1 À.0871 .0515 À.1881 .0139 2.86 .0909
Contraceptive self-efficacy 1 À.2241 .0845 À.3897 À.0585 7.04 .0080
Consistent
contraceptive use
1 À.2729 .0825 À.4346 À.1113 10.95 .0009
Scale 0 1 0 1 1
Table 6. Poisson Regression Output for Model with Correction for Overdispersion
Parameter DF Estimate
Standard
Error
Wald 95% Confidence
Limits w2
Probability > w2
Intercept 1 À2.0420 1.6456 À5.2673 1.1834 1.54 .2147
Current age 1 .1220 .0929 À.0602 .3041 1.72 .1894
African American 1 .6604 .2205 .2283 1.0925 8.97 .0027
Hispanic 1 .2070 .3745 À.5270 .9409 .31 .5805
Asian 1 .4896 .5643 À.6165 1.5956 .75 .3857
Never married 1 À.9294 .3563 À1.6277 À.2312 6.81 .0091
College plans 1 À.0871 .0883 À.2601 .0859 .97 .3237
Contraceptive self-efficacy 1 À.2241 .1447 À.5078 .0596 2.40 .1215
Consistent contraceptive use 1 À.2729 .1413 À.5498 .0039 3.73 .0533
Scale 0 1.713 0 1.713 1.713
ANALYSIS OF COUNT DATA / HUTCHINSON AND HOLTMAN 415
an alternative approach would be to use a negative
binomial specification.
Negative binomial modeling. Another option
if overdispersion is present is to use the negative
binomial specification rather than the Poisson.
This is easily accomplished by changing the DIST
line in the model statement from POISSON to NB,
as follows:
proc genmod data ¼ adhealth.PoissonAnalysis;
model numpreg ¼ age black hisp asian single
college effic contrac
/link ¼ log
dist ¼ nb
offset ¼ lactvyrs;
run;
The output is very similar and can be interpreted
in the same way. As with the PSCALE adjustment,
the negative binomial specification results in
exactly the same coefficient estimates as in the
main Poisson model, but with slightly larger
standard errors. The negative binomial specifica-
tion takes care of the overdispersion problem, but
is a little less conservative than using corrected
standard errors. When using this model there are
highly significant effects for most of the pre-
dictors, and marginally significant effects for age
and college plans. Table 7 presents the results from
this analytical approach.
Table 8 compares the parameter estimates and
standard errors from all four models: OLS,
Poisson, Poisson with the correction for over-
dispersion, and negative binomial. The parameter
estimates for the Poisson-type models are quite
similar, and the standard errors for the negative
binomial model are closer to those of the uncor-
rected than the corrected Poisson. Because it is not
possible to compare directly the OLS and Poisson-
type parameter estimates (because the former deal
with raw pregnancies and the latter with log-
pregnancies), in the last column of the table, the
expected percent change in pregnancies due to a
unit change in each predictor variable for the
Poisson and negative binomial models are pro-
vided. These transformed parameter estimates
were calculated in the way described above.
DISCUSSION
When making decisions about which modeling
approach is most appropriate for use with a given
data set, consider the degree of normality or non-
normality in the distribution of the outcome vari-
able. If the dependent variable is fairly normally
distributed, OLS regression may be the simplest
approach and an appropriate choice in many cases.
However, when the outcome variable of interest
takes the form of infrequently occurring count
data with highly skewed distributions, Poisson, or
negative binomial regression approaches may be
more appropriate. Through the use of SAS or
similar statistical software, such analyses are also
fairly simple to execute and generate results that
are meaningful and easy to interpret.
In choosing between Poisson and negative
binomial regression, the factors to consider are
overdispersion and power. When overdispersion is
present, and the Poisson assumption of equidis-
persion is violated, either a Poisson model with
corrected standard errors or a negative binomial
model may be used. Negative binomial modeling
may give you a little more statistical power.
Negativebinomialregressionmakesnoassump-
tions of equidispersion and no adjustments need to
be made when overdispersion is present. If Pois-
son modeling is chosen, the statistical adjustments
described above must be made to correct the
standard errors. Hutchinson et al. (2003) provided
examples of Poisson regressions that incorporate
corrections for overdispersion. In their analysis of
Table 7. Negative Binomial Regression Output
Parameter DF Estimate
Standard
Error
Wald 95% Confidence
Limits w2
Probability > w2
Intercept 1 À2.0414 .9608 À3.9245 À.1584 4.51 .0336
Current age 1 .1220 .0543 .0156 .2283 5.05 .0246
African American 1 .6605 .1287 .4082 .9128 26.33 <.0001
Hispanic 1 .2070 .2187 À.2216 .6355 .90 .3439
Asian 1 .4897 .3295 À.1561 1.1354 2.21 .1372
Never married 1 À.9297 .2081 À1.3375 À.5219 19.96 <.0001
College plans 1 À.0871 .0515 À.1882 .0139 2.86 .0909
Contraceptive self-efficacy 1 À.2241 .0845 À.3898 À.0585 7.03 .0080
Consistent contraceptive use 1 À.2730 .0825 À.4347 À.1113 10.96 .0009
Dispersion 0 .0007 0 .0007 .0007
416 RESEARCH IN NURSING & HEALTH
sexual risk behaviors among inner-city adoles-
cent females, only one of the three sexual risk
outcomes measured (number of male sexual
partners during the past 3 months) showed
equidispersion. The other two outcomes (number
of days had sexual intercourse and number of days
had unprotected sexual intercourse) were over-
dispersed and corrected using the procedures
described above.
As is described above and illustrated in Table 8,
OLS, Poisson, and negative binomial regressions
yield regression coefficients that are quite similar.
However, because of the non-normality of the
distributions, the size of the standard errors and the
resultant level of significance of the coefficients
vary. The inappropriate use of OLS regression
could lead one to commit Type I errors and erro-
neously conclude that some variables are sig-
nificant predictors of the number of adolescent
pregnancies when in fact their effects are null.
A related problem is that OLS will model
pregnancies as a linear function of the predictor
variables, which can lead to inaccurate predictions
for the number of pregnancies for girls who
measure even moderately high or low on those
variables. Because the Poisson and negative bino-
mial models build nonlinearity into the model by
way of the log transformation, one gets a much
better model fit to the data and more realistic
predicted values.
Finally, the Poisson and negative binomial
models have a natural way of dealing with the
problem of differential exposure among subjects.
In our example, we take into account the number
of years of sexual experience of each girl, an
important predictor of pregnancies. By using years
of experience or exposure as an ‘‘offset’’ variable,
the models are automatically adjusted to give
results that reflect the risk of pregnancy per year of
exposure.
In conclusion, Poisson and negative binomial
regression may provide more appropriate means
for modeling infrequently occurring repeatable
events or counts. In addition to being better suited
to the data when the outcome variable is skewed,
these approaches have the additional advantages
of being able to accommodate differential expo-
sure and non-linear effects. Although researchers
may be less familiar with these regression models,
they are no more difficult to execute than tradi-
tional OLS regression when using SAS or similar
statistical software packages.
REFERENCES
Allison, P. (1999). Logistic regression using the SAS
system: Theory and application. Cary, NC: The SAS
Institute.
Blum, R.W., Beuhring, T., Shew, M.L., Bearinger, L.H.,
Sieving, R.E., & Resnick, M.D. (2000). The effects of
race/ethnicity, income, and family structure on
adolescent risk behaviors. American Journal of
Public Health, 90, 1879–1884.
Cameron, A.C., & Trivedi, P.K. (1998). Regression
analysis of count data. New York: Cambridge
University Press.
Table 8. Comparison of Four Regression Models
OLS Poisson
Negative
Binomial
Predicted
% Change
per Unit
for Poisson
and NB
Models
Parameter
Estimate SE
Parameter
Estimate SE
Corrected
SE
Parameter
Estimate SE
Intercept .37 .22 À2.04 .96 1.65 À2.04 .96
Current age .05 .01 .12 .05 .09 .12 .05 13.0
African American .18 .03 .66 .13 .22 .66 .13 93.6
Hispanic .03 .05 .21 .22 .37 .21 .22 23.0
Asian .09 .08 .49 .33 .56 .49 .33 63.2
Single À.72 .10 À.93 .21 .36 À.93 .21 À60.5
Years of sexual
activity
.07 .01
College plans À.04 .01 À.09 .05 .09 À.09 .05 À8.3
Contraceptive
self-efficacy
À.05 .02 À.22 .08 .14 À.22 .08 À20.1
Consistent
contraceptive use
À.07 .02 À.27 .08 .14 À.27 .08 À23.9
n for all models 1,154
ANALYSIS OF COUNT DATA / HUTCHINSON AND HOLTMAN 417
DiClemente, R.J., Lodico, M., & Grinstead, O.A.
(1996). African American adolescents residing in
high-risk urban environments do use condoms:
Correlates and predictors of condom use among
adolescents in public housing developments. Pedia-
trics, 98, 269–278.
Hutchinson, M.K. (2002). Sexual risk communication
with mothers and fathers: Influence on the sexual risk
behaviors of adolescent daughters. Family Relations,
51, 238–247.
Hutchinson, M.K., & Cooney, T.M. (1998). Parent-teen
sexual risk communication implications for interven-
tion. Family Relations, 47, 185–194.
Hutchinson, M.K., Jemmott, J.B. III, Jemmott, L.S.,
Braverman, P., & Fong, G.T. (2003). The role of
mother–daughter sexual risk communication in
reducing sexual risk behaviors among urban adoles-
cent females: A prospective study. Journal of
Adolescent Health, 33, 98–107.
Jemmott, J.B. III, Jemmott, L.S., & Hacker, C.I. (1992).
Predicting intentions to use condoms among African
American adolescents: The theory of planned beha-
vior as a model of HIV risk-associated behavior.
Ethnicity and Disease, 2, 371–380.
Kann, L., Kinchen, S., Williams, B., Ross, J., Lowery,
R., Grunbaum, J., et al. ( 2000). Youth risk behavior
surveillance—U.S., 1999. Morbidity and Mortality
Weekly Report, 49, 1–96.
Lewis-Beck, M. (1989). Applied regression: An intro-
duction. Newbury Park, CA: Sage.
Scher, P.W., Emans, S.J., & Grace, E.M. (1982). Factors
associated with compliance to oral contraceptive use
in an adolescent population. Journal of Adolescent
Health Care, 3, 120–123.
418 RESEARCH IN NURSING & HEALTH

More Related Content

What's hot

Perceived caregiver financial barriers and asthma outcomes in urban elementar...
Perceived caregiver financial barriers and asthma outcomes in urban elementar...Perceived caregiver financial barriers and asthma outcomes in urban elementar...
Perceived caregiver financial barriers and asthma outcomes in urban elementar...Center for Managing Chronic Disease
 
Sexual Risk Behaviors Research
Sexual Risk Behaviors Research Sexual Risk Behaviors Research
Sexual Risk Behaviors Research Brittney Johns
 
Screening prevalence of fetal alcohol spectrum disorders in a region of the u...
Screening prevalence of fetal alcohol spectrum disorders in a region of the u...Screening prevalence of fetal alcohol spectrum disorders in a region of the u...
Screening prevalence of fetal alcohol spectrum disorders in a region of the u...BARRY STANLEY 2 fasd
 
Global Medical Cures™ - Screening for Skin Cancer in Adults
Global Medical Cures™ - Screening for Skin Cancer in AdultsGlobal Medical Cures™ - Screening for Skin Cancer in Adults
Global Medical Cures™ - Screening for Skin Cancer in AdultsGlobal Medical Cures™
 
Case control study
Case control studyCase control study
Case control studyTimiresh Das
 
Community Intervention (From Article Review)
Community Intervention (From Article Review)Community Intervention (From Article Review)
Community Intervention (From Article Review)Mohammad Aslam Shaiekh
 
Kouyate et al (2015) Transition from the Lactational Amenorrhea Method to oth...
Kouyate et al (2015) Transition from the Lactational Amenorrhea Method to oth...Kouyate et al (2015) Transition from the Lactational Amenorrhea Method to oth...
Kouyate et al (2015) Transition from the Lactational Amenorrhea Method to oth...Robin Anthony Kouyate, PhD
 
Lecture measure of effect and disase causation
Lecture measure of effect and disase causationLecture measure of effect and disase causation
Lecture measure of effect and disase causationmohamed abdisalam dahir
 
Global Medical Cures™ | USA Chartbook on HealthCare for Blacks
Global Medical Cures™ | USA Chartbook on HealthCare for BlacksGlobal Medical Cures™ | USA Chartbook on HealthCare for Blacks
Global Medical Cures™ | USA Chartbook on HealthCare for BlacksGlobal Medical Cures™
 
Pediatric Hospital Medicine Top 10 (ish) 2014
Pediatric Hospital Medicine Top 10 (ish)  2014Pediatric Hospital Medicine Top 10 (ish)  2014
Pediatric Hospital Medicine Top 10 (ish) 2014rdudas
 
The effect of regular home visits on the development indices of low birth wei...
The effect of regular home visits on the development indices of low birth wei...The effect of regular home visits on the development indices of low birth wei...
The effect of regular home visits on the development indices of low birth wei...Journal of Research in Biology
 
Undernutrition and Mortality Risk Among Hospitalized Children
Undernutrition and Mortality Risk Among Hospitalized ChildrenUndernutrition and Mortality Risk Among Hospitalized Children
Undernutrition and Mortality Risk Among Hospitalized Childrenasclepiuspdfs
 
Medication Adherence in America reportcard full by National Community Pharmac...
Medication Adherence in America reportcard full by National Community Pharmac...Medication Adherence in America reportcard full by National Community Pharmac...
Medication Adherence in America reportcard full by National Community Pharmac...Fran Maciel
 
Fox_SURF.PDF
Fox_SURF.PDFFox_SURF.PDF
Fox_SURF.PDFEmma Fox
 

What's hot (20)

Survey of Nurse Employers in California, Fall 2013
Survey of Nurse Employers in California, Fall 2013Survey of Nurse Employers in California, Fall 2013
Survey of Nurse Employers in California, Fall 2013
 
LARC vs SARC Free PMC
LARC vs SARC Free PMCLARC vs SARC Free PMC
LARC vs SARC Free PMC
 
CiscoElikemi (1)
CiscoElikemi (1)CiscoElikemi (1)
CiscoElikemi (1)
 
Perceived caregiver financial barriers and asthma outcomes in urban elementar...
Perceived caregiver financial barriers and asthma outcomes in urban elementar...Perceived caregiver financial barriers and asthma outcomes in urban elementar...
Perceived caregiver financial barriers and asthma outcomes in urban elementar...
 
Sexual Risk Behaviors Research
Sexual Risk Behaviors Research Sexual Risk Behaviors Research
Sexual Risk Behaviors Research
 
Unmet Basic Needs
Unmet Basic NeedsUnmet Basic Needs
Unmet Basic Needs
 
Screening prevalence of fetal alcohol spectrum disorders in a region of the u...
Screening prevalence of fetal alcohol spectrum disorders in a region of the u...Screening prevalence of fetal alcohol spectrum disorders in a region of the u...
Screening prevalence of fetal alcohol spectrum disorders in a region of the u...
 
Global Medical Cures™ - Screening for Skin Cancer in Adults
Global Medical Cures™ - Screening for Skin Cancer in AdultsGlobal Medical Cures™ - Screening for Skin Cancer in Adults
Global Medical Cures™ - Screening for Skin Cancer in Adults
 
Case control study
Case control studyCase control study
Case control study
 
Community Intervention (From Article Review)
Community Intervention (From Article Review)Community Intervention (From Article Review)
Community Intervention (From Article Review)
 
4. case control study
4. case control study4. case control study
4. case control study
 
Kouyate et al (2015) Transition from the Lactational Amenorrhea Method to oth...
Kouyate et al (2015) Transition from the Lactational Amenorrhea Method to oth...Kouyate et al (2015) Transition from the Lactational Amenorrhea Method to oth...
Kouyate et al (2015) Transition from the Lactational Amenorrhea Method to oth...
 
Lecture measure of effect and disase causation
Lecture measure of effect and disase causationLecture measure of effect and disase causation
Lecture measure of effect and disase causation
 
Global Medical Cures™ | USA Chartbook on HealthCare for Blacks
Global Medical Cures™ | USA Chartbook on HealthCare for BlacksGlobal Medical Cures™ | USA Chartbook on HealthCare for Blacks
Global Medical Cures™ | USA Chartbook on HealthCare for Blacks
 
Pediatric Hospital Medicine Top 10 (ish) 2014
Pediatric Hospital Medicine Top 10 (ish)  2014Pediatric Hospital Medicine Top 10 (ish)  2014
Pediatric Hospital Medicine Top 10 (ish) 2014
 
The effect of regular home visits on the development indices of low birth wei...
The effect of regular home visits on the development indices of low birth wei...The effect of regular home visits on the development indices of low birth wei...
The effect of regular home visits on the development indices of low birth wei...
 
Undernutrition and Mortality Risk Among Hospitalized Children
Undernutrition and Mortality Risk Among Hospitalized ChildrenUndernutrition and Mortality Risk Among Hospitalized Children
Undernutrition and Mortality Risk Among Hospitalized Children
 
Medication Adherence in America reportcard full by National Community Pharmac...
Medication Adherence in America reportcard full by National Community Pharmac...Medication Adherence in America reportcard full by National Community Pharmac...
Medication Adherence in America reportcard full by National Community Pharmac...
 
LDI Research Seminar- Targeted Testing & Treatment for Breast Cancer 11_18_11
LDI Research Seminar- Targeted Testing & Treatment for Breast Cancer 11_18_11LDI Research Seminar- Targeted Testing & Treatment for Breast Cancer 11_18_11
LDI Research Seminar- Targeted Testing & Treatment for Breast Cancer 11_18_11
 
Fox_SURF.PDF
Fox_SURF.PDFFox_SURF.PDF
Fox_SURF.PDF
 

Viewers also liked

Letter of Fray Domingo Salazar to the King of Spain
Letter of Fray Domingo Salazar to the King of SpainLetter of Fray Domingo Salazar to the King of Spain
Letter of Fray Domingo Salazar to the King of Spaindebnie
 
Certificate of Completion Report (15)
Certificate of Completion Report (15)Certificate of Completion Report (15)
Certificate of Completion Report (15)Jabasco Bovee
 
إستخدام التكنولوجا فى التعلم محاضره 1
إستخدام التكنولوجا فى التعلم محاضره 1إستخدام التكنولوجا فى التعلم محاضره 1
إستخدام التكنولوجا فى التعلم محاضره 1mohamed slman
 
Protecting your castle from CASL
Protecting your castle from CASLProtecting your castle from CASL
Protecting your castle from CASLBrian Banks
 
Presentacion Relaciones Interpersonales
Presentacion Relaciones InterpersonalesPresentacion Relaciones Interpersonales
Presentacion Relaciones InterpersonalesMaría Janeth Ríos C.
 
Rietwerk - het jaar 2014 in beeld
Rietwerk - het jaar 2014 in beeldRietwerk - het jaar 2014 in beeld
Rietwerk - het jaar 2014 in beeldRietwerk
 
04 12 2012 - El gobernador Javier Duarte de Ochoa inauguró la remodelación de...
04 12 2012 - El gobernador Javier Duarte de Ochoa inauguró la remodelación de...04 12 2012 - El gobernador Javier Duarte de Ochoa inauguró la remodelación de...
04 12 2012 - El gobernador Javier Duarte de Ochoa inauguró la remodelación de...Organización política
 
ugrad-scholars-2015-journal
ugrad-scholars-2015-journalugrad-scholars-2015-journal
ugrad-scholars-2015-journalSamah Mcgona
 
ppt
pptppt
pptmcuu
 
Unicap vest2015 1-class_geral
Unicap vest2015 1-class_geralUnicap vest2015 1-class_geral
Unicap vest2015 1-class_geralIsaquel Silva
 
Sistemadigestivodigestivo 121228173418-phpapp02
Sistemadigestivodigestivo 121228173418-phpapp02Sistemadigestivodigestivo 121228173418-phpapp02
Sistemadigestivodigestivo 121228173418-phpapp02franco larco buchelli
 
Criatividade conscientizadora
Criatividade conscientizadoraCriatividade conscientizadora
Criatividade conscientizadoraVIEIRA RESENDE
 
Presentacion para el blog
Presentacion para el blogPresentacion para el blog
Presentacion para el blogRJHO777
 

Viewers also liked (20)

Letter of Fray Domingo Salazar to the King of Spain
Letter of Fray Domingo Salazar to the King of SpainLetter of Fray Domingo Salazar to the King of Spain
Letter of Fray Domingo Salazar to the King of Spain
 
Segundo
SegundoSegundo
Segundo
 
Segundo
SegundoSegundo
Segundo
 
Certificate of Completion Report (15)
Certificate of Completion Report (15)Certificate of Completion Report (15)
Certificate of Completion Report (15)
 
Customs of the Tagalog
Customs of the TagalogCustoms of the Tagalog
Customs of the Tagalog
 
إستخدام التكنولوجا فى التعلم محاضره 1
إستخدام التكنولوجا فى التعلم محاضره 1إستخدام التكنولوجا فى التعلم محاضره 1
إستخدام التكنولوجا فى التعلم محاضره 1
 
Protecting your castle from CASL
Protecting your castle from CASLProtecting your castle from CASL
Protecting your castle from CASL
 
Presentacion Relaciones Interpersonales
Presentacion Relaciones InterpersonalesPresentacion Relaciones Interpersonales
Presentacion Relaciones Interpersonales
 
Rietwerk - het jaar 2014 in beeld
Rietwerk - het jaar 2014 in beeldRietwerk - het jaar 2014 in beeld
Rietwerk - het jaar 2014 in beeld
 
04 12 2012 - El gobernador Javier Duarte de Ochoa inauguró la remodelación de...
04 12 2012 - El gobernador Javier Duarte de Ochoa inauguró la remodelación de...04 12 2012 - El gobernador Javier Duarte de Ochoa inauguró la remodelación de...
04 12 2012 - El gobernador Javier Duarte de Ochoa inauguró la remodelación de...
 
ugrad-scholars-2015-journal
ugrad-scholars-2015-journalugrad-scholars-2015-journal
ugrad-scholars-2015-journal
 
ppt
pptppt
ppt
 
Unicap vest2015 1-class_geral
Unicap vest2015 1-class_geralUnicap vest2015 1-class_geral
Unicap vest2015 1-class_geral
 
Segundo
SegundoSegundo
Segundo
 
Sistemadigestivodigestivo 121228173418-phpapp02
Sistemadigestivodigestivo 121228173418-phpapp02Sistemadigestivodigestivo 121228173418-phpapp02
Sistemadigestivodigestivo 121228173418-phpapp02
 
Criatividade conscientizadora
Criatividade conscientizadoraCriatividade conscientizadora
Criatividade conscientizadora
 
Segundo
SegundoSegundo
Segundo
 
Tema 11 CMC
Tema 11 CMCTema 11 CMC
Tema 11 CMC
 
Segundo
SegundoSegundo
Segundo
 
Presentacion para el blog
Presentacion para el blogPresentacion para el blog
Presentacion para el blog
 

Similar to Poisson Regression Analysis of Count Data

Addressing the needs of fertility patients
Addressing the needs of fertility patientsAddressing the needs of fertility patients
Addressing the needs of fertility patientsLauri Pasch
 
Systematic Review Poster
Systematic Review PosterSystematic Review Poster
Systematic Review PosterNikky Agboola
 
LITERATURE SELECTION2LITERATURE SELECTION6Lite.docx
LITERATURE SELECTION2LITERATURE SELECTION6Lite.docxLITERATURE SELECTION2LITERATURE SELECTION6Lite.docx
LITERATURE SELECTION2LITERATURE SELECTION6Lite.docxSHIVA101531
 
Running head CRITIQUE QUANTITATIVE, QUALITATIVE, OR MIXED METHODS.docx
Running head CRITIQUE QUANTITATIVE, QUALITATIVE, OR MIXED METHODS.docxRunning head CRITIQUE QUANTITATIVE, QUALITATIVE, OR MIXED METHODS.docx
Running head CRITIQUE QUANTITATIVE, QUALITATIVE, OR MIXED METHODS.docxtodd271
 
Sociocultural and Health Correlates Related to Colorectal Cancer Screening Ad...
Sociocultural and Health Correlates Related to Colorectal Cancer Screening Ad...Sociocultural and Health Correlates Related to Colorectal Cancer Screening Ad...
Sociocultural and Health Correlates Related to Colorectal Cancer Screening Ad...Kelly Brittain
 
Exploring the Association between Maternal Health Literacy and Pediatric Heal...
Exploring the Association between Maternal Health Literacy and Pediatric Heal...Exploring the Association between Maternal Health Literacy and Pediatric Heal...
Exploring the Association between Maternal Health Literacy and Pediatric Heal...Penn Institute for Urban Research
 
Statistics In Public Health Practice
Statistics In Public Health PracticeStatistics In Public Health Practice
Statistics In Public Health Practicefhardnett
 
Applying and Sharing Evidence Discussion.docx
Applying and Sharing Evidence Discussion.docxApplying and Sharing Evidence Discussion.docx
Applying and Sharing Evidence Discussion.docxwrite22
 
Quality Data Sources Organizer Discussion Paper.docx
Quality Data Sources Organizer Discussion Paper.docxQuality Data Sources Organizer Discussion Paper.docx
Quality Data Sources Organizer Discussion Paper.docxwrite22
 
INTRODUCTION TO HEALTHCARE RESEARCH METHODS: Correlational Studies, Case Seri...
INTRODUCTION TO HEALTHCARE RESEARCH METHODS: Correlational Studies, Case Seri...INTRODUCTION TO HEALTHCARE RESEARCH METHODS: Correlational Studies, Case Seri...
INTRODUCTION TO HEALTHCARE RESEARCH METHODS: Correlational Studies, Case Seri...Dr. Khaled OUANES
 
Think of your local community. What health-related issue current.docx
Think of your local community. What health-related issue current.docxThink of your local community. What health-related issue current.docx
Think of your local community. What health-related issue current.docxirened6
 
Title More than a Number” Perspectives of Prenatal Care Qua
Title More than a Number” Perspectives of Prenatal Care QuaTitle More than a Number” Perspectives of Prenatal Care Qua
Title More than a Number” Perspectives of Prenatal Care QuaTakishaPeck109
 
Community Resources TemplateMental Health ProvidersH.docx
Community Resources TemplateMental Health ProvidersH.docxCommunity Resources TemplateMental Health ProvidersH.docx
Community Resources TemplateMental Health ProvidersH.docxmccormicknadine86
 
The prevalence, patterns of usage and people's attitude towards complementary...
The prevalence, patterns of usage and people's attitude towards complementary...The prevalence, patterns of usage and people's attitude towards complementary...
The prevalence, patterns of usage and people's attitude towards complementary...home
 
Metaanalisis VIH y Depresión
Metaanalisis  VIH y Depresión Metaanalisis  VIH y Depresión
Metaanalisis VIH y Depresión Rosa Alcayaga
 

Similar to Poisson Regression Analysis of Count Data (20)

Addressing the needs of fertility patients
Addressing the needs of fertility patientsAddressing the needs of fertility patients
Addressing the needs of fertility patients
 
JAMA Peds 9.6.16(1)
JAMA Peds 9.6.16(1)JAMA Peds 9.6.16(1)
JAMA Peds 9.6.16(1)
 
Systematic Review Poster
Systematic Review PosterSystematic Review Poster
Systematic Review Poster
 
LITERATURE SELECTION2LITERATURE SELECTION6Lite.docx
LITERATURE SELECTION2LITERATURE SELECTION6Lite.docxLITERATURE SELECTION2LITERATURE SELECTION6Lite.docx
LITERATURE SELECTION2LITERATURE SELECTION6Lite.docx
 
Running head CRITIQUE QUANTITATIVE, QUALITATIVE, OR MIXED METHODS.docx
Running head CRITIQUE QUANTITATIVE, QUALITATIVE, OR MIXED METHODS.docxRunning head CRITIQUE QUANTITATIVE, QUALITATIVE, OR MIXED METHODS.docx
Running head CRITIQUE QUANTITATIVE, QUALITATIVE, OR MIXED METHODS.docx
 
Sociocultural and Health Correlates Related to Colorectal Cancer Screening Ad...
Sociocultural and Health Correlates Related to Colorectal Cancer Screening Ad...Sociocultural and Health Correlates Related to Colorectal Cancer Screening Ad...
Sociocultural and Health Correlates Related to Colorectal Cancer Screening Ad...
 
Exploring the Association between Maternal Health Literacy and Pediatric Heal...
Exploring the Association between Maternal Health Literacy and Pediatric Heal...Exploring the Association between Maternal Health Literacy and Pediatric Heal...
Exploring the Association between Maternal Health Literacy and Pediatric Heal...
 
Statistics In Public Health Practice
Statistics In Public Health PracticeStatistics In Public Health Practice
Statistics In Public Health Practice
 
Open Journal of Pediatrics & Neonatal Care
Open Journal of Pediatrics & Neonatal CareOpen Journal of Pediatrics & Neonatal Care
Open Journal of Pediatrics & Neonatal Care
 
Applying and Sharing Evidence Discussion.docx
Applying and Sharing Evidence Discussion.docxApplying and Sharing Evidence Discussion.docx
Applying and Sharing Evidence Discussion.docx
 
Reducing Anomia
Reducing AnomiaReducing Anomia
Reducing Anomia
 
Quality Data Sources Organizer Discussion Paper.docx
Quality Data Sources Organizer Discussion Paper.docxQuality Data Sources Organizer Discussion Paper.docx
Quality Data Sources Organizer Discussion Paper.docx
 
INTRODUCTION TO HEALTHCARE RESEARCH METHODS: Correlational Studies, Case Seri...
INTRODUCTION TO HEALTHCARE RESEARCH METHODS: Correlational Studies, Case Seri...INTRODUCTION TO HEALTHCARE RESEARCH METHODS: Correlational Studies, Case Seri...
INTRODUCTION TO HEALTHCARE RESEARCH METHODS: Correlational Studies, Case Seri...
 
Think of your local community. What health-related issue current.docx
Think of your local community. What health-related issue current.docxThink of your local community. What health-related issue current.docx
Think of your local community. What health-related issue current.docx
 
Title More than a Number” Perspectives of Prenatal Care Qua
Title More than a Number” Perspectives of Prenatal Care QuaTitle More than a Number” Perspectives of Prenatal Care Qua
Title More than a Number” Perspectives of Prenatal Care Qua
 
Community Resources TemplateMental Health ProvidersH.docx
Community Resources TemplateMental Health ProvidersH.docxCommunity Resources TemplateMental Health ProvidersH.docx
Community Resources TemplateMental Health ProvidersH.docx
 
Trends in Racial/Ethnic Representation Among US Medical Students
Trends in Racial/Ethnic Representation Among US Medical StudentsTrends in Racial/Ethnic Representation Among US Medical Students
Trends in Racial/Ethnic Representation Among US Medical Students
 
The prevalence, patterns of usage and people's attitude towards complementary...
The prevalence, patterns of usage and people's attitude towards complementary...The prevalence, patterns of usage and people's attitude towards complementary...
The prevalence, patterns of usage and people's attitude towards complementary...
 
Communicating risk
Communicating riskCommunicating risk
Communicating risk
 
Metaanalisis VIH y Depresión
Metaanalisis  VIH y Depresión Metaanalisis  VIH y Depresión
Metaanalisis VIH y Depresión
 

Recently uploaded

Interimreport1 January–31 March2024 Elo Mutual Pension Insurance Company
Interimreport1 January–31 March2024 Elo Mutual Pension Insurance CompanyInterimreport1 January–31 March2024 Elo Mutual Pension Insurance Company
Interimreport1 January–31 March2024 Elo Mutual Pension Insurance CompanyTyöeläkeyhtiö Elo
 
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一(办理学位证)加拿大萨省大学毕业证成绩单原版一比一
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一S SDS
 
Monthly Market Risk Update: April 2024 [SlideShare]
Monthly Market Risk Update: April 2024 [SlideShare]Monthly Market Risk Update: April 2024 [SlideShare]
Monthly Market Risk Update: April 2024 [SlideShare]Commonwealth
 
VIP Call Girls Service Begumpet Hyderabad Call +91-8250192130
VIP Call Girls Service Begumpet Hyderabad Call +91-8250192130VIP Call Girls Service Begumpet Hyderabad Call +91-8250192130
VIP Call Girls Service Begumpet Hyderabad Call +91-8250192130Suhani Kapoor
 
Log your LOA pain with Pension Lab's brilliant campaign
Log your LOA pain with Pension Lab's brilliant campaignLog your LOA pain with Pension Lab's brilliant campaign
Log your LOA pain with Pension Lab's brilliant campaignHenry Tapper
 
Chapter 2.ppt of macroeconomics by mankiw 9th edition
Chapter 2.ppt of macroeconomics by mankiw 9th editionChapter 2.ppt of macroeconomics by mankiw 9th edition
Chapter 2.ppt of macroeconomics by mankiw 9th editionMuhammadHusnain82237
 
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...Henry Tapper
 
(DIYA) Bhumkar Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(DIYA) Bhumkar Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(DIYA) Bhumkar Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(DIYA) Bhumkar Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Instant Issue Debit Cards - School Designs
Instant Issue Debit Cards - School DesignsInstant Issue Debit Cards - School Designs
Instant Issue Debit Cards - School Designsegoetzinger
 
Call Girls In Yusuf Sarai Women Seeking Men 9654467111
Call Girls In Yusuf Sarai Women Seeking Men 9654467111Call Girls In Yusuf Sarai Women Seeking Men 9654467111
Call Girls In Yusuf Sarai Women Seeking Men 9654467111Sapana Sha
 
fca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdffca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdfHenry Tapper
 
Q3 2024 Earnings Conference Call and Webcast Slides
Q3 2024 Earnings Conference Call and Webcast SlidesQ3 2024 Earnings Conference Call and Webcast Slides
Q3 2024 Earnings Conference Call and Webcast SlidesMarketing847413
 
BPPG response - Options for Defined Benefit schemes - 19Apr24.pdf
BPPG response - Options for Defined Benefit schemes - 19Apr24.pdfBPPG response - Options for Defined Benefit schemes - 19Apr24.pdf
BPPG response - Options for Defined Benefit schemes - 19Apr24.pdfHenry Tapper
 
Vip B Aizawl Call Girls #9907093804 Contact Number Escorts Service Aizawl
Vip B Aizawl Call Girls #9907093804 Contact Number Escorts Service AizawlVip B Aizawl Call Girls #9907093804 Contact Number Escorts Service Aizawl
Vip B Aizawl Call Girls #9907093804 Contact Number Escorts Service Aizawlmakika9823
 
SBP-Market-Operations and market managment
SBP-Market-Operations and market managmentSBP-Market-Operations and market managment
SBP-Market-Operations and market managmentfactical
 
House of Commons ; CDC schemes overview document
House of Commons ; CDC schemes overview documentHouse of Commons ; CDC schemes overview document
House of Commons ; CDC schemes overview documentHenry Tapper
 
High Class Call Girls Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
High Class Call Girls Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsHigh Class Call Girls Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
High Class Call Girls Nagpur Grishma Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
VIP Kolkata Call Girl Jodhpur Park 👉 8250192130 Available With Room
VIP Kolkata Call Girl Jodhpur Park 👉 8250192130  Available With RoomVIP Kolkata Call Girl Jodhpur Park 👉 8250192130  Available With Room
VIP Kolkata Call Girl Jodhpur Park 👉 8250192130 Available With Roomdivyansh0kumar0
 

Recently uploaded (20)

Interimreport1 January–31 March2024 Elo Mutual Pension Insurance Company
Interimreport1 January–31 March2024 Elo Mutual Pension Insurance CompanyInterimreport1 January–31 March2024 Elo Mutual Pension Insurance Company
Interimreport1 January–31 March2024 Elo Mutual Pension Insurance Company
 
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一(办理学位证)加拿大萨省大学毕业证成绩单原版一比一
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一
 
Monthly Market Risk Update: April 2024 [SlideShare]
Monthly Market Risk Update: April 2024 [SlideShare]Monthly Market Risk Update: April 2024 [SlideShare]
Monthly Market Risk Update: April 2024 [SlideShare]
 
VIP Call Girls Service Begumpet Hyderabad Call +91-8250192130
VIP Call Girls Service Begumpet Hyderabad Call +91-8250192130VIP Call Girls Service Begumpet Hyderabad Call +91-8250192130
VIP Call Girls Service Begumpet Hyderabad Call +91-8250192130
 
Log your LOA pain with Pension Lab's brilliant campaign
Log your LOA pain with Pension Lab's brilliant campaignLog your LOA pain with Pension Lab's brilliant campaign
Log your LOA pain with Pension Lab's brilliant campaign
 
Chapter 2.ppt of macroeconomics by mankiw 9th edition
Chapter 2.ppt of macroeconomics by mankiw 9th editionChapter 2.ppt of macroeconomics by mankiw 9th edition
Chapter 2.ppt of macroeconomics by mankiw 9th edition
 
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
 
(DIYA) Bhumkar Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(DIYA) Bhumkar Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(DIYA) Bhumkar Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(DIYA) Bhumkar Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Instant Issue Debit Cards - School Designs
Instant Issue Debit Cards - School DesignsInstant Issue Debit Cards - School Designs
Instant Issue Debit Cards - School Designs
 
Call Girls In Yusuf Sarai Women Seeking Men 9654467111
Call Girls In Yusuf Sarai Women Seeking Men 9654467111Call Girls In Yusuf Sarai Women Seeking Men 9654467111
Call Girls In Yusuf Sarai Women Seeking Men 9654467111
 
Commercial Bank Economic Capsule - April 2024
Commercial Bank Economic Capsule - April 2024Commercial Bank Economic Capsule - April 2024
Commercial Bank Economic Capsule - April 2024
 
fca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdffca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdf
 
Q3 2024 Earnings Conference Call and Webcast Slides
Q3 2024 Earnings Conference Call and Webcast SlidesQ3 2024 Earnings Conference Call and Webcast Slides
Q3 2024 Earnings Conference Call and Webcast Slides
 
BPPG response - Options for Defined Benefit schemes - 19Apr24.pdf
BPPG response - Options for Defined Benefit schemes - 19Apr24.pdfBPPG response - Options for Defined Benefit schemes - 19Apr24.pdf
BPPG response - Options for Defined Benefit schemes - 19Apr24.pdf
 
Vip B Aizawl Call Girls #9907093804 Contact Number Escorts Service Aizawl
Vip B Aizawl Call Girls #9907093804 Contact Number Escorts Service AizawlVip B Aizawl Call Girls #9907093804 Contact Number Escorts Service Aizawl
Vip B Aizawl Call Girls #9907093804 Contact Number Escorts Service Aizawl
 
SBP-Market-Operations and market managment
SBP-Market-Operations and market managmentSBP-Market-Operations and market managment
SBP-Market-Operations and market managment
 
House of Commons ; CDC schemes overview document
House of Commons ; CDC schemes overview documentHouse of Commons ; CDC schemes overview document
House of Commons ; CDC schemes overview document
 
High Class Call Girls Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
High Class Call Girls Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsHigh Class Call Girls Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
High Class Call Girls Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
 
Monthly Economic Monitoring of Ukraine No 231, April 2024
Monthly Economic Monitoring of Ukraine No 231, April 2024Monthly Economic Monitoring of Ukraine No 231, April 2024
Monthly Economic Monitoring of Ukraine No 231, April 2024
 
VIP Kolkata Call Girl Jodhpur Park 👉 8250192130 Available With Room
VIP Kolkata Call Girl Jodhpur Park 👉 8250192130  Available With RoomVIP Kolkata Call Girl Jodhpur Park 👉 8250192130  Available With Room
VIP Kolkata Call Girl Jodhpur Park 👉 8250192130 Available With Room
 

Poisson Regression Analysis of Count Data

  • 1. Research in Nursing & Health, 2005, 28, 408–418 Focus on Research Methods Analysis of Count Data Using Poisson Regression* M. Katherine Hutchinson1 ,{ Matthew C. Holtman2z 1 University of Pennsylvania School of Nursing, 420 Guardian Drive, Philadelphia, Pennsylvania 19104-6096 2 Fels Institute of Government and Department of Criminology, University of Pennsylvania Accepted 16 May 2005 Abstract: Nurses and other health researchers are often concerned with infrequently occurring, repeatable, health-related events such as number of hospitalizations, pregnancies, or visits to a health care provider. Reports on the occurrence of such discrete events take the form of non-negative integer or count data. Because the counts of infrequently occurring events tend to be non-normally distributed and highly positively skewed, the use of ordinary least squares (OLS) regression with non-transformed data has several shortcomings. Techniques such as Poisson regression and negative binomial regression may provide more appropriate alternatives for analyzing these data. The purpose of this article is to compare and contrast the use of these three methods for the analysis of infrequently occurring count data. The strengths, limitations, and special considerations of each approach are discussed. Data from the National Longitudinal Survey of Adolescent Health (AddHealth) are used for illustrative purposes.ß 2005 Wiley Periodicals, Inc. Res Nurs Health 28:408–418, 2005 Keywords: Poisson regression; count data; data analysis Nurses and other health researchers are often concerned with infrequently occurring, repeatable, health-related events such as number of hospitali- zations, pregnancies, or visits to a health care provider.Reportsontheoccurrenceofsuchdiscrete events take the form of non-negative integer or count data. Counts of infrequently occurring, repe- atable events tend to cluster around the values of *This research uses data from Add Health, a program project designed by J. Richard Udry, Peter S. Bearman, and Kathleen Mullan Harris, and funded by a grant P01-HD31921 from the National Institute of Child Health and Human Development, with cooperative funding from 17 other agencies. Special acknowledgment is due Ronald R. Rindfuss and Barbara Entwisle for assistance in the original design. Persons interested in obtaining data files from Add Health should contact Add Health, Carolina Population Center, 123 W. Franklin Street, Chapel Hill, NC 27516-2524 (www.cpc.unc.edu/addhealth/contract.html). Contract grant sponsor: National Institute of Mental Health (to MKH); Contract grant number: R03 MH63659. Contract grant sponsor: National Institute of Child Health and Human Development; Contract grant number: P01-HD31921. Correspondence to M. Katherine Hutchinson. { Assistant Professor and Associate Director. z Lecturer. Published online in Wiley InterScience (www.interscience.wiley.com) DOI: 10.1002/nur.20093 408 ß2005 Wiley Periodicals, Inc.
  • 2. 0 and/or 1 and exhibit low frequencies at higher values.Thistypeofdistributionhasa positive skew. They are truncated at 0, and gradually trail off toward higher values; the mean is characteristically low but greater than the median because of the influence of a few relatively large observations. In a regression model, the distribution of the error term mirrors the distribution of the dependent variable itself (Lewis-Beck, 1989). Because ordinary least squares (OLS) regression assumes normality in the distribution of error terms and hence in the depen- dent variable, its use with this type of data is pro- blematic if the data are not transformed to address the effects of the positive skew (Lewis-Beck). Poisson regression and negative binomial regression may provide more appropriate alter- natives for the analysis of infrequently occurring, untransformed count data. Neither of these alter- native types of regression analysis assumes normal distribution of the error terms and depen- dent variables. Poisson regression assumes a Poisson distribution—a specific type of distribu- tion in which scores take the form of non-negative whole number or integer values. The Poisson distribution is truncated at 0, highly skewed in the positive direction, and exhibits equidispersion (i.e., a mean that is equal to the variance; Allison, 1999; Cameron & Trivedi, 1998). For the use of uncorrected Poisson regression, these character- istics should be present. When overdispersion is present (i.e., the variance is greater than the mean) Poisson regression may still be employed but statistical corrections must be incorporated into the model to correct for the overdispersion). In contrast, negative binomial regression is based on the assumption of a Poisson-like dis- tribution (Allison, 1999) and no assumptions regarding equidispersion are made. When over- dispersion is present, negative binomial regression can be employed without any special corrections. The purpose of this article is to compare and contrast the use of these three methods for the analysis of infrequently occurring count data. The strengths, limitations, and special considerations of each approach are discussed. Data from the National Longitudinal Survey of Adolescent Health (AddHealth) public use dataset are used for illustrative purposes. METHODS In our initial attempts to build comparative regression models, we used independent variables from Wave 1 of the AddHealth public use dataset to predict number of pregnancies reported at Wave II. However, in order to demonstrate the range of options available in Poisson regression, we felt that it was important to include a varying exposure variable (e.g., years of sexual activity) and model an outcome variable that exhibited overdispersion (i.e., model a dependent variable with a variance greater than its mean). When we restricted our sample to sexually experienced young women and used Wave II reports of numbers of pregnancies as our outcome, the resultant model did not exhibit overdispersion. Therefore, although we would have preferred to use more than one wave of data in order to more accurately model the longitudinal dynamics of pregnancy, for illustrative purposes, we have confined the analyses to a cross-sectional analysis using only data from Wave I. National Longitudinal Study of Adolescent Health The National Longitudinal Study of Adolescent Health (AddHealth) was mandated to the National Institute of Child Health and Human Development (NICHD) by a Congressional act. Detailed infor- mation about the study can be obtained on its web site at http://www.cpc.unc.edu/addhealth. The studyincludesaschool-basedsampleofadolescents in7ththrough12thgrades.Studentswhocompleted the in-school questionnaire (n ¼ 90,000) and those who were listed on the school roster were used as a samplingframe for a corerandomsampleof12,105 adolescents,stratifiedbygenderandgrade.In-home interviewswereconductedwiththiscoresample,as wellasoversamplesofethnicminoritiesandspecial populations based on self-report data from the in- school questionnaire. The analyses reported here were limited to Wave 1 data from in-home inter- views with adolescents who were included in the public use dataset subsample. Sample The AddHealth public use dataset includes a subsample of approximately 6,500 respondents. Of the 3,356 female respondents, 36% (n ¼ 1,241) were sexually experienced, based on self-reports of ever having sexual intercourse. Analyses were limited to those respondents who were sexually experienced and whose years of sexual experience could be calculated from their current age and their age at first sexual intercourse. We deleted a few outlier cases whose reported ages at first intercourse were so low that their calculated years of sexual experience exceeded 10. Of those ANALYSIS OF COUNT DATA / HUTCHINSON AND HOLTMAN 409
  • 3. sexually experienced girls remaining in the analy- sis (n ¼ 1,241), most (n ¼ 1,011, 81%) reported no pregnancies. Of those who had been pregnant, 190 (15%) reported only one pregnancy; 40 (3%) reported more than one pregnancy (Table 1). Independent Variables For our analyses, we included nine important predictor variables that have been shown to be related to adolescent pregnancy—age, race (Afri- can American, Asian, Hispanic/Latino), marital status, sexual experience, college plans, contra- ceptive self-efficacy, and consistent contraceptive use. The descriptive statistics for all variables are summarized in Table 2. Age. Agewas included in all of our analyses, as sexual activity, childbearing intentions, and con- traceptive behaviors tend to vary with age. Age, as self-reported at Wave 1, was recorded in years. Race. Race was included in the analyses as age at first intercourse and rates of adolescent pregnancy have been shown to vary by race (Blum et al., 2000; Kann et al., 2000). In addition, African American adolescents have been found to be less compliant with oral contraceptive use than their White peers (Scher, Emans, & Grace, 1982). Three dichotomous dummy variables, coded 1 for yes and 0 for no, were included to repre- sent respondents’ self-identification as African American, Hispanic/Latino, and/or Asian. Marital status. Although the vast majority (98%) of girls in the sample had never been married, those who had been married may have had different attitudes and intentions towards becoming pregnant than girls who were never married. Marital status was coded as single/never married (1) or other (0) and included as a dichotomous dummy variable in our analyses. Sexual experience. Pregnancy is directly related to sexual activity. Sexual experience was calculated as the number of years that had elapsed between the year when the Wave I interview was conducted and the year the respondent reported having sexual intercourse for the first time. College plans. Because feelings of hopeless- ness and lack of future plans may act as contri- buting factors in adolescent sexual risk-taking and unintended pregnancy, whether or not respondents had college plans was assessed at Wave I. The single-item measure was worded: ‘‘How much do you want to go to college?’’ The item was scored on a Likert-type scale from 1 to 5; higher scores indicated a greater desire to attend college. The mean score was 4.4 (SD ¼ 1.1). Contraceptive self-efficacy. Greater sexual self-efficacy has consistently been shown to be related to both condom use and contraceptive use (DiClemente, Lodico, & Grinstead, 1996; Hutchinson, 2002; Hutchinson & Cooney, 1998; Hutchinson, Jemmott, Jemmott, Braverman, & Fong, 2003; Jemmott, Jemmott, & Hacker, 1992). Four items on the Wave I survey assessed self- efficacy related to sexual behavior and contra- ceptive use. Items were scored from 1 to 5. Higher scores indicated greater contraceptive self-efficacy. The overall contraceptive self-efficacy score was computed as the simple average of the four items. Total possible scores ranged from 1 to 5; the mean contraceptive self-efficacy score was 3.8 (SD¼ .8). Consistent contraceptive use. In the Add- Health survey, a number of questions addressed contraceptivebehavior.Forthe present analysis,we Table 1. Frequency Distribution of Outcome Variable Pregnancies reported n % 0 1,011 81.5 1 190 15.3 2 32 2.6 3 6 0.5 4 2 0.2 Table 2. Descriptive Statistics for Sample: Girls Reporting Sexual Experience Variable n Mean SD Minimum Maximum Number of pregnancies 1,241 .23 .52 .0 4.0 Current age (age) 1,241 17.01 1.39 12.7 20.7 African American (Black) 1,241 .30 .46 .0 1.0 Hispanic (Hisp) 1,241 .10 .30 .0 1.0 Asian (Asian) 1,241 .03 .18 .0 1.0 Never married (single) (0 ¼ false;1 ¼ true) 1,236 .98 .14 .0 1.0 Years of sexual activity (years) 1,169 1.74 1.42 .0 8.0 College plans (college) 1,237 4.38 1.06 1.0 5.0 Contraceptive self-efficacy (effic) 1,233 3.76 .81 1.0 5.0 Consistent contraceptive use (contrac) 1,241 1.27 .80 .0 2.0 410 RESEARCH IN NURSING & HEALTH
  • 4. created a proxy measure for consistent contra- ceptive use based on two questions that asked whether respondents used contraception during intercourse the first time and the most recent time they had sexual intercourse. Possible values on the scale range from0 (did not use contraceptioneither time) to 2 (used contraception both times). The average score on this measure was 1.29 (SD ¼ .8). In summary, we designated nine predictor variables to include in our models—age, race, marital status, sexual experience, college plans, contraceptive self-efficacy, and consistent contra- ceptive use. In all of our models, we expected age, race, marital status, and sexual experience to be positively associated with number of pregnancies. College plans, contraceptive self-efficacy, and consistent contraceptive use were expected to be inversely related to number of pregnancies. Outcome Variable The outcome variable of interest, number of pre- gnancies, was taken from respondents’ self- reports. We did not include information on whether pregnancies were carried to term or whether they resulted in live births. Almost 19% of the sexually experienced female adolescents who participated in AddHealth reported having been pregnant at least once at Wave I; slightly less than 82% reported they had never been pregnant. The number of reported pregnancies ranged from 0 to 4. The number of pregnancies was non- normally distributed and highly skewed with a mean of .22 and a variance of .27. DATA ANALYSES AND RESULTS Our goal was to compare the use of OLS, Poisson, and negative binomial regressions for modeling the effects of nine independent variables on the number of pregnancies experienced by sexually experienced adolescent females from the National Longitudinal Survey of Adolescent Health. The outcome of interest, number of pregnan- cies, is an infrequently occurring, discrete and repeatable event. More than 80% of the sample had a pregnancy count of 0. Of those who had been pregnant, 190 had had only one pregnancy; 40 reported more than one pregnancy. OLS Regression Ignoring the specific characteristics of a Poisson- distributed outcome variable, a typical OLS re- gression model to address these effects might look like the following: Number of pregnancies ¼ a þ b1 (age) þ b2 (African American) þ b3 (His- panic) þ b4 (Asian) þ b5 (single) þ b6 (years of sexual experience) þ b7 (college) þ b8 (contra- ceptive self-efficacy) þ b9 (consistent contracep- tive use). The results fromthismodel are shownin Table3. All but two of the effects are significant at the .05 level. In interpreting the estimated effects, we see that, controlling for the other variables, each addi- tional year of age increases the predicted number of pregnancies by .05. African American girls are expected to have .18 more pregnancies than White girls. Being single decreases the expected number of pregnancies by nearly .72. Each year of sexual activity increases the predicted number of preg- nancies by .07. Greater aspiration for college reduces the predicted number of pregnancies by.04 per point, greater contraceptive self-efficacy by .05 per point. Each additional point on the consistent contraceptive use scale decreases the predicted number of pregnancies by .07. So what is wrong with using the OLS approach? The answer is that OLS is inappropriate for models in which the dependent variable is highly skewed. Table 3. OLS Regression Output for Model Predicting Number of Pregnancies Variable DF Parameter Estimate Standard Error t value p > j t j Intercept 1 .37410 .22441 1.67 .0958 Current age 1 .04718 .01223 3.86 .0001 African American 1 .17686 .03219 5.49 <.0001 Hispanic 1 .03127 .04971 .63 .5295 Asian 1 .08816 .08171 1.08 .2808 Never married 1 À.71671 .09796 À7.32 <.0001 Years of sexual activity 1 .07221 .01092 6.61 <.0001 College plans 1 À.03711 .01402 À2.65 .0082 Contraceptive self-efficacy 1 À.04573 .02005 À2.28 .0227 Consistent contraceptive use 1 À.07412 .01879 À3.94 <.0001 OLS, ordinary least squares. ANALYSIS OF COUNT DATA / HUTCHINSON AND HOLTMAN 411
  • 5. Our dependent variable, number of pregnancies, is in this category. Its range is restricted by the fact that its lower bound is zero. Furthermore, although the number of girls who have been pregnant is quite low, at the same time, a small handful of girls reported several pregnancies, up to a maximum of 4. The result is a highly skewed distribution. An OLS model does not perform well under these conditions. OLS requires, as a basic assump- tion, that the dependent variable and the error term in the model be at least approximately normally distributed (Lewis-Beck, 1989). With an outcome variable this skewed, these assumptions are violat- ed. Using OLS also risks violating the homosce- dasticity assumption of OLS, which is that the error terms are evenly distributed across values of the dependent variable. When one or both of these violations occur, the standard errors of the para- meters will be estimated incorrectly, and as a result, it does not produce accurate estimates for the t-tests associated with the parameters. The user will be unable to tell whether the effects are statistically significant or not. Worse, because one can almost always mechanically run such a model (even when the model is inappropriate), there is a risk of taking the results at face value, making a Type I error, and reaching the wrong conclusions. A second reason why a Poisson model is preferred over OLS regression is that risk exposure to the outcome of interest varies. OLS assumes linear relationships between each independent variable and the outcomes. Although using OLSallowedcontrollingforyearsofsexualactivity in the model, in doing so we were implicitly assuming that the relationship between years and pregnancies was linear. With a highly skewed outcomevariable like pregnancy, thisassumptionis not reasonable. As we will see, Poisson regression has a more natural way of incorporating this information into the model, producing more realistic estimates of how likely a girl is to get pregnant, holding constant the number of years she has been sexually active. Alternative Poisson-Type Models There are two Poisson-type models (a true Poisson and a negative binomial model, which is a generalization of the Poisson) that are better suited than OLS to analyzing the type of data in the example. In specifying a Poisson-type model, there are two initial choices to make: (a) whether or not to incorporate time-frame adjustments in the model and (b) whether to specify a true Poisson approach or a negative binomial approach. UnlikeOLSmodels,Poisson-type modelscanbe specified in such a way as to take varying time- frames or levels of exposure into account. Varying time-frames or levels of exposure can be addressed by including a predictor variable that adjusts the model to account for the time-frame in which the observations were made, or by standardizing the outcome variable per time unit. Investigators can also choose not to include such adjustments. In the present example, the appropriate variable to think about is the length of the interval of sexual activity. Although the risk remains constant for each exposure, the cumulative risk of becoming pregnant increases with the number of sexual encounters, and those encounters are likely to increase with an increase in the interval of exposure. Unfortunately, there is no direct measure ofhow many times a girlhas had sex.Therefore, we used years of sexual activity to give us an idea of how much exposure each girl is likely to have experienced. An 18-year-old girl who has been sexually active since age fourteen, for example, has probably had many more chances to become pregnant than a girl who initiated sexual activity at age 17. There may, of course be other diffe- rences—the girl who started at 14 may be, in many ways, less responsible or less able to plan than the other girl. This kind of effect is not captured with the time variable, just different levels of exposure. A Poisson type regression can incorporate such an exposure variable in one of two ways: by including exposure as a standard predictor vari- able, or incorporating exposure as an offset of the outcome variable. In the latter case, the researcher would include the log-transformed value of the exposure variable (years sexually active) on the right-hand-side of the equation, and instruct the computer to assume that its coefficient is equal to 1. In this kind of model, the outcome variable predicted will be the rate of log-pregnancies per unit of exposure (e.g., per year of sexual activity). The reasons for these minor adjustments are algebraic. The other alternative is simply to include the untransformed (e.g., not logged) exposure variable as a predictor on the right-hand side of the equation. This slightly simpler specifi- cation gives fairly comparable results, but requires a slight shift in interpretation of the coefficients. The second major option is whether to use a true Poisson specification, or a negative binomial specification. A true Poisson model assumes that the distribution of the outcome variable (number of pregnancies) has a mean equal to its variance. This is part of the definition of the Poisson distribution. The negative binomial (NB) distribution, on the other hand, makes no such assumption and can 412 RESEARCH IN NURSING & HEALTH
  • 6. have a variance that is larger or smaller than its mean. The negative binomial distribution is said, therefore, to be overdispersed compared to the true Poisson distribution. The models are interpreted in exactly the same way. The researcher needs to be aware of the possibility of overdispersion when estimating a model, and to pick the better of the two models (Poisson or NB) depending on whether there is evidence for overdispersion or not. When overdispersion is present, true Poisson modeling can still be employed; however, a standardized correction must be made for the overdispersion. Poisson modeling. A Poisson model uses a maximum likelihood estimation technique (much like a logistic regression), and can be run in SAS using the GENMOD procedure. A Poissonversion of our model is shown below. In this variation, we used the simpler approach for the exposure variable by including it in raw form as a predictor variable. Notice the line that specifies the log link function. For technical reasons, Poisson and negative binomial regression model the natural log of the outcome variable. Example SAS instructions are provided below and the output from these instructions is included in Table 4. proc genmod data ¼ adhealth.PoissonAnalysis; model numpreg ¼ age black hisp asian single years college effic contrac /link ¼ log dist ¼ poisson; run; The output is similar to that of an OLS regression, a parameter estimate, a standard error for each predictor variable, and a p value that is based on a w2 statistic. The upper and lower limits of a 95% confidence interval for the parameter also are provided. Finally, an extra line labeled scale, which is set to 1 in this first model, appears. The scale parameter identifies how much the output was adjusted to take overdispersion into account. We have not dealt with overdispersion yet in this model, so this parameter is not yet relevant. There are significant effects (p < .05) for most of the predictors, just as before. Each of the effects is again in the expected direction. In interpreting the coefficients recall that the outcome variable is really the natural log (log base e—the power that e has to be raised to get the original number, where e is about 2.718) of the number of pregnancies. This is a somewhat unnatural way to think about pregnancies (‘‘how many logged pregnancies did you have?’’), so it may be helpful to transform the parameter estimates into slightly more accessible form. This can be accomplished using spreadsheet software or a scientific calculator. For years of sexual experience, for example, the estimated coefficient is .22. Taking e to the .22 power, we get 1.25 pregnancies for every year of sexual experi- ence: e.22 ¼ 1.25. There is a simple way to interpret these estimates. The percentage change in the outcome count (Y) expected with each one unit increase in the independent variable (X) equals 100 times the inverse natural log of the coefficient minus one (Y% ¼ 100 Â [eB À 1]; Allison, 1999). In this example, the percent increase in the expected number of pregnancies for each additional year of sexual experience would be 100 Â (1.25À1) ¼ 25%. But how reasonable are these estimates? What is the predicted number of pregnancies for an average girl in the sample—that is, a girl who has average values on all of the predictor variables? No such girl may actually exist in the dataset, but applying the averages of all the predictors is an easy way to see whether the model gives reason- able results. In calculating the predicted number of Table 4. Poisson Regression Output for Model Predicting Number of Pregnancies, with Years of Sexual Activity Included as a Predictor Parameter DF Estimate Standard Error Wald 95% Confidence Limits w2 Probability > w2 Intercept 1 À3.5182 .9551 À5.3903 À1.6462 13.57 .0002 Current age 1 .2278 .0556 .1188 .3368 16.78 <.0001 African American 1 .6958 .1297 .4416 .9500 28.78 <.0001 Hispanic 1 .1498 .2195 À.2804 .5801 .47 .4949 Asian 1 .5333 .3290 À.1116 1.1782 2.63 .1051 Never married 1 À1.1035 .2122 À1.5194 À.6876 27.04 <.0001 Years of sexual activity 1 .2195 .0381 .1447 .2942 33.13 <.0001 College plans 1 À.1198 .0515 À.2208 À.0188 5.41 .0201 Contraceptive self-efficacy 1 À.1939 .0852 À.3609 À.0268 5.17 .0230 Consistent contraceptive use 1 À.3106 .0812 À.4697 À.1514 14.62 .0001 Scale 0 1 0 1 1 ANALYSIS OF COUNT DATA / HUTCHINSON AND HOLTMAN 413
  • 7. pregnancies, it is necessary to take into account the effects of all the predictor variables (not just years of sexual experience or exposure), and those effects have to be added up before taking the expo- nent of e. Referring to the descriptive statistics in Table 2, multiply each predictor variable’s average (current age, Black, Hispanic, Asian, single, years of sexual activity, college plans, contraceptive self-efficacy, consistent contraceptive use) by its estimated effect from the model, and include the intercept: À3.52þ (.23 Â 17.01) þ (.70 Â .30) þ (.15 Â .10) þ (.53 Â .03) þ (À1.10 Â .98) þ (.22 Â 1.74) þ (À.12 Â 4.38) þ (À.19 Â 3.76) þ (À.31 Â 1.27) ¼ À1.75 The total obtained is À1.75 for the log of the predicted number of pregnancies, or .17 for the predicted number of pregnancies (eÀ1.75 ¼ .17). The .17 figure is not far off the average number of pregnancies in the sample, .20. Model fit can be assessed usingthe deviance and w2 statistics that are reported in the model output. Under certain large-sample conditions, both sta- tistics are approximately distributed as w2 with degrees of freedom equal to the number of obser- vations minus the number of parameters. In this example, the deviance is 781.6 and the w2 is 3,359.9, both with 1,144 degrees of freedom. Both statistics are non-significant (p ¼ 1 and .95, respectively; the p values were calculated using the w2 formula from standard spreadsheet soft- ware). This suggests that the fit between the model and the data is very good. A significant deviance or w2 value would have indicated poor correspon- dence between the model and the data, perhaps due to inappropriate use of the Poisson specifica- tion or due to the omission of an important pre- dictor variable. In practice, the model fit statistics will tend to get larger and become statistically significant as the dataset gets larger, so they are not always useful for assessing a single model by itself. An alternative is to use the deviance statistic to compare nested models with each other; for details, refer to Cameron and Trivedi (1998). The next step is to attempt a better specification of the Poisson model using the offset method to adjust for varying lengths of exposure to pregnancy risk, in this case the interval of being sexually active. In this approach, the log-transformed value of years sexually activewill be included, calculated using the following SAS code: logyrs ¼ log(actvyrs); In setting up the model, this variable is not included with the other predictors, but instead a line is included that says: /offset ¼ logyrs The resultant SAS code reads as follows: proc genmod data ¼ adhealth.PoissonAnalysis; model numpreg ¼ age black hisp asian single college effic contrac /link ¼ log dist ¼ poisson offset ¼ lactvyrs; run; Thissetsthecoefficientforlogyrsequalto1,and adjusts the other estimates accordingly. It is done this way as an algebraic reduction of the ratio log(pregnancies/years). The logarithm of any ratio is equal to the log of the numerator minus the log of the denominator: log(a/b) ¼ log(a) À log(b), therefore log(pregnancies/years) ¼ log(pregnancies) À log(years). To estimate a model predicting log(pregnancies) alone, add log(years) to both sides of the regression equation. The log(years) term then cancels out of the left side, leaving log(pregnancies) by itself: log(pregnancies/years) ¼ a þ b1(age) þb2- (African American) þ b3(Hisp) þ b4(Asian) þ b5(single) þ b6(college) þ b7(effic) þ b8(contrac) When log(years) is added to both sides of the equation, the result is: log(pregnancies) ¼ a þ log(years) þ b1(age) þ b2(African American) þ b3(Hispanic) þ b4- (Asian) þ b5(single) þ b6(college) þ b7(effic) þ b8(contrac) The coefficient for log(years) is set to one in order to maintain the correct scale. The results of this model are shown in Table 5. In contrast to the previous models, the effects for age and college plans are no longer significant. If the sum of the products of the means of all the predictor variables with their coefficients is calculated, the following is the result: (À2.04) þ (.12 Â 17.01) þ (.66 Â .30) þ (.21 Â .10) þ (.49 Â .03) þ (À.93 Â .98) þ (À.09 Â 4.38) þ (À.22 Â 3.76) þ (À.27 Â 1.27) ¼ À2.21 The model predicts À2.21 log pregnancies per year for an average girl, or eÀ2.21 ¼ .11 pregnan- cies per year. Because the average number of years of sexual activity in the sample is 1.7, this would give a total of 1.7 Â .11 ¼ .19 pregnancies for an average girl, one who has average values on all the predictor variables that we included in the model. This estimate is the best fit with the known values in our dataset. Overdispersion. Overdispersion means having a Poisson-like distribution that is not quite Pois- son, because its variance is larger than its mean. When overdispersion is present, Poisson regres- sion coefficients are reliable but the variance is larger than the statistical program would expect for a Poisson distribution. As a result, the standard 414 RESEARCH IN NURSING & HEALTH
  • 8. errors calculated by the program are artificially smaller than the true standard errors, and thus may lead to more liberal significance test results and a greater likelihood of Type I errors. Overdispersion can be detected by comparing the reported model w2 statistic with its degrees of freedom (divide w2 by the degrees of freedom). An overdispersed model will have a ratio greater than 2; the greater the ratio, the greater the over- dispersion. An omitted predictor variable can result in apparent overdispersion or underdisper- sion; the researcher should investigate this possi- bility while examining the diagnostics. Overdispersion is corrected for by dividing each standard error by the square root of the model Pearson’s w2 divided by the degrees of freedom. SAS will perform this adjustment automatically if the PSCALE option is specified (Allison, 1999). In this example, the code is as follows. Notice it is the same as before except for the PSCALE option at the end. proc genmod data ¼ adhealth.PoissonAnalysis; model numpreg ¼ age black hisp asian single college effic contrac /link ¼ log dist ¼ poisson offset ¼ lactvyrs pscale; run; The output, presented in Table 6, looks very similar to the previous Poisson analysis. The coefficient estimates are exactly the same. The difference is in the estimated standard errors, which are now a little larger. The scale parameter has also been adjusted upward from 1, reflecting the degree of adjustment made. The increase in the standard errors affects the hypotheses tests, redu- cing the w2 statistics and increasing the p values slightly. This means that some statistically sig- nificant effect estimates will no longer reach significance. In this case, only African American race and marital status remain as significant effects. The correction for overdispersion takes a conservative approach to estimating the standard errors, which means it tends to err on the side of type-II errors—saying that an effect is not significant even if it might be. In order to gain a little more power to detect significant effects, Table 5. Poisson Regression Output for Model Predicting Number of Pregnancies, with Years of Sexual Activity Included as an Offset to Account for Varying Time-Frame of Exposure Parameter DF Estimate Standard Error Wald 95% Confidence Limits w2 Probability > w2 Intercept 1 À2.0420 .9607 À3.9248 À.1591 4.52 .0335 Current age 1 .1220 .0543 .0156 .2283 5.05 .0246 African American 1 .6604 .1287 .4082 .9126 26.33 <.0001 Hispanic 1 .2070 .2186 À.2215 .6354 .90 .3438 Asian 1 .4896 .3294 À.1561 1.1352 2.21 .1373 Never married 1 À.9294 .2080 À1.3371 À.5218 19.97 <.0001 College plans 1 À.0871 .0515 À.1881 .0139 2.86 .0909 Contraceptive self-efficacy 1 À.2241 .0845 À.3897 À.0585 7.04 .0080 Consistent contraceptive use 1 À.2729 .0825 À.4346 À.1113 10.95 .0009 Scale 0 1 0 1 1 Table 6. Poisson Regression Output for Model with Correction for Overdispersion Parameter DF Estimate Standard Error Wald 95% Confidence Limits w2 Probability > w2 Intercept 1 À2.0420 1.6456 À5.2673 1.1834 1.54 .2147 Current age 1 .1220 .0929 À.0602 .3041 1.72 .1894 African American 1 .6604 .2205 .2283 1.0925 8.97 .0027 Hispanic 1 .2070 .3745 À.5270 .9409 .31 .5805 Asian 1 .4896 .5643 À.6165 1.5956 .75 .3857 Never married 1 À.9294 .3563 À1.6277 À.2312 6.81 .0091 College plans 1 À.0871 .0883 À.2601 .0859 .97 .3237 Contraceptive self-efficacy 1 À.2241 .1447 À.5078 .0596 2.40 .1215 Consistent contraceptive use 1 À.2729 .1413 À.5498 .0039 3.73 .0533 Scale 0 1.713 0 1.713 1.713 ANALYSIS OF COUNT DATA / HUTCHINSON AND HOLTMAN 415
  • 9. an alternative approach would be to use a negative binomial specification. Negative binomial modeling. Another option if overdispersion is present is to use the negative binomial specification rather than the Poisson. This is easily accomplished by changing the DIST line in the model statement from POISSON to NB, as follows: proc genmod data ¼ adhealth.PoissonAnalysis; model numpreg ¼ age black hisp asian single college effic contrac /link ¼ log dist ¼ nb offset ¼ lactvyrs; run; The output is very similar and can be interpreted in the same way. As with the PSCALE adjustment, the negative binomial specification results in exactly the same coefficient estimates as in the main Poisson model, but with slightly larger standard errors. The negative binomial specifica- tion takes care of the overdispersion problem, but is a little less conservative than using corrected standard errors. When using this model there are highly significant effects for most of the pre- dictors, and marginally significant effects for age and college plans. Table 7 presents the results from this analytical approach. Table 8 compares the parameter estimates and standard errors from all four models: OLS, Poisson, Poisson with the correction for over- dispersion, and negative binomial. The parameter estimates for the Poisson-type models are quite similar, and the standard errors for the negative binomial model are closer to those of the uncor- rected than the corrected Poisson. Because it is not possible to compare directly the OLS and Poisson- type parameter estimates (because the former deal with raw pregnancies and the latter with log- pregnancies), in the last column of the table, the expected percent change in pregnancies due to a unit change in each predictor variable for the Poisson and negative binomial models are pro- vided. These transformed parameter estimates were calculated in the way described above. DISCUSSION When making decisions about which modeling approach is most appropriate for use with a given data set, consider the degree of normality or non- normality in the distribution of the outcome vari- able. If the dependent variable is fairly normally distributed, OLS regression may be the simplest approach and an appropriate choice in many cases. However, when the outcome variable of interest takes the form of infrequently occurring count data with highly skewed distributions, Poisson, or negative binomial regression approaches may be more appropriate. Through the use of SAS or similar statistical software, such analyses are also fairly simple to execute and generate results that are meaningful and easy to interpret. In choosing between Poisson and negative binomial regression, the factors to consider are overdispersion and power. When overdispersion is present, and the Poisson assumption of equidis- persion is violated, either a Poisson model with corrected standard errors or a negative binomial model may be used. Negative binomial modeling may give you a little more statistical power. Negativebinomialregressionmakesnoassump- tions of equidispersion and no adjustments need to be made when overdispersion is present. If Pois- son modeling is chosen, the statistical adjustments described above must be made to correct the standard errors. Hutchinson et al. (2003) provided examples of Poisson regressions that incorporate corrections for overdispersion. In their analysis of Table 7. Negative Binomial Regression Output Parameter DF Estimate Standard Error Wald 95% Confidence Limits w2 Probability > w2 Intercept 1 À2.0414 .9608 À3.9245 À.1584 4.51 .0336 Current age 1 .1220 .0543 .0156 .2283 5.05 .0246 African American 1 .6605 .1287 .4082 .9128 26.33 <.0001 Hispanic 1 .2070 .2187 À.2216 .6355 .90 .3439 Asian 1 .4897 .3295 À.1561 1.1354 2.21 .1372 Never married 1 À.9297 .2081 À1.3375 À.5219 19.96 <.0001 College plans 1 À.0871 .0515 À.1882 .0139 2.86 .0909 Contraceptive self-efficacy 1 À.2241 .0845 À.3898 À.0585 7.03 .0080 Consistent contraceptive use 1 À.2730 .0825 À.4347 À.1113 10.96 .0009 Dispersion 0 .0007 0 .0007 .0007 416 RESEARCH IN NURSING & HEALTH
  • 10. sexual risk behaviors among inner-city adoles- cent females, only one of the three sexual risk outcomes measured (number of male sexual partners during the past 3 months) showed equidispersion. The other two outcomes (number of days had sexual intercourse and number of days had unprotected sexual intercourse) were over- dispersed and corrected using the procedures described above. As is described above and illustrated in Table 8, OLS, Poisson, and negative binomial regressions yield regression coefficients that are quite similar. However, because of the non-normality of the distributions, the size of the standard errors and the resultant level of significance of the coefficients vary. The inappropriate use of OLS regression could lead one to commit Type I errors and erro- neously conclude that some variables are sig- nificant predictors of the number of adolescent pregnancies when in fact their effects are null. A related problem is that OLS will model pregnancies as a linear function of the predictor variables, which can lead to inaccurate predictions for the number of pregnancies for girls who measure even moderately high or low on those variables. Because the Poisson and negative bino- mial models build nonlinearity into the model by way of the log transformation, one gets a much better model fit to the data and more realistic predicted values. Finally, the Poisson and negative binomial models have a natural way of dealing with the problem of differential exposure among subjects. In our example, we take into account the number of years of sexual experience of each girl, an important predictor of pregnancies. By using years of experience or exposure as an ‘‘offset’’ variable, the models are automatically adjusted to give results that reflect the risk of pregnancy per year of exposure. In conclusion, Poisson and negative binomial regression may provide more appropriate means for modeling infrequently occurring repeatable events or counts. In addition to being better suited to the data when the outcome variable is skewed, these approaches have the additional advantages of being able to accommodate differential expo- sure and non-linear effects. Although researchers may be less familiar with these regression models, they are no more difficult to execute than tradi- tional OLS regression when using SAS or similar statistical software packages. REFERENCES Allison, P. (1999). Logistic regression using the SAS system: Theory and application. Cary, NC: The SAS Institute. Blum, R.W., Beuhring, T., Shew, M.L., Bearinger, L.H., Sieving, R.E., & Resnick, M.D. (2000). The effects of race/ethnicity, income, and family structure on adolescent risk behaviors. American Journal of Public Health, 90, 1879–1884. Cameron, A.C., & Trivedi, P.K. (1998). Regression analysis of count data. New York: Cambridge University Press. Table 8. Comparison of Four Regression Models OLS Poisson Negative Binomial Predicted % Change per Unit for Poisson and NB Models Parameter Estimate SE Parameter Estimate SE Corrected SE Parameter Estimate SE Intercept .37 .22 À2.04 .96 1.65 À2.04 .96 Current age .05 .01 .12 .05 .09 .12 .05 13.0 African American .18 .03 .66 .13 .22 .66 .13 93.6 Hispanic .03 .05 .21 .22 .37 .21 .22 23.0 Asian .09 .08 .49 .33 .56 .49 .33 63.2 Single À.72 .10 À.93 .21 .36 À.93 .21 À60.5 Years of sexual activity .07 .01 College plans À.04 .01 À.09 .05 .09 À.09 .05 À8.3 Contraceptive self-efficacy À.05 .02 À.22 .08 .14 À.22 .08 À20.1 Consistent contraceptive use À.07 .02 À.27 .08 .14 À.27 .08 À23.9 n for all models 1,154 ANALYSIS OF COUNT DATA / HUTCHINSON AND HOLTMAN 417
  • 11. DiClemente, R.J., Lodico, M., & Grinstead, O.A. (1996). African American adolescents residing in high-risk urban environments do use condoms: Correlates and predictors of condom use among adolescents in public housing developments. Pedia- trics, 98, 269–278. Hutchinson, M.K. (2002). Sexual risk communication with mothers and fathers: Influence on the sexual risk behaviors of adolescent daughters. Family Relations, 51, 238–247. Hutchinson, M.K., & Cooney, T.M. (1998). Parent-teen sexual risk communication implications for interven- tion. Family Relations, 47, 185–194. Hutchinson, M.K., Jemmott, J.B. III, Jemmott, L.S., Braverman, P., & Fong, G.T. (2003). The role of mother–daughter sexual risk communication in reducing sexual risk behaviors among urban adoles- cent females: A prospective study. Journal of Adolescent Health, 33, 98–107. Jemmott, J.B. III, Jemmott, L.S., & Hacker, C.I. (1992). Predicting intentions to use condoms among African American adolescents: The theory of planned beha- vior as a model of HIV risk-associated behavior. Ethnicity and Disease, 2, 371–380. Kann, L., Kinchen, S., Williams, B., Ross, J., Lowery, R., Grunbaum, J., et al. ( 2000). Youth risk behavior surveillance—U.S., 1999. Morbidity and Mortality Weekly Report, 49, 1–96. Lewis-Beck, M. (1989). Applied regression: An intro- duction. Newbury Park, CA: Sage. Scher, P.W., Emans, S.J., & Grace, E.M. (1982). Factors associated with compliance to oral contraceptive use in an adolescent population. Journal of Adolescent Health Care, 3, 120–123. 418 RESEARCH IN NURSING & HEALTH