Count data analysis

I
ADDIS ABABA UNIVERSITY
COLLEGE OF VETERINARY MEDICINE AND AGRICULTURE
Assignment for the course “Advanced Biostatistics” ON ANALYSIS OF COUNT DATA
By Walkite Furgasa Chala (DVM) ID NO., GSR/2792/10
Submitted to;
Samson Leta (DVM, MSc, Assistant Professor )
December, 2017
Bishoftu, Ethiopia

II
Table of Contents Page
LIST OF TABLE ........................................................................................................................III
LIST OF FIGURES ....................................................................................................................IV
LIST OF ABBREVATIONS ........................................................................................................V
SUMMARY.................................................................................................................................VI
1. INTRODUCTION..................................................................................................................... 1
2. STATISTICAL TESTS TO ANALYZE COUNT DATA ..................................................... 2
2.1 Poisson Regression................................................................................................................. 2
2.2 Negative Binomial Regression................................................................................................ 3
2.3 Zero Inflated Regression........................................................................................................ 4
3. ANALYSIS OF A COUNT DATA .......................................................................................... 5
3.1. Source of Data....................................................................................................................... 5
3.2. Types of Variables of the Data............................................................................................... 8
3.3. Poisson Regression Analysis and Its Interpretation............................................................... 8
3.4. Negative Binomial Regression Analysis and Its Interpretation........................................... 14
4 REFERENCES......................................................................................................................... 21

III
LIST OF TABLE
Table 1: Raw data of the Assignment

IV
LIST OF FIGURES
Figure 1. Q-Q plot of poission regression analysis
Figure 2. Q-Q plot of negative binomial regression analysis

V
LIST OF ABBREVATIONS
AIC Akaike Information Criterion
EPG Egg pergram of feaces
GLM Generalized linear model
IRR Incident rate ratio
NBREG Negative binomial regression model
ZINB Zero inflated negative binomial model
ZIP Zero inflated poisson model

VI
SUMMARY
In statistics, count data is a statistical data type in which the observations can take only the non-
negative integer values. Count models are a subset of discrete response regression models and
are distributed as non-negative integers, are intrinsically heteroskedastic, right skewed, and
have a variance that increases with the mean. An individual piece of count data is often termed
as a count variable. When such a variable is treated as a random variable,
the Poisson and negative binomial distributions are commonly used to represent its distribution
and if there is excess zeros, zero Inflated Regression was used. The objective of this assignment
was to write and analyze certain data on count data using R software. The title of the the data is
“Detection of Anthelmintic Resistance in Gastrointestinal Nematodes of Small Ruminants in
Haramaya University Farms”. The sheep and goats infected with gastrointestinal nematodes were
selected and I took 30 goats and 30 sheep. The goats and sheep were grouped into Albendazole
group(10), Ivermectin group(10) and the control(10). The egg was counted before treatment and
after treatment in treated group and again the egg was also counted twice in control group in
parallel to treated groups. The change of egg count was taken from treated groups and the second
egg count was taken in control group for these assignment. The data was analyzed with R
software through poisson regression and negative binomial regression models. The poisson
model didn`t fit the data because the result of overdispersion test indicate there is evidence of
overdispersion (c is estimated to be 872.046) which speaks quite strong against the assumption
of equidispersion that means when c=0. Pchisq p-value also nonsignificant(0) which indicates
the data was not fit. The normal quartile plot also indicates that the error is not normally
distributed. So generally since almost all assumption were violated or the goodness of fit of the
Poisson model indicates that the model is not fit. The ‘dispersiontest’indicate the data to be over
dispersed but the negative binomial regression model fit the data. The data was interpreted based
on the result obtain through negative binomial regression.
Keywords: Analysis, count data

1
1. INTRODUCTION
In statistics, count data is a statistical data type in which the observations can take only the
non-negative integer values {0, 1, 2, 3, ...}, and where these integers arise from counting.
The statistical treatment of count data is distinct from that of binary data, in which the
observations can take only two values, usually represented by 0 and 1, and from ordinal data,
which may also consist of integers but where the individual values fall on an arbitrary scale
and only the relative ranking is important(Cameron and Trivedi, 2013).
Count models are a subset of discrete response regression models. Count data are
distributed as non-negative integers, right skewed, and have a variance that increases with
the mean. Example, count data include such situations as length of hospital stay, the
number of a certain species of fish per defined area in the ocean, the number of lights
displayed by fireflies over specified time periods, the classic case of the number of deaths
and the number of occurrences of thunderstorms in a calendar year. An individual piece of
count data is often termed a count variable. When such a variable is treated as a random
variable, the Poisson and negative binomial distributions are commonly used to represent its
distribution (Cameron and Trivedi, 1986).
Graphical examination of count data may be aided by the use of data transformations chosen
to have the property of stabilising the sample variance. In particular, the square root
transformation might be used when data can be approximated by a Poisson
distribution (although other transformation have modestly improved properties), while an
inverse sine transformation is available when a binomial distribution is preferred(Hilbe,
2011b).

2
2. STATISTICAL TESTS TO ANALYZE COUNT DATA
2.1 Poisson Regression
The Poisson distribution can form the basis for some analyses of count data and in this
case Poisson regression may be used. This is a special case of the class of generalized linear
models which also contains specific forms of model capable of using the binomial
distribution (binomial regression, logistic regression) or the negative binomial distribution
where the assumptions of the Poisson model are violated, in particular when the range of
count values is limited or when overdispersion is present(Hilbe, 2011a).
A key feature of the Poisson model is the equality of the mean and variance functions. When
the variance of a Poisson model exceeds its mean, the model is termed overdispersed.
Simulation studies have demonstrated that overdispersion is indicated when the Pearson
χ2dispersion is greater than 1.0. The dispersion statistic is defined as the Pearson χ2 divided
by the model residual degrees of freedom. Overdispersion, common to most Poisson models,
biases the parameter estimates and fitted values. When Poisson overdispersion is real, and
not merely apparent, a count model other than Poisson is required(Hilbe, 2008).
Poisson regression is the basic model from which a variety of count models are based. It is
derived from the Poisson probability mass function. The Poisson regression model is the
benchmark model for count data in much the same way as the normal linear model is the
benchmark for real-valued continuous data(Cameron and Trivedi, 1986).
The Poisson model is simple, and it is robust. If the only interest of the analysis lies in
estimating the parameters of a log-linear mean function, there is hardly any reason (except
for efficiency) to ever contemplate anything other than the Poisson regression model. In
fact, its applicability extends well beyond the traditional domain of count data. The
Poisson regression model can be used for any constant elasticity mean function, whether
the dependent variable is a count, and there are good reasons why it should be preferred
over the more common log transformation of the dependent variable. In fact, its
applicability extends well beyond the traditional domain of count data. And yet, there are
instances where the Poisson regression model is unsuited. Essentially, the Poisson model is

3
always overly restrictive when it comes to estimating features of the population other than
the mean, such as the variance or the probability of single outcomes.
The Poisson distribution has a positive mean µ. Although a GLM can model a positive mean
using the identity link, it is more common to model the log of the mean. Like the linear
predictor α+βx, the log mean can take any real value. The log mean is the natural parameter
for the Poisson distribution, and the log link is the canonical link for a Poisson GLM. A
Poisson loglinear GLM assumes a Poisson distribution for Y and uses the log link. The
Poisson loglinear model with explanatory variable X is logµ=α+βx. For this model, the mean
satisfies the exponential relationship µ=exp(α+βx)=eα(eβ)x. A one unit increase in x has a
multiplicative impact of eβ on µ. The mean at x+1equals the mean at x multiplied by eβ.(Re)
.
In some contexts, the Poisson distribution describes the number of events that occur in a
given time period where its mean µ is the average number of events per period. It has the
unusual feature that its mean equals its variance. Its probability density function is Pr(Y = y )
= e-µµy/y!, y=0,1,2,. . .where e is the base of the natural logarithms and y ! is the factorial of
y . The skewness of the Poisson distribution is (1/µ) and the kurtosis is (3 + 1/µ), so that for
large µ, the distribution approaches the Normal N (µ,µ) with skewness of zero and kurtosis
of three (Christopher,2010)
2.2 Negative Binomial Regression
A limitation of the Poisson distribution is the equality of its mean and variance. It may often
observe count data processes where this equality is not reasonable: in particular, where the
conditional variance is larger than the conditional mean. This is termed overdispersion, and its
presence renders the assumption of a Poisson distribution for the error process untenable. It is
particularly likely to occur in the case of unobserved heterogeneity. In this circumstance, a
reasonable alternative is negative binomial regression. The negative binomial is a conjugate
mixture distribution for count data. The negative binomial (NB) distribution is a two-parameter
distribution. For positive integer n, it is the distribution of the number of failures that occur in a
sequence of trials before n successes have occurred, where the probability of success in each trial
is p. The distribution is defined for any positive n. The negative binomial distribution is a

4
mixture of the Poisson distribution and the Gamma distribution, or generalized factorial function.
Unlike the Poisson, which is fully characterized by its mean µ, the NB distribution is a function
of both µ and α . Its mean is still µ, but its conditional variance is µ(1 +α). As evident, as α=0,
the distribution becomes the Poisson distribution(Christopher, 2010)
2.3 Zero Inflated Regression
In many studies count data may possess excess amount of zeros. If data consist of non-
negative, highly skewed sequence counts with a large proportion of zeros. Zero-Inflated
Poisson (ZIP), Zero-Inflated Negative Binomial (ZINB) Models and Hurdle models are
useful for analysing of such data. Zero counts may not occur in the same process as other
positive counts. Zero-inflated count data may not have equality of mean and variance. In
such case over-dispersion (or under-dispersion) need to be taken into account. (Lambert,
1992)

5
3. ANALYSIS OF A COUNT DATA
3.1. Source of Data and Its Description .
The data was normally my DVM thesis. The data was on East Africa Journal of veterinary and
Animal science 03 gallery proof. walkite et al., 2017. The title of the the data or the research is
“Detection of Anthelmintic Resistance in Gastrointestinal Nematodes of Small Ruminants in
Haramaya University Farms”. The sheep and goats infected with gastrointestinal nematodes
were selected and I took 30 goats and 30 sheep. The goats and sheep were grouped into
Albendazole group(10), Ivermectin group(10) and the control(10). The egg was counted
before treatment and after treatment in treated group and again the egg was also counted twice
in control group in parallel to treated groups. The change of egg count was taken from treated
groups and the second egg count was taken in control group for these assignment(Walkite et
al., 2017).
Table 1:- The raw data of the Assignment
No, ID age species sex treatment EPG
1 1546 >3yrs goat male Albendazole 1050
2 1595 <-1yrs goat male Albendazole 2500
3 1612
2yrs-
3yrs goat male Albendazole 2800
5 1576
2yrs-
3yrs goat male Albendazole 9050
8 1608 <-1yrs goat female Albendazole 650
9 1526
2yrs-
3yrs goat female Albendazole 1850
10 1605 <-1yrs goat female Albendazole 2350
11 63
2yrs-
3yrs goat female Ivermectin 400
12 42
2yrs-

6
13 110
2yrs-
3yrs goat male Ivermectin 5750
14 111
2yrs-
15 28
2yrs-
16 1425
2yrs-
17 80
2yrs-
18 96
2yrs-
19 72
2yrs-
20 87
2yrs-
21 1536 >3yrs goat female control 2550
23 1580 >3yrs goat male control 2250
31 106 <-1yrs sheep male Albendazole 300
32 13 2yrs-3yrs sheep female Albendazole 2050
35 95 >1yrs sheep male Albendazole 5100
36 190 <-1yrs sheep male Albendazole 250
38 89 >3yrs sheep female Albendazole 1150
40 187 >3yrs sheep female Albendazole 2100
41 109 2yrs-3yrs sheep male Ivermectin 1100

7
42 5 2yrs-3yrs sheep female Ivermectin 350
43 110 >3yrs sheep male Ivermectin 500
44 168 >1yrs sheep female Ivermectin 1200
45 120 >yrs sheep male Ivermectin 2350
46 20 2yrs-3yrs sheep male Ivermectin 300
47 83 1yrs sheep male Ivermectin 1850
50 909 >3yrs sheep male Ivermectin 800
51 6 >3yrs sheep female control 1350
53 11 2yrs-3yrs sheep female control 350
54 86 2yrs-3yrs sheep male control 1200

8
3.2. Types of Variables of the Data
The EPG is the count response variables and sex,species, age and treatment are the
explanatory variables.
3.3. Poisson RegressionAnalysis and Its Interpretation
attach(walkite_Assignment_)
names(walkite_Assignment_)
View(walkite_Assignment_)
nematode<-glm(EPG~factor(age)+factor(species)+factor(sex)+factor(treatment),family =
"poisson",data = walkite_Assignment_)
nematode
summary(nematode)
coef <- coefficients(nematode)
coef
IRR <- exp(coefficients(nematode))
IRR
# predicted values and residual error
pred <- predict(nematode, type="response") # estimate predicted values
pred
res <- residuals(nematode, type="deviance") # estimate residuals
res
qqnorm(res, plot.it = TRUE)
qqline(res)
#Evaluating the fitness of Poisson regression models
?pchisq
pchisq(nematode$deviance,df=nematode$df.residual,lower.tail = FALSE)
library(AER)
dispersion <- dispersiontest(nematode,trafo=1)

9
dispersion
###################################################
library(readxl)
> walkite_Assignment_ <- read_excel("~/walkite Assignment .xlsx")
> View(walkite_Assignment_)
> attach(walkite_Assignment_)
The following object is masked _by_ .GlobalEnv:
age
The following objects are masked from walkite_Assignment_ (pos = 3):
age, EPG, ID, no,, sex, species, treatment
> names(walkite_Assignment_)
[1] "no," "ID" "age" "species" "sex" "treatment"
[7] "EPG"
> View(walkite_Assignment_)
>nematode<-glm(EPG~factor(age)+factor(species)+factor(sex)+factor(treatment),family =
> nematode
Call: glm(formula = EPG ~ factor(age) + factor(species) + factor(sex) +

10
factor(treatment), family = "poisson", data = walkite_Assignment_)
Coefficients:
(Intercept) factor(age)>3yrs
7.452148 0.005106
factor(age)2yrs-3yrs factor(species)sheep
0.308401 -0.500520
factor(sex)male factor(treatment)control
0.393118 -0.356036
factor(treatment)Ivermectin
-0.307651
Degrees of Freedom: 59 Total (i.e. Null); 53 Residual
Null Deviance: 64280
Residual Deviance: 49460 AIC: 50000
> summary(nematode)
Call:
glm(formula = EPG ~ factor(age) + factor(species) + factor(sex) +
factor(treatment), family = "poisson", data = walkite_Assignment_)
Deviance Residuals:
Min 1Q Median 3Q Max
-41.835 -28.155 -6.764 14.689 78.557
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 7.452148 0.009425 790.677 <2e-16 ***
factor(age)>3yrs 0.005106 0.010897 0.469 0.639
factor(age)2yrs-3yrs 0.308401 0.009429 32.708 <2e-16 ***
factor(species)sheep -0.500520 0.006708 -74.620 <2e-16 ***

11
factor(sex)male 0.393118 0.006936 56.681 <2e-16 ***
factor(treatment)control -0.356036 0.009612 -37.039 <2e-16 ***
factor(treatment)Ivermectin -0.307651 0.008409 -36.586 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 64280 on 59 degrees of freedom
Residual deviance: 49456 on 53 degrees of freedom
AIC: 50005
Number of Fisher Scoring iterations: 5
> coef <- coefficients(nematode)
> coef
7.452147624 0.005106036
0.308400816 -0.500519597
0.393117562 -0.356036156
-0.307651001
> IRR <- exp(coefficients(nematode))
> IRR
1723.5607335 1.0051191
1.3612465 0.6062156

12
1.4815926 0.7004473
0.7351718
> pred <- predict(nematode, type="response") # estimate predicted values
> pred
1 2 3 4 5 6 7
2566.6869 2553.6148 3476.0991 2553.6148 3476.0991 2553.6148 2553.6148
8 9 10 11 12 13 14
1723.5607 2346.1910 1723.5607 1724.8536 1724.8536 2555.5302 2555.5302
15 16 17 18 19 20 21
2555.5302 2555.5302 2555.5302 2555.5302 1724.8536 1724.8536 1213.4435
22 23 24 25 26 27 28
1213.4435 1797.8289 1797.8289 1797.8289 1797.8289 1797.8289 1797.8289
29 30 31 32 33 34 35
1213.4435 1213.4435 1548.0411 1422.2976 1422.2976 1422.2976 1548.0411
36 37 38 39 40 41 42
1548.0411 1555.9656 1050.1981 1555.9656 1050.1981 1549.2023 1045.6331
43 44 45 46 47 48 49
1143.9021 768.1439 1138.0762 1549.2023 1138.0762 1045.6331 1045.6331
50 51 52 53 54 55 56
1143.9021 735.6084 735.6084 996.2445 1476.0284 996.2445 735.6084
57 58 59 60
1476.0284 1476.0284 996.2445 735.6084
> res <- residuals(nematode, type="deviance") # estimate residuals
> res
1 2 3 4 5 6
-34.0049962 -1.0647241 -11.8729495 -23.7904413 78.5573439 -5.1054781
7 8 9 10 11 12
-33.7774785 -29.6545718 -10.6411543 14.2908883 -38.4780564 33.6401126
13 14 15 16 17 18
54.1929143 41.1171517 -15.7910551 -32.5049447 -7.2062497 -33.8108615

13
19 20 21 22 23 24
-40.4132669 -5.5385905 33.3812205 10.5744782 10.2584010 -41.8351111
25 26 27 28 29 30
21.8329977 34.5404042 -29.5821101 -32.8446551 -6.3216015 -16.0217059
31 32 33 34 35 36
-38.8780694 15.6018159 -32.0896195 -40.7419996 71.1128294 -41.0419275
37 38 39 40 41 42
-0.1513335 3.0327221 -1.4274364 28.4749380 -12.0440289 -25.0030992
43 44 45 46 47 48
-21.4525464 14.3849662 31.3689294 -38.9021433 19.3334906 28.6354384
49 50 51 52 53 54
-4.6148356 -10.7546288 20.2621346 14.1032342 -23.6695434 -7.4280964
55 56 57 58 59 60
10.6268694 -26.3476376 -27.6793360 -7.4280964 10.6268694 20.2621346
> qqnorm(res, plot.it = TRUE)
> qqline(res)
*The normal quartile plot indicates that the error is not normally distributed
?pchisq
> pchisq(nematode$deviance,df=nematode$df.residual,lower.tail = FALSE)

14
[1] 0
Interpretation: In this result the p-value zero (0) which indicates it is significant, indicating the lack of
fit of the data. The significance of the p-value in this result shows that there is presence of
overdispersion and it reveals that the poisson model data does not fit the data
> library(AER)
> dispersion <- dispersiontest(nematode,trafo=1)
> dispersion
Overdispersion test
data: nematode
z = 4.2675, p-value = 9.884e-06
alternative hypothesis: true alpha is greater than 0
sample estimates:
alpha
871.0029
The result of overdispersion test indicate there is evidence of overdispersion (c is estimated to be
872.046) which speaks quite strong against the assumption of equidispersion that means when c=0. So
generally since almost all assumption were violated or the goodness of fit of the Poisson model indicates
that the model is not fit. The ‘dispersiontest’indicate the data to be over dispersed. The normal quartile
plot also indicates that the error is not normally distributed. Thus, it is better to look for
Negative Binomial Regression.
3.4. Negative Binomial RegressionAnalysis and Its Interpretation
#Negative Binomial regression
library(MASS)
NBREG<-glm.nb(EPG~factor(age)+factor(species)+factor(sex)+factor(treatment),data =

15
walkite_Assignment_)
NBREG
summary(NBREG)
#####Checking the model assumption
library(lmtest)
lrtest(nematode,NBREG)
coef <- coefficients(NBREG)
coef
IRR <- exp(coefficients(NBREG))
IRR
# predicted values and residual error
pred <- predict(NBREG, type="response") # estimate predicted values
pred
res <- residuals(NBREG, type="deviance") # estimate residuals
res
qqnorm(res, plot.it = TRUE)
qqline(res)
################################
> library(MASS)
>NBREG<-glm.nb(EPG~factor(age)+factor(species)+factor(sex)+factor(treatment),data =
> NBREG
Call: glm.nb(formula = EPG ~ factor(age) + factor(species) + factor(sex) +
factor(treatment), data = walkite_Assignment_, init.theta = 1.923394949,
link = log)
Coefficients:
7.59952 -0.08562

16
0.06591 -0.48364
0.29248 -0.29743
-0.21070
Degrees of Freedom: 59 Total (i.e. Null); 53 Residual
Null Deviance: 79.85
Residual Deviance: 65.06 AIC: 1005
> summary(NBREG)
Call:
glm.nb(formula = EPG ~ factor(age) + factor(species) + factor(sex) +
factor(treatment), data = walkite_Assignment_, init.theta = 1.923394949,
link = log)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.9960 -1.0922 -0.1551 0.4745 1.9744
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 7.59952 0.28454 26.708 <2e-16 ***
factor(age)>3yrs -0.08562 0.30986 -0.276 0.7823
factor(age)2yrs-3yrs 0.06591 0.28140 0.234 0.8148
factor(species)sheep -0.48364 0.18917 -2.557 0.0106 *
factor(sex)male 0.29248 0.19578 1.494 0.1352
factor(treatment)control -0.29743 0.26745 -1.112 0.2661
factor(treatment)Ivermectin -0.21070 0.24985 -0.843 0.3991
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

17
(Dispersion parameter for Negative Binomial(1.9234) family taken to be 1)
Null deviance: 79.853 on 59 degrees of freedom
Residual deviance: 65.056 on 53 degrees of freedom
AIC: 1005.4
Number of Fisher Scoring iterations: 1
Theta: 1.923
Std. Err.: 0.326
2 x log-likelihood: -989.360
> library(lmtest)
>nematode<-glm(EPG~factor(age)+factor(species)+factor(sex)+factor(treatment),family =
>NBREG<-glm.nb(EPG~factor(age)+factor(species)+factor(sex)+factor(treatment),data =
> lrtest(nematode,NBREG)
Likelihood ratio test
Model 1: EPG ~ factor(age) + factor(species) + factor(sex) + factor(treatment)
Model 2: EPG ~ factor(age) + factor(species) + factor(sex) + factor(treatment)
#Df LogLik Df Chisq Pr(>Chisq)
1 7 -24995.3
2 8 -494.7 1 49001 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Interpretation: In this model checking the associated chi-squared value estimated from

18
2*(logLik(nematode) – logLik(NBREG)) is 49001 with one degree of freedom and the p-value is
less than the significance level. This strongly suggests that the negative binomial model,
estimating the dispersion parameter, is more appropriate than the Poisson model.
> coef <- coefficients(NBREG)
> coef
7.59951872 -0.08562393
0.06591159 -0.48363536
0.29247990 -0.29743050
-0.21070237
> IRR <- exp(coefficients(NBREG))
> IRR
1997.2344396 0.9179394
1.0681323 0.6165380
1.3397458 0.7427242
0.8100151
> pred <- predict(NBREG, type="response") # estimate predicted values
> pred
1 2 3 4 5 6 7
2456.2098 2675.7865 2858.0939 2675.7865 2858.0939 2675.7865 2675.7865
8 9 10 11 12 13 14
1997.2344 2133.3106 1997.2344 1728.0138 1728.0138 2315.0993 2315.0993
15 16 17 18 19 20 21
2315.0993 2315.0993 2315.0993 2315.0993 1728.0138 1728.0138 1361.6661

19
22 23 24 25 26 27 28
1361.6661 1824.2864 1824.2864 1824.2864 1824.2864 1824.2864 1824.2864
29 30 31 32 33 34 35
1361.6661 1361.6661 1649.7240 1315.2670 1315.2670 1315.2670 1649.7240
36 37 38 39 40 41 42
1649.7240 1514.3466 1130.3238 1514.3466 1130.3238 1427.3466 1065.3861
43 44 45 46 47 48 49
1226.6436 997.4290 1336.3013 1427.3466 1336.3013 1065.3861 1065.3861
50 51 52 53 54 55 56
1226.6436 839.5189 839.5189 976.8806 1308.7717 976.8806 839.5189
57 58 59 60
1308.7717 1308.7717 976.8806 839.5189
> res <- residuals(NBREG, type="deviance") # estimate residuals
> res
1 2 3 4 5 6
-1.03229229 -0.09315152 -0.02837324 -0.77077140 1.97435995 -0.20463991
7 8 9 10 11 12
-1.12245881 -1.31177836 -0.19293868 0.23175489 -1.63311996 1.00491029
13 14 15 16 17 18
1.48540908 1.18739110 -0.33482919 -0.91794012 -0.07009783 -0.96869028
19 20 21 22 23 24
-1.75162396 -0.19161862 0.97087411 0.22971859 0.30126846 -1.79873459
25 26 27 28 29 30
0.63951789 0.98797747 -1.14541053 -1.30127519 -0.40687757 -0.83012023
31 32 33 34 35 36
-1.84436364 0.66416970 -1.37752773 -1.99254932 1.92366710 -1.99600958
37 38 39 40 41 42
0.03237883 0.02398299 -0.01317219 0.95704739 -0.34600465 -1.30165058
43 44 45 46 47 48
-1.08206029 0.26433726 0.86351808 -1.71879153 0.47665214 1.05999520
49 50 51 52 53 54

20
-0.22734653 -0.55289232 0.71465995 0.46009582 -1.21473337 -0.11852987
55 56 57 58 59 60
0.47377501 -1.85712317 -1.04996037 -0.11852987 0.47377501 0.71465995
> qqnorm(res, plot.it = TRUE)
> qqline(res)
>
*The normal quartile plot indicates that the error is almost normally distributed. Thus the
negative binomial regression fit the data.
IT’S INTERPRETATION
The interpretation should be based on negative binomial regression analysis because the poission
model does not fit the Data. In the above Negative binomial regression analysis ‘Albendazole’
from –treatment-, ‘female’ from –sex- and ‘<-1yrs’ from -age, `goat` from species were used
as references. Sex and age have statistically nonsignificant impact on EPG count and control
group has nonsignificant effect on EPG count. Species has significant impact on EPG count. The
reduction factor caused by Ivermectin drug is (exp(-0.21070)-1)*100= -18.998. Even if there is
reduction in EPG count the Ivermectin drug has nonsignificant impact on EPG count because the
p-value for Ivermectin is 0.3991. Hence this indicates that there is resistance of the parasite or
the efficacy of the drug is not good. Generally; since the control group (p-value=0.2661) has

21
nonsignifant effect on EPG count, in both Albendazole and Ivermectin resistance of parasite
were detected.
4. REFERENCES
Cameron, A.C., Trivedi, P.K., 1986. Econometric models based on count data. Comparisons and
applications of some estimators and tests. Journal of applied econometrics 1, 29-53.
Cameron, A.C., Trivedi, P.K., 2013. Regression analysis of count data. Cambridge university
press.
Christopher, B., 2010. Models for Count Data and Categorical Response Data.
Hilbe, J.M., 2008. Brief overview on interpreting count model risk ratios: An addendum to
negative binomial regression. Cambridge University Press Cambridge.
Hilbe, J.M., 2011a. Modeling count data. International Encyclopedia of Statistical Science.
Springer, pp. 836-839.
Hilbe, J.M., 2011b. Negative binomial regression. Cambridge University Press.
Lambert, D., 1992. Zero-inflated Poisson regression, with an application to defects in
manufacturing. Technometrics 34, 1-14.
Walkite, F., Negesse, M., Anwar, H., 2017. Detection of Antihelmintic resistance in
gastrointestinal nematode parasite in small ruminant in Haramaya university farms. pp. 13-19.

Count data analysis

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Count data analysis

Similar to Count data analysis (20)

Recently uploaded

Recently uploaded (20)

Count data analysis