SlideShare a Scribd company logo
1 of 27
I
ADDIS ABABA UNIVERSITY
COLLEGE OF VETERINARY MEDICINE AND AGRICULTURE
Assignment for the course “Advanced Biostatistics” ON ANALYSIS OF COUNT DATA
By Walkite Furgasa Chala (DVM) ID NO., GSR/2792/10
Submitted to;
Samson Leta (DVM, MSc, Assistant Professor )
December, 2017
Bishoftu, Ethiopia
II
Table of Contents Page
LIST OF TABLE ........................................................................................................................III
LIST OF FIGURES ....................................................................................................................IV
LIST OF ABBREVATIONS ........................................................................................................V
SUMMARY.................................................................................................................................VI
1. INTRODUCTION..................................................................................................................... 1
2. STATISTICAL TESTS TO ANALYZE COUNT DATA ..................................................... 2
2.1 Poisson Regression................................................................................................................. 2
2.2 Negative Binomial Regression................................................................................................ 3
2.3 Zero Inflated Regression........................................................................................................ 4
3. ANALYSIS OF A COUNT DATA .......................................................................................... 5
3.1. Source of Data....................................................................................................................... 5
3.2. Types of Variables of the Data............................................................................................... 8
3.3. Poisson Regression Analysis and Its Interpretation............................................................... 8
3.4. Negative Binomial Regression Analysis and Its Interpretation........................................... 14
4 REFERENCES......................................................................................................................... 21
III
LIST OF TABLE
Table 1: Raw data of the Assignment
IV
LIST OF FIGURES
Figure 1. Q-Q plot of poission regression analysis
Figure 2. Q-Q plot of negative binomial regression analysis
V
LIST OF ABBREVATIONS
AIC Akaike Information Criterion
EPG Egg pergram of feaces
GLM Generalized linear model
IRR Incident rate ratio
NBREG Negative binomial regression model
ZINB Zero inflated negative binomial model
ZIP Zero inflated poisson model
VI
SUMMARY
In statistics, count data is a statistical data type in which the observations can take only the non-
negative integer values. Count models are a subset of discrete response regression models and
are distributed as non-negative integers, are intrinsically heteroskedastic, right skewed, and
have a variance that increases with the mean. An individual piece of count data is often termed
as a count variable. When such a variable is treated as a random variable,
the Poisson and negative binomial distributions are commonly used to represent its distribution
and if there is excess zeros, zero Inflated Regression was used. The objective of this assignment
was to write and analyze certain data on count data using R software. The title of the the data is
“Detection of Anthelmintic Resistance in Gastrointestinal Nematodes of Small Ruminants in
Haramaya University Farms”. The sheep and goats infected with gastrointestinal nematodes were
selected and I took 30 goats and 30 sheep. The goats and sheep were grouped into Albendazole
group(10), Ivermectin group(10) and the control(10). The egg was counted before treatment and
after treatment in treated group and again the egg was also counted twice in control group in
parallel to treated groups. The change of egg count was taken from treated groups and the second
egg count was taken in control group for these assignment. The data was analyzed with R
software through poisson regression and negative binomial regression models. The poisson
model didn`t fit the data because the result of overdispersion test indicate there is evidence of
overdispersion (c is estimated to be 872.046) which speaks quite strong against the assumption
of equidispersion that means when c=0. Pchisq p-value also nonsignificant(0) which indicates
the data was not fit. The normal quartile plot also indicates that the error is not normally
distributed. So generally since almost all assumption were violated or the goodness of fit of the
Poisson model indicates that the model is not fit. The ‘dispersiontest’indicate the data to be over
dispersed but the negative binomial regression model fit the data. The data was interpreted based
on the result obtain through negative binomial regression.
Keywords: Analysis, count data
1
1. INTRODUCTION
In statistics, count data is a statistical data type in which the observations can take only the
non-negative integer values {0, 1, 2, 3, ...}, and where these integers arise from counting.
The statistical treatment of count data is distinct from that of binary data, in which the
observations can take only two values, usually represented by 0 and 1, and from ordinal data,
which may also consist of integers but where the individual values fall on an arbitrary scale
and only the relative ranking is important(Cameron and Trivedi, 2013).
Count models are a subset of discrete response regression models. Count data are
distributed as non-negative integers, right skewed, and have a variance that increases with
the mean. Example, count data include such situations as length of hospital stay, the
number of a certain species of fish per defined area in the ocean, the number of lights
displayed by fireflies over specified time periods, the classic case of the number of deaths
and the number of occurrences of thunderstorms in a calendar year. An individual piece of
count data is often termed a count variable. When such a variable is treated as a random
variable, the Poisson and negative binomial distributions are commonly used to represent its
distribution (Cameron and Trivedi, 1986).
Graphical examination of count data may be aided by the use of data transformations chosen
to have the property of stabilising the sample variance. In particular, the square root
transformation might be used when data can be approximated by a Poisson
distribution (although other transformation have modestly improved properties), while an
inverse sine transformation is available when a binomial distribution is preferred(Hilbe,
2011b).
2
2. STATISTICAL TESTS TO ANALYZE COUNT DATA
2.1 Poisson Regression
The Poisson distribution can form the basis for some analyses of count data and in this
case Poisson regression may be used. This is a special case of the class of generalized linear
models which also contains specific forms of model capable of using the binomial
distribution (binomial regression, logistic regression) or the negative binomial distribution
where the assumptions of the Poisson model are violated, in particular when the range of
count values is limited or when overdispersion is present(Hilbe, 2011a).
A key feature of the Poisson model is the equality of the mean and variance functions. When
the variance of a Poisson model exceeds its mean, the model is termed overdispersed.
Simulation studies have demonstrated that overdispersion is indicated when the Pearson
χ2dispersion is greater than 1.0. The dispersion statistic is defined as the Pearson χ2 divided
by the model residual degrees of freedom. Overdispersion, common to most Poisson models,
biases the parameter estimates and fitted values. When Poisson overdispersion is real, and
not merely apparent, a count model other than Poisson is required(Hilbe, 2008).
Poisson regression is the basic model from which a variety of count models are based. It is
derived from the Poisson probability mass function. The Poisson regression model is the
benchmark model for count data in much the same way as the normal linear model is the
benchmark for real-valued continuous data(Cameron and Trivedi, 1986).
The Poisson model is simple, and it is robust. If the only interest of the analysis lies in
estimating the parameters of a log-linear mean function, there is hardly any reason (except
for efficiency) to ever contemplate anything other than the Poisson regression model. In
fact, its applicability extends well beyond the traditional domain of count data. The
Poisson regression model can be used for any constant elasticity mean function, whether
the dependent variable is a count, and there are good reasons why it should be preferred
over the more common log transformation of the dependent variable. In fact, its
applicability extends well beyond the traditional domain of count data. And yet, there are
instances where the Poisson regression model is unsuited. Essentially, the Poisson model is
3
always overly restrictive when it comes to estimating features of the population other than
the mean, such as the variance or the probability of single outcomes.
The Poisson distribution has a positive mean µ. Although a GLM can model a positive mean
using the identity link, it is more common to model the log of the mean. Like the linear
predictor α+βx, the log mean can take any real value. The log mean is the natural parameter
for the Poisson distribution, and the log link is the canonical link for a Poisson GLM. A
Poisson loglinear GLM assumes a Poisson distribution for Y and uses the log link. The
Poisson loglinear model with explanatory variable X is logµ=α+βx. For this model, the mean
satisfies the exponential relationship µ=exp(α+βx)=eα(eβ)x. A one unit increase in x has a
multiplicative impact of eβ on µ. The mean at x+1equals the mean at x multiplied by eβ.(Re)
.
In some contexts, the Poisson distribution describes the number of events that occur in a
given time period where its mean µ is the average number of events per period. It has the
unusual feature that its mean equals its variance. Its probability density function is Pr(Y = y )
= e-µµy/y!, y=0,1,2,. . .where e is the base of the natural logarithms and y ! is the factorial of
y . The skewness of the Poisson distribution is (1/µ) and the kurtosis is (3 + 1/µ), so that for
large µ, the distribution approaches the Normal N (µ,µ) with skewness of zero and kurtosis
of three (Christopher,2010)
2.2 Negative Binomial Regression
A limitation of the Poisson distribution is the equality of its mean and variance. It may often
observe count data processes where this equality is not reasonable: in particular, where the
conditional variance is larger than the conditional mean. This is termed overdispersion, and its
presence renders the assumption of a Poisson distribution for the error process untenable. It is
particularly likely to occur in the case of unobserved heterogeneity. In this circumstance, a
reasonable alternative is negative binomial regression. The negative binomial is a conjugate
mixture distribution for count data. The negative binomial (NB) distribution is a two-parameter
distribution. For positive integer n, it is the distribution of the number of failures that occur in a
sequence of trials before n successes have occurred, where the probability of success in each trial
is p. The distribution is defined for any positive n. The negative binomial distribution is a
4
mixture of the Poisson distribution and the Gamma distribution, or generalized factorial function.
Unlike the Poisson, which is fully characterized by its mean µ, the NB distribution is a function
of both µ and α . Its mean is still µ, but its conditional variance is µ(1 +α). As evident, as α=0,
the distribution becomes the Poisson distribution(Christopher, 2010)
2.3 Zero Inflated Regression
In many studies count data may possess excess amount of zeros. If data consist of non-
negative, highly skewed sequence counts with a large proportion of zeros. Zero-Inflated
Poisson (ZIP), Zero-Inflated Negative Binomial (ZINB) Models and Hurdle models are
useful for analysing of such data. Zero counts may not occur in the same process as other
positive counts. Zero-inflated count data may not have equality of mean and variance. In
such case over-dispersion (or under-dispersion) need to be taken into account. (Lambert,
1992)
5
3. ANALYSIS OF A COUNT DATA
3.1. Source of Data and Its Description .
The data was normally my DVM thesis. The data was on East Africa Journal of veterinary and
Animal science 03 gallery proof. walkite et al., 2017. The title of the the data or the research is
“Detection of Anthelmintic Resistance in Gastrointestinal Nematodes of Small Ruminants in
Haramaya University Farms”. The sheep and goats infected with gastrointestinal nematodes
were selected and I took 30 goats and 30 sheep. The goats and sheep were grouped into
Albendazole group(10), Ivermectin group(10) and the control(10). The egg was counted
before treatment and after treatment in treated group and again the egg was also counted twice
in control group in parallel to treated groups. The change of egg count was taken from treated
groups and the second egg count was taken in control group for these assignment(Walkite et
al., 2017).
Table 1:- The raw data of the Assignment
No, ID age species sex treatment EPG
1 1546 >3yrs goat male Albendazole 1050
2 1595 <-1yrs goat male Albendazole 2500
3 1612
2yrs-
3yrs goat male Albendazole 2800
4 1599 <-1yrs goat male Albendazole 1450
5 1576
2yrs-
3yrs goat male Albendazole 9050
6 1593 <-1yrs goat male Albendazole 2300
7 1609 <-1yrs goat male Albendazole 1050
8 1608 <-1yrs goat female Albendazole 650
9 1526
2yrs-
3yrs goat female Albendazole 1850
10 1605 <-1yrs goat female Albendazole 2350
11 63
2yrs-
3yrs goat female Ivermectin 400
12 42
2yrs-
3yrs goat female Ivermectin 3300
6
13 110
2yrs-
3yrs goat male Ivermectin 5750
14 111
2yrs-
3yrs goat male Ivermectin 4900
15 28
2yrs-
3yrs goat male Ivermectin 1800
16 1425
2yrs-
3yrs goat male Ivermectin 1100
17 80
2yrs-
3yrs goat male Ivermectin 2200
18 96
2yrs-
3yrs goat male Ivermectin 1050
19 72
2yrs-
3yrs goat female Ivermectin 350
20 87
2yrs-
3yrs goat female Ivermectin 1500
21 1536 >3yrs goat female control 2550
22 1543 >3yrs goat female control 1600
23 1580 >3yrs goat male control 2250
24 13 >3yrs goat male control 350
25 68 >3yrs goat male control 2800
26 6 >3yrs goat male control 3450
27 5 >3yrs goat male control 700
28 21 >3yrs goat male control 600
29 31 >3yrs goat female control 1000
30 259 >3yrs goat female control 700
31 106 <-1yrs sheep male Albendazole 300
32 13 2yrs-3yrs sheep female Albendazole 2050
33 237 2yrs-3yrs sheep female Albendazole 400
34 42 2yrs-3yrs sheep female Albendazole 200
35 95 >1yrs sheep male Albendazole 5100
36 190 <-1yrs sheep male Albendazole 250
37 148 >3yrs sheep male Albendazole 1550
38 89 >3yrs sheep female Albendazole 1150
39 158 >3yrs sheep male Albendazole 1500
40 187 >3yrs sheep female Albendazole 2100
41 109 2yrs-3yrs sheep male Ivermectin 1100
7
42 5 2yrs-3yrs sheep female Ivermectin 350
43 110 >3yrs sheep male Ivermectin 500
44 168 >1yrs sheep female Ivermectin 1200
45 120 >yrs sheep male Ivermectin 2350
46 20 2yrs-3yrs sheep male Ivermectin 300
47 83 1yrs sheep male Ivermectin 1850
48 60 2yrs-3yrs sheep female Ivermectin 2100
49 14 2yrs-3yrs sheep female Ivermectin 900
50 909 >3yrs sheep male Ivermectin 800
51 6 >3yrs sheep female control 1350
52 218 >3yrs sheep female control 1150
53 11 2yrs-3yrs sheep female control 350
54 86 2yrs-3yrs sheep male control 1200
55 220 2yrs-3yrs sheep female control 1350
56 217 >3yrs sheep female control 150
57 147 2yrs-3yrs sheep male control 550
58 15 2yrs-3yrs sheep male control 1200
59 2 2yrs-3yrs sheep female control 1350
60 9 >3yrs sheep female control 1350
8
3.2. Types of Variables of the Data
The EPG is the count response variables and sex,species, age and treatment are the
explanatory variables.
3.3. Poisson RegressionAnalysis and Its Interpretation
attach(walkite_Assignment_)
names(walkite_Assignment_)
View(walkite_Assignment_)
nematode<-glm(EPG~factor(age)+factor(species)+factor(sex)+factor(treatment),family =
"poisson",data = walkite_Assignment_)
nematode
summary(nematode)
coef <- coefficients(nematode)
coef
IRR <- exp(coefficients(nematode))
IRR
# predicted values and residual error
pred <- predict(nematode, type="response") # estimate predicted values
pred
res <- residuals(nematode, type="deviance") # estimate residuals
res
qqnorm(res, plot.it = TRUE)
qqline(res)
#Evaluating the fitness of Poisson regression models
?pchisq
pchisq(nematode$deviance,df=nematode$df.residual,lower.tail = FALSE)
library(AER)
dispersion <- dispersiontest(nematode,trafo=1)
9
dispersion
###################################################
library(readxl)
> walkite_Assignment_ <- read_excel("~/walkite Assignment .xlsx")
> View(walkite_Assignment_)
> attach(walkite_Assignment_)
The following object is masked _by_ .GlobalEnv:
age
The following objects are masked from walkite_Assignment_ (pos = 3):
age, EPG, ID, no,, sex, species, treatment
The following objects are masked from walkite_Assignment_ (pos = 4):
age, EPG, ID, no,, sex, species, treatment
The following objects are masked from walkite_Assignment_ (pos = 12):
age, EPG, ID, no,, sex, species, treatment
> names(walkite_Assignment_)
[1] "no," "ID" "age" "species" "sex" "treatment"
[7] "EPG"
> View(walkite_Assignment_)
>nematode<-glm(EPG~factor(age)+factor(species)+factor(sex)+factor(treatment),family =
"poisson",data = walkite_Assignment_)
> nematode
Call: glm(formula = EPG ~ factor(age) + factor(species) + factor(sex) +
10
factor(treatment), family = "poisson", data = walkite_Assignment_)
Coefficients:
(Intercept) factor(age)>3yrs
7.452148 0.005106
factor(age)2yrs-3yrs factor(species)sheep
0.308401 -0.500520
factor(sex)male factor(treatment)control
0.393118 -0.356036
factor(treatment)Ivermectin
-0.307651
Degrees of Freedom: 59 Total (i.e. Null); 53 Residual
Null Deviance: 64280
Residual Deviance: 49460 AIC: 50000
> summary(nematode)
Call:
glm(formula = EPG ~ factor(age) + factor(species) + factor(sex) +
factor(treatment), family = "poisson", data = walkite_Assignment_)
Deviance Residuals:
Min 1Q Median 3Q Max
-41.835 -28.155 -6.764 14.689 78.557
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 7.452148 0.009425 790.677 <2e-16 ***
factor(age)>3yrs 0.005106 0.010897 0.469 0.639
factor(age)2yrs-3yrs 0.308401 0.009429 32.708 <2e-16 ***
factor(species)sheep -0.500520 0.006708 -74.620 <2e-16 ***
11
factor(sex)male 0.393118 0.006936 56.681 <2e-16 ***
factor(treatment)control -0.356036 0.009612 -37.039 <2e-16 ***
factor(treatment)Ivermectin -0.307651 0.008409 -36.586 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 64280 on 59 degrees of freedom
Residual deviance: 49456 on 53 degrees of freedom
AIC: 50005
Number of Fisher Scoring iterations: 5
> coef <- coefficients(nematode)
> coef
(Intercept) factor(age)>3yrs
7.452147624 0.005106036
factor(age)2yrs-3yrs factor(species)sheep
0.308400816 -0.500519597
factor(sex)male factor(treatment)control
0.393117562 -0.356036156
factor(treatment)Ivermectin
-0.307651001
> IRR <- exp(coefficients(nematode))
> IRR
(Intercept) factor(age)>3yrs
1723.5607335 1.0051191
factor(age)2yrs-3yrs factor(species)sheep
1.3612465 0.6062156
factor(sex)male factor(treatment)control
12
1.4815926 0.7004473
factor(treatment)Ivermectin
0.7351718
> pred <- predict(nematode, type="response") # estimate predicted values
> pred
1 2 3 4 5 6 7
2566.6869 2553.6148 3476.0991 2553.6148 3476.0991 2553.6148 2553.6148
8 9 10 11 12 13 14
1723.5607 2346.1910 1723.5607 1724.8536 1724.8536 2555.5302 2555.5302
15 16 17 18 19 20 21
2555.5302 2555.5302 2555.5302 2555.5302 1724.8536 1724.8536 1213.4435
22 23 24 25 26 27 28
1213.4435 1797.8289 1797.8289 1797.8289 1797.8289 1797.8289 1797.8289
29 30 31 32 33 34 35
1213.4435 1213.4435 1548.0411 1422.2976 1422.2976 1422.2976 1548.0411
36 37 38 39 40 41 42
1548.0411 1555.9656 1050.1981 1555.9656 1050.1981 1549.2023 1045.6331
43 44 45 46 47 48 49
1143.9021 768.1439 1138.0762 1549.2023 1138.0762 1045.6331 1045.6331
50 51 52 53 54 55 56
1143.9021 735.6084 735.6084 996.2445 1476.0284 996.2445 735.6084
57 58 59 60
1476.0284 1476.0284 996.2445 735.6084
> res <- residuals(nematode, type="deviance") # estimate residuals
> res
1 2 3 4 5 6
-34.0049962 -1.0647241 -11.8729495 -23.7904413 78.5573439 -5.1054781
7 8 9 10 11 12
-33.7774785 -29.6545718 -10.6411543 14.2908883 -38.4780564 33.6401126
13 14 15 16 17 18
54.1929143 41.1171517 -15.7910551 -32.5049447 -7.2062497 -33.8108615
13
19 20 21 22 23 24
-40.4132669 -5.5385905 33.3812205 10.5744782 10.2584010 -41.8351111
25 26 27 28 29 30
21.8329977 34.5404042 -29.5821101 -32.8446551 -6.3216015 -16.0217059
31 32 33 34 35 36
-38.8780694 15.6018159 -32.0896195 -40.7419996 71.1128294 -41.0419275
37 38 39 40 41 42
-0.1513335 3.0327221 -1.4274364 28.4749380 -12.0440289 -25.0030992
43 44 45 46 47 48
-21.4525464 14.3849662 31.3689294 -38.9021433 19.3334906 28.6354384
49 50 51 52 53 54
-4.6148356 -10.7546288 20.2621346 14.1032342 -23.6695434 -7.4280964
55 56 57 58 59 60
10.6268694 -26.3476376 -27.6793360 -7.4280964 10.6268694 20.2621346
> qqnorm(res, plot.it = TRUE)
> qqline(res)
*The normal quartile plot indicates that the error is not normally distributed
?pchisq
> pchisq(nematode$deviance,df=nematode$df.residual,lower.tail = FALSE)
14
[1] 0
Interpretation: In this result the p-value zero (0) which indicates it is significant, indicating the lack of
fit of the data. The significance of the p-value in this result shows that there is presence of
overdispersion and it reveals that the poisson model data does not fit the data
> library(AER)
> dispersion <- dispersiontest(nematode,trafo=1)
> dispersion
Overdispersion test
data: nematode
z = 4.2675, p-value = 9.884e-06
alternative hypothesis: true alpha is greater than 0
sample estimates:
alpha
871.0029
The result of overdispersion test indicate there is evidence of overdispersion (c is estimated to be
872.046) which speaks quite strong against the assumption of equidispersion that means when c=0. So
generally since almost all assumption were violated or the goodness of fit of the Poisson model indicates
that the model is not fit. The ‘dispersiontest’indicate the data to be over dispersed. The normal quartile
plot also indicates that the error is not normally distributed. Thus, it is better to look for
Negative Binomial Regression.
3.4. Negative Binomial RegressionAnalysis and Its Interpretation
#Negative Binomial regression
library(MASS)
NBREG<-glm.nb(EPG~factor(age)+factor(species)+factor(sex)+factor(treatment),data =
15
walkite_Assignment_)
NBREG
summary(NBREG)
#####Checking the model assumption
library(lmtest)
lrtest(nematode,NBREG)
coef <- coefficients(NBREG)
coef
IRR <- exp(coefficients(NBREG))
IRR
# predicted values and residual error
pred <- predict(NBREG, type="response") # estimate predicted values
pred
res <- residuals(NBREG, type="deviance") # estimate residuals
res
qqnorm(res, plot.it = TRUE)
qqline(res)
################################
> library(MASS)
>NBREG<-glm.nb(EPG~factor(age)+factor(species)+factor(sex)+factor(treatment),data =
walkite_Assignment_)
> NBREG
Call: glm.nb(formula = EPG ~ factor(age) + factor(species) + factor(sex) +
factor(treatment), data = walkite_Assignment_, init.theta = 1.923394949,
link = log)
Coefficients:
(Intercept) factor(age)>3yrs
7.59952 -0.08562
factor(age)2yrs-3yrs factor(species)sheep
16
0.06591 -0.48364
factor(sex)male factor(treatment)control
0.29248 -0.29743
factor(treatment)Ivermectin
-0.21070
Degrees of Freedom: 59 Total (i.e. Null); 53 Residual
Null Deviance: 79.85
Residual Deviance: 65.06 AIC: 1005
> summary(NBREG)
Call:
glm.nb(formula = EPG ~ factor(age) + factor(species) + factor(sex) +
factor(treatment), data = walkite_Assignment_, init.theta = 1.923394949,
link = log)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.9960 -1.0922 -0.1551 0.4745 1.9744
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 7.59952 0.28454 26.708 <2e-16 ***
factor(age)>3yrs -0.08562 0.30986 -0.276 0.7823
factor(age)2yrs-3yrs 0.06591 0.28140 0.234 0.8148
factor(species)sheep -0.48364 0.18917 -2.557 0.0106 *
factor(sex)male 0.29248 0.19578 1.494 0.1352
factor(treatment)control -0.29743 0.26745 -1.112 0.2661
factor(treatment)Ivermectin -0.21070 0.24985 -0.843 0.3991
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
17
(Dispersion parameter for Negative Binomial(1.9234) family taken to be 1)
Null deviance: 79.853 on 59 degrees of freedom
Residual deviance: 65.056 on 53 degrees of freedom
AIC: 1005.4
Number of Fisher Scoring iterations: 1
Theta: 1.923
Std. Err.: 0.326
2 x log-likelihood: -989.360
> library(lmtest)
>nematode<-glm(EPG~factor(age)+factor(species)+factor(sex)+factor(treatment),family =
"poisson",data = walkite_Assignment_)
>NBREG<-glm.nb(EPG~factor(age)+factor(species)+factor(sex)+factor(treatment),data =
walkite_Assignment_)
> lrtest(nematode,NBREG)
Likelihood ratio test
Model 1: EPG ~ factor(age) + factor(species) + factor(sex) + factor(treatment)
Model 2: EPG ~ factor(age) + factor(species) + factor(sex) + factor(treatment)
#Df LogLik Df Chisq Pr(>Chisq)
1 7 -24995.3
2 8 -494.7 1 49001 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Interpretation: In this model checking the associated chi-squared value estimated from
18
2*(logLik(nematode) – logLik(NBREG)) is 49001 with one degree of freedom and the p-value is
less than the significance level. This strongly suggests that the negative binomial model,
estimating the dispersion parameter, is more appropriate than the Poisson model.
> coef <- coefficients(NBREG)
> coef
(Intercept) factor(age)>3yrs
7.59951872 -0.08562393
factor(age)2yrs-3yrs factor(species)sheep
0.06591159 -0.48363536
factor(sex)male factor(treatment)control
0.29247990 -0.29743050
factor(treatment)Ivermectin
-0.21070237
> IRR <- exp(coefficients(NBREG))
> IRR
(Intercept) factor(age)>3yrs
1997.2344396 0.9179394
factor(age)2yrs-3yrs factor(species)sheep
1.0681323 0.6165380
factor(sex)male factor(treatment)control
1.3397458 0.7427242
factor(treatment)Ivermectin
0.8100151
> pred <- predict(NBREG, type="response") # estimate predicted values
> pred
1 2 3 4 5 6 7
2456.2098 2675.7865 2858.0939 2675.7865 2858.0939 2675.7865 2675.7865
8 9 10 11 12 13 14
1997.2344 2133.3106 1997.2344 1728.0138 1728.0138 2315.0993 2315.0993
15 16 17 18 19 20 21
2315.0993 2315.0993 2315.0993 2315.0993 1728.0138 1728.0138 1361.6661
19
22 23 24 25 26 27 28
1361.6661 1824.2864 1824.2864 1824.2864 1824.2864 1824.2864 1824.2864
29 30 31 32 33 34 35
1361.6661 1361.6661 1649.7240 1315.2670 1315.2670 1315.2670 1649.7240
36 37 38 39 40 41 42
1649.7240 1514.3466 1130.3238 1514.3466 1130.3238 1427.3466 1065.3861
43 44 45 46 47 48 49
1226.6436 997.4290 1336.3013 1427.3466 1336.3013 1065.3861 1065.3861
50 51 52 53 54 55 56
1226.6436 839.5189 839.5189 976.8806 1308.7717 976.8806 839.5189
57 58 59 60
1308.7717 1308.7717 976.8806 839.5189
> res <- residuals(NBREG, type="deviance") # estimate residuals
> res
1 2 3 4 5 6
-1.03229229 -0.09315152 -0.02837324 -0.77077140 1.97435995 -0.20463991
7 8 9 10 11 12
-1.12245881 -1.31177836 -0.19293868 0.23175489 -1.63311996 1.00491029
13 14 15 16 17 18
1.48540908 1.18739110 -0.33482919 -0.91794012 -0.07009783 -0.96869028
19 20 21 22 23 24
-1.75162396 -0.19161862 0.97087411 0.22971859 0.30126846 -1.79873459
25 26 27 28 29 30
0.63951789 0.98797747 -1.14541053 -1.30127519 -0.40687757 -0.83012023
31 32 33 34 35 36
-1.84436364 0.66416970 -1.37752773 -1.99254932 1.92366710 -1.99600958
37 38 39 40 41 42
0.03237883 0.02398299 -0.01317219 0.95704739 -0.34600465 -1.30165058
43 44 45 46 47 48
-1.08206029 0.26433726 0.86351808 -1.71879153 0.47665214 1.05999520
49 50 51 52 53 54
20
-0.22734653 -0.55289232 0.71465995 0.46009582 -1.21473337 -0.11852987
55 56 57 58 59 60
0.47377501 -1.85712317 -1.04996037 -0.11852987 0.47377501 0.71465995
> qqnorm(res, plot.it = TRUE)
> qqline(res)
>
*The normal quartile plot indicates that the error is almost normally distributed. Thus the
negative binomial regression fit the data.
IT’S INTERPRETATION
The interpretation should be based on negative binomial regression analysis because the poission
model does not fit the Data. In the above Negative binomial regression analysis ‘Albendazole’
from –treatment-, ‘female’ from –sex- and ‘<-1yrs’ from -age, `goat` from species were used
as references. Sex and age have statistically nonsignificant impact on EPG count and control
group has nonsignificant effect on EPG count. Species has significant impact on EPG count. The
reduction factor caused by Ivermectin drug is (exp(-0.21070)-1)*100= -18.998. Even if there is
reduction in EPG count the Ivermectin drug has nonsignificant impact on EPG count because the
p-value for Ivermectin is 0.3991. Hence this indicates that there is resistance of the parasite or
the efficacy of the drug is not good. Generally; since the control group (p-value=0.2661) has
21
nonsignifant effect on EPG count, in both Albendazole and Ivermectin resistance of parasite
were detected.
4. REFERENCES
Cameron, A.C., Trivedi, P.K., 1986. Econometric models based on count data. Comparisons and
applications of some estimators and tests. Journal of applied econometrics 1, 29-53.
Cameron, A.C., Trivedi, P.K., 2013. Regression analysis of count data. Cambridge university
press.
Christopher, B., 2010. Models for Count Data and Categorical Response Data.
Hilbe, J.M., 2008. Brief overview on interpreting count model risk ratios: An addendum to
negative binomial regression. Cambridge University Press Cambridge.
Hilbe, J.M., 2011a. Modeling count data. International Encyclopedia of Statistical Science.
Springer, pp. 836-839.
Hilbe, J.M., 2011b. Negative binomial regression. Cambridge University Press.
Lambert, D., 1992. Zero-inflated Poisson regression, with an application to defects in
manufacturing. Technometrics 34, 1-14.
Walkite, F., Negesse, M., Anwar, H., 2017. Detection of Antihelmintic resistance in
gastrointestinal nematode parasite in small ruminant in Haramaya university farms. pp. 13-19.

More Related Content

What's hot

Ml3 logistic regression-and_classification_error_metrics
Ml3 logistic regression-and_classification_error_metricsMl3 logistic regression-and_classification_error_metrics
Ml3 logistic regression-and_classification_error_metricsankit_ppt
 
Introduction to Generalized Linear Models
Introduction to Generalized Linear ModelsIntroduction to Generalized Linear Models
Introduction to Generalized Linear Modelsrichardchandler
 
Ppt for 1.1 introduction to statistical inference
Ppt for 1.1 introduction to statistical inferencePpt for 1.1 introduction to statistical inference
Ppt for 1.1 introduction to statistical inferencevasu Chemistry
 
Introduction to Maximum Likelihood Estimator
Introduction to Maximum Likelihood EstimatorIntroduction to Maximum Likelihood Estimator
Introduction to Maximum Likelihood EstimatorAmir Al-Ansary
 
bio statistics for clinical research
bio statistics for clinical researchbio statistics for clinical research
bio statistics for clinical researchRanjith Paravannoor
 
ODDS RATIO AND RELATIVE RISK EVALUATION
ODDS RATIO AND RELATIVE RISK EVALUATIONODDS RATIO AND RELATIVE RISK EVALUATION
ODDS RATIO AND RELATIVE RISK EVALUATIONKanhu Charan
 
Inferential Statistics
Inferential StatisticsInferential Statistics
Inferential Statisticsewhite00
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statisticsAttaullah Khan
 
Statistics lecture 8 (chapter 7)
Statistics lecture 8 (chapter 7)Statistics lecture 8 (chapter 7)
Statistics lecture 8 (chapter 7)jillmitchell8778
 
Chapter 3 Confidence Interval
Chapter 3 Confidence IntervalChapter 3 Confidence Interval
Chapter 3 Confidence Intervalghalan
 
Lecture 6. univariate and bivariate analysis
Lecture 6. univariate and bivariate analysisLecture 6. univariate and bivariate analysis
Lecture 6. univariate and bivariate analysisDr Rajeev Kumar
 
Survival Analysis
Survival AnalysisSurvival Analysis
Survival AnalysisSMAliKazemi
 

What's hot (20)

Ml3 logistic regression-and_classification_error_metrics
Ml3 logistic regression-and_classification_error_metricsMl3 logistic regression-and_classification_error_metrics
Ml3 logistic regression-and_classification_error_metrics
 
Introduction to Generalized Linear Models
Introduction to Generalized Linear ModelsIntroduction to Generalized Linear Models
Introduction to Generalized Linear Models
 
Ppt for 1.1 introduction to statistical inference
Ppt for 1.1 introduction to statistical inferencePpt for 1.1 introduction to statistical inference
Ppt for 1.1 introduction to statistical inference
 
Probability concept and Probability distribution
Probability concept and Probability distributionProbability concept and Probability distribution
Probability concept and Probability distribution
 
Introduction to Maximum Likelihood Estimator
Introduction to Maximum Likelihood EstimatorIntroduction to Maximum Likelihood Estimator
Introduction to Maximum Likelihood Estimator
 
Model selection
Model selectionModel selection
Model selection
 
1.introduction
1.introduction1.introduction
1.introduction
 
Survival analysis
Survival  analysisSurvival  analysis
Survival analysis
 
Cumulative Frequency
Cumulative FrequencyCumulative Frequency
Cumulative Frequency
 
Statistical tests
Statistical tests Statistical tests
Statistical tests
 
bio statistics for clinical research
bio statistics for clinical researchbio statistics for clinical research
bio statistics for clinical research
 
ODDS RATIO AND RELATIVE RISK EVALUATION
ODDS RATIO AND RELATIVE RISK EVALUATIONODDS RATIO AND RELATIVE RISK EVALUATION
ODDS RATIO AND RELATIVE RISK EVALUATION
 
Statistical Distributions
Statistical DistributionsStatistical Distributions
Statistical Distributions
 
Poisson regression models for count data
Poisson regression models for count dataPoisson regression models for count data
Poisson regression models for count data
 
Inferential Statistics
Inferential StatisticsInferential Statistics
Inferential Statistics
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Statistics lecture 8 (chapter 7)
Statistics lecture 8 (chapter 7)Statistics lecture 8 (chapter 7)
Statistics lecture 8 (chapter 7)
 
Chapter 3 Confidence Interval
Chapter 3 Confidence IntervalChapter 3 Confidence Interval
Chapter 3 Confidence Interval
 
Lecture 6. univariate and bivariate analysis
Lecture 6. univariate and bivariate analysisLecture 6. univariate and bivariate analysis
Lecture 6. univariate and bivariate analysis
 
Survival Analysis
Survival AnalysisSurvival Analysis
Survival Analysis
 

Similar to Analysis of count data using Poisson and negative binomial regression models

Basic Statistics for application in Medical Assessment
Basic Statistics for application in Medical AssessmentBasic Statistics for application in Medical Assessment
Basic Statistics for application in Medical AssessmentShrushrita Sharma
 
Confidence Intervals in the Life Sciences PresentationNamesS.docx
Confidence Intervals in the Life Sciences PresentationNamesS.docxConfidence Intervals in the Life Sciences PresentationNamesS.docx
Confidence Intervals in the Life Sciences PresentationNamesS.docxmaxinesmith73660
 
Multinomial Logistic Regression.pdf
Multinomial Logistic Regression.pdfMultinomial Logistic Regression.pdf
Multinomial Logistic Regression.pdfAlemAyahu
 
applied multivariate statistical techniques in agriculture and plant science 2
applied multivariate statistical techniques in agriculture and plant science 2applied multivariate statistical techniques in agriculture and plant science 2
applied multivariate statistical techniques in agriculture and plant science 2amir rahmani
 
Data science types_of_poisson_regression
Data science types_of_poisson_regressionData science types_of_poisson_regression
Data science types_of_poisson_regressionvandithff
 
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Technical_Report
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Technical_ReportRodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Technical_Report
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Technical_Report​Iván Rodríguez
 
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...cambridgeWD
 
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...cambridgeWD
 
5_lectureslides.pptx
5_lectureslides.pptx5_lectureslides.pptx
5_lectureslides.pptxsuchita74
 
ProjectWriteupforClass (3)
ProjectWriteupforClass (3)ProjectWriteupforClass (3)
ProjectWriteupforClass (3)Jeff Lail
 
Normal Curve in Total Quality Management
Normal Curve in Total Quality ManagementNormal Curve in Total Quality Management
Normal Curve in Total Quality ManagementDr.Raja R
 
Data Transformation.ppt
Data Transformation.pptData Transformation.ppt
Data Transformation.pptVishal Yadav
 
Xie et al 2016 risk analysis
Xie et al 2016 risk analysisXie et al 2016 risk analysis
Xie et al 2016 risk analysisMaria Isabel
 
1 descriptive statistics
1 descriptive statistics1 descriptive statistics
1 descriptive statisticsSanu Kumar
 
Sampling distribution
Sampling distributionSampling distribution
Sampling distributionswarna dey
 
2013jsm,Proceedings,DSweitzer,26sep
2013jsm,Proceedings,DSweitzer,26sep2013jsm,Proceedings,DSweitzer,26sep
2013jsm,Proceedings,DSweitzer,26sepDennis Sweitzer
 

Similar to Analysis of count data using Poisson and negative binomial regression models (20)

Basic Statistics for application in Medical Assessment
Basic Statistics for application in Medical AssessmentBasic Statistics for application in Medical Assessment
Basic Statistics for application in Medical Assessment
 
Statistics excellent
Statistics excellentStatistics excellent
Statistics excellent
 
Confidence Intervals in the Life Sciences PresentationNamesS.docx
Confidence Intervals in the Life Sciences PresentationNamesS.docxConfidence Intervals in the Life Sciences PresentationNamesS.docx
Confidence Intervals in the Life Sciences PresentationNamesS.docx
 
Multinomial Logistic Regression.pdf
Multinomial Logistic Regression.pdfMultinomial Logistic Regression.pdf
Multinomial Logistic Regression.pdf
 
applied multivariate statistical techniques in agriculture and plant science 2
applied multivariate statistical techniques in agriculture and plant science 2applied multivariate statistical techniques in agriculture and plant science 2
applied multivariate statistical techniques in agriculture and plant science 2
 
Data science course
Data science courseData science course
Data science course
 
Data science types_of_poisson_regression
Data science types_of_poisson_regressionData science types_of_poisson_regression
Data science types_of_poisson_regression
 
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Technical_Report
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Technical_ReportRodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Technical_Report
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Technical_Report
 
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
 
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
 
Presentation1
Presentation1Presentation1
Presentation1
 
5_lectureslides.pptx
5_lectureslides.pptx5_lectureslides.pptx
5_lectureslides.pptx
 
ProjectWriteupforClass (3)
ProjectWriteupforClass (3)ProjectWriteupforClass (3)
ProjectWriteupforClass (3)
 
mining
miningmining
mining
 
Normal Curve in Total Quality Management
Normal Curve in Total Quality ManagementNormal Curve in Total Quality Management
Normal Curve in Total Quality Management
 
Data Transformation.ppt
Data Transformation.pptData Transformation.ppt
Data Transformation.ppt
 
Xie et al 2016 risk analysis
Xie et al 2016 risk analysisXie et al 2016 risk analysis
Xie et al 2016 risk analysis
 
1 descriptive statistics
1 descriptive statistics1 descriptive statistics
1 descriptive statistics
 
Sampling distribution
Sampling distributionSampling distribution
Sampling distribution
 
2013jsm,Proceedings,DSweitzer,26sep
2013jsm,Proceedings,DSweitzer,26sep2013jsm,Proceedings,DSweitzer,26sep
2013jsm,Proceedings,DSweitzer,26sep
 

Recently uploaded

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/managementakshesh doshi
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 

Recently uploaded (20)

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/management
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 

Analysis of count data using Poisson and negative binomial regression models

  • 1. I ADDIS ABABA UNIVERSITY COLLEGE OF VETERINARY MEDICINE AND AGRICULTURE Assignment for the course “Advanced Biostatistics” ON ANALYSIS OF COUNT DATA By Walkite Furgasa Chala (DVM) ID NO., GSR/2792/10 Submitted to; Samson Leta (DVM, MSc, Assistant Professor ) December, 2017 Bishoftu, Ethiopia
  • 2. II Table of Contents Page LIST OF TABLE ........................................................................................................................III LIST OF FIGURES ....................................................................................................................IV LIST OF ABBREVATIONS ........................................................................................................V SUMMARY.................................................................................................................................VI 1. INTRODUCTION..................................................................................................................... 1 2. STATISTICAL TESTS TO ANALYZE COUNT DATA ..................................................... 2 2.1 Poisson Regression................................................................................................................. 2 2.2 Negative Binomial Regression................................................................................................ 3 2.3 Zero Inflated Regression........................................................................................................ 4 3. ANALYSIS OF A COUNT DATA .......................................................................................... 5 3.1. Source of Data....................................................................................................................... 5 3.2. Types of Variables of the Data............................................................................................... 8 3.3. Poisson Regression Analysis and Its Interpretation............................................................... 8 3.4. Negative Binomial Regression Analysis and Its Interpretation........................................... 14 4 REFERENCES......................................................................................................................... 21
  • 3. III LIST OF TABLE Table 1: Raw data of the Assignment
  • 4. IV LIST OF FIGURES Figure 1. Q-Q plot of poission regression analysis Figure 2. Q-Q plot of negative binomial regression analysis
  • 5. V LIST OF ABBREVATIONS AIC Akaike Information Criterion EPG Egg pergram of feaces GLM Generalized linear model IRR Incident rate ratio NBREG Negative binomial regression model ZINB Zero inflated negative binomial model ZIP Zero inflated poisson model
  • 6. VI SUMMARY In statistics, count data is a statistical data type in which the observations can take only the non- negative integer values. Count models are a subset of discrete response regression models and are distributed as non-negative integers, are intrinsically heteroskedastic, right skewed, and have a variance that increases with the mean. An individual piece of count data is often termed as a count variable. When such a variable is treated as a random variable, the Poisson and negative binomial distributions are commonly used to represent its distribution and if there is excess zeros, zero Inflated Regression was used. The objective of this assignment was to write and analyze certain data on count data using R software. The title of the the data is “Detection of Anthelmintic Resistance in Gastrointestinal Nematodes of Small Ruminants in Haramaya University Farms”. The sheep and goats infected with gastrointestinal nematodes were selected and I took 30 goats and 30 sheep. The goats and sheep were grouped into Albendazole group(10), Ivermectin group(10) and the control(10). The egg was counted before treatment and after treatment in treated group and again the egg was also counted twice in control group in parallel to treated groups. The change of egg count was taken from treated groups and the second egg count was taken in control group for these assignment. The data was analyzed with R software through poisson regression and negative binomial regression models. The poisson model didn`t fit the data because the result of overdispersion test indicate there is evidence of overdispersion (c is estimated to be 872.046) which speaks quite strong against the assumption of equidispersion that means when c=0. Pchisq p-value also nonsignificant(0) which indicates the data was not fit. The normal quartile plot also indicates that the error is not normally distributed. So generally since almost all assumption were violated or the goodness of fit of the Poisson model indicates that the model is not fit. The ‘dispersiontest’indicate the data to be over dispersed but the negative binomial regression model fit the data. The data was interpreted based on the result obtain through negative binomial regression. Keywords: Analysis, count data
  • 7. 1 1. INTRODUCTION In statistics, count data is a statistical data type in which the observations can take only the non-negative integer values {0, 1, 2, 3, ...}, and where these integers arise from counting. The statistical treatment of count data is distinct from that of binary data, in which the observations can take only two values, usually represented by 0 and 1, and from ordinal data, which may also consist of integers but where the individual values fall on an arbitrary scale and only the relative ranking is important(Cameron and Trivedi, 2013). Count models are a subset of discrete response regression models. Count data are distributed as non-negative integers, right skewed, and have a variance that increases with the mean. Example, count data include such situations as length of hospital stay, the number of a certain species of fish per defined area in the ocean, the number of lights displayed by fireflies over specified time periods, the classic case of the number of deaths and the number of occurrences of thunderstorms in a calendar year. An individual piece of count data is often termed a count variable. When such a variable is treated as a random variable, the Poisson and negative binomial distributions are commonly used to represent its distribution (Cameron and Trivedi, 1986). Graphical examination of count data may be aided by the use of data transformations chosen to have the property of stabilising the sample variance. In particular, the square root transformation might be used when data can be approximated by a Poisson distribution (although other transformation have modestly improved properties), while an inverse sine transformation is available when a binomial distribution is preferred(Hilbe, 2011b).
  • 8. 2 2. STATISTICAL TESTS TO ANALYZE COUNT DATA 2.1 Poisson Regression The Poisson distribution can form the basis for some analyses of count data and in this case Poisson regression may be used. This is a special case of the class of generalized linear models which also contains specific forms of model capable of using the binomial distribution (binomial regression, logistic regression) or the negative binomial distribution where the assumptions of the Poisson model are violated, in particular when the range of count values is limited or when overdispersion is present(Hilbe, 2011a). A key feature of the Poisson model is the equality of the mean and variance functions. When the variance of a Poisson model exceeds its mean, the model is termed overdispersed. Simulation studies have demonstrated that overdispersion is indicated when the Pearson χ2dispersion is greater than 1.0. The dispersion statistic is defined as the Pearson χ2 divided by the model residual degrees of freedom. Overdispersion, common to most Poisson models, biases the parameter estimates and fitted values. When Poisson overdispersion is real, and not merely apparent, a count model other than Poisson is required(Hilbe, 2008). Poisson regression is the basic model from which a variety of count models are based. It is derived from the Poisson probability mass function. The Poisson regression model is the benchmark model for count data in much the same way as the normal linear model is the benchmark for real-valued continuous data(Cameron and Trivedi, 1986). The Poisson model is simple, and it is robust. If the only interest of the analysis lies in estimating the parameters of a log-linear mean function, there is hardly any reason (except for efficiency) to ever contemplate anything other than the Poisson regression model. In fact, its applicability extends well beyond the traditional domain of count data. The Poisson regression model can be used for any constant elasticity mean function, whether the dependent variable is a count, and there are good reasons why it should be preferred over the more common log transformation of the dependent variable. In fact, its applicability extends well beyond the traditional domain of count data. And yet, there are instances where the Poisson regression model is unsuited. Essentially, the Poisson model is
  • 9. 3 always overly restrictive when it comes to estimating features of the population other than the mean, such as the variance or the probability of single outcomes. The Poisson distribution has a positive mean µ. Although a GLM can model a positive mean using the identity link, it is more common to model the log of the mean. Like the linear predictor α+βx, the log mean can take any real value. The log mean is the natural parameter for the Poisson distribution, and the log link is the canonical link for a Poisson GLM. A Poisson loglinear GLM assumes a Poisson distribution for Y and uses the log link. The Poisson loglinear model with explanatory variable X is logµ=α+βx. For this model, the mean satisfies the exponential relationship µ=exp(α+βx)=eα(eβ)x. A one unit increase in x has a multiplicative impact of eβ on µ. The mean at x+1equals the mean at x multiplied by eβ.(Re) . In some contexts, the Poisson distribution describes the number of events that occur in a given time period where its mean µ is the average number of events per period. It has the unusual feature that its mean equals its variance. Its probability density function is Pr(Y = y ) = e-µµy/y!, y=0,1,2,. . .where e is the base of the natural logarithms and y ! is the factorial of y . The skewness of the Poisson distribution is (1/µ) and the kurtosis is (3 + 1/µ), so that for large µ, the distribution approaches the Normal N (µ,µ) with skewness of zero and kurtosis of three (Christopher,2010) 2.2 Negative Binomial Regression A limitation of the Poisson distribution is the equality of its mean and variance. It may often observe count data processes where this equality is not reasonable: in particular, where the conditional variance is larger than the conditional mean. This is termed overdispersion, and its presence renders the assumption of a Poisson distribution for the error process untenable. It is particularly likely to occur in the case of unobserved heterogeneity. In this circumstance, a reasonable alternative is negative binomial regression. The negative binomial is a conjugate mixture distribution for count data. The negative binomial (NB) distribution is a two-parameter distribution. For positive integer n, it is the distribution of the number of failures that occur in a sequence of trials before n successes have occurred, where the probability of success in each trial is p. The distribution is defined for any positive n. The negative binomial distribution is a
  • 10. 4 mixture of the Poisson distribution and the Gamma distribution, or generalized factorial function. Unlike the Poisson, which is fully characterized by its mean µ, the NB distribution is a function of both µ and α . Its mean is still µ, but its conditional variance is µ(1 +α). As evident, as α=0, the distribution becomes the Poisson distribution(Christopher, 2010) 2.3 Zero Inflated Regression In many studies count data may possess excess amount of zeros. If data consist of non- negative, highly skewed sequence counts with a large proportion of zeros. Zero-Inflated Poisson (ZIP), Zero-Inflated Negative Binomial (ZINB) Models and Hurdle models are useful for analysing of such data. Zero counts may not occur in the same process as other positive counts. Zero-inflated count data may not have equality of mean and variance. In such case over-dispersion (or under-dispersion) need to be taken into account. (Lambert, 1992)
  • 11. 5 3. ANALYSIS OF A COUNT DATA 3.1. Source of Data and Its Description . The data was normally my DVM thesis. The data was on East Africa Journal of veterinary and Animal science 03 gallery proof. walkite et al., 2017. The title of the the data or the research is “Detection of Anthelmintic Resistance in Gastrointestinal Nematodes of Small Ruminants in Haramaya University Farms”. The sheep and goats infected with gastrointestinal nematodes were selected and I took 30 goats and 30 sheep. The goats and sheep were grouped into Albendazole group(10), Ivermectin group(10) and the control(10). The egg was counted before treatment and after treatment in treated group and again the egg was also counted twice in control group in parallel to treated groups. The change of egg count was taken from treated groups and the second egg count was taken in control group for these assignment(Walkite et al., 2017). Table 1:- The raw data of the Assignment No, ID age species sex treatment EPG 1 1546 >3yrs goat male Albendazole 1050 2 1595 <-1yrs goat male Albendazole 2500 3 1612 2yrs- 3yrs goat male Albendazole 2800 4 1599 <-1yrs goat male Albendazole 1450 5 1576 2yrs- 3yrs goat male Albendazole 9050 6 1593 <-1yrs goat male Albendazole 2300 7 1609 <-1yrs goat male Albendazole 1050 8 1608 <-1yrs goat female Albendazole 650 9 1526 2yrs- 3yrs goat female Albendazole 1850 10 1605 <-1yrs goat female Albendazole 2350 11 63 2yrs- 3yrs goat female Ivermectin 400 12 42 2yrs- 3yrs goat female Ivermectin 3300
  • 12. 6 13 110 2yrs- 3yrs goat male Ivermectin 5750 14 111 2yrs- 3yrs goat male Ivermectin 4900 15 28 2yrs- 3yrs goat male Ivermectin 1800 16 1425 2yrs- 3yrs goat male Ivermectin 1100 17 80 2yrs- 3yrs goat male Ivermectin 2200 18 96 2yrs- 3yrs goat male Ivermectin 1050 19 72 2yrs- 3yrs goat female Ivermectin 350 20 87 2yrs- 3yrs goat female Ivermectin 1500 21 1536 >3yrs goat female control 2550 22 1543 >3yrs goat female control 1600 23 1580 >3yrs goat male control 2250 24 13 >3yrs goat male control 350 25 68 >3yrs goat male control 2800 26 6 >3yrs goat male control 3450 27 5 >3yrs goat male control 700 28 21 >3yrs goat male control 600 29 31 >3yrs goat female control 1000 30 259 >3yrs goat female control 700 31 106 <-1yrs sheep male Albendazole 300 32 13 2yrs-3yrs sheep female Albendazole 2050 33 237 2yrs-3yrs sheep female Albendazole 400 34 42 2yrs-3yrs sheep female Albendazole 200 35 95 >1yrs sheep male Albendazole 5100 36 190 <-1yrs sheep male Albendazole 250 37 148 >3yrs sheep male Albendazole 1550 38 89 >3yrs sheep female Albendazole 1150 39 158 >3yrs sheep male Albendazole 1500 40 187 >3yrs sheep female Albendazole 2100 41 109 2yrs-3yrs sheep male Ivermectin 1100
  • 13. 7 42 5 2yrs-3yrs sheep female Ivermectin 350 43 110 >3yrs sheep male Ivermectin 500 44 168 >1yrs sheep female Ivermectin 1200 45 120 >yrs sheep male Ivermectin 2350 46 20 2yrs-3yrs sheep male Ivermectin 300 47 83 1yrs sheep male Ivermectin 1850 48 60 2yrs-3yrs sheep female Ivermectin 2100 49 14 2yrs-3yrs sheep female Ivermectin 900 50 909 >3yrs sheep male Ivermectin 800 51 6 >3yrs sheep female control 1350 52 218 >3yrs sheep female control 1150 53 11 2yrs-3yrs sheep female control 350 54 86 2yrs-3yrs sheep male control 1200 55 220 2yrs-3yrs sheep female control 1350 56 217 >3yrs sheep female control 150 57 147 2yrs-3yrs sheep male control 550 58 15 2yrs-3yrs sheep male control 1200 59 2 2yrs-3yrs sheep female control 1350 60 9 >3yrs sheep female control 1350
  • 14. 8 3.2. Types of Variables of the Data The EPG is the count response variables and sex,species, age and treatment are the explanatory variables. 3.3. Poisson RegressionAnalysis and Its Interpretation attach(walkite_Assignment_) names(walkite_Assignment_) View(walkite_Assignment_) nematode<-glm(EPG~factor(age)+factor(species)+factor(sex)+factor(treatment),family = "poisson",data = walkite_Assignment_) nematode summary(nematode) coef <- coefficients(nematode) coef IRR <- exp(coefficients(nematode)) IRR # predicted values and residual error pred <- predict(nematode, type="response") # estimate predicted values pred res <- residuals(nematode, type="deviance") # estimate residuals res qqnorm(res, plot.it = TRUE) qqline(res) #Evaluating the fitness of Poisson regression models ?pchisq pchisq(nematode$deviance,df=nematode$df.residual,lower.tail = FALSE) library(AER) dispersion <- dispersiontest(nematode,trafo=1)
  • 15. 9 dispersion ################################################### library(readxl) > walkite_Assignment_ <- read_excel("~/walkite Assignment .xlsx") > View(walkite_Assignment_) > attach(walkite_Assignment_) The following object is masked _by_ .GlobalEnv: age The following objects are masked from walkite_Assignment_ (pos = 3): age, EPG, ID, no,, sex, species, treatment The following objects are masked from walkite_Assignment_ (pos = 4): age, EPG, ID, no,, sex, species, treatment The following objects are masked from walkite_Assignment_ (pos = 12): age, EPG, ID, no,, sex, species, treatment > names(walkite_Assignment_) [1] "no," "ID" "age" "species" "sex" "treatment" [7] "EPG" > View(walkite_Assignment_) >nematode<-glm(EPG~factor(age)+factor(species)+factor(sex)+factor(treatment),family = "poisson",data = walkite_Assignment_) > nematode Call: glm(formula = EPG ~ factor(age) + factor(species) + factor(sex) +
  • 16. 10 factor(treatment), family = "poisson", data = walkite_Assignment_) Coefficients: (Intercept) factor(age)>3yrs 7.452148 0.005106 factor(age)2yrs-3yrs factor(species)sheep 0.308401 -0.500520 factor(sex)male factor(treatment)control 0.393118 -0.356036 factor(treatment)Ivermectin -0.307651 Degrees of Freedom: 59 Total (i.e. Null); 53 Residual Null Deviance: 64280 Residual Deviance: 49460 AIC: 50000 > summary(nematode) Call: glm(formula = EPG ~ factor(age) + factor(species) + factor(sex) + factor(treatment), family = "poisson", data = walkite_Assignment_) Deviance Residuals: Min 1Q Median 3Q Max -41.835 -28.155 -6.764 14.689 78.557 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 7.452148 0.009425 790.677 <2e-16 *** factor(age)>3yrs 0.005106 0.010897 0.469 0.639 factor(age)2yrs-3yrs 0.308401 0.009429 32.708 <2e-16 *** factor(species)sheep -0.500520 0.006708 -74.620 <2e-16 ***
  • 17. 11 factor(sex)male 0.393118 0.006936 56.681 <2e-16 *** factor(treatment)control -0.356036 0.009612 -37.039 <2e-16 *** factor(treatment)Ivermectin -0.307651 0.008409 -36.586 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 64280 on 59 degrees of freedom Residual deviance: 49456 on 53 degrees of freedom AIC: 50005 Number of Fisher Scoring iterations: 5 > coef <- coefficients(nematode) > coef (Intercept) factor(age)>3yrs 7.452147624 0.005106036 factor(age)2yrs-3yrs factor(species)sheep 0.308400816 -0.500519597 factor(sex)male factor(treatment)control 0.393117562 -0.356036156 factor(treatment)Ivermectin -0.307651001 > IRR <- exp(coefficients(nematode)) > IRR (Intercept) factor(age)>3yrs 1723.5607335 1.0051191 factor(age)2yrs-3yrs factor(species)sheep 1.3612465 0.6062156 factor(sex)male factor(treatment)control
  • 18. 12 1.4815926 0.7004473 factor(treatment)Ivermectin 0.7351718 > pred <- predict(nematode, type="response") # estimate predicted values > pred 1 2 3 4 5 6 7 2566.6869 2553.6148 3476.0991 2553.6148 3476.0991 2553.6148 2553.6148 8 9 10 11 12 13 14 1723.5607 2346.1910 1723.5607 1724.8536 1724.8536 2555.5302 2555.5302 15 16 17 18 19 20 21 2555.5302 2555.5302 2555.5302 2555.5302 1724.8536 1724.8536 1213.4435 22 23 24 25 26 27 28 1213.4435 1797.8289 1797.8289 1797.8289 1797.8289 1797.8289 1797.8289 29 30 31 32 33 34 35 1213.4435 1213.4435 1548.0411 1422.2976 1422.2976 1422.2976 1548.0411 36 37 38 39 40 41 42 1548.0411 1555.9656 1050.1981 1555.9656 1050.1981 1549.2023 1045.6331 43 44 45 46 47 48 49 1143.9021 768.1439 1138.0762 1549.2023 1138.0762 1045.6331 1045.6331 50 51 52 53 54 55 56 1143.9021 735.6084 735.6084 996.2445 1476.0284 996.2445 735.6084 57 58 59 60 1476.0284 1476.0284 996.2445 735.6084 > res <- residuals(nematode, type="deviance") # estimate residuals > res 1 2 3 4 5 6 -34.0049962 -1.0647241 -11.8729495 -23.7904413 78.5573439 -5.1054781 7 8 9 10 11 12 -33.7774785 -29.6545718 -10.6411543 14.2908883 -38.4780564 33.6401126 13 14 15 16 17 18 54.1929143 41.1171517 -15.7910551 -32.5049447 -7.2062497 -33.8108615
  • 19. 13 19 20 21 22 23 24 -40.4132669 -5.5385905 33.3812205 10.5744782 10.2584010 -41.8351111 25 26 27 28 29 30 21.8329977 34.5404042 -29.5821101 -32.8446551 -6.3216015 -16.0217059 31 32 33 34 35 36 -38.8780694 15.6018159 -32.0896195 -40.7419996 71.1128294 -41.0419275 37 38 39 40 41 42 -0.1513335 3.0327221 -1.4274364 28.4749380 -12.0440289 -25.0030992 43 44 45 46 47 48 -21.4525464 14.3849662 31.3689294 -38.9021433 19.3334906 28.6354384 49 50 51 52 53 54 -4.6148356 -10.7546288 20.2621346 14.1032342 -23.6695434 -7.4280964 55 56 57 58 59 60 10.6268694 -26.3476376 -27.6793360 -7.4280964 10.6268694 20.2621346 > qqnorm(res, plot.it = TRUE) > qqline(res) *The normal quartile plot indicates that the error is not normally distributed ?pchisq > pchisq(nematode$deviance,df=nematode$df.residual,lower.tail = FALSE)
  • 20. 14 [1] 0 Interpretation: In this result the p-value zero (0) which indicates it is significant, indicating the lack of fit of the data. The significance of the p-value in this result shows that there is presence of overdispersion and it reveals that the poisson model data does not fit the data > library(AER) > dispersion <- dispersiontest(nematode,trafo=1) > dispersion Overdispersion test data: nematode z = 4.2675, p-value = 9.884e-06 alternative hypothesis: true alpha is greater than 0 sample estimates: alpha 871.0029 The result of overdispersion test indicate there is evidence of overdispersion (c is estimated to be 872.046) which speaks quite strong against the assumption of equidispersion that means when c=0. So generally since almost all assumption were violated or the goodness of fit of the Poisson model indicates that the model is not fit. The ‘dispersiontest’indicate the data to be over dispersed. The normal quartile plot also indicates that the error is not normally distributed. Thus, it is better to look for Negative Binomial Regression. 3.4. Negative Binomial RegressionAnalysis and Its Interpretation #Negative Binomial regression library(MASS) NBREG<-glm.nb(EPG~factor(age)+factor(species)+factor(sex)+factor(treatment),data =
  • 21. 15 walkite_Assignment_) NBREG summary(NBREG) #####Checking the model assumption library(lmtest) lrtest(nematode,NBREG) coef <- coefficients(NBREG) coef IRR <- exp(coefficients(NBREG)) IRR # predicted values and residual error pred <- predict(NBREG, type="response") # estimate predicted values pred res <- residuals(NBREG, type="deviance") # estimate residuals res qqnorm(res, plot.it = TRUE) qqline(res) ################################ > library(MASS) >NBREG<-glm.nb(EPG~factor(age)+factor(species)+factor(sex)+factor(treatment),data = walkite_Assignment_) > NBREG Call: glm.nb(formula = EPG ~ factor(age) + factor(species) + factor(sex) + factor(treatment), data = walkite_Assignment_, init.theta = 1.923394949, link = log) Coefficients: (Intercept) factor(age)>3yrs 7.59952 -0.08562 factor(age)2yrs-3yrs factor(species)sheep
  • 22. 16 0.06591 -0.48364 factor(sex)male factor(treatment)control 0.29248 -0.29743 factor(treatment)Ivermectin -0.21070 Degrees of Freedom: 59 Total (i.e. Null); 53 Residual Null Deviance: 79.85 Residual Deviance: 65.06 AIC: 1005 > summary(NBREG) Call: glm.nb(formula = EPG ~ factor(age) + factor(species) + factor(sex) + factor(treatment), data = walkite_Assignment_, init.theta = 1.923394949, link = log) Deviance Residuals: Min 1Q Median 3Q Max -1.9960 -1.0922 -0.1551 0.4745 1.9744 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 7.59952 0.28454 26.708 <2e-16 *** factor(age)>3yrs -0.08562 0.30986 -0.276 0.7823 factor(age)2yrs-3yrs 0.06591 0.28140 0.234 0.8148 factor(species)sheep -0.48364 0.18917 -2.557 0.0106 * factor(sex)male 0.29248 0.19578 1.494 0.1352 factor(treatment)control -0.29743 0.26745 -1.112 0.2661 factor(treatment)Ivermectin -0.21070 0.24985 -0.843 0.3991 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
  • 23. 17 (Dispersion parameter for Negative Binomial(1.9234) family taken to be 1) Null deviance: 79.853 on 59 degrees of freedom Residual deviance: 65.056 on 53 degrees of freedom AIC: 1005.4 Number of Fisher Scoring iterations: 1 Theta: 1.923 Std. Err.: 0.326 2 x log-likelihood: -989.360 > library(lmtest) >nematode<-glm(EPG~factor(age)+factor(species)+factor(sex)+factor(treatment),family = "poisson",data = walkite_Assignment_) >NBREG<-glm.nb(EPG~factor(age)+factor(species)+factor(sex)+factor(treatment),data = walkite_Assignment_) > lrtest(nematode,NBREG) Likelihood ratio test Model 1: EPG ~ factor(age) + factor(species) + factor(sex) + factor(treatment) Model 2: EPG ~ factor(age) + factor(species) + factor(sex) + factor(treatment) #Df LogLik Df Chisq Pr(>Chisq) 1 7 -24995.3 2 8 -494.7 1 49001 < 2.2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Interpretation: In this model checking the associated chi-squared value estimated from
  • 24. 18 2*(logLik(nematode) – logLik(NBREG)) is 49001 with one degree of freedom and the p-value is less than the significance level. This strongly suggests that the negative binomial model, estimating the dispersion parameter, is more appropriate than the Poisson model. > coef <- coefficients(NBREG) > coef (Intercept) factor(age)>3yrs 7.59951872 -0.08562393 factor(age)2yrs-3yrs factor(species)sheep 0.06591159 -0.48363536 factor(sex)male factor(treatment)control 0.29247990 -0.29743050 factor(treatment)Ivermectin -0.21070237 > IRR <- exp(coefficients(NBREG)) > IRR (Intercept) factor(age)>3yrs 1997.2344396 0.9179394 factor(age)2yrs-3yrs factor(species)sheep 1.0681323 0.6165380 factor(sex)male factor(treatment)control 1.3397458 0.7427242 factor(treatment)Ivermectin 0.8100151 > pred <- predict(NBREG, type="response") # estimate predicted values > pred 1 2 3 4 5 6 7 2456.2098 2675.7865 2858.0939 2675.7865 2858.0939 2675.7865 2675.7865 8 9 10 11 12 13 14 1997.2344 2133.3106 1997.2344 1728.0138 1728.0138 2315.0993 2315.0993 15 16 17 18 19 20 21 2315.0993 2315.0993 2315.0993 2315.0993 1728.0138 1728.0138 1361.6661
  • 25. 19 22 23 24 25 26 27 28 1361.6661 1824.2864 1824.2864 1824.2864 1824.2864 1824.2864 1824.2864 29 30 31 32 33 34 35 1361.6661 1361.6661 1649.7240 1315.2670 1315.2670 1315.2670 1649.7240 36 37 38 39 40 41 42 1649.7240 1514.3466 1130.3238 1514.3466 1130.3238 1427.3466 1065.3861 43 44 45 46 47 48 49 1226.6436 997.4290 1336.3013 1427.3466 1336.3013 1065.3861 1065.3861 50 51 52 53 54 55 56 1226.6436 839.5189 839.5189 976.8806 1308.7717 976.8806 839.5189 57 58 59 60 1308.7717 1308.7717 976.8806 839.5189 > res <- residuals(NBREG, type="deviance") # estimate residuals > res 1 2 3 4 5 6 -1.03229229 -0.09315152 -0.02837324 -0.77077140 1.97435995 -0.20463991 7 8 9 10 11 12 -1.12245881 -1.31177836 -0.19293868 0.23175489 -1.63311996 1.00491029 13 14 15 16 17 18 1.48540908 1.18739110 -0.33482919 -0.91794012 -0.07009783 -0.96869028 19 20 21 22 23 24 -1.75162396 -0.19161862 0.97087411 0.22971859 0.30126846 -1.79873459 25 26 27 28 29 30 0.63951789 0.98797747 -1.14541053 -1.30127519 -0.40687757 -0.83012023 31 32 33 34 35 36 -1.84436364 0.66416970 -1.37752773 -1.99254932 1.92366710 -1.99600958 37 38 39 40 41 42 0.03237883 0.02398299 -0.01317219 0.95704739 -0.34600465 -1.30165058 43 44 45 46 47 48 -1.08206029 0.26433726 0.86351808 -1.71879153 0.47665214 1.05999520 49 50 51 52 53 54
  • 26. 20 -0.22734653 -0.55289232 0.71465995 0.46009582 -1.21473337 -0.11852987 55 56 57 58 59 60 0.47377501 -1.85712317 -1.04996037 -0.11852987 0.47377501 0.71465995 > qqnorm(res, plot.it = TRUE) > qqline(res) > *The normal quartile plot indicates that the error is almost normally distributed. Thus the negative binomial regression fit the data. IT’S INTERPRETATION The interpretation should be based on negative binomial regression analysis because the poission model does not fit the Data. In the above Negative binomial regression analysis ‘Albendazole’ from –treatment-, ‘female’ from –sex- and ‘<-1yrs’ from -age, `goat` from species were used as references. Sex and age have statistically nonsignificant impact on EPG count and control group has nonsignificant effect on EPG count. Species has significant impact on EPG count. The reduction factor caused by Ivermectin drug is (exp(-0.21070)-1)*100= -18.998. Even if there is reduction in EPG count the Ivermectin drug has nonsignificant impact on EPG count because the p-value for Ivermectin is 0.3991. Hence this indicates that there is resistance of the parasite or the efficacy of the drug is not good. Generally; since the control group (p-value=0.2661) has
  • 27. 21 nonsignifant effect on EPG count, in both Albendazole and Ivermectin resistance of parasite were detected. 4. REFERENCES Cameron, A.C., Trivedi, P.K., 1986. Econometric models based on count data. Comparisons and applications of some estimators and tests. Journal of applied econometrics 1, 29-53. Cameron, A.C., Trivedi, P.K., 2013. Regression analysis of count data. Cambridge university press. Christopher, B., 2010. Models for Count Data and Categorical Response Data. Hilbe, J.M., 2008. Brief overview on interpreting count model risk ratios: An addendum to negative binomial regression. Cambridge University Press Cambridge. Hilbe, J.M., 2011a. Modeling count data. International Encyclopedia of Statistical Science. Springer, pp. 836-839. Hilbe, J.M., 2011b. Negative binomial regression. Cambridge University Press. Lambert, D., 1992. Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics 34, 1-14. Walkite, F., Negesse, M., Anwar, H., 2017. Detection of Antihelmintic resistance in gastrointestinal nematode parasite in small ruminant in Haramaya university farms. pp. 13-19.