SlideShare a Scribd company logo
KNM XVII 11-14 Juni 2014 ITS, Surabaya
1
MODELLING ROAD TRAFFIC ACCIDENT
DEATHS IN SOUTH AFRICA USING
GENERALIZED LINEAR MODELS
SHARON OGOLLA
1
, SONY SUNARYO
2
, IRHAMAH
3
1
Institut Teknologi Sepuluh Nopember Surabaya, sha.ogolla@gmail.com
2
Institut Teknologi Sepuluh Nopember Surabaya, sony_s@statistika.its.ac.id
3
Institut Teknologi Sepuluh Nopember Surabaya, irhamahn@yahoo.com
Abstract
World Health Organization (WHO) reports that over 1.2 million people die annually
due to road accidents. The numbers of deaths resulting from road traffic crashes have been
projected to reach 8.4 million in the year 2020. To analyze the mortality data it is
necessary to consider the mortality rate of certain age groups, so that we can find data
which shows the prevalence of major groups of deaths. The model is developed by the
Generalized Linear Modeling (GLM) method. The analysis of data is followed by
subsequent formulation of the Poisson regression models. It was further found that the
data analyzed over dispersion variance greater than average. As a result, Negative
Binomial model was used as an alternative and it found to fit the data perfectly.
Incremental addition of relevant explanatory variables further expanded the basic model
into a comprehensive model. At the end of this study, it could be seen through the
analysis of the data that age group from 35-49 is prevalent to road traffic accident deaths
with 26.6%. Females had an expected death rate of , which is 65.4% lower, at all
ages. The effect of being in the 35–49 year age group, compared with 65> year olds, is to
multiply the mean death rate by = 0.557, that is to decrease the mean death rate by
an estimated 44.3%, for both genders.
Keywords : Generalized Linear Models, Negative Binomial Regression, Poisson
Regression, South Africa
1. Introduction
Generalized linear models play a very important role in statistical inference. They
represent a mathematical way of quantifying the relationship between a response variable
and a set of independent variables, including a general class of statistical models.
Originally introduced by Nelder and Wedderburn [1], generalized linear model (GLM) is
an extension of the classical linear models. It includes linear regression models,
analysis of variance models, Logistic regression models, Poisson regression models,
Zero-inflated Poisson regression models, Negative Binomial regression models, log-
linear models, as well as many other models.
There are several studies that have been conducted relating to Generalized Linear
Models to solve real problems. Umar et al. [2] carried out a study to determine the impact
of running headlights on conspicuity-related motorcycle accidents in Malaysia. The
Generalized linear model with Poisson distribution and log link was used to describe the
frequency of conspicuity-related motorcycle accidents. The explanatory variables used
consisted of: influence of time trends, changes in recording system, effect of fasting
during month of Ramazan, and Balik Kampong which is a religious holiday unique to the
KNM XVII 11-14 Juni 2014 ITS, Surabaya
2
multi-cultural society of Malaysia. In order to overcome the over-dispersion of data, the
quasi-likelihood technique was used. Russo et al. [3] used it in Brazil to model the
number of deaths in Santo Angelo. In health, Jahangeer et al. [4] used generalized linear
models to analyze the factors influencing exclusive breastfeeding.
Studies done worldwide by Odero et al. [5] and Balogun et al.[6]have shown that road
traffic accidents are the leading causes of death of many adolescents and young adults.
There is evidence that using minimum safety standards, crash worthiness improvement in
vehicles, seatbelts use laws and reduced alcohol use, can substantially reduce deaths on
the road Leon [7]. In developing countries, including South Africa, the scenario is
different to developed countries, road traffic accidents are increasing with time and
mortality due to road traffic accidents is also on the rise Asogwa [8]. Peden et al. [9]
reported that when taking the population figures into account, developing countries in
Sub-Saharan Africa have the highest frequency of various accidents worldwide.
In South Africa, 3,280,931 deaths were recorded in between 2001 and 2006 of which
9.5% were due to non-natural causes [10]. Road traffic accident deaths comprised 9.3% of
non-natural deaths. Data from the National Injury Mortality Surveillance System
(NIMSS) showed that in 2005, transport-related injuries accounted for 74.3% of all
accidental (or unintentional) deaths [11]. Analysis of the injury burden in South Africa by
Norman et al. [12] showed that the age standardized road traffic injury mortality rates for
South Africa were about double the global rate for both males and females.
The benefits to be achieved from the results of this study are to provide scientific
insights concerning Generalized Linear Models and to create a platform for future studies
into modeling number of deaths by using Generalized Linear Models.
2. Literature Review
A. Generalized Linear Models
Generalized linear models are a natural generalization of classical linear models that
allow the mean of a population to depend on a linear predictor through a non-linear link
function. This allows the the response probability distribution to be any member of the
exponential family of distributions.
A generalized linear model (or GLM) consists of three components:
1. A random component, which specify the conditional distribution of the response
variable , given the explanatory variables
2. A linear function of the regression variables, called the linear predictor,
(1)
on which the expected value of depends.
3. An invertible link function, ( ) (2)
This transforms the expectation of the response to the linear predictor. The inverse of the
link function is sometimes called the mean function
( ) (3)
B. Poisson Regression Model
The Poisson regression model is a specific type of GLM and is non-linear. Poisson
regression analysis is a technique used to model dependent variables that describe count
data [13]. Poisson regression model has often been applied to estimate standardized
mortality and incidence ratios in cohort studies and in ecological investigations.
The primary equation of the model is
( ) (4)
KNM XVII 11-14 Juni 2014 ITS, Surabaya
3
The most common formulation of this model is the log-linear specification as in equation
(5)
The expected number of events per period is given by
( | ) (6)
Poisson regression model is a specific type of generalized linear models (GLM) whose
parameters can be estimated using the maximum likelihood method, with the likelihood
function given by:
∏ ( ) ∏ (7)
And the ln-likelihood function equal to:
∑ ∑ ∑ ( ) (8)
C. Solving For Over-dispersion In Poisson Regression
Over-dispersion may be modeled using compound Poisson distributions. With this
model the count y is Poisson distributed with mean λ, but λ is itself a random variable
which causes the variation to exceed that expected if the Poisson mean were fixed [14].
Thus suppose λ is regarded as a positive continuous random variable with probability
function g(λ). Given λ, the count is distributed as P(λ). Then the probability function of y
is
∫ (9)
A convenient choice for g(λ) is the gamma probability function G(μ, ν), implying (9) is
NB (μ, κ) where κ = 1/ν. In other words the negative binomial arises when there are
different groups of risks, each group characterized by a separate Poisson mean, and with
the means distributed according to the gamma distribution [14].
D. Negative Binomial Regression Model
Negative binomial distribution is a distribution that has a lot of ways in terms of its
approach. There are twelve negative binomial distribution approaches among which can
be approached by Poisson - Gamma mixture distribution, as a compound Poisson
distribution, as a sequence of Bernoulli trials, or as the inverse of the Binomial
distribution [15]
When data is overdispersed, the common method to account for it is by using negative
binomial model [15]. Negative binomial regression is a type of generalized linear model
in which the dependent variable Y is a count of the number of times an event occurs.
Statistical comparisons between Poisson and negative binomial regression models
confirm that in most cases the negative binomial better represents observed counts than
Poisson [15]. Hilbe [15] gave the parameterization of the negative binomial model as
( ) ( ) (10)
where is the mean of and is the heterogeneity parameter. Hilbe [15]
derives this parameterization as a Poisson-gamma mixture, or alternatively as the
number of failures before the ( ⁄ ) success, though we will not require ⁄ to be an
integer. Negative Binomial model estimation process is done by using the Newton
Raphson method.
KNM XVII 11-14 Juni 2014 ITS, Surabaya
4
The Partial likelihood form of negative binomial is
( ) ∏
( )
( ) ( ) (11)
From equation (11) it can then form a partial log-likelihood which becomes
( ) ( ) (12)
where { }
If equation (11) is substituted into (12), then the partial form ln-likelihood will be
( )
∑ ( ( )) ∑ ( ( )) ∑ ( ) ∑ ( )
∑ ( )
∑ ( ( )) ∑ ( ) ∑ ( )
∑ ( ) ∑ ( ) ∑ ( ) (13)
To maximize the function in equation above, the first derivative shall be found
( )
( ) (14)
The next step is to calculate the second partial derivatives of the log-likelihood function
partial aimed to form the Hessian matrix. The second partial derivatives of the log-
likelihood function of the partial regression coefficient β parameters are as follows:
∑ {
( )
( )
}
∑ {
( )
( )
( )}
∑ {
( )
( )
( ) }
∑ {
( )
( )
( ) }
Based on the results of the second partial derivatives above, the Hessian matrix is
obtained as follows:
KNM XVII 11-14 Juni 2014 ITS, Surabaya
5
=
( )
(15)
as a measurement of .
In addition, the matrix used in the iterative procedure of Newton Raphson algorithm
method for finding solutions of the log-likelihood function is convergent and used as
estimates for each parameter. Thus, the next stage is the process of Newton Raphson
algorithm in the negative binomial models as follows:
1. Determining the value of initial parameter estimates ̂ for iteration when .
2. Form a vector ̂
(̂ ) ( )
3. Shaping the Hessian matrix (̂ ).
4. Substituting the value ̂ to the elements of the vector ( ̂ )and the Hessian
matrix to obtain a vector ( ̂ ) and the Hessian matrix ( ̂ )
5. Perform iterations ranging from in the following equation
̂ ̂ (̂ ) (̂ )
6. Determine iteration update to to obtain parameter estimates that converge
|̂ ̂ |
3. Analysis And Results
A. Descriptive Statistics
From 2001 to 2006, there were a total of 28,890 people killed in South Africa. On
average, we could say that on a yearly basis, there were a total of 4,815 people killed
every year. Figure 1 below, shows the age distribution of people killed by road traffic
accidents in South Africa from 2001 to 2006. From the figure below, it is quite clear that
youths and middle aged people are prone to road traffic accidents. It can also be seen
that male group are the major victims in road traffic accidents. The highest number of
traffic accidents from year 2001-2006 is reported to come from the 35-49 male age
group which is recorded as 28.62 deaths in every 100,000 population. Followed closely
by 25-34 male age group which records a total death number of 25.69. The lowest
number of deaths from the male age group comes to 4.42 deaths in every 100,000
population.
Whilst male death rates show a peak at age group 35–49 years (similar to death
rates for both sexes), female death rates show a roughly linear increase from age group
0–14 to age group 65 years and above. Thus among females, the elderly experienced the
highest death rates due to road traffic accidents. This can be concluded that, at this age
KNM XVII 11-14 Juni 2014 ITS, Surabaya
6
group, most are pensioners and retirees hence they do not travel regularly. From figure
1, it can be noted that road traffic accident deaths increases very fast from infancy till
the ages 35-49 for males then it starts decreasing again. Thus it can be said that, the peak
of someone dying in South Africa due to road traffic accidents is at the age of 35-49 .
Figure 1 Age Distribution of people killed in road traffic accidents
B. DATA ANALYSIS
The deviance of the final Poisson distributed model was 1375.22 on 64 degrees of
freedom and that the scaled deviance is greater was greater than 1, a DF value of 21.49
indicating a case of over-dispersion. Since there is a case of over-dispersion, Negative
Binomial was then used to fit the model. Negative binomial reported a perfect fit for all
our models. In this case, our best model with all the variables included, the deviance of
the Negative binomial distributed model was 71.95 on 64 degrees of freedom and that
the scaled deviance and Pearson values adjusted for DF were rather small indicating
a good fit (value of 1.12). With the inclusion of all explanatory variables, the model gets
better. Age, population and gender were both highly significant, p-value was <0.05.
However, the age groups 25-34, 35-49 and 50-64 were not significant in this case since
p-value was >0.05.
Likelihood ratio statistics for type I and type III analysis tests were done.
Table 2 shows the Type I analysis tests each explanatory variable sequentially, under the
assumption that the previous explanatory variables are included in the model. With the
entry of population into the model, the deviance increases by 146.2, from 301685.765 to
301831.96.
0-14 15-24 25-34 35-49 50-64 65>
female 3.35 4.78 5.75 7.77 7.37 10.05
male 4.42 14.11 25.69 28.62 21.92 19.6
0
5
10
15
20
25
30
35
Deathsper100,000population
Age Distribution of People Killed by Road Traffic
Accidents
KNM XVII 11-14 Juni 2014 ITS, Surabaya
7
Table 1 Poisson regression model Information
Distribution Poisson
Link Function Log
Dependent Variable deaths
Offset Variable l_popn
Criteria For Assessing Goodness Of Fit
Criterion DF Value Value/DF
Deviance 64 1375.2179 21.4878
Scaled Deviance 64 1375.2179 21.4878
Pearson Chi-Square 64 1467.7835 22.9341
Scaled Pearson X2 64 1467.7835 22.9341
Log Likelihood 150368.9831
Full Log Likelihood -961.1206
AIC(smaller is better) 1938.2411
AICC (smaller is better) 1940.5268
BIC (smaller is better) 1956.4545
This is highly significant (p-value is <0.05) as judged against the distribution. In the
presence of gender in the model, the inclusion of age brings the deviance up to 301825.00, an
increase of 139.24. This indicates a much improved fit, achieved at a cost of five degrees of
freedom, since there are five parameters associated with categorical age. This statistic has p-
value <0.0001 on the distribution, indicating age is highly significant.
Table 2. LR Statistics for Type I and Type III Analysis
Type I Type III
Source df ∆ p-value p-value
Intercept 301685.765
Age 5 301825.00 68.53 <0.0001 73.89 <0.0001
Gender 1 301756.46 70.70 <0.0001 92.16 <0.0001
Popl 1 301831.96 6.96 0.0083 6.96 0.0083
The Type III analysis tests each explanatory variable under the assumption that all other
variables are included in the model. Gender, in the presence of age, has a deviance reduction of
= 92.16 with p-value <0.0001. Age, in the presence of gender, has = 73.89 with p-value
<0.0001 (as for the Type I analysis). There is no change in the Population value at 6.96.
Akaike Information Criterion was used to select our best model. Table 4 shows every
explanatory variable added to a model improves fit and the best model is the one with the
smallest AIC.
KNM XVII 11-14 Juni 2014 ITS, Surabaya
8
Table 3 Negative Binomial Model Information
Distribution Negative Binomial
Link Function Log
Dependent Variable deaths
Offset Variable l_popn
Criteria For Assessing Goodness Of Fit
Criterion DF Value Value/DF
Deviance 64 71.9595 1.1244
Scaled Deviance 64 71.9595 1.1244
Pearson Chi-Square 64 70.1381 1.0959
Scaled Pearson X2 64 70.1381 1.0959
Log Likelihood 150915.9792
Full Log Likelihood -414.1245
AIC(smaller is better) 846.2490
AICC (smaller is better) 849.1522
BIC (smaller is better) 866.7389
It is evident that when the number of explanatory variables increases, it makes a
good fit. Since the scaled deviance value is approximately close to 1, there is no case of
over-dispersion hence Negative Binomial was chosen to be the best model.
Table 4. Comparison between Poisson and Negative Binomial model with their
respective AIC, & Deviance values
Poisson Regression Model
No. Explanatory Variables AIC Scaled Deviance Value/DF
1 Age 8833.86 8274.84 125.37
2 Gender 6428.59 5877.57 83.97
3 Population 13020.65 12469.62 178.14
4 Age & Gender 2026.19 1465.17 22.54
5 Age, Gender & Population 1938.24 1375.22 21.49
Negative Binomial Regression Model
No. Explanatory Variables AIC Scaled Deviance Value/DF
1 Age 952.82 74.61 1.13
2 Gender 909.74 73.29 1.05
3 Population 980.37 76.36 1.09
4 Age & Gender 851.21 72.64 1.12
5 Age, Gender & Population 846.25 71.96 1.12
KNM XVII 11-14 Juni 2014 ITS, Surabaya
9
By choosing the smallest AIC, model number 5 is the best since it had an AIC value of
846.25. The fitted model was
where represent the age groups 0-14,15-24, 25-34, 35-49, 50-64
respectively, represents the female gender and represents population.
4. Conclusion
This study has shown that for an over-dispersion data, the Negative Binomial model
is better than the Poisson Regression model. Because of the Poisson distribution has a
special property that mean is equal to the variance. Thus an over dispersion means that
the variance is greater than mean. The Negative Binomial regression model is more
flexible as it allows for the variance to be greater than mean. The results also revealed
that the most affected people who die through road accidents in South Africa are male.
Females had an expected death rate of , which is 65.4% lower, at all ages. In
comparison with the age group 65>, the 0-14 age group had a decreased death rate of
89.6% for both genders, the 15-24 age group had a decreased death rate of 73.3% for
both genders, the 25-34 age group had a decreased death rate estimated at 54.9% for
both genders, the 35-49 age group had a decreased death rate estimated at 44.3% for
both genders and the 50-64 age group had a decreased death rate estimated at 38.8%. It
was also found that for every increase of in the population, the death rate of road
traffic accidents also increased by an estimated , thus the more the
population, the more the number of deaths. It can also be noted that accident deaths
increase as the years go by, and thus more care and policies should be provided to
reduce road traffic accident deaths in South Africa.
REFERENCES
[1] Nelder, J.A and Wedderburn, R.W.M (1972). “Generalized linear models”. Journal
of the Royal Statistical Society, Series B, 19, 92-100.
[2] Radin Umar, R.S., M., Norghani, H., Hussain, B., Shahrom, and M.M, Hamdan,
1998. Research Report 1, National Road Safety Council Malaysia, Kuala Lumpur.
[3] Russo, S. Flender, D. and da Silva, G.F. (2012). “Poisson Regression Models for
Count Data: Use in the Number of Deaths in the Santo Angelo (Brazil).” Journal of
Basic & Applied Sciences, 2012, 8, 266-269.
[4] Cheika J., Naushad M.K. and Maleika H.M.K. (2009). “Analyzing the factors
influencing exclusive breastfeeding using the Generalized Poisson Regression
model”. World Academy of Science, Engineering and Technology Vol:3 2009-11-
29.
[5] Odero, W., Garner, P. and Zwi, A. (1997). “Road traffic injuries in the developing
countries: a comprehensive review of epidemiological studies”. Journal of Tropical
Medicine and International Health. 2(5), 445-460.
[6] Balogun, J.A., Abereoje, O.K. (1992). “Pattern of road traffic accident cases in a
Nigeria University teaching hospital between 1987 and 1990.” J.Trop Med Hyg;
95(1):239.
[7] Leon, S.R. (1996). “Reducing death on the Road. The effects of minimum safety
standard”.119 Unpublicised crash test, seat belts and alcohol. Am J Public Health;
86(1):31-3.
[8] Asongwa, S.E. (1992). “Road traffic accidents in Nigeria: A review and a
reappraisal”. Accident Analysis and Prevention: 23 (5), 343-35.
KNM XVII 11-14 Juni 2014 ITS, Surabaya
10
[9] Peden, M. (Ed), (2004), “World Report on Road Traffic Injury Prevention”. World
HealthOrganisation, Geneva.
[10] Statistics South Africa. 2008. “Mortality and cause of death in South Africa, 2006:
Findings from death notification”. Statistics South Africa.
[11] Medical Research Council and UNISA. 2007. “A profile of fatal injuries in South
Africa 7th Annual Report of the National Injury Mortality Surveillance System
2005”. MRC/UNISA Crime, Violence and Injury Lead Programme, July 2007.
[12] Norman, R. Matzopoulos, R. Groenwald, P. and Bradshaw, D. (2007). “The high
burden of injuries in South Africa.” Bulletin of the World Health Organization.
September 2007, 85 (9). WHO. Geneva.
[13] Cameron, A.C. and Trivedi, P.K. (1998). “Regression Analysis of Count Data”.
Cambridge University Press, Cambridge, U.K.
[14] Jong, P. and Heller, G. Z. (2008). “Generalized Linear Models for Insurance
Data.” The International Series on Actuarial Science, Cambridge University Press
ISBN-13 978-0-511-38877-4.
[15] Hilbe, Joseph M. (2011). “Negative binomial regression” (2nd
edition) New York:
Cambridge University Press

More Related Content

Viewers also liked

PCP - 什么是私募股权投资(PE)基金 (14 Dec 2014)
PCP - 什么是私募股权投资(PE)基金 (14 Dec 2014)PCP - 什么是私募股权投资(PE)基金 (14 Dec 2014)
PCP - 什么是私募股权投资(PE)基金 (14 Dec 2014)Pat Lim
 
Hola
HolaHola
6961
69616961
6920
69206920
Регулювання оборотних плугів
Регулювання оборотних плугівРегулювання оборотних плугів
Регулювання оборотних плугів
Николай Завирюха
 
Lecture 01 Introduction (Traffic Engineering هندسة المرور & Dr. Usama Shahdah)
Lecture 01 Introduction (Traffic Engineering هندسة المرور & Dr. Usama Shahdah) Lecture 01 Introduction (Traffic Engineering هندسة المرور & Dr. Usama Shahdah)
Lecture 01 Introduction (Traffic Engineering هندسة المرور & Dr. Usama Shahdah)
Hossam Shafiq I
 
Diagnostic imaging in head and neck pathology
Diagnostic imaging in head and neck pathologyDiagnostic imaging in head and neck pathology
Diagnostic imaging in head and neck pathology
Hayat Youssef
 

Viewers also liked (7)

PCP - 什么是私募股权投资(PE)基金 (14 Dec 2014)
PCP - 什么是私募股权投资(PE)基金 (14 Dec 2014)PCP - 什么是私募股权投资(PE)基金 (14 Dec 2014)
PCP - 什么是私募股权投资(PE)基金 (14 Dec 2014)
 
Hola
HolaHola
Hola
 
6961
69616961
6961
 
6920
69206920
6920
 
Регулювання оборотних плугів
Регулювання оборотних плугівРегулювання оборотних плугів
Регулювання оборотних плугів
 
Lecture 01 Introduction (Traffic Engineering هندسة المرور & Dr. Usama Shahdah)
Lecture 01 Introduction (Traffic Engineering هندسة المرور & Dr. Usama Shahdah) Lecture 01 Introduction (Traffic Engineering هندسة المرور & Dr. Usama Shahdah)
Lecture 01 Introduction (Traffic Engineering هندسة المرور & Dr. Usama Shahdah)
 
Diagnostic imaging in head and neck pathology
Diagnostic imaging in head and neck pathologyDiagnostic imaging in head and neck pathology
Diagnostic imaging in head and neck pathology
 

Similar to Makalah Seminar_KNM XVII_ITS

Pedestrian Accident Scenario of Dhaka City and Development of a Prediction Model
Pedestrian Accident Scenario of Dhaka City and Development of a Prediction ModelPedestrian Accident Scenario of Dhaka City and Development of a Prediction Model
Pedestrian Accident Scenario of Dhaka City and Development of a Prediction Model
RafidTahmid1
 
Analysis Of Count Data Using Poisson Regression
Analysis Of Count Data Using Poisson RegressionAnalysis Of Count Data Using Poisson Regression
Analysis Of Count Data Using Poisson Regression
Amy Cernava
 
Assessing spatial heterogeneity
Assessing spatial heterogeneityAssessing spatial heterogeneity
Assessing spatial heterogeneity
Johan Blomme
 
Modeling of driver lane choice behavior with artificial neural networks (ann)...
Modeling of driver lane choice behavior with artificial neural networks (ann)...Modeling of driver lane choice behavior with artificial neural networks (ann)...
Modeling of driver lane choice behavior with artificial neural networks (ann)...
cseij
 
EVALUATION OF PARTICLE SWARM OPTIMIZATION ALGORITHM IN PREDICTION OF THE CAR ...
EVALUATION OF PARTICLE SWARM OPTIMIZATION ALGORITHM IN PREDICTION OF THE CAR ...EVALUATION OF PARTICLE SWARM OPTIMIZATION ALGORITHM IN PREDICTION OF THE CAR ...
EVALUATION OF PARTICLE SWARM OPTIMIZATION ALGORITHM IN PREDICTION OF THE CAR ...
ijcsa
 
UNDERSTANDING LEAST ABSOLUTE VALUE IN REGRESSION-BASED DATA MINING
UNDERSTANDING LEAST ABSOLUTE VALUE IN REGRESSION-BASED DATA MININGUNDERSTANDING LEAST ABSOLUTE VALUE IN REGRESSION-BASED DATA MINING
UNDERSTANDING LEAST ABSOLUTE VALUE IN REGRESSION-BASED DATA MINING
IJDKP
 
Generalized Additive and Generalized Linear Modeling for Children Diseases
Generalized Additive and Generalized Linear Modeling for Children DiseasesGeneralized Additive and Generalized Linear Modeling for Children Diseases
Generalized Additive and Generalized Linear Modeling for Children Diseases
QUESTJOURNAL
 
research journal
research journalresearch journal
research journal
rikaseorika
 
published in the journal
published in the journalpublished in the journal
published in the journal
rikaseorika
 
journals public
journals publicjournals public
journals public
rikaseorika
 
journal in research
journal in research journal in research
journal in research
rikaseorika
 
Mine Death Estimation
Mine Death EstimationMine Death Estimation
Mine Death Estimation
Jun Steed Huang
 
Mixed Model Analysis for Overdispersion
Mixed Model Analysis for OverdispersionMixed Model Analysis for Overdispersion
Mixed Model Analysis for Overdispersion
theijes
 
General Linear Model | Statistics
General Linear Model | StatisticsGeneral Linear Model | Statistics
General Linear Model | Statistics
Transweb Global Inc
 
Modelo Generalizado
Modelo GeneralizadoModelo Generalizado
Modelo Generalizado
Julio Martinez Andrade
 
Mixed models
Mixed modelsMixed models
Mixed models
Arun Nagarajan
 
Differential Evolution Algorithm with Triangular Adaptive Control Parameter f...
Differential Evolution Algorithm with Triangular Adaptive Control Parameter f...Differential Evolution Algorithm with Triangular Adaptive Control Parameter f...
Differential Evolution Algorithm with Triangular Adaptive Control Parameter f...
International Journal of Power Electronics and Drive Systems
 
Integration Method of Local-global SVR and Parallel Time Variant PSO in Water...
Integration Method of Local-global SVR and Parallel Time Variant PSO in Water...Integration Method of Local-global SVR and Parallel Time Variant PSO in Water...
Integration Method of Local-global SVR and Parallel Time Variant PSO in Water...
TELKOMNIKA JOURNAL
 
Classification with Random Forest Based on Local Tangent Space Alignment and ...
Classification with Random Forest Based on Local Tangent Space Alignment and ...Classification with Random Forest Based on Local Tangent Space Alignment and ...
Classification with Random Forest Based on Local Tangent Space Alignment and ...
International Journal of Modern Research in Engineering and Technology
 
Application of Semiparametric Non-Linear Model on Panel Data with Very Small ...
Application of Semiparametric Non-Linear Model on Panel Data with Very Small ...Application of Semiparametric Non-Linear Model on Panel Data with Very Small ...
Application of Semiparametric Non-Linear Model on Panel Data with Very Small ...
IOSRJM
 

Similar to Makalah Seminar_KNM XVII_ITS (20)

Pedestrian Accident Scenario of Dhaka City and Development of a Prediction Model
Pedestrian Accident Scenario of Dhaka City and Development of a Prediction ModelPedestrian Accident Scenario of Dhaka City and Development of a Prediction Model
Pedestrian Accident Scenario of Dhaka City and Development of a Prediction Model
 
Analysis Of Count Data Using Poisson Regression
Analysis Of Count Data Using Poisson RegressionAnalysis Of Count Data Using Poisson Regression
Analysis Of Count Data Using Poisson Regression
 
Assessing spatial heterogeneity
Assessing spatial heterogeneityAssessing spatial heterogeneity
Assessing spatial heterogeneity
 
Modeling of driver lane choice behavior with artificial neural networks (ann)...
Modeling of driver lane choice behavior with artificial neural networks (ann)...Modeling of driver lane choice behavior with artificial neural networks (ann)...
Modeling of driver lane choice behavior with artificial neural networks (ann)...
 
EVALUATION OF PARTICLE SWARM OPTIMIZATION ALGORITHM IN PREDICTION OF THE CAR ...
EVALUATION OF PARTICLE SWARM OPTIMIZATION ALGORITHM IN PREDICTION OF THE CAR ...EVALUATION OF PARTICLE SWARM OPTIMIZATION ALGORITHM IN PREDICTION OF THE CAR ...
EVALUATION OF PARTICLE SWARM OPTIMIZATION ALGORITHM IN PREDICTION OF THE CAR ...
 
UNDERSTANDING LEAST ABSOLUTE VALUE IN REGRESSION-BASED DATA MINING
UNDERSTANDING LEAST ABSOLUTE VALUE IN REGRESSION-BASED DATA MININGUNDERSTANDING LEAST ABSOLUTE VALUE IN REGRESSION-BASED DATA MINING
UNDERSTANDING LEAST ABSOLUTE VALUE IN REGRESSION-BASED DATA MINING
 
Generalized Additive and Generalized Linear Modeling for Children Diseases
Generalized Additive and Generalized Linear Modeling for Children DiseasesGeneralized Additive and Generalized Linear Modeling for Children Diseases
Generalized Additive and Generalized Linear Modeling for Children Diseases
 
research journal
research journalresearch journal
research journal
 
published in the journal
published in the journalpublished in the journal
published in the journal
 
journals public
journals publicjournals public
journals public
 
journal in research
journal in research journal in research
journal in research
 
Mine Death Estimation
Mine Death EstimationMine Death Estimation
Mine Death Estimation
 
Mixed Model Analysis for Overdispersion
Mixed Model Analysis for OverdispersionMixed Model Analysis for Overdispersion
Mixed Model Analysis for Overdispersion
 
General Linear Model | Statistics
General Linear Model | StatisticsGeneral Linear Model | Statistics
General Linear Model | Statistics
 
Modelo Generalizado
Modelo GeneralizadoModelo Generalizado
Modelo Generalizado
 
Mixed models
Mixed modelsMixed models
Mixed models
 
Differential Evolution Algorithm with Triangular Adaptive Control Parameter f...
Differential Evolution Algorithm with Triangular Adaptive Control Parameter f...Differential Evolution Algorithm with Triangular Adaptive Control Parameter f...
Differential Evolution Algorithm with Triangular Adaptive Control Parameter f...
 
Integration Method of Local-global SVR and Parallel Time Variant PSO in Water...
Integration Method of Local-global SVR and Parallel Time Variant PSO in Water...Integration Method of Local-global SVR and Parallel Time Variant PSO in Water...
Integration Method of Local-global SVR and Parallel Time Variant PSO in Water...
 
Classification with Random Forest Based on Local Tangent Space Alignment and ...
Classification with Random Forest Based on Local Tangent Space Alignment and ...Classification with Random Forest Based on Local Tangent Space Alignment and ...
Classification with Random Forest Based on Local Tangent Space Alignment and ...
 
Application of Semiparametric Non-Linear Model on Panel Data with Very Small ...
Application of Semiparametric Non-Linear Model on Panel Data with Very Small ...Application of Semiparametric Non-Linear Model on Panel Data with Very Small ...
Application of Semiparametric Non-Linear Model on Panel Data with Very Small ...
 

Makalah Seminar_KNM XVII_ITS

  • 1. KNM XVII 11-14 Juni 2014 ITS, Surabaya 1 MODELLING ROAD TRAFFIC ACCIDENT DEATHS IN SOUTH AFRICA USING GENERALIZED LINEAR MODELS SHARON OGOLLA 1 , SONY SUNARYO 2 , IRHAMAH 3 1 Institut Teknologi Sepuluh Nopember Surabaya, sha.ogolla@gmail.com 2 Institut Teknologi Sepuluh Nopember Surabaya, sony_s@statistika.its.ac.id 3 Institut Teknologi Sepuluh Nopember Surabaya, irhamahn@yahoo.com Abstract World Health Organization (WHO) reports that over 1.2 million people die annually due to road accidents. The numbers of deaths resulting from road traffic crashes have been projected to reach 8.4 million in the year 2020. To analyze the mortality data it is necessary to consider the mortality rate of certain age groups, so that we can find data which shows the prevalence of major groups of deaths. The model is developed by the Generalized Linear Modeling (GLM) method. The analysis of data is followed by subsequent formulation of the Poisson regression models. It was further found that the data analyzed over dispersion variance greater than average. As a result, Negative Binomial model was used as an alternative and it found to fit the data perfectly. Incremental addition of relevant explanatory variables further expanded the basic model into a comprehensive model. At the end of this study, it could be seen through the analysis of the data that age group from 35-49 is prevalent to road traffic accident deaths with 26.6%. Females had an expected death rate of , which is 65.4% lower, at all ages. The effect of being in the 35–49 year age group, compared with 65> year olds, is to multiply the mean death rate by = 0.557, that is to decrease the mean death rate by an estimated 44.3%, for both genders. Keywords : Generalized Linear Models, Negative Binomial Regression, Poisson Regression, South Africa 1. Introduction Generalized linear models play a very important role in statistical inference. They represent a mathematical way of quantifying the relationship between a response variable and a set of independent variables, including a general class of statistical models. Originally introduced by Nelder and Wedderburn [1], generalized linear model (GLM) is an extension of the classical linear models. It includes linear regression models, analysis of variance models, Logistic regression models, Poisson regression models, Zero-inflated Poisson regression models, Negative Binomial regression models, log- linear models, as well as many other models. There are several studies that have been conducted relating to Generalized Linear Models to solve real problems. Umar et al. [2] carried out a study to determine the impact of running headlights on conspicuity-related motorcycle accidents in Malaysia. The Generalized linear model with Poisson distribution and log link was used to describe the frequency of conspicuity-related motorcycle accidents. The explanatory variables used consisted of: influence of time trends, changes in recording system, effect of fasting during month of Ramazan, and Balik Kampong which is a religious holiday unique to the
  • 2. KNM XVII 11-14 Juni 2014 ITS, Surabaya 2 multi-cultural society of Malaysia. In order to overcome the over-dispersion of data, the quasi-likelihood technique was used. Russo et al. [3] used it in Brazil to model the number of deaths in Santo Angelo. In health, Jahangeer et al. [4] used generalized linear models to analyze the factors influencing exclusive breastfeeding. Studies done worldwide by Odero et al. [5] and Balogun et al.[6]have shown that road traffic accidents are the leading causes of death of many adolescents and young adults. There is evidence that using minimum safety standards, crash worthiness improvement in vehicles, seatbelts use laws and reduced alcohol use, can substantially reduce deaths on the road Leon [7]. In developing countries, including South Africa, the scenario is different to developed countries, road traffic accidents are increasing with time and mortality due to road traffic accidents is also on the rise Asogwa [8]. Peden et al. [9] reported that when taking the population figures into account, developing countries in Sub-Saharan Africa have the highest frequency of various accidents worldwide. In South Africa, 3,280,931 deaths were recorded in between 2001 and 2006 of which 9.5% were due to non-natural causes [10]. Road traffic accident deaths comprised 9.3% of non-natural deaths. Data from the National Injury Mortality Surveillance System (NIMSS) showed that in 2005, transport-related injuries accounted for 74.3% of all accidental (or unintentional) deaths [11]. Analysis of the injury burden in South Africa by Norman et al. [12] showed that the age standardized road traffic injury mortality rates for South Africa were about double the global rate for both males and females. The benefits to be achieved from the results of this study are to provide scientific insights concerning Generalized Linear Models and to create a platform for future studies into modeling number of deaths by using Generalized Linear Models. 2. Literature Review A. Generalized Linear Models Generalized linear models are a natural generalization of classical linear models that allow the mean of a population to depend on a linear predictor through a non-linear link function. This allows the the response probability distribution to be any member of the exponential family of distributions. A generalized linear model (or GLM) consists of three components: 1. A random component, which specify the conditional distribution of the response variable , given the explanatory variables 2. A linear function of the regression variables, called the linear predictor, (1) on which the expected value of depends. 3. An invertible link function, ( ) (2) This transforms the expectation of the response to the linear predictor. The inverse of the link function is sometimes called the mean function ( ) (3) B. Poisson Regression Model The Poisson regression model is a specific type of GLM and is non-linear. Poisson regression analysis is a technique used to model dependent variables that describe count data [13]. Poisson regression model has often been applied to estimate standardized mortality and incidence ratios in cohort studies and in ecological investigations. The primary equation of the model is ( ) (4)
  • 3. KNM XVII 11-14 Juni 2014 ITS, Surabaya 3 The most common formulation of this model is the log-linear specification as in equation (5) The expected number of events per period is given by ( | ) (6) Poisson regression model is a specific type of generalized linear models (GLM) whose parameters can be estimated using the maximum likelihood method, with the likelihood function given by: ∏ ( ) ∏ (7) And the ln-likelihood function equal to: ∑ ∑ ∑ ( ) (8) C. Solving For Over-dispersion In Poisson Regression Over-dispersion may be modeled using compound Poisson distributions. With this model the count y is Poisson distributed with mean λ, but λ is itself a random variable which causes the variation to exceed that expected if the Poisson mean were fixed [14]. Thus suppose λ is regarded as a positive continuous random variable with probability function g(λ). Given λ, the count is distributed as P(λ). Then the probability function of y is ∫ (9) A convenient choice for g(λ) is the gamma probability function G(μ, ν), implying (9) is NB (μ, κ) where κ = 1/ν. In other words the negative binomial arises when there are different groups of risks, each group characterized by a separate Poisson mean, and with the means distributed according to the gamma distribution [14]. D. Negative Binomial Regression Model Negative binomial distribution is a distribution that has a lot of ways in terms of its approach. There are twelve negative binomial distribution approaches among which can be approached by Poisson - Gamma mixture distribution, as a compound Poisson distribution, as a sequence of Bernoulli trials, or as the inverse of the Binomial distribution [15] When data is overdispersed, the common method to account for it is by using negative binomial model [15]. Negative binomial regression is a type of generalized linear model in which the dependent variable Y is a count of the number of times an event occurs. Statistical comparisons between Poisson and negative binomial regression models confirm that in most cases the negative binomial better represents observed counts than Poisson [15]. Hilbe [15] gave the parameterization of the negative binomial model as ( ) ( ) (10) where is the mean of and is the heterogeneity parameter. Hilbe [15] derives this parameterization as a Poisson-gamma mixture, or alternatively as the number of failures before the ( ⁄ ) success, though we will not require ⁄ to be an integer. Negative Binomial model estimation process is done by using the Newton Raphson method.
  • 4. KNM XVII 11-14 Juni 2014 ITS, Surabaya 4 The Partial likelihood form of negative binomial is ( ) ∏ ( ) ( ) ( ) (11) From equation (11) it can then form a partial log-likelihood which becomes ( ) ( ) (12) where { } If equation (11) is substituted into (12), then the partial form ln-likelihood will be ( ) ∑ ( ( )) ∑ ( ( )) ∑ ( ) ∑ ( ) ∑ ( ) ∑ ( ( )) ∑ ( ) ∑ ( ) ∑ ( ) ∑ ( ) ∑ ( ) (13) To maximize the function in equation above, the first derivative shall be found ( ) ( ) (14) The next step is to calculate the second partial derivatives of the log-likelihood function partial aimed to form the Hessian matrix. The second partial derivatives of the log- likelihood function of the partial regression coefficient β parameters are as follows: ∑ { ( ) ( ) } ∑ { ( ) ( ) ( )} ∑ { ( ) ( ) ( ) } ∑ { ( ) ( ) ( ) } Based on the results of the second partial derivatives above, the Hessian matrix is obtained as follows:
  • 5. KNM XVII 11-14 Juni 2014 ITS, Surabaya 5 = ( ) (15) as a measurement of . In addition, the matrix used in the iterative procedure of Newton Raphson algorithm method for finding solutions of the log-likelihood function is convergent and used as estimates for each parameter. Thus, the next stage is the process of Newton Raphson algorithm in the negative binomial models as follows: 1. Determining the value of initial parameter estimates ̂ for iteration when . 2. Form a vector ̂ (̂ ) ( ) 3. Shaping the Hessian matrix (̂ ). 4. Substituting the value ̂ to the elements of the vector ( ̂ )and the Hessian matrix to obtain a vector ( ̂ ) and the Hessian matrix ( ̂ ) 5. Perform iterations ranging from in the following equation ̂ ̂ (̂ ) (̂ ) 6. Determine iteration update to to obtain parameter estimates that converge |̂ ̂ | 3. Analysis And Results A. Descriptive Statistics From 2001 to 2006, there were a total of 28,890 people killed in South Africa. On average, we could say that on a yearly basis, there were a total of 4,815 people killed every year. Figure 1 below, shows the age distribution of people killed by road traffic accidents in South Africa from 2001 to 2006. From the figure below, it is quite clear that youths and middle aged people are prone to road traffic accidents. It can also be seen that male group are the major victims in road traffic accidents. The highest number of traffic accidents from year 2001-2006 is reported to come from the 35-49 male age group which is recorded as 28.62 deaths in every 100,000 population. Followed closely by 25-34 male age group which records a total death number of 25.69. The lowest number of deaths from the male age group comes to 4.42 deaths in every 100,000 population. Whilst male death rates show a peak at age group 35–49 years (similar to death rates for both sexes), female death rates show a roughly linear increase from age group 0–14 to age group 65 years and above. Thus among females, the elderly experienced the highest death rates due to road traffic accidents. This can be concluded that, at this age
  • 6. KNM XVII 11-14 Juni 2014 ITS, Surabaya 6 group, most are pensioners and retirees hence they do not travel regularly. From figure 1, it can be noted that road traffic accident deaths increases very fast from infancy till the ages 35-49 for males then it starts decreasing again. Thus it can be said that, the peak of someone dying in South Africa due to road traffic accidents is at the age of 35-49 . Figure 1 Age Distribution of people killed in road traffic accidents B. DATA ANALYSIS The deviance of the final Poisson distributed model was 1375.22 on 64 degrees of freedom and that the scaled deviance is greater was greater than 1, a DF value of 21.49 indicating a case of over-dispersion. Since there is a case of over-dispersion, Negative Binomial was then used to fit the model. Negative binomial reported a perfect fit for all our models. In this case, our best model with all the variables included, the deviance of the Negative binomial distributed model was 71.95 on 64 degrees of freedom and that the scaled deviance and Pearson values adjusted for DF were rather small indicating a good fit (value of 1.12). With the inclusion of all explanatory variables, the model gets better. Age, population and gender were both highly significant, p-value was <0.05. However, the age groups 25-34, 35-49 and 50-64 were not significant in this case since p-value was >0.05. Likelihood ratio statistics for type I and type III analysis tests were done. Table 2 shows the Type I analysis tests each explanatory variable sequentially, under the assumption that the previous explanatory variables are included in the model. With the entry of population into the model, the deviance increases by 146.2, from 301685.765 to 301831.96. 0-14 15-24 25-34 35-49 50-64 65> female 3.35 4.78 5.75 7.77 7.37 10.05 male 4.42 14.11 25.69 28.62 21.92 19.6 0 5 10 15 20 25 30 35 Deathsper100,000population Age Distribution of People Killed by Road Traffic Accidents
  • 7. KNM XVII 11-14 Juni 2014 ITS, Surabaya 7 Table 1 Poisson regression model Information Distribution Poisson Link Function Log Dependent Variable deaths Offset Variable l_popn Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 64 1375.2179 21.4878 Scaled Deviance 64 1375.2179 21.4878 Pearson Chi-Square 64 1467.7835 22.9341 Scaled Pearson X2 64 1467.7835 22.9341 Log Likelihood 150368.9831 Full Log Likelihood -961.1206 AIC(smaller is better) 1938.2411 AICC (smaller is better) 1940.5268 BIC (smaller is better) 1956.4545 This is highly significant (p-value is <0.05) as judged against the distribution. In the presence of gender in the model, the inclusion of age brings the deviance up to 301825.00, an increase of 139.24. This indicates a much improved fit, achieved at a cost of five degrees of freedom, since there are five parameters associated with categorical age. This statistic has p- value <0.0001 on the distribution, indicating age is highly significant. Table 2. LR Statistics for Type I and Type III Analysis Type I Type III Source df ∆ p-value p-value Intercept 301685.765 Age 5 301825.00 68.53 <0.0001 73.89 <0.0001 Gender 1 301756.46 70.70 <0.0001 92.16 <0.0001 Popl 1 301831.96 6.96 0.0083 6.96 0.0083 The Type III analysis tests each explanatory variable under the assumption that all other variables are included in the model. Gender, in the presence of age, has a deviance reduction of = 92.16 with p-value <0.0001. Age, in the presence of gender, has = 73.89 with p-value <0.0001 (as for the Type I analysis). There is no change in the Population value at 6.96. Akaike Information Criterion was used to select our best model. Table 4 shows every explanatory variable added to a model improves fit and the best model is the one with the smallest AIC.
  • 8. KNM XVII 11-14 Juni 2014 ITS, Surabaya 8 Table 3 Negative Binomial Model Information Distribution Negative Binomial Link Function Log Dependent Variable deaths Offset Variable l_popn Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 64 71.9595 1.1244 Scaled Deviance 64 71.9595 1.1244 Pearson Chi-Square 64 70.1381 1.0959 Scaled Pearson X2 64 70.1381 1.0959 Log Likelihood 150915.9792 Full Log Likelihood -414.1245 AIC(smaller is better) 846.2490 AICC (smaller is better) 849.1522 BIC (smaller is better) 866.7389 It is evident that when the number of explanatory variables increases, it makes a good fit. Since the scaled deviance value is approximately close to 1, there is no case of over-dispersion hence Negative Binomial was chosen to be the best model. Table 4. Comparison between Poisson and Negative Binomial model with their respective AIC, & Deviance values Poisson Regression Model No. Explanatory Variables AIC Scaled Deviance Value/DF 1 Age 8833.86 8274.84 125.37 2 Gender 6428.59 5877.57 83.97 3 Population 13020.65 12469.62 178.14 4 Age & Gender 2026.19 1465.17 22.54 5 Age, Gender & Population 1938.24 1375.22 21.49 Negative Binomial Regression Model No. Explanatory Variables AIC Scaled Deviance Value/DF 1 Age 952.82 74.61 1.13 2 Gender 909.74 73.29 1.05 3 Population 980.37 76.36 1.09 4 Age & Gender 851.21 72.64 1.12 5 Age, Gender & Population 846.25 71.96 1.12
  • 9. KNM XVII 11-14 Juni 2014 ITS, Surabaya 9 By choosing the smallest AIC, model number 5 is the best since it had an AIC value of 846.25. The fitted model was where represent the age groups 0-14,15-24, 25-34, 35-49, 50-64 respectively, represents the female gender and represents population. 4. Conclusion This study has shown that for an over-dispersion data, the Negative Binomial model is better than the Poisson Regression model. Because of the Poisson distribution has a special property that mean is equal to the variance. Thus an over dispersion means that the variance is greater than mean. The Negative Binomial regression model is more flexible as it allows for the variance to be greater than mean. The results also revealed that the most affected people who die through road accidents in South Africa are male. Females had an expected death rate of , which is 65.4% lower, at all ages. In comparison with the age group 65>, the 0-14 age group had a decreased death rate of 89.6% for both genders, the 15-24 age group had a decreased death rate of 73.3% for both genders, the 25-34 age group had a decreased death rate estimated at 54.9% for both genders, the 35-49 age group had a decreased death rate estimated at 44.3% for both genders and the 50-64 age group had a decreased death rate estimated at 38.8%. It was also found that for every increase of in the population, the death rate of road traffic accidents also increased by an estimated , thus the more the population, the more the number of deaths. It can also be noted that accident deaths increase as the years go by, and thus more care and policies should be provided to reduce road traffic accident deaths in South Africa. REFERENCES [1] Nelder, J.A and Wedderburn, R.W.M (1972). “Generalized linear models”. Journal of the Royal Statistical Society, Series B, 19, 92-100. [2] Radin Umar, R.S., M., Norghani, H., Hussain, B., Shahrom, and M.M, Hamdan, 1998. Research Report 1, National Road Safety Council Malaysia, Kuala Lumpur. [3] Russo, S. Flender, D. and da Silva, G.F. (2012). “Poisson Regression Models for Count Data: Use in the Number of Deaths in the Santo Angelo (Brazil).” Journal of Basic & Applied Sciences, 2012, 8, 266-269. [4] Cheika J., Naushad M.K. and Maleika H.M.K. (2009). “Analyzing the factors influencing exclusive breastfeeding using the Generalized Poisson Regression model”. World Academy of Science, Engineering and Technology Vol:3 2009-11- 29. [5] Odero, W., Garner, P. and Zwi, A. (1997). “Road traffic injuries in the developing countries: a comprehensive review of epidemiological studies”. Journal of Tropical Medicine and International Health. 2(5), 445-460. [6] Balogun, J.A., Abereoje, O.K. (1992). “Pattern of road traffic accident cases in a Nigeria University teaching hospital between 1987 and 1990.” J.Trop Med Hyg; 95(1):239. [7] Leon, S.R. (1996). “Reducing death on the Road. The effects of minimum safety standard”.119 Unpublicised crash test, seat belts and alcohol. Am J Public Health; 86(1):31-3. [8] Asongwa, S.E. (1992). “Road traffic accidents in Nigeria: A review and a reappraisal”. Accident Analysis and Prevention: 23 (5), 343-35.
  • 10. KNM XVII 11-14 Juni 2014 ITS, Surabaya 10 [9] Peden, M. (Ed), (2004), “World Report on Road Traffic Injury Prevention”. World HealthOrganisation, Geneva. [10] Statistics South Africa. 2008. “Mortality and cause of death in South Africa, 2006: Findings from death notification”. Statistics South Africa. [11] Medical Research Council and UNISA. 2007. “A profile of fatal injuries in South Africa 7th Annual Report of the National Injury Mortality Surveillance System 2005”. MRC/UNISA Crime, Violence and Injury Lead Programme, July 2007. [12] Norman, R. Matzopoulos, R. Groenwald, P. and Bradshaw, D. (2007). “The high burden of injuries in South Africa.” Bulletin of the World Health Organization. September 2007, 85 (9). WHO. Geneva. [13] Cameron, A.C. and Trivedi, P.K. (1998). “Regression Analysis of Count Data”. Cambridge University Press, Cambridge, U.K. [14] Jong, P. and Heller, G. Z. (2008). “Generalized Linear Models for Insurance Data.” The International Series on Actuarial Science, Cambridge University Press ISBN-13 978-0-511-38877-4. [15] Hilbe, Joseph M. (2011). “Negative binomial regression” (2nd edition) New York: Cambridge University Press