Dissertation

Modelling The Extinction (And
Colonisation) Of Species On The
Mascarene Islands
Fergus Boyd-Jones
A thesis presented for the degree of
Master of Science in Statistics
School of Statistics, Mathematics and Actuarial Science
University of Kent
United Kingdom
26th
August 2016

Contents
1 Summary 2
2 Introduction 3
2.1 Occupancy Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Existing Approaches For Estimating Time Of Extinction . . . . . . . . . . . . 5
2.3 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.4 Maximum Likelihood Estimation . . . . . . . . . . . . . . . . . . . . . . . . . 6
3 Likelihood Construction 7
3.1 Population Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Parameters to Estimate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4 Functions For Simulations 8
5 Parameter Redundancy 13
6 Linear Population Decline Model 17
6.1 Conﬁdence Intervals Based on Asymptotic Normality of MLEs . . . . . . . . 17
6.2 Likelihood-ratio based CIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
6.3 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6.4 Real Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
7 Changepoint Models 23
7.1 MLE Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
7.2 Known Changepoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
8 Models Assuming a Common Probability of Individual Detection 28
8.1 Extinction Times Conﬁdence Intervals . . . . . . . . . . . . . . . . . . . . . . 29
8.2 Parameter Redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
8.3 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
9 Discussion 33
10 Code 33
11 Acknowledgements 37
1

1 Summary
Using occupancy data from the Mascarene islands provided by Cheke and Hume (2008),
we attempted to estimate extinction times for various native species native to the islands
using the method of maximum likelihood as well as using simulations to test the e↵ectiveness
of these estimators. Models that assumed that the population declined exponentially and
linearly were both considered as well as models that assumed the population was constant
until a changepoint where the population begins to decline were considered. Issues such as
the sparseness of the data and parameter redundancy hindered e↵orts, and we were only able
to estimate extinction times when the population was assumed to linearly decrease - unless we
involved external information estimating the population at a given time. Simulations show
that the estimated extinction times from the linear model work well when the population
is decreasing linearly and tend to underestimate extinction times when the population is
decreasing linearly.
2

2 Introduction
2.1 Occupancy Modelling
This thesis builds heavily on concepts related to the study of occupancy modelling. Occu-
pancy modelling is a method which estimates the probability that a site is occupied by a
particular species. The general structure of approach was proposed independently by Hoet-
ing et al. (2000), Young (2002), MacKenzie et al. (2002), and Tyre et al. (2003). The
occupancy model requires a number of model assumptions to be made.
In occupancy modelling, surveyers are sent to various sites to record whether or not they
detect a member of a species. Detection is recorded as:
Dij =
8
><
>:
1 If at least one individual is observed in site i in time period j
0 If no individual is observed in site i in time period j
(1)
Surveyors do not record how many individuals were detected, nor do they mark individ-
uals for future detection. The data are simply a record of presence or absence of detection
of the species at each site.
To construct the occupancy model, the following parameters are defined. p is the proba-
bility of detection given that the site is occupied. is the probability that a site is occupied.
If a species is detected, then it must also be present, therefore the probability of detection
in a single trial is p. If a species is not detected, it may be present but not detected
(1 p) or not present 1 , so the probability of non-detection in a single trial is equal to
(1 p) + 1 or equivalently 1 p.
If we have T sampling occasions we can also work out the joint probability or likelihood
of our data. It is assumed that the species is either present or absent at every sampling
occasion at the same site. Therefore, if we have even a single sighting at a site, we assume
that every non-sighting is a case of present but not detected. This means that if we have a
confirmed sighting, we only need to include once for each site.
Assume we have data (010,000,110) from three di↵erent sites. In the first site, we have
one sighting and two non-sightings. The sighting tells us that this site is occupied for each
survey occasion, so we write the likelihood as (1 p)p(1 p). In the second site, we have
no recorded observations, therefore the site may be occupied or unoccupied. The likelihood
is (1 p)3 + (1 ): The probability that a site is occupied and has no sightings plus the
probability that a site is unoccupied. In the third site, the likelihood is p2(1 p).
3

The joint probability is simply the product of these probabilities. Finding the values
of our parameters that maximise the joint probability will give us the maximum likelihood
estimates (also known as most likely estimators or MLEs) for this model.
Single season modelling assumes that the state of each site remains the same between
surveys. This means that occupied sites do not become unoccupied, unoccupied sites do not
become occupied and the probability of detection remains constant.
Multiple season modelling relaxes some of the assumptions of single-season modelling and
allows for sites to become occupied or to become unoccupied in between seasons. A season in
this case is a longer measure of time such as a year. The new parameters used in this model
are t - the probability of an unoccupied site being colonised between season t and t+1, and
✏t - the probability of a local extinction at an occupied site in between seasons t and t + 1.
becomes the probability that a site is occupied at the ﬁrst season. For example, if we have
data (01,00,11) from one site over three seasons with two surveys taken each survey. We
know that the site must have been occupied during the ﬁrst season as there are observations
from this period, therefore the probability is (1 p)p. There are no observations during the
second season. It is possible that the site remained occupied during this time and no species
were detected, or it is possible that the species underwent a local extinction, but recovered
before the third season. The probability is therefore (1 ✏1)(1 p)2 + ✏1 2. In the third
season, we have observations at both surveys, therefore the probability is p2.
The Royle-Nichols model (Royce and Nichols, 2003) extends this idea and introduces the
idea that the probability of detection is related to the abundance of the species. ie, the greater
the abundance, the more probable the detection. In occupancy abundance modelling, it is
assumed that each individual has an equal and independent probability of detection r. The
population at each site is considered constant and is denoted Ni at site i. The probability
of detecting at least one individual of a species may be thought of as the compliment of
not detecting any individual. The probability of not detecting any one individual is 1 r,
the probability of not detecting a single individual from a population of size N is (1 r)N ,
therefore the probability of detecting at least one individual can be written as 1 (1 r)N .
Note that there is no probability of occupancy in this model, as the probability of a site
being unoccupied is equivalent to N = 0. The Royle-Nichols method assumes that the Nis
follow a distribution such as the Poisson or negative binomial distribution.
In this thesis, we do not make the assumption that populations are constant within sites.
We assume that the population is changing (decreasing) according to a predictable population
4

model. Our objective is to use the method of maximum likelihood to fit a model wherein
the population of each species (and therefore the probability of detection) is declining over
time and using this model to infer the time of extinction (TOE) for each species.
2.2 Existing Approaches For Estimating Time Of Extinction
One method described by Roberts & Solow (2003), the optimal linear model, uses the result
that the joint distribution of the k most recent sightings tn k+1 < tn k+2 < . . . < tn follow a
Weibull distribution regardless of the parent density. There are multiple methods of inference
based on this result: a maximum likelihood method, a minimum distance method and an
optimal linear estimation method. The optimal linear estimation method is preferred as the
other two fail to give consistent estimators with existent confidence intervals.
2.3 Data
Our data comes from over 400 years of historical data from the Mascarene Islands collected
in the Cheke and Hume’s book Lost Land of the Dodo: An Ecological History of Mauritius,
Réunion & Rodrigues. published in 2008. Since most of our data predates the development
of formal statistical methods of occupancy modelling, the data are not presented to us in the
conventional manner. We do not have sighting occasions - instead the sightings are pooled
into bins wherein a confirmed sighting means that at least one individual of the species was
observed at least once in the time period between the dates given.
Our data may be thought of as a series of independent, identically distributed Bernoulli
trials where the probability p of detection depends on both the probability r of a single
individual being detected (also assumed to be i.i.d.) and the total population at time tj -
denoted as Nj.
It is di cult to visualise the frequency of observations in our data. We can make it
easier to visually evaluate our data by using a grouping technique. observations 1-x are
grouped together and the total number of observations in this group is calculated. Next,
the observations from group 2-(x + 1) are counted and so on. Smaller groups will be more
volatile than larger groups and may make random fluctuations appear more significant that
they really are. We find that using groups of size 10 works well for our data.
Figure 1 gives examples of these plots for several of our species. Some of the species
such as the Red Hen, the Mascarene Coot and the Dodo can clearly be seen to decline
in probability of detection in these visualisations, suggesting a decline in population and
5

0 10 20 30 40 50 60
01234567
Red.Hen
Index
c
0 10 20 30 40 50 60
0.00.20.40.60.81.0
White.throated.Rail
Index
c
0 10 20 30 40 50 60
01234567
Common.Moorhen
Index
c
0 10 20 30 40 50 60
0.00.51.01.52.0
Mascarene.Coot
Index
c
0 10 20 30 40 50 60
01234567
Whimbrel
Index
c
0 10 20 30 40 50 60
01234567
Dodo
Index
c
Figure 1: Visualisation of the frequency of detection for various species.
ultimately extinction. Other species such as the Common Moorhen and the Whimbrel appear
to be rising in probability of detection, and are unlikely to be worth considering for extinction
times.
The aim in this study is to estimate the time of extinction for various species. Our data
includes species for which there is not a single confirmed sighting and also species which were
seen as recently as the year 2000. In the former case, statistical analysis is unnecessary, as
the lack of data means that the MLE time of extinction would occur before or at the exact
time of the first recorded year. In the latter case, the species has clearly not become extinct
during the time period of data collection. However, if the population is declining, we may
still be able to use the data to predict future extinctions.
2.4 Maximum Likelihood Estimation
The likelihood function is a function of the vector of parameters given the data. If the
observations in the data are independent and identically distributed, the likelihood function
is equivalent to the joint distribution. In general, the aim of maximum likelihood estimation
is to find the values of parameters that give the largest value of the likelihood function. If the
likelihood function is relatively simple, this may be derived explicitly by di↵erentiating the
likelihood function (or equivalently the log-likelihood which is often easier to di↵erentiate)
with respect to the vector of parameters, setting the result equal to zero and solving for
6

each of the parameters. In our case, this is not possible, and we instead use numerical
optimisation methods - primarily simplex optimisation (Dantzig et al, 1955) via the optim()
function included in R’s “Graphic” package.
3 Likelihood Construction
As explained in the introduction our model for each species is based on a single population
that is believed to be decreasing.
8
><
>:
1 (1 r)Nj probability of detection at time tj
(1 r)Nj probability of no detection at time tj
(2)
and
Dj =
8
><
>:
1 If at least one individual is detected at time tj
0 If no individual is detected at time tj
(3)
Since our detection probabilities are believed to be independent, we construct the likeli-
hood simply by taking the product of each probability.
nY
j=1
⇣
1 (1 r)Nj
Dj
· (1 r)Nj
1 Dj
⌘
(4)
3.1 Population Models
At this point, we introduce the parameter N0. N0 represents the population at the unobserved
time t0 or one unit of time before the ﬁrst sampling occasion.
There are a number of possible forms the model could take. The population could change
exponentially, the population could change linearly, the population could remain constant
before changing. We will consider each of these methods separately.
An exponential population model assumes that the change in population at any given
time is proportional to the population at that time. Nj = N0 exp( j) where N0 is the
population at t0, one unit of time before the study began.
7

or ✓
8
>>>>><
>>>>>:
< 0 Population is declining
= 0 Population is constant
> 0 Population is increasing
(5)
Our linear population model makes the assumption that every time period, the population
changes by exactly the same amount - regardless of what the population is. In this case,
Nj = N0 + ✓j. One immediate issue with this model is the fact that it allows for negative
populations - a physical impossibility which would result in nonsensical probabilities. This
can be resolved by deﬁning:
Nj =
8
><
>:
N0 + ✓j j < N0/✓
0 j N0/✓
(6)
So that the population remains constant at zero once it has reached zero.
3.2 Parameters to Estimate
There are three parameters which we need to maximise our likelihood function with respect
to. N0, the hypothetical initial population; r, the probability of individual capture and or
✓, the rate at which the population is decreasing for the exponential and linear population
models respectively.
This likelihood cannot be maximised symbolically, instead we need to optimise it numer-
ically.
4 Functions For Simulations
We have three main types of models for our population, each beginning from an initial
population of N0.
• Constant Population. The population remains at a constant N0
• Linearly Changing Population. The population at time j is equal to N0 + j✓
• Exponentially Changing Population. The population at time j is equal to N0 · exp(j✓)
Given an initial population size and a value for ✓, we can easily model the change in
population for each of these models.
8

0 10 20 30 40 50
020406080
Population Over Time
Sampling Occasion
Population
Theta=
−0.05
−0.1
−0.2
−0.3
0 10 20 30 40 50
0.00.10.20.30.40.50.6
Detection Probability Over Time
Sampling OccasionProb.Observed
Theta=
−0.05
−0.1
−0.2
−0.3
Figure 2: Population and Probability of Detection over time for the exponential decay model
The change in population a↵ects the probability of observation. If a species has a pop-
ulation of N and each member of the species can be observed with probability r then the
probability that at least one individual is observed is 1 (1 r)N . Since we are able to model
the change in N, we are able to derive the probability of observation at each time and use
this to simulate datasets.
We are also interested in models that switch at a point s from one model to another.
The most useful are models that switch from a constant population to an exponentially or
linearly decreasing population. This type of model is useful for species that do not start to
decline until a given point, such as the introduction of invasive species.
Figure 2 shows the population and probability of capture over time following each model.
The di↵erences between each model can be seen. Looking ﬁrst at the exponential model, we
notice that the change in probability detection over time resembles the change in population
over time. Since the population never reaches zero, the probability of detection never quite
reaches zero either.
From the left hand graph of Figure 3, we can see that under the linear model, the popu-
lation declines - as you would expect - linearly. ✓ is equivalent to the change in population
each year: If ✓ = 5 then the population will decrease by 5 individuals each year until the
population reaches zero at which point it remains at zero. From the righthand graph of Fig-
ure 3, we see that the decline in probability detection is parabolic, however the probability
9

0 10 20 30 40 50
020406080100
Population Over Time
Sampling Occasion
Population
Theta=
−2
−5
−10
−20
0 10 20 30 40 50
0.00.10.20.30.40.50.6
Detection Probability Over Time
Sampling OccasionProb.Observed
Theta=
−2
−5
−10
−20
Figure 3: Population and Probability of Detection over time for the linear decay model
of detection does still reach zero at the exact same time as the population reaches zero.
Despite the fact that all of our datasets start at the same time point in 1595, it is not
necessarily true that researchers were actively looking for each species in our data at this
point. The Striated Heron for example is a species native to the Mascarene Islands, however
it is not observed until 1825-1829 - after which sightings become frequent and it is also
observed in 1830-1834 and 1835-1839. Perhaps it is possible that these results are accurate,
but it is probably more likely that observations of the Striated Heron prior to 1825 were
simply not being recorded. For this reason, we believe that it may be wise not to include
years before the ﬁrst recorded sighting for each species.
The construction of our likelihood function so far has made the assumption that there is
a single observational period at each time point and that each time point is spaced equally
apart. In practise this is not the case. There is not a simple sampling occasion in our
Mascarene data: a species is recorded as seen if a single member of the species is observed
at any point over the time period. In addition, the intervals between time periods are not
uniform: instead they range from 5 to 13 years.
We also must take into account the uneven intervals between sampling occasions - which
range from 5-12 years. There are two primary ways in which we can interpret these sampling
occasions.
10

• A sighting in an interval of x years implies that it has been seen at least once in x
years. We refer to this as the pooled survey model.
• A single sampling occasion spaced x years apart. The di↵erent interval model.
The second interpretation is much simpler to model. We simply need to evaluate the
population model as we have been doing at the date of the sampling occasion. If we have a
vector of the dates of our surveys (eg 1595, 1600, 1605...) and subtract from each element
the ﬁrst element minus one so that we are left with the vector (1,6,11...) we may call this
vector s and if there are d survey occasions then the general form of the likelihood function
may be written:
dY
j=1
✓⇣
1 (1 r)Nsj
⌘Dj
·
⇣
(1 r)Nsj
⌘1 Dj
◆
(7)
In our data, a sighting at time sj means that there has been a sighting in at least one
of the years sj to sj 1. For example a sighting in the column “1595” means that there
has been a sighting in any of the years 1595, 1596, 1597, 1598 or 1599. The last column is
the year 2000, and we assume that the range for sightings extends until the year 2006. This
method requires a slightly di↵erent interpretation of r. Previously, r was the probability of
a single individual being detected in any given interval. Our new interpretation of r is that r
is the probability of a single individual being detected in a year. Since we have 70 sampling
periods over 405 years, this means that r will be smaller and and ✓ will be larger.
Under this method we need to redeﬁne pj. As in our simpler model, the probability of
at least one observation in any of the years from sj to sj 1 is equal to 1-P(No observations
in this period). This may be written as
pj = 1
sj+1 1
Y
i=sj
qexp( tj)
(8)
Now the general form of our likelihood function is
dY
j=1
⇣
p
Dj
j · (1 pj)1 Dj
⌘
(9)
Figure 4 and Figure 5 shows the detection probability over time using the actual inter-
vals in our data for the pooled time and irregular timing models respectively. The model
for irregularly spaced surveys is very similar to the basic models where the probability of
detection is constantly decreasing as the population is decreasing. However in the combined
11

1600 1700 1800 1900 2000
0.00.20.40.60.81.0
Exponential
year
ProbabilityofDetection
1600 1700 1800 1900 2000
0.00.20.40.60.81.0
Linear
year
Figure 4: Probability of detection over time for the pooled survey model.
1600 1700 1800 1900 2000
0.00.20.40.60.8
Exponential
Year
1600 1700 1800 1900 2000
0.00.10.20.30.40.50.60.7
Linear
Year
Figure 5: Probability of detection over time for the di↵erent interval model.
12

model, we can see that the probability of detection actually rises in the year 1900. This is
because prior to this year, the bins were in 5 year intervals but after 1900 they become bins
of 12-13 years. More years, means more opportunities to detect each species, therefore the
probability of detection may rise despite the decreasing population.
5 Parameter Redundancy
A model M(✓) with parameters ✓ is parameter redundant if M(✓) can be written as a function
of where is a function of ✓ and the dimension of is less than the dimension of ✓. In
plain terms, this means that the model has too many parameters that cannot be estimated
separately from one another. For example, in the occupancy model with one survey, the
parameters and p are only ever encountered as a product of each other. This means that
if the MLE if p is ˆx, this means that there will be a ridge in the likelihood where there
will be an infinite number of solutions defined by = ˆx/p. This parameter redundancy
may be overcome by performing more than one survey at more than one sites. For example,
if we have occupancy data from two surveys at three sites with data 00, 01 and 11 the
probability of each result would respectively be, (1 p)2 + (1 ), p(1 p) and p2.
The likelihood may be constructed as the product of these individual probabilities and the
parameter redundancy has been overcome. However, it is not always possible to overcome
parameter redundancy in this matter.
Figure 6 shows a contour plot of the log-likelihood evaluated at di↵erent values of r and
N0. Ideally, we would find a peak where the log-likelihood was maximised, however we have
a ridge where the function is maximised by a potentially infinite number of combinations of
rs and N0s. This means that we have parameter redundancy, which must be accounted for.
According to Catchpole and Morgan (1997), a method of identifying parameter redun-
dant models is to form an exhaustive summary (✓) of the parameters. (✓) is an exhaustive
summary if knowledge of (✓) uniquely determines M(✓). In our case, the exhaustive sum-
mary consists of the contribution to the likelihood - or equivalently the log-likelihood which
is easier in our case to work with - of an observation and a non-observation at every point
in the trial. This would result in a vector with length 140, however due to the extension
theorem (Catchpole and Morgan, 1997), we do not need to consider every single possibility.
We can test a model with three sampling occasions for parameter redundancy and extend
the results to a model with any number of sampling occasions. Thus, our exhaustive summary
13

60 80 100 120 140
0.020.040.060.080.10
Contour Plot of r and N0
N0
r
Figure 6: Contour plot of the log-likelihood for r and N0
Figure 7: Derivative Matrix
for our model that assumes the population is exponentially decreasing (✓) consists of log(1
(1 r)N0 exp( )), log(1 (1 r)N0 exp(2 )), log(1 (1 r)N0 exp(3 )), log((1 r)N0 exp( )),
log((1 r)N0 exp(2 )), log((1 r)N0 exp(3 )).
The next step is to find the matrix of partial derivatives of this matrix relative to ✓. Our
parameters are N0, r and and our exhaustive summary has length 6, therefore we create a
6x3 matrix by finding the derivative of each element (✓) for each of our parameters.
The rank of this matrix is 2, which again tells us that we are able to estimate two
parameters. We can determine which parameters are estimable by finding solutions to ↵T D =
0. If a field in the vector is equal to 0, then that parameter is estimable. The only solution
to our problem is:
↵T
=
h
1 0 N
(1 r) ln(1 r)
i
(10)
The second field corresponds to which tells us that is always estimable.
14

Parameter redundancy cannot be overcome with more datapoints - we must either change
the model or reparameterise and interpret the new parameters.
For each of our three population models, it is impossible to separately estimate r and N0.
We can - however - reparameterise our likelihood in order to derive new parameters which
we are able to estimate and interpret.
For each of our models, we find that (1 r)N0 may be reparameterised as q - the probability
of no individuals being detected on the first sampling occasion. In the exponential case
(1 r)N0 exp ti simply becomes qexp ti . is una↵ected by this reparameterisation and still
measures the population decline despite the fact that we do not know what the population
is. The linear population model requires another change of parameter. Define k as ✓/N0.
We can now rewrite (1 r)N0+✓ as (1 r)N0(1+k). Substituting q we arrive at q1+k.
k can be thought of as the constant change in population relative to the original popu-
lation. Like ✓, positive values of k correspond to an increasing population, negative values
of k correspond to a decreasing population and the population is constant when k = 0. If k
is negative (ie, the population is decreasing) then the point of extinction will occur at time
1/k. This means that unlike the exponential model, it is still possible to derive the MLE
extinction times from the linear model.
This parameter redundancy is caused by the fact that there are an infinite number of
possible combinations of N0 and r that will give the same q and therefore it is not possible
to distinguish between them. If r is known, or there is an estimate for the population at any
time - these can be substituted into the model and used to find the true values.
nY
j=1
✓⇣
1 qexp( tj)
⌘Dj
·
⇣
qexp( tj)
⌘1 Dj
◆
(11)
Then we will consistently be able to find MLEs and q.
Figure 8 shows the probability of detection over time for the parameter redundant model.
If the initial probability of detection is the same - and in the case of the linear model, the
parameter k is chosen to be the equivalent of ✓ - there should not be any di↵erences between
this model and the simple model.
Figure 9 shows an example of a contour plot for the log-likelihood for di↵erent values of
q and . Notice that unlike the contour plot shown in Figure 6, there is a clear point where
the log-likelihood peaks.
15

0 10 20 30 40 50
0.00.10.20.30.40.50.6
Linear
Sampling Occasion
Population k=
−0.02
−0.05
−0.1
−0.2
0 10 20 30 40 50
0.00.10.20.30.40.50.6
Exponential
Sampling Occasion
Population
Lambda=
−0.05
−0.1
−0.2
−0.3
Figure 8: Probability of detection over time for the parameter redundancy ﬁxed models
0.2 0.4 0.6 0.8
−1.0−0.8−0.6−0.4−0.20.0
Contour Plot of q and lambda
q
lambda
Figure 9: Contour plot of the log-likelihood for q and
16

6 Linear Population Decline Model
It is likely that assuming the population decreases linearly is less realistic than assuming that
the population decreases exponentially, however it has the advantage that the probability of
detection will eventually reach - rather than approach zero. This means that we are able
to estimate extinction times despite not being able to estimate the individual probability of
capture, nor the population at any given time.
The probability at time tj is defined as:
1 q1+tjk
(12)
We know that we are able to estimate k and that this equation will be equal to zero when
tj = 1/k.
6.1 Confidence Intervals Based on Asymptotic Normality of MLEs
If ˆ✓ is the MLE of a parameter vector ✓, then ˆ✓ is asymptotically distributed ⇠ N(✓, 1/nI(✓))
where I(✓) is the Fischer’s Information criteria, defined as E
⇣ 2 log f(X)
✓2
⌘
. In practice, we
are unable to find this as we are maximising our likelihood functions numerically rather than
symbolically. However the negative Hessian matrix (H) as given by R may be used in place
of nI(✓) in this formula. Therefore we may create 95% confidence intervals using the formula:
ˆ✓ ± z↵/2 ·
1
p
H(✓)
(13)
Since we are only interested in the distribution of k (as k wholly defines the point of
extinction), we do not need to consider confidence intervals for q although these could be
constructed using the same method.
This method is primative and comes with several disadvantages. In order to generate
a Hessian matrix, it must be possible to calculate the gradient of the function. Given the
nature of our data, certain choices of parameter will lead to events with probability zero
occurring and events with probability e↵ectively one not occuring. This leads to a negative
log-likelihood being calculated as infinite, and makes the gradient - and therefore Hessian
matrix - impossible to calculate for the majority of our data.
This method assumes that k is asymptotically normally distributed, however this as-
sumption may not be reasonable. Normally distributed variables may range over the entire
real numbers, however k is much more tightly constrained. If ˆk is close to 1/tlastsighted and
17

the standard errors are large, this could result in the time of extinction having a lower bound
that is earlier than the last sighting.
6.2 Likelihood-ratio based CIs
Another way of obtaining confidence intervals for ˆk is to use likelihood-ratios. We know
that the distribution of 2(l1 l0) is chi-square with degrees of freedom equal to c where
c is the di↵erence in parameter space between the two models. Once we have found the
value of ˆk that minimises the negative log-likelihood, we find the maximum and minimum
values of ˆk such that the di↵erence between the negative log-likelihood is not significant at
the 95% level according to the chi-square distribution. To do this we calculate the profile
likelihood for k using and find the minimum and maximum values such that l(ˆkmin) and
l(ˆkmax)  l(ˆk) + 2
0.975(c)/2 .
An advantage of this method is that it will work for every species. This CI will not
necessarily be symmetrical - which is likely to be more natural as the time of extinction has
a lower bound (the last sighting) and no upper bound. It is impossible for this method of
forming confidence intervals to place the lower bound before the last observation, as this
would mean an event with probability zero (observing an extinct species) will occur, which
will have an infinite negative log-likelihood. It is also possible if the data is sparse that the
upper bound will be extraordinarily large. For example, if we have 70 observational periods
for a species which is detected on the tenth observational period and never again. Fitting
our linear model gives us the output:
> optim(c(0,0),LikeP, data=data, meth=LinP)
$par
[1] 2.13038976 -0.05181823
$value
[1] 3.917649
According to this model, the most likely time of extinction is 1/ 0.05181823 =⇠ 20
with a negative log-likelihood of 3.92. Comparing this to a model where ✓ is fixed at 0
(the population is constant). we find a negative log-likelihood of log 1
70 · (1 1
70 )69 =
5.241318, a di↵erence of 1.32 from our linearly decreasing model, a di↵erence which is not
18

significant. This means that the upper bound will be infinite as there is no significant evidence
that the population is decreasing at all. Attempting to find an upper-bound for these species
can be exhaustive and therefore we have set an arbitrary limit the upper bound cannot go
beyond the year 2000. If there is no significant di↵erence between the last year of available
data and the MLE time of extinction, then we interpret this as a possibility that the species
has not gone extinct at all.
6.3 Simulations
Simulations were run at various values of q and k in order to determine which values are
associated with the highest precision and accuracy. Since the Hessian Matrix method of
generating CIs has been found in practice to work on very few datasets, it is ignored for this
section, instead we will be exclusively be considering likelihood-ratio based CIs.
Values of q ranging from 0.1 to 0.9 in steps of 0.1 were considered. Values of 0 and 1
were left out for di↵erent reasons. A value of 0 means that the probability of detection at
time tj is equal to 1 01+k. Clearly this will always be equal to zero unless (1 + k) = 0,
ie the population has gone extinct. This means that detection will be perfect (a vector of
ones) until the species goes extinct at which point the probability of detection will instantly
switch to zero. A value of 1 means that the species is already extinct by t0 and therefore
simulating this value will just correspond to a vector of zeroes. Values of k were chosen
so as to give extinction points at every ten year interval. Each possible combination of q
and k was simulated with 100 observational periods 1000 times. The standard error for the
estimated extinction time was calculated for each combination by the sum of the square of
the di↵erence between the true value and the estimated value. 95% confidence intervals were
calculated in order to empirically investigate how well this CI works.
In Table 1 the column names refer to the true time of extinction and the row names
refer to the initial value of q. The proportion of estimates for which the true extinction time
falls within the 95% confidence interval may be seen as a measurement of the estimations
accuracy. Considering q and k separately, we see that the estimation is least accurate for
large values of q. The accuracy increases as q decreases, peaking somewhere around 0.3 at
which point the accuracy begins to fall again. There appears to be a plateau between 0.6
and 0.4
A greater proportion of estimated CIs contain the true extinction time as k becomes larger
(ie, the true extinction point is farther away from the first observational period), however it
19

10 20 30 40 50 60 70 80 90 100
0.1 86.9 91.8 94.8 94.9 93.5 95.5 94.8 95.9 96.1 95.1
0.2 87.9 93.5 94.1 95.2 93.5 94.8 95.8 96.4 96 95.1
0.3 90.6 95 93.8 95.2 95.8 96 96.4 95.8 95.4 94.1
0.4 89.6 92.1 94.2 94.1 95.4 95.5 94.8 95 95.3 95.3
0.5 88.9 93.9 95 95 95 94.2 95.2 94.7 95.4 93.9
0.6 90.2 93.5 94 94.5 94.6 94.3 95.3 95.7 95.3 94.2
0.7 88.7 93.6 93.8 93.5 93.6 95.5 95.1 96.4 95 93.3
0.8 88.7 93.3 93.4 91.9 94.1 94.1 95 94 94.6 92.9
0.9 93 96 96.5 94.3 94.4 94.1 93 90.7 89.8 86.5
Table 1: Coverage estimates for di↵erent vales of q and k
begins to decrease when the true time to extinction is greater than 70. This is likely due to
the fact that the true extinction period is close to the end of the study, and therefore the
trailing zeroes associated with extinction are not enough to confirm extinction.
The greater coverage is likely not due to greater accuracy but rather wide confidence
intervals caused by flatter likelihoods. A larger q means that the initial probability of capture
is smaller and a smaller (further than less than zero) k means that the probability of detection
decreases faster. If we choose to simulate 100 sampling occasions where q = 0.9 and k = 0.1
we find that the probability that we do not detect a single individual in a single sampling
occassion to be 0.62. This means that for 62% of trials with these parameters, the MLE will
be q = 1 (probability of detection =0 for every sampling occasion) and the choice of k will
be arbitrary and have no e↵ect on the value of the likelihood (which will be 1). This means
that the algorithm for determining confidence intervals will continue to run until it reaches
the bounds we set - resulting in confidence intervals for the time of extinction that range
from one to positive infinity. However, in the 38% of trials with at least one detection (in
the first 10 observational periods) the confidence intervals will tend to be narrow.
The other situation that will often lead to wide confidence intervals is when the true time
of extinction is close to the last observational period - especially if the initial probability
of detection is already low. If we chose parameter values q = 0.9 and k = 0.01, then
the probability of detection in the first trial is 0.099 and the probability of detection in the
ninety-nineth trial is 0.001. If we chose q = 0.1 then the equivalent probabilities are 0.90
and 0.02. In the latter case, the probability of detection decreases by a factor of 45, meaning
20

10 20 30 40 50 60 70 80 90 100
0.1 1.979 2.652 2.897 3.218 3.676 3.808 4.005 4.098 4.274 5.998
0.2 2.512 3.199 3.566 3.953 4.39 4.505 4.769 4.984 5.34 7.179
0.3 2.885 3.483 4.194 4.411 5.002 5.277 5.361 5.992 6.193 9.216
0.4 3.619 4.671 4.995 5.465 6.007 6.39 6.542 7.021 7.366 11.318
0.5 4.193 5.208 5.662 6.204 7.27 7.513 7.843 8.006 8.788 85.58
0.6 4.934 6.322 7.363 8.089 8.56 9.204 9.42 9.505 17.961 60.216
0.7 5.684 7.899 9.169 10.288 10.973 11.192 11.908 11.951 18.306 136.349
0.8 6.842 10.531 12.729 13.989 15.04 15.764 16.448 18.013 72.092 7912.621
0.9 7.85 13.852 18.722 21.157 23.385 25.983 31.978 88.61 1324.906 2445.194
Table 2: Variance of estimated time of extinction from true time of extinction
that there is usually a clear drop in the frequency of detections as the experiment goes on.
In the former case, the probability of detection decreases by a factor of 10. The sparseness
of the data means it is di cult to tell the di↵erence between a decreasing probability and a
constant low probability of detection.
In Table 2 we can see the variance of the estimated time of extinction from the true
time of extinction over our 1000 simulations. The variance becomes larger as the true time
of extinction and q increase, likely due to the fact that the data is sparser under these
conditions.
These simulations are based on a model for which the population is actually linearly
decreasing. This is unlikely to truly be the case for our data. However there are many
obstacles in determining how well our linear model works when applied to exponential data.
The first obstacle is that there is no definitive extinction time for the exponential model as -
by definition - the population will approach but never reach zero. In order to test our model’s
e↵ectiveness datasets following the exponential population model were simulated for various
values of N0 and and a linear model fitted. Let te be the time of estimated extinction,
then we are interested in finding the true population (according to the simulated exponential
model) at time te. To do this we use the formula N0 exp( te) = Ne
Table 3 shows the median true remaining population at the estimated time of extinction.
The surviving population seems to be greater when N0 is larger and when is closer to zero.
Both of these conditions are associated with later extinction times, so it seems that the model
fails when the species does not become extinct during the course of the experiment. All of
21

-0.1 -0.0775 -0.055 -0.0325 -0.01
50 3.52 2.63 2.18 2.8 11.9
100 3.88 3.3 2.65 4.37 21.77
150 4.55 3.45 3.19 6.27 31.9
200 4.94 3.79 3.62 8.1 42.51
250 4.62 3.93 3.82 9.77 52.96
Table 3: Median true population at estimated time of extinction for di↵erent values of N0
and
-0.1 -0.0775 -0.055 -0.0325 -0.01
50 0.12 0.19 0.26 0.49 0
100 0.57 0.52 0.82 1.54 0
150 1.02 1.03 1.2 2.97 0
200 1.1 1.22 1.54 4.09 0.36
250 1.43 1.41 1.75 5.22 0.33
Table 4: Median true population at upper 95% CI for di↵erent values of N0 and
these values are greater than 1, meaning that the estimated time of extinction is consistently
earlier than the true time of extinction. For this reason it is worthwhile to consider the lower
95% conﬁdence interval as an alternative estimate.
Table 4 shows the median true remaining population at the time of the upper 95% CI for
the TOE. Having accounted for the inﬂuence of the extreme outliers, these results are more
acceptable, however they still consistently underestimate the time of extinction. There is a
very clear pattern that the remaining population gets larger when the population is larger
and when the rate of decline is slower.
Choosing the upper bound as an alternative estimate gives a more conservative estimate,
however it will overestimate if the initial population is small, or if the population decline is
too slow.
For this reason it is advised that this method is most successful when the true extinction
time is believed to occur during the experiment. If it is not obvious whether or not this is
the case, you can check post hoc whether the model has determined that the most likely
extinction time happened during the experiment.
22

6.4 Real Data
So far this model has assumed the most basic structure where observation periods occur at
single points spaced equally apart. Since this is not true of our data, it is necessary to use
the alternative model. Applying this method to our real data, we find that only 21 of our 70
species are estimated as having gone extinct at all. For the dodo, we have an estimate of 1670
with a 95% confidence interval of 1660 to 1702. This is earlier than the estimate provided by
Roberts and Solow based on the optimal linear method who estimated the time of extinction
at 1693 and within 1688 and 1715. However, our method looks exclusively at data from the
Mascarene Islands, not Mauritius and therefore a direct comparison is meaningless.
Table 8.3 shows the estimated TOE of each species that was estimated becoming extinct.
Bear in mind, that it is likely that the exponential model is a better fit for the decline in
population of these species, therefore in accordance with our findings, it is also likely that the
95% upper bound is a better estimate of the true time of extinction that the MLE estimate.
Generally, the earlier a species is estimated to have gone extinct, the narrower the confidence
interval. The dodo for example has a confidence interval of 12 years, however the frigatebird
which is estimated to have gone extinct in the year 1922 has a confidence interval spanning
over 17000 years. Other species estimated to have become extinct in the 20th century such
as the tropical shearwater and the pigeon hollandais have similarly wide confidence intervals.
We interpret these as meaning that the data is not su cient enough for there to be significant
evidence that these species have become extinct.
7 Changepoint Models
All models considered so far have made the assumption that the population decline began
at - or had begun prior to - the start of the experiment. However this is not necessarily the
case. It is possible that the population of a species remained constant for a while until it
begins to decline at a later date. In order to model this, we need to combine a constant
population model with a linearly or exponentially decreasing model. The model has a fixed
probability of detection 1 q for every sampling occasion until ts - hence referred to as the
changepoint - at which point the probability begins do decline.
If we attempt to fit a model to data with a changepoint that occurs before the first
observation, the MLE of the probability of detection pre-change will be 0. This means
that if there are any observations after this changepoint, the probability of detection cannot
23

Species Estimated TOE 95% Lower Bound 95% Upper Bound
Dugong 1856 1804 2125
Loggerhead Turtle 1605 1602 1623
Wedge-tailed Shearwater 1600 1597 1618
Tropical Shearwater 1905 1869 2476
Abbott’s Booby 1670 1667 1688
Frigatebirds 1922 1841 19139
Pink-backed Pelican 1600 1597 1618
Rougette 1897 1865 2032
Microbats 1812 1802 1873
X Herons 1622 1617 1643
Dimorphic Egret 1635 1626 1680
Mauritius Night Heron 1732 1695 2368
Greater Flamingo 1852 1801 2108
Mascarene Teal 1720 1695 1795
Mauritius Sheldgoose 1702 1683 1782
Reunion Harrier 1641 1627 1777
Red Hen 1712 1690 1775
White-throated Rail 1810 1807 1828
Mascarene Coot 1732 1695 1988
Dodo 1670 1660 1702
X Pigeons / doves 1762 1738 1846
Pigeon Hollandais 1943 1855 2693
X Parrots 1862 1806 2168
Raven Parrot 1697 1673 1803
Thirioux’s Grey Parrot 1817 1749 1994
Commerson’s Lizard Owl 1922 1796 11404
X Swallows 1872 1814 3075
Tortoises (Two Species) 1736 1719 1792
X Lizards 1806 1778 3477
Bojer’s Skink 1867 1856 1912
X Green Day Geckos 2058 1934 8543
Table 5: Estimated time of extinction with 95% Conﬁdence intervals for each species
24

1600 1700 1800 1900 2000
2830323436
Profile −Log−Likelihood for s
Year
Loglikelihood
1600 1700 1800 1900 2000
0.000.050.100.150.200.250.300.35
Probability over Time
YearProbabilityofDetection
Figure 10: Left: Profile negative log-likelihood for s. Right: Probability over time evaluated
at the greatest likelihood
decrease or remain constant - it must increase to avoid impossible events from occuring. It
also means that the model will fit too perfectly until the chosen changepoint (k events with
probability 0 not occurring has probability 1) and means that the negative log-likelihood
may be lower for these estimates in a way that does not reflect their goodness of fit. This
results in a nonsensical estimate. One possible solution may be to limit the range where the
changepoint may be estimated so that it cannot fall before the first observation, however
this ignores the possibility that the changepoint may indeed have occured before the first
detection. Alternatively, we could in these cases replace the MLE 0 with an approximation
of the upper bound of the probability. Hanley’s formula for the “rule of three” (Hanley,
1983) shows that the 95% confidence interval for ˆp when there have been 0 successes in n
independent Bernoulli trials is between 0 and 3/n.
7.1 MLE Fitting
Figure 10 shows the profile negative log-likelihood for di↵erent values of s for the dugong
occupancy data assuming that the population model switches from constant to exponentially
decreasing. The vertical dotted line represents the time of last sighting of the dugong. The
minimum negative log-likelihood coincides exactly with the last observation, meaning that
the MLE of the changepoint at which the species began to decline occurs at the time of
25

last observation. We find that for many of our datasets, the likelihood is maximised when
s = tlastsighted and that in each of them, this time is at the very least a local minimum.
Looking at the parameter values estimated at ˆs for the dugong dataset, we find that the
MLE values are ˆq = 0.64 and ˆ = 15.08. The righthand graph in Figure 10 shows the
probability over time for these parameter values and explains why this relationship occurs.
The value of ˆ is so drastic that the population - and hence the probability of detection -
e↵ectively drops to zero before the time of the next sampling occasion. Since there are no
sightings after the last sighting, the probability of each of these n s sampling occasions is
1. The dugong is observed on 15 sampling occasions and is last seen on the 42nd sampling
occassion. Treating these 42 sampling occasions as a series of Bernoulli trials, we would
calculate ˆq = 1 15
42 = 0.64 which is the same as our estimate for ˆq derived from fitting a
changepoint model using optim(). The pattern is also found in the rest of our datasets.
Using the method of maximum likelihood to choose s is inappropriate as it tends to
choose the time of last observation, however it is possible that local maximums prior to the
last observation may correspond to the actual times at which a constant population began
to decline. In order to test this, mixed model datasets were simulated at various values
of s, k and q. For each set of parameters, five datasets were simulated and used to find
the average profile likelihood for s with the intention that this would be less susceptible to
random noise than a single dataset. These profile likelihoods can be seen in Figures 11 and
12 with vertical lines corresponding to the true changepoints. If the method of maximum
likelihood was sensitive to the true changepoint, then we would see dips or valleys close to
the vertical lines, however a visual inspection was unable to detect any evidence for such a
phenomenon. This suggests that it is entirely inappropriate to attempt to use the method
of maximum likelihood to determine where a changepoint occurs.
7.2 Known Changepoint
Is it better to fit a mixed model or a simple one if the changepoint is known? Assume that
the changepoint is known - as it can be in our simulated datasets. Our desired model assumes
a constant probability of detection before time s and after time s the probability of detection
will decline. Our alternative model makes no assumption of the probability of detection
before time s and thus the datapoints from this period are not used. Our motivation for
believing that our desired model will give a better fit is the assumption that the data from
before s will help us obtain a more accurate estimate of q and therefore our estimates of
26

0 10 20 30 40 50 60 70
354045
True k = −0.05 , True q = 0.3
time
Loglikelihood
0 10 20 30 40 50 60 70
42.543.544.5
True k = −0.025 , True q = 0.3
time
Loglikelihood
0 10 20 30 40 50 60 70
3031323334
True k = −0.05 , True q = 0.7
time
Loglikelihood
0 10 20 30 40 50 60 70
39.039.139.239.339.439.5
True k = −0.025 , True q = 0.7
time
Loglikelihood
Figure 11: Average proﬁle likelihood for di↵erent values of q and ✓
0 10 20 30 40 50 60 70
25303540
True k = −0.05 , True q = 0.3
time
Loglikelihood
0 10 20 30 40 50 60 70
363840424446
True k = −0.025 , True q = 0.3
time
Loglikelihood
0 10 20 30 40 50 60 70
2022242628
True k = −0.05 , True q = 0.7
time
Loglikelihood
0 10 20 30 40 50 60 70
2829303132
True k = −0.025 , True q = 0.7
time
Loglikelihood
Figure 12: Average proﬁle likelihood for di↵erent values of q and ✓
27

0.2 0.4 0.6 0.8
-0.1 0.7 2.19 5.44 3.51
-0.05 0.66 3.69 6.13 5.74
-0.0˙3 2.36 3.55 16.3 21.14
-0.02 2.63 4.54 11.37 29.22
Table 6: Di↵erence in fit between the full model and the model beginning after the change-
point
and k will be better informed.
Simply comparing the likelihoods or even the AICs is inappropriate as the likelihood for
the alternative method is based on fewer datapoints than the desired method. We compare
the two methods by measuring the deviance of our estimated time of extinction in each
method to the true time of extinction using the formula:
(True Time of Extinction - Estimated Time of Extinction)2
No. of Simulations
(14)
for each value of k and q, we generate 100 datasets with length 100 and s fixed at 25 and
take the average di↵erence in fit between the two models.
Figure 6 shows the results of this test evaluated over 1000 simulations at various values
of q and k. Unsurprisingly, the model which includes the entire data provides a fit that is
consistently better than the model which excludes the data from before the changepoint. It
seems to work best when q is large and k is close to zero. We know from Table 6.3 in section
5.3 that large values of q and values of k close to zero are associated with greater variance
due to the sparseness of the data and the persistence of sightings. In these situations, fitting
the full model gives a more accurate estimate of q which in turn makes the estimate of k
more accurate. However the improvement is negligible when q is small and k is substantial.
This method is advised when the probability of detection is low and the species is estimated
as becoming extinct late in the dataset.
8 Models Assuming a Common Probability of Individual De-
tection
Reparameterising our model enables us to estimate our parameters, however for the exponen-
tial model it has made it impossible to estimate extinction as we cannot know the remaining
28

population at any time. To make the exponential population model useful, we need to in-
volve new information. In basic occupancy modelling, parameter redundancy is overcome by
simultaneously conducting surveys in more than one site, so perhaps a similar assumption
may help overcome parameter redundancy in our case.
Assume we have three closely related species of rabbits. If they are similar in size and
behaviour, it is possible that we might be able to assume that the individual probability
of detection r is common to each of them. Remembering that our parameter redundancy
was caused by the fact that r and N0 are always encountered in the same relationship, it is
possible that making this assumption would allow us to avoid our parameter redundancy.
We assume that r - the rate of individual detection is the same for species a, b, and c.
The probability of detection of each species at time j is equal to:
P(a) = 1 (1 r)N0 exp( j)
(15)
P(b) = 1 (1 r)b·N0 exp(✓j)
(16)
P(c) = 1 (1 r)c·N0 exp(µj)
(17)
Where N0 is the initial population of species a and b and c are the size of the initial
populations of species b and c relative to species a. eg, if species b has twice the population
of species a, b = 2.
It would also be possible to use this method with the linear population model, however
since it is already possible to estimate extinction times from the reparameterised model, it
is unnecessary to incorporate new information into the model.
8.1 Extinction Times Conﬁdence Intervals
Since the exponential model for population decline will never reach zero, it is necessary to
choose a minimum viable population - such that once the population dips below this point
it is considered extinct. Potential populations such as 1 or 0.5 will be considered. For now
we will use ↵ to refer to this value. In order to work out the variance of the time (tj) that
the population reaches ↵:
N0 exp( tj) = ↵ (18)
tj = log(↵/N0)/ (19)
29

We can use the delta method to derive the variance of tj. First, we must find the Jacobian
vector for this equation
h
tj
N0
,
tj
i
JT
=

1
N0
,
log(↵/N0)
2
(20)
And use the formula var(tj) = JT · S · J where S is the sample covariance matrix of N0
and . This can be found numerically by inverting the Hessian matrix given by optim(). We
may use this variance to construct confidence intervals based on the asymptotic normality.
8.2 Parameter Redundancy
Unfortunately, even with this additional data, we still find the same type of parameter
redundancy that we found when attempting to fit a model with one species. It is still
possible to make the substitution q = (1 r)N0 . If we do this, then our probabilities may be
rewritten as:
P(a) = 1 qexp( j)
(21)
P(b) = 1 qb·exp(✓j)
(22)
P(c) = 1 qc·exp(µj)
(23)
If we make a further step and redefine s = qb and w = qc, we find that we e↵ectively
return to a state where there are no shared parameters between each species and we are
e↵ectively estimating the parameters for each species independently of the other species in
the same model. The model for each separate species only contains two items of information:
The initial probability of detection and the rate at which the probability is decreasing be-
tween time periods. Any method that allows each individual species to have di↵erent initial
probabilities and di↵erent rates of decline will always result in parameter redundancy.
However, even if this model cannot be used to determine time of extinction it may still
be useful in other contexts. You are still able to estimate the size of populations relative to
each other - as well as the rate of decline. This means that if you know the population of
one of the species, you are able to infer the others.
The time of extinction cannot be estimated with an exponential population model without
additional information that does not come from the occupancy data. If a technique such
as the Lincoln-Petersen estimate (Lincoln, 1930) or Chapman estimate (Chapman, 1951)
30

were used to estimate the population at any point in the experiment, it would be possible to
estimate the extinction as the model would no longer be parameter redundant. Let Pj be the
population estimate according to either of these methods at time tj. In order to incorporate
this information into the model, first we must fit the parameter redundancy-fixed model
with parameters q and . Now we are able to equate 1 ˆqexp(ˆj) to 1 (1 ˆr)
ˆPj . Since
we have estimates for ˆq, ˆ, and ˆPj, we are able to express ˆr in terms of these parameters:
ˆr = 1 ˆqexp(ˆj)/ ˆPj . This estimate of ˆr may in turn be used to estimate ˆN0 as the parameter
q has been defined as (1 r)N0 therefore ˆN0 = log(ˆq)/ log(1 ˆr)
If estimates of all parameters are available, it is possible to use the model to evaluate
the population at each time period and determine when the species has gone extinct. It is
still unclear however, at what point the population becomes small enough for extinction to
have occured. An obvious choice is to conclude that the species has gone extinct when the
population 1, as this is the earliest point that implies that there are no remaining living
specimens. Another more conservative point that may be considered is the point at which
the population 0.5. At this point, the population is closer to 0 than it is to 1, therefore,
this is the point at which zero remaining specimens becomes more likely than one remaining
specimen. There is no pure statistical solution to this uncertainty, therefore we choose to
use 0.5 for our purposes.
8.3 Simulations
Simulations were run in order to estimate times of extinction and CIs for species for which
it is appropiate to use this method. To overcome our parameter redundancy, we assume
that we know the true population at t0 of the species at site a and we use this to infer the
population for the other two sites.
In practice, confidence intervals were not able to be derived for these estimates using
the delta method as above as the estimate for the population of site a involves data from a
di↵erent source from our occupancy model and therefore we cannot determine the covariance
of this estimate with the estimates derived from our occupancy data. There are simply too
many combinations of these parameters for us to be able to explore the relationship between
every combination. We instead focus on two areas:
• Three sites with the same initial population and di↵erent values of evaluated at
di↵erent values of r. Simulations are run at three di↵erent initial population in order
31

N0 = 50 N0=100 N0 = 150
-0.07 -0.1 -0.15 -0.07 -0.1 -0.15 -0.07 -0.1 -0.15
66 47 31 76 53 36 82 58 39
0.01 61.69 43.83 28.13 72.49 51.3 33.08 86.51 56.19 36.31
0.05 75.46 46.78 31.92 89.37 54.03 36.97 96.21 56.89 38.25
0.1 74.39 44.28 31.6 91.57 54 37.16 97.14 58.43 40.73
Table 7: Estimated TOEs for simulated data with identical initial populations at di↵erent
values of r (rows) and (columns). True extinction times are given below the values of
90 100 110 50 100 150 50 150 300
75 76 78 66 76 82 66 82 92
0.01 73.96 74.24 77.78 52.7 67.52 76.45 42.99 70.77 79.6
0.05 86.49 76.18 77.29 62.99 74.89 86.66 60.9 80.2 97.23
0.1 96.83 80.73 81.39 73.38 76.36 85.12 64.55 81.08 91.29
Table 8: Estimated TOEs for simulated data with identical at di↵erent values of N0
(columns) and r (rows). True TOEs are given below the values of N0
to determine if this method is most e↵ective when N0, and r are small medium or
large and whether there appear to be any interaction e↵ects.
• Three sites with the same and di↵erent values of N0 in order to determine whether
this method is more e↵ective when the values of N0 are closer to one another or more
varied.
Table 8.3 shows average estimated TOEs for various values of r and for three di↵erent
sites with identical initial populations. The estimates seem to be just as e↵ective for large
values of N0 as it is for small values of N0. Estimates tend to underestimate when r is
small and overestimate when r is large. They also underestimate for greater values of and
overestimate for values of that are closer to zero. Small values of r are associated with
lower probability of detection, and greater values of are associated with the probability of
detection decreasing more rapidly, suggesting that this could be due to relative sparseness of
the data leading to earlier estimates of extinction. However, these estimates are still fairly
accurate.
32

9 Discussion
Based on the limited form of the data provided, it is di cult to obtain any meaningful re-
sults from maximum likelihood estimation. We are unable to estimate the population at
any point, and we cannot estimate when a species becomes extinct unless we assume that
the population is linearly decreasing - which is not realistic in practice. This method consis-
tently estimates the true TOE when applied to data where the population is exponentially
decreasing, therefore the upper% CI may work as an alternative, more accurate estimate.
If the population is believed to have been constant up to a changepoint at which point
the population began to decline, we are unable to reliably estimate where this changepoint
might have occurred as the method of maximum likelihood commonly finds the most likely
changepoint to be the last observation, and we have not found any evidence that there is
even a local maximum associated with the true changepoint.
Models which assume that species at di↵erent sites with the same individual probability of
detection do not allow us to estimate the population - only the relative size of the populations
proportional to one another and the rate of extinction - which may be useful in their own way.
A possible avenue for additional research might be to attempt to combine these models with
the assumptions of the Royle-Nichols model - namely that the populations in each site are
distributed according to some discrete distribution, usually the Poisson or negative binomial.
However this was beyond the scope of this thesis.
Our methods may also be applied to species which colonise a site as well as species that
become extinct. For the linearly increasing population model, we could achieve this by fixing
the initial population at 0 and fitting the model. This would not work for the exponentially
increasing population model, as the 0 exp will be 0 for any value of . Alternatively, the
initial population would need to be set at some suitably small number such as 0.001.
10 Code
Several thousand lines of code were written for this thesis. A comprehensive list of every
function and script used would be excessive, however much of the code consisted of subtle
variations of the same basic structure.
For each of the population models (Linear, Exponential Constant), a function was created
to generate a vector of length n with initial population N0 and rate of decay (in the case of
the linear and exponential models) ✓.
33

#================= Simulates Constant N
ConstN <- function(N0,n,theta)
#=================
{
rep(N0,n)
}
#================= Simulates Linearly changing N
LinN <- function(N0,n,theta)
#=================
{
i <- 1:n
N <- N0 + theta*i
N[which(N<0)] <- 0
N
}
#================= Simulates Exponentially changing N
ExpN <- function(N0,n,theta)
#=================
{
i <- 1:n
N0*exp(i*theta)
}
These functions were called within another function SimDat() which took as arguments:
• N0 = Population at t0.
• n = Number of observational periods.
• r = individual probability of detection.
• ✓ = rate of decline (for Linear and Exponential models).
34

• method = Speciﬁes which population model is to be used.
#================= Simulates Dataset
SimDat <- function(N0,n,r,theta,method=c(ConstN,LinN,ExpN))
#=================
{
N <- method(N0,n, theta)
p <- 1-(1-r)^N
Data <- numeric(n)
for(i in 1:n){
Data[i] <- rbinom(1,1,p[i])
}
list(Data=Data,N=N,p=p)
}
This function calls the population function speciﬁed by the choice of method, calculates
the probability of detection at each sampling occasion according to the formula pj = 1
(1 r)Nj and uses this probability in conjunction with the rbinom() function to randomly
generate a dataset. It gives this dataset as output along with the values of N and p.
The function SimDatMix() uses a similar principle to generate datasets from changepoint
models. This function retains many of the same arguments as SimDat() however it introduces
new argument s = the changepoint, 1 < s < n and replaces ✓ and method with ✓1, ✓2,
method1 and method2: where method1 and ✓1 apply to the dates before the changepoint and
method2 and ✓2. Note that although only models where method1 is constant are considered,
the function still allows for any combination of population models. The code for this is:
#================= Simulates Mixed Dataset
SimDatMix <- function(N0,n,s,r,theta1,theta2,method1=c(ConstN,LinN,ExpN),
method2=c(ConstN,LinN,ExpN))
#=================
{
Na <- method1(N0,s, theta1)
Nb <- method2(Na[s],(n-s), theta2)
N <- c(Na,Nb)
35

p <- 1-(1-r)^N
Data <- numeric(n)
for(i in 1:n){
Data[i] <- rbinom(1,1,p[i])
}
list(Data=Data,N=N,p=p)
}
The next function calculates the negative log-likelihood for given parameters, data and
a speciﬁed population model.
#================= Calculates Negative Log Likelihood
Like <- function(lambda,data,meth=c(ConstN,LinN,ExpN))
#=================
{
N0 <- lambda[1]
r <- 1/(1+exp(-lambda[2]))
theta <- lambda[3]
n <- length(data)
N <- meth(N0,n, theta)
p <- 1-(1-r)^N
like <- p^data*(1-p)^(1-data)
-sum(log(like))
}
#================= Calculates Mixed Negative Log Likelihood
LikeMix <- function(lambda,data,meth1=c(ConstN,LinN,ExpN),meth2=c(ConstN,LinN,ExpN))
#=================
{
N0 <- lambda[1]
r <- 1/(1+exp(-lambda[2]))
theta1 <- lambda[3]
36

theta2 <- lambda[4]
n <- length(data)
s <- lambda[5]
Na <- meth1(N0,s, theta1)
Nb <- meth2(Na[s],(n-s), theta2)
N <- c(Na,Nb)
p <- 1-(1-r)^N
like <- p^(data)*(1-p)^(1-data)
-sum(log(like))
}
Models fixed for parameter redundancy, models that start at the first observation, models
that take the di↵erent time periods into account are all based on alterations of this style.
11 Acknowledgements
Thanks are given to Rachel McCrea and David Roberts for supervision and guidance with
respect to the direction of this thesis and their knowledge of statistics and ecology. Cheke
and Hume for provision of the data used, and the R community for the tools used.
References
[1] Catchpole, E., and Morgan, B. Detecting Parameter Redundancy, Biometrika 84.1 (1997):
187-96.
[2] Catchpole, E., and Morgan, B. Deficiency of Parameter-redundant Models. Biometrika
88.2 (2001): 593-98.
[3] Chapman, D.G. Some properties of the hypergeometric distribution with applications to
zoological sample censuses. Berkeley, University of California Press (1951): 131-159
[4] Cheke, A. S., and Hume, J. Lost Land of the Dodo: An Ecological History of Mauritius,
Réunion & Rodrigues. New Haven: Yale UP, 2008.
37

[5] Dantzig, G.B, Orden, A. Wolfe, P. The generalized simplex method for minimizing a linear
form under linear inequality restraints Paciﬁc Journal of Mathematics (1955) : 183-195
[6] Hanley, J.A. If Nothing Goes Wrong, Is Everything All Right? Interpreting Zero Numer-
ators. JAMA: The Journal of the American Medical Association 249.13 (1983): 1743-745.
[7] Hoeting, J.A., Leecaster, M. and Bowden, D. An Improved Model for Spatially Correlated
Binary Responses. Journal of Agricultural, Biological, and Environmental Statistics 5.1
(2000): 102.
[8] Lincoln, F.C. Calculating Waterfowl Abundance on the Basis of Banding Returns Wash-
ington, DC: United States Department of Agriculture. Circular. 118. (1930)
[9] Mackenzie, D.I., Nichols, J.D. Lachman, G.B. Droege, S. Royle, J.A, and Langtimm,
C.A. Estimating Site Occupancy Rates When Detection Probabilities Are Less Than One.
Ecology 83.8 (2002): 2248.
[10] R Core Team R: A language and environment for statistical computing. R Foundation
for Statistical Computing, Vienna, Austria. (2015) URL https://www.R-project.org/.
[11] Roberts, D.L., and Solow, A.R. Flightless Birds: When Did the Dodo Become Extinct?
Nature 426.6964 (2003): 245.
[12] Royle, J. Andrew, and Nichols, J. Estimating Abundance From Repeated
Presence Absence Data Or Point Counts. Ecology 84.3 (2003): 777-90.
[13] Tyre, A.J., Tenhumberg, B. Field, S.A. Niejalke, D. Parris, K. Possingham, H.P. Im-
proving Precision And Reducing Bias In Biological Surveys: Estimating False-Negative
Error Rates. Ecological Applications 13.6 (2003): 1790-801.
[14] Young, E. Statistical Methods for Timed Species Counts. Master’s thesis, University of
Kent. (2002)
38

Dissertation

Recommended

Recommended

More Related Content

Similar to Dissertation

Similar to Dissertation (20)

Dissertation