Data Science can be used to inform environmental management and policy. We have been working with several NSW Government stakeholders to give insight into how the Sydney community interacts with it's beautiful harbour.
17. Species distribution models
A model that relates
environmental predictors
to known species locations
across a landscape
To provide
understanding or
prediction
18. Fig. 1. (a) Example presence-only data—atlas records of where the tree species An-
gophora costata has been reported to be present, west of Sydney, Australia. The study
region is shaded. (b) A map of minimum temperature (◦
C) over the study region. Vari-
ables such as this are used to model how intensity of A. costata presence relates to the
environment. (c) A species distribution model, modeling the association between A. costata
and a suite of environmental variables. This is the fitted intensity function for A. costata
records per km2
, modeled as a quadratic function of four environmental variables using a
point process model as in Section 4.
example is given in Figure 1(a). This figure gives all locations where a par-
perform well in characterizing
the natural distributions of
species (within their current
range)
Occurrence points Enviro predictor Model prediction
Warton and Sheppard (2010) Annals App. Stat.
Elith et al (2009) Annu. Rev. Ecol. Evol. Syst. 2009.
Warton and Aarts (2013) J. Anim. Ecol.
useful ecological insight and
strong predictive capability
19. Ecological Applications, 24(1), 2014, pp. 71–83
Ó 2014 by the Ecological Society of America
Prediction of fishing effort distributions
using boosted regression trees
CANDAN U. SOYKAN,1,2,3
TOMOHARU EGUCHI,1
SUZANNE KOHIN,2
AND HEIDI DEWAR
2
1
Marine Mammal and Turtle Division, Southwest Fisheries Science Center, National Marine Fisheries Service,
National Oceanic and Atmospheric Administration, 8901 La Jolla Shores Drive, La Jolla, California 92037 USA
2
Fisheries Resources Division, Southwest Fisheries Science Center, National Marine Fisheries Service,
National Oceanic and Atmospheric Administration, 8901 La Jolla Shores Drive, La Jolla, California 92037 USA
Abstract. Concerns about bycatch of protected species have become a dominant factor
shaping fisheries management. However, efforts to mitigate bycatch are often hindered by a
lack of data on the distributions of fishing effort and protected species. One approach to
overcoming this problem has been to overlay the distribution of past fishing effort with known
locations of protected species, often obtained through satellite telemetry and occurrence data,
to identify potential bycatch hotspots. This approach, however, generates static bycatch risk
maps, calling into question their ability to forecast into the future, particularly when dealing
with spatiotemporally dynamic fisheries and highly migratory bycatch species. In this study,
we use boosted regression trees to model the spatiotemporal distribution of fishing effort for
two distinct fisheries in the North Pacific Ocean, the albacore (Thunnus alalunga) troll fishery
and the California drift gillnet fishery that targets swordfish (Xiphias gladius). Our results
suggest that it is possible to accurately predict fishing effort using ,10 readily available
predictor variables (cross-validated correlations between model predictions and observed data
;0.6). Although the two fisheries are quite different in their gears and fishing areas, their
respective models had high predictive ability, even when input data sets were restricted to a
fraction of the full time series. The implications for conservation and management are
encouraging: Across a range of target species, fishing methods, and spatial scales, even a
relatively short time series of fisheries data may suffice to accurately predict the location of
fishing effort into the future. In combination with species distribution modeling of bycatch
species, this approach holds promise as a mitigation tool when observer data are limited. Even
in data-rich regions, modeling fishing effort and bycatch may provide more accurate estimates
of bycatch risk than partial observer coverage for fisheries and bycatch species that are heavily
influenced by dynamic oceanographic conditions.
Key words: albacore; bycatch mitigation; dynamic oceanographic conditions; fisheries management;
marine spatial planning; species distribution modeling; swordfish.
INTRODUCTION or negligible given the costs and logistics associated with
such efforts. Although such obstacles impede direct
FIG. 1. Maps of cumulative fishing effort: (A) West Coast drift gillnet (DGN; measured as number of gear sets) and (B) North
Pacific albacore troll (AT; measured as number of days fished) fisheries. Individual grid cells are 100
3100
for the drift gillnet fishery
and 18318 for the albacore troll fishery. The drift gillnet fishery data cover the period 1981–2001, and the albacore troll fishery data
cover the period 1991–2010. Grid cells with fewer than three total sets or days fished have been censored for confidentiality.
January 2014 75PREDICTING FISHING EFFORT DISTRIBUTIONS
20. City of Sydney
Rose Bay
Lane Cove River
Manly
Sydney Institute of
Marine Science
22. predictors
0
1
2
3
4
prediction
occurrences
model number of presence points n and their location (yi). This has not
previously been proposed for the analysis of presence-only data, despite
the extensive literature on the analysis of presence-only data. We consider
inhomogeneous Poisson point process models [Cressie (1993); Diggle (2003)],
which make the following two assumptions:
1. The locations of the n point events (y1,...,yn) are independent.
2. The intensity at point yi [λ(yi), denoted as λi for convenience], the lim-
iting expected number of presences per unit area [Cressie (1993)], can
be modeled as a function of the k explanatory variables. We assume a
log-linear specification:
log(λi) = β0 +
k
j=1
xijβj,(2.1)
although note that the linearity assumption can be relaxed in the usual
way (e.g., using quadratic terms or splines). The parameters of the model
for the λi are stored in the vector β = (β0,β1,...,βk).
Note that the process being modeled here is locations where an organism has
been reported rather than locations where individuals of the organism occur.
Hence, the independence assumption would only be violated by interactions
between records of sightings rather than by interactions between individ-
ual organisms per se. The atlas data of Figure 1 consist of 721 A. costata
records accumulated over a period of 35 years in a region of 86,000 km2, so
model
explanation | correlation
MaxEnt
Boosting
GLM | GAM
Random Forrest