Providing Statistical Algorithms
as-a-Service
Gianpaolo Coro, Pasquale Pagano,
Leonardo Candela
ISTI-CNR, Pisa, Italy
Statistical Manager
Statistical Manager is a set of web services that aim to:
• Help scientists in managing marine, biological or climatic statistical problems
• Supply precooked state-of-the-art algorithms as-a-Service
• Perform calculations by using Cloud computing in a transparent way to the users
• Share input, results, parameters and comments with colleagues by means of Virtual
Research Environment in the D4Science e-Infrastructure
Setup and execution

Statistical
Manager

Sharing

D4Science
Computational
Facilities
Architecture
Internal Work
Resources and Sharing
Statistical Manager - Interface
Experiment Execution
Computations Check

Summary of the Input, Output
and Parameters of the experiment
Data Space - Sharing and Import
Hosted Algorithms
Application Fields

o
o
o
o

Ecology
Environment
Biodiversity
Life
Ecology
Niche Modelling
•
•
•
•
•

AquaMaps – Suitable Habitat
AquaMaps – Native Habitat
AquaMaps for 2050
Artificial Neural Networks
AquaMaps - ANN
Gadus morhua

AquaMaps - Suitable Habitat
Outliers Detection
Presence
Points

Cetorhinus maximus

Density-based
Clustering
and Outliers detection

Distance Based Clustering

DBScan

K-Means
X-Means
Climate Changes Effects on Species
Bioclimate HSpec

Overall occupancy in
time

Estimated impact of climate
changes over 20 years on 11549
species.
Pseudanthias evansi

The occupancy by the
Pseudanthias evansi
decreases in Area 71 but
increases in Area 77
Similarity between habitats
Habitat Representativeness Score:
1.

Measures the similarity between the environmental features of two areas
2. Assesses the quality of models and environmental features

Latimeria chalumnae

HRS=10.5

Habitat
Representativeness
Score
Environment
Rasterization

A polygonal map is
transformed into a raster
map or into a point map
Maps Comparison
compare

Compares :
• Species Distribution
maps
• Environmental layers
• SAR Images
Periodicity and Seasonality

Periodicity: 12 months
Extraction Tools

Fourier Analysis
Environmental Signal Processing

Spectrogram
Resampling
Biodiversity
Occurrence Points

Occurrence Data from GBIF

Occurrence Data from Obis

∩

ᴜ

-

Intersection

Union

Difference

DD
Duplicates Deletion

A

B

x,y

x,y

Event Date

Records

Modif Date

Modif Date
Author
Species Scientific Name

Event Date

Similarity

Author
Species Scientific Name
BiOnym
Raw Input String.
E.g. Gadus morua Lineus 1758
Reference
Source
(ASFIS)

Preprocessing
And
Parsing

Accounts for:
• Variations in the spelling and
interpretation of taxonomic
names
• Combination of data from
different sources
• Harmonization and reconciliation
of Taxa names

Reference
Source
(Other in
DwC-A)

Reference
Source
(WoRMS)

Taxon name
Matcher 1
A flexible workflow approach to
taxon name matching

Reference
Source
(FISHBASE)

Taxon name

Matcher 2
Taxon name
Matcher n
PostProcessing
Correct Transcriptions:
E.g. Gadus morhua (Linnaeus, 1758)
Trendylyzer

• Fill some knowledge gaps on marine species
• Account for sampling biases
• Define trends for common species

Herring recovered after the fish ban

Plankton regime shift

Can we recognize big changes in
species presence?
Life
Length-Weight Relationships
Calculate the a and b parameters for 14 230
species by means of Bayesian Methods

Approach:
Collaborative development with the final user
Integration of user’s R Scripts
Usage of Cloud computing for R Scripts
Periodic runs

bluewatermag.com.au

The porting to the D4Science Statistical Manager allowed to run the scripts in distributed
fashion
The time reduction was from 20 days to 11 hours! 95.4% reduction
Functions Simulation - Spawning Stock Biomass vs Recruits
Estimate biological limits for 50
Northeast Atlantic fish stocks
Use real measures
Rely on previous expert knowledge
Use Bayesian models to combine
information

Re-estimated SSB limit

Re-estimated HS
Rulebased
HS

Re-estimated
precautionary limit
Future Work
Plan

• Make the Statistical Manager Algorithms accessible
through the OGC WPS standard (currently available via
SOAP and Java API)
• Invoke the algorithms from a Workflow Management
System (e.g. Taverna)
• Expand the system with new algorithms
Thank you

Providing Statistical Algorithms as-a-Service