BlueBRIDGE receives funding from the European Union’s Horizon 2020
research and innovation programme under grant agreement No. 675680 www.bluebridge-vres.eu
Using e-Infrastructures for Biodiversity
Conservation
Gianpaolo Coro
CNR, Italy
gianpaolo.coro@isti.cnr.it
(on behalf of the InfraScience group of ISTI-CNR, Pisa, Italy)
Context
Progress in Information Technology has changed
the paradigms of Science
 The large and fast increase of volume and
complexity of data requires new approaches to
collect-curate-analyse the data
 This requires new tools to guarantee exchange
and longevity of the data and of the reapplication
of the experiments
Big Data
• Large volume
• High generation velocity
• Large variety
• Untrustworthy
(veracity)
• High complexity
(variability)
Big Data: a dataset with large volume, variety, generation velocity, containing complex and
untrustworthy information that requires nonconventional methods to extract, manage and
process information within a reasonable time.
• Value
New Science Paradigms
 Open Science: make scientific research, data and dissemination
accessible to all levels of an inquiring society, amateur or
professional.
Keywords: Open Access, Open research, Open Notebook Science
 E-Science: computationally intensive science is carried out in highly
distributed network environments that use large data sets and
require distributed computing and collaborative tools.
Keywords: Provenance of the scientific process, Scientific workflows
 Science 2.0: process and publish large data sets using a
collaborative approach. Share from raw data to experimental
results and processes. Support collaborative experiments and
Reproducibility-Repeatability-Reusability (R-R-R) of Science.
Keywords: collaborative and repeatable Science
Requirements for IT systems
• Support collaborative research and experimentation
• Implement Reproducibility-Repeatability-Reusability of
Science
• Allow sharing data, processes and findings
• Grant free access to the produced scientific knowledge
• Tackle Big Data challenges
• Sustainability: low operational costs, low maintenance
prices
• Manage heterogeneous data/processes access policies
• Meet industrial processes requirements
e-Infrastructures
e-Infrastructures enable researchers at different locations across the world
to collaborate in the context of their home institutions or in national or multinational
scientific initiatives.
• People can work together having shared access to unique or distributed scientific
facilities (including data, instruments, computing and communications).
Examples:
Belief, http://www.beliefproject.org/
OpenAire, http://www.openaire.eu/
i-Marine, http://www.i-marine.eu/
EU-Brazil OpenBio,
http://www.eubrazilopenbio.eu/
Virtual Research Environments
• Define sub-communities
• Allow temporary dedicated
assignment of computational,
storage, and data resources
• Manage policies
• Support data and information
sharing
Integrates
e-Infrastructure
Unified Resource Space
Enables
VRE VRE VRE
WPS
External e-Infrastructures
Virtual Research Environments
Innovative, web-based, community-oriented, comprehensive, flexible, and
secure working environments.
• Communities are provided with applications to interact with the VRE services
• Client services are provided both with APIs (Java, R) and simple HTTP-REST interfaces
VREs Example
The D4Science e-Infrastructure
D4Science supports scientists in several domains
1. More than 25 000
taxonomic
studies per month
www.i-marine.eu
2. More than 60 000
species distribution
maps produced and
hosted
www.d4science.eu
3. Used to build a
pan- European
geothermal energy
map
www.egip.d4science.org
4. Processing and
management of
heterogeneous
environmental and
Earth system data
www.envriplus.eu
5. Enhances
communication and
exchange in Linguistic
Studies, Humanities,
Cultural Heritage,
History and
Archaeology
www.parthenos-project.eu
BlueBRIDGE VREs
Stock Assessment
assess the health status of fisheries stocks.
http://www.bluebridge-vres.eu/services/stock-
assessment
CMSY model
Marine Protected Areas
reduce adverse impact of human activities
(e.g. fishing, aquaculture, tourism) on
ecosystems, and ensure these activities are
properly embedded in policy frameworks.
http://www.bluebridge-vres.eu/services/protected-area-
impact-maps
Education VREs
Lecture-style: the course topics stress is different
depending on the audience
Interactive: after each explained topic, students do
experiments
Experimental: students reproduce the experiment
shown by the teacher and possibly repeat it on their own
data
Social: students communicate via messaging or VRE
discussion panel
• 1 course/year
In Pisa
• 1 course/year
In Paris
• 12 courses
In Copenhagen
www.bluebridge-vres.eu
International Council for
the Exploration of the Sea
• 38 courses
All over the world
+1000 attendees
Social networking is key to share information in e-Infrastructure
BlueBRIDGE offers a continuously updated list of events / news produced by users
and applications
User-shared
News
Application-
shared News
Share News
BlueBRIDGE VREs: Social Networking
A free-of-use folder-based file system allows managing and sharing
information objects.
Information objects can be
• files, dataset, workflows,
experiments, etc.
• organized
into folders
• shared
• disseminated via public
URLs
BlueBRIDGE VREs:
The Workspace – an online files storage system
Storage
Databases Cloud storage Geospatial data
Metadata generation
and management
Harmonisation Sharing
Data
management
Cloud computing Elastic resources
assignment
Multi-platform: R,
Java, Fortran
Processing
BlueBRIDGE Facilities: Overview
Innovation Through Integration
Vision: integration, sharing, and remote hosting help
informing people and taking decisions
Data Processing
• Experiments on Big Data
• Sharing inputs and results
• Save the provenance of experiments
• Supports R-R-R of experiments
• Input/Out
• Parameters
• Provenance
Cloud Computing
Platform
WPS
REST
NEW
Workspace
Prov-O
(https://www.w3.org/TR/prov-o/)
“Provenance is information about
entities, activities, and people
involved in producing a piece of data
or thing, which can be used to form
assessments about its quality,
reliability or trustworthiness.”
The PROV Ontology (PROV-O)
expresses the PROV Data Model
using the OWL2 Web Ontology
Language (OWL2).
It provides a set of classes,
properties, and restrictions that can
be used to represent and interchange
provenance information generated in
different systems and under different
contexts.
BlueBRIDGE Computational
Capabilities
Project resources:
 28 Virtual Machines (VM) with 418 CPU cores, 636GB of RAM and 4TB of
ephemeral storage
 100 VMs with 200 CPU cores, 800GB of RAM and 2TB of ephemeral storage
 Storage: 350TB
Processes:
 ~ 225 algorithms hosted in all the VREs
 ~ 20 contributing institutes
 ~ 30,000 requests per month
 ~ 2000 scientists/students in 44 countries using VREs
 Programming languages: R, Java, Python, Fortran, Linux-compiled
External providers (European Grid Infrastructure):
 6 VMs: 8 virtual CPU cores, 16GB of RAM and 100GB of storage
 2 VMs: 16 virtual CPU cores, 32GB of RAM and 100GB of storage
 24 VMs: 2 virtual CPU cores, 8GB of RAM and 50GB of storage
 5VMs: 4 virtual CPUs cores, 8GB of RAM and 80GB of disk
Integrating new processes
Integration: putting a script or a process that works offline into
the Cloud computing platform.
R script
Computing platform Web interface and Web service
SAI - Importing tool
Automatic
Coro G., Panichi G., Pagano P. A Web application to publish R scripts as-a-Service on a Cloud computing platform.
In: Bollettino di Geofisica Teorica e Applicata, vol. 52 article n. 51. Istituto Nazionale di Oceanografia e di Geofisica
Sperimentale, 2016.
https://wiki.gcube-system.org/gcube/How-to_Implement_Algorithms_for_DataMiner
https://wiki.gcube-system.org/gcube/Statistical_Algorithms_Importer
Algorithms Importer (SAI)
System features:
1. RStudio-like
interface
2. Simple definition
of script input and
output
3. Global variables
4. Associate data
type to the I/O
5. Request packages
6. Automatic
software
production
7. Automatic
deployment
SAI Work Flow
Advantages
 The process is available as-a-Service
 Invoked via communication standards
 Higher computational capabilities
 Automatic creation of a Web interface
 Provenance management
 Storage of results on a high-availability system
 Collaboration and sharing
 Re-usability, Reproducibility, Repeatability, also
from other software (e.g. QGIS)
Collaborative experiments
WS
Shared online folders
Inputs
Outputs
Results
Computational system
In the e-Infrastructure
Through third party software
Scientific Workflow with Code Privacy Guarantee
Script provider
Updates the script on
his private Workspace
The service downloads
the script on-the-fly
A user executes an
experiment on
his/her data
The output, the input
and the parameters can
be shared with another
user
This user can execute the
experiment again
and share the
computation with the
other user
1
2
3
4
5
6
7
89
10
Limitations and requirements
Input OutputScript
Script
Required Provided
Issues:
 Code is often designed for one precise data set
 Often, prototype scripts have code that is not separable from the I/O
In the context of e-Infrastructures and Science 2.0:
 Modularity is necessary for integration
 Scripts should be re-organised in a way they could be re-used on other data without
changing the code
Vs
WS
Self-consistent comp. object
RepeatabilityProvenance Prov-O
Reusability
Use of standards
Reproducibility
Towards Science 2.0
Examples
Geospatial data processing
Maps
comparison
NetCDF
file
Data extraction
Signal processing Periodicity detection
Maps generation
Maps Comparison
compare
Compares :
• Species Distribution
maps
• Environmental layers
• SAR Images
Coro, G., Pagano, P., & Ellenbroek, A.
(2014). Comparing heterogeneous
distribution maps for marine
species. GIScience & Remote
Sensing, 51(5), 593-611.
Clustering and Outliers Detection
Presence
Points
Density-based
Clustering
and Outliers detection
Distance Based Clustering
K-Means
X-Means
DBScan
Cetorhinus maximus
Ecological Niche Modelling
Atlantic cod
Coelacanth
Giant squid
AquaMaps
Neural
Networks
Maximum
Entropy
Coro, G., Magliozzi, C., Ellenbroek, A., & Pagano, P. (2015). Improving data quality to build a robust
distribution model for Architeuthis dux. Ecological Modelling, 305, 29-39.
Estimating Similarity Between Habitats
Habitat Representativeness Score:
1. Measures the similarity between the environmental features of two areas
2. Assesses the quality of models and environmental features
HRS=10.5
Habitat
Representativeness
Score
Latimeria chalumnae
Coro, G., Pagano, P., & Ellenbroek, A. (2013). Combining simulated expert knowledge with Neural Networks
to produce Ecological Niche Models for Latimeria chalumnae. Ecological modelling, 268, 55-63.
Occurrence Data from GBIF
(www.gbif.org)
Occurrence Data from OBIS
(www.iobis.org)
∩
Intersection
-
Difference
ᴜ
Union
A
x,y
Event Date
Modif Date
Author
Species Scientific Name
Occurrence Points Processing
B
x,y
Event Date
Modif Date
Author
Species Scientific Name
Records
Similarity
DD
Duplicates Deletion
Candela, L., Castelli, D., Coro, G., Lelii, L., Mangiacrapa, F., Marioli, V., & Pagano, P. (2015). An infrastructure-
oriented approach for supporting biodiversity research. Ecological Informatics, 26, 162-172.
Absence Locations Estimation
Coro, G., Magliozzi, C., Berghe, E. V., Bailly, N.,
Ellenbroek, A., & Pagano, P. (2016). Estimating
absence locations of marine species from data of
scientific surveys in OBIS. Ecological Modelling, 323,
61-76.
• Intersect survey data
focussing on a target
species
• Maximise the
separation between
locations with and
without occurrences
• Spatially aggregate
• Estimate absence
locations
Detecting Trends in Species Abundance
• Fill some knowledge gaps on marine species
• Account for sampling biases
• Define trends for common species
Plankton regime shift
Herring recovered after the fish ban
Appeltans W., Pissierssens P., Coro G., Italiano A., Pagano P., Ellenbroek A., Webb T. Trendylyzer: a long-term trend analysis on biogeographic data. In: Bollettino di Geofisica Teorica e Applicata:
an International Journal of Earth Sciences, vol. 54 (Suppl.) pp. 203 - 205. Supplement: IMDIS 2013 - International Conference on Marine Data and Information Systems, 23-25 September, Lucca
(Italy). OGS - Istituto Nazionale di Oceanografia e di Geofisica Sperimentale, 2013.
Estimating Climate Change Effects on Species
Distributions
AquaMaps actual (native)
distribution
Today vs 2050
(~11 500 maps)
Discover classes of
changes
by means of cluster
analysis
Coro, G., Magliozzi, C., Ellenbroek, A., Kaschner, K., & Pagano, P. (2015). Automatic classification of climate change effects on marine species distributions in 2050
using the AquaMaps model. Environmental and Ecological Statistics, 1-26.
Cluster Analysis to Detect Common Species
Average of
average_number_of_species_occ
urrences_per_dataset
Average of
number_of_datasets_containin
g_at_least_one_observation_f
or_the_
Average of
number_of_6_mi
nute_cells_contai
ning_at_least_on
e_observation_fo
Average of
number_of_mont
hs_containing_at
_least_one_occur
rence_record_for
_
Average of
no_months_with
_a_least_10_occu
rrences
Average of
nInd/nOcc
Cluster 0 100 100 100 100 100 100
Cluster 1 14.46 78.57 41.05 88.90 79.65 11.14
Cluster 2 2.43 63.04 12.90 66.16 31.16 5.64
Cluster 3 0.16 53.57 1.62 27.12 1.36 0.41
Normalization with respect to the maximum value for each column
Common: frequent, widespread, high individual
density
Moderate Commonness: moderately frequent,
moderately widespread, medium individual
density
Moderate-Low Commonness: poorly
widespread, low-moderately frequent, low
individual density
Low Commonness: quite localized, not frequent,
usually low individual density
• The term “common species” refers
intuitively to a species that is abundant
in a certain area, widespread and at
low risk of extinction.
• By consequence, “rare species” are
less abundant and possibly threatened.
• Automatically detecting common and
rare species, and how their status
changes through time, is an important
step in understanding the
consequences of environmental
change for ecosystem functioning.
Coro, G., Webb, T. J., Appeltans, W., Bailly, N., Cattrijsse, A., & Pagano, P. (2015). Classifying degrees of species
commonness: North Sea fish as a case study. Ecological Modelling, 312, 272-280..
Invasive species
• Seven data mining techniques to
estimate the spread of the puffer
fish in the Mediterranean Sea;
• The approach is applicable also to
other species;
• Produced impact maps on FAO-
AREAs, EEZs and GSAs.
Under publication
Search in Large Taxonomic Names Repositories
A flexible workflow approach
to taxon name matching
Accounts for:
• Variations in the spelling and
interpretation of taxonomic
names
• Combination of data from
different sources
• Harmonization and reconciliation
of Taxa names
Raw Input String
Gadus morua Lineus 1758
Correct Transcription:
Gadus morhua (Linnaeus, 1758)
Preprocessing
And
Parsing
Taxon name
Matcher 1
Taxon name
Matcher 2
Taxon name
Matcher n
PostProcessing
Reference
Source
(ASFIS)
Reference
Source
(FISHBASE)
Reference
Source
(WoRMS)
Reference
Source
(OBIS)
Berghe, E. V., Coro, G., Bailly, N., Fiorellato, F., Aldemita, C., Ellenbroek, A., & Pagano, P. (2015). Retrieving taxa names
from large biodiversity data collections using a flexible matching workflow. Ecological Informatics, 28, 29-41.
Vessels data analysis
Most exploited locations detection
Routes interpolation
Fishing activity estimation
Coro, G., Fortunati, L., & Pagano, P. (2013, June). Deriving fishing monthly effort and caught species from vessel
trajectories. In OCEANS-Bergen, 2013 MTS/IEEE (pp. 1-5).
Forecasting Fishery Statistics
Frequency and time series
structure detection (with SSA)
was used to forecast effort, catch
and locations of purse seine
fishing in the Indian Ocean.
Coro, G., Large, S., Magliozzi, C., & Pagano, P. (2016).
Analysing and forecasting fisheries time series: purse
seine in Indian Ocean as a case study. ICES Journal of
Marine Science: Journal du Conseil, fsw131.
Stock assessment
Length-Weight Relations: estimates Length-
Weight relation parameters for marine species,
using Bayesian methods. Developed by R. Froese, T.
Thorson and R. B. Reyes
SGVM interpolation: interpolation of vessels
trajectories. Developed by the Study Group on VMS,
involving ICES
FAO MSY: stock assessment for FAO catch data.
Developed by the Resource Use and Conservation
Division of the FAO Fisheries and Aquaculture
Department (ref. Y. Ye - FAO)
ICCAT VPA: stock assessment method for
International Commission for the Conservation
of Atlantic Tunas (ICCAT) data. Developed by
Ifremer and IRD (ref. S. Bonhommeau, J. Bard)
CMSY:estimates Maximum Sustainable Yield
from catch statistics. Prime choice for ICES as
main stock assessment tool. Developed by R.
Froese, G. Coro, N. Demirel, K. Kleisner and H. Winker
Atlantic herring
BlueBRIDGE reduced time-to-
market:
State-of-the-art models to estimate
Maximum Sustainable Yield
computational time reduced of 95%
in average
Froese, R., Demirel, N., Coro, G., Kleisner, K. M., & Winker, H. (2016).
Estimating fisheries reference points from catch and resilience. Fish and
Fisheries.
Links
Web Portals
• bluebridge.d4science.org
• services.d4science.org
Web sites
• www.bluebridge-vres.eu
• www.gcube-system.org
• www.d4science.org
• www.i-marine.eu

Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)

  • 1.
    BlueBRIDGE receives fundingfrom the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 675680 www.bluebridge-vres.eu Using e-Infrastructures for Biodiversity Conservation Gianpaolo Coro CNR, Italy gianpaolo.coro@isti.cnr.it (on behalf of the InfraScience group of ISTI-CNR, Pisa, Italy)
  • 2.
    Context Progress in InformationTechnology has changed the paradigms of Science  The large and fast increase of volume and complexity of data requires new approaches to collect-curate-analyse the data  This requires new tools to guarantee exchange and longevity of the data and of the reapplication of the experiments
  • 3.
    Big Data • Largevolume • High generation velocity • Large variety • Untrustworthy (veracity) • High complexity (variability) Big Data: a dataset with large volume, variety, generation velocity, containing complex and untrustworthy information that requires nonconventional methods to extract, manage and process information within a reasonable time. • Value
  • 4.
    New Science Paradigms Open Science: make scientific research, data and dissemination accessible to all levels of an inquiring society, amateur or professional. Keywords: Open Access, Open research, Open Notebook Science  E-Science: computationally intensive science is carried out in highly distributed network environments that use large data sets and require distributed computing and collaborative tools. Keywords: Provenance of the scientific process, Scientific workflows  Science 2.0: process and publish large data sets using a collaborative approach. Share from raw data to experimental results and processes. Support collaborative experiments and Reproducibility-Repeatability-Reusability (R-R-R) of Science. Keywords: collaborative and repeatable Science
  • 5.
    Requirements for ITsystems • Support collaborative research and experimentation • Implement Reproducibility-Repeatability-Reusability of Science • Allow sharing data, processes and findings • Grant free access to the produced scientific knowledge • Tackle Big Data challenges • Sustainability: low operational costs, low maintenance prices • Manage heterogeneous data/processes access policies • Meet industrial processes requirements
  • 6.
    e-Infrastructures e-Infrastructures enable researchersat different locations across the world to collaborate in the context of their home institutions or in national or multinational scientific initiatives. • People can work together having shared access to unique or distributed scientific facilities (including data, instruments, computing and communications). Examples: Belief, http://www.beliefproject.org/ OpenAire, http://www.openaire.eu/ i-Marine, http://www.i-marine.eu/ EU-Brazil OpenBio, http://www.eubrazilopenbio.eu/
  • 7.
    Virtual Research Environments •Define sub-communities • Allow temporary dedicated assignment of computational, storage, and data resources • Manage policies • Support data and information sharing Integrates e-Infrastructure Unified Resource Space Enables VRE VRE VRE WPS External e-Infrastructures
  • 8.
    Virtual Research Environments Innovative,web-based, community-oriented, comprehensive, flexible, and secure working environments. • Communities are provided with applications to interact with the VRE services • Client services are provided both with APIs (Java, R) and simple HTTP-REST interfaces
  • 9.
    VREs Example The D4Sciencee-Infrastructure D4Science supports scientists in several domains 1. More than 25 000 taxonomic studies per month www.i-marine.eu 2. More than 60 000 species distribution maps produced and hosted www.d4science.eu 3. Used to build a pan- European geothermal energy map www.egip.d4science.org 4. Processing and management of heterogeneous environmental and Earth system data www.envriplus.eu 5. Enhances communication and exchange in Linguistic Studies, Humanities, Cultural Heritage, History and Archaeology www.parthenos-project.eu
  • 10.
    BlueBRIDGE VREs Stock Assessment assessthe health status of fisheries stocks. http://www.bluebridge-vres.eu/services/stock- assessment CMSY model Marine Protected Areas reduce adverse impact of human activities (e.g. fishing, aquaculture, tourism) on ecosystems, and ensure these activities are properly embedded in policy frameworks. http://www.bluebridge-vres.eu/services/protected-area- impact-maps
  • 11.
    Education VREs Lecture-style: thecourse topics stress is different depending on the audience Interactive: after each explained topic, students do experiments Experimental: students reproduce the experiment shown by the teacher and possibly repeat it on their own data Social: students communicate via messaging or VRE discussion panel • 1 course/year In Pisa • 1 course/year In Paris • 12 courses In Copenhagen www.bluebridge-vres.eu International Council for the Exploration of the Sea • 38 courses All over the world +1000 attendees
  • 12.
    Social networking iskey to share information in e-Infrastructure BlueBRIDGE offers a continuously updated list of events / news produced by users and applications User-shared News Application- shared News Share News BlueBRIDGE VREs: Social Networking
  • 13.
    A free-of-use folder-basedfile system allows managing and sharing information objects. Information objects can be • files, dataset, workflows, experiments, etc. • organized into folders • shared • disseminated via public URLs BlueBRIDGE VREs: The Workspace – an online files storage system
  • 14.
    Storage Databases Cloud storageGeospatial data Metadata generation and management Harmonisation Sharing Data management Cloud computing Elastic resources assignment Multi-platform: R, Java, Fortran Processing BlueBRIDGE Facilities: Overview
  • 15.
    Innovation Through Integration Vision:integration, sharing, and remote hosting help informing people and taking decisions
  • 16.
  • 17.
    • Experiments onBig Data • Sharing inputs and results • Save the provenance of experiments • Supports R-R-R of experiments • Input/Out • Parameters • Provenance Cloud Computing Platform WPS REST NEW Workspace
  • 18.
    Prov-O (https://www.w3.org/TR/prov-o/) “Provenance is informationabout entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness.” The PROV Ontology (PROV-O) expresses the PROV Data Model using the OWL2 Web Ontology Language (OWL2). It provides a set of classes, properties, and restrictions that can be used to represent and interchange provenance information generated in different systems and under different contexts.
  • 19.
    BlueBRIDGE Computational Capabilities Project resources: 28 Virtual Machines (VM) with 418 CPU cores, 636GB of RAM and 4TB of ephemeral storage  100 VMs with 200 CPU cores, 800GB of RAM and 2TB of ephemeral storage  Storage: 350TB Processes:  ~ 225 algorithms hosted in all the VREs  ~ 20 contributing institutes  ~ 30,000 requests per month  ~ 2000 scientists/students in 44 countries using VREs  Programming languages: R, Java, Python, Fortran, Linux-compiled External providers (European Grid Infrastructure):  6 VMs: 8 virtual CPU cores, 16GB of RAM and 100GB of storage  2 VMs: 16 virtual CPU cores, 32GB of RAM and 100GB of storage  24 VMs: 2 virtual CPU cores, 8GB of RAM and 50GB of storage  5VMs: 4 virtual CPUs cores, 8GB of RAM and 80GB of disk
  • 20.
    Integrating new processes Integration:putting a script or a process that works offline into the Cloud computing platform. R script Computing platform Web interface and Web service SAI - Importing tool Automatic Coro G., Panichi G., Pagano P. A Web application to publish R scripts as-a-Service on a Cloud computing platform. In: Bollettino di Geofisica Teorica e Applicata, vol. 52 article n. 51. Istituto Nazionale di Oceanografia e di Geofisica Sperimentale, 2016. https://wiki.gcube-system.org/gcube/How-to_Implement_Algorithms_for_DataMiner https://wiki.gcube-system.org/gcube/Statistical_Algorithms_Importer
  • 21.
    Algorithms Importer (SAI) Systemfeatures: 1. RStudio-like interface 2. Simple definition of script input and output 3. Global variables 4. Associate data type to the I/O 5. Request packages 6. Automatic software production 7. Automatic deployment
  • 22.
  • 23.
    Advantages  The processis available as-a-Service  Invoked via communication standards  Higher computational capabilities  Automatic creation of a Web interface  Provenance management  Storage of results on a high-availability system  Collaboration and sharing  Re-usability, Reproducibility, Repeatability, also from other software (e.g. QGIS)
  • 24.
    Collaborative experiments WS Shared onlinefolders Inputs Outputs Results Computational system In the e-Infrastructure Through third party software
  • 25.
    Scientific Workflow withCode Privacy Guarantee Script provider Updates the script on his private Workspace The service downloads the script on-the-fly A user executes an experiment on his/her data The output, the input and the parameters can be shared with another user This user can execute the experiment again and share the computation with the other user 1 2 3 4 5 6 7 89 10
  • 26.
    Limitations and requirements InputOutputScript Script Required Provided Issues:  Code is often designed for one precise data set  Often, prototype scripts have code that is not separable from the I/O In the context of e-Infrastructures and Science 2.0:  Modularity is necessary for integration  Scripts should be re-organised in a way they could be re-used on other data without changing the code Vs
  • 27.
    WS Self-consistent comp. object RepeatabilityProvenanceProv-O Reusability Use of standards Reproducibility Towards Science 2.0
  • 28.
  • 29.
    Geospatial data processing Maps comparison NetCDF file Dataextraction Signal processing Periodicity detection Maps generation
  • 30.
    Maps Comparison compare Compares : •Species Distribution maps • Environmental layers • SAR Images Coro, G., Pagano, P., & Ellenbroek, A. (2014). Comparing heterogeneous distribution maps for marine species. GIScience & Remote Sensing, 51(5), 593-611.
  • 31.
    Clustering and OutliersDetection Presence Points Density-based Clustering and Outliers detection Distance Based Clustering K-Means X-Means DBScan Cetorhinus maximus
  • 32.
    Ecological Niche Modelling Atlanticcod Coelacanth Giant squid AquaMaps Neural Networks Maximum Entropy Coro, G., Magliozzi, C., Ellenbroek, A., & Pagano, P. (2015). Improving data quality to build a robust distribution model for Architeuthis dux. Ecological Modelling, 305, 29-39.
  • 33.
    Estimating Similarity BetweenHabitats Habitat Representativeness Score: 1. Measures the similarity between the environmental features of two areas 2. Assesses the quality of models and environmental features HRS=10.5 Habitat Representativeness Score Latimeria chalumnae Coro, G., Pagano, P., & Ellenbroek, A. (2013). Combining simulated expert knowledge with Neural Networks to produce Ecological Niche Models for Latimeria chalumnae. Ecological modelling, 268, 55-63.
  • 34.
    Occurrence Data fromGBIF (www.gbif.org) Occurrence Data from OBIS (www.iobis.org) ∩ Intersection - Difference ᴜ Union A x,y Event Date Modif Date Author Species Scientific Name Occurrence Points Processing B x,y Event Date Modif Date Author Species Scientific Name Records Similarity DD Duplicates Deletion Candela, L., Castelli, D., Coro, G., Lelii, L., Mangiacrapa, F., Marioli, V., & Pagano, P. (2015). An infrastructure- oriented approach for supporting biodiversity research. Ecological Informatics, 26, 162-172.
  • 35.
    Absence Locations Estimation Coro,G., Magliozzi, C., Berghe, E. V., Bailly, N., Ellenbroek, A., & Pagano, P. (2016). Estimating absence locations of marine species from data of scientific surveys in OBIS. Ecological Modelling, 323, 61-76. • Intersect survey data focussing on a target species • Maximise the separation between locations with and without occurrences • Spatially aggregate • Estimate absence locations
  • 36.
    Detecting Trends inSpecies Abundance • Fill some knowledge gaps on marine species • Account for sampling biases • Define trends for common species Plankton regime shift Herring recovered after the fish ban Appeltans W., Pissierssens P., Coro G., Italiano A., Pagano P., Ellenbroek A., Webb T. Trendylyzer: a long-term trend analysis on biogeographic data. In: Bollettino di Geofisica Teorica e Applicata: an International Journal of Earth Sciences, vol. 54 (Suppl.) pp. 203 - 205. Supplement: IMDIS 2013 - International Conference on Marine Data and Information Systems, 23-25 September, Lucca (Italy). OGS - Istituto Nazionale di Oceanografia e di Geofisica Sperimentale, 2013.
  • 37.
    Estimating Climate ChangeEffects on Species Distributions AquaMaps actual (native) distribution Today vs 2050 (~11 500 maps) Discover classes of changes by means of cluster analysis Coro, G., Magliozzi, C., Ellenbroek, A., Kaschner, K., & Pagano, P. (2015). Automatic classification of climate change effects on marine species distributions in 2050 using the AquaMaps model. Environmental and Ecological Statistics, 1-26.
  • 38.
    Cluster Analysis toDetect Common Species Average of average_number_of_species_occ urrences_per_dataset Average of number_of_datasets_containin g_at_least_one_observation_f or_the_ Average of number_of_6_mi nute_cells_contai ning_at_least_on e_observation_fo Average of number_of_mont hs_containing_at _least_one_occur rence_record_for _ Average of no_months_with _a_least_10_occu rrences Average of nInd/nOcc Cluster 0 100 100 100 100 100 100 Cluster 1 14.46 78.57 41.05 88.90 79.65 11.14 Cluster 2 2.43 63.04 12.90 66.16 31.16 5.64 Cluster 3 0.16 53.57 1.62 27.12 1.36 0.41 Normalization with respect to the maximum value for each column Common: frequent, widespread, high individual density Moderate Commonness: moderately frequent, moderately widespread, medium individual density Moderate-Low Commonness: poorly widespread, low-moderately frequent, low individual density Low Commonness: quite localized, not frequent, usually low individual density • The term “common species” refers intuitively to a species that is abundant in a certain area, widespread and at low risk of extinction. • By consequence, “rare species” are less abundant and possibly threatened. • Automatically detecting common and rare species, and how their status changes through time, is an important step in understanding the consequences of environmental change for ecosystem functioning. Coro, G., Webb, T. J., Appeltans, W., Bailly, N., Cattrijsse, A., & Pagano, P. (2015). Classifying degrees of species commonness: North Sea fish as a case study. Ecological Modelling, 312, 272-280..
  • 39.
    Invasive species • Sevendata mining techniques to estimate the spread of the puffer fish in the Mediterranean Sea; • The approach is applicable also to other species; • Produced impact maps on FAO- AREAs, EEZs and GSAs. Under publication
  • 40.
    Search in LargeTaxonomic Names Repositories A flexible workflow approach to taxon name matching Accounts for: • Variations in the spelling and interpretation of taxonomic names • Combination of data from different sources • Harmonization and reconciliation of Taxa names Raw Input String Gadus morua Lineus 1758 Correct Transcription: Gadus morhua (Linnaeus, 1758) Preprocessing And Parsing Taxon name Matcher 1 Taxon name Matcher 2 Taxon name Matcher n PostProcessing Reference Source (ASFIS) Reference Source (FISHBASE) Reference Source (WoRMS) Reference Source (OBIS) Berghe, E. V., Coro, G., Bailly, N., Fiorellato, F., Aldemita, C., Ellenbroek, A., & Pagano, P. (2015). Retrieving taxa names from large biodiversity data collections using a flexible matching workflow. Ecological Informatics, 28, 29-41.
  • 41.
    Vessels data analysis Mostexploited locations detection Routes interpolation Fishing activity estimation Coro, G., Fortunati, L., & Pagano, P. (2013, June). Deriving fishing monthly effort and caught species from vessel trajectories. In OCEANS-Bergen, 2013 MTS/IEEE (pp. 1-5).
  • 42.
    Forecasting Fishery Statistics Frequencyand time series structure detection (with SSA) was used to forecast effort, catch and locations of purse seine fishing in the Indian Ocean. Coro, G., Large, S., Magliozzi, C., & Pagano, P. (2016). Analysing and forecasting fisheries time series: purse seine in Indian Ocean as a case study. ICES Journal of Marine Science: Journal du Conseil, fsw131.
  • 43.
    Stock assessment Length-Weight Relations:estimates Length- Weight relation parameters for marine species, using Bayesian methods. Developed by R. Froese, T. Thorson and R. B. Reyes SGVM interpolation: interpolation of vessels trajectories. Developed by the Study Group on VMS, involving ICES FAO MSY: stock assessment for FAO catch data. Developed by the Resource Use and Conservation Division of the FAO Fisheries and Aquaculture Department (ref. Y. Ye - FAO) ICCAT VPA: stock assessment method for International Commission for the Conservation of Atlantic Tunas (ICCAT) data. Developed by Ifremer and IRD (ref. S. Bonhommeau, J. Bard) CMSY:estimates Maximum Sustainable Yield from catch statistics. Prime choice for ICES as main stock assessment tool. Developed by R. Froese, G. Coro, N. Demirel, K. Kleisner and H. Winker Atlantic herring BlueBRIDGE reduced time-to- market: State-of-the-art models to estimate Maximum Sustainable Yield computational time reduced of 95% in average Froese, R., Demirel, N., Coro, G., Kleisner, K. M., & Winker, H. (2016). Estimating fisheries reference points from catch and resilience. Fish and Fisheries.
  • 44.
    Links Web Portals • bluebridge.d4science.org •services.d4science.org Web sites • www.bluebridge-vres.eu • www.gcube-system.org • www.d4science.org • www.i-marine.eu