SlideShare a Scribd company logo
1 of 44
BlueBRIDGE receives funding from the European Union’s Horizon 2020
research and innovation programme under grant agreement No. 675680 www.bluebridge-vres.eu
Using e-Infrastructures for Biodiversity
Conservation
Gianpaolo Coro
CNR, Italy
gianpaolo.coro@isti.cnr.it
(on behalf of the InfraScience group of ISTI-CNR, Pisa, Italy)
Context
Progress in Information Technology has changed
the paradigms of Science
 The large and fast increase of volume and
complexity of data requires new approaches to
collect-curate-analyse the data
 This requires new tools to guarantee exchange
and longevity of the data and of the reapplication
of the experiments
Big Data
• Large volume
• High generation velocity
• Large variety
• Untrustworthy
(veracity)
• High complexity
(variability)
Big Data: a dataset with large volume, variety, generation velocity, containing complex and
untrustworthy information that requires nonconventional methods to extract, manage and
process information within a reasonable time.
• Value
New Science Paradigms
 Open Science: make scientific research, data and dissemination
accessible to all levels of an inquiring society, amateur or
professional.
Keywords: Open Access, Open research, Open Notebook Science
 E-Science: computationally intensive science is carried out in highly
distributed network environments that use large data sets and
require distributed computing and collaborative tools.
Keywords: Provenance of the scientific process, Scientific workflows
 Science 2.0: process and publish large data sets using a
collaborative approach. Share from raw data to experimental
results and processes. Support collaborative experiments and
Reproducibility-Repeatability-Reusability (R-R-R) of Science.
Keywords: collaborative and repeatable Science
Requirements for IT systems
• Support collaborative research and experimentation
• Implement Reproducibility-Repeatability-Reusability of
Science
• Allow sharing data, processes and findings
• Grant free access to the produced scientific knowledge
• Tackle Big Data challenges
• Sustainability: low operational costs, low maintenance
prices
• Manage heterogeneous data/processes access policies
• Meet industrial processes requirements
e-Infrastructures
e-Infrastructures enable researchers at different locations across the world
to collaborate in the context of their home institutions or in national or multinational
scientific initiatives.
• People can work together having shared access to unique or distributed scientific
facilities (including data, instruments, computing and communications).
Examples:
Belief, http://www.beliefproject.org/
OpenAire, http://www.openaire.eu/
i-Marine, http://www.i-marine.eu/
EU-Brazil OpenBio,
http://www.eubrazilopenbio.eu/
Virtual Research Environments
• Define sub-communities
• Allow temporary dedicated
assignment of computational,
storage, and data resources
• Manage policies
• Support data and information
sharing
Integrates
e-Infrastructure
Unified Resource Space
Enables
VRE VRE VRE
WPS
External e-Infrastructures
Virtual Research Environments
Innovative, web-based, community-oriented, comprehensive, flexible, and
secure working environments.
• Communities are provided with applications to interact with the VRE services
• Client services are provided both with APIs (Java, R) and simple HTTP-REST interfaces
VREs Example
The D4Science e-Infrastructure
D4Science supports scientists in several domains
1. More than 25 000
taxonomic
studies per month
www.i-marine.eu
2. More than 60 000
species distribution
maps produced and
hosted
www.d4science.eu
3. Used to build a
pan- European
geothermal energy
map
www.egip.d4science.org
4. Processing and
management of
heterogeneous
environmental and
Earth system data
www.envriplus.eu
5. Enhances
communication and
exchange in Linguistic
Studies, Humanities,
Cultural Heritage,
History and
Archaeology
www.parthenos-project.eu
BlueBRIDGE VREs
Stock Assessment
assess the health status of fisheries stocks.
http://www.bluebridge-vres.eu/services/stock-
assessment
CMSY model
Marine Protected Areas
reduce adverse impact of human activities
(e.g. fishing, aquaculture, tourism) on
ecosystems, and ensure these activities are
properly embedded in policy frameworks.
http://www.bluebridge-vres.eu/services/protected-area-
impact-maps
Education VREs
Lecture-style: the course topics stress is different
depending on the audience
Interactive: after each explained topic, students do
experiments
Experimental: students reproduce the experiment
shown by the teacher and possibly repeat it on their own
data
Social: students communicate via messaging or VRE
discussion panel
• 1 course/year
In Pisa
• 1 course/year
In Paris
• 12 courses
In Copenhagen
www.bluebridge-vres.eu
International Council for
the Exploration of the Sea
• 38 courses
All over the world
+1000 attendees
Social networking is key to share information in e-Infrastructure
BlueBRIDGE offers a continuously updated list of events / news produced by users
and applications
User-shared
News
Application-
shared News
Share News
BlueBRIDGE VREs: Social Networking
A free-of-use folder-based file system allows managing and sharing
information objects.
Information objects can be
• files, dataset, workflows,
experiments, etc.
• organized
into folders
• shared
• disseminated via public
URLs
BlueBRIDGE VREs:
The Workspace – an online files storage system
Storage
Databases Cloud storage Geospatial data
Metadata generation
and management
Harmonisation Sharing
Data
management
Cloud computing Elastic resources
assignment
Multi-platform: R,
Java, Fortran
Processing
BlueBRIDGE Facilities: Overview
Innovation Through Integration
Vision: integration, sharing, and remote hosting help
informing people and taking decisions
Data Processing
• Experiments on Big Data
• Sharing inputs and results
• Save the provenance of experiments
• Supports R-R-R of experiments
• Input/Out
• Parameters
• Provenance
Cloud Computing
Platform
WPS
REST
NEW
Workspace
Prov-O
(https://www.w3.org/TR/prov-o/)
“Provenance is information about
entities, activities, and people
involved in producing a piece of data
or thing, which can be used to form
assessments about its quality,
reliability or trustworthiness.”
The PROV Ontology (PROV-O)
expresses the PROV Data Model
using the OWL2 Web Ontology
Language (OWL2).
It provides a set of classes,
properties, and restrictions that can
be used to represent and interchange
provenance information generated in
different systems and under different
contexts.
BlueBRIDGE Computational
Capabilities
Project resources:
 28 Virtual Machines (VM) with 418 CPU cores, 636GB of RAM and 4TB of
ephemeral storage
 100 VMs with 200 CPU cores, 800GB of RAM and 2TB of ephemeral storage
 Storage: 350TB
Processes:
 ~ 225 algorithms hosted in all the VREs
 ~ 20 contributing institutes
 ~ 30,000 requests per month
 ~ 2000 scientists/students in 44 countries using VREs
 Programming languages: R, Java, Python, Fortran, Linux-compiled
External providers (European Grid Infrastructure):
 6 VMs: 8 virtual CPU cores, 16GB of RAM and 100GB of storage
 2 VMs: 16 virtual CPU cores, 32GB of RAM and 100GB of storage
 24 VMs: 2 virtual CPU cores, 8GB of RAM and 50GB of storage
 5VMs: 4 virtual CPUs cores, 8GB of RAM and 80GB of disk
Integrating new processes
Integration: putting a script or a process that works offline into
the Cloud computing platform.
R script
Computing platform Web interface and Web service
SAI - Importing tool
Automatic
Coro G., Panichi G., Pagano P. A Web application to publish R scripts as-a-Service on a Cloud computing platform.
In: Bollettino di Geofisica Teorica e Applicata, vol. 52 article n. 51. Istituto Nazionale di Oceanografia e di Geofisica
Sperimentale, 2016.
https://wiki.gcube-system.org/gcube/How-to_Implement_Algorithms_for_DataMiner
https://wiki.gcube-system.org/gcube/Statistical_Algorithms_Importer
Algorithms Importer (SAI)
System features:
1. RStudio-like
interface
2. Simple definition
of script input and
output
3. Global variables
4. Associate data
type to the I/O
5. Request packages
6. Automatic
software
production
7. Automatic
deployment
SAI Work Flow
Advantages
 The process is available as-a-Service
 Invoked via communication standards
 Higher computational capabilities
 Automatic creation of a Web interface
 Provenance management
 Storage of results on a high-availability system
 Collaboration and sharing
 Re-usability, Reproducibility, Repeatability, also
from other software (e.g. QGIS)
Collaborative experiments
WS
Shared online folders
Inputs
Outputs
Results
Computational system
In the e-Infrastructure
Through third party software
Scientific Workflow with Code Privacy Guarantee
Script provider
Updates the script on
his private Workspace
The service downloads
the script on-the-fly
A user executes an
experiment on
his/her data
The output, the input
and the parameters can
be shared with another
user
This user can execute the
experiment again
and share the
computation with the
other user
1
2
3
4
5
6
7
89
10
Limitations and requirements
Input OutputScript
Script
Required Provided
Issues:
 Code is often designed for one precise data set
 Often, prototype scripts have code that is not separable from the I/O
In the context of e-Infrastructures and Science 2.0:
 Modularity is necessary for integration
 Scripts should be re-organised in a way they could be re-used on other data without
changing the code
Vs
WS
Self-consistent comp. object
RepeatabilityProvenance Prov-O
Reusability
Use of standards
Reproducibility
Towards Science 2.0
Examples
Geospatial data processing
Maps
comparison
NetCDF
file
Data extraction
Signal processing Periodicity detection
Maps generation
Maps Comparison
compare
Compares :
• Species Distribution
maps
• Environmental layers
• SAR Images
Coro, G., Pagano, P., & Ellenbroek, A.
(2014). Comparing heterogeneous
distribution maps for marine
species. GIScience & Remote
Sensing, 51(5), 593-611.
Clustering and Outliers Detection
Presence
Points
Density-based
Clustering
and Outliers detection
Distance Based Clustering
K-Means
X-Means
DBScan
Cetorhinus maximus
Ecological Niche Modelling
Atlantic cod
Coelacanth
Giant squid
AquaMaps
Neural
Networks
Maximum
Entropy
Coro, G., Magliozzi, C., Ellenbroek, A., & Pagano, P. (2015). Improving data quality to build a robust
distribution model for Architeuthis dux. Ecological Modelling, 305, 29-39.
Estimating Similarity Between Habitats
Habitat Representativeness Score:
1. Measures the similarity between the environmental features of two areas
2. Assesses the quality of models and environmental features
HRS=10.5
Habitat
Representativeness
Score
Latimeria chalumnae
Coro, G., Pagano, P., & Ellenbroek, A. (2013). Combining simulated expert knowledge with Neural Networks
to produce Ecological Niche Models for Latimeria chalumnae. Ecological modelling, 268, 55-63.
Occurrence Data from GBIF
(www.gbif.org)
Occurrence Data from OBIS
(www.iobis.org)
∩
Intersection
-
Difference
ᴜ
Union
A
x,y
Event Date
Modif Date
Author
Species Scientific Name
Occurrence Points Processing
B
x,y
Event Date
Modif Date
Author
Species Scientific Name
Records
Similarity
DD
Duplicates Deletion
Candela, L., Castelli, D., Coro, G., Lelii, L., Mangiacrapa, F., Marioli, V., & Pagano, P. (2015). An infrastructure-
oriented approach for supporting biodiversity research. Ecological Informatics, 26, 162-172.
Absence Locations Estimation
Coro, G., Magliozzi, C., Berghe, E. V., Bailly, N.,
Ellenbroek, A., & Pagano, P. (2016). Estimating
absence locations of marine species from data of
scientific surveys in OBIS. Ecological Modelling, 323,
61-76.
• Intersect survey data
focussing on a target
species
• Maximise the
separation between
locations with and
without occurrences
• Spatially aggregate
• Estimate absence
locations
Detecting Trends in Species Abundance
• Fill some knowledge gaps on marine species
• Account for sampling biases
• Define trends for common species
Plankton regime shift
Herring recovered after the fish ban
Appeltans W., Pissierssens P., Coro G., Italiano A., Pagano P., Ellenbroek A., Webb T. Trendylyzer: a long-term trend analysis on biogeographic data. In: Bollettino di Geofisica Teorica e Applicata:
an International Journal of Earth Sciences, vol. 54 (Suppl.) pp. 203 - 205. Supplement: IMDIS 2013 - International Conference on Marine Data and Information Systems, 23-25 September, Lucca
(Italy). OGS - Istituto Nazionale di Oceanografia e di Geofisica Sperimentale, 2013.
Estimating Climate Change Effects on Species
Distributions
AquaMaps actual (native)
distribution
Today vs 2050
(~11 500 maps)
Discover classes of
changes
by means of cluster
analysis
Coro, G., Magliozzi, C., Ellenbroek, A., Kaschner, K., & Pagano, P. (2015). Automatic classification of climate change effects on marine species distributions in 2050
using the AquaMaps model. Environmental and Ecological Statistics, 1-26.
Cluster Analysis to Detect Common Species
Average of
average_number_of_species_occ
urrences_per_dataset
Average of
number_of_datasets_containin
g_at_least_one_observation_f
or_the_
Average of
number_of_6_mi
nute_cells_contai
ning_at_least_on
e_observation_fo
Average of
number_of_mont
hs_containing_at
_least_one_occur
rence_record_for
_
Average of
no_months_with
_a_least_10_occu
rrences
Average of
nInd/nOcc
Cluster 0 100 100 100 100 100 100
Cluster 1 14.46 78.57 41.05 88.90 79.65 11.14
Cluster 2 2.43 63.04 12.90 66.16 31.16 5.64
Cluster 3 0.16 53.57 1.62 27.12 1.36 0.41
Normalization with respect to the maximum value for each column
Common: frequent, widespread, high individual
density
Moderate Commonness: moderately frequent,
moderately widespread, medium individual
density
Moderate-Low Commonness: poorly
widespread, low-moderately frequent, low
individual density
Low Commonness: quite localized, not frequent,
usually low individual density
• The term “common species” refers
intuitively to a species that is abundant
in a certain area, widespread and at
low risk of extinction.
• By consequence, “rare species” are
less abundant and possibly threatened.
• Automatically detecting common and
rare species, and how their status
changes through time, is an important
step in understanding the
consequences of environmental
change for ecosystem functioning.
Coro, G., Webb, T. J., Appeltans, W., Bailly, N., Cattrijsse, A., & Pagano, P. (2015). Classifying degrees of species
commonness: North Sea fish as a case study. Ecological Modelling, 312, 272-280..
Invasive species
• Seven data mining techniques to
estimate the spread of the puffer
fish in the Mediterranean Sea;
• The approach is applicable also to
other species;
• Produced impact maps on FAO-
AREAs, EEZs and GSAs.
Under publication
Search in Large Taxonomic Names Repositories
A flexible workflow approach
to taxon name matching
Accounts for:
• Variations in the spelling and
interpretation of taxonomic
names
• Combination of data from
different sources
• Harmonization and reconciliation
of Taxa names
Raw Input String
Gadus morua Lineus 1758
Correct Transcription:
Gadus morhua (Linnaeus, 1758)
Preprocessing
And
Parsing
Taxon name
Matcher 1
Taxon name
Matcher 2
Taxon name
Matcher n
PostProcessing
Reference
Source
(ASFIS)
Reference
Source
(FISHBASE)
Reference
Source
(WoRMS)
Reference
Source
(OBIS)
Berghe, E. V., Coro, G., Bailly, N., Fiorellato, F., Aldemita, C., Ellenbroek, A., & Pagano, P. (2015). Retrieving taxa names
from large biodiversity data collections using a flexible matching workflow. Ecological Informatics, 28, 29-41.
Vessels data analysis
Most exploited locations detection
Routes interpolation
Fishing activity estimation
Coro, G., Fortunati, L., & Pagano, P. (2013, June). Deriving fishing monthly effort and caught species from vessel
trajectories. In OCEANS-Bergen, 2013 MTS/IEEE (pp. 1-5).
Forecasting Fishery Statistics
Frequency and time series
structure detection (with SSA)
was used to forecast effort, catch
and locations of purse seine
fishing in the Indian Ocean.
Coro, G., Large, S., Magliozzi, C., & Pagano, P. (2016).
Analysing and forecasting fisheries time series: purse
seine in Indian Ocean as a case study. ICES Journal of
Marine Science: Journal du Conseil, fsw131.
Stock assessment
Length-Weight Relations: estimates Length-
Weight relation parameters for marine species,
using Bayesian methods. Developed by R. Froese, T.
Thorson and R. B. Reyes
SGVM interpolation: interpolation of vessels
trajectories. Developed by the Study Group on VMS,
involving ICES
FAO MSY: stock assessment for FAO catch data.
Developed by the Resource Use and Conservation
Division of the FAO Fisheries and Aquaculture
Department (ref. Y. Ye - FAO)
ICCAT VPA: stock assessment method for
International Commission for the Conservation
of Atlantic Tunas (ICCAT) data. Developed by
Ifremer and IRD (ref. S. Bonhommeau, J. Bard)
CMSY:estimates Maximum Sustainable Yield
from catch statistics. Prime choice for ICES as
main stock assessment tool. Developed by R.
Froese, G. Coro, N. Demirel, K. Kleisner and H. Winker
Atlantic herring
BlueBRIDGE reduced time-to-
market:
State-of-the-art models to estimate
Maximum Sustainable Yield
computational time reduced of 95%
in average
Froese, R., Demirel, N., Coro, G., Kleisner, K. M., & Winker, H. (2016).
Estimating fisheries reference points from catch and resilience. Fish and
Fisheries.
Links
Web Portals
• bluebridge.d4science.org
• services.d4science.org
Web sites
• www.bluebridge-vres.eu
• www.gcube-system.org
• www.d4science.org
• www.i-marine.eu

More Related Content

What's hot

FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
Carole Goble
 
Leeds University Geospatial Metadata Workshop 20110617
Leeds University Geospatial Metadata Workshop 20110617Leeds University Geospatial Metadata Workshop 20110617
Leeds University Geospatial Metadata Workshop 20110617
EDINA, University of Edinburgh
 

What's hot (20)

The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
Building the FAIR Research Commons: A Data Driven Society of Scientists
Building the FAIR Research Commons: A Data Driven Society of ScientistsBuilding the FAIR Research Commons: A Data Driven Society of Scientists
Building the FAIR Research Commons: A Data Driven Society of Scientists
 
ELIXIR UK Node presentation to the ELIXIR Board
ELIXIR UK Node presentation to the ELIXIR BoardELIXIR UK Node presentation to the ELIXIR Board
ELIXIR UK Node presentation to the ELIXIR Board
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
Reproducibility - The myths and truths of pipeline bioinformatics
Reproducibility - The myths and truths of pipeline bioinformaticsReproducibility - The myths and truths of pipeline bioinformatics
Reproducibility - The myths and truths of pipeline bioinformatics
 
D4Science Data Infrastructure - Facilitator for a FAIR Data Management
D4Science Data Infrastructure - Facilitator for a FAIR Data ManagementD4Science Data Infrastructure - Facilitator for a FAIR Data Management
D4Science Data Infrastructure - Facilitator for a FAIR Data Management
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
EOSC-Life Workflow Collaboratory
EOSC-Life Workflow CollaboratoryEOSC-Life Workflow Collaboratory
EOSC-Life Workflow Collaboratory
 
Workshop about research data archiving and open access publishing at the Rese...
Workshop about research data archiving and open access publishing at the Rese...Workshop about research data archiving and open access publishing at the Rese...
Workshop about research data archiving and open access publishing at the Rese...
 
Scientific Workflows: what do we have, what do we miss?
Scientific Workflows: what do we have, what do we miss?Scientific Workflows: what do we have, what do we miss?
Scientific Workflows: what do we have, what do we miss?
 
Publishing your research: Research Data Management (Introduction)
Publishing your research: Research Data Management (Introduction) Publishing your research: Research Data Management (Introduction)
Publishing your research: Research Data Management (Introduction)
 
BeSTGRID OpenGridForum 29 GIN session
BeSTGRID OpenGridForum 29 GIN sessionBeSTGRID OpenGridForum 29 GIN session
BeSTGRID OpenGridForum 29 GIN session
 
GBIF BIFA mentoring, Day 5a Data management, July 2016
GBIF BIFA mentoring, Day 5a Data management, July 2016GBIF BIFA mentoring, Day 5a Data management, July 2016
GBIF BIFA mentoring, Day 5a Data management, July 2016
 
Geospatial Metadata and Spatial Data: It's all Greek to me!
Geospatial Metadata and Spatial Data: It's all Greek to me!Geospatial Metadata and Spatial Data: It's all Greek to me!
Geospatial Metadata and Spatial Data: It's all Greek to me!
 
Web service technologies, at CGIAR ICT-KM workshop in Rome (2005)
Web service technologies, at CGIAR ICT-KM workshop in Rome (2005)Web service technologies, at CGIAR ICT-KM workshop in Rome (2005)
Web service technologies, at CGIAR ICT-KM workshop in Rome (2005)
 
Data management, data sharing: the SysMO-SEEK Story
Data management, data sharing: the SysMO-SEEK StoryData management, data sharing: the SysMO-SEEK Story
Data management, data sharing: the SysMO-SEEK Story
 
Leeds University Geospatial Metadata Workshop 20110617
Leeds University Geospatial Metadata Workshop 20110617Leeds University Geospatial Metadata Workshop 20110617
Leeds University Geospatial Metadata Workshop 20110617
 
Global Biodiversity Information Facility - 2013
Global Biodiversity Information Facility - 2013Global Biodiversity Information Facility - 2013
Global Biodiversity Information Facility - 2013
 
Building Federated FAIR Data Spaces, Yann Le Franc, EOSC-Pillar
Building Federated FAIR Data Spaces, Yann Le Franc, EOSC-PillarBuilding Federated FAIR Data Spaces, Yann Le Franc, EOSC-Pillar
Building Federated FAIR Data Spaces, Yann Le Franc, EOSC-Pillar
 

Similar to Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)

RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
Carole Goble
 
D4science-II Codata
D4science-II CodataD4science-II Codata
D4science-II Codata
FAO
 
D4Science: An e-Infrastructure for Facilitating Fisheries and Aquaculture Re...
D4Science:An e-Infrastructure for Facilitating Fisheries and Aquaculture Re...D4Science:An e-Infrastructure for Facilitating Fisheries and Aquaculture Re...
D4Science: An e-Infrastructure for Facilitating Fisheries and Aquaculture Re...
FAO
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practice
Carole Goble
 
Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2
Dan Taylor
 
Paul hu bupdate_i_digbio_ecn_2012
Paul hu bupdate_i_digbio_ecn_2012Paul hu bupdate_i_digbio_ecn_2012
Paul hu bupdate_i_digbio_ecn_2012
ECNOfficer
 

Similar to Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR) (20)

The BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchThe BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative research
 
Bridging Environmental Data Providers and SeaDataNet DIVA Service within a Co...
Bridging Environmental Data Providers and SeaDataNet DIVA Service within a Co...Bridging Environmental Data Providers and SeaDataNet DIVA Service within a Co...
Bridging Environmental Data Providers and SeaDataNet DIVA Service within a Co...
 
Virtual research environments for implementing long tail open science
Virtual research environments for implementing long tail open scienceVirtual research environments for implementing long tail open science
Virtual research environments for implementing long tail open science
 
Sgci esip-7-20-18
Sgci esip-7-20-18Sgci esip-7-20-18
Sgci esip-7-20-18
 
Virtual Research Environments supporting tailor-made data management service...
Virtual Research Environments supporting tailor-made data management service...Virtual Research Environments supporting tailor-made data management service...
Virtual Research Environments supporting tailor-made data management service...
 
Supporting open science oriented skills building by virtual research environm...
Supporting open science oriented skills building by virtual research environm...Supporting open science oriented skills building by virtual research environm...
Supporting open science oriented skills building by virtual research environm...
 
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
 
D4science-II Codata
D4science-II CodataD4science-II Codata
D4science-II Codata
 
D4Science: An e-Infrastructure for Facilitating Fisheries and Aquaculture Re...
D4Science:An e-Infrastructure for Facilitating Fisheries and Aquaculture Re...D4Science:An e-Infrastructure for Facilitating Fisheries and Aquaculture Re...
D4Science: An e-Infrastructure for Facilitating Fisheries and Aquaculture Re...
 
Capacity building, validation and repeatability
Capacity building, validation and repeatabilityCapacity building, validation and repeatability
Capacity building, validation and repeatability
 
The AGINFRA+ Virtual Research Environment (VRE)
The AGINFRA+ Virtual Research Environment (VRE)The AGINFRA+ Virtual Research Environment (VRE)
The AGINFRA+ Virtual Research Environment (VRE)
 
WEBINAR: "How to manage your data to make them open and fair"
WEBINAR:  "How to manage your data to make them open and fair"  WEBINAR:  "How to manage your data to make them open and fair"
WEBINAR: "How to manage your data to make them open and fair"
 
EGI Services
EGI Services EGI Services
EGI Services
 
E Infrastructure for OA
E Infrastructure for OAE Infrastructure for OA
E Infrastructure for OA
 
UK e-Infrastructure: Widening Access, Increasing Participation
UK e-Infrastructure: Widening Access, Increasing ParticipationUK e-Infrastructure: Widening Access, Increasing Participation
UK e-Infrastructure: Widening Access, Increasing Participation
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practice
 
Data-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudData-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and Cloud
 
Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2
 
Shifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data ProviderShifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data Provider
 
Paul hu bupdate_i_digbio_ecn_2012
Paul hu bupdate_i_digbio_ecn_2012Paul hu bupdate_i_digbio_ecn_2012
Paul hu bupdate_i_digbio_ecn_2012
 

More from Blue BRIDGE

More from Blue BRIDGE (20)

PerformFISH: Consumer Driven Production - Integrating Innovative Approaches f...
PerformFISH: Consumer Driven Production - Integrating Innovative Approaches f...PerformFISH: Consumer Driven Production - Integrating Innovative Approaches f...
PerformFISH: Consumer Driven Production - Integrating Innovative Approaches f...
 
BlueBRIDGE supporting education
BlueBRIDGE supporting educationBlueBRIDGE supporting education
BlueBRIDGE supporting education
 
LME: LEARN & IOC Capacity Building Activities
LME: LEARN & IOC Capacity Building ActivitiesLME: LEARN & IOC Capacity Building Activities
LME: LEARN & IOC Capacity Building Activities
 
Machine Learning methods to estimate the performance of aquafarms
Machine Learning methods to estimate the performance of aquafarms Machine Learning methods to estimate the performance of aquafarms
Machine Learning methods to estimate the performance of aquafarms
 
Environmental observation data to detect aquaculture structures: merging Cope...
Environmental observation data to detect aquaculture structures: merging Cope...Environmental observation data to detect aquaculture structures: merging Cope...
Environmental observation data to detect aquaculture structures: merging Cope...
 
Application of Earth Observation (EO) Data for Detection, Characterization an...
Application of Earth Observation (EO) Data for Detection, Characterization an...Application of Earth Observation (EO) Data for Detection, Characterization an...
Application of Earth Observation (EO) Data for Detection, Characterization an...
 
Fostering global data management with public tuna fisheries data
Fostering global data management with public tuna fisheries dataFostering global data management with public tuna fisheries data
Fostering global data management with public tuna fisheries data
 
Understanding biodiversity features in marine protected areas
Understanding biodiversity features in marine protected areasUnderstanding biodiversity features in marine protected areas
Understanding biodiversity features in marine protected areas
 
Panel discussion on Global Repositories of Merged Public Data
Panel discussion on Global Repositories of Merged Public DataPanel discussion on Global Repositories of Merged Public Data
Panel discussion on Global Repositories of Merged Public Data
 
Invasive species and climate change
Invasive species and climate changeInvasive species and climate change
Invasive species and climate change
 
Blue Skills
Blue SkillsBlue Skills
Blue Skills
 
The BIG picture - Advanced data visualization for SDG, basic stock assessment...
The BIG picture - Advanced data visualization for SDG, basic stock assessment...The BIG picture - Advanced data visualization for SDG, basic stock assessment...
The BIG picture - Advanced data visualization for SDG, basic stock assessment...
 
Global Record of Stocks and Fisheries (GRFS)
Global Record of Stocks and Fisheries (GRFS)Global Record of Stocks and Fisheries (GRFS)
Global Record of Stocks and Fisheries (GRFS)
 
Projecting global fish stocks and catches up to 2100
Projecting global fish stocks and catches up to 2100Projecting global fish stocks and catches up to 2100
Projecting global fish stocks and catches up to 2100
 
BlueBRIDGE: Major Achievements & future vision
BlueBRIDGE: Major Achievements & future visionBlueBRIDGE: Major Achievements & future vision
BlueBRIDGE: Major Achievements & future vision
 
Managing tuna fisheries data at a global scale: the Tuna Atlas VRE
Managing tuna fisheries data at a global scale: the Tuna Atlas VREManaging tuna fisheries data at a global scale: the Tuna Atlas VRE
Managing tuna fisheries data at a global scale: the Tuna Atlas VRE
 
SeaDataCloud – further developing the pan-European SeaDataNet infrastructure ...
SeaDataCloud – further developing the pan-European SeaDataNet infrastructure ...SeaDataCloud – further developing the pan-European SeaDataNet infrastructure ...
SeaDataCloud – further developing the pan-European SeaDataNet infrastructure ...
 
The BlueBRIDGE Project - Pasquale Pagano
The BlueBRIDGE Project - Pasquale PaganoThe BlueBRIDGE Project - Pasquale Pagano
The BlueBRIDGE Project - Pasquale Pagano
 
Thematic clouds for EOSC : The Food Cloud and the Blue Cloud
Thematic clouds for EOSC: The Food Cloud and the Blue Cloud�Thematic clouds for EOSC: The Food Cloud and the Blue Cloud�
Thematic clouds for EOSC : The Food Cloud and the Blue Cloud
 
BlueBRIDGE Presentation at Blue Growth Research & Innovation Event 2017
BlueBRIDGE Presentation at Blue Growth Research & Innovation Event 2017BlueBRIDGE Presentation at Blue Growth Research & Innovation Event 2017
BlueBRIDGE Presentation at Blue Growth Research & Innovation Event 2017
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Recently uploaded (20)

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)

  • 1. BlueBRIDGE receives funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 675680 www.bluebridge-vres.eu Using e-Infrastructures for Biodiversity Conservation Gianpaolo Coro CNR, Italy gianpaolo.coro@isti.cnr.it (on behalf of the InfraScience group of ISTI-CNR, Pisa, Italy)
  • 2. Context Progress in Information Technology has changed the paradigms of Science  The large and fast increase of volume and complexity of data requires new approaches to collect-curate-analyse the data  This requires new tools to guarantee exchange and longevity of the data and of the reapplication of the experiments
  • 3. Big Data • Large volume • High generation velocity • Large variety • Untrustworthy (veracity) • High complexity (variability) Big Data: a dataset with large volume, variety, generation velocity, containing complex and untrustworthy information that requires nonconventional methods to extract, manage and process information within a reasonable time. • Value
  • 4. New Science Paradigms  Open Science: make scientific research, data and dissemination accessible to all levels of an inquiring society, amateur or professional. Keywords: Open Access, Open research, Open Notebook Science  E-Science: computationally intensive science is carried out in highly distributed network environments that use large data sets and require distributed computing and collaborative tools. Keywords: Provenance of the scientific process, Scientific workflows  Science 2.0: process and publish large data sets using a collaborative approach. Share from raw data to experimental results and processes. Support collaborative experiments and Reproducibility-Repeatability-Reusability (R-R-R) of Science. Keywords: collaborative and repeatable Science
  • 5. Requirements for IT systems • Support collaborative research and experimentation • Implement Reproducibility-Repeatability-Reusability of Science • Allow sharing data, processes and findings • Grant free access to the produced scientific knowledge • Tackle Big Data challenges • Sustainability: low operational costs, low maintenance prices • Manage heterogeneous data/processes access policies • Meet industrial processes requirements
  • 6. e-Infrastructures e-Infrastructures enable researchers at different locations across the world to collaborate in the context of their home institutions or in national or multinational scientific initiatives. • People can work together having shared access to unique or distributed scientific facilities (including data, instruments, computing and communications). Examples: Belief, http://www.beliefproject.org/ OpenAire, http://www.openaire.eu/ i-Marine, http://www.i-marine.eu/ EU-Brazil OpenBio, http://www.eubrazilopenbio.eu/
  • 7. Virtual Research Environments • Define sub-communities • Allow temporary dedicated assignment of computational, storage, and data resources • Manage policies • Support data and information sharing Integrates e-Infrastructure Unified Resource Space Enables VRE VRE VRE WPS External e-Infrastructures
  • 8. Virtual Research Environments Innovative, web-based, community-oriented, comprehensive, flexible, and secure working environments. • Communities are provided with applications to interact with the VRE services • Client services are provided both with APIs (Java, R) and simple HTTP-REST interfaces
  • 9. VREs Example The D4Science e-Infrastructure D4Science supports scientists in several domains 1. More than 25 000 taxonomic studies per month www.i-marine.eu 2. More than 60 000 species distribution maps produced and hosted www.d4science.eu 3. Used to build a pan- European geothermal energy map www.egip.d4science.org 4. Processing and management of heterogeneous environmental and Earth system data www.envriplus.eu 5. Enhances communication and exchange in Linguistic Studies, Humanities, Cultural Heritage, History and Archaeology www.parthenos-project.eu
  • 10. BlueBRIDGE VREs Stock Assessment assess the health status of fisheries stocks. http://www.bluebridge-vres.eu/services/stock- assessment CMSY model Marine Protected Areas reduce adverse impact of human activities (e.g. fishing, aquaculture, tourism) on ecosystems, and ensure these activities are properly embedded in policy frameworks. http://www.bluebridge-vres.eu/services/protected-area- impact-maps
  • 11. Education VREs Lecture-style: the course topics stress is different depending on the audience Interactive: after each explained topic, students do experiments Experimental: students reproduce the experiment shown by the teacher and possibly repeat it on their own data Social: students communicate via messaging or VRE discussion panel • 1 course/year In Pisa • 1 course/year In Paris • 12 courses In Copenhagen www.bluebridge-vres.eu International Council for the Exploration of the Sea • 38 courses All over the world +1000 attendees
  • 12. Social networking is key to share information in e-Infrastructure BlueBRIDGE offers a continuously updated list of events / news produced by users and applications User-shared News Application- shared News Share News BlueBRIDGE VREs: Social Networking
  • 13. A free-of-use folder-based file system allows managing and sharing information objects. Information objects can be • files, dataset, workflows, experiments, etc. • organized into folders • shared • disseminated via public URLs BlueBRIDGE VREs: The Workspace – an online files storage system
  • 14. Storage Databases Cloud storage Geospatial data Metadata generation and management Harmonisation Sharing Data management Cloud computing Elastic resources assignment Multi-platform: R, Java, Fortran Processing BlueBRIDGE Facilities: Overview
  • 15. Innovation Through Integration Vision: integration, sharing, and remote hosting help informing people and taking decisions
  • 17. • Experiments on Big Data • Sharing inputs and results • Save the provenance of experiments • Supports R-R-R of experiments • Input/Out • Parameters • Provenance Cloud Computing Platform WPS REST NEW Workspace
  • 18. Prov-O (https://www.w3.org/TR/prov-o/) “Provenance is information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness.” The PROV Ontology (PROV-O) expresses the PROV Data Model using the OWL2 Web Ontology Language (OWL2). It provides a set of classes, properties, and restrictions that can be used to represent and interchange provenance information generated in different systems and under different contexts.
  • 19. BlueBRIDGE Computational Capabilities Project resources:  28 Virtual Machines (VM) with 418 CPU cores, 636GB of RAM and 4TB of ephemeral storage  100 VMs with 200 CPU cores, 800GB of RAM and 2TB of ephemeral storage  Storage: 350TB Processes:  ~ 225 algorithms hosted in all the VREs  ~ 20 contributing institutes  ~ 30,000 requests per month  ~ 2000 scientists/students in 44 countries using VREs  Programming languages: R, Java, Python, Fortran, Linux-compiled External providers (European Grid Infrastructure):  6 VMs: 8 virtual CPU cores, 16GB of RAM and 100GB of storage  2 VMs: 16 virtual CPU cores, 32GB of RAM and 100GB of storage  24 VMs: 2 virtual CPU cores, 8GB of RAM and 50GB of storage  5VMs: 4 virtual CPUs cores, 8GB of RAM and 80GB of disk
  • 20. Integrating new processes Integration: putting a script or a process that works offline into the Cloud computing platform. R script Computing platform Web interface and Web service SAI - Importing tool Automatic Coro G., Panichi G., Pagano P. A Web application to publish R scripts as-a-Service on a Cloud computing platform. In: Bollettino di Geofisica Teorica e Applicata, vol. 52 article n. 51. Istituto Nazionale di Oceanografia e di Geofisica Sperimentale, 2016. https://wiki.gcube-system.org/gcube/How-to_Implement_Algorithms_for_DataMiner https://wiki.gcube-system.org/gcube/Statistical_Algorithms_Importer
  • 21. Algorithms Importer (SAI) System features: 1. RStudio-like interface 2. Simple definition of script input and output 3. Global variables 4. Associate data type to the I/O 5. Request packages 6. Automatic software production 7. Automatic deployment
  • 23. Advantages  The process is available as-a-Service  Invoked via communication standards  Higher computational capabilities  Automatic creation of a Web interface  Provenance management  Storage of results on a high-availability system  Collaboration and sharing  Re-usability, Reproducibility, Repeatability, also from other software (e.g. QGIS)
  • 24. Collaborative experiments WS Shared online folders Inputs Outputs Results Computational system In the e-Infrastructure Through third party software
  • 25. Scientific Workflow with Code Privacy Guarantee Script provider Updates the script on his private Workspace The service downloads the script on-the-fly A user executes an experiment on his/her data The output, the input and the parameters can be shared with another user This user can execute the experiment again and share the computation with the other user 1 2 3 4 5 6 7 89 10
  • 26. Limitations and requirements Input OutputScript Script Required Provided Issues:  Code is often designed for one precise data set  Often, prototype scripts have code that is not separable from the I/O In the context of e-Infrastructures and Science 2.0:  Modularity is necessary for integration  Scripts should be re-organised in a way they could be re-used on other data without changing the code Vs
  • 27. WS Self-consistent comp. object RepeatabilityProvenance Prov-O Reusability Use of standards Reproducibility Towards Science 2.0
  • 29. Geospatial data processing Maps comparison NetCDF file Data extraction Signal processing Periodicity detection Maps generation
  • 30. Maps Comparison compare Compares : • Species Distribution maps • Environmental layers • SAR Images Coro, G., Pagano, P., & Ellenbroek, A. (2014). Comparing heterogeneous distribution maps for marine species. GIScience & Remote Sensing, 51(5), 593-611.
  • 31. Clustering and Outliers Detection Presence Points Density-based Clustering and Outliers detection Distance Based Clustering K-Means X-Means DBScan Cetorhinus maximus
  • 32. Ecological Niche Modelling Atlantic cod Coelacanth Giant squid AquaMaps Neural Networks Maximum Entropy Coro, G., Magliozzi, C., Ellenbroek, A., & Pagano, P. (2015). Improving data quality to build a robust distribution model for Architeuthis dux. Ecological Modelling, 305, 29-39.
  • 33. Estimating Similarity Between Habitats Habitat Representativeness Score: 1. Measures the similarity between the environmental features of two areas 2. Assesses the quality of models and environmental features HRS=10.5 Habitat Representativeness Score Latimeria chalumnae Coro, G., Pagano, P., & Ellenbroek, A. (2013). Combining simulated expert knowledge with Neural Networks to produce Ecological Niche Models for Latimeria chalumnae. Ecological modelling, 268, 55-63.
  • 34. Occurrence Data from GBIF (www.gbif.org) Occurrence Data from OBIS (www.iobis.org) ∩ Intersection - Difference ᴜ Union A x,y Event Date Modif Date Author Species Scientific Name Occurrence Points Processing B x,y Event Date Modif Date Author Species Scientific Name Records Similarity DD Duplicates Deletion Candela, L., Castelli, D., Coro, G., Lelii, L., Mangiacrapa, F., Marioli, V., & Pagano, P. (2015). An infrastructure- oriented approach for supporting biodiversity research. Ecological Informatics, 26, 162-172.
  • 35. Absence Locations Estimation Coro, G., Magliozzi, C., Berghe, E. V., Bailly, N., Ellenbroek, A., & Pagano, P. (2016). Estimating absence locations of marine species from data of scientific surveys in OBIS. Ecological Modelling, 323, 61-76. • Intersect survey data focussing on a target species • Maximise the separation between locations with and without occurrences • Spatially aggregate • Estimate absence locations
  • 36. Detecting Trends in Species Abundance • Fill some knowledge gaps on marine species • Account for sampling biases • Define trends for common species Plankton regime shift Herring recovered after the fish ban Appeltans W., Pissierssens P., Coro G., Italiano A., Pagano P., Ellenbroek A., Webb T. Trendylyzer: a long-term trend analysis on biogeographic data. In: Bollettino di Geofisica Teorica e Applicata: an International Journal of Earth Sciences, vol. 54 (Suppl.) pp. 203 - 205. Supplement: IMDIS 2013 - International Conference on Marine Data and Information Systems, 23-25 September, Lucca (Italy). OGS - Istituto Nazionale di Oceanografia e di Geofisica Sperimentale, 2013.
  • 37. Estimating Climate Change Effects on Species Distributions AquaMaps actual (native) distribution Today vs 2050 (~11 500 maps) Discover classes of changes by means of cluster analysis Coro, G., Magliozzi, C., Ellenbroek, A., Kaschner, K., & Pagano, P. (2015). Automatic classification of climate change effects on marine species distributions in 2050 using the AquaMaps model. Environmental and Ecological Statistics, 1-26.
  • 38. Cluster Analysis to Detect Common Species Average of average_number_of_species_occ urrences_per_dataset Average of number_of_datasets_containin g_at_least_one_observation_f or_the_ Average of number_of_6_mi nute_cells_contai ning_at_least_on e_observation_fo Average of number_of_mont hs_containing_at _least_one_occur rence_record_for _ Average of no_months_with _a_least_10_occu rrences Average of nInd/nOcc Cluster 0 100 100 100 100 100 100 Cluster 1 14.46 78.57 41.05 88.90 79.65 11.14 Cluster 2 2.43 63.04 12.90 66.16 31.16 5.64 Cluster 3 0.16 53.57 1.62 27.12 1.36 0.41 Normalization with respect to the maximum value for each column Common: frequent, widespread, high individual density Moderate Commonness: moderately frequent, moderately widespread, medium individual density Moderate-Low Commonness: poorly widespread, low-moderately frequent, low individual density Low Commonness: quite localized, not frequent, usually low individual density • The term “common species” refers intuitively to a species that is abundant in a certain area, widespread and at low risk of extinction. • By consequence, “rare species” are less abundant and possibly threatened. • Automatically detecting common and rare species, and how their status changes through time, is an important step in understanding the consequences of environmental change for ecosystem functioning. Coro, G., Webb, T. J., Appeltans, W., Bailly, N., Cattrijsse, A., & Pagano, P. (2015). Classifying degrees of species commonness: North Sea fish as a case study. Ecological Modelling, 312, 272-280..
  • 39. Invasive species • Seven data mining techniques to estimate the spread of the puffer fish in the Mediterranean Sea; • The approach is applicable also to other species; • Produced impact maps on FAO- AREAs, EEZs and GSAs. Under publication
  • 40. Search in Large Taxonomic Names Repositories A flexible workflow approach to taxon name matching Accounts for: • Variations in the spelling and interpretation of taxonomic names • Combination of data from different sources • Harmonization and reconciliation of Taxa names Raw Input String Gadus morua Lineus 1758 Correct Transcription: Gadus morhua (Linnaeus, 1758) Preprocessing And Parsing Taxon name Matcher 1 Taxon name Matcher 2 Taxon name Matcher n PostProcessing Reference Source (ASFIS) Reference Source (FISHBASE) Reference Source (WoRMS) Reference Source (OBIS) Berghe, E. V., Coro, G., Bailly, N., Fiorellato, F., Aldemita, C., Ellenbroek, A., & Pagano, P. (2015). Retrieving taxa names from large biodiversity data collections using a flexible matching workflow. Ecological Informatics, 28, 29-41.
  • 41. Vessels data analysis Most exploited locations detection Routes interpolation Fishing activity estimation Coro, G., Fortunati, L., & Pagano, P. (2013, June). Deriving fishing monthly effort and caught species from vessel trajectories. In OCEANS-Bergen, 2013 MTS/IEEE (pp. 1-5).
  • 42. Forecasting Fishery Statistics Frequency and time series structure detection (with SSA) was used to forecast effort, catch and locations of purse seine fishing in the Indian Ocean. Coro, G., Large, S., Magliozzi, C., & Pagano, P. (2016). Analysing and forecasting fisheries time series: purse seine in Indian Ocean as a case study. ICES Journal of Marine Science: Journal du Conseil, fsw131.
  • 43. Stock assessment Length-Weight Relations: estimates Length- Weight relation parameters for marine species, using Bayesian methods. Developed by R. Froese, T. Thorson and R. B. Reyes SGVM interpolation: interpolation of vessels trajectories. Developed by the Study Group on VMS, involving ICES FAO MSY: stock assessment for FAO catch data. Developed by the Resource Use and Conservation Division of the FAO Fisheries and Aquaculture Department (ref. Y. Ye - FAO) ICCAT VPA: stock assessment method for International Commission for the Conservation of Atlantic Tunas (ICCAT) data. Developed by Ifremer and IRD (ref. S. Bonhommeau, J. Bard) CMSY:estimates Maximum Sustainable Yield from catch statistics. Prime choice for ICES as main stock assessment tool. Developed by R. Froese, G. Coro, N. Demirel, K. Kleisner and H. Winker Atlantic herring BlueBRIDGE reduced time-to- market: State-of-the-art models to estimate Maximum Sustainable Yield computational time reduced of 95% in average Froese, R., Demirel, N., Coro, G., Kleisner, K. M., & Winker, H. (2016). Estimating fisheries reference points from catch and resilience. Fish and Fisheries.
  • 44. Links Web Portals • bluebridge.d4science.org • services.d4science.org Web sites • www.bluebridge-vres.eu • www.gcube-system.org • www.d4science.org • www.i-marine.eu