SlideShare a Scribd company logo
Using e-Infrastructures for
Biodiversity Conservation
Gianpaolo Coro
ISTI-CNR, Pisa, Italy
Module 4 - Outline
1. Data processing requirements by communities of
practice
2. The D4Science Statistical Manager
3. Ecological modelling
D4Science
D4Science is both a Data and a Computational e-Infrastructure
• Used by several Projects: i-Marine, EUBrazil OpenBio, ENVRI;
• Implements the notion of e-Infrastructure as-a-Service: it offers on demand access to
data management services and computational facilities;
• Hosts several VREs for Fisheries Managers, Biologists, Statisticians…and Students.
D4Science - Resources
Large Set of Biodiversity
and Taxonomic Datasets
connected
A Network to
distribute and
access to
Geospatial Data
Distributed Storage
System to store
datasets and
documents
A Social
Network
to share
opinions and
useful news
Algorithms for Biology-
related experiments
Data Processing
1. Data processing requirements by communities of
practice
2. The D4Science Statistical Manager
3. Ecological modelling
Some interests by communities of practice in Computational Statistics:
1. Repetition and validation of experiments
2. Exploitation of algorithms in several contexts
3. Hide the complexity of the calculations
4. Facilitate the management and the publication of the algorithms
Issues
…practically speaking, they search for:
1. Modular and pluggable solutions
2. Access by means of standard protocols
3. Hiding the complexity of parallel processing
4. Hiding the complexity of software management and provisioning
5. Active contribution with new algorithms and use cases
Issues
1. Data processing requirements by communities of
practice
2. The D4Science Statistical Manager
3. Ecological modelling
The Statistical Manager is a set of web services that aim to:
• Help scientists in computational statistics experiments
• Supply precooked state-of-the-art algorithms as-a-Service
• Perform calculations by using Map-Reduce in a seamless way to the users
• Share input, results, parameters and comments with colleagues by means of Virtual
Research Environment in the D4Science e-Infrastructure
Statistical Manager – Users’ View
Statistical
Manager
D4Science
Computational
Facilities
Sharing
Setup and execution
Open Platform Approach
External
Computing
Facility
OGC
WPS
Interface
People can contribute with:
• R scripts
• Java programs
• Linux programs
• OGC-WPS services
The Statistical Manager allows to:
• Develop distributed computation in easy way
(Statistical Manager Framework)
• Parallelize R Scripts without possibly changing
the code
• Automatically produce a User Interface to
perform experiments
• Reuse models and best practices developed by
the community
• Connect external computational facilities via
WPS OGC Standard
Statistical Manager – Developers’ View
Architecture
Internal Work
The Context: Resources and Sharing
Statistical Manager - Interface
Experiment Execution
Computations Check
Summary of the Input, Output
and Parameters of the experiment
Data Space - Sharing and Import
100 Hosted Algorithms
Numbers
FishBase (US, CA,
TW)
Geomar
Naturhistoriska
riksmuseet:
Startsida
Agrocampus
Anonymous
Individuals
INRA
King Abdullah
University of Science
and Technology
ISTI
Users
2013 2014
Avg Users
per month
200 20100
Number of
Algorithms
50 100
Number of
contributing
Organization
s providing
algorithms
2
CNR,
Geomar
7
CNR,
Geomar,
FIN,
FAO,
T2,
IRD,
Agrocampus
Publications 8 13
Sum Impact
Factor
2.66 12.17
2012
1. L. Candela, G. Coro, P. Pagano, ”Supporting Tabular Data Characterization in a Large Scale Data Infrastructure by Lexical Matching Techniques”, In M. Agosti et al. (Eds.): IRCDL 2012, Communications in Computer
and Information Science Volume 354, pp. 21–32. Springer, Heidelberg (2012).
2013
2. R. Froese, J. Thorson, R. B. Reyes Jr. A Bayesian approach for estimating length-weight relationships in fishes. Journal of Applied Ichthyology. Volume 30, Issue 1, pages 78–85, 2013
3. G. Coro, P. Pagano, A. Ellenbroek, ”Combining Simulated Expert Knowledge with Neural Networks to Produce Ecological Niche Models for Latimeria chalumnae”, Ecological Modelling, DOI
10.1016/j.ecolmodel.2013.08.005, Ed. Elsevier.
4. G. Coro, L. Fortunati, P. Pagano. Deriving Fishing Monthly Effort and Caught Species from Vessel Trajectories. Oceans 2013, Proceedings of MTS/IEEE.
5. P. Pagano, G. Coro, D. Castelli, L. Candela, F. Sinibaldi, A. Manzi. Cloud Computing for Ecological Modeling in the D4Science Infrastructure. Proceedings of EGI Community Forum 2013.
6. D. Castelli, P. Pagano, G. Coro, F. Sinibaldi, ”Modellazione della Nicchia Ecologica di Specie Marine (Marine Species Ecological Niche Modelling)”. In “Le Tecnologie del CNR per il Mare” (CNR Marine Technologies)
pp. 140, Ed. CNR (Roma, Italy).
7. D. Castelli, P. Pagano, G. Coro, ”Variazioni Climatiche ed Effetto sulle Specie Marine (Climate Changes and Effect on Marine Species)”. In ”Le Tecnologie del CNR per il Mare” (CNR Marine Technologies) pp. 139,
Ed. CNR (Roma, Italy).
8. D. Castelli, P. Pagano, G. Coro, ”Elaborazione di Dati Trasmessi da Pescherecci (Processing of fishing vessel transmitted information)”. In “Le Tecnologie del CNR per il Mare” (CNR Marine Technologies). pp. 133,
Ed. CNR (Roma, Italy).
9. G. Coro, P. Pagano, A. Ellenbroek. Automatic Procedures to Assist in Manual Review of Marine Species Distribution Maps. To be published in M. Tomassini et al. (Eds.): International Conference on Adaptive and
Natural Computing Algorithms (ICANNGA’13), Springer, Heidelberg (2013).
10. Candela L., Castelli D., Coro G., Pagano P., Sinibaldi F. Species distribution modeling in the cloud. In: Concurrency and Computation-Practice & Experience, Geoffrey C. Fox, David W. Walker (eds.). Wiley,
11. Appeltans W., Pissierssens P., Coro G., Italiano A., Pagano P., Ellenbroek A., Webb T. Trendylyzer: a long-term trend analysis on biogeographic data. In: Bollettino di Geofisica Teorica e Applicata: an International
Journal of Earth Sciences, vol. 54 (Suppl.) pp. 203 - 205. Supplement: IMDIS 2013 - International Conference on Marine Data and Information Systems, 23-25 September, Lucca (Italy). OGS - Istituto Nazionale di
Oceanografia e di Geofisica Sperimentale, 2013.
12. Coro G., Gioia A., Pagano P., Candela L. A service for statistical analysis of marine data in a distributed e-infrastructure. In: Bollettino di Geofisica Teorica e Applicata: an International Journal of Earth Sciences,
vol. 54 (Suppl.) pp. 68 - 70. Supplement: IMDIS 2013 - International Conference on Marine Data and Information Systems, 23-25 September, Lucca (Italy). OGS - Istituto Nazionale di Oceanografia e di Geofisica
Sperimentale, 2013.
13. Castelli D., Pagano P., Candela L., Coro G. The iMarine data bonanza: improving data discovery and management through a hybrid data infrastructure. In: Bollettino di Geofisica Teorica e Applicata: an
International Journal of Earth Sciences, vol. 54 (Suppl.) pp. 105 - 107. Supplement: IMDIS 2013 - International Conference on Marine Data and Information Systems, 23-25 September, Lucca (Italy). OGS - Istituto
Nazionale di Oceanografia e di Geofisica Sperimentale, 2013.
14. Coro G. A Lightweight Guide on Gibbs Sampling and JAGS. A Lightweight Guide on Gibbs Sampling and JAGS. Technical report, 2013.
15. Vanden Berghe E., Bailly N., Aldemita C., Fiorellato F., Coro G., Ellenbroek A., Pagano P. BiOnym - a flexible workflow approach to taxon name matching. In: TDWG 2013 - Taxonomic Database Working Group
2013 (Firenze, 28-31 October 2013).
16. Coro G., Pagano P., Candela L. Providing Statistical Algorithms as-a-Service. In: TDWG 2013 - Taxonomic Database Working Group 2013 (Firenze, 28-31 October 2013).
2014
17. Candela L., Castelli D., Coro G., De Faveri F., Italiano A., Lelii L., Mangiacrapa F., Marioli V., Pagano P. Integrating Species Occurrence Databases to Facilitate Data Analysis. Approved for the Ecological Informatics
Journal, Elsevier 2014.
18. Froese R, Coro G., Kleisner K., Demirel N. Revisiting Safe Biological Limits in Fisheries. Sumitted to the Fish and Fisheries Journal, Wiley 2014
19. Coro G., Candela L., Pagano P., Italiano A., Liccardo L. Parallelising the Execution of Native Data Mining Algorithms for Computational Biology. Submitted to Concurrency and Computation-Practice & Experience,
Wiley 2014.
20. Coro G. , Pagano P., Ellenbroek A. Comparing Heterogeneous Distribution Maps for Marine Species. Submitted to GIScience & Remote Sensing, Taylor & Francis 2014.
2015
21. G. Coro, C. Magliozzi, A. Ellenbroek, P. Pagano, Improving data quality to build a robust distribution model for Architeuthis dux, Ecological Modelling, Volume 305, 10 June 2015, Pages 29-39, ISSN 0304-3800
22. G. Coro, C. Magliozzi, E. Vanden Berghe, N. Bailly, A. Ellenbroek, P. Pagano, Estimating absence locations of marine species from data of scientific surveys
23. R. Froese, N. Demirel, G. Coro, K. Kleisner, H. Winker, Estimating Fisheries Reference Points from Catch and Resilience
24. E. Vanden Berghe, N. Bailly, G. Coro, F. Fiorellato, C. Aldemita, A. Ellenbroek, P. Pagano. Retrieving taxa names from large biodiversity data collections using a flexible matching workflow
25. G. Coro, C. Magliozzi, A. Ellenbroek, K. Kaschner, P. Pagano. Automatic classification of climate change effects on marine species distributions in 2050 using the AquaMaps model
26. E. Trumpy, G. Coro, A. Manzella, P. Pagano, D. Castelli, P. Calcagno, A. Nador, T. Bragasson, S. Grellet. Building a European Geothermal Information Network using a
Publications around the Statistical Manager
1. Data processing requirements by communities of
practice
2. The D4Science Statistical Manager
3. Ecological modelling
Niche Modelling
Scope:
• characterize the environmental conditions that are suitable for the species to
subsist;
• identify where suitable environment is distributed in geographical space;
• estimate the actual and potential geographic distributions of a species.
Actual distribution: areas that are truly occupied by the species
Fundamental niche: the full range of abiotic conditions within which the species is viable
Potential distribution: areas with abiotic conditions that fall within the fundamental niche
Niche Modelling and Absence and Presence Points
Approaches:
Mechanistic models: incorporate physiological limits in a species tolerance to
environmental conditions;
Correlative models: automatically estimate the environmental conditions that are
suitable for a species by relying on examples.
Presence points: occurrence records, i.e. places where the species has been observed
in its habitat
Absence points: locations where the environment is
considered unsuitable for the species.
In many cases, absence points must be simulated
(pseudo-absence points), because reliable data are rare.
Examples: Potential Distributions of the Coelacanth
Presence-only: MaxEnt Presence-only: GARP
Expert (semi-Mechanistic):
AquaMaps
PresenceAbsence: Artificial Neural Networks
Comparison between several
approaches estimating the potential
distribution of the Coelacanth.
The best depends on the quality of
the data.
Thus, cleaning operations are very
important!
C-squares (concise spatial query and representation system):
• A system of geocodes that provides a basis for simple spatial indexing of
geographic features
• Devised by Tony Rees of CSIRO Marine and Atmospheric Research
• A compact encoding of Latitude and Longitude and Resolution
Example:
C-square code: 3414:227:3
Resolution: 0.5°
N,S,W,E limits: -42.5,-43.0,147.0,147.5
A useful converter: http://www.marine.csiro.au/marq/csq_builder.init
C-square codes
Contains information on:
a) cell codes
b) statistical cell properties (center, limits, and area);
c) membership in relevant areas (FAO areas, EEZs or LMEs);
d) physical attributes (depth, salinity or temperature);
e) biological properties (e.g. primary production).
Data gathered from:
Sea Around Us Project
CSIRO
Kansas Geological Survey
Compiled by:
Kristin Kaschner & Jonathan Ready
HCAF (Half-degree Cells Authority File)
Contains information used for describing the environmental
tolerance and preference of a species:
• distribution using FAO areas and bounding box
• range of values per environmental parameter (min., preferred
min., preferred max., max.)
HSPEN (Half-degree Species Environmental Envelope)
Online experiment:
the i-Marine Filtering Facilities
https://i-marine.d4science.org/group/biodiversitylab/processing-tools
A Niche model relying on expert
knowledge
Contains the assignment of a species to a half-degree cell and
the corresponding probability of occurrence of the species in
a given cell;
The assignment probability is the multiplicative equation of
each of the environmental parameters (SST, salinity, prim.
prod., sea ice concentration, distance to land).
HSPEC (Half-degree Species Assignment)
AquaMaps
Gadus morhua
A Presence-only species model that relies on expert knowledge about the species habitat
• AquaMaps Suitable: estimates the Potential Distribution
• AquaMaps Native: estimates the Actual Distribution
• Maps have 0.5 degrees resolution;
• Expert knowledge is used in modelling the habitat parameters;
• AquaMaps adopts mechanistic assumptions combined with an automatic estimation of
parameter values.
• “good cells” - within bounding box or known FAO areas
• minimum of 10 “good cells” for needed for extracting parameters
Bounding box or FAO area limits serve as independent verification of the validity of occurrence records.
AquaMaps – Good Cells
Taken from: http://www.aquamaps.org/main/presentations/Part%20II%20-%20AquaMaps%20behind%20the%20scene.pdf
Global grid of 259,200
half degree cells
Good cells are used to derive the range of environmental parameters within the species’ native range.
AquaMaps – Extracting Environmental Parameters
Taken from: http://www.aquamaps.org/main/presentations/AquaMaps_General0908.pdf
• Depth ranges: typically from literature; depth estimate based on habitat description
• Min = 25th percentile - 1.5 * interquartile or absolute minimum in extracted data (whichever is greater)
• Max = 75th percentile + 1.5 * interquartile or absolute maximum in extracted data (whichever is greater)
• PrefMin = 10th percentile of observed variation in an environmental parameter
• PrefMax = 90th percentile of observed variation in an environmental parameter
• Surface values for species with min depth ≤ 200m
• Bottom values for species with min depth > 200m
The environmental envelopes describe tolerances of a species with respect to each environmental
parameter.
AquaMaps – Environmental Envelopes
Taken from: http://www.aquamaps.org/main/presentations/AquaMaps_General0908.pdf
Predictor
Preferred
min
Preferred
max
Min Max
PMax
Relativeprobability
ofoccurrence
Pc = Pbathymetryc
x PSSTc
x Psalinityc
x Pchl ac
x PIceDistc
x PLandDistc
Probabilities of species occurrence are generated by matching the species environmental envelope against local
environmental conditions to determine relative suitability of a given area.
Probability of Occurrence
AquaMaps – Environmental Envelopes
Taken from: http://www.aquamaps.org/main/presentations/AquaMaps_General0908.pdf
The probability is
calculated for each 0.5
cell in the oceans.
A color is associated to
the probability values
AquaMaps – Probability
Pc = Pbathymetryc
x PSSTc
x
Psalinityc
x Pchl ac
x PIceDistc
x
PLandDistc
Online experiment:
AquaMaps
https://i-marine.d4science.org/group/biodiversitylab/processing-tools
What if Expert Knowledge was missing?
Artificial Neural Network
Presence/Absence
Points examples
Probability
(1/ 0)
• Learns from positive (presence) and negative (absence) examples (training mode);
• Adapts the network weights to produce the correct outputs on the examples;
• Produces probability values for new input (test mode).
Artificial Neural Networks Maps
Examples and Exercises:
AquaMaps - Neural Networks
https://i-marine.d4science.org/group/biodiversitylab/processing-tools
Climate change analysis
• HCAF Scenarios can be simulated by
means of interpolation.
• Interpolation produces half-degree
values between a start and an end date
• Once new HCAFs are available we can
produce an HSPEC for each HCAF
Simulation of HCAF Scenarios
Climate Changes Effects on Species
Estimated impact of climate changes over 20 years on
11549 species.
Bioclimate HSpec
Overall occupancy in time
Online experiment:
BioClimate Analysis
https://i-marine.d4science.org/group/biodiversitylab/processing-tools
Grouping the occurrence points and the
environmental features
of different species
• Group points by spatial distance or density
• Detect outliers
Occurrence Points Clustering
DBScan acts on the points density
Parameters:
•Epsilon = 10
•Min Points = 2
Outliers
Density Clustering
XMeans
K = [20,30]
Min Points = 2
MaxIter=1000
KMeans
K = 24
Min Points = 2
MaxIter=1000
MaxOptSteps
= 1000
No Outliers Detected!
No Outliers Detected!
Distance Clustering
Online experiment:
Clustering
https://i-marine.d4science.org/group/biodiversitylab/processing-tools
Discovering similarities
among habitats
Similarity between habitats
Habitat Representativeness Score:
• Measures the degree to which sampled habitats are representative for a certain
area of study;
• Has been used for assessing the minimum number of surveys on a study area that
are needed to cover a good heterogeneity of species habitat variables.
Can be used to:
• Measure the similarity between the environmental features of two areas;
• Assesses the quality of models and environmental features.
HRS=10.6
Habitat
Representativeness
Score
A+P
HRS 10.58
P
HRS 10.61
Habitat Representativeness Score
Absence
Presence The HRS is too high -> all the maps can be unreliable and
need expert validation
HRS is in [0;2] for each feature
The overall HRS is the sum of the HRSs of the environmental features
Habitat Representativeness Score for each Feature
HRS 10.58
mean depth in t.c. 1.90
max depth in t.c. 0.87
min depth in t.c. 0.04
mean annual s surface temp 1.19
mean annual s bottom temp 1.59
mean salinity in t.c. 1.23
mean bottom salinity in t.c. 0.44
mean primary production 0.61
annual ice concentration 0.71
distance from land 0.46
ocean area in t.c. 1.54
Presence, Absence
HRS 10.61
mean depth in t.c. 1.92
max depth in t.c. 0.86
min depth in t.c. 0.04
mean annual s surface temp 1.13
mean annual s bottom temp 1.56
mean salinity in t.c. 1.29
mean bottom salinity in t.c. 0.34
mean primary production 0.64
annual ice concentration 0.78
distance from land 0.49
ocean area in t.c. 1.55
The most representative feature is the
minimum depth in a cell of 0.5 degrees
Presence only
Even in this case the most representative
feature is the minimum depth in a cell of 0.5
degrees
Online experiment:
Habitat Representativeness Score
https://i-marine.d4science.org/group/biodiversitylab/processing-tools
Retrieving taxonomic
information for a set of species
BiOnym
Preprocessing
And
Parsing
A workflow approach to
taxon name matching.
Accounts for:
• Variations in the spelling and
interpretation of taxonomic
names
• Combination of data from
different sources
• Harmonization and reconciliation
of Taxa names
Taxon
Matcher 1
Taxon
Matcher 2
Taxon
Matcher n
PostProcessing
Reference
Source
(ASFIS)
Reference
Source
(FISHBASE)
Reference
Source
(WoRMS)
Raw Input String.
E.g. Gadus morua Lineus 1758
Correct Transcriptions:
E.g. Gadus morhua (Linnaeus, 1758)
Reference
Source
(Other in
DwC-A)
GSAy
GSAY
GSrAy
GSrAY
GSA
Complete match
Step Rate
GSAy 950
GSAY 940
GSrAy 930
GSrAY 920
GSA 910
GSrA 900
GSY 890
GSrY 880
SAy 870
SAY 860
SrAy 850
SrAY 840
GAy 830
GAY 820
…
Parentheses issue
Gender agreement issues
Gender agreement and parentheses issues
Year issues
GSA
Year issues
Matcher Example - GSAy
GSY
GS
SrAy
Rest
Author issues, misspelling or wrong
Step Rate
GSY 950
GSAY 940
GSrAy 930
GSrAY 920
GSA 910
GSrA 900
GSY 890
GSrY 880
SAy 870
SAY 860
SrAy 850
SrAY 840
GAy 830
GAY 820
…
Homonyms
Other combinations
Taxamatch
GAY
Visual check
Matcher Example - GSAy
BiOnym - Output
Online experiment:
BiOnym
https://i-marine.d4science.org/group/biodiversitylab/processing-tools

More Related Content

Viewers also liked

графики функций и их применение
графики функций и их применениеграфики функций и их применение
графики функций и их применение
artem2905
 
quản lý nợ nước ngoài
quản lý nợ nước ngoài quản lý nợ nước ngoài
quản lý nợ nước ngoài
Trang Toét
 
Tóm tắt các điều kiện niêm yết trên hnx và hose trong nđ59
Tóm tắt các điều kiện niêm yết trên hnx và hose trong nđ59Tóm tắt các điều kiện niêm yết trên hnx và hose trong nđ59
Tóm tắt các điều kiện niêm yết trên hnx và hose trong nđ59
Trang Toét
 

Viewers also liked (13)

Hao dinh solution that matters ux malaysia v2
Hao dinh solution that matters ux malaysia v2Hao dinh solution that matters ux malaysia v2
Hao dinh solution that matters ux malaysia v2
 
Bram pitoyo cluttered-v2
Bram pitoyo  cluttered-v2Bram pitoyo  cluttered-v2
Bram pitoyo cluttered-v2
 
The slippery road of change
The slippery road of changeThe slippery road of change
The slippery road of change
 
USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 1
USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 1USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 1
USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 1
 
овз
овзовз
овз
 
Jonathan hirsch real-world ux
Jonathan hirsch   real-world uxJonathan hirsch   real-world ux
Jonathan hirsch real-world ux
 
School board meeting presentation
School board meeting presentationSchool board meeting presentation
School board meeting presentation
 
Mike Lai creating organizational change v2 uxmy
Mike Lai   creating organizational change v2 uxmyMike Lai   creating organizational change v2 uxmy
Mike Lai creating organizational change v2 uxmy
 
графики функций и их применение
графики функций и их применениеграфики функций и их применение
графики функций и их применение
 
quản lý nợ nước ngoài
quản lý nợ nước ngoài quản lý nợ nước ngoài
quản lý nợ nước ngoài
 
GBORREROS CV
GBORREROS CVGBORREROS CV
GBORREROS CV
 
Yu-Hsiu Li Design as one UXMY
Yu-Hsiu Li Design as one UXMYYu-Hsiu Li Design as one UXMY
Yu-Hsiu Li Design as one UXMY
 
Tóm tắt các điều kiện niêm yết trên hnx và hose trong nđ59
Tóm tắt các điều kiện niêm yết trên hnx và hose trong nđ59Tóm tắt các điều kiện niêm yết trên hnx và hose trong nđ59
Tóm tắt các điều kiện niêm yết trên hnx và hose trong nđ59
 

Similar to USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

Mercator Ocean newsletter 48
Mercator Ocean newsletter 48Mercator Ocean newsletter 48
Mercator Ocean newsletter 48
Mercator Ocean International
 

Similar to USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4 (20)

Spatial planning: towards a new approach in fisheries management
Spatial planning: towards a new approach in fisheries managementSpatial planning: towards a new approach in fisheries management
Spatial planning: towards a new approach in fisheries management
 
Building on iMarine for fostering Innovation, Decision making, Governance and...
Building on iMarine for fostering Innovation, Decision making, Governance and...Building on iMarine for fostering Innovation, Decision making, Governance and...
Building on iMarine for fostering Innovation, Decision making, Governance and...
 
The BlueBRIDGE Project - Pasquale Pagano
The BlueBRIDGE Project - Pasquale PaganoThe BlueBRIDGE Project - Pasquale Pagano
The BlueBRIDGE Project - Pasquale Pagano
 
The role of Earth Observations in DOPA, a Digital Observatory for Protected A...
The role of Earth Observations in DOPA, a Digital Observatory for Protected A...The role of Earth Observations in DOPA, a Digital Observatory for Protected A...
The role of Earth Observations in DOPA, a Digital Observatory for Protected A...
 
Nicola Ancona – Dall’Intelligenza Artificiale alla Systems Medicine
Nicola Ancona – Dall’Intelligenza Artificiale alla Systems MedicineNicola Ancona – Dall’Intelligenza Artificiale alla Systems Medicine
Nicola Ancona – Dall’Intelligenza Artificiale alla Systems Medicine
 
Identifying and Linking Physical Samples with Data: Using IGSN
Identifying and Linking Physical Samples with Data: Using IGSNIdentifying and Linking Physical Samples with Data: Using IGSN
Identifying and Linking Physical Samples with Data: Using IGSN
 
2021_10_15 «Enseñando conciencia medioambiental en espacios de aprendizaje co...
2021_10_15 «Enseñando conciencia medioambiental en espacios de aprendizaje co...2021_10_15 «Enseñando conciencia medioambiental en espacios de aprendizaje co...
2021_10_15 «Enseñando conciencia medioambiental en espacios de aprendizaje co...
 
Strategic Report for 1st Quarter 2023.docx
Strategic Report for 1st Quarter 2023.docxStrategic Report for 1st Quarter 2023.docx
Strategic Report for 1st Quarter 2023.docx
 
C7.05: Ocean Observations Research Coordination Network - Hans-Peter Plag
C7.05: Ocean Observations Research Coordination Network - Hans-Peter PlagC7.05: Ocean Observations Research Coordination Network - Hans-Peter Plag
C7.05: Ocean Observations Research Coordination Network - Hans-Peter Plag
 
Neale biosketch 2014-revised
Neale biosketch 2014-revisedNeale biosketch 2014-revised
Neale biosketch 2014-revised
 
Fostering global data management with public tuna fisheries data
Fostering global data management with public tuna fisheries dataFostering global data management with public tuna fisheries data
Fostering global data management with public tuna fisheries data
 
land health surveillance highlights
land health surveillance highlightsland health surveillance highlights
land health surveillance highlights
 
GBIF and Biodiversity informatics for museums, 15 March 2021
GBIF and Biodiversity informatics for museums, 15 March 2021GBIF and Biodiversity informatics for museums, 15 March 2021
GBIF and Biodiversity informatics for museums, 15 March 2021
 
Seeding Science Knowledge by engaging local experts
Seeding Science Knowledge by engaging local expertsSeeding Science Knowledge by engaging local experts
Seeding Science Knowledge by engaging local experts
 
Future direction of geoinfomatics
Future direction of geoinfomaticsFuture direction of geoinfomatics
Future direction of geoinfomatics
 
C4.01: Overview of the Coastal Zone Community of Practice & Services for the ...
C4.01: Overview of the Coastal Zone Community of Practice & Services for the ...C4.01: Overview of the Coastal Zone Community of Practice & Services for the ...
C4.01: Overview of the Coastal Zone Community of Practice & Services for the ...
 
morningkeynote.pdf
morningkeynote.pdfmorningkeynote.pdf
morningkeynote.pdf
 
Presentation Template
Presentation TemplatePresentation Template
Presentation Template
 
ILIAD and CoCoast @ Noordzeedagen 2021
ILIAD and CoCoast @ Noordzeedagen 2021ILIAD and CoCoast @ Noordzeedagen 2021
ILIAD and CoCoast @ Noordzeedagen 2021
 
Mercator Ocean newsletter 48
Mercator Ocean newsletter 48Mercator Ocean newsletter 48
Mercator Ocean newsletter 48
 

Recently uploaded

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
 

Recently uploaded (20)

From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT Professionals
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
The architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdfThe architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdf
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
Agentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdfAgentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdf
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 

USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 4

  • 1. Using e-Infrastructures for Biodiversity Conservation Gianpaolo Coro ISTI-CNR, Pisa, Italy
  • 2. Module 4 - Outline 1. Data processing requirements by communities of practice 2. The D4Science Statistical Manager 3. Ecological modelling
  • 3. D4Science D4Science is both a Data and a Computational e-Infrastructure • Used by several Projects: i-Marine, EUBrazil OpenBio, ENVRI; • Implements the notion of e-Infrastructure as-a-Service: it offers on demand access to data management services and computational facilities; • Hosts several VREs for Fisheries Managers, Biologists, Statisticians…and Students.
  • 4. D4Science - Resources Large Set of Biodiversity and Taxonomic Datasets connected A Network to distribute and access to Geospatial Data Distributed Storage System to store datasets and documents A Social Network to share opinions and useful news Algorithms for Biology- related experiments
  • 6. 1. Data processing requirements by communities of practice 2. The D4Science Statistical Manager 3. Ecological modelling
  • 7. Some interests by communities of practice in Computational Statistics: 1. Repetition and validation of experiments 2. Exploitation of algorithms in several contexts 3. Hide the complexity of the calculations 4. Facilitate the management and the publication of the algorithms Issues
  • 8. …practically speaking, they search for: 1. Modular and pluggable solutions 2. Access by means of standard protocols 3. Hiding the complexity of parallel processing 4. Hiding the complexity of software management and provisioning 5. Active contribution with new algorithms and use cases Issues
  • 9. 1. Data processing requirements by communities of practice 2. The D4Science Statistical Manager 3. Ecological modelling
  • 10. The Statistical Manager is a set of web services that aim to: • Help scientists in computational statistics experiments • Supply precooked state-of-the-art algorithms as-a-Service • Perform calculations by using Map-Reduce in a seamless way to the users • Share input, results, parameters and comments with colleagues by means of Virtual Research Environment in the D4Science e-Infrastructure Statistical Manager – Users’ View Statistical Manager D4Science Computational Facilities Sharing Setup and execution
  • 11. Open Platform Approach External Computing Facility OGC WPS Interface People can contribute with: • R scripts • Java programs • Linux programs • OGC-WPS services
  • 12. The Statistical Manager allows to: • Develop distributed computation in easy way (Statistical Manager Framework) • Parallelize R Scripts without possibly changing the code • Automatically produce a User Interface to perform experiments • Reuse models and best practices developed by the community • Connect external computational facilities via WPS OGC Standard Statistical Manager – Developers’ View
  • 15. The Context: Resources and Sharing
  • 18. Computations Check Summary of the Input, Output and Parameters of the experiment
  • 19. Data Space - Sharing and Import
  • 21. Numbers FishBase (US, CA, TW) Geomar Naturhistoriska riksmuseet: Startsida Agrocampus Anonymous Individuals INRA King Abdullah University of Science and Technology ISTI Users 2013 2014 Avg Users per month 200 20100 Number of Algorithms 50 100 Number of contributing Organization s providing algorithms 2 CNR, Geomar 7 CNR, Geomar, FIN, FAO, T2, IRD, Agrocampus Publications 8 13 Sum Impact Factor 2.66 12.17
  • 22. 2012 1. L. Candela, G. Coro, P. Pagano, ”Supporting Tabular Data Characterization in a Large Scale Data Infrastructure by Lexical Matching Techniques”, In M. Agosti et al. (Eds.): IRCDL 2012, Communications in Computer and Information Science Volume 354, pp. 21–32. Springer, Heidelberg (2012). 2013 2. R. Froese, J. Thorson, R. B. Reyes Jr. A Bayesian approach for estimating length-weight relationships in fishes. Journal of Applied Ichthyology. Volume 30, Issue 1, pages 78–85, 2013 3. G. Coro, P. Pagano, A. Ellenbroek, ”Combining Simulated Expert Knowledge with Neural Networks to Produce Ecological Niche Models for Latimeria chalumnae”, Ecological Modelling, DOI 10.1016/j.ecolmodel.2013.08.005, Ed. Elsevier. 4. G. Coro, L. Fortunati, P. Pagano. Deriving Fishing Monthly Effort and Caught Species from Vessel Trajectories. Oceans 2013, Proceedings of MTS/IEEE. 5. P. Pagano, G. Coro, D. Castelli, L. Candela, F. Sinibaldi, A. Manzi. Cloud Computing for Ecological Modeling in the D4Science Infrastructure. Proceedings of EGI Community Forum 2013. 6. D. Castelli, P. Pagano, G. Coro, F. Sinibaldi, ”Modellazione della Nicchia Ecologica di Specie Marine (Marine Species Ecological Niche Modelling)”. In “Le Tecnologie del CNR per il Mare” (CNR Marine Technologies) pp. 140, Ed. CNR (Roma, Italy). 7. D. Castelli, P. Pagano, G. Coro, ”Variazioni Climatiche ed Effetto sulle Specie Marine (Climate Changes and Effect on Marine Species)”. In ”Le Tecnologie del CNR per il Mare” (CNR Marine Technologies) pp. 139, Ed. CNR (Roma, Italy). 8. D. Castelli, P. Pagano, G. Coro, ”Elaborazione di Dati Trasmessi da Pescherecci (Processing of fishing vessel transmitted information)”. In “Le Tecnologie del CNR per il Mare” (CNR Marine Technologies). pp. 133, Ed. CNR (Roma, Italy). 9. G. Coro, P. Pagano, A. Ellenbroek. Automatic Procedures to Assist in Manual Review of Marine Species Distribution Maps. To be published in M. Tomassini et al. (Eds.): International Conference on Adaptive and Natural Computing Algorithms (ICANNGA’13), Springer, Heidelberg (2013). 10. Candela L., Castelli D., Coro G., Pagano P., Sinibaldi F. Species distribution modeling in the cloud. In: Concurrency and Computation-Practice & Experience, Geoffrey C. Fox, David W. Walker (eds.). Wiley, 11. Appeltans W., Pissierssens P., Coro G., Italiano A., Pagano P., Ellenbroek A., Webb T. Trendylyzer: a long-term trend analysis on biogeographic data. In: Bollettino di Geofisica Teorica e Applicata: an International Journal of Earth Sciences, vol. 54 (Suppl.) pp. 203 - 205. Supplement: IMDIS 2013 - International Conference on Marine Data and Information Systems, 23-25 September, Lucca (Italy). OGS - Istituto Nazionale di Oceanografia e di Geofisica Sperimentale, 2013. 12. Coro G., Gioia A., Pagano P., Candela L. A service for statistical analysis of marine data in a distributed e-infrastructure. In: Bollettino di Geofisica Teorica e Applicata: an International Journal of Earth Sciences, vol. 54 (Suppl.) pp. 68 - 70. Supplement: IMDIS 2013 - International Conference on Marine Data and Information Systems, 23-25 September, Lucca (Italy). OGS - Istituto Nazionale di Oceanografia e di Geofisica Sperimentale, 2013. 13. Castelli D., Pagano P., Candela L., Coro G. The iMarine data bonanza: improving data discovery and management through a hybrid data infrastructure. In: Bollettino di Geofisica Teorica e Applicata: an International Journal of Earth Sciences, vol. 54 (Suppl.) pp. 105 - 107. Supplement: IMDIS 2013 - International Conference on Marine Data and Information Systems, 23-25 September, Lucca (Italy). OGS - Istituto Nazionale di Oceanografia e di Geofisica Sperimentale, 2013. 14. Coro G. A Lightweight Guide on Gibbs Sampling and JAGS. A Lightweight Guide on Gibbs Sampling and JAGS. Technical report, 2013. 15. Vanden Berghe E., Bailly N., Aldemita C., Fiorellato F., Coro G., Ellenbroek A., Pagano P. BiOnym - a flexible workflow approach to taxon name matching. In: TDWG 2013 - Taxonomic Database Working Group 2013 (Firenze, 28-31 October 2013). 16. Coro G., Pagano P., Candela L. Providing Statistical Algorithms as-a-Service. In: TDWG 2013 - Taxonomic Database Working Group 2013 (Firenze, 28-31 October 2013). 2014 17. Candela L., Castelli D., Coro G., De Faveri F., Italiano A., Lelii L., Mangiacrapa F., Marioli V., Pagano P. Integrating Species Occurrence Databases to Facilitate Data Analysis. Approved for the Ecological Informatics Journal, Elsevier 2014. 18. Froese R, Coro G., Kleisner K., Demirel N. Revisiting Safe Biological Limits in Fisheries. Sumitted to the Fish and Fisheries Journal, Wiley 2014 19. Coro G., Candela L., Pagano P., Italiano A., Liccardo L. Parallelising the Execution of Native Data Mining Algorithms for Computational Biology. Submitted to Concurrency and Computation-Practice & Experience, Wiley 2014. 20. Coro G. , Pagano P., Ellenbroek A. Comparing Heterogeneous Distribution Maps for Marine Species. Submitted to GIScience & Remote Sensing, Taylor & Francis 2014. 2015 21. G. Coro, C. Magliozzi, A. Ellenbroek, P. Pagano, Improving data quality to build a robust distribution model for Architeuthis dux, Ecological Modelling, Volume 305, 10 June 2015, Pages 29-39, ISSN 0304-3800 22. G. Coro, C. Magliozzi, E. Vanden Berghe, N. Bailly, A. Ellenbroek, P. Pagano, Estimating absence locations of marine species from data of scientific surveys 23. R. Froese, N. Demirel, G. Coro, K. Kleisner, H. Winker, Estimating Fisheries Reference Points from Catch and Resilience 24. E. Vanden Berghe, N. Bailly, G. Coro, F. Fiorellato, C. Aldemita, A. Ellenbroek, P. Pagano. Retrieving taxa names from large biodiversity data collections using a flexible matching workflow 25. G. Coro, C. Magliozzi, A. Ellenbroek, K. Kaschner, P. Pagano. Automatic classification of climate change effects on marine species distributions in 2050 using the AquaMaps model 26. E. Trumpy, G. Coro, A. Manzella, P. Pagano, D. Castelli, P. Calcagno, A. Nador, T. Bragasson, S. Grellet. Building a European Geothermal Information Network using a Publications around the Statistical Manager
  • 23. 1. Data processing requirements by communities of practice 2. The D4Science Statistical Manager 3. Ecological modelling
  • 24. Niche Modelling Scope: • characterize the environmental conditions that are suitable for the species to subsist; • identify where suitable environment is distributed in geographical space; • estimate the actual and potential geographic distributions of a species. Actual distribution: areas that are truly occupied by the species Fundamental niche: the full range of abiotic conditions within which the species is viable Potential distribution: areas with abiotic conditions that fall within the fundamental niche
  • 25. Niche Modelling and Absence and Presence Points Approaches: Mechanistic models: incorporate physiological limits in a species tolerance to environmental conditions; Correlative models: automatically estimate the environmental conditions that are suitable for a species by relying on examples. Presence points: occurrence records, i.e. places where the species has been observed in its habitat Absence points: locations where the environment is considered unsuitable for the species. In many cases, absence points must be simulated (pseudo-absence points), because reliable data are rare.
  • 26. Examples: Potential Distributions of the Coelacanth Presence-only: MaxEnt Presence-only: GARP Expert (semi-Mechanistic): AquaMaps PresenceAbsence: Artificial Neural Networks Comparison between several approaches estimating the potential distribution of the Coelacanth. The best depends on the quality of the data. Thus, cleaning operations are very important!
  • 27. C-squares (concise spatial query and representation system): • A system of geocodes that provides a basis for simple spatial indexing of geographic features • Devised by Tony Rees of CSIRO Marine and Atmospheric Research • A compact encoding of Latitude and Longitude and Resolution Example: C-square code: 3414:227:3 Resolution: 0.5° N,S,W,E limits: -42.5,-43.0,147.0,147.5 A useful converter: http://www.marine.csiro.au/marq/csq_builder.init C-square codes
  • 28. Contains information on: a) cell codes b) statistical cell properties (center, limits, and area); c) membership in relevant areas (FAO areas, EEZs or LMEs); d) physical attributes (depth, salinity or temperature); e) biological properties (e.g. primary production). Data gathered from: Sea Around Us Project CSIRO Kansas Geological Survey Compiled by: Kristin Kaschner & Jonathan Ready HCAF (Half-degree Cells Authority File)
  • 29. Contains information used for describing the environmental tolerance and preference of a species: • distribution using FAO areas and bounding box • range of values per environmental parameter (min., preferred min., preferred max., max.) HSPEN (Half-degree Species Environmental Envelope)
  • 30. Online experiment: the i-Marine Filtering Facilities https://i-marine.d4science.org/group/biodiversitylab/processing-tools
  • 31. A Niche model relying on expert knowledge
  • 32. Contains the assignment of a species to a half-degree cell and the corresponding probability of occurrence of the species in a given cell; The assignment probability is the multiplicative equation of each of the environmental parameters (SST, salinity, prim. prod., sea ice concentration, distance to land). HSPEC (Half-degree Species Assignment)
  • 33. AquaMaps Gadus morhua A Presence-only species model that relies on expert knowledge about the species habitat • AquaMaps Suitable: estimates the Potential Distribution • AquaMaps Native: estimates the Actual Distribution • Maps have 0.5 degrees resolution; • Expert knowledge is used in modelling the habitat parameters; • AquaMaps adopts mechanistic assumptions combined with an automatic estimation of parameter values.
  • 34. • “good cells” - within bounding box or known FAO areas • minimum of 10 “good cells” for needed for extracting parameters Bounding box or FAO area limits serve as independent verification of the validity of occurrence records. AquaMaps – Good Cells Taken from: http://www.aquamaps.org/main/presentations/Part%20II%20-%20AquaMaps%20behind%20the%20scene.pdf
  • 35. Global grid of 259,200 half degree cells Good cells are used to derive the range of environmental parameters within the species’ native range. AquaMaps – Extracting Environmental Parameters Taken from: http://www.aquamaps.org/main/presentations/AquaMaps_General0908.pdf
  • 36. • Depth ranges: typically from literature; depth estimate based on habitat description • Min = 25th percentile - 1.5 * interquartile or absolute minimum in extracted data (whichever is greater) • Max = 75th percentile + 1.5 * interquartile or absolute maximum in extracted data (whichever is greater) • PrefMin = 10th percentile of observed variation in an environmental parameter • PrefMax = 90th percentile of observed variation in an environmental parameter • Surface values for species with min depth ≤ 200m • Bottom values for species with min depth > 200m The environmental envelopes describe tolerances of a species with respect to each environmental parameter. AquaMaps – Environmental Envelopes Taken from: http://www.aquamaps.org/main/presentations/AquaMaps_General0908.pdf
  • 37. Predictor Preferred min Preferred max Min Max PMax Relativeprobability ofoccurrence Pc = Pbathymetryc x PSSTc x Psalinityc x Pchl ac x PIceDistc x PLandDistc Probabilities of species occurrence are generated by matching the species environmental envelope against local environmental conditions to determine relative suitability of a given area. Probability of Occurrence AquaMaps – Environmental Envelopes Taken from: http://www.aquamaps.org/main/presentations/AquaMaps_General0908.pdf
  • 38. The probability is calculated for each 0.5 cell in the oceans. A color is associated to the probability values AquaMaps – Probability Pc = Pbathymetryc x PSSTc x Psalinityc x Pchl ac x PIceDistc x PLandDistc
  • 40. What if Expert Knowledge was missing?
  • 41. Artificial Neural Network Presence/Absence Points examples Probability (1/ 0) • Learns from positive (presence) and negative (absence) examples (training mode); • Adapts the network weights to produce the correct outputs on the examples; • Produces probability values for new input (test mode).
  • 43. Examples and Exercises: AquaMaps - Neural Networks https://i-marine.d4science.org/group/biodiversitylab/processing-tools
  • 45. • HCAF Scenarios can be simulated by means of interpolation. • Interpolation produces half-degree values between a start and an end date • Once new HCAFs are available we can produce an HSPEC for each HCAF Simulation of HCAF Scenarios
  • 46. Climate Changes Effects on Species Estimated impact of climate changes over 20 years on 11549 species. Bioclimate HSpec Overall occupancy in time
  • 48. Grouping the occurrence points and the environmental features of different species
  • 49. • Group points by spatial distance or density • Detect outliers Occurrence Points Clustering
  • 50. DBScan acts on the points density Parameters: •Epsilon = 10 •Min Points = 2 Outliers Density Clustering
  • 51. XMeans K = [20,30] Min Points = 2 MaxIter=1000 KMeans K = 24 Min Points = 2 MaxIter=1000 MaxOptSteps = 1000 No Outliers Detected! No Outliers Detected! Distance Clustering
  • 54. Similarity between habitats Habitat Representativeness Score: • Measures the degree to which sampled habitats are representative for a certain area of study; • Has been used for assessing the minimum number of surveys on a study area that are needed to cover a good heterogeneity of species habitat variables. Can be used to: • Measure the similarity between the environmental features of two areas; • Assesses the quality of models and environmental features. HRS=10.6 Habitat Representativeness Score
  • 55. A+P HRS 10.58 P HRS 10.61 Habitat Representativeness Score Absence Presence The HRS is too high -> all the maps can be unreliable and need expert validation HRS is in [0;2] for each feature The overall HRS is the sum of the HRSs of the environmental features
  • 56. Habitat Representativeness Score for each Feature HRS 10.58 mean depth in t.c. 1.90 max depth in t.c. 0.87 min depth in t.c. 0.04 mean annual s surface temp 1.19 mean annual s bottom temp 1.59 mean salinity in t.c. 1.23 mean bottom salinity in t.c. 0.44 mean primary production 0.61 annual ice concentration 0.71 distance from land 0.46 ocean area in t.c. 1.54 Presence, Absence HRS 10.61 mean depth in t.c. 1.92 max depth in t.c. 0.86 min depth in t.c. 0.04 mean annual s surface temp 1.13 mean annual s bottom temp 1.56 mean salinity in t.c. 1.29 mean bottom salinity in t.c. 0.34 mean primary production 0.64 annual ice concentration 0.78 distance from land 0.49 ocean area in t.c. 1.55 The most representative feature is the minimum depth in a cell of 0.5 degrees Presence only Even in this case the most representative feature is the minimum depth in a cell of 0.5 degrees
  • 57. Online experiment: Habitat Representativeness Score https://i-marine.d4science.org/group/biodiversitylab/processing-tools
  • 59. BiOnym Preprocessing And Parsing A workflow approach to taxon name matching. Accounts for: • Variations in the spelling and interpretation of taxonomic names • Combination of data from different sources • Harmonization and reconciliation of Taxa names Taxon Matcher 1 Taxon Matcher 2 Taxon Matcher n PostProcessing Reference Source (ASFIS) Reference Source (FISHBASE) Reference Source (WoRMS) Raw Input String. E.g. Gadus morua Lineus 1758 Correct Transcriptions: E.g. Gadus morhua (Linnaeus, 1758) Reference Source (Other in DwC-A)
  • 60. GSAy GSAY GSrAy GSrAY GSA Complete match Step Rate GSAy 950 GSAY 940 GSrAy 930 GSrAY 920 GSA 910 GSrA 900 GSY 890 GSrY 880 SAy 870 SAY 860 SrAy 850 SrAY 840 GAy 830 GAY 820 … Parentheses issue Gender agreement issues Gender agreement and parentheses issues Year issues GSA Year issues Matcher Example - GSAy
  • 61. GSY GS SrAy Rest Author issues, misspelling or wrong Step Rate GSY 950 GSAY 940 GSrAy 930 GSrAY 920 GSA 910 GSrA 900 GSY 890 GSrY 880 SAy 870 SAY 860 SrAy 850 SrAY 840 GAy 830 GAY 820 … Homonyms Other combinations Taxamatch GAY Visual check Matcher Example - GSAy