Providing Statistical Algorithms
as-a-Service
Gianpaolo Coro, Pasquale Pagano,
Leonardo Candela
ISTI-CNR, Pisa, Italy
Statistical Manager
Statistical Manager is a set of web services that aim to:
• Help scientists in managing marine, biolog...
Architecture
Internal Work
Resources and Sharing
Statistical Manager - Interface
Experiment Execution
Computations Check

Summary of the Input, Output
and Parameters of the experiment
Data Space - Sharing and Import
Hosted Algorithms
Application Fields

o
o
o
o

Ecology
Environment
Biodiversity
Life
Ecology
Niche Modelling
•
•
•
•
•

AquaMaps – Suitable Habitat
AquaMaps – Native Habitat
AquaMaps for 2050
Artificial Neural Netwo...
Outliers Detection
Presence
Points

Cetorhinus maximus

Density-based
Clustering
and Outliers detection

Distance Based Cl...
Climate Changes Effects on Species
Bioclimate HSpec

Overall occupancy in
time

Estimated impact of climate
changes over 2...
Similarity between habitats
Habitat Representativeness Score:
1.

Measures the similarity between the environmental featur...
Environment
Rasterization

A polygonal map is
transformed into a raster
map or into a point map
Maps Comparison
compare

Compares :
• Species Distribution
maps
• Environmental layers
• SAR Images
Periodicity and Seasonality

Periodicity: 12 months
Extraction Tools

Fourier Analysis
Environmental Signal Processing

Spectrogram
Resampling
Biodiversity
Occurrence Points

Occurrence Data from GBIF

Occurrence Data from Obis

∩

ᴜ

-

Intersection

Union

Difference

DD
Dupl...
BiOnym
Raw Input String.
E.g. Gadus morua Lineus 1758
Reference
Source
(ASFIS)

Preprocessing
And
Parsing

Accounts for:
•...
Trendylyzer

• Fill some knowledge gaps on marine species
• Account for sampling biases
• Define trends for common species...
Life
Length-Weight Relationships
Calculate the a and b parameters for 14 230
species by means of Bayesian Methods

Approach:
Co...
Functions Simulation - Spawning Stock Biomass vs Recruits
Estimate biological limits for 50
Northeast Atlantic fish stocks...
Future Work
Plan

• Make the Statistical Manager Algorithms accessible
through the OGC WPS standard (currently available via
SOAP and ...
Thank you
Upcoming SlideShare
Loading in …5
×

Providing Statistical Algorithms as-a-Service

241 views

Published on

In computational statistics, algorithms often have specialized implementations that address very specific problems. Every so often, these algorithms are applicable also to other problems than the original ones. Today, interest is growing towards modular and pluggable solutions that enable the repetition and validation of the experiments made by other scientists and allow the exploitation of those algorithms in other contexts. Furthermore, such procedures are requested to be remotely hosted and to “hide” the complexity of the calculations, managed by remote computational infrastructures behind the scenes. For such reasons, the usual solution of supplying modular software libraries containing implementations of algorithms is leaving the place to Web Services accessible through standard protocols and hosting such implementations. The protocols describing the computational capabilities of these Services are more and more elaborate, so that modular workflows can rely on them.

Published in: Design, Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
241
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Providing Statistical Algorithms as-a-Service

  1. 1. Providing Statistical Algorithms as-a-Service Gianpaolo Coro, Pasquale Pagano, Leonardo Candela ISTI-CNR, Pisa, Italy
  2. 2. Statistical Manager Statistical Manager is a set of web services that aim to: • Help scientists in managing marine, biological or climatic statistical problems • Supply precooked state-of-the-art algorithms as-a-Service • Perform calculations by using Cloud computing in a transparent way to the users • Share input, results, parameters and comments with colleagues by means of Virtual Research Environment in the D4Science e-Infrastructure Setup and execution Statistical Manager Sharing D4Science Computational Facilities
  3. 3. Architecture
  4. 4. Internal Work
  5. 5. Resources and Sharing
  6. 6. Statistical Manager - Interface
  7. 7. Experiment Execution
  8. 8. Computations Check Summary of the Input, Output and Parameters of the experiment
  9. 9. Data Space - Sharing and Import
  10. 10. Hosted Algorithms
  11. 11. Application Fields o o o o Ecology Environment Biodiversity Life
  12. 12. Ecology
  13. 13. Niche Modelling • • • • • AquaMaps – Suitable Habitat AquaMaps – Native Habitat AquaMaps for 2050 Artificial Neural Networks AquaMaps - ANN Gadus morhua AquaMaps - Suitable Habitat
  14. 14. Outliers Detection Presence Points Cetorhinus maximus Density-based Clustering and Outliers detection Distance Based Clustering DBScan K-Means X-Means
  15. 15. Climate Changes Effects on Species Bioclimate HSpec Overall occupancy in time Estimated impact of climate changes over 20 years on 11549 species. Pseudanthias evansi The occupancy by the Pseudanthias evansi decreases in Area 71 but increases in Area 77
  16. 16. Similarity between habitats Habitat Representativeness Score: 1. Measures the similarity between the environmental features of two areas 2. Assesses the quality of models and environmental features Latimeria chalumnae HRS=10.5 Habitat Representativeness Score
  17. 17. Environment
  18. 18. Rasterization A polygonal map is transformed into a raster map or into a point map
  19. 19. Maps Comparison compare Compares : • Species Distribution maps • Environmental layers • SAR Images
  20. 20. Periodicity and Seasonality Periodicity: 12 months Extraction Tools Fourier Analysis
  21. 21. Environmental Signal Processing Spectrogram Resampling
  22. 22. Biodiversity
  23. 23. Occurrence Points Occurrence Data from GBIF Occurrence Data from Obis ∩ ᴜ - Intersection Union Difference DD Duplicates Deletion A B x,y x,y Event Date Records Modif Date Modif Date Author Species Scientific Name Event Date Similarity Author Species Scientific Name
  24. 24. BiOnym Raw Input String. E.g. Gadus morua Lineus 1758 Reference Source (ASFIS) Preprocessing And Parsing Accounts for: • Variations in the spelling and interpretation of taxonomic names • Combination of data from different sources • Harmonization and reconciliation of Taxa names Reference Source (Other in DwC-A) Reference Source (WoRMS) Taxon name Matcher 1 A flexible workflow approach to taxon name matching Reference Source (FISHBASE) Taxon name Matcher 2 Taxon name Matcher n PostProcessing Correct Transcriptions: E.g. Gadus morhua (Linnaeus, 1758)
  25. 25. Trendylyzer • Fill some knowledge gaps on marine species • Account for sampling biases • Define trends for common species Herring recovered after the fish ban Plankton regime shift Can we recognize big changes in species presence?
  26. 26. Life
  27. 27. Length-Weight Relationships Calculate the a and b parameters for 14 230 species by means of Bayesian Methods Approach: Collaborative development with the final user Integration of user’s R Scripts Usage of Cloud computing for R Scripts Periodic runs bluewatermag.com.au The porting to the D4Science Statistical Manager allowed to run the scripts in distributed fashion The time reduction was from 20 days to 11 hours! 95.4% reduction
  28. 28. Functions Simulation - Spawning Stock Biomass vs Recruits Estimate biological limits for 50 Northeast Atlantic fish stocks Use real measures Rely on previous expert knowledge Use Bayesian models to combine information Re-estimated SSB limit Re-estimated HS Rulebased HS Re-estimated precautionary limit
  29. 29. Future Work
  30. 30. Plan • Make the Statistical Manager Algorithms accessible through the OGC WPS standard (currently available via SOAP and Java API) • Invoke the algorithms from a Workflow Management System (e.g. Taverna) • Expand the system with new algorithms
  31. 31. Thank you

×