Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

iMarine Products and Services delivery

770 views

Published on

The iMarine initiative provides a data infrastructure aimed at facilitating open access, the sharing of data, collaborative analysis, processing and mining processing, as well as the dissemination of newly generated knowledge. The iMarine data infrastructure is developed to support decision making in high-level challenges that require policy decisions typical of the ecosystem approach. The iMarine offering can be articulated in six bundles. A “bundle” is a set of services and technologies grouped according to a family of related tasks for achieving a common objective. Bundles can be customized and/or enriched into flexible, purpose-built Virtual Research Environments (VRE). Virtual research environments offer flexible and secure web-based, community-centric platforms, so researchers can work together on common challenges. Each VRE in the infrastructure is tightly integrated with the underlying gCube enabling software, and can access and re-purpose data from other iMarine applications.

Published in: Technology
  • Be the first to comment

iMarine Products and Services delivery

  1. 1. 4th iMarine Board Rome 17-18 October 2013 iMarine Products and services delivery Pasquale Pagano (CNR) iMarine Technical Director pasquale.pagano@isti.cnr.it
  2. 2. Outline Products and services development progress report • BiolCube • StatsCube • GeosCube • ConnectCube Products and services catalogue at project conclusion • Tiny selection of products iMarine Products and services delivery
  3. 3. Google Analytics iMarine portal iMarine Products and services delivery
  4. 4. Application Bundles Management and interpretation of biological and ecological data in the environment Complete full life-cycle data framework, from observational data to aggregated data repositories enriched with validation and analytical tools Storage and interpretation of geospatial explicit information, including WPS processing Flexible sharing, storage, reporting, search and retrieval, aggregation and projection facilities iMarine Products and services delivery A BUNDLE is a set of services and technologie s grouped according to a family of related tasks for ac hieving a common objective
  5. 5. A fraction of the products and services belonging to BiolCube PRODUCTS AND SERVICES DEVELOPMENT PROGRESS REPORT iMarine Products and services delivery
  6. 6. Species Data Discovery Search for multiple species Search across several data providers Search for all occurrences of a set of species and their synonyms Search occurrences for all species belonging a taxon group iMarine Products and services delivery
  7. 7. Species Data Discovery Search in GBIF all the occurrences about 'sarda sarda' and its synonyms found in WoRMS • SEARCH BY SN 'sarda sarda' EXPAND WITH WoRMS IN GBIF RETURN Occurrence Search in CoL all the Taxa about 'sarda sarda' and its synonyms found in WoRMS • SEARCH BY SN 'sarda sarda' EXPAND WITH WoRMS IN CoL RETURN TAXON Search all occurrences for the species commonly recognized as 'shark' in WoRMS and their synonyms as recognized by CoL. Accept only the results with coordinate less or equals to (15.12, 16.12). • SEARCH BY CN 'shark' RESOLVE WITH WoRMS EXPAND WITH CoL WHERE coordinate <= 15.12, 16.12 RETURN Occurrence Search in OBIS all the occurrences for 'sarda sarda' and 'Carcharodon carcharias' expanded with synonyms from WoRMS and CoL. Accept only the results with an event date between 2000 and 2005. • SEARCH BY SN 'sarda sarda', 'Carcharodon carcharias' EXPAND WITH WoRMS, CoL IN OBIS WHERE eventDate >= '2000' AND eventDate <= '2005' RETURN Occurrence iMarine Products and services delivery
  8. 8. Occurrence Points Occurrence Data from GBIF Occurrence Data from Obis ∩ ᴜ - Intersection Union Difference DD Duplicates Deletion A B x,y x,y Records Event Date Event Date Modif Date Modif Date Similarity Author Species Scientific Name Author Species Scientific Name iMarine Products and services delivery
  9. 9. Similarity between habitats Habitat Representativeness Score: 1. Measures the similarity between the environmental features of two areas 2. Assesses the quality of models and environmental features Latimeria chalumnae HRS=10.5 Habitat Representativeness Score iMarine Products and services delivery
  10. 10. BiOnym Raw Input String. E.g. Gadus morua Lineus 1758 Reference Source (ASFIS) Preprocessing And Parsing Reference Source (FISHBASE) Reference Source (Other in DwC-A) Reference Source (OBIS) Taxon Matcher 1 Taxon Matcher 2 A flexible workflow approach to taxon name matching Accounts for: • Variations in the spelling and interpretation of taxonomic names • Combination of data from different sources • Harmonization and reconciliation of Taxa names Taxon Matcher n PostProcessing Correct Transcriptions: E.g. Gadus morhua (Linnaeus, 1758) iMarine Products and services delivery
  11. 11. Trendylyzer - Scope • Fill some knowledge gaps on marine species • Account for sampling biases • Define trends for common species We focus on the OBIS database Is the Fulmar losing its common species status among the seabirds? Herring recovered after the fish ban Can we recognize big changes in species presence? Plankton regime shift iMarine Products and services delivery
  12. 12. Trendylyzer - Most Observed Taxa iMarine Products and services delivery
  13. 13. Trendylyzer – Observation ranks on Large Marine Ecosystems iMarine Products and services delivery
  14. 14. Trendylyzer – Observation ranks on Marine Ecoregions of the World iMarine Products and services delivery
  15. 15. Length-Weight Relationships Objective: Calculate the a and b parameters for several species. Requirements: Account for... • Many studies about a single species • Single study • Use existing studies to inform new studies bluewatermag.com.au Solution: Combine existing knowledge with new data by means of Bayesian methods. Approach: Collaborative development with the ‘stakeholder’ Integration of R Scripts Usage of Cloud computing for R Scripts iMarine Products and services delivery
  16. 16. LWR - Performance The porting to the D4Science Statistical Manager allowed to run the scripts in distributed fashion The original time of the scientist’s procedure was 20 days After the optimization on our R development machines the time of the sequential run was reduced to 10 days The timing on the Statistical Manager was of 11 hours! Time reduction of 95.4% The script has been run periodically and currently solves LWR for 37 234 species iMarine Products and services delivery
  17. 17. A fraction of the products and services belonging to StatsCube PRODUCTS AND SERVICES DEVELOPMENT PROGRESS REPORT iMarine Products and services delivery
  18. 18. Tabular Data Manager Complete new application for the management of data workflow. It allows to *manage* *flow of data* and to create report out of the management activities. • flow of data: dataset compliant with a template that are generated and updated in chunks. • manage: import, store, transform, validate, access, analyze, visualize, and export. iMarine Products and services delivery
  19. 19. Tabular Data Manager: Templates • A table template defines: – Table definition – Columns definition – A set of table transformations – A set of validation procedures • Can be applied to any dataset • Can be modified and shared among people iMarine Products and services delivery
  20. 20. Tabular Data Manager: Menu Ribbon style menu Buttons behavior depends on current document Alt messages on mouseover iMarine Products and services delivery
  21. 21. Tabular Data Manager: Panels iMarine Products and services delivery
  22. 22. Tabular Data Manager: Import iMarine Products and services delivery
  23. 23. 330 Cores Currently Allocated Infrastructure: Computing as Service Hadoop • MapReduce Statistical Manager • Analysis/clustering/modeling R clusters • Windows and Linux I-MARINE EXTENDED BOARD 23
  24. 24. A fraction of the products and services belonging to GeosCube PRODUCTS AND SERVICES DEVELOPMENT PROGRESS REPORT iMarine Products and services delivery
  25. 25. Rasterization A polygonal map is transformed into a raster map or into a point map iMarine Products and services delivery
  26. 26. Maps Comparison compare Compares : • Species Distribution maps • Environmental layers • SAR Images iMarine Products and services delivery
  27. 27. Periodicity and Seasonality Periodicity: 12 months Extraction Tools Fourier Analysis iMarine Products and services delivery
  28. 28. Environmental Signal Processing Spectrogram Resampling iMarine Products and services delivery
  29. 29. Environmental Enrichment: Approach • (Oozie)workflow to optimize the processing chain: – Extract occurrences for the Carcharodon carcharias (White Shark) for a given time of interest – Apply the dbscan algorithm (R implementation) to identify geospatial clusters – Create bounding boxes around the clusters – Use the bounding boxes as queryables for the WCS request – Apply BEAM Pixel Extraction (same algorithm as BioOracle environmental enrichment service) – Create the time series – Visualize the time series iMarine Products and services delivery
  30. 30. Environmental Enrichment: results iMarine Products and services delivery
  31. 31. SPREAD • Interactive investigation process for statisticians & scientists to confront data from different domains (e.g. Statistics vs. GIS data) and batch process of data reallocations hypothesis DATA IMPORT / CURATION Estimates dataset by EEZ – high seas Catch dataset by FAO area FAO Areas GIS DATA DISCOVERY, SEARCHING & SHARING Available Target Areas DATA SELECTION (e.g. Filter) Geographic intersection FAO Areas / EEZs – Highs seas REALLOCATION Species distributions iMarine Products and services delivery
  32. 32. Legacy Processes (IRD) • iX Catches per Species: per Ocean / Area, per Fishing Gear type, per Month / Year, and kernel density for biodiversity / ecological datasets (IRD+OBIS+GBIF) 20°N 10°N 0 10°S 20°S 30°S 30°E 50°E 70°E 90°E 110°E iMarine Products and services delivery
  33. 33. A fraction of the products and services belonging to ConnectCube PRODUCTS AND SERVICES DEVELOPMENT PROGRESS REPORT iMarine Products and services delivery
  34. 34. MarineTLO Version 3.0.0 Version 2.0.0 – – – – – – – – Species Scientific Name of Species FAO Species Code IRD Species Code WoRMS Species Code Predators and Prey Competitors Biological Classification of Species (e.g. WoRMS) – – – – – – – – – – – – – – MarineTLO Version 2.0.0 Water Areas Species connected to Water Areas Countries Countries connected to Water Aras Species connected to Countries Ecosystems Ecosystems connected to Countries Species connected to Ecosystems Exclusive Economical Zones Fishing Gears Fishing Vessels More species and more Predators Common Names of Species iMarine Products and services delivery 34
  35. 35. Requirements as Competency Queries #Query For a scientific name of a species (e.g. Thunnus Albacares or Poromitra Crassiceps), find/give me Q1 the biological environments (e.g. ecosystems) in which the species has been introduced and more general descriptive information of it (such as the country) Q2 its common names and their complementary info (e.g. languages and countries where they are used) Q3 Q4 Q5 Q6 the water areas and their FAO codes in which the species is native the countries in which the species lives the water areas and the FAO portioning code associated with a country the presentation w.r.t Country, Ecosystem, Water Area and Exclusive Economical Zone (of the water area) Q7 the projection w.r.t. Ecosystem and Competitor, providing for each competitor the identification information (e.g. several codes provided by different organizations) Q8 a map w.r.t. Country and Predator, providing for each predator both the identification information and the biological classification Q9 who discovered it, in which year, the biological classification, the identification information, the common names - providing for each common name the language, the countries where it is used in. iMarine Products and services delivery 35
  36. 36. The MarineTLO-based warehouse Evolution RDF Triple Store TLOMarine FLOD2TLOm apping ECOSCOPE2TLO mapping WoRMS2TLO mapping DBpediaS2TLO mapping FB2TLO mapping FLOD ECOSCOPE WoRMS DBpedia Fishbase Copy FLOD By FAO Copy ECOSCOPE By IRD Copy WoRMS (part) Generated by SPD &TLO wrapper Copy DBpedia (part) By DBpedia SPARQL Endpoint iMarine Products and services delivery Copy Fishbase (part) By Fishbase RDMS
  37. 37. Warehouse V3 Concepts Ecoscope FLOD WoRMS DBpedia Fishbase Species Scientific Names Authorships Common Names Predators Ecosystems Countries Water Areas Vessels Gears EEZ iMarine Products and services delivery
  38. 38. TLO warehouse V2 vs V3 V2 Contains information about 19,000 distinct marine species Source Species Number DBpedia FLOD 14,291 FLOD Common Species (size of intersections) 10,849 WoRMS 3,046 Ecoscope 731 56 768 FLOD 1124 Ecoscope DBpedia WoRMS 73 277 WoRMS 768 53 V3 contains information about 37,000 distinct marine species Source Common Species (size of intersections) Species Number DBpedia 14,291 FLOD FLOD WoRMS Ecoscope Fishbase 10,849 WoRMS 1124 Ecoscope 277 FishBase 31,277 DBpedia FLOD 3,046 731 56 9833 768 73 6141 53 1288 WoRMS Ecoscope iMarine Products and services delivery 53
  39. 39. A tiny fraction of the products and services belonging to BiolCube PRODUCTS AND SERVICES CATALOGUE AT PROJECT CONCLUSION iMarine Products and services delivery
  40. 40. Trendylyzer – Definition of Common Species Grey = not a common species in 1990 Trends for common species can be indicators of ecological changes A formal definition of common species is not trivial A definition based on occurrences distribution gives interesting, result but is affected by sampling biases iMarine Products and services delivery
  41. 41. Trendylyzer – Definition of Common Species We are searching for a more formal definition of C.S., which accounts for the biases in the database … We defined a commonness score function The terms influencing the Commonness of a species are given a weight using pattern recognition models For each species: 1. Nr of observations 2. Nr of individuals per observation 3. Nr of observations per dataset 4. Nr of datasets 5. Nr of geographical cells 6. Temporal frequency of the observations Normalizing => relative commonness. Create score or rank by taxonomic group We are assessing the performances on the indications by FishBase and IUCN on some benchmark species iMarine Products and services delivery
  42. 42. Trendylyzer - Performance A preliminary definition of CS was done using 1. Nr of observations per dataset in one year 2. Nr of datasets containing the species in one year On a ‘trustable’ benchmark with 255 species the correctness of the classification with respect to an expert classification was 99.21%! The complex approximating function including also time and geographical extent gave 80% of agreement with respect to an expert classification on an ‘wild’ benchmark (80 species) The results are very promising! iMarine Products and services delivery
  43. 43. A tiny fraction of the products and services belonging to StatsCube PRODUCTS AND SERVICES CATALOGUE AT PROJECT CONCLUSION iMarine Products and services delivery
  44. 44. Tabular Data Manager gCube Releases iMarine Products and services delivery
  45. 45. Tabular Data Manager: 2.18 • Transformations support: table/column type, labels management • Validation: multiple codes warning, reduntant tuples, table types checks (codelist, dataset) • Generic table metadata support • Batch replace (according to an expression) • Single tuple modification • Full Workspace integration • Support for JSON document • Templates iMarine Products and services delivery
  46. 46. Tabular Data Manager: 2.19 • Operations bundle – Aggregation, Union, Filtering, Denormalisation – Column merging – Import Postprocessing – Notification – Custom codelist creation use cases more to come… iMarine Products and services delivery
  47. 47. Tabular Data Manager: Next releases • 2.20 – SDMX Datasource – Codelist georeferencing – Maps visualization • 2.21 – Data Analysis: mondrian support – Graphs – UI: Jpivot, stpivot iMarine Products and services delivery
  48. 48. Harmonize: Cotrix 48
  49. 49. Discussion time Thank you for your attention www.i-marine.eu iMarine Products and services delivery

×