Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Aspects of Reproducibility in Earth Science


Published on

Ongoing Work within EVEREST EU Project

Published in: Science
  • Login to see the comments

  • Be the first to like this

Aspects of Reproducibility in Earth Science

  1. 1. Aspects of Reproducibility in Earth Science – ongoing work Raul Palma Poznan Supercomputing and Networking Center, Poland Dagstuhl seminar: Reproducibility of Data-Oriented Experiments in e-Science January, 2016
  2. 2. Context Acronym: EVER-EST Full title: European Virtual Environment for Research - Earth Science Themes Type of funding scheme: Research and Innovation Actions Work Programme topic addressed: Call EINFRA-9-2015 – e-Infrastructures for Virtual Research Environments (VRE) • Project ID: 674907 • Project Type: RIA • Start Date: 01.10.2015 • Duration: 36 Months • Website: TBC • Maximum Grant Amount: 6,649,002 € • Total funded effort in person/months: 663 • Coordinator: European Space Agency • Contact Person: Mirko Albani (ESA)
  3. 3. EVEREST Consortium
  4. 4. Key objectives  Establish a VRE e-infrastructure for Earth Science  addressing the needs of different ES communities  to facilitate their collaborative working and research  Discover, access, assess and process existing and new heterogeneous ES datasets and preserved knowledge held by distributed data centres  Share data, models, algorithms, scientific results and their own experiences within a community or across communities  Capture, annotate and store the workflows, processes and results from their research activities;  Ensure the long-term sustainability and preservation of data, models, workflows, tools and services developed by existing communities  Validate the VRE with four main Virtual Research Communities  Sea Monitoring VRC  Natural Hazards VRC (floods, geological, weather, wildfires)  Land Monitoring VRC  Supersites VRC (volcanoes and seismic)
  5. 5. Key objectives Define, implement and validate the Research Objects (RO) concepts and technologies within the ES context as the mean for sharing information and establish more effective collaboration in the VRE
  6. 6. Reproducibility aspects
  7. 7. Earth Science Research and Information Lifecycle (high level story)
  8. 8. Experimental Science (to compare) Experiment Results (data) Scientific Interpretation Background Hypothesis Assumptions Input data Method Publication Results (Data) Contribution to Science Communicate contribution to the community Contribution to Research Community Peer review: “Are these novel findings? Was the method sound?” Reader: “I trust that this method is sound.” Reuse (incremental)
  9. 9. Supersite Science - ES VRC (more concrete story)  Historical science mostly based on past observations, as opposed to experimental science  Testing of hypothesis is not normally the main activity  Main activities of the VRC:  measure geophysical parameters in the natural environment,  derive information on the effects of the phenomena and processes,  model this information to generate space/time representations of geophysical phenomena,  provide these representations to risk management stakeholders,  use the information to develop theories or confirm hypotheses
  10. 10. Supersite VRC operational scenario  In situ data providers (normally local monitoring agencies) provide open access to their data collections (with a data policy), including raw and processed data  Space agencies acquire and distribute satellite EO data (personal licenses to sign)  Authorized scientists should be able to access and display the data online, process them using community tools, validate the results, model the validated data, generate research products and build consensus on scientific information for end-users  Authorized end-users (local) should be able to access the scientific information online and provide feedback  The general public should be able to browse part of the data, the published results, part of the scientific information provided to users (if the latter authorize disclosure) With a Supersite agreement in place:
  11. 11. Research Objects in Supersite VRC Current main use scenarios  Documentation/communication  Reproducibility of scientific results
  12. 12. Research Objects in Supersite VRC  Document best practices (WFs, analysis methods, monitoring methods, etc.)  Training purposes  Provide long term preservation of scientific knowledge (how data are analyzed, how results are validated, etc.)  Provide long term preservation of end-user stories (demonstrating scientist-end-user interactions)  Public dissemination  Provide good management of intellectual property, through licensing and PID/DOI, to allow fast work recognition  Others tbd Documentation/communication
  13. 13. Research Objects in Supersite VRC  Execute “standard” WFs for data analysis/modelling.  validating results  generate “standard” products (e.g. deformation maps) as mass products  training  Testing algorithms and data, either  modifying the WF to execute new analysis methods/models on the same dataset, or  executing the original WF on different Supersites datasets  Others tbd Reproducibility of scientific results
  14. 14. Some issues in reproducibility  The VRC is not (yet) using formalized WFs. Their use, and the use of ROs, must be promoted through a simple, incremental approach.  The data access may be tricky, since their formats and metadata could depend on the Supersite.  Some datasets (and most results) are not maintained by external sources and should be stored in the VRE (and exported as web services to the outside).  WFs reproducibility can be a problem, since they could use a mix of COTS and scientific SW, with licensing, HW compatibility, and computational resources issues.  They do not use web processing services at present.  WFs are rarely fully automated.  Some may require considerable manual intervention.  Some other use a trial and error procedure, during repeated execution one could discard some data or choose different parameters.  In general some internal WF decisions may be based on expert judgment and should be documented.
  15. 15. Research Object example
  16. 16. RO example for the Supersite VRC  Ground deformation mapping is a typical use case for this VRC.  It may be carried out by different researchers on different volcanoes or even on the same volcano.  It normally consists of two consecutive WFs:  the analysis of a multitemporal InSAR image dataset to calculate ground displacement time series  the validation of the results by comparison with other data or results. RO for Volcano deformation mapping
  17. 17. RO example for the Supersite VRC  The main engine of the WF is the analysis SW (COTS): SarScape, which requires IDL.  Other scientists may be more comfortable using other SW, or even using remote processing services (as those provided by the GEP).  Input data are normally accessed through remote web services:  ESA Virtual Archive, Sentinel Hub, DLR Supersite portal, ASI Data Gateway.  Validation data (GPS time series, previous deformation data, levelling data) are not always provided as a service.  Output results must be placed in the VRC database, and exported as web services.  They are subsequently used by other scientists during a consensus process to generate a final product for the End-users. RO for Volcano deformation mapping