SlideShare a Scribd company logo
Aspects of Reproducibility in Earth
Science – ongoing work
Raul Palma
Poznan Supercomputing and Networking Center, Poland
Dagstuhl seminar: Reproducibility of Data-Oriented Experiments in e-Science
January, 2016
Context
Acronym: EVER-EST
Full title: European Virtual Environment for Research - Earth Science Themes
Type of funding
scheme:
Research and Innovation Actions
Work Programme
topic addressed:
Call EINFRA-9-2015 – e-Infrastructures for Virtual Research Environments (VRE)
• Project ID: 674907
• Project Type: RIA
• Start Date: 01.10.2015
• Duration: 36 Months
• Website: TBC
• Maximum Grant Amount: 6,649,002 €
• Total funded effort in person/months: 663
• Coordinator: European Space Agency
• Contact Person: Mirko Albani (ESA)
EVEREST Consortium
Key objectives
 Establish a VRE e-infrastructure for Earth Science
 addressing the needs of different ES communities
 to facilitate their collaborative working and research
 Discover, access, assess and process existing and new heterogeneous ES
datasets and preserved knowledge held by distributed data centres
 Share data, models, algorithms, scientific results and their own experiences
within a community or across communities
 Capture, annotate and store the workflows, processes and results from their
research activities;
 Ensure the long-term sustainability and preservation of data, models, workflows,
tools and services developed by existing communities
 Validate the VRE with four main Virtual Research Communities
 Sea Monitoring VRC
 Natural Hazards VRC (floods, geological, weather, wildfires)
 Land Monitoring VRC
 Supersites VRC (volcanoes and seismic)
Key objectives
Define, implement and validate the
Research Objects (RO) concepts and
technologies within the ES context as
the mean for sharing information and
establish more effective collaboration in
the VRE
Reproducibility aspects
Earth Science Research and Information
Lifecycle (high level story)
Experimental Science (to
compare)
Experiment
Results
(data)
Scientific
Interpretation
Background
Hypothesis
Assumptions
Input data
Method
Publication
Results
(Data)
Contribution to Science Communicate
contribution to
the community
Contribution to
Research Community
Peer review:
“Are these novel
findings? Was the
method sound?”
Reader:
“I trust that
this method
is sound.”
Reuse (incremental)
Supersite Science - ES VRC
(more concrete story)
 Historical science mostly based on
past observations, as opposed to
experimental science
 Testing of hypothesis is not normally the main activity
 Main activities of the VRC:
 measure geophysical parameters in the natural
environment,
 derive information on the effects of the phenomena and processes,
 model this information to generate space/time representations of
geophysical phenomena,
 provide these representations to risk management stakeholders,
 use the information to develop theories or confirm hypotheses
Supersite VRC operational scenario
 In situ data providers (normally local monitoring agencies) provide open
access to their data collections (with a data policy), including raw and
processed data
 Space agencies acquire and distribute satellite EO data (personal licenses
to sign)
 Authorized scientists should be able to access and display the data online,
process them using community tools, validate the results, model the
validated data, generate research products and build consensus on scientific
information for end-users
 Authorized end-users (local) should be able to access the scientific
information online and provide feedback
 The general public should be able to browse part of the data, the published
results, part of the scientific information provided to users (if the latter
authorize disclosure)
With a Supersite agreement in place:
Research Objects in Supersite
VRC
Current main use scenarios
 Documentation/communication
 Reproducibility of scientific results
Research Objects in Supersite
VRC
 Document best practices (WFs, analysis methods, monitoring
methods, etc.)
 Training purposes
 Provide long term preservation of scientific knowledge (how data
are analyzed, how results are validated, etc.)
 Provide long term preservation of end-user stories (demonstrating
scientist-end-user interactions)
 Public dissemination
 Provide good management of intellectual property, through licensing
and PID/DOI, to allow fast work recognition
 Others tbd
Documentation/communication
Research Objects in Supersite
VRC
 Execute “standard” WFs for data analysis/modelling.
 validating results
 generate “standard” products (e.g. deformation maps) as mass
products
 training
 Testing algorithms and data, either
 modifying the WF to execute new analysis methods/models on
the same dataset, or
 executing the original WF on different Supersites datasets
 Others tbd
Reproducibility of scientific results
Some issues in reproducibility
 The VRC is not (yet) using formalized WFs. Their use, and the use of
ROs, must be promoted through a simple, incremental approach.
 The data access may be tricky, since their formats and metadata could
depend on the Supersite.
 Some datasets (and most results) are not maintained by external sources and
should be stored in the VRE (and exported as web services to the outside).
 WFs reproducibility can be a problem, since they could use a mix of
COTS and scientific SW, with licensing, HW compatibility, and computational
resources issues.
 They do not use web processing services at present.
 WFs are rarely fully automated.
 Some may require considerable manual intervention.
 Some other use a trial and error procedure, during repeated execution one could
discard some data or choose different parameters.
 In general some internal WF decisions may be based on expert judgment and
should be documented.
Research Object
example
RO example for the Supersite
VRC
 Ground deformation mapping is a typical use case for this VRC.
 It may be carried out by different researchers on different volcanoes or even
on the same volcano.
 It normally consists of two consecutive WFs:
 the analysis of a multitemporal InSAR image dataset to calculate ground
displacement time series
 the validation of the results by comparison with other data or results.
RO for Volcano deformation mapping
RO example for the Supersite
VRC
 The main engine of the WF is the analysis SW (COTS): SarScape, which
requires IDL.
 Other scientists may be more comfortable using other SW, or even using
remote processing services (as those provided by the GEP).
 Input data are normally accessed through remote web services:
 ESA Virtual Archive, Sentinel Hub, DLR Supersite portal, ASI Data Gateway.
 Validation data (GPS time series, previous deformation data, levelling
data) are not always provided as a service.
 Output results must be placed in the VRC database, and exported as web
services.
 They are subsequently used by other scientists during a consensus process to
generate a final product for the End-users.
RO for Volcano deformation mapping

More Related Content

What's hot

Reproducibility, Research Objects and Reality, Leiden 2016
Reproducibility, Research Objects and Reality, Leiden 2016Reproducibility, Research Objects and Reality, Leiden 2016
Reproducibility, Research Objects and Reality, Leiden 2016
Carole Goble
 
Research Shared: researchobject.org
Research Shared: researchobject.orgResearch Shared: researchobject.org
Research Shared: researchobject.org
Norman Morrison
 
PhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsPhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflows
dgarijo
 
Mtsr2015 goble-keynote
Mtsr2015 goble-keynoteMtsr2015 goble-keynote
Mtsr2015 goble-keynote
Carole Goble
 

What's hot (20)

Research Objects: more than the sum of the parts
Research Objects: more than the sum of the partsResearch Objects: more than the sum of the parts
Research Objects: more than the sum of the parts
 
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
 
Reproducibility, Research Objects and Reality, Leiden 2016
Reproducibility, Research Objects and Reality, Leiden 2016Reproducibility, Research Objects and Reality, Leiden 2016
Reproducibility, Research Objects and Reality, Leiden 2016
 
Beyond the PDF 2, 2013
Beyond the PDF 2, 2013Beyond the PDF 2, 2013
Beyond the PDF 2, 2013
 
FAIR Data, Operations and Model management for Systems Biology and Systems Me...
FAIR Data, Operations and Model management for Systems Biology and Systems Me...FAIR Data, Operations and Model management for Systems Biology and Systems Me...
FAIR Data, Operations and Model management for Systems Biology and Systems Me...
 
Being FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceBeing FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data Science
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow Environments
 
FAIRer Research
FAIRer ResearchFAIRer Research
FAIRer Research
 
Research Shared: researchobject.org
Research Shared: researchobject.orgResearch Shared: researchobject.org
Research Shared: researchobject.org
 
UKON 2014
UKON 2014UKON 2014
UKON 2014
 
PhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsPhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflows
 
CSHALS 2013
CSHALS 2013CSHALS 2013
CSHALS 2013
 
Mtsr2015 goble-keynote
Mtsr2015 goble-keynoteMtsr2015 goble-keynote
Mtsr2015 goble-keynote
 
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
 
OpenTox Europe 2013
OpenTox Europe 2013OpenTox Europe 2013
OpenTox Europe 2013
 
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object FrameworksResults Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
 
NETTAB 2013
NETTAB 2013NETTAB 2013
NETTAB 2013
 
NETTAB 2012
NETTAB 2012NETTAB 2012
NETTAB 2012
 
RARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research ObjectsRARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research Objects
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
 

Similar to Aspects of Reproducibility in Earth Science

Fr1T101-Kuo-20110729 IGARSS ESC.pptx
Fr1T101-Kuo-20110729 IGARSS ESC.pptxFr1T101-Kuo-20110729 IGARSS ESC.pptx
Fr1T101-Kuo-20110729 IGARSS ESC.pptx
grssieee
 
Workflows to access and massage VOData
Workflows to access and massage VODataWorkflows to access and massage VOData
Workflows to access and massage VOData
Jose Enrique Ruiz
 

Similar to Aspects of Reproducibility in Earth Science (20)

Shifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data ProviderShifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data Provider
 
Supporting the research lifecycle of geo-GSNL initiative through HPC and Rese...
Supporting the research lifecycle of geo-GSNL initiative through HPC and Rese...Supporting the research lifecycle of geo-GSNL initiative through HPC and Rese...
Supporting the research lifecycle of geo-GSNL initiative through HPC and Rese...
 
Virtual research environments for implementing long tail open science
Virtual research environments for implementing long tail open scienceVirtual research environments for implementing long tail open science
Virtual research environments for implementing long tail open science
 
Fr1T101-Kuo-20110729 IGARSS ESC.pptx
Fr1T101-Kuo-20110729 IGARSS ESC.pptxFr1T101-Kuo-20110729 IGARSS ESC.pptx
Fr1T101-Kuo-20110729 IGARSS ESC.pptx
 
Cyberistructure
CyberistructureCyberistructure
Cyberistructure
 
E research overview gahegan bioinformatics workshop 2010
E research overview gahegan bioinformatics workshop 2010E research overview gahegan bioinformatics workshop 2010
E research overview gahegan bioinformatics workshop 2010
 
IEEE_BigData2014-Lee.pdf
IEEE_BigData2014-Lee.pdfIEEE_BigData2014-Lee.pdf
IEEE_BigData2014-Lee.pdf
 
Presentation
PresentationPresentation
Presentation
 
Presentation
PresentationPresentation
Presentation
 
Presentation
PresentationPresentation
Presentation
 
Presentation
PresentationPresentation
Presentation
 
XLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and MyriaXLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and Myria
 
Exploration – A Serious Game
Exploration – A Serious GameExploration – A Serious Game
Exploration – A Serious Game
 
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
 
Accelerating Discovery via Science Services
Accelerating Discovery via Science ServicesAccelerating Discovery via Science Services
Accelerating Discovery via Science Services
 
UK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalfaceUK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalface
 
Workflows to access and massage VOData
Workflows to access and massage VODataWorkflows to access and massage VOData
Workflows to access and massage VOData
 
Accelerating Data-driven Discovery in Energy Science
Accelerating Data-driven Discovery in Energy ScienceAccelerating Data-driven Discovery in Energy Science
Accelerating Data-driven Discovery in Energy Science
 
2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurgh2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurgh
 
Metadata for Research Objects
Metadata for Research ObjectsMetadata for Research Objects
Metadata for Research Objects
 

More from Raul Palma

An INSPIRE-based vocabulary for the publication of Agricultural Linked Data
An INSPIRE-based vocabulary for the publication of Agricultural Linked DataAn INSPIRE-based vocabulary for the publication of Agricultural Linked Data
An INSPIRE-based vocabulary for the publication of Agricultural Linked Data
Raul Palma
 

More from Raul Palma (17)

ROHub - Research Object Management Platform Introduction
ROHub - Research Object Management Platform IntroductionROHub - Research Object Management Platform Introduction
ROHub - Research Object Management Platform Introduction
 
FDO as building block for digitization technology stacks
FDO as building block for digitization technology stacksFDO as building block for digitization technology stacks
FDO as building block for digitization technology stacks
 
RO-crate-FDO-ROHub
RO-crate-FDO-ROHubRO-crate-FDO-ROHub
RO-crate-FDO-ROHub
 
RELIANCE-reproducible-OS.pptx
RELIANCE-reproducible-OS.pptxRELIANCE-reproducible-OS.pptx
RELIANCE-reproducible-OS.pptx
 
RELIANCE-services-final.pptx
RELIANCE-services-final.pptxRELIANCE-services-final.pptx
RELIANCE-services-final.pptx
 
ROHub-Argos integration
ROHub-Argos integrationROHub-Argos integration
ROHub-Argos integration
 
RELIANCE ROHub hackathon
RELIANCE ROHub hackathonRELIANCE ROHub hackathon
RELIANCE ROHub hackathon
 
Fostering the Smart Agriculture Development in North East Europe
Fostering the Smart Agriculture Development in North East EuropeFostering the Smart Agriculture Development in North East Europe
Fostering the Smart Agriculture Development in North East Europe
 
Reliance project introduction
Reliance project introductionReliance project introduction
Reliance project introduction
 
Linked data publication pipelines for agri-related use cases
Linked data publication pipelines for agri-related use casesLinked data publication pipelines for agri-related use cases
Linked data publication pipelines for agri-related use cases
 
Exposing EO Linked (meta-)Data from OpenSearch Catalogue
Exposing EO Linked (meta-)Data from OpenSearch CatalogueExposing EO Linked (meta-)Data from OpenSearch Catalogue
Exposing EO Linked (meta-)Data from OpenSearch Catalogue
 
Linked Data with hybrid services in Agriculture
Linked Data with hybrid services in AgricultureLinked Data with hybrid services in Agriculture
Linked Data with hybrid services in Agriculture
 
Publication of INSPIRE-based agricultural linked data
Publication of INSPIRE-based agricultural linked dataPublication of INSPIRE-based agricultural linked data
Publication of INSPIRE-based agricultural linked data
 
Inspire hack 2017-linked-data
Inspire hack 2017-linked-dataInspire hack 2017-linked-data
Inspire hack 2017-linked-data
 
Wielkopolska activities with potential to cluster to cluster collaboration EU...
Wielkopolska activities with potential to cluster to cluster collaboration EU...Wielkopolska activities with potential to cluster to cluster collaboration EU...
Wielkopolska activities with potential to cluster to cluster collaboration EU...
 
Towards the development of smart agriculture infrastructure in Wielkopolska r...
Towards the development of smart agriculture infrastructure in Wielkopolska r...Towards the development of smart agriculture infrastructure in Wielkopolska r...
Towards the development of smart agriculture infrastructure in Wielkopolska r...
 
An INSPIRE-based vocabulary for the publication of Agricultural Linked Data
An INSPIRE-based vocabulary for the publication of Agricultural Linked DataAn INSPIRE-based vocabulary for the publication of Agricultural Linked Data
An INSPIRE-based vocabulary for the publication of Agricultural Linked Data
 

Recently uploaded

The solar dynamo begins near the surface
The solar dynamo begins near the surfaceThe solar dynamo begins near the surface
The solar dynamo begins near the surface
Sérgio Sacani
 
Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...
Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...
Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...
Sérgio Sacani
 
Climate extremes likely to drive land mammal extinction during next supercont...
Climate extremes likely to drive land mammal extinction during next supercont...Climate extremes likely to drive land mammal extinction during next supercont...
Climate extremes likely to drive land mammal extinction during next supercont...
Sérgio Sacani
 
Isolation of AMF by wet sieving and decantation method pptx
Isolation of AMF by wet sieving and decantation method pptxIsolation of AMF by wet sieving and decantation method pptx
Isolation of AMF by wet sieving and decantation method pptx
GOWTHAMIM22
 
Pests of Green Manures_Bionomics_IPM_Dr.UPR.pdf
Pests of Green Manures_Bionomics_IPM_Dr.UPR.pdfPests of Green Manures_Bionomics_IPM_Dr.UPR.pdf
Pests of Green Manures_Bionomics_IPM_Dr.UPR.pdf
PirithiRaju
 
Jet reorientation in central galaxies of clusters and groups: insights from V...
Jet reorientation in central galaxies of clusters and groups: insights from V...Jet reorientation in central galaxies of clusters and groups: insights from V...
Jet reorientation in central galaxies of clusters and groups: insights from V...
Sérgio Sacani
 
Mitosis...............................pptx
Mitosis...............................pptxMitosis...............................pptx
Mitosis...............................pptx
Cherry
 

Recently uploaded (20)

GBSN - Microbiology Lab 1 (Microbiology Lab Safety Procedures)
GBSN -  Microbiology Lab  1 (Microbiology Lab Safety Procedures)GBSN -  Microbiology Lab  1 (Microbiology Lab Safety Procedures)
GBSN - Microbiology Lab 1 (Microbiology Lab Safety Procedures)
 
The solar dynamo begins near the surface
The solar dynamo begins near the surfaceThe solar dynamo begins near the surface
The solar dynamo begins near the surface
 
Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...
Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...
Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...
 
Climate extremes likely to drive land mammal extinction during next supercont...
Climate extremes likely to drive land mammal extinction during next supercont...Climate extremes likely to drive land mammal extinction during next supercont...
Climate extremes likely to drive land mammal extinction during next supercont...
 
GBSN - Microbiology Lab 2 (Compound Microscope)
GBSN - Microbiology Lab 2 (Compound Microscope)GBSN - Microbiology Lab 2 (Compound Microscope)
GBSN - Microbiology Lab 2 (Compound Microscope)
 
Cell Immobilization Methods and Applications.pptx
Cell Immobilization Methods and Applications.pptxCell Immobilization Methods and Applications.pptx
Cell Immobilization Methods and Applications.pptx
 
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
 
Isolation of AMF by wet sieving and decantation method pptx
Isolation of AMF by wet sieving and decantation method pptxIsolation of AMF by wet sieving and decantation method pptx
Isolation of AMF by wet sieving and decantation method pptx
 
Pests of Green Manures_Bionomics_IPM_Dr.UPR.pdf
Pests of Green Manures_Bionomics_IPM_Dr.UPR.pdfPests of Green Manures_Bionomics_IPM_Dr.UPR.pdf
Pests of Green Manures_Bionomics_IPM_Dr.UPR.pdf
 
Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243
Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243
Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243
 
Triploidy ...............................pptx
Triploidy ...............................pptxTriploidy ...............................pptx
Triploidy ...............................pptx
 
Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...
Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...
Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...
 
PLANT DISEASE MANAGEMENT PRINCIPLES AND ITS IMPORTANCE
PLANT DISEASE MANAGEMENT PRINCIPLES AND ITS IMPORTANCEPLANT DISEASE MANAGEMENT PRINCIPLES AND ITS IMPORTANCE
PLANT DISEASE MANAGEMENT PRINCIPLES AND ITS IMPORTANCE
 
NuGOweek 2024 full programme - hosted by Ghent University
NuGOweek 2024 full programme - hosted by Ghent UniversityNuGOweek 2024 full programme - hosted by Ghent University
NuGOweek 2024 full programme - hosted by Ghent University
 
GBSN - Microbiology (Unit 7) Microbiology in Everyday Life
GBSN - Microbiology (Unit 7) Microbiology in Everyday LifeGBSN - Microbiology (Unit 7) Microbiology in Everyday Life
GBSN - Microbiology (Unit 7) Microbiology in Everyday Life
 
Jet reorientation in central galaxies of clusters and groups: insights from V...
Jet reorientation in central galaxies of clusters and groups: insights from V...Jet reorientation in central galaxies of clusters and groups: insights from V...
Jet reorientation in central galaxies of clusters and groups: insights from V...
 
INSIGHT Partner Profile: Tampere University
INSIGHT Partner Profile: Tampere UniversityINSIGHT Partner Profile: Tampere University
INSIGHT Partner Profile: Tampere University
 
Mitosis...............................pptx
Mitosis...............................pptxMitosis...............................pptx
Mitosis...............................pptx
 
Application of Mass Spectrometry In Biotechnology
Application of Mass Spectrometry In BiotechnologyApplication of Mass Spectrometry In Biotechnology
Application of Mass Spectrometry In Biotechnology
 
Plasmapheresis - Dr. E. Muralinath - Kalyan . C.pptx
Plasmapheresis - Dr. E. Muralinath - Kalyan . C.pptxPlasmapheresis - Dr. E. Muralinath - Kalyan . C.pptx
Plasmapheresis - Dr. E. Muralinath - Kalyan . C.pptx
 

Aspects of Reproducibility in Earth Science

  • 1. Aspects of Reproducibility in Earth Science – ongoing work Raul Palma Poznan Supercomputing and Networking Center, Poland Dagstuhl seminar: Reproducibility of Data-Oriented Experiments in e-Science January, 2016
  • 2. Context Acronym: EVER-EST Full title: European Virtual Environment for Research - Earth Science Themes Type of funding scheme: Research and Innovation Actions Work Programme topic addressed: Call EINFRA-9-2015 – e-Infrastructures for Virtual Research Environments (VRE) • Project ID: 674907 • Project Type: RIA • Start Date: 01.10.2015 • Duration: 36 Months • Website: TBC • Maximum Grant Amount: 6,649,002 € • Total funded effort in person/months: 663 • Coordinator: European Space Agency • Contact Person: Mirko Albani (ESA)
  • 4. Key objectives  Establish a VRE e-infrastructure for Earth Science  addressing the needs of different ES communities  to facilitate their collaborative working and research  Discover, access, assess and process existing and new heterogeneous ES datasets and preserved knowledge held by distributed data centres  Share data, models, algorithms, scientific results and their own experiences within a community or across communities  Capture, annotate and store the workflows, processes and results from their research activities;  Ensure the long-term sustainability and preservation of data, models, workflows, tools and services developed by existing communities  Validate the VRE with four main Virtual Research Communities  Sea Monitoring VRC  Natural Hazards VRC (floods, geological, weather, wildfires)  Land Monitoring VRC  Supersites VRC (volcanoes and seismic)
  • 5. Key objectives Define, implement and validate the Research Objects (RO) concepts and technologies within the ES context as the mean for sharing information and establish more effective collaboration in the VRE
  • 7. Earth Science Research and Information Lifecycle (high level story)
  • 8. Experimental Science (to compare) Experiment Results (data) Scientific Interpretation Background Hypothesis Assumptions Input data Method Publication Results (Data) Contribution to Science Communicate contribution to the community Contribution to Research Community Peer review: “Are these novel findings? Was the method sound?” Reader: “I trust that this method is sound.” Reuse (incremental)
  • 9. Supersite Science - ES VRC (more concrete story)  Historical science mostly based on past observations, as opposed to experimental science  Testing of hypothesis is not normally the main activity  Main activities of the VRC:  measure geophysical parameters in the natural environment,  derive information on the effects of the phenomena and processes,  model this information to generate space/time representations of geophysical phenomena,  provide these representations to risk management stakeholders,  use the information to develop theories or confirm hypotheses
  • 10. Supersite VRC operational scenario  In situ data providers (normally local monitoring agencies) provide open access to their data collections (with a data policy), including raw and processed data  Space agencies acquire and distribute satellite EO data (personal licenses to sign)  Authorized scientists should be able to access and display the data online, process them using community tools, validate the results, model the validated data, generate research products and build consensus on scientific information for end-users  Authorized end-users (local) should be able to access the scientific information online and provide feedback  The general public should be able to browse part of the data, the published results, part of the scientific information provided to users (if the latter authorize disclosure) With a Supersite agreement in place:
  • 11. Research Objects in Supersite VRC Current main use scenarios  Documentation/communication  Reproducibility of scientific results
  • 12. Research Objects in Supersite VRC  Document best practices (WFs, analysis methods, monitoring methods, etc.)  Training purposes  Provide long term preservation of scientific knowledge (how data are analyzed, how results are validated, etc.)  Provide long term preservation of end-user stories (demonstrating scientist-end-user interactions)  Public dissemination  Provide good management of intellectual property, through licensing and PID/DOI, to allow fast work recognition  Others tbd Documentation/communication
  • 13. Research Objects in Supersite VRC  Execute “standard” WFs for data analysis/modelling.  validating results  generate “standard” products (e.g. deformation maps) as mass products  training  Testing algorithms and data, either  modifying the WF to execute new analysis methods/models on the same dataset, or  executing the original WF on different Supersites datasets  Others tbd Reproducibility of scientific results
  • 14. Some issues in reproducibility  The VRC is not (yet) using formalized WFs. Their use, and the use of ROs, must be promoted through a simple, incremental approach.  The data access may be tricky, since their formats and metadata could depend on the Supersite.  Some datasets (and most results) are not maintained by external sources and should be stored in the VRE (and exported as web services to the outside).  WFs reproducibility can be a problem, since they could use a mix of COTS and scientific SW, with licensing, HW compatibility, and computational resources issues.  They do not use web processing services at present.  WFs are rarely fully automated.  Some may require considerable manual intervention.  Some other use a trial and error procedure, during repeated execution one could discard some data or choose different parameters.  In general some internal WF decisions may be based on expert judgment and should be documented.
  • 16. RO example for the Supersite VRC  Ground deformation mapping is a typical use case for this VRC.  It may be carried out by different researchers on different volcanoes or even on the same volcano.  It normally consists of two consecutive WFs:  the analysis of a multitemporal InSAR image dataset to calculate ground displacement time series  the validation of the results by comparison with other data or results. RO for Volcano deformation mapping
  • 17. RO example for the Supersite VRC  The main engine of the WF is the analysis SW (COTS): SarScape, which requires IDL.  Other scientists may be more comfortable using other SW, or even using remote processing services (as those provided by the GEP).  Input data are normally accessed through remote web services:  ESA Virtual Archive, Sentinel Hub, DLR Supersite portal, ASI Data Gateway.  Validation data (GPS time series, previous deformation data, levelling data) are not always provided as a service.  Output results must be placed in the VRC database, and exported as web services.  They are subsequently used by other scientists during a consensus process to generate a final product for the End-users. RO for Volcano deformation mapping

Editor's Notes

  1. On the one hand reviewers need to evaluate whether the findings are worthwhile and novel and the method sound. On the other hand, the reader needs to trust what she reads Scientific communications have at least two goals: To announce a result To convince readers that the result is correct Science is incrementally built on results which can be reused and therefore reproduced for validation Experimental science should describe the results and provide a clear enough protocol for successful repetition and extension