Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

GlobusWorld 2015


Published on

A Reproducible Framework Powered by Globus

Published in: Technology
  • Be the first to comment

  • Be the first to like this

GlobusWorld 2015

  1. 1. A Reproducible Framework Powered By Globus Tanu Malik, Kyle Chard, Ian Foster Computation Institute University of Chicago and Argonne National Laboratory GeoDataspace
  2. 2. Share and Reproduce Alice wants to share her models and simulation output with Bob, and Bob wants to re-execute Alice’s application to validate her inputs and outputs.
  3. 3. Alice’s Options 1. A tar and gzip 2. Build a website with model code, parameters, and data 3. Submit to a repository 4. Create a virtual machine
  4. 4. Bob’s Frustration 1. I do not find the required for building the model. 2. How do I? Lack of easy and efficient methods for sharing and reproducibility Amount of pain Bob suffers Amount of pain Alice suffers
  5. 5. Some Reproducibility Requirements • Automatically solve the “dependency hell” problem • “I have an incompatible version of the library” • Connect programs with data and capture dataflows • Which version of my program produced this data? • Allows easy annotation of human knowledge • “Insufficient documentation to install or run the program” • Enables reproducibility efficiently and with minimal intervention • “No change of programming or authoring environments”
  6. 6. Reproducible Framework Machine A Application Machine B data system files (/bin, /lib, ...) source code parameters configuration network connections SciUnits (Docker Hub, GitHub, DataHub) Globus Catalog Globus Publish SciUnits Execution Platform (Docker, PTU, chroot) Share/ Transfer 1 2 3 3 4 1. Capture the scientific activity • Capture the source code, the data, the environment, including the flows of data from process to process (local or distributed) 2. Preserve as SciUnits • Preserve the captured information as physical files or as detailed metadata (annotations and provenance) 3. Share & Distribute • Share the sciunits with others including detailed metadata 4. Re-execute and Re-analyze • Users can run the complete package without installation or configuration. Queries for detailed provenance of data
  7. 7. CI Components • SciUnits • Units of scientific activity/research output • Metadata Catalog • A scalable, flexible catalog for annotations conforming to open-world assumption • Globus services for sharing, transfering and publishing sciunits • Share/Publish sciunits for others to use • Replay capability through native re-execution, Docker orVagrant • Run sciunits without installation or configuration and metadata information
  8. 8. Simplifying  Data  Management   for Geoscience  Models Tanu  Malik,  Ian  Foster,  Kyle  Chard, Joseph  Baker,  Mike  Gurnis,  Jonathan  Goodall,  Sco=  Peckham   GeoDataspace
  9. 9. Science Drivers Solid Earth Space Science Hydrology CSDMS GeoDataspace • • Software, Source code, Science Usecases, Reports, Presentations, News
  10. 10. Project Goals • GeoDataspace Project Goals: • Establish the reproducible framework with Globus • Enable three use cases for establishing geounits in Space Science, Hydrology, and Seismology • Making the geounit Client widely accessible to the EarthCube community • connect with a model and data repository (CSDMS)
  11. 11. Science Usecases • Seismology: geounits of 2D and 3D kinematic geoscience models, visualized through GPlates and modifying GPML data files • End Goal: Sharing, preserving, and publishing visualization sessions with data • Space Science geounits on SuperDARN data with analysis tools as available from the Baker Laboratory atVirginia Tech • End Goal: Sharing and publishing geounits • Hydrology geounits of IRODs workflows on hydrologyVIC models • End Goal: Demonstrating end-to-end reproducibility with iRODS that does not support data provenance or data publishing
  12. 12. Acknowledgements Funders: Community:
  13. 13. Reproducible Framework Client Provenance Data Annotations Application Virtualization (Source Code, Data, Environment, Library Dependencies) CorePlugins Clipboard Events Commands Web Browsing History Metadata from Files Ontologies, Dictionaries, Vocabularies Globus Services (Catalog,Transfer, Share, Publish) g gg g Provenance Services (PROV Data Management)
  14. 14. geounit Client GeoDatasp 1. Support for application virtualization. 2. Provenance collection: audit <program name>, exec <program name> [activity] specific version information collected if part of aVMS 3.Annotation: addannotation <file|dir|g> <key:value> 4. Create packages (Docker or vagrant compliant) 5. Queries: why, what, where 6.Visualizers
  15. 15. PROVaaS