GeoDataspace: Simplifying Data Management Tasks with Globus

Tanu Malik
Tanu MalikResearch Associate Scientist
Simplifying	
  Data	
  Management	
  Tasks	
  with	
  
Globus	
  
Tanu	
  Malik,	
  Ian	
  Foster,	
  Kyle	
  Chard,	
  Roselyne	
  Tchoua,	
  
Joseph	
  Baker,	
  Mike	
  Gurnis,	
  Jonathan	
  Goodall,	
  ScoD	
  Peckham	
  
GeoDataspace
Share and Reproduce
Alice wants to share her models and
simulation output with Bob, and Bob wants
to re-execute Alice’s application to validate
her inputs and outputs.
GeoDatasp
Alice’s Options
1. A tar and gzip
2. Build a website with model code,
parameters, and data
3. Create a virtual machine
GeoDatasp
Bob’s Frustration
1. I do not find the lib.so required for building
the model.
2. How do I?
GeoDatasp
Lack of easy and efficient methods for sharing
and reproducibility
Amount of pain
Bob suffers
Amount of
pain Alice suffers
GeoDataspace
• Goal: Sharing and reproducibility hand-in-
hand
• Target users: Computational geoscientists
• Data and model integration
• Research Output is More Than "Just" a Research Paper
GeoDatasp
GeoDataspace
CI Components
• The geounits
• Units of scientific activity/research output
• How to capture and track this activity
• Globus Catalog
• A scalable, flexible catalog for annotations
conforming to open-world assumption
• Globus Publish and reproduce geounits
• Share/Publish geounits for others
• Replay geounits for analysis
GeoDatasp
geounits:
package data , source code and
environment
GeoDatasp
geounit Client:
Provenance is key
GeoDatasp
1. audit
<program name>
2. PROV
compliant
database
3. exec
<program name>
[activity]
geounit Client:
Features
• Based on Code, Data, Environment (CDE’s)
ptrace and okapi functionality
• Data/code can be local or distributed
• Data/code files are not manifested into the
package until ready to share; only
descriptions in package
• Specify granularity of auditing
• Partial replay
• Unpack into docker or vagrant
Globus Catalog:
hosts geounits
• Dataset Management Model
• Catalog: a hosted resource that enables
the grouping of related datasets
• Dataset: a virtual collection of
(schemaless) metadata and distributed
data elements viz files, provenance
• Annotation: a piece of metadata that
exists within the context of a dataset or
data member
GeoDatasp
Globus Catalog
• Dataset Service
• Virtual views of data based on user-defined and/or automatically
extracted metadata (annotations)
• Implemented as a service with web and REST interfaces
• Relies on Globus Nexus for user authentication and group management
• Client-side Tooling
• Dataset ingest
• Automatic creation of datasets and extraction of metadata from various
common data formats and directory structures
• Globus endpoints
• Associate data (in files and directories) with one or more datasets
• Python Client library
• Integration with external services
• Transfer: Moving datasets from their storage endpoint(s) to a selected
destination
• Faceted Browser Search
• Search based on provenance entities and activities
GeoDatasp
Globus Catalog:
REST interface
GeoDatasp
Approach
•  Hosted user-defined catalogs
•  Based on annotation model
<dataset/member, name, value>
•  Association of data members
•  Fine grained access control
•  Flexible query language
–  Name:value, free text, facets,…
•  Integrated with other
services
/geodataspace
/geodataspace/annotation
/geodataspace/geounit
/geodataspace/geounit/annotation
/geodataspace/geounit/acl
/geodataspace/geounit/members
/geodataspace/geounit/members/annotation
/geodataspace/geounit/provenance
/geodataspace/geounit/version
Publish and Reexecute
geounits
• Still in the works
• Each geounit can be published through
Globus Publish and re-executed through
analysis platform
GeoDatasp
Science Drivers
Solid Earth
Space Science
Hydrology
CSDMS
GeoDatasp
GeoDataspace
Solid Earth
• Allow reproducible, replayable geounits of GPlates
• GPlates
• Software package has several dependencies
• Create geounits of Kinematic Representation of
Surface of Earth (3D and 4D models)
• GPlates software,
• GPML files (XML for plate tectonics) used in the model,
• output GPML files are simple X/Y format or could be visualization files, a
global set of visualization output, images as well. 
• Integrating geounits in Python workflows
• Incorporate metadata from workflows and use geounit metadata to
inform workflows
GeoDatasp
Hydrology
• Data processing steps for theVIC model
geounit 1
geounit 2
geounit 3 geounit 4
Objective: Monitor changes in the data processing steps
and compare them across the various runs
GeoDatasp
Space Science
• Create geounits of SuperDarn data and its
plotting products
• Publish them for validation
GeoDatasp
CSDMS
• How geounits should be coupled
• Metadata alignment issues
• If we create geounits of CSDMS models,
how do we enable suitable search
interfaces with the provenance metadata
and CSDMS metadata?
GeoDatasp
Current Work
• Working with use cases to bootstrap
geounits
• Populating geounits based on Python
workflows and incorporate geounits in
workflows
• Interfacing geounit Client with Globus
Catalog
• Improving distributed search functionality
GeoDatasp
Track it!
• http://workspace.earthcube.org/
geodataspace
• Software, Source code, Science Usecases,
Reports, Presentations, News
GeoDatasp
Acknowledgements
• National Science Foundation
• EarthCube Community
• Globus team
• CI team
GeoDatasp
1 of 21

Recommended

GlobusWorld 2015 by
GlobusWorld 2015GlobusWorld 2015
GlobusWorld 2015Tanu Malik
638 views15 slides
EarthCube DDMA AGU by
EarthCube DDMA AGUEarthCube DDMA AGU
EarthCube DDMA AGUTanu Malik
770 views16 slides
GEN: A Database Interface Generator for HPC Programs by
GEN: A Database Interface Generator for HPC ProgramsGEN: A Database Interface Generator for HPC Programs
GEN: A Database Interface Generator for HPC ProgramsTanu Malik
681 views18 slides
LDV: Light-weight Database Virtualization by
LDV: Light-weight Database VirtualizationLDV: Light-weight Database Virtualization
LDV: Light-weight Database VirtualizationTanu Malik
823 views32 slides
Benchmarking Cloud-based Tagging Services by
Benchmarking Cloud-based Tagging ServicesBenchmarking Cloud-based Tagging Services
Benchmarking Cloud-based Tagging ServicesTanu Malik
539 views23 slides
PTU: Using Provenance for Repeatability by
PTU: Using Provenance for RepeatabilityPTU: Using Provenance for Repeatability
PTU: Using Provenance for RepeatabilityTanu Malik
1.3K views19 slides

More Related Content

What's hot

Coding the Continuum by
Coding the ContinuumCoding the Continuum
Coding the ContinuumIan Foster
1.7K views50 slides
Cytoscape Tutorial Session 1 at UT-KBRIN Bioinformatics Summit 2014 (4/11/2014) by
Cytoscape Tutorial Session 1 at UT-KBRIN Bioinformatics Summit 2014 (4/11/2014)Cytoscape Tutorial Session 1 at UT-KBRIN Bioinformatics Summit 2014 (4/11/2014)
Cytoscape Tutorial Session 1 at UT-KBRIN Bioinformatics Summit 2014 (4/11/2014)Keiichiro Ono
24.2K views83 slides
Scaling collaborative data science with Globus and Jupyter by
Scaling collaborative data science with Globus and JupyterScaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and JupyterIan Foster
810 views35 slides
Workshop: Introduction to Cytoscape at UT-KBRIN Bioinformatics Summit 2014 (4... by
Workshop: Introduction to Cytoscape at UT-KBRIN Bioinformatics Summit 2014 (4...Workshop: Introduction to Cytoscape at UT-KBRIN Bioinformatics Summit 2014 (4...
Workshop: Introduction to Cytoscape at UT-KBRIN Bioinformatics Summit 2014 (4...Keiichiro Ono
7.6K views119 slides
Data Tribology: Overcoming Data Friction with Cloud Automation by
Data Tribology: Overcoming Data Friction with Cloud AutomationData Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud AutomationIan Foster
892 views18 slides
Research Automation for Data-Driven Discovery by
Research Automationfor Data-Driven DiscoveryResearch Automationfor Data-Driven Discovery
Research Automation for Data-Driven DiscoveryGlobus
173 views18 slides

What's hot(20)

Coding the Continuum by Ian Foster
Coding the ContinuumCoding the Continuum
Coding the Continuum
Ian Foster1.7K views
Cytoscape Tutorial Session 1 at UT-KBRIN Bioinformatics Summit 2014 (4/11/2014) by Keiichiro Ono
Cytoscape Tutorial Session 1 at UT-KBRIN Bioinformatics Summit 2014 (4/11/2014)Cytoscape Tutorial Session 1 at UT-KBRIN Bioinformatics Summit 2014 (4/11/2014)
Cytoscape Tutorial Session 1 at UT-KBRIN Bioinformatics Summit 2014 (4/11/2014)
Keiichiro Ono24.2K views
Scaling collaborative data science with Globus and Jupyter by Ian Foster
Scaling collaborative data science with Globus and JupyterScaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and Jupyter
Ian Foster810 views
Workshop: Introduction to Cytoscape at UT-KBRIN Bioinformatics Summit 2014 (4... by Keiichiro Ono
Workshop: Introduction to Cytoscape at UT-KBRIN Bioinformatics Summit 2014 (4...Workshop: Introduction to Cytoscape at UT-KBRIN Bioinformatics Summit 2014 (4...
Workshop: Introduction to Cytoscape at UT-KBRIN Bioinformatics Summit 2014 (4...
Keiichiro Ono7.6K views
Data Tribology: Overcoming Data Friction with Cloud Automation by Ian Foster
Data Tribology: Overcoming Data Friction with Cloud AutomationData Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud Automation
Ian Foster892 views
Research Automation for Data-Driven Discovery by Globus
Research Automationfor Data-Driven DiscoveryResearch Automationfor Data-Driven Discovery
Research Automation for Data-Driven Discovery
Globus 173 views
Introduction to Biological Network Analysis and Visualization with Cytoscape ... by Keiichiro Ono
Introduction to Biological Network Analysis and Visualization with Cytoscape ...Introduction to Biological Network Analysis and Visualization with Cytoscape ...
Introduction to Biological Network Analysis and Visualization with Cytoscape ...
Keiichiro Ono5.6K views
Building Reproducible Network Data Analysis / Visualization Workflows by Keiichiro Ono
Building Reproducible Network Data Analysis / Visualization WorkflowsBuilding Reproducible Network Data Analysis / Visualization Workflows
Building Reproducible Network Data Analysis / Visualization Workflows
Keiichiro Ono970 views
Accelerating Data-driven Discovery in Energy Science by Ian Foster
Accelerating Data-driven Discovery in Energy ScienceAccelerating Data-driven Discovery in Energy Science
Accelerating Data-driven Discovery in Energy Science
Ian Foster1.2K views
The Galaxy bioinformatics workflow environment by Rutger Vos
The Galaxy bioinformatics workflow environmentThe Galaxy bioinformatics workflow environment
The Galaxy bioinformatics workflow environment
Rutger Vos6.1K views
Accelerating Discovery via Science Services by Ian Foster
Accelerating Discovery via Science ServicesAccelerating Discovery via Science Services
Accelerating Discovery via Science Services
Ian Foster1.3K views
Materials Data Facility: Streamlined and automated data sharing, discovery, ... by Ian Foster
Materials Data Facility: Streamlined and automated data sharing,  discovery, ...Materials Data Facility: Streamlined and automated data sharing,  discovery, ...
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
Ian Foster880 views
Data science apps: beyond notebooks by Natalino Busa
Data science apps: beyond notebooksData science apps: beyond notebooks
Data science apps: beyond notebooks
Natalino Busa762 views
Big data at experimental facilities by Ian Foster
Big data at experimental facilitiesBig data at experimental facilities
Big data at experimental facilities
Ian Foster923 views
2019 03-11 bio it-world west genepattern notebook slides by Michael Reich
2019 03-11 bio it-world west genepattern notebook slides2019 03-11 bio it-world west genepattern notebook slides
2019 03-11 bio it-world west genepattern notebook slides
Michael Reich256 views
The Discovery Cloud: Accelerating Science via Outsourcing and Automation by Ian Foster
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationThe Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
Ian Foster937 views
Big Data Science with H2O in R by Anqi Fu
Big Data Science with H2O in RBig Data Science with H2O in R
Big Data Science with H2O in R
Anqi Fu8.4K views
The DuraMat Data Hub and Analytics Capability: A Resource for Solar PV Data by Anubhav Jain
The DuraMat Data Hub and Analytics Capability: A Resource for Solar PV DataThe DuraMat Data Hub and Analytics Capability: A Resource for Solar PV Data
The DuraMat Data Hub and Analytics Capability: A Resource for Solar PV Data
Anubhav Jain424 views
Introduction to Biological Network Analysis and Visualization with Cytoscape ... by Keiichiro Ono
Introduction to Biological Network Analysis and Visualization with Cytoscape ...Introduction to Biological Network Analysis and Visualization with Cytoscape ...
Introduction to Biological Network Analysis and Visualization with Cytoscape ...
Keiichiro Ono16.1K views

Viewers also liked

Scientometrics by
ScientometricsScientometrics
ScientometricsTanu Malik
1.4K views21 slides
How Can Policymakers and Regulators Better Engage the Internet of Things? by
How Can Policymakers and Regulators Better Engage the Internet of Things? How Can Policymakers and Regulators Better Engage the Internet of Things?
How Can Policymakers and Regulators Better Engage the Internet of Things? Mercatus Center
757 views47 slides
Presentación modulo 1 UNTREF 2016 by
Presentación modulo 1 UNTREF 2016Presentación modulo 1 UNTREF 2016
Presentación modulo 1 UNTREF 2016Ana Carolina Salgado
84 views8 slides
Mercenaries_Freedom_Fighters_and_Self_De by
Mercenaries_Freedom_Fighters_and_Self_DeMercenaries_Freedom_Fighters_and_Self_De
Mercenaries_Freedom_Fighters_and_Self_DeDr. Robert L Silva II MA
219 views16 slides
Question 3) What have you learned from your audience feedback? by
Question 3) What have you learned from your audience feedback?Question 3) What have you learned from your audience feedback?
Question 3) What have you learned from your audience feedback?branblack
241 views7 slides
Datos de caso clinico by
Datos de caso clinicoDatos de caso clinico
Datos de caso clinicoJoel Hq
326 views4 slides

Viewers also liked(13)

Scientometrics by Tanu Malik
ScientometricsScientometrics
Scientometrics
Tanu Malik1.4K views
How Can Policymakers and Regulators Better Engage the Internet of Things? by Mercatus Center
How Can Policymakers and Regulators Better Engage the Internet of Things? How Can Policymakers and Regulators Better Engage the Internet of Things?
How Can Policymakers and Regulators Better Engage the Internet of Things?
Mercatus Center757 views
Question 3) What have you learned from your audience feedback? by branblack
Question 3) What have you learned from your audience feedback?Question 3) What have you learned from your audience feedback?
Question 3) What have you learned from your audience feedback?
branblack241 views
Datos de caso clinico by Joel Hq
Datos de caso clinicoDatos de caso clinico
Datos de caso clinico
Joel Hq326 views
Auditing and Maintaining Provenance in Software Packages by Tanu Malik
Auditing and Maintaining Provenance in Software PackagesAuditing and Maintaining Provenance in Software Packages
Auditing and Maintaining Provenance in Software Packages
Tanu Malik670 views
Nur 2-minuten by grafic02
Nur 2-minutenNur 2-minuten
Nur 2-minuten
grafic02191 views
David Campbell: Writing Security by Aaron Bauer
David Campbell: Writing SecurityDavid Campbell: Writing Security
David Campbell: Writing Security
Aaron Bauer675 views

Similar to GeoDataspace: Simplifying Data Management Tasks with Globus

Ozri 2013 Brisbane, Australia - Geodatabase Efficiencies by
Ozri 2013 Brisbane, Australia - Geodatabase EfficienciesOzri 2013 Brisbane, Australia - Geodatabase Efficiencies
Ozri 2013 Brisbane, Australia - Geodatabase EfficienciesWalter Simonazzi
1.7K views49 slides
FOSS4G 2017 Spatial Sql for Rookies by
FOSS4G 2017 Spatial Sql for RookiesFOSS4G 2017 Spatial Sql for Rookies
FOSS4G 2017 Spatial Sql for RookiesTodd Barr
922 views86 slides
Using Oracle Big Data Discovey as a Data Scientist's Toolkit by
Using Oracle Big Data Discovey as a Data Scientist's ToolkitUsing Oracle Big Data Discovey as a Data Scientist's Toolkit
Using Oracle Big Data Discovey as a Data Scientist's ToolkitMark Rittman
1.6K views40 slides
Geonetwork for Spatial Data by
Geonetwork for Spatial DataGeonetwork for Spatial Data
Geonetwork for Spatial DataNizam GIS
517 views34 slides
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS by
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWSExperiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWSEd Dodds
594 views13 slides
Intro to Big Data - Spark by
Intro to Big Data - SparkIntro to Big Data - Spark
Intro to Big Data - SparkSofian Hadiwijaya
1.5K views72 slides

Similar to GeoDataspace: Simplifying Data Management Tasks with Globus(20)

Ozri 2013 Brisbane, Australia - Geodatabase Efficiencies by Walter Simonazzi
Ozri 2013 Brisbane, Australia - Geodatabase EfficienciesOzri 2013 Brisbane, Australia - Geodatabase Efficiencies
Ozri 2013 Brisbane, Australia - Geodatabase Efficiencies
Walter Simonazzi1.7K views
FOSS4G 2017 Spatial Sql for Rookies by Todd Barr
FOSS4G 2017 Spatial Sql for RookiesFOSS4G 2017 Spatial Sql for Rookies
FOSS4G 2017 Spatial Sql for Rookies
Todd Barr922 views
Using Oracle Big Data Discovey as a Data Scientist's Toolkit by Mark Rittman
Using Oracle Big Data Discovey as a Data Scientist's ToolkitUsing Oracle Big Data Discovey as a Data Scientist's Toolkit
Using Oracle Big Data Discovey as a Data Scientist's Toolkit
Mark Rittman1.6K views
Geonetwork for Spatial Data by Nizam GIS
Geonetwork for Spatial DataGeonetwork for Spatial Data
Geonetwork for Spatial Data
Nizam GIS517 views
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS by Ed Dodds
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWSExperiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Ed Dodds594 views
A Gen3 Perspective of Disparate Data by Robert Grossman
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate Data
Robert Grossman1.2K views
PEARC17: Live Integrated Visualization Environment: An Experiment in General... by moneyjh
PEARC17: Live Integrated Visualization Environment: An Experiment in General...PEARC17: Live Integrated Visualization Environment: An Experiment in General...
PEARC17: Live Integrated Visualization Environment: An Experiment in General...
moneyjh104 views
MongoDB for Spatio-Behavioral Data Analysis and Visualization by MongoDB
MongoDB for Spatio-Behavioral Data Analysis and VisualizationMongoDB for Spatio-Behavioral Data Analysis and Visualization
MongoDB for Spatio-Behavioral Data Analysis and Visualization
MongoDB2.1K views
PostGIS and Spatial SQL by Todd Barr
PostGIS and Spatial SQLPostGIS and Spatial SQL
PostGIS and Spatial SQL
Todd Barr1.6K views
Agile Oracle to PostgreSQL migrations (PGConf.EU 2013) by Gabriele Bartolini
Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)
Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)
Gabriele Bartolini5.1K views
GeoKettle: A powerful open source spatial ETL tool by Thierry Badard
GeoKettle: A powerful open source spatial ETL toolGeoKettle: A powerful open source spatial ETL tool
GeoKettle: A powerful open source spatial ETL tool
Thierry Badard3.5K views
managing georeferenced content with Plone and collective.geo by gborelli
managing georeferenced content with Plone and collective.geomanaging georeferenced content with Plone and collective.geo
managing georeferenced content with Plone and collective.geo
gborelli1K views
GLOSIS vision | GSP Soil Data Facility, ISRIC - Bas Kempen by ExternalEvents
GLOSIS vision | GSP Soil Data Facility, ISRIC - Bas KempenGLOSIS vision | GSP Soil Data Facility, ISRIC - Bas Kempen
GLOSIS vision | GSP Soil Data Facility, ISRIC - Bas Kempen
ExternalEvents362 views
Social Networks Analysis by Joud Khattab
Social Networks AnalysisSocial Networks Analysis
Social Networks Analysis
Joud Khattab5.1K views
ITEM 1. Cont. GloSIS – Spatial Data Infrastructure_Bas Kempen by FAO
ITEM 1. Cont. GloSIS – Spatial Data Infrastructure_Bas KempenITEM 1. Cont. GloSIS – Spatial Data Infrastructure_Bas Kempen
ITEM 1. Cont. GloSIS – Spatial Data Infrastructure_Bas Kempen
FAO251 views
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence... by Perficient, Inc.
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Perficient, Inc.3.6K views

Recently uploaded

Webinar : Desperately Seeking Transformation - Part 2: Insights from leading... by
Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading...The Digital Insurer
98 views52 slides
Evaluation of Quality of Experience of ABR Schemes in Gaming Stream by
Evaluation of Quality of Experience of ABR Schemes in Gaming StreamEvaluation of Quality of Experience of ABR Schemes in Gaming Stream
Evaluation of Quality of Experience of ABR Schemes in Gaming StreamAlpen-Adria-Universität
44 views34 slides
The Role of Patterns in the Era of Large Language Models by
The Role of Patterns in the Era of Large Language ModelsThe Role of Patterns in the Era of Large Language Models
The Role of Patterns in the Era of Large Language ModelsYunyao Li
104 views65 slides
Netmera Presentation.pdf by
Netmera Presentation.pdfNetmera Presentation.pdf
Netmera Presentation.pdfMustafa Kuğu
22 views50 slides
Future of AR - Facebook Presentation by
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook PresentationRob McCarty
66 views27 slides
AIM102-S_Cognizant_CognizantCognitive by
AIM102-S_Cognizant_CognizantCognitiveAIM102-S_Cognizant_CognizantCognitive
AIM102-S_Cognizant_CognizantCognitivePhilipBasford
23 views36 slides

Recently uploaded(20)

Webinar : Desperately Seeking Transformation - Part 2: Insights from leading... by The Digital Insurer
Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading...
The Role of Patterns in the Era of Large Language Models by Yunyao Li
The Role of Patterns in the Era of Large Language ModelsThe Role of Patterns in the Era of Large Language Models
The Role of Patterns in the Era of Large Language Models
Yunyao Li104 views
Future of AR - Facebook Presentation by Rob McCarty
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook Presentation
Rob McCarty66 views
AIM102-S_Cognizant_CognizantCognitive by PhilipBasford
AIM102-S_Cognizant_CognizantCognitiveAIM102-S_Cognizant_CognizantCognitive
AIM102-S_Cognizant_CognizantCognitive
PhilipBasford23 views
Measurecamp Brussels - Synthetic data.pdf by Human37
Measurecamp Brussels - Synthetic data.pdfMeasurecamp Brussels - Synthetic data.pdf
Measurecamp Brussels - Synthetic data.pdf
Human37 27 views
The Power of Heat Decarbonisation Plans in the Built Environment by IES VE
The Power of Heat Decarbonisation Plans in the Built EnvironmentThe Power of Heat Decarbonisation Plans in the Built Environment
The Power of Heat Decarbonisation Plans in the Built Environment
IES VE85 views
Mobile Core Solutions & Successful Cases.pdf by IPLOOK Networks
Mobile Core Solutions & Successful Cases.pdfMobile Core Solutions & Successful Cases.pdf
Mobile Core Solutions & Successful Cases.pdf
IPLOOK Networks16 views
Adopting Karpenter for Cost and Simplicity at Grafana Labs.pdf by MichaelOLeary82
Adopting Karpenter for Cost and Simplicity at Grafana Labs.pdfAdopting Karpenter for Cost and Simplicity at Grafana Labs.pdf
Adopting Karpenter for Cost and Simplicity at Grafana Labs.pdf
MichaelOLeary8213 views
PCCC23:日本AMD株式会社 テーマ2「AMD EPYC™ プロセッサーを用いたAIソリューション」 by PC Cluster Consortium
PCCC23:日本AMD株式会社 テーマ2「AMD EPYC™ プロセッサーを用いたAIソリューション」PCCC23:日本AMD株式会社 テーマ2「AMD EPYC™ プロセッサーを用いたAIソリューション」
PCCC23:日本AMD株式会社 テーマ2「AMD EPYC™ プロセッサーを用いたAIソリューション」
Cocktail of Environments. How to Mix Test and Development Environments and St... by Aleksandr Tarasov
Cocktail of Environments. How to Mix Test and Development Environments and St...Cocktail of Environments. How to Mix Test and Development Environments and St...
Cocktail of Environments. How to Mix Test and Development Environments and St...
Optimizing Communication to Optimize Human Behavior - LCBM by Yaman Kumar
Optimizing Communication to Optimize Human Behavior - LCBMOptimizing Communication to Optimize Human Behavior - LCBM
Optimizing Communication to Optimize Human Behavior - LCBM
Yaman Kumar39 views
AI + Memoori = AIM by Memoori
AI + Memoori = AIMAI + Memoori = AIM
AI + Memoori = AIM
Memoori15 views
Digital Personal Data Protection (DPDP) Practical Approach For CISOs by Priyanka Aash
Digital Personal Data Protection (DPDP) Practical Approach For CISOsDigital Personal Data Protection (DPDP) Practical Approach For CISOs
Digital Personal Data Protection (DPDP) Practical Approach For CISOs
Priyanka Aash171 views
This talk was not generated with ChatGPT: how AI is changing science by Elena Simperl
This talk was not generated with ChatGPT: how AI is changing scienceThis talk was not generated with ChatGPT: how AI is changing science
This talk was not generated with ChatGPT: how AI is changing science
Elena Simperl34 views
"Node.js vs workers — A comparison of two JavaScript runtimes", James M Snell by Fwdays
"Node.js vs workers — A comparison of two JavaScript runtimes", James M Snell"Node.js vs workers — A comparison of two JavaScript runtimes", James M Snell
"Node.js vs workers — A comparison of two JavaScript runtimes", James M Snell
Fwdays14 views
"Package management in monorepos", Zoltan Kochan by Fwdays
"Package management in monorepos", Zoltan Kochan"Package management in monorepos", Zoltan Kochan
"Package management in monorepos", Zoltan Kochan
Fwdays37 views

GeoDataspace: Simplifying Data Management Tasks with Globus

  • 1. Simplifying  Data  Management  Tasks  with   Globus   Tanu  Malik,  Ian  Foster,  Kyle  Chard,  Roselyne  Tchoua,   Joseph  Baker,  Mike  Gurnis,  Jonathan  Goodall,  ScoD  Peckham   GeoDataspace
  • 2. Share and Reproduce Alice wants to share her models and simulation output with Bob, and Bob wants to re-execute Alice’s application to validate her inputs and outputs. GeoDatasp
  • 3. Alice’s Options 1. A tar and gzip 2. Build a website with model code, parameters, and data 3. Create a virtual machine GeoDatasp
  • 4. Bob’s Frustration 1. I do not find the lib.so required for building the model. 2. How do I? GeoDatasp Lack of easy and efficient methods for sharing and reproducibility Amount of pain Bob suffers Amount of pain Alice suffers
  • 5. GeoDataspace • Goal: Sharing and reproducibility hand-in- hand • Target users: Computational geoscientists • Data and model integration • Research Output is More Than "Just" a Research Paper GeoDatasp
  • 6. GeoDataspace CI Components • The geounits • Units of scientific activity/research output • How to capture and track this activity • Globus Catalog • A scalable, flexible catalog for annotations conforming to open-world assumption • Globus Publish and reproduce geounits • Share/Publish geounits for others • Replay geounits for analysis GeoDatasp
  • 7. geounits: package data , source code and environment GeoDatasp
  • 8. geounit Client: Provenance is key GeoDatasp 1. audit <program name> 2. PROV compliant database 3. exec <program name> [activity]
  • 9. geounit Client: Features • Based on Code, Data, Environment (CDE’s) ptrace and okapi functionality • Data/code can be local or distributed • Data/code files are not manifested into the package until ready to share; only descriptions in package • Specify granularity of auditing • Partial replay • Unpack into docker or vagrant
  • 10. Globus Catalog: hosts geounits • Dataset Management Model • Catalog: a hosted resource that enables the grouping of related datasets • Dataset: a virtual collection of (schemaless) metadata and distributed data elements viz files, provenance • Annotation: a piece of metadata that exists within the context of a dataset or data member GeoDatasp
  • 11. Globus Catalog • Dataset Service • Virtual views of data based on user-defined and/or automatically extracted metadata (annotations) • Implemented as a service with web and REST interfaces • Relies on Globus Nexus for user authentication and group management • Client-side Tooling • Dataset ingest • Automatic creation of datasets and extraction of metadata from various common data formats and directory structures • Globus endpoints • Associate data (in files and directories) with one or more datasets • Python Client library • Integration with external services • Transfer: Moving datasets from their storage endpoint(s) to a selected destination • Faceted Browser Search • Search based on provenance entities and activities GeoDatasp
  • 12. Globus Catalog: REST interface GeoDatasp Approach •  Hosted user-defined catalogs •  Based on annotation model <dataset/member, name, value> •  Association of data members •  Fine grained access control •  Flexible query language –  Name:value, free text, facets,… •  Integrated with other services /geodataspace /geodataspace/annotation /geodataspace/geounit /geodataspace/geounit/annotation /geodataspace/geounit/acl /geodataspace/geounit/members /geodataspace/geounit/members/annotation /geodataspace/geounit/provenance /geodataspace/geounit/version
  • 13. Publish and Reexecute geounits • Still in the works • Each geounit can be published through Globus Publish and re-executed through analysis platform GeoDatasp
  • 14. Science Drivers Solid Earth Space Science Hydrology CSDMS GeoDatasp GeoDataspace
  • 15. Solid Earth • Allow reproducible, replayable geounits of GPlates • GPlates • Software package has several dependencies • Create geounits of Kinematic Representation of Surface of Earth (3D and 4D models) • GPlates software, • GPML files (XML for plate tectonics) used in the model, • output GPML files are simple X/Y format or could be visualization files, a global set of visualization output, images as well.  • Integrating geounits in Python workflows • Incorporate metadata from workflows and use geounit metadata to inform workflows GeoDatasp
  • 16. Hydrology • Data processing steps for theVIC model geounit 1 geounit 2 geounit 3 geounit 4 Objective: Monitor changes in the data processing steps and compare them across the various runs GeoDatasp
  • 17. Space Science • Create geounits of SuperDarn data and its plotting products • Publish them for validation GeoDatasp
  • 18. CSDMS • How geounits should be coupled • Metadata alignment issues • If we create geounits of CSDMS models, how do we enable suitable search interfaces with the provenance metadata and CSDMS metadata? GeoDatasp
  • 19. Current Work • Working with use cases to bootstrap geounits • Populating geounits based on Python workflows and incorporate geounits in workflows • Interfacing geounit Client with Globus Catalog • Improving distributed search functionality GeoDatasp
  • 20. Track it! • http://workspace.earthcube.org/ geodataspace • Software, Source code, Science Usecases, Reports, Presentations, News GeoDatasp
  • 21. Acknowledgements • National Science Foundation • EarthCube Community • Globus team • CI team GeoDatasp