SlideShare a Scribd company logo
1 of 19
Process automation
for data-driven science


Ian Foster
Computation Institute
Argonne National Laboratory & The University of Chicago

Talk at Materials Genome Initiative Workshop, May 14-15, DC
                                                              www.ci.anl.gov
                                                              www.ci.uchicago.edu
Where we want to get to
    Imagine if, when tackling a problem, we could
    easily, both alone and within a distributed team:
    • Assemble, integrate, and interpret all relevant
       data—organized within a knowledge network
    • Be informed of anomalies, patterns, and gaps
    • Formulate and evaluate computational models
    • Launch automated processes to test
       hypotheses & expand the knowledge network
    All within an environment in which productive
    strategies could be easily scaled—and repeated
                                               www.ci.anl.gov
2
                                               www.ci.uchicago.edu
The attractive vs. the pragmatic
•   Some attractive goals expressed yesterday
    – “Record the complete process used to generate data”
    – “Define standard formats and metadata”
    – “Make users rate data every time they use it”
    – “Eliminate incorrect data from databases”


•   My pragmatic take on how best to proceed
    –   “Identify, automate, and streamline key
         processes to make desirable behaviors
         easy”

                                                  www.ci.anl.gov
3
                                                  www.ci.uchicago.edu
www.ci.anl.gov
4
    www.ci.uchicago.edu
Tripit exemplifies process automation

       Me                           Other services
    Book flights   Record flights
                   Suggest hotel
    Book hotel     Record hotel
                   Get weather
                   Prepare maps
                   Share info
                   Check prices
                   Monitor flight
                                        www.ci.anl.gov
5
                                        www.ci.uchicago.edu
Process automation for science
    Run experiment
       Collect data
       Move data
       Check data          >5,000 registered users, >4 PB moved

     Annotate data
       Share data
    Find similar data     >25,000 registered users, >1PB access

    Link to literature
      Analyze data
                              >45,000 metagenomes, 12 Tbp
      Publish data
                                                 www.ci.anl.gov
6
                                                 www.ci.uchicago.edu
A simple take on “big process for science”
              Research Data Management-as-a-Service
       Globus        Globus         Globus          Globus   …SaaS
      Transfer       Storage      Collaborate      Catalog
                        Globus Integrate                     …PaaS




                                                              www.ci.anl.gov
7
                                                              www.ci.uchicago.edu
Globus Transfer: Data movement
              Research Data Management-as-a-Service
       Globus        Globus         Globus          Globus   …SaaS
      Transfer       Storage      Collaborate      Catalog
                        Globus Integrate                     …PaaS




                                                              www.ci.anl.gov
8
                                                              www.ci.uchicago.edu
Globus Transfer details
• Reliable file transfer.
     –   Easy “fire-and-forget” transfers
     –   Automatic fault recovery
     –   High performance
     –   Across multiple security domains
• No IT required.
     – Software as a Service (SaaS)
           • No client software installation
           • New features automatically available
     – Consolidated support & troubleshooting
     – Works with existing GridFTP servers; Globus Connect for “last mile”
• >5000 users, >4 Petabytes and 500,000,000 files moved
• >99.9% uptime in 2012
Adopted by Advanced Photon Source, NERSC, Blue Waters, campuses
                                                                        www.ci.anl.gov
10
                                                                        www.ci.uchicago.edu
Globus Storage and Globus Collaborate
              Research Data Management-as-a-Service
       Globus        Globus         Globus          Globus   …SaaS
      Transfer       Storage      Collaborate      Catalog
                        Globus Integrate                     …PaaS




                                                              www.ci.anl.gov
11
                                                              www.ci.uchicago.edu
Globus Storage: For when you want to …

•    Place your data where
     you want
•    Access it from anywhere   Globus Transfer, HTTP/REST, Desktop sync
     via different protocols
•    Update it, version it,
                                 Globus
     and take snapshots          Storage
                                 volume
•    Share versions with
     who you want
•    Synchronize among           Commercial      National      Campus
                               storage service   research     computin
     locations                    provider        center       g center

                                                            www.ci.anl.gov
    12
                                                            www.ci.uchicago.edu
Globus Collaborate: For when you want to
Join with a few or many people to:
• Share documents
• Track tasks
• Send email
• Share data
• Do whatever


With:
• Common
  groups
• Delegated
  management
                                           www.ci.anl.gov
13
                                           www.ci.uchicago.edu
Globus Storage & Collaborate in action
                                               Globus Connect
                         Bryce                 Move DTI results to                                     PADS
                                                Bryce’s laptop                                         Compute
    DTI Group                                                                                          Cluster
    - Kyle
    - Bryce                         Globus Storage                   Globus Transfer
                                    Create snapshot to               Copy TBI data to
                                     share with group                compute cluster
    Globus Nexus                                                                         Globus Transfer
   Add Bryce to TBI                                                                      Move DTI results
    collaboration                                                                        to shared volume


          Globus Collaborate
          Publish DTI data to TBI
                 web site
                                                                                                    Amazon S3
                                     Globus Storage
                                     Create volume and
                                    share with TBI group                                                SDSC
                                                                                    UChicago
                                                                                                        Cloud
Kyle                                                                  “TBI”           Object
                                    Globus Connect                   volume            Store
                               Move MRI files to                                                      Cornell
TBI=Traumatic Brain Injury     TBI shared volume                                                     Red Cloud
DTI=Diffusion Tensor Imaging                                                                   www.ci.anl.gov
 14
MRI=Magnetic Resonance Imaging                                                                 www.ci.uchicago.edu
Use case: Earth System Grid




Outsource data transfer to Globus
 – Data download from search
 – Data transfer to another server
 – Replication between sites
Next step is automated publication
No ESGF client software needed
                                     www.ci.anl.gov
15
                                     www.ci.uchicago.edu
Data acquisition, management, analysis




                          don’t
           Experiments Literature Computations
                         forget!




     Big Data (volume, velocity, variety, variability)
       … demands Big Process in order for discovery to scale
                                                     www.ci.anl.gov
16
                                                     www.ci.uchicago.edu
How to proceed
•    Top down:
     – Large-scale integration, standardized formats,
       common protocols, etc.
     – Good if achieved, but likely to be slow and painful


•    Bottom up:
     – Consider opportunities to encourage useful
       behaviors via outsourcing and automation
     – Making data accessible is the first (and easiest?) 90%
     – Facilitate sharing, annotation, emergence of
       (localized) structure, bridging among structures
                                                     www.ci.anl.gov
17
                                                     www.ci.uchicago.edu
Acknowledgements
•    Thanks for vital and much appreciated support:
     – DOE Office of Advanced Scientific Computing
       Research (ASCR)
     – NSF Office of Cyberinfrastructure (OCI)
     – National Institutes of Health
     – The University of Chicago
•    Thanks to the Globus Online team at the
     University of Chicago and Argonne for their
     amazing work. See
     https://www.globusonline.org/about/goteam/
                                                     www.ci.anl.gov
18
                                                     www.ci.uchicago.edu
Thank you!


foster@anl.gov
foster@uchicago.edu



                      www.ci.anl.gov
                      www.ci.uchicago.edu

More Related Content

More from Ian Foster

Foster CRA March 2022.pptx
Foster CRA March 2022.pptxFoster CRA March 2022.pptx
Foster CRA March 2022.pptxIan Foster
 
Big Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental ScienceBig Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental ScienceIan Foster
 
AI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryAI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryIan Foster
 
Coding the Continuum
Coding the ContinuumCoding the Continuum
Coding the ContinuumIan Foster
 
Data Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud AutomationData Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud AutomationIan Foster
 
Research Automation for Data-Driven Discovery
Research Automation for Data-Driven DiscoveryResearch Automation for Data-Driven Discovery
Research Automation for Data-Driven DiscoveryIan Foster
 
Scaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and JupyterScaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and JupyterIan Foster
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for ScienceIan Foster
 
Data Automation at Light Sources
Data Automation at Light SourcesData Automation at Light Sources
Data Automation at Light SourcesIan Foster
 
Team Argon Summary
Team Argon SummaryTeam Argon Summary
Team Argon SummaryIan Foster
 
Thoughts on interoperability
Thoughts on interoperabilityThoughts on interoperability
Thoughts on interoperabilityIan Foster
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...Ian Foster
 
NIH Data Commons Architecture Ideas
NIH Data Commons Architecture IdeasNIH Data Commons Architecture Ideas
NIH Data Commons Architecture IdeasIan Foster
 
Going Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCFGoing Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCFIan Foster
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...Ian Foster
 
Software Infrastructure for a National Research Platform
Software Infrastructure for a National Research PlatformSoftware Infrastructure for a National Research Platform
Software Infrastructure for a National Research PlatformIan Foster
 
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...Ian Foster
 
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
Materials Data Facility: Streamlined and automated data sharing,  discovery, ...Materials Data Facility: Streamlined and automated data sharing,  discovery, ...
Materials Data Facility: Streamlined and automated data sharing, discovery, ...Ian Foster
 
Globus Auth: A Research Identity and Access Management Platform
Globus Auth: A Research Identity and Access Management PlatformGlobus Auth: A Research Identity and Access Management Platform
Globus Auth: A Research Identity and Access Management PlatformIan Foster
 
Streamlined data sharing and analysis to accelerate cancer research
Streamlined data sharing and analysis to accelerate cancer researchStreamlined data sharing and analysis to accelerate cancer research
Streamlined data sharing and analysis to accelerate cancer researchIan Foster
 

More from Ian Foster (20)

Foster CRA March 2022.pptx
Foster CRA March 2022.pptxFoster CRA March 2022.pptx
Foster CRA March 2022.pptx
 
Big Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental ScienceBig Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental Science
 
AI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryAI at Scale for Materials and Chemistry
AI at Scale for Materials and Chemistry
 
Coding the Continuum
Coding the ContinuumCoding the Continuum
Coding the Continuum
 
Data Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud AutomationData Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud Automation
 
Research Automation for Data-Driven Discovery
Research Automation for Data-Driven DiscoveryResearch Automation for Data-Driven Discovery
Research Automation for Data-Driven Discovery
 
Scaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and JupyterScaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and Jupyter
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
Data Automation at Light Sources
Data Automation at Light SourcesData Automation at Light Sources
Data Automation at Light Sources
 
Team Argon Summary
Team Argon SummaryTeam Argon Summary
Team Argon Summary
 
Thoughts on interoperability
Thoughts on interoperabilityThoughts on interoperability
Thoughts on interoperability
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
 
NIH Data Commons Architecture Ideas
NIH Data Commons Architecture IdeasNIH Data Commons Architecture Ideas
NIH Data Commons Architecture Ideas
 
Going Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCFGoing Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCF
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
 
Software Infrastructure for a National Research Platform
Software Infrastructure for a National Research PlatformSoftware Infrastructure for a National Research Platform
Software Infrastructure for a National Research Platform
 
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
 
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
Materials Data Facility: Streamlined and automated data sharing,  discovery, ...Materials Data Facility: Streamlined and automated data sharing,  discovery, ...
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
 
Globus Auth: A Research Identity and Access Management Platform
Globus Auth: A Research Identity and Access Management PlatformGlobus Auth: A Research Identity and Access Management Platform
Globus Auth: A Research Identity and Access Management Platform
 
Streamlined data sharing and analysis to accelerate cancer research
Streamlined data sharing and analysis to accelerate cancer researchStreamlined data sharing and analysis to accelerate cancer research
Streamlined data sharing and analysis to accelerate cancer research
 

Recently uploaded

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Recently uploaded (20)

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

Process automation for data-driven science

  • 1. Process automation for data-driven science Ian Foster Computation Institute Argonne National Laboratory & The University of Chicago Talk at Materials Genome Initiative Workshop, May 14-15, DC www.ci.anl.gov www.ci.uchicago.edu
  • 2. Where we want to get to Imagine if, when tackling a problem, we could easily, both alone and within a distributed team: • Assemble, integrate, and interpret all relevant data—organized within a knowledge network • Be informed of anomalies, patterns, and gaps • Formulate and evaluate computational models • Launch automated processes to test hypotheses & expand the knowledge network All within an environment in which productive strategies could be easily scaled—and repeated www.ci.anl.gov 2 www.ci.uchicago.edu
  • 3. The attractive vs. the pragmatic • Some attractive goals expressed yesterday – “Record the complete process used to generate data” – “Define standard formats and metadata” – “Make users rate data every time they use it” – “Eliminate incorrect data from databases” • My pragmatic take on how best to proceed – “Identify, automate, and streamline key processes to make desirable behaviors easy” www.ci.anl.gov 3 www.ci.uchicago.edu
  • 4. www.ci.anl.gov 4 www.ci.uchicago.edu
  • 5. Tripit exemplifies process automation Me Other services Book flights Record flights Suggest hotel Book hotel Record hotel Get weather Prepare maps Share info Check prices Monitor flight www.ci.anl.gov 5 www.ci.uchicago.edu
  • 6. Process automation for science Run experiment Collect data Move data Check data >5,000 registered users, >4 PB moved Annotate data Share data Find similar data >25,000 registered users, >1PB access Link to literature Analyze data >45,000 metagenomes, 12 Tbp Publish data www.ci.anl.gov 6 www.ci.uchicago.edu
  • 7. A simple take on “big process for science” Research Data Management-as-a-Service Globus Globus Globus Globus …SaaS Transfer Storage Collaborate Catalog Globus Integrate …PaaS www.ci.anl.gov 7 www.ci.uchicago.edu
  • 8. Globus Transfer: Data movement Research Data Management-as-a-Service Globus Globus Globus Globus …SaaS Transfer Storage Collaborate Catalog Globus Integrate …PaaS www.ci.anl.gov 8 www.ci.uchicago.edu
  • 9.
  • 10. Globus Transfer details • Reliable file transfer. – Easy “fire-and-forget” transfers – Automatic fault recovery – High performance – Across multiple security domains • No IT required. – Software as a Service (SaaS) • No client software installation • New features automatically available – Consolidated support & troubleshooting – Works with existing GridFTP servers; Globus Connect for “last mile” • >5000 users, >4 Petabytes and 500,000,000 files moved • >99.9% uptime in 2012 Adopted by Advanced Photon Source, NERSC, Blue Waters, campuses www.ci.anl.gov 10 www.ci.uchicago.edu
  • 11. Globus Storage and Globus Collaborate Research Data Management-as-a-Service Globus Globus Globus Globus …SaaS Transfer Storage Collaborate Catalog Globus Integrate …PaaS www.ci.anl.gov 11 www.ci.uchicago.edu
  • 12. Globus Storage: For when you want to … • Place your data where you want • Access it from anywhere Globus Transfer, HTTP/REST, Desktop sync via different protocols • Update it, version it, Globus and take snapshots Storage volume • Share versions with who you want • Synchronize among Commercial National Campus storage service research computin locations provider center g center www.ci.anl.gov 12 www.ci.uchicago.edu
  • 13. Globus Collaborate: For when you want to Join with a few or many people to: • Share documents • Track tasks • Send email • Share data • Do whatever With: • Common groups • Delegated management www.ci.anl.gov 13 www.ci.uchicago.edu
  • 14. Globus Storage & Collaborate in action Globus Connect Bryce Move DTI results to PADS Bryce’s laptop Compute DTI Group Cluster - Kyle - Bryce Globus Storage Globus Transfer Create snapshot to Copy TBI data to share with group compute cluster Globus Nexus Globus Transfer Add Bryce to TBI Move DTI results collaboration to shared volume Globus Collaborate Publish DTI data to TBI web site Amazon S3 Globus Storage Create volume and share with TBI group SDSC UChicago Cloud Kyle “TBI” Object Globus Connect volume Store Move MRI files to Cornell TBI=Traumatic Brain Injury TBI shared volume Red Cloud DTI=Diffusion Tensor Imaging www.ci.anl.gov 14 MRI=Magnetic Resonance Imaging www.ci.uchicago.edu
  • 15. Use case: Earth System Grid Outsource data transfer to Globus – Data download from search – Data transfer to another server – Replication between sites Next step is automated publication No ESGF client software needed www.ci.anl.gov 15 www.ci.uchicago.edu
  • 16. Data acquisition, management, analysis don’t Experiments Literature Computations forget! Big Data (volume, velocity, variety, variability) … demands Big Process in order for discovery to scale www.ci.anl.gov 16 www.ci.uchicago.edu
  • 17. How to proceed • Top down: – Large-scale integration, standardized formats, common protocols, etc. – Good if achieved, but likely to be slow and painful • Bottom up: – Consider opportunities to encourage useful behaviors via outsourcing and automation – Making data accessible is the first (and easiest?) 90% – Facilitate sharing, annotation, emergence of (localized) structure, bridging among structures www.ci.anl.gov 17 www.ci.uchicago.edu
  • 18. Acknowledgements • Thanks for vital and much appreciated support: – DOE Office of Advanced Scientific Computing Research (ASCR) – NSF Office of Cyberinfrastructure (OCI) – National Institutes of Health – The University of Chicago • Thanks to the Globus Online team at the University of Chicago and Argonne for their amazing work. See https://www.globusonline.org/about/goteam/ www.ci.anl.gov 18 www.ci.uchicago.edu
  • 19. Thank you! foster@anl.gov foster@uchicago.edu www.ci.anl.gov www.ci.uchicago.edu

Editor's Notes

  1. Given continued exponential growth along so many dimensions …… process efficiencies must improve at a comparable rate to maintain just constant progress