Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

GlobusWorld 2019 Opening Keynote

18 views

Published on

This presentation was given by Steve Tuecke, Ian Foster and Rachana Ananthakrishnan, all from the University of Chicago and Argonne National Lab.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

GlobusWorld 2019 Opening Keynote

  1. 1. Manage Data with Assurance Ian Foster Rachana Ananthakrishnan Steve Tuecke Vas Vasiliadis
  2. 2. Mission Increase the efficiency and effectiveness of researchers engaged in data-driven science and scholarship through sustainable software
  3. 3. Data keeps moving! 3
  4. 4. Globus by the numbers... 7,400 active shared endpoints 100+ subscribers 600 PB moved 22,000 active personal endpoints 90 billion files processed 1,800 active server endpoints 3 months longest running transfer 1 PB largest single transfer to date 99.9% availability 600+ identity providers 2000+ most shared endpoints at a single institution 138,000 registered users
  5. 5. 0 500 1000 1500 2000 2500 3000 3500 4000 4500 Jan-14 Jul-14 Jan-15 Jul-15 Jan-16 Jul-16 Jan-17 Jul-17 Jan-18 Jul-18 Jan-19 Active Endpoints by Month Free Subscribed
  6. 6. Globus User Story Highlights File Sharing Value Improved Performance Ease of Use Connector Benefits “We needed an easy way to share terabytes of data on a regular basis with dozens of researchers. Thanks to Globus sharing, it’s easy for us to get our researchers the data they need.” Platform Development “Now Canadian researchers have a single repository where data can easily and securely be accessed, searched and shared.” “With Globus, our researchers have one less thing to worry about!” “I routinely have to move hundreds of gigabytes of data – Globus makes it easy, so I can execute these transfers with very little effort.” “Users can quickly, effectively, and securely share data with their research community or the broader public.” “WVU uses Globus to archive research data out to Google Drive.” “[BlackPearl with Globus] enables us to archive and share petabytes of information in a convenient solution.” Usage Briefs: www.globus.org/usage-brief-library User Stories: www.globus.org/user-stories What makes it all worthwhile
  7. 7. “Whatever you are studying right now, if you are not getting up to speed on deep learning, neural networks, etc., you lose. We are going through the process where software will automate software, automation will automate automation.” -- Mark Cuban
  8. 8. 10 Configure apparatus/write code Run experiments Solve societal problems Create knowledge What scientists want to do Most scientist time Analyze and plan Opportunities for AI in science: Research today
  9. 9. 11 Run experiments Create knowledge Most scientist time AI assistants Analyze and plan Opportunities for AI in science: Research tomorrow Solve societal problems Configure apparatus/write code
  10. 10. AI at Argonne: data-driven discovery Strong and weak lensing in sky survey data Prediction of antimicrobial resistance phenotypes Prediction of radiation stopping power Identification and tracking of storms Parameter extraction in atom probe tomography Learning for dynamic sampling in spectroscopy Structure-property-process triangle in additive manufact. Vehicle energy consumption prediction Photometric red shift estimation New materials for efficient solar cells Cosmic Microwave Background emulation Enhancement of noisy tomographic images Nowcasting with convolutional LSTMs Efficient climate model emulators Defect-level prediction in seminconductors Flying object detector for edge deployment Discovery of new energy storage materials Reduced order modeling of laser sintering
  11. 11. 13 Model creation Data ingest Inference HPO Data enhancement Data QA/QC Feature selection Model training UQ Model reduction Active/ reinforcement learning Scientific instruments Major user facilities Laboratory equipment Automated labs … Sensors Environmental Laboratories Mobile … Simulation codes Computational results Function memorization … Databases Reference data Experimental data Computed properties Scientific literature … AI Workflows Data Models , Accelerato rs Compute Agile Infrastructure Surrogates Scientists Expert input Goal setting … AI industry, academia New methods Open source codes AI accelerators … Rethinking Data infrastructure for Science AI
  12. 12. 14 Model creation Data ingest Inference HPO Data enhancement Data QA/QC Feature selection Model training UQ Model reduction Active/ reinforcement learning Scientific instruments Major user facilities Laboratory equipment Automated labs … Sensors Environmental Laboratories Mobile … Simulation codes Computational results Function memorization … Databases Reference data Experimental data Computed properties Scientific literature … AI Workflows Data Models , Accelerat ors Compute Agile Infrastructure Surrogates Scientists Expert input Goal setting … AI industry, academia New methods Open source codes AI accelerators … Agile services Data transfer Registries Data sharing Containers Integrity Automation FaaS Identifiers Rethinking Data infrastructure for Science AI
  13. 13. 15 Data ingest Inference HPO Data enhancement Data QA/QC Feature selection Model training UQ Model reduction Active/ reinforcement learning Scientific instruments Major user facilities Laboratory equipment Automated labs … Sensors Environmental Laboratories Mobile … Simulation codes Computational results Function memorization … Databases Reference data Experimental data Computed properties Scientific literature … AI Workflows Data Models , Accelerat ors Compute Agile Infrastructure Surrogates Scientists Expert input Goal setting … AI industry, academia New methods Open source codes AI accelerators … Agile services Data transfer Registries Data sharing Containers Integrity Automation FaaS Identifiers Transfer Auth Sharing Model creation Rethinking Data infrastructure for Science AI
  14. 14. 16 Data ingest Inference HPO Data enhancement Data QA/QC Feature selection Model training UQ Model reduction Active/ reinforcement learning Scientific instruments Major user facilities Laboratory equipment Automated labs … Sensors Environmental Laboratories Mobile … Simulation codes Computational results Function memorization … Databases Reference data Experimental data Computed properties Scientific literature … AI Workflows Data Models , Accelerat ors Compute Agile Infrastructure Surrogates Scientists Expert input Goal setting … AI industry, academia New methods Open source codes AI accelerators … Agile services Data transfer Registries Data sharing Containers Integrity Automation FaaS Identifiers funcX Transfer Automate Auth Sharing Identifers Model creation Rethinking Data infrastructure for Science AI
  15. 15. 17 Data ingest Inference HPO Data enhancement Data QA/QC Feature selection Model training UQ Model reduction Active/ reinforcement learning Scientific instruments Major user facilities Laboratory equipment Automated labs … Sensors Environmental Laboratories Mobile … Simulation codes Computational results Function memorization … Databases Reference data Experimental data Computed properties Scientific literature … AI Workflows Data Models , Accelerat ors Compute Agile Infrastructure Surrogates Scientists Expert input Goal setting … AI industry, academia New methods Open source codes AI accelerators … Agile services Data transfer Registries Data sharing Containers Integrity Automation FaaS Identifiers DLHub xDF funcX Parsl Transfer Automate Petrel Auth Sharing Identifers Model creation CANDLE Rethinking Data infrastructure for Science AI
  16. 16. DLHub: Organizing and Serving Models • Collect, publish, categorize models • Serve models via API with access controls to simplify sharing, consumption, and access • Leverage ALCF resources and prepare for Exascale ML • Deploy and scale automatically • Provide citable DOI for reproducible science Argonne Advanced Computing LDRD Cherukara et al. Energy Storage Tomography www.dlhub.org Models and Processing Logic as a Service X-Ray Science Ward et al. TomoGAN: Liu et al. Input Output
  17. 17. funcX: Think “compute endpoints”
  18. 18. funcX: Think “compute endpoints”
  19. 19. Automation: Ripple Pipelines21
  20. 20. Automation: Neuroanatomy Web form User input Search Ingest Share Set policy Identifier Mint DOI funcX Auth Get credentials Automate Run job Describe Get metadata Transfer Transfer data funcX Run job Transfer Transfer data
  21. 21. Manage Protected Data 25 Higher assurance levels for HIPAA and other regulated data • Support for managed data transfer of protected data such as health related information • Share data with collaborators while meeting compliance requirements • Administration and management of access • Includes BAA option
  22. 22. Globus for high assurance data management • Restricted data handling – PHI (Protected Health Information) – PII (Personally identifiable information) – Controlled Unclassified Information • University of Chicago security controls – NIST 800-53 Low – Superset of 800-171 Low • Business Associate Agreements (BAA) between University of Chicago and our subscribers
  23. 23. Services in scope • Globus Services: Auth, Transfer & Sharing, Groups • Globus Connect Server v5.2 and above • Globus Connect Personal v3.x • Web app (app.globus.org) • Globus Command Line Interface (CLI) • Connectors: POSIX, Google Drive, AWS S3, CEPH
  24. 24. Restricted data disclosure to Globus • Globus never sees file contents – File contents can have restricted data • File paths/name can have restricted data (e.g. PHI) • No other elements (endpoint definitions, labels, collection definitions) can contain restricted data
  25. 25. Product enhancements for high assurance • Additional authentication assurance – Authenticate with specific identity within specific time within a session • Isolation of applications – Authentication context is per application, per session (~browser session) • Enforces encryption of all user data in transit • Audit logging – Both at the institution and Globus services
  26. 26. Product enhancements for high assurance • Additional security requirements enforced on management of all high assurance resources – Data access, and any interaction that can lead to data access – Examples: Groups, Management Console • Enhanced user interfaces for seamless management of protected data – Webapp and CLI
  27. 27. Operational enhancements for high assurance • Intrusion detection and prevention • Encryption • Enhanced logging • Secure remote access, access control, and secure practices for laptops • Uniform configuration management and change control • AWS best practices for secure environment: VPCs, security groups, IAM best practices
  28. 28. New subscription levels • High Assurance – 33% uplift on Standard subscription and on premium connectors used for high assurance data • BAA – All High Assurance features + BAA with University of Chicago – 50% uplift on Standard subscription and on premium connectors used under a BAA
  29. 29. High Assurance Demonstration 33
  30. 30. Web app enhancements • Accessibility – Target WCAG 2.0 AA compliance • Responsiveness and touch • Works with new connectors collections.globus.org 34
  31. 31. Web app enhancements • Customizable interface • Full screen view • Compact file listing display • Remember user configuration – Single vs. dual panel – Columns displayed • Continue incorporating user feedback
  32. 32. CLI enhancements • Support for use with high assurance collections • '--format UNIX' flag - output suitable for line-oriented processing with typical Unix tools • 'globus rm' command • 'globus whoami --linked-identities' flag to show all linked identities • '--timeout-exit-code' flag overrides the default exit code for commands which wait on tasks • Enhancements to SDK as needed. 36
  33. 33. Connector updates • Enhanced user experience for credential handling for several connectors (GCSv5) • AWS S3 – Automated multi-region support • Google Drive – Enhancement to retry handling for large transfers • HPSS – Support added for HPSS 7.5 (7.3 to 7.5 supported) – Improved asynchronous staging from tape – New home for documentation: docs.globus.org/premium- storage-connectors/hpss 38
  34. 34. S3 compatible systems • Initial customer deployments • Validation, testing and vendor engagement planned • Additional systems driven by customer demand 39
  35. 35. Announcing our latest connector… beta globus.org/connectors/box
  36. 36. Globus for Box • Extends the value of your Box deployment • Unifies access to cloud and on-prem storage • Transitions protected data (HIPAA-regulated, CUI) seamlessly between Box and other storage systems 41
  37. 37. 42 Box for Globus Demonstration
  38. 38. Make Box part of your research storage ecosystem globus.org/connectors/box docs.globus.org/premium-storage-connectors/box
  39. 39. Globus Connect Server v5.3 • Subsumes GCS version 5.0, 5.1, 5.2 • Standard and high assurance guest collections (sharing) • High assurance mapped collections • Connectors: POSIX, AWS S3, CEPH, Google Drive, Box • Data access protocols: GridFTP and HTTPS • Single deployment support both high assurance and standard gateway • Upgrade all v5.x deployments to v5.3
  40. 40. Recent Transfer enhancements • Verify transfer using client provided checksums – User provided checksum used rather than source checksum for verification • Improvements for scaling transfer service – Multiple nodes for transfer service for higher availability and reliability – Allows for code updates with no downtime 46
  41. 41. SSH with OAuth • Securely access resource using SSH with federated identity – Facilitates automation, eliminates SSH key management – Replacement for deprecated GSI OpenSSH • First version released – Server side PAM module with Globus Auth support – Command line client • Open source, community support – Not part of the standard subscription – OAuth SSH Client: https://pypi.org/project/oauth-ssh/ – OAuth SSH Server PAM module: https://github.com/xsede/oauth-ssh
  42. 42. Where are we headed?
  43. 43. Enhancing the core: Transfer Building the future: Platform
  44. 44. Globus Transfer: A complete solution ☑ Bulk transfer and sync ☑ Good end-to-end performance in myriad of real world settings ☑ End-to-end reliability ☑ Robust security, with federated identities ☑ Layers onto diverse storage systems ☑ Web-compatible client/server remote access ☑ Easy to use interfaces ☑ Easy installation and administration ☑ Sharing data with guest users ☑ Dedicated professional support 50
  45. 45. HTTPS and what it enables • Browser based up/download • Allow your (research) storage to be “on the web” • Enforce same security policies 51
  46. 46. Globus Connect Server v5 Milestones v5.0: Google Drive v5.1: POSIX guest collections, HTTPS v5.x: v4 feature parity+ v5.3 • Multi DTN support • Additional storage systems • Endpoint specific identity providers • … Other features v5.2: High assurance v5.4: …
  47. 47. GCSv5: Key enabling technology for the future • Challenge: Managing increasing amount of shared, dynamic state among multiple DTNs – Endpoint configuration – Multiple storage gateway configurations – Collection configurations – Credentials (user and system) • Approach: Stateless DTNs – No persistent state on DTN – Multi-DTN endpoints without a shared file system • GCS state stored in the cloud – Dynamic sync of state to each DTN – Enabled by our use of AWS AppSync • Customer managed encryption keys with optional escrow – Only you can see and modify your endpoint’s state • Facilitates creation of new Globus Connect features
  48. 48. GCSv5 has significant admin benefits • Greatly simplified multi-DTN deployment – Bootstrap DTN from only client id & secret, and encryption key – No more copy-pasting GCS config files with every change – Command line, REST API, and (eventually) web admin of GCS – Automatic synchronization amongst DTNs • Rapid recovery from failures – Restore all nodes from stored state with minimal effort – No local backups of GCS state required • Lost client ID/secret? Recover them from Auth. • Enables us to roll out new features more quickly
  49. 49. What does it mean for you? • No sudden moves! • Ready for GCS v4 to v5 migration late this year • Tools will be available for migration from GCS v4 • Comprehensive documentation • Long migration period with parallel support of v5 & v4 • Only use GCS v5 today if you need its specific features, otherwise continue to use GCS v4
  50. 50. Planned Features for Globus Transfer • S3 compatible HTTPS interface to GCSv5 storage • Browser based up/downloaders • Multiple checksum algorithm support • Manifest support • Automated recurring replication as a service • … 57
  51. 51. Rethinking data publication • Limited adoption – Not easily customizable • Maintenance Challenges – Costly to maintain – JRE licensing concerns • Going forward – Code will be open source – Leverage platform • Invest in higher priorities
  52. 52. Platform challenge • Transform how research applications, services, and workflows are created, delivered, used, and sustained – Scientific instrument data processing – Repositories: Make data more FAIR – Science gateways • Interoperable ecosystem 59
  53. 53. Globus platform services • Identity and Access Management (IAM) – Federated identity login, Groups, Attributes, Access Control – Auth: Oauth authorization provider • Connect • Transfer – Will become a family of services • Execution • Search, Identifiers • Automation – Queues, Events, Actions, Triggers – Flows 60
  54. 54. Globus Platform: Automation 61
  55. 55. Platform status • Generally Available in a few years • Separate product with separate sustainability model • Early engagements help shape product direction – Argonne Leadership Computing Facility, Materials Data Facility, – NCAR Research Data Archive, NSO, … – Use in Globus products • Multiple integrations facilitate more complete solution – e.g. Django, JupyterHub – Follow progress: globus-integration-examples.readthedocs.io • Currently accessible via professional services team
  56. 56. We are committed to doing all this sustainably
  57. 57. Our focus: You, the research community is
  58. 58. Why not do a for-profit? Focus: Investor ROI è can’t serve you properly!
  59. 59. Sustainability >> $$ No single points of failure
  60. 60. Subscriber Value = Engineering (DevOps) + Customer facing operations (support, sales, outreach, training, professional services)
  61. 61. Freemium means managing tension! Meeting current customer needs…
  62. 62. …and furthering strategic aspirations
  63. 63. Customer community Delivering on requests Product planning process Contractual challenges
  64. 64. Is there a better model? Internet2-like membership?
  65. 65. Network infrastructure services provider Research software provider
  66. 66. Member fee ≈ sustainability Governance model ≈ product influence
  67. 67. Do the dynamics change? - Willingness to join/pay? - Sufficient revenue growth? - Greater subscriber satisfaction?
  68. 68. Why now? Increasing view of Globus as “enterprise” service RCC à CIO
  69. 69. Data management needs are increasingly pervasive ✓ Network ✓ Cycles ✓ Storage Robust data management for all?
  70. 70. Expand the dialogue HPC Management + IT Leadership + Researcher Community
  71. 71. From “Purchase” to “Invest” Everyone derives more value if Globus is a strategic partner
  72. 72. Intrigued? Confused? Amused? Share your thoughts with us!
  73. 73. Thank you to our sponsors... U . S . D E P A R T M E N T O F ENERGY
  74. 74. THANK YOU, subscribers!
  75. 75. Program Preview • Today – Lightning talks – Guest keynotes: Tom Barton, Bobby Kasthuri – Reception • Tomorrow – Tutorials – Office Hours • Friday morning – Customer forum globusworld.org/conf/program
  76. 76. #globusworld @globus

×