I was invited to talk at the 18th GENI Engineering Conference on experiences in the Grid community with creating and operating large shared infrastructures. I chose to focus on our experiences using Software as a Service (SaaS: aka Cloud) to reduce barriers to the use of the capabilities required to create and operate virtual organizations.

  • Foster, Kesselman, and Tuecke claimed that grids were all about “virtual organizations.”The way one should interpret that claim, I would assert, is in the context of Gilder’s comments. Things are distributed, for one reason or another—either via deliberate disintegration process, via outsourcing, or because they just started out distributed. Now we need to reassemble them, in a controlled manner. We gave some examples
  • 173 TB/day
  • Question: Which steps can we outsource in that way?
  • Question: Which steps can we outsource in that way?
  • Globus Nexus makes it easy for individuals, teams, and institutions to create web applications for the science communityIt provides a flexible, powerful Platform-as-a-Service to which developers can outsource their identity, group, and profile management needsUsers encounter intuitive interfaces with common look and feel across different services
  • Four obstacles to collaborative application developmentBuild collaborative applications– Outsource identity, group and profilemanagement– REST API for flexible integration– Intuitive, customizable interfaces
  • slide 6: groups should have a use case.  KBase is a good example. A few things we do for them: - All users that login to the KBase branded site ( will automatically be added to a KBase group.  Then then create sub-groups under that for various things. - They use groups for providing access control to various of their resources - They use the Nexus OAuth to get tokens that their clients can be used to authenticate with the KBase REST APIsCan define policies on groups – membership acceptance, invitation etc. Can set requirements for custom attributes for joiningGroups can be used for authorization decisionsWe use Groups for Crowd/Confluence, Drupal
  • InvitationsRolesPolicies
  • Different InterfacesAmazon-based infrastructure, high availability/elasticDistributed Architecture (AWS), uses ELBs to allocate workload, stateless Nexus servicesScalable/extensible graph model – we can change model easily and quicklyDistributed NoSQL databases to store schemaless graph efficientlyProfessional hosting, lots of other services like monitoring, logging, security, that are managed across GO.
  • More specifically, the opportunity is to apply a very modern technology—software as a service, or SaaS—to address a very modern problem, namely the enormous challenges inherent in translating revolutionary 21st century technologies into scientific advances. Our SaaS approach will address these challenges, and both make powerful tools far more widely available, and reduce the cycle time associated with research and discovery.Achieve economies of scaleReduce cost per researcher dramaticallyAchieve positive returns to scaleMost academic solutions do NOT have PRTSMost industrial solutions DO have PRTS
    1. 1. Hosted services for managing shared cyberinfrastructure Ian Foster Argonne National Laboratory & The University of Chicago Joint work with Rachana Ananthakrishnan, Josh Bryan, Kyle Chard, Mattias Lidman, Steven Tuecke, and others GENI Engineering Conference, NYC, October 28, 2013
    2. 2. Using cloud services to accelerate discovery Ian Foster Argonne National Laboratory & The University of Chicago Joint work with Rachana Ananthakrishnan, Josh Bryan, Kyle Chard, Mattias Lidman, Steven Tuecke, and others GENI Engineering Conference, NYC, October 28, 2013
    3. 3. Cyberinfrastructure • “a technological and sociological solution to the problem of efficiently connecting laboratories, data, computers, and people with the goal of enabling derivation of novel scientific theories and knowledge” [Wikipedia] • AKA eScience, eResearch, Computer Supported Collaborative Work, Grid, … 3
    4. 4. “The Anatomy of the Grid,” 2001 The … problem that underlies the Grid concept is coordinated resource sharing and problem solving in dynamic, multiinstitutional virtual organizations. The sharing that we are concerned with is not primarily file exchange but rather direct access to computers, software, data, and other resources, as is required by a range of collaborative problem-solving and resource-brokering strategies emerging in industry, science, and engineering. This sharing is, necessarily, highly controlled, with resource providers and consumers defining clearly and carefully just what is shared, who is allowed to share, and the conditions under which sharing occurs. A set of individuals and/or institutions defined by such sharing rules form what we call a virtual organization (VO). 4
    5. 5. Grid technology accelerates discovery Higgs discovery “only possible because of the extraordinary achievements of … grid computing”—Rolf Heuer, CERN DG Large Hadron Collider 5
    6. 6. LHC Computing Grid “virtual organizations”
    7. 7. Complexity in research is large and growing Run experiment Collect data Move data Check data Annotate data Share data Find similar data Link to literature Analyze data Publish data 8
    8. 8. Process automation for discovery Run experiment Collect data Move data Check data Annotate data Share data Find similar data Link to literature Analyze data Publish data 9 Discovery IT as a service
    9. 9. First: File transfer as a service 2 Globus Online Data Source moves and syncs files Data Destination 1 User initiates transfer request 3 Easy Fast Reliable Available Secure Globus Online notifies user 10
    10. 10. Early adoption is encouraging 12
    11. 11. Early adoption is encouraging 12,000 registered users; >150 daily >25 PB moved; >1B files 10x (or better) performance vs. scp 99.9% availability Entirely hosted on Amazon 13
    12. 12. Next: Share big data from existing storage 1 2 Globus Online Data tracks shared files; Source no need to move X Y files to cloud storage! User A selects 3 file(s) to share, User B logs in to selects user or Globus Online group, and sets and accesses permissions shared file File X: Users A, B: RW Directory Y: Group G: R 14
    13. 13. Sharing Service Transfer Service Globus Connect Globus Online is SaaS for science Globus Nexus (Identity, Group, Profile) Globus Toolkit 15 SaaS
    14. 14. Sharing Service Transfer Service Globus Connect Globus Online APIs We are now expanding to a platform Globus Nexus (Identity, Group, Profile) PaaS 16 Globus Toolkit SaaS
    15. 15. Sharing Service Transfer Service Globus Connect Globus Online APIs Globus Online: Platform-as-a-Service Globus Nexus (Identity, Group, Profile) Globus Toolkit 17
    16. 16. The identity challenge in science • Research communities often need to Assign identities to their users – Manage user profiles – Organize users into groups for authorization – • Obstacles to high-quality implementations Complexity of associated security protocols – Creation of identity silos – Multiple credentials for users – Reliability, availability, scalability, security – 18
    17. 17. Streamline collaborative tool development • Allows developers to focus on core application logic Sharing Service • Simplifies integration with campus infrastructure Transfer Service Globus Connect Globus Online APIs Custom Web Application Globus Nexus Globus Nexus (Identity, Group, Profile) (Identity, group, & profile management) Globus Toolkit 19
    18. 18. Nexus provides four key capabilities I• Identity provisioning – Create, manage Globus identities Key points: 1) Outsource I – Link with other identities; use I identity, group, to authenticate to services profile G management • Group hub I 2) REST API for V – User-managed groups; groups flexible U can be used for authorization integration 3) Intuitive, •b Profile management aI customizable – User-managed attributes; Web interfaces I I • Identity hub can use in group admission 20
    19. 19. I Identity provisioning Globus Nexus can act as an identity provider (IDP) for a project – User management, email validation… • DOE Systems Biology Knowledge Base (kBase) is an example of such a project. ~400 identities to date • 21
    20. 20. I I I Identity hub I • Link identities from other federated IDP(s) with a Nexus identity – • Use linked identity to authenticate to Nexus – • – Via OAuth or LDAP E.g., to Jira, Zendesk, Drupal, Confluence Have Nexus cache delegated credentials – 22 E.g., use campus identity, XSEDE identity (via OAuth) Leverage Nexus federated IDP to 3rd-party services – • E.g., InCommon/Campus (SAML), Google (OpenID), XSEDE (OAuth MyProxy), IGTF-certified X.509 CA, SSH X.509, via CILogon and MyProxy
    21. 21. Identity management 23
    22. 22. Identity hub: Biomedical science Dr. Smith creates a Nexus id, via BIRN project interface • Dr. Smith links campus id and XSEDE id Name: Dr. Smith Email: • Dr. Smith can then: • – – – – – Linked id: Campus Linked id: XSEDE Authenticate to BIRN with campus id Query catalog (Nexus/BIRN id) Campus (SAML) BIRN Request data transfer from BIRN Gateway to campus (Nexus and campus ids) OAuth Campus XSEDE Request transfer from BIRN identity identity Nexus identity to XSEDE (Nexus and XSEDE ids) Repeat these tasks: use cached XSEDE BIRN Campus credentials (BIRN=Biomedical Informatics Research Network) 24
    23. 23. Use linked identity 25 25
    24. 24. G I V U • • • Group hub User-managed group creation, management Flexible control over admission policies and visibility Groups can be used in authorization decisions Example: kBase • Every kBase user added to kbase_users • Subgroups also created • Groups used for access control 26 26
    25. 25. Group membership interface 27 27
    26. 26. Branded sites XSEDE Open Science Grid University of Chicago DOE kBase Indiana University University of Exeter NERSC NIH BIRN Globus Online 28
    27. 27. Implementation and deployment Elastic Load Balancer REST API Web REST API Web REST API Web Nexus Nexus Nexus OSSEC Logging Monitoring 29
    28. 28. Globus Nexus usage as of 9/13 14,000 – • 30 Largest group (kbase) has 402 members Total users 6,000 4,000 Aug-… May-… Feb-13 Nov-… Aug-… May-… Nov-… 0 Aug-… 2,000 Feb-12 – 1638 active members 229 pending or invited members 162 rejected or suspended members 8,000 May-… – 10,000 Feb-11 557 groups totaling: 12,000 Nov-… • >12,000 users and 4977 linked identities 1000 Users in group • 100 10 1 1 21 41 61 81 101 121
    29. 29. Identities and groups in XSEDE • Proposal: Replace current ad-hoc systems with Globus Nexus identity and group service – • Reduce complexity, reduce cost, increase capability Careful process of documentation and review “Architecture and development requirements: User and identity management” – “User management proposal: Affected use cases” – “User management proposal: Motivating stories” – “Proposal: Refactoring XSEDE identity and group capabilities” – • 31 Hope to reach closure by end of 2013
    30. 30. Cloud services to accelerate discovery Accelerate discovery and innovation worldwide by providing research IT as a service Leverage software-as-a-service to • provide millions of researchers with unprecedented access to powerful tools; • enable a massive shortening of cycle times in time-consuming research processes; and • reduce research IT costs dramatically via economies of scale 32
    31. 31. Thanks to ... U.S. DEPARTMENT OF ENERGY
    32. 32. Thank you! Questions?