Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Ian Foster
Science Services and
Science Platforms
Using the cloud to accelerate
and democratize discovery
Talk at NorduGri...
Thanks to co-authors and Globus team
Globus services (globus.org)
 Foster, I. Globus Online: Accelerating and democratizi...
Thank you to our sponsors!
3
Civilization advances by extending
the number of important operations
which we can perform
without thinking about them
Alf...
Computation may someday be organized
as a public utility …
The computing utility could become the
basis for a new and impo...
The grid vision
Accelerate discovery and innovation by
providing on-demand access to computing
• “the average computing en...
Another pioneer: NorduGrid
Example grid scenarios
 “The application service providers, storage service
providers, cycle providers, and consultants
e...
Higgs discovery “only possible because
of the extraordinary achievements of
… grid computing”—Rolf Heuer,
CERN DG
10s of P...
What has changed?
• Thousands of people learned about the joys of
large-scale distributed systems
• Virtual organization c...
Looking forward
 Exploding data volumes and earlier successes
mean that many more people face challenges
of big data, big...
Cloud has transformed how software
is developed and delivered
13
Infrastructure as a service: IaaS
Platform as a service: ...
The right platform can do
the same for science
14
We can leverage cloud to provide solutions that
span the vast majority o...
The right platform can do
the same for science
15
We can leverage cloud to provide solutions that
span the vast majority o...
A science platform can spur a
discovery cloud ecosystem
16
Infrastructure as a service: IaaS
Platform as a service: PaaS
S...
A science platform can spur a
discovery cloud ecosystem
17
Infrastructure as a service: IaaS
Platform as a service: PaaS
S...
What can we automate and
outsource in science?
Run experiment
Collect data
Move data
Check data
Annotate data
Share data
F...
Data
challenges
Identity and authorization challenges
Data access: we have the highways
but not the delivery service
Our highways
encompass the Internet,
ultra-high-speed netwo...
Globus: Research data
management as a service
Essential research data
management services
 File transfer
 Data sharing
...
Globus and the research data lifecycle
23
Researcher initiates
transfer request; or
requested automatically
by script, sci...
Globus by the numbers
4
major services
13
national labs
use Globus
160 PB
transferred
10,000
active endpoints
25 billion
f...
Platforms can slash costs, simplify
access, increase
interoperability
For example, by
providing:
 Federated identity
syst...
Globus PaaS: Ecosystem
enabler
26
Auth & Groups
…
Globus Toolkit
GlobusAPIs
GlobusConnect
Data Publication & Discovery
Fil...
Globus PaaS and Open
Science Grid
27
Simple web app server login
Jetstream cloud service
29
Serving a global community:
NCAR’s Research Data Archive
Serving a global community
 17+ PB virtual
processing
 45,000+ custom
orders, 4,000
users, 380 TB
served in 2014 Courtes...
PaaS enabled automated
workflow
User logs in with
NCAR or other
campus identity
Selected dataset
copied to staging
area (s...
Desktop
Globus Cloud
Firewall
Science DMZ
Research data portal pattern
 Move portal storage into Science DMZ, with Globus...
Desktop
Globus Cloud
Firewall
Science DMZ
Research data portal pattern
 Move portal storage into Science DMZ, with Globus...
https://github.com/globus/globus-sample-data-portal
Let us know if you’d
like to participate in
future workshops
Desktop
Globus Cloud
Firewall
Science DMZ
Research data portal pattern
 Move portal storage into Science DMZ, with Globus...
Globus Auth
 Foundational identity and access management
(IAM) platform service
 Simplify creation and integration of ad...
Based on widely used web
standards
 OAuth 2.0 Authorization Framework
 aka OAuth2
 OpenID Connect Core 1.0
 aka OIDC
...
Globus Auth uses
 Login to web app
 “Log in with Globus”
 Mobile, desktop, command line apps coming
 Protect all REST ...
Desktop
Globus Cloud
Firewall
Science DMZ
Research data portal pattern
 Move portal storage into Science DMZ, with Globus...
Globus transfer API
Nearly all Globus Web App functionality
implemented via public Transfer API
https://docs.globus.org/ap...
Globus Python SDK
Python client library for the Globus Auth and
Transfer REST APIs
http://globus.github.io/globus-sdk-pyth...
Jupyter (iPython) notebooks
https://github.com/globus/globus-jupyter-notebooks
45
Desktop
Globus Cloud
Firewall
Science DMZ
Research data portal pattern
 Move portal storage into Science DMZ, with Globus...
Globus helper pages
Globus-provided web pages
designed for use
by your web apps
 Browse Endpoint
 Select Group
 Logout
...
Globus helper pages
48
Branding
Can skin Globus Auth pages
Header
Text
Default IDP
49
Desktop
Globus Cloud
Firewall
Science DMZ
Research data portal pattern
 Move portal storage into Science DMZ, with Globus...
Globus Connect HTTPS
 The future of research CI is … the web
 Globus Connect HTTPS unlocks all research
storage to the w...
Desktop
Globus Cloud
Firewall
Science DMZ
Research data portal pattern
 Move portal storage into Science DMZ, with Globus...
Why create your own services?
 Front-end / back-end within your portal
 Remote backend for portal
 Backend for pure Jav...
Why Globus Auth for your
service?
 Outsource all identity management, authentication
 Federated identity with InCommon, ...
A science platform can spur a
discovery cloud ecosystem
55
Infrastructure as a service: IaaS
Platform as a service: PaaS
S...
The Discovery Platform
Research
Service
Research
Service
Research
Service Research
Service
Research
Service
Commercial Iaa...
Together we can create
an integrated ecosystem
of services and applications
for the research and education
community
Thank...
Science Services and Science Platforms: Using the Cloud to Accelerate and Democratize Discovery
Science Services and Science Platforms: Using the Cloud to Accelerate and Democratize Discovery
Upcoming SlideShare
Loading in …5
×

Science Services and Science Platforms: Using the Cloud to Accelerate and Democratize Discovery

701 views

Published on

Ever more data- and compute-intensive science makes computing increasingly important for research. But for advanced computing infrastructure to benefit more than the scientific 1%, we need new delivery methods that slash access costs, new sustainability models beyond direct research funding, and new platform capabilities to accelerate the development of new, interoperable tools and services.

The Globus team has been working towards these goals since 2010. We have developed software-as-a-service methods that move complex and time-consuming research IT tasks out of the lab and into the cloud, thus greatly reducing the expertise and resources required to use them. We have demonstrated a subscription-based funding model that engages research institutions in supporting service operations. And we are now also showing how the platform services that underpin Globus applications can accelerate the development and use of an integrated ecosystem of advanced science applications, such as NCAR’s Research Data Archive and OSG Connect, thus enabling access to powerful data and compute resources by many more people than is possible today.

In this talk, I introduce Globus services and the underlying Globus platform. I present representative applications and discuss opportunities that this platform presents for both small science and large facilities.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Science Services and Science Platforms: Using the Cloud to Accelerate and Democratize Discovery

  1. 1. Ian Foster Science Services and Science Platforms Using the cloud to accelerate and democratize discovery Talk at NorduGrid annual conference, Košice, Slovakia, June 2nd, 2016
  2. 2. Thanks to co-authors and Globus team Globus services (globus.org)  Foster, I. Globus Online: Accelerating and democratizing science through cloud-based services. IEEE Internet Computing(May/June):70-73, 2011.  Chard, K., Tuecke, S. and Foster, I. Efficient and Secure Transfer, Synchronization, and Sharing of Big Data. Cloud Computing, IEEE, 1(3):46-55, 2014.  Chard, K., Foster, I. and Tuecke, S. Globus Platform-as-a-Service for Collaborative Science Applications. Concurrency - Practice and Experience, 27(2):290-305, 2014. Publication (globus.org/data-publication)  Chard, K., Pruyne, J., Blaiszik, B., Ananthakrishnan, R., Tuecke, S. and Foster, I., Globus Data Publication as a Service: Lowering Barriers to Reproducible Science. 11th IEEE International Conference on eScience Munich, Germany, 2015 Discovery engines  Foster, I., Ananthakrishnan, R., Blaiszik, B., Chard, K., Osborn, R., Tuecke, S., Wilde, M. and Wozniak, J. Networking materials data: Accelerating discovery at an experimental facility. Big Data and High Performance Computing, 2015.
  3. 3. Thank you to our sponsors! 3
  4. 4. Civilization advances by extending the number of important operations which we can perform without thinking about them Alfred North Whitehead (1911)
  5. 5. Computation may someday be organized as a public utility … The computing utility could become the basis for a new and important industry. John McCarthy (1961)
  6. 6. The grid vision Accelerate discovery and innovation by providing on-demand access to computing • “the average computing environment remains inadequate for [many] computationally sophisticated purposes” • “if mechanisms are in place to allow reliable, transparent, and instantaneous access to high-end resources, then it is as if those resources are devoted to them” [The Grid, Chapter 2, 1998]
  7. 7. Another pioneer: NorduGrid
  8. 8. Example grid scenarios  “The application service providers, storage service providers, cycle providers, and consultants engaged by a car manufacturer to perform scenario evaluation during planning for a new factory”  “Members of an industrial consortium bidding on a new aircraft”  “A crisis management team and the databases and simulation systems that they use to plan a response to an emergency situation”  “Members of a large, international, multiyear high- energy physics collaboration” From: The Anatomy of the Grid, 2001
  9. 9. Higgs discovery “only possible because of the extraordinary achievements of … grid computing”—Rolf Heuer, CERN DG 10s of PB, 100s of institutions, 1000s of scientists, 100Ks of CPUs, Bs of tasks 9
  10. 10. What has changed? • Thousands of people learned about the joys of large-scale distributed systems • Virtual organization concepts and technologies • Now routine to move 100s of terabytes (e.g., GridFTP moves >2 petabyte per day) • High throughput computing is mainstream (e.g., Condor and Globus run millions of jobs per day) • Large Hadron Collider found the Higgs • Earth System Grid supports >25,000 users • Commercial cloud computing has exploded
  11. 11. Looking forward  Exploding data volumes and earlier successes mean that many more people face challenges of big data, big compute, big collaboration  Networks are several orders of magnitude faster than when Grid started  Commercial cloud providers provide a substrate on which powerful new capabilities can be built with new economies of scale
  12. 12. Cloud has transformed how software is developed and delivered 13 Infrastructure as a service: IaaS Platform as a service: PaaS Software as a service: SaaS PaaS enables more rapid, cheap, and scalable delivery of powerful apps—as SaaS (web & mobile apps)
  13. 13. The right platform can do the same for science 14 We can leverage cloud to provide solutions that span the vast majority of researcher needs σ Big science
  14. 14. The right platform can do the same for science 15 We can leverage cloud to provide solutions that span the vast majority of researcher needs σ Big science
  15. 15. A science platform can spur a discovery cloud ecosystem 16 Infrastructure as a service: IaaS Platform as a service: PaaS Software as a service: SaaS (web and mobile apps) In so doing, we can slash costs, improve quality, and accelerate discovery across the sciences
  16. 16. A science platform can spur a discovery cloud ecosystem 17 Infrastructure as a service: IaaS Platform as a service: PaaS Software as a service: SaaS (web and mobile apps) 2010- 2014- In so doing, we can slash costs, improve quality, and accelerate discovery across the sciences
  17. 17. What can we automate and outsource in science? Run experiment Collect data Move data Check data Annotate data Share data Find similar data Link to literature Analyze data Publish data Automate and outsource Discovery Cloud
  18. 18. Data challenges
  19. 19. Identity and authorization challenges
  20. 20. Data access: we have the highways but not the delivery service Our highways encompass the Internet, ultra-high-speed networks, science DMZs, data transfer nodes, high-speed transport protocols A good delivery service automates, schedules, accelerates, adapts. It provides APIs for experts and casual users. Cuts costs and saves time.
  21. 21. Globus: Research data management as a service Essential research data management services  File transfer  Data sharing  Data publication  Identity and groups Builds on 15 years of research Outsourced and automated  High availability, reliability, performance, scalability  Convenient for  Casual users: Web interfaces  Power users: APIs  Administrators: Install, manage globus.org
  22. 22. Globus and the research data lifecycle 23 Researcher initiates transfer request; or requested automatically by script, science gateway 1 Instrument Compute Facility Globus transfers files reliably, securely 2 Globus controls access to shared files on existing storage; no need to move files to cloud storage! 4 Curator reviews and approves; data set published on campus or other system 7 Researcher selects files to share, selects user or group, and sets access permissions 3 Collaborator logs in to Globus and accesses shared files; no local account required; download via Globus 5 Researcher assembles data set; describes it using metadata (Dublin core and domain- specific) 6 6 Peers, collaborators search and discover datasets; transfer and share using Globus 8 Publication Repository Personal Computer Transfer Share Publish Discover • SaaS  Only a web browser required • Use storage system of your choice • Access using your campus credentials
  23. 23. Globus by the numbers 4 major services 13 national labs use Globus 160 PB transferred 10,000 active endpoints 25 billion files processed ~400 active daily users 38,000 registered users 99.9% uptime 35+ institutional subscribers 1 PB largest single transfer to date 3 months longest continuously managed transfer 130 federated campus identities
  24. 24. Platforms can slash costs, simplify access, increase interoperability For example, by providing:  Federated identity system with fine- grained authorization  Data management easily integrated with application workflows Via RESTful APIs
  25. 25. Globus PaaS: Ecosystem enabler 26 Auth & Groups … Globus Toolkit GlobusAPIs GlobusConnect Data Publication & Discovery File Sharing File Transfer & Replication
  26. 26. Globus PaaS and Open Science Grid 27
  27. 27. Simple web app server login
  28. 28. Jetstream cloud service 29
  29. 29. Serving a global community: NCAR’s Research Data Archive
  30. 30. Serving a global community  17+ PB virtual processing  45,000+ custom orders, 4,000 users, 380 TB served in 2014 Courtesy of Thomas Cram, NCAR (2014) Fully automated delivery using portal developed w/ PaaS
  31. 31. PaaS enabled automated workflow User logs in with NCAR or other campus identity Selected dataset copied to staging area (shared endpoint) User granted read permission for shared endpoint User receives email with link to access files ACLs deleted after five days
  32. 32. Desktop Globus Cloud Firewall Science DMZ Research data portal pattern  Move portal storage into Science DMZ, with Globus endpoint  Leave Portal Web server behind firewall  Globus handles the security and data heavy lifting 34 Globus Transfer Service Portal Web Server (Client) Globus Auth Browser User’s Endpoint (optional) Portal Endpoint Other Endpoints HTTPS GridFTP REST Other Services Globus Web Widgets
  33. 33. Desktop Globus Cloud Firewall Science DMZ Research data portal pattern  Move portal storage into Science DMZ, with Globus endpoint  Leave Portal Web server behind firewall  Globus handles the security and data heavy lifting 35 Globus Transfer Service Globus Auth Browser User’s Endpoint (optional) Portal Endpoint Other Endpoints HTTPS GridFTP REST Globus Web Widgets Portal Web Server (Client) Other Services
  34. 34. https://github.com/globus/globus-sample-data-portal
  35. 35. Let us know if you’d like to participate in future workshops
  36. 36. Desktop Globus Cloud Firewall Science DMZ Research data portal pattern  Move portal storage into Science DMZ, with Globus endpoint  Leave Portal Web server behind firewall  Globus handles the security and data heavy lifting 38 Globus Transfer Service Portal Web Server (Client) Globus Auth Browser User’s Endpoint (optional) Portal Endpoint Other Endpoints HTTPS GridFTP REST Other Services Globus Web Widgets
  37. 37. Globus Auth  Foundational identity and access management (IAM) platform service  Simplify creation and integration of advanced apps and services  Brokers authentication and authorization interactions between:  End-users  Identity providers: XSEDE, InCommon, web apps  Resource servers: services with REST APIs  Clients: web, mobile, desktop, command line apps  Resource servers acting as clients to other resource servers https://docs.globus.org/api/auth 39
  38. 38. Based on widely used web standards  OAuth 2.0 Authorization Framework  aka OAuth2  OpenID Connect Core 1.0  aka OIDC  Allows use of standard OAuth2 and OIDC libraries  E.g., Google OAuth Client Libraries (Java, Python, etc.), Apache mod_auth_openidc 40
  39. 39. Globus Auth uses  Login to web app  “Log in with Globus”  Mobile, desktop, command line apps coming  Protect all REST API communications  App  Globus service  App  non-Globus service  Service  service 41
  40. 40. Desktop Globus Cloud Firewall Science DMZ Research data portal pattern  Move portal storage into Science DMZ, with Globus endpoint  Leave Portal Web server behind firewall  Globus handles the security and data heavy lifting 42 Globus Transfer Service Portal Web Server (Client) Globus Auth Browser User’s Endpoint (optional) Portal Endpoint Other Endpoints HTTPS GridFTP REST Other Services Globus Web Widgets
  41. 41. Globus transfer API Nearly all Globus Web App functionality implemented via public Transfer API https://docs.globus.org/api/transfer/ 43
  42. 42. Globus Python SDK Python client library for the Globus Auth and Transfer REST APIs http://globus.github.io/globus-sdk-python/ 44
  43. 43. Jupyter (iPython) notebooks https://github.com/globus/globus-jupyter-notebooks 45
  44. 44. Desktop Globus Cloud Firewall Science DMZ Research data portal pattern  Move portal storage into Science DMZ, with Globus endpoint  Leave Portal Web server behind firewall  Globus handles the security and data heavy lifting 46 Globus Transfer Service Portal Web Server (Client) Globus Auth Browser User’s Endpoint (optional) Portal Endpoint Other Endpoints HTTPS GridFTP REST Other Services Globus Web Widgets
  45. 45. Globus helper pages Globus-provided web pages designed for use by your web apps  Browse Endpoint  Select Group  Logout https://docs.globus.org/api/helper-pages/
  46. 46. Globus helper pages 48
  47. 47. Branding Can skin Globus Auth pages Header Text Default IDP 49
  48. 48. Desktop Globus Cloud Firewall Science DMZ Research data portal pattern  Move portal storage into Science DMZ, with Globus endpoint  Leave Portal Web server behind firewall  Globus handles the security and data heavy lifting 50 Globus Transfer Service Portal Web Server (Client) Globus Auth Browser User’s Endpoint (optional) Portal Endpoint Other Endpoints HTTPS GridFTP REST Other Services Globus Web Widgets
  49. 49. Globus Connect HTTPS  The future of research CI is … the web  Globus Connect HTTPS unlocks all research storage to the web  Globus Auth provides security glue using standard web security  GridFTP doesn’t go away – async, bulk data transfer is important, but its not the end-all, be-all
  50. 50. Desktop Globus Cloud Firewall Science DMZ Research data portal pattern  Move portal storage into Science DMZ, with Globus endpoint  Leave Portal Web server behind firewall  Globus handles the security and data heavy lifting 52 Globus Transfer Service Portal Web Server (Client) Globus Auth Browser User’s Endpoint (optional) Portal Endpoint Other Endpoints HTTPS GridFTP REST Other Services Globus Web Widgets
  51. 51. Why create your own services?  Front-end / back-end within your portal  Remote backend for portal  Backend for pure Javascript browser apps  Extend your portal with a public REST API, so that other app and service developers can integrate with and extend your portal 53
  52. 52. Why Globus Auth for your service?  Outsource all identity management, authentication  Federated identity with InCommon, Google, etc.  Outsource your REST API security  Consent, token issuance, validation, revocation  You provide service-specific authorization  Apps use your service like all others  Its standard OAuth2 and OIDC  Your service can seamlessly leverage other services  Other services can leverage your service Add your service to the international science platform 54
  53. 53. A science platform can spur a discovery cloud ecosystem 55 Infrastructure as a service: IaaS Platform as a service: PaaS Software as a service: SaaS (web and mobile apps) 2010- 2014- In so doing, we can slash costs, improve quality, and accelerate discovery across the sciences
  54. 54. The Discovery Platform Research Service Research Service Research Service Research Service Research Service Commercial IaaS/PaaS Core Services Authentication Groups Attributes Data Services Transfer Sharing Publication Commercial PaaS Discovery Exchange Enabling the Discovery Cloud Research facilitiesResearch facilities
  55. 55. Together we can create an integrated ecosystem of services and applications for the research and education community Thank you! foster@uchicago.edu @ianfoster 57

×