Ian Foster
Computation Institute
Argonne National Lab & University of Chicago
Services for Science
2
Thanks!
DOE Office of Science
NSF Office of Cyberinfrastructure
National Institutes of Health
Colleagues at Argonne, U.Chicago, USC/ISI,
OSU, Manchester, and elsewhere
3
Scientific
Communication, ~1600
Brahe Kepler
4
1980
5
Scientific Communication, ~2000
Data ArchivesData Archives
User
Analysis toolsAnalysis tools
Gateway
Figure: S. G. Djorgovski
Discovery toolsDiscovery tools
Service-Oriented Science
6
Application Scenario
Location A
Microarray, Protein,
Image data
Location B
Microarray, Protein,
Image data
Location C
Microarray, Protein,
Image data
Location C
Image Analysis
Location D
Image Analysis
Microarray and protein
databases at other institutions
Different database
systems, data
representations,
security
Different
program
invocation,
remote access,
data transfer
7
caBIG: sharing of infrastructure, applications, and
data.
Data
Integration!
Services
& Cancer Biology
Globus
8
Service-Oriented Science
People create services (data, code, instr.) …
which I discover (& decide whether to use) …
& compose to create a new function ...
& then publish as a new service.
 I find “someone else” to host services,
so I don’t have to become an expert in
operating services & computers!
 I hope that this “someone else” can
manage security, reliability, scalability, …
!!
“Service-Oriented Science”, Science, 2005
9
Creating Services
People create services (data, code, instr.) …
which I discover (& decide whether to use) …
& compose to create a new function ...
& then publish as a new service.
 I find “someone else” to host services,
so I don’t have to become an expert in
operating services & computers!
 I hope that this “someone else” can
manage security, reliability, scalability, …
!!
“Service-Oriented Science”, Science, 2005
10
Anatomy of a Service
op1 opN (meta)data
Implementation(s)
Clients
Registry
Management
Clients
…
Service
Service
Attribute
Authority
Attribute
Authority
Persistence
11
Creating Services (~2005)
“This full-day tutorial provides an
introduction to programming Java
services with the latest version of the
Globus Toolkit version 4 (GT4). The
tutorial teaches how to build a Java
Service that makes use of GT4
mechanisms for state management,
security, registry and related topics.”
12
Appln
Service
Create
Index
service
Store
Repository
Service
Advertize
Discover
Invoke;
get results
Introduce
Container
Transfer
GAR
Deploy
Ohio State University and Argonne/U.Chicago
Creating Services in 2008
Introduce and gRAVI
Introduce
Define service
Create skeleton
Discover types
Add operations
Configure security
Grid Remote
Application
Virtualization
Infrastructure
Wrap executables
Globus
Demonstration:
Creating Services
Introduce + gRAVI
Shannon Hastings
Scott Oster
David Ervin
Stephen Langella
Kyle Chard
Ravi Madduri
14Center for Enabling Distributed Petascale Science
Workflow Automation at DOE Facilities
Automation
Reproducibility
Security
Reusability
Storage
Metadata
Analysis
Visualization
Advanced Photon
Source
15
Discovering Services
People create services (data or functions) …
which I discover (& decide whether to use) …
& compose to create a new function ...
& then publish as a new service.
 I find “someone else” to host services,
so I don’t have to become an expert in
operating services & computers!
 I hope that this “someone else” can
manage security, reliability, scalability, …
!!
“Service-Oriented Science”, Science, 2005
16
 The ultimate arbiter?
 Types, ontologies
 Can I use it?
 Billions of services
Discovering Services
Assume success
Syntax,
semantics
Permissions
Reputation
A B
17
Discovery (1):
Registries
Globus
18
Discovery (2):
Standardized Vocabularies
Core Services
Grid
Service
Uses Terminology
Described In
Cancer Data
Standards
Repository
Enterprise
Vocabulary
Services
References
Objects
Defined in
Service
Metadata
Publishes
Subscribes to
and Aggregates
Queries
Service
Metadata
Aggregated In
Registers To
Discovery
Client API
Index
Service
Globus
19
20
Discovery (3): Tagging
& Social Networking
GLOSS:
Generalized
Labels Over Scientific
data Sources
(Foster, Nestorov)
21
Discovery (3): Tagging
& Social Networking
David de Roure,
Carole Goble,
et al.
22
Composing Services
People create services (data or functions) …
which I discover (& decide whether to use) …
& compose to create a new function ...
& then publish as a new service.
 I find “someone else” to host services,
so I don’t have to become an expert in
operating services & computers!
 I hope that this “someone else” can
manage security, reliability, scalability, …
!!
“Service-Oriented Science”, Science, 2005
23
Composing Services:
E.g., BPEL Workflow System
Data Service
@ uchicago.edu
Analytic service
@ osu.edu
Analytic service
@ duke.edu
<BPEL
Workflow
Doc>
<Workflow
Inputs>
<Workflow
Results>
BPEL
Engine
link
caBiG: https://cabig.nci.nih.gov/; BPEL work: Ravi Madduri et al.
link
link
link
See also Kepler & Taverna
Globus
24
Composing Services:
Taverna
caGrid Scavenger with
semantic/metadata-
based caGrid service query
A sample
caGrid
workflow
Globus
25
Composing
Services
Globus
Demonstration:
Composing Services
Taverna + GT4
Taverna team
Wei Tan
Ravi Madduri
27
Publishing Services
People create services (data or functions) …
which I discover (& decide whether to use) …
& compose to create a new function ...
& then publish as a new service.
 I find “someone else” to host services,
so I don’t have to become an expert in
operating services & computers!
 I hope that this “someone else” can
manage security, reliability, scalability, …
!!
“Service-Oriented Science”, Science, 2005
28
Publishing Services
Description  Syntax, semantics
State  Availability, load, …
Policies  Who, what, when, …
Hosting  Location, scalability, …
29
Authorization: SAML & XACML
VOMS Shibboleth LDAP PERMIS…
GT4 Client
GT4 Server
PDP
Attributes
Authorization
Decision
PIP PIP PIP
SAML
XACML
Globus
30
Hosting Services
People create services (data or functions) …
which I discover (& decide whether to use) …
& compose to create a new function ...
& then publish as a new service.
 I find “someone else” to host services,
so I don’t have to become an expert in
operating services & computers!
 I hope that this “someone else” can
manage security, reliability, scalability, …
!!
“Service-Oriented Science”, Science, 2005
31
The Two Dimensions
of Service-Oriented Science
Decompose across network
Clients integrate dynamically
Select & compose services
Select “best of breed” providers
Publish result as new services
Decouple resource & service providers
Function
Resource
Data Archives
Analysis tools
Discovery toolsUsers
Fig: S. G. Djorgovski
32
The geWorkbench/caGrid/TeraGrid
Interface
33
Putting It Together for the Example Scenario
Location A
Microarray, Protein,
Image data
Location B
Microarray, Protein,
Image data
Location C
Microarray, Protein,
Image data
Location C
Image Analysis
Location D
Image Analysis
caGrid Service
Interfaces
caGrid
Environ-
ment
Registered
Object
Definitions
Advertise-
ment
Log on, Grid
credentials
Query and
Analysis
Workflow
Discovery
Microarray & protein
databases at other
institutions
34
Lessons Learned
A convenient higher-level abstraction
Suitable for a subset of scientific use cases
Infrastructure need to be sustainable
Integrates well with hospital/cancer
center/experimental facility IT
infrastructure
Workflows are attractive to users
Scalability and provenance are important
No vendor lock-in (if you are careful)
User experience remains ambiguous
Early adopters are enthusiastic (50+ services)
35
Services for Science
A new approach to communicating
A (not-so new) approach to structuring systems
They’re real
Excellent infrastructure and tools (Globus,
Introduce, gRAVI, Taverna, Swift, etc., etc.)
Substantial numbers of services out there
They’re challenging
Sociology: incentives, rewards
Infrastructure: hosting
Provenance: justifying “results”
Scaling: services, requests

Services for Science v2 (APAN26)

  • 1.
    Ian Foster Computation Institute ArgonneNational Lab & University of Chicago Services for Science
  • 2.
    2 Thanks! DOE Office ofScience NSF Office of Cyberinfrastructure National Institutes of Health Colleagues at Argonne, U.Chicago, USC/ISI, OSU, Manchester, and elsewhere
  • 3.
  • 4.
  • 5.
    5 Scientific Communication, ~2000 DataArchivesData Archives User Analysis toolsAnalysis tools Gateway Figure: S. G. Djorgovski Discovery toolsDiscovery tools Service-Oriented Science
  • 6.
    6 Application Scenario Location A Microarray,Protein, Image data Location B Microarray, Protein, Image data Location C Microarray, Protein, Image data Location C Image Analysis Location D Image Analysis Microarray and protein databases at other institutions Different database systems, data representations, security Different program invocation, remote access, data transfer
  • 7.
    7 caBIG: sharing ofinfrastructure, applications, and data. Data Integration! Services & Cancer Biology Globus
  • 8.
    8 Service-Oriented Science People createservices (data, code, instr.) … which I discover (& decide whether to use) … & compose to create a new function ... & then publish as a new service.  I find “someone else” to host services, so I don’t have to become an expert in operating services & computers!  I hope that this “someone else” can manage security, reliability, scalability, … !! “Service-Oriented Science”, Science, 2005
  • 9.
    9 Creating Services People createservices (data, code, instr.) … which I discover (& decide whether to use) … & compose to create a new function ... & then publish as a new service.  I find “someone else” to host services, so I don’t have to become an expert in operating services & computers!  I hope that this “someone else” can manage security, reliability, scalability, … !! “Service-Oriented Science”, Science, 2005
  • 10.
    10 Anatomy of aService op1 opN (meta)data Implementation(s) Clients Registry Management Clients … Service Service Attribute Authority Attribute Authority Persistence
  • 11.
    11 Creating Services (~2005) “Thisfull-day tutorial provides an introduction to programming Java services with the latest version of the Globus Toolkit version 4 (GT4). The tutorial teaches how to build a Java Service that makes use of GT4 mechanisms for state management, security, registry and related topics.”
  • 12.
    12 Appln Service Create Index service Store Repository Service Advertize Discover Invoke; get results Introduce Container Transfer GAR Deploy Ohio StateUniversity and Argonne/U.Chicago Creating Services in 2008 Introduce and gRAVI Introduce Define service Create skeleton Discover types Add operations Configure security Grid Remote Application Virtualization Infrastructure Wrap executables Globus
  • 13.
    Demonstration: Creating Services Introduce +gRAVI Shannon Hastings Scott Oster David Ervin Stephen Langella Kyle Chard Ravi Madduri
  • 14.
    14Center for EnablingDistributed Petascale Science Workflow Automation at DOE Facilities Automation Reproducibility Security Reusability Storage Metadata Analysis Visualization Advanced Photon Source
  • 15.
    15 Discovering Services People createservices (data or functions) … which I discover (& decide whether to use) … & compose to create a new function ... & then publish as a new service.  I find “someone else” to host services, so I don’t have to become an expert in operating services & computers!  I hope that this “someone else” can manage security, reliability, scalability, … !! “Service-Oriented Science”, Science, 2005
  • 16.
    16  The ultimatearbiter?  Types, ontologies  Can I use it?  Billions of services Discovering Services Assume success Syntax, semantics Permissions Reputation A B
  • 17.
  • 18.
    18 Discovery (2): Standardized Vocabularies CoreServices Grid Service Uses Terminology Described In Cancer Data Standards Repository Enterprise Vocabulary Services References Objects Defined in Service Metadata Publishes Subscribes to and Aggregates Queries Service Metadata Aggregated In Registers To Discovery Client API Index Service Globus
  • 19.
  • 20.
    20 Discovery (3): Tagging &Social Networking GLOSS: Generalized Labels Over Scientific data Sources (Foster, Nestorov)
  • 21.
    21 Discovery (3): Tagging &Social Networking David de Roure, Carole Goble, et al.
  • 22.
    22 Composing Services People createservices (data or functions) … which I discover (& decide whether to use) … & compose to create a new function ... & then publish as a new service.  I find “someone else” to host services, so I don’t have to become an expert in operating services & computers!  I hope that this “someone else” can manage security, reliability, scalability, … !! “Service-Oriented Science”, Science, 2005
  • 23.
    23 Composing Services: E.g., BPELWorkflow System Data Service @ uchicago.edu Analytic service @ osu.edu Analytic service @ duke.edu <BPEL Workflow Doc> <Workflow Inputs> <Workflow Results> BPEL Engine link caBiG: https://cabig.nci.nih.gov/; BPEL work: Ravi Madduri et al. link link link See also Kepler & Taverna Globus
  • 24.
    24 Composing Services: Taverna caGrid Scavengerwith semantic/metadata- based caGrid service query A sample caGrid workflow Globus
  • 25.
  • 26.
    Demonstration: Composing Services Taverna +GT4 Taverna team Wei Tan Ravi Madduri
  • 27.
    27 Publishing Services People createservices (data or functions) … which I discover (& decide whether to use) … & compose to create a new function ... & then publish as a new service.  I find “someone else” to host services, so I don’t have to become an expert in operating services & computers!  I hope that this “someone else” can manage security, reliability, scalability, … !! “Service-Oriented Science”, Science, 2005
  • 28.
    28 Publishing Services Description Syntax, semantics State  Availability, load, … Policies  Who, what, when, … Hosting  Location, scalability, …
  • 29.
    29 Authorization: SAML &XACML VOMS Shibboleth LDAP PERMIS… GT4 Client GT4 Server PDP Attributes Authorization Decision PIP PIP PIP SAML XACML Globus
  • 30.
    30 Hosting Services People createservices (data or functions) … which I discover (& decide whether to use) … & compose to create a new function ... & then publish as a new service.  I find “someone else” to host services, so I don’t have to become an expert in operating services & computers!  I hope that this “someone else” can manage security, reliability, scalability, … !! “Service-Oriented Science”, Science, 2005
  • 31.
    31 The Two Dimensions ofService-Oriented Science Decompose across network Clients integrate dynamically Select & compose services Select “best of breed” providers Publish result as new services Decouple resource & service providers Function Resource Data Archives Analysis tools Discovery toolsUsers Fig: S. G. Djorgovski
  • 32.
  • 33.
    33 Putting It Togetherfor the Example Scenario Location A Microarray, Protein, Image data Location B Microarray, Protein, Image data Location C Microarray, Protein, Image data Location C Image Analysis Location D Image Analysis caGrid Service Interfaces caGrid Environ- ment Registered Object Definitions Advertise- ment Log on, Grid credentials Query and Analysis Workflow Discovery Microarray & protein databases at other institutions
  • 34.
    34 Lessons Learned A convenienthigher-level abstraction Suitable for a subset of scientific use cases Infrastructure need to be sustainable Integrates well with hospital/cancer center/experimental facility IT infrastructure Workflows are attractive to users Scalability and provenance are important No vendor lock-in (if you are careful) User experience remains ambiguous Early adopters are enthusiastic (50+ services)
  • 35.
    35 Services for Science Anew approach to communicating A (not-so new) approach to structuring systems They’re real Excellent infrastructure and tools (Globus, Introduce, gRAVI, Taverna, Swift, etc., etc.) Substantial numbers of services out there They’re challenging Sociology: incentives, rewards Infrastructure: hosting Provenance: justifying “results” Scaling: services, requests

Editor's Notes

  • #8 I would mention data integration here: integration of different data types: tissue, (pre)clinical, genomics, proteomics….
  • #15 Developer benefits: Developers spend time developing scientific applications instead of rewriting code to turn into services. Developer time spared through code reuse—legacy apps are wrappered into services instead of rewritten as services Learning gRavi easier than learning service development—especially marshaling of security resources! Traditional applications are not secure and can not provide an audit trail Scientific benefits: Scientists spend time developing workflows rather than moving data from application to application Scientists interact with GUI to focus on scientific process not how to run applications. Services provide easy access to all scientific tools in one integrated GUI. Access to grid resources promotes cross disciplinary collaboration as it becomes easier to share data and applications across institutional boundaries
  • #17 Use funnel animation?
  • #25 The GT4 plug-in support semantic based service query. Users can input multiple (up to three, in current scenario three is enough. We can add more in our program upon request) service query criteria and input the corresponding value. We can combine multiple criteria. The initial GUI only shows one query criteria, but more can be added by clicking “Add Service Query” button. For example, we can query the caGrid services whose “Research Center” name is “Ohio State University”, with Service Name “DICOMDataService”, and has operation “PullOp”.