building global software
earthcube  sciencecloud
Ian Foster
Argonne National Laboratory
University of Chicago
foster@uchicago.edu
all software
must be
global software*
Rhys Francis
Globus endpoints, Azimuthal Equidistant projection* “all” of course doesn’t mean 100%
automation ✦ economies of scale
simplicity ✦ network effects
7
data locations in
a study of breast
cancer genomics
some identities required for Dark Energy Survey
(Don Petravick)
CLI = "ssh ian@cli.globusonline.org"
def transfer(from, to):
id = run( CLI + " transfer --generate-id" )
p = run( CLI + " transfer --taskid " + id
+ " -- " + from + " " + to)
if p != None:
(stdout, stderr) = run( CLI + " wait -q " + id)
dp = run( CLI + " details -f status -O csv " + id)
if dp != None:
return ( dp, stderr, stdout )
research data management as a service
research data management as a service
research data management as a service
identity, group management as a service
8,000 endpoints and 25,000 users
10B files and 95 PB moved
Globus endpoints, Mercator projection
science cloud platforms
can accelerate service
development
science cloud platforms
can accelerate service
development
three take home messages
all software must be global software
Software-as-a-service (SaaS) allows for
global impact at reduced cost
broad adoption of SaaS in science
requires science platforms
Globus endpoints, Azimuthal Equidistant projection
three questions
what activity in your research is the
most painful and time-consuming?
what data management activities can
you imagine outsourcing?
do you use Globus services?
If not, why not? 
Ian Foster, foster@uchicago.edu
Globus endpoints, Azimuthal Equidistant projection

building global software/earthcube->sciencecloud

  • 1.
    building global software earthcube sciencecloud Ian Foster Argonne National Laboratory University of Chicago foster@uchicago.edu
  • 2.
    all software must be globalsoftware* Rhys Francis Globus endpoints, Azimuthal Equidistant projection* “all” of course doesn’t mean 100%
  • 6.
    automation ✦ economiesof scale simplicity ✦ network effects
  • 7.
    7 data locations in astudy of breast cancer genomics
  • 8.
    some identities requiredfor Dark Energy Survey (Don Petravick)
  • 9.
    CLI = "sshian@cli.globusonline.org" def transfer(from, to): id = run( CLI + " transfer --generate-id" ) p = run( CLI + " transfer --taskid " + id + " -- " + from + " " + to) if p != None: (stdout, stderr) = run( CLI + " wait -q " + id) dp = run( CLI + " details -f status -O csv " + id) if dp != None: return ( dp, stderr, stdout ) research data management as a service
  • 10.
  • 11.
  • 12.
  • 13.
    8,000 endpoints and25,000 users 10B files and 95 PB moved Globus endpoints, Mercator projection
  • 14.
    science cloud platforms canaccelerate service development
  • 15.
    science cloud platforms canaccelerate service development
  • 16.
    three take homemessages all software must be global software Software-as-a-service (SaaS) allows for global impact at reduced cost broad adoption of SaaS in science requires science platforms Globus endpoints, Azimuthal Equidistant projection
  • 17.
    three questions what activityin your research is the most painful and time-consuming? what data management activities can you imagine outsourcing? do you use Globus services? If not, why not?  Ian Foster, foster@uchicago.edu Globus endpoints, Azimuthal Equidistant projection

Editor's Notes

  • #2 Two cryptic phrases … both of which I will explain in my remarks.
  • #3 What did Rhys mean? Many of the properties to which we aspire for software (and indeed for data): quality, reproducibility, longevity, sustainability. are hard to achieve if the developers of the software do not have the vision, skills, and persistence to achieve extremely broad adoption.
  • #4 Any software that does not have those properties will inevitably follow this all-too common trajectory. Enthusiastic development, completion, and eventually sinking (often along with the science that depends on it) in a sea of technical debt. Experience outside science suggests some potential solutions to this crisis. Consumer and enterprise software has undergone a profound revolution over the past 10 years, a revolution that is variously referred to as Cloud, SaaS, etc. We ourselves can archive all of our digital photos for $$s per month. Work collaboratively on documents with people worldwide.
  • #5 Companies can outsource all of their IT, so that it is quite feasible to run a company from a coffee shop. Subscribe to web presence, accounting, data analytics, etc., etc., services.
  • #6 Companies can outsource all of their IT, so that it is quite feasible to run a company from a coffee shop. Subscribe to web presence, accounting, data analytics, etc., etc., services.
  • #7 What makes this all possible is: radical simplification via Web 2.0, large economies of scale, Why can’t we do the same thing for science?
  • #8 Outsource, for example, the challenges inherent in moving, locating, publishing diverse data. A picture from cancer genomics, but the challenges should be familiar
  • #9 Managing different identities, credentials, and group memberships
  • #15 RDA: outsource data sharing and transfer
  • #16 kBase: Outsource identity and group management