SlideShare a Scribd company logo
1 of 52
Science as a Service
Ian Foster, The University of Chicago and Argonne National Laboratory
November 14, 2013
A time of disruptive change
A time of disruptive change
Most labs have limited resources
Heidorn: NSF grants in 2007
$1,000,000
$100,000
$10,000

< $350,000
80% of awards
50% of grant $$

$1,000

2000

4000

6000

8000
Automation is required to
apply more sophisticated
methods to far more data
Automation is required to
apply more sophisticated
methods to far more data
Outsourcing is needed to
achieve economies of scale
in the use of automated
methods
Building a discovery cloud
• Identify time-consuming activities amenable to
automation and outsourcing
• Implement as high-quality, low-touch SaaS
• Leverage IaaS for reliability,
Software as a service
economies of scale
Platform as a service
Infrastructure as a service
• Extract common elements as
research automation platform
Bonus question: Sustainability
We aspire (initially) to create a
great user experience for
research data management
What would a “dropbox for
science” look like?
• Collect
• Move
• Sync
• Share
• Analyze

• Annotate
• Publish
• Search
• Backup
• Archive

BIG DATA
It should be trivial to
Collect, Move, Sync, Share, Analyze, Annotate, Publish,
… but in reality it’s often very challenging
Search, Backup, & Archive BIG DATA
!

Staging
Store

! Ingest

Expired
Store
credentials

Registry
Permission
denied
Communit
Community
yStore
Store

!

Analysis
!
Store Quota

Network
failed. Retry.

exceeded

Archive

Mirror
• Collect
• Move
• Sync
• Share
• Analyze

• Annotate
• Publish
• Search
• Backup
• Archive

BIG DATA
• Collect
• Annotate
Move
• Publish
• Move
Sync
• Search
• Sync
• Share
Share
• Backup
Capabilities delivered using
• Analyze
• Archive

Software-as-Service (SaaS) model

BIG DATA
2
Data
Source

1

Globus
Online
moves/syncs
files

Data
Destination

User
initiates
transfer
request

Globus Online
notifies user

3
2

1

User A selects
file(s) to share;
selects user/group,
sets share
permissions

Globus Online tracks
shared files; no need
to move files to cloud
storage!

Data
Source

3
User B logs in to
Globus Online
and accesses
shared file
Extreme ease of use
•
•
•
•
•
•
•
•

InCommon, Oauth, OpenID, X.509, …
Credential management
Group definition and management
Transfer management and optimization
Reliability via transfer retries
Web interface, REST API, command line
One-click “Globus Connect” install
5-minute Globus Connect Multi User install
Early adoption is encouraging
Early adoption is encouraging

>12,000 registered users; >150 daily
>27 PB moved; >1B files
10x (or better) performance vs. scp
99.9% availability
Entirely hosted on Amazon
Amazon web services used
• EC2 for hosting Globus services
• ELB to use multiple availability zones for
reliability and uptime
• SES and SNS to send notifications of transfer
status
• S3 to store historical state
• PostgreSQL for active state
K. Heitmann (Argonne)
moves 22 TB of cosmology
data LANL  ANL at 5 Gb/s
B. Winjum (UCLA) moves
900K-file plasma physics
datasets UCLA NERSC
Dan Kozak (Caltech) replicates 1
PB LIGO astronomy data for
resilience
Erin Miller (PNNL)
collects data at
Advanced Photon
Source, renders at
PNNL, and views at ANL
Credit: Kerstin Kleese-van Dam
• Collect
• Annotate
Move
• Publish
• Move
Sync
• Search
• Sync
• Share
Share
• Backup
Capabilities delivered using
• Analyze
• Archive

Software-as-Service (SaaS) model

BIG DATA
• Collect
• Move
• Sync
• Share
• Analyze

• Annotate
• Publish
• Search
• Backup
• Archive

BIG DATA
Sharing Service
Transfer Service
Globus Nexus
(Identity, Group, Profile)

Globus Toolkit

Globus Connect

Globus Online
APIs

Globus Online already does a lot
The identity challenge in science
• Research communities often need to
– Assign identities to their users
– Manage user profiles
– Organize users into groups for authorization

• Obstacles to high-quality implementations
–
–
–
–

Complexity of associated security protocols
Creation of identity silos
Multiple credentials for users
Reliability, availability, scalability, security
Nexus provides four key capabilities
• Identity provisioning

I
I

I

– Create, manage Globus identities
I

I
G

I
V
U
aI b

• Identity hub
– Link with other identities; use
to authenticate to services

• Group hub
– User-managed groups; groups can
be used for authorization

• Profile management
– User-managed attributes;
can use in group admission

Key points:
1) Outsource
identity, group,
profile
management
2) REST API for
flexible integration
3) Intuitive,
customizable
Web interfaces
Branded sites
XSEDE

Open Science Grid

University of Chicago

DOE kBase

Indiana University

University of Exeter

Globus Online

NERSC

NIH BIRN
A platform for integration
A platform for integration
A platform for integration
Data management SaaS (Globus) +
Next-gen sequence analysis pipelines (Galaxy) +
Cloud IaaS (Amazon) =
Flexible, scalable, easy-to-use genomics analysis for
all biologists
globus
genomics
Sharing Service
Transfer Service
Globus Nexus
(Identity, Group, Profile)

Globus Toolkit

Globus Connect

Globus Online
APIs

We are adding capabilities
Dataset Services
Sharing Service
Transfer Service
Globus Nexus
(Identity, Group, Profile)

Globus Toolkit

Globus Connect

Globus Online
APIs

We are adding capabilities
We are adding capabilities
• Ingest and publication
– Imagine a DropBox that not only replicates, but also extracts
metadata, catalogs, converts

• Cataloging
– Virtual views of data based on user-defined and/or automatically
extracted metadata

• Computation
– Associate computational procedures, orchestrate application,
catalog results, record provenance
Next Gen Sequencing Analysis for Everyone –
No IT Required
Ravi K Madduri, The University of Chicago and Argonne National Laboratory

November 14, 2013
One slide to get your attention
Outline
• Globus Vision
• Challenges in Sequencing Analysis
– Big Data Management
– Analysis at Scale
– Reproducibility

• Proposed Approach Using Globus Genomics
• Example Collaborations
• Q&A
Globus Vision
Goal: Accelerate discovery and innovation worldwide
by providing research IT as a service
Leverage software-as-a-service to:
– provide millions of researchers with unprecedented access to
powerful tools for managing Big Data
– reduce research IT costs dramatically via economies of scale

“Civilization advances by extending the number of important
operations which we can perform without thinking of them”
—Alfred North Whitehead , 1911
Challenges in Sequencing Analysis
Data Movement and Access Challenges
•
•
•
•

Shell scripts to sequentially execute the tools
Manually modify the scripts for any change

•

Public
Data

Manually move the data to the Compute node
Install all the tools required for the Analysis

Difficult to maintain and transfer the knowledge

•

BWA, Picard, GATK, Filtering Scripts, etc.

•

Error Prone, difficult to keep track, messy..

Storage

Sequencing
Centers

Fastq

Ref Genome

Research Lab
Seq
Center

Local Cluster/
Cloud

Modify

Picard
Install

•
•
•
•

Difficult to Data is distributed in different locations
Research labs need access to the data for analysis
Be able to Share data with other researchers/collaborators
•
Inefficient ways of data movement
Data needs to be available on the local and Distributed Compute
Resources
•
Local Clusters, Cloud, Grid and transfer the knowledge

Alignment
(Re)Run
GATK

Script
Variant
Calling

How do we analyze this
Sequence Data

Manual Data Analysis
Globus Genomics

Globus Genomics

Galaxy Based
Workflow
Management System
•

Public
Data
Sequencin
g Centers

Globus Provides a
•
High-performance
Research Lab
•
Fault-tolerant
Seq Secure
•
Center

Storage

•
•

Galaxy
Data Libraries

•
Local Cluster/
Cloud

•

file transfer Service between
all data-endpoints

Globus Integrated
within Galaxy
Web-based UI
Drag-Drop workflow
creations
Easily modify
Workflows with new
tools
Analytical tools are
automatically run on
the scalable compute
resources when
possible

Globus Genomics on
Amazon EC2

Data Management

Data Analysis
Globus Genomics Architecture

Figure 2: Globus Genomics Architecture
Globus Genomics Usage
mputation Institute, University of Chicago, Chicago, IL, USA. 2Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL, U
3 Section Genetic Medicine, University of Chicago, Chicago, IL.

Challenges in Next-Gen Sequencing Analysis

Parallel Workflows on Globus Genomics

High Performance, Reusable Consensus
Calling Pipeline
Globus Genomics
• Computational profiles for
various analysis tools
• Resources can be
provisioned on-demand with
Amazon Web Services cloud
based infrastructure
• Glusterfs as a shared file
system between head nodes
and compute nodes
• Provisioned I/O on EBS
Coming soon!
• Integration with Globus Catalog
– Better data discovery and metadata management

• Integration with Globus Sharing
– Easy and secure method to share large datasets with collaborators

• Integration with Amazon Glacier for data archiving
• Support for high throughput computational
modalities through Apache Mesos
– MapReduce and MPI clusters

• Dynamic Storage Strategies using S3 and/or LVMbased shared file system
Our vision for a 21st century
discovery infrastructure
Provide more capability for
more people at lower cost by
building a “Discovery Cloud”
Delivering “Science as a service”
Thank you to our sponsors
For more information
• More information on Globus Genomics and to
sign up: www.globus.org/genomics
• More information on Globus:
www.globusonline.org
• Follow us on Twitter:
@ianfoster, @madduri, @globusgenomics, @gl
obusonline
Thank you!
Please give us your feedback on this
presentation

BDT 310
As a thank you, we will select prize
winners daily for completed surveys!

More Related Content

What's hot

Genomic Scale Big Data Pipelines
Genomic Scale Big Data PipelinesGenomic Scale Big Data Pipelines
Genomic Scale Big Data PipelinesLynn Langit
 
Big data at experimental facilities
Big data at experimental facilitiesBig data at experimental facilities
Big data at experimental facilitiesIan Foster
 
Taming Big Data!
Taming Big Data!Taming Big Data!
Taming Big Data!Ian Foster
 
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Ian Foster
 
Working with Instrument Data (GlobusWorld Tour - UMich)
Working with Instrument Data (GlobusWorld Tour - UMich)Working with Instrument Data (GlobusWorld Tour - UMich)
Working with Instrument Data (GlobusWorld Tour - UMich)Globus
 
Streamlined data sharing and analysis to accelerate cancer research
Streamlined data sharing and analysis to accelerate cancer researchStreamlined data sharing and analysis to accelerate cancer research
Streamlined data sharing and analysis to accelerate cancer researchIan Foster
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for ScienceIan Foster
 
Benchmarking Cloud-based Tagging Services
Benchmarking Cloud-based Tagging ServicesBenchmarking Cloud-based Tagging Services
Benchmarking Cloud-based Tagging ServicesTanu Malik
 
Genome-scale Big Data Pipelines
Genome-scale Big Data PipelinesGenome-scale Big Data Pipelines
Genome-scale Big Data PipelinesLynn Langit
 
Testistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland Leusden
Testistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland LeusdenTestistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland Leusden
Testistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland LeusdenTurkish Testing Board
 
Big Data Testing
Big Data TestingBig Data Testing
Big Data TestingQA InfoTech
 
Time to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the CloudTime to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the CloudAmazon Web Services
 
Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero
Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcaderoIasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero
Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcaderoCodecamp Romania
 
Big Data – A New Testing Challenge
Big Data – A New Testing ChallengeBig Data – A New Testing Challenge
Big Data – A New Testing ChallengeTEST Huddle
 
Accelerating data-intensive science by outsourcing the mundane
Accelerating data-intensive science by outsourcing the mundaneAccelerating data-intensive science by outsourcing the mundane
Accelerating data-intensive science by outsourcing the mundaneIan Foster
 
Reference architecture for Internet Of Things
Reference architecture for Internet Of ThingsReference architecture for Internet Of Things
Reference architecture for Internet Of Thingselephantscale
 
GlobusWorld 2015
GlobusWorld 2015GlobusWorld 2015
GlobusWorld 2015Tanu Malik
 
Cloud com foster december 2010
Cloud com foster december 2010Cloud com foster december 2010
Cloud com foster december 2010Ian Foster
 

What's hot (20)

Genomic Scale Big Data Pipelines
Genomic Scale Big Data PipelinesGenomic Scale Big Data Pipelines
Genomic Scale Big Data Pipelines
 
Big data at experimental facilities
Big data at experimental facilitiesBig data at experimental facilities
Big data at experimental facilities
 
Taming Big Data!
Taming Big Data!Taming Big Data!
Taming Big Data!
 
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
 
Working with Instrument Data (GlobusWorld Tour - UMich)
Working with Instrument Data (GlobusWorld Tour - UMich)Working with Instrument Data (GlobusWorld Tour - UMich)
Working with Instrument Data (GlobusWorld Tour - UMich)
 
Streamlined data sharing and analysis to accelerate cancer research
Streamlined data sharing and analysis to accelerate cancer researchStreamlined data sharing and analysis to accelerate cancer research
Streamlined data sharing and analysis to accelerate cancer research
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
Benchmarking Cloud-based Tagging Services
Benchmarking Cloud-based Tagging ServicesBenchmarking Cloud-based Tagging Services
Benchmarking Cloud-based Tagging Services
 
Genome-scale Big Data Pipelines
Genome-scale Big Data PipelinesGenome-scale Big Data Pipelines
Genome-scale Big Data Pipelines
 
Testistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland Leusden
Testistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland LeusdenTestistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland Leusden
Testistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland Leusden
 
Big Data Testing
Big Data TestingBig Data Testing
Big Data Testing
 
Time to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the CloudTime to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the Cloud
 
Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero
Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcaderoIasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero
Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero
 
Big Data – A New Testing Challenge
Big Data – A New Testing ChallengeBig Data – A New Testing Challenge
Big Data – A New Testing Challenge
 
Accelerating data-intensive science by outsourcing the mundane
Accelerating data-intensive science by outsourcing the mundaneAccelerating data-intensive science by outsourcing the mundane
Accelerating data-intensive science by outsourcing the mundane
 
Reference architecture for Internet Of Things
Reference architecture for Internet Of ThingsReference architecture for Internet Of Things
Reference architecture for Internet Of Things
 
CLIM Program: Remote Sensing Workshop, Distributed Access and Analysis: NASA ...
CLIM Program: Remote Sensing Workshop, Distributed Access and Analysis: NASA ...CLIM Program: Remote Sensing Workshop, Distributed Access and Analysis: NASA ...
CLIM Program: Remote Sensing Workshop, Distributed Access and Analysis: NASA ...
 
GlobusWorld 2015
GlobusWorld 2015GlobusWorld 2015
GlobusWorld 2015
 
Cloud com foster december 2010
Cloud com foster december 2010Cloud com foster december 2010
Cloud com foster december 2010
 
Hadoop to spark_v2
Hadoop to spark_v2Hadoop to spark_v2
Hadoop to spark_v2
 

Viewers also liked

Big Process for Big Data
Big Process for Big DataBig Process for Big Data
Big Process for Big DataIan Foster
 
CI4CC sustainability-panel
CI4CC sustainability-panelCI4CC sustainability-panel
CI4CC sustainability-panelRavi Madduri
 
Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2Dan Taylor
 
Role of Amyloid Burden in cognitive decline
Role of Amyloid Burden in cognitive decline Role of Amyloid Burden in cognitive decline
Role of Amyloid Burden in cognitive decline Ravi Madduri
 
ADAS&ME presentation @ the SCOUT project expert workshop (22-02-2017, Brussels)
ADAS&ME presentation @ the SCOUT project expert workshop (22-02-2017, Brussels)ADAS&ME presentation @ the SCOUT project expert workshop (22-02-2017, Brussels)
ADAS&ME presentation @ the SCOUT project expert workshop (22-02-2017, Brussels)joseplaborda
 
Jsm madduri-august-2015
Jsm madduri-august-2015Jsm madduri-august-2015
Jsm madduri-august-2015Ravi Madduri
 
Effective ansible
Effective ansibleEffective ansible
Effective ansibleWu Bigo
 
Big Data and Genomics
Big Data and GenomicsBig Data and Genomics
Big Data and GenomicsAl Costa
 
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer toolsMay 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer toolsYahoo Developer Network
 
Globus Genomics: Democratizing NGS Analysis
Globus Genomics: Democratizing NGS AnalysisGlobus Genomics: Democratizing NGS Analysis
Globus Genomics: Democratizing NGS AnalysisRavi Madduri
 

Viewers also liked (20)

HL7: Clinical Decision Support
HL7: Clinical Decision SupportHL7: Clinical Decision Support
HL7: Clinical Decision Support
 
Public.Cdsc.Middleton
Public.Cdsc.MiddletonPublic.Cdsc.Middleton
Public.Cdsc.Middleton
 
Big Process for Big Data
Big Process for Big DataBig Process for Big Data
Big Process for Big Data
 
CI4CC sustainability-panel
CI4CC sustainability-panelCI4CC sustainability-panel
CI4CC sustainability-panel
 
Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2
 
Role of Amyloid Burden in cognitive decline
Role of Amyloid Burden in cognitive decline Role of Amyloid Burden in cognitive decline
Role of Amyloid Burden in cognitive decline
 
Supporting Barack Obama for President
Supporting Barack Obama for PresidentSupporting Barack Obama for President
Supporting Barack Obama for President
 
ADAS&ME presentation @ the SCOUT project expert workshop (22-02-2017, Brussels)
ADAS&ME presentation @ the SCOUT project expert workshop (22-02-2017, Brussels)ADAS&ME presentation @ the SCOUT project expert workshop (22-02-2017, Brussels)
ADAS&ME presentation @ the SCOUT project expert workshop (22-02-2017, Brussels)
 
Jsm madduri-august-2015
Jsm madduri-august-2015Jsm madduri-august-2015
Jsm madduri-august-2015
 
Effective ansible
Effective ansibleEffective ansible
Effective ansible
 
Big Data and Genomics
Big Data and GenomicsBig Data and Genomics
Big Data and Genomics
 
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer toolsMay 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
 
Stereo and 3D Displays - Matt Hirsch
Stereo and 3D Displays - Matt HirschStereo and 3D Displays - Matt Hirsch
Stereo and 3D Displays - Matt Hirsch
 
Raskar UIST Keynote 2015 November
Raskar UIST Keynote 2015 NovemberRaskar UIST Keynote 2015 November
Raskar UIST Keynote 2015 November
 
Leap Motion Development (Rohan Puri)
Leap Motion Development (Rohan Puri)Leap Motion Development (Rohan Puri)
Leap Motion Development (Rohan Puri)
 
Globus Genomics: Democratizing NGS Analysis
Globus Genomics: Democratizing NGS AnalysisGlobus Genomics: Democratizing NGS Analysis
Globus Genomics: Democratizing NGS Analysis
 
Coded Photography - Ramesh Raskar
Coded Photography - Ramesh RaskarCoded Photography - Ramesh Raskar
Coded Photography - Ramesh Raskar
 
Google Glass Breakdown
Google Glass BreakdownGoogle Glass Breakdown
Google Glass Breakdown
 
Multiview Imaging HW Overview
Multiview Imaging HW OverviewMultiview Imaging HW Overview
Multiview Imaging HW Overview
 
What is Media in MIT Media Lab, Why 'Camera Culture'
What is Media in MIT Media Lab, Why 'Camera Culture'What is Media in MIT Media Lab, Why 'Camera Culture'
What is Media in MIT Media Lab, Why 'Camera Culture'
 

Similar to re:Invent 2013-foster-madduri

Science cloud foster june 2013
Science cloud foster june 2013Science cloud foster june 2013
Science cloud foster june 2013Kirill Osipov
 
Science as a Service: How On-Demand Computing can Accelerate Discovery
Science as a Service: How On-Demand Computing can Accelerate DiscoveryScience as a Service: How On-Demand Computing can Accelerate Discovery
Science as a Service: How On-Demand Computing can Accelerate DiscoveryIan Foster
 
Introduction to Globus - XSEDE14 Tutorial
Introduction to Globus - XSEDE14 TutorialIntroduction to Globus - XSEDE14 Tutorial
Introduction to Globus - XSEDE14 TutorialGlobus
 
Simplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus PlatformSimplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus PlatformGlobus
 
Science for the Future: Strategies for Moving and Sharing Data
Science for the Future: Strategies for Moving and Sharing DataScience for the Future: Strategies for Moving and Sharing Data
Science for the Future: Strategies for Moving and Sharing DataIan Foster
 
Big Process for Big Data @ NASA
Big Process for Big Data @ NASABig Process for Big Data @ NASA
Big Process for Big Data @ NASAIan Foster
 
Introduction to the Globus SaaS (GlobusWorld Tour - STFC)
Introduction to the Globus SaaS (GlobusWorld Tour - STFC)Introduction to the Globus SaaS (GlobusWorld Tour - STFC)
Introduction to the Globus SaaS (GlobusWorld Tour - STFC)Globus
 
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...Globus
 
Webinar: Q&A on Globus Subscription Features
Webinar: Q&A on Globus Subscription FeaturesWebinar: Q&A on Globus Subscription Features
Webinar: Q&A on Globus Subscription FeaturesGlobus
 
GlobusWorld 2019 Opening Keynote
GlobusWorld 2019 Opening KeynoteGlobusWorld 2019 Opening Keynote
GlobusWorld 2019 Opening KeynoteGlobus
 
E Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutesE Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutesJan Aerts
 
Automating Research Data Management at Scale with Globus
Automating Research Data Management at Scale with GlobusAutomating Research Data Management at Scale with Globus
Automating Research Data Management at Scale with GlobusGlobus
 
Delivering a Campus Research Data Service with Globus
Delivering a Campus Research Data Service with GlobusDelivering a Campus Research Data Service with Globus
Delivering a Campus Research Data Service with GlobusIan Foster
 
Globus status and publication plans
Globus status and publication plansGlobus status and publication plans
Globus status and publication plansIan Foster
 
GlobusWorld 2020 Keynote
GlobusWorld 2020 KeynoteGlobusWorld 2020 Keynote
GlobusWorld 2020 KeynoteGlobus
 
Research Automation for Data-Driven Discovery
Research Automationfor Data-Driven DiscoveryResearch Automationfor Data-Driven Discovery
Research Automation for Data-Driven DiscoveryGlobus
 
Research Automation for Data-Driven Discovery
Research Automation for Data-Driven DiscoveryResearch Automation for Data-Driven Discovery
Research Automation for Data-Driven DiscoveryIan Foster
 
Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...mestato
 
Linking Scientific Instruments and Computation
Linking Scientific Instruments and ComputationLinking Scientific Instruments and Computation
Linking Scientific Instruments and ComputationIan Foster
 

Similar to re:Invent 2013-foster-madduri (20)

Science cloud foster june 2013
Science cloud foster june 2013Science cloud foster june 2013
Science cloud foster june 2013
 
Science as a Service: How On-Demand Computing can Accelerate Discovery
Science as a Service: How On-Demand Computing can Accelerate DiscoveryScience as a Service: How On-Demand Computing can Accelerate Discovery
Science as a Service: How On-Demand Computing can Accelerate Discovery
 
Introduction to Globus - XSEDE14 Tutorial
Introduction to Globus - XSEDE14 TutorialIntroduction to Globus - XSEDE14 Tutorial
Introduction to Globus - XSEDE14 Tutorial
 
Simplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus PlatformSimplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus Platform
 
Science for the Future: Strategies for Moving and Sharing Data
Science for the Future: Strategies for Moving and Sharing DataScience for the Future: Strategies for Moving and Sharing Data
Science for the Future: Strategies for Moving and Sharing Data
 
Sept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the CloudSept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the Cloud
 
Big Process for Big Data @ NASA
Big Process for Big Data @ NASABig Process for Big Data @ NASA
Big Process for Big Data @ NASA
 
Introduction to the Globus SaaS (GlobusWorld Tour - STFC)
Introduction to the Globus SaaS (GlobusWorld Tour - STFC)Introduction to the Globus SaaS (GlobusWorld Tour - STFC)
Introduction to the Globus SaaS (GlobusWorld Tour - STFC)
 
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...
 
Webinar: Q&A on Globus Subscription Features
Webinar: Q&A on Globus Subscription FeaturesWebinar: Q&A on Globus Subscription Features
Webinar: Q&A on Globus Subscription Features
 
GlobusWorld 2019 Opening Keynote
GlobusWorld 2019 Opening KeynoteGlobusWorld 2019 Opening Keynote
GlobusWorld 2019 Opening Keynote
 
E Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutesE Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutes
 
Automating Research Data Management at Scale with Globus
Automating Research Data Management at Scale with GlobusAutomating Research Data Management at Scale with Globus
Automating Research Data Management at Scale with Globus
 
Delivering a Campus Research Data Service with Globus
Delivering a Campus Research Data Service with GlobusDelivering a Campus Research Data Service with Globus
Delivering a Campus Research Data Service with Globus
 
Globus status and publication plans
Globus status and publication plansGlobus status and publication plans
Globus status and publication plans
 
GlobusWorld 2020 Keynote
GlobusWorld 2020 KeynoteGlobusWorld 2020 Keynote
GlobusWorld 2020 Keynote
 
Research Automation for Data-Driven Discovery
Research Automationfor Data-Driven DiscoveryResearch Automationfor Data-Driven Discovery
Research Automation for Data-Driven Discovery
 
Research Automation for Data-Driven Discovery
Research Automation for Data-Driven DiscoveryResearch Automation for Data-Driven Discovery
Research Automation for Data-Driven Discovery
 
Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...
 
Linking Scientific Instruments and Computation
Linking Scientific Instruments and ComputationLinking Scientific Instruments and Computation
Linking Scientific Instruments and Computation
 

Recently uploaded

Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 

Recently uploaded (20)

Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 

re:Invent 2013-foster-madduri

  • 1. Science as a Service Ian Foster, The University of Chicago and Argonne National Laboratory November 14, 2013
  • 2. A time of disruptive change
  • 3. A time of disruptive change
  • 4. Most labs have limited resources Heidorn: NSF grants in 2007 $1,000,000 $100,000 $10,000 < $350,000 80% of awards 50% of grant $$ $1,000 2000 4000 6000 8000
  • 5. Automation is required to apply more sophisticated methods to far more data
  • 6. Automation is required to apply more sophisticated methods to far more data Outsourcing is needed to achieve economies of scale in the use of automated methods
  • 7. Building a discovery cloud • Identify time-consuming activities amenable to automation and outsourcing • Implement as high-quality, low-touch SaaS • Leverage IaaS for reliability, Software as a service economies of scale Platform as a service Infrastructure as a service • Extract common elements as research automation platform Bonus question: Sustainability
  • 8. We aspire (initially) to create a great user experience for research data management What would a “dropbox for science” look like?
  • 9. • Collect • Move • Sync • Share • Analyze • Annotate • Publish • Search • Backup • Archive BIG DATA
  • 10. It should be trivial to Collect, Move, Sync, Share, Analyze, Annotate, Publish, … but in reality it’s often very challenging Search, Backup, & Archive BIG DATA ! Staging Store ! Ingest Expired Store credentials Registry Permission denied Communit Community yStore Store ! Analysis ! Store Quota Network failed. Retry. exceeded Archive Mirror
  • 11. • Collect • Move • Sync • Share • Analyze • Annotate • Publish • Search • Backup • Archive BIG DATA
  • 12. • Collect • Annotate Move • Publish • Move Sync • Search • Sync • Share Share • Backup Capabilities delivered using • Analyze • Archive Software-as-Service (SaaS) model BIG DATA
  • 14. 2 1 User A selects file(s) to share; selects user/group, sets share permissions Globus Online tracks shared files; no need to move files to cloud storage! Data Source 3 User B logs in to Globus Online and accesses shared file
  • 15. Extreme ease of use • • • • • • • • InCommon, Oauth, OpenID, X.509, … Credential management Group definition and management Transfer management and optimization Reliability via transfer retries Web interface, REST API, command line One-click “Globus Connect” install 5-minute Globus Connect Multi User install
  • 16. Early adoption is encouraging
  • 17. Early adoption is encouraging >12,000 registered users; >150 daily >27 PB moved; >1B files 10x (or better) performance vs. scp 99.9% availability Entirely hosted on Amazon
  • 18. Amazon web services used • EC2 for hosting Globus services • ELB to use multiple availability zones for reliability and uptime • SES and SNS to send notifications of transfer status • S3 to store historical state • PostgreSQL for active state
  • 19. K. Heitmann (Argonne) moves 22 TB of cosmology data LANL  ANL at 5 Gb/s
  • 20. B. Winjum (UCLA) moves 900K-file plasma physics datasets UCLA NERSC
  • 21. Dan Kozak (Caltech) replicates 1 PB LIGO astronomy data for resilience
  • 22. Erin Miller (PNNL) collects data at Advanced Photon Source, renders at PNNL, and views at ANL Credit: Kerstin Kleese-van Dam
  • 23. • Collect • Annotate Move • Publish • Move Sync • Search • Sync • Share Share • Backup Capabilities delivered using • Analyze • Archive Software-as-Service (SaaS) model BIG DATA
  • 24. • Collect • Move • Sync • Share • Analyze • Annotate • Publish • Search • Backup • Archive BIG DATA
  • 25. Sharing Service Transfer Service Globus Nexus (Identity, Group, Profile) Globus Toolkit Globus Connect Globus Online APIs Globus Online already does a lot
  • 26. The identity challenge in science • Research communities often need to – Assign identities to their users – Manage user profiles – Organize users into groups for authorization • Obstacles to high-quality implementations – – – – Complexity of associated security protocols Creation of identity silos Multiple credentials for users Reliability, availability, scalability, security
  • 27. Nexus provides four key capabilities • Identity provisioning I I I – Create, manage Globus identities I I G I V U aI b • Identity hub – Link with other identities; use to authenticate to services • Group hub – User-managed groups; groups can be used for authorization • Profile management – User-managed attributes; can use in group admission Key points: 1) Outsource identity, group, profile management 2) REST API for flexible integration 3) Intuitive, customizable Web interfaces
  • 28. Branded sites XSEDE Open Science Grid University of Chicago DOE kBase Indiana University University of Exeter Globus Online NERSC NIH BIRN
  • 29. A platform for integration
  • 30. A platform for integration
  • 31. A platform for integration
  • 32. Data management SaaS (Globus) + Next-gen sequence analysis pipelines (Galaxy) + Cloud IaaS (Amazon) = Flexible, scalable, easy-to-use genomics analysis for all biologists globus genomics
  • 33. Sharing Service Transfer Service Globus Nexus (Identity, Group, Profile) Globus Toolkit Globus Connect Globus Online APIs We are adding capabilities
  • 34. Dataset Services Sharing Service Transfer Service Globus Nexus (Identity, Group, Profile) Globus Toolkit Globus Connect Globus Online APIs We are adding capabilities
  • 35. We are adding capabilities • Ingest and publication – Imagine a DropBox that not only replicates, but also extracts metadata, catalogs, converts • Cataloging – Virtual views of data based on user-defined and/or automatically extracted metadata • Computation – Associate computational procedures, orchestrate application, catalog results, record provenance
  • 36. Next Gen Sequencing Analysis for Everyone – No IT Required Ravi K Madduri, The University of Chicago and Argonne National Laboratory November 14, 2013
  • 37. One slide to get your attention
  • 38. Outline • Globus Vision • Challenges in Sequencing Analysis – Big Data Management – Analysis at Scale – Reproducibility • Proposed Approach Using Globus Genomics • Example Collaborations • Q&A
  • 39. Globus Vision Goal: Accelerate discovery and innovation worldwide by providing research IT as a service Leverage software-as-a-service to: – provide millions of researchers with unprecedented access to powerful tools for managing Big Data – reduce research IT costs dramatically via economies of scale “Civilization advances by extending the number of important operations which we can perform without thinking of them” —Alfred North Whitehead , 1911
  • 40. Challenges in Sequencing Analysis Data Movement and Access Challenges • • • • Shell scripts to sequentially execute the tools Manually modify the scripts for any change • Public Data Manually move the data to the Compute node Install all the tools required for the Analysis Difficult to maintain and transfer the knowledge • BWA, Picard, GATK, Filtering Scripts, etc. • Error Prone, difficult to keep track, messy.. Storage Sequencing Centers Fastq Ref Genome Research Lab Seq Center Local Cluster/ Cloud Modify Picard Install • • • • Difficult to Data is distributed in different locations Research labs need access to the data for analysis Be able to Share data with other researchers/collaborators • Inefficient ways of data movement Data needs to be available on the local and Distributed Compute Resources • Local Clusters, Cloud, Grid and transfer the knowledge Alignment (Re)Run GATK Script Variant Calling How do we analyze this Sequence Data Manual Data Analysis
  • 41. Globus Genomics Globus Genomics Galaxy Based Workflow Management System • Public Data Sequencin g Centers Globus Provides a • High-performance Research Lab • Fault-tolerant Seq Secure • Center Storage • • Galaxy Data Libraries • Local Cluster/ Cloud • file transfer Service between all data-endpoints Globus Integrated within Galaxy Web-based UI Drag-Drop workflow creations Easily modify Workflows with new tools Analytical tools are automatically run on the scalable compute resources when possible Globus Genomics on Amazon EC2 Data Management Data Analysis
  • 42. Globus Genomics Architecture Figure 2: Globus Genomics Architecture
  • 44. mputation Institute, University of Chicago, Chicago, IL, USA. 2Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL, U 3 Section Genetic Medicine, University of Chicago, Chicago, IL. Challenges in Next-Gen Sequencing Analysis Parallel Workflows on Globus Genomics High Performance, Reusable Consensus Calling Pipeline
  • 45. Globus Genomics • Computational profiles for various analysis tools • Resources can be provisioned on-demand with Amazon Web Services cloud based infrastructure • Glusterfs as a shared file system between head nodes and compute nodes • Provisioned I/O on EBS
  • 46. Coming soon! • Integration with Globus Catalog – Better data discovery and metadata management • Integration with Globus Sharing – Easy and secure method to share large datasets with collaborators • Integration with Amazon Glacier for data archiving • Support for high throughput computational modalities through Apache Mesos – MapReduce and MPI clusters • Dynamic Storage Strategies using S3 and/or LVMbased shared file system
  • 47.
  • 48. Our vision for a 21st century discovery infrastructure Provide more capability for more people at lower cost by building a “Discovery Cloud” Delivering “Science as a service”
  • 49. Thank you to our sponsors
  • 50. For more information • More information on Globus Genomics and to sign up: www.globus.org/genomics • More information on Globus: www.globusonline.org • Follow us on Twitter: @ianfoster, @madduri, @globusgenomics, @gl obusonline
  • 52. Please give us your feedback on this presentation BDT 310 As a thank you, we will select prize winners daily for completed surveys!

Editor's Notes

  1. For example in genomics
  2. For example in genomics
  3. 10,000 80% of awards and 50% of grant $$ are &lt; $350K
  4. Concern:
  5. Many in this room are probably users of Dropbox or similar services for keeping their files synced across multiple machinesWell, the scientific research equivalent is a little different
  6. We figured it needs to allow a group of collaborating researchers to do many or all of these things with their data ……and not just the 2GB of powerpoints…or the 100GB of family photos and videos….but the petabytes and exabytes of data that will soon be the norm for many
  7. So how would such a drop box for science be used? Let’s look at a very typical scientific data work flow . . .Data is generated by some instrument (a sequencer at JGI or a light source like APS/ALS)…since these instruments are in high demand, users have to get their data off the instrument to make way for the next userSo the data is typically moved from a staging area to some type of ingest storeEtcetera for analysis, sharing of results with collaborators, annotation with metadata for future search, backup/sync/archival, …
  8. We figured it needs to allow a group of collaborating researchers to do many or all of these things with their data ……and not just the 2GB of powerpoints…or the 100GB of family photos and videos….but the petabytes and exabytes of data that will soon be the norm for many
  9. We figured it needs to allow a group of collaborating researchers to do many or all of these things with their data ……and not just the 2GB of powerpoints…or the 100GB of family photos and videos….but the petabytes and exabytes of data that will soon be the norm for many
  10. And when we spoke with IT folks at various research communities they insisted that some things were not up for negotiation
  11. And when we spoke with IT folks at various research communities they insisted that some things were not up for negotiation
  12. And when we spoke with IT folks at various research communities they insisted that some things were not up for negotiation
  13. This image shows a 3D rendering of a Shewanella biofilm grown on a flat plastic substrate in a Constant Depth bioFilm Fermenter (CDFF).  The image was generated using x-ray microtomography at the Advanced Photon Source, Argonne National Laboratory.   
  14. We figured it needs to allow a group of collaborating researchers to do many or all of these things with their data ……and not just the 2GB of powerpoints…or the 100GB of family photos and videos….but the petabytes and exabytes of data that will soon be the norm for many
  15. We figured it needs to allow a group of collaborating researchers to do many or all of these things with their data ……and not just the 2GB of powerpoints…or the 100GB of family photos and videos….but the petabytes and exabytes of data that will soon be the norm for many
  16. Four obstacles to collaborative application developmentBuild collaborative applications– Outsource identity, group and profilemanagement– REST API for flexible integration– Intuitive, customizable interfaces
  17. Total: over 350K Core hours in last 6months
  18. Questions remain:-- What capabilities? Where does time go?-- How do we turn them into usable solutions?-- How do we scale from thousands to millions?-- How do we incentivize contributions? Long tail.