Streamlined
data sharing and analysis
to accelerate cancer research
Ian Foster
The University of Chicago and Argonne National Laboratory
1
2
Thesis: We enhance
sharing and analysis
by eliminating friction
3
1919 Motor Transport Corps convoy
Washington, DC., to San Francisco
56 days, average speed of 9 km/h
2016: 41 hours by road, 5.5 hours by air
5
2 minutes by web
<1 second by API
Cloud: Outsourcing and automation
6
Software as a service: SaaS
Infrastructure as a service: IaaS
Platform as a service: PaaS
(web & mobile apps)
Cloud: Outsourcing and automation
7
Software as a service: SaaS
Infrastructure as a service: IaaS
Platform as a service: PaaS
(web & mobile apps)Saas for
science
Data challenges:
Olopade Lab
Inherited hematological malignancies
Impact:
• Familial blood cancer syndromes are being included in the 2016 revision of World Health Organization
Classification of Hematological Malignancies; NCCN guidelines; European LeukemiaNet
• Identification of germline mutations is important for prevention/intervention and early diagnosis, and
may change treatment (e.g., stem cell transplant from related donor w/o mutation or matched
unrelated donor)
Background:
• Familial predisposition to blood cancers has not been widely appreciated,
like some solid cancers
• Identifying the genes involved informs understanding of biology and may
impact patient care (prevention, diagnosis and treatment)
Jane Churpek, MD Lucy Godley, MD, PhD
Research highlights:
• With samples from >500 families, the team has identified novel germline
mutations that predispose to familial myelodysplastic syndromes and leukemia
• These mutations are much more common than previously known
• Specific genes with identified mutations include RUNX1, ETV6, DDX41, ANKRD26
The RUNX1
International
Sequencing
Consortium
(RISC)
Inherited hematological malignancies, Lucy Godley, UChicago
Notable areas of friction
• Moving data rapidly, securely, and reliably from lab to lab
• Accessing data at other labs
• Controlling who can access data
• Tracking what data is where
• Discovering available data within a rapidly growing haystack
• Computing at scale
• Complying with rules on personal health information
• Archive and backup
11
Sequencing center
Publication
repository
Personal Computer
Compute facility
Researcher
initiates transfer
request; or requested
automatically by script,
science gateway
Publication
repository
Personal Computer
1
Sequencing center Compute facility
Researcher
initiates transfer
request; or requested
automatically by script,
science gateway
Compute facilityGlobus transfers
files reliably,
securely
2
Personal Computer
Transfer
1
Sequencing center
Publication
repository
Researcher
initiates transfer
request; or requested
automatically by script,
science gateway
Researcher
selects files to
share, selects user
or group, and sets
access permissions
Publication
repository
Personal Computer
1 3
Share
Compute facilityGlobus transfers
files reliably,
securely
2
Transfer
Sequencing center
Researcher
initiates transfer
request; or requested
automatically by script,
science gateway
Globus controls access to
shared files on existing
storage; no need to move
files to cloud storage!
Researcher
selects files to
share, selects user
or group, and sets
access permissions
Publication
repository
Personal Computer
1 3
Share
4
Compute facilityGlobus transfers
files reliably,
securely
2
Transfer
Sequencing center
Researcher
initiates transfer
request; or requested
automatically by script,
science gateway
Researcher
selects files to
share, selects user
or group, and sets
access permissions
Collaborator logs in
to access shared
files; no local
account needed;
download via
Globus
Publication
repository
Personal Computer
1 3
Share
5
Compute facilityGlobus transfers
files reliably,
securely
2
Transfer
Sequencing center
Globus controls access to
shared files on existing
storage; no need to move
files to cloud storage!
4
Researcher
initiates transfer
request; or requested
automatically by script,
science gateway
Researcher
selects files to
share, selects user
or group, and sets
access permissions
Collaborator logs in
to access shared
files; no local
account needed;
download via
Globus
Researcher
assembles data set;
attaches metadata
(Dublin core,
domain-specific) Publication
repository
Personal Computer
1 3
Share
Publish
5
6
6
Compute facilityGlobus transfers
files reliably,
securely
2
Transfer
Sequencing center
Globus controls access to
shared files on existing
storage; no need to move
files to cloud storage!
4
Researcher
initiates transfer
request; or requested
automatically by script,
science gateway
Curator reviews and
approves; data set published
on campus or other system
Researcher
selects files to
share, selects user
or group, and sets
access permissions
Collaborator logs in
to access shared
files; no local
account needed;
download via
Globus
Researcher
assembles data set;
attaches metadata
(Dublin core,
domain-specific) Publication
repository
Personal Computer
1 3
Share
Publish
5
6
6
7
Compute facilityGlobus transfers
files reliably,
securely
2
Transfer
Globus controls access to
shared files on existing
storage; no need to move
files to cloud storage!
4
Sequencing center
Researcher
initiates transfer
request; or requested
automatically by script,
science gateway
Curator reviews and
approves; data set published
on campus or other system
Researcher
selects files to
share, selects user
or group, and sets
access permissions
Collaborator logs in
to access shared
files; no local
account needed;
download via
Globus
Researcher
assembles data set;
attaches metadata
(Dublin core,
domain-specific)
Peers, collaborators
search and discover
datasets; transfer and
share using Globus
Publication
repository
Personal Computer
1 3
Share
Publish
Discover
5
6
6
7
8
Compute facilityGlobus transfers
files reliably,
securely
2
Transfer
Globus controls access to
shared files on existing
storage; no need to move
files to cloud storage!
4
Sequencing center
Researcher
initiates transfer
request; or requested
automatically by script,
science gateway
Curator reviews and
approves; data set published
on campus or other system
Researcher
selects files to
share, selects user
or group, and sets
access permissions
Collaborator logs in
to access shared
files; no local
account needed;
download via
Globus
Researcher
assembles data set;
attaches metadata
(Dublin core,
domain-specific)
Peers, collaborators
search and discover
datasets; transfer and
share using Globus
Publication
repository
Personal Computer
• Only Web browser required
• Use any storage system
• Access using any credential
1 3
Share
Publish
Discover
5
6
6
7
8
Compute facilityGlobus transfers
files reliably,
securely
2
Transfer
Sequencing center
Globus controls access to
shared files on existing
storage; no need to move
files to cloud storage!
4
How Globus adds value…
• Ease of use, consistent user interface across systems
• “Fire-and-forget” reliable file transfer
• Low-overhead external collaboration
• Secure access, multi-tier security model
• Maximized wide area network throughput
• Rapid deployment via standard packages
• Highly automatable: CLI, RESTful API
23
24
25
26
27
28
29
30
Storage connectors
Standard (Posix)
Linux
Windows
MacOS
Lustre, GPFS, OrangeFS, ...
31
Premium
HPSS
HDFS
Amazon S3
Ceph RadosGW (S3 API)
Spectra Logic BlackPearl
Google Drive*
* Coming soon
Globus accelerates disk-to-disk throughput
0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000
scp
scp (w/HPN)
sftp
GridFTP
(1 stream)
GridFTP
(4 streams)
Disk-to-Disk Throughput (Mbps)
32Source: ESnet (2016)
• Berkeley, CA to Argonne, IL
(RTT: 53 ms, Capacity: 10Gbps)
• scp is 24x slower than GridFTP
• >1 Gbps (125 MB/s) disk-to-disk
requires RAID array
34
35
36
Globus is widely used
4
major services
13
national labs
190 PB
transferred
10,000
active endpoints
20 billion
files processed
10,000
active users
50,000
registered users
99.9%
uptime
35+
institutional
subscribers
1 PB
largest single
transfer to date
3 months
longest
continuously
managed transfer
130
federated
campus identities
0
20
40
60
80
100
120
140
160
180
2015-03
2015-04
2015-05
2015-06
2015-07
2015-08
2015-09
2015-10
2015-11
2015-12
2016-01
2016-02
2016-03
2016-04
2016-05
2016-06
2016-07
2016-08
Terabytes
Year and Month
Terabytes per Month
0
20
40
60
80
100
120
140
2015-03
2015-04
2015-05
2015-06
2015-07
2015-08
2015-09
2015-10
2015-11
2015-12
2016-01
2016-02
2016-03
2016-04
2016-05
2016-06
2016-07
2016-08
Users
Year and Month
Users per Month
Globus @ NIH
Globus subscriptions for sustainability
• Standard subscription
• Shared endpoints
• Data publication
• HTTPS support*
• Management console
• Usage reporting
• Priority support
• Application integration
• Branded Web Site
• Premium Storage Connectors
• Amazon S3, Ceph, HPSS, Spectra, Google Drive*, …
• Alternate Identity Provider (InCommon is standard) 39*Available late 2016
Representativesubscribers
41
Cloud: Outsourcing and automation
42
Software as a service: SaaS
Infrastructure as a service: IaaS
Platform as a service: PaaS
(web & mobile apps)
PaaS for
science
43
44
45
46
10GE10GE
10GE
10GE
Border Router
WAN
Science DMZ
Switch/Router
Firewall
Enterprise
perfSONAR
perfSONAR
10GE
10GE
10GE
10GE
DTN
DTN
API DTNs
(data access governed
by portal)
DTN
DTN
perfSONAR
Filesystem
(data store)
10GE
Portal
Server
Browsing path
Query path
Portal server applications:
web server
search
database
authentication
Data Path
Data Transfer Path
Portal Query/Browse Path
47
https://fasterdata.es.net/
Globus
leverages
Science
DMZs
Prototypical research data portal
• Move portal storage
into Science DMZ,
with Globus endpoint
• Leave portal web
server behind firewall
• Globus handles
security and data
heavy lifting
48
Desktop
Globus Cloud
Firewall
Science DMZ
Globus
Transfer
Service
Portal Web
Server (Client)
Globus Auth
Browser
User’s
Endpoint
(optional)
Portal
Endpoint
Other
Endpoints
HTTPS
GridFTP
REST Other
Services
Globus Web
Widgets
50
https://github.com/globus/globus-sample-data-portal
51
https://www.globusworld.org/tour/
Workflows can be easily defined
and automated with integrated
Galaxy Platform capabilities
Data movement is streamlined
with integrated Globus transfer
Resources can be provisioned on-
demand with Amazon Web Services
cloud based infrastructure
Globus Genomics: Genomics analysis as a service
Ravi Madduri et al., University of Chicago
Globus Genomics use cases
A profile of inherited predisposition to breast cancer among Nigerian women
Y. Zheng, T. Walsh, F. Yoshimatsu, M. Lee, S. Gulsuner, S. Casadei, A. Rodriguez, T. Ogundiran, C. Babalola, O.
Ojengbede, D. Sighoko, R. Madduri, M.-C. King, O. Olopade
A case study for high throughput analysis of NGS data for translational research
using Globus Genomics
D. Sulakhe, A. Rodriguez, K. Bhuvaneshwar, Y. Gusev, R. Madduri, L. Lacinski, U. Dave, I. Foster, S. Madhavan
Globus Genomics at a glance
30
institutions, groups
10s
million core hours
2 PBs
raw sequence
analyzed
1,500
analysis tools
10,000
genomes
processed
50
workflows
99%
uptime over the past
two years
1 PB
data generated
43
steps in longest
pipeline
100s
species
75
largest user group
5 days
longest running
workflow
Cost-aware provisioning on cloud resources
55
1. Filter instance types with profiles
2. Determine price for each instance
type across all availability zones
3. Rank potential requests
4. Make requests and monitor
5. Cancel or repurpose excess active
requests once one is fulfilled
Can reduce costs by 95% or more!
$$$
???
R. Chard et al. Cost-aware cloud provisioning, 11th IEEE International Conference on e-Science (e-Science), 2015.
What’s coming soon: Richer endpoints
HTTPS access to endpoints
• Enhanced use of research storage:
• Asynchronous, bulk transfer: GridFTP
• Synchronous remote access: HTTPS
• Enhanced Globus web app
• Browser-based upload/download
• Inline file viewer
• Integration with clients, web apps
56
GridFTP
What’s coming soon: Richer endpoints
57
GridFTP
Collections
• Groupings of files that are to be
treated as logical units
• Can be named and described
HTTPS access to endpoints
• Enhanced use of research storage:
• Asynchronous, bulk transfer: GridFTP
• Synchronous remote access: HTTPS
• Enhanced Globus web app
• Browser-based upload/download
• Inline file viewer
• Integration with clients, web apps
What’s coming soon: Richer endpoints
58
Data search
• Automated metadata harvesting
• From Globus endpoints
• Submitted via REST API
• Rich search capabilities
• Free text, faceted, boosted
GridFTP
HTTPS access to endpoints
• Enhanced use of research storage:
• Asynchronous, bulk transfer: GridFTP
• Synchronous remote access: HTTPS
• Enhanced Globus web app
• Browser-based upload/download
• Inline file viewer
• Integration with clients, web apps
Collections
• Groupings of files that are to be
treated as logical units
• Can be named and described
Thank you to our sponsors
U . S . D E P A R T M E N T O F
ENERGY
59
Thanks to: Rachana Ananthakrishnan, Kyle Chard, Ravi Madduri,
Brigitte Raumann, Steve Tuecke, Vas Vasiliadis,
and others in the Globus team at the University of Chicago
Globus provides a new global-scale data fabric that can accelerate
discovery by streamlining scientific data sharing and analysis
• Globus-enabled storage systems enable robust, secure access
• Globus cloud services implement transfer, sharing, publication,
discovery, and other capabilities
This fabric is:
• Being applied in cancer research
• Spreading rapidly by word of mouth (scientists like it!)
• Widely deployed across universities and labs (thanks, NSF & DOE)
• On a path to sustainability based on subscriptions
• Being integrated into research infrastructures and applications 60
To accelerate impact in biomedicine:
•Integrate biomedical research facilities into the fabric
•Encourage subscriptions to address sustainability
•Provide HIPAA compliance for applications involving PHI
•Cultivate an ecosystem of data portals and applications
that leverage the platform
•Continue to add capabilities
61
www.globus.org foster@uchicago.edu

Streamlined data sharing and analysis to accelerate cancer research

  • 1.
    Streamlined data sharing andanalysis to accelerate cancer research Ian Foster The University of Chicago and Argonne National Laboratory 1
  • 2.
    2 Thesis: We enhance sharingand analysis by eliminating friction
  • 3.
    3 1919 Motor TransportCorps convoy Washington, DC., to San Francisco 56 days, average speed of 9 km/h
  • 4.
    2016: 41 hoursby road, 5.5 hours by air
  • 5.
    5 2 minutes byweb <1 second by API
  • 6.
    Cloud: Outsourcing andautomation 6 Software as a service: SaaS Infrastructure as a service: IaaS Platform as a service: PaaS (web & mobile apps)
  • 7.
    Cloud: Outsourcing andautomation 7 Software as a service: SaaS Infrastructure as a service: IaaS Platform as a service: PaaS (web & mobile apps)Saas for science
  • 8.
  • 9.
    Inherited hematological malignancies Impact: •Familial blood cancer syndromes are being included in the 2016 revision of World Health Organization Classification of Hematological Malignancies; NCCN guidelines; European LeukemiaNet • Identification of germline mutations is important for prevention/intervention and early diagnosis, and may change treatment (e.g., stem cell transplant from related donor w/o mutation or matched unrelated donor) Background: • Familial predisposition to blood cancers has not been widely appreciated, like some solid cancers • Identifying the genes involved informs understanding of biology and may impact patient care (prevention, diagnosis and treatment) Jane Churpek, MD Lucy Godley, MD, PhD Research highlights: • With samples from >500 families, the team has identified novel germline mutations that predispose to familial myelodysplastic syndromes and leukemia • These mutations are much more common than previously known • Specific genes with identified mutations include RUNX1, ETV6, DDX41, ANKRD26
  • 10.
  • 11.
    Notable areas offriction • Moving data rapidly, securely, and reliably from lab to lab • Accessing data at other labs • Controlling who can access data • Tracking what data is where • Discovering available data within a rapidly growing haystack • Computing at scale • Complying with rules on personal health information • Archive and backup 11
  • 13.
  • 14.
    Researcher initiates transfer request; orrequested automatically by script, science gateway Publication repository Personal Computer 1 Sequencing center Compute facility
  • 15.
    Researcher initiates transfer request; orrequested automatically by script, science gateway Compute facilityGlobus transfers files reliably, securely 2 Personal Computer Transfer 1 Sequencing center Publication repository
  • 16.
    Researcher initiates transfer request; orrequested automatically by script, science gateway Researcher selects files to share, selects user or group, and sets access permissions Publication repository Personal Computer 1 3 Share Compute facilityGlobus transfers files reliably, securely 2 Transfer Sequencing center
  • 17.
    Researcher initiates transfer request; orrequested automatically by script, science gateway Globus controls access to shared files on existing storage; no need to move files to cloud storage! Researcher selects files to share, selects user or group, and sets access permissions Publication repository Personal Computer 1 3 Share 4 Compute facilityGlobus transfers files reliably, securely 2 Transfer Sequencing center
  • 18.
    Researcher initiates transfer request; orrequested automatically by script, science gateway Researcher selects files to share, selects user or group, and sets access permissions Collaborator logs in to access shared files; no local account needed; download via Globus Publication repository Personal Computer 1 3 Share 5 Compute facilityGlobus transfers files reliably, securely 2 Transfer Sequencing center Globus controls access to shared files on existing storage; no need to move files to cloud storage! 4
  • 19.
    Researcher initiates transfer request; orrequested automatically by script, science gateway Researcher selects files to share, selects user or group, and sets access permissions Collaborator logs in to access shared files; no local account needed; download via Globus Researcher assembles data set; attaches metadata (Dublin core, domain-specific) Publication repository Personal Computer 1 3 Share Publish 5 6 6 Compute facilityGlobus transfers files reliably, securely 2 Transfer Sequencing center Globus controls access to shared files on existing storage; no need to move files to cloud storage! 4
  • 20.
    Researcher initiates transfer request; orrequested automatically by script, science gateway Curator reviews and approves; data set published on campus or other system Researcher selects files to share, selects user or group, and sets access permissions Collaborator logs in to access shared files; no local account needed; download via Globus Researcher assembles data set; attaches metadata (Dublin core, domain-specific) Publication repository Personal Computer 1 3 Share Publish 5 6 6 7 Compute facilityGlobus transfers files reliably, securely 2 Transfer Globus controls access to shared files on existing storage; no need to move files to cloud storage! 4 Sequencing center
  • 21.
    Researcher initiates transfer request; orrequested automatically by script, science gateway Curator reviews and approves; data set published on campus or other system Researcher selects files to share, selects user or group, and sets access permissions Collaborator logs in to access shared files; no local account needed; download via Globus Researcher assembles data set; attaches metadata (Dublin core, domain-specific) Peers, collaborators search and discover datasets; transfer and share using Globus Publication repository Personal Computer 1 3 Share Publish Discover 5 6 6 7 8 Compute facilityGlobus transfers files reliably, securely 2 Transfer Globus controls access to shared files on existing storage; no need to move files to cloud storage! 4 Sequencing center
  • 22.
    Researcher initiates transfer request; orrequested automatically by script, science gateway Curator reviews and approves; data set published on campus or other system Researcher selects files to share, selects user or group, and sets access permissions Collaborator logs in to access shared files; no local account needed; download via Globus Researcher assembles data set; attaches metadata (Dublin core, domain-specific) Peers, collaborators search and discover datasets; transfer and share using Globus Publication repository Personal Computer • Only Web browser required • Use any storage system • Access using any credential 1 3 Share Publish Discover 5 6 6 7 8 Compute facilityGlobus transfers files reliably, securely 2 Transfer Sequencing center Globus controls access to shared files on existing storage; no need to move files to cloud storage! 4
  • 23.
    How Globus addsvalue… • Ease of use, consistent user interface across systems • “Fire-and-forget” reliable file transfer • Low-overhead external collaboration • Secure access, multi-tier security model • Maximized wide area network throughput • Rapid deployment via standard packages • Highly automatable: CLI, RESTful API 23
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
    Storage connectors Standard (Posix) Linux Windows MacOS Lustre,GPFS, OrangeFS, ... 31 Premium HPSS HDFS Amazon S3 Ceph RadosGW (S3 API) Spectra Logic BlackPearl Google Drive* * Coming soon
  • 32.
    Globus accelerates disk-to-diskthroughput 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000 scp scp (w/HPN) sftp GridFTP (1 stream) GridFTP (4 streams) Disk-to-Disk Throughput (Mbps) 32Source: ESnet (2016) • Berkeley, CA to Argonne, IL (RTT: 53 ms, Capacity: 10Gbps) • scp is 24x slower than GridFTP • >1 Gbps (125 MB/s) disk-to-disk requires RAID array
  • 34.
  • 35.
  • 36.
  • 37.
    Globus is widelyused 4 major services 13 national labs 190 PB transferred 10,000 active endpoints 20 billion files processed 10,000 active users 50,000 registered users 99.9% uptime 35+ institutional subscribers 1 PB largest single transfer to date 3 months longest continuously managed transfer 130 federated campus identities
  • 38.
    0 20 40 60 80 100 120 140 160 180 2015-03 2015-04 2015-05 2015-06 2015-07 2015-08 2015-09 2015-10 2015-11 2015-12 2016-01 2016-02 2016-03 2016-04 2016-05 2016-06 2016-07 2016-08 Terabytes Year and Month Terabytesper Month 0 20 40 60 80 100 120 140 2015-03 2015-04 2015-05 2015-06 2015-07 2015-08 2015-09 2015-10 2015-11 2015-12 2016-01 2016-02 2016-03 2016-04 2016-05 2016-06 2016-07 2016-08 Users Year and Month Users per Month Globus @ NIH
  • 39.
    Globus subscriptions forsustainability • Standard subscription • Shared endpoints • Data publication • HTTPS support* • Management console • Usage reporting • Priority support • Application integration • Branded Web Site • Premium Storage Connectors • Amazon S3, Ceph, HPSS, Spectra, Google Drive*, … • Alternate Identity Provider (InCommon is standard) 39*Available late 2016
  • 41.
  • 42.
    Cloud: Outsourcing andautomation 42 Software as a service: SaaS Infrastructure as a service: IaaS Platform as a service: PaaS (web & mobile apps) PaaS for science
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
    10GE10GE 10GE 10GE Border Router WAN Science DMZ Switch/Router Firewall Enterprise perfSONAR perfSONAR 10GE 10GE 10GE 10GE DTN DTN APIDTNs (data access governed by portal) DTN DTN perfSONAR Filesystem (data store) 10GE Portal Server Browsing path Query path Portal server applications: web server search database authentication Data Path Data Transfer Path Portal Query/Browse Path 47 https://fasterdata.es.net/ Globus leverages Science DMZs
  • 48.
    Prototypical research dataportal • Move portal storage into Science DMZ, with Globus endpoint • Leave portal web server behind firewall • Globus handles security and data heavy lifting 48 Desktop Globus Cloud Firewall Science DMZ Globus Transfer Service Portal Web Server (Client) Globus Auth Browser User’s Endpoint (optional) Portal Endpoint Other Endpoints HTTPS GridFTP REST Other Services Globus Web Widgets
  • 50.
  • 51.
  • 52.
    Workflows can beeasily defined and automated with integrated Galaxy Platform capabilities Data movement is streamlined with integrated Globus transfer Resources can be provisioned on- demand with Amazon Web Services cloud based infrastructure Globus Genomics: Genomics analysis as a service Ravi Madduri et al., University of Chicago
  • 53.
    Globus Genomics usecases A profile of inherited predisposition to breast cancer among Nigerian women Y. Zheng, T. Walsh, F. Yoshimatsu, M. Lee, S. Gulsuner, S. Casadei, A. Rodriguez, T. Ogundiran, C. Babalola, O. Ojengbede, D. Sighoko, R. Madduri, M.-C. King, O. Olopade A case study for high throughput analysis of NGS data for translational research using Globus Genomics D. Sulakhe, A. Rodriguez, K. Bhuvaneshwar, Y. Gusev, R. Madduri, L. Lacinski, U. Dave, I. Foster, S. Madhavan
  • 54.
    Globus Genomics ata glance 30 institutions, groups 10s million core hours 2 PBs raw sequence analyzed 1,500 analysis tools 10,000 genomes processed 50 workflows 99% uptime over the past two years 1 PB data generated 43 steps in longest pipeline 100s species 75 largest user group 5 days longest running workflow
  • 55.
    Cost-aware provisioning oncloud resources 55 1. Filter instance types with profiles 2. Determine price for each instance type across all availability zones 3. Rank potential requests 4. Make requests and monitor 5. Cancel or repurpose excess active requests once one is fulfilled Can reduce costs by 95% or more! $$$ ??? R. Chard et al. Cost-aware cloud provisioning, 11th IEEE International Conference on e-Science (e-Science), 2015.
  • 56.
    What’s coming soon:Richer endpoints HTTPS access to endpoints • Enhanced use of research storage: • Asynchronous, bulk transfer: GridFTP • Synchronous remote access: HTTPS • Enhanced Globus web app • Browser-based upload/download • Inline file viewer • Integration with clients, web apps 56 GridFTP
  • 57.
    What’s coming soon:Richer endpoints 57 GridFTP Collections • Groupings of files that are to be treated as logical units • Can be named and described HTTPS access to endpoints • Enhanced use of research storage: • Asynchronous, bulk transfer: GridFTP • Synchronous remote access: HTTPS • Enhanced Globus web app • Browser-based upload/download • Inline file viewer • Integration with clients, web apps
  • 58.
    What’s coming soon:Richer endpoints 58 Data search • Automated metadata harvesting • From Globus endpoints • Submitted via REST API • Rich search capabilities • Free text, faceted, boosted GridFTP HTTPS access to endpoints • Enhanced use of research storage: • Asynchronous, bulk transfer: GridFTP • Synchronous remote access: HTTPS • Enhanced Globus web app • Browser-based upload/download • Inline file viewer • Integration with clients, web apps Collections • Groupings of files that are to be treated as logical units • Can be named and described
  • 59.
    Thank you toour sponsors U . S . D E P A R T M E N T O F ENERGY 59 Thanks to: Rachana Ananthakrishnan, Kyle Chard, Ravi Madduri, Brigitte Raumann, Steve Tuecke, Vas Vasiliadis, and others in the Globus team at the University of Chicago
  • 60.
    Globus provides anew global-scale data fabric that can accelerate discovery by streamlining scientific data sharing and analysis • Globus-enabled storage systems enable robust, secure access • Globus cloud services implement transfer, sharing, publication, discovery, and other capabilities This fabric is: • Being applied in cancer research • Spreading rapidly by word of mouth (scientists like it!) • Widely deployed across universities and labs (thanks, NSF & DOE) • On a path to sustainability based on subscriptions • Being integrated into research infrastructures and applications 60
  • 61.
    To accelerate impactin biomedicine: •Integrate biomedical research facilities into the fabric •Encourage subscriptions to address sustainability •Provide HIPAA compliance for applications involving PHI •Cultivate an ecosystem of data portals and applications that leverage the platform •Continue to add capabilities 61 www.globus.org foster@uchicago.edu

Editor's Notes

  • #4 Colonel Dwight D. Eisenhower
  • #7 Amazon VPC Microsoft one Add consumers Consumers, SMBs, large enterprises
  • #8 Amazon VPC Microsoft one Add consumers Consumers, SMBs, large enterprises
  • #35 Data Publication and Discovery
  • #43 Amazon VPC Microsoft one Add consumers Consumers, SMBs, large enterprises
  • #54 We built this pipeline to create high quality variants using multiple genotyping algorithms