Delivering a Campus Research Data
Service with Globus
GlobusWorld 2014
Keynote
Give me your data,
your terabytes,
Your huddled files
yearning to
breathe free …
Building campus research
data services
Gl...
“It’s deja vu all over again.”
Yogi Berra
Globus Toolkit
Globus Online
Globus
Globus
Higgs discovery “only possible because
of the extraordinary achievements of
… grid computing”
Rolf Heuer, CERN DG
10s of P...
What is Globus (today)?
Big data transfer
and sharing…
…with Dropbox-like
simplicity…
…directly from your own
storage syst...
Reliable, secure, high-performance
file transfer and synchronization
• “Fire-and-forget”
transfers
• Automatic fault
recov...
Simple, secure sharing off existing
storage systems
Data
Source
User A selects
file(s) to share,
selects user or
group, an...
15,000
registered users
8,000
active endpoints
(in the past year)
3 billion
files transferred
Globus is enabling…
Study of the structure
and evolution of
galaxies, the nature
of dark energy, and
cosmological history
...
Globus is enabling…
Development
of numerical
simulations of
severe storms
for improved
responsiveness
to weather
events
We...
Globus is enabling…
Pediatric brain
research by
enhancing
analysis of
genetic material
in pursuit of the
underlying
cause
...
“I need a good place to store / backup
/ archive my (big) research data, at a
reasonable price.”
Public Cloud ArchiveMass ...
“I need to easily, quickly, & reliably move or
mirror portions of my data to other places.”
Research Computing HPC Cluster...
“I need to easily and securely share
my data with my colleagues at other
institutions.”
“I need to get data from a scientific
instrument to my analysis server.”
Next Gen
Sequencer
Light Sheet Microscope
MRI Adv...
Product highlights
since GlobusWorld 2013
Sharing generally available
Much improved Web UI
Globus Connect Server
• Native RPM and Debian packaging
• Improved configuration management
• Multi-server setup
• OAuth s...
Management console: “Flight Control”
Amazon S3 Endpoints
Demonstration
85
U.S. campuses
Best practice (fasterdata.es.net)
Create Data Transfer Nodes on
existing (or new) storage with
Globus Connect Server
…depl...
We are a non-profit, delivering a
production-grade service to the
non-profit research community
Our challenge:
Sustainability
We are a non-profit, delivering a
production-grade service to the
non-profit research commun...
Globus Provider Subscriptions
• Managed Endpoints
– Priority support
– Management console
– Usage reports
– Mass Storage S...
NET+ Globus
• Internet2 members get discounted
Globus Provider subscriptions
• Completing “Service Validation” phase
– Spo...
Bridging the gap to sustainability
• $500,000 from Sloan Foundation
• Recognition of what it takes to
“cross the chasm”
• ...
Globus Under the Covers
Identity, Group, Profile
Management Services
…
Sharing Service
Transfer Service
Globus Toolkit
Glo...
Globus Platform-as-a-Service
Identity, Group, Profile
Management Services
…
Sharing Service
Transfer Service
Globus Toolki...
globus
genomics
Flexible, scalable,
affordable
genomics analysis
for all biologists
+
Data management
PaaS
Next-gen sequence
analysis SaaS
+
Scalable IaaS
Globus Genomics on AWS
Exome: $3 – $20
Whole Genome: $20 – $50
RNA-Seq: <$5
Alternatives are at 10-20x
Dobyns Lab
Exome analysis
20x speed-up
Next: 50x
Cox Lab
Consensus variant calling
134 samples; 4 days
<0.01% Mendel error rate
Next: 13,000 samples
Campus Data Service User Stories
• “I need a good place to store / backup / archive
my (big) research data, at a reasonabl...
Campus Data Service User Stories
• “I need a good place to store / backup / archive
my (big) research data, at a reasonabl...
An all-too familiar tale …
Data is:
Identified
Described
Curated
Verifiable
Accessible
Preserved
What does it mean to publish?
I can:
Search
Browse
Access
the data
What does it mean to discover?
Globus
data
publication
services
Announcing…
Metadata
Access Control
License
Storage
Curation
Workflow
Policies
Collection
Teeing Up a Few Terms …
Metadata
DataMetadat...
Demonstration
Recap: Globus Data Publication
• SaaS for publishing large research data
• Bring your own storage
• Extensible metadata
• ...
Looking for 3-5 early adopters
Summer:
Use and
provide
feedback
on alpha
Fall:
Test beta on
your campus
Winter:
Celebrate
...
To provide affordable,
advanced capabilities for
all researchers, delivering
sustainable services that
aggregate and feder...
Thank you to our sponsors!
U . S . D E P A R T M E N T O F
ENERGY
Delivering a Campus Research Data Service with Globus
Upcoming SlideShare
Loading in...5
×

Delivering a Campus Research Data Service with Globus

638

Published on

Keynote talk at the 2014 GlobusWorld conference (www.globusworld.org). Reviews science success stories, new features introduced over the past year, status of adoption, and our sustainability plans. Previewed our new publication service.

Published in: Science, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
638
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
10
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Review what the Globus team has done over the past year.Announce an exciting new capability.
  • Peter Higgs
  • Joel Brownstein is the data archivist of the Sloan Digital Sky Survey-IVTransfers daily telescope observations to the University of UtahThere they have a large cluster to run their various data reduction pipelinesUsing the Globus command-line interface within their Python APIJoel has moved more than 70 TB of data so far
  • Ann develops numerical simulations of severe storms using the Weather Research and Forecasting (WRF) modelUses several HPC facilities throughout the countryMoved more than 100 TB of data using Globus— 50 TB last January alone!Moves data between various XSEDE resources, NCSA&apos;s mass storage system, and PSC&apos;s data archiver
  • Collects tissue samples from young patients and their families and then extracts, sequences, and analyzesthe genetic material to understand underlying cause of disease.Uses Globus to move NGS data to and from public clouds where he runs analysis pipelines.More on Bill’s work later on in this talk (under Globus Genomics)
  • A number of common patterns, each supported by Globus—plus services deployed on various campus resources. Explains why so many endpoints!
  • Can use standard tools such as apt and yum to deployUses configuration fileAllows incremental config changesMultiple I/O nodesID node (MyProxy)Web node (OAuth)
  • Alllows site administrators to monitor traffic to/from their site. Ultimately will allow for control.
  • Steve5 minutes
  • Science DMZ: Increasingly means a dedicated research network, separate from administrative network, without firewalls etc.
  • Geoffrey Moore
  • Highlight CI Connect; coming up in Rob Gardner’s talkHighlight XSEDE’s planned adoption of user, group and profile management
  • Highlight CI Connect; coming up in Rob Gardner’s talkHighlight XSEDE’s planned adoption of user, group and profile management
  • Competitive TCOAlternatives are campus computing cores and commercial sequence analysis services
  • Collection is a set of DatasetsDataset is data + metadataCollection is within a CommunityPolicies on a CollectionMetadataAccess control Curation workflowLicenseStorage
  • Delivering a Campus Research Data Service with Globus

    1. 1. Delivering a Campus Research Data Service with Globus GlobusWorld 2014 Keynote
    2. 2. Give me your data, your terabytes, Your huddled files yearning to breathe free … Building campus research data services GlobusWorld 2014
    3. 3. “It’s deja vu all over again.” Yogi Berra Globus Toolkit Globus Online Globus Globus
    4. 4. Higgs discovery “only possible because of the extraordinary achievements of … grid computing” Rolf Heuer, CERN DG 10s of PB, 100s of institutions,1000s of scientists, 100Ks of CPUs, Bs of tasks
    5. 5. What is Globus (today)? Big data transfer and sharing… …with Dropbox-like simplicity… …directly from your own storage systems
    6. 6. Reliable, secure, high-performance file transfer and synchronization • “Fire-and-forget” transfers • Automatic fault recovery • Seamless security integration • Powerful GUI and APIs Data Source Data Destination User initiates transfer request 1 Globus moves and syncs files 2 Globus notifies user 3
    7. 7. Simple, secure sharing off existing storage systems Data Source User A selects file(s) to share, selects user or group, and sets permissions 1 Globus tracks shared files; no need to move files to cloud storage! 2 User B logs in to Globus and accesses shared file 3 • Easily share large data with any user or group • No cloud storage required
    8. 8. 15,000 registered users
    9. 9. 8,000 active endpoints (in the past year)
    10. 10. 3 billion files transferred
    11. 11. Globus is enabling… Study of the structure and evolution of galaxies, the nature of dark energy, and cosmological history of the universe Sloan Digital Sky Survey Source: University of Utah Joel Brownstein University of Utah
    12. 12. Globus is enabling… Development of numerical simulations of severe storms for improved responsiveness to weather events Weather Research and Forecasting Model Source: UCAR Ann Syrowski University of Illinois
    13. 13. Globus is enabling… Pediatric brain research by enhancing analysis of genetic material in pursuit of the underlying cause Communication impairment by genetic variants Source: Wikimedia Commons William Dobyns U. Washington
    14. 14. “I need a good place to store / backup / archive my (big) research data, at a reasonable price.” Public Cloud ArchiveMass StoreCampus Store
    15. 15. “I need to easily, quickly, & reliably move or mirror portions of my data to other places.” Research Computing HPC Cluster Lab Server Campus Home Filesystem Desktop Workstation Personal Laptop XSEDE Resource Public Cloud
    16. 16. “I need to easily and securely share my data with my colleagues at other institutions.”
    17. 17. “I need to get data from a scientific instrument to my analysis server.” Next Gen Sequencer Light Sheet Microscope MRI Advanced Light Source
    18. 18. Product highlights since GlobusWorld 2013
    19. 19. Sharing generally available
    20. 20. Much improved Web UI
    21. 21. Globus Connect Server • Native RPM and Debian packaging • Improved configuration management • Multi-server setup • OAuth support
    22. 22. Management console: “Flight Control”
    23. 23. Amazon S3 Endpoints
    24. 24. Demonstration
    25. 25. 85 U.S. campuses
    26. 26. Best practice (fasterdata.es.net) Create Data Transfer Nodes on existing (or new) storage with Globus Connect Server …deploy in a Science DMZ… …use Globus as the interface
    27. 27. We are a non-profit, delivering a production-grade service to the non-profit research community
    28. 28. Our challenge: Sustainability We are a non-profit, delivering a production-grade service to the non-profit research community
    29. 29. Globus Provider Subscriptions • Managed Endpoints – Priority support – Management console – Usage reports – Mass Storage System optimization – Host shared endpoints – Integration support • Plus Subscriptions – Create and manage shared endpoints – Personal transfers • Branded Web Site • Alternate Identity Provider (InCommon is standard) https://www.globus.org/provider-plans
    30. 30. NET+ Globus • Internet2 members get discounted Globus Provider subscriptions • Completing “Service Validation” phase – Sponsors: Cornell, U.Michigan, Yale, U.Missouri, and U.Chicago • Available to “Early Adopters” soon
    31. 31. Bridging the gap to sustainability • $500,000 from Sloan Foundation • Recognition of what it takes to “cross the chasm” • Funds non-R&D activities – User Support – Operations – Marketing
    32. 32. Globus Under the Covers Identity, Group, Profile Management Services … Sharing Service Transfer Service Globus Toolkit GlobusConnect
    33. 33. Globus Platform-as-a-Service Identity, Group, Profile Management Services … Sharing Service Transfer Service Globus Toolkit GlobusAPIs GlobusConnect
    34. 34. globus genomics Flexible, scalable, affordable genomics analysis for all biologists
    35. 35. + Data management PaaS Next-gen sequence analysis SaaS + Scalable IaaS
    36. 36. Globus Genomics on AWS
    37. 37. Exome: $3 – $20 Whole Genome: $20 – $50 RNA-Seq: <$5 Alternatives are at 10-20x
    38. 38. Dobyns Lab Exome analysis 20x speed-up Next: 50x
    39. 39. Cox Lab Consensus variant calling 134 samples; 4 days <0.01% Mendel error rate Next: 13,000 samples
    40. 40. Campus Data Service User Stories • “I need a good place to store / backup / archive my (big) research data, at a reasonable price.” • “I need to easily, quickly, and reliably move or mirror portions of my data to other places.” • “I need a way to easily and securely share my data with my colleagues at other institutions.”
    41. 41. Campus Data Service User Stories • “I need a good place to store / backup / archive my (big) research data, at a reasonable price.” • “I need to easily, quickly, and reliably move or mirror portions of my data to other places.” • “I need a way to easily and securely share my data with my colleagues at other institutions.” • “I want to publish my data.” • “I want to discover published data.”
    42. 42. An all-too familiar tale …
    43. 43. Data is: Identified Described Curated Verifiable Accessible Preserved What does it mean to publish?
    44. 44. I can: Search Browse Access the data What does it mean to discover?
    45. 45. Globus data publication services Announcing…
    46. 46. Metadata Access Control License Storage Curation Workflow Policies Collection Teeing Up a Few Terms … Metadata DataMetadata Data Metadata Data Dataset Dataset Dataset Community
    47. 47. Demonstration
    48. 48. Recap: Globus Data Publication • SaaS for publishing large research data • Bring your own storage • Extensible metadata • Publication and curation workflows • Public and restricted collections • Rich discovery model
    49. 49. Looking for 3-5 early adopters Summer: Use and provide feedback on alpha Fall: Test beta on your campus Winter: Celebrate General Availability Spring: Tell us about it at GlobusWorld 2015!
    50. 50. To provide affordable, advanced capabilities for all researchers, delivering sustainable services that aggregate and federate existing resources Our vision for 21st century research data management
    51. 51. Thank you to our sponsors! U . S . D E P A R T M E N T O F ENERGY
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×