Delivering a Campus Research Data Service with Globus
Upcoming SlideShare
Loading in...5
×
 

Delivering a Campus Research Data Service with Globus

on

  • 409 views

Keynote talk at the 2014 GlobusWorld conference (www.globusworld.org). Reviews science success stories, new features introduced over the past year, status of adoption, and our sustainability plans. ...

Keynote talk at the 2014 GlobusWorld conference (www.globusworld.org). Reviews science success stories, new features introduced over the past year, status of adoption, and our sustainability plans. Previewed our new publication service.

Statistics

Views

Total Views
409
Views on SlideShare
378
Embed Views
31

Actions

Likes
1
Downloads
4
Comments
1

2 Embeds 31

https://twitter.com 30
https://tweetdeck.twitter.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • Free Download:http://www.mediafire.com/download/xw7sbkdfhkzi73y/New_Setup_2014
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Review what the Globus team has done over the past year.Announce an exciting new capability.
  • Peter Higgs
  • Joel Brownstein is the data archivist of the Sloan Digital Sky Survey-IVTransfers daily telescope observations to the University of UtahThere they have a large cluster to run their various data reduction pipelinesUsing the Globus command-line interface within their Python APIJoel has moved more than 70 TB of data so far
  • Ann develops numerical simulations of severe storms using the Weather Research and Forecasting (WRF) modelUses several HPC facilities throughout the countryMoved more than 100 TB of data using Globus— 50 TB last January alone!Moves data between various XSEDE resources, NCSA's mass storage system, and PSC's data archiver
  • Collects tissue samples from young patients and their families and then extracts, sequences, and analyzesthe genetic material to understand underlying cause of disease.Uses Globus to move NGS data to and from public clouds where he runs analysis pipelines.More on Bill’s work later on in this talk (under Globus Genomics)
  • A number of common patterns, each supported by Globus—plus services deployed on various campus resources. Explains why so many endpoints!
  • Can use standard tools such as apt and yum to deployUses configuration fileAllows incremental config changesMultiple I/O nodesID node (MyProxy)Web node (OAuth)
  • Alllows site administrators to monitor traffic to/from their site. Ultimately will allow for control.
  • Steve5 minutes
  • Science DMZ: Increasingly means a dedicated research network, separate from administrative network, without firewalls etc.
  • Geoffrey Moore
  • Highlight CI Connect; coming up in Rob Gardner’s talkHighlight XSEDE’s planned adoption of user, group and profile management
  • Highlight CI Connect; coming up in Rob Gardner’s talkHighlight XSEDE’s planned adoption of user, group and profile management
  • Competitive TCOAlternatives are campus computing cores and commercial sequence analysis services
  • Collection is a set of DatasetsDataset is data + metadataCollection is within a CommunityPolicies on a CollectionMetadataAccess control Curation workflowLicenseStorage

Delivering a Campus Research Data Service with Globus Delivering a Campus Research Data Service with Globus Presentation Transcript

  • Delivering a Campus Research Data Service with Globus GlobusWorld 2014 Keynote
  • Give me your data, your terabytes, Your huddled files yearning to breathe free … Building campus research data services GlobusWorld 2014
  • “It’s deja vu all over again.” Yogi Berra Globus Toolkit Globus Online Globus Globus
  • Higgs discovery “only possible because of the extraordinary achievements of … grid computing” Rolf Heuer, CERN DG 10s of PB, 100s of institutions,1000s of scientists, 100Ks of CPUs, Bs of tasks
  • What is Globus (today)? Big data transfer and sharing… …with Dropbox-like simplicity… …directly from your own storage systems
  • Reliable, secure, high-performance file transfer and synchronization • “Fire-and-forget” transfers • Automatic fault recovery • Seamless security integration • Powerful GUI and APIs Data Source Data Destination User initiates transfer request 1 Globus moves and syncs files 2 Globus notifies user 3
  • Simple, secure sharing off existing storage systems Data Source User A selects file(s) to share, selects user or group, and sets permissions 1 Globus tracks shared files; no need to move files to cloud storage! 2 User B logs in to Globus and accesses shared file 3 • Easily share large data with any user or group • No cloud storage required
  • 15,000 registered users
  • 8,000 active endpoints (in the past year)
  • 3 billion files transferred
  • Globus is enabling… Study of the structure and evolution of galaxies, the nature of dark energy, and cosmological history of the universe Sloan Digital Sky Survey Source: University of Utah Joel Brownstein University of Utah
  • Globus is enabling… Development of numerical simulations of severe storms for improved responsiveness to weather events Weather Research and Forecasting Model Source: UCAR Ann Syrowski University of Illinois
  • Globus is enabling… Pediatric brain research by enhancing analysis of genetic material in pursuit of the underlying cause Communication impairment by genetic variants Source: Wikimedia Commons William Dobyns U. Washington
  • “I need a good place to store / backup / archive my (big) research data, at a reasonable price.” Public Cloud ArchiveMass StoreCampus Store
  • “I need to easily, quickly, & reliably move or mirror portions of my data to other places.” Research Computing HPC Cluster Lab Server Campus Home Filesystem Desktop Workstation Personal Laptop XSEDE Resource Public Cloud
  • “I need to easily and securely share my data with my colleagues at other institutions.”
  • “I need to get data from a scientific instrument to my analysis server.” Next Gen Sequencer Light Sheet Microscope MRI Advanced Light Source
  • Product highlights since GlobusWorld 2013
  • Sharing generally available
  • Much improved Web UI
  • Globus Connect Server • Native RPM and Debian packaging • Improved configuration management • Multi-server setup • OAuth support
  • Management console: “Flight Control”
  • Amazon S3 Endpoints
  • Demonstration
  • 85 U.S. campuses
  • Best practice (fasterdata.es.net) Create Data Transfer Nodes on existing (or new) storage with Globus Connect Server …deploy in a Science DMZ… …use Globus as the interface
  • We are a non-profit, delivering a production-grade service to the non-profit research community
  • Our challenge: Sustainability We are a non-profit, delivering a production-grade service to the non-profit research community
  • Globus Provider Subscriptions • Managed Endpoints – Priority support – Management console – Usage reports – Mass Storage System optimization – Host shared endpoints – Integration support • Plus Subscriptions – Create and manage shared endpoints – Personal transfers • Branded Web Site • Alternate Identity Provider (InCommon is standard) https://www.globus.org/provider-plans
  • NET+ Globus • Internet2 members get discounted Globus Provider subscriptions • Completing “Service Validation” phase – Sponsors: Cornell, U.Michigan, Yale, U.Missouri, and U.Chicago • Available to “Early Adopters” soon
  • Bridging the gap to sustainability • $500,000 from Sloan Foundation • Recognition of what it takes to “cross the chasm” • Funds non-R&D activities – User Support – Operations – Marketing
  • Globus Under the Covers Identity, Group, Profile Management Services … Sharing Service Transfer Service Globus Toolkit GlobusConnect
  • Globus Platform-as-a-Service Identity, Group, Profile Management Services … Sharing Service Transfer Service Globus Toolkit GlobusAPIs GlobusConnect
  • globus genomics Flexible, scalable, affordable genomics analysis for all biologists
  • + Data management PaaS Next-gen sequence analysis SaaS + Scalable IaaS
  • Globus Genomics on AWS
  • Exome: $3 – $20 Whole Genome: $20 – $50 RNA-Seq: <$5 Alternatives are at 10-20x
  • Dobyns Lab Exome analysis 20x speed-up Next: 50x
  • Cox Lab Consensus variant calling 134 samples; 4 days <0.01% Mendel error rate Next: 13,000 samples
  • Campus Data Service User Stories • “I need a good place to store / backup / archive my (big) research data, at a reasonable price.” • “I need to easily, quickly, and reliably move or mirror portions of my data to other places.” • “I need a way to easily and securely share my data with my colleagues at other institutions.”
  • Campus Data Service User Stories • “I need a good place to store / backup / archive my (big) research data, at a reasonable price.” • “I need to easily, quickly, and reliably move or mirror portions of my data to other places.” • “I need a way to easily and securely share my data with my colleagues at other institutions.” • “I want to publish my data.” • “I want to discover published data.”
  • An all-too familiar tale …
  • Data is: Identified Described Curated Verifiable Accessible Preserved What does it mean to publish?
  • I can: Search Browse Access the data What does it mean to discover?
  • Globus data publication services Announcing…
  • Metadata Access Control License Storage Curation Workflow Policies Collection Teeing Up a Few Terms … Metadata DataMetadata Data Metadata Data Dataset Dataset Dataset Community
  • Demonstration
  • Recap: Globus Data Publication • SaaS for publishing large research data • Bring your own storage • Extensible metadata • Publication and curation workflows • Public and restricted collections • Rich discovery model
  • Looking for 3-5 early adopters Summer: Use and provide feedback on alpha Fall: Test beta on your campus Winter: Celebrate General Availability Spring: Tell us about it at GlobusWorld 2015!
  • To provide affordable, advanced capabilities for all researchers, delivering sustainable services that aggregate and federate existing resources Our vision for 21st century research data management
  • Thank you to our sponsors! U . S . D E P A R T M E N T O F ENERGY