Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data Sharing via Globus in the NIH Intramural Program

14 views

Published on

This presentation was given at the 2019 GlobusWorld Conference in Chicago, IL by Susan Chacko from the National Institutes of Health (NIH).

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Data Sharing via Globus in the NIH Intramural Program

  1. 1. hpc.nih.gov Data Sharing via Globus in the NIH Intramural Program Susan Chacko High Performance Computing National Institutes of Health
  2. 2. hpc.nih.gov
  3. 3. hpc.nih.gov The NIH intramural program’s large-scale high-performance computing resource completely dedicated to biomedical computing • High availability and high data durability • Designed for general-purpose scientific computing (not dedicated to any single application type) • Dedicated staff with expertise in high-performance computing and computational biology Biowulf: the NIH Intramural Program HPC system
  4. 4. hpc.nih.gov 95,000 compute cores 560 GPUs 35 PB storage 640 Principal Investigators (labs) 2200 Users 650+ Scientific Applications Biowulf in 2019
  5. 5. hpc.nih.gov Globus -- 2014
  6. 6. hpc.nih.gov TB Outbound data Inbound data 8 DTNsSingle host Globus Transfers since 2014 20192018201720162015 20192018201720162015 50 100 150 200 250 50 100 150 200 250 Outbound data Inbound data 20192018201720162015
  7. 7. hpc.nih.gov Globus Transfers in the last year ~ 3 PB of biomedical data transferred 450 unique users 2000 unique hosts High Points 24 million files in Oct 2018 300 TB in March 2019 NIH site license ~20 Endpoints at NIH
  8. 8. hpc.nih.gov Data Sharing via Globus Many NIH researchers have outside & international collaborators Globus shares Oct 2018: use Globus SDK -> get list of user shares 1900+ user shares via Globus on the NIH HPC Systems!!
  9. 9. hpc.nih.gov Globus Shares on NIH HPC ~ 50 new shares/week 35% defunct shares
  10. 10. hpc.nih.gov Shares per User ~100 users with 1 share each 8 users with > 100 shares each
  11. 11. hpc.nih.gov Data Sharing via Globus NCI Sequencing Core Facility - serves 150 labs and collaborators NICHD Sequencing Facility - serves 11 labs - 10,000 samples sequenced and shared since 2014 - 150 TB data shared off NIH HPC in 2018 - additional data shared off their own Globus endpoint - transfers ~ 15 TB /year
  12. 12. hpc.nih.gov Wishlist • Admin ability to delete endpoints • Admin ability to prohibit ‘world-write’ shared endpoints (and maybe ‘world-read’ as well) • Admin ability to get ‘create date’ for share • Users who set up a shared endpoint would like to know when data has been downloaded

×