This document discusses using the Globus data management platform to build protected data sharing networks that advance cancer risk assessment and treatment. It describes how Globus can securely manage and share data from instruments and analysis between personal resources, national resources, public clouds, and collaborators. It also allows publishing data for discoverability and establishes automated workflows for cancer research consortiums to leverage high-performance networks and cloud infrastructure.
Disentangling the origin of chemical differences using GHOST
Building Protected Data Sharing Networks to Advance Cancer Risk Assessment and Treatment
1. Building Protected Data Sharing
Networks to Advance Cancer Risk
Assessment and Treatment
Ian Foster (with Ravi Madduri & Brigitte Raumann)
University of Chicago
foster@uchicago.edu
2. Securely, reliably move/sync/backup across tiers
2
Research Computing HPC
Desktop Workstation
Mass Storage Next-Gen Sequencer
National Resources
Personal Resources
Public Cloud
3. Manage data from instruments
Analysis store
Next Gen
Sequencer
Light Sheet
Microscope
MRI
Advanced
Light Source
Personal computer
Remote visualization
11. • Workflows can be easily
defined and automated with
integrated Galaxy Platform
capabilities
• Data management is
streamlined using Globus
• Resources can be provisioned
on-demand with Amazon Web
Services cloud-based
infrastructure
Ravi Madduri et al., University of Chicago
Globus Genomics
12. Globus security and data management platform
leveraged for the NIH Data Commons infrastructure
13.
14. ITCR Collaboration with LSU Cancer Center
14
• LSU Health has new, NSF funded network,
but cancer researchers can’t use it to full
advantage with existing tools.
• Cancer center sequence core delivers data
on hard drives
• Automate secure data management platform for the Louisiana Cancer
Research Consortium Translational Genomics Core that leverages the
new NSF-funded high performance network and Science DMZ.
• Establish network of Globus endpoints in Cancer Center labs to
connect LSU Health to collaborators at other institutions, national
resources, cloud resources, data portals.
Enable a network of researchers to assess the genetic basis of hematologic cancer risk. We will work with a system of regional hubs to establish a Protected Hematologic Cancer Research Network for sharing identified data from families to assess the genetic contribution to risk of blood and bone marrow cancers.
Data will be generated in one location and secure data movement and sharing via Globus will enable each part of our collaborative network to use common primary sequencing data with synchronized annotations to ask separate, complementary questions, based on a hub’s individual expertise, and will facilitate real-time discussions of centralized data.
https://imputation.sanger.ac.uk/?instructions=1
This service, operated by the Wellcome Trust Sanger Institute at https://imputation.sanger.ac.uk, allows you to upload files containing genome wide association study (GWAS) data from the 23andMe genotyping service and receive back the results of imputation and other analyses that identify variants genes that you are likely to possess based on those GWAS data (McCarthy et al., 2016).
The JGI Genome Portal provides unified access to all JGI genomic databases and analytical tools. A user can search, download and explore multiple data sets available for all DOE JGI sequencing projects including their status, assemblies and annotations of sequenced genomes.
Develop an automated data distribution pipeline for delivery of genomic data.
The current data distribution practice at the Louisiana Cancer Research Consortium Translational Genomics Core requires copying sequence data onto physical media
We will establish an automated, secure data management platform for the TGC that leverages the new NSF-funded high performance network and Science DMZ to allow near-real time data access by researchers. The solution ensures data integrity, offers fine grained access control permissions, and provides secure data delivery. Data delivery to the end user will be integrated into the experimental workflow of the facility in a way that is reliable and scalable to thousands of users and datasets.
TGS staff will have access to administrative and monitoring capabilities that allow them to check for successful transfer of the data. Using the management interfaces provided by Globus, transfer of data by the user can be monitored and the data will be automatically deleted or archived, depending on the policies set by the TGC.
The LSU Health Identity Provider will be integrated with Globus to allow researchers to use their familiar login for data management.
Aim 2: Connect LSU Health Cancer Center members to data sharing and transfer networks.
We will establish a network of five to seven Globus endpoints in laboratories of Cancer Center members, allowing researchers to take full advantage of the new high speed network when exchanging data with collaborators or when transferring data from the TGC. Depending on the needs of LSU Health cancer researcher, we will support installation of Globus endpoints at collaborating institutions that are not already Globus enabled, for example Tulane University and Xavier University of Louisiana.
The network of Globus endpoints will not only allow researchers to share data among themselves, but connect them to a broader national and international network of hundreds of institutions and data portals already using Globus to exchange and distribute data.