How to Transform Clinical Trial Management with Advanced Data Analytics
NCI Cancer Research Data Commons - Overview
1. NCI Cancer Research Data Commons
Todd Pihl, Ph.D., PMP
Technical Project Manager
Frederick National Laboratory for Cancer Research
Imaging Community Call
October 7th, 2019
2. • Enable the cancer research community to share diverse
data types across programs and institutions.
• Provide easy access to data, regardless of where it is stored.
• Provide mechanisms for innovative tool discovery, access, usage.
• Help NCI Data Coordinating Centers sustain and share their data
publicly.
• Develop a set of reusable components - a framework - for the
community to use to build interoperable data commons.
2
Goals of the NCI Cancer Research Data Commons
3. • Data are stored in domain-specific
repositories, called Data Nodes
(e.g., genomic, proteomic,
imaging, etc.).
• Data access is controlled through a
common Authentication and
Authorization mechanism that
secures the data.
• Researchers can bring their own
data and tools to the cloud, and
combine with the data in the CRDC
for integrative analysis.
3
Cancer Research
Data Commons
4. Cancer Data Aggregator
Common Data / Metadata Model (CRDC-H)
APIs
Genomic
Data Commons
IndexD
Cloud-based
Data Repository
Genomic
Data Model
Node Portal
DCF Digital
ID /
Metadata
Services
NCI Cloud
Resources
Analytic
Tools
User
Workspaces
IndexD
Cloud-based
Data Repository
Imaging
Data Model
Node Portal
DCF Digital
ID /
Metadata
Services
APIs
Imaging
Data Commons
IndexD
Cloud-based
Data Repository
Proteomic
Data Model
Node Portal
DCF Digital
ID /
Metadata
Services
APIs
Proteomic
Data Commons
Portals & Applications
APIs
Integrated Canine
Data Commons
IndexD
Cloud-based
Data Repository
I/O
Data Model
Node Portal
DCF Digital
ID /
Metadata
Services
5. 5
NCI Cloud Resources
Democratize access to
NCI-generated genomic
and related data, and to
create a cost-effective
way to provide scalable
computational capacity
to the cancer research
community.
Cloud Resources provide:
• Access to large genomic data sets without need to download
• Ability for researchers to bring their own tools and pipelines to the data
• Ability for researchers to bring their own data and analyze in combination
with existing genomic data
• Workspaces, for researchers to save and share their data and results of
analyses
SBG CGC
Broad FireCloud ISB CGC
6. Center for Cancer Data Harmonization (CCDH)
• Create a harmonized data model and provide cross-mapping to
domain-specific node data models.
• Provide semantic concierge services to CRDC nodes and NCI Data
Coordinating Centers related to data models, metadata, and
terminology.
• Implement a web-based portal.
• Create, adapt, and disseminate data harmonization tools for use by
CRDC nodes and NCI Data Coordinating Centers.
• As needed, develop new terminology, metadata, mappings, and
models to support data aggregation through the CRDC.
6
7. Status of CRDC Components
• Data Commons Framework
• A&A (Fence), ID & indexing (IndexD) in use
by Cloud Resources and some Data Nodes
• Other components developed and ready for
use by Data Nodes
• Cloud Resources
• Fully operational
• Genomics Data Commons (GDC)
• Fully operational
• Proteomic Data Commons (PDC)
• Contract awarded September 2017
• Pilot launched October 2018; Production
version Spring 2020
• Cancer Immuno-oncology Data Commons
(CIDC)
• Awarded September 2017
• Integrated Canine Data Commons (ICDC)
• Awarded September 2018
• Imaging Data Commons (IDC)
• Awarded July 2019
• Population Cohort(s) Data Commons
• Concept phase
• Center for Cancer Data Harmonization
• Awarded September 2019
• Cancer Data Aggregator
• RFP:
https://frederick.cancer.gov/workwithus/so
licitations/s20-001
• Cancer Data Services (CDS)
• Operational
7
8. TCIA and IDC
• TCIA is one of the data collection centers that feed data into the IDC.
No changes in function for TCIA is intended.
• TCIA will provide radiology and pathology data to IDC
• IDC will focus on a cloud-only, no download, analysis approach
8