The Cancer Imaging Archive (TCIA) is a large online archive of medical images and associated clinical data from cancer patients. It contains a variety of imaging modalities like CT, MRI, and PET scans covering many cancer types. The archive aims to support precision medicine by linking imaging data to molecular and genomic data from sources like The Cancer Genome Atlas. It provides a growing collection of over 40,000 subjects and 70 datasets that are frequently used in research publications and challenge competitions. The TCIA helps relieve researchers of data sharing burdens and provides hosting, de-identification, and support services to submitters and users of the archive.
An introduction to The Cancer Imaging Archive (Hands on)
1. http://cancerimagingarchive.net
Justin Kirby – justin.kirby@nih.gov
Frederick National Laboratory for Cancer Research
Leidos Biomedical Research, Inc.
Support to: Cancer Imaging Program/DCTD/NCI
Hands on workshop
RSNA 2017
2. 2
The Cancer Imaging Archive
• Covers most modalities (CT/MR/PET/RT)
• Wide variety of cancers + phantoms
• Patient populations vary from a handful to
>26,000 (NLST)
• Many have associated meta-data
Demographics/outcomes/therapy
Pathology imaging
Radiologist expert and automated
computational analyses (segmentations,
features)
• ‘Omics via TCGA, CPTAC, and GEO
http://www.cancerimagingarchive.net
3. 3
Special focus on precision medicine data sets
Genomic/Proteomic data derived from
tissue lacks critical information about
tumor location, size, heterogeneity and
surrounding tissue
4. 4
The Cancer Genome Atlas (TCGA)
Data types
• Clinical diagnosis
• Treatment history
• Histologic diagnosis
• Pathologic report/images
• Tissue anatomic site
• Surgical history
• Gene expression
• RNA sequence
• Chromosomal copy number
• Loss of heterozygosity
• Methylation patterns
• miRNA expression
• DNA sequence
• RPPA (protein)
• Subset for Mass Spec
Data types
• CT
• MRI
• PET
Clinical Features
Genomic
Features
Pathologic
Features
Imaging
Features
5. 5
Clinical Proteomic Tumor Analysis Consortium
Adapted from - http://metabolomics.se/sites/default/files/courses_files/History%20of%20Omics%20cascade_Wheelock.pdf
Raghu Vikram 2012
TRANSCRIPT(OMICS)
PROTEOME (OMICS)
METABOLOME (OMICS)
What appears to be happening
What makes it happen
What has happened
IMAGING(OMICS)
p
h
e
n
o
t
y
p
e
What can happen
6. 6
Source data for challenge competitions
PROSTATEx Classification Challenge
7. 7
A growing user community
40,000+ total subjects in the archive
70+ data sets currently available
• 21 from The Cancer Genome Atlas project
• 10 from the Quantitative Imaging Network
• NCI Clinical trials
526 publications based on TCIA data
Source data for 10 challenge competitions
Over 7,000 active users per month
Downloads of ~40TB per month
8. 8
TCIA Site Architecture
The
Cancer
Imaging
Archive
Data Collection Center
•Tools and staffing to support data
collection, curation, and de-
identification
Data Access
•Browse (home page)
•Filter/Search (Data Portal)
•REST API
•Analysis Data
Data Analysis Centers
•3rd party web sites or tools which
connect to TCIA’s API or mirror its
data
9. 9
TCIA services
Relieves PI of majority of data sharing burden/risks
• Data hosting with >99% uptime
• De-identification using pre-configured RSNA’s Clinical Trials Processor (CTP) and
DICOM PS 3.15 Annex E standards
• Multi-tiered QC process inspects both DICOM headers and pixels for PHI and
integrity of data set
Phone/email support available for end users and submitters
Extensive documentation throughout the site
Publish your data and gain exposure to a large community of researchers
• Increase visibility of your work, get more citations!
10. 10
Publishing data in addition to manuscripts
Data citations for both primary and analysis data to enable reproducible research
Analysis Dataset Citation (derived image features)
Gutman DA, Cooper LA, Hwang SN, Holder CA, Gao J, Aurora TD, Dunn WD Jr, Scarpace L,
Mikkelsen T, Jain R, Wintermark M, Jilwan M, Raghavan P, Huang E, Clifford RJ, Mongkolwat
P, Kleper V, Freymann J, Kirby J, Zinn PO, Moreno CS, Jaffe C, Colen R, Rubin DL, Saltz J,
Flanders A, Brat DJ. (2014). MR Imaging Predictors of Molecular Profile and Survival: Multi-
institutional Study of the TCGA Glioblastoma Data Set. The Cancer Imaging Archive.
http://doi.org/10.7937/K9/TCIA.2014.4HTXYRCN
Publication Citation (cites specific data used)
MR imaging predictors of molecular profile and survival: multi-
institutional study of the TCGA glioblastoma data set. Radiology.
2013 May;267(2):560-9. doi: 10.1148/radiol.13120118. Epub
2013 Feb 7. PubMed PMID: 23392431; PubMed Central PMCID:
PMC3632807.
Primary Data Citation (TCIA images used for study)
Smith K, Clark K, Bennett W, Nolan T, Kirby J, Wolfsberger M, Moulton J,
Vendt B, Freymann J. Radiology Data from The Cancer Genome Atlas
Glioblastoma Multiforme (TCGA-GBM)
collection. http://dx.doi.org/10.7937/K9/TCIA.2016.RNYFUYE9
Gene Expression Omnibus - in future linking to clinical trial data website and Genomic Data Commons
Must include imaging to add the critical aspects of location and heterogeneity – Bx can be a few mm in a multi cm tummor
TCIA collected the clinical imaging for the TCGA project cases, so the imaging can be correlated and combined with the data TCGA was making available.
Provide DOIs to collections and meta-collections (article’s analysis)
Publication can refer to the specific data sets used via the DOIs in the data citations
Currently working with NLM, collaborating with Nature Scientific Data and other publications