1. Brett Whitty, DCC Curation Group
7th International Workshop
Heidelberg, Germany
International Cancer Genome
Consortium
Data Coordination Center Update
2. DCC Updates in 2012
August 2012 • ICGC 9, first data release from Canadian Pediatric
Medulloblastoma (MAGIC) project, full update of 18
cancer types from the U.S. TCGA project including 6
new databases, update of German Pediatric Brain
Tumour (PedBrain) project. Added 3,029 donors.
November 2012 • ICGC 10, two new cancer types from the U.S. TCGA
project as well as updates to 18 other TCGA project,
updates from the Spanish Chronic Lymphocytic
Leukemia project including new methylation data,
updates to U.K. Breast Carcinoma and Chronic Myeloid
Disorders project databases. Added 432 donors.
December 2012 • ICGC 11, first data release from 4 projects: German
Malignant Lymphoma, Canadian Prostate Cancer,
German Prostate Cancer, U.K. Prostate Cancer; new
data from an additional 4 projects: Australian Pancreatic
Cancer, Canadian Pancreatic Cancer, Japanese Liver
Cancer, German Pediatric Brain Tumour (PedBrain).
Added 336 donors.
3. 3
• Cancer types: 42 (including TCGA, TSP, JHU)
• Donors: 7,358 (14,645 specimens)
• Simple somatic mutations: 3,761,508
• Copy number mutations: 25,227,388
• Structural rearrangements: 2,079
• Genes affected* by simple somatic mutations: 19,901
• Genes affected* by non-synonymous coding mutations: 19,675
• Genes affected* by copy number mutations: 20,109
• Genes affected* by structural rearrangements: 1,060
*out of 21,420 protein coding genes annotated in Ensembl Human release 66
• Open tier and controlled data currently available
ICGC dataset version 11
December 2012
4. ICGC DCC Has Received Data for 7,358 Cancer Genomes
and Counting
4
Release 7
Release 8
Release 9
Release 10
Release 11
7. Completeness of Data for Genomic Analysis Types in
DCC Datasets
(ICGC 11)
7
Copy Number Alterations
Structural Variation
Gene Expression
miRNA Expression
Simple Somatic Mutations
Splicing Variation
DNA Methylation
#
DONOR
S
8. Completeness of Genomic Analysis Data Types in DCC
Datasets (2)
8
miRNA ExpressionSimple Somatic
Mutations
Splicing Variation DNA MethylationCopy Number
Alterations
Structural Variation Gene Expression
9. Completeness of Genomic Analysis Data Types in DCC
Datasets (3)
9
miRNA ExpressionSimple Somatic
Mutations
Splicing Variation DNA MethylationCopy Number
Alterations
Structural Variation Gene Expression
10. Completeness of Genomic Analysis Data Types in DCC
Datasets (4)
10
miRNA ExpressionSimple Somatic
Mutations
Splicing Variation DNA MethylationCopy Number
Alterations
Structural Variation Gene Expression
11. Completeness of Genomic Analysis Data Types in DCC
Datasets (5)
11
miRNA ExpressionSimple Somatic
Mutations
Splicing Variation DNA MethylationCopy Number
Alterations
Structural Variation Gene Expression
13. Clinical Data Completeness Overview
Donor Data Element
Average %
Complete
donor sex 94.8
donor diagnosis icd10 94.5
donor age at diagnosis 84.7
donor vital status 71.2
donor age at last
followup 64.9
donor notes 57.7
donor interval of last
followup 55.5
disease status last
followup 52.5
donor region of
residence 52.5
donor tumour staging
system at diagnosis 49.9
donor tumour stage at
diagnosis 33.4
donor survival time 30.3
donor age at
enrollment 28.4
donor tumour stage at
diagnosis supplemental 14.8
donor relapse interval 5.8
donor relapse type 4.5
Specimen Data Element
Average %
Complete
specimen type 97.7
tumour confirmed 68.3
specimen storage other 54.2
specimen notes 52.4
specimen processing other 51.7
digital image of stained
section 51.6
tumour grade 25.0
tumour grading system 24.4
specimen storage 22.6
specimen donor treatment
type 21.1
specimen processing 21.0
tumour histological type 18.6
tumour stage 18.2
tumour stage system 14.5
specimen type other 14.4
specimen interval 10.7
specimen available 9.4
tumour stage supplemental 2.3
tumour grade supplemental 1.1
specimen donor treatment
type other 0.9
specimen biobank 0.0
specimen biobank id 0.0
Analyzed Sample Data
Element
Average %
Complete
analyzed sample type 95.2
analyzed sample notes 48.8
analyzed sample type
other 12.4
analyzed sample
interval 4.9
Disclaimer:
A data element was considered “complete” in an
individual donor’s clinical data if a non-null value was
provided for that data element at least once in the
donor record, or in any of the donor-associated
specimens and sample records.
Averages were calculated for each field across all
donors from all projects.
Intention is only to provide a high level overview of
how “complete” ICGC release 11 clinical dataset is.
19. DCC Helpdesk
• 110 helpdesk inquiries received at dcc-support@oicr.on.ca since
Cannes meeting
◦ …this doesn’t include requests that arrive direct to my inbox
Some frequent topics of enquiry include:
• Controlled data access
◦ How do I obtain access?
◦ Why am I unable to log into my account?
• Questions related to analysis methods, eg: how data was
normalized
• Questions from ICGC member projects related to data
submissions, data encoding, etc.
20. Key DCC Activities for 2013
• Improved data & metadata curation at EGA; better linking
of data held at DCC to ICGC data in other repositories
• Improved data quality/integrity checking through new
submission/validation system; review of submission file
specifications
• Integration of new data submission system and portal
infrastructure with project and user information managed at
ICGC.org
21. Anknowledgements and Thanks
• ICGC DCC software team @ OICR
• ICGC Secretariat Office
• All the great ICGC members!
21
Editor's Notes
3
13/16 non-TCGA projects have data under ICGC DAC in EGA!
NCC liver exome study (EGAS00001000389) is not currently associated with ICGC DAC