SlideShare a Scribd company logo
1 of 21
Brett Whitty, DCC Curation Group
7th International Workshop
Heidelberg, Germany
International Cancer Genome
Consortium
Data Coordination Center Update
DCC Updates in 2012
August 2012 • ICGC 9, first data release from Canadian Pediatric
Medulloblastoma (MAGIC) project, full update of 18
cancer types from the U.S. TCGA project including 6
new databases, update of German Pediatric Brain
Tumour (PedBrain) project. Added 3,029 donors.
November 2012 • ICGC 10, two new cancer types from the U.S. TCGA
project as well as updates to 18 other TCGA project,
updates from the Spanish Chronic Lymphocytic
Leukemia project including new methylation data,
updates to U.K. Breast Carcinoma and Chronic Myeloid
Disorders project databases. Added 432 donors.
December 2012 • ICGC 11, first data release from 4 projects: German
Malignant Lymphoma, Canadian Prostate Cancer,
German Prostate Cancer, U.K. Prostate Cancer; new
data from an additional 4 projects: Australian Pancreatic
Cancer, Canadian Pancreatic Cancer, Japanese Liver
Cancer, German Pediatric Brain Tumour (PedBrain).
Added 336 donors.
3
• Cancer types: 42 (including TCGA, TSP, JHU)
• Donors: 7,358 (14,645 specimens)
• Simple somatic mutations: 3,761,508
• Copy number mutations: 25,227,388
• Structural rearrangements: 2,079
• Genes affected* by simple somatic mutations: 19,901
• Genes affected* by non-synonymous coding mutations: 19,675
• Genes affected* by copy number mutations: 20,109
• Genes affected* by structural rearrangements: 1,060
*out of 21,420 protein coding genes annotated in Ensembl Human release 66
• Open tier and controlled data currently available
ICGC dataset version 11
December 2012
ICGC DCC Has Received Data for 7,358 Cancer Genomes
and Counting
4
Release 7
Release 8
Release 9
Release 10
Release 11
6
1
10
100
1000
10000
100000
1000000
10000000
Total Mutation Observation Counts by Cancer Project
(Release 11)
SSM Observations CNSM Observations STSM Observations
Completeness of Data for Genomic Analysis Types in
DCC Datasets
(ICGC 11)
7
Copy Number Alterations
Structural Variation
Gene Expression
miRNA Expression
Simple Somatic Mutations
Splicing Variation
DNA Methylation
#
DONOR
S
Completeness of Genomic Analysis Data Types in DCC
Datasets (2)
8
miRNA ExpressionSimple Somatic
Mutations
Splicing Variation DNA MethylationCopy Number
Alterations
Structural Variation Gene Expression
Completeness of Genomic Analysis Data Types in DCC
Datasets (3)
9
miRNA ExpressionSimple Somatic
Mutations
Splicing Variation DNA MethylationCopy Number
Alterations
Structural Variation Gene Expression
Completeness of Genomic Analysis Data Types in DCC
Datasets (4)
10
miRNA ExpressionSimple Somatic
Mutations
Splicing Variation DNA MethylationCopy Number
Alterations
Structural Variation Gene Expression
Completeness of Genomic Analysis Data Types in DCC
Datasets (5)
11
miRNA ExpressionSimple Somatic
Mutations
Splicing Variation DNA MethylationCopy Number
Alterations
Structural Variation Gene Expression
12
Clinical Data Completeness Overview
Donor Data Element
Average %
Complete
donor sex 94.8
donor diagnosis icd10 94.5
donor age at diagnosis 84.7
donor vital status 71.2
donor age at last
followup 64.9
donor notes 57.7
donor interval of last
followup 55.5
disease status last
followup 52.5
donor region of
residence 52.5
donor tumour staging
system at diagnosis 49.9
donor tumour stage at
diagnosis 33.4
donor survival time 30.3
donor age at
enrollment 28.4
donor tumour stage at
diagnosis supplemental 14.8
donor relapse interval 5.8
donor relapse type 4.5
Specimen Data Element
Average %
Complete
specimen type 97.7
tumour confirmed 68.3
specimen storage other 54.2
specimen notes 52.4
specimen processing other 51.7
digital image of stained
section 51.6
tumour grade 25.0
tumour grading system 24.4
specimen storage 22.6
specimen donor treatment
type 21.1
specimen processing 21.0
tumour histological type 18.6
tumour stage 18.2
tumour stage system 14.5
specimen type other 14.4
specimen interval 10.7
specimen available 9.4
tumour stage supplemental 2.3
tumour grade supplemental 1.1
specimen donor treatment
type other 0.9
specimen biobank 0.0
specimen biobank id 0.0
Analyzed Sample Data
Element
Average %
Complete
analyzed sample type 95.2
analyzed sample notes 48.8
analyzed sample type
other 12.4
analyzed sample
interval 4.9
Disclaimer:
A data element was considered “complete” in an
individual donor’s clinical data if a non-null value was
provided for that data element at least once in the
donor record, or in any of the donor-associated
specimens and sample records.
Averages were calculated for each field across all
donors from all projects.
Intention is only to provide a high level overview of
how “complete” ICGC release 11 clinical dataset is.
Overview of Clinical Data Completeness (ICGC 10)
14
ICGC Release 11 Raw Data Availability
Raw Data Availability at EGA by Project and Data Type
Project
Whole Genome
Sequencing
Exome
Sequencing
Transcriptome
Sequencing
Whole Genome
Expression Array
Whole Genome
Methylation
Array
Unspecified
Type
Total Project
Samples
CLL, Spain 11 227 107 205 171 224 945
Breast Carcinoma, UK 173 442 TBD - - 174 789
Myeloproliferative Disease, UK 6 476 - - - - 482
Pediatric Medulloblastoma, Germany 236 - - - - - 236
Pancreatic Cancer, Australia - - - - - 192 192
Osteosarcoma, UK 3 140 - - - - 143
Liver Cancer, France - 48 - - - - 48
Malignant Lymphoma, Germany 12 12 4 (+TBD) - - - 28
Oral Cancer, India - 21 - - - - 21
Prostate Cancer, Germany 18 - - - - - 18
Prostate Cancer, UK 4 - - - - - 4
Pancreatic Cancer, Canada - 2 - - - - 2
Pediatric Medulloblastoma, Canada - - - - - TBD -
Total Samples by Type 463 1368 111 205 171 590 2908
# of Samples in Available Datasets by Data Type
Web Usage Overview
18
DCC Helpdesk
• 110 helpdesk inquiries received at dcc-support@oicr.on.ca since
Cannes meeting
◦ …this doesn’t include requests that arrive direct to my inbox
Some frequent topics of enquiry include:
• Controlled data access
◦ How do I obtain access?
◦ Why am I unable to log into my account?
• Questions related to analysis methods, eg: how data was
normalized
• Questions from ICGC member projects related to data
submissions, data encoding, etc.
Key DCC Activities for 2013
• Improved data & metadata curation at EGA; better linking
of data held at DCC to ICGC data in other repositories
• Improved data quality/integrity checking through new
submission/validation system; review of submission file
specifications
• Integration of new data submission system and portal
infrastructure with project and user information managed at
ICGC.org
Anknowledgements and Thanks
• ICGC DCC software team @ OICR
• ICGC Secretariat Office
• All the great ICGC members!
21

More Related Content

What's hot

Large Scale PCA Analysis in SVS
Large Scale PCA Analysis in SVSLarge Scale PCA Analysis in SVS
Large Scale PCA Analysis in SVSGolden Helix
 
Use of open, curated variant databases: ethics? Liability? - Bartha Knoppers
Use of open, curated variant databases: ethics? Liability? - Bartha KnoppersUse of open, curated variant databases: ethics? Liability? - Bartha Knoppers
Use of open, curated variant databases: ethics? Liability? - Bartha KnoppersHuman Variome Project
 
NetBioSIG2014-Talk by David Amar
NetBioSIG2014-Talk by David AmarNetBioSIG2014-Talk by David Amar
NetBioSIG2014-Talk by David AmarAlexander Pico
 
Predictive in vitro & in silico Methods for Precision Medicine- Robert G. Hun...
Predictive in vitro & in silico Methods for Precision Medicine- Robert G. Hun...Predictive in vitro & in silico Methods for Precision Medicine- Robert G. Hun...
Predictive in vitro & in silico Methods for Precision Medicine- Robert G. Hun...RobertGHunter
 
Lessons Learned From a DDE Phase 2 CT, 2012
Lessons Learned From a DDE Phase 2 CT, 2012Lessons Learned From a DDE Phase 2 CT, 2012
Lessons Learned From a DDE Phase 2 CT, 2012Vadim Tantsyura
 
The BRCA Share(TM) Consortium - Christophe Beroud
The BRCA Share(TM) Consortium - Christophe BeroudThe BRCA Share(TM) Consortium - Christophe Beroud
The BRCA Share(TM) Consortium - Christophe BeroudHuman Variome Project
 
Jax bio dataworldcongress.ngs.20181128finalwithoutbu
Jax bio dataworldcongress.ngs.20181128finalwithoutbuJax bio dataworldcongress.ngs.20181128finalwithoutbu
Jax bio dataworldcongress.ngs.20181128finalwithoutbuAnne Deslattes Mays
 
The Human Variome Database in Australia in 2014 - Graham Taylor
The Human Variome Database in Australia in 2014 - Graham TaylorThe Human Variome Database in Australia in 2014 - Graham Taylor
The Human Variome Database in Australia in 2014 - Graham TaylorHuman Variome Project
 
Clinical data management and software packages final edc and rdc
Clinical data management and software packages final edc and rdcClinical data management and software packages final edc and rdc
Clinical data management and software packages final edc and rdcPristyn Research Solutions
 
Big Data Challenges for Real-Time Personalized Medicine
Big Data Challenges for Real-Time Personalized MedicineBig Data Challenges for Real-Time Personalized Medicine
Big Data Challenges for Real-Time Personalized MedicineSAP Technology
 
ClinVar: Aggregating Data to Improve Variant Interpretation - Melissa Landrum
ClinVar: Aggregating Data to Improve Variant Interpretation - Melissa LandrumClinVar: Aggregating Data to Improve Variant Interpretation - Melissa Landrum
ClinVar: Aggregating Data to Improve Variant Interpretation - Melissa LandrumHuman Variome Project
 
Managing the analysis of high-throughput data
Managing the analysis of high-throughput dataManaging the analysis of high-throughput data
Managing the analysis of high-throughput dataJavier Quílez Oliete
 
Reg Sci Lecture Dec 2016
Reg Sci Lecture Dec 2016Reg Sci Lecture Dec 2016
Reg Sci Lecture Dec 2016Rick Silva
 
Cancer Analytics Poster
Cancer Analytics PosterCancer Analytics Poster
Cancer Analytics PosterMichael Atkins
 
AI in Bioinformatics
AI in BioinformaticsAI in Bioinformatics
AI in BioinformaticsAli Kishk
 
TCIA Data Harmonization Project
TCIA Data Harmonization ProjectTCIA Data Harmonization Project
TCIA Data Harmonization Projectimgcommcall
 

What's hot (20)

Large Scale PCA Analysis in SVS
Large Scale PCA Analysis in SVSLarge Scale PCA Analysis in SVS
Large Scale PCA Analysis in SVS
 
Use of open, curated variant databases: ethics? Liability? - Bartha Knoppers
Use of open, curated variant databases: ethics? Liability? - Bartha KnoppersUse of open, curated variant databases: ethics? Liability? - Bartha Knoppers
Use of open, curated variant databases: ethics? Liability? - Bartha Knoppers
 
NetBioSIG2014-Talk by David Amar
NetBioSIG2014-Talk by David AmarNetBioSIG2014-Talk by David Amar
NetBioSIG2014-Talk by David Amar
 
Predictive in vitro & in silico Methods for Precision Medicine- Robert G. Hun...
Predictive in vitro & in silico Methods for Precision Medicine- Robert G. Hun...Predictive in vitro & in silico Methods for Precision Medicine- Robert G. Hun...
Predictive in vitro & in silico Methods for Precision Medicine- Robert G. Hun...
 
Lessons Learned From a DDE Phase 2 CT, 2012
Lessons Learned From a DDE Phase 2 CT, 2012Lessons Learned From a DDE Phase 2 CT, 2012
Lessons Learned From a DDE Phase 2 CT, 2012
 
The BRCA Share(TM) Consortium - Christophe Beroud
The BRCA Share(TM) Consortium - Christophe BeroudThe BRCA Share(TM) Consortium - Christophe Beroud
The BRCA Share(TM) Consortium - Christophe Beroud
 
Jax bio dataworldcongress.ngs.20181128finalwithoutbu
Jax bio dataworldcongress.ngs.20181128finalwithoutbuJax bio dataworldcongress.ngs.20181128finalwithoutbu
Jax bio dataworldcongress.ngs.20181128finalwithoutbu
 
iOmics
iOmicsiOmics
iOmics
 
The Human Variome Database in Australia in 2014 - Graham Taylor
The Human Variome Database in Australia in 2014 - Graham TaylorThe Human Variome Database in Australia in 2014 - Graham Taylor
The Human Variome Database in Australia in 2014 - Graham Taylor
 
Clinical data management and software packages final edc and rdc
Clinical data management and software packages final edc and rdcClinical data management and software packages final edc and rdc
Clinical data management and software packages final edc and rdc
 
Project Hippocrates
Project HippocratesProject Hippocrates
Project Hippocrates
 
Data sharing and analysis
Data sharing and analysisData sharing and analysis
Data sharing and analysis
 
Big Data Challenges for Real-Time Personalized Medicine
Big Data Challenges for Real-Time Personalized MedicineBig Data Challenges for Real-Time Personalized Medicine
Big Data Challenges for Real-Time Personalized Medicine
 
ClinVar: Aggregating Data to Improve Variant Interpretation - Melissa Landrum
ClinVar: Aggregating Data to Improve Variant Interpretation - Melissa LandrumClinVar: Aggregating Data to Improve Variant Interpretation - Melissa Landrum
ClinVar: Aggregating Data to Improve Variant Interpretation - Melissa Landrum
 
Managing the analysis of high-throughput data
Managing the analysis of high-throughput dataManaging the analysis of high-throughput data
Managing the analysis of high-throughput data
 
Reg Sci Lecture Dec 2016
Reg Sci Lecture Dec 2016Reg Sci Lecture Dec 2016
Reg Sci Lecture Dec 2016
 
Cancer Analytics Poster
Cancer Analytics PosterCancer Analytics Poster
Cancer Analytics Poster
 
AI in Bioinformatics
AI in BioinformaticsAI in Bioinformatics
AI in Bioinformatics
 
TCIA Data Harmonization Project
TCIA Data Harmonization ProjectTCIA Data Harmonization Project
TCIA Data Harmonization Project
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 

Similar to 2012-ICGC-Heidelberg-Whitty-DCC 2

Analysis and Interpretation of Cell-free DNA
Analysis and Interpretation of Cell-free DNAAnalysis and Interpretation of Cell-free DNA
Analysis and Interpretation of Cell-free DNAQIAGEN
 
Ontology-Driven Clinical Intelligence: A Path from the Biobank to Cross-Disea...
Ontology-Driven Clinical Intelligence: A Path from the Biobank to Cross-Disea...Ontology-Driven Clinical Intelligence: A Path from the Biobank to Cross-Disea...
Ontology-Driven Clinical Intelligence: A Path from the Biobank to Cross-Disea...Remedy Informatics
 
Advancing Innovation and Convergence in Cancer Research: US Federal Cancer Mo...
Advancing Innovation and Convergence in Cancer Research: US Federal Cancer Mo...Advancing Innovation and Convergence in Cancer Research: US Federal Cancer Mo...
Advancing Innovation and Convergence in Cancer Research: US Federal Cancer Mo...Jerry Lee
 
Cancer Moonshot, Data sharing and the Genomic Data Commons
Cancer Moonshot, Data sharing and the Genomic Data CommonsCancer Moonshot, Data sharing and the Genomic Data Commons
Cancer Moonshot, Data sharing and the Genomic Data CommonsWarren Kibbe
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...David Peyruc
 
cBioPortal Webinar Slides (2/3)
cBioPortal Webinar Slides (2/3)cBioPortal Webinar Slides (2/3)
cBioPortal Webinar Slides (2/3)Pistoia Alliance
 
NCI Cancer Genomic Data Commons for NCAB September 2016
NCI Cancer Genomic Data Commons for NCAB September 2016NCI Cancer Genomic Data Commons for NCAB September 2016
NCI Cancer Genomic Data Commons for NCAB September 2016Warren Kibbe
 
Biocuration activities for the International Cancer Genome Consortium (ICGC).
Biocuration activities for the International Cancer Genome Consortium (ICGC).Biocuration activities for the International Cancer Genome Consortium (ICGC).
Biocuration activities for the International Cancer Genome Consortium (ICGC).Neuro, McGill University
 
Big Data Analytics in the Health Domain
Big Data Analytics in the Health DomainBig Data Analytics in the Health Domain
Big Data Analytics in the Health DomainBigData_Europe
 
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...Barry Smith
 
Federal Research & Development for the Florida system Sept 2014
Federal Research & Development for the Florida system Sept 2014 Federal Research & Development for the Florida system Sept 2014
Federal Research & Development for the Florida system Sept 2014 Warren Kibbe
 
Day 2 Big Data panel at the NIH BD2K All Hands 2016 meeting
Day 2 Big Data panel at the NIH BD2K All Hands 2016 meetingDay 2 Big Data panel at the NIH BD2K All Hands 2016 meeting
Day 2 Big Data panel at the NIH BD2K All Hands 2016 meetingWarren Kibbe
 
A practical guide to using The Cancer Imaging Archive for QIN Challenges and ...
A practical guide to using The Cancer Imaging Archive for QIN Challenges and ...A practical guide to using The Cancer Imaging Archive for QIN Challenges and ...
A practical guide to using The Cancer Imaging Archive for QIN Challenges and ...CancerImagingInforma
 
Database Designing in CDM
Database Designing in CDMDatabase Designing in CDM
Database Designing in CDMClinosolIndia
 
DCHI webinar on N3C January 2021
DCHI webinar on N3C January 2021DCHI webinar on N3C January 2021
DCHI webinar on N3C January 2021Warren Kibbe
 
CDM_Process_Overview_Katalyst HLS
CDM_Process_Overview_Katalyst HLSCDM_Process_Overview_Katalyst HLS
CDM_Process_Overview_Katalyst HLSKatalyst HLS
 
Big Data in Genomics: Opportunities and Challenges
Big Data in Genomics: Opportunities and ChallengesBig Data in Genomics: Opportunities and Challenges
Big Data in Genomics: Opportunities and ChallengesMatthieu Schapranow
 
Data Commons & Data Science Workshop
Data Commons & Data Science WorkshopData Commons & Data Science Workshop
Data Commons & Data Science WorkshopWarren Kibbe
 
Whole Genome Trait Association in SVS
Whole Genome Trait Association in SVSWhole Genome Trait Association in SVS
Whole Genome Trait Association in SVSGolden Helix
 

Similar to 2012-ICGC-Heidelberg-Whitty-DCC 2 (20)

Komatsoulis Jhu
Komatsoulis JhuKomatsoulis Jhu
Komatsoulis Jhu
 
Analysis and Interpretation of Cell-free DNA
Analysis and Interpretation of Cell-free DNAAnalysis and Interpretation of Cell-free DNA
Analysis and Interpretation of Cell-free DNA
 
Ontology-Driven Clinical Intelligence: A Path from the Biobank to Cross-Disea...
Ontology-Driven Clinical Intelligence: A Path from the Biobank to Cross-Disea...Ontology-Driven Clinical Intelligence: A Path from the Biobank to Cross-Disea...
Ontology-Driven Clinical Intelligence: A Path from the Biobank to Cross-Disea...
 
Advancing Innovation and Convergence in Cancer Research: US Federal Cancer Mo...
Advancing Innovation and Convergence in Cancer Research: US Federal Cancer Mo...Advancing Innovation and Convergence in Cancer Research: US Federal Cancer Mo...
Advancing Innovation and Convergence in Cancer Research: US Federal Cancer Mo...
 
Cancer Moonshot, Data sharing and the Genomic Data Commons
Cancer Moonshot, Data sharing and the Genomic Data CommonsCancer Moonshot, Data sharing and the Genomic Data Commons
Cancer Moonshot, Data sharing and the Genomic Data Commons
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...
 
cBioPortal Webinar Slides (2/3)
cBioPortal Webinar Slides (2/3)cBioPortal Webinar Slides (2/3)
cBioPortal Webinar Slides (2/3)
 
NCI Cancer Genomic Data Commons for NCAB September 2016
NCI Cancer Genomic Data Commons for NCAB September 2016NCI Cancer Genomic Data Commons for NCAB September 2016
NCI Cancer Genomic Data Commons for NCAB September 2016
 
Biocuration activities for the International Cancer Genome Consortium (ICGC).
Biocuration activities for the International Cancer Genome Consortium (ICGC).Biocuration activities for the International Cancer Genome Consortium (ICGC).
Biocuration activities for the International Cancer Genome Consortium (ICGC).
 
Big Data Analytics in the Health Domain
Big Data Analytics in the Health DomainBig Data Analytics in the Health Domain
Big Data Analytics in the Health Domain
 
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
 
Federal Research & Development for the Florida system Sept 2014
Federal Research & Development for the Florida system Sept 2014 Federal Research & Development for the Florida system Sept 2014
Federal Research & Development for the Florida system Sept 2014
 
Day 2 Big Data panel at the NIH BD2K All Hands 2016 meeting
Day 2 Big Data panel at the NIH BD2K All Hands 2016 meetingDay 2 Big Data panel at the NIH BD2K All Hands 2016 meeting
Day 2 Big Data panel at the NIH BD2K All Hands 2016 meeting
 
A practical guide to using The Cancer Imaging Archive for QIN Challenges and ...
A practical guide to using The Cancer Imaging Archive for QIN Challenges and ...A practical guide to using The Cancer Imaging Archive for QIN Challenges and ...
A practical guide to using The Cancer Imaging Archive for QIN Challenges and ...
 
Database Designing in CDM
Database Designing in CDMDatabase Designing in CDM
Database Designing in CDM
 
DCHI webinar on N3C January 2021
DCHI webinar on N3C January 2021DCHI webinar on N3C January 2021
DCHI webinar on N3C January 2021
 
CDM_Process_Overview_Katalyst HLS
CDM_Process_Overview_Katalyst HLSCDM_Process_Overview_Katalyst HLS
CDM_Process_Overview_Katalyst HLS
 
Big Data in Genomics: Opportunities and Challenges
Big Data in Genomics: Opportunities and ChallengesBig Data in Genomics: Opportunities and Challenges
Big Data in Genomics: Opportunities and Challenges
 
Data Commons & Data Science Workshop
Data Commons & Data Science WorkshopData Commons & Data Science Workshop
Data Commons & Data Science Workshop
 
Whole Genome Trait Association in SVS
Whole Genome Trait Association in SVSWhole Genome Trait Association in SVS
Whole Genome Trait Association in SVS
 

2012-ICGC-Heidelberg-Whitty-DCC 2

  • 1. Brett Whitty, DCC Curation Group 7th International Workshop Heidelberg, Germany International Cancer Genome Consortium Data Coordination Center Update
  • 2. DCC Updates in 2012 August 2012 • ICGC 9, first data release from Canadian Pediatric Medulloblastoma (MAGIC) project, full update of 18 cancer types from the U.S. TCGA project including 6 new databases, update of German Pediatric Brain Tumour (PedBrain) project. Added 3,029 donors. November 2012 • ICGC 10, two new cancer types from the U.S. TCGA project as well as updates to 18 other TCGA project, updates from the Spanish Chronic Lymphocytic Leukemia project including new methylation data, updates to U.K. Breast Carcinoma and Chronic Myeloid Disorders project databases. Added 432 donors. December 2012 • ICGC 11, first data release from 4 projects: German Malignant Lymphoma, Canadian Prostate Cancer, German Prostate Cancer, U.K. Prostate Cancer; new data from an additional 4 projects: Australian Pancreatic Cancer, Canadian Pancreatic Cancer, Japanese Liver Cancer, German Pediatric Brain Tumour (PedBrain). Added 336 donors.
  • 3. 3 • Cancer types: 42 (including TCGA, TSP, JHU) • Donors: 7,358 (14,645 specimens) • Simple somatic mutations: 3,761,508 • Copy number mutations: 25,227,388 • Structural rearrangements: 2,079 • Genes affected* by simple somatic mutations: 19,901 • Genes affected* by non-synonymous coding mutations: 19,675 • Genes affected* by copy number mutations: 20,109 • Genes affected* by structural rearrangements: 1,060 *out of 21,420 protein coding genes annotated in Ensembl Human release 66 • Open tier and controlled data currently available ICGC dataset version 11 December 2012
  • 4. ICGC DCC Has Received Data for 7,358 Cancer Genomes and Counting 4 Release 7 Release 8 Release 9 Release 10 Release 11
  • 5.
  • 6. 6 1 10 100 1000 10000 100000 1000000 10000000 Total Mutation Observation Counts by Cancer Project (Release 11) SSM Observations CNSM Observations STSM Observations
  • 7. Completeness of Data for Genomic Analysis Types in DCC Datasets (ICGC 11) 7 Copy Number Alterations Structural Variation Gene Expression miRNA Expression Simple Somatic Mutations Splicing Variation DNA Methylation # DONOR S
  • 8. Completeness of Genomic Analysis Data Types in DCC Datasets (2) 8 miRNA ExpressionSimple Somatic Mutations Splicing Variation DNA MethylationCopy Number Alterations Structural Variation Gene Expression
  • 9. Completeness of Genomic Analysis Data Types in DCC Datasets (3) 9 miRNA ExpressionSimple Somatic Mutations Splicing Variation DNA MethylationCopy Number Alterations Structural Variation Gene Expression
  • 10. Completeness of Genomic Analysis Data Types in DCC Datasets (4) 10 miRNA ExpressionSimple Somatic Mutations Splicing Variation DNA MethylationCopy Number Alterations Structural Variation Gene Expression
  • 11. Completeness of Genomic Analysis Data Types in DCC Datasets (5) 11 miRNA ExpressionSimple Somatic Mutations Splicing Variation DNA MethylationCopy Number Alterations Structural Variation Gene Expression
  • 12. 12
  • 13. Clinical Data Completeness Overview Donor Data Element Average % Complete donor sex 94.8 donor diagnosis icd10 94.5 donor age at diagnosis 84.7 donor vital status 71.2 donor age at last followup 64.9 donor notes 57.7 donor interval of last followup 55.5 disease status last followup 52.5 donor region of residence 52.5 donor tumour staging system at diagnosis 49.9 donor tumour stage at diagnosis 33.4 donor survival time 30.3 donor age at enrollment 28.4 donor tumour stage at diagnosis supplemental 14.8 donor relapse interval 5.8 donor relapse type 4.5 Specimen Data Element Average % Complete specimen type 97.7 tumour confirmed 68.3 specimen storage other 54.2 specimen notes 52.4 specimen processing other 51.7 digital image of stained section 51.6 tumour grade 25.0 tumour grading system 24.4 specimen storage 22.6 specimen donor treatment type 21.1 specimen processing 21.0 tumour histological type 18.6 tumour stage 18.2 tumour stage system 14.5 specimen type other 14.4 specimen interval 10.7 specimen available 9.4 tumour stage supplemental 2.3 tumour grade supplemental 1.1 specimen donor treatment type other 0.9 specimen biobank 0.0 specimen biobank id 0.0 Analyzed Sample Data Element Average % Complete analyzed sample type 95.2 analyzed sample notes 48.8 analyzed sample type other 12.4 analyzed sample interval 4.9 Disclaimer: A data element was considered “complete” in an individual donor’s clinical data if a non-null value was provided for that data element at least once in the donor record, or in any of the donor-associated specimens and sample records. Averages were calculated for each field across all donors from all projects. Intention is only to provide a high level overview of how “complete” ICGC release 11 clinical dataset is.
  • 14. Overview of Clinical Data Completeness (ICGC 10) 14
  • 15. ICGC Release 11 Raw Data Availability
  • 16. Raw Data Availability at EGA by Project and Data Type Project Whole Genome Sequencing Exome Sequencing Transcriptome Sequencing Whole Genome Expression Array Whole Genome Methylation Array Unspecified Type Total Project Samples CLL, Spain 11 227 107 205 171 224 945 Breast Carcinoma, UK 173 442 TBD - - 174 789 Myeloproliferative Disease, UK 6 476 - - - - 482 Pediatric Medulloblastoma, Germany 236 - - - - - 236 Pancreatic Cancer, Australia - - - - - 192 192 Osteosarcoma, UK 3 140 - - - - 143 Liver Cancer, France - 48 - - - - 48 Malignant Lymphoma, Germany 12 12 4 (+TBD) - - - 28 Oral Cancer, India - 21 - - - - 21 Prostate Cancer, Germany 18 - - - - - 18 Prostate Cancer, UK 4 - - - - - 4 Pancreatic Cancer, Canada - 2 - - - - 2 Pediatric Medulloblastoma, Canada - - - - - TBD - Total Samples by Type 463 1368 111 205 171 590 2908 # of Samples in Available Datasets by Data Type
  • 18. 18
  • 19. DCC Helpdesk • 110 helpdesk inquiries received at dcc-support@oicr.on.ca since Cannes meeting ◦ …this doesn’t include requests that arrive direct to my inbox Some frequent topics of enquiry include: • Controlled data access ◦ How do I obtain access? ◦ Why am I unable to log into my account? • Questions related to analysis methods, eg: how data was normalized • Questions from ICGC member projects related to data submissions, data encoding, etc.
  • 20. Key DCC Activities for 2013 • Improved data & metadata curation at EGA; better linking of data held at DCC to ICGC data in other repositories • Improved data quality/integrity checking through new submission/validation system; review of submission file specifications • Integration of new data submission system and portal infrastructure with project and user information managed at ICGC.org
  • 21. Anknowledgements and Thanks • ICGC DCC software team @ OICR • ICGC Secretariat Office • All the great ICGC members! 21

Editor's Notes

  1. 3
  2. 13/16 non-TCGA projects have data under ICGC DAC in EGA! NCC liver exome study (EGAS00001000389) is not currently associated with ICGC DAC