Canceromatic III - Session I: Pan-Cancer analysis
- Changing landscape of data and tools available
for reproducible cancer genomics workflows: report
from the ICGC trenches.
Nov 14th 2016
B.F. Francis Ouellette francis@oicr.on.ca
• Senior Scientists & Associate Director,
Informatics and Biocomputing, Ontario Institute for
Cancer Research, Toronto, ON
• Associate Professor, Department of Cell and Systems Biology,
University of Toronto, Toronto, ON.
2Module #: Title of Module
Module 2 bioinformatics.ca
ONTARIO INSTITUTE FOR CANCER RESEARC
You are free to:
Copy, share, adapt, or re-mix;
Photograph, film, or broadcast;
Blog, live-blog, or post video of;
This presentation. Provided that:
You attribute the work to its author and respect the rights
and licenses associated with its components.
Slide Concept by Cameron Neylon, who has waived all copyright and related or neighbouring rights. This slide only ccZero.
Social Media Icons adapted with permission from originals by Christopher Ross. Original images are available under GPL at;
http://www.thisismyurl.com/free-downloads/15-free-speech-bubble-icons-for-popular-websites
ONTARIO INSTITUTE FOR CANCER RESEARC
@bffo
francis@oicr.on.caE-mail
ONTARIO INSTITUTE FOR CANCER RESEARC
6
Cancer-om-atics Jul 6-9 2009
Cancer-om-atics II Mar 28-30 2011
Canceromatics III Nov 13 -16 2016
ONTARIO INSTITUTE FOR CANCER RESEARC
Disclaimers
I do not (and will not) profit in any way, shape or form, from
any of the brands, products or companies I may mention.
I am a big proponent of Open Access, Open Source, Opent
Data and Open Courseware
I am on the SAB of many NIH funded projects (SGD, Galaxy,
GenomeSpace, H3ABionet, and HMP2), as well as Elixir and
Genome Canada’s SIAC, and the NRC’s KMAC.
This comes with a bias on how science should be done!
ONTARIO INSTITUTE FOR CANCER RESEARC
Outline
8
Introduction
ICGC
PCAWG
Closing remarks
ONTARIO INSTITUTE FOR CANCER RESEARC
9
adapted from https://goo.gl/fQJAz1
ICGC PCAWG
Docker
Testing
ONTARIO INSTITUTE FOR CANCER RESEARC
Cancer is a Disease
of the Genome
Challenge in Treating Cancer:
 Every tumour is different
 Every cancer patient is different
Adapted from Tom Hudsonhttps://www.cancer.gov/research/areas/genomics
ONTARIO INSTITUTE FOR CANCER RESEARC
 Johns Hopkins
> 18,000 genes analyzed for mutations
11 breast and 11 colon tumors
L.D. Wood et al, Science, Oct. 2007
 Wellcome Trust Sanger Institute
518 genes analyzed for mutations
210 tumors of various types
C. Greenman et al, Nature, Mar. 2007
 TCGA (NIH)
Multiple technologies
brain (glioblastoma multiforme), lung (squamous carcinoma),
and ovarian (serous cystadenocarcinoma).
F.S. Collins & A.D. Barker, Sci. Am, Mar. 2007
Large-Scale Studies of Cancer Genomes
ONTARIO INSTITUTE FOR CANCER RESEARC
 Heterogeneity within and across tumor types
 High rate of abnormalities (driver vs passenger)
 Sample quality matters
 Consent and controlled data access is complicated
Lessons learned from early studies
MR Stratton et al. Nature 458, 719-724 (2009) doi:10.1038/nature07943
ONTARIO INSTITUTE FOR CANCER RESEARC
Analysis Data Types
Simple Somatic Mutations (SSM or SNV)
Copy Number Alterations (CAN or CNV)
Structural Variants (SV)
Germline variants (SNPs)
Gene Expression (micro-arrays and RNASeq)
miRNA Expression (RNASeq)
Epigenomics (Arrays and Methylation)
Splicing Variation (RNASeq)
Protein Expression (Arrays)
ONTARIO INSTITUTE FOR CANCER RESEARC
Rationale for the ICGC:
Scope is huge
Reduce duplication of effort
Standardization and uniform quality
measures
Merging of datasets
Spectrum of many cancers varies
across the world
Accelerate the dissemination of
genomic and analytical methods
ONTARIO INSTITUTE FOR CANCER RESEARC
International Cancer Genome Consortium
Collect ~500 tumour/normal pairs from each of 50 different
major cancer types; 25,000 T/N pairs!
Comprehensive genome analysis of each T/N pair:
Genome
Transcriptome
Methylome
Clinical data
Make the data available to the research community & public.
Identify
genome
changes
…GATTATTCCAGGTAT… …GATTATTGCAGGTAT… …GATTATTGCAGGTAT…
Adapted from Tom Hudson
ONTARIO INSTITUTE FOR CANCER RESEARC
16
ONTARIO INSTITUTE FOR CANCER RESEARC
International Cancer Genome Consortium: http:/icgc.org
ONTARIO INSTITUTE FOR CANCER RESEARC
Data
Submission
Validation
ValidationValidation
(dictionary)
Validation
(across
fields)
Validation
(across
fields)
Validation
(across
fields)
indexing
Happy
Users
http://goo.gl/1EcyR
ONTARIO INSTITUTE FOR CANCER RESEARC
ICGC needs to deal with different
kinds of users!
19
Biologists/Clinicians:
Web interface to processed data, providing:
Affected gene lists with consequences
Impact on pathways
Power users:
Application Programing Interface (API) to get to data
Availability and Integration with cloud resources
ONTARIO INSTITUTE FOR CANCER RESEARC
ICGC Data Coordinating Centre:
dcc.icgc.org
20
ONTARIO INSTITUTE FOR CANCER RESEARC
BRAF missense mutations in colorectal cancer
21
ONTARIO INSTITUTE FOR CANCER RESEARC
https://dcc.icgc.org/
22
ONTARIO INSTITUTE FOR CANCER RESEARC
23
https://dcc.icgc.org/icgc-in-the-cloud
ONTARIO INSTITUTE FOR CANCER RESEARC
24
http://www.cancercollaboratory.org/
ONTARIO INSTITUTE FOR CANCER RESEARC
http://docs.icgc.org/
User and submitter documentation
ONTARIO INSTITUTE FOR CANCER RESEARC
Software development discussions
26
https://discuss.icgc.org/
ONTARIO INSTITUTE FOR CANCER RESEARC
Some challenges:
27
So, we have lots of data, is
it generated the same way?
ONTARIO INSTITUTE FOR CANCER RESEARC
Every country/group has basically been
submitting:
28
Simple Somatic Mutations (SSM or SNV)
Copy Number Alterations (CAN or CNV)
Structural Variants (SV)
Germline variants (SNPs)
Gene Expression (micro-arrays and RNASeq)
miRNA Expression (RNASeq)
Epigenomics (Arrays and Methylation)
Splicing Variation (RNASeq)
Protein Expression (Arrays)
ONTARIO INSTITUTE FOR CANCER RESEARC
Are they all using the same pipelines?
29
No
ONTARIO INSTITUTE FOR CANCER RESEARC
http://goo.gl/CekF6y
Missing Clinical Data?
ONTARIO INSTITUTE FOR CANCER RESEARC
31
http://goo.gl/CekF6y
ONTARIO INSTITUTE FOR CANCER RESEARC
Are we all using the same definition for
controlled access data?
32
No
ONTARIO INSTITUTE FOR CANCER RESEARC
ICGC
BAM/FASTQ
TCGA
BAM/FASTQ
ICGC
Open
Data
(includes
TCGA
Open Data)
ONTARIO INSTITUTE FOR CANCER RESEARC
• Detailed Phenotype and Outcome data
Region of residence
Risk factors
Examination
Surgery
Radiation
Sample
Slide
Specific histological features
Analyte
Aliquot
Donor notes
• Gene Expression (probe-level data)
• Raw genotype calls
• Gene-sample identifier links
• Genome sequence files
ICGC Controlled
Access Datasets
• Cancer Pathology
Histologic type or subtype
Histologic nuclear grade
• Patient/Person
Gender, Age range,
Vital status, Survival time
Relapse type, Status at follow-up
• Gene Expression (normalized)
• DNA methylation
•Computed Copy Number and
Loss of Heterozygosity
• Newly discovered somatic variants
ICGC OA
Datasets
http://goo.gl/w4mrV
ONTARIO INSTITUTE FOR CANCER RESEARC
ICGC
TCGA
ONTARIO INSTITUTE FOR CANCER RESEARC
ICG
C
TCGA
Differences between ICGC & TCGA
• Different tumour types
• Different geographic rules
• Many countries vs one jurisdiction
• Different definitions of what is controlled
• Different data access rules
ONTARIO INSTITUTE FOR CANCER RESEARC
• Detailed Phenotype and Outcome data
• Gene Expression (probe-level data)
• Raw genotype calls
• Gene-sample identifier links
• Genome sequence files
• Germ line variants
ICGC Controlled
Access Datasets
• Cancer Pathology
Histologic type or subtype
Histologic nuclear grade
• Patient/Person
Gender, Age range,
Vital status, Survival time
Relapse type, Status at follow-up
• Gene Expression (normalized)
• DNA methylation
•Computed Copy Number and
Loss of Heterozygosity
• Somatic variants from Exome or WGS
ICGC Open
Access Datasets
http://goo.gl/w4mrV
ONTARIO INSTITUTE FOR CANCER RESEARC
• Primary sequence data
(BAM and FASTQ files)
• SNP6 array level 1 and level 2 data
• Exon array level 1 and level 2 data
• Somatic variants from whole
genome sequencing
• Certain information in MAFs
• A full list of controlled-access
data types can be found at:
http://goo.gl/K1h7zu
TCGA Controlled
Access Datasets
• De-identified clinical and
demographic data
• Gene expression data
• Copy number alterations in regions
of the genome
• Epigenetic data
• Summaries of data compiled across
individuals
• Anonymized single amplicon DNA
sequence data
• Somatic variants from scrubbed
exome sequencing
TCGA Open
Access Datasets
http://goo.gl/A1rMRB
ONTARIO INSTITUTE FOR CANCER RESEARC
Can we do better?
39
ONTARIO INSTITUTE FOR CANCER RESEARC
From ICGC/TCGA
40
Each groups have been free to decide on their own if
they wanted to sequence Exomes or Whole Genomes.
A bit more than 10% of all genomes done were done
with Whole Genome Sequencing
A steering comitte was formed and we decided to
alnalyze these WG in a robust way with the primary
question of figuring out what was hidden in the genomic
sequence of cancer patients!
ONTARIO INSTITUTE FOR CANCER RESEARC
41
ONTARIO INSTITUTE FOR CANCER RESEARC
Steering Committee of PCAWG
42
Peter Campbell, Sanger Inst.
Gady Getz, Broad
Jan Korbel, EMBL
Lincoln Stein, OICR
Josh Stuart, UCSC
ONTARIO INSTITUTE FOR CANCER RESEARC
PanCancer Analysis of Whole Genomes
(PCAWG)
> 2,800 T/N pairs with clinical data from 20
tumour type of whole genome analysis.
Aligned with one standard pipeline.
Genomic Variants determined with 3
pipelines
17 working groups
Start writing papers now
ONTARIO INSTITUTE FOR CANCER RESEARC
Deliverable for PCAWG will include:
44
1st PANCANCER analysis on > 2,800 cancer tumours
from a WGS perspective
RNA, SSM, CNV, Methylation analysis & germline
Published (executable) pipelines
Docker / Dockstore
Mutiple cloud access to data
Multiple portal access to data
ONTARIO INSTITUTE FOR CANCER RESEARC
https://dcc.icgc.org/pcawg
45
ONTARIO INSTITUTE FOR CANCER RESEARC
Working Groups (1/2)
46
1 Novel somatic mutation calling methods
2 Analysis of mutations in regulatory regions
3 Integration of transcriptome and genome
4 Integration of epigenome and genome
5 Consequences of somatic mutations on pathway
and network activity
6 Patterns of structural variations, signatures, genomic
correlations, retrotransposons, mobile elements
7 Mutation signatures and processes
8 Germline cancer genome
ONTARIO INSTITUTE FOR CANCER RESEARC
Working Groups (1/2)
47
9 Inferring driver mutations and identifying cancer genes
and pathways
10 Translating cancer genomes to the clinic
11 Evolution and heterogeneity
12 Exploratory: portals, visualization and software
infrastructure
13 Molecular subtypes and classification
14 Analysis of mutations in non-coding RNA
15 Exploratory: mitochondrial
16 Exploratory: pathogens
Tech Technical working group
ONTARIO INSTITUTE FOR CANCER RESEARC
http://dockstore.org
48
ONTARIO INSTITUTE FOR CANCER RESEARC
49
PCAWG pipelines now on Dockstore
ONTARIO INSTITUTE FOR CANCER RESEARC
DOCKSTORE testing group
50
Andrew Duncan, OICR
Christina Yung, OICR
Denis Yuen, OICR
Zhibin Lu, OICR
Brian O’Connor, UCSC
Alex Buchanan, OHSU
Kyle Ellrott, OHSU
Francis Ouellette, OICR
Gordon Saksena, Broad
Junjun Zhang, OICR
Miguel Vazquez, CNIO
Oliver Hofmann, Australia
Solomon Shorser, OICR
Adam Strucka, OHSU
ONTARIO INSTITUTE FOR CANCER RESEARC
Challenges:
51
Too many conference calls!
Too many clouds
Even though we learned from what not to do with ICGC,
we had to learn what not to do in the clouds.
TCGA and ICGC have different authorization protocols
Not all data can exist everywhere
Dockstore testing is taking too long!
ONTARIO INSTITUTE FOR CANCER RESEARC
Other projects in planning
ICGC to finish in Spring of 2018
Planning for ICGCmed
ICGC 1: 25,000 tumours (DNA, RNA, Epigenome,
Clinical data)
ICGCmed: 200,000 Tumours (DNA, RNA,
Epigenome, Clinical trial)
ICGC1 was the picture, ICGCmed will be the movie
(before and after treatment).
Submission system with one place for data and
metadata
Tools/links directory portal
ONTARIO INSTITUTE FOR CANCER RESEARC
53
29,647
ONTARIO INSTITUTE FOR CANCER RESEARC
54
29,647
ONTARIO INSTITUTE FOR CANCER RESEARC
55
2,834
ONTARIO INSTITUTE FOR CANCER RESEARC
56
2,834
ONTARIO INSTITUTE FOR CANCER RESEARC
57
1477
ONTARIO INSTITUTE FOR CANCER RESEARC
58
1477
ONTARIO INSTITUTE FOR CANCER RESEARC
59
915
ONTARIO INSTITUTE FOR CANCER RESEARC
60
915
ONTARIO INSTITUTE FOR CANCER RESEARC
61
20
ONTARIO INSTITUTE FOR CANCER RESEARC
62
20
ONTARIO INSTITUTE FOR CANCER RESEARC
17
ONTARIO INSTITUTE FOR CANCER RESEARC
http://bioinformatics.ca/
17
ONTARIO INSTITUTE FOR CANCER RESEARC
65
12
ONTARIO INSTITUTE FOR CANCER RESEARC
66
0-Toronto1-Bethesda2-Hinxton
4-Queensland 3-Madrid5-Kyoto
7-Hidelberg 6-Cannes8-Toronto
9-Beijing
10-Mumbai11- Boston
12
ONTARIO INSTITUTE FOR CANCER RESEARC
67
10
ONTARIO INSTITUTE FOR CANCER RESEARC
Informatics & BioComputing @ OICR
68
10
ONTARIO INSTITUTE FOR CANCER RESEARC
9
ONTARIO INSTITUTE FOR CANCER RESEARC
9
ONTARIO INSTITUTE FOR CANCER RESEARC
71
1
ONTARIO INSTITUTE FOR CANCER RESEARC
Bioinformatics.ca workshops Content
72
http://bioinformatics-ca.github.io/
https://goo.gl/CGu13q
1
ONTARIO INSTITUTE FOR CANCER RESEARC
DCC Software
Developer
Vincent Ferretti
Dusan Andric
Phuong-My Do
Francois Gerthoffert
Terry Lin
Michael Moncada
Vitalii Slobodianyk
Bob Tiernay
Douglas Wong
Linda Xiang
Junjun Zhang
Acknowledgments
ICGC/OICR
Project leaders:
Tom Hudson
John McPherson
Lincoln Stein
Jared Simpson
Paul Boutros
Vincent Ferretti
Francis Ouellette
Jennifer Jennings
Christine Yung
Ouellette Lab
Alysha Moncrieffe
Ann Meyer
Zhibin Lu
Web Dev
Joseph Yamada
Kaman Wu
Kim Cullion
Koji Miyauchi
Miyuki Fukuma
ICGC DCC Biocuration
Hardeep Nahal
Marc Perry
http://oicr.on.ca http://icgc.org
… and all the patients and their
families that that are putting
their hopes into our work!
Research
IT/Systems
David Sutton,
Bob Gibson
David Magda
Rob Naccarato
Brian Ott
Gino Yearwood
EGA
Jordi Rambla De
Argila
Arcadi Navarro
Audald Iloret
Mauricio Moldes
ONTARIO INSTITUTE FOR CANCER RESEARC
http://icgc.org
http://dcc.icgc.org
http://docs.icgc.org
info@icgc.org
http://bioinformatics.ca
ONTARIO INSTITUTE FOR CANCER RESEARC
We are hiring:
• OICR Director
• Genome Technology Director
• Junior Faculty in Informatics
& Biocomputing
• PDFs
Interested? Ask Paul Boutros or I
ONTARIO INSTITUTE FOR CANCER RESEARC
76
Muchas gracias!

Madrid icgc pcawg_2016_slideshare

  • 1.
    Canceromatic III -Session I: Pan-Cancer analysis - Changing landscape of data and tools available for reproducible cancer genomics workflows: report from the ICGC trenches. Nov 14th 2016 B.F. Francis Ouellette francis@oicr.on.ca • Senior Scientists & Associate Director, Informatics and Biocomputing, Ontario Institute for Cancer Research, Toronto, ON • Associate Professor, Department of Cell and Systems Biology, University of Toronto, Toronto, ON.
  • 2.
  • 3.
  • 4.
    ONTARIO INSTITUTE FORCANCER RESEARC You are free to: Copy, share, adapt, or re-mix; Photograph, film, or broadcast; Blog, live-blog, or post video of; This presentation. Provided that: You attribute the work to its author and respect the rights and licenses associated with its components. Slide Concept by Cameron Neylon, who has waived all copyright and related or neighbouring rights. This slide only ccZero. Social Media Icons adapted with permission from originals by Christopher Ross. Original images are available under GPL at; http://www.thisismyurl.com/free-downloads/15-free-speech-bubble-icons-for-popular-websites
  • 5.
    ONTARIO INSTITUTE FORCANCER RESEARC @bffo francis@oicr.on.caE-mail
  • 6.
    ONTARIO INSTITUTE FORCANCER RESEARC 6 Cancer-om-atics Jul 6-9 2009 Cancer-om-atics II Mar 28-30 2011 Canceromatics III Nov 13 -16 2016
  • 7.
    ONTARIO INSTITUTE FORCANCER RESEARC Disclaimers I do not (and will not) profit in any way, shape or form, from any of the brands, products or companies I may mention. I am a big proponent of Open Access, Open Source, Opent Data and Open Courseware I am on the SAB of many NIH funded projects (SGD, Galaxy, GenomeSpace, H3ABionet, and HMP2), as well as Elixir and Genome Canada’s SIAC, and the NRC’s KMAC. This comes with a bias on how science should be done!
  • 8.
    ONTARIO INSTITUTE FORCANCER RESEARC Outline 8 Introduction ICGC PCAWG Closing remarks
  • 9.
    ONTARIO INSTITUTE FORCANCER RESEARC 9 adapted from https://goo.gl/fQJAz1 ICGC PCAWG Docker Testing
  • 10.
    ONTARIO INSTITUTE FORCANCER RESEARC Cancer is a Disease of the Genome Challenge in Treating Cancer:  Every tumour is different  Every cancer patient is different Adapted from Tom Hudsonhttps://www.cancer.gov/research/areas/genomics
  • 11.
    ONTARIO INSTITUTE FORCANCER RESEARC  Johns Hopkins > 18,000 genes analyzed for mutations 11 breast and 11 colon tumors L.D. Wood et al, Science, Oct. 2007  Wellcome Trust Sanger Institute 518 genes analyzed for mutations 210 tumors of various types C. Greenman et al, Nature, Mar. 2007  TCGA (NIH) Multiple technologies brain (glioblastoma multiforme), lung (squamous carcinoma), and ovarian (serous cystadenocarcinoma). F.S. Collins & A.D. Barker, Sci. Am, Mar. 2007 Large-Scale Studies of Cancer Genomes
  • 12.
    ONTARIO INSTITUTE FORCANCER RESEARC  Heterogeneity within and across tumor types  High rate of abnormalities (driver vs passenger)  Sample quality matters  Consent and controlled data access is complicated Lessons learned from early studies MR Stratton et al. Nature 458, 719-724 (2009) doi:10.1038/nature07943
  • 13.
    ONTARIO INSTITUTE FORCANCER RESEARC Analysis Data Types Simple Somatic Mutations (SSM or SNV) Copy Number Alterations (CAN or CNV) Structural Variants (SV) Germline variants (SNPs) Gene Expression (micro-arrays and RNASeq) miRNA Expression (RNASeq) Epigenomics (Arrays and Methylation) Splicing Variation (RNASeq) Protein Expression (Arrays)
  • 14.
    ONTARIO INSTITUTE FORCANCER RESEARC Rationale for the ICGC: Scope is huge Reduce duplication of effort Standardization and uniform quality measures Merging of datasets Spectrum of many cancers varies across the world Accelerate the dissemination of genomic and analytical methods
  • 15.
    ONTARIO INSTITUTE FORCANCER RESEARC International Cancer Genome Consortium Collect ~500 tumour/normal pairs from each of 50 different major cancer types; 25,000 T/N pairs! Comprehensive genome analysis of each T/N pair: Genome Transcriptome Methylome Clinical data Make the data available to the research community & public. Identify genome changes …GATTATTCCAGGTAT… …GATTATTGCAGGTAT… …GATTATTGCAGGTAT… Adapted from Tom Hudson
  • 16.
    ONTARIO INSTITUTE FORCANCER RESEARC 16
  • 17.
    ONTARIO INSTITUTE FORCANCER RESEARC International Cancer Genome Consortium: http:/icgc.org
  • 18.
    ONTARIO INSTITUTE FORCANCER RESEARC Data Submission Validation ValidationValidation (dictionary) Validation (across fields) Validation (across fields) Validation (across fields) indexing Happy Users http://goo.gl/1EcyR
  • 19.
    ONTARIO INSTITUTE FORCANCER RESEARC ICGC needs to deal with different kinds of users! 19 Biologists/Clinicians: Web interface to processed data, providing: Affected gene lists with consequences Impact on pathways Power users: Application Programing Interface (API) to get to data Availability and Integration with cloud resources
  • 20.
    ONTARIO INSTITUTE FORCANCER RESEARC ICGC Data Coordinating Centre: dcc.icgc.org 20
  • 21.
    ONTARIO INSTITUTE FORCANCER RESEARC BRAF missense mutations in colorectal cancer 21
  • 22.
    ONTARIO INSTITUTE FORCANCER RESEARC https://dcc.icgc.org/ 22
  • 23.
    ONTARIO INSTITUTE FORCANCER RESEARC 23 https://dcc.icgc.org/icgc-in-the-cloud
  • 24.
    ONTARIO INSTITUTE FORCANCER RESEARC 24 http://www.cancercollaboratory.org/
  • 25.
    ONTARIO INSTITUTE FORCANCER RESEARC http://docs.icgc.org/ User and submitter documentation
  • 26.
    ONTARIO INSTITUTE FORCANCER RESEARC Software development discussions 26 https://discuss.icgc.org/
  • 27.
    ONTARIO INSTITUTE FORCANCER RESEARC Some challenges: 27 So, we have lots of data, is it generated the same way?
  • 28.
    ONTARIO INSTITUTE FORCANCER RESEARC Every country/group has basically been submitting: 28 Simple Somatic Mutations (SSM or SNV) Copy Number Alterations (CAN or CNV) Structural Variants (SV) Germline variants (SNPs) Gene Expression (micro-arrays and RNASeq) miRNA Expression (RNASeq) Epigenomics (Arrays and Methylation) Splicing Variation (RNASeq) Protein Expression (Arrays)
  • 29.
    ONTARIO INSTITUTE FORCANCER RESEARC Are they all using the same pipelines? 29 No
  • 30.
    ONTARIO INSTITUTE FORCANCER RESEARC http://goo.gl/CekF6y Missing Clinical Data?
  • 31.
    ONTARIO INSTITUTE FORCANCER RESEARC 31 http://goo.gl/CekF6y
  • 32.
    ONTARIO INSTITUTE FORCANCER RESEARC Are we all using the same definition for controlled access data? 32 No
  • 33.
    ONTARIO INSTITUTE FORCANCER RESEARC ICGC BAM/FASTQ TCGA BAM/FASTQ ICGC Open Data (includes TCGA Open Data)
  • 34.
    ONTARIO INSTITUTE FORCANCER RESEARC • Detailed Phenotype and Outcome data Region of residence Risk factors Examination Surgery Radiation Sample Slide Specific histological features Analyte Aliquot Donor notes • Gene Expression (probe-level data) • Raw genotype calls • Gene-sample identifier links • Genome sequence files ICGC Controlled Access Datasets • Cancer Pathology Histologic type or subtype Histologic nuclear grade • Patient/Person Gender, Age range, Vital status, Survival time Relapse type, Status at follow-up • Gene Expression (normalized) • DNA methylation •Computed Copy Number and Loss of Heterozygosity • Newly discovered somatic variants ICGC OA Datasets http://goo.gl/w4mrV
  • 35.
    ONTARIO INSTITUTE FORCANCER RESEARC ICGC TCGA
  • 36.
    ONTARIO INSTITUTE FORCANCER RESEARC ICG C TCGA Differences between ICGC & TCGA • Different tumour types • Different geographic rules • Many countries vs one jurisdiction • Different definitions of what is controlled • Different data access rules
  • 37.
    ONTARIO INSTITUTE FORCANCER RESEARC • Detailed Phenotype and Outcome data • Gene Expression (probe-level data) • Raw genotype calls • Gene-sample identifier links • Genome sequence files • Germ line variants ICGC Controlled Access Datasets • Cancer Pathology Histologic type or subtype Histologic nuclear grade • Patient/Person Gender, Age range, Vital status, Survival time Relapse type, Status at follow-up • Gene Expression (normalized) • DNA methylation •Computed Copy Number and Loss of Heterozygosity • Somatic variants from Exome or WGS ICGC Open Access Datasets http://goo.gl/w4mrV
  • 38.
    ONTARIO INSTITUTE FORCANCER RESEARC • Primary sequence data (BAM and FASTQ files) • SNP6 array level 1 and level 2 data • Exon array level 1 and level 2 data • Somatic variants from whole genome sequencing • Certain information in MAFs • A full list of controlled-access data types can be found at: http://goo.gl/K1h7zu TCGA Controlled Access Datasets • De-identified clinical and demographic data • Gene expression data • Copy number alterations in regions of the genome • Epigenetic data • Summaries of data compiled across individuals • Anonymized single amplicon DNA sequence data • Somatic variants from scrubbed exome sequencing TCGA Open Access Datasets http://goo.gl/A1rMRB
  • 39.
    ONTARIO INSTITUTE FORCANCER RESEARC Can we do better? 39
  • 40.
    ONTARIO INSTITUTE FORCANCER RESEARC From ICGC/TCGA 40 Each groups have been free to decide on their own if they wanted to sequence Exomes or Whole Genomes. A bit more than 10% of all genomes done were done with Whole Genome Sequencing A steering comitte was formed and we decided to alnalyze these WG in a robust way with the primary question of figuring out what was hidden in the genomic sequence of cancer patients!
  • 41.
    ONTARIO INSTITUTE FORCANCER RESEARC 41
  • 42.
    ONTARIO INSTITUTE FORCANCER RESEARC Steering Committee of PCAWG 42 Peter Campbell, Sanger Inst. Gady Getz, Broad Jan Korbel, EMBL Lincoln Stein, OICR Josh Stuart, UCSC
  • 43.
    ONTARIO INSTITUTE FORCANCER RESEARC PanCancer Analysis of Whole Genomes (PCAWG) > 2,800 T/N pairs with clinical data from 20 tumour type of whole genome analysis. Aligned with one standard pipeline. Genomic Variants determined with 3 pipelines 17 working groups Start writing papers now
  • 44.
    ONTARIO INSTITUTE FORCANCER RESEARC Deliverable for PCAWG will include: 44 1st PANCANCER analysis on > 2,800 cancer tumours from a WGS perspective RNA, SSM, CNV, Methylation analysis & germline Published (executable) pipelines Docker / Dockstore Mutiple cloud access to data Multiple portal access to data
  • 45.
    ONTARIO INSTITUTE FORCANCER RESEARC https://dcc.icgc.org/pcawg 45
  • 46.
    ONTARIO INSTITUTE FORCANCER RESEARC Working Groups (1/2) 46 1 Novel somatic mutation calling methods 2 Analysis of mutations in regulatory regions 3 Integration of transcriptome and genome 4 Integration of epigenome and genome 5 Consequences of somatic mutations on pathway and network activity 6 Patterns of structural variations, signatures, genomic correlations, retrotransposons, mobile elements 7 Mutation signatures and processes 8 Germline cancer genome
  • 47.
    ONTARIO INSTITUTE FORCANCER RESEARC Working Groups (1/2) 47 9 Inferring driver mutations and identifying cancer genes and pathways 10 Translating cancer genomes to the clinic 11 Evolution and heterogeneity 12 Exploratory: portals, visualization and software infrastructure 13 Molecular subtypes and classification 14 Analysis of mutations in non-coding RNA 15 Exploratory: mitochondrial 16 Exploratory: pathogens Tech Technical working group
  • 48.
    ONTARIO INSTITUTE FORCANCER RESEARC http://dockstore.org 48
  • 49.
    ONTARIO INSTITUTE FORCANCER RESEARC 49 PCAWG pipelines now on Dockstore
  • 50.
    ONTARIO INSTITUTE FORCANCER RESEARC DOCKSTORE testing group 50 Andrew Duncan, OICR Christina Yung, OICR Denis Yuen, OICR Zhibin Lu, OICR Brian O’Connor, UCSC Alex Buchanan, OHSU Kyle Ellrott, OHSU Francis Ouellette, OICR Gordon Saksena, Broad Junjun Zhang, OICR Miguel Vazquez, CNIO Oliver Hofmann, Australia Solomon Shorser, OICR Adam Strucka, OHSU
  • 51.
    ONTARIO INSTITUTE FORCANCER RESEARC Challenges: 51 Too many conference calls! Too many clouds Even though we learned from what not to do with ICGC, we had to learn what not to do in the clouds. TCGA and ICGC have different authorization protocols Not all data can exist everywhere Dockstore testing is taking too long!
  • 52.
    ONTARIO INSTITUTE FORCANCER RESEARC Other projects in planning ICGC to finish in Spring of 2018 Planning for ICGCmed ICGC 1: 25,000 tumours (DNA, RNA, Epigenome, Clinical data) ICGCmed: 200,000 Tumours (DNA, RNA, Epigenome, Clinical trial) ICGC1 was the picture, ICGCmed will be the movie (before and after treatment). Submission system with one place for data and metadata Tools/links directory portal
  • 53.
    ONTARIO INSTITUTE FORCANCER RESEARC 53 29,647
  • 54.
    ONTARIO INSTITUTE FORCANCER RESEARC 54 29,647
  • 55.
    ONTARIO INSTITUTE FORCANCER RESEARC 55 2,834
  • 56.
    ONTARIO INSTITUTE FORCANCER RESEARC 56 2,834
  • 57.
    ONTARIO INSTITUTE FORCANCER RESEARC 57 1477
  • 58.
    ONTARIO INSTITUTE FORCANCER RESEARC 58 1477
  • 59.
    ONTARIO INSTITUTE FORCANCER RESEARC 59 915
  • 60.
    ONTARIO INSTITUTE FORCANCER RESEARC 60 915
  • 61.
    ONTARIO INSTITUTE FORCANCER RESEARC 61 20
  • 62.
    ONTARIO INSTITUTE FORCANCER RESEARC 62 20
  • 63.
    ONTARIO INSTITUTE FORCANCER RESEARC 17
  • 64.
    ONTARIO INSTITUTE FORCANCER RESEARC http://bioinformatics.ca/ 17
  • 65.
    ONTARIO INSTITUTE FORCANCER RESEARC 65 12
  • 66.
    ONTARIO INSTITUTE FORCANCER RESEARC 66 0-Toronto1-Bethesda2-Hinxton 4-Queensland 3-Madrid5-Kyoto 7-Hidelberg 6-Cannes8-Toronto 9-Beijing 10-Mumbai11- Boston 12
  • 67.
    ONTARIO INSTITUTE FORCANCER RESEARC 67 10
  • 68.
    ONTARIO INSTITUTE FORCANCER RESEARC Informatics & BioComputing @ OICR 68 10
  • 69.
    ONTARIO INSTITUTE FORCANCER RESEARC 9
  • 70.
    ONTARIO INSTITUTE FORCANCER RESEARC 9
  • 71.
    ONTARIO INSTITUTE FORCANCER RESEARC 71 1
  • 72.
    ONTARIO INSTITUTE FORCANCER RESEARC Bioinformatics.ca workshops Content 72 http://bioinformatics-ca.github.io/ https://goo.gl/CGu13q 1
  • 73.
    ONTARIO INSTITUTE FORCANCER RESEARC DCC Software Developer Vincent Ferretti Dusan Andric Phuong-My Do Francois Gerthoffert Terry Lin Michael Moncada Vitalii Slobodianyk Bob Tiernay Douglas Wong Linda Xiang Junjun Zhang Acknowledgments ICGC/OICR Project leaders: Tom Hudson John McPherson Lincoln Stein Jared Simpson Paul Boutros Vincent Ferretti Francis Ouellette Jennifer Jennings Christine Yung Ouellette Lab Alysha Moncrieffe Ann Meyer Zhibin Lu Web Dev Joseph Yamada Kaman Wu Kim Cullion Koji Miyauchi Miyuki Fukuma ICGC DCC Biocuration Hardeep Nahal Marc Perry http://oicr.on.ca http://icgc.org … and all the patients and their families that that are putting their hopes into our work! Research IT/Systems David Sutton, Bob Gibson David Magda Rob Naccarato Brian Ott Gino Yearwood EGA Jordi Rambla De Argila Arcadi Navarro Audald Iloret Mauricio Moldes
  • 74.
    ONTARIO INSTITUTE FORCANCER RESEARC http://icgc.org http://dcc.icgc.org http://docs.icgc.org info@icgc.org http://bioinformatics.ca
  • 75.
    ONTARIO INSTITUTE FORCANCER RESEARC We are hiring: • OICR Director • Genome Technology Director • Junior Faculty in Informatics & Biocomputing • PDFs Interested? Ask Paul Boutros or I
  • 76.
    ONTARIO INSTITUTE FORCANCER RESEARC 76 Muchas gracias!