Thousands of tumor genomes/exomes are being sequenced as part of the International Cancer Genome Consortium (ICGC), The Cancer Genome Atlas (TCGA) and other initiatives. This opens the possibility to have, for the first time, a comprehensive picture of mutations, genes and pathways involved in the cancer phenotype across tumor types. We have developed computational methods able to identify signals of positive selection in the pattern of tumor somatic mutations, which point to genes and pathways directly involved in the development of the tumors. We have applied these approaches to 3025 tumors from 12 different cancer types of the TCGA Pan-Cancer project, identifying 291 high-confidence cancer driver genes acting on those tumors (Tamborero et al 2013). We have also developed IntOGen-mutations (http://www.intogen.org/mutations), a novel web platform for cancer genomes interpretations, which analyses not only TCGA pan-cancer data but all mutation data from ICGC and other initiatives. The resource allows users to identify driver mutations, genes and pathways acting on more than 6000 tumors originated in 17 different cancer sites and to analyze newly sequence tumor genomes. Among the novel cancer drivers identified there are chromatin regulatory factors and splicing factors, which are emerging as important genes in cancer development and are regarded as interesting candidates for novel targets for cancer treatment. In my talk I will summarize all these recent findings.
More info: http://bg.upf.edu/blog/2013/10/my-slides-on-identification-of-cancer-drivers-across-tumor-types/
Identification of cancer drivers across tumor types
1. Identification of cancer drivers
across tumor types
Nuria Lopez-Bigas
ICREA Research Professor at Universitat Pompeu Fabra
Barcelona
http://bg.upf.edu
3. BRAF is frequently mutated in melanoma (V600E)
Vemurafenib
Vemurafenib
Vemurafenib
Dibb et al., Nature Review Cancer 2004
Davies et al. Nature 2002
August 2011
6. Sequencing tumor genomes
Mrs. McDaniel
Normal Cell
Tumor Cell
Sequencing
Which mutations are
drivers?
Somatic mutations
7. Cancer is an evolutionary process
Yates and Campbell et al, Nat Rev Genet 2012
8. How to differentiate drivers from passengers?
ACTGCCTACGTCTCACCGTCGACTTCAAATCGCTTAACCCGTACTCCCATGCTACTGC
ATCTCGGGTTAACTCGACGTTTTTCATGCATGTGTGCACCCCAATATATATGCAACTT
TTGTGCACCTCTGTCACGCGCGAGTTGGCACTGTCGCCCCTGTGTGCATGTGCACTGT
CTCTCGCTGCACTGCCTACGTCTCACCGTCGACTTCAAATCGCTTAACCCGTACTCCC
ATGCTACTGCATCTCGGGTTAACTCGACGTTTTGCATGCATGTGTGCACCCCAATATA
TATGCAACTTTTGTGCACCTCTGTCACGCGCGAGTTGGCACTGTCGCCCCTGTGTGCA
TGTGCACTGTCTCTCGAGTTTTGCATGCATGTGTGCACTGTGCACCTCTGTTACGTCT
Find signals of positive
selection across tumour
re-sequenced genomes
9. Signals of positive selection
Recurrence
R
MuSiC-SMG / MutSig
Mutation
Identify genes mutated more frequently than background mutation rate
10. Signals of positive selection
Recurrence
R
MuSiC-SMG / MutSigCV
Mutation
Identify genes mutated more frequently than background mutation rate
Challenge: Background mutation rate varies across patients and genomic regions
Replication time
Stamatoyannoppoulos et al., Nature Genetics 2009
Chromatin organization
Schuster-Böckler and Lehner, Nature 2011
11. Signals of positive selection
Functional impact bias (FMbias)
F
OncodriveFM
Mutation
How to measure functional impact of mutations?
• Based on consequences of mutations (eg. synonymous is
lowest and STOPgain, frameshift indel highest)
• And SIFT, PPH2 and MA for missense
Gonzalez-Perez and Lopez-Bigas. NAR 2012
12. Signals of positive selection
Functional impact bias (FMbias)
F
OncodriveFM
Mutation
Main Advantages of FM bias approach
• It does not depend on background mutation rates
• Only needs list of somatic mutations
• It is computationally cheap
Gonzalez-Perez and Lopez-Bigas. NAR 2012
14. Banerji et al Nature 2012. Which analyzes 103 breast tumors
OncodriveFM
MutSig
TP53
CBFB
GATA3
MAP3K1
AKT1
PIK3CA
MLL
NOTCH2
PCDHA7
15. PIK3CA is a false negative of OncodriveFM in some Breast
Cancer projects
Protein affecting mutations
80
PIK3CA
0
0
1047
Protein position
H1047L
PIK3CA is recurrently mutated in the same residue in breast
tumours
Lowly scored by functional
impact metrics
16. Signals of positive selection
Mutation clustering
OncodriveCLUST
Mutation
Tamborero et al., Bioinformatics 2013
17. Signals of positive selection: OncodriveCLUST
Gene B
Gene A
mutations
(I)
mutations
(II)
Th
Th
mutations
(III)
(IV)
mutations
C1
C1
Amino acid
(V)
SgeneA = Sc1
C2
Background model obtained by
calculating the clustering score per
gene of the coding-silent mutations
Amino acid
SgeneB = Sc1 + SC2
(VI)
ZB
ZA
0
SgeneB S
geneA
Tamborero et al., Bioinformatics 2013
18. Banerji et al Nature 2012. Which analyzes 103 breast tumors
OncodriveFM
MutSig
TP53
CBFB
GATA3
MAP3K1
AKT1
PIK3CA
ERBB2
PRKCZ
NME5
AKR1C3
RSBN1L
OncodriveCLUST
MLL
NOTCH2
PCDHA7
19. IntOGen mutations pipeline
To interpret catalogs of cancer somatic mutations
List of tumor
somatic
mutations
✓ Identify consequences of mutations (Ensembl VEP)
✓ Assess functional impact of nsSNVs (SIFT, PPH2, MA and TransFIC)
✓ Compute frequency of mutations per gene and pathway
✓ Identify candidate driver genes (OncodriveFM and OncodriveCLUST)
✓ Identify pathways with FM bias (OncodriveFM)
Input data
Analysis Pipeline (powered by Wok)
Workflow Management Sytem
Christian Perez-Llamas
Browser (powered by
Onexus)
Web browser creation
Jordi Deu-Pons
20. IntOGen mutations pipeline
To interpret catalogs of cancer somatic mutations
Current version:
31 Projects
13 Cancer sites
4623 tumours
List of tumor
somatic
mutations
Input data
Working version:
41 Projects
17 Cancer sites
~6300 tumours
✓ Identify consequences of mutations (Ensembl VEP)
✓ Assess functional impact of nsSNVs (SIFT, PPH2, MA and TransFIC)
✓ Compute frequency of mutations per gene and pathway
✓ Identify candidate driver genes (OncodriveFM and OncodriveCLUST)
✓ Identify pathways with FM bias (OncodriveFM)
Analysis Pipeline (powered by Wok)
Browser (powered by
Onexus)
.org
http://www.intogen.org/mutations
Gonzalez-Perez et al, Nature Methods 2013
21. Projects in current version of IntOGen
Site
Number of
projects
Samples
Bladder
1
98
Brain
3
491
Breast
6
1148
Colorectal
2
229
Head and neck
2
375
Hematopoietic
3
395
Kidney
1
417
Liver
1
24
Lung
6
664
Ovary
1
316
Pancreas
3
214
Stomach
1
22
Uterus
1
230
TOTAL
31
4623
Gonzalez-Perez et al, Nature Methods 2013
22. Combining results across projects
genes
genes
OncodriveFM
+
0.05
No mutation
Low
High
Gonzalez-Perez et al, Nature Methods 2013
Cancer site A
combine
...
0
Functional Impact
project 4
samples
project 3
project 1
project 2
project 1
Cancer site A
p-value
1
23. Comprehensive view of cancer vulnerability across tumor types
http://www.intogen.org/mutations
Gonzalez-Perez et al, Nature Methods 2013
24. Comprehensive view of cancer vulnerability across tumor types
0.4
0.3
0.2
0.1
http://www.intogen.org/mutations
Mutation frequency
39. Differences in relative important of driver CRFs between cancer types
Glioblastoma TCGA
-2
0
MA FIS score
0.4
Glioblastoma JHU
0.2
Paediatric
medulloblastoma
TP53
PTEN
EGFR
NF1
IDH1
RB1
PIK3R1
ATRX
KMT2C
CTNNB1
DDX3X
STAG2
MYH8
SMARCA4
PRDM9
LZTR1
KDM6A
RPL5
WDR90
BPTF
SETD2
EP300
ARID1A
KDM5C
ATF7IP
NCOR1
CHD4
PBRM1
PHC3
BAP1
MBD1
NSD1
CHD2
CHD3
Glioblastoma TCGA
Glioblastoma JHU
Pediatric Brain DKFZ
Mutated CRFs / site-specific drivers ratio
4.5
Gonzalez-Perez et al, Genome Biology 2013
40. Pan-Cancer Project - The Cancer Genome Atlas
TCGA PanCancer Network, Nature Genetics 2013
41. TCGA pan-cancer project
12 cancer types - 3205 tumors
Project Name
Number of
samples
Tumor Type
BLCA
Bladder Urothelial Carcinoma
98
BRCA
Breast invasive carcinoma
762
Colon and Rectum adenocarcinoma
193
GBM
Glioblastoma multiforme
290
HNSC
Head and Neck squamous cell carcinoma
301
KIRC
Kidney renal clear cell carcinoma
417
LAML
LUAD
Acute Myeloid Leukemia
Lung adenocarcinoma
196
228
LUSC
Lung squamous cell carcinoma
174
Ovarian serous cystadenocarcinoma
316
Uterine Corpus Endometrioid Carcinoma
230
COADREAD
OV
UCEC
3205
TCGA PanCancer Network, Nature Genetics 2013
42. Recurrence
Complementary signals of positive selection
R
MuSiC-SMG
Identify genes mutated more
frequently than background mutation
rate
FM bias
F
Mutation
OncodriveFM
Identify genes with a bias towards
high functional mutations (FM bias)
Mutation
CLUST bias
C
ACTIVE bias
Functional Impact (FI) Score
A
OncodriveCLUST
Identify genes with a significant
regional clustering of mutations
Mutation
ActiveDriver
Identify genes significantly enriched in
mutations affecting phosphorylationassociated sites
M
MutSigCV
Mutation
phosphorylation-associated site
43. Using complementary signals help obtaining a more
comprehensive list of cancer drivers
MuSiC-SMG
R
OncodriveFM
F
OncodriveCLUST
C
ActiveDriver
A
Tamborero et al., Scientific Reports 2013
44. Genes exhibiting more than one signal are more likely true drivers
Tamborero et al., Scientific Reports 2013
46. 291 High Confident Cancer Drivers
Tamborero et al., Scientific Reports 2013
47. Most driver genes are lowly frequently mutated
KIRC
COADREAD
LUAD
LUSC
HNSC
TP53
LAML
GBM
0.4
BLCA
BRCA
OV
UCEC
0.3
0.2
PIK3CA
PTEN
0.1
APC
SF3B1
HRAS
8 / 3205
(0.002)
CDKN2C
Tamborero et al., Scientific Reports 2013
48. Most drivers map to 5 cancer hallmarks
BLCA
BRCA
COADREAD
LUAD
GBM
LUSC
HNSC
http://www.intogen.org/tcga
KIRC
OV
UCEC
LAML
Tamborero et al., Scientific Reports 2013
49. Some drivers show clear specificity for one tumor type
Tamborero et al., Scientific Reports 2013
50. Some novel driver genes map to well-known cancer pathways
Novel cancer gene
Stablished cancer gene
51. 95% of tumors have PAMs in at least one driver
PANCANCER
Samples with at least one PAM in HCDs
Median (IQR) of PAMs in HCDs per sample
Median (IQR) of PAMs in all genes per sample
3038(0.95)
4(4)
49(63)
Proportion of samples
0.20
0.15
0.10
0.05
>30
26-30
21-25
16-20
11-15
10
9
8
7
6
5
4
3
2
1
0
0
Number of PAMs in HCDs
PAMs: Protein affecting mutations
Tamborero et al., Scientific Reports 2013
52. Median of 4 PAMs in drivers per sample with variability per cancer type
165 (0.85)
2 (3)
8 (7)
312 (0.99)
2 (2)
40 (276)
393 (0.94)
3 (3)
45 (24)
710 (0.93)
3 (2)
28 (27)
272 (0.94)
4 (3)
51 (23)
193 (1.0)
5 (2)
65 (47)
299 (0.99)
6 (5)
97 (79)
228 (0.99)
6 (9)
48 (153)
221 (0.98)
9 (8)
183 (248)
172 (0.99)
9 (7)
209 (123)
98 (1.0)
9.5 (7.5)
160 (157)
Proportion of samples
1.00
0.75
0.50
0.25
0
LAML
LAML
OV
OV
KIRC
KIRC
PAMs: Protein affecting mutations
BRCA
BRCA
GBM
COADREAD
HNSC
GBM COAREAD HNSC
UCEC
UCEC
LUAD
LUAD
LUSC
LUSC
BLCA
BLCA
Tamborero et al., Scientific Reports 2013
53. Summary
•
Cancer genomics projects aim to unravel the mechanisms of tumorigenesis
to advance towards personalized cancer medicine
•
To identify cancer driver genes we search for signals of positive selection in
the pattern of somatic mutations
•
IntOGen-mutations contains results of analysing more than 4500 tumours
(6200 in new version) to identify cancer drivers across tumor types
•
IntOGen-mutations can analyse newly sequenced tumor genomes to identify
likely driver mutations
•
34 chromatin regulatory factors show signals of positive selection in the
tumor somatic mutation pattern
•
291 high-confidence cancer driver genes detected in TCGA Pan-Cancer 12
by combining complementary signals of positive selection
54. Biomedical Genomics Lab
Michael Schroeder
David Tamborero
Carlota Rubio
Christian Perez-Llamas
Jordi Deu-Pons
Abel Gonzalez-Perez
Nuria Lopez-Bigas
@bbglab
@nlbigas
http://bg.upf.edu/blog