Cracking Cancers Code
Max Salm, PhD
Bioinformatics & Biostatistics
Max.Salm@cancer.org.uk
The Code

♂

♀

• A code passed from cell to cell
• ~3.2 billion ‘letters’ in total
• Encodes about 22,000 proteins
Image Credit: National Human Genome Research Institute
‘Bugs’ in the Code
Sustain
proliferative
signalling

Evade
growth
suppressors
Mutations in the Genetic Code
Avoid immune
GACCTGGCAGCCAGGAACGTACTGGT
destruction

Deregulating
cellular energetics

Enabling replicative
immortality
• Vast majority have no
consequence…
Tumor-promoting
inflammation
…but occasionally…

Resisting cell
death

GACCTGGCAGCC----ACGTACTGGT

Genome instability
& mutation
Promoting
local blood
supply

• Alteration causes a selective
Activating
growth advantage, increasing
invasion &
the ratio of cell birth to cell
metastasis
death.

Hanahan D & Weinberg R (2011) Hallmarks of Cancer: The Next Generation Cell , Volume 144, Issue 5, Pages 646-674
Reading the Code: 1
$100,000,000

1400
1200

$10,000,000

1000
$1,000,000
800
$100,000

Tb
600

$10,000
400
$1,000

200

Cost/Genome
Short Read Archive
$100

0
2001

2002

2003

Draft Human
Genome Project

2004

2005

2006

2007

‘Massively Parralel’
sequencing

2008

2009

2010

1st Tumour

2011

2012

2013

2014

>12, 000 tumours

Wetterstrand KA. DNA Sequencing Costs: Data from the NHGRI Genome Sequencing Program (GSP) [27/01/2014]
Wang and Wheeler (2014) Genomic Sequencing for Cancer Diagnosis and Therapy Annu. Rev. Med. 65: 33-48
Reading the Code: 2

Challenge: Rate of data production has overtaken improvements
in long-term storage capacity.
Response: Novel Compression Algorithms
Stein (2010) The case for cloud computing in genome informatics Genome Biology, 11:207
Bonfield JK, Mahoney MV (2013) Compression of FASTQ and SAM Format Sequencing Data. PLoS ONE 8(3): e59190.
Reading the Code: 3
‘I would say the Human Genome Project is probably
‘I would say the Human Genome Project is probably
‘I would say the Human Genome Project is probably
more significant than splitting the atom or going
more significant than splitting the atom or going
more significant than splitting the atom or going
to the moon.’ Francis Collins, CNN.
to the moon.’ Francis Collins, CNN.
to the moon.’ Francis Collins, CNN.
Shatter & Scan
probab
ing to
ignifi

X 300M

y more

ly mor
obably
Genome

the Hu
ifican
Proje

itting

signif
ing to

oject

roject

Identify Original
‘
the Human Genome Project
more significant’

probably
Placing the Code Snippets

ACTGGTGAAAA

CGTACTGGTGA
CAGGAACGTAC
CCTGGCAGCCA

TGGTGAAAACA

TCAAGATCACA

GTCAAGATCAC
TGTCAAGATCA

TTTGGGCTGGC
TTTTGGGCTGG

………GACCTGGCAGCCAGGAACGTACTGGTGAAAACACCGCAGCATGTCAACATCACAGATTTTGGGCTGGCCAAACT………

Reference genome

Candidate variant
Placing the Code Snippets
 Encodes a protein change in
the EGFR gene
 This particular coding
change (‘L858R’) renders the
tumour sensitive to targeted
therapy (Afatinib)
 ‘Personalised medicine’

D Gonzalez de Castro, P A Clarke, B Al-Lazikani and P Workman (2013) Personalized Cancer Medicine: Molecular
Diagnostics, Predictive biomarkers, and Drug Resistance Clinical Pharmacology & Therapeutics; 93 3, 252–259
Identifying Bugs, confidently
• “What are we missing?”
The influence of
sample heterogeneity,
purity and read depth.
• ‘False’ mutations:
sequencing errors &
inaccurate alignment.
• Mutation calling is a
work in progress.
• Crowdsourcing a
solution.

Cibulskiset al (2013) A comparative unravels of algorithmsmutations inSNV detection in cancer Bioinformatics 2013;29:2223Caldas C (2012) Cancer sequencing analysissomaticevolution Nature Biotechnologyheterogeneous cancer samples Nature
Roberts et al (2013) Sensitive detection of clonal point for somatic impure and 30, 408–410
Biotechnology 31, 213–219
2230
Interpreting the Code
Distinguishing between mutations that confer a selective advantage and those
that are selectively neutral.
• Recurrent mutations
• Account for variable background
mutation rates

• Comparative genomics

Imielinski Wheeler Mapping the hallmarks of lung for cancer diagnosis and therapy. Annu sequencing. 14;65:33-48.
Wang L & M (2012)DA. (2014) Genomic sequencingadenocarcinoma with massively parallel Rev Med. JanCell. 2012 Sep
14;150(6):1107-20.
Thank you for your attention
Particular thanks go to:
BABS, Prof. Swanton
& above all to the patients

Cracking cancers code feb 2014

  • 1.
    Cracking Cancers Code MaxSalm, PhD Bioinformatics & Biostatistics Max.Salm@cancer.org.uk
  • 2.
    The Code ♂ ♀ • Acode passed from cell to cell • ~3.2 billion ‘letters’ in total • Encodes about 22,000 proteins Image Credit: National Human Genome Research Institute
  • 3.
    ‘Bugs’ in theCode Sustain proliferative signalling Evade growth suppressors Mutations in the Genetic Code Avoid immune GACCTGGCAGCCAGGAACGTACTGGT destruction Deregulating cellular energetics Enabling replicative immortality • Vast majority have no consequence… Tumor-promoting inflammation …but occasionally… Resisting cell death GACCTGGCAGCC----ACGTACTGGT Genome instability & mutation Promoting local blood supply • Alteration causes a selective Activating growth advantage, increasing invasion & the ratio of cell birth to cell metastasis death. Hanahan D & Weinberg R (2011) Hallmarks of Cancer: The Next Generation Cell , Volume 144, Issue 5, Pages 646-674
  • 4.
    Reading the Code:1 $100,000,000 1400 1200 $10,000,000 1000 $1,000,000 800 $100,000 Tb 600 $10,000 400 $1,000 200 Cost/Genome Short Read Archive $100 0 2001 2002 2003 Draft Human Genome Project 2004 2005 2006 2007 ‘Massively Parralel’ sequencing 2008 2009 2010 1st Tumour 2011 2012 2013 2014 >12, 000 tumours Wetterstrand KA. DNA Sequencing Costs: Data from the NHGRI Genome Sequencing Program (GSP) [27/01/2014] Wang and Wheeler (2014) Genomic Sequencing for Cancer Diagnosis and Therapy Annu. Rev. Med. 65: 33-48
  • 5.
    Reading the Code:2 Challenge: Rate of data production has overtaken improvements in long-term storage capacity. Response: Novel Compression Algorithms Stein (2010) The case for cloud computing in genome informatics Genome Biology, 11:207 Bonfield JK, Mahoney MV (2013) Compression of FASTQ and SAM Format Sequencing Data. PLoS ONE 8(3): e59190.
  • 6.
    Reading the Code:3 ‘I would say the Human Genome Project is probably ‘I would say the Human Genome Project is probably ‘I would say the Human Genome Project is probably more significant than splitting the atom or going more significant than splitting the atom or going more significant than splitting the atom or going to the moon.’ Francis Collins, CNN. to the moon.’ Francis Collins, CNN. to the moon.’ Francis Collins, CNN. Shatter & Scan probab ing to ignifi X 300M y more ly mor obably Genome the Hu ifican Proje itting signif ing to oject roject Identify Original ‘ the Human Genome Project more significant’ probably
  • 7.
    Placing the CodeSnippets ACTGGTGAAAA CGTACTGGTGA CAGGAACGTAC CCTGGCAGCCA TGGTGAAAACA TCAAGATCACA GTCAAGATCAC TGTCAAGATCA TTTGGGCTGGC TTTTGGGCTGG ………GACCTGGCAGCCAGGAACGTACTGGTGAAAACACCGCAGCATGTCAACATCACAGATTTTGGGCTGGCCAAACT……… Reference genome Candidate variant
  • 8.
    Placing the CodeSnippets  Encodes a protein change in the EGFR gene  This particular coding change (‘L858R’) renders the tumour sensitive to targeted therapy (Afatinib)  ‘Personalised medicine’ D Gonzalez de Castro, P A Clarke, B Al-Lazikani and P Workman (2013) Personalized Cancer Medicine: Molecular Diagnostics, Predictive biomarkers, and Drug Resistance Clinical Pharmacology & Therapeutics; 93 3, 252–259
  • 9.
    Identifying Bugs, confidently •“What are we missing?” The influence of sample heterogeneity, purity and read depth. • ‘False’ mutations: sequencing errors & inaccurate alignment. • Mutation calling is a work in progress. • Crowdsourcing a solution. Cibulskiset al (2013) A comparative unravels of algorithmsmutations inSNV detection in cancer Bioinformatics 2013;29:2223Caldas C (2012) Cancer sequencing analysissomaticevolution Nature Biotechnologyheterogeneous cancer samples Nature Roberts et al (2013) Sensitive detection of clonal point for somatic impure and 30, 408–410 Biotechnology 31, 213–219 2230
  • 10.
    Interpreting the Code Distinguishingbetween mutations that confer a selective advantage and those that are selectively neutral. • Recurrent mutations • Account for variable background mutation rates • Comparative genomics Imielinski Wheeler Mapping the hallmarks of lung for cancer diagnosis and therapy. Annu sequencing. 14;65:33-48. Wang L & M (2012)DA. (2014) Genomic sequencingadenocarcinoma with massively parallel Rev Med. JanCell. 2012 Sep 14;150(6):1107-20.
  • 11.
    Thank you foryour attention Particular thanks go to: BABS, Prof. Swanton & above all to the patients