3. “With enough data
and the ability to
crunch it, virtually any
challenge facing
humanity today can
be solved.”
Eric Schmidt et al, How Google Works, 2014
4. Prof Atul Butte (Genomic Medicine, Stanford) at TEDMED 2012
“Who needs the scientific method?
Vast stores of available data (…)
are simply waiting for the right
questions.”
5. Chris Anderson, WIRED.com, 2008
http://archive.wired.com/science/discoveries/magazine/16-07/pb_theory
The End of Theory: The Data
Deluge Makes the Scientific
Method Obsolete
‘Petabytes allow us to say: "Correlation
is enough.” (…)
We can throw the numbers into the
biggest computing clusters the world
has ever seen and let statistical
algorithms find patterns where science
cannot.’
14. [The ENCODE] data enabled us to
assign bio-chemical functions for
80% of the genome.
Function = showed up in the data
15. Graur et al, Genome Biol Evol (2013)
“This claim flies in the face of
current estimates according to
which the fraction of the genome
that is evolutionarily conserved
(…)
is under 10%.”
Function = evolutionary conserved
17. Always the same science.
Always the same questions.
Big Data is a technical
challenge, not a conceptual
one
18. Systems Genetics of Cancer
Genetic variation
• In people
• In tumours
• In clones
Phenotypic variation
• Tumour subtypes
• Aggressiveness
• Survival
27. Mixture model
1. How many clones are there in the
sample?
Data
Nr of
clones Size of
clone
Variability
inside clone
Parameters
28. Graphical Model behind BitPhylogeny
Phylogeny prior
Prior on local parameters
Likelihood
The BitPhylogeny model
29. FUTURE
• ICGC pan-cancer analysis:
2500 genomes => 2500 trees
• Characterize the 2500 trees
• Correlate trees with clinical data
• Infer onco-genetic progression models across
the 2500 trees
37. FUTURE
• ICGC pan-cancer analysis: 2500 genomes
• Collect tissues for as many samples as possible.
• Correlate tissue architecture with clinical data
• Correlate tissue architecture with evolution.