2. Goals
• Provide an overview of the major scientific
events, trends and publications in
translational bioinformatics
• Create a “snapshot” of what seems to be
important in March, 2010 for the amusement
of future generations.
• Marvel at the progress made and the
opportunities ahead.
3. Process
1. Think about what has had early impact
2. Think about sources to trust
3. Solicit advice from colleagues
4. Surf online resources
5. Select papers to highlight in ~2 slides and
some to highlight in < 1 slide.
4. Caveats
• Considered 2009 to present
• Focused on human biology and clinical
implications: molecules, clinical data, informatics.
• Considered both data sources and informatics
methods (and combination)
• Tried to avoid simply following crowd mentality.
5. Final list
• ~70 semi-finalist papers
• 24 presented here (briefly!)
• This talk and semi-finalist bibliography will be
made available on the conference website.
6. Thanks!
• George Hripcsak
• Brian Athey
• Peter Tarczy-Hornoch
• Alain Laederach
• Soumya Raychaudhuri
• Yves Lussier
• Dan Masys
• Emidio Capriotti
• Andrea Califano
• Liping Wei
• Atul Butte
• Nick Tatonetti
• Joel Dudley
• Gill Omenn
8. “Geographic dependence, surveillance, and origins of
the 2009 Influenza A (H1N1)Virus” (Trifonov et al,
NEJM)
• Goal: understand the origin and recent history of
new strains from viral DNA sequences.
• Method: Sequence analysis and comparison of eight
key influenza genes in current and historical
samples.
• Result: Evolutionary map of recombination events
leading to current H1N1 variant.
• Conclusion: Aggressive sampling of multiple species
may allow us to anticipate novel flu in the future.
12. “Exome sequencing identifies the cause of a
mendelian disorder” (Ng et al, Nat. Gen.)
• Goal: find the cause of Miller syndrome.
• Miller syndrome = facial and limb anomalies.
• Method: exon-only sequencing of 4 affected
individuals in three kindreds.
• Result: DHODH gene (enzyme for pyrimidine
synthesis) mutations in these and 3 other families.
15. “Analysis of genetic inheritance in a family quartet
by whole-genome sequencing” (Roach et al, Science
Express)
• Goal: understand relationship between rare disease
and corresponding genetic changeas.
• Miller syndrome & cilia dyskinesia = both recessive.
• Method: whole genome sequencing of parents and 2
affected sibs.
• Result: 4 genes identified with SNPs explaining
pattern of inheritence (CES1, DHODH, DNAH5,
KIAA056)
17. “Whole-genome sequencing in a patient with
Charcot-Marie-Tooth Neuropathy” (Lupski et al,
NEJM)
• Goal: understand relationship between rare disease
and corresponding genetic changes.
• CMT neuropathy = recessive, demyelinating disease.
• Method: whole genome sequencing of (big!) family
(parents, 4 affected sibs, 4 unaffected sibs). Negative
for previous CMT common screens.
• Result: causative alleles in gene SH3TC2, het
20. “Autoimmune disease classification by inverse
association with SNP alleles” (Sirota et al, PLoS
Genetics)
• Goal: Compare genetic variation profiles across six
autoimmune diseases.
• MS,AS,ATD, RA, CD,T1D + 5 non-autoimmne
• Method: Cluster diseases based on allele
occurrences from GWAS studies.
• Result: RS/AS cluster separates from MS/ATD
cluster with someone “opposite” allele profile. May
yield information about disease-specific differences.
22. “Identifying relationships among genomic disease
regions: predicting genes at pathogenic SNP
associations and rare deletions” (Raychaudhuri et al,
PLoS Genetics)
• Goal: Map associations to potential mechanisms
using literature mining.
• Method: Test associated disease regions with
medical literature, looking for connectionss =
pathways
• Result: Able to filter candidate mutations in Crohn’s
disease and schizophrenia, and map them to subset
of mutations for which there is a biological pathway
related to the disease.
25. “Rare variants create synthetic genome-wide
associations” (Dickson et al, PLoS Biology)
• Goal: understand the impact of rare variants on
common SNP association studies.
• Method: Simulation of effect of LD between rare
SNPs and common ones
• Result: Correlations are not possible but inevitable,
so GWAS may work for wrong reason. F/U
sequencing is key.
• Many positive GWAS studies, especially with
differential results in geographically disperse
populations, may be affected by this phenomenon.
27. “In Silico functional profiling of human disease-
associated and polymorphic amino acid
substitutions” (Mort et al, Human Mutation)
• Goal: Understand how variation in proteins leads to
complex disease phenotypes.
• Method: Compare amino acid substitutions
associated with disease and neutral, looking for
differences in protein chemical features.
• Results: Associated UMLS disease areas with
different sets of predictive protein features
• Conclusion: The types of proteins used in different
disease areas are sensitive to different types of
mutations.
30. “Exploring the human genome with functional
maps” (Huttenhower et al, Genome Research)
• Goal: Systems-level understanding of genetic
contributions to human phenotypes.
• Method: Bayesian integration of 30K experiments on
25K genes. Creation of data-driven functional maps
weighted by reliability for individual functional
categories.
• Result: 200 context-specific interaction networks.
Experimentally validated 5 novel predictions for
genes involved in macroautophagy.
33. “Genome-wide identification of post-translational
modulators of transcription factor activity in human
B cells” (Wang et al, Nat. Biotech.)
• Goal: Understand TF regulation via proteins.
• Method: Mutual information analysis to identify
protein modulators of TF function on chosen
targets.
• Result: Able to detect molecules that transduce
signal from TF to target either as positive modulator
(create correlation) or negative modulator (destroy
correlation). Successful application to MYC to find
~50 significant modulators, experimentally verified.
34.
35.
36. “MiR-204 suppresses tumor invasion by regulating
networks of cell adhesion and extracellular matrix
remodeling ” (Lee et al, PLoS Comp. Bio, in press)
• Goal: Identify microRNA regulators of cancer and
opportunities for new therapies
• Method: Integrate expression, genetics, and cancer
molecular phenotypes.
• Result: 18 validated targets of miR-204,
experimental evidence showing that miRNA-204
replacement reduces tumor aggressiveness.
• Conclusion: Integrated analysis of miRNA with
experimental validation yields new cancer leads.
38. “Drug discovery using chemical systems biology:
repositioning the safe medicine comtan to treat
multidrug and extensively drug resistant
Tuberculosis” (Kinnings et al, PLoS Comp. Bio)
• Goal: Identify off-targets of major pharmaceuticals
to find new uses for old drugs.
• Method: Use protein structure to characterize
binding site of drug, and then look for cryptic similar
sites in other proteins, including TB proteome.
• Result: Comtan (for Parkinson’s) binds InhA in TB,
& inhibits TB growth--they also found evidence that
Parkinson’s patients improve with TB treatment!
39.
40. “Generating genome-scale candidate gene lists for
pharmacogenomics” (Hansen et al, Clin. Pharm. &
Ther.)
• Goal: Identify genes likely to modulate drug
response.
• Method: Associate drugs with network
representation of genetic interactions, rank genes
based on likelihood of interacting with drugs.
• Result: AUC of 82% on independent test set. Novel
gene candidates for warfarin, gefitinib, carboplatin
and gemcitabine.
41.
42. “Network-based elucidation of human disease
similarities reveals common functional modules
enriched for pluripotent drug targets” (Suthram et
al, PLoS Comp. Bio.)
• Goal: Create molecular relationships between
diseases, use this to find new drug opportunities.
• Method: Define 4600 co-expressed functional
modules, and cluster diseases using these.
• Result: A novel disease clustering, and functional
modules including known drug targets that
participate in many diseases.
43.
44.
45. “Predicting new molecular targets for known
drugs” (Keiser et al, Nature)
• Goal: Find new uses for old drugs
• Method: Represent drug targets by the company
they keep: the drugs that bind them. Compare the
list of drugs for similarity. Targets with similar lists
may have cross-reactivity. Find drugs that are most
similar with a new list. Careful statistics.
• Result:An off-target network that relates drugs to
new targets. 5 potent new associations, e.g. Prozac
as beta-blocker, Vadilex as serotonin blocker.
48. “Ontology-driven indexing of public datasets for
translational bioinformatics” (Shah et al, BMC
Bioinf.)
• Goal: develop infrastructure for applying controlled
descriptors to datasets.
• Method:Annotate and index multiple biomedical
data resources with UMLS concepts, create index,
and federate these together.
• Result: Integration of multiple data sources with
controlled vocabulary allowing powerful searches
across data sets.
49.
50.
51. “A recent advance in the automatic indexing of the
biomedical literature” (Neveol et al, J. Biomed. Info.)
• Goal: Move towards automated indexing of Medline
articles
• Method: Combine methods of NLP & machine
learning to assign heading/subheading pairs.
• Results: Best combination 48% precision, 30%
recall. Integrated into MTI tool for NLM curators.
52.
53.
54. “Cloud computing: a new business paradigm for
biomedical informatics” (Rosenthal et al, J. Biomed.
Inf.)
• Goal: Examine fit of BMI to cloud computing.
• Method: Focus on specific component technologies
used by the field in different types of tasks.
• Result: Clouds require careful analysis and
attention to the migration path from current
infrastructure to future.
55.
56. “Lowering industry firewalls: pre-competitive
informatics initiatives in drug discovery” (Barnes et
al, Nat. Rev. Drug. Disc.)
• There are substantial challenges facing
pharmaceutical industry (failed new drugs, slow
pipeline).
• Opportunity for pre-competitive collaboration and
engagement with public domain.
• Propose new areas for collaboration, and highlight
cultural shifts that will be needed.
57. PROPOSED INITIATIVES
• Disease knowledge: Curating gene-disease
associations, shared pathways, imaging repositories
• Target pharmacology: redefine druggability, catalog of
targets/phenotypes, share data on known molecules
• Drug safety: adverse event signatures, Pgx data (!),
ADME models
• Knowledge management: literature mining, patent
mining, data standards
• Pharmaceutical infrastructure: gene indices/
nomenclature, robust web service standards, data
storage cooperatives.
59. “An agenda for personalized medicine” (Ng et al,
Nature)
• Goal: Compare direct-to-consumer (DTC)
services.
• Method: Compare analyses from two DTC
companies for 13 diseases on 5 individuals.
• Result: Raw data very accurate. Interpretation vary
significantly. For 7 diseases, 50% or less of
predictions agree.
• Conclusion: Focus on high risk, strong effect, direct
measures. Focus on PGx. Monitor outcomes.
60.
61.
62. “Back to the future: why randomized controlled
trials cannot be the answer to pharmacogenomics
and personalized medicine” (Frueh,
Pharmacogenomics)
• Question: RCTs are the gold standard, shouldn’t
they be required for personalized medicine
interventions?
• Answer: No. Not based on “averages” (by
definition), better to use case-control, retrospective
and other mechanisms.
• Conclusion: Insistence on RCT level evidence will
unnecessarily hinder the roll out of personalized
medicine.
63. “Computing has changed biology--biology education
must catch up” (Pevzner et al, Science)
• Education Forum piece
• Computation is now essential to biology
• Undergraduate biology education has not changed
• New course proposed for all biology undergrads:
“Algorithmic, mathematical, and statistical concepts
in Biology”
64. “Distilling free-form natural laws from experimental
data” (Schmidt & Lipson, Science)
• Goal: Define algorithmically what makes a
correlation in observed data important and
insightful.
• Method: Propose a principle for identifying
nontriviality: candidate equations should predict
connections between dynamics of subcomponents
of the system.
• Result: Example in undergraduate physics,
recovered well-known physical laws (Hamiltonian,
Lagrange, Equation of Motion)
65.
66.
67. “A statistical dynamics approach to the study of human
health data: resolving population scale diurnal variation
in laboratory data” (Albers & Hripcsak, Physics
Letters A)
• Goal: Apply statistical physics and information theory
to clinical chemistry measurements.
• Method: 2.5 million data points over 20 years, look at
time delay mutual information. Focus on creatinine.
• Result: Creatinine is initially measured twice a day at
Columbia, and then every morning. Yesterday’s
measurement predicts today’s.
• Conclusion: Sophisticated dynamic modeling methods
(that physicists use )are applicable to biological systems.
68.
69. 2008 Crystal ball...
Sequencing makes a comeback (watch out
microarrays....)
Translational science projects will create
astounding data sets (hopefully available) to
catalyze research
GWAS will continue to proliferate
Consumer-oriented genetics will create demand
for online resources for interpretation
Difficult decisions about when/how to bring
new molecular diagnostics to practice.
70. 2008 Crystal ball...
Sequencing makes a comeback (watch out
microarrays....)
Translational science projects will create
astounding data sets (hopefully available) to
catalyze research
GWAS will continue to proliferate
Consumer-oriented genetics will create demand
for online resources for interpretation
Difficult decisions about when/how to bring
new molecular diagnostics to practice.
71. 2009 Crystal ball...
Focus on mechanism in interpreting genetic
associations
More sophisticated mechanisms to find signal in
GWAS, including data integration
Cellular dynamics of expression, metabolites,
proteins
Multiple human & cancer genome sequences
Consumer sequencing (vs. genotyping)
72. 2009 Crystal ball...
Focus on mechanism in interpreting genetic
associations
More sophisticated mechanisms to find signal in
GWAS, including data integration
Cellular dynamics of expression, metabolites,
proteins
Multiple human & cancer genome sequences
Consumer sequencing (vs. genotyping)
73. 2010 Crystal ball...
Clinical records will be linked to genomics to make
discoveries.
More emphasis on drugs and ancestry in DTC
companies
Whole genome sequencing for a cohort with a
common disease (cancer already here?)
Consumer sequencing (vs. genotyping)
Semantics in literature mining for knowledge discovery
Cloud computing will contribute to one biomedical
discovery.