Translational
Bioinformatics 2012:
TheYear in Review
Russ B. Altman, MD, PhD
Stanford University
Goals
• Provide an overview of the scientific trends
and publications in translational bioinformatics
• Create a “snapshot” of what seems to be
important in March, 2011 for the amusement
of future generations.
• Marvel at the progress made and the
opportunities ahead.
Process
1. Follow literature through the year
2. Solicit nominations from colleagues
3. Search key journals
4. Stress out a bit.
5. Select papers to highlight in ~2-3 slides
Caveats
• Translational bioinformatics = informatics methods
that link biological entities (genes, proteins, small
molecules) to clinical entities (diseases, symptoms,
drugs)--or vice versa.
• Considered last ~14 months (to Jan 2011)
• Focused on human biology and clinical
implications: molecules, clinical data, informatics.
• NOTE: Amazing biological papers with
straightforward informatics generally not included
(genome sequencing for rare diseases, disease
analyses with high-throughput data).
Final list
• ~100 finalist papers (will make list available)
• 25 presented here (briefly!). 14 “shout outs”
• Apologies to many I missed. Mistakes are mine.
• This talk and semi-finalist bibliography will be made
available on the conference website and my blog on
rbaltman.wordpress.com
• TOPICS: systems medicine, finding & defining
phenotypes, biomarkers, genomic infrastructure,
drug adverse events & interactions, drug repurposing
Thanks!• Bruce Aronow
• Atul Butte
• Phil Bourne
• Andrea Califano
• Lisa Cannon-Albright
• Josh Denny
• Joel Dudley
• Larry Fagan
• Guy Fernald
• Carol Friedman
• Yael Garten
• Mark Gerstein
• Maureen Hillenmeyer
• George Hripcsak
• Larry Hunter
• Peter Kang
• Rachel Karchin
• Konrad Karczewski
• Hiroaki Kitano
• Ron Kostoff
• Alain Laederach
• Jennifer Lahti
• Tianyun Liu
• Yves Lussier
• Dan Masys
• Alex Morgan
• Stephen Montgomery
• Peter O’Donnell
• Lucila Ohno-Machado
• Raul Rabadan
• Predrag Radcovic
• Soumya Raychaudhuri
• Neil Sarkar
• Nigam Shah
• Ted Shortliffe
• Mike Snyder
• Nick Tatonetti
• Peter Tarczy-Hornoch
• Olga Troyanskaya
• AlfonsoValencia
• Liping Wei
• Jeff Williamson
• Jonathan Wren
• HongYu
• Qunying Xie
“ISCB public policy statement on open access to
scientific and technical research literature” (Lathrop
et al, PLoS Comp Bio)
• Goal: Influence policy by supporting open access to
scientific literature (and block attempts by for-profit
publishers to roll back open access rules)
• Conclusion: (1) essential to have access for mining,
(2) existing models show success, (3) will enable
novel tools, (4) supplementary data should be freely
available, (5) cost recovery is necessary, (6) details
will matter but should not distract, (7) neutral on
funding policy, (8) cost is small compared to
alternative.
Systems medicine
“Three-dimensional reconstruction of protein
networks provides insight into human genetic
disease” (Wang et al, Nat. Biotech.)
• Goal: Understand molecular mechanisms underlying
human disease.
• Method: Create interactome of genes, mutations
and associated disorders, in context of protein-
protein interactions.
• Result: In-frame mutations occur at protein
interfaces & disease specificity depends on location
within an interface.
• Conclusion: Predict 292 genes for 694 diseases.
Nodes =
proteins;
edges =
interactions;
colored nodes =
disease
associated
proteins
“Protein networks as logic functions in development
and cancer” (Dutkowski et al, PLoS Comp Bio)
• Goal: Understand how protein modules combine
protein functions to create output signals.
• Method: Network-Guided Forests to identify
predictive modules and logic functions that connect
module to component genes.
• Result: Modules implement complex logic, not
simple linear models.
• Conclusion: Genetic effects of cancer genes are
not additive, but engage in nontrivial combinatorial
logic.
“Reverse engineering of TLX oncogenic
transcriptional networks identifies RUNX1 as a
tumor suppressor in T-ALL” (Gatta et al, Nat.
Medicine)
• Goal: Use transcriptional data to study the
pathogenesis of T-cell ALL, understand regulation
• Method: Network structure analysis of relationships
gathered from expression analysis.
• Result: TLX1 and TLX3 key regulators. RUNX1 is
tumor supressor and shows high rates of loss-of-
function mutations in T-ALL subjects
• Conclusion: Network analyses can identify key
cancer players.
“Computational modeling of pancreatic cancer
reveals kinetics of metastasis suggesting optimum
treatment strategies” (Haeno et al, Cell)
• Goal: Math model of pancreatic cancer progression
and impact of different drug dosing regimens.
• Method: Differential equation modeling of cell
growth, death and effects of drugs.
• Result: Therapies that reduce growth rate of cells
early look superior to upfront resection strategies.
• Conclusion: Math modeling of cancer progression
can yield insight into detailed risks/benefits of
different treatment strategies.
“Extracting a cellular hierarchy from high-
dimensional cytometry data with SPADE” (Qiu et al,
Nature Biotech)
• Goal: Understand cellular heterogeneity in single cell
measurements of hematopoietic cells.
• Method: Spanning-tree progression analysis of
density-normalized events (SPADE) to subcluster and
connect cell groupings.
• Result: Found hierarchies of related phenotypes
recapitulating hematopoiesis.
• Conclusion: Cytometry data allows separation,
characterization and definition of relationships
between single cells.
“Computational design of proteins targeting the
conserved stem region of influenza
hemagglutinin” (Fleishman et al, Science)
• Goal: Design proteins for diagnostic and therapeutic
purposes.
• Method: Use protein prediction algorithms (Rosetta-
derived) to design amino acid surface that bind
target proteins.
• Result: Developed two proteins that bind a
conserved patch on 1918 H1N1 pandemic virus.
• Conclusion: Designed proteins using knowledge-
based methods can achieve high binding specificity.
Finding & Defining
Phenotypes
“Using electronic patient records to discover
disease correlations and stratify patient
cohorts” (Roque et al, PLoS Comp Bio)
• Goal: Mine phenotype descriptions from EMR and
connect to genetic networks.
• Method: Extract free-text to classify patients and
disease co-occurrence. Use OMIM to map to
genetics.
• Result: Large set of disease correlations, associated
with genes.
• Conclusion: EMR can identify new phenotype
“syndromes” with genetic hooks.
“Enabling enrichment analysis with the human
disease ontology” (LePendu et al, J Biomed Inf)
• Goal: Enable enrichment analyses with ontologies
that do not have manually curated annotation sets.
• Method: Use GO annotations as filter to associate
diseases to genes in 44,000 PubMed abstracts.
• Result: Can associate disease terms with 30% of the
genome, and reproduce known associations of aging
genes.
• Conclusion: Extension of enrichment analysis to
other ontologies can use GO curation as a “seed.”
“Detecting novel associations in large data
sets” (Reschef et al, Science)
• Goal: Find interesting (nonlinear) relationships
between pairs of variables in large data sets.
• Method: Maximal information coefficient captures
wide range of associations.
• Result: Applied to global health, gene expression,
baseball, microbiota in gut with good results.
• Conclusion: Useful tool for finding associations in
large data sets, e.g.
“Toward Precision Medicine: Building a Knowledge
Network for Biomedical Research and a New
Taxonomy of Disease” (National Academies Report)
• Goal: Explore the feasibility and need for a “new
taxonomy” (NT) of human health based on
molecular biology.
• Conclusion: (1) a NT will lead to better health
care, (2) the time is right, (3) NT should be
developed, (4) a knowledge network of disease
would enable NT, (5) new models for population-
based research will enable NT, (6) redirection of
resources could facilitate development.
Biomarkers
“Efficient replication of over 180 genetic
associations with self-reported medical data” (Tung
et al, PLoS ONE)
• Goal: Assess whether self-reported phenotypes are
adequate for discovery.
• Method: Attempt to replicate genetic associations
from 23andme customers.
• Result: 180 (70%) of associations replicated.
• Conclusion: Self-reported phenotypes have lower
precision but still allow discovery.
“Rare de novo variants associated with autism
implicate a large functional network of genes
involved in formation and function of
synapses” (Gilman et al, Cell)
• Goal: Identify complex networks underlying
common human phenotypes.
• Method: Network based analysis of genetic
associations (NETBAG) to identify genes affected by
rare CNVs in autism.
• Result: Perturbed synaptogenesis is associated with
autism phenotype.
• Conclusion: Networks help with analysis of rare
variation.
Genomic infrastructure
“The mystery of missing heretability: genetic
interactions create phantom heritability” (Zuk et al,
PNAS)
• Goal: Understand why there is so much
unexplained variability in setting of GWAS.
• Method: Explore the role of genetic interactions
with quantitative modeling.
• Result: Assuming additive traits leads to
overestimates of hereditability, and epistasis is
common.
• Conclusion: Not as much heretability is missing as is
common assumed.
phantom (π) vs. apparent (h2)
“Performance of mutation pathogenicity prediction
methods on missense variants” (Thusberg et al,
Human Mutation)
• Goal: Compare methods for predicting deleterious
variants in protein sequences.
• Method: 40,000 pathogenic and neutral variants
tested vs. 9 methods.
• Result: Performance Matthew’s CC 0.19 to 0.65.
SNPs&GO and MutPred were best.
• Conclusion: General purpose predictors still with
limited capabilities.
“A probabilistic disease-gene finder for personal
genomes” (Yandell et al, Genome Res)
• Goal: Find disease-causing variants in whole genome
sequences.
• Method: Bayesian variant prioritization for coding
and non-coding variants combining several sequence
features =Variant Annotation, Analysis & Search
Tool (VAAST)
• Result: Demonstrate ability to detect key genes in
small cohorts, and common multigenic diseases.
• Conclusion: Information integration for finding rare
variants can be successful.
“Technical desiderata for the integration of genomic
data into electronic health records” (Masys et al, J.
Biomed Inf.)
• Goal: Understand how genomic data differs from
other health data in the medical record.
• Conclusion: (1) Maintain separation of primary data
and observations, (2) Support lossless compression ,
(3) Link observations to lab methods, (4) Compactly
represent clinical actionability, (5) Support human
and machine-readable formats, (6) Anticipate
changes in our understanding of variation, (7)
Support both clinical care and discovery science.
“Genomics and privacy: implications of the new
reality of closed data for the field” (Greenbaum et
al, PLoS Comp Bio)
• Goal: Examine state of genomic privacy in context
of emerging privacy concerns.
• Conclusions: (1) Changing ability to interpret
genomes makes it a moving target, (2) Methods
needed to divide genome into segments for anlaysis,
(3) Modification of informed consent required, (4)
Cloud computing may help control access, (5)
Education challenges in analyzing personal genomes.
Drugs Adverse Events &
Interactions
“Structure-based discovery of prescription drugs
that interact with the norepinephrine transporter,
NET” (Schlessinger et al, PNAS)
• Goal: Find new substrates for NET transporter,
among prescription drugs.
• Method: Model 3D structure, screen 6536 small
molecules.
• Result: 10/18 high scoring molecules inhibited NET.
• Conclusion: Virtual screening against a modeled
structure can provide valuable pharmacological info.
“Predicting adverse drug reactions using publicly
available PubChem BioAssay Data” (Pouliot et al,
Clin Pharm & Ther.)
• Goal: Develop method to predict adverse
reactions to drugs based on bioassay data.
• Method: Build regression models that relate
performance in bioassays to adverse events.
• Result: For 19 organ classes, 9 predictors
successfully predict on cross-validation.
• Conclusion: Bioassay data can be used to predict
ADRs and may shed light on mechanism.
“Predicting adverse drug events using
pharmacological network models” (Cami et al,
Science Trans Med)
• Goal: Create predictor for drug AEs based on
training data from known drug-AE relations.
• Method: Build individual regressions based on
network connectivity features, ATC codes,AE
codes, drug properties.
• Result: AUROC 87% (42% sens with 95% spec)
• Conclusion: Can use these network models to
predict AEs before drugs released.
“Detecting drug interactions from adverse-event
reports: interaction between paroxetine and
pravastation increases blood glucose
levels” (Tatonetti et al, Clin Pharm & Ther.)
• Goal: Develop method for detecting latent signs of
drug-drug interactions.
• Method: Learn pattern for hyperglycemia on single
drugs, apply to pairs of drugs.
• Result: Paroxetine & Pravastatin with strong signal,
seen in 3 EMRs, validated in mouse model.
• Conclusion: Despite 0 reports in FDA-AERS, strong
latent signal for hyperglycemia for Pa & Pr.
Drug Repurposing
“Prediction of drug combinations by integrating
molecular and pharmacological data” (Zhao et al,
PLoS Comp Bio)
• Goal: Predict effective drug combinations with
molecular & pharmacological data.
• Method: Look at approved drug combinations for
specific patterns of features (targets, indications),
and use these to predict new combinations.
• Result: 69% of predictions have literature support.
• Conclusion: This approach can help look for drug
combinations that are likely to be effective.
“PREDICT: a method for inferring novel drug
indications with application to personalized
medicine” (Gottlieb et al, Mol Sys Biol)
• Goal: Find novel uses for existing drugs.
• Method: Use drug-drug and disease-disease
similarities to create new drug-disease pairs.
• Result: Validated by assessing overlap with drugs
currently in clinical trials. Also cross-validation .90.
• Conclusion: Disease-specific signatures can predict
new drugs with high cross-val.
“Discovery and preclinical validation of drug
indications using compendia of public gene
expression data” (Sirota et al, Science Trans Med)
• Goal: Predict novel drug uses
• Method: Compare molecular signatures (expression)
for drugs and diseases and find complements.
• Result: New indications for 164 drugs, some
experimentally validated.
• Conclusion: A computational method for suggesting
drug repurposing.
(Dudley et al in same issue showed validation of
anticonvulsant topiramate for IBD using this method).
Shout outs...
“A systematic survey of loss-of-function variants in human protein-coding genes.”
MacArthur et al, Science
“PASTE: patient-centered SMS text tagging in a medication management system.”
Stenner et al, JAMIA
“Discovering disease associations by integrating electronic clinical data and medical
literature” Holmes et al, PLoS ONE
“Variants near FOXE1 are associated with hypothyroidism and other thyroid conditions:
using electronic medical records for genome- and phenome-wide studies.”
Denny et al, AJHG
“BioNOT: a searchable database of biomedical negated sentences.”
Agarwal et al, BMC Bioinformatics
“The impact of risk information exposure on women’s beliefs about direct-to-consumer
genetic testing for BRCA mutations” Gray et al, Clinical Genetics
“Mapping clinical phenotype data elements to standardized metadata repositories and
controlled terminologies; the eMERGE Network experience.” Pathak et al, JAMIA.
Shout outs...
“Evidence for hitchhiking of deleterious mutations within the human genome.”
Chun & Fay, PLoS Genetics
“The write position” Wren et al, EMBO Reports
“Phase whole-genome genetic risk in a family quartet using a major allele reference
sequence.” Dewey et al, PLoS Genetics
“A quantitative analysis of adverse events and ‘overwarning’ in drug labeling”
Duke et al, Arch. Internal Medicine
“Making a definitive diagnosis: successful clinical application of whole genome
sequencing in a child with intractable inflammatory bowel disease.”
Worthey et al, Genetics in Medicine
“Global analysis of disease-related DNA sequence variation in 10 healthy individuals:
implications for whole genome-based clinical diagnostics.”
Moore et al, Genetics in Medicine
“Enterotypees of the human gut microbiome.” Arumugam et al, Nature
2011 Crystal ball...
Consumer sequencing (vs. genotyping) will emerge
Cloud computing will contribute to major biomedical
discovery.
Informatics applications to stem cell science will
increase
Important discoveries from text mining
Population-based data mining will yield important
biomedical insights
Systems modeling will suggest useful polypharmacy
Immune genomics will emerge as powerful data
2011 Crystal ball...
Consumer sequencing (vs. genotyping) will emerge
Cloud computing will contribute to major biomedical
discovery.
Informatics applications to stem cell science will
increase
Important discoveries from text mining
Population-based data mining will yield important
biomedical insights
Systems modeling will suggest useful polypharmacy
Immune genomics will emerge as powerful data
2012 Crystal ball...
Cloud computing will contribute to major biomedical
discovery.
Informatics applications to stem cell science will
increase
Immune genomics will emerge as powerful data
Flow cytometry informatics will grow
Molecular & expression data will combine for drug
repurposing
Exome sequencing will persist longer than expected
Progress in interpreting non-coding DNA variations
Thanks.
See you in 2013!
russ.altman@stanford.edu

Amia tb-review-12

  • 1.
    Translational Bioinformatics 2012: TheYear inReview Russ B. Altman, MD, PhD Stanford University
  • 2.
    Goals • Provide anoverview of the scientific trends and publications in translational bioinformatics • Create a “snapshot” of what seems to be important in March, 2011 for the amusement of future generations. • Marvel at the progress made and the opportunities ahead.
  • 3.
    Process 1. Follow literaturethrough the year 2. Solicit nominations from colleagues 3. Search key journals 4. Stress out a bit. 5. Select papers to highlight in ~2-3 slides
  • 4.
    Caveats • Translational bioinformatics= informatics methods that link biological entities (genes, proteins, small molecules) to clinical entities (diseases, symptoms, drugs)--or vice versa. • Considered last ~14 months (to Jan 2011) • Focused on human biology and clinical implications: molecules, clinical data, informatics. • NOTE: Amazing biological papers with straightforward informatics generally not included (genome sequencing for rare diseases, disease analyses with high-throughput data).
  • 5.
    Final list • ~100finalist papers (will make list available) • 25 presented here (briefly!). 14 “shout outs” • Apologies to many I missed. Mistakes are mine. • This talk and semi-finalist bibliography will be made available on the conference website and my blog on rbaltman.wordpress.com • TOPICS: systems medicine, finding & defining phenotypes, biomarkers, genomic infrastructure, drug adverse events & interactions, drug repurposing
  • 6.
    Thanks!• Bruce Aronow •Atul Butte • Phil Bourne • Andrea Califano • Lisa Cannon-Albright • Josh Denny • Joel Dudley • Larry Fagan • Guy Fernald • Carol Friedman • Yael Garten • Mark Gerstein • Maureen Hillenmeyer • George Hripcsak • Larry Hunter • Peter Kang • Rachel Karchin • Konrad Karczewski • Hiroaki Kitano • Ron Kostoff • Alain Laederach • Jennifer Lahti • Tianyun Liu • Yves Lussier • Dan Masys • Alex Morgan • Stephen Montgomery • Peter O’Donnell • Lucila Ohno-Machado • Raul Rabadan • Predrag Radcovic • Soumya Raychaudhuri • Neil Sarkar • Nigam Shah • Ted Shortliffe • Mike Snyder • Nick Tatonetti • Peter Tarczy-Hornoch • Olga Troyanskaya • AlfonsoValencia • Liping Wei • Jeff Williamson • Jonathan Wren • HongYu • Qunying Xie
  • 7.
    “ISCB public policystatement on open access to scientific and technical research literature” (Lathrop et al, PLoS Comp Bio) • Goal: Influence policy by supporting open access to scientific literature (and block attempts by for-profit publishers to roll back open access rules) • Conclusion: (1) essential to have access for mining, (2) existing models show success, (3) will enable novel tools, (4) supplementary data should be freely available, (5) cost recovery is necessary, (6) details will matter but should not distract, (7) neutral on funding policy, (8) cost is small compared to alternative.
  • 9.
  • 10.
    “Three-dimensional reconstruction ofprotein networks provides insight into human genetic disease” (Wang et al, Nat. Biotech.) • Goal: Understand molecular mechanisms underlying human disease. • Method: Create interactome of genes, mutations and associated disorders, in context of protein- protein interactions. • Result: In-frame mutations occur at protein interfaces & disease specificity depends on location within an interface. • Conclusion: Predict 292 genes for 694 diseases.
  • 12.
    Nodes = proteins; edges = interactions; colorednodes = disease associated proteins
  • 13.
    “Protein networks aslogic functions in development and cancer” (Dutkowski et al, PLoS Comp Bio) • Goal: Understand how protein modules combine protein functions to create output signals. • Method: Network-Guided Forests to identify predictive modules and logic functions that connect module to component genes. • Result: Modules implement complex logic, not simple linear models. • Conclusion: Genetic effects of cancer genes are not additive, but engage in nontrivial combinatorial logic.
  • 16.
    “Reverse engineering ofTLX oncogenic transcriptional networks identifies RUNX1 as a tumor suppressor in T-ALL” (Gatta et al, Nat. Medicine) • Goal: Use transcriptional data to study the pathogenesis of T-cell ALL, understand regulation • Method: Network structure analysis of relationships gathered from expression analysis. • Result: TLX1 and TLX3 key regulators. RUNX1 is tumor supressor and shows high rates of loss-of- function mutations in T-ALL subjects • Conclusion: Network analyses can identify key cancer players.
  • 18.
    “Computational modeling ofpancreatic cancer reveals kinetics of metastasis suggesting optimum treatment strategies” (Haeno et al, Cell) • Goal: Math model of pancreatic cancer progression and impact of different drug dosing regimens. • Method: Differential equation modeling of cell growth, death and effects of drugs. • Result: Therapies that reduce growth rate of cells early look superior to upfront resection strategies. • Conclusion: Math modeling of cancer progression can yield insight into detailed risks/benefits of different treatment strategies.
  • 22.
    “Extracting a cellularhierarchy from high- dimensional cytometry data with SPADE” (Qiu et al, Nature Biotech) • Goal: Understand cellular heterogeneity in single cell measurements of hematopoietic cells. • Method: Spanning-tree progression analysis of density-normalized events (SPADE) to subcluster and connect cell groupings. • Result: Found hierarchies of related phenotypes recapitulating hematopoiesis. • Conclusion: Cytometry data allows separation, characterization and definition of relationships between single cells.
  • 25.
    “Computational design ofproteins targeting the conserved stem region of influenza hemagglutinin” (Fleishman et al, Science) • Goal: Design proteins for diagnostic and therapeutic purposes. • Method: Use protein prediction algorithms (Rosetta- derived) to design amino acid surface that bind target proteins. • Result: Developed two proteins that bind a conserved patch on 1918 H1N1 pandemic virus. • Conclusion: Designed proteins using knowledge- based methods can achieve high binding specificity.
  • 28.
  • 29.
    “Using electronic patientrecords to discover disease correlations and stratify patient cohorts” (Roque et al, PLoS Comp Bio) • Goal: Mine phenotype descriptions from EMR and connect to genetic networks. • Method: Extract free-text to classify patients and disease co-occurrence. Use OMIM to map to genetics. • Result: Large set of disease correlations, associated with genes. • Conclusion: EMR can identify new phenotype “syndromes” with genetic hooks.
  • 32.
    “Enabling enrichment analysiswith the human disease ontology” (LePendu et al, J Biomed Inf) • Goal: Enable enrichment analyses with ontologies that do not have manually curated annotation sets. • Method: Use GO annotations as filter to associate diseases to genes in 44,000 PubMed abstracts. • Result: Can associate disease terms with 30% of the genome, and reproduce known associations of aging genes. • Conclusion: Extension of enrichment analysis to other ontologies can use GO curation as a “seed.”
  • 34.
    “Detecting novel associationsin large data sets” (Reschef et al, Science) • Goal: Find interesting (nonlinear) relationships between pairs of variables in large data sets. • Method: Maximal information coefficient captures wide range of associations. • Result: Applied to global health, gene expression, baseball, microbiota in gut with good results. • Conclusion: Useful tool for finding associations in large data sets, e.g.
  • 36.
    “Toward Precision Medicine:Building a Knowledge Network for Biomedical Research and a New Taxonomy of Disease” (National Academies Report) • Goal: Explore the feasibility and need for a “new taxonomy” (NT) of human health based on molecular biology. • Conclusion: (1) a NT will lead to better health care, (2) the time is right, (3) NT should be developed, (4) a knowledge network of disease would enable NT, (5) new models for population- based research will enable NT, (6) redirection of resources could facilitate development.
  • 38.
  • 39.
    “Efficient replication ofover 180 genetic associations with self-reported medical data” (Tung et al, PLoS ONE) • Goal: Assess whether self-reported phenotypes are adequate for discovery. • Method: Attempt to replicate genetic associations from 23andme customers. • Result: 180 (70%) of associations replicated. • Conclusion: Self-reported phenotypes have lower precision but still allow discovery.
  • 42.
    “Rare de novovariants associated with autism implicate a large functional network of genes involved in formation and function of synapses” (Gilman et al, Cell) • Goal: Identify complex networks underlying common human phenotypes. • Method: Network based analysis of genetic associations (NETBAG) to identify genes affected by rare CNVs in autism. • Result: Perturbed synaptogenesis is associated with autism phenotype. • Conclusion: Networks help with analysis of rare variation.
  • 46.
  • 47.
    “The mystery ofmissing heretability: genetic interactions create phantom heritability” (Zuk et al, PNAS) • Goal: Understand why there is so much unexplained variability in setting of GWAS. • Method: Explore the role of genetic interactions with quantitative modeling. • Result: Assuming additive traits leads to overestimates of hereditability, and epistasis is common. • Conclusion: Not as much heretability is missing as is common assumed.
  • 48.
    phantom (π) vs.apparent (h2)
  • 49.
    “Performance of mutationpathogenicity prediction methods on missense variants” (Thusberg et al, Human Mutation) • Goal: Compare methods for predicting deleterious variants in protein sequences. • Method: 40,000 pathogenic and neutral variants tested vs. 9 methods. • Result: Performance Matthew’s CC 0.19 to 0.65. SNPs&GO and MutPred were best. • Conclusion: General purpose predictors still with limited capabilities.
  • 51.
    “A probabilistic disease-genefinder for personal genomes” (Yandell et al, Genome Res) • Goal: Find disease-causing variants in whole genome sequences. • Method: Bayesian variant prioritization for coding and non-coding variants combining several sequence features =Variant Annotation, Analysis & Search Tool (VAAST) • Result: Demonstrate ability to detect key genes in small cohorts, and common multigenic diseases. • Conclusion: Information integration for finding rare variants can be successful.
  • 54.
    “Technical desiderata forthe integration of genomic data into electronic health records” (Masys et al, J. Biomed Inf.) • Goal: Understand how genomic data differs from other health data in the medical record. • Conclusion: (1) Maintain separation of primary data and observations, (2) Support lossless compression , (3) Link observations to lab methods, (4) Compactly represent clinical actionability, (5) Support human and machine-readable formats, (6) Anticipate changes in our understanding of variation, (7) Support both clinical care and discovery science.
  • 56.
    “Genomics and privacy:implications of the new reality of closed data for the field” (Greenbaum et al, PLoS Comp Bio) • Goal: Examine state of genomic privacy in context of emerging privacy concerns. • Conclusions: (1) Changing ability to interpret genomes makes it a moving target, (2) Methods needed to divide genome into segments for anlaysis, (3) Modification of informed consent required, (4) Cloud computing may help control access, (5) Education challenges in analyzing personal genomes.
  • 57.
    Drugs Adverse Events& Interactions
  • 58.
    “Structure-based discovery ofprescription drugs that interact with the norepinephrine transporter, NET” (Schlessinger et al, PNAS) • Goal: Find new substrates for NET transporter, among prescription drugs. • Method: Model 3D structure, screen 6536 small molecules. • Result: 10/18 high scoring molecules inhibited NET. • Conclusion: Virtual screening against a modeled structure can provide valuable pharmacological info.
  • 60.
    “Predicting adverse drugreactions using publicly available PubChem BioAssay Data” (Pouliot et al, Clin Pharm & Ther.) • Goal: Develop method to predict adverse reactions to drugs based on bioassay data. • Method: Build regression models that relate performance in bioassays to adverse events. • Result: For 19 organ classes, 9 predictors successfully predict on cross-validation. • Conclusion: Bioassay data can be used to predict ADRs and may shed light on mechanism.
  • 62.
    “Predicting adverse drugevents using pharmacological network models” (Cami et al, Science Trans Med) • Goal: Create predictor for drug AEs based on training data from known drug-AE relations. • Method: Build individual regressions based on network connectivity features, ATC codes,AE codes, drug properties. • Result: AUROC 87% (42% sens with 95% spec) • Conclusion: Can use these network models to predict AEs before drugs released.
  • 64.
    “Detecting drug interactionsfrom adverse-event reports: interaction between paroxetine and pravastation increases blood glucose levels” (Tatonetti et al, Clin Pharm & Ther.) • Goal: Develop method for detecting latent signs of drug-drug interactions. • Method: Learn pattern for hyperglycemia on single drugs, apply to pairs of drugs. • Result: Paroxetine & Pravastatin with strong signal, seen in 3 EMRs, validated in mouse model. • Conclusion: Despite 0 reports in FDA-AERS, strong latent signal for hyperglycemia for Pa & Pr.
  • 67.
  • 68.
    “Prediction of drugcombinations by integrating molecular and pharmacological data” (Zhao et al, PLoS Comp Bio) • Goal: Predict effective drug combinations with molecular & pharmacological data. • Method: Look at approved drug combinations for specific patterns of features (targets, indications), and use these to predict new combinations. • Result: 69% of predictions have literature support. • Conclusion: This approach can help look for drug combinations that are likely to be effective.
  • 71.
    “PREDICT: a methodfor inferring novel drug indications with application to personalized medicine” (Gottlieb et al, Mol Sys Biol) • Goal: Find novel uses for existing drugs. • Method: Use drug-drug and disease-disease similarities to create new drug-disease pairs. • Result: Validated by assessing overlap with drugs currently in clinical trials. Also cross-validation .90. • Conclusion: Disease-specific signatures can predict new drugs with high cross-val.
  • 74.
    “Discovery and preclinicalvalidation of drug indications using compendia of public gene expression data” (Sirota et al, Science Trans Med) • Goal: Predict novel drug uses • Method: Compare molecular signatures (expression) for drugs and diseases and find complements. • Result: New indications for 164 drugs, some experimentally validated. • Conclusion: A computational method for suggesting drug repurposing. (Dudley et al in same issue showed validation of anticonvulsant topiramate for IBD using this method).
  • 76.
    Shout outs... “A systematicsurvey of loss-of-function variants in human protein-coding genes.” MacArthur et al, Science “PASTE: patient-centered SMS text tagging in a medication management system.” Stenner et al, JAMIA “Discovering disease associations by integrating electronic clinical data and medical literature” Holmes et al, PLoS ONE “Variants near FOXE1 are associated with hypothyroidism and other thyroid conditions: using electronic medical records for genome- and phenome-wide studies.” Denny et al, AJHG “BioNOT: a searchable database of biomedical negated sentences.” Agarwal et al, BMC Bioinformatics “The impact of risk information exposure on women’s beliefs about direct-to-consumer genetic testing for BRCA mutations” Gray et al, Clinical Genetics “Mapping clinical phenotype data elements to standardized metadata repositories and controlled terminologies; the eMERGE Network experience.” Pathak et al, JAMIA.
  • 77.
    Shout outs... “Evidence forhitchhiking of deleterious mutations within the human genome.” Chun & Fay, PLoS Genetics “The write position” Wren et al, EMBO Reports “Phase whole-genome genetic risk in a family quartet using a major allele reference sequence.” Dewey et al, PLoS Genetics “A quantitative analysis of adverse events and ‘overwarning’ in drug labeling” Duke et al, Arch. Internal Medicine “Making a definitive diagnosis: successful clinical application of whole genome sequencing in a child with intractable inflammatory bowel disease.” Worthey et al, Genetics in Medicine “Global analysis of disease-related DNA sequence variation in 10 healthy individuals: implications for whole genome-based clinical diagnostics.” Moore et al, Genetics in Medicine “Enterotypees of the human gut microbiome.” Arumugam et al, Nature
  • 78.
    2011 Crystal ball... Consumersequencing (vs. genotyping) will emerge Cloud computing will contribute to major biomedical discovery. Informatics applications to stem cell science will increase Important discoveries from text mining Population-based data mining will yield important biomedical insights Systems modeling will suggest useful polypharmacy Immune genomics will emerge as powerful data
  • 79.
    2011 Crystal ball... Consumersequencing (vs. genotyping) will emerge Cloud computing will contribute to major biomedical discovery. Informatics applications to stem cell science will increase Important discoveries from text mining Population-based data mining will yield important biomedical insights Systems modeling will suggest useful polypharmacy Immune genomics will emerge as powerful data
  • 80.
    2012 Crystal ball... Cloudcomputing will contribute to major biomedical discovery. Informatics applications to stem cell science will increase Immune genomics will emerge as powerful data Flow cytometry informatics will grow Molecular & expression data will combine for drug repurposing Exome sequencing will persist longer than expected Progress in interpreting non-coding DNA variations
  • 81.
    Thanks. See you in2013! russ.altman@stanford.edu