Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Amia tbi-14-final


Published on

Annual review of translational bioinformatics, 2014

Published in: Science
  • Be the first to comment

  • Be the first to like this

Amia tbi-14-final

  1. 1. Translational Bioinformatics 2014: TheYear in Review Russ B. Altman, MD, PhD Stanford University
  2. 2. Disclosures •Founder & Consultant, Personalis Inc (genome sequencing for clinical applications). •Funding support: NIH, NSF, Microsoft, Oracle, LightspeedVentures, PARSA Foundation. •I am a fan of informatics, genomics, medicine & clinical pharmacology.
  3. 3. Goals •Provide an overview of the scientific trends and publications in translational bioinformatics •Create a “snapshot” of what seems to be important in Spring, 2014 for the amusement of future generations. •Marvel at the progress made and the opportunities ahead.
  4. 4. Process 1. Follow literature through the year 2. Solicit nominations from colleagues 3. Search key journals and key topics on PubMed 4. Evaluate & ponder 5. Select papers to highlight in ~2-3 slides
  5. 5. Caveats •Translational bioinformatics = informatics methods that link biological entities (genes, proteins, small molecules) to clinical entities (diseases, symptoms, drugs)--or vice versa. •Considered last ~14 months (to this week) •Focused on human biology and clinical implications: molecules, clinical data, informatics. •NOTE: Amazing biological papers with straightforward informatics generally not included. •NOTE: Amazing informatics papers which don’t link clinical to molecular generally not included.
  6. 6. Final list •105 Semifinalists, 49 finalists •32 Presented here (briefly) + 10 “shout outs” •Apologies to those I misjudged. Mistakes are mine. •These slides and bibliography will be made available on •8 TOPICS: Controversies, Clinical genomics, Drugs, Genetic basis of disease, Emerging data sources, Mice, Scientific process, Odds & End.
  7. 7. Thanks! Conversations and recommendations Phil Bourne Josh Denny Joel Dudley Michel Dumontier Guy Fernald George Hripcsak Larry Hunter Konrad Karczewski Lang Li Yong Li Tianyun Liu Yves Lussier Dan Masys Hua Fan-Minogue Alex Morgan Sandy Napel Peter O’Donnell Lucila Ohno- Machado Chirag Patel Beth Percha Raul Rabadan Dan Roden Neil Sarkar Nigam Shah David States Jost Stuart Peter Tarczy- Hornoch Nick Tatonetti Laura Taylor Jessie Tenenbaum Olga Troyanskaya Piet van der Graaf Scott Waldman
  8. 8. Controversies
  9. 9. “Warning Letter. November 22, 2013” (Alberto Gutierrez, Director Office of InVitro Diagnostics & Radiological Health, US FDA to Ann Wojcicki, CEO, 23andme) • Goal: Stop marketing a ‘device’ that is not cleared. • Method: Send letter, acknowledge 14 face-to-face meetings, cite laws & regulations. • Result: 23andme suspending health advice on website, still providing raw data. • Conclusion: Do not mess with the FDA. FDA Document Number: GEN1300666
  10. 10. Nature,Vol 505, 16 Jan 2014. Robert Green & Nita Farahany.
  11. 11. “Why I read the network nonsense papers” (Lior Pachter, Prof. of Math, Berkeley ) • Goal: Use untraditional channels (blog) to voice concern over potentially flawed science. • Method: Blog posts with detailed analysis of papers and concerns about correctness of conclusions, especially directed at a particular colleague. • Result: Entertaining/informative set of accusations and responses, serving as a reminder to do diligence in literature review and technical content. • Conclusion: Do not mess with Lior Pachter.
  12. 12. Top of first of 38 pages of Blog + comments…
  13. 13. “Inconsistency in large pharmacogenomic studies” (Haibe-Kains et al, Nature) • Goal: Evaluate consistency of two major reports of cancer cell line drug sensitivity. • Method: Curate and compare results on same drugs, as possible. • Result: Correlation of drug sensitivity ranged from 0 to 0.6. • Conclusion: Do not mess with experimental data. PMID: 24284626
  14. 14. “Inconsistency in large pharmacogenomic studies” (Haibe-Kains et al, Nature) • Goal: Evaluate consistency of two major studies (CCLE & CGP) of cancer cell line drug sensitivity. • Method: Curate and compare results on same drugs, as possible. Result: Correlation of drug sensitivity ranged from 0 to 0.6. • Conclusion: High variability in experimental measures of drug sensitivity indicate extreme caution in using these measures uncritically. 24284626
  15. 15. “Inconsistency in large pharmacogenomic studies” (Haibe-Kains et al, Nature) • Goal: Evaluate consistency of two major studies (CCLE & CGP) of cancer cell line drug sensitivity. • Method: Curate and compare results on same drugs, as possible. Result: Correlation of drug sensitivity ranged from 0 to 0.6. • Conclusion: High variability in experimental measures of drug sensitivity indicate extreme caution in using these measures uncritically. 24284626
  16. 16. 24284626
  17. 17. Clinical Genomics
  18. 18. “A pharmacogenetic versus clinical algorithm for warfarin dosing” (Kimmel et al, NEJM) “A randomized trial of genotype-guided dosing of acenocoumarol and phenprocoumon” (Verhoef et al, NEJM) “A randomized trial of genotype-guided dosing of warfarin” (Pirmohamed et al, NEJM) •Goal: See if genetics improves warfarin dosing. •Method: Randomized trials vs. clinical algorithm OR standard of care. •Result: PGx beats standard of care, but not clinical algorithm. African- Americans seemed to do worse with PGx. •Conclusion: Study design matters, quality of execution matters, what SNPS are measured matters. 24251361
  19. 19. 24251361
  20. 20. “Clinically actionable genotypes among 10,000 patients with preemptive pharmacogenomic testing” (Van Driest et al, Clin Pharmacol Ther) • Goal: Estimate value of preemptive testing versus “reactive” testing for pharmacogenomics. • Method: Focus on five drug-gene interactions, . • Result: 1+ actionable variant in 91% of patients (96% of AA). “Reactive” strategy would generate 15K tests. • Conclusion: Most patients have at least one PGx variant, point of care availability helps, less total testing with preemptive strategy. 242563661
  21. 21. 242563661
  22. 22. 242563661
  23. 23. “Genic intolerance to functional variance and the intepretation of personal genomes” (Petrovski et al, PLoS Genetics) • Goal: Figuring out which mutations will most likely influence disease. • Method: Using 6503 exomes, create a scoring system for “intolerance” to mutations based on amount of observed genetic variation vs. expected. • Result: Mendelian disease genes very intolerant, striking variation within other classes. • Conclusion: May aid in identifying class-specific deleterious mutations. 23990802
  24. 24. 23990802 Intolerant Tolerant
  25. 25. 23990802
  26. 26. “A general framework for estimating the relative pathogenicity of human genetic variants” (Kircher et al, Nat Genetics) • Goal: Integrate diverse annotations into a single score for evaluating SNP probable impact on health. • Method: Combined Annotation-Dependent Depletion (C-Score) defined and computed for 8.6 billion SNPs using machine learning approach. • Result: C-score correlates with pathogenicity, disease severity, regulatory effects, allelic diversity. • Conclusion: CADD can prioritize functional, deleterious and pathogenic variants across many categories. 24487276
  27. 27. 24487276
  28. 28. “An informatics approach to analyzing the incidentalome” (Berg et al, Genet Med) Result: Categorized 2016 genes into bins based on clinical utility and validity, analyzed 80 genomes, created algorithm that selected variants worth pursuing. “Whole genome sequencing in support of wellness and health maintenance” (Patel et al, Genome Medicine) Result: Combine genetic and clinical markers to assess risk and make lifestyle recommendations. Shout Outs for Clinical Genomics 22995991 23806097
  29. 29. Drugs
  30. 30. “A CTD-Pfizer collaboration: manual curation of 88,000 scientific articles text mined for drug-disease and drug-phenotype interactions” (Davis et al, Database) • Goal: Curate the relationship of 1200 drugs to potential toxicities in CV, neuro, renal, liver. • Method: In one year, 5 curators curated 88K articles and 254,173 interactions (!). • Result: 152,173 chemical-disease, 58572 chemical- gene, 5345 gene-disease and 38083 chemical- phenotype. • Conclusion: Comprehensive manual curation of the literature is possible and useful. 24288140
  31. 31. 24288140
  32. 32. 24288140
  33. 33. “DGIdb: mining the druggable genome” (Griffith et al, Nature Methods) • Goal: Create central resource to associated mutated genes with their potential to be “drugged.” • Method: Mine existing gene-drug relationship resources, and bring into a single resource. • Result: 14,144 drug-gene interactions (2611 genes & 6307 drugs). 39 druggable gene categories. • Conclusion: is a useful compendium of existing and potential drug targets 24122041
  34. 34. 24122041 315 genes recurrently mutated in breast cancer
  35. 35. “Pathway-based screening strategy for multi target inhibitors of diverse proteins in metabolic pathways” (Hsu et al, PLoS Comp Bio) • Goal: Find ways to treat pathways and networks vs. single targets (to avoid resistance, ineffectiveness) • Method: Pathway-based screening using 3D structural information to find promiscuous inhibitors that hit multiple members of a pathway. • Result: Two inhibitors for pathways in H. pylori. • Conclusion: Shared small molecule binding properties within pathways may yield poly-active compounds. 23861662
  36. 36. 23861662
  37. 37. “Systematic identification of proteins that elicit drug side effects” (Kuhn et al, Mol Sys Biol) • Goal: Can we clarify the mechanism of action associated for drug side effects? • Method: Integrate drug-phenotype and drug-target relations to establish target-phenotype relations. • Result: 732 side effects with single protein associations, 137 of these with existing evidence. 1 novel proven experimentally (HTR7 and hyperesthesia) • Conclusion: Large fraction of drug side effects are mediated predominantly by single proteins. 23632385
  38. 38. 23632385
  39. 39. “Network-assisted prediction of potential drugs for addiction” (Sun et al, Biomed Res Intl) • Goal: Novel therapeutics are needed to battle addiction. • Method: Create a network of drugs and their associated genes, expand to include other drugs. • Result: Addictive drugs with similar actions cluster together. Predicted 94 non-addictive drugs that may modulate addictive response. • Conclusion: Network analyses provides candidate drugs for addiction treatment (or risk). 24689033
  40. 40. 24689033 Red = addictive drugs Green = drug targets
  41. 41. 24689033 Red = addictive drugs Yellow = nonaddictive drugs
  42. 42. “A drug repositioning approach identifies tricyclic antidepressants as inhibitors of small cell lung cancer and other neuroendocrine tumnors” (Jahchan et al, Cancer Discovery) • Goal: Find novel treatments for small cell lung cancer (SCLC, neuroendocrine subtype). • Method: Query gene expression compendium to find drugs that oppose or synergize with SCLC • Result: Tricyclic antidepressants consistently antagonize SCLC, induce SCLC apoptosis, activate stress pathways. • Conclusion: Expression data can suggest novel drug treatments for difficult diseases. 24078773
  43. 43. 24078773
  44. 44. 24078773
  45. 45. “Combinatorial therapy discovery using mixed integer linear programming” (Pang et al, Bioinformatics) Result: Combinatorial algorithm for maximizing coverage of targets, minimize off-targets for drug combinations. “The druggable genome: evaluation of drug targets in clinical trials suggests major shifts in molecular class and indication” (Rask- Andersen et al, Ann Rev Pharm Toxicol) Result: Analyzed clinical trials to find 475 novel targets. “Identification and characterization of potential drug targets by subtractive genome analyses of methicillin resistant Staphylococcus aureus” (Uddin & Saeed, Comp Biol & Chem) Result: Find non-homologous & essential proteins in MRSA genome to define new drug targets. Shout Outs for Drugs 24463180 24016212 24361957
  46. 46. Genetic basis of disease
  47. 47. “A nondegenerate code of deleterious variants in Mendelian loci contributes to complex disease risk” (Blair et al, Cell) • Goal: Understand genetic architecture of complex disease. • Method: Mine EMR of 110 million patients to associate Mendelian variation with complex disease. • Result: Each complex disorder linked to a unique set of Mendelian disorders. GWAS hits enriched in these, Mendelian variants contribute more to risk. • Conclusion: Complex diseases have comorbidity with Mendelian, with deep genetic overlap. Mendelian genes are key for complex disease 24074861
  48. 48. 24074861
  49. 49. 24074861
  50. 50. “Systematic comparison of phenome-wide association study of electronic medical record data and genome- wide association data” (Denny et al, Nature Biotech) • Goal: Replicate genetic associations using PheWAS. • Method: For each of 3144 SNPs, look for associations with 1358 EMR-defined phenotypes in 14K individuals. • Result: 51/77 associations replicated. 63 SNPs with pleiotropic associations. • Conclusion: EMR and PheWAS powerful tool for genetic discovery and replication. 24270849
  51. 51. 24270849 p = 0.05
  52. 52. “Coherent functional modules improve transcription factor target identification, cooperatively prediction, and disease association” (Karczewski et al, PLoS Genetics) • Goal: Understand role of transcription factors (TFs) in disease. • Method: Integrate TF binding data with functional gene modules from 9K expression experiments to establish associations of TFs to modules. • Result: 30 TF-TF associations (14 known). 4K TF- disease relationships, including MEF2A + Crohn’s. • Conclusion: Chip-Seq data + co-expression modules amplifies signal of TF-TF and TF-disease relations. 24516403
  53. 53. 24516403
  54. 54. 24516403
  55. 55. “Towards building a disease-phenotype knowledge base: extracting disease-manifestation relationship from literature” (Xu et al, Bioinformatics) • Goal: Catalog full set of disease manifestations • Method: Extract connections between disease and their manifestations using NLP. • Result: 119M sentences provide 121K Disease- Manifestation pairs, 99.2% of them previously not available in structured repository. • Conclusion:Automated characterization of disease will be useful for disease classification and ultimately treatment. 23828786
  56. 56. 23828786
  57. 57. “A common rejection module (CRM) for acute rejection across multiple organs identifies novel therapeutics for organ transplantation” (Khatri et al, J Exp Med) • Goal: Understand biology of acute rejection. • Method: Use expression data from 8 transplant data sets to find genes significantly and consistently over expressed in rejected organs. • Result: Defined a module of 11 genes present in all rejection samples. Suggested sensitivity to atorvastatin and dasatinib, based on their targets. • Conclusion: This CRM useful for both diagnosis & treatment of acute rejection in transplant. 24127489
  58. 58. 24127489 96 overexpressed genes linked by IPA tool. 96 overexpressed genes linked by IPA tool.
  59. 59. “Network models of genome-wide association studies uncover the topological centrality of protein interactions in complex disease” (Lee et al, JAMIA) Result: Complex trait associated loci are more likely to be hub and bottleneck genes in protein-protein interaction networks. Shout Outs for Genetic Basis of Disease 23355459
  60. 60. Emerging Data Sources
  61. 61. “A network based method for analysis of lncRNA- disease associations and prediction of lncRNAs implicated in disease” (Yang et al, PLoS ONE) • Goal: Understand role of Long non-coding RNAs (lncRNA) in disease • Method: Create network of lncRNA-disease associations from literature, and linked to known disease-genes. • Result: 295 lncRNAs associated with 801 genes in context of 214 diseases. Predict 768 new associations using shared links. Validated 3 of them. • Conclusion: lncRNAs have important role in regulating disease gene expression and thus disease. 24498199
  62. 62. 24498199 Nodes = disease Edge = shared RNA Nodes = RNA Edge = shared disease
  63. 63. “Lineage structure of the human antibody repertoire in response to influenza vaccination” (Jiang et al, Sci Trans Med) • Goal: Understand immune response to vaccines • Method: Sequence B-cell antibodies in 17 volunteers (young and old) after flu vaccine. • Result: Elderly subjects have decreased number of B-cell lineages, increased pre-vaccine diversity, decreased post-vaccine diversity. • Conclusion: Immune response evolves with age, and can be directly interrogated with NGS technology. 23390249
  64. 64. 23390249 Informatically defined lineages with influenza specificity in an elderly subject
  65. 65. “An integrated clinico-metabolomic model improves prediction of death in sepsis” (Langley et al, Sci Trans Med) • Goal: Understand predictors of death from sepsis. • Method: Combine metabolome and proteome of patients admitted with sepsis. • Result: Those who died from sepsis showed divergent profiles for fatty acid transport, b- oxidation, gluconeogenesis, citric acid cycle. Classifier created to predict survival. • Conclusion: Proteome/metabolome can predict outcomes in patient with sepsis. 23884467
  66. 66. 23884467
  67. 67. “Meta-analyses of studies of the human microbiota” (Lozupone et al, Genome Research) • Goal: Understand the ability to pool microbiome data across populations. • Method: Combine data from 12 studies to evaluate reproducibility. • Result: Different body sites consistently clear signal. Fecal samples dominated by local factors. Some unusual similarities suggest need for care. • Conclusion: Microbiome studies must select cases and controls carefully, and measure effect size with “out groups.” 23861384
  68. 68. 23861384
  69. 69. “PhenDisco: phenotype discovery system for the database of genotypes and phenotypes” (Doan et al, JAMIA) Result: It may be possible to search dbGAP! “Comorbidity clusters in autism spectrum disorders: an electronic health record time-series analysis” (Doshi-Velez et al, Pediatrics) Result: Three distinct syndromes/trajectories seen in ASD. “Network-based analysis of vaccine-related associations reveals consistent knowledge with the vaccine ontology” (Zhang et al, J Biomed Sem) Result: Identified connections between different vaccines and genes important for vaccine response ! Shout Outs for Emerging Data Sources 23989082 24323995 24209834
  70. 70. Mice. can’t live with ‘em, can’t live without ‘em…
  71. 71. “Knockouts model of the 100 best-selling drugs—will they model the next 100?” (Zambrowicz & Sands, Nature) • Goal: Evaluate value of mouse-knockouts for drug target discovery & validation. • Method: Retrospective evaluation for 100 best- selling drugs. • Result: Phenotypes correlate well with known drug efficacy. • Conclusion: Large-scale mouse knockout programs may be likely source of new targets and useful drugs. 12509758
  72. 72. 12509758
  73. 73. “Mouse model phenotypes provide information about human drug targets” (Hoehndorf et al, Bioinformatics) • Goal: Create automated methods for transferring data from model organisms (mice) to humans. • Method: Use metric of phenotypic similarity to map from mouse to human drug-relevant phenotypes. • Result: General method. Example mapping for diclofenac. • Conclusion: Semantic methods may be useful for automated mapping of mouse knockout phenotypes to relevant human disease phenotypes. 24158600
  74. 74. 24158600
  75. 75. “Genomic responses in mouse models poorly mimic human inflammatory disease” (Seok et al, PNAS) • Goal:Assess the utility of mouse models of acute inflammation. • Method:Assess gene expression changes in humans and mice for burn, trauma, endotoxemia. • Result: Mouse results don’t agree with humans. Mouse results don’t agree with mouse results. • Conclusion: Mouse models for human inflammatory diseases are not going to be useful. 23401516
  76. 76. 23401516
  77. 77. The scientific process
  78. 78. “Atypical combinations and scientific impact” (Uzzi et al, Science) • Goal: Understand why some papers have high scientific impact. • Method: Analyze frequency of co-citation between all pairs of papers. Define “conventionality” metric, and “tail” metric for out-of-discipline citations. • Result: High impact papers are both very conventional and feature unusual citations.Teams are 38% more likely than solo authors to do something novel. • Conclusion: Read and refer to papers outside your discipline. Write papers in groups. 24159044
  79. 79. 24159044
  80. 80. “Chapter 4: Protein Interactions and Disease” (Gonzalez & Kann, PLoS Comp Bio) • Goal: Disseminate knowledge about translational bioinformatics widely. • Method: Publish a textbook in an Open Source journal. • Result: “Translational Bioinformatics” edited by Kann, available at PLoS Comp Bio. 17 chapters + intro. • Conclusion: You can publish an open source textbook. Count citations to your chapter! 23300410
  81. 81. 23300410
  82. 82. “Quantifying long-term scientific impact” (Wang et al, Science) Result: Initial citation trajectory predicts lifetime trajectory. “A historic moment for open science: theYale University open data access project and Medtronic” (Krumholz et al, Ann Intern Med) Result: Created a model for sharing industrial trial data for re- analysis. “Evidence of community structure in biomedical research grant collaborations” (Nagarajan et al, J Biomed Inf) Result: CTSAs have encouraged more team-science and more collaborative publications. Shout Outs for the Scientific Process 24092745 23778908 22981843
  83. 83. Odds & End
  84. 84. “A haplotype-resolved genome and epigenome of the aneuploid HeLa cancer cell line” (Adey et al, Nature) • Goal: Understand the genomic features of the HeLa cell line genome. • Method: High quality, phased sequencing of the genome. • Result: Valuable map of genetic variations. Careful attention paid to sensitive release and data access involving NIH leadership & family. • Conclusion: HeLa cells continue to provide valuable information at genotype and phenotype levels. 23925245
  85. 85. 23925245
  86. 86. “A social network of hospital acquired infection built from electronic medical record data” (Cusumano- Tower et al, JAMIA) • Goal: Understand how infections spread in a hospital • Method: Use EMR to create social network of patient contacts, and simulate infectious outbreaks. • Result: Simulations reflect staffing and patient flow practices. • Conclusion: EMR allowed creation of robust network, useful for simulation. 23467473
  87. 87. 23467473 Room sharing Provider sharing Probability of spread (influenza) between wards MRSA Simulation results: seed in the MRI suite.
  88. 88. “The hidden geometry of complex, network-driven contagion phenomena” (Brockmann & Helbing, Science) • Goal: Understand global spread of epidemics. • Method: Wave propogation models applied to “effective distance” between locations based on air traffic flow. • Result: Method can predict arrival times and correct for discontinuities in effective distance. • Conclusion: You are closer to SARS than you think. 24337289
  89. 89. 24337289
  90. 90. “How do you feel?Your computer knows” (Geller, CACM) Result: Facial expression encodes emotions, and can be decoded by current algorithms. ! “Simulation of repetitive diagnostic blood loss and onset of iatrogenic anemia in critical care patients with a mathematical model” (Lyon et al, Comp in Biol & Med) Result: If you order too many blood tests, you can bleed your patient to death. This can be modeled with math. ! Shout Outs for the Scientific Process 23228481 DOI:10.1145/2555809
  91. 91. 2013 Crystal ball... Increased focus on methods to untangle regulatory control of clinical phenotypes Rare variant GWAS with exomes & genomes Microbiome integrated with immunology & metabolomics, and disease risk. Emphasis on non European-descent populations for discovery of disease associations Mobile computing resources for genomics Crowd-based discovery in translational bioinformatics
  92. 92. 2014 Crystal ball... Emphasis on non European-descent populations for discovery of disease associations Crowd-based discovery in translational bioinformatics Methods to recommend treatment for cancer based on genome/transcriptome Increase in “trained systems” (ala Watson) applications in translational bioinformatics Repurposing with combinations of drugs (vs. one) More cost-effectiveness evidence for genomics Linking essential genes, drug targets, and drug response
  93. 93. Thanks.