Computational challenges in precision medicine and genomics

1,451 views

Published on

Genomics is mapping complex data about human biology and promises major medical advances. In particular, genomics is enabling precision medicine, the use of a patient's genome and physiological state to improve therapeutic efficacy and outcome. However, routine use of genomics data in medical research is in its infancy, due mainly to the challenges of working with "Big data". These data are so complex and large that typical researchers are not able to cope with them. Collectively, these data require an understanding of many aspects of experimental biology and medicine to correctly process and interpret. Data size is also an issue, as individual researchers may need to handle tens of terabytes (genomes from a few hundred patients), which is challenging to download and store on typical workstations. To effectively support precision medicine, scientists from a wide range of disciplines, including computer science, must develop algorithms to improve precision medicine (e.g. diagnostics and prognostics), genome interpretation, raw data processing and secure high performance computing.

Published in: Science, Technology

Computational challenges in precision medicine and genomics

  1. 1. COMPUTATIONAL CHALLENGES IN PRECISION MEDICINE AND GENOMICS GARY BADER WWW.BADERLAB.ORG GOOGLE WATERLOO, JUNE 9, 2014
  2. 2. PRECISION MEDICINE •  TRADITIONAL MEDICINE, WITH MORE DATA •  DIAGNOSIS: ASSIGNING PATIENTS TO GROUPS –  BIOLOGY, DISEASE PROGRESSION, TREATMENT RESPONSE •  PERSONALIZED, BUT NOT EVERYONE HAS A DIFFERENT DISEASE NATURE MEDICINE 19, 249 (2013) DOI:10.1038/NM0313-249
  3. 3. NATIONAL COMPREHENSIVE CANCER NETWORK (NCCN) Breast Cancer Noninvasive Invasive Lobular Carcinoma In Situ Ductal Carcinoma In Situ Lobular Carcinoma Ductal Carcinoma Inflammatory
  4. 4. IMPROVING PRECISION WITH GENOMICS •  BRCA1/BRCA2 MUTATIONS PREDICT RISK •  COMMERCIAL PROGNOSTIC TESTS BASED ON GENE SIGNATURES HTTP://THEBIGCANDME.BLOGSPOT.CA/
  5. 5. GENOMICS •  NEW TECHNOLOGY FOR READING/WRITING DNA •  MEASURE OUR GENETIC CODE AND SYSTEM STATE •  LOTS OF VARIABLES – WHOLE GENOME, TRANSCRIPT AND PROTEIN EXPRESSION, SPLICING, CHROMATIN STRUCTURE, MOLECULAR INTERACTION, TRANSCRIPTION FACTOR, METHYLATION, METABOLITE, PATIENT PHENOTYPE
  6. 6. 2  
  7. 7. HTTP://WWW.LHSC.ON.CA/   SOURCE CODE ON DISK LOAD TO ACTIVE MEMORY COMPILER RUNNING SOFTWARE ACTIVE MEMORY 4 LETTER CODE (DNA/RNA BASES) 20 LETTER CODE (AMINO ACIDS) MEEPQSDPSVEPPLSQETFSDLWKLLPEN…GATGGGATTGGGGTTTTCCCCTCCCAT…
  8. 8. A PROTEIN IS A MOLECULAR MACHINE
  9. 9. DNA SEQUENCING •  RECENT MASSIVE BREAKTHROUGH •  CURRENT TECH: – ~10 HUMAN GENOMES, 1TB DATA/6 DAY RUN ILLUMINA,  GEORGE  CHURCH  
  10. 10. FEB. 1, 2013: DR. LEE HOOD RECEIVES HIS NATIONAL MEDAL OF SCIENCE FROM PRESIDENT OBAMA AT WHITE HOUSE CEREMONY
  11. 11. MORE BREAKTHROUGHS COMING WWW.NANOPORETECH.COM 20-NODE INSTALLATION = COMPLETE HUMAN GENOME IN 15 MINUTES MINION = USB CONNECTION, MINIMAL SAMPLE PREPARATION, $1000 DEVICE + CONSUMABLES
  12. 12. WHERE DOES THE DATA COME FROM? BARODA,  INDIA   TORONTO,  CANADA   VERMONT,  USA   CAMBRIDGE,  UK   MOLECULAR  BIOLOGY  LABS  AROUND  THE  WORLD  
  13. 13. BGI,  >160  MACHINES     THE  FACTORY  
  14. 14. COMPUTING NEEDS: 1 HUMAN GENOME •  ~125 BASE READ LENGTH X MILLIONS •  >30X COVERAGE •  ALIGNMENT TO REFERENCE GENOME •  COMPUTE VARIANTS (MUTATIONS) •  ANNOTATE VARIANTS •  COMPUTE TIME: UP TO 2 DAYS/GENOME – OPTIMIZED 4 HOURS: 128G/2CPU/SSD, 3.1GHZ •  MEDICALLY IMPORTANT TO BE FAST
  15. 15. THE POWER OF GENOMICS IN MEDICINE •  7000 RARE MONOGENIC DISEASES – 50% HAVE A KNOWN GENE RESPONSIBLE – QUADRUPLED RATE OF IDENTIFICATION SINCE 2012 •  BRAIN DOPAMINE-SEROTONIN VESICULAR TRANSPORT DISEASE AND ITS TREATMENT – TWO YEARS FROM DISEASE DEFINITION TO GENE IDENTIFICATION TO TREATMENT NAT REV GENET. 2013 OCT;14(10):681-91 N ENGL J MED. 2013 FEB 7;368(6):543-50
  16. 16. NON-INVASIVE PRENATAL TEST HTTP://WWW.PANORAMATEST.COM/
  17. 17. CANCER GENOMICS •  GERM LINE VS. SOMATIC MUTATIONS •  AIM: IDENTIFY FREQUENT MUTATIONS IN CANCER •  >11,000 TUMOUR GENOMES, 9M MUTATIONS HUMAN COLORECTAL CARCINOMA HTTPS://DCC.ICGC.ORG/  
  18. 18. COMPUTING CHALLENGES •  EXPONENTIAL DATA GROWTH (>MOORE’S LAW) – BILLIONS OF GENOMES – SIZE: >100GB/HUMAN GENOME, 4GB PROCESSED, MBS (JUST MUTATIONS) •  HETEROGENEOUS, NOISY, COMPLEX DATA – DATA SCIENTISTS, DOMAIN EXPERTS
  19. 19. COMPUTING WILL TRANSFORM MEDICINE GOAL: IMPROVE PATIENT OUTCOME
  20. 20. COMPUTATIONAL BIOLOGY •  RESEARCH: USING COMPUTERS TO ANSWER BIOLOGICAL/BIOMEDICAL QUESTIONS •  EXPLORE, INTERPRET AND DISCOVER: SEARCH •  SPEED AND ACCURACY: ALGORITHMS •  PREDICTING FUNCTIONAL MUTATIONS, PATIENT CLASSIFICATION: MACHINE LEARNING •  PRIVACY: DIFFERENTIAL PRIVACY, ENCRYPTION •  USABLE APPLICATIONS: SOFTWARE ENGINEERING
  21. 21. MedSavant search engine for genetic variants WWW.MEDSAVANT.COM Developers: Marc Fiume, James Vlasblom, Ron Ammar, Orion Buske, Eric Smith, Andrew Brook, Misko Dzamba, Khushi Chachcha, Sergiu Dumitriu Scientific Advisors: Christian Marshall, Kym Boycott, Marta Girdea, Peter Ray, Gary Bader, Michael Brudno
  22. 22. WWW.MEDSAVANT.COM
  23. 23. GENOMIC READS GENOMIC VARIANTS MARC FIUME, MIKE BRUDNO
  24. 24. GENOMIC READS GENOMIC VARIANTS MARC FIUME, MIKE BRUDNO
  25. 25. GLIOBLASTOMA MULTIFORME (N=215) GOLDENBERG, BRUDNO NATURE METHODS, 2014IDENTIFY DISEASE SUBTYPE SURVIVAL CLUSTERING SPEED DATA FUSION (NON-LINEAR, MESSAGE PASSING), UNSUPERVISED CLUSTERING
  26. 26. PREDICT TREATMENT RESPONSE •  SUPERVISED MACHINE LEARNING E.G. RHEUMATOID ARTHRITIS METHOTREXATE RESPONSE B New A A B B B A Personal Medical Network Responder Non-Responder New New patient (Predicted Non-Responder) Weakly similar Highly similar Response to treatment A Similar e.g. SNP, smoking status SHIRLEY HUI, RUTH ISSERLIN, HUSSAM KACA, TABITHA KUNG, KATHY SIMINOVITCH  
  27. 27. EXPLAINING GENOMICS DATA •  SNAPSHOTS OF SYSTEM STATE –  E.G. CANCER VS. NORMAL •  EXPLAIN WHY STATES DIFFER –  E.G. REGULATOR PERTURBATION – CAUSAL MODELING – PRIOR KNOWLEDGE ABOUT MECHANISM: PATHWAYS WITT H ET AL. CANCER CELL. 2011 AUG 16;20(2):143-57
  28. 28. GENOME++ MOLECULAR, PHYSIOLOGICAL PHENOTYPE ENCODES EXPLAINS ENVIRONMENT CELL MECHANISM THE HUMAN BODY •  A WETWARE COMPUTING SYSTEM MODULATES
  29. 29. A PROTEIN IS A MOLECULAR MACHINE
  30. 30. 1 INTERACTION (EDGE)
  31. 31. HO ET AL. NATURE 415(6868) 2002
  32. 32. LOGIC CIRCUIT (PATHWAY) HTTP://DISCOVER.NCI.NIH.GOV/KOHNK/INTERACTION_MAPS.HTML
  33. 33. THE CELL
  34. 34. ALAIN VIEL, HARVARD UNIVERSITY, 2007
  35. 35. HTTP://WWW.ENDOSZKOP.COM/ ~40 TRILLION CELLS, +TRILLIONS OF MICROBES (PARALLEL PROCESSING) BIANCONI ET AL. ANN HUM BIOL. 2013 NOV-DEC;40(6):471
  36. 36. Microtubule Cytoskeleton Cell Projection & Cell Motility Cell Proliferation Glycosylation Adhesion Regulation of GTPase Kinase Activity/Regulation CNS Development Intellectual Disability Autism GTPase/Ras Signaling Regulation of cell proliferation Positive regulation of cell proliferation Tyrosin kinase Vasculature develepment Palate develepment Organ Morphogenesis Behavior Heart develepment RHO Ras Membrane Kinase regulation Cell Motility (stricter cluster) Centrosome Nucleolus Cell cycle Regulation of hormone levels Aminoacid derivative / amine metabolism Synaptic vescicle maturation Reelin pathway LIS1 in neuronal migration and development Negative regulation of cell cycle cKIT pathwaymTor pathway Zn finger domain Carboxyl esterase domain Ras signaling GTPase regulator Neuron migration Cell Motility (stricter cluster) Cell morphogenesis Cell projection organization CNS development Brain development Neurite development CNS neuron differentiation Axonogenesis Projection neuron axonogenesis Cerebral cortex cell migration SMC flexible hinge domain Urea and amine group metabolism MHC-I Zoom of CNS-Development ID ID ASDASD Both 0% 12.5% Enriched in deletions FDR Known disease genes Enriched only in disease genes Node type (gene-set) Edge type (gene-set overlap) From disease genes to enriched gene-sets Between gene-sets enriched in deletions Between sets enriched in deletions and in disease genes or between disease sets only Pinto  et  al.  FuncJonal  impact  of  global  rare  copy  number  variaJon  in  auJsm   spectrum  disorders.  Nature.  2010  Jun  9.  
  37. 37. Adhe Reelin pathway development Nega regula of cell Neuron migration Cell Motility (stricter cluster) Cell morphogenesis Cell projection organization CNS development Brain development Neurite development CNS neuron differentiation Axonogenesis Projection neuron axonogenesis Cerebral cortex cell migration Zoom of CNS-Development
  38. 38. PATIENT #1 PATIENT #2 PATIENT #3 PATIENT #I PATHWAYGSI CNV-AFFECTED GENE COUNT = 1 COUNT = 1 COUNT = 1 COUNT = 0 •  IF WE HAVE AT LEAST ONE CNV AFFECTING AT LEAST ONE GENE IN A CERTAIN PATHWAY GI, THEN WE HAVE A PERTURBATION POTENTIAL IN THAT PATHWAY •  WE COUNT THE PRESENCE / ABSENCE OF SUCH PERTURBATION POTENTIAL IN PATIENTS PaJent  #1   PaJent  #2   PaJent  #3   …   PaJent  #i   …   PaJent  #n   GS1   1   1   1   …   0   …   0   GS2   0   0   1   …   1   …   0   GS3   0   0   0   …   0   …   0   DANIELE MERICO   PATHWAY ASSOCIATION TEST
  39. 39. DESCRIPTION: • THE SIGNIFICANCE OF A GENE-SET IS THEN ASSESSED USING THE FISHER S EXACT TEST FOR ASSOCIATION • A SIGNIFICANT GENE-SET IS AFFECTED BY A MUTATION POTENTIAL MORE FREQUENTLY IN CASES THAN CONTROLS • THE FDR IS ESTIMATED BY SHUFFLING THE COLUMNS IN THE GENE-SET BY PATIENT COUNT TABLE Case   Control   GSi   13   1   Not  in  GSi   1146  -­‐  13   889  -­‐  1   PaJent  #1   PaJent  #2   PaJent  #3   …   PaJent  #i   …   PaJent  #n   GS1   1   1   1   …   0   …   0   GS2   0   0   1   …   1   …   0   GS3   0   0   0   …   0   …   0   PATHWAY ASSOCIATION TEST
  40. 40. BENEFITS OF SYSTEMS THINKING •  IMPROVES STATISTICAL POWER – FEWER TESTS •  MORE REPRODUCIBLE – E.G. GENE EXPRESSION SIGNATURES •  EASIER TO INTERPRET – FAMILIAR CONCEPTS E.G. CELL CYCLE •  IDENTIFIES MECHANISM – CAN EXPLAIN CAUSE VS. PARTS THINKING
  41. 41. DATABASESEXPERIMENTS, PREDICTIONS LITERATURE EXPERTS GENOME++ MOLECULAR, PHYSIOLOGICAL PHENOTYPE ENCODES EXPLAINS ENVIRONMENT CELL MECHANISM MODULATES
  42. 42. HTTP://PATHWAYCOMMONS.ORG
  43. 43. THE FACTOID PROJECT MAX FRANZ, IGOR RODCHENKOV, OZGUN BABUR, EMEK DEMIR, CHRIS SANDER HELPING AUTHORS DIGITIZE THEIR PUBLISHED KNOWLEDGE HTTP://FACTOID.BADERLAB.ORG/
  44. 44. NETWORK VISUALIZATION AND ANALYSIS UCSD, ISB, AGILENT, MSKCC, PASTEUR, UCSF HTTP://CYTOSCAPE.ORG PATHWAY COMPARISON LITERATURE MINING GENE ONTOLOGY ANALYSIS ACTIVE MODULES COMPLEX DETECTION NETWORK MOTIF SEARCH
  45. 45. CYTOSCAPE.JS: HTML5 – TOUCH CYTOSCAPE.GITHUB.COM/CYTOSCAPE.JS/ MAX FRANZ
  46. 46. GENE FUNCTION PREDICTION HTTP://WWW.GENEMANIA.ORG QUAID MORRIS (DONNELLY) RASHAD BADRAWI, OVI COMES, SYLVA DONALDSON, MAX FRANZ, CHRISTIAN LOPES, FARZANA KAZI, JASON MONTOJO, HAROLD RODRIGUEZ, KHALID ZUBERI •  GUILT-BY-ASSOCIATION PRINCIPLE •  BIOLOGICAL NETWORKS ARE COMBINED INTELLIGENTLY TO OPTIMIZE PREDICTION ACCURACY •  ALGORITHM IS MORE FAST AND ACCURATE THAN ITS PEERS
  47. 47. SOCIAL CHALLENGES •  BIOETHICS AND DATA SHARING •  ENGAGING RESEARCHERS – CROWDSOURCING: TCGA PAN CANCER, DREAM •  ENCOURAGING RESEARCHERS TO EXPLORE UNCHARTED TERRITORY •  NEED FOR QUANTITATIVE THINKING IN BIOLOGY –  NEW PH.D. PROGRAM IN THE MOLECULAR GENETICS DEPARTMENT AT THE UNIVERSITY OF TORONTO NATURE. 2011 FEB 10;470(7333):163-5WWW.NATURE.COM/TCGA/
  48. 48. EPENDYMOMA •  3RD MOST COMMON BRAIN TUMOUR IN CHILDREN •  INCURABLE IN UP TO 45% OF PATIENTS STEVE  MACK,  MICHAEL  TAYLOR,  RUTH  ISSERLIN  -­‐  CANCER  CELL.  2011  AUG  16;20(2):143-­‐57   GENE  EXPRESSION   PATIENT  AGE   OVERALL  SURVIVAL  
  49. 49. EPENDYMOMA  GENOMIC  ANALYSIS   •  EPENDYMOMA  BRAIN  CANCER  -­‐  MOST  COMMON  AND  MORBID  LOCATION   FOR  CHILDHOOD  IS  THE  POSTERIOR  FOSSA  (PF  =  BRAINSTEM  +   CEREBELLUM)   •  TWO  SUBTYPES  BY  GENE  EXPRESSION:  PFA  -­‐  YOUNG,  DISMAL  PROGNOSIS,   PFB  -­‐  OLDER,  EXCELLENT  PROGNOSIS.   •  WHOLE  GENOME  SEQUENCING  (47  SAMPLES)  SHOWED  ALMOST  NO   MUTATIONS,  HOWEVER  DNA  METHYLATION  ARRAYS  SHOWED  CLEAR   CLUSTERING  INTO  PFA  AND  PFB  (79  SAMPLES)   •  PFA  MORE  TRANSCRIPTIONALLY  SILENCED  BY  CPG  METHYLATION   STEVE MACK, MICHAEL TAYLOR, SCOTT ZUYDERDUYN NATURE, FEB. 2014
  50. 50. POLYCOMB REPRESSOR COMPLEX 2 – INHIBITED BY DZNEP AND GSK343 – KILLED PFA CELLS NO KNOWN TREATMENT, SO NOW GOING TO CLINICAL TRIAL, COMPASSIONATE USE IN ONE PATIENT
  51. 51. 2 MONTHS 3 MONTHS 3 CYCLES VIDAZA 9 YO WITH METASTATIC PF EPENDYMOMA TO LUNG TREATED WITH AZACYTIDINE TREATMENT OF METASTATIC PF EPENDYMOMA WITH VIDAZA MICHAEL TAYLOR
  52. 52. ACKNOWLEDGEMENTS BADER LAB DOMAIN INTERACTION TEAM SHOBHIT JAIN BRIAN LAW JÜRI REIMAND MOHAMED HELMY ANDREA UETRECHT MARINA OLHOVSKY CANCER GENOMICS FLORENCE CAVALLI DAVID SHIH ASHA ROSTAMIANFAR PRECISION MEDICINE RON AMMAR SHIRLEY HUI FUNDING HTTP://BADERLAB.ORG PATHWAY AND NETWORK ANALYSIS RUTH ISSERLIN IGOR RODCHENKOV SCOTT ZUYDERDUYN RUTH WONG VERONIQUE VOISIN SHAHEENA BASHIR KHALID ZHUBERI CHRISTIAN LOPES JASON MONTOJO MAX FRANZ HAROLD RODRIGUEZ

×