Your SlideShare is downloading. ×
2014 07 ismb personalized medicine
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

2014 07 ismb personalized medicine


Published on

Atul Butte presentation at ISMB 2014 …

Atul Butte presentation at ISMB 2014

Published in: Health & Medicine

1 Like
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Big  Data  in  Biomedicine:   Transla3ng  300  trillion  points  of  data   into  new  drugs  and  diagnos3cs       Atul  Bu;e,  MD,  PhD   Chief,  Division  of  Systems  Medicine,     Departments  of  Pediatrics,  Gene3cs,     and,  by  courtesy,  Computer  Science,   Pathology,  and  Medicine   Center  for  Pediatric  Bioinforma3cs,  LPCH   Stanford  University   abu;     @atulbu;e   @ImmPortDB  
  • 2. Disclosures   •  Scien'fic  founder  and     advisory  board  membership   –  Genstruct   –  NuMedii   –  Personalis   –  Carmenta   •  Honoraria  for  talks   –  Lilly   –  Pfizer   –  Siemens   –  Bristol  Myers  Squibb   –  AstraZeneca   –  Roche   –  Genentech   •  Past  or  present  consultancy   –  Lilly   –  Johnson  and  Johnson   –  Roche   –  NuMedii   –  Genstruct   –  Tercica   –  Ecoeos   –  Ansh  Labs   –  Prevendia   –  Samsung   –  Assay  Depot   –  Regeneron   –  Verinata   –  Geisinger   –  Covance   •  Corporate  Rela'onships   –  Northrop  Grumman   –  Aptalis   –  Thomson  Reuters   •  Speakers’  bureau   –  None   •  Companies  started  by  students   –  Carmenta   –  Serendipity   –  NuMedii   –  S'mulomics   –  NunaHealth   –  Praedicat   –  MyTime   –  Flipora    
  • 3. Big  Data  in     Biomedicine  
  • 4. Nearly  1.4  million  microarrays  available   Doubles  every  2-­‐3  years   Bu;e  AJ.  Transla3onal  Bioinforma3cs:     coming  of  age.  JAMIA,  2008.  
  • 5. 127  million  substances  x   740,000  assays     1.2  billion  points  of  data   within  a  grid  of     100  trillion  cells     ~250  million  ac3ve   substances  
  • 6. 5,178  compounds   ·∙                1,300  off-­‐patent  FDA-­‐approved  drugs   ·∙                700  bioac've  tool  compounds   ·∙                2,000+  screening  hits  (MLPCN  and  others)   3,712  genes  (shRNA  +  cDNA)   ·∙                targets/pathways  of  FDA-­‐approved  drugs  (n=900)   ·∙                candidate  disease  genes  (n=600)   ·∙                community  nomina'ons  (n=500+)   15  cell  types   ·∙                Banked  primary  cell  types   ·∙                Cancer  cell  lines   ·∙                Primary  hTERT  immortalized   ·∙                Pa'ent  derived  iPS  cells   ·∙                5  community  nominated  
  • 7. Protein
  • 8. Protein Cancer  markers   Transplant  Rejec3on  markers  
  • 9. Preeclampsia:  large  cause  of  maternal  and   fetal  death   •  Incidence   •  5-­‐8%  of  all  pregnancies  in  the  U.S.  and  worldwide   •  4.1  million  births  in  the  U.S.  in  2009   •  Up  to  300K  cases  of  preeclampsia  annually  in  the  U.S.   •  Mortality   •  Responsible  for  18%  of  all  maternal  deaths  in  the  U.S.   •  Maternal  death  in  56  out  of  every  100,000  live  births  in  US   •  Neonatal  death  in  71  out  of  every  100,000  live  births  in  US   •  Cost   •  $20  billion  in  direct  costs  in  the  U.S  annually   •  Average  hospital  stay  of  3.5  days   Linda  Liu   Ma;  Cooper   Bruce  Ling  
  • 10. New  markers  for  preeclampsia   p  value   3.49  X  10-­‐4  1.79  X  10-­‐5   ng/ml   p  value  =  1.92  X  10-­‐8   Control   N=16   Preeclampsia   N=15   Control   N=16   Preeclampsia   N=17   GA  23-­‐34  weeks   GA  >  34  weeks   ng/ml   Gesta3onal  age  (weeks)   march of dimes® prematurity research center VERSION: MOD_PRC_LOGO_R7G_082712 at STANFORD University School of Medicine Linda  Liu   Bruce  Ling  
  • 11. Sequencing  Excitement   •  454/Roche,  Life  Technologies   •  Helicos:  $30k  genome   •  Pacific  Biosystems:  sequence   human  genome  in  15  minutes   •  Run  'mes  in  minutes     at  a  cost  of  hundreds  of  dollars   •  Complete  Genomics:   80  genomes/day   •  Ion  Torrent    and   Illumina:  ~$1500  per     genome   •  Oxford:  USB  s'ck  
  • 12. Lancet,  375:1525,  May  1,  2010.    
  • 13. Credit:  Euan  Ashley,  Russ  Altman,  Steve  Quake,  Lancet  
  • 14. •  Study  published  in  2008  in   Inflammatory  Bowel   Disease   •  Crohn’s  Disease  and   Ulcera've  Coli's   •  Inves'gated  9  loci  in  700   Finnish  IBD  pa'ents   •  We  record  100+  items   –  GWAS,  non-­‐GWAS  papers   –  Disease,  Phenotype   –  Popula'on,  Gender   –  Alleles  and  Genotypes   –  p-­‐value  (and  confidence)   –  Odds  ra'o  (and  confidence)   –  Technology,  Study  design   –  Gene'c  model   •  Mapped  to  UMLS  concepts  Rong  Chen   Optra  Systems  
  • 15. •  Study  published  in  2008  in   Inflammatory  Bowel   Disease   •  Crohn’s  Disease  and   Ulcera've  Coli's   •  Inves'gated  9  loci  in  700   Finnish  IBD  pa'ents   •  We  record  100+  items   –  GWAS,  non-­‐GWAS  papers   –  Disease,  Phenotype   –  Popula'on,  Gender   –  Alleles  and  Genotypes   –  p-­‐value  (and  confidence)   –  Odds  ra'o  (and  confidence)   –  Technology,  Study  design   –  Gene'c  model   •  Mapped  to  UMLS  concepts  
  • 16. •  Study  published  in   2009  in   Rheumatology   •  Ankylosing   spondyli's   •  Inves'gated  8   SNPs  in  IL23R  in   2000  UK  case-­‐ control  pa'ents   •  Tables  can  be  rotated   •  NLP  is  hard  
  • 17. •  Study  published  in   2009  in   Rheumatology   •  Ankylosing   spondyli's   •  Inves'gated  8   SNPs  in  IL23R  in   2000  UK  case-­‐ control  pa'ents   •  Tables  can  be  rotated   •  NLP  is  hard  
  • 18. •  Study  published  in   2009  in   Rheumatology   •  Ankylosing   spondyli's   •  Inves'gated  8   SNPs  in  IL23R  in   2000  UK  case-­‐ control  pa'ents   •  Tables  can  be  rotated   •  NLP  is  hard  
  • 19. What  are  the  alleles  for  rs1004819?  
  • 20. Alleles  for  rs1004819  are  C  and  T   ~11%  of  records  reported  genotypes  in  the  nega3ve  strand  
  • 21. Number  of   papers   curated   Number  of   records   Dis3nct  SNPs   Diseases  and   phenotypes   ~19,000   ~1.6  million   ~473,000   ~7,400   Rong  Chen   Anil  Patwardhan   Michael  Clark   Optra  Systems   Personalis   VARIMED:  Variants  Informing  Medicine   Chen  R,  Davydov  EV,  Sirota  M,  Bu;e  AJ.     PLoS  One.     2010  October:  5(10):  e13574.  
  • 22. Diseases  and  Traits   • Risk  factors  are  associated  with  an  increased  likelihood  of   developing  a  given  diseases   •  Smoking  à  chronic  obstruc've  pulmonary  disease   • Risk  factors  are  iden'fied  for  diseases  through  large  scale   epidemiological  studies,  which  are  resource  intensive   • GWAS  have  iden'fied  gene'c  variants  for  thousands  of   diseases  and  traits   • If  traits  and  diseases  share  the  same  associated  gene'c   variants,  could  the  trait  be  used  to  suggest  risk  factors  for   disease?   Li  L,  Ruau  DJ,  Patel  CJ,  Weber  SC,  Chen  R,  Tatonej  NP,  Dudley  JT,  Bu;e  AJ.     Science  Transla3onal  Medicine,  2014,  6(234).   Li  Li  
  • 23. EMR Cohort Identify significant disease-trait genetic associations and clinically validate using EMR data Gene counts > 3 Disease (n=201) Varimed   TF-IDF weighing Cosine distance Random shuffling Trait (n=85) Disease (n=69) Trait (n=249) Disease-Trait Pair (n=120) p < 1e-8 Disease modules (n=8) Gene3cs  Module   Clinical  Valida3on   Novel predictions (n=26) T q ≤ 0.01 D Published findings (n=94) T D Trait modules (n=7) Complications Diagnostic tests Risk factors 1st dx After dxBefore dx 1st dx Li  Li  
  • 24. Assessing  significance  of  disease-­‐trait  (D-­‐T)  pair   •  Each  gene  within  individual  disease  or  trait  by  taking  into  account  the   frequency  of  the  gene:  Term  Frequency–Inverse  Document  Frequency   •  2-­‐idf(i,  j)  =  2(i,  j)  ×  idfi,  =  ni,  j/(∑k  nk,  j)  x  log(D/Di)  which  adjusted  the  score  of  6(i,  j)  by  taking  into   account  the  popularity  level  of  the  gene  i.     •  e.g,  154  D+T,  28  genes  in  Alzheimer's  disease  and  5  genes  in  ESR,  CR1  was  in  common   •  s-­‐idf  (AD)=1/28  x  log(154/2,10)=0.067   •  s-­‐idf  (ESR)=1/5  x  log(154/2,10)=0.377   •  D-­‐T  distance  score  was  calculated  using  Cosine  distance  to  evaluate   similarity  between  all  pairs.   •  Randomly  sampling  all  the  genes  across  all  the  traits,  and  calculated  the  D-­‐T   similarity,  repeated  1,000  'mes  and  generated  the  q  value  based  on  the   number  of  the  samplings.   ∑∑ ∑ == = × × = • =− n i i n i i n i ii TD TD TD TD TDsimilarityine 1 2 1 2 1 )()( ),(cos =  0.9274524   Li  L,  Ruau  DJ,  Patel  CJ,  Weber  SC,  Chen  R,  Tatonej  NP,  Dudley  JT,  Bu;e  AJ.     Science  Transla>onal  Medicine,  2014,  6(234).   Li  Li  
  • 25. Li  Li  
  • 26. Li  Li  
  • 27. Categoriza3ons  for  known  D-­‐T  pairs  and  discover  poten3al   confounders  in  GWAS  studies   38 pairs 27 pairs 28 pairs 93 pairs T D Gene3c  Variants   TD Gene3c  Variants   Timing  of  Disease  Progression   Risk  Factor   Consequence   T D Gene3c  Variants   Diagnos3c  Test   Li  Li  
  • 28. Diagnos3c  tests  where  traits  occur  at  the  same  3me  as  disease   onset   An3body  3ter   Hepa<<s  B  vaccine  response   Png  et  al,  Hum  Mol  Genet,  2011   Even  though  this  GWAS  did  not  explicitly  par'cipants  with  the  autoimmune  diseases  above,  our   approach  inferred  known  rela'onships  between  diseases  and  traits  based  on  their  shared  gene'c   architecture     T D Gene3c  Variants   Diagnos3c  Test   Li  Li  
  • 29. Significant  genes  shared  between  an3body  3ter  and     16  autoimmune  diseases   Disease   Common  Genes   Genes  Shared   q-­‐value   Alopecia  areata   4   BTNL2;  C6orf10;  RDBP;  TNXB   <0.001   Ankylosing  spondyli's   2   BTNL2;  LOC100507436   0.001   Asthma   4   BTNL2;  C6orf10;  HLA-­‐DPA1;  NOTCH4;   <0.001   Biliary  liver  cirrhosis   3   BTNL2;  C6orf10;  HLA-­‐DPB1   0.003   Chronic  hepa''s  b   2   HLA-­‐DPA1;  HLA-­‐DPB1   <0.001   HIV  infec'on   7   C6orf10;  HLA-­‐C;  LOC100507436;  NOTCH4;  PRRC2A;  RDBP;  TNXB   <0.001   Membranous  nephropathy   15   AGPAT1;  BAG6;  BTNL2;  C6orf10;  EHMT2;  GPANK1;  LY6G5B;  LY6G6C;  NOTCH4;   PRRC2A;  RDBP;  RNF5;  SLC44A4;  TNXB;  ZBTB12   <0.001   Mul'ple  sclerosis   7   AGPAT1;  BAG6;  BTNL2;  C6orf10;  EHMT2;  NOTCH4;  TNXB   <0.001   Neonatal  lupus   3   BAG6;  C6orf10;  ZBTB12   <0.001   Primary  biliary  cirrhosis   3   BTNL2;  C6orf10;  HLA-­‐DPB1   0.005   Rheumatoid  arthri's   20   AGPAT1;  BAG6;  BTNL2;  C6orf10;  EHMT2;  GPANK1;  HLA-­‐C;  HLA-­‐DPA1;  HLA-­‐DPB1;   LOC100507436;  LY6G5B;  LY6G6C;  LY6G6F;  NOTCH4;  PRRC2A;  RDBP;  RNF5;   SLC44A4;  TNXB;  ZBTB12   <0.001   Systemic  lupus   erythematosus   9   BAG6;  BTNL2;  C6orf10;  GPANK1;  HLA-­‐DPB1;  NOTCH4;  PRRC2A;  TNXB;  ZBTB12   <0.001   Systemic  sclerosis   3   HLA-­‐DPA1;  HLA-­‐DPB1;  NOTCH4   <0.001   Type  1  diabetes   5   BAG6;  BTNL2;  C6orf10;  HLA-­‐C;  HLA-­‐DPB1   0.001   Vi'ligo   6   AGPAT1;  BTNL2;  NOTCH4;  RNF5;  SLC44A4;  TNXB   <0.001   Wegener's  granulomatosis   2   HLA-­‐DPA1;  HLA-­‐DPB1   <0.001   Li  Li  
  • 30. Risk  factors  where  traits  occur  prior  to  the  disease  onset  and  may   accompany  disease   Trait   Disease   Common  Genes   Genes  Shared   q-­‐value   Smoking   Chronic  obstruc've  pulmonary  disease   3   AGPHD1;  CHRNA3;  RAB4B   <0.001   Gene3cs  Variants   Known  clinical  study:  Smoking  is  the  primary  risk  factor  for   COPD  although  lixle  was  known  the  pathogenesis  between   smoking  and  COPD.  Pauwels  et  al,  2001,  Vestbo  et  al  2012     In  GWAS  study:  Six  GWAS  studies  are  related  to  COPD  in   VARIMED  and  their  COPD  cohorts  all  are  from  smoking   pa'ents.    Cho  et  al,  2012,  Pillai  SG,  2010,  Wang  et  al  2010,  Cho  et  al,  2010,   lambrechts  et  al,  2010,  Pillai  SG,  2009     As  COPD  occurs  ayer  smoking,  the  variants  associated  with   COPD  could  be  influenced  by  smoking,  and  the  gene'c   variants  for  COPD  could  be  unmasked  if  smoking  confounder   is  excluded  in  GWAS.   Smoking   COPD   Li  Li  
  • 31. Gene3c  Variants   Consequence  where  traits  occur  aqer  the  disease  onset   Trait   Common  Genes   Genes  Shared   q-­‐value   Alanine  aminotransferase  levels   1   C12orf51   0.001   Cholesterol  levels   3   ALDH2;  BRAP;  C12orf51   0.001   HDL  cholesterol  levels   2   C12orf51;  OAS3   <0.001   Known  clinical  study:  High  HDL  criterion  was  observed  with   triple  frequency  in  the  ADS  group,  high  cholesterol  diet  was   associated  with  ADS  pa'ents  ,  and    ALT  levels  have  been   seen  to  increase  with  daily  alcohol  intake  in  pa'ents  who   developed  ADS.  Kahl  et  al,  2010;  imhof  et  al,  2001,  Gross  GA,  1994     In  GWAS  study:  3  genes  for  cholesterol  levels  reported  by   Kato  et  al.  and  2  genes  for  ALT  and  HDL-­‐C  reported  by  Young   et  al.    could  be  biased  by  alcohol  effect  as  the  authors  did  not   perform  alcohol  intake  adjustment  or  controlled  for  drinking   habits  on  these  genes  in  their  GWAS  studies.  Kato  et  al,  2011;   Kamatani  et  al,  2010     The  GWAS  to  iden'fy  concrete  gene'c  variants  for  these   three  clinical  measurements  should  be  performed  in  pa'ents   without  ADS  as  a  confounder   Alcohol  dependence  syndrome     (ADS)   ALT   HDL-­‐C   ADS   Li  Li  
  • 32. 27  novel  pairs   Trait   Disease   Common   Genes   Genes  Shared   q-­‐value   Mean  corpuscular  volume   Acute  lymphoblas3c  leukemia   1   IKZF1   0.001   Mean  cell  hemoglobin  concentra3on   Alcohol  dependence   1   ALDH2   0.005   Platelet  count   Alcohol  dependence   1   C12orf51   0.007   Lung  func'on   Alopecia  areata   1   AGER   0.008   Erythrocyte  sedimenta3on  rate   Alzheimer's  disease   1   CR1   0.004   Prostate-­‐Specific  an'gen  levels   Basal  cell  carcinoma   1   CLPTM1L   0.004   Eye  color   Chronic  lymphocy'c  leukemia   1   IRF4   0.006   Freckles   Chronic  lymphocy'c  leukemia   1   IRF4   0.008   Blood  pressure   Esophageal  cancer   3   ALDH2,  C12orf51,  PLCE1   0.009   Factor  vii  coagulant  ac'vity   Esophageal  cancer   1   ADH4   0.008   Serum  magnesium  levels   Gastric  cancer   3   MUC1;  THBS3;  TRIM46   <0.001   Prostate-­‐Specific  an'gen  levels   Glioma   1   TERT   0.005   Alpha  linolenic  acid  levels   Glucose  intolerance   1   FADS1   0.01   Alanine  aminotransferase  levels   Hypertension   1   C12orf51   0.003   Serum  transferrin  levels   Hypertension   1   HFE   0.005   Smoking   Kawasaki  disease   1   RAB4B   0.003   Prostate-­‐Specific  an'gen  levels   Lung  cancer   2   CLPTM1L;  TERT   0.001   Homocysteine  levels   Melanoma   1   C16orf55   0.01   Protein  c  levels   Melanoma   2   NCOA6;  PIGU   <0.001   Transferrin  receptor  levels   Metabolic  syndrome   3   APOA5;  BUD13;  ZNF259   <0.001   PR  interval   Open-­‐Angle  glaucoma   1   CAV1   0.002   PR  interval   Restless  legs  syndrome   1   MEIS1   0.003   Bone  mineral  density   Sudden  cardiac  arrest   1   ESR1   0.006   Acenocoumarol  maintenance  dosage  Systemic  lupus  erythematosus   2   ITGAM;  ITGAX   0.004   Platelet  count   Tes'cular  cancer   1   BAK1   0.003   Prostate-­‐Specific  an'gen  levels   Tes'cular  cancer   2   CLPTM1L;  TERT   <0.001   Alkaline  phosphatase  levels   Venous  thromboembolism   1   ABO   0.008   Li  Li  
  • 33. Independent  pa3ent  cohort  valida3on:  clinical  data  warehouses   •  STRIDE:  clinical  data  warehouse,  has  ICD9  diagnoses  codes,  CPT  procedure   codes,  and  lab  results  on  over  1.7  million  pediatric  and  adult  pa'ents  at   Stanford  Hospital  and  Clinic,  independent  cohort   1/1/2005  to  7/15/2012   •  Collabora'ons  also  with  Columbia  University  and  Mount  Sinai  School  of   Medicine  to  validate  findings   •  Time  frame  for  analysis:  within  one  year  before  the  1st  disease  diagnosis  or   within  one  year  ayer  the  1st  disease  diagnosis   1st Dx Target  disease  (case)   Non-­‐target  disease  (control)   lab lab 1 year 1 year Li  Li  
  • 34. Serum  magnesium  levels  and  gastric  cancer   Li  Li  
  • 35.  
  • 36. Digital  compara3ve   effec3veness   Find  precision  subsets   If  entry  criteria  are  same,  outcome   measures  are  same,  and  comparable   studies,  can  perform  “meta-­‐trial”  
  • 37. Take  Home  Points   •  Personalized  medicine    ≥ DNA.    Will  include  other   clinical,  molecular,  and  environment  measures.   •  We  need  new  inves'gators  who  can  imagine  basic   ques'ons  to  ask  of  these  repositories  of  clinical   and  genomic  measurements.   •  Bioinforma'cs  is  not  just  about  building  tools.     We  know  our  tools;  we  should  use  them  first.   Don’t  be  afraid  to  test  your  ideas.  
  • 38. Funded  post-­‐doctoral   posi3ons  in   Transla3onal   Bioinforma3cs       Contact  Atul  Bu;e   abu;  
  • 39. Collaborators   •  Jeff  Wiser,  Patrick  Dunn,  Mike  Atassi  /  Northrop  Grumman   •  Ashley  Xia  and  Quan  Chen  /  NIAID   •  Takashi  Kadowaki,  Momoko  Horikoshi,  Kazuo  Hara,  Hiroshi  Ohtsu  /  U  Tokyo   •  Kyoko  Toda,  Satoru  Yamada,  Junichiro  Irie  /  Kitasato  Univ  and  Hospital   •  Shiro  Maeda  /  RIKEN   •  Alejandro  Sweet-­‐Cordero,  Julien  Sage  /  Pediatric  Oncology   •  Mark  Davis,  C.  Garrison  Fathman  /  Immunology   •  Russ  Altman,  Steve  Quake  /  Bioengineering   •  Euan  Ashley,  Joseph  Wu,  Tom  Quertermous  /  Cardiology   •  Mike  Snyder,  Carlos  Bustamante,  Anne  Brunet  /  Gene'cs   •  Jay  Pasricha  /  Gastroenterology   •  Rob  Tibshirani,  Brad  Efron  /  Sta's'cs   •  Hannah  Valan'ne,  Kiran  Khush/  Cardiology   •  Ken  Weinberg  /  Pediatric  Stem  Cell  Therapeu'cs   •  Mark  Musen,  Nigam  Shah  /  Na'onal  Center  for  Biomedical  Ontology   •  Minnie  Sarwal  /  Nephrology   •  David  Miklos  /  Oncology  
  • 40. Support   •  Lucile  Packard  Founda'on  for  Children's  Health   •  NIH:  NIAID,  NLM,  NIGMS,  NCI;  NIDDK,  NHGRI,  NIA,  NHLBI,  NCATS   •  March  of  Dimes   •  Hewlex  Packard   •  Howard  Hughes  Medical  Ins'tute   •  California  Ins'tute  for  Regenera've  Medicine   •  Luke  Evnin  and  Deann  Wright  (Scleroderma  Research  Founda'on)   •  Clayville  Research  Fund   •  PhRMA  Founda'on   •  Stanford  Cancer  Center,  Bio-­‐X,  SPARK   •  Tarangini  Deshpande   •  Alan  Krensky,  Harvey  Cohen   •  Hugh  O’Brodovich   •  Isaac  Kohane   Admin  and  Tech  Staff   •  Susan  Aptekar   •  Jen  Cory   •  Boris  Oskotsky