Friend UCSF 2012-07-27


Published on

Stephen Friend, July 27, 2012. University of California San Francisco, San Francisco, CA

Published in: Health & Medicine
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Friend UCSF 2012-07-27

  1. 1. Why not use data intensive science to build better models of diseases together? – beyond current rewards   Stephen  H  Friend  MD  PhD   Sage  Bionetworks  (non-­‐profit)   July  27,  2012   UCSF  
  2. 2. So  what  is  the  problem?        Most  approved  therapies  were  assumed  to  be   monotherapies  for  diseases  represen4ng  homogenous   popula4ons    Our  exis4ng  disease  models  o9en  assume  pathway   knowledge  sufficient  to  infer  correct  therapies  
  3. 3. Familiar but Incomplete
  4. 4. Reality: Overlapping Pathways
  5. 5. The value of appropriate representations/ maps
  6. 6. “Data Intensive” Science- Fourth Scientific Paradigm Equipment capable of generating massive amounts of data IT Interoperability Open Information System Host evolving computational models in a “Compute Space”
  8. 8. what will it take to understand disease?                    DNA    RNA  PROTEIN  (dark  maVer)    MOVING  BEYOND  ALTERED  COMPONENT  LISTS  
  9. 9. 2002 Can one build a “causal” model?
  10. 10. Preliminary Probabalistic Models- Rosetta /Schadt Networks facilitate direct identification of genes that are causal for disease Evolutionarily tolerated weak spots Gene symbol Gene name Variance of OFPM Mouse Source explained by gene model expression* Zfp90 Zinc finger protein 90 68% tg Constructed using BAC transgenics Gas7 Growth arrest specific 7 68% tg Constructed using BAC transgenics Gpx3 Glutathione peroxidase 3 61% tg Provided by Prof. Oleg Mirochnitchenko (University of Medicine and Dentistry at New Jersey, NJ) [12] Lactb Lactamase beta 52% tg Constructed using BAC transgenics Me1 Malic enzyme 1 52% ko Naturally occurring KO Gyk Glycerol kinase 46% ko Provided by Dr. Katrina Dipple (UCLA) [13] Lpl Lipoprotein lipase 46% ko Provided by Dr. Ira Goldberg (Columbia University, NY) [11] C3ar1 Complement component 46% ko Purchased from Deltagen, CA 3a receptor 1 Tgfbr2 Transforming growth 39% ko Purchased from Deltagen, CANat Genet (2005) 205:370 factor beta receptor 2
  12. 12. List of Influential Papers in Network Modeling   50 network papers 
  13. 13. (Eric Schadt)
  14. 14. “Data Intensive” Science- Fourth Scientific Paradigm Score Card for Medical Sciences Equipment capable of generating massive amounts of data A- IT Interoperability D Open Information System D- Host evolving computational models in a “Compute Space F
  15. 15. We still consider much clinical research as if we were hunter gathers - not sharing .
  16. 16.  TENURE      FEUDAL  STATES      
  17. 17. Clinical/genomic data are accessible but minimally usableLittle incentive to annotate and curate data for other scientists to use
  18. 18. Mathematicalmodels of disease are not built to be reproduced orversioned by others
  19. 19. Lack of standard forms for future rights and consents
  20. 20. Lack of data standards..
  21. 21. Background:  Informaon  Commons  for  Biological  Funcons  
  22. 22. Sage Mission Sage Bionetworks is a non-profit organization with a vision to create a commons where integrative bionetworks are evolved by contributor scientists with a shared vision to accelerate the elimination of human diseaseBuilding Disease Maps Data RepositoryCommons Pilots Discovery Platform
  23. 23. Sage Bionetworks Collaborators  Pharma Partners   Merck, Pfizer, Takeda, Astra Zeneca, Amgen, Johnson &Johnson  Foundations   Kauffman CHDI, Gates Foundation  Government   NIH, LSDF, NCI  Academic   Levy (Framingham)   Rosengren (Lund)   Krauss (CHORI)  Federation   Ideker, Califano, Nolan, Schadt 27
  24. 24. Biomedical Information Commons Better Models of Disease: INFORMATION COMMONS
  25. 25. Products/Approaches Technology Platform Governance Impactful Models Better Models of Disease: INFORMATION COMMONS Challenges
  26. 26. IT   Cons4tuencies   Individual   Pa4ents   Technology  PlaHorm     Biotech   Governance   ImpacHul  Models   BeVer  Models   Pa4ent   of  Disease:   Founda4ons   INFORMATION   COMMONS  Pharma   Challenges       Joint  Pa4ent/ Scien4st   Academic   Communi4es   Consor4a  
  27. 27. Ongoing  Sage  Bionetworks  Ini4a4ves     IT   Individual   Pharma   EU   Pa4ents   Roche PARTICIPATION            RNDP/FA/MEL   Takeda   Technology  PlaHorm     Communi2es  engaging    COMMONS  PLATFORM   Governance   WPP   ImpacHul  Models   Pa4ent   BeVer  Models  of   Founda4ons   Biotech   Disease:   INFORMATION   COMMONS  Common  Mind/   BrCA/Challenges  Mt.  Sinai  Neuro   Cell  Line     Challenges       Challenge   TCGA/Challenge   Joint  Pa4ent/ SB/Gates Academic   Scien4st   Discovery     Consor4a   ClearScience   Communi4es   Sage CCSB Network  
  28. 28. IT   Cons4tuencies   Individual   Pa4ents   Technology  PlaHorm     Biotech   Governance   ImpacHul  Models   BeVer  Models   Pa4ent   of  Disease:   Founda4ons   INFORMATION   COMMONS  Pharma   Challenges       Joint  Pa4ent/ Scien4st   Academic   Communi4es   Consor4a  
  29. 29. Impactful ModelsBreast Cancer: Co-expression Networks A) Miller 159 samples B) Christos 189 samplesNKI: N Engl J Med. 2002 Dec 19;347(25):1999.Wang: Lancet. 2005 Feb 19-25;365(9460):671.Miller: Breast Cancer Res. 2005;7(6):R953.Christos: J Natl Cancer Inst. 2006 15;98(4):262. C) NKI 295 samples E) Super modules Cell cycle Pre-mRNA ECM D) Wang 286 samples Blood vessel Immune response 33 Zhang B et al., Towards a global picture of breast cancer (manuscript).
  30. 30. CHRIS  GAITERI-­‐ALZHEIMER’S   What  is  this?  Bayesian  networks  enriched  in  inflammaon  genes    correlated  with  disease  severity  in  pre-­‐frontal  cortex  of  250  Alzheimer’s  paents.   What  does  it  mean?  Inflammaon    in  AD  is  an  interacve  mul-­‐pathway  system.    More  broadly,  network  structure  organizes  complex  disease  effects  into  coherent  sub-­‐systems  and  can  priorize  key  genes.   Are  you  joking?  Gene  validaon  shows  novel  key  drivers  increase  Abeta  uptake  and  decrease  neurite  length  through  an  ROS  burst.  (highly  relevant  to  AD  pathology)  
  31. 31. IMPACTFUL  MODELS   A  mul-­‐ssue  immune-­‐driven  theory  of  weight  loss   Hypothalamus   Lep4n   signaling   FaNy  acids   Macrophage/   inflamma4on   Liver   Adipose   M1  macrophage   Phagocytosis-­‐   Phagocytosis-­‐   induced  lipolysis   induced  lipolysis  
  32. 32. IT   Cons4tuencies   Individual   Pa4ents   Technology  PlaHorm     Biotech   Governance   ImpacHul  Models   BeVer  Models   Pa4ent   of  Disease:   Founda4ons   INFORMATION   COMMONS  Pharma   Challenges       Joint  Pa4ent/ Scien4st   Academic   Communi4es   Consor4a  
  33. 33. Two approaches to building common scientific and technical knowledge Every code change versioned Every issue trackedText summary of the completed project Every project the starting point for new workAssembled after the fact All evolving and accessible in real time Social Coding
  34. 34. Synapse is GitHub for Biomedical Data Every code change versioned Every issue trackedData and code versioned Every project the starting point for new workAnalysis history captured in real time All evolving and accessible in real timeWork anywhere, and share the results with anyone Social CodingSocial Science
  35. 35. Leveraging Existing TechnologiesAddama Taverna tranSMART
  36. 36. sage bionetworks synapse project Watch What I Do, Not What I Say
  37. 37. sage bionetworks synapse project Most of the People You Need to Work with Don’t Work with You
  38. 38. sage bionetworks synapse project My Other Computer is “The Cloud”
  39. 39. Data Analysis with SynapseRun Any ToolOn Any PlatformRecord in SynapseShare with Anyone
  40. 40. •  Automated  workflows  for  curaon,  QC,  and  sharing  of   1%/2* 53,6%(* !7"(%,2/"* large-­‐scale  datasets.  -./#"++0%(* (3&4"#* •  All  of  TCGA,  GEO,  and  user-­‐submiVed  data   processed  with  standard  normalizaon  methods.   1%/2* 53,6%(* !7"(%,2/"* •  Searchable  TCGA  data:  -./#"++0%(* (3&4"#* •  23  cancers   •  11  data  plahorms   •  Standardized  meta-­‐data  ontologies  -./#"++0%(* -./#"++0%(* !7"(%,2/"* !7"(%,2/"* 1%/2* 1%/2* (3&4"#* (3&4"#* 53,6%(* 53,6%(* !#"80)69"*&%8":* ;"("#6%(* !"#$%#&()"* ++"++&"(,*
  41. 41. 1%/2* 53,6%(* !7"(%,2/"* •  Comparison  of  many  modeling  approaches  applied  -./#"++0%(* (3&4"#* to  the  same  data.   •  Models  transparently  shared  and  reusable  through  -./#"++0%(* 1%/2* 53,6%(* !7"(%,2/"* Synapse.   (3&4"#* •  Displayed  is  comparison  of  6  modeling  approaches   to  predict  sensivity  to  130  drugs.   •  Extending  pipeline  to  evaluate  predicon  of  -./#"++0%(* -./#"++0%(* !7"(%,2/"* !7"(%,2/"* TCGA  phenotypes.   1%/2* 1%/2* (3&4"#* (3&4"#* •  Hosng  of  collaborave  compeons  to  compare   53,6%(* 53,6%(* models  from  many  groups.   1--&2-3$4567$ !#"80)69"*&%8":* *&+%,-./0$ ;"("#6%(* !"#$%#&()"* ++"++&"(,* !"#$%&()$
  43. 43. IT   Cons4tuencies   Individual   Pa4ents   Technology  PlaHorm     Biotech   Governance   ImpacHul  Models   BeVer  Models   Pa4ent   of  Disease:   Founda4ons   INFORMATION   COMMONS  Pharma   Challenges       Joint  Pa4ent/ Scien4st   Academic   Communi4es   Consor4a  
  44. 44. Clinical Trial Comparator Arm Partnership (CTCAP)  Description: Collate, Annotate, Curate and Host Clinical Trial Data with Genomic Information from the Comparator Arms of Industry and Foundation Sponsored Clinical Trials: Building a Site for Sharing Data and Models to evolve better Disease Maps.  Public-Private Partnership of leading pharmaceutical companies, clinical trial groups and researchers.  Neutral Conveners: Sage Bionetworks and Genetic Alliance [nonprofits].  Initiative to share existing trial data (molecular and clinical) from non-proprietary comparator and placebo arms to create powerful new tool for drug development. Started Sept 2010
  45. 45. Shared clinical/genomic data sharing and analysis will maximize clinical impact and enable discovery•  Graphic  of  curated  to  qced  to  models  
  46. 46. Arch2POCM  Restructuring  the  Precompeve   Space  for  Drug  Discovery   How  to  potenally  De-­‐Risk       High-­‐Risk  Therapeuc  Areas  
  47. 47. The  Federaon  
  48. 48. How can we accelerate the pace of scientific discovery? 2008   2009   2010   2011   Ways to move beyond “traditional” collaborations? Intra-lab vs Inter-lab Communication Pfizer CTI/ Industrial PPPs Academic Unions
  49. 49. (Nolan  and  Haussler)  
  50. 50. sage federation:model of biological age Faster Aging Predicted  Age  (liver  expression)   Slower Aging Clinical Association -  Gender -  BMI -  Disease Age Differential Genotype Association Gene Pathway Expression Chronological  Age  (years)  
  51. 51. Reproducible  science==shareable  science   Sweave: combines programmatic analysis with narrativeDynamic generation of statistical reports using literate data analysis Sweave.Friedrich Leisch. Sweave: Dynamic generation of statistical reportsusing literate data analysis. In Wolfgang Härdle and Bernd Rönz,editors, Compstat 2002 – Proceedings in Computational Statistics,pages 575-580. Physica Verlag, Heidelberg, 2002. ISBN 3-7908-1517-9
  52. 52. Portable  Legal  Consent   (Acvang  Paents)   John  Wilbanks  
  53. 53.  
  54. 54. IT   Cons4tuencies   Individual   Pa4ents   Technology  PlaHorm     Biotech   Governance   ImpacHul  Models   BeVer  Models   Pa4ent   of  Disease:   Founda4ons   INFORMATION   COMMONS  Pharma   Challenges       Joint  Pa4ent/ Scien4st   Academic   Communi4es   Consor4a  
  55. 55. What  is the problem?Our current models of disease biology are primitive and limit doctor’s understanding and ability to treat patientsCurrent incentives reward those whosilo information and work in closedsystems
  56. 56. The Solution: Competitions to crowd-source research in biology and other fields  Why competitions? •  Objective assessments •  Acceleration of progress •  Transparency •  Reproducibility •  Extensible, reusable models  Competitions in biomedical research •  CASP (protein structure) •  Fold it / EteRNA (protein / RNA structure) •  CAGI (genome annotation) •  Assemblethon / alignathon (genome assembly / alignment) •  SBV Improver (industrial methodology benchmarking) •  DREAM (co-organizer of Sage/DREAM competition)  Generic competition platforms •  Kaggle, Innocentive, MLComp
  57. 57. The Sage/DREAM breast cancer prognosis challengeGoal: Challenge to assess the accuracy of computational models designed topredict breast cancer survival using patient clinical and genomic dataWhy this is unique:  This Sage/DREAM Challenge is a pre-collated cohort: 2000 breast cancer samples from the Metabric cohort  Accessible to all: A cloud-based common compute architecture is being made available by Google to support the computational models needed to develop and test challenge models  New Rigor: •  Contestants will evaluate their models on a validation data set composed of newly generated data (provided by Dr. Anne-Lise Borreson Dale) •  Contestants must demonstrate their models can be reproduced by others  New incentives: leaderboard to energize participants, Science Translational Medicine publication for winning team  Breast cancer patients, funders and researchers can track this Challenge on BRIDGE, an open source online community being built by Sage and Ashoka Changemakers and affiliated with this Challenge
  58. 58. Sage/DREAM Challenge: Details and TimingPhase  1: July thru end-Sep 2012 Phase  2:  Oct 1 thru Nov 12, 2012  Training data: 2,000 breast cancer samples from METABRIC cohort   Evaluation of models in novel dataset. •  Gene expression •  Copy number   Validation data: ~500 fresh frozen •  Clinical covariates tumors from Norway group with: •  10 year survival •  Clinical covariates  Supporting data: Other Sage-curated •  10 year survival breast cancer datasets •  >1,000 samples from GEO   Gene expression and copy number data to be generated for model •  ~800 samples from TCGA evaluation •  ~500 additional samples from •  Sent to Cancer Research UK to Norway group generate data at same facility as •  Curated and available on METABRIC Synapse, Sage’s compute •  Models built on training data platform evaluated on newly generated data  Data released in phases on Synapse from now through end-September   Winners announced at November 12 DREAM conference  Will evaluate accuracy of models built on METABRIC data to predict survival in: •  Held out samples from METABRIC •  Other datasets
  59. 59. METABRIC  cohort:   1%/2* 53,6%(* !7"(%,2/"* 997  breast  cancer  samples  -./#"++0%(* (3&4"#* Clinical  covariates   1%/2* 53,6%(* !7"(%,2/"*-./#"++0%(* (3&4"#* Gene  expression   (Illumina  HT12v3)   Copy  number   (Affy  SNP  6.0)  -./#"++0%(* -./#"++0%(* !7"(%,2/"* !7"(%,2/"* 1%/2* 1%/2* (3&4"#* (3&4"#* 53,6%(* 53,6%(* 10  year  survival   !#"80)69"*&%8":* ;"("#6%(* !"#$%#&()"* ++"++&"(,* Loaded  through  Synapse  R  client  as   Bioconductor  objects.  
  60. 60. 1%/2* 53,6%(* !7"(%,2/"*-./#"++0%(* (3&4"#* 1%/2* 53,6%(* !7"(%,2/"*-./#"++0%(* (3&4"#*-./#"++0%(* -./#"++0%(* !7"(%,2/"* !7"(%,2/"* 1%/2* 1%/2* Custom  models  implement  train()  and   (3&4"#* (3&4"#* predict()  API.   53,6%(* 53,6%(* !#"80)69"*&%8":* ;"("#6%(* !"#$%#&()"* ++"++&"(,* Implementa)on  of  simple  clinical-­‐only  survival   model  used  as  baseline  predictor.  
  61. 61. Models  submiNed  and   Federa4on  modeling   evaluated  in  real-­‐4me   compe44on   leaderboard   >200  models  tested  within  3  months   9-.1+:2% 712<2:4=($) 5"8+,% &,%726(%*+,F) >#,%7+-#"34,#) >4<+<)71#G8#,%H"4#,) D+"6%I4+<) ?+C%D+"F2<4,) ?,"#+% 9+-"+:% *-.14,%9-4,,#$) >#,%J2F.2,) @+<4A+,2) 5"46%768+1) ;+,#$) B4.8+4% 9+""$%E2<+,) 784C2,4) !"#$%&#(#") D-(#.8% *+,-./%0-1(23.(4) >+,.+<) D+"4+,2% ?<:+"#/)
  62. 62. SummaryhVps://  -­‐  BCCOverview:0  Transparency,   Valida4on  in  novel  reproducibility   -./#"++0%(* 1%/2* (3&4"#* 53,6%(* !7"(%,2/"* dataset   1%/2* 53,6%(* !7"(%,2/"* -./#"++0%(* (3&4"#* -./#"++0%(* -./#"++0%(* !7"(%,2/"* !7"(%,2/"* 1%/2* 1%/2* (3&4"#* (3&4"#* 53,6%(* 53,6%(* !#"80)69"*&%8":* ;"("#6%(* !"#$%#&()"* ++"++&"(,*Publica4on  in  Science   Dona4on  of  Google-­‐Transla4onal  Medicine   scale  compute  space.   For  the  goal  of  promo4ng  democra4za4on  of  medicine…   Registra4on  star4ng  NOW…   sign  up  at:  
  63. 63. SUMMARY   These  new  data  intensive  models  of  disease     will  be  strikingly  powerful  They  will  not  arise  within  the  current  academic/industrial  loop   They  will  be  harder  and  more  expensive  than  we  can  accept   Cizenss  as  donors  of  data  insights  and  funds  will  be  crical   For  these  benefits  to  be  realized  -­‐  must  become  affordable   therefore  we  willl  need:   Compute  spaces   A  Commons   New  ways  of  being  rewarded   More  eyeballs  working  without  being  paid   More  willingness  to  share  ll  aoer  Clinical  Proof  of  Concept  
  64. 64. Alignments between the UCSF Strategic Plan and the matchstick pilots for the Information Commons being pursued by Sage Bionetworks •  Invest in infrastructure that enables UCSF to excel in basic, clinical and population science- SYNAPSE as a compute space -links with Michael Wiener/ OneMind •  Build a Bioinformatics initiative across all school by June 2014- Impactful Models / Challenges with Laura Esserman/ LauraVant’Veer •  Enhance existing data repositories and mining tools by June 2012 - Sage collaborations with Alex Pico, Geoff Manley •  Develop infrastructure to support new team-based, interdisciplinary learning models- PLC (The Federation?) •  Accelerate translation of groundbreaking science into therapies Arch2POCM