Stephen Friend Koo Foundation / Sun Yat-Sen Cancer Center 2012-03-12


Published on

Stephen Friend, Mar 12, 2012. Koo Foundation / Sun Yat-Sen Cancer Center, Taipei, Taiwan

Published in: Health & Medicine
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Stephen Friend Koo Foundation / Sun Yat-Sen Cancer Center 2012-03-12

  1. 1. Moving beyond linear investigations Both of the science and of how we workIntegrating layers of omics data models and building compute spaces capable of enabling models to be evolved by teams of teams Koo Foundation / Sun Yat-Sen Cancer Center March12, 2012 Stephen Friend MD PhD Sage Bionetworks (Non-Profit Organization) Seattle/ Beijing/ Amsterdam
  2. 2. So  what  is  the  problem?        Most  approved  therapies  were  assumed  to  be   monotherapies  for  diseases  represen4ng  homogenous   popula4ons    Our  exis4ng  disease  models  o9en  assume  pathway   knowledge  sufficient  to  infer  correct  therapies  
  3. 3. Familiar but Incomplete
  4. 4. Reality: Overlapping Pathways
  5. 5. Explosion  of  Biological  Genomic  &  Clinical  Informa<on  •  Computa<onal  methods  for  integra<ng  massive  molecular  and  clinical  datasets  obtained   across  sizable  popula<ons  into  predic<ve  disease  models  can  recapitulate  complex   biological  systems  •  Data  should  feed  and  refine  a  set  of  models  that  inform  our  understanding  of  disease   causality  as  well  as  generate  new  mechanisms,  targets,  diagnos<cs  and  knowledge.  
  6. 6. The value of appropriate representations/ maps
  7. 7. “Data Intensive” Science- Fourth Scientific Paradigm Equipment capable of generating massive amounts of data IT Interoperability Open Information System Host evolving computational models in a “Compute Space”
  9. 9. what will it take to understand disease?                    DNA    RNA  PROTEIN  (dark  maWer)    MOVING  BEYOND  ALTERED  COMPONENT  LISTS  
  10. 10. 2002 Can one build a “causal” model?
  11. 11. Preliminary Probabalistic Models- Rosetta /Schadt Networks facilitate direct identification of genes that are causal for disease Evolutionarily tolerated weak spots Gene symbol Gene name Variance of OFPM Mouse Source explained by gene model expression* Zfp90 Zinc finger protein 90 68% tg Constructed using BAC transgenics Gas7 Growth arrest specific 7 68% tg Constructed using BAC transgenics Gpx3 Glutathione peroxidase 3 61% tg Provided by Prof. Oleg Mirochnitchenko (University of Medicine and Dentistry at New Jersey, NJ) [12] Lactb Lactamase beta 52% tg Constructed using BAC transgenics Me1 Malic enzyme 1 52% ko Naturally occurring KO Gyk Glycerol kinase 46% ko Provided by Dr. Katrina Dipple (UCLA) [13] Lpl Lipoprotein lipase 46% ko Provided by Dr. Ira Goldberg (Columbia University, NY) [11] C3ar1 Complement component 46% ko Purchased from Deltagen, CA 3a receptor 1 Tgfbr2 Transforming growth 39% ko Purchased from Deltagen, CANat Genet (2005) 205:370 factor beta receptor 2
  12. 12. Building Realistic, Predictive Models of Disease: Can this lead us from gene to drug?
  13. 13. Extensive Publications now Substantiating Scientific Approach Probabilistic Causal Bionetwork Models• >60 Publications from Rosetta Genetics Group (~30 scientists) over 5 years including high profile papers in PLoS Nature and Nature GeneticsMetabolic "Genetics of gene expression surveyed in maize, mouse and man." Nature. (2003)Disease "Variations in DNA elucidate molecular networks that cause disease." Nature. (2008) "Genetics of gene expression and its effect on disease." Nature. (2008) "Validation of candidate causal genes for obesity that affect..." Nat Genet. (2009) ….. Plus 10 additional papers in Genome Research, PLoS Genetics, PLoS Comp.Biology, etcCVD "Identification of pathways for atherosclerosis." Circ Res. (2007) "Mapping the genetic architecture of gene expression in human liver." PLoS Biol. (2008) …… Plus 5 additional papers in Genome Res., Genomics, Mamm.GenomeBone "Integrating genotypic and expression data …for bone traits…" Nat Genet. (2005) d   ..approach to identify candidate genes regulating BMD…" J Bone Miner Res. (2009)Methods "An integrative genomics approach to infer causal associations ... Nat Genet. (2005) "Increasing the power to detect causal associations… PLoS Comput Biol. (2007) "Integrating large-scale functional genomic data ..." Nat Genet. (2008) …… Plus 3 additional papers in PLoS Genet., BMC Genet.
  14. 14. List of Influential Papers in Network Modeling   50 network papers 
  15. 15. (Eric Schadt)
  16. 16. “Data Intensive” Science- Fourth Scientific Paradigm Score Card for Medical Sciences Equipment capable of generating massive amounts of data A- IT Interoperability D Open Information System D- Host evolving computational models in a “Compute Space F
  17. 17. We still consider much clinical research as if w hunter gathers - not sharing .
  18. 18.  TENURE      FEUDAL  STATES      
  19. 19. Clinical/genomic data are accessible but minimally usableLittle incentive to annotate and curate data for other scientists to use
  20. 20. Mathematicalmodels of disease are not built to be reproduced orversioned by others
  21. 21. Lack of standard forms for future rights and consents
  22. 22. Sage Mission Sage Bionetworks is a non-profit organization with a vision to create a commons where integrative bionetworks are evolved by contributor scientists with a shared vision to accelerate the elimination of human diseaseBuilding Disease Maps Data RepositoryCommons Pilots Discovery Platform
  23. 23. Sage Bionetworks Collaborators  Pharma Partners   Merck, Pfizer, Takeda, Astra Zeneca, Amgen,Roche, Johnson &Johnson  Foundations   Kauffman CHDI, Gates Foundation  Government   NIH, LSDF, NCI  Academic   Levy (Framingham)   Rosengren (Lund)   Krauss (CHORI)  Federation   Ideker, Califano, Nolan, Schadt 27
  24. 24. Model of Breast Cancer: Co-expression A) Miller 159 samples B) Christos 189 samplesNKI: N Engl J Med. 2002 Dec 19;347(25):1999.Wang: Lancet. 2005 Feb 19-25;365(9460):671.Miller: Breast Cancer Res. 2005;7(6):R953.Christos: J Natl Cancer Inst. 2006 15;98(4):262. C) NKI 295 samples E) Super modules Cell cycle Pre-mRNA ECM D) Wang 286 samples Blood vessel Immune response 28 Zhang B et al., Towards a global picture of breast cancer (manuscript).
  25. 25. CHRIS  GAITERI-­‐ALZHEIMER’S   What  is  this?  Bayesian  networks  enriched  in  inflamma<on  genes    correlated  with  disease  severity  in  pre-­‐frontal  cortex  of  250  Alzheimer’s  pa<ents.   What  does  it  mean?  Inflamma<on    in  AD  is  an  interac<ve  mul<-­‐pathway  system.    More  broadly,  network  structure  organizes  complex  disease  effects  into  coherent  sub-­‐systems  and  can  priori<ze  key  genes.   Are  you  joking?  Gene  valida<on  shows  novel  key  drivers  increase  Abeta  uptake  and  decrease  neurite  length  through  an  ROS  burst.  (highly  relevant  to  AD  pathology)  
  26. 26. PLATFORM Sage Platform and Infrastructure Builders- ( Academic Biotech and Industry IT Partners...) PILOTS= PROJECTS FOR COMMONS Data Sharing Commons Pilots- (Federation, CCSB, Inspire2Live....) M S FOR MAP PLATNEW RULES GOVERN
  27. 27. Why not share clinical /genomic data and model building in the ways currently used by the software industry (power of tracking workflows and versioning
  28. 28. Leveraging Existing TechnologiesAddama Taverna tranSMART
  29. 29. sage bionetworks synapse project Watch What I Do, Not What I Say
  30. 30. sage bionetworks synapse project Most of the People You Need to Work with Don’t Work with You
  31. 31. sage bionetworks synapse project My Other Computer is Cloudera Amazon Google
  32. 32. Sage Metagenomics Project Processed Data (S3)•  > 10k genomic and expression standardized datasets indexed in SCR•  Error detection, normalization in mG•  Access raw or processed data via download or API in downstream analysis•  Building towards open, continuous community curation
  33. 33. Sage Metagenomics using Amazon Simple Workflow Full case study at
  34. 34. Synapse Roadmap•  Data Repository•  Projects and security Synapse Platform Functionality•  R integration •  Workflow templates•  Analysis provenance •  Social networking •  Publishing figures •  User-customized • Search •  Wiki & collaboration tools dashboards • Controlled Vocabularies •  Integrated management •  R Studio integration • Governance of restricted of cloud resources •  Curation tool integration data Internal Alpha Public Beta Testing Synapse 1.0 Synapse 1.5 Future Q1-2012 Q2-2012 Q3-2012 Q4-2012 Q1-2013 Q2-2013 Q3-2013 Q4-2013 • TCGA •  Predictive modeling •  TBD: Integrations with other •  METABRIC breast workflows visualization and analysis cancer challenge •  Automated processing of packages common genomics platforms•  40+ manually curated clinical studies•  8000 + GEO / Array Express datasets•  Clinical, genomic, compound sensitivity•  Bioconductor and custom R analysis Data / Analysis Capabilities
  35. 35. Four  Pilots  involving  Sage  Bionetworks   CTCAP   The  Federa<on   Portable  Legal  Consent   M S Sage  Congress  Project   FOR MAP PLAT NEW RULES GOVERN
  36. 36. Clinical Trial Comparator Arm Partnership (CTCAP)  Description: Collate, Annotate, Curate and Host Clinical Trial Data with Genomic Information from the Comparator Arms of Industry and Foundation Sponsored Clinical Trials: Building a Site for Sharing Data and Models to evolve better Disease Maps.  Public-Private Partnership of leading pharmaceutical companies, clinical trial groups and researchers.  Neutral Conveners: Sage Bionetworks and Genetic Alliance [nonprofits].  Initiative to share existing trial data (molecular and clinical) from non-proprietary comparator and placebo arms to create powerful new tool for drug development. Started Sept 2010
  37. 37. Shared clinical/genomic data sharing and analysis will maximize clinical impact and enable discovery•  Graphic  of  curated  to  qced  to  models  
  38. 38. The  Federa<on  
  39. 39. (Nolan  and  Haussler)  
  40. 40. sage federation:model of biological age Faster Aging Predicted  Age  (liver  expression)   Slower Aging Clinical Association -  Gender -  BMI -  Disease Age Differential Genotype Association Gene Pathway Expression Chronological  Age  (years)  
  41. 41. Reproducible  science==shareable  science   Sweave: combines programmatic analysis with narrativeDynamic generation of statistical reports using literate data analysis Sweave.Friedrich Leisch. Sweave: Dynamic generation of statistical reportsusing literate data analysis. In Wolfgang Härdle and Bernd Rönz,editors, Compstat 2002 – Proceedings in Computational Statistics,pages 575-580. Physica Verlag, Heidelberg, 2002. ISBN 3-7908-1517-9
  42. 42. Federated  Aging  Project  :     Combining  analysis  +  narra<ve     =Sweave Vignette Sage Lab R code + PDF(plots + text + code snippets) narrative HTML Data objectsCalifano Lab Ideker Lab Submitted Paper Shared  Data   JIRA:  Source  code  repository  &  wiki   Repository  
  43. 43. For 11/12 compounds, the #1 predictive feature in an unbiasedanalysis corresponds to the known stratifier of sensitivity #2  CML  lineage   CML lineage #1  EGFR  mut   EGFR mut #1  EGFR  mut   EGFR mut #1  CML  lineage   #1  EGFR  mut   CML linage EGFR mut #1  ERBB2  expr   ERBB2 expr Can  the  approach  make  new   mut   #1  BRAF   discoveries?   BRAF mut #1  HGF  expr   HGF expr #2  NRAS  mut   NRAS mut BRAF mut #1  BRAF  mut   #3  KRAS  mut   KRAS mut #2  NRAS  mut   NRAS mut BRAF mut #1  BRAF  mut   #3  KRAS  mut   KRAS mut #2  NRAS  mut   NRAS mut BRAF mut #1  BRAF  mut   #2  TP53  mut   TP53 mut #3  CDKN2A  copy   CDKN2A copy #1  MDM2  expr   MDM2 expr 48  
  44. 44. Portable  Legal  Consent   Ac<va<ng  Pa<ents   Enabling  Sharing   Becoming  Partners   (John  Wilbanks)  
  45. 45.  
  46. 46. Sage  Congress  Project   April  20  2012   RealNames  Parkinson’s  Project   Fanconi’s  Anemia  Revisi<ng  Breast  Cancer  Prognosis   (Responders  Compe<<ons-­‐  IBM-­‐DREAM)  
  47. 47. Networking  Disease  Model  Building