Stephen Friend Molecular Imaging Program at Stanford (MIPS) 2011-08-15


Published on

Stephen Friend, Aug 15, 2011. Molecular Imaging Program at Stanford (MIPS), Palo Alto, CA

Published in: Health & Medicine
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Stephen Friend Molecular Imaging Program at Stanford (MIPS) 2011-08-15

  1. 1. Use of Bionetworks to Build Maps of Diseases Stephen Friend MD PhD Sage Bionetworks (Non-Profit Organization) Seattle/ Beijing/ San Francisco MIPS Seminar Series August 15th, 2011
  2. 2. why consider the fourth paradigm- data intensive science thinking beyond the narrative, beyond pathways advantages of an open innovation compute space it is more about how than what
  3. 3. Alzheimer’s Diabetes Treating Symptoms v.s. Modifying DiseasesCancer Obesity Will it work for me?
  4. 4. Familiar but Incomplete
  5. 5. Reality: Overlapping Pathways
  7. 7. “Data Intensive Science”- “Fourth Scientific Paradigm”For building: “Better Maps of Human Disease” Equipment capable of generating massive amounts of data IT Interoperability Open Information System Evolving Models hosted in a Compute Space- Knowledge Expert
  8. 8. It is now possible to carry out comprehensive monitoring of many traits at the population levelMonitor disease and molecular traits in populations Putative causal gene Disease trait
  9. 9. what will it take to understand disease? DNA RNA PROTEIN (dark matter)MOVING BEYOND ALTERED COMPONENT LISTS
  10. 10. 2002 Can one build a “causal” model?
  11. 11. How is genomic data used to understand biology? RNA amplification Tumors Microarray hybirdization Tumors Gene Index !Standard"GWAS Approaches Profiling Approaches Identifies Causative DNA Variation but Genome scale profiling provide correlates of disease provides NO mechanism   Many examples BUT what is cause and effect?   Provide unbiased view of molecular physiology as it relates to disease phenotypes trait   Insights on mechanism   Provide causal relationships and allows predictions Integrated" ! Genetics Approaches
  12. 12. Integration of Genotypic, Gene Expression & Trait Data Schadt et al. Nature Genetics 37: 710 (2005) Millstein et al. BMC Genetics 10: 23 (2009) Causal Inference “Global Coherent Datasets” •  population based •  100s-1000s individuals Chen et al. Nature 452:429 (2008) Zhu et al. Cytogenet Genome Res. 105:363 (2004) Zhang & Horvath. Stat.Appl.Genet.Mol.Biol. 4: article 17 (2005) Zhu et al. PLoS Comput. Biol. 3: e69 (2007)
  13. 13. Constructing Co-expression Networks Start with expression measures for genes most variant genes across 100s ++ samples 1 2 3 4 Note: NOT a gene expression heatmap 1 1 0.8 0.2 -0.8 Establish a 2D correlation matrix 2 for all gene pairsexpression 0.8 1 0.1 -0.6 3 0.2 0.1 1 -0.1 4 -0.8 -0.6 -0.1 1 Brain sample Correlation Matrix Define Threshold eg >0.6 for edge 1 2 4 3 1 2 3 4 1 1 1 4 1 1 1 0 1 1 0 1 2 2 1 1 1 0 1 1 0 1 1 1 1 0 Hierarchically 3 Identify modules 4 0 0 1 0 2 3 cluster 4 3 0 0 0 1 1 1 0 1 Network Module Clustered Connection Matrix Connection Matrix sets of genes for which many pairs interact (relative to the total number of pairs in that set)
  14. 14. Preliminary Probabalistic Models- Rosetta /Schadt Networks facilitate direct identification of genes that are causal for disease Evolutionarily tolerated weak spots Gene symbol Gene name Variance of OFPM Mouse Source explained by gene model expression* Zfp90 Zinc finger protein 90 68% tg Constructed using BAC transgenics Gas7 Growth arrest specific 7 68% tg Constructed using BAC transgenics Gpx3 Glutathione peroxidase 3 61% tg Provided by Prof. Oleg Mirochnitchenko (University of Medicine and Dentistry at New Jersey, NJ) [12] Lactb Lactamase beta 52% tg Constructed using BAC transgenics Me1 Malic enzyme 1 52% ko Naturally occurring KO Gyk Glycerol kinase 46% ko Provided by Dr. Katrina Dipple (UCLA) [13] Lpl Lipoprotein lipase 46% ko Provided by Dr. Ira Goldberg (Columbia University, NY) [11] C3ar1 Complement component 46% ko Purchased from Deltagen, CA 3a receptor 1 Tgfbr2 Transforming growth 39% ko Purchased from Deltagen, CANat Genet (2005) 205:370 factor beta receptor 2
  15. 15. List of Influential Papers in Network Modeling   50 network papers 
  16. 16. (Eric Schadt)
  17. 17. Recognition that the benefits of bionetwork based molecularmodels of diseases are powerful but that they requiresignificant resourcesAppreciation that it will require decades of evolvingrepresentations as real complexity emerges and needs to beintegrated with therapeutic interventions
  18. 18. Sage Mission Sage Bionetworks is a non-profit organization with a vision to create a commons where integrative bionetworks are evolved by contributor scientists with a shared vision to accelerate the elimination of human diseaseBuilding Disease Maps Data RepositoryCommons Pilots Discovery Platform
  19. 19. Sage Bionetworks Collaborators  Pharma Partners   Merck, Pfizer, Takeda, Astra Zeneca, Amgen, Johnson &Johnson  Foundations   Kauffman CHDI, Gates Foundation  Government   NIH, LSDF  Academic   Levy (Framingham)   Rosengren (Lund)   Krauss (CHORI)  Federation   Ideker, Califarno, Butte, Schadt 22
  20. 20. Engaging Communities of Interest NEW MAPS Disease Map and Tool Users- ( Scientists, Industry, Foundations, Regulators...) PLATFORM Sage Platform and Infrastructure Builders- ( Academic Biotech and Industry IT Partners...) RULES AND GOVERNANCE Data Sharing Barrier Breakers- (Patients Advocates, Governance M and Policy Makers,  Funders...) APS FORM NEW TOOLS PLAT NEW Data Tool and Disease Map Generators- (Global coherent data sets, Cytoscape, RULES GOVERN Clinical Trialists, Industrial Trialists, CROs…) PILOTS= PROJECTS FOR COMMONS Data Sharing Commons Pilots- (Federation, CCSB, Inspire2Live....)
  21. 21. Platform Commons Research Cancer Neurological Disease Metabolic DiseaseCuration/Annotation Building Data Disease Repository Maps CTCAP Public Data Pfizer Merck Data Outposts Merck TCGA/ICGC Federation Takeda CCSB Astra Zeneca CHDI Commons Gates NIH Pilots LSDF-WPP Inspire2Live Hosting Data POC Hosting Tools Bayesian Models Co-expression Models Hosting Models Discovery Tools & Platform Methods KDA/GSVA LSDF
  22. 22. Example 1: Breast Cancer Coexpression Networks Module combination Partition BN Bayesian NetworkSurvival Analysis 25 Zhang B et al., manuscript
  23. 23. Generation of Co-expression & Bayesian Networks frompublished Breast Cancer Studies 4 Public Breast Cancer Datasets NKI: van de Vijver et al. A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med. 2002 Dec 19;347 295 samples (25):1999-2009. Wang Y et al. Gene-expression profiles to predict distant metastasis of lymph-node- negative primary breast cancer. Lancet. 286 samples 2005 Feb 19-25;365(9460):671-9. Miller: Pawitan Y et al. Gene expression profiling spares early breast cancer patients from adjuvant therapy: derived and 159 samples validated in two population-based cohorts. Breast Cancer Res. 2005;7(6):R953-64. Christos: Sotiriou C et al.. Gene expression profiling in breast cancer: understanding the molecular basis of 189 samples histologic grade to improve prognosis. J Natl Cancer Inst. 2006 Feb 15;98(4): 262-72.
  24. 24. Recovery of EGFR and Her2 oncoproteinsdownstream pathways by super modules
  25. 25. Comparison of Super-modules with EGFR and Her2 signaling and resistance pathways
  26. 26. Key Driver Analysis•  Identify key regulators for a list of genes h and a network N•  Check the enrichment of h in the downstream of each node in N•  The nodes significantly enriched for h are the candidate drivers 29
  27. 27. A) Cell Cycle (blue) B) Chromatin modification (black) C) Pre-mRNA Processing (brown) D) mRNA Processing (red) Global driver Global driver & RNAi validation 30
  28. 28. Signaling between Super Modules (View Poster presented by Bin Zhang)
  29. 29. Example 2. The Sage Non-Responder Project in Cancer •  To identify Non-Responders to approved drug regimens so Purpose: we can improve outcomes, spare patients unnecessary toxicities from treatments that have no benefit to them, and reduce healthcare costs Leadership: •  Co-Chairs Stephen Friend, Todd Golub, Charles Sawyers & Rich Schilsky Initial •  AML (at first relapse) Studies: •  Non-Small Cell Lung Cancer •  Ovarian Cancer (at first relapse) •  Breast Cancer •  Renal Cell •  Multiple MyelomaSage Bionetworks • Non-Responder Project
  30. 30. Bin ZhangModel of Alzheimer’s Disease Jun Zhu AD normal AD normal AD normal Cell cycle
  31. 31. AndersNew Type II Diabetes Disease Models Rosengren Global expression data 340 genes in islet-specific from 64 human islet donors open chromatin regions Blue module: 3000 genes Associated with Type 2 diabetes Elevated HbA1c Reduced insulin secretion 168 overlapping genes, which have •  Higher connectivity •  Markedly stronger association with •  Type 2 diabetes •  Elevated HbA1c •  Reduced insulin secretion •  Enrichment for beta-cell transcription factors and exocytotic proteins
  32. 32. New Type II Diabetes Disease Models Anders Rosengren•  Search across 1300 datasets in MetaGEO at Sage for similar expression profiles Top hit: Islet dedifferentiation study where the 168 genes were upregulated in mature islets and downregulated in dedifferentiated islets (Kutlu et al., Phys Gen 2009)•  Analyses of expression-SNPs and clinical SNPs as well as Causal Inference Test•  Identification of candidate key genes affecting beta-cell differentiation and chromatinWorking hypothesis:Normal beta-cell: open chromatin in islet-specific regions,high expression of beta-cell transcription factors,differentiated beta-cells and normal insulin secretionDiabetic beta-cell: lower expression of beta-cell transcriptionfactors affecting the identified module, dedifferentiation,reduced insulin secretion and hyperglycemiaNext steps: Validation of hypothesis and suggested key genes in human islets
  33. 33. Clinical Trial Comparator Arm Partnership (CTCAP)  Description: Collate, Annotate, Curate and Host Clinical Trial Data with Genomic Information from the Comparator Arms of Industry and Foundation Sponsored Clinical Trials: Building a Site for Sharing Data and Models to evolve better Disease Maps.  Public-Private Partnership of leading pharmaceutical companies, clinical trial groups and researchers.  Neutral Conveners: Sage Bionetworks and Genetic Alliance [nonprofits].  Initiative to share existing trial data (molecular and clinical) from non-proprietary comparator and placebo arms to create powerful new tool for drug development.
  34. 34. Examples: The Sage Federation •  Founding Lab Groups –  Seattle- Sage Bionetworks –  New York- Columbia: Andrea Califano –  Palo Alto- Stanford: Atul Butte –  San Diego- UCSD: Trey Ideker –  San Francisco: UCSF/Sage: Eric Schadt •  Initial Projects –  Aging –  Diabetes –  Warburg •  Goals: Share all datasets, tools, models Develop interoperability for human data
  35. 35. THE FEDERATIONButte Califano Friend Ideker Schadt vs
  36. 36. Federation s Genome-wide Network and Modeling ApproachCalifano group at Columbia Sage Bionetworks Butte group at Stanford
  37. 37. Human Aging Project Data Transformations Machine LearningBrain A(n=363) Interactome Elastic NetBrain B(n=145)Brain C TF Activity Profile Age(n=400) Network Prior Model Models Blood A(n=~1000) Gene Set / Pathway Variation Analysis Blood B Tree Classifiers(n=~1000)Adipose(n=~700)
  38. 38. Deriving Master Regulators from Transcription FactorsRegulatory Networks Glycolysis & Glycogenesis Metabolism Pathway
  39. 39. Inferring Prostate Cancer Regulatory Modules for Glycolysis &Glycogenesis Metabolism Pathway Sage bionetworks approach Prostate cancer global coherent data set (GSE21032) Taylor BS. et al (2010) Cancer Cell 18(1):11-22 Integrated Bayesian Approach Zhu J. et al (2008) Nature Genetics 40(7): 854-61 Glycolysis and Inferred Transcriptional Glycogenesis Metablism Regulatory Network in Prostate Gene Set (GGMSE) Cancer Cox Proportional-Hazards Prostate Cancer Regulatory Regression model based on Modules for GGMSE and Other individual gene for recurrence free Metabolism Pathways survivalDuarte N. et al (2006) PNAS 107(6):1777-1782 Metabolism pathways with regulatory modules enriched by poor prognosis genes for prostate cancer
  40. 40. Genes Associated with Poor Prognosis are disproportionallyfound among the networks regulating the !glycolysis" Genes P-Value<0.005 Size of the node proportional to -log10 P value for recurrence free survival. Inferred regulatory module for GGMSE Inferred regulatory module for Oxidative Phosphorylation and Sphingolipid >5 fold enrichment of recurrence free prognostic genes with Metabolism genes the Glycolysis BN module than random selection (p<1e-100)
  41. 41. Federated Aging Project : Combining analysis + narrative =Sweave Vignette Sage Lab R code + PDF(plots + text + code snippets) narrative HTML Data objectsCalifano Lab Ideker Lab Submitted Paper Shared Data JIRA: Source code repository & wiki Repository
  42. 42. Why not share clinical /genomic data and model building in the ways currently used by the software industry (power of tracking workflows and versioning
  43. 43. Synapse as a Github for building models of disease
  44. 44. Evolution of a Software Project
  45. 45. Biology Tools Support Collaboration
  46. 46. Potential Supporting TechnologiesAddama Taverna tranSMART
  47. 47. Platform for Modeling SYNAPSE
  51. 51. Eight Projects Initiated in last year
  52. 52. !Group DLEGAL STACK-ENABLING PAIENTS: John Wilbanks
  53. 53. Arch2POCMRestructuring Drug Discovery
  54. 54. Absurdity of Current R&D Ecosystem•  $200B per year in biomedical and drug discovery R&D•  Handful of new medicines approved each year•  Productivity in steady decline since 1950•  90% of novel drugs entering clinical trials fail•  NIH and EU just started spending billions to duplicate process•  Significant pharma revenues going off patent in next 5 years•  >30,000 pharma employees fired in each of last four years•  Number of R&D sites in Europe down from 29 to 16 since 2009
  55. 55. What is the problem?•  Regulatory hurdles too high?•  Low hanging fruit picked?•  Payers unwilling to pay?•  Genome has not delivered?•  Valley of death?•  Companies not large enough to execute on strategy?•  Internal research costs too high?•  Clinical trials in developed countries too expensive?In fact, all are true but none is the real problem
  56. 56. What is the problem?•  The current system is designed as if every new program is destined to deliver an approved drug•  Past 20 years prove this assumption wrong (again and again)•  Why do promising early results rarely translate into approved drugs?•  Bottom line: we have poor understanding of biology•  Lack of early-data sharing within closed information systems dooms drug discovery for frequent avoidable failure
  57. 57. What is the problem?We need to rebuild the drug discovery process so that webetter understand disease biology before testing proprietarycompounds on sick patients
  58. 58. The solution – Arch2POCM1.  Create an Archipelago of clinicians and scientists from public and private sectors to take projects from ideas to Proof of Clinical Mechanism (POCM)2.  Arch2POCM is a collaborative, data-sharing network of scientists, whose drug discovery objective is to use robust compounds against new targets to disentangle the complexity of human biology, not to create a medicine3.  Success? •  A compound that provides proof of concept for a novel target- allowing companies to use this common information to compete, with dramatic increased chances of success •  Culling targets with doomed mechanisms before multiple companies waste money exploring them - at $50M a pop
  59. 59. Why data sharing through to Phase IIb?•  Most rapidly reveals limitations and opportunities associated with the target•  Increases probability of success for internal proprietary programs•  Scientific decisions are not influenced by market considerations or biased internal thinking•  Target mechanism is only properly tested at Phase IIb
  60. 60. Why no IP on “Common Stream” compounds?•  Allows multiple groups to test diverse indications without funds from Arch2POCM- crowdsourcing drug discovery•  Broader and faster data dissemination•  Far fewer legal agreements to negotiate•  Generates “freedom to operate” on target because there are no patent thickets to wade through•  Efficient way to access world’s top scientists and doctors without hassle
  61. 61. Existing Team Ready to Execute
  62. 62. First major milestones2013- First Compound in clinical trials2014- Go and No-Go Decisions from common stream of targets drivingProprietary Programs2014- Full complement of target programs activated2014- Core Clinical Programs joined by crowdsourced clinical trials
  63. 63. why consider the fourth paradigm- data intensive science thinking beyond the narrative, beyond pathways advantages of an open innovation compute space it is more about how than what
  64. 64. OPPORTUNITIES FOR MIPS COMMUNITY Data sets, Tools and Models Joining Synapse Communities Joining Federation Projects Joinig Arch2POCMChange reward structures for sharing data (patients and academics)