Stephen Friend National Heart Lung & Blood Institute 2011-07-19


Published on

Stephen Friend July 19, 2011. National Heart Lung & Blood Institute, Bethesda, MD

Published in: Health & Medicine
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Stephen Friend National Heart Lung & Blood Institute 2011-07-19

  1. 1. From Gene networks to bioinformatics networks Stephen Friend MD PhD Sage Bionetworks (Non-Profit Organization) Seattle/ Beijing/ San Francisco NHLBI July 18th, 2011
  2. 2. why consider the fourth paradigm- data intensive science thinking beyond the narrative, beyond pathways advantages of an open innovation compute space it is more about how than what
  3. 3. COPD   Diabetes   Treating Symptoms v.s. Modifying DiseasesPulmonary  Fibrosis   Obesity     Will it work for me?
  4. 4. Familiar  but  Incomplete  
  5. 5. Reality: Overlapping Pathways
  7. 7. “Data Intensive Science”- “Fourth Scientific Paradigm”For building: “Better Maps of Human Disease” Equipment capable of generating massive amounts of data IT Interoperability Open Information System Evolving Models hosted in a Compute Space- Knowledge Expert
  8. 8. It is now possible to carry out comprehensive monitoring of many traits at the population levelMonitor disease and molecular traits in populations Putative causal gene Disease trait
  9. 9. what will it take to understand disease?                    DNA    RNA  PROTEIN  (dark  maKer)    MOVING  BEYOND  ALTERED  COMPONENT  LISTS  
  10. 10. 2002 Can one build a “causal” model?
  11. 11. How is genomic data used to understand biology? RNA amplification Tumors Microarray hybirdization Tumors Gene Index !Standard"GWAS Approaches Profiling Approaches Identifies Causative DNA Variation but Genome scale profiling provide correlates of disease provides NO mechanism   Many examples BUT what is cause and effect?   Provide unbiased view of molecular physiology as it relates to disease phenotypes trait   Insights on mechanism   Provide causal relationships and allows predictions Integrated" ! Genetics Approaches
  12. 12. Integration of Genotypic, Gene Expression & Trait Data Schadt et al. Nature Genetics 37: 710 (2005) Millstein et al. BMC Genetics 10: 23 (2009) Causal Inference “Global Coherent Datasets” •  population based •  100s-1000s individuals Chen et al. Nature 452:429 (2008) Zhu et al. Cytogenet Genome Res. 105:363 (2004) Zhang & Horvath. Stat.Appl.Genet.Mol.Biol. 4: article 17 (2005) Zhu et al. PLoS Comput. Biol. 3: e69 (2007)
  13. 13. Constructing Co-expression Networks Start with expression measures for genes most variant genes across 100s ++ samples 1 2 3 4 Note: NOT a gene expression heatmap 1 1 0.8 0.2 -0.8 Establish a 2D correlation matrix 2 for all gene pairsexpression 0.8 1 0.1 -0.6 3 0.2 0.1 1 -0.1 4 -0.8 -0.6 -0.1 1 Brain sample Correlation Matrix Define Threshold eg >0.6 for edge 1 2 4 3 1 2 3 4 1 1 1 4 1 1 1 0 1 1 0 1 2 2 1 1 1 0 1 1 0 1 1 1 1 0 Hierarchically 3 Identify modules 4 0 0 1 0 2 3 cluster 4 3 0 0 0 1 1 1 0 1 Network Module Clustered Connection Matrix Connection Matrix sets of genes for which many pairs interact (relative to the total number of pairs in that set)
  14. 14. Preliminary Probabalistic Models- Rosetta /Schadt Networks facilitate direct identification of genes that are causal for disease Evolutionarily tolerated weak spots Gene symbol Gene name Variance of OFPM Mouse Source explained by gene model expression* Zfp90 Zinc finger protein 90 68% tg Constructed using BAC transgenics Gas7 Growth arrest specific 7 68% tg Constructed using BAC transgenics Gpx3 Glutathione peroxidase 3 61% tg Provided by Prof. Oleg Mirochnitchenko (University of Medicine and Dentistry at New Jersey, NJ) [12] Lactb Lactamase beta 52% tg Constructed using BAC transgenics Me1 Malic enzyme 1 52% ko Naturally occurring KO Gyk Glycerol kinase 46% ko Provided by Dr. Katrina Dipple (UCLA) [13] Lpl Lipoprotein lipase 46% ko Provided by Dr. Ira Goldberg (Columbia University, NY) [11] C3ar1 Complement component 46% ko Purchased from Deltagen, CA 3a receptor 1 Tgfbr2 Transforming growth 39% ko Purchased from Deltagen, CANat Genet (2005) 205:370 factor beta receptor 2
  15. 15. List of Influential Papers in Network Modeling   50 network papers 
  16. 16. (Eric Schadt)
  17. 17. Recognition that the benefits of bionetwork based molecularmodels of diseases are powerful but that they requiresignificant resourcesAppreciation that it will require decades of evolvingrepresentations as real complexity emerges and needs to beintegrated with therapeutic interventions
  18. 18. Sage Mission Sage Bionetworks is a non-profit organization with a vision to create a commons where integrative bionetworks are evolved by contributor scientists with a shared vision to accelerate the elimination of human diseaseBuilding Disease Maps Data RepositoryCommons Pilots Discovery Platform
  19. 19. Sage Bionetworks Collaborators  Pharma Partners   Merck, Pfizer, Takeda, Astra Zeneca, Amgen, Johnson &Johnson  Foundations   Kauffman CHDI, Gates Foundation  Government   NIH, LSDF  Academic   Levy (Framingham)   Rosengren (Lund)   Krauss (CHORI)  Federation   Ideker, Califarno, Butte, Schadt 21
  20. 20. Engaging Communities of Interest NEW MAPS Disease Map and Tool Users- ( Scientists, Industry, Foundations, Regulators...) PLATFORM Sage Platform and Infrastructure Builders- ( Academic Biotech and Industry IT Partners...) RULES AND GOVERNANCE Data Sharing Barrier Breakers- (Patients Advocates, Governance M and Policy Makers,  Funders...) APS FORM NEW TOOLS PLAT NEW Data Tool and Disease Map Generators- (Global coherent data sets, Cytoscape, RULES GOVERN Clinical Trialists, Industrial Trialists, CROs…) PILOTS= PROJECTS FOR COMMONS Data Sharing Commons Pilots- (Federation, CCSB, Inspire2Live....)
  21. 21. Platform Commons Research Cancer Neurological Disease Metabolic DiseaseCuration/Annotation Building Data Disease Repository Maps CTCAP Public Data Pfizer Merck Data Outposts Merck TCGA/ICGC Federation Takeda CCSB Astra Zeneca CHDI Commons Gates NIH Pilots LSDF-WPP Inspire2Live Hosting Data POC Hosting Tools Bayesian Models Co-expression Models Hosting Models Discovery Tools & Platform Methods KDA/GSVA LSDF
  22. 22. Bin ZhangModel of Breast Cancer: Co-expression Xudong Dai Jun Zhu A) Miller 159 samples B) Christos 189 samplesNKI: N Engl J Med. 2002 Dec 19;347(25):1999.Wang: Lancet. 2005 Feb 19-25;365(9460):671.Miller: Breast Cancer Res. 2005;7(6):R953.Christos: J Natl Cancer Inst. 2006 15;98(4):262. C) NKI 295 samples E) Super modules Cell cycle Pre-mRNA ECM D) Wang 286 samples Blood vessel Immune response Zhang B et al., Towards a global picture of breast cancer (manuscript).
  23. 23. Bin ZhangModel of Alzheimer’s Disease Jun Zhu AD normal AD normal AD normal Cell cycle
  24. 24. AndersNew Type II Diabetes Disease Models Rosengren Global expression data 340 genes in islet-specific from 64 human islet donors open chromatin regions Blue module: 3000 genes Associated with Type 2 diabetes Elevated HbA1c Reduced insulin secretion 168 overlapping genes, which have •  Higher connectivity •  Markedly stronger association with •  Type 2 diabetes •  Elevated HbA1c •  Reduced insulin secretion •  Enrichment for beta-cell transcription factors and exocytotic proteins
  25. 25. New Type II Diabetes Disease Models Anders Rosengren•  Search across 1300 datasets in MetaGEO at Sage for similar expression profiles Top hit: Islet dedifferentiation study where the 168 genes were upregulated in mature islets and downregulated in dedifferentiated islets (Kutlu et al., Phys Gen 2009)•  Analyses of expression-SNPs and clinical SNPs as well as Causal Inference Test•  Identification of candidate key genes affecting beta-cell differentiation and chromatinWorking hypothesis:Normal beta-cell: open chromatin in islet-specific regions,high expression of beta-cell transcription factors,differentiated beta-cells and normal insulin secretionDiabetic beta-cell: lower expression of beta-cell transcriptionfactors affecting the identified module, dedifferentiation,reduced insulin secretion and hyperglycemiaNext steps: Validation of hypothesis and suggested key genes in human islets
  26. 26. Liver Cytochrome P450 Regulatory Network Xia Yang Bin Zhang Models Jun Zhu Regulators of P450 networkYang et al. Systematic genetic and genomic analysis of cytochrome P450 enzyme activities in human liver. 2010. Genome Research 20:1020.
  27. 27. Clinical Trial Comparator Arm Partnership (CTCAP)  Description: Collate, Annotate, Curate and Host Clinical Trial Data with Genomic Information from the Comparator Arms of Industry and Foundation Sponsored Clinical Trials: Building a Site for Sharing Data and Models to evolve better Disease Maps.  Public-Private Partnership of leading pharmaceutical companies, clinical trial groups and researchers.  Neutral Conveners: Sage Bionetworks and Genetic Alliance [nonprofits].  Initiative to share existing trial data (molecular and clinical) from non-proprietary comparator and placebo arms to create powerful new tool for drug development.
  28. 28. Examples: The Sage Federation•  Founding Lab Groups –  Seattle- Sage Bionetworks –  New York- Columbia: Andrea Califano –  Palo Alto- Stanford: Atul Butte –  San Diego- UCSD: Trey Ideker –  San Francisco: UCSF/Sage: Eric Schadt•  Initial Projects –  Aging –  Diabetes –  Warburg•  Goals: Share all datasets, tools, models Develop interoperability for human data
  29. 29. Federation s Genome-wide Network and Modeling ApproachCalifano group at Columbia Sage Bionetworks Butte group at Stanford
  30. 30. Human Aging Project Data Transformations Machine LearningBrain A(n=363) Interactome Elastic NetBrain B(n=145)Brain C TF Activity Profile Age(n=400) Network Prior Model Models Blood A(n=~1000) Gene Set / Pathway Variation Analysis Blood B Tree Classifiers(n=~1000)Adipose(n=~700)
  31. 31. Deriving Master Regulators from Transcription FactorsRegulatory Networks Glycolysis & Glycogenesis Metabolism Pathway
  32. 32. THE FEDERATIONButte Califano Friend Ideker Schadt vs
  33. 33. … the world is becoming toofast, too complex, and too networked for any company to have all the answers inside Y. Benkler, The Wealth of Networks
  34. 34. Is the Industry managing itself into irrelevance? $130 billion of patented drug sales will face generics in the 2011-2016 decade (55% of 2009 US sales) Sales exposed to generics will double in 2012 (to $33 billion) 98% of big pharma sales come from products 5 years and older (avg patent life = 11 years) 6 big pharmas were lost in the last 10 years
  35. 35. Largest Attrition For Pioneer Targets is at Clinical POC (Ph II) Target ID/ Hit/Probe/ Clinical Toxicolog Phase I Phase Discovery Lead ID Candidate y/ IIa/IIb ID Pharmaco logyAttrition 50% 10% 30% 30% 90% This is killing drug discovery We can generate effective and safe molecules in animals, but they do not have sufficient efficacy and/or safety in the chosen patient group.
  36. 36. The current pharma model is redundant Target ID/ Hit/Probe/ Clinical Toxicolog Phase I Phase Discovery Lead ID Candidate y/ IIa/IIb Phase Target ID/ Hit/Probe/ Clinical ID Pharmaco Toxicolog Phase I Discovery Lead ID Candidate logy y/ IIa/IIb ID Pharmaco Target ID/ Hit/Probe/ Clinical logy Toxicolog Phase I Phase Discovery Lead ID Candidate y/ IIa/IIb ID Pharmaco Target ID/ Hit/Probe/ Clinical Toxicolog logy Phase I Phase Discovery Lead ID Candidate y/ IIa/IIb ID Pharmaco logy Target ID/ Hit/Probe/ Clinical Toxicolog Phase I Phase Discovery Lead ID Candidate y/ IIa/IIb ID Pharmaco Target ID/ Hit/Probe/ Clinical logy Toxicolog Phase I Phase Discovery Lead ID Candidate y/ IIa/IIb ID Pharmaco Target ID/ Hit/Probe/ Clinical Toxicolog logy Phase I Phase Discovery Lead ID Candidate y/ IIa/IIb ID Pharmaco logyAttrition 50% 10% 30% 30% 90% Negative POC information is not shared
  37. 37. Let s imagine…. •  A pool of dedicated, stable funding •  A process that attracts top scientists and clinicians •  A process in which regulators can fully collaborate to solve key scientific problems •  An engaged citizenry that promotes science and acknowledges risk •  Mechanisms to avoid bureaucratic and administrative barriers •  Sharing of knowledge to more rapidly achieve understanding of human biology •  A steady stream of targets whose links to disease have been validated in humans
  38. 38. Arch2POCM A globally distributed public private partnership (PPP) committed to: • Generate more clinically validated targets by sharing data • Deliver more new drugs for patients by using compounds to understand disease biology
  39. 39. Arch2POCM: what s in a name? Arch: as in archipelago and referring to the distributed network of academic labs, pharma partners and clinical sites that will contribute to Arch2POCM programs POCM: Proof Of Clinical Mechanism: demonstration in a Ph II setting that the mechanism of the selected disease target can be safely and usefully modulated.
  40. 40. Arch2POCM  Mission  To establish a pre-competitive stream of drug development data and POCM candidates that: 1.  Will focus on high risk/high opportunity targets 2.  Will inform the industry regarding those targets that are validated for clinical proof of concept mechanism (POCM) and those that are not 3.  Will drive down redundant efforts in discovery and early development 4.  Will lead to substantial cost avoidance (est. $12.5 B annuall (HOW DOES THIS COMPLEMENT NIH TRANSLATIONAL CENTER) PARTNERS/ WHO DOES WHAT/ NO IP /CROWDSOURCING April  16-­‐17,  2011   San  Francisco  
  41. 41. Federation Projects: Building a Compute Space Combining analysis + narrative =Sweave Vignette Sage Lab R code + PDF(plots + text + code snippets) narrative HTML Data objects Califano Lab Ideker Lab Submitted Paper Shared Data JIRA: Source code repository & wiki Repository
  42. 42. Reproducible science==shareable science Sweave: combines programmatic analysis with narrative Dynamic generation of statistical reports using literate data analysis Sweave.Friedrich Leisch. Sweave: Dynamic generation of statistical reportsusing literate data analysis. In Wolfgang Härdle and Bernd Rönz,editors, Compstat 2002 – Proceedings in Computational Statistics,pages 575-580. Physica Verlag, Heidelberg, 2002. ISBN 3-7908-1517-9
  43. 43. Software Tools Support Collaboration
  44. 44. Biology Tools Support Collaboration
  45. 45. Potential Supporting TechnologiesAddama Taverna tranSMART
  46. 46. Platform for Modeling SYNAPSE  
  48. 48.  TENURE      FEUDAL  STATES      
  49. 49. Synapse  as  a  Github  for  building  models  of  disease  
  51. 51. Eight Projects Initiated in last year
  52. 52. !Group D LEGAL STACK-ENABLING PAIENTS: John Wilbanks
  53. 53. why consider the fourth paradigm- data intensive science thinking beyond the narrative, beyond pathways advantages of an open innovation compute space it is more about how than what
  54. 54. OPPORTUNITIES FOR LUNG COMMUNITYData sets, Tools and Models for Lung Biology/Pathophsiology Broad Institute cell line panels enriched in lung cancer Change reward structures for sharing data (patients and academics) Several Pharma partners interested in building models of respiratory disease- 2 public /3 Industry (Ron Crystal)