Stephen Friend Fanconi Anemia Research Fund 2012-01-21
Use of Bionetworks to Build Maps of Diseases Moving Beyond Stephen Friend MD PhD Sage Bionetworks (Non-Profit Organization) Seattle/ Beijing/ San Francisco Fanconi Anemia Research Fund Annual Planning Meeting January 21, 2012
We still consider much clinical research as if we were hunter gathers - not sharing .
Why consider the fourth paradigm- data intensive science? thinking beyond the narrative, beyond pathways advantages of an open innovation compute space Where are the precompetitive goalposts? it is more about how than what
WHY NOT USE “DATA INTENSIVE” SCIENCETO BUILD BETTER DISEASE MAPS?
“Data Intensive Science”- “Fourth Scientific Paradigm”For building: “Better Maps of Human Disease” Equipment capable of generating massive amounts of data IT Interoperability Open Information System Evolving Models hosted in a Compute Space- Knowledge Expert
It is now possible to carry out comprehensive monitoring of many traits at the population levelMonitor disease and molecular traits in populations Putative causal gene Disease trait
what will it take to understand disease? DNA RNA PROTEIN (dark matter)MOVING BEYOND ALTERED COMPONENT LISTS
Data integration via Bayesian Network Yeast segregants Public databases ******BYRMSynthetic complete Protein-protein medium interationsLogorithm growth Transcription factor binding Gene expression sites genotypes Yeast segregants Protein Metabolite interations Bayesian network Courtesy of Dr. Jun Zhu
Preliminary Probabalistic Models- Rosetta /Schadt Networks facilitate direct identification of genes that are causal for disease Evolutionarily tolerated weak spots Gene symbol Gene name Variance of OFPM Mouse Source explained by gene model expression* Zfp90 Zinc finger protein 90 68% tg Constructed using BAC transgenics Gas7 Growth arrest specific 7 68% tg Constructed using BAC transgenics Gpx3 Glutathione peroxidase 3 61% tg Provided by Prof. Oleg Mirochnitchenko (University of Medicine and Dentistry at New Jersey, NJ)  Lactb Lactamase beta 52% tg Constructed using BAC transgenics Me1 Malic enzyme 1 52% ko Naturally occurring KO Gyk Glycerol kinase 46% ko Provided by Dr. Katrina Dipple (UCLA)  Lpl Lipoprotein lipase 46% ko Provided by Dr. Ira Goldberg (Columbia University, NY)  C3ar1 Complement component 46% ko Purchased from Deltagen, CA 3a receptor 1 Tgfbr2 Transforming growth 39% ko Purchased from Deltagen, CANat Genet (2005) 205:370 factor beta receptor 2
Our ability to integrate compound data into our network analyses db/db mouse (p~10E(-30)) = up regulated = down regulateddb/db mouse(p~10E(-20) p~10E(-100)) AVANDIA in db/db mouse
Extensive Publications now Substantiating Scientific Approach Probabilistic Causal Bionetwork Models• >80 Publications from Rosetta Genetics Metabolic "Genetics of gene expression surveyed in maize, mouse and man." Nature. (2003) Disease "Variations in DNA elucidate molecular networks that cause disease." Nature. (2008) "Genetics of gene expression and its effect on disease." Nature. (2008) "Validation of candidate causal genes for obesity that affect..." Nat Genet. (2009) ….. Plus 10 additional papers in Genome Research, PLoS Genetics, PLoS Comp.Biology, etc CVD "Identification of pathways for atherosclerosis." Circ Res. (2007) "Mapping the genetic architecture of gene expression in human liver." PLoS Biol. (2008) …… Plus 5 additional papers in Genome Res., Genomics, Mamm.Genome Bone "Integrating genotypic and expression data …for bone traits…" Nat Genet. (2005) d ..approach to identify candidate genes regulating BMD…" J Bone Miner Res. (2009) Methods "An integrative genomics approach to infer causal associations ... Nat Genet. (2005) "Increasing the power to detect causal associations… PLoS Comput Biol. (2007) "Integrating large-scale functional genomic data ..." Nat Genet. (2008) …… Plus 3 additional papers in PLoS Genet., BMC Genet.
List of Influential Papers in Network Modeling 50 network papers http://sagebase.org/research/resources.php
Recognition that the benefits of bionetwork based molecularmodels of diseases are powerful but that they requiresignificant resourcesAppreciation that it will require decades of evolvingrepresentations as real complexity emerges and needs to beintegrated with therapeutic interventions
Sage Mission Sage Bionetworks is a non-profit organization with a vision to create a commons where integrative bionetworks are evolved by contributor scientists with a shared vision to accelerate the elimination of human diseaseBuilding Disease Maps Data RepositoryCommons Pilots Discovery Platform Sagebase.org
Engaging Communities of Interest NEW MAPS Disease Map and Tool Users- ( Scientists, Industry, Foundations, Regulators...) PLATFORM Sage Platform and Infrastructure Builders- ( Academic Biotech and Industry IT Partners...) RULES AND GOVERNANCE Data Sharing Barrier Breakers- (Patients Advocates, Governance ORM and Policy Makers, Funders...)M APS F NEW TOOLS PLAT NEW Data Tool and Disease Map Generators- (Global coherent data sets, Cytoscape, RULES GOVERN Clinical Trialists, Industrial Trialists, CROs…) PILOTS= PROJECTS FOR COMMONS Data Sharing Commons Pilots- (Federation, CCSB, Inspire2Live....)
Example 1: Breast Cancer Coexpression Networks Module combination Partition BN Bayesian NetworkSurvival Analysis 27 Zhang B et al., manuscript
Generation of Co-expression & Bayesian Networks frompublished Breast Cancer Studies 4 Public Breast Cancer Datasets NKI: van de Vijver et al. A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med. 2002 Dec 19;347 295 samples (25):1999-2009. Wang Y et al. Gene-expression profiles to predict distant metastasis of lymph-node- negative primary breast cancer. Lancet. 286 samples 2005 Feb 19-25;365(9460):671-9. Miller: Pawitan Y et al. Gene expression profiling spares early breast cancer patients from adjuvant therapy: derived and 159 samples validated in two population-based cohorts. Breast Cancer Res. 2005;7(6):R953-64. Christos: Sotiriou C et al.. Gene expression profiling in breast cancer: understanding the molecular basis of 189 samples histologic grade to improve prognosis. J Natl Cancer Inst. 2006 Feb 15;98(4): 262-72.
Recovery of EGFR and Her2 oncoproteinsdownstream pathways by super modules
Comparison of Super-modules with EGFR and Her2 signaling and resistance pathways
Key Driver Analysis• Identify key regulators for a list of genes h and a network N• Check the enrichment of h in the downstream of each node in N• The nodes significantly enriched for h are the candidate drivers 31
A) Cell Cycle (blue) B) Chromatin modification (black) C) Pre-mRNA Processing (brown) D) mRNA Processing (red) Global driver Global driver & RNAi validation 32
Signaling between Super Modules (View Poster presented by Bin Zhang)
Clinical Trial Comparator Arm Partnership (CTCAP) Description: Collate, Annotate, Curate and Host Clinical Trial Data with Genomic Information from the Comparator Arms of Industry and Foundation Sponsored Clinical Trials: Building a Site for Sharing Data and Models to evolve better Disease Maps. Public-Private Partnership of leading pharmaceutical companies, clinical trial groups and researchers. Neutral Conveners: Sage Bionetworks and Genetic Alliance [nonprofits]. Initiative to share existing trial data (molecular and clinical) from non-proprietary comparator and placebo arms to create powerful new tool for drug development.
Example 3: The Sage Federation • Founding Lab Groups – Seattle- Sage Bionetworks – New York- Columbia: Andrea Califano – Palo Alto- Stanford: Atul Butte – San Diego- UCSD: Trey Ideker – San Francisco: UCSF/Sage: Eric Schadt – NEW LABS: Gary Nolan Stanford/ David Haussler UCSC • Initial Projects – Aging – Diabetes – Warburg • Goals: Share all datasets, tools, models Develop interoperability for human data
Federation s Genome-wide Network and Modeling ApproachCalifano group at Columbia Sage Bionetworks Butte group at Stanford
Genes Associated with Poor Prognosis are disproportionallyfound among the networks regulating the glycolysis Genes P-Value<0.005 Size of the node proportional to -log10 P value for recurrence free survival. Inferred regulatory module for GGMSE Inferred regulatory module for Oxidative Phosphorylation and Sphingolipid >5 fold enrichment of recurrence free prognostic genes with Metabolism genes the Glycolysis BN module than random selection (p<1e-100)
Why not share clinical /genomic data and model building in the ways currently used by the software industry (power of tracking workflows and versioning
Synapse as a Github for building models of disease
sage bionetworks synapse project Watch What I Do, Not What I Say Reduce, Reuse, Recycle My Other Computer is Amazon Most of the People You Need to Work with Don’t Work with You
Arch2POCMRestructuring the “Competitive” Phase of Drug Discovery
What is the described problem?• Regulatory hurdles too high?• Low hanging fruit picked?• Payers unwilling to pay?• Genome has not delivered?• Valley of death?• Companies not large enough to execute on strategy?• Internal research costs too high?• Clinical trials in developed countries too expensive?In fact, all are true but none is the real problem
What is the real problem?We need to rebuild the drug discovery process so that webetter understand disease biology before testing proprietarycompounds on sick patients
The solution – Arch2POCM1. Create an Archipelago of clinicians and scientists from public and private sectors to take projects from ideas to Proof of Clinical Mechanism (POCM)2. Arch2POCM is a collaborative, data-sharing network of scientists, whose drug discovery objective is to use robust compounds against new targets to disentangle the complexity of human biology, not to create a medicine3. Success? • A compound that provides proof of concept for a novel target- allowing companies to use this common information to compete, with dramatic increased chances of success • Culling targets with doomed mechanisms before multiple companies waste money exploring them - at $50M a pop
Why data sharing through to Phase IIb?• Most rapidly reveals limitations and opportunities associated with the target• Increases probability of success for internal proprietary programs• Scientific decisions are not influenced by market considerations or biased internal thinking• Target mechanism is only properly tested at Phase IIb
Why no IP on “Common Stream” compounds?• Allows multiple groups to test diverse indications without funds from Arch2POCM- crowdsourcing drug discovery• Broader and faster data dissemination• Far fewer legal agreements to negotiate• Generates “freedom to operate” on target because there are no patent thickets to wade through• Efficient way to access world’s top scientists and doctors without hassle
2012-13: Year of Learning from our Pilots Jan 12-‐ APR 13 SAGE RESEARCH SYNAPSE FEDERATION CONSENTS CTCAP CITIZEN LED PROJECTS SCIENTIST LED PROJECTS SAGE BIONETWORKS WEBSITES/COMMONS Arch2POCM CONGRESS “MedXChange-‐Bridge”
Actionable Disease Bionetwork Models Open Medical Information Systems Democratization of Science Networked Science Approaches IMPACT ON PATIENTS
OPPORTUNITIES FOR FANCONI’S COMMUNITY Evolve Sharing of Data sets, Tools and Models Joining Synapse Communities Buiding your own “Federation Projects”Paticipate in Sage Commons Congress April 20-21 Joining Arch2POCM Change reward structures for sharing data (patients and academics)