Stephen Friend Institute of Development, Aging and Cancer 2011-11-28
Use of Bionetworks to build maps of disease: Moving beyond the linear Integrating layers of omics data models and use of compute spaces Stephen Friend MD PhD Sage Bionetworks (Non-Profit Organization) Seattle/ Beijing/ Amsterdam International Symposium for 70th Anniversary IDAC November 29, 2011
Alzheimer’s Diabetes Treating Symptoms v.s. Modifying DiseasesCancer Obesity Will it work for me? Biomarkers?
Why not use data intensive science to build models of disease? Current Reward StructuresOrganizational Structures and Tools Pilots
What is the problem?Most approved therapies assume indications would representhomogenous populationsOur existing disease models often assume pathwayknowledge sufficient to infer correct therapies
Personalized Medicine 101:Capturing Single bases pair mutations = ID of responders
“Data Intensive” Science- Fourth Scientific Paradigm Equipment capable of generating massive amounts of data IT Interoperability Open Information System Host evolving computational models in a “Compute Space”
WHY NOT USE “DATA INTENSIVE” SCIENCETO BUILD BETTER DISEASE MAPS?
what will it take to understand disease? DNA RNA PROTEIN (dark matter)MOVING BEYOND ALTERED COMPONENT LISTS
How is genomic data used to understand biology? RNA amplification Tumors Microarray hybirdization Tumors Gene Index Standard GWAS Approaches Profiling Approaches Identifies Causative DNA Variation but Genome scale profiling provide correlates of disease provides NO mechanism Many examples BUT what is cause and effect? Provide unbiased view of molecular physiology as it relates to disease phenotypes trait Insights on mechanism Provide causal relationships and allows predictions 19 Integrated Genetics Approaches
Gene Co-Expression Network Analysis Define a Gene Co-expression Similarity Define a Family of Adjacency Functions Determine the AF Parameters Define a Measure of Node Distance Identify Network Modules (Clustering) Relate the Network Concepts to External Gene or Sample Information 20 Zhang B, Horvath S. Stat Appl Genet Mol Biol 2005
Constructing Co-expression Networks Start with expression measures for genes most variant genes across 100s ++ samples 1 2 3 4 Note: NOT a gene expression heatmap 1 1 0.8 0.2 -0.8 Establish a 2D correlation matrix 2 for all gene pairsexpression 0.8 1 0.1 -0.6 3 0.2 0.1 1 -0.1 4 -0.8 -0.6 -0.1 1 Brain sample Correlation Matrix Define Threshold eg >0.6 for edge 1 2 4 3 1 2 3 4 1 1 1 4 1 1 1 0 1 1 0 1 2 2 1 1 1 0 1 1 0 1 1 1 1 0 Hierarchically 3 Identify modules 4 0 0 1 0 2 3 cluster 4 3 0 0 0 1 1 1 0 1 Network Module Clustered Connection Matrix Connection Matrix sets of genes for which many pairs interact (relative to the total number of pairs in that set)
Preliminary Probabalistic Models- Rosetta /Schadt Networks facilitate direct identification of genes that are causal for disease Evolutionarily tolerated weak spots Gene symbol Gene name Variance of OFPM Mouse Source explained by gene model expression* Zfp90 Zinc finger protein 90 68% tg Constructed using BAC transgenics Gas7 Growth arrest specific 7 68% tg Constructed using BAC transgenics Gpx3 Glutathione peroxidase 3 61% tg Provided by Prof. Oleg Mirochnitchenko (University of Medicine and Dentistry at New Jersey, NJ)  Lactb Lactamase beta 52% tg Constructed using BAC transgenics Me1 Malic enzyme 1 52% ko Naturally occurring KO Gyk Glycerol kinase 46% ko Provided by Dr. Katrina Dipple (UCLA)  Lpl Lipoprotein lipase 46% ko Provided by Dr. Ira Goldberg (Columbia University, NY)  C3ar1 Complement component 46% ko Purchased from Deltagen, CA 3a receptor 1 Tgfbr2 Transforming growth 39% ko Purchased from Deltagen, CANat Genet (2005) 205:370 factor beta receptor 2
Extensive Publications now Substantiating Scientific Approach Probabilistic Causal Bionetwork Models• >80 Publications from Rosetta Genetics Metabolic "Genetics of gene expression surveyed in maize, mouse and man." Nature. (2003) Disease "Variations in DNA elucidate molecular networks that cause disease." Nature. (2008) "Genetics of gene expression and its effect on disease." Nature. (2008) "Validation of candidate causal genes for obesity that affect..." Nat Genet. (2009) ….. Plus 10 additional papers in Genome Research, PLoS Genetics, PLoS Comp.Biology, etc CVD "Identification of pathways for atherosclerosis." Circ Res. (2007) "Mapping the genetic architecture of gene expression in human liver." PLoS Biol. (2008) …… Plus 5 additional papers in Genome Res., Genomics, Mamm.Genome Bone "Integrating genotypic and expression data …for bone traits…" Nat Genet. (2005) d ..approach to identify candidate genes regulating BMD…" J Bone Miner Res. (2009) Methods "An integrative genomics approach to infer causal associations ... Nat Genet. (2005) "Increasing the power to detect causal associations… PLoS Comput Biol. (2007) "Integrating large-scale functional genomic data ..." Nat Genet. (2008) …… Plus 3 additional papers in PLoS Genet., BMC Genet.
List of Influential Papers in Network Modeling 50 network papers http://sagebase.org/research/resources.php
“Data Intensive” Science- Fourth Scientific Paradigm Score Card for Medical Sciences Equipment capable of generating massive amounts of data A- IT Interoperability D Open Information System D- Host evolving computational models in a “Compute Space F
.We still consider much clinical research as if We were hunter gathers - not sharing
Clinical/genomic data are accessible but minimally usableLittle incentive to annotate and curate data for other scientists to use
Mathematicalmodels of disease are not built to be reproduced orversioned by others
Assumption that genetic alterationsin human conditions should be owned
Lack of standard forms for future rights and consents
sharing as an adoption of common standards.. Clinical Genomics Privacy IP
Publication Bias- Where can we find the (negative) clinical data?
Sage Mission Sage Bionetworks is a non-profit organization with a vision to create a commons where integrative bionetworks are evolved by contributor scientists with a shared vision to accelerate the elimination of human diseaseBuilding Disease Maps Data RepositoryCommons Pilots Discovery Platform Sagebase.org
NEW MAPS Disease Map and Tool Users- ( Scientists, Industry, Foundations, Regulators...) PLATFORM Sage Platform and Infrastructure Builders- ( Academic Biotech and Industry IT Partners...) PILOTS= PROJECTS FOR COMMONS Data Sharing Commons Pilots- (Federation, CCSB, Inspire2Live....) NEW TOOLS ORM APS Data Tool and Disease Map Generators- (Global coherent data sets, Cytoscape,M F PLAT Clinical Trialists, Industrial Trialists, CROs…) NEW RULES GOVERN RULES AND GOVERNANCE Data Sharing Barrier Breakers- (Patients Advocates, Governance and Policy Makers, Funders...)
Alzheimer’s Disease• Cross-tissue coexpression networks for both normal and AD brains – prefrontal cortex, cerebellum, visual cortex• Differential network analysis on AD and normal networks• Integrate coexpression networks and Bayesian networks to identify key regulators for the modules associated with AD 38
Identification of Disease (AD) Pathways via Comparative Gene Network Analysis40,000 genes from three tissues Glutathione transferase Gain connectivity by 91 fold AD (PFC, CB, VC) nerve ensheathment Control Lose connectivity by 40% (PFC, CB, VC) Module Connectivity Change (AD/Normal) 39 Bayesian Subnetworks
Key RegulatorsGlutathioneTransferase NerveEnsheathment ExtracellularMatrix PECAM1: Platelet-endothelial cell adhesion molecule, a tyrosine phosphatase activator that plays a role in the platelet activation, increased expression correlates with MS, Crohn disease, chronic B-cell leukemia, rheumatoid arthritis, and ulcerative colitis ENPP2: Phosphodiesterase I alpha, a lysophospholipase that acts in chemotaxis, phosphatidic acid biosynthesis, regulates apoptosis and PKB signaling; aberrant expression is associated with Alzheimer type dementia, major depressive disorder, and various cancers SLC22A25: solute carrier family 22, member 25, Protein with high similarity to mouse Slc22a19, which is a renal steroid sulfate transporter that plays a role in the uptake of estrone sulfate, member of the sugar (and other) transporter family and the major facilitator superfamily Glutathione Transferase Module (Pink)• 983 probes from all three brain regions (9% from CB, 15% from PFC and 76% from VC) 40• Most predictive of Braak severity score
sage federation:model of biological age Faster Aging Predicted Age (liver expression) Slower Aging Clinical Association - Gender - BMI - Disease Age Differential Genotype Association Gene Pathway Expression Chronological Age (years)
Non-Responders Project To identify Non-Responders to approved Oncology drug regimens in order to improve outcomes, spare patients unnecessary toxicitiesfrom treatments that have no benefit to them, and reduce healthcare costs
The Non-Responder Cancer Project Leadership Team Stephen Friend, MD, PhD Todd Golub, MD Founding Director Cancer Biology President and Co-Founder of Program Broad Institute, Charles Dana Sage Bionetworks, Head of Investigator Dana-Farber Cancer Merck Oncology 01-08, Institute, Professor of Pediatrics Harvard Founder of Rosetta Medical School, Investigator, Howard Inpharmatics 97-01, co- Hughes Medical Institute Founder of the Seattle Project Richard Schilsky, MD Garry Nolan, PhD Chief, Hematology- Oncology, Deputy Professor, Baxter Laboratory of Stem Director, Comprehensive Cancer Cell Biology, Department of Microbiology Center, University of Chicago; Chair, and Immunology, Stanford University National Cancer Institute Board of Director, Proteomics Center at Stanford Scientific Advisors; past-President University ASCO, past Chairman CALGB clinical trials group 11
Why not share clinical /genomic data and model building in the ways currently used by the software industry (power of tracking workflows and versioning