Use of Bionetworks to Build Maps of Disease                 Stephen Friend MD PhD        Sage Bionetworks (Non-Profit Orga...
why consider the fourth paradigm- data intensive science    thinking beyond the narrative, beyond pathways   advantages of...
Alzheimers	                             Diabetes	        Treating Symptoms v.s. Modifying Diseases Depression	            ...
The Current Pharma Model is Broken:•  In 2010, the pharmaceutical industry spent ~$100B for R&D•  Half of the 2010 R&D spe...
Familiar	  but	  Incomplete	  
Personalized Medicine 101:Capturing Single bases pair mutations = ID of responders
Reality: Overlapping Pathways
“Data Intensive Science” - Fourth Scientific Paradigm            Equipment capable of generating            massive amount...
WHY	  NOT	  USE	  	        “DATA	  INTENSIVE”	  SCIENCE	  TO	  BUILD	  BETTER	  DISEASE	  MAPS?	  
“Data Intensive Science”- “Fourth Scientific Paradigm”For building: “Better Maps of Human Disease”           Equipment cap...
It	  is	  now	  possible	  to	  carry	  out	  comprehensive	      monitoring	  of	  many	  traits	  at	  the	  populaOon	 ...
One Dimensional Technology Slices                                                    ENVIRONMENT      Building an Altered ...
2002 Rosetta Integrative Genomics Experiment : Generation, assembly, and           integration of data to build models tha...
How is genomic data used to understand biology?                                                                RNA amplifi...
Integration of Genotypic, Gene Expression & Trait Data                                               Schadt et al. Nature ...
Constructing Co-expression Networks             Start with expression measures for genes most variant genes across 100s ++...
Gene Co-Expression Network Analysis  Define a Gene Co-expression Similarity  Define a Family of Adjacency Functions       ...
Preliminary Probabalistic Models- Rosetta /Schadt                                                                        N...
Network	  Modeling	  of	  Cardiovascular	  Disease	       Agilent	  Technologies,	  Stanford	  School	  of	  Medicine,	  C...
Genomic	                       Literature	                     Protein-­‐Protein	  Complexes	                             ...
Assembling	  Networks	  for	  Use	  in	  the	  Clinic	      Network evolutionary     comparison / cross-    species alignm...
List of Influential Papers in Network Modeling                                        50 network papers                  ...
(Eric Schadt)
Recognition that the benefits of bionetwork based molecularmodels of diseases are powerful but that they requiresignifican...
Sage Mission      Sage Bionetworks is a non-profit organization with a vision to   create a commons where integrative bion...
Sage Bionetworks Strategy: Integrate with Communities of Interest                                                    Map U...
Sage Bionetworks Functional Organization            Platform            Commons         Research                          ...
Sage Bionetworks Collaborators  Pharma Partners     Merck, Pfizer, Takeda, Astra Zeneca, Amgen  Foundations     CHDI, ...
Bin ZhangIntegration of Multiple Networks for             Jun ZhuPathway and Target Identification                        ...
Bin ZhangKey Driver Analysis                         Jun Zhu                                            Justin Guinney    ...
Justin GuinneyGene Set Variation Analysis (GSVA)                                                                          ...
Bin ZhangModel of Breast Cancer: Co-expression                                                 Xudong Dai                 ...
Bin ZhangModel of Breast Cancer: Integration                                                              Xudong Dai      ...
Bin ZhangModel of Breast Cancer: Mining                                    Xudong Dai                                     ...
Bin ZhangModel of Alzheimer’s Disease                                   Jun Zhu                                           ...
Liver Cytochrome P450 Regulatory Network                                                        Xia Yang                  ...
AndersNew Type II Diabetes Disease Models                                        Rosengren  Global expression data        ...
New Type II Diabetes Disease Models                                        Anders                                         ...
Brig MechamValidating Prostate Cancer Models                                                          Xudong Dai          ...
Lara MangraviteSystems biology approach to pharmacogenomics                                                               ...
Clinical Trial Comparator Arm        Partnership (CTCAP)  Description: Collate, Annotate, Curate and Host Clinical Trial ...
CTCAP Workstreams                                           Uncurated GCD                                             Cura...
Developing predictive models of genotype specific sensitivityto Perturbations- Margolin                     Predictive model
Examples: The Sage Federation•  Founding Lab Groups   –    Seattle- Sage Bionetworks   –    New York- Columbia: Andrea Cal...
Human	  Aging	  Project	      Data                 Transformations       Machine Learning  Brain	  A	    (n=363)   	      ...
Preliminary	  Results	            Adipose Age Prediction   multivariate logistic regression model   predicting age in huma...
Federation s Genome-wide Network and                Modeling ApproachCalifano group at Columbia   Sage Bionetworks   Butte...
Deriving Master Regulators from Transcription FactorsRegulatory Networks Glycolysis & Glycogenesis Metabolism Pathway
Genes Associated with Poor Prognosis are disproportionallyfound among the networks regulating the glycolysis Genes    P-Va...
THE FEDERATIONButte   Califano Friend Ideker   Schadt                   vs
Sage Bionetworks 22 publications          in last year
MO    DEL        S              PILOTS         CE     NAN  VERGO
http://sagecongress.org
E                                  A     Compute                           Engaged Public     Platform           GO   D   ...
We still consider much clinical research as if we were hunter gathers - not sharing soon enough                          .
Assumption that genetic alterationsin human conditions should be owned
Reproducible	  science==shareable	  science	            Sweave: combines programmatic analysis with narrativeDynamic gener...
Federated	  Aging	  Project	  :	  	         Combining	  analysis	  +	  narraOve	  	                                =Sweave...
Evolution of a Software Project
Evolution of a Biology Project
Software Tools Support Collaboration
Biology Tools Support Collaboration
Potential Supporting TechnologiesAddama                                   Taverna                 tranSMART	  
A Platform Node for Modeling
INTEROPERABILITYINTEROPERABILITY	  
 TENURE   	     	  	  FEUDAL	  STATES	  	     	  
IMPACT	  ON	  PATIENTS	  
why consider the fourth paradigm- data intensive science    thinking beyond the narrative, beyond pathways   advantages of...
Stephen Friend Cytoscape Retreat 2011-05-20
Stephen Friend Cytoscape Retreat 2011-05-20
Stephen Friend Cytoscape Retreat 2011-05-20
Stephen Friend Cytoscape Retreat 2011-05-20
Stephen Friend Cytoscape Retreat 2011-05-20
Stephen Friend Cytoscape Retreat 2011-05-20
Stephen Friend Cytoscape Retreat 2011-05-20
Upcoming SlideShare
Loading in …5
×

Stephen Friend Cytoscape Retreat 2011-05-20

1,632
-1

Published on

Stephen Friend, May 19-21, 2011. Cytoscape Retreat, La Jolla, CA

Published in: Health & Medicine
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,632
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Stephen Friend Cytoscape Retreat 2011-05-20

  1. 1. Use of Bionetworks to Build Maps of Disease Stephen Friend MD PhD Sage Bionetworks (Non-Profit Organization) Seattle/ Beijing/ San Francisco 6th Annual S2S Symposium San Diego May 20th, 2011
  2. 2. why consider the fourth paradigm- data intensive science thinking beyond the narrative, beyond pathways advantages of an open innovation compute space it is more about why than what
  3. 3. Alzheimers   Diabetes   Treating Symptoms v.s. Modifying Diseases Depression   Cancer   Will it work for me?
  4. 4. The Current Pharma Model is Broken:•  In 2010, the pharmaceutical industry spent ~$100B for R&D•  Half of the 2010 R&D spend ($50B) covered pre-PH III activities•  Half of the pre-PH III costs ($25B) were for program targets that at least one other pharmaceutical company was actively pursuing•  Only 8% of pharma company small molecule PCCs make it to PH III•  In 2010, only 21 new medical entities were approved by FDA April  16-­‐17,  2011   4   San  Francisco  
  5. 5. Familiar  but  Incomplete  
  6. 6. Personalized Medicine 101:Capturing Single bases pair mutations = ID of responders
  7. 7. Reality: Overlapping Pathways
  8. 8. “Data Intensive Science” - Fourth Scientific Paradigm Equipment capable of generating massive amounts of data IT Interoperability Open Information System Evolving Models hosted in a Compute Space- Knowledge expert
  9. 9. WHY  NOT  USE     “DATA  INTENSIVE”  SCIENCE  TO  BUILD  BETTER  DISEASE  MAPS?  
  10. 10. “Data Intensive Science”- “Fourth Scientific Paradigm”For building: “Better Maps of Human Disease” Equipment capable of generating massive amounts of data IT Interoperability Open Information System Evolving Models hosted in a Compute Space- Knowledge Expert
  11. 11. It  is  now  possible  to  carry  out  comprehensive   monitoring  of  many  traits  at  the  populaOon  level  Monitor  disease  and  molecular  traits  in   populaOons   PutaOve  causal  gene   Disease  trait  
  12. 12. One Dimensional Technology Slices ENVIRONMENT Building an Altered Component List Non-coding RNA network BRAIN HEART ENVIRONMENT GI TRACT protein network KIDNEYENVIRONMENT metabolite network IMMUNE SYSTEM VASCULATURE transcriptional network ENVIRONMENT
  13. 13. 2002 Rosetta Integrative Genomics Experiment : Generation, assembly, and integration of data to build models that predict clinical outcomeMerck Inc. Co.5 Year ProgramBased at RosettaDriven by Eric Schadt•  Generate data need to build•  bionetworks•  Assemble other available data useful for building networks•  Integrate and build models•  Test predictions•  Develop treatments•  Design Predictive Markers
  14. 14. How is genomic data used to understand biology? RNA amplification Tumors Microarray hybirdization Tumors Gene Index Standard GWAS Approaches Profiling Approaches Identifies Causative DNA Variation but Genome scale profiling provide correlates of disease provides NO mechanism   Many examples BUT what is cause and effect?   Provide unbiased view of molecular physiology as it relates to disease phenotypes trait   Insights on mechanism   Provide causal relationships and allows predictions 19 Integrated Genetics Approaches
  15. 15. Integration of Genotypic, Gene Expression & Trait Data Schadt et al. Nature Genetics 37: 710 (2005) Millstein et al. BMC Genetics 10: 23 (2009) Causal Inference “Global Coherent Datasets” •  population based •  100s-1000s individuals Chen et al. Nature 452:429 (2008) Zhu et al. Cytogenet Genome Res. 105:363 (2004) Zhang & Horvath. Stat.Appl.Genet.Mol.Biol. 4: article 17 (2005) Zhu et al. PLoS Comput. Biol. 3: e69 (2007)
  16. 16. Constructing Co-expression Networks Start with expression measures for genes most variant genes across 100s ++ samples 1 2 3 4 Note: NOT a gene expression heatmap 1 1 0.8 0.2 -0.8 Establish a 2D correlation matrix 2 for all gene pairsexpression 0.8 1 0.1 -0.6 3 0.2 0.1 1 -0.1 4 -0.8 -0.6 -0.1 1 Brain sample Correlation Matrix Define Threshold eg >0.6 for edge 1 2 4 3 1 2 3 4 1 1 1 4 1 1 1 0 1 1 0 1 2 2 1 1 1 0 1 1 0 1 1 1 1 0 Hierarchically 3 Identify modules 4 0 0 1 0 2 3 cluster 4 3 0 0 0 1 1 1 0 1 Network Module Clustered Connection Matrix Connection Matrix sets of genes for which many pairs interact (relative to the total number of pairs in that set)
  17. 17. Gene Co-Expression Network Analysis Define a Gene Co-expression Similarity Define a Family of Adjacency Functions Determine the AF Parameters Define a Measure of Node Distance Identify Network Modules (Clustering) Relate the Network Concepts to External Gene or Sample Information 22 Zhang B, Horvath S. Stat Appl Genet Mol Biol 2005
  18. 18. Preliminary Probabalistic Models- Rosetta /Schadt Networks facilitate direct identification of genes that are causal for disease Evolutionarily tolerated weak spots Gene symbol Gene name Variance of OFPM Mouse Source explained by gene model expression* Zfp90 Zinc finger protein 90 68% tg Constructed using BAC transgenics Gas7 Growth arrest specific 7 68% tg Constructed using BAC transgenics Gpx3 Glutathione peroxidase 3 61% tg Provided by Prof. Oleg Mirochnitchenko (University of Medicine and Dentistry at New Jersey, NJ) [12] Lactb Lactamase beta 52% tg Constructed using BAC transgenics Me1 Malic enzyme 1 52% ko Naturally occurring KO Gyk Glycerol kinase 46% ko Provided by Dr. Katrina Dipple (UCLA) [13] Lpl Lipoprotein lipase 46% ko Provided by Dr. Ira Goldberg (Columbia University, NY) [11] C3ar1 Complement component 46% ko Purchased from Deltagen, CA 3a receptor 1 Tgfbr2 Transforming growth 39% ko Purchased from Deltagen, CANat Genet (2005) 205:370 factor beta receptor 2
  19. 19. Network  Modeling  of  Cardiovascular  Disease   Agilent  Technologies,  Stanford  School  of  Medicine,  Cytoscape  •  Coronary  Heart  Disease   –  Inflammatory  disease  stemming  from  geneOc  and  environmental  factors   –  Number  one  killer  in  the  U.S.   •  More  deaths  than  the  next  5  leading  causes  of  death  combined*   –  Involves  a  large  number  of  processes  •  Mul1ple  inves1ga1ve  approaches   –  Analysis  of  microarray  data  idenOfies  (staOsOcally  significant  gene   expression  changes,     –  IdenOfying  discriminatory  pathways/networks  of  gene  interacOons  provides   •  informaOon  for  understanding  complex  processes  and     •  possible  therapeuOc  targets    •  Systems-­‐based  method  to  analyze  high-­‐throughput  data     –  Literature-­‐based  de  novo  network  construcOon,     –  VisualizaOon  for  examining  generated  networks  against  experimental  data.  •  Method  applied  to  studies  in     –  Atherosclerosis   –  In  stent  restenoisis   –  ACE  Inhibitor  usage   King  et  al,  Physiol  Genomics.  2005  
  20. 20. Genomic   Literature   Protein-­‐Protein  Complexes   Transcriptiona l   Signaling  THE EVOLUTION OF SYSTEMS BIOLOGY Mol.  Profiles   Structure   Model  Evolution   Disease  Models   Model  Topology   Physiologic  /   Model  Dynamics   Pathologic   Phenotype  Regulation    
  21. 21. Assembling  Networks  for  Use  in  the  Clinic   Network evolutionary comparison / cross- species alignment to Network-based cancer identify conserved The Working Map diagnosis / prognosis modules Projection of molecular profiles on proteinnetworks to reveal active modules Identification of networks associated with cancer Integration of progressiontranscriptional interactions with causal or functional links Moving from genome-wide association studies Alignment of physical and (GWAS) to network-wide genetic networks pathway association (NWAS) Pathway assembly via Network based study of integration of networks disease
  22. 22. List of Influential Papers in Network Modeling   50 network papers   http://sagebase.org/research/resources.php
  23. 23. (Eric Schadt)
  24. 24. Recognition that the benefits of bionetwork based molecularmodels of diseases are powerful but that they requiresignificant resourcesAppreciation that it will require decades of evolvingrepresentations as real complexity emerges and needs to beintegrated with therapeutic interventions
  25. 25. Sage Mission Sage Bionetworks is a non-profit organization with a vision to create a commons where integrative bionetworks are evolved by contributor scientists with a shared vision to accelerate the elimination of human diseaseBuilding Disease Maps Data RepositoryCommons Pilots Discovery Platform Sagebase.org
  26. 26. Sage Bionetworks Strategy: Integrate with Communities of Interest Map Users- Disease Map and Tool Users- ( Scientists, Industry, Foundations, Regulators...) Platform Builders – Sage Platform and Infrastructure Builders- ( Academic Biotech and Industry IT Partners...) Barrier Breakers- Data Sharing Barrier Breakers- (Patients Advocates, Governance ORM APS and Policy Makers,  Funders...) M F Data Generators- PLAT NEW Data Tool and Disease Map Generators- (Global coherent data sets, Cytoscape, REPOSITORY Clinical Trialists, Industrial Trialists, CROs…) Commons Pilots- Data Sharing Commons Pilots- (Federation, CCSB, Inspire2Live....)
  27. 27. Sage Bionetworks Functional Organization Platform Commons Research Cancer Neurological Disease Metabolic Disease Curation/Annotation Building Data Disease Repository Maps CTCAP Public Data Pfizer Merck Data Outposts Merck TCGA/ICGC Federation Takeda CCSB Astra Zeneca CHDI Commons Gates NIH Pilots LSDF-WPP Inspire2Live Hosting Data POC Hosting Tools Bayesian Models Co-expression Models Hosting Models Discovery Tools & Platform Methods KDA/GSVA LSDF 32
  28. 28. Sage Bionetworks Collaborators  Pharma Partners   Merck, Pfizer, Takeda, Astra Zeneca, Amgen  Foundations   CHDI, Gates Foundation  Government   NIH, LSDF  Academic   Levy (Framingham)   Rosengren (Lund)   Krauss (CHORI)  Federation   Ideker, Califarno, Butte, Schadt 33
  29. 29. Bin ZhangIntegration of Multiple Networks for Jun ZhuPathway and Target Identification CNV Data Gene Expression Clinical Traits Co-Expression Bayesian Network Network Integration of Coexp. & Bayesian Networks 34 Key Driver Analysis 34
  30. 30. Bin ZhangKey Driver Analysis Jun Zhu Justin Guinney http://sagebase.org/research/tools.php 35
  31. 31. Justin GuinneyGene Set Variation Analysis (GSVA) Sonja Haenzelmann . !"#"$"546"**78# / 9(#+7#,$,"#"*$:*7#,$+"6#";$<"#*7&=$"*&7>(&"* ,. %.$%/$%-$%0$%1$%2$%# KL1 !"#"$. Meta-pathways ,/ KL0 ,- CA5G KL- ,0 KL/ ,1 ,2 KL. ,3 K 5 ,4 J0$$$$$$$$$$$$$$$$$J/$$$$$$$$$$$$$$$$K$$$$$$$$$$$$$$$$/$$$$$$$$$$$$$$$$$0 ?"(*:6"$@%$A>(57>:>$ H8#*&6:I&$($>(&675$A*(>4;"$5$ - !"#"$%"&$(&()(*" 1 ,"#"$*"&G$(#<$*&86"$@%$*I86"* 0 <"B7(&78#$8C$&D"$6(#<8>$E(;+$ C68>$F"68G %.$%/$%-$%0$%1$%2$%# @% ,*. ,*/ 9:##7#,$ ,*- *:> ,*0 ,*1 +$,"#"$*"&* ,*2 ,"#"* ,*3 ,*+ Pathway CNV Cross-tissue Pathway Clustering Pathways 36
  32. 32. Bin ZhangModel of Breast Cancer: Co-expression Xudong Dai Jun Zhu A) Miller 159 samples B) Christos 189 samplesNKI: N Engl J Med. 2002 Dec 19;347(25):1999.Wang: Lancet. 2005 Feb 19-25;365(9460):671.Miller: Breast Cancer Res. 2005;7(6):R953.Christos: J Natl Cancer Inst. 2006 15;98(4):262. C) NKI 295 samples E) Super modules Cell cycle Pre-mRNA ECM D) Wang 286 samples Blood vessel Immune response 37 Zhang B et al., Towards a global picture of breast cancer (manuscript).
  33. 33. Bin ZhangModel of Breast Cancer: Integration Xudong Dai Jun Zhu Conserved Super-modules mRNA proc. = predictive Breast Cancer Bayesian Network Chromatin of survival Extract gene:gene relationships for selected super-modules from BN and define Key Drivers Pathways & Regulators (Key drivers=yellow; key drivers validated in siRNA screen=green) Cell Cycle (Blue) Chromatin Modification (Black) Pre-mRNA proc. (Brown) mRNA proc. (red) 38 Zhang B et al., Key Driver Analysis in Gene Networks (manuscript)
  34. 34. Bin ZhangModel of Breast Cancer: Mining Xudong Dai Jun Zhu Co-expression sub-networks predict survival; KDA identifies drivers Co-­‐expression  modules   Map  to  Bayesian   Define  Key  Drivers   correlate  with  survival   Network   39
  35. 35. Bin ZhangModel of Alzheimer’s Disease Jun Zhu AD normal AD normal AD normal Cell cycle http://sage.fhcrc.org/downloads/downloads.php
  36. 36. Liver Cytochrome P450 Regulatory Network Xia Yang Bin Zhang Models Jun Zhu http://sage.fhcrc.org/downloads/downloads.php Regulators of P450 network 41Yang et al. Systematic genetic and genomic analysis of cytochrome P450 enzyme activities in human liver. 2010. Genome Research 20:1020.
  37. 37. AndersNew Type II Diabetes Disease Models Rosengren Global expression data 340 genes in islet-specific from 64 human islet donors open chromatin regions Blue module: 3000 genes Associated with Type 2 diabetes Elevated HbA1c Reduced insulin secretion 168 overlapping genes, which have •  Higher connectivity •  Markedly stronger association with •  Type 2 diabetes •  Elevated HbA1c •  Reduced insulin secretion •  Enrichment for beta-cell transcription factors and exocytotic proteins 42
  38. 38. New Type II Diabetes Disease Models Anders Rosengren•  Search across 1300 datasets in MetaGEO at Sage for similar expression profiles Top hit: Islet dedifferentiation study where the 168 genes were upregulated in mature islets and downregulated in dedifferentiated islets (Kutlu et al., Phys Gen 2009)•  Analyses of expression-SNPs and clinical SNPs as well as Causal Inference Test•  Identification of candidate key genes affecting beta-cell differentiation and chromatinWorking hypothesis:Normal beta-cell: open chromatin in islet-specific regions,high expression of beta-cell transcription factors,differentiated beta-cells and normal insulin secretionDiabetic beta-cell: lower expression of beta-cell transcriptionfactors affecting the identified module, dedifferentiation,reduced insulin secretion and hyperglycemiaNext steps: Validation of hypothesis and suggested key genes in human islets 43
  39. 39. Brig MechamValidating Prostate Cancer Models Xudong Dai Pete Nelson Rich Klingoffer classification Gene Expression Data on >1000 prostate cancer samples (GEO) CNV Gene Expression & CNV Data Data ~200 prostate cancers Gene Expression (Taylor et al) Clinical Traits Co-Expression Bayesian Network Network Integration of Coexp. & Gene Expression & CNV Bayesian Networks 44 Data ~120 rapid autopsy Mets (Nelson) Key Driver Analysis Integrated network analysis Gene Expression & CNV Data ~30 prostate Key Drivers Matched xenografts to Xenografts for (Nelson) validations with Presage Technology siRNA Screen Data (Nelson) 44
  40. 40. Lara MangraviteSystems biology approach to pharmacogenomics Ron KraussMolecular simvastatin response Integrative Ongoing: Genomic Analysis Cellular validation of novel genes and SNPs involved in statin efficacy and cellular cholesterol homeostasisClinical simvastatin response 100 -41 % 80 60 40 20 0 -100 -80 -60 -40 -20 0 20 Percent change LDLC Simon et al, Am J Cardiol 2006 45
  41. 41. Clinical Trial Comparator Arm Partnership (CTCAP)  Description: Collate, Annotate, Curate and Host Clinical Trial Data with Genomic Information from the Comparator Arms of Industry and Foundation Sponsored Clinical Trials: Building a Site for Sharing Data and Models to evolve better Disease Maps.  Public-Private Partnership of leading pharmaceutical companies, clinical trial groups and researchers.  Neutral Conveners: Sage Bionetworks and Genetic Alliance [nonprofits].  Initiative to share existing trial data (molecular and clinical) from non-proprietary comparator and placebo arms to create powerful new tool for drug development.
  42. 42. CTCAP Workstreams Uncurated GCD Curated GCD •  Single common identifier to link datatypes •  Gender mismatches removed Public Sage Domain GCDs Curated & QC d GCD  Curated GCD •  Gene expression data corrected for batch Uncurated effects, etc GCD  Curated & QC’d GCDCollaborators Database GCDs (Sage)  Network Models •  Public • Collaboration •  Internal Co- Private expressio Domain n Public Databases GCDs Network  dbGAP Analysis Integrate d Network Bayesian Analysis Network Analysis
  43. 43. Developing predictive models of genotype specific sensitivityto Perturbations- Margolin Predictive model
  44. 44. Examples: The Sage Federation•  Founding Lab Groups –  Seattle- Sage Bionetworks –  New York- Columbia: Andrea Califano –  Palo Alto- Stanford: Atul Butte –  San Diego- UCSD: Trey Ideker –  San Francisco: UCSF/Sage: Eric Schadt•  Initial Projects –  Aging –  Diabetes –  Warburg•  Goals: Share all datasets, tools, models Develop interoperability for human data
  45. 45. Human  Aging  Project   Data Transformations Machine Learning Brain  A   (n=363)   Interactome Elastic Net Brain  B   (n=145)   Brain  C   TF Activity Profile Age (n=400)   Network Prior Model Models Blood  A  (n=~1000)   Gene Set / Pathway Variation Analysis Blood  B   Tree Classifiers(n=~1000)   Adipose   (n=~700)  
  46. 46. Preliminary  Results   Adipose Age Prediction multivariate logistic regression model predicting age in human adipose dataMaster Regulator Analysis (MARINa) from Califanos lab.
  47. 47. Federation s Genome-wide Network and Modeling ApproachCalifano group at Columbia Sage Bionetworks Butte group at Stanford
  48. 48. Deriving Master Regulators from Transcription FactorsRegulatory Networks Glycolysis & Glycogenesis Metabolism Pathway
  49. 49. Genes Associated with Poor Prognosis are disproportionallyfound among the networks regulating the glycolysis Genes P-Value<0.005 Size of the node proportional to -log10 P value for recurrence free survival. Inferred regulatory module for Inferred regulatory module for GGMSE Oxidative Phosphorylation and >5 fold enrichment of recurrence free prognostic Sphingolipid Metabolism genes genes with the Glycolysis BN module than random selection (p<1e-100)
  50. 50. THE FEDERATIONButte Califano Friend Ideker Schadt vs
  51. 51. Sage Bionetworks 22 publications in last year
  52. 52. MO DEL S PILOTS CE NAN VERGO
  53. 53. http://sagecongress.org
  54. 54. E A Compute Engaged Public Platform GO D MO VER BEnabling DEL Map Building NANSharing S CE PILOTS C The Federation
  55. 55. We still consider much clinical research as if we were hunter gathers - not sharing soon enough .
  56. 56. Assumption that genetic alterationsin human conditions should be owned
  57. 57. Reproducible  science==shareable  science   Sweave: combines programmatic analysis with narrativeDynamic generation of statistical reports using literate data analysis Sweave.Friedrich Leisch. Sweave: Dynamic generation of statistical reportsusing literate data analysis. In Wolfgang Härdle and Bernd Rönz,editors, Compstat 2002 – Proceedings in Computational Statistics,pages 575-580. Physica Verlag, Heidelberg, 2002. ISBN 3-7908-1517-9
  58. 58. Federated  Aging  Project  :     Combining  analysis  +  narraOve     =Sweave Vignette Sage Lab R code + PDF(plots + text + code snippets) narrative HTML Data objectsCalifano Lab Ideker Lab Submitted Paper Shared  Data   JIRA:  Source  code  repository  &  wiki   Repository  
  59. 59. Evolution of a Software Project
  60. 60. Evolution of a Biology Project
  61. 61. Software Tools Support Collaboration
  62. 62. Biology Tools Support Collaboration
  63. 63. Potential Supporting TechnologiesAddama Taverna tranSMART  
  64. 64. A Platform Node for Modeling
  65. 65. INTEROPERABILITYINTEROPERABILITY  
  66. 66.  TENURE      FEUDAL  STATES      
  67. 67. IMPACT  ON  PATIENTS  
  68. 68. why consider the fourth paradigm- data intensive science thinking beyond the narrative, beyond pathways advantages of an open innovation compute space it is more about why than what
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×