If the physicists do it, the software engineers do it,                 Why can’t we do it?:      Moving beyond linear inve...
So	  what	  is	  the	  problem?	  	  	  	  Most	  approved	  therapies	  were	  assumed	  to	  be	              monotherap...
Familiar but Incomplete
Reality: Overlapping Pathways
The value of appropriate representations/ maps
“Data Intensive” Science- Fourth Scientific Paradigm       Equipment capable of generating       massive amounts of data  ...
WHY	  NOT	  USE	  	        “DATA	  INTENSIVE”	  SCIENCE	  TO	  BUILD	  BETTER	  DISEASE	  MAPS?	  
what will it take to understand disease?	  	  	  	  	  	  	  	  	  	  DNA	  	  RNA	  PROTEIN	  (dark	  maHer)	  	  MOVING	...
2002 Can one build a “causal” model?
Preliminary Probabalistic Models- Rosetta /Schadt                                                                         ...
DIVERSE	  POWERFUL	  USE	  OF	  MODELS	  AND	  NETWORKS	  
List of Influential Papers in Network Modeling                                        50 network papers                  ...
(Eric Schadt)
“Data Intensive” Science- Fourth Scientific Paradigm       Score Card for Medical Sciences    Equipment capable of generat...
We still consider much clinical research as if we were hunter gathers - not sharing                          .
 TENURE   	     	  	  FEUDAL	  STATES	  	     	  
Clinical/genomic data are accessible but minimally usableLittle incentive to annotate and curate       data for other scie...
Mathematicalmodels of disease are not built to be   reproduced orversioned by others
Lack of standard forms for future rights and consents
Lack of data standards..
Sage Mission      Sage Bionetworks is a non-profit organization with a vision to   create a commons where integrative bion...
Sage Bionetworks Collaborators  Pharma Partners     Merck, Pfizer, Takeda, Astra Zeneca,      Amgen, Johnson &Johnson  ...
JUN ZHUModel of Breast Cancer: Co-expression                                                   A) Miller 159 samples      ...
CHRIS	  GAITERI-­‐ALZHEIMER’S	             What	  is	  this?	  Bayesian	  networks	  enriched	  in	  inflammaQon	  genes	  ...
ELIAS NETO                                                   Causal Model Selection Hypothesis Tests in Systems Genetics  ...
ELIAS NETO     Causal Model Selection Hypothesis Tests in Systems Genetics                          The Schadt et al. (200...
ZHI	  WANG	                    A	  mulQ-­‐Qssue	  immune-­‐driven	  theory	  of	  weight	  loss	                          ...
PLATFORM                                Sage Platform and Infrastructure Builders-                             ( Academic ...
Why not share clinical /genomic data and model building in the        ways currently used by the software industry        ...
Leveraging Existing TechnologiesAddama                                  Taverna               tranSMART
sage bionetworks synapse project                 Watch What I Do, Not What I Say
sage bionetworks synapse project                       Reduce, Reuse, Recycle
sage bionetworks synapse project           Most of the People You Need to Work with Don’t Work with You
sage bionetworks synapse project               My Other Computer is Cloudera Amazon Google
Sage Metagenomics Project                                               Processed Data                                    ...
Sage Metagenomics using Amazon Simple Workflow          Full case study at http://aws.amazon.com/swf/testimonials/swfsageb...
Amazon SWF and Synapse•  Maintains state of analysis     •  Hosts raw and processed data for•  Tracks step execution      ...
Synapse Roadmap•  Data Repository•  Projects and security                  Synapse Platform Functionality•  R integration ...
INTEROPERABILITYSYNAPSE	                              Genome Pattern                            CYTOSCAPE                 ...
Now	  accep4ng	                                                                                                 submission...
Five	  Pilots	  involving	  Sage	  Bionetworks	   CTCAP	   Arch2POCM	   The	  FederaQon	                                  ...
Clinical Trial Comparator Arm        Partnership (CTCAP)  Description: Collate, Annotate, Curate and Host Clinical Trial ...
Shared clinical/genomic data sharing and analysis will   maximize clinical impact and enable discovery•  Graphic	  of	  cu...
Arch2POCM	  Restructuring	  the	  PrecompeQQve	      Space	  for	  Drug	  Discovery	     How	  to	  potenQally	  De-­‐Risk...
Arch2POCM: scale and scope•  Proposed Goal: Initiate 2 programs. One for Oncology/Epigenetics/   Immunology. One for Neuro...
The	  FederaQon	  
How can we accelerate the pace of scientific discovery?           2008	         2009	     2010	     2011	   Ways to move b...
(Nolan	  and	  Haussler)	  
sage federation:model of biological age                                                        Faster Aging        Predict...
Reproducible	  science==shareable	  science	            Sweave: combines programmatic analysis with narrativeDynamic gener...
Federated	  Aging	  Project	  :	  	         Combining	  analysis	  +	  narraQve	  	                                 =Sweav...
For 11/12 compounds, the #1 predictive feature in an unbiasedanalysis corresponds to the known stratifier of sensitivity  ...
Presentation outline1)	  Predic4ng	  drug	  response	      2)	  Future	  approaches:	      3)	  Standardized	  from	  canc...
1)      Data	  management	  APIs	  to	  load	  standaridzed	  objects,	  e.g.	             R	  ExpressionSets	  (MaD	  Fur...
Portable	  Legal	  Consent	      (AcQvaQng	  PaQents)	         John	  Wilbanks	  
weconsent.us	  
Sage	  Congress	  Project	              April	  20	  2012	   RealNames	  Parkinson’s	  Project	  RevisiQng	  Breast	  Canc...
Networking	  Disease	  Model	  Building	  
Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
Upcoming SlideShare
Loading in …5
×

Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

420 views
345 views

Published on

Stephen Friend, Feb 23, 2012. Complex Traits: Genomics and Computational Approaches, Breckenridge, CO

Published in: Health & Medicine
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
420
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

  1. 1. If the physicists do it, the software engineers do it, Why can’t we do it?: Moving beyond linear investigations Both of the science and of how we work Integrating layers of omics data models and building using compute spaces capable of enabling models to be evolved by teams of teams Stephen Friend MD PhD Sage Bionetworks (Non-Profit Organization) Seattle/ Beijing/ Amsterdam February 23, 2012
  2. 2. So  what  is  the  problem?        Most  approved  therapies  were  assumed  to  be   monotherapies  for  diseases  represen4ng  homogenous   popula4ons    Our  exis4ng  disease  models  o9en  assume  pathway   knowledge  sufficient  to  infer  correct  therapies  
  3. 3. Familiar but Incomplete
  4. 4. Reality: Overlapping Pathways
  5. 5. The value of appropriate representations/ maps
  6. 6. “Data Intensive” Science- Fourth Scientific Paradigm Equipment capable of generating massive amounts of data IT Interoperability Open Information System Host evolving computational models in a “Compute Space”
  7. 7. WHY  NOT  USE     “DATA  INTENSIVE”  SCIENCE  TO  BUILD  BETTER  DISEASE  MAPS?  
  8. 8. what will it take to understand disease?                    DNA    RNA  PROTEIN  (dark  maHer)    MOVING  BEYOND  ALTERED  COMPONENT  LISTS  
  9. 9. 2002 Can one build a “causal” model?
  10. 10. Preliminary Probabalistic Models- Rosetta /Schadt Networks facilitate direct identification of genes that are causal for disease Evolutionarily tolerated weak spots Gene symbol Gene name Variance of OFPM Mouse Source explained by gene model expression* Zfp90 Zinc finger protein 90 68% tg Constructed using BAC transgenics Gas7 Growth arrest specific 7 68% tg Constructed using BAC transgenics Gpx3 Glutathione peroxidase 3 61% tg Provided by Prof. Oleg Mirochnitchenko (University of Medicine and Dentistry at New Jersey, NJ) [12] Lactb Lactamase beta 52% tg Constructed using BAC transgenics Me1 Malic enzyme 1 52% ko Naturally occurring KO Gyk Glycerol kinase 46% ko Provided by Dr. Katrina Dipple (UCLA) [13] Lpl Lipoprotein lipase 46% ko Provided by Dr. Ira Goldberg (Columbia University, NY) [11] C3ar1 Complement component 46% ko Purchased from Deltagen, CA 3a receptor 1 Tgfbr2 Transforming growth 39% ko Purchased from Deltagen, CANat Genet (2005) 205:370 factor beta receptor 2
  11. 11. DIVERSE  POWERFUL  USE  OF  MODELS  AND  NETWORKS  
  12. 12. List of Influential Papers in Network Modeling   50 network papers   http://sagebase.org/research/resources.php
  13. 13. (Eric Schadt)
  14. 14. “Data Intensive” Science- Fourth Scientific Paradigm Score Card for Medical Sciences Equipment capable of generating massive amounts of data A- IT Interoperability D Open Information System D- Host evolving computational models in a “Compute Space F
  15. 15. We still consider much clinical research as if we were hunter gathers - not sharing .
  16. 16.  TENURE      FEUDAL  STATES      
  17. 17. Clinical/genomic data are accessible but minimally usableLittle incentive to annotate and curate data for other scientists to use
  18. 18. Mathematicalmodels of disease are not built to be reproduced orversioned by others
  19. 19. Lack of standard forms for future rights and consents
  20. 20. Lack of data standards..
  21. 21. Sage Mission Sage Bionetworks is a non-profit organization with a vision to create a commons where integrative bionetworks are evolved by contributor scientists with a shared vision to accelerate the elimination of human diseaseBuilding Disease Maps Data RepositoryCommons Pilots Discovery Platform Sagebase.org
  22. 22. Sage Bionetworks Collaborators  Pharma Partners   Merck, Pfizer, Takeda, Astra Zeneca, Amgen, Johnson &Johnson  Foundations   Kauffman CHDI, Gates Foundation  Government   NIH, LSDF, NCI  Academic   Levy (Framingham)   Rosengren (Lund)   Krauss (CHORI)  Federation   Ideker, Califano, Nolan, Schadt 27
  23. 23. JUN ZHUModel of Breast Cancer: Co-expression A) Miller 159 samples B) Christos 189 samplesNKI: N Engl J Med. 2002 Dec 19;347(25):1999.Wang: Lancet. 2005 Feb 19-25;365(9460):671.Miller: Breast Cancer Res. 2005;7(6):R953.Christos: J Natl Cancer Inst. 2006 15;98(4):262. C) NKI 295 samples E) Super modules Cell cycle Pre-mRNA ECM D) Wang 286 samples Blood vessel Immune response 28 Zhang B et al., Towards a global picture of breast cancer (manuscript).
  24. 24. CHRIS  GAITERI-­‐ALZHEIMER’S   What  is  this?  Bayesian  networks  enriched  in  inflammaQon  genes    correlated  with  disease  severity  in  pre-­‐frontal  cortex  of  250  Alzheimer’s  paQents.   What  does  it  mean?  InflammaQon    in  AD  is  an  interacQve  mulQ-­‐pathway  system.    More  broadly,  network  structure  organizes  complex  disease  effects  into  coherent  sub-­‐systems  and  can  prioriQze  key  genes.   Are  you  joking?  Gene  validaQon  shows  novel  key  drivers  increase  Abeta  uptake  and  decrease  neurite  length  through  an  ROS  burst.  (highly  relevant  to  AD  pathology)  
  25. 25. ELIAS NETO Causal Model Selection Hypothesis Tests in Systems Genetics Elias Chaibub Neto1, Aimee T. Broman2, Mark P. Keller2, Alan D. Attie2, Bin Zhang1, Jun Zhu1, Brian S. Yandell2 1 Sage Bionetworks, Seattle, WA USA; 2 University of Wisconsin-Madison, Madison, WI USA Abstract Vuong’s Model Selection Test Causal Model Selection Tests (CMST) Simulation StudyCurrent efforts in systems genetics have focused on the Vuongs test derives from the Kullback-Leibler Information In our applications we consider four models: M1, M2, M3 and We conducted a simulation study generating data from thedevelopment of statistical approaches aiming to disentangle Criterion (KLIC). M4. models oncausal relationships among molecular phenotypes in segregating the Figure below.populations. Model selection criterions, such as the AIC and Let h0(y | x) represent the true model. We derive intersection-union tests based on six separate VuongBIC, have been widely used for this purpose, in spite of being (Clarke) tests:unable to quantify the uncertainty associated with the model Consider the parametric family of conditional models: {f(y | x; f1 vs f2 , f1 vs f3 , f1 vs f4 , f2 vs f3 , f2 vs f4 , f3 vs f4selection call. Here we propose three novel hypothesis tests to φ): φ ϵ Ф}.perform model selection among models representing distinct We propose three distinct CMST tests: (1) parametric, (2) non- Then parametric, and (3) joint-parametric CMST tests.causal relationships. We focus on models composed of pairs ofphenotypes and use their common QTL to determine which KLIC(h0, f) = E0[log h0(y | x)] – E0[log f(y | x; φ)], The results are shown below:phenotype has a causal effect on the other, or whether thephenotypes are not causally related, and are only statistically where the expectation E0 is computed w.r.t h0(y, x), and φ* is the Parametric CMST:associated. Our hypothesis tests are fully analytical and avoid parameter value that minimizes KLIC(h0, f). H0: model M1 is not closer to the true model than M2, M3 or M4.the use of computationally expensive permutation or re-sampling Consider two models: f1 ≡ f1(y | x; φ1*) and f2 ≡ f2(y | x; φ2*). H1: model M1 is closer to the true model than M2, M3 and M4.strategies. They adapt and extend Vuongs (and Clarke’s) modelselection test to the comparison of four possibly misspecifiedmodels, handling the full range of possible causal relationships Model f1 is a better approximation of h0 than f2 if and only if H0: { E0[LR12] = 0 } { E0[LR13] = 0 } { E0[LR14] = 0 }among a pair of phenotypes. We evaluate the performance of our H1: { E0[LR12] > 0 } ∩ { E0[LR13] > 0 } ∩ { E0[LR14] > 0 }tests against the AIC, BIC and a published causality inference KLIC(h0, f1) < KLIC(h0, f2)  E0[log f1] > E0[log f2]. The rejection region and p-value for this IU-test are given by:test in simulation studies. Furthermore, we compare theprecision of the causal predictions made by the methods using Let LR12 = log f1 – log f2. Then we testbiologically validated causal relationships extracted from a min{z12 , z13 , z14} > cα , p1 = max{p12 , p13 , p14}.database of 247 knockout experiments in yeast. Overall, our H0: E0[LR12] = 0, H1: E0[LR12] > 0, H2: E0[LR12] < 0.model selection hypothesis tests achieve higher precision thanthe alternative methods at the expense of reduced statistical The quantity E0[LR12] is unknown, but the sample mean and Non-parametric CMST:power. variance of Analogous to the parametric CMST. Just replace Vuong’s by LR = log f – log f 2,i, f 1 ≡ f(y | x; φ 1), φ ≡ 12,i 1,i 1 Clarke’s tests. ML est. of φ1 Pairwise Causal Models converve a.s. to E0[LR12] and Var0[LR12] = σ12.12 . Joint parametric CMST:Given a pair of phenotypes, Y1 and Y2, that co-map to the samequantitative trait loci, Q, we consider the following models: Let LR = ∑ LR , then under H0 12 12,i Simple application of Vuong tests, overlooks the dependency among the test statistics. (n σ 12.12 )−1/2 LR 12 →d N(0, 1). Let S1 represent the sample covariance matrix of LR 12,i , Yeast Data Analysis If different models have different dimensions we consider LR 13,i and LR 14,i. We analyzed the yeast genetical genonics data set from Brem LR *12 = LR 12 – D12 Under regularity conditions we have that S1 converges a.s. to and Kruglyak (2005). Σ1. where D12 represents a difference of AIC or BIC penalties, and We evaluated the precision of the causal predictions made by adopt the test statistic the methods using validated causal relationships extracted It follows from the MCT and Slutsky’s theorem that when Z12 = (n σ 12.12 )−1/2 LR *12 . from a data-base of 247 knock-out experiments (Hughes ( E0[LR12] , E0[LR13] , E0[LR14] )T = ( 0 , 0 , 0 )T 2000, Zhu 2008). Clarke’s Model Selection Test we have that In total, 46 of the ko-genes showed significant eQTLs, and Conclusions Represents a non-parametric version of Vuong’s test. we tested a total of 4,928 ko-gene/putative target gene Z1 = n−1/2 diag(S1 )−1/2 LR 1 →d N3(0 , ρ1) relations.Advantages of the Causal Model Selection Tests: Vuong’s null: the mean log-likelihood ratio is 0. Clarke’s null: the median log-likelihood ratio is 0. where LR 1 = ( LR 12 , LR 13 , LR 14 )T and ρ1 = diag1- Fully analytical hypothesis tests that avoid the use of (S1)−1/2 Σ1 diag(S1)−1/2computationally expensive permutation or re-sampling Paired sign test on log-likelihood scores:techniques. We consider the hypotheses Scores: (LR 12,1 , LR 12,2 , LR 12,3 , LR 12,4 , LR 12,5 ,2- Achieve better controlled type I error rates. … , LR 12,n ) H0: min{ E0[LR12] , E0[LR13] , E0[LR14] } ≤ 0 Signs: ( + , − , + , + , − , … , H1: min{ E0[LR12] , E0[LR13] , E0[LR14] } > 03- Achieve higher precision rates. + ) and adopt the test statistic W1 = min{Z1}. The p-value is Let, T12 = {# of positive signs}. Then under Clarke’s null computed asMain disadvantage: lower statistical power. T12 ~ Binomial(n, 1/2). P(W1 ≥ w1) = P(Z12 ≥ w1 , Z13 ≥ w1 , Z14 ≥ w1).
  26. 26. ELIAS NETO Causal Model Selection Hypothesis Tests in Systems Genetics The Schadt et al. (2005) approach was based on a penalized likelihood model selection approach, were we simply select the model with the best score. The proposed hypothesis test allows us to attach a p-value to the selected model and, in this way, allows the quantification of the uncertainty associated with the model selection call. The proposed tests are fully analytical and avoid computationally expensive permutation and re- sampling techniques.
  27. 27. ZHI  WANG   A  mulQ-­‐Qssue  immune-­‐driven  theory  of  weight  loss   Hypothalamus   Lep4n   signaling   FaDy  acids   Macrophage/   inflamma4on   Liver   Adipose   M1  macrophage   Phagocytosis-­‐   Phagocytosis-­‐   induced  lipolysis   induced  lipolysis  
  28. 28. PLATFORM Sage Platform and Infrastructure Builders- ( Academic Biotech and Industry IT Partners...) PILOTS= PROJECTS FOR COMMONS Data Sharing Commons Pilots- (Federation, CCSB, Inspire2Live....) ORMM APS F PLAT NEW RULES GOVERN
  29. 29. Why not share clinical /genomic data and model building in the ways currently used by the software industry (power of tracking workflows and versioning
  30. 30. Leveraging Existing TechnologiesAddama Taverna tranSMART
  31. 31. sage bionetworks synapse project Watch What I Do, Not What I Say
  32. 32. sage bionetworks synapse project Reduce, Reuse, Recycle
  33. 33. sage bionetworks synapse project Most of the People You Need to Work with Don’t Work with You
  34. 34. sage bionetworks synapse project My Other Computer is Cloudera Amazon Google
  35. 35. Sage Metagenomics Project Processed Data (S3)•  > 10k genomic and expression standardized datasets indexed in SCR•  Error detection, normalization in mG•  Access raw or processed data via download or API in downstream analysis•  Building towards open, continuous community curation
  36. 36. Sage Metagenomics using Amazon Simple Workflow Full case study at http://aws.amazon.com/swf/testimonials/swfsagebio/
  37. 37. Amazon SWF and Synapse•  Maintains state of analysis •  Hosts raw and processed data for•  Tracks step execution further reuse in public or private projects•  Logs workflow history •  Provides visibility into•  Dispatches work to Amazon or intermediate results and remote worker nodes algorithmic details•  Efficiently match job size to •  Allows programmatic access to hardware data; integration with R•  Provides error handling and •  Provides standard terminologies recovery for annotations •  Search across data sets
  38. 38. Synapse Roadmap•  Data Repository•  Projects and security Synapse Platform Functionality•  R integration •  Workflow templates•  Analysis provenance •  Social networking •  Publishing figures •  User-customized • Search •  Wiki & collaboration tools dashboards • Controlled Vocabularies •  Integrated management •  R Studio integration • Governance of restricted of cloud resources •  Curation tool integration data Internal Alpha Public Beta Testing Synapse 1.0 Synapse 1.5 Future Q1-2012 Q2-2012 Q3-2012 Q4-2012 Q1-2013 Q2-2013 Q3-2013 Q4-2013 • TCGA •  Predictive modeling •  TBD: Integrations with other •  METABRIC breast workflows visualization and analysis cancer challenge •  Automated processing of packages common genomics platforms•  40+ manually curated clinical studies•  8000 + GEO / Array Express datasets•  Clinical, genomic, compound sensitivity•  Bioconductor and custom R analysis Data / Analysis Capabilities
  39. 39. INTEROPERABILITYSYNAPSE   Genome Pattern CYTOSCAPE tranSMART I2B2 INTEROPERABILITY  
  40. 40. Now  accep4ng   submissions   Editor-­‐in-­‐Chief          Eric  Schadt  (USA)  Open  Network  Biology  is  an  open  access  journal  that  publishes  arQcles  relaQng  to  predicQve,   network-­‐based   models   of   living   systems   linked   to   the   corresponding  coherent   data   sets   upon   which   the   models   are   based.   In   addiQon   to   arQcles  describing   these   large   data   sets,   the   journal   also   welcomes   submissions   of  original   research,   sobware   and   methods,   along   with   reviews   and   commentary,  relevant  to  the  emerging  field  of  network  biology.  Submit  your  manuscript  and  benefit  from:   •   High  visibility  for  arQcles  through  unrestricted  online  access   •   Free  arQcle  redistribuQon  under  a  CreaQve  Commons  aHribuQon  license   •   No  limits  on  arQcle  length,  addiQonal  files,  colour  figures  or  movies   •   Rapid,  immediate  open  access  publicaQon  on  acceptance   •   An  integrated  repository  for  network  model  data  and  code   www.opennetworkbiology.com  
  41. 41. Five  Pilots  involving  Sage  Bionetworks   CTCAP   Arch2POCM   The  FederaQon   ORM S Portable  Legal  Consent   MAP F Sage  Congress  Project   PLAT NEW RULES GOVERN
  42. 42. Clinical Trial Comparator Arm Partnership (CTCAP)  Description: Collate, Annotate, Curate and Host Clinical Trial Data with Genomic Information from the Comparator Arms of Industry and Foundation Sponsored Clinical Trials: Building a Site for Sharing Data and Models to evolve better Disease Maps.  Public-Private Partnership of leading pharmaceutical companies, clinical trial groups and researchers.  Neutral Conveners: Sage Bionetworks and Genetic Alliance [nonprofits].  Initiative to share existing trial data (molecular and clinical) from non-proprietary comparator and placebo arms to create powerful new tool for drug development. Started Sept 2010
  43. 43. Shared clinical/genomic data sharing and analysis will maximize clinical impact and enable discovery•  Graphic  of  curated  to  qced  to  models  
  44. 44. Arch2POCM  Restructuring  the  PrecompeQQve   Space  for  Drug  Discovery   How  to  potenQally  De-­‐Risk       High-­‐Risk  TherapeuQc  Areas  
  45. 45. Arch2POCM: scale and scope•  Proposed Goal: Initiate 2 programs. One for Oncology/Epigenetics/ Immunology. One for Neuroscience/Schizophrenia/Autism. Both programs will have 8 drug discovery projects (targets) - ramped up over a period of 2 years –  It is envisioned that Arch2POCM’s funding partners will select targets that are judged as slightly too risky to be pursued at the top of pharma’s portfolio, but that have significant scientific potential that could benefit from Arch2POCM’s crowdsourcing effort•  These will be executed over a period of 5 years making a total of 16 drug discovery projects –  Projected pipeline attrition by Year 5 (assuming 12 targets loaded in early discovery) •  30% will enter Phase 1 •  20% will deliver Ph 2 POCM data 52
  46. 46. The  FederaQon  
  47. 47. How can we accelerate the pace of scientific discovery? 2008   2009   2010   2011   Ways to move beyond “traditional” collaborations? Intra-lab vs Inter-lab Communication Colrain/ Industrial PPPs Academic Unions
  48. 48. (Nolan  and  Haussler)  
  49. 49. sage federation:model of biological age Faster Aging Predicted  Age  (liver  expression)   Slower Aging Clinical Association -  Gender -  BMI -  Disease Age Differential Genotype Association Gene Pathway Expression Chronological  Age  (years)  
  50. 50. Reproducible  science==shareable  science   Sweave: combines programmatic analysis with narrativeDynamic generation of statistical reports using literate data analysis Sweave.Friedrich Leisch. Sweave: Dynamic generation of statistical reportsusing literate data analysis. In Wolfgang Härdle and Bernd Rönz,editors, Compstat 2002 – Proceedings in Computational Statistics,pages 575-580. Physica Verlag, Heidelberg, 2002. ISBN 3-7908-1517-9
  51. 51. Federated  Aging  Project  :     Combining  analysis  +  narraQve     =Sweave Vignette Sage Lab R code + PDF(plots + text + code snippets) narrative HTML Data objectsCalifano Lab Ideker Lab Submitted Paper Shared  Data   JIRA:  Source  code  repository  &  wiki   Repository  
  52. 52. For 11/12 compounds, the #1 predictive feature in an unbiasedanalysis corresponds to the known stratifier of sensitivity #2  CML  lineage   CML lineage #1  EGFR  mut   EGFR mut #1  EGFR  mut   EGFR mut #1  CML  lineage   #1  EGFR  mut   CML linage EGFR mut #1  ERBB2  expr   ERBB2 expr Can  the  approach  make  new   mut   #1  BRAF   discoveries?   BRAF mut #1  HGF  expr   HGF expr #2  NRAS  mut   NRAS mut BRAF mut #1  BRAF  mut   #3  KRAS  mut   KRAS mut #2  NRAS  mut   NRAS mut BRAF mut #1  BRAF  mut   #3  KRAS  mut   KRAS mut #2  NRAS  mut   NRAS mut BRAF mut #1  BRAF  mut   #2  TP53  mut   TP53 mut #3  CDKN2A  copy   CDKN2A copy #1  MDM2  expr   MDM2 expr 59  
  53. 53. Presentation outline1)  Predic4ng  drug  response   2)  Future  approaches:   3)  Standardized  from  cancer  cell  lines   network-­‐based  predictors   workflows  for  data   and  mul4-­‐task  learning   management,   Cancer  cell  line   versioning  and   encyclopedia   method  comparison  Molecular characterization Network  /  pathway  (1,000 cell lines) prior  informa4on   Currently   mRNA   copy number   somatic mutations (36 cancer-related genes) In progress   targeted exon sequencing Vaske,  et  al.     epigenetics   microRNA TCGA  /ICGC     lncRNA Transfer  Molecular characterization learning  (50 tumor types)   phospho-tyrosine kinase   metabolitesViability screens (500 cell   genomicslines, 24 compounds)   transcriptomicsSmall molecule screen   epigenetics Predic4ve   Clinical data model   Vaske,  et  al.  
  54. 54. 1)  Data  management  APIs  to  load  standaridzed  objects,  e.g.   R  ExpressionSets  (MaD  Furia):            ccleFeatureData  <-­‐  getEnQty(ccleFeatureDataId)            ccleResponseData  <-­‐  getEnQty(ccleResponseDataId)   2)      tAutomated,  standardized  workflows  for  cura4on  and  QC  of   large-­‐scale  datasets  (-­‐  getEnQty(tcgaFeatureDataId)           cgaFeatureData  < Brig  Mecham).            tcgaResponseData  <-­‐  getEnQty(tcgaResponseDataId)   A.  TCGA:  Automated  cloud-­‐based  processing.   B. GEO  /  Array  Expression:  NormalizaQon  workflows,  curaQon   of  phenotype  using  standard  ontologies.   C. AddiQonal  studies  with  geneQc  and  phenotypic  data  in   Sage  repository  (e.g.  CCLE  and  Sanger  cell  line  datasets)   Observed Data!=! Systematic Variation! +! Random Variation! =! +! +! 3)  Pluggable  API  to  implement  predic4ve  modeling   algorithms.  Normalization: Remove the influence of adjustment variables on data...! A)  Support  for  all  commonly  used  machine  learning  methods  4)  Sta4s4cal  performance  assessment  ew  methods)   (for  automated  benchmarking  against  n across  models.   B)  Pluggable  custom  =! ethods  as  R  classes  implemenQng   m customTrain()  and  customPredict()  methods.   +!custom  model  1   be  arbitrarily  complex  (e.g.  pathway  and  other   A)  Can   custom  model  2   custom  model  N   priors)   5)  Output  of  candidate  biomarkers  and  feature   B)  Support  for  parallelizaQon  in  for  each  loops.   evalua4on  (e.g.  GSEA,  pathway  analysis)  custom  model  1   custom  model  2   custom  model  N   6)  Experimental  follow-­‐up  on  top  predic4ons  (TBD)        E.g.  for  cell  lines:  medium  throughput  suppressor  /  enhancer   screens  of  drug  sensiQvity  for  knockdown  /  overexpression  of   predicted  biomarkers.  
  55. 55. Portable  Legal  Consent   (AcQvaQng  PaQents)   John  Wilbanks  
  56. 56. weconsent.us  
  57. 57. Sage  Congress  Project   April  20  2012   RealNames  Parkinson’s  Project  RevisiQng  Breast  Cancer  Prognosis   Fanconi’s  Anemia   (Responders  CompeQQons-­‐  IBM-­‐DREAM)  
  58. 58. Networking  Disease  Model  Building  

×