Friend EORTC 2012-11-08


Published on

Stephen Friend Nov 8, 2012. 24th EORTC-NCI-AACR Symposium on
Molecular Targets and Cancer Therapeutics, Dublin, Ireland

Published in: Health & Medicine
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Friend EORTC 2012-11-08

  1. 1. Integrating Cancer Networks and the Value of Compute Spaces Stephen H Friend November 8, 2012 EORTC/NCI Dublin
  2. 2. Oncogenes only make good targets in particular molecularcontexts : EGFR story ERBB2 • EGFR Pathway commonly mutated/activated in Cancer EGFRi EGFR • 30% of all epithelial cancers BCR/ABL • Blocking Abs approved for treatment of metastatic colon cancer KRAS NRAS • Subsequently found that RASMUT tumors don’t respond – “Negative Predictive Biomarker” BRAF • However still EGFR+ / RASWT patients who don’t MEK1/2 respond? – need “Positive Predictive Biomarker” • And in Lung Cancer not clear that RASMUT status is Proliferation, Survival useful biomarker Predicting treatment response to known oncogenes is complex and requires detailed understanding of how different genetic backgrounds function
  3. 3. Reality: Overlapping Pathways
  4. 4. Preliminary Probabalistic Models- Rosetta Networks facilitate direct identification of genes that are causal for disease Evolutionarily tolerated weak spots Gene symbol Gene name Variance of OFPM Mouse Source explained by gene model expression* Zfp90 Zinc finger protein 90 68% tg Constructed using BAC transgenics Gas7 Growth arrest specific 7 68% tg Constructed using BAC transgenics Gpx3 Glutathione peroxidase 3 61% tg Provided by Prof. Oleg Mirochnitchenko (University of Medicine and Dentistry at New Jersey, NJ) [12] Lactb Lactamase beta 52% tg Constructed using BAC transgenics Me1 Malic enzyme 1 52% ko Naturally occurring KO Gyk Glycerol kinase 46% ko Provided by Dr. Katrina Dipple (UCLA) [13] Lpl Lipoprotein lipase 46% ko Provided by Dr. Ira Goldberg (Columbia University, NY) [11] C3ar1 Complement component 46% ko Purchased from Deltagen, CA 3a receptor 1 Tgfbr2 Transforming growth 39% ko Purchased from Deltagen, CANat Genet (2005) 205:370 factor beta receptor 2
  5. 5. Extensive Publications now Substantiating Scientific Approach Probabilistic Causal Bionetwork Models >80 Publications from Rosetta Genetics/ Sage BionetworksMetabolic "Genetics of gene expression surveyed in maize, mouse and man." Nature. (2003) Disease "Variations in DNA elucidate molecular networks that cause disease." Nature. (2008) "Genetics of gene expression and its effect on disease." Nature. (2008) "Validation of candidate causal genes for obesity that affect..." Nat Genet. (2009) ….. Plus 10 additional papers in Genome Research, PLoS Genetics, PLoS Comp.Biology, etcCVD "Identification of pathways for atherosclerosis." Circ Res. (2007) "Mapping the genetic architecture of gene expression in human liver." PLoS Biol. (2008) …… Plus 5 additional papers in Genome Res., Genomics, Mamm.GenomeBone "Integrating genotypic and expression data …for bone traits…" Nat Genet. (2005) d “..approach to identify candidate genes regulating BMD…" J Bone Miner Res. (2009)Methods "An integrative genomics approach to infer causal associations ...” Nat Genet. (2005) "Increasing the power to detect causal associations… “PLoS Comput Biol. (2007) "Integrating large-scale functional genomic data ..." Nat Genet. (2008) …… Plus 3 additional papers in PLoS Genet., BMC Genet.
  6. 6. Iterative Networked ApproachesTo Generating Analyzing and Supporting New Models Data Biological System Analysis Uncouple the automatic linkage between the data generators, analyzers, and validators
  7. 7. An Alternative Biomedicine Information CommonsCommons are resources that are owned in common or shared amongcommunities. -David Bollier
  8. 8. Sage BionetworksA non-profit organization with a vision to enable networked team approaches to building better models of disease BIOMEDICINE INFORMATION COMMONS INCUBATOR Technology Platform Governance Impactful Models Better Models of Disease: INFORMATION COMMONS Challenges
  9. 9. Sage Bionetworks Collaborators Pharma Partners  Merck, Pfizer, Takeda, Astra Zeneca, Amgen,Roche, Johnson &Johnson Foundations  Kauffman CHDI, Gates Foundation Government  NIH, LSDF, NCI Academic  Levy (Framingham)  Rosengren (Lund)  Krauss (CHORI) Federation  Ideker, Califano, Nolan, Schadt 11
  10. 10. IT/Data Constituencies Generators Technology Platform Individual Patients Biotech Governance Impactful Models Better Models of Disease:Pharma INFORMATION Patient COMMONS Foundations Challenges Joint Academic Patient/Scient Consortia ist Communities
  11. 11. Background: Information Commons for Biological Functions
  12. 12. Networked Approaches BioMedicine Information Commons Patients/ Citizens Data Generators CURATED DATA Data TOOLS/ Analysts METHODS RAW DATA ANALYZES/ MODELS Clinicians SYNAPSE Experimentalists
  13. 13. FOUR PILOTS IN THE SAGE BIONETWORKS COMMONS INCUBATOR• Provide a “compute space” for hosting and sharing models – (to complement data storage and tools provided by Sanger Broad…)- SYNAPSE)• Co-generate models of drivers for Cell Line/Clinical Sensitivity• Host Challenges and other approaches that will maximize most people providing and sharing their insights as quickly as possible – - BCCOverview:0• Engage citizens as partners in gathering information and insights and funds
  14. 14. Two approaches to building common scientific and technical knowledge Every code change versioned Every issue trackedText summary of the completed project Every project the starting point for new workAssembled after the fact All evolving and accessible in real time Social Coding
  15. 15. “Synapse is a compute platform for transparent, reproducible, andmodular collaborative research.”
  16. 16. Synapse is GitHub for Biomedical Data • Every code change versioned • Every issue tracked • Every project the starting point for new work• Data and code versioned • Social/Interactive Coding• Analysis history captured in real time• Work anywhere, and share the results with anyone• Social/Interactive Science
  17. 17. Currently at 16K+ datasets and ~1M models
  18. 18. Demo InteractionDownload Data from Web Programmatic Access to Data
  19. 19. Demo InteractionDownload Data from Web Programmatic Access to Data
  20. 20. Data Repository: with versions Points to specific version of repository
  21. 21. Pancancer collaborative subtype discovery
  22. 22. Download analysis and meta-analysisDownload another Cluster Download Evaluation and view moreResult stats • Perform Model averaging • Compare/contrast models • Find consensus clusters
  23. 23. Synapse infrastructure for sharing, searching, and analyzing TCGA data Copy* Muta6on* Phenotype* • Comparison of many modeling approaches appliedExpression* number* to the same data. • Models transparently shared and reusable through Copy*Expression* number* Muta6on* Phenotype* Synapse. • Displayed is comparison of 6 modeling approaches to predict sensitivity to 130 drugs. • Extending pipeline to evaluate prediction ofExpression* Expression* Phenotype* Phenotype* TCGA phenotypes. Copy* Copy* number* number* • Hosting of collaborative competitions to compare Muta6on* Muta6on* models from many groups. Accuracy$ 2)$ (R Predic. on$ Predic6ve* model* genera6on* Performance* assessment* 130$ drugs$
  24. 24. Synapse transparent, reproducible, versioned machinelearning infrastructure for method comparison Copy* 1) Automated, standardized workflows forExpression* number* Muta6on* Phenotype* curation, QC and hosting of large-scale datasets (Brig Mecham). 2) Programmatic APIs to load standaridzed Copy* Muta6on* Phenotype* objects, e.g. R ExpressionSets (Matt Furia):Expression* number* Load cell line feature and response data: > ccleFeatureData <- getEntity(ccleFeatureDataId) > ccleResponseData <- getEntity(ccleResponseDataId) Load TCGA feature and phenotype data (in same format as cell line data):Expression* Expression* 4) tcgaFeatureData <-<- getEntity(tcgaResponseDataId) > Statistical performance assessment across models. getEntity(tcgaFeatureDataId) Phenotype* Phenotype* > tcgaResponseData Copy* Copy* number* number* custom model 1 custom model 2 custom model N Muta6on* Muta6on* 3) Pluggable API to implement predictive modeling algorithms. User implements customTrain() and 5) Output of candidate biomarkers and feature Predic6ve* model* customPredict() functions. evaluation (e.g. GSEA, pathway analysis) genera6on* custom model 1 all commonly2used machinemodel N Support for custom model custom learning methods (for automated Performance* benchmarking against new methods) assessment*
  25. 25. Objective assessment of factors influencing modelperformance (>1 million predictions evaluated) Sanger CCLE Prediction accuracyCross validation prediction accuracy (R2) improved by… Not discretizing data Including expression data Elastic net regression 130 compounds In Sock Jang 24 compounds
  26. 26. Assessment of pathway enrichment of inferredpredictive feature sets KEGG REACTOME BIOCARTA SangerPathways CCLE Compounds
  27. 27. Data Analysis with SynapseRun Any ToolOn Any PlatformRecord in SynapseShare with Anyone
  28. 28. Why Stratifying Patients for Therapy Matters 43% 59% EGFR 60% mCRC patients are KRAS RASwt Chemotherapy Chemotherapy + Cetuximab BRAF 40% 36% MEK1/2 Metastatic Proliferation, Colorectal Cancer (mCRC) Survival 40% mCRC patients are responder RASmut non-responder But not all CRC patients that are RASwt respond to Cetuximab In other cancers for which it is efficacious RAS status appears not to predict response (e.g. lung) 30
  29. 29. RAS Model using primary tumor data to predict KRAS mutation status 290 CRC samples: • KRAS12 or KRAS13 (n=115) vs WT (n=175) • Penalized regression model using ElasticNet and gene expression data 1.0 0.8 Robust External True positive rate 0.6 Validation In CRC TCGA CRC Khambata−Ford 0.4 Gaedcke data sets 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 False positive rate Model specific to CRC: does not generalized to other KRAS dependent cancersRAS signatures derived from CRC cohort can classify mutation status in CRC 31
  30. 30. RIS 0.0 0.2 0.4 0.6 0.8 1.0 kras.p.G12D kras.p.G12V kras.p.G13D kras.p.A146T kras.p.G12C kras.p.G12S kras.p.G12A kras.p.K117N kras.p.Q61L kras.p.A146V kras.p.E98X kras.p.G12R kras.p.G13C kras.p.Q22K kras.p.R68S braf.p.V600E braf.p.E228V braf.p.F247L braf.p.K205Q nras.p.Q61K nras.p.G12C nras.p.G12D nras.p.G13R nras.p.Q61L nras.p.E132K Putative novel activating KRAS mutations nras.p.G12A nras.p.Q61H nras.p.Q61R nras.p.R164C Exploring the RASness Model in TCGA Colorectal Carcinoma WT wt braf kras nras32
  31. 31. Can we predict response to RAS Pathway Drugs in CRC Cell lines?Correlate RASness Score with IC50 for drugs across 21 CRC cell lines from CCLE1 panel P value ERBB2 EGFR Note: KRAS BCR/ABL and/or BRAF mutation status KRAS NRAS NOT predictive of response to BRAF MEK inhibitor PD-0325901 MEK1/2 AZD6244 Proliferation, Survival RASness Model Translates to predict response to RAS pathway drugs in CRC cell lines 1. Barretina et al. 2012 Nature. 483:603: The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. 33
  32. 32. RASness Model Predicts response to Cetuximab in patient and xenograftdata tumor, n=19 xeno early, n=26 p=0.023 (p>.5) p=0.017 (p=0.03) ● ● ● 0.4 0.6 0.8 ● ● ● ● ● 0.5 ● ● ●54 xenograft models ● ● ● ● ● RIS RIS ● ●115 expression arrays, xenograft and primary ● 0.3 ● ● ● ● ● ● ● ● ●tumor ● ● ● ● ● 0.2 ● ● ● ● 0.1Kras, braf, pik3ca, apc profiled ● ● ● ●Response to cetuximab, 5-FU, I-OHP, CPT-11 Non−response Response Non−response Responsemeasured xeno late, n=28 tumor + xeno, n=73 p=0.0034 (p>.5) p=1.4e−05 (p=0.13) ● 0.7 ● ● 0.8 ● ● ● ● ● ● ● ● ● ● ● 0.5 RIS ● ● ● RIS ● ● ● ● ● ● 0.4 ● ● ● ● ● ● ● ● ● ● ● ● kras ● 0.3 ● ● ● ● ● ● ● ● ● ● ● ● braf ● ● ● ● ● ● ● ● ● kras+braf ● ● ● ● ● ● ● ● ● ● 0.0 ● wt 0.1 ● Non−response Response Non−response Response RAS model predicts response to Cetuximab better than mutation status 34
  33. 33. Predictive models of cancer phenotypes Panel of tumor samplesMolecularcharacterization mRNA copy number somatic Predictive mutations model epigenetics proteomicsCancerphenotypes Drug sensitivity screens Clinical 15 prognosis
  34. 34. Developing predictive models of genotype-specificsensitivity to compound treatment Genetic Feature Matrix Expression, copy number, somatic mutations, etc. Maximize:  Predictive Features log Pr  | C,G  ~  C  G 2     1  1     2 2 (biomarkers) 2 Cancer samples with varying degrees of response to therapy Sensitive Refractory (e.g. EC50) 36
  35. 35. Novel predictions are functionally validatedPrediction ValidationAHR expression predicts sensitivity Functionally validated by AHR knockdownto MEK inhibitors in NRAS mutantcell lines Legend AHR shRNA Wei G.*, Margolin A.A.*, et al, Cancer Cell Control shRNABCL-xL expression predicts Functionally validated by :sensitivity to severalchemotherapeutics BCL-xL knockdown BCL-xL inhibitor drug synergy ! "# &# ) * $% ( +, - &$# ( &* "# . /% * 0 0 &1&"23# 4* /# . 4# 5&6 7/# 4* 86 ) 94) * : 2"&6 7/# 4* =><"* ?! @*BCL$xL&Expression& /, 5$, 5) *& ; <"*Doxorubicin*Triptolide*Eme3ne* Mouse models Clinical trialsActD*Flavopiridol*Anicomycin*Puromycin* 37
  36. 36. REDEFINING HOW WE WORK TOGETHER: Sage/DREAM Breast Cancer Prognosis Challenge
  37. 37. What is the problem?Our current models of disease biology are primitive and limit doctor’s understanding and ability to treat patientsCurrent incentives reward those whosilo information and work in closedsystems
  38. 38. The Solution: Competitions to crowd-source researchin biology and other fields Why competitions? • Objective assessments • Acceleration of progress • Transparency • Reproducibility • Extensible, reusable models Competitions in biomedical research • CASP (protein structure) • Fold it / EteRNA (protein / RNA structure) • CAGI (genome annotation) • Assemblethon / alignathon (genome assembly / alignment) • SBV Improver (industrial methodology benchmarking) • DREAM (co-organizer of Sage/DREAM competition) Generic competition platforms • Kaggle, Innocentive, MLComp
  39. 39. METABRIC Anglo-Canadian collaboration •Array-CGH •Expression arrays •Sequencing TP53 PIK3CA •Amplified DNA and cDNA banks •miRNA profiling Gene sequencing (ICGC)
  40. 40. Sage/DREAM Challenge: Details and TimingPhase 1: July thru end-Sep 2012 Phase 2: Oct 15 thru Nov 12, 2012 Training data: 2,000 breast cancer samples from METABRIC cohort  Evaluation of models in novel • Gene expression dataset. • Copy number • Clinical covariates  Validation data: ~500 fresh • 10 year survival frozen tumors from Norway Supporting data: Other Sage-curated group with: breast cancer datasets • Clinical covariates • >1,000 samples from GEO • ~800 samples from TCGA • 10 year survival • ~500 additional samples from Norway group • Curated and available on Synapse, Sage’s compute platform Data released in phases on Synapse from now through end-September Will evaluate accuracy of models built on METABRIC data to predict survival in: • Held out samples from METABRIC • Other datasets
  41. 41. Synapse transparent, reproducible, versioned machinelearning infrastructure for method comparison Copy* Muta6on* Phenotype*Expression* number* Copy* Muta6on* Phenotype*Expression* number*Expression* Expression* Phenotype* Phenotype* Copy* Copy* Custom models implement train() and number* number* predict() API. Muta6on* Muta6on* Predic6ve* model* genera6on* Performance* assessment* Implementation of simple clinical-only survival model used as baseline predictor.
  42. 42. Models submitted and Federation modeling evaluated in real-time competition leaderboard >200 models tested within 3 months Gustavo% Stolovi= ky)Erhan% In%Sock%Jang) Ben% Sauerwine) Bilal)Stephen%Friend) Marc% Vidal) Adam% Margolin) Andrea% Gaurav% Guinney) Ben% Justin% Logsdon) Califano) Yishai% Eric% Schadt) Pandey) Garry% Nolan) Shimoni) Trey% Ideker) Mukesh% Janusz% Dutkowski) Bansal) Mariano% Alvarez)
  43. 43. Sage-DREAM Breast Cancer Prognosis Challenge one month of building better disease models together breast cancer data154 participants; 27 countries 268 participants; 32 countries August 17 StatusChallenge Launch: July 17 290 models posted to Leaderboard
  44. 44. Summary of Breast Cancer Challenge #1 - BCCOverview:0Transparency, Validation in novelreproducibility Copy* Expression* number* Muta6on* Phenotype* dataset Copy* Muta6on* Phenotype* Expression* number* Expression* Expression* Phenotype* Phenotype* Copy* Copy* number* number* Muta6on* Muta6on* Predic6ve* model* genera6on* Performance* assessment*Publication in Science Donation of Google-Translational Medicine scale compute space. For the goal of promoting democratization of medicine… Registration starting NOW… sign up at:
  45. 45. FOUR PILOTS IN THE SAGE BIONETWORKS COMMONS INCUBATOR• Provide a “compute space” for hosting and sharing models – (to complement data storage and tools provided by Sanger Broad…)- SYNAPSE)• Co-generate models of drivers for Cell Line/Clinical Sensitivity• Host Challenges and other approaches that will maximize most people providing and sharing their insights as quickly as possible – - BCCOverview:0• Engage citizens as partners in gathering information and insights and funds
  46. 46. Networked Approaches BioMedicine Information Commons Patients/ Citizens Data Generators CURATED DATA Data TOOLS/ Analysts METHODS RAW DATA ANALYZES/ MODELS Clinicians SYNAPSE Experimentalists
  47. 47. Upon this gifted age, in its dark hour,Rains from the sky a meteoric showerOf Facts…they lie unquestioned,uncombined.Wisdom enough to leech us of our illIs daily spun; but there exists no loomTo weave it into fabric. - Edna St. Vincent Millay