• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehouse for Translational Medicine at Takeda Pharmaceuticals International
 

tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehouse for Translational Medicine at Takeda Pharmaceuticals International

on

  • 642 views

tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehouse for Translational Medicine at Takeda Pharmaceuticals ...

tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehouse for Translational Medicine at Takeda Pharmaceuticals
International
Dave Marberg, Takeda
We have used the tranSMART platform to construct a warehouse containing data from several
Takeda clinical trials, proprietary preclinical drug activity studies, 1600 Gene Expression
Omnibus studies, and data from TCGA, CCLE, and other sources. All gene expression data has
been globally normalized. We extended the tranSMART platform with a set of R function calls
to enable cross-study queries and analysis via the rich toolset available in R. The utility of the
data warehouse is exemplified by a study in which we built a predictive model for drug
sensitivities. The model was trained on gene expression and IC50 data from cell lines and was
found to correctly predict drug activity in oncology indications.

Statistics

Views

Total Views
642
Views on SlideShare
502
Embed Views
140

Actions

Likes
0
Downloads
22
Comments
0

2 Embeds 140

http://lanyrd.com 139
http://www.feedspot.com 1

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehouse for Translational Medicine at Takeda Pharmaceuticals International tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehouse for Translational Medicine at Takeda Pharmaceuticals International Presentation Transcript

    • tranSMART: a data warehouse for Translational Medicine at Takeda Pharmaceuticals International Co. transMART Community Workshop November 2013 David Merberg Bin Li William Trepicchio
    • Outline • Takeda’s tranSMART instance – Goal – Data content – Enhancements • Case Studies – Models for predicting erlotinib and sorafenib efficacy 1 |○○○○ | DDMMYY
    • Takeda rationale for implementing tranSMART • To provide a large, well organized, and integrated dataset consisting of MPI/Takeda proprietary data, outsourced data, and valuable public data. • To provide an integrated environment for accessing clinical data and molecular profiling data – Low dimensional data – age, sex, weight, previous treatments, survival, etc. – High dimensional data – gene expression microarray, SNP, mutation, NGS • To provide tools that will enable Medical and Discovery scientists to use this data warehouse for biomarker identification, patient stratification, and drug targeting disease prediction, etc. 2 |○○○○ | DDMMYY
    • Public data currently in Takeda tranSMART • Gene Expression Omnibus (GEO) – Approximately 1600 studies – Approximately 200 key cancer studies manually curated; another ~150 cancer studies curated via text mining – Most GEO datasets are cancer studies, but there are also samples from cardiovascular disease, metabolic diseases, hematopoietic diseases, and many others. • The Cancer Genome Atlas (TCGA) – Gene expression, SNP, and clinical data from close to 1000 patients (brain, lung, and ovarian cancer) • Large cell line panels – The CCLE dataset, ~ 1000 cell lines, screened for 24 SOC drugs – The Sanger dataset, ~ 1000 cell lines, screened on > 100 SOC drugs 3 |○○○○ | DDMMYY
    • Proprietary data currently in Takeda tranSMART • Velcade Trials – Clinical observations – Gene expression results – Mutation data • Commissioned Studies – Oncopanel 240 – cell line response to Takeda and SOC compounds • Drug response (IC50, EC50, cell cycle blocks, apoptosis induction, etc.) • Mutation status • Gene expression – Oncotest – xenograft response to Takeda and SOC compounds • • • • 4 |○○○○ | Drug response (IC50) Mutation status Gene expression SNP DDMMYY
    • OncoPanel 240 (Ricerca/Eurofins Panlabs) • 240 well-defined tumor cell lines representing diverse tumor types • Drug sensitivity screen results (IC50, EC50) – for 13 Standard of Care anti-tumor compounds – for 8 Takeda compounds targeting diverse pathways • Baseline gene expression • Mutation data 5 |○○○○ | DDMMYY
    • Normalization of information in the data warehouse • Gene expression data – Globally normalized GEO gene expression data using frozen Robust Multiarray Analysis (fMRA), • Quantile based normalization • Currently, only selected Affymetrix platforms are globally normalized – Enabled grouping gene expression results from different labs and different studies by disease • Clinical information – Curate clinical information to create consistent vocabulary 6 |○○○○ | DDMMYY
    • R interface • Enable direct access to tranSMART database tables – Eliminates some limitations of web interface, E.g. inability to perform multi-study queries and analyses. – Provide a connection to the R environment, including diverse analysis packages • Sample functions – getDistinctConcepts – given a keyword/string, returns study codes for matching clinical concepts in the tranSMART database – getGEXdata – given study codes, gets Gene Expression data from the tranSMART database. > br_concepts <transmart.getDistinctConcepts(,'Breast_Cancer') > study_list <- unique(br_concepts$STUDYCODE) > ITGB2_GEP_BR2 <transmart.getGEXData(study_list, gene.list='ITGB2', data.pivot=F) > hist(ITGB2_GEP_BR2$LOG_INTENSITY, br=50, xlim=c(5,12), main="All ITGB2 GEP", xlab="GEP") 7 |○○○○ | DDMMYY
    • Summary • A data warehouse with a large store of gene expression, SNP, and phenotypic data – Clinical samples and cell lines – Data normalized so that comparisons across studies are meaningful – Vocabulary standardized across studies • An R-interface to facilitate cross-study analysis using a large collection of methods from statistics and machine learning • A “toolbox” for achieving key Translational Medicine goals – Bridging the gap between “omic” data generated in preclinical studies and clinical results – Predicting drug efficacy using clinical and pre-clinical information collected for different purposes • Case studies in using this toolbox follow . . . 8 |○○○○ | DDMMYY
    • Building and using a model to predict drug sensitivity MLN7243 IC50 distribution on Ricerca panel 4 Can we identify a relationship between baseline gene expression and drug sensitivity in cell lines . . . 2 0 1 IC50s 3 ? 0 50 100 150 200 Cell lines ??? 9 |○○○○ | DDMMYY . . . and then extrapolate from that relationship to use gene expression to predict drug efficacy in the clinic?
    • Building the predictive models 4 MLN7243 IC50 distribution on Ricerca panel 2 IC50s 3 Oncopanel 240 drug sensitivity 0 1 Oncopanel 240 Expression data 0 50 100 150 200 Cell lines • • • • Normalize all Oncopanel 240 expression data Remove low-intensity and low-variance genes (to get robust signal) Correlation based feature selection (gene expression vs IC50s) Develop a methodology for deriving drug sensitivity models – Based on Partial Least Squares Regression (PLSR) – Captures consensus information from cancer cell line panel data • Use two SOC drugs as proof of concept for methodology – Predict erlotinib (inhibits EGFR) sensitivity – Predict sorafenib (inhibits VEGFR and PDGFR) sensitivity – Use PFS from BATTLE trial to evaluate performance of models 10 |○○○○ | DDMMYY
    • Accuracy of the erlotinib sensitivity model Re-predicting Oncopanel 240 log2(IC50) Accuracy estimation: Upper boundary: 91% Lower boundary: 77% 11 |○○○○ | DDMMYY
    • Signature genes in the Erlotinib model reflect known drug mechanism Signature genes over-representing pathways that contains an EGFR node Signature genes over-connected to EGFR EGFR • Also, EGFR ligand NRG1 is among the signature genes
    • Real data tests of the models • Test 1: The BATTLE clinical trial – 255 lung cancer (NSCLC) patients, 131 with gene expression profile data (GSE33072) • 25 patients in erlotinib arm • 39 patients in sorafenib arm – Are the predictions of the PLSR models consistent with the results of the BATTLE trial? • Test 2: Predicting drug sensitivity across indications – Use model to predict erlotinib and sorafenib sensitivity based on gene expression data from 484 Gene Expression Omnibus datasets in Takeda tranSMART instance • 11,331 samples grouped into 19 major oncology indications • Calculate percentage predicted drug sensitive tumors for each indication • Compare predictions to results of phase III clinical trials and FDA approvals 13 |○○○○ | DDMMYY
    • Test 1 – The BATTLE Trial: Survival analysis of groups predicted to be drug sensitive/resistant by PLSR model 0.0 0.2 0.4 0.6 0.8 1.0 P = 0.09 HR = 0.43 0 1 2 3 4 Proportion of Cases Proportion of Cases (B) E_model pred E_PFS S_model pred S_PFS 0.0 0.2 0.4 0.6 0.8 1.0 (A) 5 P = 0.006 HR = 0.32 0 2 Monthes from Start of Therapy (D) 2 4 6 8 10 Monthes from Start of Therapy 8 10 12 12 S_model pred E_PFS 0.0 0.2 0.4 0.6 0.8 1.0 Proportion of Cases P = 0.32 HR = 1.87 0 6 Monthes from Start of Therapy E_model pred S_PFS 0.0 0.2 0.4 0.6 0.8 1.0 Proportion of Cases (C) 4 P = 0.54 HR = 1.32 0 1 2 3 4 5 Monthes from Start of Therapy E: Erlotinib; S: Sorafenib; red: predicted sensitive; green: predicted resistant 14 |○○○○ | DDMMYY
    • Test 2: Are predictions of erlotinib sensitivity, grouped by indication, consistent with clinical results? Kidney cancer is predicted to be Erlotinib insensitive a phase III clinical trial failed Lung cancer is predicted to be erlotinib sensitive, a phase III clinical trial succeeded, (companion diagnostic available) Potential new indication? Multiple head and neck cancer trials are going on now 15
    • Test 2: Are predictions of sorafenib sensitivity, grouped by indication, consistent with clinical results? Potential new indication? Kidney and Liver cancers are predicted to be Sorafenib sensitive Sorafenib has been approved for Kidney and Liver cancers 16
    • Conclusions • Using tranSMART, we created a large data warehouse to provide computational support for biomarker identification, patient stratification, and other Translational Medicine goals. • Patient and cell line data can be grouped across studies by indication or other attributes to increase statistical power. Grouping is enabled by: – Global normalization of numeric data – Standardization of vocabulary – An R interface that provides direct access to database tables • Using erlotinib and sorafenib as case studies, we demonstrated that the data warehouse and the R interface enable us to predict patient stratification and drug efficacy in cancer indications. 17 |○○○○ | DDMMYY
    • Acknowledgements Takeda Andy Dorner Gene Shin Andrew Krueger Seema Grover Jike Cui (now at Sanofi) Thomson Reuters Elona Kolpakova-Hart 18 |○○○○ | DDMMYY Recombinant by Deloitte Jinlei Liu Mike McDuffie Hiaping Xia
    • Backup Slides 19 |○○○○ | DDMMYY
    • Model test 2: How well do the models predicts predict drug-indication efficacy profile? Successful Cancer Type Lung Cancer Liver Cancer Kidney Cancer Phase III trial FDA approval Erlotinib Sorafenib Sorafenib Number of samples 329 85 218 % tumors predicted Erlotinib sensitive 15.81 0.00 0.46 * % tumors predicted Sorafenib sensitive 0.61 31.76 24.77 * Erlotinib failed to show efficacy for kidney cancer in a phase III trial 20