reverse engineering of gene networks has been on the identification of causal interaction topologies between genes. how can we decide between models having very similar topologies and how do we characterize the actual kinetics of these networks in a way that accurately reflects the causal relationships implied in the proposed topology?
Amyotrophic Lateral Sclerosis (ALS)–also known as Lou Gehrig’s disease (in the US) or Motor Neurone disease (outside the US)–is a fatal neurological disease causing death of the nerve cells in the brain and spinal cord which control voluntary muscle movements. In the early stages of the disease, it is currently very difficult to determine whether a given patient will experience slow or fast disease progression. lack of specific and reliable predictors.
challenge 1 - genomic characterization of 53 cell lines;GI50 concentrations for 31 compounds on 35 cell lines; 18 cell lines for which GI 50 concentrations are not given DLBCL - diffuse large B cell lymphoma cell lines
covariate - may be of direct interest or it may be a confounding or interacting variable.
METABRIC - Molecular taxonomy of Breast Cancer International Consortium
DREAM Challenge Gustavo Stolovitzky, IBM Computational Biology Center Andrea Califano, Columbia University
DREAM• DREAM is a Dialogue for Reverse Engineering Assessments and Methods.• The main objective is to catalyze the interaction between experiment and theory in the area of cellular network inference and quantitative model building in systems biology
Challenges• Network Topology and Parameter Inference Challenge• Sage Bionetworks - DREAM Breast Cancer Prognosis C• The DREAM-Phil Bowen ALS Prediction Prize4Life• NCI-DREAM Drug Sensitivity Prediction Challenge
Project/Challenges• Network Topology and Parameter Inference challenge • Develop/apply optimization methods including the selection of the most informative experiments, to accurately estimate parameters and predict outcomes of perturbations in Systems Biology models given • In Model 1 the complete structure of the model (including expressions for the kinetic rate laws) for a gene regulatory network composed of 9 genes. Protein and mRNA are explicitly modeled. • In Model 2 an incomplete structure of the model, with missing regulatory links, for a gene regulatory network composed of 11 genes. Here, participants will also have to find the missing links. Only proteins are explicitly modeled.
DREAM Phil Bowen ALS Prediction Prize4Life• Goal • Challenge is to predict the progression of disease in ALS patients based on the patient’s current disease status • Specifically develop an approach to predict a given patient’s disease status within a year’s time based on 3 months of data• Data • Includes demographics, medical and family history data, functional measures, vital signs, and lab data (blood chemistry/hematology/urinalysis) collected at multiple times• Disease progression will be calculated as the average change in (ALSFRS) Amyotrophic Lateral Sclerosis Functional Rating Scale over a year’s time from enrollment in a clinical trial• At the end of the challenge, the prediction submitted (based on 3 months of data) will be compared against the actual ALSFRS slope experienced by the patient over a year
Output• Improve disease prediction beyond the current capabilities by • Developing more accurate (sensitive and specific) methods of predicting progression, • Identifying markers (variables) that would enable a determination of expected future disease progression earlier on in the course of the disease• Validation • Validate the model against a subset of patients that are neither part of the training set nor the final validation (test) set. • Submit the actual code written in R language (“Validation Code”) and InnoCentive will run the code against the interim validation data set
NCI-DREAM Drug Sensitivity Prediction Challenge• Use genomic information to build models capable of ranking the sensitivity of cancer cell lines to a set of small molecule compounds or their combinations
Sub Challenges• Sub Challenge 1 • Predict the sensitivity of breast cancer cell lines to previously untested compounds • Model capable of ranking the sensitivity of 18 breast cancer cell lines to 31 compounds • Challenge in this case is to link the drug effects to the underlying genetics of the 53 cell lines.• Sub Challenge 2 • Predicting compound combinations that have a synergistic effect in reducing viability of a DLBCL cell line • Predict the activity of pairs of compounds in the DLBCL LY3 cell line from expression profiles acquired after treatment of the cell line with each of 14 individual compounds
Sage Bionetworks - DREAM Breast Cancer Prognosis Challenge• Background • Molecular diagnostics for cancer therapeutic decision making are among the most promising applications of genomic technology • Molecular profiles have proved particularly powerful in adding prognosis information to standard clinical practice in breast cancer• Trends emerging • Genes defining predictive signatures of the same phenotype often do not overlap across studies • Predictive signatures are not very robust • No consensus regarding the most accurate signatures or computational methods for inferring predictive signatures • No consensus regarding the added value of incorporating molecular data in addition to or instead of traditionally used clinical covariates
Goal/Challenge• Goal • To assess the accuracy of computational models designed to predict breast cancer survival, based on clinical information about the patients tumor as well as genome-wide molecular profiling data including gene expression and copy number profiles• Challenge • Create a community-based effort to provide an unbiased assessment of models and methodologies for the prediction of breast cancer survival • Common dataset will be provided to all participants, with a validation dataset held out for model evaluation • Novel dataset will be generated at the end of the Challenge and used to provide a final, unbiased score for each model
DATA• Training data set from METABRIC cohort of 2000 breast cancer samples • Include detailed clinical annotations • 10 median year survival time, gene expression and copy number data• Additional breast cancer datasets curated by Sage Bionetworks • Can be use in the model development• Web based platform called Synapse • Enable transparent reproducible model building and analysis workflows as well as sharing of data, tools and models with the challenge community• Validation dataset • Derived from 300- 500 fresh frozen primary tumors with the same clinical annotations and survival data as the METABRIC cohort
DATA• Survival data • Survival data is loaded into R as a Surv object as defined in the R survival package. • This object is simply a 2 column matrix with sample names on the rows and columns: • time – time from diagnosis to last follow up. • status – weather the patient was alive at last follow up time• Feature data • Gene expression data. • Performed on the Illumina HT 12v3 platform • Loaded as Bioconductor ExpressionSet object • Data normalized• Copy number data. • Performed on the Affymetrix SNP 6.0 platform • Loaded as Bioconductor ExpressionSet object • Data normalized• Clinical covariates • Loaded as a data.frame object with features
Submission• Models built for this Challenge will be constructed using the R programming language and uploaded to a common platform (Synapse) provided by Sage Bionetworks• Models will be uploaded as R objects implementing a function called customPredict() that returns a vector of survival predictions when given a set of feature data as input• customPredict() function will be run by a validation script for each submitted model and resulting predictions will be scored• Phase 3 submissions must be accompanied by a write-up that includes a short description of the approach used in the final model
Scoring• Challenge models will be scored by calculating the concordance index between the predicted survival and the true survival information in the validation dataset (accounting for the censor variable indicating whether the patient was alive at last follow-up)• Final assessment of models and the determination of the best performer will be based on the concordance index of predictions on the test dataset in Phase 3 of the Challenge.• In addition, other scoring metrics will be considered depending on the suggestions of the community throughout the Challenge
Time Line• Deadline for submitting models for the Breast Cancer Prognosis Challenge is 5PM EST October 15th• Best performers will be announced at the DREAM 7 Conference taking place in San Francisco on November 12 to 16• Final assessment of all models in newly generated data • Additional cohort of 350 breast cancer samples with archived fresh frozen tumor samples has been identified by Anne-Lise Borresen-Dale of Oslo University Hospital and a generous donation has been made by the Avon Foundation to obtain gene expression and copy number data on these samples • Currently curation of the clinical records of this patient cohort to harmonize with the current METABRIC dataset and generation of the genomic profiling data for these samples is being carried on • Aim is to generate these data by the November 12 DREAM conference