SlideShare a Scribd company logo
1 of 1
Download to read offline
Shorter multimarker signatures: a new tool
to facilitate cancer diagnosis
M. P. Schneider, N. Jullian, M. Afshar, M. Guergova-Kuras
Ariana Pharmaceuticals SA, 28 rue du Docteur Finlay ,75015 Paris, France
Reduced prognostic signature in breast cancer
Aim
The aim of the study was to test a new tool based on
logical rules, KEM®Biomarker, to classify two
publically available datasets related to cancer and to
compare the results with alternative machine
learning methods applied on the same data sets.
Abstract 667
Conclusions
The prognostic signatures generated by
KEM®Biomarker had better or similar performances
to other classification methods in terms of sensitivity
and specificity while having significantly lower
number of features in the models.
Results of predictive models
Biomarker signature reduction
High-throughput technologies have led to an exponential growth of the
amount of available data and allowed to derive a new generation of cancer
biomarkers called multimarker signatures. They consist of a computational
algorithm combining multiple biomarkers and demonstrate better
diagnostic performance than single biomarkers. Machine learning methods
make these easy to obtain but their experimental and clinical validation is a
long and difficult process. This validation is strongly dependent on the
complexity of the signature and the number of features. The drive towards
personalized medicine in cancer requires computational methods capable
to generate robust signatures with the minimum number of components.
The performance of the logical model to predict breast
cancer survival at 10.1 years derived with KEM®Biomarker
(SEN=84% & SPE=88%) comprising only 13 features was
similar to the performance of the 70-gene MammaPrintTM
signature (SEN=85% & SPE=82%) (van’t Veer, 2002).
An algorithm with only 13 features were enough to predict
ovarian cancer with a balanced accuracy (BAC) of ~94%
using pre-selected features (P-value<0.01) with KEM®
Biomarker. Thus, a similar performance achieved by Zhou
et al. (2010) (97% BAC) using a customized functional
support vector machine based algorithm (fSVM) with
2,084 features in a leave one out cross-validation (CV)
strategy.
The logical model to predict survival at 10.1 years is comprised of 13 features organized
in 4 features logical rules.
Combining up to 2 “OR” and 2 “AND”
features in the logical rules, rules with a
balanced accuracy > 81% can be achieved
compared to the performance of individual
features (highest AUC=0.80). In the ROC
graph, the blue dots represents all the logical
rules generated, the red dots are the selected
10 rules by highest balanced accuracy and
the lines are the ROC curves of features
NM_003882, NM_006101, NM_003981.
Data &
discretization Model 10k-fold* CV LOO CV
SEN SPE BAC N SEN SPE BAC SEN SPE BAC
Breast cancer
Binary 84 88 86 13 74 77 76 70 81 76
Tiertile 78 85 82 13 69 80 74 69 77 73
Ovarian cancer
Median 89 100 97 13 87 100 93 89 100 94
Tiertile 93 100 97 14 88 100 94 93 100 97
SPE: specificity; SEN: sensitivity; BAC: balanced accuracy; N: number of features; CV: cross-
validation; LOO: Leave One Out ; *: mean of 10 repeated cross validation.
Material and Methods
Two publically data set were classified: the gene expression breast cancer data of the 70 genes comprising the MammaPrintTM signature of 148 patients (74 death and 74 alive) (van de
Vijver et al., 2002) and 20,000 mass spectrometry profiles from blood sera measured in 94 patients with ovarian cancer (44 patients) or benign conditions (50 patients) (Zhou et al.,
2010). Association rules were derived with logical operators “AND” and “OR” (Afshar et al., 2006). To generate logical rules variables were discretized and different discretization
strategies were considered.
http://www.arianapharma.com/
Breast cancer discretization strategy Ovarian cancer discretization strategy
For the breast cancer data, gene expression data was transformed into binary form, i.e.
positive logarithmic fold changes were classified as >0 (up_regulated) and negative and
missing ones as ≤0 (down_regulated). In the ovarian cancer data, features were
discretized as Low when the value of the feature was ≤ 0.33 * median and High when
the value of the feature was > 3*median. In both data sets, variables were also
discretized using tiertiles, i.e. based on their distribution, and only under and over-
expressed values were kept for logical rules generation. Models to predict breast cancer
survival at 10.1 years and ovarian cancer were evaluated using Majority Vote on 10 rules
selected by their highest balanced accuracy from all the logical rules generated.
Performance was evaluated as sensitivity and specificity and the method was validated
using repeated 10K-fold & leave one out CV. Feature selection was applied in the
ovarian cancer data before discretization as in Zhou et al. Only features with a P-value ≤
0.01 were retained for the biomarker analyses.
Color scale indicating the number of occurrences of a feature in a model
The validation of the method consisted of choosing the best rules in terms of BAC
among all the logical rules generated. The global performance of the two CV methods
was very similar in both data sets. We have tested two discretization strategies that
favors more or less extreme values (e.g. over and under expressed). Results show that
strategies do not modify the overall performance nor the ratio between SEN and SPE.
The method was also compared to the performances of four other machine learning
methods implemented by Vanneschi et al. (2011) on the same data set using a 70/30
cross-validation over 50 random rounds. KEM® Biomarker yielded the lowest average
number of incorrectly classified instances (30%) compared to the other machine
learning algorithms.
KEM® GP SVM-k3 MP RF
Classification
Error
30% 37.2% 41.6% 42.7% 40%
NM_003882 > 0 AND NM_003981 ≤ 0
NM_006101 ≤ 0
NM_007036 ≤ 0
Contig_38288_RC ≤ 0
Contig_63649_RC ≤ 0
Contig_24252_RC ≤ 0
NM_001282 > 0
Contig_38288_RC ≤ 0 NM_016448 ≤ 0
NM_006101 ≤ 0 Contig_63649_RC ≤ 0 AF201951 > 0
NM_001809 ≤ 0
NM_020188 ≤ 0
NM_003882 > 0
NM_020188 ≤ 0 NM_003981 ≤ 0 NM_000127 ≤ 0NM_016448 ≤ 0
OR
8 7 4 3 2 1
AND
AND ANDOR
AND OR AND
Afshar et al. Comprehensive Medical Chemistry II, 4: 767-774, 2006.
Vanneschi et al. BioData Mining, 4: 12, 2011
van de Vijver et al. N Engl J Med, 347(25): 1999-2009, 2002.
The results of Genetic Programming (GP), Support Vector Machine with polynomial
kernel 3.0 (SVM-k3), Multilayered Perceptrons (MP) and Random Forest (RF) were
taken from Vanneschi et al. (2011).
Survived Not survived
≤ 0 > 0
Binary
≤ -0.15 > 0.08
Tiertile TiertileMedian
> 0.78≤ 0.09
Ovarian cancer Healthy
> 0.33≤ 0.17
van’t Veer et al. Nature 415: 53-536, 2002
Zhou et al. Cancer Epidemiology, Biomarkers & Prevention, 19(9), 2010.
Model BAC: 86 % Model BAC: 84 %
CV BAC: 94 % CV BAC: 97 %
AND
Breast Cancer
Gene expression
Agilent platform
Ovarian Cancer
Metabolomics
Mass Spectrometry

More Related Content

What's hot

On Predicting and Analyzing Breast Cancer using Data Mining Approach
On Predicting and Analyzing Breast Cancer using Data Mining ApproachOn Predicting and Analyzing Breast Cancer using Data Mining Approach
On Predicting and Analyzing Breast Cancer using Data Mining ApproachMasud Rana Basunia
 
Breast cancer classification
Breast cancer classificationBreast cancer classification
Breast cancer classificationAshwan Abdulmunem
 
QSAR Studies of the Inhibitory Activity of a Series of Substituted Indole and...
QSAR Studies of the Inhibitory Activity of a Series of Substituted Indole and...QSAR Studies of the Inhibitory Activity of a Series of Substituted Indole and...
QSAR Studies of the Inhibitory Activity of a Series of Substituted Indole and...inventionjournals
 
Ensemble Classifier Approach in Breast Cancer Detection and Malignancy Gradin...
Ensemble Classifier Approach in Breast Cancer Detection and Malignancy Gradin...Ensemble Classifier Approach in Breast Cancer Detection and Malignancy Gradin...
Ensemble Classifier Approach in Breast Cancer Detection and Malignancy Gradin...ijmpict
 
A fast Algorithm for Automatic Segmentation of Pancreas Histological Images f...
A fast Algorithm for Automatic Segmentation of Pancreas Histological Images f...A fast Algorithm for Automatic Segmentation of Pancreas Histological Images f...
A fast Algorithm for Automatic Segmentation of Pancreas Histological Images f...Tathagata Bandyopadhyay
 
ITC in metastatic melanoma
ITC in metastatic melanomaITC in metastatic melanoma
ITC in metastatic melanomaSheily Kamra
 
Policy Implications of Methods Used for Analyzing Intensive Care Costs of Acu...
Policy Implications of Methods Used for Analyzing Intensive Care Costs of Acu...Policy Implications of Methods Used for Analyzing Intensive Care Costs of Acu...
Policy Implications of Methods Used for Analyzing Intensive Care Costs of Acu...Leonard Davis Institute of Health Economics
 
Segmentation and Classification of Skin Lesions Based on Texture Features
Segmentation and Classification of Skin Lesions Based on Texture FeaturesSegmentation and Classification of Skin Lesions Based on Texture Features
Segmentation and Classification of Skin Lesions Based on Texture FeaturesIJERA Editor
 
Statistical modeling in pharmaceutical research and development
Statistical modeling in pharmaceutical research and developmentStatistical modeling in pharmaceutical research and development
Statistical modeling in pharmaceutical research and developmentPV. Viji
 
Ensemble strategies for a medical diagnostic decision support system: A breas...
Ensemble strategies for a medical diagnostic decision support system: A breas...Ensemble strategies for a medical diagnostic decision support system: A breas...
Ensemble strategies for a medical diagnostic decision support system: A breas...dewisetiyana52
 
Computer simulation in pharmacokinetics and pharmacodynamics
Computer simulation in pharmacokinetics and pharmacodynamicsComputer simulation in pharmacokinetics and pharmacodynamics
Computer simulation in pharmacokinetics and pharmacodynamicsMOHAMMAD ASIM
 
Predictive Analysis of Breast Cancer Detection using Classification Algorithm
Predictive Analysis of Breast Cancer Detection using Classification AlgorithmPredictive Analysis of Breast Cancer Detection using Classification Algorithm
Predictive Analysis of Breast Cancer Detection using Classification AlgorithmSushanti Acharya
 
IRJET- Detection and Classification of Breast Cancer from Mammogram Image
IRJET-  	  Detection and Classification of Breast Cancer from Mammogram ImageIRJET-  	  Detection and Classification of Breast Cancer from Mammogram Image
IRJET- Detection and Classification of Breast Cancer from Mammogram ImageIRJET Journal
 
An Ensemble of Filters and Wrappers for Microarray Data Classification
An Ensemble of Filters and Wrappers for Microarray Data Classification An Ensemble of Filters and Wrappers for Microarray Data Classification
An Ensemble of Filters and Wrappers for Microarray Data Classification mlaij
 

What's hot (16)

On Predicting and Analyzing Breast Cancer using Data Mining Approach
On Predicting and Analyzing Breast Cancer using Data Mining ApproachOn Predicting and Analyzing Breast Cancer using Data Mining Approach
On Predicting and Analyzing Breast Cancer using Data Mining Approach
 
Breast cancer classification
Breast cancer classificationBreast cancer classification
Breast cancer classification
 
QSAR Studies of the Inhibitory Activity of a Series of Substituted Indole and...
QSAR Studies of the Inhibitory Activity of a Series of Substituted Indole and...QSAR Studies of the Inhibitory Activity of a Series of Substituted Indole and...
QSAR Studies of the Inhibitory Activity of a Series of Substituted Indole and...
 
Ensemble Classifier Approach in Breast Cancer Detection and Malignancy Gradin...
Ensemble Classifier Approach in Breast Cancer Detection and Malignancy Gradin...Ensemble Classifier Approach in Breast Cancer Detection and Malignancy Gradin...
Ensemble Classifier Approach in Breast Cancer Detection and Malignancy Gradin...
 
A fast Algorithm for Automatic Segmentation of Pancreas Histological Images f...
A fast Algorithm for Automatic Segmentation of Pancreas Histological Images f...A fast Algorithm for Automatic Segmentation of Pancreas Histological Images f...
A fast Algorithm for Automatic Segmentation of Pancreas Histological Images f...
 
RGUHS Rank Holders Names and Marks - RRMCH
RGUHS Rank Holders Names and Marks - RRMCHRGUHS Rank Holders Names and Marks - RRMCH
RGUHS Rank Holders Names and Marks - RRMCH
 
ITC in metastatic melanoma
ITC in metastatic melanomaITC in metastatic melanoma
ITC in metastatic melanoma
 
Policy Implications of Methods Used for Analyzing Intensive Care Costs of Acu...
Policy Implications of Methods Used for Analyzing Intensive Care Costs of Acu...Policy Implications of Methods Used for Analyzing Intensive Care Costs of Acu...
Policy Implications of Methods Used for Analyzing Intensive Care Costs of Acu...
 
Segmentation and Classification of Skin Lesions Based on Texture Features
Segmentation and Classification of Skin Lesions Based on Texture FeaturesSegmentation and Classification of Skin Lesions Based on Texture Features
Segmentation and Classification of Skin Lesions Based on Texture Features
 
Statistical modeling in pharmaceutical research and development
Statistical modeling in pharmaceutical research and developmentStatistical modeling in pharmaceutical research and development
Statistical modeling in pharmaceutical research and development
 
Ensemble strategies for a medical diagnostic decision support system: A breas...
Ensemble strategies for a medical diagnostic decision support system: A breas...Ensemble strategies for a medical diagnostic decision support system: A breas...
Ensemble strategies for a medical diagnostic decision support system: A breas...
 
Sample Presentation
Sample PresentationSample Presentation
Sample Presentation
 
Computer simulation in pharmacokinetics and pharmacodynamics
Computer simulation in pharmacokinetics and pharmacodynamicsComputer simulation in pharmacokinetics and pharmacodynamics
Computer simulation in pharmacokinetics and pharmacodynamics
 
Predictive Analysis of Breast Cancer Detection using Classification Algorithm
Predictive Analysis of Breast Cancer Detection using Classification AlgorithmPredictive Analysis of Breast Cancer Detection using Classification Algorithm
Predictive Analysis of Breast Cancer Detection using Classification Algorithm
 
IRJET- Detection and Classification of Breast Cancer from Mammogram Image
IRJET-  	  Detection and Classification of Breast Cancer from Mammogram ImageIRJET-  	  Detection and Classification of Breast Cancer from Mammogram Image
IRJET- Detection and Classification of Breast Cancer from Mammogram Image
 
An Ensemble of Filters and Wrappers for Microarray Data Classification
An Ensemble of Filters and Wrappers for Microarray Data Classification An Ensemble of Filters and Wrappers for Microarray Data Classification
An Ensemble of Filters and Wrappers for Microarray Data Classification
 

Viewers also liked

Cros escolar local 2016
Cros escolar local 2016Cros escolar local 2016
Cros escolar local 2016apares
 
Extracting source code of apk file
Extracting source code of apk fileExtracting source code of apk file
Extracting source code of apk fileDeepanshu Gajbhiye
 
BI Congres Het nut van een gegevensinfrastructuur Marc Govers 2012
BI Congres Het nut van een gegevensinfrastructuur Marc Govers 2012BI Congres Het nut van een gegevensinfrastructuur Marc Govers 2012
BI Congres Het nut van een gegevensinfrastructuur Marc Govers 2012Marc Govers
 
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...Daniel Valcarce
 
Unit 5 fiscal and monetary policy
Unit 5  fiscal and monetary policyUnit 5  fiscal and monetary policy
Unit 5 fiscal and monetary policyBethany Bryski
 
Concept based information retrieval using explicit
Concept based information retrieval using explicitConcept based information retrieval using explicit
Concept based information retrieval using explicitnadikari123
 
Production possibility frontier bethany.ppt
Production possibility frontier bethany.pptProduction possibility frontier bethany.ppt
Production possibility frontier bethany.pptBethany Bryski
 
Iona reyes programa ng pamahalaan sa pagpapaunlad ng bansa
Iona reyes programa ng pamahalaan sa pagpapaunlad ng bansaIona reyes programa ng pamahalaan sa pagpapaunlad ng bansa
Iona reyes programa ng pamahalaan sa pagpapaunlad ng bansaAlice Bernardo
 
Tugas ringkasan Global Value Chain
Tugas ringkasan Global Value ChainTugas ringkasan Global Value Chain
Tugas ringkasan Global Value ChainMegitta Ignacia
 
Tugas Pemasaran Global : MERGER
Tugas Pemasaran Global : MERGERTugas Pemasaran Global : MERGER
Tugas Pemasaran Global : MERGERMegitta Ignacia
 

Viewers also liked (15)

101 For Everyone
101 For Everyone101 For Everyone
101 For Everyone
 
Finario Overview
Finario OverviewFinario Overview
Finario Overview
 
Al ver mis horas de fiebre
Al ver mis horas de fiebreAl ver mis horas de fiebre
Al ver mis horas de fiebre
 
Cros escolar local 2016
Cros escolar local 2016Cros escolar local 2016
Cros escolar local 2016
 
Extracting source code of apk file
Extracting source code of apk fileExtracting source code of apk file
Extracting source code of apk file
 
BI Congres Het nut van een gegevensinfrastructuur Marc Govers 2012
BI Congres Het nut van een gegevensinfrastructuur Marc Govers 2012BI Congres Het nut van een gegevensinfrastructuur Marc Govers 2012
BI Congres Het nut van een gegevensinfrastructuur Marc Govers 2012
 
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
 
Unit 5 fiscal and monetary policy
Unit 5  fiscal and monetary policyUnit 5  fiscal and monetary policy
Unit 5 fiscal and monetary policy
 
Labor markets
Labor marketsLabor markets
Labor markets
 
Concept based information retrieval using explicit
Concept based information retrieval using explicitConcept based information retrieval using explicit
Concept based information retrieval using explicit
 
Production possibility frontier bethany.ppt
Production possibility frontier bethany.pptProduction possibility frontier bethany.ppt
Production possibility frontier bethany.ppt
 
Iona reyes programa ng pamahalaan sa pagpapaunlad ng bansa
Iona reyes programa ng pamahalaan sa pagpapaunlad ng bansaIona reyes programa ng pamahalaan sa pagpapaunlad ng bansa
Iona reyes programa ng pamahalaan sa pagpapaunlad ng bansa
 
Math in ConTeXt
Math in ConTeXt Math in ConTeXt
Math in ConTeXt
 
Tugas ringkasan Global Value Chain
Tugas ringkasan Global Value ChainTugas ringkasan Global Value Chain
Tugas ringkasan Global Value Chain
 
Tugas Pemasaran Global : MERGER
Tugas Pemasaran Global : MERGERTugas Pemasaran Global : MERGER
Tugas Pemasaran Global : MERGER
 

Similar to Shorter Multi-marker Signatures: a new tool to facilitate cancer diagnosis

THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...IJDKP
 
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...IJDKP
 
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...IJDKP
 
CSCI 6505 Machine Learning Project
CSCI 6505 Machine Learning ProjectCSCI 6505 Machine Learning Project
CSCI 6505 Machine Learning Projectbutest
 
Diagnosis of Cancer using Fuzzy Rough Set Theory
Diagnosis of Cancer using Fuzzy Rough Set TheoryDiagnosis of Cancer using Fuzzy Rough Set Theory
Diagnosis of Cancer using Fuzzy Rough Set TheoryIRJET Journal
 
Srge most important publications 2020
Srge most important  publications 2020Srge most important  publications 2020
Srge most important publications 2020Aboul Ella Hassanien
 
Design of an Intelligent System for Improving Classification of Cancer Diseases
Design of an Intelligent System for Improving Classification of Cancer DiseasesDesign of an Intelligent System for Improving Classification of Cancer Diseases
Design of an Intelligent System for Improving Classification of Cancer DiseasesMohamed Loey
 
PSO-SVM hybrid system for melanoma detection from histo-pathological images
PSO-SVM hybrid system for melanoma detection from histo-pathological imagesPSO-SVM hybrid system for melanoma detection from histo-pathological images
PSO-SVM hybrid system for melanoma detection from histo-pathological imagesIJECEIAES
 
i.a.Preoperative ovarian cancer diagnosis using neuro fuzzy approach
i.a.Preoperative ovarian cancer diagnosis using neuro fuzzy approachi.a.Preoperative ovarian cancer diagnosis using neuro fuzzy approach
i.a.Preoperative ovarian cancer diagnosis using neuro fuzzy approachJonathan Josue Cid Galiot
 
POSTER_JIANYU_LIU
POSTER_JIANYU_LIUPOSTER_JIANYU_LIU
POSTER_JIANYU_LIUJianyu Liu
 
Multivariate sample similarity measure for feature selection with a resemblan...
Multivariate sample similarity measure for feature selection with a resemblan...Multivariate sample similarity measure for feature selection with a resemblan...
Multivariate sample similarity measure for feature selection with a resemblan...IJECEIAES
 
High Sensitivity Sanger Sequencing for Minor Variant Detection
High Sensitivity Sanger Sequencing for Minor Variant DetectionHigh Sensitivity Sanger Sequencing for Minor Variant Detection
High Sensitivity Sanger Sequencing for Minor Variant DetectionThermo Fisher Scientific
 
A Novel Approach for Cancer Detection in MRI Mammogram Using Decision Tree In...
A Novel Approach for Cancer Detection in MRI Mammogram Using Decision Tree In...A Novel Approach for Cancer Detection in MRI Mammogram Using Decision Tree In...
A Novel Approach for Cancer Detection in MRI Mammogram Using Decision Tree In...CSCJournals
 
coad_machine_learning
coad_machine_learningcoad_machine_learning
coad_machine_learningFord Sleeman
 
A new model for large dataset dimensionality reduction based on teaching lear...
A new model for large dataset dimensionality reduction based on teaching lear...A new model for large dataset dimensionality reduction based on teaching lear...
A new model for large dataset dimensionality reduction based on teaching lear...TELKOMNIKA JOURNAL
 
In-silico structure activity relationship study of toxicity endpoints by QSAR...
In-silico structure activity relationship study of toxicity endpoints by QSAR...In-silico structure activity relationship study of toxicity endpoints by QSAR...
In-silico structure activity relationship study of toxicity endpoints by QSAR...Kamel Mansouri
 
Comparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning andComparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning andAlexander Decker
 
Comparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning andComparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning andAlexander Decker
 
Controlling informative features for improved accuracy and faster predictions...
Controlling informative features for improved accuracy and faster predictions...Controlling informative features for improved accuracy and faster predictions...
Controlling informative features for improved accuracy and faster predictions...Damian R. Mingle, MBA
 

Similar to Shorter Multi-marker Signatures: a new tool to facilitate cancer diagnosis (20)

Comparison of breast cancer classification models on Wisconsin dataset
Comparison of breast cancer classification models on Wisconsin  datasetComparison of breast cancer classification models on Wisconsin  dataset
Comparison of breast cancer classification models on Wisconsin dataset
 
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
 
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
 
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
 
CSCI 6505 Machine Learning Project
CSCI 6505 Machine Learning ProjectCSCI 6505 Machine Learning Project
CSCI 6505 Machine Learning Project
 
Diagnosis of Cancer using Fuzzy Rough Set Theory
Diagnosis of Cancer using Fuzzy Rough Set TheoryDiagnosis of Cancer using Fuzzy Rough Set Theory
Diagnosis of Cancer using Fuzzy Rough Set Theory
 
Srge most important publications 2020
Srge most important  publications 2020Srge most important  publications 2020
Srge most important publications 2020
 
Design of an Intelligent System for Improving Classification of Cancer Diseases
Design of an Intelligent System for Improving Classification of Cancer DiseasesDesign of an Intelligent System for Improving Classification of Cancer Diseases
Design of an Intelligent System for Improving Classification of Cancer Diseases
 
PSO-SVM hybrid system for melanoma detection from histo-pathological images
PSO-SVM hybrid system for melanoma detection from histo-pathological imagesPSO-SVM hybrid system for melanoma detection from histo-pathological images
PSO-SVM hybrid system for melanoma detection from histo-pathological images
 
i.a.Preoperative ovarian cancer diagnosis using neuro fuzzy approach
i.a.Preoperative ovarian cancer diagnosis using neuro fuzzy approachi.a.Preoperative ovarian cancer diagnosis using neuro fuzzy approach
i.a.Preoperative ovarian cancer diagnosis using neuro fuzzy approach
 
POSTER_JIANYU_LIU
POSTER_JIANYU_LIUPOSTER_JIANYU_LIU
POSTER_JIANYU_LIU
 
Multivariate sample similarity measure for feature selection with a resemblan...
Multivariate sample similarity measure for feature selection with a resemblan...Multivariate sample similarity measure for feature selection with a resemblan...
Multivariate sample similarity measure for feature selection with a resemblan...
 
High Sensitivity Sanger Sequencing for Minor Variant Detection
High Sensitivity Sanger Sequencing for Minor Variant DetectionHigh Sensitivity Sanger Sequencing for Minor Variant Detection
High Sensitivity Sanger Sequencing for Minor Variant Detection
 
A Novel Approach for Cancer Detection in MRI Mammogram Using Decision Tree In...
A Novel Approach for Cancer Detection in MRI Mammogram Using Decision Tree In...A Novel Approach for Cancer Detection in MRI Mammogram Using Decision Tree In...
A Novel Approach for Cancer Detection in MRI Mammogram Using Decision Tree In...
 
coad_machine_learning
coad_machine_learningcoad_machine_learning
coad_machine_learning
 
A new model for large dataset dimensionality reduction based on teaching lear...
A new model for large dataset dimensionality reduction based on teaching lear...A new model for large dataset dimensionality reduction based on teaching lear...
A new model for large dataset dimensionality reduction based on teaching lear...
 
In-silico structure activity relationship study of toxicity endpoints by QSAR...
In-silico structure activity relationship study of toxicity endpoints by QSAR...In-silico structure activity relationship study of toxicity endpoints by QSAR...
In-silico structure activity relationship study of toxicity endpoints by QSAR...
 
Comparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning andComparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning and
 
Comparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning andComparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning and
 
Controlling informative features for improved accuracy and faster predictions...
Controlling informative features for improved accuracy and faster predictions...Controlling informative features for improved accuracy and faster predictions...
Controlling informative features for improved accuracy and faster predictions...
 

Recently uploaded

Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 

Recently uploaded (20)

Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 

Shorter Multi-marker Signatures: a new tool to facilitate cancer diagnosis

  • 1. Shorter multimarker signatures: a new tool to facilitate cancer diagnosis M. P. Schneider, N. Jullian, M. Afshar, M. Guergova-Kuras Ariana Pharmaceuticals SA, 28 rue du Docteur Finlay ,75015 Paris, France Reduced prognostic signature in breast cancer Aim The aim of the study was to test a new tool based on logical rules, KEM®Biomarker, to classify two publically available datasets related to cancer and to compare the results with alternative machine learning methods applied on the same data sets. Abstract 667 Conclusions The prognostic signatures generated by KEM®Biomarker had better or similar performances to other classification methods in terms of sensitivity and specificity while having significantly lower number of features in the models. Results of predictive models Biomarker signature reduction High-throughput technologies have led to an exponential growth of the amount of available data and allowed to derive a new generation of cancer biomarkers called multimarker signatures. They consist of a computational algorithm combining multiple biomarkers and demonstrate better diagnostic performance than single biomarkers. Machine learning methods make these easy to obtain but their experimental and clinical validation is a long and difficult process. This validation is strongly dependent on the complexity of the signature and the number of features. The drive towards personalized medicine in cancer requires computational methods capable to generate robust signatures with the minimum number of components. The performance of the logical model to predict breast cancer survival at 10.1 years derived with KEM®Biomarker (SEN=84% & SPE=88%) comprising only 13 features was similar to the performance of the 70-gene MammaPrintTM signature (SEN=85% & SPE=82%) (van’t Veer, 2002). An algorithm with only 13 features were enough to predict ovarian cancer with a balanced accuracy (BAC) of ~94% using pre-selected features (P-value<0.01) with KEM® Biomarker. Thus, a similar performance achieved by Zhou et al. (2010) (97% BAC) using a customized functional support vector machine based algorithm (fSVM) with 2,084 features in a leave one out cross-validation (CV) strategy. The logical model to predict survival at 10.1 years is comprised of 13 features organized in 4 features logical rules. Combining up to 2 “OR” and 2 “AND” features in the logical rules, rules with a balanced accuracy > 81% can be achieved compared to the performance of individual features (highest AUC=0.80). In the ROC graph, the blue dots represents all the logical rules generated, the red dots are the selected 10 rules by highest balanced accuracy and the lines are the ROC curves of features NM_003882, NM_006101, NM_003981. Data & discretization Model 10k-fold* CV LOO CV SEN SPE BAC N SEN SPE BAC SEN SPE BAC Breast cancer Binary 84 88 86 13 74 77 76 70 81 76 Tiertile 78 85 82 13 69 80 74 69 77 73 Ovarian cancer Median 89 100 97 13 87 100 93 89 100 94 Tiertile 93 100 97 14 88 100 94 93 100 97 SPE: specificity; SEN: sensitivity; BAC: balanced accuracy; N: number of features; CV: cross- validation; LOO: Leave One Out ; *: mean of 10 repeated cross validation. Material and Methods Two publically data set were classified: the gene expression breast cancer data of the 70 genes comprising the MammaPrintTM signature of 148 patients (74 death and 74 alive) (van de Vijver et al., 2002) and 20,000 mass spectrometry profiles from blood sera measured in 94 patients with ovarian cancer (44 patients) or benign conditions (50 patients) (Zhou et al., 2010). Association rules were derived with logical operators “AND” and “OR” (Afshar et al., 2006). To generate logical rules variables were discretized and different discretization strategies were considered. http://www.arianapharma.com/ Breast cancer discretization strategy Ovarian cancer discretization strategy For the breast cancer data, gene expression data was transformed into binary form, i.e. positive logarithmic fold changes were classified as >0 (up_regulated) and negative and missing ones as ≤0 (down_regulated). In the ovarian cancer data, features were discretized as Low when the value of the feature was ≤ 0.33 * median and High when the value of the feature was > 3*median. In both data sets, variables were also discretized using tiertiles, i.e. based on their distribution, and only under and over- expressed values were kept for logical rules generation. Models to predict breast cancer survival at 10.1 years and ovarian cancer were evaluated using Majority Vote on 10 rules selected by their highest balanced accuracy from all the logical rules generated. Performance was evaluated as sensitivity and specificity and the method was validated using repeated 10K-fold & leave one out CV. Feature selection was applied in the ovarian cancer data before discretization as in Zhou et al. Only features with a P-value ≤ 0.01 were retained for the biomarker analyses. Color scale indicating the number of occurrences of a feature in a model The validation of the method consisted of choosing the best rules in terms of BAC among all the logical rules generated. The global performance of the two CV methods was very similar in both data sets. We have tested two discretization strategies that favors more or less extreme values (e.g. over and under expressed). Results show that strategies do not modify the overall performance nor the ratio between SEN and SPE. The method was also compared to the performances of four other machine learning methods implemented by Vanneschi et al. (2011) on the same data set using a 70/30 cross-validation over 50 random rounds. KEM® Biomarker yielded the lowest average number of incorrectly classified instances (30%) compared to the other machine learning algorithms. KEM® GP SVM-k3 MP RF Classification Error 30% 37.2% 41.6% 42.7% 40% NM_003882 > 0 AND NM_003981 ≤ 0 NM_006101 ≤ 0 NM_007036 ≤ 0 Contig_38288_RC ≤ 0 Contig_63649_RC ≤ 0 Contig_24252_RC ≤ 0 NM_001282 > 0 Contig_38288_RC ≤ 0 NM_016448 ≤ 0 NM_006101 ≤ 0 Contig_63649_RC ≤ 0 AF201951 > 0 NM_001809 ≤ 0 NM_020188 ≤ 0 NM_003882 > 0 NM_020188 ≤ 0 NM_003981 ≤ 0 NM_000127 ≤ 0NM_016448 ≤ 0 OR 8 7 4 3 2 1 AND AND ANDOR AND OR AND Afshar et al. Comprehensive Medical Chemistry II, 4: 767-774, 2006. Vanneschi et al. BioData Mining, 4: 12, 2011 van de Vijver et al. N Engl J Med, 347(25): 1999-2009, 2002. The results of Genetic Programming (GP), Support Vector Machine with polynomial kernel 3.0 (SVM-k3), Multilayered Perceptrons (MP) and Random Forest (RF) were taken from Vanneschi et al. (2011). Survived Not survived ≤ 0 > 0 Binary ≤ -0.15 > 0.08 Tiertile TiertileMedian > 0.78≤ 0.09 Ovarian cancer Healthy > 0.33≤ 0.17 van’t Veer et al. Nature 415: 53-536, 2002 Zhou et al. Cancer Epidemiology, Biomarkers & Prevention, 19(9), 2010. Model BAC: 86 % Model BAC: 84 % CV BAC: 94 % CV BAC: 97 % AND Breast Cancer Gene expression Agilent platform Ovarian Cancer Metabolomics Mass Spectrometry