SlideShare a Scribd company logo
1 of 16
Gargi Mukherjee … Rutgers University, New Jersey
Kevin Raines … Stanford University, California
Srikanth Sastry … JNC, Bengaluru, India
Sebastian Doniach … Stanford University, California
Gyan Bhanot … Rutgers University, New Jersey
Michael Biehl … University of Groningen, The Netherlands
1
Predicting Recurrence in Clear Cell
Renal Cell Carcinoma
Analysis of TCGA data using Outlier Analysis and GMLVQ
WCCI 2016, Vancouver / BC 2 /15
overview
gene expression in tumor cells
specific example: clear cell Renal Cell Carcinomas (ccRCC)
• outlier analysis: identification of a panel of prognostic genes
with respect to recurrence
• risk score: prediction of individual recurrence risk
based on outlier status w.r.t. selected genes
• machine learning: analysis of extreme cases of low / high risk
distance based classification and relevance learning
(Generalized Matrix Relevance LVQ)
clinical data: recurrence free intervals
WCCI 2016, Vancouver / BC 3 /15
clear cell Renal Cell Carcinoma (ccRCC)
publicly available datasets:
The Cancer Genome Atlas (TCGA) cancergenome.nih.gov
also hosted at Broad Institute gdac.broadinstitute.org
data
WCCI 2016, Vancouver / BC 4 /15
data
20532genes
65normalsamples
469 tumor
samples
65 + 65
matched
clear cell renal cell carcinoma
TCGA data @ Broad Institute
mRNA-Seq expression data X
normalized, log-transformed:
Y=log(1+X)
65 normal samples
65 matched tumor samples
469 tumor samples in total
number of
recurrences
recurrence data:
days after diagnosis
WCCI 2016, Vancouver / BC 5 /15
380
training
samples
outlier analysis
89testsamples
randomized split
WCCI 2016, Vancouver / BC 6 /15
380
training
samples
outlier analysis
per gene:
determine
mean μ, standard deviation σ of Y
for each gene: identify outlier samples
Y > μ + σ “high outlier“
Y < μ - σ “low outlier“
restrict the following analysis to genes with
≥ 20 high outlier samples
or ≥ 20 low outlier samples
WCCI 2016, Vancouver / BC 7 /15
1546 „high-outlier genes“
with KM log rank p < 0.001
1628 „low-outlier genes“
with KM log rank p < 0.0005
construct two binary outlier matrices
„1“ for high-outlier samples
„0“ else
„1“ for low-outlier samples
„0“ else
1546 genes
 PCA
Kaplan-Meier (KM) analysis per gene:
test for significant association of outlier status of samples with recurrence
outlier analysis
1628 genes
380samples380samples
WCCI 2016, Vancouver / BC 8 /15
PCA reveals
four clusters of genes
711475
2261402
A B
DC
high outlier genes
low outlier genes
genes in small clusters (B,D):
outlier status associated
with late recurrence
genes in large clusters (A,C):
outlier status associated
with early recurrence
outlier analysis
WCCI 2016, Vancouver / BC 9 /15
recurrence risk score
top 20 genes (by KM p-value) from each cluster A,B,C,D
reference set of 80 genes
for each sample:
- determine outlier status with respect to the 80 genes (Y >?< μ ± σ )
- add up contributions per gene
- 1 if the sample is outlier w.r.t. to a gene in A or C (early rec.)
0 if the sample is not an outlier w.r.t. the gene
+ 1 if the sample is outlier w.r.t. to a gene in B or D (late rec.)
recurrence risk score - 40 ≤ R ≤ + 40
observe: median = 2 over the 380 training samples
crisp classification w.r.t. recurrence risk:
high risk (early recurrence) if R < 2
low risk (late recurrence) if R ≥ 2
WCCI 2016, Vancouver / BC 10 /15
recurrence risk prediction
training set (380 samples) test set (89 samples)
log rank p < 1.e-16 log rank p < 1.e-4
KM plots with respect to high / low risk groups:
• risk score R is predictive of the actual recurrence risk
• the 80 selected genes can serve as a prognostic panel
WCCI 2016, Vancouver / BC 11 /15
extreme case analysis
number of
recurrences:
≤ 2 years
(early)
> 5 years
(late or no
recurrence)
109 samples
class 2, high risk
107 samples
class 1, low risk
(undefined)
2 classes:
• 80-dim. feature vectors (gene expression)
• representation by one prototype vector per class:
• adaptive distance measure for comparison of samples and prototypes:
with relevance matrix
• distance-based classification, e.g. Nearest Prototype Classifier (NPC)
WCCI 2016, Vancouver / BC 12 /15
GMLVQ classifier
Generalized Matrix Relevance Learning Vector Quantization (GMLVQ)
training of prototypes and relevance matrix
= minimization of an appropriate cost function
with respect to performance on labeled training set
components of diagonal elements of Λ
A B C D A B C D
lowexpression|highexpression
WCCI 2016, Vancouver / BC 13 /15
GMLVQ classifier
ROC of GMLVQ classifier (Leave-One-Out of the 216 extreme samples)
KM plot w.r.t. all 469 samples
( L-1-O for 216 samples, plus 253 undefined )
log rank p < 1.e-7
WCCI 2016, Vancouver / BC 14 /15
extreme case analysis (107+109 samples)
GMLVQ classifier Risk score classifier
- AUC=0.84
 R=2
WCCI 2016, Vancouver / BC 15 /15
the set of 80 genes is also diagnostic:
• GMLVQ separates normal from tumor cells (close to) perfectly
• PCA of corresponding gene expressions:
65 normal samples
105 low risk samples (late recurrence)
109 high risk samples (early recurrence)
gradient from normal to high risk:
diagnostics?
WCCI 2016, Vancouver / BC 16 /15
• GMLVQ suggests an even smaller panel of prognostic genes (12?)
identify a minimum panel for diagnostics and prognostics
• 80 genes do not necessarily reflect biological mechanisms
compare, e.g., with known pathways / modules of genes
remarks and open questions
• prospective studies required with respect to use as an assay
• can the performance be improved further ?
study more sophisticated classifier systems
include further clinical information (diet, life style, family history, … )
easy-to-use GMLVQ-classifier: www.cs.rug.nl/~biehl/gmlvq
• more direct, multivariate identification of relevant genes ?
e.g. PCA+GMLVQ and back-transform

More Related Content

Similar to 2016: Predicting Recurrence in Clear Cell Renal Cell Carcinoma

coad_machine_learning
coad_machine_learningcoad_machine_learning
coad_machine_learning
Ford Sleeman
 
NY Prostate Cancer Conference - K. Touijer - Session 4: Predicting clinical a...
NY Prostate Cancer Conference - K. Touijer - Session 4: Predicting clinical a...NY Prostate Cancer Conference - K. Touijer - Session 4: Predicting clinical a...
NY Prostate Cancer Conference - K. Touijer - Session 4: Predicting clinical a...
European School of Oncology
 
BRITEREU_finalposter
BRITEREU_finalposterBRITEREU_finalposter
BRITEREU_finalposter
Elsa Fecke
 
IMM_752_kSORT_Whitepaper_2016_revfinal_NoCrops
IMM_752_kSORT_Whitepaper_2016_revfinal_NoCropsIMM_752_kSORT_Whitepaper_2016_revfinal_NoCrops
IMM_752_kSORT_Whitepaper_2016_revfinal_NoCrops
Kevin Jaglinski
 
Updates to VSClinical ACMG Guidelines & a Tour of Cancer Annotation Sources
Updates to VSClinical ACMG Guidelines & a Tour of Cancer Annotation SourcesUpdates to VSClinical ACMG Guidelines & a Tour of Cancer Annotation Sources
Updates to VSClinical ACMG Guidelines & a Tour of Cancer Annotation Sources
Golden Helix
 
Updates to VSClinical ACMG Guidelines & a Tour of Cancer Annotation Sources
Updates to VSClinical ACMG Guidelines & a Tour of Cancer Annotation SourcesUpdates to VSClinical ACMG Guidelines & a Tour of Cancer Annotation Sources
Updates to VSClinical ACMG Guidelines & a Tour of Cancer Annotation Sources
Delaina Hawkins
 
Pharmacogenomic Prediction of Antracycline-induced Cardiotoxicity
Pharmacogenomic Prediction of Antracycline-induced CardiotoxicityPharmacogenomic Prediction of Antracycline-induced Cardiotoxicity
Pharmacogenomic Prediction of Antracycline-induced Cardiotoxicity
Golden Helix
 
Q biomarkersomaticmutation
Q biomarkersomaticmutationQ biomarkersomaticmutation
Q biomarkersomaticmutation
Elsa von Licy
 

Similar to 2016: Predicting Recurrence in Clear Cell Renal Cell Carcinoma (20)

coad_machine_learning
coad_machine_learningcoad_machine_learning
coad_machine_learning
 
Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...
Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...
Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...
 
NY Prostate Cancer Conference - K. Touijer - Session 4: Predicting clinical a...
NY Prostate Cancer Conference - K. Touijer - Session 4: Predicting clinical a...NY Prostate Cancer Conference - K. Touijer - Session 4: Predicting clinical a...
NY Prostate Cancer Conference - K. Touijer - Session 4: Predicting clinical a...
 
20160219 - S. De Toffol - Dal Sanger al NGS nello studio delle mutazioni BRCA
20160219 - S. De Toffol -  Dal Sanger al NGS nello studio delle mutazioni BRCA �20160219 - S. De Toffol -  Dal Sanger al NGS nello studio delle mutazioni BRCA �
20160219 - S. De Toffol - Dal Sanger al NGS nello studio delle mutazioni BRCA
 
BRITEREU_finalposter
BRITEREU_finalposterBRITEREU_finalposter
BRITEREU_finalposter
 
Radiomics and Deep Learning for Lung Cancer Screening
Radiomics and Deep Learning for Lung Cancer ScreeningRadiomics and Deep Learning for Lung Cancer Screening
Radiomics and Deep Learning for Lung Cancer Screening
 
Utilization of NGS to Identify Clinically-Relevant Mutations in cfDNA: Meet t...
Utilization of NGS to Identify Clinically-Relevant Mutations in cfDNA: Meet t...Utilization of NGS to Identify Clinically-Relevant Mutations in cfDNA: Meet t...
Utilization of NGS to Identify Clinically-Relevant Mutations in cfDNA: Meet t...
 
IMM_752_kSORT_Whitepaper_2016_revfinal_NoCrops
IMM_752_kSORT_Whitepaper_2016_revfinal_NoCropsIMM_752_kSORT_Whitepaper_2016_revfinal_NoCrops
IMM_752_kSORT_Whitepaper_2016_revfinal_NoCrops
 
Project_702
Project_702Project_702
Project_702
 
How to do successful gene expression analysis - Siena 20100625
How to do successful gene expression analysis - Siena 20100625How to do successful gene expression analysis - Siena 20100625
How to do successful gene expression analysis - Siena 20100625
 
Updates to VSClinical ACMG Guidelines & a Tour of Cancer Annotation Sources
Updates to VSClinical ACMG Guidelines & a Tour of Cancer Annotation SourcesUpdates to VSClinical ACMG Guidelines & a Tour of Cancer Annotation Sources
Updates to VSClinical ACMG Guidelines & a Tour of Cancer Annotation Sources
 
Updates to VSClinical ACMG Guidelines & a Tour of Cancer Annotation Sources
Updates to VSClinical ACMG Guidelines & a Tour of Cancer Annotation SourcesUpdates to VSClinical ACMG Guidelines & a Tour of Cancer Annotation Sources
Updates to VSClinical ACMG Guidelines & a Tour of Cancer Annotation Sources
 
Bioinformatics-R program의 실례
Bioinformatics-R program의 실례Bioinformatics-R program의 실례
Bioinformatics-R program의 실례
 
The National Center for Biotechnology Information (NCBI) Pathogen Analysis Pi...
The National Center for Biotechnology Information (NCBI) Pathogen Analysis Pi...The National Center for Biotechnology Information (NCBI) Pathogen Analysis Pi...
The National Center for Biotechnology Information (NCBI) Pathogen Analysis Pi...
 
Recent Advances in Pathologic Evaluation of Melanoma Sentinel Lymph Nodes. Sl...
Recent Advances in Pathologic Evaluation of Melanoma Sentinel Lymph Nodes. Sl...Recent Advances in Pathologic Evaluation of Melanoma Sentinel Lymph Nodes. Sl...
Recent Advances in Pathologic Evaluation of Melanoma Sentinel Lymph Nodes. Sl...
 
Pharmacogenomic Prediction of Antracycline-induced Cardiotoxicity
Pharmacogenomic Prediction of Antracycline-induced CardiotoxicityPharmacogenomic Prediction of Antracycline-induced Cardiotoxicity
Pharmacogenomic Prediction of Antracycline-induced Cardiotoxicity
 
Pharmacogenomic Prediction of Antracycline-induced Cardiotoxicity
Pharmacogenomic Prediction of Antracycline-induced CardiotoxicityPharmacogenomic Prediction of Antracycline-induced Cardiotoxicity
Pharmacogenomic Prediction of Antracycline-induced Cardiotoxicity
 
Population-Based DNA Variant Analysis
Population-Based DNA Variant AnalysisPopulation-Based DNA Variant Analysis
Population-Based DNA Variant Analysis
 
Q biomarkersomaticmutation
Q biomarkersomaticmutationQ biomarkersomaticmutation
Q biomarkersomaticmutation
 
Bioinformatics as a tool for understanding clinically significant variations ...
Bioinformatics as a tool for understanding clinically significant variations ...Bioinformatics as a tool for understanding clinically significant variations ...
Bioinformatics as a tool for understanding clinically significant variations ...
 

More from University of Groningen

More from University of Groningen (20)

Interpretable machine learning in endocrinology, M. Biehl, APPIS 2024
Interpretable machine learning in endocrinology, M. Biehl, APPIS 2024Interpretable machine learning in endocrinology, M. Biehl, APPIS 2024
Interpretable machine learning in endocrinology, M. Biehl, APPIS 2024
 
ESE-Eyes-2023.pdf
ESE-Eyes-2023.pdfESE-Eyes-2023.pdf
ESE-Eyes-2023.pdf
 
APPIS-FDGPET.pdf
APPIS-FDGPET.pdfAPPIS-FDGPET.pdf
APPIS-FDGPET.pdf
 
stat-phys-appis-reduced.pdf
stat-phys-appis-reduced.pdfstat-phys-appis-reduced.pdf
stat-phys-appis-reduced.pdf
 
prototypes-AMALEA.pdf
prototypes-AMALEA.pdfprototypes-AMALEA.pdf
prototypes-AMALEA.pdf
 
stat-phys-AMALEA.pdf
stat-phys-AMALEA.pdfstat-phys-AMALEA.pdf
stat-phys-AMALEA.pdf
 
Evidence for tissue and stage-specific composition of the ribosome: machine l...
Evidence for tissue and stage-specific composition of the ribosome: machine l...Evidence for tissue and stage-specific composition of the ribosome: machine l...
Evidence for tissue and stage-specific composition of the ribosome: machine l...
 
The statistical physics of learning revisted: Phase transitions in layered ne...
The statistical physics of learning revisted: Phase transitions in layered ne...The statistical physics of learning revisted: Phase transitions in layered ne...
The statistical physics of learning revisted: Phase transitions in layered ne...
 
Interpretable machine-learning (in endocrinology and beyond)
Interpretable machine-learning (in endocrinology and beyond)Interpretable machine-learning (in endocrinology and beyond)
Interpretable machine-learning (in endocrinology and beyond)
 
Biehl hanze-2021
Biehl hanze-2021Biehl hanze-2021
Biehl hanze-2021
 
2020: Prototype-based classifiers and relevance learning: medical application...
2020: Prototype-based classifiers and relevance learning: medical application...2020: Prototype-based classifiers and relevance learning: medical application...
2020: Prototype-based classifiers and relevance learning: medical application...
 
2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...
2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...
2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...
 
2020: So you thought the ribosome was constant and conserved ...
2020: So you thought the ribosome was constant and conserved ... 2020: So you thought the ribosome was constant and conserved ...
2020: So you thought the ribosome was constant and conserved ...
 
Prototype-based classifiers and their applications in the life sciences
Prototype-based classifiers and their applications in the life sciencesPrototype-based classifiers and their applications in the life sciences
Prototype-based classifiers and their applications in the life sciences
 
Prototype-based models in machine learning
Prototype-based models in machine learningPrototype-based models in machine learning
Prototype-based models in machine learning
 
The statistical physics of learning - revisited
The statistical physics of learning - revisitedThe statistical physics of learning - revisited
The statistical physics of learning - revisited
 
2013: Sometimes you can trust a rat - The sbv improver species translation ch...
2013: Sometimes you can trust a rat - The sbv improver species translation ch...2013: Sometimes you can trust a rat - The sbv improver species translation ch...
2013: Sometimes you can trust a rat - The sbv improver species translation ch...
 
2013: Prototype-based learning and adaptive distances for classification
2013: Prototype-based learning and adaptive distances for classification2013: Prototype-based learning and adaptive distances for classification
2013: Prototype-based learning and adaptive distances for classification
 
2015: Distance based classifiers: Basic concepts, recent developments and app...
2015: Distance based classifiers: Basic concepts, recent developments and app...2015: Distance based classifiers: Basic concepts, recent developments and app...
2015: Distance based classifiers: Basic concepts, recent developments and app...
 
2016: Classification of FDG-PET Brain Data
2016: Classification of FDG-PET Brain Data2016: Classification of FDG-PET Brain Data
2016: Classification of FDG-PET Brain Data
 

Recently uploaded

Electricity and Circuits for Grade 9 students
Electricity and Circuits for Grade 9 studentsElectricity and Circuits for Grade 9 students
Electricity and Circuits for Grade 9 students
levieagacer
 
GENETICALLY MODIFIED ORGANISM'S PRESENTATION.ppt
GENETICALLY MODIFIED ORGANISM'S PRESENTATION.pptGENETICALLY MODIFIED ORGANISM'S PRESENTATION.ppt
GENETICALLY MODIFIED ORGANISM'S PRESENTATION.ppt
SyedArifMalki
 

Recently uploaded (20)

Classification of Kerogen, Perspective on palynofacies in depositional envi...
Classification of Kerogen,  Perspective on palynofacies in depositional  envi...Classification of Kerogen,  Perspective on palynofacies in depositional  envi...
Classification of Kerogen, Perspective on palynofacies in depositional envi...
 
Electricity and Circuits for Grade 9 students
Electricity and Circuits for Grade 9 studentsElectricity and Circuits for Grade 9 students
Electricity and Circuits for Grade 9 students
 
Factor Causing low production and physiology of mamary Gland
Factor Causing low production and physiology of mamary GlandFactor Causing low production and physiology of mamary Gland
Factor Causing low production and physiology of mamary Gland
 
GBSN - Biochemistry (Unit 3) Metabolism
GBSN - Biochemistry (Unit 3) MetabolismGBSN - Biochemistry (Unit 3) Metabolism
GBSN - Biochemistry (Unit 3) Metabolism
 
GENETICALLY MODIFIED ORGANISM'S PRESENTATION.ppt
GENETICALLY MODIFIED ORGANISM'S PRESENTATION.pptGENETICALLY MODIFIED ORGANISM'S PRESENTATION.ppt
GENETICALLY MODIFIED ORGANISM'S PRESENTATION.ppt
 
Manganese‐RichSandstonesasanIndicatorofAncientOxic LakeWaterConditionsinGale...
Manganese‐RichSandstonesasanIndicatorofAncientOxic  LakeWaterConditionsinGale...Manganese‐RichSandstonesasanIndicatorofAncientOxic  LakeWaterConditionsinGale...
Manganese‐RichSandstonesasanIndicatorofAncientOxic LakeWaterConditionsinGale...
 
TEST BANK for Organic Chemistry 6th Edition.pdf
TEST BANK for Organic Chemistry 6th Edition.pdfTEST BANK for Organic Chemistry 6th Edition.pdf
TEST BANK for Organic Chemistry 6th Edition.pdf
 
Vital Signs of Animals Presentation By Aftab Ahmed Rahimoon
Vital Signs of Animals Presentation By Aftab Ahmed RahimoonVital Signs of Animals Presentation By Aftab Ahmed Rahimoon
Vital Signs of Animals Presentation By Aftab Ahmed Rahimoon
 
Polyethylene and its polymerization.pptx
Polyethylene and its polymerization.pptxPolyethylene and its polymerization.pptx
Polyethylene and its polymerization.pptx
 
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
 
EU START PROJECT. START-Newsletter_Issue_4.pdf
EU START PROJECT. START-Newsletter_Issue_4.pdfEU START PROJECT. START-Newsletter_Issue_4.pdf
EU START PROJECT. START-Newsletter_Issue_4.pdf
 
PARENTAL CARE IN FISHES.pptx for 5th sem
PARENTAL CARE IN FISHES.pptx for 5th semPARENTAL CARE IN FISHES.pptx for 5th sem
PARENTAL CARE IN FISHES.pptx for 5th sem
 
RACEMIzATION AND ISOMERISATION completed.pptx
RACEMIzATION AND ISOMERISATION completed.pptxRACEMIzATION AND ISOMERISATION completed.pptx
RACEMIzATION AND ISOMERISATION completed.pptx
 
Costs to heap leach gold ore tailings in Karamoja region of Uganda
Costs to heap leach gold ore tailings in Karamoja region of UgandaCosts to heap leach gold ore tailings in Karamoja region of Uganda
Costs to heap leach gold ore tailings in Karamoja region of Uganda
 
NUMERICAL Proof Of TIme Electron Theory.
NUMERICAL Proof Of TIme Electron Theory.NUMERICAL Proof Of TIme Electron Theory.
NUMERICAL Proof Of TIme Electron Theory.
 
Taphonomy and Quality of the Fossil Record
Taphonomy and Quality of the  Fossil RecordTaphonomy and Quality of the  Fossil Record
Taphonomy and Quality of the Fossil Record
 
Vital Signs of Animals Presentation By Aftab Ahmed Rahimoon
Vital Signs of Animals Presentation By Aftab Ahmed RahimoonVital Signs of Animals Presentation By Aftab Ahmed Rahimoon
Vital Signs of Animals Presentation By Aftab Ahmed Rahimoon
 
A Scientific PowerPoint on Albert Einstein
A Scientific PowerPoint on Albert EinsteinA Scientific PowerPoint on Albert Einstein
A Scientific PowerPoint on Albert Einstein
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
 
Technical english Technical english.pptx
Technical english Technical english.pptxTechnical english Technical english.pptx
Technical english Technical english.pptx
 

2016: Predicting Recurrence in Clear Cell Renal Cell Carcinoma

  • 1. Gargi Mukherjee … Rutgers University, New Jersey Kevin Raines … Stanford University, California Srikanth Sastry … JNC, Bengaluru, India Sebastian Doniach … Stanford University, California Gyan Bhanot … Rutgers University, New Jersey Michael Biehl … University of Groningen, The Netherlands 1 Predicting Recurrence in Clear Cell Renal Cell Carcinoma Analysis of TCGA data using Outlier Analysis and GMLVQ
  • 2. WCCI 2016, Vancouver / BC 2 /15 overview gene expression in tumor cells specific example: clear cell Renal Cell Carcinomas (ccRCC) • outlier analysis: identification of a panel of prognostic genes with respect to recurrence • risk score: prediction of individual recurrence risk based on outlier status w.r.t. selected genes • machine learning: analysis of extreme cases of low / high risk distance based classification and relevance learning (Generalized Matrix Relevance LVQ) clinical data: recurrence free intervals
  • 3. WCCI 2016, Vancouver / BC 3 /15 clear cell Renal Cell Carcinoma (ccRCC) publicly available datasets: The Cancer Genome Atlas (TCGA) cancergenome.nih.gov also hosted at Broad Institute gdac.broadinstitute.org data
  • 4. WCCI 2016, Vancouver / BC 4 /15 data 20532genes 65normalsamples 469 tumor samples 65 + 65 matched clear cell renal cell carcinoma TCGA data @ Broad Institute mRNA-Seq expression data X normalized, log-transformed: Y=log(1+X) 65 normal samples 65 matched tumor samples 469 tumor samples in total number of recurrences recurrence data: days after diagnosis
  • 5. WCCI 2016, Vancouver / BC 5 /15 380 training samples outlier analysis 89testsamples randomized split
  • 6. WCCI 2016, Vancouver / BC 6 /15 380 training samples outlier analysis per gene: determine mean μ, standard deviation σ of Y for each gene: identify outlier samples Y > μ + σ “high outlier“ Y < μ - σ “low outlier“ restrict the following analysis to genes with ≥ 20 high outlier samples or ≥ 20 low outlier samples
  • 7. WCCI 2016, Vancouver / BC 7 /15 1546 „high-outlier genes“ with KM log rank p < 0.001 1628 „low-outlier genes“ with KM log rank p < 0.0005 construct two binary outlier matrices „1“ for high-outlier samples „0“ else „1“ for low-outlier samples „0“ else 1546 genes  PCA Kaplan-Meier (KM) analysis per gene: test for significant association of outlier status of samples with recurrence outlier analysis 1628 genes 380samples380samples
  • 8. WCCI 2016, Vancouver / BC 8 /15 PCA reveals four clusters of genes 711475 2261402 A B DC high outlier genes low outlier genes genes in small clusters (B,D): outlier status associated with late recurrence genes in large clusters (A,C): outlier status associated with early recurrence outlier analysis
  • 9. WCCI 2016, Vancouver / BC 9 /15 recurrence risk score top 20 genes (by KM p-value) from each cluster A,B,C,D reference set of 80 genes for each sample: - determine outlier status with respect to the 80 genes (Y >?< μ ± σ ) - add up contributions per gene - 1 if the sample is outlier w.r.t. to a gene in A or C (early rec.) 0 if the sample is not an outlier w.r.t. the gene + 1 if the sample is outlier w.r.t. to a gene in B or D (late rec.) recurrence risk score - 40 ≤ R ≤ + 40 observe: median = 2 over the 380 training samples crisp classification w.r.t. recurrence risk: high risk (early recurrence) if R < 2 low risk (late recurrence) if R ≥ 2
  • 10. WCCI 2016, Vancouver / BC 10 /15 recurrence risk prediction training set (380 samples) test set (89 samples) log rank p < 1.e-16 log rank p < 1.e-4 KM plots with respect to high / low risk groups: • risk score R is predictive of the actual recurrence risk • the 80 selected genes can serve as a prognostic panel
  • 11. WCCI 2016, Vancouver / BC 11 /15 extreme case analysis number of recurrences: ≤ 2 years (early) > 5 years (late or no recurrence) 109 samples class 2, high risk 107 samples class 1, low risk (undefined) 2 classes: • 80-dim. feature vectors (gene expression) • representation by one prototype vector per class: • adaptive distance measure for comparison of samples and prototypes: with relevance matrix • distance-based classification, e.g. Nearest Prototype Classifier (NPC)
  • 12. WCCI 2016, Vancouver / BC 12 /15 GMLVQ classifier Generalized Matrix Relevance Learning Vector Quantization (GMLVQ) training of prototypes and relevance matrix = minimization of an appropriate cost function with respect to performance on labeled training set components of diagonal elements of Λ A B C D A B C D lowexpression|highexpression
  • 13. WCCI 2016, Vancouver / BC 13 /15 GMLVQ classifier ROC of GMLVQ classifier (Leave-One-Out of the 216 extreme samples) KM plot w.r.t. all 469 samples ( L-1-O for 216 samples, plus 253 undefined ) log rank p < 1.e-7
  • 14. WCCI 2016, Vancouver / BC 14 /15 extreme case analysis (107+109 samples) GMLVQ classifier Risk score classifier - AUC=0.84  R=2
  • 15. WCCI 2016, Vancouver / BC 15 /15 the set of 80 genes is also diagnostic: • GMLVQ separates normal from tumor cells (close to) perfectly • PCA of corresponding gene expressions: 65 normal samples 105 low risk samples (late recurrence) 109 high risk samples (early recurrence) gradient from normal to high risk: diagnostics?
  • 16. WCCI 2016, Vancouver / BC 16 /15 • GMLVQ suggests an even smaller panel of prognostic genes (12?) identify a minimum panel for diagnostics and prognostics • 80 genes do not necessarily reflect biological mechanisms compare, e.g., with known pathways / modules of genes remarks and open questions • prospective studies required with respect to use as an assay • can the performance be improved further ? study more sophisticated classifier systems include further clinical information (diet, life style, family history, … ) easy-to-use GMLVQ-classifier: www.cs.rug.nl/~biehl/gmlvq • more direct, multivariate identification of relevant genes ? e.g. PCA+GMLVQ and back-transform