We introduce PICS (Pathway Informed Classification System) for classifying cancers based on tumor sample gene expression levels. The method clearly separates a pan-cancer dataset into their tissue of origin and is also able to sub-classify individual cancer datasets into distinct survival classes. Gene expression values are collapsed into pathway scores that reveal which biological activities are most useful for clustering cancer cohorts into sub-types. Variants of the method allow it to be used on datasets that do and do not contain non-cancerous samples. Activity levels of all types of pathways, broadly grouped into metabolic, cellular processes and signaling, and immune system, are useful for separating the pan-cancer cohort. In the clustering of specific cancer types, certain pathway types become more valuable depending on the site being studied. For lung cancer, signaling pathways dominate, for pancreatic cancer signaling and metabolic pathways, and for melanoma immune system pathways are the most useful. This work suggests the utility of pathway level genomic analysis and points in the direction of using pathway classification for predicting the efficacy and side effects of drugs and radiation.
Call Girl Coimbatore Prisha☎️ 8250192130 Independent Escort Service Coimbatore
PICS: Pathway Informed Classification System for cancer analysis using gene expression data
1. PICS: Pathway Informed
Classification System for cancer
analysis using gene expression data
David Craft and Michael Young
MGH Brown Bag
April 12, 2016
2. Scenario: a patient with an advanced stage cancer who
has failed traditional treatments is told something like the
following:
We can offer you this new drug that was FDA approved 3
years ago. For your type of cancer, the response rate is
about 20% durable response after 2 years, 30% some
shrinkage but not durable, and 50% do not respond at all.
Additionally, the following side effects….
How can we improve on our predictive capability?
3. Probability of
eradication
Probability of toxicity 1
Probability of sensitivity
to X Gy of radiation
…
“We do not have good models for predicting
patient response to treatment”
4. Genomic characterization of cancers
Thousands of papers
characterizing genomic
nature of cancer.
Very little of this is in
usage on the front lines of
clinical cancer care.
Some clinical mutation/drug examples:
vemurafenib for BRAFv600 mutations
erlotinib for EGFR mutations
crizotinib for ALK mutations
6. Microarrays can measure upwards of 20,000 gene expression
levels.
With typical early phase drug testing trials, patient cohort numbers
are usually much smaller than this (30 - 100 patients).
Plenty of room for misleading correlations.
There has
been some
success with
the “you have
mutation x
therefore
take this
drug”, but …
Cartoon of the central dogma A microarray measuring
RNA levels
7. Although the central dogma is considered vastly
over-simplified, RNA still a useful “signal”
-James Shapiro, U. of Chicago
8. Data mining approach without any reference to biological systems.
A biological pathway is a well defined biochemical process that
occurs in living cells and organisms.
1000s have been curated over the years.
From GENES to PATHWAYS
9. An example pathway from KEGG
KEGG = Kyoto
encyclopedia of
genes and
genomes
11. Pathway scores
Gene expression levels for 12 genes of the pyruvate metabolism
pathway from 156 bladder patients in PRECOG. The first 12
columns are non-cancerous patient samples:
PCA
decomposition of
expression levels
of genes in a
particular pathway
pca 1
pca2
In this example score
for a patient could be
(pca1,pca2)
12. How to score a pathway?
1) PCA on gene expression levels
2) NTC ...
3) GED ...
3D → 2D
Expression level of gene 1
Expressionlevelofgene2
Expressionlevelofgene3
PCA decomposition
13. How to score a pathway?
1) PCA on gene expression levels
2) NTC: Compute “distance” in gene expression space from a
patient cancer sample to the mean of normal tissues in the
dataset.
normal samples cancer samples
X
Gene expression for gene 1
Gene
expression for
gene 2
Visual demo of
NTC method for a
pathway with two
genes.
NTC = normal tissue
centroid
14. How to score a pathway?
1) PCA on gene expression levels
2) NTC: ...
3) GED: Gene expression deviation
Two scores per patient for each
pathway:
One which sums up the positive
gene deviations [genes that
have higher expression
compared to normal samples]
and one which sums up the
negative gene deviations
Gene deviation for
gene g patient p
Expression level of
gene g patient p
Mean expression
level of gene g for
normal samples
Standard deviation
of expression level of
gene g for normal
samples
Kolmogorov-Smirnov to test
difference of two distributions.
If a gene passes then:
15. How to decide if we should use a particular
pathway (Silhouette score)
pca 1
pca2
1) Use PCA to reduce
dimensionality of
pathway gene
expression levels.
2) For the known groups
“normal” and “cancer”
evaluate for each
sample a silhouette
score. [1 is perfect
separation, <1 is worse
separation].
3) Average all the
silhouette scores for an
overall score. Take the
pathway if that score is
big enough.
16. Proposed system
PICS: Pathway Informed Classification System
Module 1
dimension
reduction
techniques
KEGG pathway
database
Patient gene
expression levels
Pathway
scores Module 2
Clustering
algorithms
Pathway scores for
multiple patients and
pathways
Pathway-
based
classification
Module 3
Prediction
Machine learning
Pathway
scores
Proposed
treatment
Improved
probabilities
of
a list of
possible side
effects and
tumor cure.
Overallsurvival
time
Improved
predictability with
pathway-based
quantitative
learning.
Standard
separation
1 2
3 4
18. Clustering methods
K-means or K-medoids Hierarchical clustering
Each row is a
type of avocado
(or, a set of gene
expression
values of a tissue
sample, i.e. a list
of attributes)
Attribute 1
Attribute2