Problem with this validation: lack of insignificant pathways
Pathway Ranking Tool
BioDiscovery, Inc. at Marina del Rey
Analyzing microarray data on pathway level
instead of individual gene level
Validation of statistical methods
2 data sets: Brain Tumor, Interferon-gamma.
Sources of annotation: BioCarta, Kegg, Gene
Project Overview, cont.
GeneSight is a data analysis software
-Statistical significance testing
-Multiple Data Visualizations
-Automated gene annotation
-Complete result reports
-Pathway analysis (?)
Research and Development in GeneSight
Glioblastoma multiforme(GBM) is the most
malignant of the glial tumors, classified as
Many brain tumors are currently incurable.
Average survival time: 1 year
Biology of Brain Tumor
normal cell growth
genes: retard cell growth
Bad Genes Foment Trouble
Interferon is a class of cytokines that mediate
antiviral, antiproliferative, antitumor activites,
IFN gamma is produced by T lymphocytes in
response to mitogens or to antigens.
IFNs bind to their receptors and initiate JAK-
STAT signaling cascade.
Biology of Interferon
Biology of Interferon, cont.
Grouping related genes together into pathways
Ex: p53 Signaling Pathway
Ex:Citrate cycle (TCA cycle)
Grouping genes into structured, controlled
-Biological Process. Ex: angiogenesis, apoptosis
-Molecular Function. Ex: DNA binding activity
-Cellular Component. Ex: nucleus, mitochondria
Traditional method of ranking gene
1. Mann-Whitney Test: obtain
list of probe sets that satisfy
a certain p-value.
2. Cluster analysis: see how
many of listed probe occur
in a cluster (pathway).
1. Original data: 12,625 genes.
Select genes p-value <0.001.
=>narrow to 927 genes.
2. Cluster those 927 genes into
4 of the genes in SODD/TNFR1 Signaling
Pathway satisfy p-value<0.001
Annotations Lists DG-Less_than_0.001
BioCarta Pathway SODD/TNFR1 Signaling Pathway p=0.012: CASP8,FADD,LTA,TNF (4 of 9)
BioCarta Pathway D4-GDI Signaling Pathway p=0.017: CASP1,CASP10,CASP8,JUN (4 of 10)
BioCarta Pathway TNFR1 Signaling Pathway p=0.021: CASP8,FADD,JUN,LMNB1,LTA,MADD,TNF (7 of 28
BioCarta Pathway Cadmium induces DNA synthesis and proliferation in macrophages p=0.021: JUN,LTA,MAPK3,PRKCB1,TNF (5 of 16)
BioCarta Pathway Visceral Fat Deposits and the Metabolic Syndrome p=0.022: LPL,LTA,TNF (3 of 6)
BioCarta Pathway Fibrinolysis Pathway p=0.032: F13A1,F2R,SERPINE1 (3 of 7)
BioCarta Pathway EPO Signaling Pathway p=0.033: EPO,EPOR,GRB2,JUN,MAPK3 (5 of 18)
Mann-Whitney Test, Denovo Glioblastoma
How Affy. Microarray Chips Work
Best results: Genes hybridize
perfectly with Perfect Match, and
not at all with Mismatch.
PM: Perfect Match
Example of GeneSight PlotData
Normal Normal Tumor Tumor
Probe Set A 4.5 3.8 10.2 11.1
Probe Set B 2.3 2.7 13.5 13.6
Probe Set C 7.8 8.2 1.4 1.8
Probe Set A 3.5 4.2 8.9 9.6
Theoretical Tumor Expression Levels (Log Transformed)
Notice column replicates, Probe Set replicates.
Given Data Sets
Given two data sets: Brain Tumor, IFN-γ
Brain Tumor Data Set has 5+ tumor
types,however, only 2 Tumor types were used
(Denovo Glioblastoma, Progressive
IFN-γ Data Set: the entire data set was used.
What and why?
Goal: write a prototype extension to GeneSight
that uses permutational statistics to develop a
custom distribution for a given Microarray data
Overall significance: the software provides a
list of (potentially) significant pathways that
enables researchers to focus their work.
What is permutational statistics?
E E C C
1 2 3 4
Choose different Control and Experiment groupings (permute).
E C E C
1 2 3 4
By iterating through an adequate number of permutations, we can
determine if a pathway is likely to be significant (p-value).
(In this context.)
There are two versions of the S. Metric
S. Metric I =
S. Metric II =
M = Number of
Genes flagged as
Total = Total number
of Genes in the
(Layman's) How Statistics Works
Data Statistic P-Value
S. Metric I, II
After all permutations
are done, calculate
Take at least 10,000 unique permutations. A unique
permutation is determined by a Permute class.
For each condition
For each permutation
For each gene
Calc. Mean diff.
For each pathway
store the statistic
Initial Significance Flagging
Computational Power (Memory, CPU)
Required number of replicates (8,8)
Validation of pathway analysis
classified as significant
classified as insignificant
Linda's Selection of significant
pathways True Positive False Negative
Linda's Selection of insignificant
pathways False Positive True Negative
Problem: lack of insignificant pathways
Validation of pathway analysis
Best algorithm Random
Comparision of Prediction Methods
# of Pathways in BioCarta sorted by P-value
1 26 51 76 101 126 151 176 201 226 251
# of pathways in BioCarta sorted by P-value
IFNG-Molecular Function (GO)
Number of terms in MF(GO) sorted by p-value
Prediction of pathways to be significant in the
conditions of interest is subjective.
Assumption of similar biological states between
Denovo Glioblastoma and Progressive
Finish modifying the Multivariate Statistic for
use in the permutational method. This method
uses PCA and Multivariate statistics.
Finish Validating the data produced using the
Initial Results of Multivariate Stat.
Number of terms in BP(GO)
Sorted by p-value.
It is not clear which is better the S. metric or
traditional Enrichment Analysis.
Improvements can be made to the S. metric.
Dr. Bruce Hoff
Dr. Anton Petrov
SoCalBSI: Dr. Jamil Momand,
Dr. Sandra Sharp, Dr. Nancy Warter-Perez,
Dr. Wendie Johnston
National Science Foundation
National Institute of Heath