Pathway Ranking Tool
Dimitri Kosturos
Linda Tsai
SoCalBSI, 8/21/2003
Project Overview
 BioDiscovery, Inc. at Marina del Rey
 Analyzing microarray data on pathway level
instead of individual...
 Validation of statistical methods
 2 data sets: Brain Tumor, Interferon-gamma.
 Sources of annotation: BioCarta, Kegg,...
phenotype
microarray algorithm
pathway
Dimitri,
(Computer Scientist)
Linda
(biologist)
Project Flowchart
 GeneSight is a data analysis software
 Feature:
-Statistical significance testing
-Multiple Data Visualizations
-Automa...
 Glioblastoma multiforme(GBM) is the most
malignant of the glial tumors, classified as
grade IV.
 Many brain tumors are ...
 Oncogenes: promote
normal cell growth
 Tumor suppressor
genes: retard cell growth
http://www.med.harvard.edu/publicatio...
 Interferon is a class of cytokines that mediate
antiviral, antiproliferative, antitumor activites,
etc.
 IFN gamma is p...
http://www.grt.kyushu-u.ac.jp/eny-doc/pathway/ifn_gamma.html
Biology of Interferon, cont.
 Grouping related genes together into pathways
(A) BioCarta
Ex: p53 Signaling Pathway
(B) KEGG
Ex:Citrate cycle (TCA cycl...
Traditional method of ranking gene
pathways
 Steps:
1. Mann-Whitney Test: obtain
list of probe sets that satisfy
a certai...
4 of the genes in SODD/TNFR1 Signaling
Pathway satisfy p-value<0.001
Annotations  Lists DG-Less_than_0.001
BioCarta Pathwa...
How Affy. Microarray Chips Work
http://www.ucl.ac.uk/oncology/MicroCore/HTML_resource/Norm_Affy1.htm
Best results: Genes h...
Example of GeneSight PlotData
Normal Normal Tumor Tumor
Probe Set A 4.5 3.8 10.2 11.1
Probe Set B 2.3 2.7 13.5 13.6
Probe ...
Given Data Sets
 Given two data sets: Brain Tumor, IFN-γ
 Brain Tumor Data Set has 5+ tumor
types,however, only 2 Tumor ...
What and why?
 Goal: write a prototype extension to GeneSight
that uses permutational statistics to develop a
custom dist...
What is permutational statistics?
E E C C
1 2 3 4
Choose different Control and Experiment groupings (permute).
E C E C
1 2...
Permutational Stats.
 There are two versions of the S. Metric
currently implemented.
S. Metric I =
S. Metric II =
M = Num...
(Layman's) How Statistics Works
Data Statistic P-Value
Permute Here
S. Metric I, II
After all permutations
are done, calcu...
Algorithm
 Take at least 10,000 unique permutations. A unique
permutation is determined by a Permute class.
For each cond...
Limitations
 Computational Power (Memory, CPU)
 Required number of replicates (8,8)
Output of result
Validation of pathway analysis
Method 1
Computer algorithm
classified as significant
pathways
Computer algorithm
classifie...
Validation of pathway analysis
Method 2
Best algorithm Random
Worst
Comparision of Prediction Methods
0
2
4
6
8
10
12
14
1...
Result
Brain Tumor-BioCarta
D. glioblastoma
0
5
10
15
20
25
30
35
40
1 26 51 76 101 126 151 176 201 226 251
# of pathways ...
Result
IFNG-Molecular Function (GO)
IFNG-Molecular Function
0
10
20
30
40
50
60
70
80
1
125
249
373
497
621
745
869
993
11...
Biological Limitations
 Prediction of pathways to be significant in the
conditions of interest is subjective.
 Assumptio...
Future Direction
 Finish modifying the Multivariate Statistic for
use in the permutational method. This method
uses PCA a...
Initial Results of Multivariate Stat.
IFNG-Biological Process(GO)
0
10
20
30
40
50
1
98
195
292
389
486
583
680
777
874
97...
Conclusion
 It is not clear which is better the S. metric or
traditional Enrichment Analysis.
 Improvements can be made ...
Acknowledgements
 Dr. Bruce Hoff
 Dr. Anton Petrov
 SoCalBSI: Dr. Jamil Momand,
Dr. Sandra Sharp, Dr. Nancy Warter-Pere...
Upcoming SlideShare
Loading in...5
×

dimitri_linda.ppt

192

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
192
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Problem with this validation: lack of insignificant pathways
  • dimitri_linda.ppt

    1. 1. Pathway Ranking Tool Dimitri Kosturos Linda Tsai SoCalBSI, 8/21/2003
    2. 2. Project Overview  BioDiscovery, Inc. at Marina del Rey  Analyzing microarray data on pathway level instead of individual gene level  Methods: -Enrichment Analysis -Permutational Statistics -S. Metric -Multivariate test Project Overview
    3. 3.  Validation of statistical methods  2 data sets: Brain Tumor, Interferon-gamma.  Sources of annotation: BioCarta, Kegg, Gene Ontology. Project Overview, cont.
    4. 4. phenotype microarray algorithm pathway Dimitri, (Computer Scientist) Linda (biologist) Project Flowchart
    5. 5.  GeneSight is a data analysis software  Feature: -Statistical significance testing -Multiple Data Visualizations -Automated gene annotation -Complete result reports -Pathway analysis (?) Research and Development in GeneSight
    6. 6.  Glioblastoma multiforme(GBM) is the most malignant of the glial tumors, classified as grade IV.  Many brain tumors are currently incurable.  Average survival time: 1 year Biology of Brain Tumor
    7. 7.  Oncogenes: promote normal cell growth  Tumor suppressor genes: retard cell growth http://www.med.harvard.edu/publications/On_The_Brain/Volume4/Number2/SP95Awry.html Bad Genes Foment Trouble
    8. 8.  Interferon is a class of cytokines that mediate antiviral, antiproliferative, antitumor activites, etc.  IFN gamma is produced by T lymphocytes in response to mitogens or to antigens.  IFNs bind to their receptors and initiate JAK- STAT signaling cascade. Biology of Interferon
    9. 9. http://www.grt.kyushu-u.ac.jp/eny-doc/pathway/ifn_gamma.html Biology of Interferon, cont.
    10. 10.  Grouping related genes together into pathways (A) BioCarta Ex: p53 Signaling Pathway (B) KEGG Ex:Citrate cycle (TCA cycle)  Grouping genes into structured, controlled vocabularies (ontologies) Gene Ontology -Biological Process. Ex: angiogenesis, apoptosis -Molecular Function. Ex: DNA binding activity -Cellular Component. Ex: nucleus, mitochondria Gene Annotations
    11. 11. Traditional method of ranking gene pathways  Steps: 1. Mann-Whitney Test: obtain list of probe sets that satisfy a certain p-value. 2. Cluster analysis: see how many of listed probe occur in a cluster (pathway).  Example: 1. Original data: 12,625 genes. Select genes p-value <0.001. =>narrow to 927 genes. 2. Cluster those 927 genes into clusters.
    12. 12. 4 of the genes in SODD/TNFR1 Signaling Pathway satisfy p-value<0.001 Annotations Lists DG-Less_than_0.001 BioCarta Pathway SODD/TNFR1 Signaling Pathway p=0.012: CASP8,FADD,LTA,TNF (4 of 9) BioCarta Pathway D4-GDI Signaling Pathway p=0.017: CASP1,CASP10,CASP8,JUN (4 of 10) BioCarta Pathway TNFR1 Signaling Pathway p=0.021: CASP8,FADD,JUN,LMNB1,LTA,MADD,TNF (7 of 28 BioCarta Pathway Cadmium induces DNA synthesis and proliferation in macrophages p=0.021: JUN,LTA,MAPK3,PRKCB1,TNF (5 of 16) BioCarta Pathway Visceral Fat Deposits and the Metabolic Syndrome p=0.022: LPL,LTA,TNF (3 of 6) BioCarta Pathway Fibrinolysis Pathway p=0.032: F13A1,F2R,SERPINE1 (3 of 7) BioCarta Pathway EPO Signaling Pathway p=0.033: EPO,EPOR,GRB2,JUN,MAPK3 (5 of 18) Mann-Whitney Test, Denovo Glioblastoma p<0.001
    13. 13. How Affy. Microarray Chips Work http://www.ucl.ac.uk/oncology/MicroCore/HTML_resource/Norm_Affy1.htm Best results: Genes hybridize perfectly with Perfect Match, and not at all with Mismatch. PM: Perfect Match MM: Mismatch
    14. 14. Example of GeneSight PlotData Normal Normal Tumor Tumor Probe Set A 4.5 3.8 10.2 11.1 Probe Set B 2.3 2.7 13.5 13.6 Probe Set C 7.8 8.2 1.4 1.8 Probe Set A 3.5 4.2 8.9 9.6 Theoretical Tumor Expression Levels (Log Transformed) Conditions Genes Notice column replicates, Probe Set replicates.
    15. 15. Given Data Sets  Given two data sets: Brain Tumor, IFN-γ  Brain Tumor Data Set has 5+ tumor types,however, only 2 Tumor types were used (Denovo Glioblastoma, Progressive Glioblastoma)  IFN-γ Data Set: the entire data set was used.
    16. 16. What and why?  Goal: write a prototype extension to GeneSight that uses permutational statistics to develop a custom distribution for a given Microarray data set.  Overall significance: the software provides a list of (potentially) significant pathways that enables researchers to focus their work.
    17. 17. What is permutational statistics? E E C C 1 2 3 4 Choose different Control and Experiment groupings (permute). E C E C 1 2 3 4 By iterating through an adequate number of permutations, we can determine if a pathway is likely to be significant (p-value). (In this context.)
    18. 18. Permutational Stats.  There are two versions of the S. Metric currently implemented. S. Metric I = S. Metric II = M = Number of Genes flagged as significant Total = Total number of Genes in the Pathway
    19. 19. (Layman's) How Statistics Works Data Statistic P-Value Permute Here S. Metric I, II After all permutations are done, calculate the p-Value
    20. 20. Algorithm  Take at least 10,000 unique permutations. A unique permutation is determined by a Permute class. For each condition For each permutation For each gene Calc. Mean diff. Calc. T-stat End For For each pathway store the statistic End for End for calcPvalue(stored statistic) S. Metric Initial Significance Flagging pValue
    21. 21. Limitations  Computational Power (Memory, CPU)  Required number of replicates (8,8)
    22. 22. Output of result
    23. 23. Validation of pathway analysis Method 1 Computer algorithm classified as significant pathways Computer algorithm classified as insignificant pathways Linda's Selection of significant pathways True Positive False Negative Linda's Selection of insignificant pathways False Positive True Negative Problem: lack of insignificant pathways ????
    24. 24. Validation of pathway analysis Method 2 Best algorithm Random Worst Comparision of Prediction Methods 0 2 4 6 8 10 12 14 16 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 # of Pathways in BioCarta sorted by P-value #ofidentifiedsignificantpathways
    25. 25. Result Brain Tumor-BioCarta D. glioblastoma 0 5 10 15 20 25 30 35 40 1 26 51 76 101 126 151 176 201 226 251 # of pathways in BioCarta sorted by P-value #ofsig.identified pathways SMI SMII DG0.001 DG0.01
    26. 26. Result IFNG-Molecular Function (GO) IFNG-Molecular Function 0 10 20 30 40 50 60 70 80 1 125 249 373 497 621 745 869 993 1117 1241 1365 1489 1613 1737 Number of terms in MF(GO) sorted by p-value Numberofsig. Identifiedterms SMI SMII enrich0.01 enrich0.001
    27. 27. Biological Limitations  Prediction of pathways to be significant in the conditions of interest is subjective.  Assumption of similar biological states between Denovo Glioblastoma and Progressive Glioblastoma.
    28. 28. Future Direction  Finish modifying the Multivariate Statistic for use in the permutational method. This method uses PCA and Multivariate statistics.  Finish Validating the data produced using the Multivariate Statistic.
    29. 29. Initial Results of Multivariate Stat. IFNG-Biological Process(GO) 0 10 20 30 40 50 1 98 195 292 389 486 583 680 777 874 971 1068 1165 Number of terms in BP(GO) Numberofterms identified SMI SMII enrich0.01 enrich0.001 M. Perm Sorted by p-value.
    30. 30. Conclusion  It is not clear which is better the S. metric or traditional Enrichment Analysis.  Improvements can be made to the S. metric.
    31. 31. Acknowledgements  Dr. Bruce Hoff  Dr. Anton Petrov  SoCalBSI: Dr. Jamil Momand, Dr. Sandra Sharp, Dr. Nancy Warter-Perez, Dr. Wendie Johnston  National Science Foundation  National Institute of Heath
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×