Translational Genomics Research Institute | www.tgen.org
Cancer Pathway Analysis and
Personalized Medicine
Jeff	
  Kiefer	
  
Research	
  Associate	
  Inves4gator	
  
Transla4onal	
  Genomics	
  Research	
  
Ins4tute
Translational Genomics Research Institute | www.tgen.org
Big Cancer Data Resources and Secondary Data Tools

Pathway Analysis - Resources, Methods, and Tools

Personalized Medicine - ‘Interpretation bottleneck’ 

Drug to Genomic Event Matching
Outline
Translational Genomics Research Institute | www.tgen.org
Cancer Genome Data Repositories
https://www.ebi.ac.uk/arrayexpress/
http://www.ncbi.nlm.nih.gov/geo/
http://cancergenome.nih.gov/
https://icgc.org/
Translational Genomics Research Institute | www.tgen.org
Cancer Genome Data Repositories and Data
Portals
https://genome-cancer.ucsc.edu/
http://www.cbioportal.org/public-portal/
http://cancergenome.broadinstitute.orgTumorPortal
https://dcc.icgc.org/
http://genomeportal.stanford.edu/pan-tcga
http://cancer.sanger.ac.uk/cancergenome/projects/cosmic/
Translational Genomics Research Institute | www.tgen.org
http://www.cbioportal.org/public-portal/
Translational Genomics Research Institute | www.tgen.org
http://www.cbioportal.org/public-portal/
Translational Genomics Research Institute | www.tgen.org
Pathways Analysis
Pathway analysis encompasses a number of different approaches
and methods applied to large-scale -omic data sets.

The goal is to discover meaningful biological knowledge from
large data sets often in the form of a gene list.

Pathway is a term that describes a step-wise signal transduction
pathway. However, the term ‘pathway’ is also loosely used to
encompass genes sets derived from signatures or other biological
processes such as the gene ontology.
Translational Genomics Research Institute | www.tgen.org
(2012). PLOS Computational Biology, 8(2), e1002375. doi:10.1371/journal.pcbi.1002375.t001
Pathways Analysis
Good general review outlining techniques, resources, and issues
in pathway analysis
Translational Genomics Research Institute | www.tgen.org
Pathways Analysis
Threshold-Based = Enrichment analysis performed on a gene list
derived from statistical test.

Non-threshold Based = All data is used. First popularized by gene
set enrichment analysis (GSEA).

‘de-novo’ Based = Pathways or gene sets derived from primary
data.
Translational Genomics Research Institute | www.tgen.org
Pathway Resources
http://www.reactome.org/
http://www.genome.jp/kegg/pathway.html
http://www.broadinstitute.org/gsea/msigdb/index.jsp
Commercial Resources
http://www.pathwaycommons.org/about/#main-container
Translational Genomics Research Institute | www.tgen.org
Threshold-based Pathway Enrichment Tools
https://toppgene.cchmc.org/
http://amp.pharm.mssm.edu/Enrichr
http://www.ici.upmc.fr/cluego/
Translational Genomics Research Institute | www.tgen.org
ToppGene extensive pathway
gene sets available for
enrichment analysis
Translational Genomics Research Institute | www.tgen.org
Easy to use web interface
Add list of gene identifiers to

perform enrichment analysis on.
Translational Genomics Research Institute | www.tgen.org
Results sorted based on significance.
Translational Genomics Research Institute | www.tgen.org
Translational Genomics Research Institute | www.tgen.org
Results
Gene Set/Pathway
Categories
Translational Genomics Research Institute | www.tgen.org
Different Result Outputs
Translational Genomics Research Institute | www.tgen.org
http://www.ici.upmc.fr/cluego/
ClueGO integrates Gene Ontology (GO) terms as well as pathways and creates a functionally organized GO/
pathway term network.
COL9A1
COL28A1
COL14A1
COL9A3
COL20A1
COL12A1
COL9A2
Collagen
biosynthesis and
modifying
enzymes
Collagen formation
forebrain
development
SEMA3A
SYPL2
FGF9
CNTNAP2
SLC6A4
NDNF
SLC5A3
HEPH
SLC14A1
Transport of
glucose and other
sugars, bile salts
and organic acids,
metal ions and
amine compounds
RHBG SLC6A20
TBX5
RAC3
negative regulation of cell
differentiation
negative regulation of Wnt
signaling pathway
BICC1
PRICKLE1
DKK1
SFRP2
EFEMP1
regulation of cell development
COL1A1
EPHA3
SLIT2
FES APCDD1SULF1
PPP2R3A
regulation of canonical Wnt
signaling pathway
regulation of Wnt
signaling pathway
DDR2
LTF
regulation of cell
differentiation
SP7
MT3
BAX
S100A9
S100A8
NDUFA13
regulation of cysteine-type
endopeptidase activity involved
in apoptotic process
BBC3
regulation of intrinsic apoptotic
signaling pathway
IGFBP3
MEGF10
SLN
CACNG4
CCL4
CACNB2
ENPP1
KCNH2
regulation of ion transport
positive
regulation of ion
transport
CCL3
CTLA4
SCN4B
GADD45G
TRIB3
intrinsic apoptotic signaling
pathway
p53 signaling
pathway
BAI1
SEPT4
CD82
SFN
TLR4
osteoblast
differentiation
TLR3
Rheumatoid
arthritis
IL8
LOC100509457
CXCL5 ANGPT1
Toll-like receptor signaling
pathway
CTSK
RUNX2
Cytoscape App
Translational Genomics Research Institute | www.tgen.org
Non-Threshold Pathway Enrichment Tools
http://www.broadinstitute.org/gsea/index.jsp
Translational Genomics Research Institute | www.tgen.org
GSEA
Can be accessed through a number of
resources and methods
Java Desktop

R-GSEA

Gene Pattern
Translational Genomics Research Institute | www.tgen.org
GSEA Use Case
Anaplastic Thyroid Cancer vs Non-Tumor Thyroid
Translational Genomics Research Institute | www.tgen.org
GSEA Use Case
Translational Genomics Research Institute | www.tgen.org
GSEA Visualization with Enrichment Map
Translational Genomics Research Institute | www.tgen.org
GSEA Visualization with Enrichment Map
(2010) PLoS ONE, 5(11), 1–12. doi:10.1371/journal.pone.0013984.t001
http://www.baderlab.org/Software/EnrichmentMap
Cytoscape App
Translational Genomics Research Institute | www.tgen.org
EDDY computes the discrepancy between probability distributions of
candidate networks structures based on likelihood of each network
across classes of samples.
Translational Genomics Research Institute | www.tgen.org
Methodology that can exploit complex interactions between
two conditions, such as tumor v normal that might be missed
in traditional approaches based on differential gene
expression
Translational Genomics Research Institute | www.tgen.org
Investigate differential dependencies between conditions
–  Evaluation of Differential DependencY
–  Computes the differential dependency statistics (JS) and its statistical
significance (p-value, via permutation) between conditions, based on
the likelihoods of genetic networks (a probabilistic distribution)
Likelihood
… Possible (or probable)
dependency structures JS
A
B C
Gene set of interest
A
B
C
A
B
C
Class 1
Class 2
MSigDB,
…
Gene set
DB
Class 2 specific dependency
Class 1 specific dependency
Common dependency
EDDY computes the discrepancy between probability distributions of
candidate networks structures based on likelihood of each network
across classes of samples.
Translational Genomics Research Institute | www.tgen.org
Likelihood
… Possible (or probable)
dependency structures
A
B
C
A
B
C
Class 1
Class 2
A
B C
A
B C A
B C
Class 1
Specific
dependency
Class 2
Specific
Dependency
A
B C
Common dependency
Translational Genomics Research Institute | www.tgen.org
•  GSEA appears under-powered, and also select disproportionate
amount.
•  GSCA appears to be overly sensitive – high false positive
(#): Overlap with EDDY gene sets
The number of identified subtype-specific gene sets
methods
GSEA and
ts the area
C curves in
simulation
es superior
his is partly
rom models
Comparison of EDDY with other methods in application
to TCGA GBM gene expression data
Table 2 lists the number of statistically significant gene
sets identified with the three different methods for each
subtype. EDDY and GSEA produced different results,
as EDDY identified 10 $ 22 gene sets for each subtype,
whereas GSEA identified 245 gene sets for mesenchymal
but just a few for other subtypes. Moreover, there is only
and EDDY in identifying differential gene sets from the interaction-focused simulation
and EDDY
v ¼ 30
0.5965
0.6075
0.6704
0.7064
Table 2. The number of statistically significant gene sets for each
subtype
Method Classical Mesenchymal Neural Proneural
EDDY 13 10 22 22
GSEA 1 (0) 245 (1) 6 (0) 3 (0)
GSCA 1590 (11) 1432 (7) 1681 (21) 1563 (17)
The number of common cases with EDDY is indicated in the
parentheses.
byguestonFebruary6,2014http://nar.oxfordjournals.org/Downloadedfrom
Translational Genomics Research Institute | www.tgen.org
G2 pathway and p53 pathway gene sets to have differential dependencies that are related to the enrichment of
p53 mutations in the proneural subtype. Heat maps show that genes in pathway are not differentially expressed
so would not be identified by GSEA technique.
Two Pathways Identified with EDDY Enriched in Proneural
Glioblastoma Phenotype
Translational Genomics Research Institute | www.tgen.org
PARADIGM
March 20, 2014 Vol507 Nature
MEMo
https://www.genome.gov/Multimedia/Slides/TCGA1/TCGA1_Ciriello.pdf
Both methods employ multiple genomic

data types to identified altered pathways
Employed in TCGA studies
Translational Genomics Research Institute | www.tgen.org
Personalized Medicine
‘Interpretation Bottleneck’
Drug Target Annotation
Translational Genomics Research Institute | www.tgen.org
Personalized Medicine Pipeline
Good, B. M., Ainscough, B. J., McMichael, J. F., Su, A. I., & Griffith, O. L. (2014). Organizing knowledge
to enable personalization of medicine in cancer, 1–9. doi:10.1186/s13059-014-0438-7
Translational Genomics Research Institute | www.tgen.org
Drug Target Matching for Personalized Medicine
Good, B. M., Ainscough, B. J., McMichael, J. F., Su, A. I., & Griffith, O. L. (2014). Organizing knowledge
to enable personalization of medicine in cancer, 1–9. doi:10.1186/s13059-014-0438-7
Translational Genomics Research Institute | www.tgen.org
Framework for Clinical Mapping Genomic
Aberration to Drugs
Good, B. M., Ainscough, B. J., McMichael, J. F., Su, A. I., & Griffith, O. L. (2014). Organizing knowledge
to enable personalization of medicine in cancer, 1–9. doi:10.1186/s13059-014-0438-7
Translational Genomics Research Institute | www.tgen.org
Drug Target Resources
A number of resources available for drug mapping to gene
targets.
Issues with available sources
•Different annotations schemes and data structures leads to misleading
results for end user.

•Contextual information around the drug and target is often not annotated.
•Not all annotations are therapeutically actionable or appropriate.
Translational Genomics Research Institute | www.tgen.org
Drug to Target Annotation
Information for linking drugs to genes should be based on
primary literature.

Curated information should be annotated with controlled
vocabulary and arrayed in a structured format.

Rules need to capture explicit drug-target response
information but also be flexible enough to capture inferred
information that may not always be explicitly stated. Important
for further research.
Translational Genomics Research Institute | www.tgen.org
Example annotation workflow for
capturing drug to target information.
Translational Genomics Research Institute | www.tgen.org
Visualization of Drug Target Network
Translational Genomics Research Institute | www.tgen.org
CNV
OtherEXPDRUG
SNV
Aberration Type Color Key
=no_direct
=no_inferred=yes_inferred
=yes_direct
Edge Interaction Key Aberration Type Color Key
=DRUG
=BIOMARKER
=MODIFIER
Patient Specific Drug Target Network
Patient Genomic Information
Translational Genomics Research Institute | www.tgen.org
Impact Areas for Text Mining
•Identify and extract interaction information for network and pathway
reconstruction.

•Aid in identifying and extracting genomic events linked to drug response to
better enable personalized medicine.

Psb tutorial cancer_pathways

  • 1.
    Translational Genomics ResearchInstitute | www.tgen.org Cancer Pathway Analysis and Personalized Medicine Jeff  Kiefer   Research  Associate  Inves4gator   Transla4onal  Genomics  Research   Ins4tute
  • 2.
    Translational Genomics ResearchInstitute | www.tgen.org Big Cancer Data Resources and Secondary Data Tools Pathway Analysis - Resources, Methods, and Tools Personalized Medicine - ‘Interpretation bottleneck’ Drug to Genomic Event Matching Outline
  • 3.
    Translational Genomics ResearchInstitute | www.tgen.org Cancer Genome Data Repositories https://www.ebi.ac.uk/arrayexpress/ http://www.ncbi.nlm.nih.gov/geo/ http://cancergenome.nih.gov/ https://icgc.org/
  • 4.
    Translational Genomics ResearchInstitute | www.tgen.org Cancer Genome Data Repositories and Data Portals https://genome-cancer.ucsc.edu/ http://www.cbioportal.org/public-portal/ http://cancergenome.broadinstitute.orgTumorPortal https://dcc.icgc.org/ http://genomeportal.stanford.edu/pan-tcga http://cancer.sanger.ac.uk/cancergenome/projects/cosmic/
  • 5.
    Translational Genomics ResearchInstitute | www.tgen.org http://www.cbioportal.org/public-portal/
  • 6.
    Translational Genomics ResearchInstitute | www.tgen.org http://www.cbioportal.org/public-portal/
  • 7.
    Translational Genomics ResearchInstitute | www.tgen.org Pathways Analysis Pathway analysis encompasses a number of different approaches and methods applied to large-scale -omic data sets. The goal is to discover meaningful biological knowledge from large data sets often in the form of a gene list. Pathway is a term that describes a step-wise signal transduction pathway. However, the term ‘pathway’ is also loosely used to encompass genes sets derived from signatures or other biological processes such as the gene ontology.
  • 8.
    Translational Genomics ResearchInstitute | www.tgen.org (2012). PLOS Computational Biology, 8(2), e1002375. doi:10.1371/journal.pcbi.1002375.t001 Pathways Analysis Good general review outlining techniques, resources, and issues in pathway analysis
  • 9.
    Translational Genomics ResearchInstitute | www.tgen.org Pathways Analysis Threshold-Based = Enrichment analysis performed on a gene list derived from statistical test. Non-threshold Based = All data is used. First popularized by gene set enrichment analysis (GSEA). ‘de-novo’ Based = Pathways or gene sets derived from primary data.
  • 10.
    Translational Genomics ResearchInstitute | www.tgen.org Pathway Resources http://www.reactome.org/ http://www.genome.jp/kegg/pathway.html http://www.broadinstitute.org/gsea/msigdb/index.jsp Commercial Resources http://www.pathwaycommons.org/about/#main-container
  • 11.
    Translational Genomics ResearchInstitute | www.tgen.org Threshold-based Pathway Enrichment Tools https://toppgene.cchmc.org/ http://amp.pharm.mssm.edu/Enrichr http://www.ici.upmc.fr/cluego/
  • 12.
    Translational Genomics ResearchInstitute | www.tgen.org ToppGene extensive pathway gene sets available for enrichment analysis
  • 13.
    Translational Genomics ResearchInstitute | www.tgen.org Easy to use web interface Add list of gene identifiers to perform enrichment analysis on.
  • 14.
    Translational Genomics ResearchInstitute | www.tgen.org Results sorted based on significance.
  • 15.
    Translational Genomics ResearchInstitute | www.tgen.org
  • 16.
    Translational Genomics ResearchInstitute | www.tgen.org Results Gene Set/Pathway Categories
  • 17.
    Translational Genomics ResearchInstitute | www.tgen.org Different Result Outputs
  • 18.
    Translational Genomics ResearchInstitute | www.tgen.org http://www.ici.upmc.fr/cluego/ ClueGO integrates Gene Ontology (GO) terms as well as pathways and creates a functionally organized GO/ pathway term network. COL9A1 COL28A1 COL14A1 COL9A3 COL20A1 COL12A1 COL9A2 Collagen biosynthesis and modifying enzymes Collagen formation forebrain development SEMA3A SYPL2 FGF9 CNTNAP2 SLC6A4 NDNF SLC5A3 HEPH SLC14A1 Transport of glucose and other sugars, bile salts and organic acids, metal ions and amine compounds RHBG SLC6A20 TBX5 RAC3 negative regulation of cell differentiation negative regulation of Wnt signaling pathway BICC1 PRICKLE1 DKK1 SFRP2 EFEMP1 regulation of cell development COL1A1 EPHA3 SLIT2 FES APCDD1SULF1 PPP2R3A regulation of canonical Wnt signaling pathway regulation of Wnt signaling pathway DDR2 LTF regulation of cell differentiation SP7 MT3 BAX S100A9 S100A8 NDUFA13 regulation of cysteine-type endopeptidase activity involved in apoptotic process BBC3 regulation of intrinsic apoptotic signaling pathway IGFBP3 MEGF10 SLN CACNG4 CCL4 CACNB2 ENPP1 KCNH2 regulation of ion transport positive regulation of ion transport CCL3 CTLA4 SCN4B GADD45G TRIB3 intrinsic apoptotic signaling pathway p53 signaling pathway BAI1 SEPT4 CD82 SFN TLR4 osteoblast differentiation TLR3 Rheumatoid arthritis IL8 LOC100509457 CXCL5 ANGPT1 Toll-like receptor signaling pathway CTSK RUNX2 Cytoscape App
  • 19.
    Translational Genomics ResearchInstitute | www.tgen.org Non-Threshold Pathway Enrichment Tools http://www.broadinstitute.org/gsea/index.jsp
  • 20.
    Translational Genomics ResearchInstitute | www.tgen.org GSEA Can be accessed through a number of resources and methods Java Desktop R-GSEA Gene Pattern
  • 21.
    Translational Genomics ResearchInstitute | www.tgen.org GSEA Use Case Anaplastic Thyroid Cancer vs Non-Tumor Thyroid
  • 22.
    Translational Genomics ResearchInstitute | www.tgen.org GSEA Use Case
  • 23.
    Translational Genomics ResearchInstitute | www.tgen.org GSEA Visualization with Enrichment Map
  • 24.
    Translational Genomics ResearchInstitute | www.tgen.org GSEA Visualization with Enrichment Map (2010) PLoS ONE, 5(11), 1–12. doi:10.1371/journal.pone.0013984.t001 http://www.baderlab.org/Software/EnrichmentMap Cytoscape App
  • 25.
    Translational Genomics ResearchInstitute | www.tgen.org EDDY computes the discrepancy between probability distributions of candidate networks structures based on likelihood of each network across classes of samples.
  • 26.
    Translational Genomics ResearchInstitute | www.tgen.org Methodology that can exploit complex interactions between two conditions, such as tumor v normal that might be missed in traditional approaches based on differential gene expression
  • 27.
    Translational Genomics ResearchInstitute | www.tgen.org Investigate differential dependencies between conditions –  Evaluation of Differential DependencY –  Computes the differential dependency statistics (JS) and its statistical significance (p-value, via permutation) between conditions, based on the likelihoods of genetic networks (a probabilistic distribution) Likelihood … Possible (or probable) dependency structures JS A B C Gene set of interest A B C A B C Class 1 Class 2 MSigDB, … Gene set DB Class 2 specific dependency Class 1 specific dependency Common dependency EDDY computes the discrepancy between probability distributions of candidate networks structures based on likelihood of each network across classes of samples.
  • 28.
    Translational Genomics ResearchInstitute | www.tgen.org Likelihood … Possible (or probable) dependency structures A B C A B C Class 1 Class 2 A B C A B C A B C Class 1 Specific dependency Class 2 Specific Dependency A B C Common dependency
  • 29.
    Translational Genomics ResearchInstitute | www.tgen.org •  GSEA appears under-powered, and also select disproportionate amount. •  GSCA appears to be overly sensitive – high false positive (#): Overlap with EDDY gene sets The number of identified subtype-specific gene sets methods GSEA and ts the area C curves in simulation es superior his is partly rom models Comparison of EDDY with other methods in application to TCGA GBM gene expression data Table 2 lists the number of statistically significant gene sets identified with the three different methods for each subtype. EDDY and GSEA produced different results, as EDDY identified 10 $ 22 gene sets for each subtype, whereas GSEA identified 245 gene sets for mesenchymal but just a few for other subtypes. Moreover, there is only and EDDY in identifying differential gene sets from the interaction-focused simulation and EDDY v ¼ 30 0.5965 0.6075 0.6704 0.7064 Table 2. The number of statistically significant gene sets for each subtype Method Classical Mesenchymal Neural Proneural EDDY 13 10 22 22 GSEA 1 (0) 245 (1) 6 (0) 3 (0) GSCA 1590 (11) 1432 (7) 1681 (21) 1563 (17) The number of common cases with EDDY is indicated in the parentheses. byguestonFebruary6,2014http://nar.oxfordjournals.org/Downloadedfrom
  • 30.
    Translational Genomics ResearchInstitute | www.tgen.org G2 pathway and p53 pathway gene sets to have differential dependencies that are related to the enrichment of p53 mutations in the proneural subtype. Heat maps show that genes in pathway are not differentially expressed so would not be identified by GSEA technique. Two Pathways Identified with EDDY Enriched in Proneural Glioblastoma Phenotype
  • 31.
    Translational Genomics ResearchInstitute | www.tgen.org PARADIGM March 20, 2014 Vol507 Nature MEMo https://www.genome.gov/Multimedia/Slides/TCGA1/TCGA1_Ciriello.pdf Both methods employ multiple genomic data types to identified altered pathways Employed in TCGA studies
  • 32.
    Translational Genomics ResearchInstitute | www.tgen.org Personalized Medicine ‘Interpretation Bottleneck’ Drug Target Annotation
  • 33.
    Translational Genomics ResearchInstitute | www.tgen.org Personalized Medicine Pipeline Good, B. M., Ainscough, B. J., McMichael, J. F., Su, A. I., & Griffith, O. L. (2014). Organizing knowledge to enable personalization of medicine in cancer, 1–9. doi:10.1186/s13059-014-0438-7
  • 34.
    Translational Genomics ResearchInstitute | www.tgen.org Drug Target Matching for Personalized Medicine Good, B. M., Ainscough, B. J., McMichael, J. F., Su, A. I., & Griffith, O. L. (2014). Organizing knowledge to enable personalization of medicine in cancer, 1–9. doi:10.1186/s13059-014-0438-7
  • 35.
    Translational Genomics ResearchInstitute | www.tgen.org Framework for Clinical Mapping Genomic Aberration to Drugs Good, B. M., Ainscough, B. J., McMichael, J. F., Su, A. I., & Griffith, O. L. (2014). Organizing knowledge to enable personalization of medicine in cancer, 1–9. doi:10.1186/s13059-014-0438-7
  • 36.
    Translational Genomics ResearchInstitute | www.tgen.org Drug Target Resources A number of resources available for drug mapping to gene targets. Issues with available sources •Different annotations schemes and data structures leads to misleading results for end user. •Contextual information around the drug and target is often not annotated. •Not all annotations are therapeutically actionable or appropriate.
  • 37.
    Translational Genomics ResearchInstitute | www.tgen.org Drug to Target Annotation Information for linking drugs to genes should be based on primary literature. Curated information should be annotated with controlled vocabulary and arrayed in a structured format. Rules need to capture explicit drug-target response information but also be flexible enough to capture inferred information that may not always be explicitly stated. Important for further research.
  • 38.
    Translational Genomics ResearchInstitute | www.tgen.org Example annotation workflow for capturing drug to target information.
  • 39.
    Translational Genomics ResearchInstitute | www.tgen.org Visualization of Drug Target Network
  • 40.
    Translational Genomics ResearchInstitute | www.tgen.org CNV OtherEXPDRUG SNV Aberration Type Color Key =no_direct =no_inferred=yes_inferred =yes_direct Edge Interaction Key Aberration Type Color Key =DRUG =BIOMARKER =MODIFIER Patient Specific Drug Target Network Patient Genomic Information
  • 41.
    Translational Genomics ResearchInstitute | www.tgen.org Impact Areas for Text Mining •Identify and extract interaction information for network and pathway reconstruction. •Aid in identifying and extracting genomic events linked to drug response to better enable personalized medicine.