Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Visual Exploration of 
Clinical and Genomic Data for 
Patient Stratification 
NILS GEHLENBORG 
! 
@nils_gehlenborg・http://...
Team 
Alexander Lex Harvard School of Engineering and Applied Sciences, Cambridge, MA, USA 
Marc Streit Johannes Kepler Un...
Funding 
Peter J Park NIH/NCI The Cancer Genome Atlas 
! 
Nils Gehlenborg NIH/NHGRI K99/R00 Pathway to Independence Award ...
?
TCGA 
The Cancer Genome Atlas
20+ cancer types 
× 
500 patients
10,000+ patients
mRNA expression 
microRNA expression 
DNA methylation 
protein expression 
copy number variants 
mutation calls 
clinical ...
Stratome
Anthony92931 / Wikimedia Commons
Correlation with clusters based on other data types? 
Different outcomes? 
Mutations or copy number variants associated wi...
Challenges 
How can we explore overlap of patient sets across stratifications? 
How can we compare properties of patient s...
Problem 1 
! 
Comparing Patient Sets 
across Stratifications
Patients 
Stratifications
mRNA Copy Number 
gene X 
Mutation 
gene Y 
del 
normal 
amp 
mut 
normal 
#1 
#2 
#3 
#4
mRNA Copy Number 
gene X 
Mutation 
gene Y 
del 
normal 
amp 
mut 
normal 
#1 
#2 
#3 
#4
mRNA Copy Number 
gene X 
Mutation 
gene Y 
del 
normal 
amp 
mut 
normal 
#1 
#2 
#3 
#4
mRNA Copy Number 
gene X 
Mutation 
gene Y 
del 
normal 
amp 
mut 
normal 
#1 
#2 
#3 
#4
StratomeX 
(short for Stratome Explorer)
mRNA Copy Number Mutation 
del 
normal 
amp 
mut 
normal 
#1 
#2 
#3 
#4
Select band
Select block
Compare clusterings: consensus NMF and hierarchical
Park columns
Compare clusterings: left cluster split
Compare clusterings: right cluster split
Compare clusterings: left cluster contained in right cluster
Problem 2 
! 
Comparing Patient Sets 
within Stratifications
Block Visualizations: Patient Properties 
Numerical Data 
Matrix 
Vector 
Matrix + (Pathway) Maps 
Categorical Data 
Scala...
Add KEGG glioma pathway and map mRNA transcript levels
Modify color mapping on the fly
View pathway detail (cluster 2)
Zoom into pathway detail (cluster 2): EGFR down-regulated
View pathway detail (cluster 3)
Zoom into pathway detail (cluster 3): EGFR up-regulated
Add copy number for EGFR
Add copy number for EGFR
Add survival stratified by TP53 mutation status
View detail of Kaplan-Meier plot based on TP53
?
Knowledge-driven Exploration 
Data-driven Exploration
Problem 3 
! 
Finding “Interesting” 
Stratifications and Pathways
Is there a mutation that overlaps with this mRNA cluster? 
Is there a mutually exclusive mutation? 
Is there a CNV that af...
Guided 
Exploration 
Query 
Retrieve 
Visualize 
Stratifications 
Clinical Params 
Pathways
LineUp 
S Gratzl, A Lex, N Gehlenborg, H Pfister and M Streit, “LineUp: Visual Analysis of Multi-Attribute 
Rankings“, IEE...
Example: Clear Cell Renal Carcinoma (KIRC) 
Main TCGA Paper published in Nature in 2013 
! 
First goal here: Characterize ...
View TCGA mRNA subtypes
Add MutSig q-values for mutations
Invert q-value mapping
Add filter to inverted q-value as cut-off
Query mutated genes
Queries 
Retrieve Stratifications 
Sets with large overlap: Jaccard Index 
Similar stratifications: Adjusted Rand Index 
S...
Query mutated genes
Result of Jaccard Index query: preview PTEN
Query mutated genes
Query mutated genes
Query mutated genes with cluster m2
Result of Jaccard Index query: preview MTOR
Re-order columns
Add TCGA microRNA subtypes (direct insert mode)
Add TCGA microRNA subtypes (direct insert mode)
Observe large overlap between m1 and mi3
Observe large overlap between m3 and mi2
Query for copy number variation matching m3
Query only tumor suppressor genes (Vogelstein et al.)
Query only tumor suppressor genes (Vogelstein et al.)
Score only deletions
Score only deletions
Score only deletions
Score only deletions
Score only deletions
View CDKN2A copy number status and m3 and mi2 overlap
Add survival stratified by TCGA microRNA clusters
Find gene mutation that affects survival
Score only mutations
Score only mutations
Score only mutations
Score only mutations
View BAP1 mutation status and survival stratified by BAP1
View BAP1 mutation status and survival stratified by BAP1
View BAP1 mutation status and survival stratified by BAP1
Query for enriched pathway in TCGA mRNA cluster m4
Preview KEGG ribosome pathway overexpression in m4
Confirm selection
Change color mapping
View ribosome pathway detail for TCGA mRNA cluster m4
?
Problem 4 
! 
Dealing with Terabytes of 
Cancer Genomics Data
TCGA 
Data Coordination Center 
Broad Institute 
Genome Data Analysis Center 
Standardized Data Sets 
Standardized Analyse...
Standardized Data Sets Standardized Analyses Analysis Reports 
Data set versioning 
Format normalization 
Removal of redac...
102 
Standardized Data Sets Standardized Analyses Analysis Reports 
http://gdac.broadinstitute.org 
individual downloads a...
102 
Standardized Data Sets Standardized Analyses Analysis Reports 
http://gdac.broadinstitute.org 
individual downloads a...
Standardized Data Sets Standardized Analyses Analysis Reports 
+ = one per 
Data Matrices Stratifications 
mRNA (array & s...
up to 24 data and result files 
from 18 Firehose archives 
up to 500 MB (190 MB compressed) 
Data Packages
Schroeder et al. Genome Medicine 2013, 5:9
Challenges 
How can we explore overlap of patient sets across stratifications? 
How can we compare properties of patient s...
CALEYDO 
StratomeX is part of the Caleydo Visualization Framework 
Implemented in Java, uses OpenGL and 
Eclipse Rich Clie...
CALEYDO 
StratomeX 
http://stratomex.caleydo.org 
http://www.github.com/caleydo 
A Lex, M Streit, H-J Schulz, C Partl, D S...
Plans 
! 
Where to go from here?
Domino 
S Gratzl, N Gehlenborg, A Lex, H Pfister and M Streit, “Domino: Extracting, Comparing, and 
Manipulating Subsets a...
INTEGRATION
INTEGRATION INTEGRATION
INTEGRATION 
Horizontal Integration across Data Types 
Biological Insight
Vertical Integration across Data Levels 
Confirmation & 
Troubleshooting 
INTEGRATION
Refinery Platform 
! 
! |
Refinery Platform 
! 
! | 
Data repository based on ISA-Tab for reproducible research 
Workflow execution in Galaxy 
Integ...
CALEYDO 
StratomeX 
http://stratomex.caleydo.org 
http://www.github.com/caleydo 
A Lex, M Streit, H-J Schulz, C Partl, D S...
Execute Logrank 
Test query 
Select displayed 
set 
Execute Jaccard 
Index query 
Select displayed 
Z[YH[PÄJH[PVU 
Execute...
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Upcoming SlideShare
Loading in …5
×

Visual Exploration of Clinical and Genomic Data for Patient Stratification

903 views

Published on

Talk presented at the Simons Foundation Biotech Symposium "Complex Data Visualization: Approach and Application" (12 September 2014)

http://www.simonsfoundation.org/event/complex-data-visualization-approach-and-application/

In this talk I describe how we integrated a sophisticated computational framework directly into the StratomeX visualization technique to enable rapid exploration of tens of thousands of stratifications in cancer genomics data, creating a unique and powerful tool for the identification and characterization of tumor subtypes. The tool can handle a wide range of genomic and clinical data types for cohorts with hundreds of patients. StratomeX also provides direct access to comprehensive data sets generated by The Cancer Genome Atlas Firehose analysis pipeline.

http://stratomex.caleydo.org

Published in: Science
  • Be the first to comment

Visual Exploration of Clinical and Genomic Data for Patient Stratification

  1. 1. Visual Exploration of Clinical and Genomic Data for Patient Stratification NILS GEHLENBORG ! @nils_gehlenborg・http://www.gehlenborg.com Broad Institute of MIT and Harvard Cancer Program Harvard Medical School Center for Biomedical Informatics
  2. 2. Team Alexander Lex Harvard School of Engineering and Applied Sciences, Cambridge, MA, USA Marc Streit Johannes Kepler University, Linz, Austria Christian Partl Graz University of Technology, Graz, Austria Sam Gratzl Johannes Kepler University, Linz, Austria Dieter Schmalstieg Graz University of Technology, Graz, Austria ! Hanspeter Pfister Harvard School of Engineering and Applied Sciences, Cambridge, MA, USA Peter J Park Harvard Medical School, Boston, MA, USA ! Nils Gehlenborg Harvard Medical School, Boston, MA, USA & Broad Institute, Cambridge, MA !! ! ! Special thanks to Broad Institute TCGA Genome Data Analysis Center Team in particular Michael S Noble, Lynda Chin & Gaddy Getz
  3. 3. Funding Peter J Park NIH/NCI The Cancer Genome Atlas ! Nils Gehlenborg NIH/NHGRI K99/R00 Pathway to Independence Award !!
  4. 4. ?
  5. 5. TCGA The Cancer Genome Atlas
  6. 6. 20+ cancer types × 500 patients
  7. 7. 10,000+ patients
  8. 8. mRNA expression microRNA expression DNA methylation protein expression copy number variants mutation calls clinical parameters
  9. 9. Stratome
  10. 10. Anthony92931 / Wikimedia Commons
  11. 11. Correlation with clusters based on other data types? Different outcomes? Mutations or copy number variants associated with clusters? Demographic differences?
  12. 12. Challenges How can we explore overlap of patient sets across stratifications? How can we compare properties of patient sets within a stratification? How can we discover “interesting” stratifications and pathways to consider How can we handle terabytes of clinical and genomic data in visualization tools?
  13. 13. Problem 1 ! Comparing Patient Sets across Stratifications
  14. 14. Patients Stratifications
  15. 15. mRNA Copy Number gene X Mutation gene Y del normal amp mut normal #1 #2 #3 #4
  16. 16. mRNA Copy Number gene X Mutation gene Y del normal amp mut normal #1 #2 #3 #4
  17. 17. mRNA Copy Number gene X Mutation gene Y del normal amp mut normal #1 #2 #3 #4
  18. 18. mRNA Copy Number gene X Mutation gene Y del normal amp mut normal #1 #2 #3 #4
  19. 19. StratomeX (short for Stratome Explorer)
  20. 20. mRNA Copy Number Mutation del normal amp mut normal #1 #2 #3 #4
  21. 21. Select band
  22. 22. Select block
  23. 23. Compare clusterings: consensus NMF and hierarchical
  24. 24. Park columns
  25. 25. Compare clusterings: left cluster split
  26. 26. Compare clusterings: right cluster split
  27. 27. Compare clusterings: left cluster contained in right cluster
  28. 28. Problem 2 ! Comparing Patient Sets within Stratifications
  29. 29. Block Visualizations: Patient Properties Numerical Data Matrix Vector Matrix + (Pathway) Maps Categorical Data Scalar
  30. 30. Add KEGG glioma pathway and map mRNA transcript levels
  31. 31. Modify color mapping on the fly
  32. 32. View pathway detail (cluster 2)
  33. 33. Zoom into pathway detail (cluster 2): EGFR down-regulated
  34. 34. View pathway detail (cluster 3)
  35. 35. Zoom into pathway detail (cluster 3): EGFR up-regulated
  36. 36. Add copy number for EGFR
  37. 37. Add copy number for EGFR
  38. 38. Add survival stratified by TP53 mutation status
  39. 39. View detail of Kaplan-Meier plot based on TP53
  40. 40. ?
  41. 41. Knowledge-driven Exploration Data-driven Exploration
  42. 42. Problem 3 ! Finding “Interesting” Stratifications and Pathways
  43. 43. Is there a mutation that overlaps with this mRNA cluster? Is there a mutually exclusive mutation? Is there a CNV that affects survival? Is there a pathway that is enriched in this cluster? Query Stratifications Clinical Params Pathways
  44. 44. Guided Exploration Query Retrieve Visualize Stratifications Clinical Params Pathways
  45. 45. LineUp S Gratzl, A Lex, N Gehlenborg, H Pfister and M Streit, “LineUp: Visual Analysis of Multi-Attribute Rankings“, IEEE Transactions on Visualization and Computer Graphics 19:2277-2286 (2013)
  46. 46. Example: Clear Cell Renal Carcinoma (KIRC) Main TCGA Paper published in Nature in 2013 ! First goal here: Characterize mRNA clusters
  47. 47. View TCGA mRNA subtypes
  48. 48. Add MutSig q-values for mutations
  49. 49. Invert q-value mapping
  50. 50. Add filter to inverted q-value as cut-off
  51. 51. Query mutated genes
  52. 52. Queries Retrieve Stratifications Sets with large overlap: Jaccard Index Similar stratifications: Adjusted Rand Index Survival: Log Rank Score (one vs rest) Retrieve Pathways Gene Set Enrichtment Score: original or PAGE (one vs rest)
  53. 53. Query mutated genes
  54. 54. Result of Jaccard Index query: preview PTEN
  55. 55. Query mutated genes
  56. 56. Query mutated genes
  57. 57. Query mutated genes with cluster m2
  58. 58. Result of Jaccard Index query: preview MTOR
  59. 59. Re-order columns
  60. 60. Add TCGA microRNA subtypes (direct insert mode)
  61. 61. Add TCGA microRNA subtypes (direct insert mode)
  62. 62. Observe large overlap between m1 and mi3
  63. 63. Observe large overlap between m3 and mi2
  64. 64. Query for copy number variation matching m3
  65. 65. Query only tumor suppressor genes (Vogelstein et al.)
  66. 66. Query only tumor suppressor genes (Vogelstein et al.)
  67. 67. Score only deletions
  68. 68. Score only deletions
  69. 69. Score only deletions
  70. 70. Score only deletions
  71. 71. Score only deletions
  72. 72. View CDKN2A copy number status and m3 and mi2 overlap
  73. 73. Add survival stratified by TCGA microRNA clusters
  74. 74. Find gene mutation that affects survival
  75. 75. Score only mutations
  76. 76. Score only mutations
  77. 77. Score only mutations
  78. 78. Score only mutations
  79. 79. View BAP1 mutation status and survival stratified by BAP1
  80. 80. View BAP1 mutation status and survival stratified by BAP1
  81. 81. View BAP1 mutation status and survival stratified by BAP1
  82. 82. Query for enriched pathway in TCGA mRNA cluster m4
  83. 83. Preview KEGG ribosome pathway overexpression in m4
  84. 84. Confirm selection
  85. 85. Change color mapping
  86. 86. View ribosome pathway detail for TCGA mRNA cluster m4
  87. 87. ?
  88. 88. Problem 4 ! Dealing with Terabytes of Cancer Genomics Data
  89. 89. TCGA Data Coordination Center Broad Institute Genome Data Analysis Center Standardized Data Sets Standardized Analyses Analysis Reports MSKCC cBio Portal TCGA Working Groups StratomeX ...
  90. 90. Standardized Data Sets Standardized Analyses Analysis Reports Data set versioning Format normalization Removal of redacted data . . . Mutation Analysis Copy Number Analysis Clustering Correlations Pathway Analysis . . .
  91. 91. 102 Standardized Data Sets Standardized Analyses Analysis Reports http://gdac.broadinstitute.org individual downloads and view reports firehose_get bulk download
  92. 92. 102 Standardized Data Sets Standardized Analyses Analysis Reports http://gdac.broadinstitute.org individual downloads and view reports firehose_get bulk download
  93. 93. Standardized Data Sets Standardized Analyses Analysis Reports + = one per Data Matrices Stratifications mRNA (array & sequencing) microRNA (array & sequencing) methylation reverse phase protein array clinical parameters clustering (CNMF & hierarchical) gene mutation status (binary) gene copy number status (5 class) Data Package tumor type
  94. 94. up to 24 data and result files from 18 Firehose archives up to 500 MB (190 MB compressed) Data Packages
  95. 95. Schroeder et al. Genome Medicine 2013, 5:9
  96. 96. Challenges How can we explore overlap of patient sets across stratifications? How can we compare properties of patient sets within a stratification? How can we discover “interesting” stratifications and pathways to consider How can we handle terabytes of clinical and genomic data in visualization tools?
  97. 97. CALEYDO StratomeX is part of the Caleydo Visualization Framework Implemented in Java, uses OpenGL and Eclipse Rich Client Platform Binaries available for Linux, Windows, Mac OS X Requires Java 1.7 JRE or JDK (on Mac OS X) Open source licensed under BSD license Source code on GitHub
  98. 98. CALEYDO StratomeX http://stratomex.caleydo.org http://www.github.com/caleydo A Lex, M Streit, H-J Schulz, C Partl, D Schmalstieg, PJ Park, N Gehlenborg, “StratomeX: Visual Analysis of Large-Scale Heterogeneous Genomics Data for Cancer Subtype Characterization”, Computer Graphics Forum (EuroVis '12), 31:1175-1184 (2012) M Streit, A Lex, S Gratzl, C Partl, D Schmalstieg, H Pfister, PJ Park, N Gehlenborg, “Guided Visual Exploration of Genomic Stratifications in Cancer”, Nature Methods 11:884–885 (2014)
  99. 99. Plans ! Where to go from here?
  100. 100. Domino S Gratzl, N Gehlenborg, A Lex, H Pfister and M Streit, “Domino: Extracting, Comparing, and Manipulating Subsets across Multiple Tabular Datasets“, IEEE Transactions on Visualization and Computer Graphics (2014)
  101. 101. INTEGRATION
  102. 102. INTEGRATION INTEGRATION
  103. 103. INTEGRATION Horizontal Integration across Data Types Biological Insight
  104. 104. Vertical Integration across Data Levels Confirmation & Troubleshooting INTEGRATION
  105. 105. Refinery Platform ! ! |
  106. 106. Refinery Platform ! ! | Data repository based on ISA-Tab for reproducible research Workflow execution in Galaxy Integrated visualization tools with access to provenance http://www.refinery-platform.org
  107. 107. CALEYDO StratomeX http://stratomex.caleydo.org http://www.github.com/caleydo A Lex, M Streit, H-J Schulz, C Partl, D Schmalstieg, PJ Park, N Gehlenborg, “StratomeX: Visual Analysis of Large-Scale Heterogeneous Genomics Data for Cancer Subtype Characterization”, Computer Graphics Forum (EuroVis '12), 31:1175-1184 (2012) M Streit, A Lex, S Gratzl, C Partl, D Schmalstieg, H Pfister, PJ Park, N Gehlenborg, “Guided Visual Exploration of Genomic Stratifications in Cancer”, Nature Methods 11:884–885 (2014)
  108. 108. Execute Logrank Test query Select displayed set Execute Jaccard Index query Select displayed Z[YH[PÄJH[PVU Execute Adjusted Rand Index query 6WLU8LY`PaHYK 6WLU8LY`PaHYK Select pathway Select displayed set Add other data Execute GSEA query Select displayed Z[YH[PÄJH[PVU Select displayed Z[YH[PÄJH[PVU Select clinical param. in LineUp view Manually Execute Logrank Test query Execute PAGE query :LSLJ[Z[YH[PÄJH[PVU :LSLJ[Z[YH[PÄJH[PVU :LSLJ[Z[YH[PÄJH[PVU :LSLJ[Z[YH[PÄJH[PVU :LSLJ[Z[YH[PÄJH[PVU in LineUp view Select pathway Select pathway Select pathway Select clinical param. in LineUp view (KKZ[YH[PÄJH[PVU Based on Logrank Test score (survival) Based on similarity to KPZWSH`LKZ[YH[PÄJH[PVU Based on overlap with displayed set Add pathway Stratify with displayed Z[YH[PÄJH[PVU Find based on differential expression in displayed set Stratify with displayed Z[YH[PÄJH[PVU Display UZ[YH[PÄLK Add pathway Based on Logrank Test score (survival) Add other data Add independent column Add dependent column Add independent column to existing one Manually Based on GSEA Based on PAGE 6WLU8LY`PaHYK Select clinical param. in LineUp view in LineUp view in LineUp view in LineUp view in LineUp view in LineUp view in LineUp view in LineUp view in LineUp view

×