Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Visual Exploration of Clinical and Genomic Data for Patient Stratification

928 views

Published on

Talk presented at the Simons Foundation Biotech Symposium "Complex Data Visualization: Approach and Application" (12 September 2014)

http://www.simonsfoundation.org/event/complex-data-visualization-approach-and-application/

In this talk I describe how we integrated a sophisticated computational framework directly into the StratomeX visualization technique to enable rapid exploration of tens of thousands of stratifications in cancer genomics data, creating a unique and powerful tool for the identification and characterization of tumor subtypes. The tool can handle a wide range of genomic and clinical data types for cohorts with hundreds of patients. StratomeX also provides direct access to comprehensive data sets generated by The Cancer Genome Atlas Firehose analysis pipeline.

http://stratomex.caleydo.org

Published in: Science
  • Be the first to comment

Visual Exploration of Clinical and Genomic Data for Patient Stratification

  1. 1. Visual Exploration of Clinical and Genomic Data for Patient Stratification NILS GEHLENBORG ! @nils_gehlenborg・http://www.gehlenborg.com Broad Institute of MIT and Harvard Cancer Program Harvard Medical School Center for Biomedical Informatics
  2. 2. Team Alexander Lex Harvard School of Engineering and Applied Sciences, Cambridge, MA, USA Marc Streit Johannes Kepler University, Linz, Austria Christian Partl Graz University of Technology, Graz, Austria Sam Gratzl Johannes Kepler University, Linz, Austria Dieter Schmalstieg Graz University of Technology, Graz, Austria ! Hanspeter Pfister Harvard School of Engineering and Applied Sciences, Cambridge, MA, USA Peter J Park Harvard Medical School, Boston, MA, USA ! Nils Gehlenborg Harvard Medical School, Boston, MA, USA & Broad Institute, Cambridge, MA !! ! ! Special thanks to Broad Institute TCGA Genome Data Analysis Center Team in particular Michael S Noble, Lynda Chin & Gaddy Getz
  3. 3. Funding Peter J Park NIH/NCI The Cancer Genome Atlas ! Nils Gehlenborg NIH/NHGRI K99/R00 Pathway to Independence Award !!
  4. 4. ?
  5. 5. TCGA The Cancer Genome Atlas
  6. 6. 20+ cancer types × 500 patients
  7. 7. 10,000+ patients
  8. 8. mRNA expression microRNA expression DNA methylation protein expression copy number variants mutation calls clinical parameters
  9. 9. Stratome
  10. 10. Anthony92931 / Wikimedia Commons
  11. 11. Correlation with clusters based on other data types? Different outcomes? Mutations or copy number variants associated with clusters? Demographic differences?
  12. 12. Challenges How can we explore overlap of patient sets across stratifications? How can we compare properties of patient sets within a stratification? How can we discover “interesting” stratifications and pathways to consider How can we handle terabytes of clinical and genomic data in visualization tools?
  13. 13. Problem 1 ! Comparing Patient Sets across Stratifications
  14. 14. Patients Stratifications
  15. 15. mRNA Copy Number gene X Mutation gene Y del normal amp mut normal #1 #2 #3 #4
  16. 16. mRNA Copy Number gene X Mutation gene Y del normal amp mut normal #1 #2 #3 #4
  17. 17. mRNA Copy Number gene X Mutation gene Y del normal amp mut normal #1 #2 #3 #4
  18. 18. mRNA Copy Number gene X Mutation gene Y del normal amp mut normal #1 #2 #3 #4
  19. 19. StratomeX (short for Stratome Explorer)
  20. 20. mRNA Copy Number Mutation del normal amp mut normal #1 #2 #3 #4
  21. 21. Select band
  22. 22. Select block
  23. 23. Compare clusterings: consensus NMF and hierarchical
  24. 24. Park columns
  25. 25. Compare clusterings: left cluster split
  26. 26. Compare clusterings: right cluster split
  27. 27. Compare clusterings: left cluster contained in right cluster
  28. 28. Problem 2 ! Comparing Patient Sets within Stratifications
  29. 29. Block Visualizations: Patient Properties Numerical Data Matrix Vector Matrix + (Pathway) Maps Categorical Data Scalar
  30. 30. Add KEGG glioma pathway and map mRNA transcript levels
  31. 31. Modify color mapping on the fly
  32. 32. View pathway detail (cluster 2)
  33. 33. Zoom into pathway detail (cluster 2): EGFR down-regulated
  34. 34. View pathway detail (cluster 3)
  35. 35. Zoom into pathway detail (cluster 3): EGFR up-regulated
  36. 36. Add copy number for EGFR
  37. 37. Add copy number for EGFR
  38. 38. Add survival stratified by TP53 mutation status
  39. 39. View detail of Kaplan-Meier plot based on TP53
  40. 40. ?
  41. 41. Knowledge-driven Exploration Data-driven Exploration
  42. 42. Problem 3 ! Finding “Interesting” Stratifications and Pathways
  43. 43. Is there a mutation that overlaps with this mRNA cluster? Is there a mutually exclusive mutation? Is there a CNV that affects survival? Is there a pathway that is enriched in this cluster? Query Stratifications Clinical Params Pathways
  44. 44. Guided Exploration Query Retrieve Visualize Stratifications Clinical Params Pathways
  45. 45. LineUp S Gratzl, A Lex, N Gehlenborg, H Pfister and M Streit, “LineUp: Visual Analysis of Multi-Attribute Rankings“, IEEE Transactions on Visualization and Computer Graphics 19:2277-2286 (2013)
  46. 46. Example: Clear Cell Renal Carcinoma (KIRC) Main TCGA Paper published in Nature in 2013 ! First goal here: Characterize mRNA clusters
  47. 47. View TCGA mRNA subtypes
  48. 48. Add MutSig q-values for mutations
  49. 49. Invert q-value mapping
  50. 50. Add filter to inverted q-value as cut-off
  51. 51. Query mutated genes
  52. 52. Queries Retrieve Stratifications Sets with large overlap: Jaccard Index Similar stratifications: Adjusted Rand Index Survival: Log Rank Score (one vs rest) Retrieve Pathways Gene Set Enrichtment Score: original or PAGE (one vs rest)
  53. 53. Query mutated genes
  54. 54. Result of Jaccard Index query: preview PTEN
  55. 55. Query mutated genes
  56. 56. Query mutated genes
  57. 57. Query mutated genes with cluster m2
  58. 58. Result of Jaccard Index query: preview MTOR
  59. 59. Re-order columns
  60. 60. Add TCGA microRNA subtypes (direct insert mode)
  61. 61. Add TCGA microRNA subtypes (direct insert mode)
  62. 62. Observe large overlap between m1 and mi3
  63. 63. Observe large overlap between m3 and mi2
  64. 64. Query for copy number variation matching m3
  65. 65. Query only tumor suppressor genes (Vogelstein et al.)
  66. 66. Query only tumor suppressor genes (Vogelstein et al.)
  67. 67. Score only deletions
  68. 68. Score only deletions
  69. 69. Score only deletions
  70. 70. Score only deletions
  71. 71. Score only deletions
  72. 72. View CDKN2A copy number status and m3 and mi2 overlap
  73. 73. Add survival stratified by TCGA microRNA clusters
  74. 74. Find gene mutation that affects survival
  75. 75. Score only mutations
  76. 76. Score only mutations
  77. 77. Score only mutations
  78. 78. Score only mutations
  79. 79. View BAP1 mutation status and survival stratified by BAP1
  80. 80. View BAP1 mutation status and survival stratified by BAP1
  81. 81. View BAP1 mutation status and survival stratified by BAP1
  82. 82. Query for enriched pathway in TCGA mRNA cluster m4
  83. 83. Preview KEGG ribosome pathway overexpression in m4
  84. 84. Confirm selection
  85. 85. Change color mapping
  86. 86. View ribosome pathway detail for TCGA mRNA cluster m4
  87. 87. ?
  88. 88. Problem 4 ! Dealing with Terabytes of Cancer Genomics Data
  89. 89. TCGA Data Coordination Center Broad Institute Genome Data Analysis Center Standardized Data Sets Standardized Analyses Analysis Reports MSKCC cBio Portal TCGA Working Groups StratomeX ...
  90. 90. Standardized Data Sets Standardized Analyses Analysis Reports Data set versioning Format normalization Removal of redacted data . . . Mutation Analysis Copy Number Analysis Clustering Correlations Pathway Analysis . . .
  91. 91. 102 Standardized Data Sets Standardized Analyses Analysis Reports http://gdac.broadinstitute.org individual downloads and view reports firehose_get bulk download
  92. 92. 102 Standardized Data Sets Standardized Analyses Analysis Reports http://gdac.broadinstitute.org individual downloads and view reports firehose_get bulk download
  93. 93. Standardized Data Sets Standardized Analyses Analysis Reports + = one per Data Matrices Stratifications mRNA (array & sequencing) microRNA (array & sequencing) methylation reverse phase protein array clinical parameters clustering (CNMF & hierarchical) gene mutation status (binary) gene copy number status (5 class) Data Package tumor type
  94. 94. up to 24 data and result files from 18 Firehose archives up to 500 MB (190 MB compressed) Data Packages
  95. 95. Schroeder et al. Genome Medicine 2013, 5:9
  96. 96. Challenges How can we explore overlap of patient sets across stratifications? How can we compare properties of patient sets within a stratification? How can we discover “interesting” stratifications and pathways to consider How can we handle terabytes of clinical and genomic data in visualization tools?
  97. 97. CALEYDO StratomeX is part of the Caleydo Visualization Framework Implemented in Java, uses OpenGL and Eclipse Rich Client Platform Binaries available for Linux, Windows, Mac OS X Requires Java 1.7 JRE or JDK (on Mac OS X) Open source licensed under BSD license Source code on GitHub
  98. 98. CALEYDO StratomeX http://stratomex.caleydo.org http://www.github.com/caleydo A Lex, M Streit, H-J Schulz, C Partl, D Schmalstieg, PJ Park, N Gehlenborg, “StratomeX: Visual Analysis of Large-Scale Heterogeneous Genomics Data for Cancer Subtype Characterization”, Computer Graphics Forum (EuroVis '12), 31:1175-1184 (2012) M Streit, A Lex, S Gratzl, C Partl, D Schmalstieg, H Pfister, PJ Park, N Gehlenborg, “Guided Visual Exploration of Genomic Stratifications in Cancer”, Nature Methods 11:884–885 (2014)
  99. 99. Plans ! Where to go from here?
  100. 100. Domino S Gratzl, N Gehlenborg, A Lex, H Pfister and M Streit, “Domino: Extracting, Comparing, and Manipulating Subsets across Multiple Tabular Datasets“, IEEE Transactions on Visualization and Computer Graphics (2014)
  101. 101. INTEGRATION
  102. 102. INTEGRATION INTEGRATION
  103. 103. INTEGRATION Horizontal Integration across Data Types Biological Insight
  104. 104. Vertical Integration across Data Levels Confirmation & Troubleshooting INTEGRATION
  105. 105. Refinery Platform ! ! |
  106. 106. Refinery Platform ! ! | Data repository based on ISA-Tab for reproducible research Workflow execution in Galaxy Integrated visualization tools with access to provenance http://www.refinery-platform.org
  107. 107. CALEYDO StratomeX http://stratomex.caleydo.org http://www.github.com/caleydo A Lex, M Streit, H-J Schulz, C Partl, D Schmalstieg, PJ Park, N Gehlenborg, “StratomeX: Visual Analysis of Large-Scale Heterogeneous Genomics Data for Cancer Subtype Characterization”, Computer Graphics Forum (EuroVis '12), 31:1175-1184 (2012) M Streit, A Lex, S Gratzl, C Partl, D Schmalstieg, H Pfister, PJ Park, N Gehlenborg, “Guided Visual Exploration of Genomic Stratifications in Cancer”, Nature Methods 11:884–885 (2014)
  108. 108. Execute Logrank Test query Select displayed set Execute Jaccard Index query Select displayed Z[YH[PÄJH[PVU Execute Adjusted Rand Index query 6WLU8LY`PaHYK 6WLU8LY`PaHYK Select pathway Select displayed set Add other data Execute GSEA query Select displayed Z[YH[PÄJH[PVU Select displayed Z[YH[PÄJH[PVU Select clinical param. in LineUp view Manually Execute Logrank Test query Execute PAGE query :LSLJ[Z[YH[PÄJH[PVU :LSLJ[Z[YH[PÄJH[PVU :LSLJ[Z[YH[PÄJH[PVU :LSLJ[Z[YH[PÄJH[PVU :LSLJ[Z[YH[PÄJH[PVU in LineUp view Select pathway Select pathway Select pathway Select clinical param. in LineUp view (KKZ[YH[PÄJH[PVU Based on Logrank Test score (survival) Based on similarity to KPZWSH`LKZ[YH[PÄJH[PVU Based on overlap with displayed set Add pathway Stratify with displayed Z[YH[PÄJH[PVU Find based on differential expression in displayed set Stratify with displayed Z[YH[PÄJH[PVU Display UZ[YH[PÄLK Add pathway Based on Logrank Test score (survival) Add other data Add independent column Add dependent column Add independent column to existing one Manually Based on GSEA Based on PAGE 6WLU8LY`PaHYK Select clinical param. in LineUp view in LineUp view in LineUp view in LineUp view in LineUp view in LineUp view in LineUp view in LineUp view in LineUp view

×