Nils Gehlenborg, PhD
Department of Biomedical Informatics
Harvard Medical School
@ngehlenborg
Nils Gehlenborg, PhD
Department of Biomedical Informatics
Harvard Medical School
@ngehlenborg
+ = ?
My Time at EMBL-EBI
Visualization and Exploration of
Transcriptomics Data
HYPOTHESIS
experiment
DATA
HYPOTHESIS
experiment
DATA
INSIGHT HYPOTHESIS
interpretation
experiment
DATA
INSIGHT HYPOTHESIS
interpretation
hypothesis
generation
REPORT
experiment
DATA
INSIGHT HYPOTHESIS
interpretation
hypothesis
generation
REPORT
experiment
DATA
INSIGHT HYPOTHESIS
interpretation
hypothesis
generation
EXPLANATION
REPORT
experiment
DATA
INSIGHT HYPOTHESIS
interpretation
hypothesis
generation
EXPLANATION
“Storytelling”
REPORT
experiment
DATA
INSIGHT HYPOTHESIS
interpretation
hypothesis
generation
EXPLORATION
EXPLANATION
“Storytelling”
REPORT
experiment
DATA
INSIGHT HYPOTHESIS
interpretation
hypothesis
generation
EXPLORATION
EXPLANATION
“Storytelling”
“Pattern Discovery”
REPORT
experiment
DATA
INSIGHT HYPOTHESIS
interpretation
HYPOTHESIS
hypothesis
generation
EXPLORATION
EXPLANATION
“Storytelling”
“Pattern Discovery”
HYPOTHESIS-DRIVEN DISCOVERY
REPORT
experiment
DATA
INSIGHT HYPOTHESIS
interpretation
DATA
hypothesis
generation
EXPLORATION
EXPLANATION
“Storytelling”
“Pattern Discovery”
DATA-DRIVEN DISCOVERY
REPORT
experiment
DATA
INSIGHT HYPOTHESIS
interpretation
DATA
hypothesis
generation
EXPLORATION
EXPLANATION
“Storytelling”
“Pattern Discovery”
DATA-DRIVEN DISCOVERY
Database
Database Data Set
Database Data Set
Database Data Set
Database Data Set
?
?
?
Shoe salesperson - an expert to find the shoes
you are looking for
?
Shoe salesperson - an expert to find the shoes
you are looking for
Display rack - a means to display the content
of the shoe boxes
?
Space
Maps
Multi-Resolution Glyphs
Gehlenborg and Brazma, BMC Bioinformatics, 2009
Space
Maps
Real World Example
low high
Human	Gene	Expression	Map	
SNAP-25
Level 1
1 node
Level 2
4 nodes
Level 3
15 nodes
Level 4
371 nodes
Level 5
5,372 nodes
Gehlenborg and Brazma, BMC Bioinformatics, 2009
Space
Maps
Arrangement of Glyphs
Grid Projection
NeRV
Knowledge-driven
Data-driven
Data-driven
Gehlenborg and Brazma, BMC Bioinformatics, 2009
REX Topic Model
Blei	et	al.	(2003);	Caldas,	Gehlenborg	et	al.	(2009)
infer
Topic Model
comparison:	
list	of		
gene	sets	w/	counts
T T T
GS GS GS GS
comparison:
C
distribution	over
distributions	over
Caldas, Gehlenborg, et al., Bioinformatics, 2009
Retrieval of
Relevant
Experiments
REX Topic Model
Blei	et	al.	(2003);	Caldas,	Gehlenborg	et	al.	(2009)
infer
Topic Model
comparison:	
list	of		
gene	sets	w/	counts
T T T
GS GS GS GS
comparison:
document:
document:
words
C
W W W W
distribution	over
distributions	over
(for	text	documents)
D
“Car”
driver
engine
driving
tires
mileage
Caldas, Gehlenborg, et al., Bioinformatics, 2009
Retrieval of
Relevant
Experiments
REX
Caldas, Gehlenborg, et al., Bioinformatics, 2009
Retrieval of
Relevant
Experiments
Moving on to
Harvard Medical School
The Cancer Genome Atlas
10,000+ patients
20+ tumor types
The Cancer Genome Atlas
10,000+ patients
20+ tumor types
mRNA expression
The Cancer Genome Atlas
10,000+ patients
20+ tumor types
microRNA expression
mRNA expression
The Cancer Genome Atlas
10,000+ patients
20+ tumor types
microRNA expression
protein expression
mRNA expression
The Cancer Genome Atlas
10,000+ patients
20+ tumor types
microRNA expression
protein expression
mutation calls
mRNA expression
The Cancer Genome Atlas
10,000+ patients
20+ tumor types
microRNA expression
protein expression
copy number variants
mutation calls
mRNA expression
The Cancer Genome Atlas
10,000+ patients
20+ tumor types
microRNA expression
DNA methylation
protein expression
copy number variants
mutation calls
mRNA expression
The Cancer Genome Atlas
10,000+ patients
20+ tumor types
microRNA expression
DNA methylation
protein expression
copy number variants
mutation calls
clinical parameters
mRNA expression
The Cancer Genome Atlas
10,000+ patients
20+ tumor types
mRNA expression
C4C3C2C1
mRNA expression clustering
Tumor Subtypes
C4C3C2C1
LONGER TYPICAL SHORTER
patient survival time
mRNA expression clustering
Tumor Subtypes
C4C3C2C1
LONGER TYPICAL SHORTER
WILDTYPEMUT
patient survival time
mutation status of gene Y
mRNA expression clustering
Tumor Subtypes
C4C3C2C1
WILDTYPEMUT
mRNA expression clustering
patient survival time
mutation status of gene Y
Tumor Subtypes
LONGER TYPICAL SHORTER
C4C3C2C1
WILDTYPEMUT
mRNA expression clustering
patient survival time
mutation status of gene Y
Tumor Subtypes
LONGER TYPICAL SHORTER
C4C3C2C1
WILDTYPEMUT
mRNA expression clustering
patient survival time
mutation status of gene Y
Tumor Subtypes
LONGER TYPICAL SHORTER
C4C3C2C1
WILDTYPEMUT
patient survival time
mutation status of gene Y
mRNA expression clustering
Tumor Subtypes
LONGER TYPICAL SHORTER
StratomeX
StratomeX
PROBLEM 1
Visualize overlap of patient sets across two or more stratifications.
StratomeX
PROBLEM 1
Visualize overlap of patient sets across two or more stratifications.
PROBLEM 2
Visualize characteristics of patient sets within a stratification of interest.
M Streit, A Lex, S Gratzl, C Partl, D Schmalstieg, H Pfister, P Park, N Gehlenborg , Nature Methods, 2014
StratomeX
PROBLEM 1
Visualize overlap of patient sets across two or more stratifications.
PROBLEM 2
Visualize characteristics of patient sets within a stratification of interest.
StratomeX
PROBLEM 1
Visualize overlap of patient sets across two or more stratifications.
PROBLEM 2
Visualize characteristics of patient sets within a stratification of interest.
PROBLEM 3
Identify relevant stratifications, pathways, and clinical variables.
Stratifications
Clinical Params
Pathways
Guided
Exploration
M Streit, A Lex, S Gratzl, C Partl, D Schmalstieg, H Pfister, P Park, N Gehlenborg , Nature Methods (2014)
Query
Stratifications
Clinical Params
Pathways
Guided
Exploration
M Streit, A Lex, S Gratzl, C Partl, D Schmalstieg, H Pfister, P Park, N Gehlenborg , Nature Methods (2014)
Query
Stratifications
Clinical Params
Pathways
Guided
Exploration
M Streit, A Lex, S Gratzl, C Partl, D Schmalstieg, H Pfister, P Park, N Gehlenborg , Nature Methods (2014)
Is there a mutation that overlaps with this mRNA cluster?
Query
Stratifications
Clinical Params
Pathways
Guided
Exploration
M Streit, A Lex, S Gratzl, C Partl, D Schmalstieg, H Pfister, P Park, N Gehlenborg , Nature Methods (2014)
Is there a mutation that overlaps with this mRNA cluster?
Is there a mutually exclusive mutation?
Query
Stratifications
Clinical Params
Pathways
Guided
Exploration
M Streit, A Lex, S Gratzl, C Partl, D Schmalstieg, H Pfister, P Park, N Gehlenborg , Nature Methods (2014)
Is there a mutation that overlaps with this mRNA cluster?
Is there a CNV that affects survival?
Is there a mutually exclusive mutation?
Query
Stratifications
Clinical Params
Pathways
Guided
Exploration
M Streit, A Lex, S Gratzl, C Partl, D Schmalstieg, H Pfister, P Park, N Gehlenborg , Nature Methods (2014)
Is there a mutation that overlaps with this mRNA cluster?
Is there a CNV that affects survival?
Is there a pathway that is enriched in this cluster?
Is there a mutually exclusive mutation?
Query
Stratifications
Clinical Params
Pathways
Guided
Exploration
M Streit, A Lex, S Gratzl, C Partl, D Schmalstieg, H Pfister, P Park, N Gehlenborg , Nature Methods (2014)
Query
Rank
Stratifications
Clinical Params
Pathways
Guided
Exploration
M Streit, A Lex, S Gratzl, C Partl, D Schmalstieg, H Pfister, P Park, N Gehlenborg , Nature Methods (2014)
Query
Rank
Visualize
Stratifications
Clinical Params
Pathways
Guided
Exploration
M Streit, A Lex, S Gratzl, C Partl, D Schmalstieg, H Pfister, P Park, N Gehlenborg , Nature Methods (2014)
Dekker et al., Nature, 2017
Genome Interaction Data
Genome Interaction Data
PROBLEM 1
Visualize a matrix of 3 million x 3 million cells across multiple scales
Genome Interaction Data
PROBLEM 1
Visualize a matrix of 3 million x 3 million cells across multiple scales
PROBLEM 2
Support comparison of many different conditions
Kerpedjiev et al., biorxiv, 2018; http://higlass.io/app/?config=TKXaqsSIRvGEcw2dAUQvxg
Kerpedjiev et al., biorxiv, 2018; http://higlass.io/app/?config=TKXaqsSIRvGEcw2dAUQvxg
Kerpedjiev et al., biorxiv, 2018; Schwarzer et al., Nature, 2017
Kerpedjiev et al., biorxiv, 2018; Schwarzer et al., Nature, 2017
Kerpedjiev et al., biorxiv, 2018; Schwarzer et al., Nature, 2017
Kerpedjiev et al., biorxiv, 2018; Schwarzer et al., Nature, 2017
http://higlass.io/app/?config=dyE970c4TH21onnRvT1PmQ
http://higlass.io/app/?config=dyE970c4TH21onnRvT1PmQ
http://higlass.io/app/?config=dyE970c4TH21onnRvT1PmQ
Lekschas et al., Transactions on Visualization and Computer Graphics, 2018
Lekschas et al., Transactions on Visualization and Computer Graphics, 2018
Lekschas et al., Transactions on Visualization and Computer Graphics, 2018
Lekschas et al., Transactions on Visualization and Computer Graphics, 2018
Lekschas et al., Transactions on Visualization and Computer Graphics, 2018
Lekschas et al., Transactions on Visualization and Computer Graphics, 2018
Beyond the Lab
EDUCATION
Nature Methods Points of View;
courses and workshops
OPEN SCIENCE
Preprints for 4D Nucleome
Consortium
OPEN SCIENCE
Preprints for 4D Nucleome
Consortium
COMMUNITY BUILDING
VIZBI, BioVis, and other meetings
A Tale of Great Perseverance
Thank You!
EMBL Alumni Association
Kay Nieselt
Alvis Brazma
Nicholas Luscombe
Gos Micklem
Lars Steinmetz
Peter J Park
Isaac Kohane
Alexander Lex
Marc Streit
Hanspeter Pfister
Peter Kerpedjiev
Fritz Lekschas
Sabrina Nusrat
Jennifer Marx
Scott Ouellette
Chuck McCallum
Theresa Harbig
Danielle Nguyen
Jake Conway
Undina Gisladottir
Maureen
Ruby and Clara
EMBL Alumni Association
Kay Nieselt
Alvis Brazma
Nicholas Luscombe
Gos Micklem
Lars Steinmetz
Peter J Park
Isaac Kohane
Alexander Lex
Marc Streit
Hanspeter Pfister
Peter Kerpedjiev
Fritz Lekschas
Sabrina Nusrat
Jennifer Marx
Scott Ouellette
Chuck McCallum
Theresa Harbig
Danielle Nguyen
Jake Conway
Undina Gisladottir
Maureen
Ruby and Clara
EMBL John Kendrew Award Lecture 2018

EMBL John Kendrew Award Lecture 2018