Visual Analytics in omics - why, what, how?
Prof Jan Aerts

STADIUS - ESAT, Faculty of Engineering, University of Leuven, ...
• What problem are we trying to solve?


• What is Visual Analytics and how can it help?


• How do we actually do this?

...
A. What’s the problem?

3
hypothesis-driven -> data-driven
Scientific Research Paradigms (Jim Gray, Microsoft)

!

1st

1,000s years ago

empirical

...
What does this mean?
• immense re-use of existing datasets

• biologically interesting signals may be too poorly understoo...
For domain expert: what’s my hypothesis?

Martin Krzywinski
7
For developer and domain expert:

opening the black box
input
filter 1
filter 2
filter 3
output A

output B

output C
8
B. What is Visual Analytics and how can it help?

9
Our research interest:

visual design + interaction design + backend

10
What is visualization?

visualization of simulations

in situ visualization

of real-world structures

11
What is visualization?

T. Munzner

12
What is visualization?

cognition <=> perception
cognitive task => perceptive task

T. Munzner

13
Why do we visualize data?
• record information

• blueprints, photographs,

seismographs, ...

• analyze data to support r...
Sedlmair et al. IEEE Transactions on Visualization and Computer Graphics. 2012
The strength of visualization
pictorial superiority effect
“information”
72hr

“informa”
65%

“i”
10%
17
Steven’s psychophysical law
= proposed relationship between the magnitude of a physical stimulus and its
perceived intensi...
Accuracy of quantitative perceptual tasks
how much (quantitative)

what/where (qualitative)

McKinlay
19
Accuracy of quantitative perceptual tasks
how much (quantitative)

what/where (qualitative)

McKinlay
20
Accuracy of quantitative perceptual tasks
how much (quantitative)

what/where (qualitative)

“power of the plane”

McKinla...
Pre-attentive vision
= ability of low-level human visual system to rapidly identify certain basic visual
properties

• som...
23
24
Limitations of preattentive vision
1. Combining pre-attentive features does not always work => would need to
resort to “se...
Gestalt laws - interplay between parts and the
whole

26
Gestalt laws - interplay between parts and the
whole
• simplicity


• familiarity


• proximity


• symmetry

• similarity...
Bret Victor - Ladder of abstration

28
For domain expert: what’s my hypothesis?

Martin Krzywinski
29
Martin Krzywinski
30
Martin Krzywinski
31
For developer and domain expert:

opening the black box
input
filter 1
filter 2
filter 3
output A

output B

output C
32
B

A

C
33
B

A

C
34
B

A

C
35
C. How do we actually do this?

36
Talking to domain experts

37
Data visualization framework

38
Card sorting

39
Tools of the trade

40
Processing - http://processing.org
• java

41
D3 - http://d3js.org/
• javascript

42
Vega - https://github.com/trifacta/vega/wiki
• html + json

43
D. Examples

Data exploration
Data filtering
User-guided analysis

44
Data exploration

HiTSee
Bertini E et al. IEEE Symposium on Biological Data Visualization (2011)
Aracari
Bartlett C et al. BMC Bioinformatics (2012)

Ryo Sakai
46
Reveal
Jäger, G et al. Bioinformatics (2012)
Meander
Pavlopoulos et al. Nucl Acids Res (2013)

Georgios
Pavlopoulos

48
ParCoord

Endeavour gene prioritization

Boogaerts T et al. IEEE International Conference on
Bioinformatics & Bioengineeri...
Sequence logo
Seagull
subgroup

similarity

difference
Data filtering (visual parameter setting)

TrioVis
Sakai R et al. Bioinformatics (2013)

Ryo Sakai

54
User-guided analysis
clustering

regions of interest

Spark
Nielsen et al. Genome Research (2012)

data samples
chromatin ...
BaobabView
decision trees

van den Elzen S & van Wijk J. IEEE Conference on
Visual Analytics Science and Technology (2011)
E. Challenges

57
Many challenges remain
• scalability (data processing + perception), uncertainty, “interestingness”,
interaction, evaluati...
Computational scalability
• speed
• preprocessing big data: mapreduce = batch

• interactivity: max 0.3 sec lag!

• size
•...
• Options:


• distribute visualization calculations over cluster


• distributing scala/spark or other “real-time” mapred...
Perceptual scalability
• “overview first, then zoom and filter, details on demand”: breaks down with
very big datasets

• “a...
Thank you
• Georgios Pavlopoulos

• Ryo Sakai

• Thomas Boogaerts

• Toni Verbeiren

• Data Visualization Lab (datavislab....
Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?
Upcoming SlideShare
Loading in …5
×

Visual Analytics in Omics - why, what, how?

1,860 views

Published on

Published in: Education, Technology
5 Comments
4 Likes
Statistics
Notes
  • slide 19 to 28 were probably the most useful. Serving as a reminder on what we should keep in mind while making a visualisation. For the rest it was very interesting to see how data visualisation helps making hypothesises/validating in an expert environment :)
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Beside the 'accuracy of quantitative perceptual tasks', we believe 'perceptual scalability' is also a key take-away message. Or being more detailed: the change of the sentence 'overview first, then zoom and filter, details on demand' to 'analyze first, show results, then zoom and filter, details on demand'. Maybe this key sentence isn't that highly applicable to the Multimedia course now, due to the limited sizes of the datasets, but it will be useful for all of us in the future. Then we will be dealing with much larger data and we can apply some rules from the slides.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • indeed, as my colleagues stated, We/I liked that fact that we saw some things again. We were immediately discussing how well we used certain principles! The examples were really interesting, although not really relevant to what we are doing, as we only have a small amount of data (football results).

    Pieterjan (topija.wordpress.com)
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • I agree with Adriaan: we saw the slide on 'accuracy of perceptual tasks' (slide 19) at the beginning of the course. Now, after creating our own visualizations, it is very interesting to retrospect on this. Also, i think this presentation really complements what we've seen so far in the mume course, since it focuses more on visualizations for domain experts. Related to that, i really liked the multitude of examples of visualizations: each of them clearly made the task of reasoning about (often complex) data a lot more intuitive.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • With regard to the #mume13 course at the K.U.Leuven, where you gave this presentation last week, it was probably most useful to be reminded about the 'accuracy of quantitative perceptual tasks' (slides 19 and following). (I believe most members of the audience took this as the key take-away message.)
    What was extra interesting for our project specifically was the notes on computational scalability. Some of these aspects, although not nearly as extreme as when working with big data (I can imagine), are already of concern to us: various data sources that we need to 'unify', the interactivity suffering from computational lag, ... and some of the solution methods that were suggested on slide 60.
    Either way I found it a very interesting presentation, brought in an enthusiastic and engaging fashion. Thank you!
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
1,860
On SlideShare
0
From Embeds
0
Number of Embeds
206
Actions
Shares
0
Downloads
23
Comments
5
Likes
4
Embeds 0
No embeds

No notes for slide

Visual Analytics in Omics - why, what, how?

  1. 1. Visual Analytics in omics - why, what, how? Prof Jan Aerts STADIUS - ESAT, Faculty of Engineering, University of Leuven, Belgium Data Visualization Lab ! jan.aerts@esat.kuleuven.be jan@datavislab.org creativecommons.org/licenses/by-nc/3.0/
  2. 2. • What problem are we trying to solve? • What is Visual Analytics and how can it help? • How do we actually do this? • Some examples • Challenges 2
  3. 3. A. What’s the problem? 3
  4. 4. hypothesis-driven -> data-driven Scientific Research Paradigms (Jim Gray, Microsoft) ! 1st 1,000s years ago empirical ! 2nd 100s years ago theoretical ! 3rd last few decades computational 4rd today data exploration ! I have an hypothesis -> need to generate data to (dis)prove it.
 I have data -> need to find hypotheses that I can test. 4
  5. 5. What does this mean? • immense re-use of existing datasets • biologically interesting signals may be too poorly understood to be analyzed in automated fashion • much of initial analysis is exploratory in nature => what’s my hypothesis?
 => searching for unknown unknowns • automated algorithms often act as black boxes => biologists must have blind faith in bioinformatician (and bioinformatician in his/her own skills) 5
  6. 6. For domain expert: what’s my hypothesis? Martin Krzywinski 7
  7. 7. For developer and domain expert:
 opening the black box input filter 1 filter 2 filter 3 output A output B output C 8
  8. 8. B. What is Visual Analytics and how can it help? 9
  9. 9. Our research interest: visual design + interaction design + backend 10
  10. 10. What is visualization? visualization of simulations in situ visualization
 of real-world structures 11
  11. 11. What is visualization? T. Munzner 12
  12. 12. What is visualization? cognition <=> perception cognitive task => perceptive task T. Munzner 13
  13. 13. Why do we visualize data? • record information • blueprints, photographs,
 seismographs, ... • analyze data to support reasoning • develop & assess hypotheses • discover errors in data • expand memory • find patterns (see Snow’s cholera map) • communicate information • share & persuade • collaborate & revise 14
  14. 14. Sedlmair et al. IEEE Transactions on Visualization and Computer Graphics. 2012
  15. 15. The strength of visualization
  16. 16. pictorial superiority effect “information” 72hr “informa” 65% “i” 10% 17
  17. 17. Steven’s psychophysical law = proposed relationship between the magnitude of a physical stimulus and its perceived intensity or strength 18
  18. 18. Accuracy of quantitative perceptual tasks how much (quantitative) what/where (qualitative) McKinlay 19
  19. 19. Accuracy of quantitative perceptual tasks how much (quantitative) what/where (qualitative) McKinlay 20
  20. 20. Accuracy of quantitative perceptual tasks how much (quantitative) what/where (qualitative) “power of the plane” McKinlay 21
  21. 21. Pre-attentive vision = ability of low-level human visual system to rapidly identify certain basic visual properties • some features “pop out” • used for: • target detection • boundary detection • counting/estimation • ... • visual system takes over => all cognitive power available for interpreting the figure, rather than needing part of it for processing the figure 22
  22. 22. 23
  23. 23. 24
  24. 24. Limitations of preattentive vision 1. Combining pre-attentive features does not always work => would need to resort to “serial search” (most channel pairs; all channel triplets) e.g. is there a red square in this picture 2. Speed depends on which channel (use one that is good for categorical) 25
  25. 25. Gestalt laws - interplay between parts and the whole 26
  26. 26. Gestalt laws - interplay between parts and the whole • simplicity • familiarity • proximity • symmetry • similarity • connectedness • good continuation • common fate 27
  27. 27. Bret Victor - Ladder of abstration 28
  28. 28. For domain expert: what’s my hypothesis? Martin Krzywinski 29
  29. 29. Martin Krzywinski 30
  30. 30. Martin Krzywinski 31
  31. 31. For developer and domain expert:
 opening the black box input filter 1 filter 2 filter 3 output A output B output C 32
  32. 32. B A C 33
  33. 33. B A C 34
  34. 34. B A C 35
  35. 35. C. How do we actually do this? 36
  36. 36. Talking to domain experts 37
  37. 37. Data visualization framework 38
  38. 38. Card sorting 39
  39. 39. Tools of the trade 40
  40. 40. Processing - http://processing.org • java 41
  41. 41. D3 - http://d3js.org/ • javascript 42
  42. 42. Vega - https://github.com/trifacta/vega/wiki • html + json 43
  43. 43. D. Examples Data exploration Data filtering User-guided analysis 44
  44. 44. Data exploration HiTSee Bertini E et al. IEEE Symposium on Biological Data Visualization (2011)
  45. 45. Aracari Bartlett C et al. BMC Bioinformatics (2012) Ryo Sakai 46
  46. 46. Reveal Jäger, G et al. Bioinformatics (2012)
  47. 47. Meander Pavlopoulos et al. Nucl Acids Res (2013) Georgios Pavlopoulos 48
  48. 48. ParCoord Endeavour gene prioritization Boogaerts T et al. IEEE International Conference on Bioinformatics & Bioengineering (2012) Thomas Boogaerts 49
  49. 49. Sequence logo
  50. 50. Seagull
  51. 51. subgroup similarity difference
  52. 52. Data filtering (visual parameter setting) TrioVis Sakai R et al. Bioinformatics (2013) Ryo Sakai 54
  53. 53. User-guided analysis clustering regions of interest Spark Nielsen et al. Genome Research (2012) data samples chromatin modification DNA methylation RNA-Seq 55
  54. 54. BaobabView decision trees van den Elzen S & van Wijk J. IEEE Conference on Visual Analytics Science and Technology (2011)
  55. 55. E. Challenges 57
  56. 56. Many challenges remain • scalability (data processing + perception), uncertainty, “interestingness”, interaction, evaluation • infrastructure & architecture • fast imprecise answers with progressive refinement • incremental re-computation • steering computation towards data regions of interest 58
  57. 57. Computational scalability • speed • preprocessing big data: mapreduce = batch • interactivity: max 0.3 sec lag! • size • multiple data resolutions => data size increase • not all resolutions necessary for all data regions: steer computation to regions of interest
  58. 58. • Options: • distribute visualization calculations over cluster • distributing scala/spark or other “real-time” mapreduce paradigm • functional programming paradigm? • lazy evaluation and smart preprocessing: only calculate what’s needed => generic framework
  59. 59. Perceptual scalability • “overview first, then zoom and filter, details on demand”: breaks down with very big datasets • “analyze first, show results, then zoom and filter, details on demand” => need to identify regions of interest and “interestingness features” • identify higher-level structure in data (e.g. clustering, dimensionality reduction) -> use these to guide user
  60. 60. Thank you • Georgios Pavlopoulos • Ryo Sakai • Thomas Boogaerts • Toni Verbeiren • Data Visualization Lab (datavislab.org) • Erik Duval • Andrew Vande Moere 62

×