1. Data Visualization - An introductionProf Jan AertsBiodata Visualization and AnalysisESAT/SCDUniversity of LeuvenBelgiumtwitter: @jandotGoogle+: +Jan Aertsjan.email@example.com://biovizanlab.wordpress.comhttp://saaientist.blogspot.com
2. 1. What is data visualization?
3. “A good sketch is better than a long speech” (Napoleon)
4. “A good sketch is better than a long speech” (Napoleon)shows: size of the army, geographical coordinates, direction that the armywas traveling, location of the army with respect to certain dates, temperaturealong the path of the retreat
5. John Snow - cholera map
6. Shape of Songs: “Like a Prayer” (Madonna) Martin Wattenberg
10. Why do we visualize data?• record information • blueprints, photographs, seismographs, ...• analyze data to support reasoning • develop & assess hypotheses • discover errors in data • expand memory • ﬁnd patterns (see Snow’s cholera map)• communicate information • share & persuade • collaborate & revise
19. Anscombe’s quartet• uX = 9.0• uY = 7.5• sigma X = 3.317• sigma Y = 2.03• Y = 3 + 0.5X• R2 = 0.67
20. A concrete example: hive plots
21. same network Martin Krzewinsky
22. different networks! Martin Krzewinsky
23. 3D, anyone?
24. 3D, anyone? occlusion interaction complexity perspective distortion text legibility
25. Functions in linux operation system: “function A calls function B”Gene interaction data:“gene A regulates gene B”
26. regulatorworkhorse manager
27. 3. Why speciﬁcally learn about dataviz?
28. Isn’t it all just about using common sense?
29. • huge space of design alternatives => many tradeoffs• many possibilities known to be ineffective • avoid random walk through parameter space • avoid some of our past mistakes • extensive experimentation has already been done• guidelines continue to evolve • we reﬂect on lessons learned in design studies • iterative reﬁnement usually wise
30. 4. Stages of data visualization
31. How do we get from data to visualization? We need to understand:• properties of the data• properties of the image• the rules mapping data to image
32. 4.1. Properties of the data
33. S Stevens “On the theory of scales and measurements” (1946)
34. 4.2. Properties of the image - perception
35. Semiology of graphics• Jacques Bertin, Gauthier-Villars 1967, EHESS 1998• semiology = study of signs and sign processes, likeness, analogy, metaphor, symbolism, signiﬁcation, and communication (Wikipedia)• visual encoding: • what - points, lines, areas (, patterns, trees/networks, grids) • where - positional: XY (1D, 2D, 3D) • how - retinal: Z (size, lightness, texture, colour, orientation, shape) • when - temporal: animation
36. “marks” - geometric primitives H V S “channels” - control appearance of marks
37. Gestalt laws - interplay between parts and thewhole (Kurt Koffka) series of principles Election results Florida: • black = Bush • white = Gore
38. Gestalt - Principle of Simplicity Every pattern we see is seen such that we see a structure that is as simple as possible.
39. Gestalt - Principle of Proximity Things that are close to each other are seen as belonging together (=> clusters)
40. Gestalt - Principle of Similarity Things that are similar in some way are perceived as belonging together.
41. Gestalt - Principle of Closure You will try to complete a pattern.
42. Gestalt - Principle of Connectedness Things that are connected are perceived as belonging together. This encoding is stronger than similarity, shape, colour, and size.
43. Gestalt - Principle of Good Continuation Objects that are arranged in a straight or smooth line tend to be seen as a unit.
44. Gestalt - Principle of Common Fate Objects that move in the same direction tend to be seen as a unit.
45. Gestalt - Principle of Familiarity
46. Gestalt - Principle of Symmetry Symmetrical areas tend to be seen as ﬁgures against asymmetrical backgrounds.
47. Context affects perceptual tasks
48. Pre-attentive vision= ability of low-level human visual system to rapidly identify certain basic visualproperties• some features “pop out”• used for: • target detection • boundary detection • counting/estimation • ...• visual system takes over => all cognitive power available for interpreting the ﬁgure, rather than needing part of it for processing the ﬁgure
49. Really fast; see http://www.csc.ncsu.edu/faculty/healey/PP/
50. Limitations of preattentive vision1. Combining pre-attentive features does not always work => would need toresort to “serial search” (most channel pairs; all channel triplets)e.g. is there a red square in this picture 2. Speed depends on which channel (use one that is good for categorical; see further (“accuracy”))
51. 4.3. Mapping data to image: visual encoding
52. Language of graphics• graphics = sign system: • each mark (point, line, area) represents a data element • choose visual variables to encode relationships between data elements • difference, similarity, order, proportion • only position supports all relationships (see later) • huge range of alternatives for data with many attributes • ﬁnd images that express & effectively convey the information
53. Which encoding should I use?• From huge list of possibilities, you have to choose the best one.• Principle of Consistency • properties of the representation should match properties of the data (e.g. pie chart: area vs radius)• Principle of Importance Ordering • encode the most important piece of information in the most “effective” way (i.e. spatial position)
54. Steven’s psychophysical law = proposed relationship between the magnitude of a physical stimulus and its perceived intensity or strength
55. Accuracy of quantitative perceptual tasks how much (quantitative) what/where (qualitative) McKinlay
56. Accuracy of quantitative perceptual tasks how much (quantitative) what/where (qualitative) McKinlay
57. Accuracy of quantitative perceptual tasks how much (quantitative) what/where (qualitative) McKinlay “power of the plane”
58. Accuracy of quantitative perceptual tasks how much (quantitative) what/where (qualitative) grouping: see Gestalt laws McKinlay
60. COLOUR ... is tricky, and often used wrong
61. Colour space• = mathematical model to talk about colour• RGB (red-green-blue) • most common, but less useful• HSV (hue-saturation-value) • more useful
62. colorbrewer2.orgin R: please use RColorBrewer!
63. Context affects colour perception
64. Context affects colour perception
65. Dangers of Depth (3D)• We do NOT see in 3D; we see in 2.05D.• occlusion• interaction complexity• perspective distortion
66. 3D example
67. Lie factor size of effect shown in graphic “lie factor” = size of effect in data
68. 3D scatter plots are better as series of 2D projections
69. Dynamic data• animation is good sometimes, but often not: • we can only follow 3-4 visual cues simultaneously • change in “mental map”• change blindness (e.g. http://nivea.psycho.univ-paris5.fr/CBMovies/ BarnTrackFlickerMovie.gif)
71. 5. Interaction
72. Overview, zoom and ﬁlter, details on demand(Schneiderman’s Information Seeking Mantra)
73. Operations on the data• sorting• ﬁltering• browsing/exploring• comparison• characterizing trends & distributions• ﬁnding anomalies & outliers• ...
74. Techniques to support these operations• re-orderable matrices• brushing• linked views• overview & detail• focus & context• ...