Once you know what you want to say, effective visual communication is achieved by displaying information in a way that enables people to clearly see an accurate representation of your message and understand what they see. To do this, you must understand a few things about how people see (visual perception) and how people think (cognition).
A common problem with tables and graphs is the excessive presence of visual content that doesn’t represent actual data. Whenever quantitative information is presented, the data itself should stand out clearly, without distraction. This involves eliminating anything that doesn’t represent data, except for visual devices that support the data in a necessary way (for example, axes in a graph), in which case they should be displayed in muted fashion so as to not distract from the data itself.
Because differences in visual properties, such as color, are used to communicate actual differences in theinformation itself, visual differences should never be used arbitrarily. When people notice visual differences,they try to discern the meaning of those differences. Don’t confuse people and waste their time by includingvisual differences that are meaningless.
It is usually best to encode the third variable using distinct colours, rather than any of the other available methods, such as different line or fill patterns. Just be careful to use colours that are still distinct, even when photocopied.
Keeping the quantitative scale consistent makes it is easy to compare the charts.
Be careful whenever you narrow the scale to make sure that it is obvious to your audience that you’ve done so and won’t misread big differences between lines and points on the graph with big differences in their values, which might not be the case.Points aren’t as visually prominent as bars and consequently don’t emphasize individual values quite as forcefully, but points are a fine substitute for bars when you need to narrow the quantitative scale.
The more directly you can label data, the better. For instance in a line graph with multiple lines, if you can label the lines directly (for example, at the ends of the lines), the graph will be much easier to read. In a bar graph with multiple sets of bars, you usually need a legend, but you can make it much easier to read by arranging the labels to match the arrangement of the bars, rather than the more usual way on the right. Notice also that the legend doesn’t need a border around it - it simply isn’t necessary.
Even on quantitative scales, only major tick marks are necessary, with rare exceptions.When the quantitative scale corresponds to the Y axis, it can be placed on the left side, right side, or on both sides of the graph. When it corresponds to the X axis, it can be placed on the top, bottom, or both. It is usually sufficient to place the quantitative scale in one place, but if the graph is so large that some values are positioned too far from the scale to adequately determine their values, placing the scale on both the left and the right, or the top and the bottom, will solve the problem.When it only needs to appear in one place, the best choice of position depends on which values you want to emphasize or make easier to read. Placing the scale nearest to those values will accomplish . Avoid placing the scale on the right side of the graph, however, unless really necessary to serve this purpose, because the scale so rarely appears only on the right that this might momentarily disoriented those who use the graph.If the quantitative scale ranges between positive and negative values, the axis line should be positioned at zero, but the labels should be placed elsewhere so they won’t interfere with the data. For instance, when the quantitative scale is on the X axis, it is usually best to place the text labels just below the plot area of the graph.
Grid lines in graphs are mostly a vestige of the old days when graphs had to be drawn by hand on grid paper. Today, with computer-generated graphs, grid lines are only useful when one of the following conditions exists:• Values cannot be interpreted with the necessary degree of accuracy• Subset of points in multiple related scatter plots must be comparedBear in mind that it is not the purpose of a graph to communicate data with a high degree of quantitative accuracy, which is handled better by a table. Graphs display patterns and relationships. If a bit more accuracy than can be easily discerned is necessary, however, you may include grid lines, but when you do, you should subdue them visually, making them just barely visible enough to do the job. When you are using multiple related scatter plots and wish to make it easy for folks to compare the same subset of values in two or more graphs, a subtle matrix of vertical and horizontal grid lines neatly divides the graphs into sections, making it easy to isolate particular ranges of values.
Data VisualisationHarvinder Atwal
Agenda Warm-Up 5 mins Data Visualisation: Why it matters 5 mins The Rules 10 mins Seven Common Quantitative Relationships 5 minsThe best means to encode quantitative data in charts 5 mins Step by Step Guide 10 mins Test 5 mins 2
Agenda Warm-Up 5 mins Data Visualisation: Why it matters 5 mins The Rules 10 mins Seven Common Quantitative Relationships 5 minsThe best means to encode quantitative data in charts 5 mins Step by Step Guide 10 mins Test 5 mins 3
Even the best information is useless, if its story is poorly toldMost presentations of quantitative business data are poorly designed –painfully so, often to the point of misinformation.Anyone can start drawing charts in Excel and use PowerPoint buthardly anyone is trained to do so effectively. The effective display of quantitative information involves two fundamental challenges Selecting the right medium Designing the individual visual of display (for example, a components of the selected table or a graph, and the and medium to display the appropriate kind of either) information and its message as clearly as possible 4
Bad Data Visualisation can have tragic consequencesIn Jan 1986 NASA had to Morton Thiokol engineersdecide whether to launch produced a chart and the Challenger shuttle in recommended that a ―100-year cold‖ shuttles not be flown below 53F because of potential damage to the O-Rings in the booster rockets Morton Thiokol managers Morton Thiokol managers accepted the agree to the flight recommendation and passed it on to NASA NASA asks for the recommendation to be reconsidered 5
The engineers are Morton Thiokol came up with this chartLooking at the O-Ring damage over the previous 24 shuttle missions, the data waspresented in chronological order showing the location and extent of the damagesustained to the left and right boosters and the temperature at launch time. 6
The Morton Thiokol engineers failed to convince their management and NASA with fatal consequences 7
Would this chart have been more convincing?If instead we remove all the extraneous data and do a simple plot oftemperature vs damage then the pattern becomes much clearer. ALWAYS damage below 66F Never damage above 76F 8
WTF!? How many hours of valuable management time have been wasted trying to understand a badly drawn chart? How many £billions have been wasted on incorrect decisions because someone has misinterpreted a chart message? 9
To communicate effectively visually you need to understand visual perception and cognition. Present your message in a way that takes advantage of the strengths of visual perception while avoiding its weaknesses - matching the human thought process. You can develop a simple set of skills (graphicacy) based on this knowledge. , based on clear-cut This is principles about what mostly Not works and what doesn’t 10
Agenda Warm-Up 5 mins Data Visualisation: Why it matters 5 mins The Rules 10 mins Seven Common Quantitative Relationships 5 minsThe best means to encode quantitative data in charts 3 mins Step by Step Guide 10 mins Test 5 mins 11
12Research Finding: Communication is most effective when you say neither more nor less than what is relevant to your message.Principle #1: Display neither more nor less than what is relevant to your message.
Tufte’s data-ink ratio is the single most important concept in data visualisation Data-ink ratio = data-ink / total ink used to print the graphic = proportion of a graphic’s ink devoted to the non-redundant display of data-information = 1.0 − proportion of a graphic that can be erased without loss of data-information. (The Visual Display of Quantitative Information, Edward R. Tufte, Graphics Press, Cheshire CT, 1983, p.93) 13
Eliminate all redundant visual information! You wouldn’t write a document like this using multiple fonts, gratuitous formatting, redundant excessive highlighting, variable colours, difficult to read italics, pointless underlining, desperate shadows in multiple sizes.● Yet everyday you see the graphical equivalent as people try to make their charts “interesting” instead of useful!. 14
How many items of redundant visual information can you see in this chart? Grey Underlining Sales and Appointments by Region Background Border 200 183 180 Legend Key VerticalExcessive 160 150 Linestick marks 3-D Effect 140 Data Labels 120 112 100 97 Volume 100 91 Appointments 85 Sales 75 80 60 Border on 40 LegendBorder on 20Bars 0 Wales and West London And South East Scotland and North Midlands Floor Region Highlighting for no reason 15
Less is more; the same chart de-junked…Volumes Sales and Appointments by Region 200 Appointments Sales 180 160 140 120 100 80 60 40 20 0 Wales and West London And South East Scotland and North Midlands Region
Research Finding: People perceive visualdifferences in an information display asdifferences in meaning.Principle #2: Do not include visualdifferences in a graph that do not correspondto actual differences in the data. 17
What is the meaning of the different colours that appear on the bars? The answer is “nothing.” Don’t confuse people and waste their time by including visual differences that are meaningless. 18
Research Finding: The visual properties that workbest for representing quantitative values are thelength or 2-D location of objects.Principle #3: Use the lengths or 2-D locations ofobjects to encode quantitative values in graphsunless they have already been used for othervariables. 19
Bar B is actually only 10% bigger than A, not 100% 560 550 540 530 520 510 500 490 480 470 A B 26
Research Finding: People perceive differences in thelengths or 2-D locations of objects fairly accuratelyand interpret them as differences in the actual valuesthat they represent.Principle #4: Differences in the visual properties thatrepresent values (that is, differences in their lengthsor 2-D locations) should accurately correspond tothe actual differences in the values they represent. 27
Research Finding: People perceive things thatappear connected as wholes and things that appeardisconnected as discrete.Principle #5: Do not visually connect values that arediscrete, thereby suggesting a relationship that doesnot exist in the data. 28
The regions are discrete, so values that measure something going on in these regions should be displayed as discrete. Connecting discrete items with a line is misleading. Doing so forms a pattern of upwards and downwards slopes that are utterly meaningless. 29
Research Finding: People pay most attention to andconsider most important those parts of a visualdisplay that are most salient.Principle #6: Make the information that is mostimportant to your message more visually salient in agraph than information that is less important. 30
Some information is more important to your message than others You can communicate this fact in a graph by making those items that are most important more visually dominant (salient). It is your job to direct people’s eyes to the most important parts of the display, so they adequately focus on them. 31
Research Finding: Short-term memory is limited toabout four chunks of information at a time.Principle #7: Augment people’s short-term memoryby combining multiple facts into a single visualpattern that can be stored as a chunk of memory andby presenting all the information they need tocompare within eye span. 32
By presenting quantitative information visually aspatterns, more information can be simultaneously stored in short-term memory, Each of the two lines in this line graph combines 12 different sales figures, one per month, into a single pattern of upward and downward sloping line segments. When encoded in a visual pattern such as this, these 12 numbers can be stored together as a single chunk of information in short- term memory 33
Agenda Warm-Up 5 mins Data Visualisation: Why it matters 5 mins The Rules 10 mins Seven Common Quantitative Relationships 5 minsThe best means to encode quantitative data in charts 5 mins Step by Step Guide 10 mins Test 5 mins 34
Seven common quantitative relationships in graphs and how to display them Meaningful quantitative information always involves relationships. With rare exceptions in business graphs, these relationships always boil down to one or more of the seven relationships described on the following slides. 35
Time SeriesExpresses the rise and fall ofvalues through time. – Use lines to emphasize overall pattern. – Use bars to emphasize individual values. – Use points connected by lines to slightly emphasize individual values while still highlighting the overall pattern. – Always place time on the horizontal axis. 36
RankingExpresses values in order bysize.Use bars only (horizontal orvertical). – To highlight high values, sort in descending order. – To highlight low values, sort in ascending order. 37
Part-to-WholeExpresses the portion of eachpart relative to the whole. – Use bars only (horizontal or vertical). – Use stacked bars only when you must display measures of the whole 38
DeviationExpresses how and the degree towhich one or more things differ fromanother. – Use lines to emphasize the overall pattern only when displaying deviation and timeseries relationships together. – Use points connected by lines to slightly emphasize individual data points while also highlighting the overall pattern when displaying deviation and time-series relationships together. – Use bars to emphasize individual values, but limit to vertical bars when a time series relationship is included. – Always include a reference line to compare the measures of deviation against. 39
DistributionExpresses a range of values as well as theshape of the distribution across that range.Single distribution: – Use vertical bars to emphasize individual values – Use lines to emphasize the overall shape.Multiples distributions: – Use vertical or horizontal bars (a.k.a. range bars or boxes) to encode the full range from the low value to the high value, or some meaningful portion of the range (for example, 90% of the values). – Use points or lines together to encode measures of centre (for example, the median). 40
CorrelationExpresses how two pairedsets of values vary in relationto one another. – Use points and a trend line in the form of a scatter plot. 41
Nominal ComparisonSimply expresses thecomparative sizes of multiplerelated but discrete values in noparticular order. – Use bars only (horizontal or vertical). 42
Agenda Warm-Up 5 mins Data Visualisation: Why it matters 5 mins The Rules 10 mins Seven Common Quantitative Relationships 5 minsThe best means to encode quantitative data in charts 5 mins Step by Step Guide 10 mins Test 5 mins 43
Four types of objects work best for encoding quantitativevalues in graphs: points, lines, bars, and boxes. Bars Points Boxes Lines 44
Points and Lines Points are the smallest of the objects that are used to encode values in graphs. They can take the shape of dots, squares, triangles, Xs, dashes, and other simple objects. They have two primary strengths: (1) they can be used to encode quantitative values along two quantitative scales simultaneously, as in a scatter plot, and (2) they can be used to in place of bars when the quantitative scale does not begin at zero. Unlike lines, points emphasize individual values, rather than the shape of those values as they move up and down. Lines connect the individual values in a series, emphasizing the shape of the data as it moves from value to value. As such, they are superb for showing the shape of data as it moves and changes through time. Trends, patterns, and exceptions stand out clearly. You should only use lines to encode data along an interval scale. 45
Do not use lines for Nominal or Ordinal scales!Sales160 Wrong Wrong Sales140 120120 100100 8080 6060 404020 Nominal Scale 20 0 0 Extra-Value Standard Branded Finest Wales and West London And South East Scotland and North In nominal and ordinal scales, the individual items are not related closely enough to be linked with lines, so you should use bars or points instead. Lines suggest change from one item to the next, but change isn’t happening if the items aren’t closely related as sequential subdivisions of a continuous range of values. For instance, it is appropriate to use lines to display change from one day to the next or from one price range to the next, but not from one community bank to the next. 46
Use lines only for Interval scalesSales120 Right If, however, you want to emphasize individual items, such as individual months, or to100 support discrete comparisons of multiple values at the same location along the interval80 scale, such as revenues and expenses for individual months, then bars or points work60 best.40 Sales20 Interval Scale 120 100 0 Q1 Q2 Q3 Q4 80 60 With interval scales, you are not forced in all cases to use lines; you can use bars and points 40 as well. If you want to emphasize the overall shape of the data or changes from one item to 20 the next, lines work best. 0 Q1 Q2 Q3 Q4 47
Bars encode data in a way that emphasizes individual values powerfullyThis ability is due in part to the fact that bars encode quantitative values in two ways:(1) the 2-D position of the bar’s endpoint in relation to the quantitative scale, and(2) the length of the bar.You probably recognize that these two characteristics correspond precisely to the two visual attributes that can be usedto encode data in graphs. When you want to draw focus to individual values or to support the comparison of individualvalues to one another (see figure 19), bars are an ideal choice. They don’t, however, do as well as lines in revealing theoverall shape of the data. Bars may be oriented vertically or horizontally. 100 Budget Actual 90 80 70 60 50 40 30 20 10 0 Rewards Exchange 48
Whenever you use bars, your quantitative scale must include zero The lengths of the bars encode their values, but won’t When you would normally use bars, but do so accurately if those values don’t begin at zero. wish to narrow the quantitative scale to Notice what happens when you narrow the show differences between the values in quantitative scale and use bars below. Actual sales appear to be half of planned sales, but in fact they are greater detail, you should switch from bars 90% of the plan. to points, because points encode values merely as 2-D location in relation to the quantitative scale, which eliminates the need to begin the scale at zero.100 Budget Actual 560 550 54090 530 520 51080 500 490 48070 470 Rewards Exchange A B 49
BoxesBoxes are a lot like bars, except thatboth ends encode quantitative values.When bars are used in this way, they aresometimes called range bars. They areused to encode a range ofvalues, usually from the highest to thelowest, rather than a single value.In the 1970s John Tukey invented amethod of using rectangles (bars with orwithout fill colors) in combination withindividual data points (often a short line)and thin bars to encode several factsabout a distribution of values, includingthe median (middle value), middle50%, etc.He called his invention a box plot (a.k.a.box-and-whisker plot). 50
Agenda Warm-Up 5 mins Data Visualisation: Why it matters 5 mins The Rules 10 mins Seven Common Quantitative Relationships 5 minsThe best means to encode quantitative data in charts 5 mins Step by Step Guide 10 mins Test 5 mins 51
Step 1: Determine your message Will the data be used to look up and compare individual values, or will the data need to be precise?Determine your message. If so, you should display it in aDon’t just turn your data table. into a chart!Think about what your Or, do both. data means, what you want to communicate Is the message contained in the and most importantly shape of the data—in your audiences’ trends, patterns, exceptions, or needs. comparisons that involve more than a few values? If so, you should display it in a graph. 52
Step 2: Determine the best means to encode the values Nominal comparison. Bars (horizontal or vertical). Points (if the quantitative scale does not include zero). Time Series. Lines to emphasize the overall shape of the data Bars to emphasize and support comparisons between individual values Points connected by lines to slightly emphasize individual values while still highlighting the overall shape of the data Ranking. Bars (horizontal or vertical). Points (if the quantitative scale does not include zero)What am I Part-to-Whole. Bars (horizontal or vertical) Note: Pie charts are commonly used to display part-to-whole relationships, but they don’t work nearly as well as bartrying to graphs because it is much harder to compare the sizes of slices than the length of bars. Use stacked bars only when you must display measures of the whole as wellrepresent? as the parts Deviation. Lines to emphasize the overall shape of the data (only when displaying deviation and time-series relationships together) Points connected by lines to slightly emphasize individual data points while also highlighting the overall shape (only when displaying deviation and time-series relationships together) Frequency Distribution. Bars (vertical only) to emphasize individual values. This kind of graph is called a histogram Lines to emphasize the overall shape of the data. This kind of graph is called a frequency polygon. Correlation. Points and a trend line in the form of a scatter plot 53
Step 3: Determine where to display each variable – One VariablePlace the categorical variable on the x-axis if your graph will include ONE categorical variable and any one of the following is true: • The categorical scale is an interval scale • You are using lines to encode the data • You are using bars to encode the data and the labels are not long or manyIf you are using bars place the categorical variable on the Y-axis when either of these two conditions exist: • The text labels associated with the bars are long • There are many bars. 120 0 20 40 60 80 100 120 100 Beef Fresh pork 80 Lamb Bacon 60 Sausage 40 Beef fillet jnt Beef sirloin joint Is better than 20 Pork roulades Fresh pork mince 0 Fresh poultry gravy Be avy ge rs s b nt es s rk n ce f rlo nt k e er er oc m co oi ge po tj Be ad in sa rg rg La gr st Po in j Ba le Beef stock m An bur u ul h bu bu fil ef try Sa es rk ro ef ef k s po 4 beef burgers ul Fr si rk ea gu Be be po ef h st es Be8 beef steak burgers 4 h ef es Fr be Fr Angus burgers 8 54
Step 3: Determine where to display each variable – Two or three variablesIf the graph involves two or three variables, you must decide which to display along theaxes and which to encode using distinct versions of another visual attribute, such ascolour. 200With a line graph, place the variable that is 180 Appointmentsmost important to your message along the X 160 Salesaxis. 140With a bar graph, encode the variable whose 120items you want to make it easiest to 100compare using a method other than 80association with an axis. Notice how much 60easier it is to compare appointments and 40sales than the regions, because they arepositioned next to one another. 20 0 Wales and West London And South Scotland and North Midlands East 55
Step 3: Determine where to display each variable - the problem of the fourth variable This solution involves a series of small graphs, arranged in the same way as a graph with three variables, all arranged together in a way that can be seen simultaneously. Each graph is alike, including consistent scales, differing only in that each features a different item of a categorical variable. Each graph varies according to a fourth variable, which is sales channel (e.g. product). Using small multiples to support an additional variable is a powerful technique. Graphs can be arranged horizontally, vertically, or even in a matrix of columns and rows. If you need to display one more variable than you can fit into a single graph, select this approach. Face-Value Rewards Big Exchange Midlands 2010 Sales Scotland and North Appointments 2011London And South East Wales and West 0 50 100 150 200 0 50 100 150 200 0 50 100 150 200 56
Step 4: Determine the best design for the remaining objects - Scale It’s now time to make a series of design decisions that remain, including the scales and text. These decisions are concerned with the placement and visual appearance of items. SalesIf the graph will be used for analysis purposes 1800that require seeing the differences between 1600values in as much detail as possible, narrowingthe scale can be useful. Generally, you should 1400adjust the scale so that it extends a little belowthe lowest data value and a little above the 1200highest. 1000 800 560 Q1 Q2 Q3 Q4 550 540 530 520 If you are using bars to encode the 510 data, but your message could be better communicated by narrowing the 500 scale, Remember to switch from bars 490 to points! 480 470 A B 57
Step 4: Determine the best design for the remaining objects - Legend Sales 1800 If a Legend Is Required, and London and South East you are using lines, label the 1600 lines directly Wales and West 1400 1200 1000100 Budget Actual 80090 Q1 Q2 Q3 Q4807060 If you are using bars, place the5040 legend above the plot area with30 the labels arranged side-by-side20 in the same order as the bars10 0 Rewards Exchange 58
Step 4: Determine the best design for the remaining objects – Tick Marks and ScalesTick marks are only necessary on quantitative scales, for they serve no real purposeon categorical scales. A number between 5 and 10 tick marks usually does the job;too many clutters the graph and too few fail to give the level of detail needed tointerpret the values.If the graph can be read with the scale in only one place (left, right, topbottom) place it nearest the data you want to emphasise or make easiest toread.If the graph is so large it cannot be read with only one scale, place it in bothpositions ( top and bottom, left and right). 59
Step 4: Determine the best design for the remaining objects – GridlinesUnless they are necessary to understand your message or divide a scatter plot into sections leave themoff, and when used subdue them visually. Bear in mind graphs display patterns and relationships. Ifyou want to communicate data with a high degree of quantitative accuracy use a table. Sales 1800 1600 1400 1200 1000 800 Q1 Q2 Q3 Q4 60
Step 4: Determine the best design for the remaining objects – Descriptive TextAlthough the primary message of a graph is carried in the picture it provides, text isalways required to some degree to clarify the meaning of that picture. Some text if oftenneeded, including: – A descriptive title – Axis titles (unless the nature of the scale and its unit of measure are already clear)Numbers in the form of text Widget Sales by Region andalong quantitative scales are Sales Calendar Quarter (2007)always necessary and 1800legends often are. It is often London and South Eastuseful to include one or 1600more notes to describe what Wales and Westis going on in the 1400graph, what ought to be Widget sales inexamined in particular, or London and South 1200how to read the East have been ahead of Walesgraph, whenever these bits 1000 and West with theof important information are exception of Q3not otherwise obvious. 800 Q1 Q2 Q3 Q4 61
Step 5: Determine if particular data should be featured, andif so, how The ﬁnal major stage in the process involves highlighting particular data if some data is more important than the rest. Whatever the reason, you have a number of possible ways to make selected data stand out. One of the best and simplest ways is to encode those items using bright or dark colours, which will stand out clearly if you’ve used soft colours for everything else. Other methods include: –When bars are used, place borders only around those bars that should be highlighted. –When lines are used, make the lines that must stand out thicker. –When points are used, make the featured points larger or include ﬁll colour in them alone.Sales Sales 1800120100 1600 80 1400 60 1200 40 20 1000 0 800 A B Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 62
Remember to follow this process for graph selection and design inorder to communicate your information in the most eﬀective manner Determine your message and identify your data Determine if a table, graph, or combination of both is needed to communicate your message Determine the best means to encode the values Determine where to display each variable The best means to encode quantitative data in charts Determine the best design for the remaining objects Determine if particular data should be featured, and if so, how 63
SummaryWhenever you create a graph, you have a choice tomake — to communicate or not. That’s what it all comesdown to. If you have something important to say, thensay it clearly and accurately. These guidelines aredesigned to help you do just that.
Agenda Warm-Up 5 mins Data Visualisation: Why it matters 5 mins The Rules 10 mins Seven Common Quantitative Relationships 5 minsThe best means to encode quantitative data in charts 5 mins Step by Step Guide 10 mins Test 5 mins 65
Which graph makes it easier to determine whether Mid-Cap USstocks or Small-Cap US stocks have a greater share? A B 66
Which of these line graphs is easier to read? A B 67
Which of these tables is easier to read? AB 68
Which graph makes it easier to focus on the pattern of changethrough time, instead of the individual values? A B 69
Only one of these graphs accurately encodes the values. The other skews thevalues in a misleading manner. Which graph presents the data accurately? A B 70
Which map makes it easier to find all of the counties withpositive growth rates? A B 71
Which graph makes it easier to determine R&D’s travelexpense? A B 72
In which graph are the labels easier to read? A B 73