Transcript - Data Visualisation - Design and Principals
[Unclear] words are denoted in brackets
Webinar: Data Visualisation – Design and Principles
22 March 2018
Video & slides available from ANDS website
START OF TRANSCRIPT
Gerry Ryder: So good afternoon, everyone, and welcome to the webinar today. My name
is Gerry Ryder and it's my pleasure today to host this webinar about data
visualisation. It's my pleasure to introduce Martin Schweitzer. Martin's
currently working with ANDS as a data technologist. He has a background in
computer science and a particular interest in data visualisation, data science
and user interface design. He has a very professional background which
includes photography, working on large IT systems, lecturing, as well as
running workshops and training courses.
Martin is currently seconded to ANDS from the Bureau of Meteorology
where he's largely responsible for the climate record of Australia. Today
Martin is presenting for us the first in a series of two webinars focused on
data visualisation. This first webinar will focus on visualisation design and
principles while the second will focus on tools and techniques. So having
covered off on those introductions it's my pleasure now to handover to
Martin for our presentation today. Thank you, Martin.
Martin Schweitzer: Thanks very much, Gerry, and hello, everybody. I'll just jump straight in. So
when asked to present a series on visualisation the first question, I guess,
that everybody will have asked is what is the visualisation. I wanted it to be
slightly broader than just presenting graphic data so my definition of the
Page 2 of 19
visualisation is that it's a visual explanation. It's anything that helps us
understand something by looking at it. A typical example is something that
should be familiar to most people, a map of the underground. One of the
things that makes this a good visualisation is it helps show the relationships
between the different objects inside this and how people in this case
understand how to get from one point to another.
If you're trying to imagine looking at a text description of how to get, for
example, from Edgware Road to Blackfriars, it would be particularly
complex, particularly if, for example, somebody told you that Tottenham
Court Road is closed. One of the things that make this visualisation famous
was that the designer discovered that when you're underground it's really
only just the relationships that mattered. The actual exact geographical
location is a lot less interesting, and that can be seen in this visualisation. So
we'll just have a look at this. This shows the actual place on the map and
then it morphs to what it looks like on the underground map.
It just cycles through what the locations really look like and the underground
map. So once again a beautiful visualisation of how the underground map
actually maps to the real locations in London. Yet another example, often
for people who may be a bit 3D challenged would be familiar - many people
would be familiar with this IKEA visualisation that shows us the correct way
to construct a bookcase. Why are visualisations important? Why don't we
just have text description? We have a lot of descriptive statistics. Well, one
of my favourite examples and something that really made my hair stand on
end the first time I saw it was this thing called Anscombe's quartet.
Many people may be familiar with this. It's a famous example. What we
have here are four data sets: one, two, three and four in Roman numerals.
Each one is a series of X and Y values. Just looking at them it's very hard to
read much into them, but we can look at their summary statistics and - sorry
- for example, they all have the same average value for the X. They all have
the same average value for the Y. The sample variance of both the X and Y is
the same in all four of them. The correlation between the X and Y is almost
Page 3 of 19
identical in all four of them. The linear regression is exactly the same. So a
statistician may be tempted to just say, well, these numbers are pretty much
However, as soon as we look at the visualisation - in other words see the
values plotted - we see something quite different. So just one example of
how seeing a visualisation is very different to looking at the raw data.
Another example that I've taken is - we'll just go to some text and we will
have a look at a file. This is the contents of a file. As you can see, it's
probably not easy to interpret what's in the file. Most files when they're
stored on a disc are just bunches of numbers. If I told you that these
numbers represent RGB values arranged according to an XY grid, once again
it may not be obvious what the numbers represent.
However, if I do this and present them as an image - excuse me - suddenly
we see, okay, we have an image. So as numbers the numbers meant - or as
data the numbers meant absolutely nothing. However, as soon as we
visualise it as an image it will make sense. So as Gerry mentioned, I've been
interested in visualisation for a very long time. In fact, over 20 years. One
of the first books I came across was by Edward Tufte and was one of the
seminal works. At that time I think it wasn't really realised that it would
become a seminal work. He wrote a book called The Visual Display of
In it he says, excellence in statistical graphics consists of complex ideas
communicated with clarity, precision and efficiency. For the rest of the
presentation I'm going to try and expand some of these ideas. So he came
up with a few principles. The first one is graphical displays should show the
data. We'll go through these principles first and then we'll look at examples.
It should induce the viewer to think about the substance rather than about
methodology, avoid distorting what the data has to say, present many
numbers in a small space, make large data sets coherent, and reveal the
data at several levels of detail from a broad overview to the fine structure.
Page 4 of 19
In this book he's got many fine examples; however, I've tried to find more
modern examples and I've taken some of the examples from the work that I
do. So the first one, show the data. What we're looking at here is a rainfall
map of Australia and the government instituted a plan where they said they
would give farmers concessionary loans if they were in a region that had
suffered a one in 10 year rainfall deficiency or one in 20 year rainfall
deficiency. So the map we're seeing here is a map where users can typically
zoom in and out, but what we've done is to show only those areas where -
that are affected or covered by this concessionary loan.
So I guess one of the things is we could have shown a typical rainfall map,
but ideally make this simple as possible and show only the data, so the pink
and red areas are the areas that had been affected by either a one in 10 or
one in 20 year rainfall deficiency. Next, induce the viewer to think about the
substance rather than about the methodology. So what we're looking at
here is in Kyoto, Japan cherry blossoms are a big thing. In Kyoto they've
been recording the peak of the cherry blossom season since the year 800.
So they have over 1000 years of data. What somebody's done is to plot all
What we see is that for about a century they pretty much peak between 10
and 20 April. However, since the early twentieth century they start
blossoming earlier and earlier and a lot of people would say, well, this is a
signal of climate change. However, what we wanted to show about this
graph is that the person has plotted the actual data points using a little
image of cherry blossoms which is quite cute. But they also noted in an
article they wrote about it that initially they had plotted it with a cherry
blossom with six petals until somebody pointed out that cherry blossoms
only have five petals.
The point about that is if people are thinking about how many petals the
cherry blossoms have rather than about what the graph is saying maybe
they should have thought more about the substance than the methodology.
But nonetheless, I think with any of these rules often it's a good thing to
Page 5 of 19
break a rule now and again because in this case, for example, I certainly
remembered this graph long after I'd seen it because I remembered the
issue with the cherry blossoms. The next one was avoid distorting the data
and here we're going to do something exciting and that's do it live. So what
I've done is we're now seeing what's known as Jupyter Notebook.
I imagine a lot of people would be familiar with Jupyter Notebook. Jupyter
Notebook allows us to run Python code and in the next webinar the whole
webinar will be based around looking at our work in Jupyter Notebook;
however, this is a small demo that I've got in this presentation. What we're
looking at here is storage levels in the dams that are around Melbourne. So
the first graph I'll pull up I'll just - so this is fantastic at work. What we see in
this graph is it looks like the Thomson, Cardinia and Upper Yarra dams are
really low and all the rest of them are almost full. So we may worry a bit
However, when we look at this graph we see that we started - the base of it
was 60 per cent full. So Cardinia, for example, is - well, let's take Thomson.
It's actually almost 65 per cent full so it's really not that bad. When we look
at the graph plotted against - starting at zero we note as well it doesn't look
that bad. We may also look at this and say, well, the other dams are all over
80 per cent so we've got nothing to worry about. However, not all these
dams are the same size, so looking at only the percentage can be a bit
misleading. So let's run this one.
What we see here is that the amount of space in the Thomson Dam, there's
probably not enough water in all of these smaller dams to even fill that gap
that's in the Thomson Dam. So that's what we mean when we say avoid
distorting the data. Try and make sure that we're telling a story with
integrity. The next principle was to present many numbers in a small space.
The map that we're looking at here is Australian rainfall deciles. So this is
that - the areas that are in this bright red have received the least rainfall this
December, they're in the lowest one per cent of December rainfalls.
Page 6 of 19
These tiny dark blue patches are in the highest one per cent of rainfall that -
this record goes back to 1910 so they take every year from 1910. We say
present many numbers in a small space. So what we're looking at here is a
grid, and they're roughly 640 by 800 grid cells. So each one is calculated and
for each one there's 117 years of data. So what we're looking at is almost 36
million data points; however, we've condensed those 36 million data points
into one, well, simple map. So I think this is a fantastic example of
presenting many numbers in a small space. Sometimes, as I said, we want
to break the rules and get something where we break the rules.
This was the recent tropical cyclone. We've got a visualisation that shows
the current position of the cyclone. This arguably is just one data point;
however, it's a really important data point, particularly if you're living in the
north of Western Australia and you want to know how close the cyclone is
or whether it's got a chance. Also we can - by clicking on that one point we
see a far more detailed image which then takes us into seeing the data at
different levels. The next one was around making large data sets coherent.
This is something that at the bureau we're very interested in. How do you
communicate things like probability?
When people hear almost certainly do they think that an event is more
probable or less probable than if they hear highly likely or if they hear very
good chance? So what they've done here is taken all these terms and
presented them using a technique known as KDE on one graph. So we can
very easily compare that, for example, if somebody says, chances are slight,
that people think that there's actually slightly more chance of an event
happening than if we, for example, say, it's highly unlikely, or if we say,
there's almost no chance. So that covers off on Tufte.
The next few slides are some of my ideas and some of my experience in
developing visualisations and somethings that I feel are important. One of
the most important things in any visualisation is that you actually have
something interesting to talk about the data. Whenever I see somebody
saying, we've got this data, it looks pretty boring. Can we just create a
Page 7 of 19
visualisation, well, that's when the hairs on my neck prickle a bit. So this is a
famous video. It started off as a TED Talk by the Swedish Hans Rosling.
Martin Schweitzer: Okay. I think people get the idea. Now, one of the things that strikes me
about that video is talking about inequality, et cetera, and gave this TED
Talk. At a similar time, Thomas Piketty, who was famous for his book on
capitalism, also gave a TED Talk. I watched both talks. Both were equally
impressive. I thought Piketty's was the more impressive. However,
Rosling's - the one you've just seen - got 10 times as many views roughly as
Piketty's, and I think the real reason it got so many views was because it had
such a story here. It had such remarkable visualisation and graphics.
So it certainly says that it's important. Obviously Rosling is a very - or was a
very impressive storyteller and was just a very impressive presenter and so
did it really well. Of course, not all of us have his talents; however, we can
all do good or great visualisations. So here's a simpler graphic and this one
shows the trend in maximum temperatures from 1970 to 2016. So
wherever the graph is red the average maximum temperature has been
increasing and wherever the graph is blue the maximum temperature has
been decreasing over the years. I think this one tells quite an alarming
Here's another visualisation and this one I've got three slides which show a
progression of how we're trying to convey something. So in the first slide
the person has just taken the data and they've put it - this is rainfall data.
They've started at 1900 and showed how much rainfall up to years 2010.
Now, there are two large influences on rainfall. One is the ENSO which is -
often we hear that in a La Niña system or an El Niño system. The other one
is what is marked as IOD which is Indian Ocean Dipole. Once again, these
can be either positive or negative.
So we've got two, four, six, seven different colours in the graph showing that
when this rainfall fell what kind of system we were in. However, this doesn't
Page 8 of 19
really tell a good story. if we look at it having been rearranged we see that
the blue lines on the right when - all the years where we had a lot of rainfall
all tended to be where we had a La Niña and a negative Indian Ocean
Dipole. The red and brown on the left were during generally El Niño years.
However, we can improve this as well because we've got seven different
things. We have to keep looking at the colours, move forwards and
So here's a graph where what we've done is we've plotted the IOD along the
bottom going from negative to positive. We've plotted the ENSO along the
left-hand side. So these numbers in the top right we can see had a strong
ENSO signal, strong La Niña, and a positive IOD, while these numbers to the
left had a - sorry. These are the La Niña and the negative IOD. We can see
as it gets stronger how it affects the rainfall. Here's another graph which
also tells quite an alarming story. This is the water supply in Cape Town and
in 2013/14 we can see they typically get their rainfall in winter.
So around about - from October onwards the dam levels start falling.
Because for about the last five years they haven't been - there hasn't been
good rain, they've continually been falling each year progressively. That's
2013, 2014, 2015, 2016, up till this year which is 2017/18. We see when I
pulled up this graph it was between January and February and we were over
there and they were projecting that around April/May/June Cape Town
could run out of water. There were a few projections. One is if people use
600 megalitres a day of water, one with 500. One is if they were using 600
megalitres and they've started up desal plants so what would happen.
All of them show pretty dire consequences. A visualisation like this really
does tell a story. So the next principle is keep your graph as simple as
possible. I've made a very quick 3D graph. I've just made a fictitious one,
which is how many people attended at morning teas and maybe the person
that attends the most morning teas at the end of the year gets a prize and
the person who has attended the least gets a wooden spoon. So this was
my first graph and I felt, well, this can always be improved. Whenever I see
Page 9 of 19
a 3D graph if it's not displaying 3D data I'm a little bit disturbed. So I
modified it so we've now got a 2D graph.
However, the numbers are in the [box]. We probably don't need those grids
and as many of them. We certainly don't need our dotted and solid line
grids, so I cleaned that up a bit. So there's a simpler graph. However, when
looking at that graph - and often I see graphs like this - the first question I
ask is, what do those colours mean? Why are there different colours? Well,
in this case the colours mean absolutely nothing, so I've got rid of the
colours. The next thing is getting back to this idea maybe of telling a story.
What am I trying to say? Well, really what I'm trying to do is find out who
attended the most and least morning teas.
So maybe by improving the graph, well, I've now put the least - I've ordered
them from least to most and now it's quite obvious who's attended the least
and who's attended the most. So is there anything else we can do to make
this presentation simpler or to remove any unnecessary data et cetera? This
is a trick question, but of course there is. Well, in this particular case I think
we can just remove the graph altogether. I don't think that that
visualisation has given us any more information than simply looking at a
table of numbers. The table remains ordered. I get exactly that same
So it's probably important to ask that question occasionally. Do we really
need a graph for this data, or do we really need a visualisation for this data?
I think Antoine de Saint-Exupery said it best when he said, perfection is
achieved not when there's nothing more to add but when there's nothing
left to take away. However - this was a however - Einstein was apparently
famous for saying, make it as simple as possible but no simpler. So here's
another example of a visualisation. This is called a skew-T log-P graph, and
this is used by meteorologists every single day. Temperature is on these
diagonals. The pressure is going along this way.
The reason it's called log-P is because at the bottom you see the gap
between 100 - 900 and 1000 is much smaller than the gap between 200 and
Page 10 of 19
300. So even the scale appears to be changing. There are two different
colour lines. Each of those lines has a meaning. The red line is what was
recorded today and the blue yesterday. In case - well, I imagine most
people aren't familiar with these graphs. So what this is actually plotting is
at a lot of locations around the world they send up weather balloons or
sondes. So this is plotting the temperature as the balloon is moving up
through the atmosphere. So we can see that it's getting cooler, et cetera.
The second line is the dew point. So we can see, for example, if the dew
point crosses the temperature we're going to get precipitation or rainfall
and so on. On the right-hand side we've got another particularly interesting
thing being visualised here and these are called wind barbs. The direction of
the wind barb shows the direction of the wind. So these ones pointing
upwards show northerly wind. The number of feathers shows the speed of
the wind. So the short ones are five knots. The long one is 10 knots. A long
and a short is 50 knots and so on. I won't go too much into this. But the fact
is that for a meteorologist, this is a really important graph.
It's not as simple as a bar chart or a line graph, et cetera, but it's serving its
purpose. That's the most important thing. A visualisation has to be fit for
purpose. The next thing we'll look at is colour. I'm not going to go into
colour in a lot of detail, the main reason being because you can spend hours
talking about colour to really understand it thoroughly. I've got a few
suggestions, but the most important one, I think, is for colour if it's
important please try and find somebody who's an expert. There are lots of
different factors to consider, things like colour blindness, common
conventions, cultural differences and so on.
This is just a very simple example. These are from images of blood travelling
through an artery. This one - well, basically they showed these different
images to a lot of doctors and asked which one they preferred. Most
doctors came up with this A. However, when they asked people to diagnose
the issues with these things they were - then I think the best one was F or G
where they were able to identify the most issues or see the most problems
Page 11 of 19
with a patient. So even though they thought that this one was the easiest
one to read - the colourful one - admittedly they were used to those colours,
et cetera. It's not always the case.
The reason I'm saying this is it really does say that colour can be a tricky
issue and that really it does need some expertise and, in this case, it was
actually through some research. In the slides there's a reference to this
paper that talks about this. It's quite an interesting paper. Just on the topic
of colour, here are some examples from the bureau once again. This one is
showing rainfall. It's using a gradated scale, so darker means more rainfall.
They've used the colour blue which makes sense because the more
saturated blue tends to show areas that have more saturation in terms of
This map is not showing how much rainfall but it's showing how variable the
rainfall is, in other words, how much it differs from year to year. So it
wouldn't have made sense to use blue here because some areas can be very
dry but at the same time have a lot of variability of - or have very little
variability. Areas that may be very wet may have a very small variability
because they're wet all year round, just as areas that are wet all year around
have low variability. So this one is showing - that's chosen a different colour
for this one. This one is showing how much rainfall in this case fell in the
week of 23 January. This is using a scale that people who are looking at this
type of map are familiar with.
The white areas have had no rainfall or not been able to record it, and these
dark colours are the areas of the highest rainfall. Once again we see that
this scale is not linear, so there's a colour for between one and five
millimetres. There's a different colour for between 300 and 400 millimetres.
I think that's useful when looking at visualisations also to see examples of
maybe things that we can try and avoid. This is always the part of this
presentation that I feel uneasy about, but I think it's just worth having a look
at an example so we'll have a quick look at this one. So what this is talking
Page 12 of 19
about is average household debt in America by this person who is a financial
It's how much debt you have. It's an infographic. So the first thing I looked
at when I saw this is we've got some sort of thing that looks like a
visualisation, and I tried to work out what it's telling us. I looked at it and I
thought, well, why are some people green and some people - is it the green
ones have less debt? No. All different sizes. They've - I realised that it
probably doesn't mean anything. It's just decoration, so we can move on.
So the next thing is the total owed by the average. We see credit cards are
16,000, mortgages are almost 10 times that amount, but the mortgages
aren't actually 10 times as long in the specialisation. 28,000 is a lot longer
So there's clearly no clear scale - well, I should just say there's no clear scale
on this. Once again, we've got different colours but, once again, they seem
just for decoration. The other thing is I couldn't understand why any type of
debt is 134,000 while mortgages are 176,000. So it wasn't quite clear what
any type of debt meant. Also credit cards and auto loans were lumped
together with mortgages which are more of an asset and some people
differentiate between things like mortgages, which they classify as good
debt and things like auto loans which are classified as bad debt.
The next one is how much does debt cost you. This probably one of the
better ones, but there's no - given that she's used comparative scales in the
previous ones I was surprised that there wasn't any comparative scale. I
think one thing I did notice here was that this figure from memory didn't
really add up. This was an interesting one, medical debt on the rise. There
were a few issues with this but one of the things we notice is if that's 63 per
cent then that one is about 37 per cent and yet that 37 per cent segment
actually looks a bit bigger than the 42 per cent segment.
Considering that halfway across would be 50 per cent I don't think that that
42 per cent is accurately reflected in the pie chart. I won't go into the
colours that have been chosen or talk much more about pie charts. A lot of
Page 13 of 19
people have very strong opinions about how useful pie charts are. We now
come to debt broken down by age. In this one it actually looks as though
the colours may be meaningful because they're two red bars, two orange
bars, and two green bars, but once again it just seems that the colours were
arbitrarily chosen. That's all I'll say about that, but - except to say I do think
- have a look at examples and always look critically.
Look critically at your own work at things that can be improved. But also
when looking at other things think about, okay, is this a good visualisation?
Is it a bad one? When you see something that looks good what makes it
look good? When you see something that looks okay maybe think, how
could it be improved? What could this person have done to make the story
clearer? So what are some techniques that you can use when doing a
visualisation that will make it better for the people looking at it? One of the
first ones I talk about is natural mappings. What we're looking at here is
what's called a wind rose. What this is showing is wind in eight - not
quadrants but eight sectors - and how windy it is.
So this is Melbourne Airport that we're seeing here and we see that most of
the winds at Melbourne Airport are northerly. These are the averages taken
over a particular period. As we go out in this telescope it shows us stronger
and stronger winds. So, for example, we hardly ever have, let's say, gale
force winds in this south-westerly direction. There's very few easterlies at
Melbourne Airport. But the natural mapping is if it's facing upwards then
we can see straightaway it's a northerly wind. We've seen this graph before,
but the important thing is to highlight relevant information.
So if all five of these lines were the same colour it wouldn’t be quite clear
what the story's telling us, but it - given that this one is highlighted and the
others are muted we can see straightaway it - our focus shifts to this one.
The next thing, make comparisons clear. So what this is comparing is arctic
ice. This is going back to 1879 and it's comparing the - as we're progressing
into the present. One of the things we see is it seems pretty clear that
Page 14 of 19
there's less and less arctic ice as we're coming into the present. By
overlaying those plots one on top of the other it makes it a lot clearer.
Going back to this graph we see once again by plotting all these different
attributes on the same set of vertical axes it makes those comparisons much
clearer. So, for example, when we're comparing highly likely to very good
chance we can see quite clearly how they compare. The next thing is in this
case it's probably exaggerated but make the scale clear. This is showing the
stations in Australia that record - it's showing basically the largest difference
between two days - so between the maximum temperature on day 1 and
day 2. So at these stations there was a 25 degree or 27 degree difference.
So one day the maximum temperature was 10 degrees and the next day 37
degrees, for example.
As we went further north there's less difference between successive days in
temperature in terms of their records. Yet another visualisation, this time of
space, and we've got a very different scale here. It's probably hard to read
on the slide, but that distance there is 100 million light years across. So a
light year is pretty big. 100 million light years is 100 million times as big.
Finally, colour should add meaning and not detract. We come back to this
slide, which is how much Australia has - or the warming trend in Australia
since 1970. Here clearly colour is enhancing the meaning of what we're
trying to say here.
Use conventions. If we look at this time series of temperature, at first look it
may seem that temperature is actually declining. This is just a dummy slide I
created for this presentation. What I've done here is these temperatures
are actually - if we look carefully at these numbers we see the numbers are
actually decreasing as we go from left to right. Normally when we read from
left to right we expect time to increase - in other words, get either closer to
the present or further into the future. By turning it around we've defied
that convention and then obviously made this a whole lot harder to read.
There's a lot of ways to display different dimensions, and I'll just - sorry. I'll
just go there and I'll just skip this for the moment. We'll go back to it if
Page 15 of 19
we've got a bit of time. So here's another slide showing how we can plot
dimensions very differently. In this graph or in this visualisation what we've
done is this is temperature in Africa but across a range of latitudes going
from 30 south to 30 north. So the Y axis is latitude. The X axis is the month
of the year. The actual colours depict the rainfall during those months. So
what we see here is in the southern latitudes we get rainfall around
As we go north of 20 degrees north it's very dry and around about 10
degrees north they get mostly a winter rainfall. This way of plotting data is
known as a Hovmöller plot. These are called Chernoff faces and what this
does is allows us to plot multidimensional data by using faces. So Chernoff
said people, their brains are hardwired to really recognise faces quickly. So
what we can do is we've got about seven or eight different attributes we can
change. We can change the smile on their mouth. We can change the
length of their nose, the distance between their eyes, the amount by which
eyebrows are raised and so on.
So we've taken a dummy data set here comparing different universities,
different people across the universities, and then we've said, okay, we'll use,
for example, the eye colour to show how - where they are, [of data for
sharing] and maybe the length of the nose to show awareness of data
licensing, et cetera. So basically, it's a novel way of displaying data with a
high number of dimensions. As I keep saying, it's always good to break the
rules. Some people may be familiar with this image. It's called pale blue
dot. If you're not familiar it's a visualisation - well, I guess any image can be.
But what it's showing is over there there's a pale blue dot. This photograph
was taken by Voyager 1 from out of space - well, from space. That pale blue
dot there, almost single pixel down there, is Earth. So often we're told to
make the data we're displaying significant and obvious. In this case the
strength of this visualisation comes from how insignificant that tiny little dot
on that photograph is, how insignificant this huge planet that we live on is.
Da Vinci has said, simplicity is the ultimate sophistication.
Page 16 of 19
I've got a few things in my slides. I'll just go back to the slide that I was
trying to find earlier. Which one did we - for some reason - what I'll do is I'll
rewind that. So - okay. So what we're going to see is how Australia's
temperatures changed for the 12 months ending December 1910. I'll just
maximise this. This is an animation. The colour shows the year and we see
as we're coming more and more to the present the colours spiralling
outward, representing warming. So I guess what makes this visualisation
effective is not only the animation but also the fact that we were able to
draw a line which shows about 100 years of data which typically would have
been a very long line but in this case by wrapping it around the inner circle
we were able to show it all in one compact way.
So finally, all visualisations are wrong. What do I mean? There's a famous
quote from George Box, the statistician, that said, all models are wrong. He
said, all models are wrong. The only question of interest is is the model
illuminating and useful. I've changed that to, all visualisations are wrong.
The question is is the visualisation illuminating, useful, and does it have
integrity? Thank you.
Gerry Ryder: Thank you so much, Martin, for that really valuable presentation that I'm
sure has given us all a lot of ideas and some things to look forward to in the
next webinar where we'll actually see some of the tools that you've used to
create these examples. We do have time for questions if we have anyone in
the audience that would like to ask Martin a question about anything he's
presented on today. Please do put it into the question pod and I'd happily
relay that and put Martin on the spot. So we've got a number of people
thanking you, Martin, for a really interesting talk. We have got one
question, Martin, from [Mark Mackay] who's asked if you could suggest any
textbooks or papers that he could share with students.
Martin Schweitzer: Yes, I do, quite a few. I've actually put them in the slides. So at the end of
the slides there's some references. I believe the slides are going to be made
Page 17 of 19
Gerry Ryder: Yes. That's correct. We'll have both the slides up as well as the recording
up. So you can have a look at the slides separately to the recording.
Gerry Ryder: Another question, Martin, can you provide the name of the visualisation
with the faces. Somebody's obviously liked that one.
Martin Schweitzer: Chernoff faces, C-H-E-R-N-O, either V or F-F.
Gerry Ryder: So perhaps we might put them - Susannah, we might be able to pop that in
the question box for people to see, C-H-E-R-N-O-V or F-F. Someone's -
Richard's asked, Martin, you've used Jupyter Notebooks. He's pre-empting
the next webinar. What sort of other technologies do you normally use to
build visualisations? Another question related about open source software
for visualisations. So I know we'll cover that in the next webinar, but
perhaps a teaser today, Martin.
Martin Schweitzer: So definitely Jupyter Notebooks and Python. So the next webinar will focus
largely on Python. I also do a lot of work with web front-ends and
visualisation tools but probably if one - if you don't mind a steep learning
curve and want to be able to do absolutely everything, E3.js is the go to one
and it's open source.
Gerry Ryder: Thank you, Martin. Someone wants you to - [Jacinta] wants you to look in a
crystal ball and asks, what do you see is the future direction of data
Martin Schweitzer: Wow. I think the - what's happening is we're getting to things with higher
and higher resolution. We're going to more dimensions so we've got the
three dimensional static flatwork. We move to two dimensional animation
with the web. One of the things that's becoming popular is virtual reality, so
people can put on some glasses and maybe see storms being - the data for
the storm being visualised but in their own surroundings. So what does it
feel if a rain - and that actually gets us on to the next one which is
Page 18 of 19
So I can look around at Monash University or let's say I could go down to St
Kilda Beach and see what it's going to look like maybe in 100 years with the
sea level rising two feet or 10 feet or something like that. So both exciting
Gerry Ryder: As technology changes tend to be. We do have a couple more minutes if
there is any other final questions for Martin. So Lisa is interested in the
relationship between storytelling and data and the idea of integrity and
worries about collecting data to suit a story and there being a lack of rigour
and accountability. I guess that's a comment more than a question, but you
might like to respond to that, Martin.
Martin Schweitzer: I think it's a - integrity is always in the mind of the beholder, so that you
can't - data cannot have integrity. The people using and presenting the data
need to have integrity. They need to present the data with integrity. I
would say any tool that can be used for good can also be used for evil. So,
yes, people can create visualisations that try and push an agenda or push a
point, et cetera. Hopefully by being more critical of visualisations we can
actually see those ones where somebody is trying to push something which
That's why I also push for integrity in data that as soon as we show a
visualisation that, let's say, only shows 30 years of data where maybe, let's
say, temperatures have been decreasing immediately it puts a cloud over
everything that person is saying because why have they picked that one 30
year period where the temperature was dropping? So I think in the long run
it pays to be as honest as one can about data.
Gerry Ryder: A final question today, thanks, Martin. Is there a common standard for
colour coding for general use in data visualisation?
Martin Schweitzer: A very simple and short answer, no, absolutely not. However, there is a
website called ColorBrewer - actually, it's called ColorBrewer 2, so colour is
spelt the American way, and brewer like somebody who brews. I would
recommend anybody looking for a good set of colours to go there first.
Page 19 of 19
[There are tools] for visualisation. [We'll] actually use the - so it was written
by a researcher called - her last name is Brewer and she's done a lot of
research into colour and how to use it well.
Gerry Ryder: Great. I'd like to thank now Martin for his presentation today and also
acknowledge Susannah who's been quietly sitting in the background
responding to your questions and making sure the webinar runs smoothly.
So thank you all today and have a great afternoon.
END OF TRANSCRIPT