Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Transcript - Data Visualisation - Tools and Techniques

124 views

Published on

Youtube: https://youtu.be/9HR3p6MmwU0

Published in: Education
  • Be the first to comment

  • Be the first to like this

Transcript - Data Visualisation - Tools and Techniques

  1. 1. [Unclear] words are denoted in brackets Webinar: Data Visualisation Part 2 – Tools and Techniques 12 April 2018 Video & slides available from ANDS website START OF TRANSCRIPT Gerry Ryder: Good afternoon everyone. My name is Gerry Ryder and it's my pleasure to host this webinar today. Now, on to our speaker today - for those that weren't with us for the previous webinar in this series, our speaker is Martin Schweitzer, who is a data technologist with ANDS in our Melbourne office. Martin has a background in computer science and a particular interest in visualisation, data science and user interface design. He has a very professional background, which includes photography, working on large IT systems, lecturing, as well as running workshops and training courses. Martin is currently seconded to ANDS from the Bureau of Meteorology, where he is largely responsible for the climate record of Australia. Today Martin is presenting the second in the series of two webinars on data visualisation and today's focus will be on tools and techniques. So without any further ado, I'll hand over to you, Martin. Martin Schweitzer: Thanks Gerry, and thanks Susannah, who's behind the controls. I hope everybody can see my screen.
  2. 2. Page 2 of 20 Today we're going to look at creating visualisations and pretty much everything you see is going to be live. I'm using a tool called Jupyter Notebook. You don't have to be familiar with this tool to follow along. Also, I'll be using Python for my examples, but if you don't know Python, once again, that shouldn’t be a problem, because most of the tools and techniques that I will be showing you will be available in other languages, for example, R and the languages like that. I'll be going through a number of libraries, showing generally the strengths of each library, where they can be used, how they can be used, and as we progress, we'll move from more static to more web- based type environments. Jupyter Notebook runs in a web browser, and so what you see here is my web browser, and I'll just maximise it now - that you know it's a web browser. What it allows us to do is to type in Python code and then execute it immediately. This is great for anybody doing research, because a lot of the work is in experimenting. You try something, you adjust a few parameters, and so on. The first two lines I've got is just to set up our environment and it's - often we get error messages showing. So this will just hide them. Some of the libraries we'll be exploring today - the first one is Matplotlib. We'll be looking at Pandas, one called Seaborn, two web- based, called Bokeh and Plotly, and the last one is one that's used for mapping, called Basemap. As I go into each one, we'll talk about them in detail. Now, if anybody missed the first talk, that's not a problem, because I'll explain things as I go along, but a number of these examples are showing how we created the plots that you may have seen in the first talk. Also, one of the things I said in the first talk was, when during any sort of visualisation, it's important to have some sort of story or some sort of reason, or something we're trying to say with that visualisation. The first visualisation I'm going to share is actually based on a problem that I came across just while browsing the web. I'll explain the
  3. 3. Page 3 of 20 problem. Basically, we have a room - there are 50 people in the room, and each person starts with $100. Each time the clock ticks, each person takes a random number - picks a card between one and 50, and let's say their card says 26 - they'll give $1 to person 26, if they've got some money in their hand. If they've got no money at that point in time, they don't give anything. The question was, after a few thousand ticks, let's say a thousand, how much money - how will the money be distributed? Will it be fairly evenly distributed among the people? Will some people have a lot of money and some people a little - and so on. I found this quite interesting. I wrote a small simulation. The first part of this code is the simulation, so all that is simulating what happens. I'm now using the library called Matplotlib. Matplotlib is - as much as you can say there's a standard plotting library in Python - the standard plotting library. In Matplotlib, to plot my results that I got requires one line. So to run this, I just press - I'm going to press control return, and if everything works, as I hope it will, we get a plot. So this one line of code gave me a plot. It realised that there were 50 elements and that the values inside those elements were between zero and 350. However, this plot doesn’t really give us a picture of what's happened. So what I'm going to do is sort it so that the people with the least money come up first and the people with the most money come last. As I said, this is an interactive environment, so all I have to do is press - make that change, press shift enter again. It runs it a thousand times. Now I get a very different plot. Here we can see the a lot of people have below $50, most of them have below $100, and then very few people have between $250 and $350. One of the things you will note - I haven't done this yet. I'll just change this once again, and now what I'm going to do is to save the plot as an SVG - SVG is scalable vector graphics - and I'll run the plot again. It will save it, and I can now open it in a web page. So we'll just open that web page here, and what we see is we've got a nice plot. The
  4. 4. Page 4 of 20 other thing about it that's special - about scalable vector graphics - is that it's scalable. If I make it bigger - because it's a vector, we don't get any artefacts. It just scales. It gets bigger, it gets smaller. We don't lose any quality. The next thing is we can look at this and say, well, this sort of looks exponential. How well does it fit an exponential curve? Once again, Matplotlib allows us to do this. We're not going to rerun the simulation. We'll just use the data from the last simulation and when we run it, what we see is we get a nice little - so what I've done is I said, added a line - a polynomial of third degree that fits those points. I've just noticed that I haven't reset something from the last time I ran it, so I'm quickly going to restart this. I'm going to have to rerun the simulation, unfortunately. When we do this again - okay, which is what I expected, an orange line. Initially what we see is we've got this thin orange line, which nicely fits the points. By changing the plot we can set things like the line width - equals five - run it again, and we get a thick orange line, which is much easier to see, but it's crossing over the points. So the next thing we can do is just say, alpha - which is the transparency - so equals 0.5, which basically says, make it 50 per cent transparent. Now we get a nice, thick, transparent line running through those points. So once again Matplotlib gives us the ability to customise this line. If we wanted Xs instead of circles, we can change the plot of the points to Xs - and I'll run again, and we get Xs. So one of the characteristics that we often look for when looking for a library is this idea that simple things should be simple, complex things should be possible. In other words, we don't want a very long learning curve, or have to do a lot of work to get a simple graph, but if we do want something special, we want to be able to do it. We don't want a tool that's really simple to use but as soon as we want something a little bit out of the ordinary, suddenly the tool stops working.
  5. 5. Page 5 of 20 We'll look at a few more examples of Matplotlib. This one actually comes from the documentation. The easiest thing is to show it. It plots a polar graph. It's using different colours, and we've just - in this case - generated some random numbers and made them the size of the circle and generated a random number for where, around the circle, we've plotted it, and then depending on the angle we're plotting the numbers in a different colour. Once again, the code that's doing the plotting are those two lines. Once we've generated the data, all we need are two lines of code and it gives us a really nice polar plot. Here's another example, once again, also from the documentation. What we're doing with this one is we're going to display it interactively in Jupyter Notebook, but we're also going to save it. This time we're going to save it as a PDF. So let's run this, and there we get four histograms. It's the same data each time, but what it's demonstrating to us is different ways in which we can create histograms, so we can have stacked, we can have unfilled, we can have bars with legends, et cetera. If we have a look at the PDF that we've generated, there we go - and once again, because it's a PDF, it's scalable, and as we scale we don't lose any quality. It just - it's all done with vectors and gives us really nice output. Unfortunately, I seem to have closed my - oh, there we go. Where that's useful is if we're doing any kind of publication, it's really nice to be able to save our output either as SVG or PDF and include that in a publication. This is one more, showing the range of plots we can do. This one is called a hexbin plot. So it's plotting with hexagons, and what we're doing is it’s a cross between a scatter plot, which plots X values and Y values, on a graph, where we've got - like we may have the X values may be how many figs somebody has eaten, and the Y values may be something like the weight, and we want to plot those two against each other, but what this is also doing is plotting it against how frequently those values occur. One of the nice things is it's really easy to do log scales. What we see here is a nice graph showing us that that white
  6. 6. Page 6 of 20 area in the centre are values that occurred very often, and as we move out values, occurring less and less often. The following one is we've - actually, what we're going to do is quickly look at a comic. Some people may be familiar with the web comic called XKCD. The person Randall Munroe is a very funny person, but with quite a scientific bent, and also quite strong computer skills. This one is called Stove Ownership, and it shows his health before he realised he could cook bacon whenever he wanted, and afterwards. The thing about this graph is that it's hand-drawn. While sometimes we want graphs that look very polished, very professional, there's often a perception when people see a graph like that that the figures are very accurate, and this isn't always the case. So what people did was to create a style, using Matplotlib, that would recreate the look at feel of the XKCD. This is quite a lot because there's quite a lot. In fact, they've taken two of his comics and I'll quickly plot that, using Matplotlib, and what we see is a compute-generated reproduction of Randall Monroe's graph. That's one of his - this is a histogram, done in very much the same style, which copies another one of his comics. So, taking the style, I will replot my simulation - you'll remember the results from my simulation. So if we run that again, we see we get this thing, and which once again - so now that it doesn’t look slick and professional, we see really this is very much a simulation that these figures aren't accurate, and so on. What this does do for us is it does give us an idea of the flexibility of Matplotlib. I'll quickly restart the kernel before going into our next library. The next library is called Pandas. Pandas is a very useful library for anybody who's working with spreadsheets, who's working with CSV files, who's working with data that's coming from an API across the web, and it also has its own plotting routines built in. In this code here, the first line, which actually goes over three lines, I'm reading a file which is dam storage levels. It's a CSV file. Anybody
  7. 7. Page 7 of 20 who watched the last presentation would be familiar that I showed some examples. You'll see the same examples again today. The first line reads the file, the second line plots it, using Pandas plotting, and the third line just adds a legend to it, or sets the label on the file at [send full]. So we're run this code, and there we did get - these are Melbourne dams, and this is showing that the Thomson is about 68 per cent full, and things like Tarago are 95 per cent full. So what we've done is in one line we've run that CSV file, we've told it what we want to call the columns. One of the columns is called name and one of the columns is called Pfull, for percentage full. When we plot it, because Pandas knows about this thing called DS, all we have to say, I want to plot the name against the percentage full and I want a bar chart. I've also said, I want to plot from the value 60 to 100. If I leave out those values, but the same graph, it will plot from zero to 100 by default. On this one - part of the thing was to show that even though we got the same figures in the same graph, it looks different when we start our scale from zero. Once again, the Thomson is about 68 per cent full and Tarago is almost 100 per cent full. The point which is the take-home point here is that to create that plot took two lines of code. The third thing that we showed last time was what's really interesting though, is the gap in volume of these different dams. That gives us a much better picture of what's happening. So when we run that what we see is because the Thomson dam is a really big dam, it's got over 200,000 gigalitres of water deficit. So even though these dams on the right are almost full, altogether they don't even make up that deficit in the dam. So that's Matplotlib and its strength is that it pretty much comes standard with Python. It's flexible, and so on. However, its simplicity often comes at the cost that it's not the best publication-ready graphing tool. You can get very nice publication-ready graphs by
  8. 8. Page 8 of 20 doing a bit of work, but what some people have done is to do that work to make it easier for people to create better graphs. One of those libraries is called Seaborn. Seaborn basically sits on top of Matplotlib and simply adds some nice styles. We'll replot that same plot, this time using Seaborn. All that we're doing here is importing it and just saying - and initialising it. So we've just added those two lines. Everything else is exactly as the last example. We run this, and we see a totally - well, similar graph, but different styling. What Seaborn has done is to make it quite easy to change the styling. I'll say set the style to white, run it again, and we'll see we'll get a nice clean graph, and for example, in the next example what we will do is we will set the style, but we want a white grid and a muted palette. We will run that one and we get that white grid with a muted palette. The next one is just one of the - well, one thing that Seaborn does, which a few packages are starting to do, is that it actually includes its own datasets when you install the package, which is really great for when you're learning, because one of the worst things is you pick up a package, you try and learn it, but the first thing you've got to do is find some data to plot and so on. One of the data sets that's Seaborn comes with is this one called Flights and I really enjoy heatmaps, so this is just an example of a really simple heatmap, using Seaborn and some of its inbuilt data - or some of the data that's provided. What we see over here is these going down the bottom are years. Across the y-axis are months. So round about 1960, July, there were lots of flights, and in the earlier years I guess there were fewer flights. Also during winter there are fewer flights than in summer. Once again, done with Seaborn and done really with two, three - two lines of text - two lines of code, which are those lines. Another dataset that comes with Seaborn is called Tips. It's basically how much people will tip at restaurants. So the first thing we'll do is to load the dataset and have a look at the first 10 rows of this dataset. What we see is we've got a few columns. The first one is the amount
  9. 9. Page 9 of 20 of the total bill - how much tip - what tip was left, the sex of the person serving, whether or not they were a smoker, what day of the week it was, whether it was lunch or dinner, and the size of the party. We're going to use that dataset and have a look at a few Seaborn graphs. The first one we're going to look at is a box plot. What we've done is we've said we want for you to be whether they're a smoker or not, so the purplish colour means they were smokers, the greenish colour means they weren't. On the left-hand side we've got the size of the total bill and across the bottom is days. So it does seem that on Sundays maybe people tip more, and it would look like on Sundays maybe for some reason, whatever, smokers tip more than non- smokers. Another plot that is often used in similar ways to the box plot but carries a bit more information - encodes a bit more information - is what's known as a violin plot, and these once again are quite easy in Seaborn. In this case what we've done is we've used a different view for male and female. Basically you read this pretty much the same as the box plot. There's the median. There's one - the top quartile, the bottom quartile, et cetera. Some of the information is very much the same. On Sundays people seem to tip the most, and we can see they've been split this time into male and female. Those people who were at the last one will remember I demonstrated something called Anscombe's Quartet. It's four datasets, each with the same means and linear regression lines, but each dataset looks very different. Here's a very simple example of it being done in Seaborn. We'll just have a quick look at that. We see it was quite easy. In this case, we're sharing the y-axis. Across the bottom we're sharing the x- axis of the two plots, and all of this was done in a very, very compact way, using Seaborn. The next thing we're going to look at is plotting data on maps. This goes back to a lot of what I do in my substantive job at the Bureau. The library that we use for a lot of our mapping is - once again, it's a
  10. 10. Page 10 of 20 standard with Python. It's called Basemap. The first one, we're not actually plotting anything, we're just simply drawing a map, so what we should see now. It takes a little bit the first time we run it, but we've plotted a map of the world in a few lines of code. That's pretty much from there to there. The story - what we're really interested in at this stage is Australia, and this projection isn't as useful as what we're going to look at now, which is a sort of [MICATA], so we'll just change some of the parameters and this should give us a map of Australia, which is great. It looks a bit like the ones I draw by hand. What I'm going to demonstrate now is some more visualisation, but it goes back to a problem I was given, oh, about a year or two ago. We have about 112 reference stations around Australia. These are stations with very high-quality data that have a long record - about 50 years or longer. These are very important in - as reference stations, to see what's happening with the climate of Australia. One of the outcomes of this - the reference station set is called Acorn, and we do a publication where we publish the names of each station. One of the things we also publish is for each station which are the closest three stations to that station. I wrote some code that worked out what the closest three stations were, to each station. This was the file I was given - once again, I'm using Pandas to read it. So we've got, for example, Halls Creek, we've got the latitude, longitude, the altitude and the date it was opened. As you can see, these all have a very long record. The first thing is I plot these, so using Matplotlib - the first parts we've seen. That draws the map. This line, after having read the file, plots the data on a map, so we'll just quickly plot those stations. The black dots of course are the stations, and there are 112 of them around Australia. The question I was asked is - after saying, okay, here's a list - for each station these are the closest three stations to that particular station.
  11. 11. Page 11 of 20 Being scientists, they always ask interesting questions. They said, by going to one of the closest three station from each station, is it possible to get from any station to any other station? Now, it may seem that the obvious answer is yes, but the thing is because - if I'm sitting here, these may be my three closest stations, but that does not mean that where I'm sitting - which is around Meekatharra, that it's going to be one of the closest three stations to this station, because this station's three closest neighbours are maybe these three stations. The first thing I did - because I'm very visual - was to try and visualise it. What I did was to go back to a very old package which is about 30 years old. I first used it probably more than 20 years ago, called Graphviz. Python includes bindings for Graphviz. We can think of this as each station is a node and we've got lines connecting it to the three closest stations. What I've done is to do something that will visualise that. So we'll just run this code and it creates a PDF. What we see in this PDF is that - I've simply used the station numbers to save space - we can see the layout of all the stations and - move across here - one of the things that we see, for example, is that station 7045, even though it's got three stations that are closest to it, there's no station for which 7045 is the closest station. We can see it in a few other places as well. I think over here, we've only got one line going from 85096 to 91293. If anybody wants to guess, this part is in fact the stations that are in Tasmania. If we go back to our graph - our map - we can see how these are all close together - that station is close to that one, but these are all closer to each other than the main one. So basically, that graph helped us visualise, and yes, it turns out that after writing some code, that there is no single path. The next question - once again, these people being scientists - is, where would we have to add stations so that the closest - so that there's always a way that we can get to another station by visiting one of the closest three?
  12. 12. Page 12 of 20 I came up with a new visualisation, and it's called a Voronoi plot. I'll run this code. What a Voronoi plot does, is it's not easy to show here, so I'll show it in a web page that I did. On this page you see the Acorn sat stations and you see all these polygons. What these polygons are - every point inside this polygon, for example, is closer to this station than to any other station that's not inside the polygon. So any point inside this polygon - this point, for example, is closer to there than it is to any of the surrounding stations. So basically, it divides the territory up into areas. In a way it's saying, okay, well, the temperature there, we could argue, is mostly influenced by this station, so if we've got a temperature here, and want to check it for accuracy, or whatever, we're more likely to look here, than one of these other stations. What does this have to do with where do we build a site? Well, if we consider this line, any point along this line is the point that's the furthest point between this station and this station, and any point on this line is the point that's the furthest one between that station and that station. Therefore, if we were going to - ah - so any point on one of these edges here, these where these lines meet, is the point that's the furthest from all the adjoining stations. So this point is furthest from that one, that one and that one - and obviously further than any other. So what it comes down to is if we're going to build a new station, we want it on one of these points. On one of these vertices. So it's just another example of how we can use visualisations to solve some real problems. For the moment, that's all we're going to do with maps, and we may return to it soon. The next library we're going to quickly have a look at is called Bokeh. Bokeh is the first library - it works with Python, but its output is targeted generally at web pages. Once again, you'll remember Anscombe's Quartet from a previous slide. We'll do it in Bokeh. It's given us a really nice graph of Anscombe's Quartet. If one sees some
  13. 13. Page 13 of 20 of the original drawings of it, for example, in [Tufter's] book, this is very close to the original, so it was very easy to - well, it required some work to make it similar, but we could - it was flexible enough that we could. I'll quickly show another one, which is another famous machine learning data set, which is Irises. This one is plotting. So what we're plotting is the petal width of different species against the petal length. We see that some species are down her, some species - the green ones - are up here, and some over here. The thing about Bokeh is it allows us some interactivity, so we can do things like zoom, you can also pan, and if we put the output on a web page, the web page can have these same tools. There's a wheel zoom. We can go back to what it looked like initially. So that's Bokeh. Here's one that also came from that last one, called Joyplots. The thing about this is we're plotting a whole lot of variables against a common set of axes. I'll just for the moment skip over Plotly, because I want to look at a few tools that are useful in web development - so we're leaving Python for a moment. The first one is one that I wrote a few years ago. This is using Google Maps and I'm putting some data on it. These are the Acorn Sat stations once again. When we click on one, we get a graph of the climatology, the average monthly temperature, so let's go to Melbourne. We're now in April, to the average maximum temperature for Melbourne is normally 21 degrees. This is the average rainfall for Melbourne - around 50 millimetres. We can also get a time series and we can zoom in on the time series. This graph and the time series were done using a tool called Highcharts. Highcharts is available free for non-commercial use, but it does require licence for any kind of commercial use, and government use is also considered to be commercial. Having said that, if you are
  14. 14. Page 14 of 20 doing web pages and you are looking for a plotting package, it's worth considering Highcharts. The next example is another mapping library. This one is Leaflet. In this case - this is something I did for work. What we're plotting here is - this data is coming from NetCDF files. Some people will be familiar and have used NetCDF - and the data's coming straight out of these NetCDF files. The main purpose of this slide though is to show this library Leaflet, which basically allows us to put data on top of maps. In this case it's gridded data, but we can also put - here we've got some GeoJSON. We could also be putting shape files and other things. There's things like utility boundaries, which you can overlay on the maps. So it basically allows us to overlay data on top of maps. The third example I'll show is one called OpenLayers, and this was one of the more complex visualisations I did. Basically what this one is demonstrating is east coast lows, off the eastern seaboard of Australia, and all of this was overlaid on this map using - the map was done OpenLayers. I think that's all I'll talk about maps. I think finally what we'll do is look at one more library and one more example. The library we're going to look at is called Vega. Once again, it's another simulation. I came across this thing called Parrondo's paradox. For me it was quite mind-blowing, so I just had to do a visualisation to make sure that I understood it and that it worked. Basically - I'll try and explain it quickly - you've got three games you can play. Each of them involves a coin being spun. In game one the coin is more likely to land on tails. So each time in game one you bet on heads - in other words, it's a losing strategy. So that's game one. In game two, we occasionally choose coin one - oh, sorry - we've got coin two, which most often lands on heads, but we don't choose coin two all the time. We just - sorry, we don't choose heads all the time. Sometimes we choose heads, sometimes we choose tails. Most of the
  15. 15. Page 15 of 20 times we choose heads, but two out of three times we choose tails, and it can be shown, once again, that that's a losing strategy. In game three what we do is we randomly decide to either play game one or game two. So if game one, we definitely lose and game two we definitely lose, we would think that choosing game one and game two we should also lose, if we just choose randomly between whether to play game one or game two. In this one I've used this library called Vega, and I think the first thing I'll do is just run - so I play this game 10,000 times. I play game A and plot the results. I play game B 10,000 times, plot the result. Then I do P3, which is where I randomly choose between game one and game two and plot the results. We run the simulation. P1 is when I play game one and we can see I started off with zero dollars - end up with minus $100. When I played game two, which was also a losing strategy, I did actually quite badly. I ended up with minus $250, but when I alternated randomly between the two games, I landed up in the black with plus $150. This site or this Python notebook will be included after the talk. You're welcome to have a look at this and find the mathematical explanation why it works, or you can also just Google Parrondo's Paradox. So, what have we found out? Well, I guess one of the questions is if I want to do visualisations, what's a good tool? In brief, Matplotlib is a good one to start with. Easy things are easy. Flexible things are possible. It can do dozens of different visualisations. It's very good for static plots - in other words, if you're going to publish your results in a book, or whatever, and it also integrates well with Python's maths and science toolkits. If you're familiar with Python, it understands things like NumPy and SciPy, and they're all tightly integrated. Seaborn makes it easier to do, let's say publication-ready plots with Matplotlib.
  16. 16. Page 16 of 20 Bokeh has very nice output. It targets web pages. It's got a slightly easier learning curve than Matplotlib, and it looks good out of the box. Plotly, one of the things is it's based on a commercial package and there's both commercial and non-commercial versions of it available. It leverages D3 for graphics - D3 is a fantastic JavaScript graphics library that unfortunately this talk didn’t give us time for - and because of that, the interaction is more extensive than Bokeh, and also the range of things. One thing I didn’t talk about PDVega - or Vega - is that it's got an interesting way of working in that it defines a language for defining a graph and it displays it, but when you create a graph with Vega, that graph includes all the data that was used to create the graph, so if you're interested in making your publications and your data available - so it's one thing to get - see a graph in a paper and say, okay, well, how do I reproduce this graph? It's another thing to say, okay, this is the graph, and this is all the data that created this graph. So it's really worth considering if it's important to you to publish the data with the graphs. Basemap is based on Matplotlib. It sits on top of Matplotlib. It can be a bit clunky, but it does the job. Cartopy is still, I don't think, 100 per cent production-ready, but it improves on Basemap - makes it easier to use and has some great features. Then I'll quickly go through, Leaflet - its advantages were lightweight, it's quick to learn and use, and supports many formats - most particularly WMS and GeoJSON. OpenLayers is more feature full than Leaflet. It used to be a steeper learning curve than leaflet, but modern versions are actually much easier - or they've improved the - they've made the learning curve less steep.
  17. 17. Page 17 of 20 I didn’t get the chance to demonstrate Cesium, but it can utilise built-in 3G capabilities of browsers, and it works just out of the box. You can install it and immediately you've got a map up and running. I installed it recently, just to try it out and about an hour later decided to download some earthquake data from the United States geographical survey, and within about 15 minutes I was displaying that data on my map. So it makes it really easy. What are my recommendations? If you work with Python and you're not interested in learning a lot of programming and getting deeply into it, but you do need to work with data and you're doing research, I recommend learn Pandas - use Pandas for plotting with static plots and use Vega for the web. Thanks very much. That… Gerry Ryder: Well, thank you so much Martin, for such and informative and practical presentation and bravely, with so many live demos, which we rarely see. So thank you for that. Now, we do have time for questions, so if people have questions or comments, please put them into the question pod. Now's your chance with Martin online to ask any specific questions about packages or just some of the things that you've seen today. So please do ask away. We have got time for a few questions. Martin, we do have a question from Marlon. What's your opinion on tools like Tableau - or Tableau - T-A-B-L-E-A-U? Two people have asked about that one. Martin Schweitzer: Tableau. So Tableau is what's known as a BI tool, or business intelligence tool. It's used in the Bureau. It's a commercial tool. I think I'm correct in saying that it's only commercial. There may be demo versions available. From everything I understand, it does what it's designed to do extremely well. It's very good at building dashboards. I think it often assumes the idea that there's going to be a data warehouse available
  18. 18. Page 18 of 20 - or at least a data mart. I know previous versions where it was used, there were some issues with creating websites that were being presented to the public. This was because it wasn't WCAG compliant - WCAG is the web accessibility guidelines, and for government work websites need to be WCAG compliant. It had some mapping features, but the maps only allowed single layers, which would have made something like what I demonstrated with the rainfall maps very tricky, because we had sometimes up to five or six different layers on those maps. So I guess, I neither want to recommend or dismiss any packages, but I think from everything I understand, and I'm not a regular Tableau user, but it works well for its design purpose and one of the areas where I know people have really enjoyed using it is where they've wanted management type dashboards on their desktop to be able to monitor whatever it was that they were monitoring. Gerry Ryder: Thanks, Martin. John has popped into the question pod that there is a free public version of Tableau, Tableau - if I can get my mouth around that. So if people are interested they could go and check that out for themselves. Colin's asked, Martin, why Python, and not Ruby? He also has asked if MATLAB or R make the grade? Martin Schweitzer: Okay, the reason Python and not Ruby is because I know Python and I don't know Ruby particularly well. When Ruby came out, I started learning it and then other things got in the way. I don't think there's any good reason why not Ruby, but I can't talk with authority on how many - I think one of the things is with data science, Python and R really seem to have taken a lot of that mind share. Between Python and R, I wouldn't - it's six of one and half a dozen of the other. There are a lot of people using R. There are a lot of domains where people really love R. Bioinformatics, I know is one where it's very common. Every - and as I said in the beginning, most of these visualisations
  19. 19. Page 19 of 20 and that are available in almost any language that people look at or any popular language. When people come up with a library like Plotly, they - other people will create bindings for different languages. Gerry Ryder: Thanks, Martin. Another question - do you use other mapping tools like ArcMap? That's another question from Marlon. Martin Schweitzer: Well, at the Bureau Esri products are very popular. I personally don't use ArcMap, and probably just because of the nature of the work that I'm doing, and probably because of the current set of tool chain that we've got. I do use an open source product called QGIS occasionally, but even that I don't use often. Most of my work is done in the - well, of this type - is done using things like JavaScript, and so I just use the JavaScript libraries that are available. Gerry Ryder: Okay, and a question from Susan, who's interested in a online tutorial for beginners in data visualisation. So apart from recordings of your own webinars, Martin, are there any - anything that you could recommend to Susan? That might be one to take on notice. Martin Schweitzer: I think it is, and I'll definitely have a look, but there's a lot of MOOCs, so might go to places of things like Udacity or EDX, and lately I've been noticing, particularly with the current flavour of the month being data science, a lot of these places are offering courses, but yeah, certainly I'll have a look at maybe we'll put a - in one of our snippets or something, a beginner's guide to visualisation. Gerry Ryder: Okay. Thank you, Martin. That's probably a nice segue to plug our updated web page. Now, we - Martin's kindly spent some time updating the content of our web page, on the ANDS website. I'm just showing you the link here. So a lot of the tools and the libraries that Martin's spoken about in the webinars are available and described there, so please go and have a look at that. Also, of course, these webinar recordings will be made available. We have one last question, Martin, from Sophie, do you recommend Codeacademy?
  20. 20. Page 20 of 20 Martin Schweitzer: I haven't used Codeacademy. I've got an account, I know, because I keep getting emails from them, but I think it's pretty much - there's a lot of good stuff available, so I think it's pretty much try and find something that suits you. Gerry Ryder: So that's great timing for the end of our webinar today. Thank you all for coming along, and a big thanks to Martin for two fantastic webinars and presentations and making all the materials available through the presentations and through our updated web page. We look forward to seeing you at one of our future webinars, and in the meantime, have a great afternoon. Thank you very much. END OF TRANSCRIPT

×