17. The data should be used in a storyful way
Standardise data acquisition and analysis
Generate charts in a reproducible way
Use interactive charts with data inside
Supporting Reuse
Editor's Notes
Telling stories with data about courses relates to learning analytics.
Telling stories with data within courses relates to using data in some way as part of the course experience.
I’m not going to talk about the second of those – data about courses – rather I’ll be focusing on the use of data within courses.
This in turn might work in one of two ways – we can use data about progress through the course to influence the learner’s experience of the course, either directly, though displaying content that is in some way influenced by prior progress, or more indirectly, for example by showing a student a dashboard relating to their progress through the course.
Or we might use data as subject matter for the course material itself.
Again, I’m not going to focus on the use of data generated as part of the learning analytics process – instead I’ll be concentrating on how we can use subject related data as part of the course material.
In a story map, we make use of a map to help us illustrate or tell a story. As different locations are mentioned in the story, we highlight them on a map.
“This example, The Russia Left Behind, , from the New York Times, tells the story of a 12 hour drive from St. Petersburg to Moscow. Primarily a textual narrative, with rich photography and video clips to illustrate the story, an animated map legend traces out the route as you read through the story of the journey. Once again, the animated journey line gives you a sense of moving through the landscape as you scroll through the story.” - See more at: http://schoolofdata.org/2014/08/25/seven-ways-to-create-a-storymap/#sthash.rtJK1L6B.dpuf
There are many other kinds of storymap, with software libraries readily available to generate them, that allow you to achieve similar, if slightly less polished, effects. Typically, they use Google Maps or OpenStreetMap to display the map, on which markers or lines are placed. Various styles of story map are possible – a fixed window with an image or text and image carousel that highlights a new location as each image is brought in to view is a popular one.
The data requirements for this sort of display are minimal – just the locations you want to map, and a point in the text from which you want to trigger the display, or movement towards, that particular location.
For carousel style storymaps, no more than four columns of data in a spreadsheet will do the trick – a required one, containing locations, and up to three optional ones: one containing any text you want to display above the marker or if the marker is selected; one containing a link to an image to show; and one containing any text you want to display.
As to how to know where to place the marker on a map from a place name, an operation that typically requires the availability of latitude and longitude data for the location – computers are quite capable of helping to that geocoding operation for you!
Many of you will be familiar with the public engagement work of OU honorary graduate, Professor Hans Rosling, seen here in BBC2’s Don’t Panic programme that we co-produced a year or so ago.
Hans Rosling’s data narration approach demonstrates how we can tell a story through the use an animated data in which data points corresponding to measurements taking at particular points in time a replayed using a time slider.
As part of a set of OpenLearn materials I’m drafting around several short videos commissioned from Hans Rosling folllowing Don’t Panic, materials that might also be referred to from OU courses in development that are currently in production, I’ve started looking at using browser based interactive charts to support the materials.
Whilst Rosling’s Gapminder foundation does publish the Gapminder/motion chart tool for browser or desktop use, it does require either Flash or Java support – which means it doesn’t work on a tablet.
And whilst there is lots of data “built in” to the Gapminder tool – as well as an option to use your own data with it – the workflow for getting you own data into Gapminder, in the format that Gapminder expects – is a bespoke workflow for that tool.
Here’s another chart, this one based on the iScatter charting library developed by Michel Wermelinger from my department, Computing and Communications, and colleagues in LTS.
This chart carries with it several sets of time series data, from the World Bank’s Development Indicators database. The data is in a wide format – that is, we have one column for time, a that identifies a grouping – in this case, country – and separate columns for each indicator.
As well as selecting which data column is plotted against each axis, we can also set the axis scale type – linear, logarithmic, and so on.
The interesting thing about both these charts is that then can be constructed from a simple combination of a data file – as long as the data has the correct shape – and a configuration file, that includes things such as the chart title, was values appear on which axis to start with, what the axis scale types are, and so on.
It’s also interesting because it means we can generate the charts from a workflow that sources the data, cleans it, if required, reshapes it to the form that chart expects, and then saves it as data’n’configuration bindle that the chart can then display. It’s down to the chart to then provide the interactivity within the context of the chart that the user can avail themselves of.
So how might we do that?
A workflow I’ve started exploring recently – originally stemming from the requirement to find an environment for working with data in a programmatic way for the new computing course in data, TM351 – is a technology known as IPython notebooks, and a programming library for the Python language called pandas.
An IPython notebook is computational notebook – you can write text in, and you can also write programming code in it. And then execute that code and display the output from that computation.
IPython notebooks are properly beautiful – and they’re gaining a lot of interest from researchers because they provide a powerful tool for generating literate computer programmes – ones that are self-explaining in a human readable way – and reproducible research: the notebook is ideally self-contained, telling you how to get the data, process it, analyse it and visualise it.
This example shows how I can use a ‘remote data’ service built in to to pandas to retrieve a data set from the World Bank Indicator Data website. The data comes directly from the World Bank and is made available in a form know as a DataFrame that I can immediately start working with. I might also save the data so that I have a local copy of it, and then work from that instead. But I’d also know how I got that local copy of the data in the first place…
Any charts that are generated are embedded in the notebook as the result of running a computation. The chart doesn’t initially as an image – the image is generated by running the code. The diagram is written. We can delete the image and we have lost nothing – we can simply rerun the code and regenerate the image.
From a reuse point of view, this is important for two reasons. Firstly, I can tweak the chart and regenerate it. My chart definition could include a title or styling information that changes the look of the chart, for example, putting into a chart style used by a particular publication, such as the Economist or this nice white theme. Notice the two extra lines I added to my original chart definition – one to add the title, the other to change the styling.
In fact, there is a library available that can make a good attempt at generating an interactive chart that pops up values when you hover over points, for example, from this chart simply by adding one more line of code that takes the chart object and works out what it needs to do to generate what we might term an interactive web chart.
Secondly, I can take the code that generated this chart and - if I have a data set that has a similar shape to the one I use to generate this chart – generate a set of charts for another data set.
The code doesn’t have to change – just the data object I pass to it.
This is similar principle to the use of the motion chart and iScatter chart demonstrated previously – if the data’s in the right form, the chart engine will display and operate the chart for you.
Another question that arises is how we might go about finding data stories that we want to make use of in our courses.
One approach is to find stories – or story types - that other people have found and made use of. The ONS – the Office of National Statistics – is a good source for such stories. They have recently started enriching their regular statements with video summaries of them. This recent one summarises the latest migration figures into and out of the UK, and can be found on YouTube.
One nice thing about this video is that the narrator talks over the construction of the chart. You might just be able to see how the blue line stops mid-way through the chart – in the actual video the lines grow in an animated way. This helps give the viewer the impression of how the corresponding indicator evolved over time. A voiceover narration further explains both the construction, and the statistical interpretation, of the chart.
A conversation with data can be built up around a series of queries made over one or more datasets. Each question asked of the dataset can be used to generate a summary data table or data visualisation. We have already seen how charts can be written – in a very real sense, the sentences we use to construct a chart ask questions of the data – ask it to present itself to use in a particular way that our pattern recognising perceptual system can then help us interpret. Through trying to interpret the result, additional questions are likely to arise, or be prompted by the process of asking the previous question.
IPython notebooks provide an ideal environment for engaging in a data conversation. All steps can be recorded, questions can be slightly revised, and human readable text can be added to either provide an interpretation of one result, or the setting up of another question.
Notebooks can be presented with all output cells cleared, and the reader can then play each cell as they read the the document, generating the result of each data query as they do so. This may increase engagement with the data and encourage readers to either refine and reask a particular question in a slightly different way, or even ask their own questions.
It is both the availability of the data and the context of an environment in which questions can be asked of the data in an interactive, reproducible and if not self-explanatory, at least then a transparent, way, supporting direct reuse – in use – of the data.
In all the examples described, the datasets themselves are quite small. They might even be tiny – think of a bar chart comparing just two columns, to show their rank order and relative size. Two data points.
But what makes the data reusable. Four things, I think, are key:
the data should be used in a storyful way: that is, data should be interpreted and where possible the interpretation should be constructed in a similar sequence to the process it measures might have generated it; this supports a retelling form of reuse – we remember stories and can retell them to others or ourselves;
standardise data acquisition and analysis: if we can get a robust workflow, we can build tooling and support around it, as well as reusing things we have learned or used before;
generate charts in a reproducible way: when we construct charts, do so in a way that means they can be maintained in an easy way, and regenerated from the original data set or an improved or more recent version of the original dataset;
use interactive charts with data inside: this supports reuse of the data in a direct way by the learner.