Inspiring content - You Don't Need Big Data to Tell Good Data Stories

•Download as PPTX, PDF•

1 like•962 views

OU w/s presentation

You Don’t Need Big Data to Tell Good
Data Stories
Tony Hirst

Data, data stories & reuse

within
courses
about
courses
Telling stories with data

What makes a data
story?

Some data
A representation
of the data
A way of reading the
representation of the data

Story Maps

Charts tell stories

Given that real data
is often horrible…
…how do we get charts
from that data?
So what’s stopping us?

IPython Notebooks
+ pandas

How do we find
data stories?

Conversations
with data

The data should be used in a storyful way
Standardise data acquisition and analysis
Generate charts in a reproducible way
Use interactive charts with data inside
Supporting Reuse

More Related Content

More from Tony Hirst

Dev8d jupyterTony Hirst

Ili 16 robotTony Hirst

Jupyternotebooks ou.pptx

Jupyternotebooks ou.pptx

Jupyternotebooks ou.pptxTony Hirst

Virtual computing.pptx

Virtual computing.pptx

Virtual computing.pptxTony Hirst

ouseful-parlihacks

ouseful-parlihacks

ouseful-parlihacksTony Hirst

Gors appropriate

Gors appropriate

Gors appropriateTony Hirst

Gors appropriate

Gors appropriate

Gors appropriateTony Hirst

Robotlab jupyter

Robotlab jupyter

Robotlab jupyterTony Hirst

Fco open data in half day th-v2

Fco open data in half day th-v2

Fco open data in half day th-v2Tony Hirst

Notes on the Future - ILI2015 Workshop

Notes on the Future - ILI2015 Workshop

Notes on the Future - ILI2015 WorkshopTony Hirst

Community Journalism Conf - hyperlocal data wire

Community Journalism Conf - hyperlocal data wire

Community Journalism Conf - hyperlocal data wireTony Hirst

Residential school 2015_robotics_interest

Residential school 2015_robotics_interest

Residential school 2015_robotics_interestTony Hirst

Data Mining - Separating Fact From Fiction - NetIKX

Data Mining - Separating Fact From Fiction - NetIKX

Data Mining - Separating Fact From Fiction - NetIKXTony Hirst

Week4Tony Hirst

A Quick Tour of OpenRefine

A Quick Tour of OpenRefine

A Quick Tour of OpenRefineTony Hirst

Conversations with data

Conversations with data

Conversations with dataTony Hirst

Data reuse OU workshop bingo

Data reuse OU workshop bingo

Data reuse OU workshop bingoTony Hirst

Lincoln jun14datajournalism

Lincoln jun14datajournalism

Lincoln jun14datajournalismTony Hirst

Lincoln Journalism Research Day - Data Journalism

Lincoln Journalism Research Day - Data Journalism

Lincoln Journalism Research Day - Data JournalismTony Hirst

Calrg14 tm351Tony Hirst

More from Tony Hirst (20)

Dev8d jupyter

Ili 16 robot

Jupyternotebooks ou.pptx

Jupyternotebooks ou.pptx

Jupyternotebooks ou.pptx

Virtual computing.pptx

Virtual computing.pptx

Virtual computing.pptx

ouseful-parlihacks

ouseful-parlihacks

ouseful-parlihacks

Gors appropriate

Gors appropriate

Gors appropriate

Gors appropriate

Gors appropriate

Gors appropriate

Robotlab jupyter

Robotlab jupyter

Robotlab jupyter

Fco open data in half day th-v2

Fco open data in half day th-v2

Fco open data in half day th-v2

Notes on the Future - ILI2015 Workshop

Notes on the Future - ILI2015 Workshop

Notes on the Future - ILI2015 Workshop

Community Journalism Conf - hyperlocal data wire

Community Journalism Conf - hyperlocal data wire

Community Journalism Conf - hyperlocal data wire

Residential school 2015_robotics_interest

Residential school 2015_robotics_interest

Residential school 2015_robotics_interest

Data Mining - Separating Fact From Fiction - NetIKX

Data Mining - Separating Fact From Fiction - NetIKX

Data Mining - Separating Fact From Fiction - NetIKX

Week4

A Quick Tour of OpenRefine

A Quick Tour of OpenRefine

A Quick Tour of OpenRefine

Conversations with data

Conversations with data

Conversations with data

Data reuse OU workshop bingo

Data reuse OU workshop bingo

Data reuse OU workshop bingo

Lincoln jun14datajournalism

Lincoln jun14datajournalism

Lincoln jun14datajournalism

Lincoln Journalism Research Day - Data Journalism

Lincoln Journalism Research Day - Data Journalism

Lincoln Journalism Research Day - Data Journalism

Calrg14 tm351

Recently uploaded

Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"

Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"

Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"National Information Standards Organization (NISO)

Arihant handbook biology for class 11 .pdf

Arihant handbook biology for class 11 .pdf

Arihant handbook biology for class 11 .pdfchloefrazer622

Beyond the EU: DORA and NIS 2 Directive's Global Impact

Beyond the EU: DORA and NIS 2 Directive's Global Impact

Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB

Paris 2024 Olympic Geographies - an activity

Paris 2024 Olympic Geographies - an activity

Paris 2024 Olympic Geographies - an activityGeoBlogs

Organic Name Reactions for the students and aspirants of Chemistry12th.pptx

Organic Name Reactions for the students and aspirants of Chemistry12th.pptx

Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre

CARE OF CHILD IN INCUBATOR..........pptx

CARE OF CHILD IN INCUBATOR..........pptx

CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2

Q4-W6-Restating Informational Text Grade 3

Q4-W6-Restating Informational Text Grade 3

Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron

The Most Excellent Way | 1 Corinthians 13

The Most Excellent Way | 1 Corinthians 13

The Most Excellent Way | 1 Corinthians 13Steve Thomason

Web & Social Media Analytics Previous Year Question Paper.pdf

Web & Social Media Analytics Previous Year Question Paper.pdf

Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande

Introduction to Nonprofit Accounting: The Basics

Introduction to Nonprofit Accounting: The Basics

Introduction to Nonprofit Accounting: The BasicsTechSoup

9548086042 for call girls in Indira Nagar with room service

9548086042 for call girls in Indira Nagar with room service

9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt

Separation of Lanthanides/ Lanthanides and Actinides

Separation of Lanthanides/ Lanthanides and Actinides

Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732

Nutritional Needs Presentation - HLTH 104

Nutritional Needs Presentation - HLTH 104

Nutritional Needs Presentation - HLTH 104misteraugie

microwave assisted reaction. General introduction

microwave assisted reaction. General introduction

microwave assisted reaction. General introductionMaksud Ahmed

Mastering the Unannounced Regulatory Inspection

Mastering the Unannounced Regulatory Inspection

Mastering the Unannounced Regulatory InspectionSafetyChain Software

Sanyam Choudhary Chemistry practical.pdf

Sanyam Choudhary Chemistry practical.pdf

Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019

Software Engineering Methodologies (overview)

Software Engineering Methodologies (overview)

Software Engineering Methodologies (overview)eniolaolutunde

Z Score,T Score, Percential Rank and Box Plot Graph

Z Score,T Score, Percential Rank and Box Plot Graph

Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K

Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...

Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...

Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching

Sports & Fitness Value Added Course FY..

Sports & Fitness Value Added Course FY..

Sports & Fitness Value Added Course FY..Disha Kariya

Recently uploaded (20)

Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"

Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"

Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"

Arihant handbook biology for class 11 .pdf

Arihant handbook biology for class 11 .pdf

Arihant handbook biology for class 11 .pdf

Beyond the EU: DORA and NIS 2 Directive's Global Impact

Beyond the EU: DORA and NIS 2 Directive's Global Impact

Beyond the EU: DORA and NIS 2 Directive's Global Impact

Paris 2024 Olympic Geographies - an activity

Paris 2024 Olympic Geographies - an activity

Paris 2024 Olympic Geographies - an activity

Organic Name Reactions for the students and aspirants of Chemistry12th.pptx

Organic Name Reactions for the students and aspirants of Chemistry12th.pptx

Organic Name Reactions for the students and aspirants of Chemistry12th.pptx

CARE OF CHILD IN INCUBATOR..........pptx

CARE OF CHILD IN INCUBATOR..........pptx

CARE OF CHILD IN INCUBATOR..........pptx

Q4-W6-Restating Informational Text Grade 3

Q4-W6-Restating Informational Text Grade 3

Q4-W6-Restating Informational Text Grade 3

The Most Excellent Way | 1 Corinthians 13

The Most Excellent Way | 1 Corinthians 13

The Most Excellent Way | 1 Corinthians 13

Web & Social Media Analytics Previous Year Question Paper.pdf

Web & Social Media Analytics Previous Year Question Paper.pdf

Web & Social Media Analytics Previous Year Question Paper.pdf

Introduction to Nonprofit Accounting: The Basics

Introduction to Nonprofit Accounting: The Basics

Introduction to Nonprofit Accounting: The Basics

9548086042 for call girls in Indira Nagar with room service

9548086042 for call girls in Indira Nagar with room service

9548086042 for call girls in Indira Nagar with room service

Separation of Lanthanides/ Lanthanides and Actinides

Separation of Lanthanides/ Lanthanides and Actinides

Separation of Lanthanides/ Lanthanides and Actinides

Nutritional Needs Presentation - HLTH 104

Nutritional Needs Presentation - HLTH 104

Nutritional Needs Presentation - HLTH 104

microwave assisted reaction. General introduction

microwave assisted reaction. General introduction

microwave assisted reaction. General introduction

Mastering the Unannounced Regulatory Inspection

Mastering the Unannounced Regulatory Inspection

Mastering the Unannounced Regulatory Inspection

Sanyam Choudhary Chemistry practical.pdf

Sanyam Choudhary Chemistry practical.pdf

Sanyam Choudhary Chemistry practical.pdf

Software Engineering Methodologies (overview)

Software Engineering Methodologies (overview)

Software Engineering Methodologies (overview)

Z Score,T Score, Percential Rank and Box Plot Graph

Z Score,T Score, Percential Rank and Box Plot Graph

Z Score,T Score, Percential Rank and Box Plot Graph

Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...

Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...

Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...

Sports & Fitness Value Added Course FY..

Sports & Fitness Value Added Course FY..

Sports & Fitness Value Added Course FY..

Inspiring content - You Don't Need Big Data to Tell Good Data Stories

1. You Don’t Need Big Data to Tell Good Data Stories Tony Hirst

2. Data, data stories & reuse

3. within courses about courses Telling stories with data

4. What makes a data story?

5. Some data A representation of the data A way of reading the representation of the data

7. Charts tell stories

8.

9.

10. Given that real data is often horrible… …how do we get charts from that data? So what’s stopping us?

11. IPython Notebooks + pandas

12.

13.

14. How do we find data stories?

15.

16. Conversations with data

17. The data should be used in a storyful way Standardise data acquisition and analysis Generate charts in a reproducible way Use interactive charts with data inside Supporting Reuse

Editor's Notes

Telling stories with data about courses relates to learning analytics. Telling stories with data within courses relates to using data in some way as part of the course experience. I’m not going to talk about the second of those – data about courses – rather I’ll be focusing on the use of data within courses. This in turn might work in one of two ways – we can use data about progress through the course to influence the learner’s experience of the course, either directly, though displaying content that is in some way influenced by prior progress, or more indirectly, for example by showing a student a dashboard relating to their progress through the course. Or we might use data as subject matter for the course material itself. Again, I’m not going to focus on the use of data generated as part of the learning analytics process – instead I’ll be concentrating on how we can use subject related data as part of the course material.
In a story map, we make use of a map to help us illustrate or tell a story. As different locations are mentioned in the story, we highlight them on a map. “This example, The Russia Left Behind, , from the New York Times, tells the story of a 12 hour drive from St. Petersburg to Moscow. Primarily a textual narrative, with rich photography and video clips to illustrate the story, an animated map legend traces out the route as you read through the story of the journey. Once again, the animated journey line gives you a sense of moving through the landscape as you scroll through the story.” - See more at: http://schoolofdata.org/2014/08/25/seven-ways-to-create-a-storymap/#sthash.rtJK1L6B.dpuf There are many other kinds of storymap, with software libraries readily available to generate them, that allow you to achieve similar, if slightly less polished, effects. Typically, they use Google Maps or OpenStreetMap to display the map, on which markers or lines are placed. Various styles of story map are possible – a fixed window with an image or text and image carousel that highlights a new location as each image is brought in to view is a popular one. The data requirements for this sort of display are minimal – just the locations you want to map, and a point in the text from which you want to trigger the display, or movement towards, that particular location. For carousel style storymaps, no more than four columns of data in a spreadsheet will do the trick – a required one, containing locations, and up to three optional ones: one containing any text you want to display above the marker or if the marker is selected; one containing a link to an image to show; and one containing any text you want to display. As to how to know where to place the marker on a map from a place name, an operation that typically requires the availability of latitude and longitude data for the location – computers are quite capable of helping to that geocoding operation for you!
Many of you will be familiar with the public engagement work of OU honorary graduate, Professor Hans Rosling, seen here in BBC2’s Don’t Panic programme that we co-produced a year or so ago. Hans Rosling’s data narration approach demonstrates how we can tell a story through the use an animated data in which data points corresponding to measurements taking at particular points in time a replayed using a time slider.
As part of a set of OpenLearn materials I’m drafting around several short videos commissioned from Hans Rosling folllowing Don’t Panic, materials that might also be referred to from OU courses in development that are currently in production, I’ve started looking at using browser based interactive charts to support the materials. Whilst Rosling’s Gapminder foundation does publish the Gapminder/motion chart tool for browser or desktop use, it does require either Flash or Java support – which means it doesn’t work on a tablet. And whilst there is lots of data “built in” to the Gapminder tool – as well as an option to use your own data with it – the workflow for getting you own data into Gapminder, in the format that Gapminder expects – is a bespoke workflow for that tool.
Here’s another chart, this one based on the iScatter charting library developed by Michel Wermelinger from my department, Computing and Communications, and colleagues in LTS. This chart carries with it several sets of time series data, from the World Bank’s Development Indicators database. The data is in a wide format – that is, we have one column for time, a that identifies a grouping – in this case, country – and separate columns for each indicator. As well as selecting which data column is plotted against each axis, we can also set the axis scale type – linear, logarithmic, and so on. The interesting thing about both these charts is that then can be constructed from a simple combination of a data file – as long as the data has the correct shape – and a configuration file, that includes things such as the chart title, was values appear on which axis to start with, what the axis scale types are, and so on.
It’s also interesting because it means we can generate the charts from a workflow that sources the data, cleans it, if required, reshapes it to the form that chart expects, and then saves it as data’n’configuration bindle that the chart can then display. It’s down to the chart to then provide the interactivity within the context of the chart that the user can avail themselves of. So how might we do that?
A workflow I’ve started exploring recently – originally stemming from the requirement to find an environment for working with data in a programmatic way for the new computing course in data, TM351 – is a technology known as IPython notebooks, and a programming library for the Python language called pandas. An IPython notebook is computational notebook – you can write text in, and you can also write programming code in it. And then execute that code and display the output from that computation. IPython notebooks are properly beautiful – and they’re gaining a lot of interest from researchers because they provide a powerful tool for generating literate computer programmes – ones that are self-explaining in a human readable way – and reproducible research: the notebook is ideally self-contained, telling you how to get the data, process it, analyse it and visualise it. This example shows how I can use a ‘remote data’ service built in to to pandas to retrieve a data set from the World Bank Indicator Data website. The data comes directly from the World Bank and is made available in a form know as a DataFrame that I can immediately start working with. I might also save the data so that I have a local copy of it, and then work from that instead. But I’d also know how I got that local copy of the data in the first place…
Any charts that are generated are embedded in the notebook as the result of running a computation. The chart doesn’t initially as an image – the image is generated by running the code. The diagram is written. We can delete the image and we have lost nothing – we can simply rerun the code and regenerate the image.
From a reuse point of view, this is important for two reasons. Firstly, I can tweak the chart and regenerate it. My chart definition could include a title or styling information that changes the look of the chart, for example, putting into a chart style used by a particular publication, such as the Economist or this nice white theme. Notice the two extra lines I added to my original chart definition – one to add the title, the other to change the styling. In fact, there is a library available that can make a good attempt at generating an interactive chart that pops up values when you hover over points, for example, from this chart simply by adding one more line of code that takes the chart object and works out what it needs to do to generate what we might term an interactive web chart. Secondly, I can take the code that generated this chart and - if I have a data set that has a similar shape to the one I use to generate this chart – generate a set of charts for another data set. The code doesn’t have to change – just the data object I pass to it. This is similar principle to the use of the motion chart and iScatter chart demonstrated previously – if the data’s in the right form, the chart engine will display and operate the chart for you.
Another question that arises is how we might go about finding data stories that we want to make use of in our courses.
One approach is to find stories – or story types - that other people have found and made use of. The ONS – the Office of National Statistics – is a good source for such stories. They have recently started enriching their regular statements with video summaries of them. This recent one summarises the latest migration figures into and out of the UK, and can be found on YouTube. One nice thing about this video is that the narrator talks over the construction of the chart. You might just be able to see how the blue line stops mid-way through the chart – in the actual video the lines grow in an animated way. This helps give the viewer the impression of how the corresponding indicator evolved over time. A voiceover narration further explains both the construction, and the statistical interpretation, of the chart.
A conversation with data can be built up around a series of queries made over one or more datasets. Each question asked of the dataset can be used to generate a summary data table or data visualisation. We have already seen how charts can be written – in a very real sense, the sentences we use to construct a chart ask questions of the data – ask it to present itself to use in a particular way that our pattern recognising perceptual system can then help us interpret. Through trying to interpret the result, additional questions are likely to arise, or be prompted by the process of asking the previous question. IPython notebooks provide an ideal environment for engaging in a data conversation. All steps can be recorded, questions can be slightly revised, and human readable text can be added to either provide an interpretation of one result, or the setting up of another question. Notebooks can be presented with all output cells cleared, and the reader can then play each cell as they read the the document, generating the result of each data query as they do so. This may increase engagement with the data and encourage readers to either refine and reask a particular question in a slightly different way, or even ask their own questions. It is both the availability of the data and the context of an environment in which questions can be asked of the data in an interactive, reproducible and if not self-explanatory, at least then a transparent, way, supporting direct reuse – in use – of the data.
In all the examples described, the datasets themselves are quite small. They might even be tiny – think of a bar chart comparing just two columns, to show their rank order and relative size. Two data points. But what makes the data reusable. Four things, I think, are key: the data should be used in a storyful way: that is, data should be interpreted and where possible the interpretation should be constructed in a similar sequence to the process it measures might have generated it; this supports a retelling form of reuse – we remember stories and can retell them to others or ourselves; standardise data acquisition and analysis: if we can get a robust workflow, we can build tooling and support around it, as well as reusing things we have learned or used before; generate charts in a reproducible way: when we construct charts, do so in a way that means they can be maintained in an easy way, and regenerated from the original data set or an improved or more recent version of the original dataset; use interactive charts with data inside: this supports reuse of the data in a direct way by the learner.