Data storytelling is connecting these 3 dots. Form, data and story. In which order should they be connected?
The most natural way to proceed is to start from a given form, then add data, then find something to comment about. The story becomes a byproduct of the visualization.
For instance, I choose an Excel chart type, I put all the data I can find into that, and try to find something to say. Or, I want to do a gapminder/trendalyzer visualization, I load a datafile into the tool. I get a functionning visualization, and I can try to comment what is going on.
One other approach is to start with the story, then find data that support the story, and, finally, decide on the form.
My approach is about starting with the story.
This explains my workflow.
Here are some of the examples that I cannot work with.
Commentary on new data. If unemployment rate goes from 9.6 to 9.7%, that in itself doesn’t make a story. I certainly can make a chart about it, but I can’t really offer insight. Anecdotes, or factoids which may be interesting to hear but have no general implication. For example, although most people believe that the USA are the country which is the most concerned with obesity, as 2/3rds of its adult population are overweight or obese, there are micro-countries where this rate is over 90%. Technical information or jargon. I try not to talk of the more technical indicators we have unless I can clearly explain how relevant they are. Self-promotion. The story comes before the author.
Here are a few examples of stories. The idea here is that we start from a theme, we find all the literature we have on that which can be quite expansive, then we « crunch » that into one or several short messages. When writing the messages we try to make them relate to the reader: we avoid large absolute numbers in favor of ratios per capita or growth rates, for instance. We never assume the reader knows what is « much » or what is « little ».
I am very frugal when it comes to data, I’ll explain why later
This chart makes its point with only 20 datapoints, and it could have been less. It is adapted from existing published charts which are busier (i.e. showing other alternative scenarios, or breaking down the emissions by fuel type or region…) This additional data doesn’t help the user to get the point of a chart (ideally, pre-attentively)
This chart appears at http://blog.oecdfactblog.org/?p=281
In this example I give more data for a couple of reasons First, if I can give data for all regions of the world, it reinforces the credibility of the data I give for China. Then, if I can get people to explore the economic history of China, they can get the history of another region without detracting to the main story.
The url for that is http://blog.oecdfactblog.org/?p=173
For this last step, there are several principles I follow for all of my charts.
For my work I am most definitely in the push mode (gentleman in purple). I only have a very tiny audience (subscribers, people who come regularly to the site…). Most of the traffic come from people I have been able to « hook » in a fraction of second with a very simplified chart, such as what can be seen
I have 1 second, probably less, to get the attention of viewers on the main OECD web site (http://oecd.org). The idea is to make this chart on the side of the article seem interesting enough to get a click. For this to happen the chart, the title etc. must be very simple, and should be understandable with no effort.
I cannot affort to confuse users. This is another reason why I only use as much data as I need to. If the « right » data can be manipulated too much by an uninformed user it can be misinterpreted. I want my chart to answer the only question I designed it for. I only give more data if it serves a purpose.
I won’t omit data that disprove my point. If there are facts that render my chart invalid, I won’t run it. That being said, I won’t feature data that doesn’t support my point well enough.
Here we are completely in the « overview ». We may dip into « zoom and filter » and « details on demand » but the focus is on exposition, not exploration.
Here the example I showed came from http://blog.oecdfactblog.org/?p=139 . On this chart all a user can do is highlight the years (and get the exact values) or choose to show more countries, which don’t appear by default. It’s interesting to provide some interaction on the web but for this example more is not needed.
I always include the data of my charts in my posts, so I can let users create their own visualizations from this data and reproduce the chart if they want to.
Typical articles get about 10,000 views. I don’t think it’s a great measure of success. However, especially with the data included, a post can be replicated elsewhere, with or without link to the source and with or without credit. This happens « a lot » (which I cannot measure accurately). What this means is that an idea can find a new home, and travel to new audiences.
Transcript of "Visweek tswd jerome cukier"
TELLING STORIES WITH DATA
Jerome Cukier, OECD
Hi, I’m Jerome Cukier
I work at OECD, an international
agency which creates a lot of data.
(economics, society, environment…)
I publish a blog where I
tell stories with data
What is not a story
What is a story?
Simple. Interesting. Relevant.
What is a story?
A theme A message
Climate change We can stop global warming today
for 11 cents per person per day.
Improving health A baby born today will live 6 hours
more than a baby born yesterday.
China of 200 years ago was much,
much more powerful.
Getting data, selecting data
(only what I need to make my point.)
(This is a schema of the model OECD
uses to forecast greenhouse gas