TAU
The data units
We’re not in Kansas anymore
1. The journey to data
journalism
2. Magicians and unicorns
3. Behind the curtain
https://www.bbc.co.uk/news/uk-england-40273193
https://www.bbc.co.uk/sport/football/44981103
https://www.bbc.co.uk/news/uk-56420186 https://www.bbc.co.uk/news/uk-59594712
● Training
● Journalism
● Events
● Spreadsheets
● Learn to use Flourish
● Google Sheets
● Sourcing data
● The investigative story method
● How to clean datasets
● Building your own dataset
● How to plan an investigation
● Using statistics
● How to map your data
● Finding stories in financial data
● Merging datasets
● Finding stories in procurement data
● Advanced internet search
● Using the Office for National Statistics
● Find the right data for your beat
● Using Freedom of Information laws
500+
Journalists trained during the past
two years
1,700+
Stories generated across partner
titles
(Numbers)
Bisiani et al (2023) The Data Journalism Workforce: Demographics, Skills, Work Practices, and Challenges in the
Aftermath of the COVID-19 Pandemic
https://github.com/bbc-data-unit
BBC England data unit
https://github.com/bbc-data-unit
IDEA
DATA
STORY
IDEA
DATA
STORY
Start with an idea — or with data
Case study: university
artwork
https://github.com/BBC-Data-Unit/university-artwork
https://github.com/BBC-Data-Unit/university-artwork
https://datajournalism.com/read/longreads/data-journalism-ideas
Case study: homelessness
https://github.com/search?q=topic%
3Ahomelessness+org%3ABBC-Dat
a-Unit+fork%3Atrue&type=repositor
ies
https://github.com/BBC-Data-Unit/homelessness-real-figure
https://datajournalism.com/read/longreads/data-journalism-ideas
https://github.com/BBC-Data-Unit/dead-whales
Case studies: context to news
stories, liveblogs, ‘in
numbers’
https://github.com/BBC-Data-Unit/dead-whales
https://github.com/BBC-Data-Unit/dead-whales / https://github.com/BBC-Data-Unit/blue-plaques
https://datajournalism.com/read/longreads/data-journalism-ideas
Story source
1 day max*. Angles:
change, scale, ranking
Data release (public)
Diary story: *plan ahead by
checking previous
releases
1 day max. Angles:
change, scale, ranking.
Data release (press release)
Data story:
check quality of data
1 day max. Angles:
change, scale, ranking.
National data story
(Or data story in other locality)
Local angle data story:
same data but local angle
1-3 days. Angles: scale,
change, exploratory
Event-based news story
(something happens)
Analysis/backgrounder
putting event into context
5-12 weeks. Angles:
scale, change, ranking,
variation
Event-based news story
(something happens)
FOI story
Timescale and angles
vary
Idea/tip-off Data request/FOI story
Timescale and angles
vary
Idea/tip-off Investigation
Story type Timescale/angle
Case study: library cuts
D
ata
locked
in
PD
F
s
https://www.parliament.uk/business/publications/written-questions-answers-statements/written-question/Commons/2016-02-11/27175/
https://datajournalism.com/read/longreads/data-journalism-ideas
Case study: unsolved crime
DATA.POLICE.U
K >2,400 CSVs, 37 million rows
https://github.com/paulbradshaw/commandline/blob/master/movingfiles.md
https://cloud.google.com/bigquery/what-is-bigquery
https://github.com/BBC-Data-Unit/unsolved-crime
Case studies: pandemic
bankruptcies, police
misconduct and data-driven
paths
https://github.com/BBC-Data-Unit/police_misconduct
https://www.policeconduct.gov.uk/news/gmp-officer-arrested-following-allegations-he-abused-his-position
https://onlinejournalismblog.com/2020/08/11/here-are-the-7-types-of-stories-most-often-found-in-data/
https://onlinejournalismblog.com/2020/08/11/here-are-the-7-types-of-stories-most-often-found-in-data/
https://github.com/BBC-Data-Unit/football-finances
https://onlinejournalismblog.com/2020/01/29/how-to-plan-a-journalism-project-that-needs-data-entry/
https://github.com/BBC-Data-Unit/social-media-ab
se https://www.bbc.co.uk/news/uk-63330885
● Source: 3m tweets using Twint + Docker
● Quantify: keyword frequency, ngrams, topic
modelling. Tried APIs to classify likelihood of toxicity
+ Manual checks
https://pudding.cool/2020/07/gendered-descriptions/
https://pudding.cool/2018/07/women-in-parliament/
https://github.com/BBC-Data-Unit/child-speech
Counter-hypothesis checking
https://github.com/BBC-Data-Unit/academytransparency
https://github.com/BBC-Data-Unit/lockdown-gambling
“Proxy data [is] data which acts as a proxy for the
thing you are looking for. Air pollution data, for
example, can be a proxy for transport activity;
energy consumption data can be a proxy for
economic activity; waste collection data is a proxy
for people moving away or working elsewhere. A
spike in people dying at home can raise questions
about what that indicates. Social media chatter and
search trends are regularly used as proxies for
behaviour, too.”
https://datajournalism.com/read/longreads/brainstorm-covid-19-data-story-ideas
Bisiani et al (2023) The Data Journalism Workforce: Demographics, Skills, Work Practices, and Challenges in the
Aftermath of the COVID-19 Pandemic
https://public.flourish.studio/visualisation/3425413/
https://public.flourish.studio/visualisation/6264286/
What next?
Parameterisation: auto-generated websites to help
partner reporters report stories in their area
Generative AI-augmented code: writing regex,
Python, etc. using ChatGPT et al.
Flourish charts: moving away from internal BBC
charts tool
Counter-hypotheses: more systematic checking
https://journalistsresource.org/home/confirmation-bias-strategies-to-avoid-it/
https://onlinejournalismblog.com/2017/08/07/10-principles-for-data-journalism-in-its-second-decade/
Questions?
@PaulBradshaw, Birmingham City University
Online Journalism Blog, BBC Shared data unit
https://datajournalism.com/read/longreads/data-journalism-ideas

Working on data stories: different approaches