SlideShare a Scribd company logo
1 of 55
Download to read offline
Data Journalism
Having Conversations with Data
Tony Hirst
Computing and Communications
The Open University
What is
journalism?
[sensemaking]
InvertedPyramid
What is
data?
[a particular type
of source]
Data is just
another type of
source…
What is data
journalism?
Data journalism
is investigative
Journalism that
uses data as one
of the sources
(We don’t have
to produce a map
or a chart in the
final piece)
The practice of
data journalism
find stories
tell stories
find stories
Anscombe’sQuartet
THE BLOCKERS
Reading
Charts
Blocker
“Data points are just
words, but when
connected with a squiggly
line they tell a story”
Christopher Brown, “Making Sense of Squiggly Lines”, 2011, ISBN 978-0-9832593-1-2
Lincoln Journalism Research Day - Data Journalism
Lincoln Journalism Research Day - Data Journalism
“Conversations
with data”
ouseful.info - A Wrangling Example With OpenRefine: Making “Oven Ready Data”
THE BLOCKERS
Skills
Blocker
Tidying data
Lincoln Journalism Research Day - Data Journalism
Lincoln Journalism Research Day - Data Journalism
KnowThyTools…
Lincoln Journalism Research Day - Data Journalism
Data Distributions
Outliers
InvertedPyramid
Data can
confirm what
we think we
know
Data can surprise
us and force us to
rethink what we
think we know
Trends and (anti)correlations...
Lincoln Journalism Research Day - Data Journalism
Data makes
most sense
when
contextualised
THE BLOCKERS
Skills
Blocker
[statistics]
(the art of looking at one number in the context of other numbers)
Asking Questions
of Data
Data Distributions
Outliers
We need to teach
people how to
ask questions
We need to teach
people how to
read charts
Lincoln Journalism Research Day - Data Journalism
SQL was
developed as a
navigational aid
Lincoln Journalism Research Day - Data Journalism
intitle:yarl
filetype:pdf
inurl:http://www.justice.gov.uk/do
wnloads/publications/inspectorate
-reports/hmipris/immigration-
removal-centre-inspections/
Lincoln Journalism Research Day - Data Journalism
intitle:yarl
filetype:pdf
inurl:http://www.justice.gov.uk/do
wnloads/publications/inspectorate
-reports/hmipris/immigration-
removal-centre-inspections/
SELECT * FROM searchindex
WHERE title LIKE “%Yarl%”
AND filetype=‘pdf’
AND url LIKE “http:.. /%”
Reproducibility
Lincoln Journalism Research Day - Data Journalism
tell stories
…but that’s
another story…
tony.hirst@open.ac.uk
@psychemedia
blog.ouseful.info
Anyquestions?

More Related Content

Similar to Lincoln Journalism Research Day - Data Journalism

Mac201 data journalism lecture
Mac201 data journalism lectureMac201 data journalism lecture
Mac201 data journalism lectureRob Jewitt
 
Storytelling in a digital age - challenges of a Data Journalist
Storytelling in a digital age - challenges of a Data JournalistStorytelling in a digital age - challenges of a Data Journalist
Storytelling in a digital age - challenges of a Data JournalistHille van der Kaa MA MBA
 
Mac373 med312 data journalism lecture
Mac373 med312 data journalism lectureMac373 med312 data journalism lecture
Mac373 med312 data journalism lectureRob Jewitt
 
Data Journalism - Introduction
Data Journalism - IntroductionData Journalism - Introduction
Data Journalism - IntroductionBahareh Heravi
 
Crisis and Interaction Design
Crisis and Interaction DesignCrisis and Interaction Design
Crisis and Interaction DesignChris B. France
 
Design week - Chris Blow
Design week - Chris BlowDesign week - Chris Blow
Design week - Chris BlowAynne Valencia
 
Mac281 big data & journalism lecture 2014
Mac281 big data &  journalism lecture 2014Mac281 big data &  journalism lecture 2014
Mac281 big data & journalism lecture 2014Rob Jewitt
 
Artificial Intelligence For Investigative Reporting
Artificial Intelligence For Investigative ReportingArtificial Intelligence For Investigative Reporting
Artificial Intelligence For Investigative ReportingJennifer Strong
 
Social media practices and implications for journalists
Social media practices and implications for journalistsSocial media practices and implications for journalists
Social media practices and implications for journalistsJohn Bergin
 
Journalism, data and storytelling: navigating the battlefield
Journalism, data and storytelling: navigating the battlefieldJournalism, data and storytelling: navigating the battlefield
Journalism, data and storytelling: navigating the battlefieldPaul Bradshaw
 
Effective Presentations using Data Visualization
Effective Presentations using Data VisualizationEffective Presentations using Data Visualization
Effective Presentations using Data VisualizationHeather Wilmore Hornbeak
 
Data Journalism: chapter from Online Journalism Handbook first edition
Data Journalism: chapter from Online Journalism Handbook first editionData Journalism: chapter from Online Journalism Handbook first edition
Data Journalism: chapter from Online Journalism Handbook first editionPaul Bradshaw
 
Iamcr impact internet_newsroom_wysv_stof_16_07_11_x
Iamcr impact internet_newsroom_wysv_stof_16_07_11_xIamcr impact internet_newsroom_wysv_stof_16_07_11_x
Iamcr impact internet_newsroom_wysv_stof_16_07_11_xVinzenz Wyss
 
Biases In Decision Making
Biases In Decision MakingBiases In Decision Making
Biases In Decision MakingSara Rouse
 
The story of Data Stories
The story of Data StoriesThe story of Data Stories
The story of Data StoriesElena Simperl
 
Cultural text mining workshop
Cultural text mining workshopCultural text mining workshop
Cultural text mining workshopPim Huijnen
 

Similar to Lincoln Journalism Research Day - Data Journalism (20)

Mac201 data journalism lecture
Mac201 data journalism lectureMac201 data journalism lecture
Mac201 data journalism lecture
 
What is Data?
What is Data?What is Data?
What is Data?
 
Storytelling in a digital age - challenges of a Data Journalist
Storytelling in a digital age - challenges of a Data JournalistStorytelling in a digital age - challenges of a Data Journalist
Storytelling in a digital age - challenges of a Data Journalist
 
Mac373 med312 data journalism lecture
Mac373 med312 data journalism lectureMac373 med312 data journalism lecture
Mac373 med312 data journalism lecture
 
Data Journalism - Introduction
Data Journalism - IntroductionData Journalism - Introduction
Data Journalism - Introduction
 
Crisis and Interaction Design
Crisis and Interaction DesignCrisis and Interaction Design
Crisis and Interaction Design
 
Design week - Chris Blow
Design week - Chris BlowDesign week - Chris Blow
Design week - Chris Blow
 
Mac281 big data & journalism lecture 2014
Mac281 big data &  journalism lecture 2014Mac281 big data &  journalism lecture 2014
Mac281 big data & journalism lecture 2014
 
Artificial Intelligence For Investigative Reporting
Artificial Intelligence For Investigative ReportingArtificial Intelligence For Investigative Reporting
Artificial Intelligence For Investigative Reporting
 
ENP_Dutch_Infoday_PHuijnen
ENP_Dutch_Infoday_PHuijnen ENP_Dutch_Infoday_PHuijnen
ENP_Dutch_Infoday_PHuijnen
 
Fact-checking in the newsroom: best practices, open questions
Fact-checking in the newsroom: best practices, open questionsFact-checking in the newsroom: best practices, open questions
Fact-checking in the newsroom: best practices, open questions
 
Social media practices and implications for journalists
Social media practices and implications for journalistsSocial media practices and implications for journalists
Social media practices and implications for journalists
 
Journalism, data and storytelling: navigating the battlefield
Journalism, data and storytelling: navigating the battlefieldJournalism, data and storytelling: navigating the battlefield
Journalism, data and storytelling: navigating the battlefield
 
Effective Presentations using Data Visualization
Effective Presentations using Data VisualizationEffective Presentations using Data Visualization
Effective Presentations using Data Visualization
 
Royal blasingame
Royal blasingameRoyal blasingame
Royal blasingame
 
Data Journalism: chapter from Online Journalism Handbook first edition
Data Journalism: chapter from Online Journalism Handbook first editionData Journalism: chapter from Online Journalism Handbook first edition
Data Journalism: chapter from Online Journalism Handbook first edition
 
Iamcr impact internet_newsroom_wysv_stof_16_07_11_x
Iamcr impact internet_newsroom_wysv_stof_16_07_11_xIamcr impact internet_newsroom_wysv_stof_16_07_11_x
Iamcr impact internet_newsroom_wysv_stof_16_07_11_x
 
Biases In Decision Making
Biases In Decision MakingBiases In Decision Making
Biases In Decision Making
 
The story of Data Stories
The story of Data StoriesThe story of Data Stories
The story of Data Stories
 
Cultural text mining workshop
Cultural text mining workshopCultural text mining workshop
Cultural text mining workshop
 

More from Tony Hirst

15 in 20 research fiesta
15 in 20 research fiesta15 in 20 research fiesta
15 in 20 research fiestaTony Hirst
 
Jupyternotebooks ou.pptx
Jupyternotebooks ou.pptxJupyternotebooks ou.pptx
Jupyternotebooks ou.pptxTony Hirst
 
Virtual computing.pptx
Virtual computing.pptxVirtual computing.pptx
Virtual computing.pptxTony Hirst
 
ouseful-parlihacks
ouseful-parlihacksouseful-parlihacks
ouseful-parlihacksTony Hirst
 
Gors appropriate
Gors appropriateGors appropriate
Gors appropriateTony Hirst
 
Gors appropriate
Gors appropriateGors appropriate
Gors appropriateTony Hirst
 
Robotlab jupyter
Robotlab   jupyterRobotlab   jupyter
Robotlab jupyterTony Hirst
 
Fco open data in half day th-v2
Fco open data in half day  th-v2Fco open data in half day  th-v2
Fco open data in half day th-v2Tony Hirst
 
Notes on the Future - ILI2015 Workshop
Notes on the Future - ILI2015 WorkshopNotes on the Future - ILI2015 Workshop
Notes on the Future - ILI2015 WorkshopTony Hirst
 
Community Journalism Conf - hyperlocal data wire
Community Journalism Conf - hyperlocal data wireCommunity Journalism Conf - hyperlocal data wire
Community Journalism Conf - hyperlocal data wireTony Hirst
 
Residential school 2015_robotics_interest
Residential school 2015_robotics_interestResidential school 2015_robotics_interest
Residential school 2015_robotics_interestTony Hirst
 
Data Mining - Separating Fact From Fiction - NetIKX
Data Mining - Separating Fact From Fiction - NetIKXData Mining - Separating Fact From Fiction - NetIKX
Data Mining - Separating Fact From Fiction - NetIKXTony Hirst
 
A Quick Tour of OpenRefine
A Quick Tour of OpenRefineA Quick Tour of OpenRefine
A Quick Tour of OpenRefineTony Hirst
 
Conversations with data
Conversations with dataConversations with data
Conversations with dataTony Hirst
 
Data reuse OU workshop bingo
Data reuse OU workshop bingoData reuse OU workshop bingo
Data reuse OU workshop bingoTony Hirst
 
Inspiring content - You Don't Need Big Data to Tell Good Data Stories
Inspiring content - You Don't Need Big Data to Tell Good Data Stories Inspiring content - You Don't Need Big Data to Tell Good Data Stories
Inspiring content - You Don't Need Big Data to Tell Good Data Stories Tony Hirst
 
Lincoln jun14datajournalism
Lincoln jun14datajournalismLincoln jun14datajournalism
Lincoln jun14datajournalismTony Hirst
 

More from Tony Hirst (20)

15 in 20 research fiesta
15 in 20 research fiesta15 in 20 research fiesta
15 in 20 research fiesta
 
Dev8d jupyter
Dev8d jupyterDev8d jupyter
Dev8d jupyter
 
Ili 16 robot
Ili 16 robotIli 16 robot
Ili 16 robot
 
Jupyternotebooks ou.pptx
Jupyternotebooks ou.pptxJupyternotebooks ou.pptx
Jupyternotebooks ou.pptx
 
Virtual computing.pptx
Virtual computing.pptxVirtual computing.pptx
Virtual computing.pptx
 
ouseful-parlihacks
ouseful-parlihacksouseful-parlihacks
ouseful-parlihacks
 
Gors appropriate
Gors appropriateGors appropriate
Gors appropriate
 
Gors appropriate
Gors appropriateGors appropriate
Gors appropriate
 
Robotlab jupyter
Robotlab   jupyterRobotlab   jupyter
Robotlab jupyter
 
Fco open data in half day th-v2
Fco open data in half day  th-v2Fco open data in half day  th-v2
Fco open data in half day th-v2
 
Notes on the Future - ILI2015 Workshop
Notes on the Future - ILI2015 WorkshopNotes on the Future - ILI2015 Workshop
Notes on the Future - ILI2015 Workshop
 
Community Journalism Conf - hyperlocal data wire
Community Journalism Conf - hyperlocal data wireCommunity Journalism Conf - hyperlocal data wire
Community Journalism Conf - hyperlocal data wire
 
Residential school 2015_robotics_interest
Residential school 2015_robotics_interestResidential school 2015_robotics_interest
Residential school 2015_robotics_interest
 
Data Mining - Separating Fact From Fiction - NetIKX
Data Mining - Separating Fact From Fiction - NetIKXData Mining - Separating Fact From Fiction - NetIKX
Data Mining - Separating Fact From Fiction - NetIKX
 
Week4
Week4Week4
Week4
 
A Quick Tour of OpenRefine
A Quick Tour of OpenRefineA Quick Tour of OpenRefine
A Quick Tour of OpenRefine
 
Conversations with data
Conversations with dataConversations with data
Conversations with data
 
Data reuse OU workshop bingo
Data reuse OU workshop bingoData reuse OU workshop bingo
Data reuse OU workshop bingo
 
Inspiring content - You Don't Need Big Data to Tell Good Data Stories
Inspiring content - You Don't Need Big Data to Tell Good Data Stories Inspiring content - You Don't Need Big Data to Tell Good Data Stories
Inspiring content - You Don't Need Big Data to Tell Good Data Stories
 
Lincoln jun14datajournalism
Lincoln jun14datajournalismLincoln jun14datajournalism
Lincoln jun14datajournalism
 

Recently uploaded

EDD8524 The Future of Educational Leader
EDD8524 The Future of Educational LeaderEDD8524 The Future of Educational Leader
EDD8524 The Future of Educational LeaderDr. Bruce A. Johnson
 
LEAD5623 The Economics of Community Coll
LEAD5623 The Economics of Community CollLEAD5623 The Economics of Community Coll
LEAD5623 The Economics of Community CollDr. Bruce A. Johnson
 
Alamkara theory by Bhamaha Indian Poetics (1).pptx
Alamkara theory by Bhamaha Indian Poetics (1).pptxAlamkara theory by Bhamaha Indian Poetics (1).pptx
Alamkara theory by Bhamaha Indian Poetics (1).pptxDhatriParmar
 
AI Uses and Misuses: Academic and Workplace Applications
AI Uses and Misuses: Academic and Workplace ApplicationsAI Uses and Misuses: Academic and Workplace Applications
AI Uses and Misuses: Academic and Workplace ApplicationsStella Lee
 
Awards Presentation 2024 - March 12 2024
Awards Presentation 2024 - March 12 2024Awards Presentation 2024 - March 12 2024
Awards Presentation 2024 - March 12 2024bsellato
 
Riti theory by Vamana Indian poetics.pptx
Riti theory by Vamana Indian poetics.pptxRiti theory by Vamana Indian poetics.pptx
Riti theory by Vamana Indian poetics.pptxDhatriParmar
 
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...gdgsurrey
 
BBA 205 BE UNIT 2 economic systems prof dr kanchan.pptx
BBA 205 BE UNIT 2 economic systems prof dr kanchan.pptxBBA 205 BE UNIT 2 economic systems prof dr kanchan.pptx
BBA 205 BE UNIT 2 economic systems prof dr kanchan.pptxProf. Kanchan Kumari
 
ASTRINGENTS.pdf Pharmacognosy chapter 5 diploma in Pharmacy
ASTRINGENTS.pdf Pharmacognosy chapter 5 diploma in PharmacyASTRINGENTS.pdf Pharmacognosy chapter 5 diploma in Pharmacy
ASTRINGENTS.pdf Pharmacognosy chapter 5 diploma in PharmacySumit Tiwari
 
3.12.24 Freedom Summer in Mississippi.pptx
3.12.24 Freedom Summer in Mississippi.pptx3.12.24 Freedom Summer in Mississippi.pptx
3.12.24 Freedom Summer in Mississippi.pptxmary850239
 
Metabolism of lipoproteins & its disorders(Chylomicron & VLDL & LDL).pptx
Metabolism of  lipoproteins & its disorders(Chylomicron & VLDL & LDL).pptxMetabolism of  lipoproteins & its disorders(Chylomicron & VLDL & LDL).pptx
Metabolism of lipoproteins & its disorders(Chylomicron & VLDL & LDL).pptxDr. Santhosh Kumar. N
 
2024.03.16 How to write better quality materials for your learners ELTABB San...
2024.03.16 How to write better quality materials for your learners ELTABB San...2024.03.16 How to write better quality materials for your learners ELTABB San...
2024.03.16 How to write better quality materials for your learners ELTABB San...Sandy Millin
 
Dhavni Theory by Anandvardhana Indian Poetics
Dhavni Theory by Anandvardhana Indian PoeticsDhavni Theory by Anandvardhana Indian Poetics
Dhavni Theory by Anandvardhana Indian PoeticsDhatriParmar
 
The basics of sentences session 8pptx.pptx
The basics of sentences session 8pptx.pptxThe basics of sentences session 8pptx.pptx
The basics of sentences session 8pptx.pptxheathfieldcps1
 
Quantitative research methodology and survey design
Quantitative research methodology and survey designQuantitative research methodology and survey design
Quantitative research methodology and survey designBalelaBoru
 
3.12.24 The Social Construction of Gender.pptx
3.12.24 The Social Construction of Gender.pptx3.12.24 The Social Construction of Gender.pptx
3.12.24 The Social Construction of Gender.pptxmary850239
 
Auchitya Theory by Kshemendra Indian Poetics
Auchitya Theory by Kshemendra Indian PoeticsAuchitya Theory by Kshemendra Indian Poetics
Auchitya Theory by Kshemendra Indian PoeticsDhatriParmar
 
Research Methodology and Tips on Better Research
Research Methodology and Tips on Better ResearchResearch Methodology and Tips on Better Research
Research Methodology and Tips on Better ResearchRushdi Shams
 

Recently uploaded (20)

EDD8524 The Future of Educational Leader
EDD8524 The Future of Educational LeaderEDD8524 The Future of Educational Leader
EDD8524 The Future of Educational Leader
 
LEAD5623 The Economics of Community Coll
LEAD5623 The Economics of Community CollLEAD5623 The Economics of Community Coll
LEAD5623 The Economics of Community Coll
 
Alamkara theory by Bhamaha Indian Poetics (1).pptx
Alamkara theory by Bhamaha Indian Poetics (1).pptxAlamkara theory by Bhamaha Indian Poetics (1).pptx
Alamkara theory by Bhamaha Indian Poetics (1).pptx
 
AI Uses and Misuses: Academic and Workplace Applications
AI Uses and Misuses: Academic and Workplace ApplicationsAI Uses and Misuses: Academic and Workplace Applications
AI Uses and Misuses: Academic and Workplace Applications
 
Awards Presentation 2024 - March 12 2024
Awards Presentation 2024 - March 12 2024Awards Presentation 2024 - March 12 2024
Awards Presentation 2024 - March 12 2024
 
Riti theory by Vamana Indian poetics.pptx
Riti theory by Vamana Indian poetics.pptxRiti theory by Vamana Indian poetics.pptx
Riti theory by Vamana Indian poetics.pptx
 
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
 
Least Significance Difference:Biostatics and Research Methodology
Least Significance Difference:Biostatics and Research MethodologyLeast Significance Difference:Biostatics and Research Methodology
Least Significance Difference:Biostatics and Research Methodology
 
BBA 205 BE UNIT 2 economic systems prof dr kanchan.pptx
BBA 205 BE UNIT 2 economic systems prof dr kanchan.pptxBBA 205 BE UNIT 2 economic systems prof dr kanchan.pptx
BBA 205 BE UNIT 2 economic systems prof dr kanchan.pptx
 
ASTRINGENTS.pdf Pharmacognosy chapter 5 diploma in Pharmacy
ASTRINGENTS.pdf Pharmacognosy chapter 5 diploma in PharmacyASTRINGENTS.pdf Pharmacognosy chapter 5 diploma in Pharmacy
ASTRINGENTS.pdf Pharmacognosy chapter 5 diploma in Pharmacy
 
3.12.24 Freedom Summer in Mississippi.pptx
3.12.24 Freedom Summer in Mississippi.pptx3.12.24 Freedom Summer in Mississippi.pptx
3.12.24 Freedom Summer in Mississippi.pptx
 
Metabolism of lipoproteins & its disorders(Chylomicron & VLDL & LDL).pptx
Metabolism of  lipoproteins & its disorders(Chylomicron & VLDL & LDL).pptxMetabolism of  lipoproteins & its disorders(Chylomicron & VLDL & LDL).pptx
Metabolism of lipoproteins & its disorders(Chylomicron & VLDL & LDL).pptx
 
2024.03.16 How to write better quality materials for your learners ELTABB San...
2024.03.16 How to write better quality materials for your learners ELTABB San...2024.03.16 How to write better quality materials for your learners ELTABB San...
2024.03.16 How to write better quality materials for your learners ELTABB San...
 
Dhavni Theory by Anandvardhana Indian Poetics
Dhavni Theory by Anandvardhana Indian PoeticsDhavni Theory by Anandvardhana Indian Poetics
Dhavni Theory by Anandvardhana Indian Poetics
 
The basics of sentences session 8pptx.pptx
The basics of sentences session 8pptx.pptxThe basics of sentences session 8pptx.pptx
The basics of sentences session 8pptx.pptx
 
Quantitative research methodology and survey design
Quantitative research methodology and survey designQuantitative research methodology and survey design
Quantitative research methodology and survey design
 
3.12.24 The Social Construction of Gender.pptx
3.12.24 The Social Construction of Gender.pptx3.12.24 The Social Construction of Gender.pptx
3.12.24 The Social Construction of Gender.pptx
 
ANOVA Parametric test: Biostatics and Research Methodology
ANOVA Parametric test: Biostatics and Research MethodologyANOVA Parametric test: Biostatics and Research Methodology
ANOVA Parametric test: Biostatics and Research Methodology
 
Auchitya Theory by Kshemendra Indian Poetics
Auchitya Theory by Kshemendra Indian PoeticsAuchitya Theory by Kshemendra Indian Poetics
Auchitya Theory by Kshemendra Indian Poetics
 
Research Methodology and Tips on Better Research
Research Methodology and Tips on Better ResearchResearch Methodology and Tips on Better Research
Research Methodology and Tips on Better Research
 

Lincoln Journalism Research Day - Data Journalism

Editor's Notes

  1. Taking this opportunity to explore some of the issues associated with whatever this thing called “data journalism” is…
  2. I’m not a journalist, and don’t have any form of journalism training. But I do have an interest in ICT, and from that have an interest in “communication”. Let’s start with an easy(?!) question - what is journalism? One way of answering that question is to list some of the functions, or attributed, associated with it – informing, educating, holding to account, watchdog function, campaigning, contextualising for a particular audience.
  3. Sensemaking seems to me to be an important part of it… In part contextualisation, in part identifying the bits that make the difference, the bits that make it important, the bits that make it news that people need to know… …and often with a particular audience in mind.
  4. Critical judgement.
  5. Second question: what is data? National statistics, sports results, polls, financial figures, health data, school league tables, etc etc. Is a book data? Or a speech? What if I split a speech up into separate words, count the occurrence of each unique word and then display the result as a “tag cloud”, or word frequency diagram.
  6. One way of thinking about data is that it is a particular sort of source, or a source that can respond to a particular style of questioning in a particular way. Another take on this is that many “data sources” are experts on a particular topic, experts that know a lot of a very particular class of facts.
  7. One way of thinking about data is that it is a particular sort of source, or a source that can respond to a particular style of questioning in a particular way. Another take on this is that many “data sources” are experts on a particular topic, experts that know a lot of a very particular class of facts.
  8. So what is data journalism? If I was to ask you, the members of a school of journalism, “is this or that news article ‘journalism’” I imagine one response might, “well…. It’s the output of a journalistic process.” But if I point at a map with some markers on it and ask: “is this map “data journalism”, you might answer: yes. Or at least, that’s what many of the early job ads for data journalists implied.
  9. Sports journalism has sport as the topical contextual frame for some journalistic activity, Political journalism has politics as the topical contextual frame for some journalistic activity, Investigative journalism has a particular process as the contextual frame for some journalistic activity, a process that may be applied to particular topic areas. So for data journalism does “data” relate to the topic or the process? Where we focus on data outputs, then the implication is that the “topic” of data is the focus of the framing. But I think we need to reframe to consider the procedural role.
  10. So as a starting point, let’s frame the idea that data journalism is a process related epithet that implies one of the key sources in a journalistic activity is “data”.
  11. By focusing on this notion of data journalism as relating to process, we can then start to explore with a little bit more criticality what the practice of data journalism might involve that identifies it as such. That is, how is practice influenced by the fact that it must engage with “data as a source”?
  12. The inverted pyramid gives us one way of considering the data journalistic process, or at least identifying some of the steps involved in a data investigation. But there are many other ways of conceptualising the process – for example, finding stories and telling stories…
  13. When it comes to finding stories, do we: want to find stories in a dataset we are provided with, or use data to help draw out a story lead we have already been tipped off to?
  14. Anscombe’s Quartet is a toy dataset that first appeared in a 1973 paper by statistician Francis Anscombe. His paper – Graphs in Statistical Analysis – was based around the claim that “graphs are essential to good statistical analysis”.
  15. But this is where we start to hit some stumbling blocks.
  16. And a big stumbling block is one that is often denied in higher education, which is the provision of skills, as compared to “higher level conceptual or academic understanding”. There is an old saw that we become better writers through reading more. But how much time do you invest in reading charts? Really reading them? I came across this beautifully titled book a few weeks ago - “Making Sense of Squiggly Lines”. The blurb on the back summarises the situation well: “Data points are just words, but when connected with a squiggly line they tell a story”.
  17. In an ideal world, the process would be simple: have data, get story.
  18. But it’s not that simple. It’s more likely that we need to engage with the dataset to try to tease the stories out of it, or facts and relationships from it that we can used to support the claims we make in a narration of some sort of story that is at least supported by the data, or contextualises it in a narrative way that is hopefully “truthy”.
  19. One of the ways I like to work with data is to have a conversation with it – asking questions of it and then further questions based on the responses I get.
  20. Sometimes it looks at first as if we have data in a form where we might be able to do something with it – then we realise it needs cleaning and reshaping. For example, in this case we have percentage signs contaminating numbers, data organised in separate sections – but how do we get a “well behaved” view over data from all the wards – and different sorts of data: votes polled per candidate versus the size of the electorate in a particular ward for example. Walkthrough: http://blog.ouseful.info/2013/05/03/a-wrangling-example-with-openrefine-making-ready-data/
  21. But this is where we start to hit some stumbling blocks.
  22. And a big stumbling block is one that is often denied in higher education, which is the provision of skills, as compared to “higher level conceptual or academic understanding”.
  23. Tidying data – or cleaning data – or more colloquially, “wrangling data” – refers to the process we need to engage in to turn a dataset we have found into one that is useable. Many published datasets are horrible. Really horrible. They don’t work as we might want or expect them to in the applications we tend to have to hand.
  24. Take producing data visualisations, for example: have data, produce visualisation. No. That’s like saying: have two hours of rambling conversation with source, have 200 word story with strong quotes. No. Just: no. It doesn’t work like that. Yes, there are powerful charting tools available BUT they require the data to be clean and tidy and to be in the right shape for the tool. But it typically isn’t.
  25. We have to wrangle it. Now wrangling is a technical job, and arguably a job for technicians – higher apprentices of the journalistic world – not graduate journalists. But I think out journalists are going to have to learn the equivalent of some machining in the mechanical world.
  26. Just by the by, I didn’t draw those block diagrams, I wrote them.
  27. I “wrote” these charts – you can see how at the top. That code – applied to a suitably shaped version of a dataset known as Anscombe’s Quartet. The data has been reshaped to 3 column format: a column for the x values, that are plotted on the horizontal x-axes; a column for the y values, that form the vertical y-axes; and a column for the groups, which specify which panel, or facet, each point should be plotted in. The code defines the construction of those charts. Exactly. There is no magic. At least, no other magic.
  28. One of the first datasets I played with was MPs’ expenses data. Here are a couple of ways I started to chat with it – imagine talking to someone whop knows about *all* the expenses claims put in by every MP over a parliamentary session… (The charts were created using an online interactive tool developed by IBM called Many Eyes.) The bar chart Is ordered, for a particular expenses area, by total amount for each individual MP. The block histogram shows how many MPs made a total claim in particular expenses area of a particular binned value. (A ‘bin’ is a range.)
  29. Critical judgement – it applies to data too...
  30. One of the things to mention about mapping data mapping and visualisation techniques is that they often tells us things we already (think we) know; in that sense, they are not news. But they may also tell us things we know in new, visually appealing ways. And by making use of such ‘confirmatory’ visualisations and displays we can build confidence within an audience that they know how to interpret these sorts of representation.
  31. As the audience becomes comfortable reading the charts and making sense of data, when there is something new or surprising in the data, the surprise manifests itself in the reading of the data or chart. For journalists working with data, developing a sense of familiarity with how to interpret and read data when it is just confirming what you already know helps to refine your senses for spotting things that are odd, noteworthy, or newsworthy. Taking a little bit of time each day to: read charts as if they were stories; look behind the data to find original sources, such as polls or data containing news releases, and then compare the original release with the way it is reported, paying particular attention to the points that are highlighted, and how the data is contextualised; will help you develop some of the skills you will need if you want to be able to identify, develop and treat some of the stories that your specialist source that is data can provide you with, of only you ask…
  32. A scatterplot is another very powerful sort of chart – we can plot two sorts of value against each other to see if there are any groups, or trends. Some scatterplot tools allow you to size or colour nodes according to further dimensions. Colouring nodes by group (if sensible groups exist) can also help you see whether particular groups are clustered or group together in particular areas of the chart.
  33. Maps can be used to pull out different sorts of relationships – for example, plotting markers in the centre of each MP’s ward coloured by the total value of travel expenses claim in a particular area, we can easily see whether or not an MP is claiming an amount significantly different to MPs in neighbouring wards. In this case – travel expenses – we might expect (at first glance at least) a homophilitic effect – folk a similar distance away from Westminster should presumably make similar sorts of travel claim? At second glance, we might then start to refine our questioning – does ward size (in terms of geographical area) or rurality have an effect? Does an MP travel to and from home more than neighbours (or perhaps claim more in terms of accommodation in London?)
  34. Sometimes we need to provide quite a lot of explanation when it comes to making sense of even a simple data visualisation – “what am I supposed to be looking at?”
  35. The other way of using data is to tell stories. But what does that even mean…?
  36. The other way of using data is to tell stories. But what does that even mean…?
  37. In passing, it’s worth mentioning that one thing statistics does is help provide context. Is this number a big number in the greater scheme of things? Is this thing likely to happen by chance or is there a meaningful causal relationship between this thing and another thing? The chart in the corner is a reminder about how surprising probabilities can be. The chart shows the probability (y-axis) that two people share a birthday (the number of people is given on the x-axis). The chart shows that if there are 23 or more people in a room, there is more than a 50/50 chance that two of them will share a birthday (that is, share the same birth day and month, though not necessarily same birth year). How many people are in the room? If it’s more than 23 – I bet that at least two people share a birthday (at least in terms of day and month).
  38. One of the first datasets I played with was MPs’ expenses data. Here are a couple of ways I started to chat with it – imagine talking to someone whop knows about *all* the expenses claims put in by every MP over a parliamentary session… (The charts were created using an online interactive tool developed by IBM called Many Eyes.) The bar chart Is ordered, for a particular expenses area, by total amount for each individual MP. The block histogram shows how many MPs made a total claim in particular expenses area of a particular binned value. (A ‘bin’ is a range.)
  39. The other way of using data is to tell stories. But what does that even mean…?