Big Data & Journalism 
MAC201 
twitter/rob_jewitt 
robert.jewitt@sunderland.ac.uk 
1
2009 #iranelection 
Image: Gilad Lotan, ReTweet Revolution 
2
Anatomy of a tweet 
3
Overview 
 Intro 
 Database Journalism and Computer Assisted Reporting 
 Data Today : Visualisations and Interactivity 
 How To Be A Data Journalist 
 Ethics? 
4
Recent hype 
 Data Journalism 
 Meta Journalism 
 Visualisation 
 Infographics 
 Mash Ups 
5
Adam Westbrook 
 “I think data-driven journalism is one of the big potential 
growth areas in the future of journalism. A lot of the forward-thinking 
discussion about the future of news focuses on the 
‘glamorous’ possibilities, like video journalism and 
interactivity, but I often see data journalism being ignored. 
 In fact, I believe it is journalism in its truest essence: 
uncovering and mining through information the public do 
not have enough time to do themselves, interrogating it, 
and making sense of it before sharing it with the audience. If 
more journalists did this (rather than relying on ‘data’ from 
press releases) we would be a far more enlightened public. 
6 
Source link
Adam Westbrook 
 My message to the next generation of journalists - or any 
journalist looking for a new niche or direction - would be 
to learn the skills and tools of data interrogation. It’s not 
glamorous, but it’s a skill not many journalists have, and 
one which will give one an edge in the market.” 
7 
Source link
Brian Storm 
 One of our big goals in the storytelling process is to 
humanize the statistics. It’s hard for people to care about 
numbers, especially large numbers. How do you get your 
head around the death of 800,000 people in the 
Rwandan genocide? I think if you meet the individuals - 
see and hear the stories of the survivors - you can gain a 
better insight into the tragedy. 
8 
Source link
“Data-driven journalism is the future” 
 “[Journalism’s] going to be about poring over data 
and equipping yourself with the tools to analyse it 
and picking out what's interesting. And keeping it in 
perspective, helping people out by really seeing 
where it all fits together, and what's going on in the 
country.” 
 Sir Tim Berners-Lee, inventor of the Web, 2010 
9
Origins 
 1950s 
 Database Journalism 
 Computer Assisted Reporting (CAR) 
 Very expensive 
10
The Indianapolis Star 
11 
Capital Journal circa 1961
New York Times News Room 
12
CBS: 1952, Walter Cronkite 
 Presidential election battle 
 Eisenhower vs Stevenson 
 Remington Rand UNIVAC 
 Early vote returns analysis 
 Predicted a landslide victory 
 Contrary to popular opinion 
13
Philip Meyer, Precision Journalism 
 1969: a journalist must make use of databases 
and surveys 
 2002: “a journalist has to be a database 
manager” 
14
Other notable examples 
 Clarence Jones, The Miami Herald, 1969 
 Criminal Justice systems 
 David Burnham, The New York Times, 1972 
 Police crime rates 
 Elliot Jaspin, The Providence Journal, 1986 
 School bus drivers and criminal records 
 Bill Dedman, The Atlanta Journal, 1988 
 Pullitzer Prize for The Color of Money 
15
Not Database – Just Data? 
16
17
18
19 
Since 2004
20
Adrian Holovaty (2005) 
 Chicago Transport Authority map + Firefox plug-in + 
Google Maps = real time updates 
 Chicago Police Department + Google Maps = real time 
police reports 
21
Adrian Holovaty (2006) 
 Now working for the Washington Post 
 A fundamental way newspaper sites need to change 
 Most material collected by journalists is: 
 "structured information: the type of information that can be 
sliced-and-diced, in an automated fashion, by computers” 
22
Adrian Holovaty (2006) 
Traditional journalism 
 Articles as the finished 
product 
Data journalism 
 Continually maintained 
and improved 
23 
Radical overhaul needed 
- Employing data 
- Making data available 
- Storing data 
- Coding data 
=✗ =✓
Maps Everywhere! 
24
25
26
Maps Everywhere! 
 2007 – Holovaty won $1.1 million from the Knight 
Foundation for Everyblock 
 2010 – SR2 Blog won Guardian.co.uk’s ‘most inspirational 
site’ accolade 
27
28
29 link
30 link
31 
Link
Interactivity 
 Transport For London API 
 Icelandic Ash Cloud and plane tracking 
 
 AlJazeera’s coverage of War on Gaza using Ushahidi 
 Guardian’s Twitter map of Middle East 
 
 BBC Interactive on the Spending Review 
32
Bella Hurrell, Specials Editor with BBC 
News Online (2011) 
 Proximity of “journalists, designers and developers all 
working together, sitting alongside each other” 
33
Bella Hurrell, Specials Editor with BBC 
News Online (2011) 
 “We have found that proximity really important to the 
success of projects. Although we have done this for a 
while, increasingly other organisations are reorganising 
along these lines after coming to realise the benefits of 
breaking down silos and co-locating people with different 
skillsets can produce more innovative solutions at a 
faster pace.” 
34
Bella Hurrell, Specials Editor with BBC 
News Online (2011) 
 “As data visualisation has come into the zeitgeist, and we 
have started using it more regularly in our story-telling, 
journalists and designers on the specials team have 
become much more proficient at using basic 
spreadsheet applications like Excel or Google Docs” 
35
Paul Bradshaw 
36
Paul Bradshaw 
37 
 “It represents the convergence of a number of fields 
which are significant in their own right - from investigative 
research and statistics to design and programming. The 
idea of combining those skills to tell important stories is 
powerful - but also intimidating. Who can do all that?”
Paul Bradshaw 
38 
 “It represents the convergence of a number of fields 
which are significant in their own right - from investigative 
research and statistics to design and programming. The 
idea of combining those skills to tell important stories is 
powerful - but also intimidating. Who can do all that?” 
 “The reality is that almost no one is doing all of that, but 
there are enough different parts of the puzzle for people 
to easily get involved in, and go from there”
39
Dealing with Data (Bradshaw, 2010) 
4 crucial aspects 
40 
1. Finding data 
2. Interrogating data 
3. Visualizing data 
1. Mashing data
Link 
41
42
43
44
Data visualisation vs data journalism 
45
46 
Video
New Tools of the Trade? 
Analysis 
 Excel or Calc 
 sort your data 
 Google Refine 
 clean your dirty data 
 Yahoo Pipes 
 Composition mash-up tool 
 ScraperWiki 
 transforms info from webpages 
into data 
 R 
 Process and manipulate data 
Visualisation 
 Google Fusion Tables 
 visualise data on maps, timelines, 
etc 
 Tableau Public 
 Visualise and share 
 IBM’s Many Eyes 
 data visualisation tool 
 Processing 
 create images & interactives 
 Wordle 
 generate word clouds from bulky 
text 
47
Free tools… 
48
Free tools… 
49
50
Summary 
 Is this journalism? 
 Journalism educators doing students a disservice? 
 Journalists replaced by programmers? 
 Wikileaks: no journalist's required? 
51
Links and further reading 
 Simon Rogers (2013) Facts are Sacred, London: Faber & Faber 
 http://www.delicious.com/rob_jewitt/med312+datajournalism 
 http://www.delicious.com/smfrogers 
52
53 
Images 
 Knight Foundation, 2008, Sir Tim Berners-Lee talking about 
the Web at the Newseum 
 Bill on Capitol Hill, 2007, The Rim and the Slot 
 Marion Doss, 2008, Capital Journalism News Room 16 
October 1961 
 Igorschwarzmann, 2010, NYT News Room 
 Mkandlez, 2009, The Billion Pound O Gram 
 BitBoy, 2006, The Elephant in the Room 
 Ravages, 2008, Links
Issues 
 To what extent is the traditional craft of storytelling being 
challenged by the emergence of big data? 
 What kind of problems are manifest by the deluge of 
large data sets (eg MPs expenses, Wikileaks Iraq war logs, 
US cables, etc)? 
 Can the use or release of big data sets have ethical 
implications? 
54
Wikileaks 
55

Mac201 data journalism lecture

  • 1.
    Big Data &Journalism MAC201 twitter/rob_jewitt robert.jewitt@sunderland.ac.uk 1
  • 2.
    2009 #iranelection Image:Gilad Lotan, ReTweet Revolution 2
  • 3.
    Anatomy of atweet 3
  • 4.
    Overview  Intro  Database Journalism and Computer Assisted Reporting  Data Today : Visualisations and Interactivity  How To Be A Data Journalist  Ethics? 4
  • 5.
    Recent hype Data Journalism  Meta Journalism  Visualisation  Infographics  Mash Ups 5
  • 6.
    Adam Westbrook “I think data-driven journalism is one of the big potential growth areas in the future of journalism. A lot of the forward-thinking discussion about the future of news focuses on the ‘glamorous’ possibilities, like video journalism and interactivity, but I often see data journalism being ignored.  In fact, I believe it is journalism in its truest essence: uncovering and mining through information the public do not have enough time to do themselves, interrogating it, and making sense of it before sharing it with the audience. If more journalists did this (rather than relying on ‘data’ from press releases) we would be a far more enlightened public. 6 Source link
  • 7.
    Adam Westbrook My message to the next generation of journalists - or any journalist looking for a new niche or direction - would be to learn the skills and tools of data interrogation. It’s not glamorous, but it’s a skill not many journalists have, and one which will give one an edge in the market.” 7 Source link
  • 8.
    Brian Storm One of our big goals in the storytelling process is to humanize the statistics. It’s hard for people to care about numbers, especially large numbers. How do you get your head around the death of 800,000 people in the Rwandan genocide? I think if you meet the individuals - see and hear the stories of the survivors - you can gain a better insight into the tragedy. 8 Source link
  • 9.
    “Data-driven journalism isthe future”  “[Journalism’s] going to be about poring over data and equipping yourself with the tools to analyse it and picking out what's interesting. And keeping it in perspective, helping people out by really seeing where it all fits together, and what's going on in the country.”  Sir Tim Berners-Lee, inventor of the Web, 2010 9
  • 10.
    Origins  1950s  Database Journalism  Computer Assisted Reporting (CAR)  Very expensive 10
  • 11.
    The Indianapolis Star 11 Capital Journal circa 1961
  • 12.
    New York TimesNews Room 12
  • 13.
    CBS: 1952, WalterCronkite  Presidential election battle  Eisenhower vs Stevenson  Remington Rand UNIVAC  Early vote returns analysis  Predicted a landslide victory  Contrary to popular opinion 13
  • 14.
    Philip Meyer, PrecisionJournalism  1969: a journalist must make use of databases and surveys  2002: “a journalist has to be a database manager” 14
  • 15.
    Other notable examples  Clarence Jones, The Miami Herald, 1969  Criminal Justice systems  David Burnham, The New York Times, 1972  Police crime rates  Elliot Jaspin, The Providence Journal, 1986  School bus drivers and criminal records  Bill Dedman, The Atlanta Journal, 1988  Pullitzer Prize for The Color of Money 15
  • 16.
    Not Database –Just Data? 16
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
    Adrian Holovaty (2005)  Chicago Transport Authority map + Firefox plug-in + Google Maps = real time updates  Chicago Police Department + Google Maps = real time police reports 21
  • 22.
    Adrian Holovaty (2006)  Now working for the Washington Post  A fundamental way newspaper sites need to change  Most material collected by journalists is:  "structured information: the type of information that can be sliced-and-diced, in an automated fashion, by computers” 22
  • 23.
    Adrian Holovaty (2006) Traditional journalism  Articles as the finished product Data journalism  Continually maintained and improved 23 Radical overhaul needed - Employing data - Making data available - Storing data - Coding data =✗ =✓
  • 24.
  • 25.
  • 26.
  • 27.
    Maps Everywhere! 2007 – Holovaty won $1.1 million from the Knight Foundation for Everyblock  2010 – SR2 Blog won Guardian.co.uk’s ‘most inspirational site’ accolade 27
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
    Interactivity  TransportFor London API  Icelandic Ash Cloud and plane tracking   AlJazeera’s coverage of War on Gaza using Ushahidi  Guardian’s Twitter map of Middle East   BBC Interactive on the Spending Review 32
  • 33.
    Bella Hurrell, SpecialsEditor with BBC News Online (2011)  Proximity of “journalists, designers and developers all working together, sitting alongside each other” 33
  • 34.
    Bella Hurrell, SpecialsEditor with BBC News Online (2011)  “We have found that proximity really important to the success of projects. Although we have done this for a while, increasingly other organisations are reorganising along these lines after coming to realise the benefits of breaking down silos and co-locating people with different skillsets can produce more innovative solutions at a faster pace.” 34
  • 35.
    Bella Hurrell, SpecialsEditor with BBC News Online (2011)  “As data visualisation has come into the zeitgeist, and we have started using it more regularly in our story-telling, journalists and designers on the specials team have become much more proficient at using basic spreadsheet applications like Excel or Google Docs” 35
  • 36.
  • 37.
    Paul Bradshaw 37  “It represents the convergence of a number of fields which are significant in their own right - from investigative research and statistics to design and programming. The idea of combining those skills to tell important stories is powerful - but also intimidating. Who can do all that?”
  • 38.
    Paul Bradshaw 38  “It represents the convergence of a number of fields which are significant in their own right - from investigative research and statistics to design and programming. The idea of combining those skills to tell important stories is powerful - but also intimidating. Who can do all that?”  “The reality is that almost no one is doing all of that, but there are enough different parts of the puzzle for people to easily get involved in, and go from there”
  • 39.
  • 40.
    Dealing with Data(Bradshaw, 2010) 4 crucial aspects 40 1. Finding data 2. Interrogating data 3. Visualizing data 1. Mashing data
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
    Data visualisation vsdata journalism 45
  • 46.
  • 47.
    New Tools ofthe Trade? Analysis  Excel or Calc  sort your data  Google Refine  clean your dirty data  Yahoo Pipes  Composition mash-up tool  ScraperWiki  transforms info from webpages into data  R  Process and manipulate data Visualisation  Google Fusion Tables  visualise data on maps, timelines, etc  Tableau Public  Visualise and share  IBM’s Many Eyes  data visualisation tool  Processing  create images & interactives  Wordle  generate word clouds from bulky text 47
  • 48.
  • 49.
  • 50.
  • 51.
    Summary  Isthis journalism?  Journalism educators doing students a disservice?  Journalists replaced by programmers?  Wikileaks: no journalist's required? 51
  • 52.
    Links and furtherreading  Simon Rogers (2013) Facts are Sacred, London: Faber & Faber  http://www.delicious.com/rob_jewitt/med312+datajournalism  http://www.delicious.com/smfrogers 52
  • 53.
    53 Images Knight Foundation, 2008, Sir Tim Berners-Lee talking about the Web at the Newseum  Bill on Capitol Hill, 2007, The Rim and the Slot  Marion Doss, 2008, Capital Journalism News Room 16 October 1961  Igorschwarzmann, 2010, NYT News Room  Mkandlez, 2009, The Billion Pound O Gram  BitBoy, 2006, The Elephant in the Room  Ravages, 2008, Links
  • 54.
    Issues  Towhat extent is the traditional craft of storytelling being challenged by the emergence of big data?  What kind of problems are manifest by the deluge of large data sets (eg MPs expenses, Wikileaks Iraq war logs, US cables, etc)?  Can the use or release of big data sets have ethical implications? 54
  • 55.

Editor's Notes

  • #5 [In last week’s lecture and workshops I talked to you about the dangers of inappropriate use of data sourced from social media platforms, making its way into the mainstream news agenda – and about how lives were put at risk as a result of the naivety of web users driving news stories and the people at the centre of them even further up the news agenda. This week I want to talk to you about how a relatively new form of journalism has been coming to prominence and what this might mean for journalists of the future. I want to talk to you about data journalism]