Big Data & Journalism
MAC281
twitter/rob_jewitt
robert.jewitt@sunderland.ac.uk
1
2009 #iranelection
Image: Gilad Lotan, ReTweet Revolution
2
Anatomyofatweet
3
Overview
 Intro
 Database Journalism and Computer Assisted Reporting
 Data Today : Visualisations and Interactivity
 How To Be A Data Journalist
 Ethics?
4
Recent hype
 Data Journalism
 Meta Journalism
 Visualisation
 Infographics
 Mash Ups
5
Adam Westbrook
 “I think data-driven journalism is one of the big potential
growth areas in the future of journalism. A lot of the forward-
thinking discussion about the future of news focuses on the
‘glamorous’ possibilities, like video journalism and
interactivity, but I often see data journalism being ignored.
 In fact, I believe it is journalism in its truest essence:
uncovering and mining through information the public do
not have enough time to do themselves, interrogating it,
and making sense of it before sharing it with the audience. If
more journalists did this (rather than relying on ‘data’ from
press releases) we would be a far more enlightened public.
6
Source link
Adam Westbrook
 My message to the next generation of journalists - or any
journalist looking for a new niche or direction - would be
to learn the skills and tools of data interrogation. It’s not
glamorous, but it’s a skill not many journalists have, and
one which will give one an edge in the market.”
7
Source link
Brian Storm
 One of our big goals in the storytelling process is to
humanize the statistics. It’s hard for people to care about
numbers, especially large numbers. How do you get your
head around the death of 800,000 people in the
Rwandan genocide? I think if you meet the individuals -
see and hear the stories of the survivors - you can gain a
better insight into the tragedy.
8
Source link
“Data-driven journalism is the future”
 “[Journalism’s] going to be about poring over data
and equipping yourself with the tools to analyse it
and picking out what's interesting. And keeping it in
perspective, helping people out by really seeing
where it all fits together, and what's going on in the
country.”
 Sir Tim Berners-Lee, inventor of the Web, 2010
9
Origins
 1950s
 Database Journalism
 Computer Assisted Reporting (CAR)
 Very expensive
10
11
The Indianapolis Star
Capital Journal circa 1961
NewYorkTimesNewsRoom
12
CBS: 1952, Walter Cronkite
 Presidential election battle
 Eisenhower vs Stevenson
 Remington Rand UNIVAC
 Early vote returns analysis
 Predicted a landslide victory
 Contrary to popular opinion
13
Philip Meyer, Precision Journalism
 1969: a journalist must make use of databases
and surveys
 2002: “a journalist has to be a database
manager”
14
Other notable examples
 Clarence Jones, The Miami Herald, 1969
 Criminal Justice systems
 David Burnham, The New York Times, 1972
 Police crime rates
 Elliot Jaspin, The Providence Journal, 1986
 School bus drivers and criminal records
 Bill Dedman, The Atlanta Journal, 1988
 Pullitzer Prize for The Color of Money
15
Not Database – Just Data?
16
17
18
19
Since 2004
20
Adrian Holovaty (2005)
 Chicago Transport Authority map + Firefox plug-in +
Google Maps = real time updates
 Chicago Police Department + Google Maps = real time
police reports
21
Adrian Holovaty (2006)
 Now working for the Washington Post
 A fundamental way newspaper sites need to change
 Most material collected by journalists is:
 "structured information: the type of information that can be
sliced-and-diced, in an automated fashion, by computers”
22
Adrian Holovaty (2006)
Traditional journalism
 Articles as the finished
product
Data journalism
 Continually maintained
and improved
23
Radical overhaul needed
- Employing data
- Making data available
- Storing data
- Coding data
=✓=✗
Maps Everywhere!
24
25
26
Maps Everywhere!
 2007 – Holovaty won $1.1 million from the Knight
Foundation for Everyblock
 2010 – SR2 Blog won Guardian.co.uk’s ‘most inspirational
site’ accolade
27
28
29
link
30
link
31
Link
Interactivity
 Transport For London API
 Icelandic Ash Cloud and plane tracking

 AlJazeera’s coverage of War on Gaza using Ushahidi
 Guardian’s Twitter map of Middle East

 BBC Interactive on the Spending Review
32
Bella Hurrell, Specials Editor with BBC
News Online (2011)
 Proximity of “journalists, designers and developers all
working together, sitting alongside each other”
33
Bella Hurrell, Specials Editor with BBC
News Online (2011)
 “We have found that proximity really important to the
success of projects. Although we have done this for a
while, increasingly other organisations are reorganising
along these lines after coming to realise the benefits of
breaking down silos and co-locating people with different
skillsets can produce more innovative solutions at a
faster pace.”
34
Bella Hurrell, Specials Editor with BBC
News Online (2011)
 “As data visualisation has come into the zeitgeist, and we
have started using it more regularly in our story-telling,
journalists and designers on the specials team have
become much more proficient at using basic
spreadsheet applications like Excel or Google Docs”
35
Paul Bradshaw
36
Paul Bradshaw
37
 “It represents the convergence of a number of fields
which are significant in their own right - from investigative
research and statistics to design and programming. The
idea of combining those skills to tell important stories is
powerful - but also intimidating. Who can do all that?”
Paul Bradshaw
38
 “It represents the convergence of a number of fields
which are significant in their own right - from investigative
research and statistics to design and programming. The
idea of combining those skills to tell important stories is
powerful - but also intimidating. Who can do all that?”
 “The reality is that almost no one is doing all of that, but
there are enough different parts of the puzzle for people
to easily get involved in, and go from there”
39
Dealing with Data (Bradshaw, 2010)
4 crucial aspects
40
1. Finding data
2. Interrogating data
3. Visualizing data
1. Mashing data
Link
41
42
43
44
Data visualisation vs data journalism
45
46
Video
New Tools of the Trade?
Analysis
 Excel or Calc
 sort your data
 Google Refine
 clean your dirty data
 Yahoo Pipes
 Composition mash-up tool
 ScraperWiki
 transforms info from webpages
into data
 R
 Process and manipulate data
Visualisation
 Google Fusion Tables
 visualise data on maps, timelines,
etc
 Tableau Public
 Visualise and share
 IBM’s Many Eyes
 data visualisation tool
 Processing
 create images & interactives
 Wordle
 generate word clouds from bulky
text
47
Free tools…
48
Free tools…
49
50
Summary
 Is this journalism?
 Journalism educators doing students a disservice?
 Journalists replaced by programmers?
 Wikileaks: no journalist's required?
51
Links and further reading
 Simon Rogers (2013) Facts are Sacred, London: Faber & Faber
 http://www.delicious.com/rob_jewitt/med312+datajournalism
 http://www.delicious.com/smfrogers
52
53
Images
 Knight Foundation, 2008, Sir Tim Berners-Lee talking about
the Web at the Newseum
 Bill on Capitol Hill, 2007, The Rim and the Slot
 Marion Doss, 2008, Capital Journalism News Room 16
October 1961
 Igorschwarzmann, 2010, NYT News Room
 Mkandlez, 2009, The Billion Pound O Gram
 BitBoy, 2006, The Elephant in the Room
 Ravages, 2008, Links
Issues
 To what extent is the traditional craft of storytelling being
challenged by the emergence of big data?
 What kind of problems are manifest by the deluge of
large data sets (eg MPs expenses, Wikileaks Iraq war logs,
US cables, etc)?
 Can the use or release of big data sets have ethical
implications?
54
Wikileaks
55

Mac281 big data & journalism lecture 2014

  • 1.
    Big Data &Journalism MAC281 twitter/rob_jewitt robert.jewitt@sunderland.ac.uk 1
  • 2.
    2009 #iranelection Image: GiladLotan, ReTweet Revolution 2
  • 3.
  • 4.
    Overview  Intro  DatabaseJournalism and Computer Assisted Reporting  Data Today : Visualisations and Interactivity  How To Be A Data Journalist  Ethics? 4
  • 5.
    Recent hype  DataJournalism  Meta Journalism  Visualisation  Infographics  Mash Ups 5
  • 6.
    Adam Westbrook  “Ithink data-driven journalism is one of the big potential growth areas in the future of journalism. A lot of the forward- thinking discussion about the future of news focuses on the ‘glamorous’ possibilities, like video journalism and interactivity, but I often see data journalism being ignored.  In fact, I believe it is journalism in its truest essence: uncovering and mining through information the public do not have enough time to do themselves, interrogating it, and making sense of it before sharing it with the audience. If more journalists did this (rather than relying on ‘data’ from press releases) we would be a far more enlightened public. 6 Source link
  • 7.
    Adam Westbrook  Mymessage to the next generation of journalists - or any journalist looking for a new niche or direction - would be to learn the skills and tools of data interrogation. It’s not glamorous, but it’s a skill not many journalists have, and one which will give one an edge in the market.” 7 Source link
  • 8.
    Brian Storm  Oneof our big goals in the storytelling process is to humanize the statistics. It’s hard for people to care about numbers, especially large numbers. How do you get your head around the death of 800,000 people in the Rwandan genocide? I think if you meet the individuals - see and hear the stories of the survivors - you can gain a better insight into the tragedy. 8 Source link
  • 9.
    “Data-driven journalism isthe future”  “[Journalism’s] going to be about poring over data and equipping yourself with the tools to analyse it and picking out what's interesting. And keeping it in perspective, helping people out by really seeing where it all fits together, and what's going on in the country.”  Sir Tim Berners-Lee, inventor of the Web, 2010 9
  • 10.
    Origins  1950s  DatabaseJournalism  Computer Assisted Reporting (CAR)  Very expensive 10
  • 11.
  • 12.
  • 13.
    CBS: 1952, WalterCronkite  Presidential election battle  Eisenhower vs Stevenson  Remington Rand UNIVAC  Early vote returns analysis  Predicted a landslide victory  Contrary to popular opinion 13
  • 14.
    Philip Meyer, PrecisionJournalism  1969: a journalist must make use of databases and surveys  2002: “a journalist has to be a database manager” 14
  • 15.
    Other notable examples Clarence Jones, The Miami Herald, 1969  Criminal Justice systems  David Burnham, The New York Times, 1972  Police crime rates  Elliot Jaspin, The Providence Journal, 1986  School bus drivers and criminal records  Bill Dedman, The Atlanta Journal, 1988  Pullitzer Prize for The Color of Money 15
  • 16.
    Not Database –Just Data? 16
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
    Adrian Holovaty (2005) Chicago Transport Authority map + Firefox plug-in + Google Maps = real time updates  Chicago Police Department + Google Maps = real time police reports 21
  • 22.
    Adrian Holovaty (2006) Now working for the Washington Post  A fundamental way newspaper sites need to change  Most material collected by journalists is:  "structured information: the type of information that can be sliced-and-diced, in an automated fashion, by computers” 22
  • 23.
    Adrian Holovaty (2006) Traditionaljournalism  Articles as the finished product Data journalism  Continually maintained and improved 23 Radical overhaul needed - Employing data - Making data available - Storing data - Coding data =✓=✗
  • 24.
  • 25.
  • 26.
  • 27.
    Maps Everywhere!  2007– Holovaty won $1.1 million from the Knight Foundation for Everyblock  2010 – SR2 Blog won Guardian.co.uk’s ‘most inspirational site’ accolade 27
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
    Interactivity  Transport ForLondon API  Icelandic Ash Cloud and plane tracking   AlJazeera’s coverage of War on Gaza using Ushahidi  Guardian’s Twitter map of Middle East   BBC Interactive on the Spending Review 32
  • 33.
    Bella Hurrell, SpecialsEditor with BBC News Online (2011)  Proximity of “journalists, designers and developers all working together, sitting alongside each other” 33
  • 34.
    Bella Hurrell, SpecialsEditor with BBC News Online (2011)  “We have found that proximity really important to the success of projects. Although we have done this for a while, increasingly other organisations are reorganising along these lines after coming to realise the benefits of breaking down silos and co-locating people with different skillsets can produce more innovative solutions at a faster pace.” 34
  • 35.
    Bella Hurrell, SpecialsEditor with BBC News Online (2011)  “As data visualisation has come into the zeitgeist, and we have started using it more regularly in our story-telling, journalists and designers on the specials team have become much more proficient at using basic spreadsheet applications like Excel or Google Docs” 35
  • 36.
  • 37.
    Paul Bradshaw 37  “Itrepresents the convergence of a number of fields which are significant in their own right - from investigative research and statistics to design and programming. The idea of combining those skills to tell important stories is powerful - but also intimidating. Who can do all that?”
  • 38.
    Paul Bradshaw 38  “Itrepresents the convergence of a number of fields which are significant in their own right - from investigative research and statistics to design and programming. The idea of combining those skills to tell important stories is powerful - but also intimidating. Who can do all that?”  “The reality is that almost no one is doing all of that, but there are enough different parts of the puzzle for people to easily get involved in, and go from there”
  • 39.
  • 40.
    Dealing with Data(Bradshaw, 2010) 4 crucial aspects 40 1. Finding data 2. Interrogating data 3. Visualizing data 1. Mashing data
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
    Data visualisation vsdata journalism 45
  • 46.
  • 47.
    New Tools ofthe Trade? Analysis  Excel or Calc  sort your data  Google Refine  clean your dirty data  Yahoo Pipes  Composition mash-up tool  ScraperWiki  transforms info from webpages into data  R  Process and manipulate data Visualisation  Google Fusion Tables  visualise data on maps, timelines, etc  Tableau Public  Visualise and share  IBM’s Many Eyes  data visualisation tool  Processing  create images & interactives  Wordle  generate word clouds from bulky text 47
  • 48.
  • 49.
  • 50.
  • 51.
    Summary  Is thisjournalism?  Journalism educators doing students a disservice?  Journalists replaced by programmers?  Wikileaks: no journalist's required? 51
  • 52.
    Links and furtherreading  Simon Rogers (2013) Facts are Sacred, London: Faber & Faber  http://www.delicious.com/rob_jewitt/med312+datajournalism  http://www.delicious.com/smfrogers 52
  • 53.
    53 Images  Knight Foundation,2008, Sir Tim Berners-Lee talking about the Web at the Newseum  Bill on Capitol Hill, 2007, The Rim and the Slot  Marion Doss, 2008, Capital Journalism News Room 16 October 1961  Igorschwarzmann, 2010, NYT News Room  Mkandlez, 2009, The Billion Pound O Gram  BitBoy, 2006, The Elephant in the Room  Ravages, 2008, Links
  • 54.
    Issues  To whatextent is the traditional craft of storytelling being challenged by the emergence of big data?  What kind of problems are manifest by the deluge of large data sets (eg MPs expenses, Wikileaks Iraq war logs, US cables, etc)?  Can the use or release of big data sets have ethical implications? 54
  • 55.