Your SlideShare is downloading. ×
0
Data Journalism 101
What is data journalism?
?
?
? ?
?
?
?
?
?
??
? ???
“Wrangling, vetting and visualizing data to bring
forth news stories in the public interest that we
never would have found...
“A data journalist is anyone ...who can fluently
work with this primary source [data]. It’s the
same as a traditional repo...
“Data journalism is a form of reporting that
makes use of structured data (e.g.
spreadsheets, databases) as a key componen...
“Data can be the source of data journalism, or it
can be the tool with which the story is told — or
it can be both. Like a...
DJ in the wild
Data can be used to...
● Fact-check official narratives
● Identify trends and patterns
● Rank things
● Analyze relationshi...
Step-by-Step Guide on How To Become a
Journicorn
Step 1: Master the Basics
In no particular order:
Excel, MySQL, Postgres, SPSS, R, Javascript,
Linux, Python, Ruby, QGIS, ...
Don’t try to be a Journicorn.
(Hint: They don’t exist.)
Be a journalist who uses data.
Data is just another source.
Start with a Question, then Data
● Are housing prices going up?
● Do reports of falling crime bear out across
the entire c...
Data sources
● Public agencies (local, county, state, federal)
● Data.gov sites
● Social networking sites (often APIs)
● N...
Databases of Databases
● Paid
○ Accurint ($)
○ Nexis ($)
● Free
○ BRB
○ Online Searches
○ Libraries
Not everything is on the web.
A whole world of data may never see light of
day on gov websites. How do you find it?
● Gove...
Useful datasets
● Building permits
● Campaign finance
● Corporate records
● Election
● Inspections
● Planning & Zoning
● L...
Open Records Laws
● Know and understand your rights
● Try to negotiate first
● Seek expert advice (CalAware, CFAC)
● Don’t...
FOIA Resources
● RCFP Open Gov Guide
● RCFP Letter Generator
● FOIA Machine
● Experts: CalAware and CFAC
So I’ve found data. Now what?
Understand the Data.
● What is the origin of the data?
● What do the fields mean?
● What rules surround the data?
● Seek e...
Wrangle the Data.
● What format is the source data?
● How do I convert the data for tool of choice?
● Explore the data. Is...
Sort, Filter, Sum, etc.
● Spreadsheets can take you far.
● Aggregate functions in SQL.
● Patterns and outliers in stats pr...
Add tools as needed.
Tools are abundant, free and paid.
Knowledge is abundant, freely shared*.
(*see IRE-L/NICAR-L)
Keep reporting.
Most often data is a starting point or
supplement. Check conclusions in the real
world and circle back to ...
If you’re a visual person...
...confounded by the last few bits (like me)...
Talk to people
“What data do I need to
answer my question?”
Get The Data
Clean The Data
Check The Data
Interview The Data ...
Quick Hit Data Wrangling
Story idea is the key.
Most stats were already available and
supported or confirmed by reporting. But we
wanted county bre...
Data wrangling ain’t pretty.
We got (dirty) data for 2013.
● copy/paste -> Excel = Fail
● pdftk -> CSV -> Excel = Fail
● p...
Check the data.
A few strategies to ensure accuracy:
● Manually calculate a sample of subtotals,
compare to calculated res...
Keep a Data Diary
● Document data sources
● Document field descriptions, quirks, etc.
● Document data cleaning process
● D...
Remember.
Journicorns don’t exist.
The Data Padawan
● See data as another source.
● Find and master tools, as needed.
● Write stories.
● Keep learning.
● Rin...
Join the Community
If you do nothing else, sign up for
IRE-L and NICAR-L.
Also, shameless plug for PythonJournos.
Ping me.
Serdar Tumgoren
@zstumgoren
zstumgoren@gmail.com
http://www.slideshare.net/serdartumgoren
Data Journalism 101
Data Journalism 101
Data Journalism 101
Data Journalism 101
Upcoming SlideShare
Loading in...5
×

Data Journalism 101

253

Published on

Data Journalism 101 workshop, presented by AP data journalist Serdar Tumgoren on April 29, 2014 to Bay Area journalists. Organized by the Society of Professional Journalists - Northern California chapter.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
253
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Data Journalism 101"

  1. 1. Data Journalism 101
  2. 2. What is data journalism? ? ? ? ? ? ? ? ? ? ?? ? ???
  3. 3. “Wrangling, vetting and visualizing data to bring forth news stories in the public interest that we never would have found otherwise.” - Garance Burke, AP data journalist
  4. 4. “A data journalist is anyone ...who can fluently work with this primary source [data]. It’s the same as a traditional reporter, who should know how to hunt down human sources and interview them.” - Me (I know, so lame to quote yourself)
  5. 5. “Data journalism is a form of reporting that makes use of structured data (e.g. spreadsheets, databases) as a key component of researching and telling stories.” - Chad Skelton, data journalist at Vancouver Sun and journalism instructor
  6. 6. “Data can be the source of data journalism, or it can be the tool with which the story is told — or it can be both. Like any source, it should be treated with scepticism; and like any tool, we should be conscious of how it can shape and restrict the stories that are created with it.” - Paul Bradshaw, Data Journalism Handbook
  7. 7. DJ in the wild
  8. 8. Data can be used to... ● Fact-check official narratives ● Identify trends and patterns ● Rank things ● Analyze relationships ● Find questions to explore ● Automate breaking news alerts
  9. 9. Step-by-Step Guide on How To Become a Journicorn
  10. 10. Step 1: Master the Basics In no particular order: Excel, MySQL, Postgres, SPSS, R, Javascript, Linux, Python, Ruby, QGIS, pdftk, ARCGIS, Ruby on Rails, Django, Backbone, Node, Hadoop, Mongo, C, Algol, Hypercard, Can, You, Tell, I’m, Just, Making, Shit, Up, Now?
  11. 11. Don’t try to be a Journicorn. (Hint: They don’t exist.)
  12. 12. Be a journalist who uses data. Data is just another source.
  13. 13. Start with a Question, then Data ● Are housing prices going up? ● Do reports of falling crime bear out across the entire city? ● Are developers helping to finance campaigns of politicians who approved their projects? ● Are public employee salaries on the rise?
  14. 14. Data sources ● Public agencies (local, county, state, federal) ● Data.gov sites ● Social networking sites (often APIs) ● Nonprofits/industry experts ● Academic institutions ● Manually gathered
  15. 15. Databases of Databases ● Paid ○ Accurint ($) ○ Nexis ($) ● Free ○ BRB ○ Online Searches ○ Libraries
  16. 16. Not everything is on the web. A whole world of data may never see light of day on gov websites. How do you find it? ● Government forms provide clues ● Gov employees ● Software contracts and manuals
  17. 17. Useful datasets ● Building permits ● Campaign finance ● Corporate records ● Election ● Inspections ● Planning & Zoning ● Land records ● Etc. Etc.
  18. 18. Open Records Laws ● Know and understand your rights ● Try to negotiate first ● Seek expert advice (CalAware, CFAC) ● Don’t go fishing; craft targeted requests ● Follow through on requests
  19. 19. FOIA Resources ● RCFP Open Gov Guide ● RCFP Letter Generator ● FOIA Machine ● Experts: CalAware and CFAC
  20. 20. So I’ve found data. Now what?
  21. 21. Understand the Data. ● What is the origin of the data? ● What do the fields mean? ● What rules surround the data? ● Seek expert advice and sanity checks.
  22. 22. Wrangle the Data. ● What format is the source data? ● How do I convert the data for tool of choice? ● Explore the data. Is it dirty? ● What cleanups are needed to answer my question?
  23. 23. Sort, Filter, Sum, etc. ● Spreadsheets can take you far. ● Aggregate functions in SQL. ● Patterns and outliers in stats programs.
  24. 24. Add tools as needed. Tools are abundant, free and paid. Knowledge is abundant, freely shared*. (*see IRE-L/NICAR-L)
  25. 25. Keep reporting. Most often data is a starting point or supplement. Check conclusions in the real world and circle back to refine and qualify data analyses.
  26. 26. If you’re a visual person... ...confounded by the last few bits (like me)...
  27. 27. Talk to people “What data do I need to answer my question?” Get The Data Clean The Data Check The Data Interview The Data Interview People Display The Data Tell The Story The Data Journalism Process
  28. 28. Quick Hit Data Wrangling
  29. 29. Story idea is the key. Most stats were already available and supported or confirmed by reporting. But we wanted county breakdowns for 2013 (most recent full year of granular data). So...
  30. 30. Data wrangling ain’t pretty. We got (dirty) data for 2013. ● copy/paste -> Excel = Fail ● pdftk -> CSV -> Excel = Fail ● pdftk -> CSV -> python -> Excel = Success
  31. 31. Check the data. A few strategies to ensure accuracy: ● Manually calculate a sample of subtotals, compare to calculated results. ● Compare totals to summary stats from third party. ● Have someone else check your work.
  32. 32. Keep a Data Diary ● Document data sources ● Document field descriptions, quirks, etc. ● Document data cleaning process ● Document analysis
  33. 33. Remember. Journicorns don’t exist.
  34. 34. The Data Padawan ● See data as another source. ● Find and master tools, as needed. ● Write stories. ● Keep learning. ● Rinse and repeat. ● The end.
  35. 35. Join the Community If you do nothing else, sign up for IRE-L and NICAR-L. Also, shameless plug for PythonJournos.
  36. 36. Ping me. Serdar Tumgoren @zstumgoren zstumgoren@gmail.com http://www.slideshare.net/serdartumgoren
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×