Your SlideShare is downloading. ×
0
Big Ugly Datasets For Thumb-Fingered Journalists
Big Ugly Datasets For Thumb-Fingered Journalists
Big Ugly Datasets For Thumb-Fingered Journalists
Big Ugly Datasets For Thumb-Fingered Journalists
Big Ugly Datasets For Thumb-Fingered Journalists
Big Ugly Datasets For Thumb-Fingered Journalists
Big Ugly Datasets For Thumb-Fingered Journalists
Big Ugly Datasets For Thumb-Fingered Journalists
Big Ugly Datasets For Thumb-Fingered Journalists
Big Ugly Datasets For Thumb-Fingered Journalists
Big Ugly Datasets For Thumb-Fingered Journalists
Big Ugly Datasets For Thumb-Fingered Journalists
Big Ugly Datasets For Thumb-Fingered Journalists
Big Ugly Datasets For Thumb-Fingered Journalists
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Big Ugly Datasets For Thumb-Fingered Journalists

1,055

Published on

Presentation by Nick Judd. Audio is here: http://ow.ly/2RMQG

Presentation by Nick Judd. Audio is here: http://ow.ly/2RMQG

1 Comment
1 Like
Statistics
Notes
No Downloads
Views
Total Views
1,055
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
1
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  1. Big, Ugly Datasets for Thumb-Fingered Journalists <br />@nclarkjudd, thumb-fingered journalist<br />
  2. We’re swimming in data<br />Open Graph<br />Social Media Data Mining<br />Government Data<br />
  3. It’s not getting easier to use<br />… With exceptions, like TimeFlow<br />
  4. This is where we come in <br />There’s an increasing need for journalists at all levels to be equipped to acquire and analyze big, ugly datasets<br />Without the resources of a New York Times or Washington Post, how do you do that?<br />
  5. What are you doing with data?<br />Exploring: Looking for patterns, following hunches, finding context and background — looking to be surprised<br />Deducing: Proving a hypothesis, pulling specific records — looking for something in particular<br />
  6. Know right questions to ask<br />When you’re picking a dataset to use, understand its:<br />Provenance<br />Sampling<br />Method<br />Quality<br />Completeness<br />
  7. Data Workflow<br />Understand your needs<br />Acquire your data (Download, FOIL, Sources)<br />Clean your data<br />Load it into a Relational Database Management System (RDBMS)<br />Analyze what you’ve got<br />Output relevant segments for visualization<br />
  8. Cleaning Your Data<br />Use a script or a robust text editor like vi<br />It’s difficult. It takes a while. It gets done.<br />
  9. Load your data <br />
  10. Fail and Iterate<br />Again: It probably won’t work the first time.<br />It’s difficult. It takes a while. It gets done.<br />
  11. Analyze<br />Check your script. Did I write my query correctly?<br />Write queries multiple ways. Do the numbers add up the same when the RDBMS makes sums and when I do them?<br />Use checksums: Can I compare results from a segment of this data with previously published and vetted results? Are they the same?<br />Consult experts: Ask — Does this mean what I think it means? Do these results make sense?<br />Output smaller segments of your data to another tool such as Socrata or ManyEyes in order to generate graphs, tables, and visualizations<br />
  12. Share<br />Photo: Britta Bohllinger / Flickr<br /><ul><li>SPJ.org
  13. IRE.org
  14. HacksHackers.com</li></li></ul><li>Resources <br />http://dev.mysql.com/doc/refman/5.1/en/<br />http://github.com/FlowingMedia/TimeFlow/wiki<br />http://www.lagmonster.org/docs/vi.html<br />http://www.socrata.com/<br />http://www.data.gov<br />
  15. Assignment<br />You are an investigative team that does freelance work around the country and are working up a pitch for your next project.<br />Pick a subject matter you want to investigate<br />Identify a dataset or datasets that will help you formulate your story. For this exercise, only pick one available on the Web already, e.g. through Data.gov.<br />Plan:<br />What do you need to clean these data?<br />The schema you’ll make to house the dataset(s)<br />What are you doing with this data — are you using it for exploratory or deductive reasoning?<br />What will your queries look like? Will you join multiple databases together? If so, how are you sure the results will be relevant?<br />How will you express the results of your inquiry?<br />What questions won’t the data answer that you want to address in your project? Who will you turn to as you start looking for these answers?<br />

×