Your SlideShare is downloading. ×
Big Ugly Datasets For Thumb-Fingered Journalists
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Big Ugly Datasets For Thumb-Fingered Journalists


Published on

Presentation by Nick Judd. Audio is here:

Presentation by Nick Judd. Audio is here:

1 Comment
1 Like
No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Big, Ugly Datasets for Thumb-Fingered Journalists
    @nclarkjudd, thumb-fingered journalist
  • 2. We’re swimming in data
    Open Graph
    Social Media Data Mining
    Government Data
  • 3. It’s not getting easier to use
    … With exceptions, like TimeFlow
  • 4. This is where we come in
    There’s an increasing need for journalists at all levels to be equipped to acquire and analyze big, ugly datasets
    Without the resources of a New York Times or Washington Post, how do you do that?
  • 5. What are you doing with data?
    Exploring: Looking for patterns, following hunches, finding context and background — looking to be surprised
    Deducing: Proving a hypothesis, pulling specific records — looking for something in particular
  • 6. Know right questions to ask
    When you’re picking a dataset to use, understand its:
  • 7. Data Workflow
    Understand your needs
    Acquire your data (Download, FOIL, Sources)
    Clean your data
    Load it into a Relational Database Management System (RDBMS)
    Analyze what you’ve got
    Output relevant segments for visualization
  • 8. Cleaning Your Data
    Use a script or a robust text editor like vi
    It’s difficult. It takes a while. It gets done.
  • 9. Load your data
  • 10. Fail and Iterate
    Again: It probably won’t work the first time.
    It’s difficult. It takes a while. It gets done.
  • 11. Analyze
    Check your script. Did I write my query correctly?
    Write queries multiple ways. Do the numbers add up the same when the RDBMS makes sums and when I do them?
    Use checksums: Can I compare results from a segment of this data with previously published and vetted results? Are they the same?
    Consult experts: Ask — Does this mean what I think it means? Do these results make sense?
    Output smaller segments of your data to another tool such as Socrata or ManyEyes in order to generate graphs, tables, and visualizations
  • 12. Share
    Photo: Britta Bohllinger / Flickr
  • Resources
  • 15. Assignment
    You are an investigative team that does freelance work around the country and are working up a pitch for your next project.
    Pick a subject matter you want to investigate
    Identify a dataset or datasets that will help you formulate your story. For this exercise, only pick one available on the Web already, e.g. through
    What do you need to clean these data?
    The schema you’ll make to house the dataset(s)
    What are you doing with this data — are you using it for exploratory or deductive reasoning?
    What will your queries look like? Will you join multiple databases together? If so, how are you sure the results will be relevant?
    How will you express the results of your inquiry?
    What questions won’t the data answer that you want to address in your project? Who will you turn to as you start looking for these answers?