• Save
Big Ugly Datasets For Thumb-Fingered Journalists
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Big Ugly Datasets For Thumb-Fingered Journalists

on

  • 1,162 views

Presentation by Nick Judd. Audio is here: http://ow.ly/2RMQG

Presentation by Nick Judd. Audio is here: http://ow.ly/2RMQG

Statistics

Views

Total Views
1,162
Views on SlideShare
1,160
Embed Views
2

Actions

Likes
1
Downloads
0
Comments
1

1 Embed 2

http://atviriduomenys.wordpress.com 2

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Big Ugly Datasets For Thumb-Fingered Journalists Presentation Transcript

  • 1. Big, Ugly Datasets for Thumb-Fingered Journalists
    @nclarkjudd, thumb-fingered journalist
  • 2. We’re swimming in data
    Open Graph
    Social Media Data Mining
    Government Data
  • 3. It’s not getting easier to use
    … With exceptions, like TimeFlow
  • 4. This is where we come in
    There’s an increasing need for journalists at all levels to be equipped to acquire and analyze big, ugly datasets
    Without the resources of a New York Times or Washington Post, how do you do that?
  • 5. What are you doing with data?
    Exploring: Looking for patterns, following hunches, finding context and background — looking to be surprised
    Deducing: Proving a hypothesis, pulling specific records — looking for something in particular
  • 6. Know right questions to ask
    When you’re picking a dataset to use, understand its:
    Provenance
    Sampling
    Method
    Quality
    Completeness
  • 7. Data Workflow
    Understand your needs
    Acquire your data (Download, FOIL, Sources)
    Clean your data
    Load it into a Relational Database Management System (RDBMS)
    Analyze what you’ve got
    Output relevant segments for visualization
  • 8. Cleaning Your Data
    Use a script or a robust text editor like vi
    It’s difficult. It takes a while. It gets done.
  • 9. Load your data
  • 10. Fail and Iterate
    Again: It probably won’t work the first time.
    It’s difficult. It takes a while. It gets done.
  • 11. Analyze
    Check your script. Did I write my query correctly?
    Write queries multiple ways. Do the numbers add up the same when the RDBMS makes sums and when I do them?
    Use checksums: Can I compare results from a segment of this data with previously published and vetted results? Are they the same?
    Consult experts: Ask — Does this mean what I think it means? Do these results make sense?
    Output smaller segments of your data to another tool such as Socrata or ManyEyes in order to generate graphs, tables, and visualizations
  • 12. Share
    Photo: Britta Bohllinger / Flickr
    • SPJ.org
    • 13. IRE.org
    • 14. HacksHackers.com
  • Resources
    http://dev.mysql.com/doc/refman/5.1/en/
    http://github.com/FlowingMedia/TimeFlow/wiki
    http://www.lagmonster.org/docs/vi.html
    http://www.socrata.com/
    http://www.data.gov
  • 15. Assignment
    You are an investigative team that does freelance work around the country and are working up a pitch for your next project.
    Pick a subject matter you want to investigate
    Identify a dataset or datasets that will help you formulate your story. For this exercise, only pick one available on the Web already, e.g. through Data.gov.
    Plan:
    What do you need to clean these data?
    The schema you’ll make to house the dataset(s)
    What are you doing with this data — are you using it for exploratory or deductive reasoning?
    What will your queries look like? Will you join multiple databases together? If so, how are you sure the results will be relevant?
    How will you express the results of your inquiry?
    What questions won’t the data answer that you want to address in your project? Who will you turn to as you start looking for these answers?