Combining data with Google Refine

  • 4,897 views
Uploaded on

Presentation at the Global Investigative Journalism Conference, Kiev, Ukraine, 15 October 2011

Presentation at the Global Investigative Journalism Conference, Kiev, Ukraine, 15 October 2011

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
4,897
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
0
Comments
0
Likes
4

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Get yourself ready • Google ‘Google Refine download’ http://code.google.com/p/google- refine/wiki/Downloads • Download and install Google Refine • Open it up - it should open in a browser at http://127.0.0.1:3333/Saturday, 15 October 2011
  • 2. Google Refine Combining data OnlineJournalismBlog.com Twitter.com/PaulBradshawSaturday, 15 October 2011
  • 3. In a nutshell... cell.cross("GPdata2008", "Practice Code").cells["Total Listsize"].value[0] • Using GREL to combine datasets • Using APIs to grab geographical data • Using Reconcile services to grab company dataSaturday, 15 October 2011
  • 4. GREL Google Refine Expression LanguageSaturday, 15 October 2011
  • 5. cell.cross("GPdata2008", "Practice Code").cells ["Total Listsize"].value[0]Saturday, 15 October 2011
  • 6. Using APIs Getting contextual data .Saturday, 15 October 2011
  • 7. What’s an API again? Ask it a question, it gives you an answer: “For each of these codes, give me the region.” “For each of these names, tell me their political party”Saturday, 15 October 2011
  • 8. Useful APIs Geo: UK Postcodes, Google Maps Social: Twitter, Facebook, Flickr Politics: They Work For You, Data.gov.uk News: Guardian, NYT, USA Today, NPR Health, business, etc. Search for specific onesSaturday, 15 October 2011
  • 9. API keys Sometimes needed - apply through the site Use it in the request as a passwordSaturday, 15 October 2011
  • 10. API limits Can prevent you getting data for all your records. Try multiple APIs or split your data into multiple sheets - or buy a licenceSaturday, 15 October 2011
  • 11. Saturday, 15 October 2011
  • 12. Get data from an API www.chasedavis.com/refine.htmlSaturday, 15 October 2011
  • 13. Reconciling An easier way to get dataSaturday, 15 October 2011
  • 14. OpenCorporates.com http://vimeo.com/17924204Saturday, 15 October 2011
  • 15. Walkthrough: Reconciliation with Open Corporates • Click on arrow at top of column • Select Reconcile > Start Reconciling... • Click on Add Standard Service... • http://opencorporates.com/reconcile • And start...Saturday, 15 October 2011
  • 16. Walkthrough: Reconciliation with Open Corporates • Click ‘Search for Match’ and select • Click double tick icon to bulk reconcile • Reconcile > Action > Match each cell to its best candidateSaturday, 15 October 2011
  • 17. FreebaseSaturday, 15 October 2011
  • 18. Freebase and namespacesSaturday, 15 October 2011
  • 19. Search for matchSaturday, 15 October 2011
  • 20. Walkthrough: Using Google Refine and APIsSaturday, 15 October 2011
  • 21. Saturday, 15 October 2011
  • 22. Escaping values for URLs "http://maps.googleapis.com/ maps/api/geocode/json? sensor=false&address=" + escape(value, "url")Saturday, 15 October 2011
  • 23. JSON explained {category : value} {category {nested category : value {nested category 2 : value } {category 2 : value}Saturday, 15 October 2011
  • 24. JSON explained {name : citytown} {geo {latitude : 42 {longitude : 2 } {administrative : citytown council}Saturday, 15 October 2011
  • 25. Walkthrough: Using Google Refine to pull out data > Create new column based on this one... GREL: value.parseJson().item1.part2[1]Saturday, 15 October 2011
  • 26. Links Delicious.com/paulb/kiev11 Delicious.com/paulb/googlerefine OnlineJournalismBlog.com/tag/ google-refineSaturday, 15 October 2011