Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

A Quick Tour of OpenRefine

2,455 views

Published on

A quick feature tour of OpenRefine, using an example open spending data dataset from Birmingham City Council.

Published in: Technology
  • Be the first to comment

A Quick Tour of OpenRefine

  1. 1. Wrangling Data with OpenRefine Tony Hirst Computing and Communications The Open University @psychemedia
  2. 2. “It’s … a great joy to learn a technique, because as soon as you learn it, you start thinking in it. When I learn a new technique my imaginative possibilities have expanded.” Grayson Perry, Playing to the Gallery
  3. 3. Real data is often dirty and messy
  4. 4. Loading Checking Exploring Cleaning Reshaping Annotating Saving
  5. 5. This is a hands-on workshop, so fire up your laptops…
  6. 6. Loading – import source
  7. 7. Loading - filetype
  8. 8. Loading - encoding
  9. 9. Exploring – text facets
  10. 10. Exploring – text facets
  11. 11. Exploring – text facets
  12. 12. Saving - customisation
  13. 13. Cleaning – text facets
  14. 14. Checking – no blanks
  15. 15. Cleaning – tidying columns
  16. 16. Cleaning – tidying numbers
  17. 17. Cleaning – tidying numbers
  18. 18. value.replace(‘£’, ‘’).replace(‘,’ , ‘’) Cleaning – tidying numbers
  19. 19. Cleaning – tidying numbers
  20. 20. Exploring – number facets
  21. 21. Exploring – filtering number ranges
  22. 22. Exploring – sorting columns
  23. 23. Cleaning – making dates
  24. 24. value.toDate( ‘d/M/y’ ) Cleaning – making dates
  25. 25. http://bit.ly/javadateformat Cleaning – making dates
  26. 26. Exploring – filtering date ranges
  27. 27. Cleaning - whitespace
  28. 28. Cleaning - whitespace
  29. 29. Cleaning - whitespace
  30. 30. Exploring – filter and facet
  31. 31. Cleaning – “ish-match”
  32. 32. Cleaning – cluster / make alike
  33. 33. Cleaning – good practice
  34. 34. We need a small dataset for the next example…
  35. 35. Annotating – reconciliation
  36. 36. https://opencorporates.com/reconcile Annotating – reconciliation
  37. 37. Annotating – reconciliation
  38. 38. value.replace( / LTD.?/, ‘ LIMITED’) Cleaning – normalisation
  39. 39. Annotating – reconciled data
  40. 40. cell.recon.match.id Annotating – reconciled data
  41. 41. cell.recon.match.name Annotating – reconciled data
  42. 42. Annotating – reconciled data
  43. 43. https://api.opencorporates.com/companies/gb/00102498
  44. 44. Annotating – URL based data
  45. 45. 'https://api.opencorporates.com'+value+'?sparse=true' Annotating – URL based data
  46. 46. Annotating – URL based data
  47. 47. JSON['results’]
  48. 48. JSON['results']['company’]
  49. 49. JSON['results']['company']['registered_address_in_full']
  50. 50. value.parseJson['results']['company']['registered_address_in_full'] Annotating – parsed JSON data
  51. 51. split(value, ‘,’) Annotating – parsed JSON data
  52. 52. split(value, ‘,’)[-1] split(value, ‘,’)[-1].strip() Annotating – parsed JSON data
  53. 53. Saving – annotated data
  54. 54. Loading Checking Exploring Cleaning Reshaping Annotating Saving
  55. 55. Reuse – exporting your action list
  56. 56. Tutorials and walkthroughs http://schoolofdata.org/handbook/ recipes/cleaning-data-with-refine/ http://blog.ouseful.info/categor y/syndication/openrefine Any questions: @psychemedia

×