Your SlideShare is downloading. ×
20130206  open refine
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

20130206 open refine

1,460

Published on

10 presentation of OpenRefine (former Google Refine) for the Toronto Data Science group.

10 presentation of OpenRefine (former Google Refine) for the Toronto Data Science group.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,460
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
26
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. We are surroundedby data 2013-02-06 Toronto Data Science Group 1
  • 2. We are surrounded byMESSY data 2013-02-06 Toronto Data Science Group - Multiple standards and formats Structured vs unstructured Field nomination and format varies ... - Human Error (misspellings, errors, etc) - Non-normalized inputs (free-text entries, the “other" option) - Incomplete data (laziness) .... 2
  • 3. Lack of 2013-02-06 Toronto Data Science Group Time Skills » Software 3
  • 4. OpenRefine the 2013-02-06 Toronto Data Science Group - Swiss army knife for data manipulation! - glue step between your IT systems 4
  • 5. Whats OpenRefine(former Google Refine, former Gridworks) 2013-02-06 Toronto Data Science Group - A Cross platform Web Application that runs locally - A Community based project hosted on GitHub - Which have two distributions and multiple extensions - Something between a spreadsheet and SQL 5
  • 6. Three use case 2013-02-06 Toronto Data Science Group1. Data Cleaning2. ETL (Extract Transform Load) Prototyping3. Data extension (reconciliation & linked data) 6
  • 7. #1 Data Cleaning 2013-02-06 Toronto Data Science Group Graphical interface Cluster similar record Facet option Support three languages: - GREL Jyton, Clojure + regex 7
  • 8. Facet example 2013-02-06 Toronto Data Science Group 8
  • 9. Cluster example 2013-02-06 Toronto Data Science Group 9
  • 10. #2 ETL Prototyping(Extract – Transform - Load) 2013-02-06 Toronto Data Science Group Extract & Load Transform Support: - Understand your data - tabular (csv, xls) - Test the transformation that - hierarchical (xml, json) need to be done - Undo / Redo - Export transformation in JSON format - Automate using the python or ruby extension 10
  • 11. History and JSON export 2013-02-06 Toronto Data Science Group 11
  • 12. #3 Extend your Data(reconciliation & linked data) 2013-02-06 Toronto Data Science Group- Cross between Reconcile against OpenRefine projects - RDF file & Local SPARQL (vlookup) endpoints- Fetch URL and - Online databases call web services (API) 12
  • 13. Reconciliation example 2013-02-06 Toronto Data Science Group 13
  • 14. 2013-02-06 Toronto Data Science Group Thanks!Martin Magdinier OpenRefinemartin.magdinier@gmail.com http://openrefine.org@magdmartin @OpenRefine 14

×