20130206 open refine

1,705
-1

Published on

10 presentation of OpenRefine (former Google Refine) for the Toronto Data Science group.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,705
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
31
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

20130206 open refine

  1. 1. We are surroundedby data 2013-02-06 Toronto Data Science Group 1
  2. 2. We are surrounded byMESSY data 2013-02-06 Toronto Data Science Group - Multiple standards and formats Structured vs unstructured Field nomination and format varies ... - Human Error (misspellings, errors, etc) - Non-normalized inputs (free-text entries, the “other" option) - Incomplete data (laziness) .... 2
  3. 3. Lack of 2013-02-06 Toronto Data Science Group Time Skills » Software 3
  4. 4. OpenRefine the 2013-02-06 Toronto Data Science Group - Swiss army knife for data manipulation! - glue step between your IT systems 4
  5. 5. Whats OpenRefine(former Google Refine, former Gridworks) 2013-02-06 Toronto Data Science Group - A Cross platform Web Application that runs locally - A Community based project hosted on GitHub - Which have two distributions and multiple extensions - Something between a spreadsheet and SQL 5
  6. 6. Three use case 2013-02-06 Toronto Data Science Group1. Data Cleaning2. ETL (Extract Transform Load) Prototyping3. Data extension (reconciliation & linked data) 6
  7. 7. #1 Data Cleaning 2013-02-06 Toronto Data Science Group Graphical interface Cluster similar record Facet option Support three languages: - GREL Jyton, Clojure + regex 7
  8. 8. Facet example 2013-02-06 Toronto Data Science Group 8
  9. 9. Cluster example 2013-02-06 Toronto Data Science Group 9
  10. 10. #2 ETL Prototyping(Extract – Transform - Load) 2013-02-06 Toronto Data Science Group Extract & Load Transform Support: - Understand your data - tabular (csv, xls) - Test the transformation that - hierarchical (xml, json) need to be done - Undo / Redo - Export transformation in JSON format - Automate using the python or ruby extension 10
  11. 11. History and JSON export 2013-02-06 Toronto Data Science Group 11
  12. 12. #3 Extend your Data(reconciliation & linked data) 2013-02-06 Toronto Data Science Group- Cross between Reconcile against OpenRefine projects - RDF file & Local SPARQL (vlookup) endpoints- Fetch URL and - Online databases call web services (API) 12
  13. 13. Reconciliation example 2013-02-06 Toronto Data Science Group 13
  14. 14. 2013-02-06 Toronto Data Science Group Thanks!Martin Magdinier OpenRefinemartin.magdinier@gmail.com http://openrefine.org@magdmartin @OpenRefine 14
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×