0
We are surroundedby data             2013-02-06                    Toronto Data Science Group                             ...
We are surrounded byMESSY data                                       2013-02-06                                           ...
Lack of              2013-02-06                     Toronto Data Science Group Time          Skills            »      Soft...
OpenRefine the                         2013-02-06                                       Toronto Data Science Group - Swiss...
Whats OpenRefine(former Google Refine, former Gridworks)   2013-02-06                                           Toronto Da...
Three use case                         2013-02-06                                       Toronto Data Science Group1. Data ...
#1 Data Cleaning                    2013-02-06                                    Toronto Data Science Group Graphical int...
Facet example   2013-02-06                Toronto Data Science Group                                       8
Cluster example   2013-02-06                  Toronto Data Science Group                                         9
#2 ETL Prototyping(Extract – Transform - Load)               2013-02-06                                           Toronto ...
History and JSON export   2013-02-06                          Toronto Data Science Group                                  ...
#3 Extend your Data(reconciliation & linked data)                 2013-02-06                                              ...
Reconciliation example   2013-02-06                         Toronto Data Science Group                                    ...
2013-02-06                                      Toronto Data Science Group   Thanks!Martin Magdinier             OpenRefin...
Upcoming SlideShare
Loading in...5
×

20130206 open refine

1,531

Published on

10 presentation of OpenRefine (former Google Refine) for the Toronto Data Science group.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,531
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
28
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "20130206 open refine"

  1. 1. We are surroundedby data 2013-02-06 Toronto Data Science Group 1
  2. 2. We are surrounded byMESSY data 2013-02-06 Toronto Data Science Group - Multiple standards and formats Structured vs unstructured Field nomination and format varies ... - Human Error (misspellings, errors, etc) - Non-normalized inputs (free-text entries, the “other" option) - Incomplete data (laziness) .... 2
  3. 3. Lack of 2013-02-06 Toronto Data Science Group Time Skills » Software 3
  4. 4. OpenRefine the 2013-02-06 Toronto Data Science Group - Swiss army knife for data manipulation! - glue step between your IT systems 4
  5. 5. Whats OpenRefine(former Google Refine, former Gridworks) 2013-02-06 Toronto Data Science Group - A Cross platform Web Application that runs locally - A Community based project hosted on GitHub - Which have two distributions and multiple extensions - Something between a spreadsheet and SQL 5
  6. 6. Three use case 2013-02-06 Toronto Data Science Group1. Data Cleaning2. ETL (Extract Transform Load) Prototyping3. Data extension (reconciliation & linked data) 6
  7. 7. #1 Data Cleaning 2013-02-06 Toronto Data Science Group Graphical interface Cluster similar record Facet option Support three languages: - GREL Jyton, Clojure + regex 7
  8. 8. Facet example 2013-02-06 Toronto Data Science Group 8
  9. 9. Cluster example 2013-02-06 Toronto Data Science Group 9
  10. 10. #2 ETL Prototyping(Extract – Transform - Load) 2013-02-06 Toronto Data Science Group Extract & Load Transform Support: - Understand your data - tabular (csv, xls) - Test the transformation that - hierarchical (xml, json) need to be done - Undo / Redo - Export transformation in JSON format - Automate using the python or ruby extension 10
  11. 11. History and JSON export 2013-02-06 Toronto Data Science Group 11
  12. 12. #3 Extend your Data(reconciliation & linked data) 2013-02-06 Toronto Data Science Group- Cross between Reconcile against OpenRefine projects - RDF file & Local SPARQL (vlookup) endpoints- Fetch URL and - Online databases call web services (API) 12
  13. 13. Reconciliation example 2013-02-06 Toronto Data Science Group 13
  14. 14. 2013-02-06 Toronto Data Science Group Thanks!Martin Magdinier OpenRefinemartin.magdinier@gmail.com http://openrefine.org@magdmartin @OpenRefine 14
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×