20130206  open refine
Upcoming SlideShare
Loading in...5
×
 

20130206 open refine

on

  • 1,449 views

10 presentation of OpenRefine (former Google Refine) for the Toronto Data Science group.

10 presentation of OpenRefine (former Google Refine) for the Toronto Data Science group.

Statistics

Views

Total Views
1,449
Views on SlideShare
1,446
Embed Views
3

Actions

Likes
1
Downloads
17
Comments
0

1 Embed 3

https://twitter.com 3

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

20130206  open refine 20130206 open refine Presentation Transcript

  • We are surroundedby data 2013-02-06 Toronto Data Science Group 1
  • We are surrounded byMESSY data 2013-02-06 Toronto Data Science Group - Multiple standards and formats Structured vs unstructured Field nomination and format varies ... - Human Error (misspellings, errors, etc) - Non-normalized inputs (free-text entries, the “other" option) - Incomplete data (laziness) .... 2
  • Lack of 2013-02-06 Toronto Data Science Group Time Skills » Software 3
  • OpenRefine the 2013-02-06 Toronto Data Science Group - Swiss army knife for data manipulation! - glue step between your IT systems 4
  • Whats OpenRefine(former Google Refine, former Gridworks) 2013-02-06 Toronto Data Science Group - A Cross platform Web Application that runs locally - A Community based project hosted on GitHub - Which have two distributions and multiple extensions - Something between a spreadsheet and SQL 5
  • Three use case 2013-02-06 Toronto Data Science Group1. Data Cleaning2. ETL (Extract Transform Load) Prototyping3. Data extension (reconciliation & linked data) 6
  • #1 Data Cleaning 2013-02-06 Toronto Data Science Group Graphical interface Cluster similar record Facet option Support three languages: - GREL Jyton, Clojure + regex 7
  • Facet example 2013-02-06 Toronto Data Science Group 8
  • Cluster example 2013-02-06 Toronto Data Science Group 9
  • #2 ETL Prototyping(Extract – Transform - Load) 2013-02-06 Toronto Data Science Group Extract & Load Transform Support: - Understand your data - tabular (csv, xls) - Test the transformation that - hierarchical (xml, json) need to be done - Undo / Redo - Export transformation in JSON format - Automate using the python or ruby extension 10
  • History and JSON export 2013-02-06 Toronto Data Science Group 11
  • #3 Extend your Data(reconciliation & linked data) 2013-02-06 Toronto Data Science Group- Cross between Reconcile against OpenRefine projects - RDF file & Local SPARQL (vlookup) endpoints- Fetch URL and - Online databases call web services (API) 12
  • Reconciliation example 2013-02-06 Toronto Data Science Group 13
  • 2013-02-06 Toronto Data Science Group Thanks!Martin Magdinier OpenRefinemartin.magdinier@gmail.com http://openrefine.org@magdmartin @OpenRefine 14