Open datapolicy danmurray_goopendata2015_kitchenersapproachtoopendata
Data analytics martinmagdinier-go open data 2015
1. Data Analytics & Visualization Cool Tools: OpenRefine
Martin Magdinier @magdmartin
1
01/05/2015
Data Analytics & Visualization
Cool Tools:
Martin Magdinier
@magdmartin
2. Data Analytics & Visualization Cool Tools: OpenRefine
Martin Magdinier @magdmartin
2
01/05/2015
About Me
● Contributor since 2011
● Committer since 2012
● http://openrefine.org
● @OpenRefine
● Toronto OpenRefine
Meetup Organizer
● Next Sessions:
May 21 and June 18
● Founder 2014
● OpenRefine Hosting
● http://refinepro.com
● @RefinePro
3. Data Analytics & Visualization Cool Tools: OpenRefine
Martin Magdinier @magdmartin
3
01/05/2015
80% of data analysis
is spent on the process of
cleaning, transformation and integration
4. Data Analytics & Visualization Cool Tools: OpenRefine
Martin Magdinier @magdmartin
4
01/05/2015
Cleaning
● Duplicate value
● Typos
● Multi value cells
● Data in the wrong field
● Missing / Partial Values
● Encoding Errors
● Wrong format (text,
number, date ...)
Integration &
Transformation
● Flat to relational data set
● Schema alignment
● Transpose
● Join data-set
● Enrichment from other
sources
● ....
5. Data Analytics & Visualization Cool Tools: OpenRefine
Martin Magdinier @magdmartin
5
01/05/2015
Bridging The Skill Gap
Spreadsheet
Basic Knowledge of Scripting
python, R, command line ...
ETL
Engineer
Data Science
Data Visualization / Interpretation
Understand The Data
(Business Skills)
Know How To
Transform Data
(Technical Skills)
6. Data Analytics & Visualization Cool Tools: OpenRefine
Martin Magdinier @magdmartin
6
01/05/2015
Discovery Wrangling
In application feedback
(personal usage)
Profiling Preparation
ad hoc usage
reporting - migration
Quality Transformation
Industralization
Integration
Measure
Check
Build - Do
Learn Think
Plan - Act
A Lean Data Model
7. Data Analytics & Visualization Cool Tools: OpenRefine
Martin Magdinier @magdmartin
7
01/05/2015
History
● 2009: Freebase Gridworks release
● 2010: Gridworks become Google Refine
● 2010: Google Refine 2.0 release
● 2011: Google Refine 2.5 release
● 2012: Google Refine become OpenRefine
● 2013: OpenRefine2.6-beta release
8. Data Analytics & Visualization Cool Tools: OpenRefine
Martin Magdinier @magdmartin
8
01/05/2015
OpenRefine Eco System
9. Data Analytics & Visualization Cool Tools: OpenRefine
Martin Magdinier @magdmartin
9
01/05/2015
Getting Started
Setting Up Refine
● Download OpenRefine2.6-
beta:
http://openrefine.org/download.html
● Unzip the file
● Start Refine
– Win: Double Click refine.exe
– Linux: ./refine
– Mac: Use RefinePro
● Register on RefinePro Beta
http://app.refinepro.com
● Use Chrome or Firefox
● Create An Instance and click
Access Instance
(can take up to 5 min the first time)
10. Data Analytics & Visualization Cool Tools: OpenRefine
Martin Magdinier @magdmartin
10
01/05/2015
Workshop / Demo: 2014 Toronto
Cleared Building Permits
● Presentation page: http://ow.ly/Js8GD
● Download Data: http://ow.ly/Js8Ho