2013-06-26
Knowledge Workers Toronto
Methods
1
We are surrounded by data
2013-06-26
Knowledge Workers Toronto
Methods
2
... by MESSY data
- Multiple standards and formats
Structured vs unstructur...
2013-06-26
Knowledge Workers Toronto
Methods
3
2013-06-26
Knowledge Workers Toronto
Methods
4
OpenRefine the
- Swiss army knife for data manipulation!
- glue step betwee...
2013-06-26
Knowledge Workers Toronto
Methods
5
What's OpenRefine
(former Google Refine, former Gridworks)
- A Cross platfo...
2013-06-26
Knowledge Workers Toronto
Methods
6
Four use cases
1. Data Profiling *
2. Data Cleaning *
3. Data extension *
4...
2013-06-26
Knowledge Workers Toronto
Methods
7
File 1: Data Profiling &
Cleaning
City Subject Thesaurus XML file with 5431...
2013-06-26
Knowledge Workers Toronto
Methods
8
File 2: The Economist Best
City Contest 2012
Prepare Data for an applicatio...
2013-06-26
Knowledge Workers Toronto
Methods
9
OpenRefine
http://openrefine.org
@OpenRefine
Martin Magdinier
martin.magdin...
2013-06-26
Knowledge Workers Toronto
Methods
10
DESCRIPTOR The preferred term
FAC Facet
SC Subject category of the term
SN...
Upcoming SlideShare
Loading in...5
×

20130626 OpenRefine Introduction

646

Published on

An introduction to data profiling and cleaning with OpenRefine, done June 26, 2013 at the Toronto Knowledge Workers Methods Group. This presentation contains links to dataset used during the demo.

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
646
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
9
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

20130626 OpenRefine Introduction

  1. 1. 2013-06-26 Knowledge Workers Toronto Methods 1 We are surrounded by data
  2. 2. 2013-06-26 Knowledge Workers Toronto Methods 2 ... by MESSY data - Multiple standards and formats Structured vs unstructured Field nomination and format varies ... - Human Error (misspellings, errors, etc) - Non-normalized inputs (free-text entries) - Incomplete data (laziness) ....
  3. 3. 2013-06-26 Knowledge Workers Toronto Methods 3
  4. 4. 2013-06-26 Knowledge Workers Toronto Methods 4 OpenRefine the - Swiss army knife for data manipulation! - glue step between your IT systems
  5. 5. 2013-06-26 Knowledge Workers Toronto Methods 5 What's OpenRefine (former Google Refine, former Gridworks) - A Cross platform Web Application that runs locally - A Community based project hosted on GitHub - Which have two distributions and multiple extensions - Something between a spreadsheet and SQL
  6. 6. 2013-06-26 Knowledge Workers Toronto Methods 6 Four use cases 1. Data Profiling * 2. Data Cleaning * 3. Data extension * 4. ETL (Extract Transform Load) Prototyping
  7. 7. 2013-06-26 Knowledge Workers Toronto Methods 7 File 1: Data Profiling & Cleaning City Subject Thesaurus XML file with 5431 concept. What we will do: - Explore the file - Fix inconsistencies - Transpose / Merge fields
  8. 8. 2013-06-26 Knowledge Workers Toronto Methods 8 File 2: The Economist Best City Contest 2012 Prepare Data for an application to the Economist Intelligence Units (EIU) Best City Contest 2012 using G. Hofstede Values Survey Module 2008 What we will do: - Clean Duplicate - Create New Data - Add data from a different project - Use Project History
  9. 9. 2013-06-26 Knowledge Workers Toronto Methods 9 OpenRefine http://openrefine.org @OpenRefine Martin Magdinier martin.magdinier@gmail.com @magdmartin Thanks!
  10. 10. 2013-06-26 Knowledge Workers Toronto Methods 10 DESCRIPTOR The preferred term FAC Facet SC Subject category of the term SN Scope note SRC Source of term UF Used for BT Broader term City Subject Thesaurus Legend NT Narrower term RT Related term STA Term status INP Input date APP Approval date UPD Modified date TNR Term number
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×