20130626 OpenRefine Introduction
Upcoming SlideShare
Loading in...5
×
 

20130626 OpenRefine Introduction

on

  • 689 views

An introduction to data profiling and cleaning with OpenRefine, done June 26, 2013 at the Toronto Knowledge Workers Methods Group. This presentation contains links to dataset used during the demo.

An introduction to data profiling and cleaning with OpenRefine, done June 26, 2013 at the Toronto Knowledge Workers Methods Group. This presentation contains links to dataset used during the demo.

Statistics

Views

Total Views
689
Views on SlideShare
689
Embed Views
0

Actions

Likes
1
Downloads
8
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

20130626 OpenRefine Introduction 20130626 OpenRefine Introduction Presentation Transcript

  • 2013-06-26 Knowledge Workers Toronto Methods 1 We are surrounded by data
  • 2013-06-26 Knowledge Workers Toronto Methods 2 ... by MESSY data - Multiple standards and formats Structured vs unstructured Field nomination and format varies ... - Human Error (misspellings, errors, etc) - Non-normalized inputs (free-text entries) - Incomplete data (laziness) ....
  • 2013-06-26 Knowledge Workers Toronto Methods 3
  • 2013-06-26 Knowledge Workers Toronto Methods 4 OpenRefine the - Swiss army knife for data manipulation! - glue step between your IT systems
  • 2013-06-26 Knowledge Workers Toronto Methods 5 What's OpenRefine (former Google Refine, former Gridworks) - A Cross platform Web Application that runs locally - A Community based project hosted on GitHub - Which have two distributions and multiple extensions - Something between a spreadsheet and SQL
  • 2013-06-26 Knowledge Workers Toronto Methods 6 Four use cases 1. Data Profiling * 2. Data Cleaning * 3. Data extension * 4. ETL (Extract Transform Load) Prototyping
  • 2013-06-26 Knowledge Workers Toronto Methods 7 File 1: Data Profiling & Cleaning City Subject Thesaurus XML file with 5431 concept. What we will do: - Explore the file - Fix inconsistencies - Transpose / Merge fields
  • 2013-06-26 Knowledge Workers Toronto Methods 8 File 2: The Economist Best City Contest 2012 Prepare Data for an application to the Economist Intelligence Units (EIU) Best City Contest 2012 using G. Hofstede Values Survey Module 2008 What we will do: - Clean Duplicate - Create New Data - Add data from a different project - Use Project History
  • 2013-06-26 Knowledge Workers Toronto Methods 9 OpenRefine http://openrefine.org @OpenRefine Martin Magdinier martin.magdinier@gmail.com @magdmartin Thanks!
  • 2013-06-26 Knowledge Workers Toronto Methods 10 DESCRIPTOR The preferred term FAC Facet SC Subject category of the term SN Scope note SRC Source of term UF Used for BT Broader term City Subject Thesaurus Legend NT Narrower term RT Related term STA Term status INP Input date APP Approval date UPD Modified date TNR Term number