Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Martin Magdinier - @magdmartin 1
Iterative data discovery and
transformation with
Martin Magdinier - @magdmartin
OpenRefin...
Martin Magdinier - @magdmartin 2
80% of data analysis
is spent on the process of
cleaning, transformation and integration
Martin Magdinier - @magdmartin 3
• Duplicate value & Typos
• Multi value cells
• Data in the wrong field
• Missing / Parti...
Martin Magdinier - @magdmartin 4
OpenRefine
Bridges The Skill Gap
DBA
ETL
Data Science
Spreadsheet User
Data Visualization...
Martin Magdinier - @magdmartin 5
• SaaS and on-premise solution for extra
compute power, collaboration and
lightweight ETL...
Martin Magdinier - @magdmartin 6
Data Engineer
Scale & Automate
Processes
Data Quality
Manage
Master
Data
Agile Data Proce...
Martin Magdinier - @magdmartin 7
Data Engineer
Scale & Automate
Processes
Data Quality
Manage
Master
Data
Data Scientist
D...
Martin Magdinier - @magdmartin 8
Data Engineer
IT
Support
Governance
Access To Data
Scale & Automate
Processes
Data Qualit...
Martin Magdinier - @magdmartin 9
Business Analyst
Data Engineer
IT
Support
Governance
Access To Data
Scale & Automate
Proc...
Martin Magdinier - @magdmartin 10
Demo: 2014 Toronto
Cleared Building Permits
http://ow.ly/Js8GD
Data Discovery
1. What of...
Martin Magdinier - @magdmartin 11
Iterative data discovery and
transformation with
Martin Magdinier - @magdmartin
OpenRefi...
Upcoming SlideShare
Loading in …5
×

Iterative data discovery and transformation with open refine

771 views

Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Iterative data discovery and transformation with open refine

  1. 1. Martin Magdinier - @magdmartin 1 Iterative data discovery and transformation with Martin Magdinier - @magdmartin OpenRefine - @OpenRefine http://openrefine.org
  2. 2. Martin Magdinier - @magdmartin 2 80% of data analysis is spent on the process of cleaning, transformation and integration
  3. 3. Martin Magdinier - @magdmartin 3 • Duplicate value & Typos • Multi value cells • Data in the wrong field • Missing / Partial Values • Encoding Errors • Change format (text, number, date) • Flat to relational data set • Schema alignment • Transpose rows and columns • Join data-set • Enrichment from other sources (MDM, API calls) Data Quality & Integration & Is Time Consuming
  4. 4. Martin Magdinier - @magdmartin 4 OpenRefine Bridges The Skill Gap DBA ETL Data Science Spreadsheet User Data Visualization / Interpretation Data Preparation Understand The Data (Business Skills) Know How To Transform Data (Technical Skills) User Base
  5. 5. Martin Magdinier - @magdmartin 5 • SaaS and on-premise solution for extra compute power, collaboration and lightweight ETL • On demand training • Custom development • Free & Open Source • Community developed for 5 years • Available on local machine only • 5,000+ monthly download • Strong user base with Open Data, Library, Semantic web and Bio Science Semantic WebLibraryBio ScienceOpen Data
  6. 6. Martin Magdinier - @magdmartin 6 Data Engineer Scale & Automate Processes Data Quality Manage Master Data Agile Data Process
  7. 7. Martin Magdinier - @magdmartin 7 Data Engineer Scale & Automate Processes Data Quality Manage Master Data Data Scientist Develop Machine Learning & Data Analysis Model Agile Data Process
  8. 8. Martin Magdinier - @magdmartin 8 Data Engineer IT Support Governance Access To Data Scale & Automate Processes Data Quality Manage Master Data Data Scientist Discovery Data Wrangling Profiling Preparation Quality Integration Agile Data Process Business Analyst Develop Machine Learning & Data Analysis Model Sense Making Data Exploration Reporting Analysis Scale Real -Time Lightweight ETL Migration
  9. 9. Martin Magdinier - @magdmartin 9 Business Analyst Data Engineer IT Support Governance Access To Data Scale & Automate Processes Data Quality Manage Master Data Data Scientist Discovery Data Wrangling Profiling Preparation Quality Integration Agile Data Process Develop Machine Learning & Data Analysis Model ETL Tools
  10. 10. Martin Magdinier - @magdmartin 10 Demo: 2014 Toronto Cleared Building Permits http://ow.ly/Js8GD Data Discovery 1. What of Permit Type are issued? 2. Explore Previous usage ; Application Date & Dwelling Units Created Data Preparation 1. Geocode with Google Maps API 2. Map Construction with over 10 new Dwelling Units Created
  11. 11. Martin Magdinier - @magdmartin 11 Iterative data discovery and transformation with Martin Magdinier - @magdmartin OpenRefine - @OpenRefine http://openrefine.org

×