Exploratory Data Analysis

885
-1

Published on

Talk given by me at Gnunify 2014 on Exploratory Data Analysis

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
885
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Exploratory Data Analysis

  1. 1. Exploratory Data Analysis Aditya Laghate Twitter: @thinrhino 1
  2. 2. Who am I? • A pseudo geek • Freelance software consultant • Wildlife photographer Twitter: @thinrhino 2
  3. 3. Agenda • • • • Data gathering Data cleaning Usage of classic unix tools Data analysis Twitter: @thinrhino 3
  4. 4. Data Gathering • Public data websites o data.gov.in o databank.worldbank.org • Social websites o facebook.com o twitter.com • Blogs / websites /etc via scrapping Twitter: @thinrhino 4
  5. 5. Data cleaning • Eg: openrefine o OpenRefine (ex-Google Refine) is a powerful tool for working with messy data, cleaning it, transforming it from one format into another, extending it with web services, and linking it to databases like Freebase o openrefine.org Twitter: @thinrhino 5
  6. 6. Classic Unix Tools • sed /awk • Shell scripts • GNU parallel o Examples: o cat rands20M.txt | awk '{s+=$1} END {print s}’ o cat rands20M.txt | parallel --pipe awk '{s+=$1}END{print s}' | awk '{s+=$1} END {print s}’ o wc -l bigfile.txt o cat bigfile.txt | parallel {print s}' Twitter: @thinrhino --pipe wc -l | awk '{s+=$1} END 6
  7. 7. Data Analysis Twitter: @thinrhino 7
  8. 8. Questions @thinrhino me@adityalaghate.in Twitter: @thinrhino 8

×