Exploratory Data Analysis
Aditya Laghate

Twitter: @thinrhino

1
Who am I?
• A pseudo geek
• Freelance software consultant
• Wildlife photographer

Twitter: @thinrhino

2
Agenda
•
•
•
•

Data gathering
Data cleaning
Usage of classic unix tools
Data analysis

Twitter: @thinrhino

3
Data Gathering
• Public data websites
o data.gov.in
o databank.worldbank.org

• Social websites
o facebook.com
o twitter.c...
Data cleaning
• Eg: openrefine
o OpenRefine (ex-Google Refine) is a powerful tool for working with messy
data, cleaning it...
Classic Unix Tools
• sed /awk
• Shell scripts
• GNU parallel
o Examples:
o cat rands20M.txt | awk '{s+=$1} END {print s}’
...
Data Analysis

Twitter: @thinrhino

7
Questions
@thinrhino
me@adityalaghate.in

Twitter: @thinrhino

8
Upcoming SlideShare
Loading in...5
×

Exploratory Data Analysis

617

Published on

Talk given by me at Gnunify 2014 on Exploratory Data Analysis

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
617
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Exploratory Data Analysis"

  1. 1. Exploratory Data Analysis Aditya Laghate Twitter: @thinrhino 1
  2. 2. Who am I? • A pseudo geek • Freelance software consultant • Wildlife photographer Twitter: @thinrhino 2
  3. 3. Agenda • • • • Data gathering Data cleaning Usage of classic unix tools Data analysis Twitter: @thinrhino 3
  4. 4. Data Gathering • Public data websites o data.gov.in o databank.worldbank.org • Social websites o facebook.com o twitter.com • Blogs / websites /etc via scrapping Twitter: @thinrhino 4
  5. 5. Data cleaning • Eg: openrefine o OpenRefine (ex-Google Refine) is a powerful tool for working with messy data, cleaning it, transforming it from one format into another, extending it with web services, and linking it to databases like Freebase o openrefine.org Twitter: @thinrhino 5
  6. 6. Classic Unix Tools • sed /awk • Shell scripts • GNU parallel o Examples: o cat rands20M.txt | awk '{s+=$1} END {print s}’ o cat rands20M.txt | parallel --pipe awk '{s+=$1}END{print s}' | awk '{s+=$1} END {print s}’ o wc -l bigfile.txt o cat bigfile.txt | parallel {print s}' Twitter: @thinrhino --pipe wc -l | awk '{s+=$1} END 6
  7. 7. Data Analysis Twitter: @thinrhino 7
  8. 8. Questions @thinrhino me@adityalaghate.in Twitter: @thinrhino 8

×