• Save
Exploratory Data Analysis
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Exploratory Data Analysis

on

  • 621 views

Talk given by me at Gnunify 2014 on Exploratory Data Analysis

Talk given by me at Gnunify 2014 on Exploratory Data Analysis

Statistics

Views

Total Views
621
Views on SlideShare
612
Embed Views
9

Actions

Likes
0
Downloads
0
Comments
0

3 Embeds 9

http://adityalaghate.in 5
http://localhost 3
https://twitter.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Exploratory Data Analysis Presentation Transcript

  • 1. Exploratory Data Analysis Aditya Laghate Twitter: @thinrhino 1
  • 2. Who am I? • A pseudo geek • Freelance software consultant • Wildlife photographer Twitter: @thinrhino 2
  • 3. Agenda • • • • Data gathering Data cleaning Usage of classic unix tools Data analysis Twitter: @thinrhino 3
  • 4. Data Gathering • Public data websites o data.gov.in o databank.worldbank.org • Social websites o facebook.com o twitter.com • Blogs / websites /etc via scrapping Twitter: @thinrhino 4
  • 5. Data cleaning • Eg: openrefine o OpenRefine (ex-Google Refine) is a powerful tool for working with messy data, cleaning it, transforming it from one format into another, extending it with web services, and linking it to databases like Freebase o openrefine.org Twitter: @thinrhino 5
  • 6. Classic Unix Tools • sed /awk • Shell scripts • GNU parallel o Examples: o cat rands20M.txt | awk '{s+=$1} END {print s}’ o cat rands20M.txt | parallel --pipe awk '{s+=$1}END{print s}' | awk '{s+=$1} END {print s}’ o wc -l bigfile.txt o cat bigfile.txt | parallel {print s}' Twitter: @thinrhino --pipe wc -l | awk '{s+=$1} END 6
  • 7. Data Analysis Twitter: @thinrhino 7
  • 8. Questions @thinrhino me@adityalaghate.in Twitter: @thinrhino 8