Data StoriesJeremy HinegardnerScottish Ruby Conference 2011
Data Analysis Anyone?
Other People’s Data?
Your Own Data?
Weblog dataDatabase transactionspostgres performance metrics
Public Data?
UN Statistics websitedata.gov.ukdata.govScottish Home SurveyThe GuardianIMDB
All 3?
Collective Intellectprivate customer data  internal chat, email listsour internal data  processing metrics, queue sizes, d...
I have the __DATA__!What now?
Scrub it.Get out your CleaningSupplies... Ruby
Majority of Your TimeWill be Spent CleaningThe Data.
Interesting DataCleaning Problems?
I want to hear aboutthem.
Cleaning IMDB
A whole bunch of .gzfiles.
Each with its ownslightly different format.
Extra junk around thedata.
ISO-8859 -> UTF8
Dates...
Black and WhiteBlack and whiteBlack & White
Country Inconsistencies
Why are we doing this?
To Learn SomethingNew!
Supercrunchers
Outliers
Freakonomics
Superfreakonomics
Science of Fear
OK Cupid
The Guardian
Investigation Time.
Internet Movie DatabaseTitle              Running Times by                   CountryYear made                   Country of...
Ruby + SQLite + R + iTerm
Thanks!Jeremy Hinegardnerjeremy@hinegardner.org@copiousfreetime
Upcoming SlideShare
Loading in …5
×

Data Stories

693 views

Published on

The slides that accompany my 2011 Scottish Ruby Conference talk 'Data Stories'.

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
693
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Data Stories

    1. 1. Data StoriesJeremy HinegardnerScottish Ruby Conference 2011
    2. 2. Data Analysis Anyone?
    3. 3. Other People’s Data?
    4. 4. Your Own Data?
    5. 5. Weblog dataDatabase transactionspostgres performance metrics
    6. 6. Public Data?
    7. 7. UN Statistics websitedata.gov.ukdata.govScottish Home SurveyThe GuardianIMDB
    8. 8. All 3?
    9. 9. Collective Intellectprivate customer data internal chat, email listsour internal data processing metrics, queue sizes, database queries, performance metrics.public data blogs, boards, tweets
    10. 10. I have the __DATA__!What now?
    11. 11. Scrub it.Get out your CleaningSupplies... Ruby
    12. 12. Majority of Your TimeWill be Spent CleaningThe Data.
    13. 13. Interesting DataCleaning Problems?
    14. 14. I want to hear aboutthem.
    15. 15. Cleaning IMDB
    16. 16. A whole bunch of .gzfiles.
    17. 17. Each with its ownslightly different format.
    18. 18. Extra junk around thedata.
    19. 19. ISO-8859 -> UTF8
    20. 20. Dates...
    21. 21. Black and WhiteBlack and whiteBlack & White
    22. 22. Country Inconsistencies
    23. 23. Why are we doing this?
    24. 24. To Learn SomethingNew!
    25. 25. Supercrunchers
    26. 26. Outliers
    27. 27. Freakonomics
    28. 28. Superfreakonomics
    29. 29. Science of Fear
    30. 30. OK Cupid
    31. 31. The Guardian
    32. 32. Investigation Time.
    33. 33. Internet Movie DatabaseTitle Running Times by CountryYear made Country of OriginActresses/Actors LanguageRelease Dates byCountry Production CompaniesColour Genre
    34. 34. Ruby + SQLite + R + iTerm
    35. 35. Thanks!Jeremy Hinegardnerjeremy@hinegardner.org@copiousfreetime

    ×