SlideShare a Scribd company logo
1 of 35
Data Stories
Jeremy Hinegardner
Scottish Ruby Conference 2011
Data Analysis Anyone?
Other People’s Data?
Your Own Data?
Weblog data

Database transactions

postgres performance metrics
Public Data?
UN Statistics website

data.gov.uk

data.gov

Scottish Home Survey

The Guardian

IMDB
All 3?
Collective Intellect
private customer data

  internal chat, email lists

our internal data

  processing metrics, queue sizes, database
  queries, performance metrics.

public data

  blogs, boards, tweets
I have the __DATA__!

What now?
Scrub it.
Get out your Cleaning
Supplies... Ruby
Majority of Your Time
Will be Spent Cleaning
The Data.
Interesting Data
Cleaning Problems?
I want to hear about
them.
Cleaning IMDB
A whole bunch of .gz
files.
Each with its own
slightly different format.
Extra junk around the
data.
ISO-8859 -> UTF8
Dates...
Black and White
Black and white
Black & White
Country Inconsistencies
Why are we doing this?
To Learn Something
New!
Supercrunchers
Outliers
Freakonomics
Superfreakonomics
Science of Fear
OK Cupid
The Guardian
Investigation Time.
Internet Movie Database
Title              Running Times by
                   Country
Year made
                   Country of Origin
Actresses/Actors
                   Language
Release Dates by
Country            Production
                   Companies
Colour
                   Genre
Ruby + SQLite + R + iTerm
Thanks!



Jeremy Hinegardner
jeremy@hinegardner.org
@copiousfreetime

More Related Content

Similar to Data Stories

Service Design Days 2017 - Keynote Jon Rogers (University of Dundee)
Service Design Days 2017 - Keynote Jon Rogers (University of Dundee)Service Design Days 2017 - Keynote Jon Rogers (University of Dundee)
Service Design Days 2017 - Keynote Jon Rogers (University of Dundee)SERVICE DESIGN DAYS
 
Online text data for machine learning, data science, and research - Who can p...
Online text data for machine learning, data science, and research - Who can p...Online text data for machine learning, data science, and research - Who can p...
Online text data for machine learning, data science, and research - Who can p...Fredrik Olsson
 
Getting comfortable with Data
Getting comfortable with DataGetting comfortable with Data
Getting comfortable with DataRitvvij Parrikh
 
Digital forensics track schroader-rob when forensics collide
Digital forensics track schroader-rob when forensics collideDigital forensics track schroader-rob when forensics collide
Digital forensics track schroader-rob when forensics collideISSA LA
 
A living hell - lessons learned in eight years of parsing real estate data
A living hell - lessons learned in eight years of parsing real estate data  A living hell - lessons learned in eight years of parsing real estate data
A living hell - lessons learned in eight years of parsing real estate data lokku
 
(Ab)using Identifiers: Indiscernibility of Identity
(Ab)using Identifiers: Indiscernibility of Identity(Ab)using Identifiers: Indiscernibility of Identity
(Ab)using Identifiers: Indiscernibility of IdentityBayCHI
 
Knowledge Integration in Practice
Knowledge Integration in PracticeKnowledge Integration in Practice
Knowledge Integration in PracticePeter Mika
 
Our Adventure with MongoDB
Our Adventure with MongoDBOur Adventure with MongoDB
Our Adventure with MongoDBEthan Gunderson
 
What's the Big Deal About Big Data?.pdf
What's the Big Deal About Big Data?.pdfWhat's the Big Deal About Big Data?.pdf
What's the Big Deal About Big Data?.pdfSteven Jong
 
Big Data to Analytics
Big Data to AnalyticsBig Data to Analytics
Big Data to AnalyticsMilind Zodge
 
A fresh new look into Information Gathering - OWASP Spain
A fresh new look into Information Gathering - OWASP SpainA fresh new look into Information Gathering - OWASP Spain
A fresh new look into Information Gathering - OWASP SpainChristian Martorella
 
Web Evolution Nova Spivack Twine
Web Evolution   Nova Spivack   TwineWeb Evolution   Nova Spivack   Twine
Web Evolution Nova Spivack TwineNova Spivack
 
Semantic search: from document retrieval to virtual assistants
Semantic search: from document retrieval to virtual assistantsSemantic search: from document retrieval to virtual assistants
Semantic search: from document retrieval to virtual assistantsPeter Mika
 
From Telephones to Tablets: The Good, The Bad and The Ugly
From Telephones to Tablets: The Good, The Bad and The UglyFrom Telephones to Tablets: The Good, The Bad and The Ugly
From Telephones to Tablets: The Good, The Bad and The UglyAngela Hey
 
movie_notebook.pdf
movie_notebook.pdfmovie_notebook.pdf
movie_notebook.pdfpinstechwork
 
Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th Novembe...
Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th Novembe...Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th Novembe...
Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th Novembe...Linuxmalaysia Malaysia
 

Similar to Data Stories (20)

Service Design Days 2017 - Keynote Jon Rogers (University of Dundee)
Service Design Days 2017 - Keynote Jon Rogers (University of Dundee)Service Design Days 2017 - Keynote Jon Rogers (University of Dundee)
Service Design Days 2017 - Keynote Jon Rogers (University of Dundee)
 
Online text data for machine learning, data science, and research - Who can p...
Online text data for machine learning, data science, and research - Who can p...Online text data for machine learning, data science, and research - Who can p...
Online text data for machine learning, data science, and research - Who can p...
 
Tactical Information Gathering
Tactical Information GatheringTactical Information Gathering
Tactical Information Gathering
 
Getting comfortable with Data
Getting comfortable with DataGetting comfortable with Data
Getting comfortable with Data
 
Digital forensics track schroader-rob when forensics collide
Digital forensics track schroader-rob when forensics collideDigital forensics track schroader-rob when forensics collide
Digital forensics track schroader-rob when forensics collide
 
A living hell - lessons learned in eight years of parsing real estate data
A living hell - lessons learned in eight years of parsing real estate data  A living hell - lessons learned in eight years of parsing real estate data
A living hell - lessons learned in eight years of parsing real estate data
 
Big data
Big dataBig data
Big data
 
(Ab)using Identifiers: Indiscernibility of Identity
(Ab)using Identifiers: Indiscernibility of Identity(Ab)using Identifiers: Indiscernibility of Identity
(Ab)using Identifiers: Indiscernibility of Identity
 
Knowledge Integration in Practice
Knowledge Integration in PracticeKnowledge Integration in Practice
Knowledge Integration in Practice
 
Our Adventure with MongoDB
Our Adventure with MongoDBOur Adventure with MongoDB
Our Adventure with MongoDB
 
What's the Big Deal About Big Data?.pdf
What's the Big Deal About Big Data?.pdfWhat's the Big Deal About Big Data?.pdf
What's the Big Deal About Big Data?.pdf
 
Big Data to Analytics
Big Data to AnalyticsBig Data to Analytics
Big Data to Analytics
 
Mongo chicago
Mongo chicagoMongo chicago
Mongo chicago
 
A fresh new look into Information Gathering - OWASP Spain
A fresh new look into Information Gathering - OWASP SpainA fresh new look into Information Gathering - OWASP Spain
A fresh new look into Information Gathering - OWASP Spain
 
Web Evolution Nova Spivack Twine
Web Evolution   Nova Spivack   TwineWeb Evolution   Nova Spivack   Twine
Web Evolution Nova Spivack Twine
 
Semantic search: from document retrieval to virtual assistants
Semantic search: from document retrieval to virtual assistantsSemantic search: from document retrieval to virtual assistants
Semantic search: from document retrieval to virtual assistants
 
From Telephones to Tablets: The Good, The Bad and The Ugly
From Telephones to Tablets: The Good, The Bad and The UglyFrom Telephones to Tablets: The Good, The Bad and The Ugly
From Telephones to Tablets: The Good, The Bad and The Ugly
 
movie_notebook.pdf
movie_notebook.pdfmovie_notebook.pdf
movie_notebook.pdf
 
Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th Novembe...
Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th Novembe...Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th Novembe...
Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th Novembe...
 
Big data tutorial_part4
Big data tutorial_part4Big data tutorial_part4
Big data tutorial_part4
 

More from Jeremy Hinegardner

More from Jeremy Hinegardner (7)

Creative Photography for Geeks
Creative Photography for GeeksCreative Photography for Geeks
Creative Photography for Geeks
 
Gemology
GemologyGemology
Gemology
 
Extending JRuby
Extending JRubyExtending JRuby
Extending JRuby
 
FFI -- creating cross engine rubygems
FFI -- creating cross engine rubygemsFFI -- creating cross engine rubygems
FFI -- creating cross engine rubygems
 
Playing Nice with Others
Playing Nice with OthersPlaying Nice with Others
Playing Nice with Others
 
Crate - ruby based standalone executables
Crate - ruby based standalone executablesCrate - ruby based standalone executables
Crate - ruby based standalone executables
 
FFI - building cross engine ruby extensions
FFI - building cross engine ruby extensionsFFI - building cross engine ruby extensions
FFI - building cross engine ruby extensions
 

Recently uploaded

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 

Recently uploaded (20)

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 

Data Stories

Editor's Notes

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n