SlideShare a Scribd company logo
Data Stories
Jeremy Hinegardner
Scottish Ruby Conference 2011
Data Analysis Anyone?
Other People’s Data?
Your Own Data?
Weblog data

Database transactions

postgres performance metrics
Public Data?
UN Statistics website

data.gov.uk

data.gov

Scottish Home Survey

The Guardian

IMDB
All 3?
Collective Intellect
private customer data

  internal chat, email lists

our internal data

  processing metrics, queue sizes, database
  queries, performance metrics.

public data

  blogs, boards, tweets
I have the __DATA__!

What now?
Scrub it.
Get out your Cleaning
Supplies... Ruby
Majority of Your Time
Will be Spent Cleaning
The Data.
Interesting Data
Cleaning Problems?
I want to hear about
them.
Cleaning IMDB
A whole bunch of .gz
files.
Each with its own
slightly different format.
Extra junk around the
data.
ISO-8859 -> UTF8
Dates...
Black and White
Black and white
Black & White
Country Inconsistencies
Why are we doing this?
To Learn Something
New!
Supercrunchers
Outliers
Freakonomics
Superfreakonomics
Science of Fear
OK Cupid
The Guardian
Investigation Time.
Internet Movie Database
Title              Running Times by
                   Country
Year made
                   Country of Origin
Actresses/Actors
                   Language
Release Dates by
Country            Production
                   Companies
Colour
                   Genre
Ruby + SQLite + R + iTerm
Thanks!



Jeremy Hinegardner
jeremy@hinegardner.org
@copiousfreetime

More Related Content

Similar to Data Stories

Service Design Days 2017 - Keynote Jon Rogers (University of Dundee)
Service Design Days 2017 - Keynote Jon Rogers (University of Dundee)Service Design Days 2017 - Keynote Jon Rogers (University of Dundee)
Service Design Days 2017 - Keynote Jon Rogers (University of Dundee)
SERVICE DESIGN DAYS
 
Online text data for machine learning, data science, and research - Who can p...
Online text data for machine learning, data science, and research - Who can p...Online text data for machine learning, data science, and research - Who can p...
Online text data for machine learning, data science, and research - Who can p...
Fredrik Olsson
 
Tactical Information Gathering
Tactical Information GatheringTactical Information Gathering
Tactical Information Gathering
Christian Martorella
 
Getting comfortable with Data
Getting comfortable with DataGetting comfortable with Data
Getting comfortable with Data
Ritvvij Parrikh
 
Digital forensics track schroader-rob when forensics collide
Digital forensics track schroader-rob when forensics collideDigital forensics track schroader-rob when forensics collide
Digital forensics track schroader-rob when forensics collide
ISSA LA
 
A living hell - lessons learned in eight years of parsing real estate data
A living hell - lessons learned in eight years of parsing real estate data  A living hell - lessons learned in eight years of parsing real estate data
A living hell - lessons learned in eight years of parsing real estate data
lokku
 
Big data
Big dataBig data
Big data
raghav125
 
(Ab)using Identifiers: Indiscernibility of Identity
(Ab)using Identifiers: Indiscernibility of Identity(Ab)using Identifiers: Indiscernibility of Identity
(Ab)using Identifiers: Indiscernibility of Identity
BayCHI
 
Knowledge Integration in Practice
Knowledge Integration in PracticeKnowledge Integration in Practice
Knowledge Integration in Practice
Peter Mika
 
Our Adventure with MongoDB
Our Adventure with MongoDBOur Adventure with MongoDB
Our Adventure with MongoDB
Ethan Gunderson
 
What's the Big Deal About Big Data?.pdf
What's the Big Deal About Big Data?.pdfWhat's the Big Deal About Big Data?.pdf
What's the Big Deal About Big Data?.pdf
Steven Jong
 
Big Data to Analytics
Big Data to AnalyticsBig Data to Analytics
Big Data to Analytics
Milind Zodge
 
A fresh new look into Information Gathering - OWASP Spain
A fresh new look into Information Gathering - OWASP SpainA fresh new look into Information Gathering - OWASP Spain
A fresh new look into Information Gathering - OWASP Spain
Christian Martorella
 
Web Evolution Nova Spivack Twine
Web Evolution   Nova Spivack   TwineWeb Evolution   Nova Spivack   Twine
Web Evolution Nova Spivack Twine
Nova Spivack
 
Semantic search: from document retrieval to virtual assistants
Semantic search: from document retrieval to virtual assistantsSemantic search: from document retrieval to virtual assistants
Semantic search: from document retrieval to virtual assistants
Peter Mika
 
From Telephones to Tablets: The Good, The Bad and The Ugly
From Telephones to Tablets: The Good, The Bad and The UglyFrom Telephones to Tablets: The Good, The Bad and The Ugly
From Telephones to Tablets: The Good, The Bad and The Ugly
Angela Hey
 
movie_notebook.pdf
movie_notebook.pdfmovie_notebook.pdf
movie_notebook.pdf
pinstechwork
 
Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th Novembe...
Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th Novembe...Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th Novembe...
Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th Novembe...
Linuxmalaysia Malaysia
 
Big data tutorial_part4
Big data tutorial_part4Big data tutorial_part4
Big data tutorial_part4
Aravindharamanan S
 

Similar to Data Stories (20)

Service Design Days 2017 - Keynote Jon Rogers (University of Dundee)
Service Design Days 2017 - Keynote Jon Rogers (University of Dundee)Service Design Days 2017 - Keynote Jon Rogers (University of Dundee)
Service Design Days 2017 - Keynote Jon Rogers (University of Dundee)
 
Online text data for machine learning, data science, and research - Who can p...
Online text data for machine learning, data science, and research - Who can p...Online text data for machine learning, data science, and research - Who can p...
Online text data for machine learning, data science, and research - Who can p...
 
Tactical Information Gathering
Tactical Information GatheringTactical Information Gathering
Tactical Information Gathering
 
Getting comfortable with Data
Getting comfortable with DataGetting comfortable with Data
Getting comfortable with Data
 
Digital forensics track schroader-rob when forensics collide
Digital forensics track schroader-rob when forensics collideDigital forensics track schroader-rob when forensics collide
Digital forensics track schroader-rob when forensics collide
 
A living hell - lessons learned in eight years of parsing real estate data
A living hell - lessons learned in eight years of parsing real estate data  A living hell - lessons learned in eight years of parsing real estate data
A living hell - lessons learned in eight years of parsing real estate data
 
Big data
Big dataBig data
Big data
 
(Ab)using Identifiers: Indiscernibility of Identity
(Ab)using Identifiers: Indiscernibility of Identity(Ab)using Identifiers: Indiscernibility of Identity
(Ab)using Identifiers: Indiscernibility of Identity
 
Knowledge Integration in Practice
Knowledge Integration in PracticeKnowledge Integration in Practice
Knowledge Integration in Practice
 
Our Adventure with MongoDB
Our Adventure with MongoDBOur Adventure with MongoDB
Our Adventure with MongoDB
 
What's the Big Deal About Big Data?.pdf
What's the Big Deal About Big Data?.pdfWhat's the Big Deal About Big Data?.pdf
What's the Big Deal About Big Data?.pdf
 
Big Data to Analytics
Big Data to AnalyticsBig Data to Analytics
Big Data to Analytics
 
Mongo chicago
Mongo chicagoMongo chicago
Mongo chicago
 
A fresh new look into Information Gathering - OWASP Spain
A fresh new look into Information Gathering - OWASP SpainA fresh new look into Information Gathering - OWASP Spain
A fresh new look into Information Gathering - OWASP Spain
 
Web Evolution Nova Spivack Twine
Web Evolution   Nova Spivack   TwineWeb Evolution   Nova Spivack   Twine
Web Evolution Nova Spivack Twine
 
Semantic search: from document retrieval to virtual assistants
Semantic search: from document retrieval to virtual assistantsSemantic search: from document retrieval to virtual assistants
Semantic search: from document retrieval to virtual assistants
 
From Telephones to Tablets: The Good, The Bad and The Ugly
From Telephones to Tablets: The Good, The Bad and The UglyFrom Telephones to Tablets: The Good, The Bad and The Ugly
From Telephones to Tablets: The Good, The Bad and The Ugly
 
movie_notebook.pdf
movie_notebook.pdfmovie_notebook.pdf
movie_notebook.pdf
 
Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th Novembe...
Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th Novembe...Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th Novembe...
Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th Novembe...
 
Big data tutorial_part4
Big data tutorial_part4Big data tutorial_part4
Big data tutorial_part4
 

More from Jeremy Hinegardner

Creative Photography for Geeks
Creative Photography for GeeksCreative Photography for Geeks
Creative Photography for Geeks
Jeremy Hinegardner
 
Gemology
GemologyGemology
Extending JRuby
Extending JRubyExtending JRuby
Extending JRuby
Jeremy Hinegardner
 
FFI -- creating cross engine rubygems
FFI -- creating cross engine rubygemsFFI -- creating cross engine rubygems
FFI -- creating cross engine rubygems
Jeremy Hinegardner
 
Playing Nice with Others
Playing Nice with OthersPlaying Nice with Others
Playing Nice with Others
Jeremy Hinegardner
 
Crate - ruby based standalone executables
Crate - ruby based standalone executablesCrate - ruby based standalone executables
Crate - ruby based standalone executables
Jeremy Hinegardner
 
FFI - building cross engine ruby extensions
FFI - building cross engine ruby extensionsFFI - building cross engine ruby extensions
FFI - building cross engine ruby extensions
Jeremy Hinegardner
 

More from Jeremy Hinegardner (7)

Creative Photography for Geeks
Creative Photography for GeeksCreative Photography for Geeks
Creative Photography for Geeks
 
Gemology
GemologyGemology
Gemology
 
Extending JRuby
Extending JRubyExtending JRuby
Extending JRuby
 
FFI -- creating cross engine rubygems
FFI -- creating cross engine rubygemsFFI -- creating cross engine rubygems
FFI -- creating cross engine rubygems
 
Playing Nice with Others
Playing Nice with OthersPlaying Nice with Others
Playing Nice with Others
 
Crate - ruby based standalone executables
Crate - ruby based standalone executablesCrate - ruby based standalone executables
Crate - ruby based standalone executables
 
FFI - building cross engine ruby extensions
FFI - building cross engine ruby extensionsFFI - building cross engine ruby extensions
FFI - building cross engine ruby extensions
 

Recently uploaded

Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
Vlad Stirbu
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 

Recently uploaded (20)

Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 

Data Stories

Editor's Notes

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n