Fairy tale from the land of data
Upcoming SlideShare
Loading in...5
×
 

Fairy tale from the land of data

on

  • 154 views

A fairy tale about falling into a trap of wrong interpretation of results. Shows the importance of building models and understanding them.

A fairy tale about falling into a trap of wrong interpretation of results. Shows the importance of building models and understanding them.

Statistics

Views

Total Views
154
Views on SlideShare
151
Embed Views
3

Actions

Likes
0
Downloads
1
Comments
0

1 Embed 3

https://twitter.com 3

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Fairy tale from the land of data Fairy tale from the land of data Presentation Transcript

    • Fairy tales in the land of data Or - do I know what I’m doing? By @przemur from http://about.me/przemek.maciolek
    • A story
    • http://yamao.deviantart.com/art/Cleric-comm-343786321 https://www.flickr.com/photos/jsjgeology/8359854092/
    • Suspense
    • <
 ? “The hammers from the new provider are no good, sayr.”
    • What would you do?
    • New hammers since this month
    • install.packages('ggplot2') require('ggplot2') setwd("/Users/pmm/Desktop/hammer") all <- read.csv(file="all.csv") ! qplot(all$month_sequence, all$dwarfs) + geom_smooth() qplot(all$month_sequence, all$production) + geom_smooth() ! all$prod_per_dwarf <- all$production / all$dwarfs qplot(all$month_sequence, all$prod_per_dwarf) + geom_smooth()
    • Number of dwarfs working in the mine The hammers from the new provider started being distributed to the new miners.
    • Total production of gold
    • Per-dwarf average production
    • Who sees any problem?
    • Lets look at the production of each dwarf, relative to the time one applied… Dwarfs which are using the OLD hammer design Dwarfs which are using the NEW hammer design
    • new <- read.csv(file="new_relative.csv") old <- read.csv(file="old_relative.csv") ! qplot(new$relative_month, new$production) ggplot(new, aes(x=relative_month, y=production)) + geom_point(shape=19, position=position_jitter(width=.5,height=0), alpha=.2)
 # This will look much better!
 old$type='old' new$type='new' old_and_new = rbind(old,new) ggplot(old_and_new, aes(x=relative_month, y=production, color=type)) + geom_point(shape=19, position=position_jitter(width=.5,height=0), alpha=.2)
    • Scatterplot showing relative production done using old and new hammers
    • What now?
    • ggplot(old_and_new, aes(x=relative_month, y=production, color=type)) + geom_point(shape=19, position=position_jitter(width=.5,height=0), alpha=.1) + geom_smooth(method=lm) The new hammers wear much faster!
    • How much did the dwarfs lost?
    • old_m = lm(production ~ relative_month, old) new$possible_production <- predict(old_m, new) sum(new$possible_production) - sum(new$production) (sum(new$possible_production) - sum(new$production))/sum(new$production) 0.5% Now, taking into account the price of hammer, one can select the optimal strategy… but that’s another story…
    • Lessons learned …? • Don’t trust the data blindly, ask questions • Try to understand underlying rules of the system • Don’t be shy with trying various models • If using R, go for ggplot2