Microsoft NERD Talk - R and Tableau - 2-4-2013

2,274 views

Published on

This presentation is from a talk I gave at Microsoft NERD for the Boston Predictive Analytics Meetup group.

Published in: Sports

Microsoft NERD Talk - R and Tableau - 2-4-2013

  1. 1. TABLEAU AND RBeauty and the BeastTanya Cashorali@tanyacash21
  2. 2. R – THE WORKHORSE
  3. 3. TABLEAU – MAKES BEAUTIFUL THINGS HAPPEN
  4. 4. BUT SO CAN R
  5. 5. TOGETHER THEY ARE UNSTOPPABLE
  6. 6. SERIOUSLY THOUGH, WHAT IS R? Open source Statistical Programming Environment 4,211 community contributed packages on CRAN as of 1/31/2013 - http://cran.r-project.org/ Interpreted - Terminal or GUI (Rstudio)
  7. 7. WHAT IS TABLEAU? Data visualization software for interactive business intelligence Spun out of Stanford University in 2003, current CTO was a founder of Pixar Animation Studios Drag and drop interface
  8. 8. R AND TABLEAU R Tableau Insert using the RODBC package Variousdata munge database Livedata model drivers connection Write to .csv Dashboards
  9. 9. START WITH THE R WORKHORSE Read data into R  pbp2012 <- read.csv(file=“2012_nfl_pbp_data_reg_season.csv”, header=TRUE) View the data  str(pbp2012)
  10. 10. START WITH THE R WORKHORSE (CONT’D) Conduct pre-processing or “data munging”  is.na(pbp2012$down); as.numeric(pbp2012$ydline) Slice and dice  subset(pbp2012, qtr == 1) Write to CSV for consumption by Tableau Public  write.csv(pbp2012, file=“pbp2012.csv", row.names=FALSE)
  11. 11. R NO HUDDLE EXAMPLE## read in the dataseasons <- c(2002:2011)pbp <- read.csv("2012_nfl_pbp_data_reg_season.csv", header=TRUE,stringsAsFactors=FALSE)n1 <- read.csv("2002_nfl_pbp_data.csv", header=TRUE,stringsAsFactors=FALSE)pbp <- pbp[,-which(is.na(match(colnames(pbp), colnames(n1))))]for(season in seasons){ n1 <- read.csv(paste(season, "_nfl_pbp_data.csv", sep=""), header=TRUE, stringsAsFactors=FALSE) pbp <- rbind(pbp, n1)}## grab the no huddle playsnh <- pbp[grep("Huddle", pbp$description),]## calculate the percentage of no-huddle plays each team rannh.by.team <- table(nh$off)
  12. 12. R NO HUDDLE EXAMPLE (CONT’D)ggplot(nh.by.team, aes(x=reorder(Var1, -Freq), y=Freq)) +geom_bar(stat="identity") + labs(x="Team", y="Number of Plays", title="Number ofNo Huddle Plays Ran by Team 2002-2012") + theme(axis.text.x = element_text(angle= 50, hjust = 1))
  13. 13. R NO HUDDLE EXAMPLE (CONT’D)## table by offensive team and quarternh.df <- data.frame(table(nh$off, nh$qtr))[-1,]colnames(nh.df) <- c("Team", "Quarter", "Number")## plot number of no huddle plays by team by quarterggplot(nh.df, aes(x=reorder(Team, Number), y=Number, fill=Quarter)) + geom_bar() + labs(x="Team",y="Number", title="Number of No Huddle Plays in the NFL by Team by Quarter") + theme(axis.text.x =element_text(angle = 50, hjust = 1))
  14. 14. TABLEAU-IFIED ## write file for Tableau write.table(nh.by.team, file=“noHuddles.txt", sep="t", row.names=FALSE) http://sportsdataviz.com/percentage-no-huddle-plays-by-nfl-team-by-season-2002-2012/
  15. 15. IS THE RAVENS OFFENSE PREDICTABLE? ## Read in the data generated by play_parser.py plays <- read.csv(“plays.csv", header=TRUE, stringsAsFactors=FALSE) ## extract Baltimore offensive plays plays <- plays[grep("BAL", plays$gameid),] plays <- subset(plays, def != "BAL") ## 1,625 offensive BAL plays in the 2012 regular season nrow(plays) ## classify the other play types that are not passes or runs plays$type <- as.character(plays$type) plays[grep("PENALTY", plays$desc),]$type <- "Penalty" plays[grep("kick", plays$desc),]$type <- "Kick" plays[grep("punt", plays$desc),]$type <- "Punt" plays[grep("field goal", plays$desc),]$type <- "FG" ## create a binned variable yardsToGo plays$yardsToGo <- "0" plays[plays$ydline >= 80,]$yardsToGo <- ">= 80" plays[plays$ydline >= 50 & plays$ydline < 80,]$yardsToGo <- "50 <= yardsToGo < 80" plays[plays$ydline >= 30 & plays$ydline < 50,]$yardsToGo <- "30 <= yardsToGo < 50" plays[plays$ydline >= 10 & plays$ydline < 30,]$yardsToGo <- "10 <= yardsToGo < 30" plays[plays$ydline < 10,]$yardsToGo <- "< 10" ## write out file for Tableau write.table(plays, file="BALplays2012regSeason.csv", row.names=FALSE) http://sportsdataviz.com/superbowl-xlvii-2013-baltimore-ravens-offense-predictability/
  16. 16. IS THE RAVENS OFFENSE PREDICTABLE? (CONT’D) Set the scenario for each play during the Superbowl and predicted either run or pass based on percentage. http://sportsdataviz.com/superbowl-xlvii-2013-baltimore-ravens-offense-predictability/
  17. 17. RESULTS AND CONSIDERATIONS Predicted plays correctly 60.3% of the time Missing variables (defensive and offensive formations, crowd noise, weather, injured players, power outage, etc.) Change in Ravens’ offensive coordinator in week 15 Lack of data
  18. 18. SUMMARY Initial analysis in R  Explore the data  Pre-process  Write to file for consumption by Tableau Public or to database for Tableau Desktop Create interactive dashboards in Tableau in minutes that can be shared via a web interface (free = publicly available, paid = private internally hosted Tableau Server)
  19. 19. REFERENCES NFL Play by Play Data (2002 – 2012) http://www.advancednflstats.com/2010/04/play-by- play-data.html Python parser for NFL PBP Data http://www.10flow.com/ Tableau Public http://www.tableausoftware.com/public/ R http://cran.r-project.org/ SportsDataViz - http://www.sportsdataviz.com/
  20. 20. APPENDIX
  21. 21. TABLEAU DESKTOP FEATURE COMPARISON Public Edition Personal Edition Professional EditionOperating System Windows application Windows application Windows applicationSaves to the Tableau Only Option OptionPublic Website?Opens Data in Files? Yes Yes YesOpens Data in No No YesDatabases?Save Work Locally? No Yes YesExport Results No Yes YesLocally?Data Limitation? 100,000 rows Unlimited UnlimitedPublish to Tableau No No YesServer?Cost Free $999 $1,999

×