Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Oscon 2013 Jesse Anderson

3,278 views

Published on

Jesse Anderson's OSCON 2013 talk

  • Sex in your area is here: ♥♥♥ http://bit.ly/2ZDZFYj ♥♥♥
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Dating for everyone is here: ♥♥♥ http://bit.ly/2ZDZFYj ♥♥♥
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Oscon 2013 Jesse Anderson

  1. 1. 1 Headline Goes Here Speaker Name or Subhead Goes Here DO NOT USE PUBLICLY PRIOR TO 10/23/12 Doing Data Science on the NFL Play by Play Dataset Jesse Anderson | Curriculum Developer and Instructor July 2013 v2
  2. 2. Plays 2 • Advanced NFL stats released all Play by Play since 2002 season • 2,898 total games • 471,392 plays
  3. 3. Full Play Entry 3 20121119_CHI@SF,3,1 7,48,SF,CHI,3,2,76,20, 0,(2:48) C.Kaepernick pass short right to M.Crabtree to SF 25 for 1 yard (C.Tillman). Caught at SF 25. 0-yds YAC,0,3,0,27,7 ,2012
  4. 4. Play Description 4 (2:48) C.Kaepernick pass short right to M.Crabtree to SF 25 for 1 yard (C.Tillman). Caught at SF 25. 0-yds YAC
  5. 5. There's A Chart for That 5
  6. 6. There's A Custom MapReduce Behind That 6 public class IncompletesMapper extends Mapper<LongWritable, Text, Text, PassWritable> { @Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); if (line.contains("incomplete")) { Matcher matcher = incompletePass.matcher(line); if (matcher.find()) { context.write(new Text(matcher.group(1) + "-" + matcher.group(2)), new PassWritable(1,Integer.parseInt(matcher.group(3))));
  7. 7. 7 The Hive Story Enter the Query
  8. 8. Queryable Data 8 Give me every run play by New Orleans in the 2010 season
  9. 9. From the Data: Fourth Downs 9 15% of 4th down plays weren't kicks
  10. 10. Play by Play Pieces 10 (2:48) C.Kaepernick pass short right to M.Crabtree to SF 25 for 1 yard (C.Tillman). Caught at SF 25. 0-yds YAC
  11. 11. From the Data: Sacks 11 QB sacks and scrambles double on 3rd downs
  12. 12. Hive • Abstraction on top of MapReduce • Allows queries using a SQL-like language 12
  13. 13. Hive Query 13 Give me every run by New Orleans in the 2010 season: SELECT * FROM playbyplay WHERE playtype = "RUN" and year = 2010 and game like "%NO%";
  14. 14. From the Data: Yards to Go 14 With 1 yard to go, 65% of plays are runs
  15. 15. 15 Lost in data Algorithm Alone
  16. 16. Data Janitorial 16
  17. 17. From the Data: Number of Plays By Yard Line 17 Direction of Offense
  18. 18. Stadium 18
  19. 19. Figuring Out Stadium 19 20121119_CHI@SF Date Played Away Team Home Team
  20. 20. From the Data: Stadium Attendance 20 Stadiums with the smallest capacities average the best scores 20.55-17.79
  21. 21. Stadium Data 21 Stadium The capacity of the stadium Expanded Capacity The expanded capacity of the stadium Location The location of the stadium Playing Surface The type of grass, etc that the stadium has Is Artificial Is the playing surface artificial Team The name of the team that plays at the stadium Roof Type The type of roof in the stadium (None, Retractable, Dome) Elevation The elevation of the stadium
  22. 22. From the Data: Stadium Elevation 22 There is a 1% increase in passes at Mile High versus sea level stadiums
  23. 23. Weather 23 1,015 games had weather
  24. 24. From the Data: Fumble 24 Games with weather have a fumble 93% of the time compared to 56% without
  25. 25. Weather Data 25 STATION Station identifier STATION NAME Station location name READING DATE Date of reading PRCP Precipitation AWND Average daily wind speed WV20 Fog, ice fog, or freezing fog (may include heavy fog) TMAX Maximum temperature TMIN Minimum temperature
  26. 26. From the Data: Home Field Advantage 26 Baltimore has the biggest weather advantage 22-14
  27. 27. Arrests 27
  28. 28. Arrest Data 28 Season Player Arrested in (February to February) Team Team person played on Player Name of player Arrested Player Arrested Was a player in the play arrested that season Offense Player Arrested Offense had player arrested in season Defense Player Arrested Defense had player arrested in season Home Team Player Arrested Home Team had player arrested in season Away Team Player Arrested Away Team had player arrested in season
  29. 29. Whenever there are arrests either in the home team, away team or both, the home team 29 From 2002 to 2012, each team had many arrests. From to a low in 2002 of 56% to a HIGH OFWINS Arrest = Win?
  30. 30. 30
  31. 31. 31
  32. 32. 32 The Low Downs • /me - http://www.jesse-anderson.com • @jessetanderson • Code - https://github.com/eljefe6a/nfldata *I am not in any way affiliated with the NFL or any Team
  33. 33. 33
  34. 34. From the Data: Weather 34 Wind had the most effect on games At calm winds 41% pass and 37% run At >30 MPH 34% pass and 46% run
  35. 35. From the Data: Field Goals 35 Weather only increases misses by %1 14% of Field Goals are missed 21% of Field Goals are missed 30-39 MPH average winds

×