SlideShare a Scribd company logo
1
Headline Goes Here
Speaker Name or Subhead Goes Here
DO NOT USE PUBLICLY
PRIOR TO 10/23/12
Doing Data Science on the
NFL Play by Play Dataset
Jesse Anderson | Curriculum Developer and Instructor
July 2013 v2
Plays
2
• Advanced NFL
stats released all
Play by Play since
2002 season
• 2,898 total games
• 471,392 plays
Full Play Entry
3
20121119_CHI@SF,3,1
7,48,SF,CHI,3,2,76,20,
0,(2:48) C.Kaepernick
pass short right to
M.Crabtree to SF 25
for 1 yard (C.Tillman).
Caught at SF 25. 0-yds
YAC,0,3,0,27,7 ,2012
Play Description
4
(2:48) C.Kaepernick
pass short right to
M.Crabtree to SF
25 for 1 yard
(C.Tillman). Caught
at SF 25. 0-yds YAC
There's A Chart for That
5
There's A Custom MapReduce Behind That
6
public class IncompletesMapper extends
Mapper<LongWritable, Text, Text, PassWritable> {
@Override
public void map(LongWritable key, Text value, Context context) throws
IOException, InterruptedException {
String line = value.toString();
if (line.contains("incomplete")) {
Matcher matcher = incompletePass.matcher(line);
if (matcher.find()) {
context.write(new Text(matcher.group(1) + "-" +
matcher.group(2)), new PassWritable(1,Integer.parseInt(matcher.group(3))));
7
The Hive Story
Enter the Query
Queryable Data
8
Give me every run
play by New Orleans in
the 2010 season
From the Data: Fourth Downs
9
15% of 4th down
plays weren't kicks
Play by Play Pieces
10
(2:48) C.Kaepernick
pass short right to
M.Crabtree to SF
25 for 1 yard
(C.Tillman). Caught
at SF 25. 0-yds YAC
From the Data: Sacks
11
QB sacks and
scrambles
double on 3rd downs
Hive
• Abstraction on top of
MapReduce
• Allows queries using a SQL-like
language
12
Hive Query
13
Give me every run by
New Orleans in the
2010 season:
SELECT * FROM
playbyplay WHERE
playtype = "RUN"
and year = 2010
and game like
"%NO%";
From the Data: Yards to Go
14
With 1 yard to go, 65%
of plays are runs
15
Lost in data
Algorithm Alone
Data Janitorial
16
From the Data: Number of Plays By Yard Line
17
Direction of Offense
Stadium
18
Figuring Out Stadium
19
20121119_CHI@SF
Date Played Away Team Home Team
From the Data: Stadium Attendance
20
Stadiums with the smallest
capacities average the best
scores 20.55-17.79
Stadium Data
21
Stadium The capacity of the stadium
Expanded Capacity The expanded capacity of the stadium
Location The location of the stadium
Playing Surface The type of grass, etc that the stadium has
Is Artificial Is the playing surface artificial
Team The name of the team that plays at the stadium
Roof Type The type of roof in the stadium (None, Retractable, Dome)
Elevation The elevation of the stadium
From the Data: Stadium Elevation
22
There is a 1%
increase in passes at
Mile High versus sea
level stadiums
Weather
23
1,015 games had weather
From the Data: Fumble
24
Games with weather
have a fumble 93%
of the time
compared to 56%
without
Weather Data
25
STATION Station identifier
STATION NAME Station location name
READING DATE Date of reading
PRCP Precipitation
AWND Average daily wind speed
WV20 Fog, ice fog, or freezing fog (may include heavy fog)
TMAX
Maximum temperature
TMIN Minimum temperature
From the Data: Home Field Advantage
26
Baltimore has the
biggest weather
advantage 22-14
Arrests
27
Arrest Data
28
Season Player Arrested in (February to February)
Team Team person played on
Player Name of player Arrested
Player Arrested Was a player in the play arrested that season
Offense Player Arrested Offense had player arrested in season
Defense Player Arrested Defense had player arrested in season
Home Team Player Arrested Home Team had player arrested in season
Away Team Player Arrested Away Team had player arrested in season
Whenever there are
arrests either in the
home team, away team
or both, the home team
29
From 2002 to 2012, each
team had many arrests.
From to a low in 2002 of
56% to a
HIGH OFWINS
Arrest = Win?
30
31
32
The Low Downs
• /me - http://www.jesse-anderson.com
• @jessetanderson
• Code - https://github.com/eljefe6a/nfldata
*I am not in any way affiliated with the NFL or any Team
33
From the Data: Weather
34
Wind had the most effect on
games
At calm winds 41% pass and
37% run
At >30 MPH 34% pass and 46%
run
From the Data: Field Goals
35
Weather only increases
misses by %1
14% of Field Goals are
missed
21% of Field Goals are
missed 30-39 MPH
average winds

More Related Content

Viewers also liked

Halloweeen
HalloweeenHalloweeen
Halloweeenguiagirl
 
Historia...cena
Historia...cenaHistoria...cena
Historia...cenavalenypaom
 
Trabajo Práctico nº7
Trabajo Práctico nº7Trabajo Práctico nº7
Trabajo Práctico nº7tpsicologicas
 
Journalism, Blogging and the Real Time Web
Journalism, Blogging and the Real Time WebJournalism, Blogging and the Real Time Web
Journalism, Blogging and the Real Time Web
Kathy Gill
 
Aporte individual wiki 2 oviedo_ricardo
Aporte individual wiki 2 oviedo_ricardoAporte individual wiki 2 oviedo_ricardo
Aporte individual wiki 2 oviedo_ricardo
Lady Johanna Bohorquez Sandoval
 
Fujitsu France IT Future 2013 : Evolution du Data Center adapter la productio...
Fujitsu France IT Future 2013 : Evolution du Data Center adapter la productio...Fujitsu France IT Future 2013 : Evolution du Data Center adapter la productio...
Fujitsu France IT Future 2013 : Evolution du Data Center adapter la productio...Fujitsu France
 
Medios de transmisión 1
Medios de transmisión 1Medios de transmisión 1
Medios de transmisión 1Mafeer Bernal
 
Propiedades de la potenciación
Propiedades de la potenciaciónPropiedades de la potenciación
Propiedades de la potenciaciónpampayelau
 
Material didáctico
Material didácticoMaterial didáctico
Material didáctico
Tatiana Espinosa Ossa
 
Necesito Un Abrazo.
Necesito Un Abrazo.Necesito Un Abrazo.
Necesito Un Abrazo.
Freddy Ramirez
 
Plaza merca
Plaza mercaPlaza merca
Plaza merca
Jesus Hdez
 
Actividad Final Nº 2 - Didactica Universitaria - Unida
Actividad Final Nº 2 - Didactica Universitaria - UnidaActividad Final Nº 2 - Didactica Universitaria - Unida
Actividad Final Nº 2 - Didactica Universitaria - Unida
Dr Aguiar Oviedo
 
NECESITA UN CRM O GESTOR DE CORREO
NECESITA UN CRM O GESTOR DE CORREONECESITA UN CRM O GESTOR DE CORREO
NECESITA UN CRM O GESTOR DE CORREO
Solutions DAT
 

Viewers also liked (20)

Halloweeen
HalloweeenHalloweeen
Halloweeen
 
Historia...cena
Historia...cenaHistoria...cena
Historia...cena
 
Trabajo Práctico nº7
Trabajo Práctico nº7Trabajo Práctico nº7
Trabajo Práctico nº7
 
Desarrollo de las pags 7 y 8
Desarrollo de las pags 7 y 8Desarrollo de las pags 7 y 8
Desarrollo de las pags 7 y 8
 
Prof. cuervo hp
Prof. cuervo hpProf. cuervo hp
Prof. cuervo hp
 
Jjjj
JjjjJjjj
Jjjj
 
Journalism, Blogging and the Real Time Web
Journalism, Blogging and the Real Time WebJournalism, Blogging and the Real Time Web
Journalism, Blogging and the Real Time Web
 
Aporte individual wiki 2 oviedo_ricardo
Aporte individual wiki 2 oviedo_ricardoAporte individual wiki 2 oviedo_ricardo
Aporte individual wiki 2 oviedo_ricardo
 
Fujitsu France IT Future 2013 : Evolution du Data Center adapter la productio...
Fujitsu France IT Future 2013 : Evolution du Data Center adapter la productio...Fujitsu France IT Future 2013 : Evolution du Data Center adapter la productio...
Fujitsu France IT Future 2013 : Evolution du Data Center adapter la productio...
 
Medios de transmisión 1
Medios de transmisión 1Medios de transmisión 1
Medios de transmisión 1
 
Domotica
DomoticaDomotica
Domotica
 
Museos virtuales
Museos virtualesMuseos virtuales
Museos virtuales
 
Tema 6 mates denisa
Tema 6 mates denisaTema 6 mates denisa
Tema 6 mates denisa
 
Propiedades de la potenciación
Propiedades de la potenciaciónPropiedades de la potenciación
Propiedades de la potenciación
 
Innovación
InnovaciónInnovación
Innovación
 
Material didáctico
Material didácticoMaterial didáctico
Material didáctico
 
Necesito Un Abrazo.
Necesito Un Abrazo.Necesito Un Abrazo.
Necesito Un Abrazo.
 
Plaza merca
Plaza mercaPlaza merca
Plaza merca
 
Actividad Final Nº 2 - Didactica Universitaria - Unida
Actividad Final Nº 2 - Didactica Universitaria - UnidaActividad Final Nº 2 - Didactica Universitaria - Unida
Actividad Final Nº 2 - Didactica Universitaria - Unida
 
NECESITA UN CRM O GESTOR DE CORREO
NECESITA UN CRM O GESTOR DE CORREONECESITA UN CRM O GESTOR DE CORREO
NECESITA UN CRM O GESTOR DE CORREO
 

More from OSCON Byrum

OSCON 2013 - Planning an OpenStack Cloud - Tom Fifield
OSCON 2013 - Planning an OpenStack Cloud - Tom FifieldOSCON 2013 - Planning an OpenStack Cloud - Tom Fifield
OSCON 2013 - Planning an OpenStack Cloud - Tom Fifield
OSCON Byrum
 
Protecting Open Innovation with the Defensive Patent License
Protecting Open Innovation with the Defensive Patent LicenseProtecting Open Innovation with the Defensive Patent License
Protecting Open Innovation with the Defensive Patent License
OSCON Byrum
 
Using Cascalog to build an app with City of Palo Alto Open Data
Using Cascalog to build an app with City of Palo Alto Open DataUsing Cascalog to build an app with City of Palo Alto Open Data
Using Cascalog to build an app with City of Palo Alto Open Data
OSCON Byrum
 
Finite State Machines - Why the fear?
Finite State Machines - Why the fear?Finite State Machines - Why the fear?
Finite State Machines - Why the fear?
OSCON Byrum
 
Open Source Automotive Development
Open Source Automotive DevelopmentOpen Source Automotive Development
Open Source Automotive Development
OSCON Byrum
 
How we built our community using Github - Uri Cohen
How we built our community using Github - Uri CohenHow we built our community using Github - Uri Cohen
How we built our community using Github - Uri Cohen
OSCON Byrum
 
The Vanishing Pattern: from iterators to generators in Python
The Vanishing Pattern: from iterators to generators in PythonThe Vanishing Pattern: from iterators to generators in Python
The Vanishing Pattern: from iterators to generators in Python
OSCON Byrum
 
Distributed Coordination with Python
Distributed Coordination with PythonDistributed Coordination with Python
Distributed Coordination with Python
OSCON Byrum
 
An overview of open source in East Asia (China, Japan, Korea)
An overview of open source in East Asia (China, Japan, Korea)An overview of open source in East Asia (China, Japan, Korea)
An overview of open source in East Asia (China, Japan, Korea)
OSCON Byrum
 
US Patriot Act OSCON2012 David Mertz
US Patriot Act OSCON2012 David MertzUS Patriot Act OSCON2012 David Mertz
US Patriot Act OSCON2012 David Mertz
OSCON Byrum
 
OSCON 2012 US Patriot Act Implications for Cloud Computing - Diane Mueller, A...
OSCON 2012 US Patriot Act Implications for Cloud Computing - Diane Mueller, A...OSCON 2012 US Patriot Act Implications for Cloud Computing - Diane Mueller, A...
OSCON 2012 US Patriot Act Implications for Cloud Computing - Diane Mueller, A...
OSCON Byrum
 
Big Data for each one of us
Big Data for each one of usBig Data for each one of us
Big Data for each one of usOSCON Byrum
 
BodyTrack: Open Source Tools for Health Empowerment through Self-Tracking
BodyTrack: Open Source Tools for Health Empowerment through Self-Tracking BodyTrack: Open Source Tools for Health Empowerment through Self-Tracking
BodyTrack: Open Source Tools for Health Empowerment through Self-Tracking
OSCON Byrum
 
Declarative web data visualization using ClojureScript
Declarative web data visualization using ClojureScriptDeclarative web data visualization using ClojureScript
Declarative web data visualization using ClojureScriptOSCON Byrum
 
Using and Building Open Source in Google Corporate Engineering - Justin McWil...
Using and Building Open Source in Google Corporate Engineering - Justin McWil...Using and Building Open Source in Google Corporate Engineering - Justin McWil...
Using and Building Open Source in Google Corporate Engineering - Justin McWil...
OSCON Byrum
 
A Look at the Network: Searching for Truth in Distributed Applications
A Look at the Network: Searching for Truth in Distributed ApplicationsA Look at the Network: Searching for Truth in Distributed Applications
A Look at the Network: Searching for Truth in Distributed Applications
OSCON Byrum
 
Life After Sharding: Monitoring and Management of a Complex Data Cloud
Life After Sharding: Monitoring and Management of a Complex Data CloudLife After Sharding: Monitoring and Management of a Complex Data Cloud
Life After Sharding: Monitoring and Management of a Complex Data Cloud
OSCON Byrum
 
Faster! Faster! Accelerate your business with blazing prototypes
Faster! Faster! Accelerate your business with blazing prototypesFaster! Faster! Accelerate your business with blazing prototypes
Faster! Faster! Accelerate your business with blazing prototypes
OSCON Byrum
 
Comparing open source private cloud platforms
Comparing open source private cloud platformsComparing open source private cloud platforms
Comparing open source private cloud platforms
OSCON Byrum
 
State of the Art Web Mapping with Open Source
State of the Art Web Mapping with Open SourceState of the Art Web Mapping with Open Source
State of the Art Web Mapping with Open Source
OSCON Byrum
 

More from OSCON Byrum (20)

OSCON 2013 - Planning an OpenStack Cloud - Tom Fifield
OSCON 2013 - Planning an OpenStack Cloud - Tom FifieldOSCON 2013 - Planning an OpenStack Cloud - Tom Fifield
OSCON 2013 - Planning an OpenStack Cloud - Tom Fifield
 
Protecting Open Innovation with the Defensive Patent License
Protecting Open Innovation with the Defensive Patent LicenseProtecting Open Innovation with the Defensive Patent License
Protecting Open Innovation with the Defensive Patent License
 
Using Cascalog to build an app with City of Palo Alto Open Data
Using Cascalog to build an app with City of Palo Alto Open DataUsing Cascalog to build an app with City of Palo Alto Open Data
Using Cascalog to build an app with City of Palo Alto Open Data
 
Finite State Machines - Why the fear?
Finite State Machines - Why the fear?Finite State Machines - Why the fear?
Finite State Machines - Why the fear?
 
Open Source Automotive Development
Open Source Automotive DevelopmentOpen Source Automotive Development
Open Source Automotive Development
 
How we built our community using Github - Uri Cohen
How we built our community using Github - Uri CohenHow we built our community using Github - Uri Cohen
How we built our community using Github - Uri Cohen
 
The Vanishing Pattern: from iterators to generators in Python
The Vanishing Pattern: from iterators to generators in PythonThe Vanishing Pattern: from iterators to generators in Python
The Vanishing Pattern: from iterators to generators in Python
 
Distributed Coordination with Python
Distributed Coordination with PythonDistributed Coordination with Python
Distributed Coordination with Python
 
An overview of open source in East Asia (China, Japan, Korea)
An overview of open source in East Asia (China, Japan, Korea)An overview of open source in East Asia (China, Japan, Korea)
An overview of open source in East Asia (China, Japan, Korea)
 
US Patriot Act OSCON2012 David Mertz
US Patriot Act OSCON2012 David MertzUS Patriot Act OSCON2012 David Mertz
US Patriot Act OSCON2012 David Mertz
 
OSCON 2012 US Patriot Act Implications for Cloud Computing - Diane Mueller, A...
OSCON 2012 US Patriot Act Implications for Cloud Computing - Diane Mueller, A...OSCON 2012 US Patriot Act Implications for Cloud Computing - Diane Mueller, A...
OSCON 2012 US Patriot Act Implications for Cloud Computing - Diane Mueller, A...
 
Big Data for each one of us
Big Data for each one of usBig Data for each one of us
Big Data for each one of us
 
BodyTrack: Open Source Tools for Health Empowerment through Self-Tracking
BodyTrack: Open Source Tools for Health Empowerment through Self-Tracking BodyTrack: Open Source Tools for Health Empowerment through Self-Tracking
BodyTrack: Open Source Tools for Health Empowerment through Self-Tracking
 
Declarative web data visualization using ClojureScript
Declarative web data visualization using ClojureScriptDeclarative web data visualization using ClojureScript
Declarative web data visualization using ClojureScript
 
Using and Building Open Source in Google Corporate Engineering - Justin McWil...
Using and Building Open Source in Google Corporate Engineering - Justin McWil...Using and Building Open Source in Google Corporate Engineering - Justin McWil...
Using and Building Open Source in Google Corporate Engineering - Justin McWil...
 
A Look at the Network: Searching for Truth in Distributed Applications
A Look at the Network: Searching for Truth in Distributed ApplicationsA Look at the Network: Searching for Truth in Distributed Applications
A Look at the Network: Searching for Truth in Distributed Applications
 
Life After Sharding: Monitoring and Management of a Complex Data Cloud
Life After Sharding: Monitoring and Management of a Complex Data CloudLife After Sharding: Monitoring and Management of a Complex Data Cloud
Life After Sharding: Monitoring and Management of a Complex Data Cloud
 
Faster! Faster! Accelerate your business with blazing prototypes
Faster! Faster! Accelerate your business with blazing prototypesFaster! Faster! Accelerate your business with blazing prototypes
Faster! Faster! Accelerate your business with blazing prototypes
 
Comparing open source private cloud platforms
Comparing open source private cloud platformsComparing open source private cloud platforms
Comparing open source private cloud platforms
 
State of the Art Web Mapping with Open Source
State of the Art Web Mapping with Open SourceState of the Art Web Mapping with Open Source
State of the Art Web Mapping with Open Source
 

Recently uploaded

When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 

Recently uploaded (20)

When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 

Oscon 2013 Jesse Anderson

  • 1. 1 Headline Goes Here Speaker Name or Subhead Goes Here DO NOT USE PUBLICLY PRIOR TO 10/23/12 Doing Data Science on the NFL Play by Play Dataset Jesse Anderson | Curriculum Developer and Instructor July 2013 v2
  • 2. Plays 2 • Advanced NFL stats released all Play by Play since 2002 season • 2,898 total games • 471,392 plays
  • 3. Full Play Entry 3 20121119_CHI@SF,3,1 7,48,SF,CHI,3,2,76,20, 0,(2:48) C.Kaepernick pass short right to M.Crabtree to SF 25 for 1 yard (C.Tillman). Caught at SF 25. 0-yds YAC,0,3,0,27,7 ,2012
  • 4. Play Description 4 (2:48) C.Kaepernick pass short right to M.Crabtree to SF 25 for 1 yard (C.Tillman). Caught at SF 25. 0-yds YAC
  • 5. There's A Chart for That 5
  • 6. There's A Custom MapReduce Behind That 6 public class IncompletesMapper extends Mapper<LongWritable, Text, Text, PassWritable> { @Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); if (line.contains("incomplete")) { Matcher matcher = incompletePass.matcher(line); if (matcher.find()) { context.write(new Text(matcher.group(1) + "-" + matcher.group(2)), new PassWritable(1,Integer.parseInt(matcher.group(3))));
  • 8. Queryable Data 8 Give me every run play by New Orleans in the 2010 season
  • 9. From the Data: Fourth Downs 9 15% of 4th down plays weren't kicks
  • 10. Play by Play Pieces 10 (2:48) C.Kaepernick pass short right to M.Crabtree to SF 25 for 1 yard (C.Tillman). Caught at SF 25. 0-yds YAC
  • 11. From the Data: Sacks 11 QB sacks and scrambles double on 3rd downs
  • 12. Hive • Abstraction on top of MapReduce • Allows queries using a SQL-like language 12
  • 13. Hive Query 13 Give me every run by New Orleans in the 2010 season: SELECT * FROM playbyplay WHERE playtype = "RUN" and year = 2010 and game like "%NO%";
  • 14. From the Data: Yards to Go 14 With 1 yard to go, 65% of plays are runs
  • 17. From the Data: Number of Plays By Yard Line 17 Direction of Offense
  • 19. Figuring Out Stadium 19 20121119_CHI@SF Date Played Away Team Home Team
  • 20. From the Data: Stadium Attendance 20 Stadiums with the smallest capacities average the best scores 20.55-17.79
  • 21. Stadium Data 21 Stadium The capacity of the stadium Expanded Capacity The expanded capacity of the stadium Location The location of the stadium Playing Surface The type of grass, etc that the stadium has Is Artificial Is the playing surface artificial Team The name of the team that plays at the stadium Roof Type The type of roof in the stadium (None, Retractable, Dome) Elevation The elevation of the stadium
  • 22. From the Data: Stadium Elevation 22 There is a 1% increase in passes at Mile High versus sea level stadiums
  • 24. From the Data: Fumble 24 Games with weather have a fumble 93% of the time compared to 56% without
  • 25. Weather Data 25 STATION Station identifier STATION NAME Station location name READING DATE Date of reading PRCP Precipitation AWND Average daily wind speed WV20 Fog, ice fog, or freezing fog (may include heavy fog) TMAX Maximum temperature TMIN Minimum temperature
  • 26. From the Data: Home Field Advantage 26 Baltimore has the biggest weather advantage 22-14
  • 28. Arrest Data 28 Season Player Arrested in (February to February) Team Team person played on Player Name of player Arrested Player Arrested Was a player in the play arrested that season Offense Player Arrested Offense had player arrested in season Defense Player Arrested Defense had player arrested in season Home Team Player Arrested Home Team had player arrested in season Away Team Player Arrested Away Team had player arrested in season
  • 29. Whenever there are arrests either in the home team, away team or both, the home team 29 From 2002 to 2012, each team had many arrests. From to a low in 2002 of 56% to a HIGH OFWINS Arrest = Win?
  • 30. 30
  • 31. 31
  • 32. 32 The Low Downs • /me - http://www.jesse-anderson.com • @jessetanderson • Code - https://github.com/eljefe6a/nfldata *I am not in any way affiliated with the NFL or any Team
  • 33. 33
  • 34. From the Data: Weather 34 Wind had the most effect on games At calm winds 41% pass and 37% run At >30 MPH 34% pass and 46% run
  • 35. From the Data: Field Goals 35 Weather only increases misses by %1 14% of Field Goals are missed 21% of Field Goals are missed 30-39 MPH average winds

Editor's Notes

  1. Extract value and insight.http://www.flickr.com/photos/billlublin/3972999678/sizes/o/
  2. http://www.flickr.com/photos/nathaninsandiego/5159833527/sizes/o/
  3. Unstructured data. Human generated.http://www.flickr.com/photos/nathaninsandiego/5159833527/sizes/o/
  4. Incomplete passes to a receiver averaged over seasons togetherA.Luck to R.WayneG.Ferotte to C.ChambersJ.Freeman to V.JacksonT.Brady to R.MossA.Luck to D.Avery
  5. This break up creates 96 different queryablecolumsnhttp://www.flickr.com/photos/modenadude/6150263821/sizes/o/in/photostream/
  6. 1st downs are 52% runs and 42% pass2nd downs are 45% runs and 49% pass3rd downs are 26% runs and 66% passhttp://www.flickr.com/photos/crackerbunny/3215652008/sizes/l/
  7. Easy for humans to parse data, hard for computers.Natural language processingWhile breaking down the data, we need to know what questions we want to answer.Look back at my commits to see what I&apos;ve added.http://www.flickr.com/photos/nathaninsandiego/5159833527/sizes/o/
  8. http://www.flickr.com/photos/modenadude/6150820962/sizes/o/
  9. This break up creates 96 different queryable columns.Limited to data about playshttp://www.flickr.com/photos/modenadude/6150263821/sizes/o/in/photostream/
  10. 1 yard is 65% runX and 24 has the highest chance of a sack at 4.6%X and 21 has the highest chance of a QB scramble 1.7%X and 10 is about even between pass and run at high 40&apos;shttp://www.flickr.com/photos/crackerbunny/3215652008/sizes/l/
  11. 6% of plays lack weather dataHours spent diagnosing missing or bad dataHours spent downloading datahttp://www.flickr.com/photos/37611179@N00/2295452969/in/photolist-4uQNck-5SRuWS-5WYBDL-677pYM-7cscT7-7vyC7G-7XRk46-84U1Ft-ayVaRS-7ReJrS-dpXi1U-8cTwQ1-7Pq9iE-bEo82F-98LeR5-9Ue2aF-b3vtrz-7YWv62
  12. 100-81: 9%80 - 3%79-50:41%49-21: 28%20-0:18%http://en.wikipedia.org/wiki/File:Acre_over_football_field.svghttp://www.flickr.com/photos/10792703@N07/5753103429/in/photolist-9Loadr-cFWwK5-7EF4kv-d8HppU-aWhuw6-8HBrik-9X7RqK-9XaR7f-e81wbX-89PW2o-8u8GKc-dCM1x1-9bbf31-8Mco3M-ck72kf-bmuLcL-dPUGbG-8HEzxY-bSMizz-92FLxy-7LCu9g-8qcDik-81ASaj-81ASas-81ASam-81ASad-dqfGpZ-9X81MM-ck73Q3-dgnu17-dgnsVy-dgntA5-dgnrba-a85BMW-aBZgcM-beiJi2-boaW1F-7CbZ6C-a9FcCw-8nEGtU-8JwV5X-dAgFZu-doXFTj
  13. Georgia Domehttp://www.flickr.com/photos/ucumari/481430551/sizes/o/
  14. Date of game is important later on
  15. http://www.flickr.com/photos/aneebaba/5154335641/sizes/o/
  16. http://www.flickr.com/photos/aneebaba/5154335641/sizes/o/
  17. http://www.flickr.com/photos/zruda/1807289958/in/photolist-3KGQkG-44bNJx-4js8Cg-4pQ1bg-4sNLUK-4wBzkz-4wFGmh-559J6y-5nxQVm-5qnF14-5r9AyS-5r9AJq-5KGLMR-5KGQNx-5W2oxe-5W2oKZ-5W6Gt9-5W6Gvs-6k6HX8-6k6J2B-6k6Jcn-6kaUuC-6kaUQE-6wffW7-7chpaN-dFfSAs-8RsNT8-9Pzgh1-9PwrNF-812vNy-a6s3Ec-8NFpHL-bpjMZq-bpjRu1-bnv3gS-8qemwV-dFfSuG-aKju4r-9gin1L/http://www.flickr.com/photos/17251027@N00/2190657211/in/photolist-4kzG4V-4qfDjD-5e3UP6-5k4eSa-5m73Pf-5mR3nR-5nSv8u-5qnF14-5rGWN8-5rM4m3-5rM58f-5rMcT7-5rMdB3-5rMeko-5rMeZs-5rMhBN-5rMEqb-5rNvKb-5vrbfb-5zUrSt-5C3LQs-5CcaoK-5Cgq7N-5Cgtko-643317-6433ym-649s84-6EBd5T-6LwGEX-6XnJXg-6Y6D6D-71kkp7-741GVR-741H1z-741H5r-741Hcg-741HfM-741Hja-741Hoa-741HyT-741HBx-741HF6-741HJn-741HMR-741J5p-741J9r-741JdM-741Jiz-741JnM-741Jtv-741JxPhttp://www.flickr.com/photos/kevharb/3124008816/
  18. http://www.flickr.com/photos/keithallison/2310794054/sizes/o/
  19. No direct key between stadium and weather station.The average for weather scoring is 21-18 and without weather is 21-19
  20. Miami has the worst 14-18Pittsburgh has the biggest non-weather advantage 24-14http://www.flickr.com/photos/37611179@N00/2295452969/in/photolist-4uQNck-5SRuWS-5WYBDL-677pYM-7cscT7-7vyC7G-7XRk46-84U1Ft-ayVaRS-7ReJrS-dpXi1U-8cTwQ1-7Pq9iE-bEo82F-98LeR5-9Ue2aF-b3vtrz-7YWv62
  21. Used by permission of Lego Police Force https://www.facebook.com/LegoPD
  22. 2008 was the peak with 29 or 32 teams with an arrest.Commissioner Goodell implemented a personal conduct policy in 2007 for the 2008 season.http://www.thebiglead.com/index.php/2013/07/01/nfl-offseason-arrests-are-up-61-since-roger-goodell-implemented-personal-conduct-policy-in-2007/
  23. Weather not as big as issue.Arrests not a big issueWe need to use data to make decisions.
  24. Learn more at screencast.Use QuickStart VM
  25. http://www.flickr.com/photos/paolo_rosa/5062025369/sizes/o/
  26. http://www.flickr.com/photos/billlublin/3973002210/sizes/o/