SlideShare a Scribd company logo
How smart is
Football Data
Analytics today?
Dr. Stefan Kühn
data2day - Karlsruhe
29.09.2015
Topic
Why Football Data Analytics?
• It’s about Football
• There is a lot of data out there
• There is a lot of ignorance out there
• Three examples
• Corners
• Marginal goals
• Substitutions
• Alternatives
2
Infos
Why Football is an interesting Use Case
• 209 FIFA federations - worldwide
• Most popular sport - 3.3-3.5 billion fans
• Monetary facts - revenue (Deloitte Money League)
• Real Madrid 2013/4: 549.5 Million € (Position 1)
• Bayern Munich 2013/4: 487.5 Million € (Position 3)
• Everton 2013/4: 144.1 Million € (Position 20)
• Social Media facts (Deloitte Money League)
• Facebook: FC Barcelona - 81.4 Million Likes
• Twitter: Real Madrid - 14.4 Million Followers
3
Some Stats
Why Football is a Data Use Case
• 306 Bundesliga matches per season
• 2000+ recorded events per match
• 512 Bundesliga players
• Live Statistics (Opta, Prozone etc.):
• Shots, Passes, Assists
• Tacklings, Blocks, intercepted Passes
• Saves and other actions of Goalkeepers
• Fouls and Foul types
• Position Data including time stamps
• 1.8 Million Amateur matches (Deutschland)
4
Some Remarks
Is there anything left to do?
• Big companies like SAP are involved
• Players are tracked in training and matches (and
sometimes at home as well)
• Physiological data, nutrition data, training plans
★ BUT:
Big data is not about the data.
(Gary King, Harvard University, 2013)
It’s about Analytics.
5
Some Remarks
Where is the ignorance?
• „The Number’s Game - Why Everything You
Know About Football Is Wrong“
• Book by Chris Anderson (former Cornell University
Prof) and David Sally (Economics and Behavioral
Game Theory)
• „Is it easier to score as a sub“?
• Blogpost by Dan Altman, founder of North Yard
Analytics
6
Ignorance
-
Part 1
7
Corners
Claim: Long corners are overrated, short
corners are better, see e.g. Barca.
8
Long corners versus Short corners
Corners
Some useful stats
• Average number of goals per team per match: 1.3
• Average number of corners per team per match: 5
• Long corners account for ~8.5% of all goals
• Silly question: The average team scores once
every ten games from a penalty, shall they give
up on penalties as well?
• Lack of relevant context
• How efficient are the alternatives?
• How efficient is the average possession?
9
Corners
Average Possession
• Average number of possessions per team per match: 200
• Average number of goals per team per match: 1.3
• Expectation value per possession: 0.0065
• Normalized per match (200 possessions):
• All possessions are corners: 4.4 goals
• Half of the possessions are corner: 2.85 goals
• 10% of the possessions are corners: 1.46 goals
• The efficiency of long corners is more than three times
as high as the efficiency of the average possession.
• Still unknown:
• How efficient are the alternatives?
• Are there any negative counter effects?
10
Corners
11
Ignorance
-
Part 2
12
Marginal Goals
13
Claim:
Some goals count
more than others,
one should rate
players according
to this.
Marginal goals
14
Why they should have bought Darren Bent
What do you think?
Marginal goals
Why they should have bought a book on hypothesis testing
• How many second goals could have been scored without the first goal?
• Do the samples for matches with one (own) goal, two goals etc. differ,
and if yes (it’s a definite yes, selection bias): how?
• Is it more likely to score more against weaker teams and less against
stronger teams?
• And of course: The events considered here are not statistically
independent.
15
What they should have done
• Compute marginal goals per sample group (e.g. fixed number of own goals).
Here, the first goal cannot have less marginal points than the second goal etc.
which is the only reasonable result.
• Do not compare apples and pies. (In some sense Simpson’s paradox)
• Or: Hire the best striker for first goals and the best striker for second goals.
Ignorance
-
Part 3
16
Substitutions and Scoring
17
Substitutions and Scoring
Claim
Subs score more
than expected
• This is the first
correct claim!
• But still weak
effect, unknown
reason(s)
• Do opponents
score more as
well?
• Corrections needed
• 36% of subs are
forwards
• Individual Orders
• Tactical changes
• Lots of other things
18
Substitutions and Scoring
Only
forwards
Controlled
for time on
the field
• Claim:
Fatigue is
the cause
of this
effect!
19
Substitutions and Scoring
A closer look
Estimates for
the mean for
first and
second half
• Analysis:
No control for
fatigue
possible, only
control for
time spent on
the field.
20
From minute 60
on the share of
subs starts to
rise. Effect on
number of goals?
Substitutions and Scoring
Detected
Reason
Fatigue,
subs are
fitter
• What do
you think,
when
looking at
this graph?
21
Summary
What are the commonalities in all cases?
• „New“ spectacular insights
• Preconceptions
• Confirmation Bias
• Lack of reflection
• Challenging own results?
• Alternative explanations?
• Do not mix up a variable and your interpretation
of this variable (fatigue vs. time on field)
• BUT: Data and Tools have been good!
22
Alternatives
23
What keeps Football Data Analytics from being smart?
24
Requirements
+ Scientific Method!
Reality
Tools Data
Money
???
+ Severe Time Constraint
+ Results must impress
What keeps Data Analytics from being smart?
25
Requirements
+ Scientific Method!
Reality
Tools Data
Money
???
+ Severe Time Constraint
+ Results must impress
Alternatives
26
27
Thanks a lot!
And enjoy the game :-)
www.codecentric.de
blog.codecentric.de
stefan.kuehn@codecentric.de

More Related Content

Similar to SKuehn_Talk_FootballAnalytics_data2day2015

http://qa.us/aaaaG9 is a link Multi channel content from new page
http://qa.us/aaaaG9 is a link Multi channel content from new pagehttp://qa.us/aaaaG9 is a link Multi channel content from new page
http://qa.us/aaaaG9 is a link Multi channel content from new page
nikhilawareness
 
Harry Potter 7-2 3D tonight!!! http://4rd.ca/aaaj6w
Harry Potter 7-2 3D tonight!!! http://4rd.ca/aaaj6wHarry Potter 7-2 3D tonight!!! http://4rd.ca/aaaj6w
Harry Potter 7-2 3D tonight!!! http://4rd.ca/aaaj6w
nikhilawareness
 
Go to all channels so that I may test your stats tom
Go to all channels so that I may test your stats tomGo to all channels so that I may test your stats tom
Go to all channels so that I may test your stats tom
nikhilawareness
 
This is going everywhere
This is going everywhereThis is going everywhere
This is going everywhere
nikhilawareness
 
WC 2011 starts tom
WC 2011 starts tomWC 2011 starts tom
WC 2011 starts tom
nikhilawareness
 
I am omnipresent
I am omnipresentI am omnipresent
I am omnipresent
nikhilawareness
 
Woolcock opta pro analytics forum
Woolcock opta pro analytics forumWoolcock opta pro analytics forum
Woolcock opta pro analytics forum
TheWoolster
 
Woolcock opta pro analytics forum with links
Woolcock opta pro analytics forum with linksWoolcock opta pro analytics forum with links
Woolcock opta pro analytics forum with links
TheWoolster
 
Field Hockey match analysis by rohit.pptx
Field Hockey match analysis by rohit.pptxField Hockey match analysis by rohit.pptx
Field Hockey match analysis by rohit.pptx
Laxmibai National Institute of Physical Education (LNIPE)
 
Andy Pick: Statistics Presentation
Andy Pick: Statistics PresentationAndy Pick: Statistics Presentation
Andy Pick: Statistics PresentationMilesBuesst
 
EC3144 Undergraduate Dissertation
EC3144 Undergraduate DissertationEC3144 Undergraduate Dissertation
EC3144 Undergraduate DissertationRory O'Riordan
 

Similar to SKuehn_Talk_FootballAnalytics_data2day2015 (13)

http://qa.us/aaaaG9 is a link Multi channel content from new page
http://qa.us/aaaaG9 is a link Multi channel content from new pagehttp://qa.us/aaaaG9 is a link Multi channel content from new page
http://qa.us/aaaaG9 is a link Multi channel content from new page
 
Harry Potter 7-2 3D tonight!!! http://4rd.ca/aaaj6w
Harry Potter 7-2 3D tonight!!! http://4rd.ca/aaaj6wHarry Potter 7-2 3D tonight!!! http://4rd.ca/aaaj6w
Harry Potter 7-2 3D tonight!!! http://4rd.ca/aaaj6w
 
Go to all channels so that I may test your stats tom
Go to all channels so that I may test your stats tomGo to all channels so that I may test your stats tom
Go to all channels so that I may test your stats tom
 
This is going everywhere
This is going everywhereThis is going everywhere
This is going everywhere
 
WC 2011 starts tom
WC 2011 starts tomWC 2011 starts tom
WC 2011 starts tom
 
I am omnipresent
I am omnipresentI am omnipresent
I am omnipresent
 
Woolcock opta pro analytics forum
Woolcock opta pro analytics forumWoolcock opta pro analytics forum
Woolcock opta pro analytics forum
 
Lineup Efficiency
Lineup EfficiencyLineup Efficiency
Lineup Efficiency
 
Woolcock opta pro analytics forum with links
Woolcock opta pro analytics forum with linksWoolcock opta pro analytics forum with links
Woolcock opta pro analytics forum with links
 
Field Hockey match analysis by rohit.pptx
Field Hockey match analysis by rohit.pptxField Hockey match analysis by rohit.pptx
Field Hockey match analysis by rohit.pptx
 
Lesson 2
Lesson 2Lesson 2
Lesson 2
 
Andy Pick: Statistics Presentation
Andy Pick: Statistics PresentationAndy Pick: Statistics Presentation
Andy Pick: Statistics Presentation
 
EC3144 Undergraduate Dissertation
EC3144 Undergraduate DissertationEC3144 Undergraduate Dissertation
EC3144 Undergraduate Dissertation
 

More from Stefan Kühn

data2day2023_SKuehn_DataPlatformFallacy.pdf
data2day2023_SKuehn_DataPlatformFallacy.pdfdata2day2023_SKuehn_DataPlatformFallacy.pdf
data2day2023_SKuehn_DataPlatformFallacy.pdf
Stefan Kühn
 
data2day2022_SKuehn_DataValueChain.pdf
data2day2022_SKuehn_DataValueChain.pdfdata2day2022_SKuehn_DataValueChain.pdf
data2day2022_SKuehn_DataValueChain.pdf
Stefan Kühn
 
Talk at MCubed London about Manifold Learning and Applications
Talk at MCubed London about Manifold Learning and ApplicationsTalk at MCubed London about Manifold Learning and Applications
Talk at MCubed London about Manifold Learning and Applications
Stefan Kühn
 
Data Science - Cargo Cult - Organizational Change
Data Science - Cargo Cult - Organizational ChangeData Science - Cargo Cult - Organizational Change
Data Science - Cargo Cult - Organizational Change
Stefan Kühn
 
Interactive Dashboards with R
Interactive Dashboards with RInteractive Dashboards with R
Interactive Dashboards with R
Stefan Kühn
 
Talk at PyData Berlin about Manifold Learning and Applications
Talk at PyData Berlin about Manifold Learning and ApplicationsTalk at PyData Berlin about Manifold Learning and Applications
Talk at PyData Berlin about Manifold Learning and Applications
Stefan Kühn
 
Bridging the gap
Bridging the gapBridging the gap
Bridging the gap
Stefan Kühn
 
The Machinery behind Deep Learning
The Machinery behind Deep LearningThe Machinery behind Deep Learning
The Machinery behind Deep Learning
Stefan Kühn
 
Manifold Learning and Data Visualization
Manifold Learning and Data VisualizationManifold Learning and Data Visualization
Manifold Learning and Data Visualization
Stefan Kühn
 
Becoming Data-driven - Machine Learning @ XING Marketing Solutions
Becoming Data-driven - Machine Learning @ XING Marketing SolutionsBecoming Data-driven - Machine Learning @ XING Marketing Solutions
Becoming Data-driven - Machine Learning @ XING Marketing Solutions
Stefan Kühn
 
Learning To Rank data2day 2017
Learning To Rank data2day 2017Learning To Rank data2day 2017
Learning To Rank data2day 2017
Stefan Kühn
 
Deep Learning and Optimization Methods
Deep Learning and Optimization MethodsDeep Learning and Optimization Methods
Deep Learning and Optimization Methods
Stefan Kühn
 
Visualizing and Communicating High-dimensional Data
Visualizing and Communicating High-dimensional DataVisualizing and Communicating High-dimensional Data
Visualizing and Communicating High-dimensional Data
Stefan Kühn
 
Data quality - The True Big Data Challenge
Data quality - The True Big Data ChallengeData quality - The True Big Data Challenge
Data quality - The True Big Data Challenge
Stefan Kühn
 
Data Visualization at codetalks 2016
Data Visualization at codetalks 2016Data Visualization at codetalks 2016
Data Visualization at codetalks 2016
Stefan Kühn
 
SKuehn_MachineLearningAndOptimization_2015
SKuehn_MachineLearningAndOptimization_2015SKuehn_MachineLearningAndOptimization_2015
SKuehn_MachineLearningAndOptimization_2015
Stefan Kühn
 

More from Stefan Kühn (16)

data2day2023_SKuehn_DataPlatformFallacy.pdf
data2day2023_SKuehn_DataPlatformFallacy.pdfdata2day2023_SKuehn_DataPlatformFallacy.pdf
data2day2023_SKuehn_DataPlatformFallacy.pdf
 
data2day2022_SKuehn_DataValueChain.pdf
data2day2022_SKuehn_DataValueChain.pdfdata2day2022_SKuehn_DataValueChain.pdf
data2day2022_SKuehn_DataValueChain.pdf
 
Talk at MCubed London about Manifold Learning and Applications
Talk at MCubed London about Manifold Learning and ApplicationsTalk at MCubed London about Manifold Learning and Applications
Talk at MCubed London about Manifold Learning and Applications
 
Data Science - Cargo Cult - Organizational Change
Data Science - Cargo Cult - Organizational ChangeData Science - Cargo Cult - Organizational Change
Data Science - Cargo Cult - Organizational Change
 
Interactive Dashboards with R
Interactive Dashboards with RInteractive Dashboards with R
Interactive Dashboards with R
 
Talk at PyData Berlin about Manifold Learning and Applications
Talk at PyData Berlin about Manifold Learning and ApplicationsTalk at PyData Berlin about Manifold Learning and Applications
Talk at PyData Berlin about Manifold Learning and Applications
 
Bridging the gap
Bridging the gapBridging the gap
Bridging the gap
 
The Machinery behind Deep Learning
The Machinery behind Deep LearningThe Machinery behind Deep Learning
The Machinery behind Deep Learning
 
Manifold Learning and Data Visualization
Manifold Learning and Data VisualizationManifold Learning and Data Visualization
Manifold Learning and Data Visualization
 
Becoming Data-driven - Machine Learning @ XING Marketing Solutions
Becoming Data-driven - Machine Learning @ XING Marketing SolutionsBecoming Data-driven - Machine Learning @ XING Marketing Solutions
Becoming Data-driven - Machine Learning @ XING Marketing Solutions
 
Learning To Rank data2day 2017
Learning To Rank data2day 2017Learning To Rank data2day 2017
Learning To Rank data2day 2017
 
Deep Learning and Optimization Methods
Deep Learning and Optimization MethodsDeep Learning and Optimization Methods
Deep Learning and Optimization Methods
 
Visualizing and Communicating High-dimensional Data
Visualizing and Communicating High-dimensional DataVisualizing and Communicating High-dimensional Data
Visualizing and Communicating High-dimensional Data
 
Data quality - The True Big Data Challenge
Data quality - The True Big Data ChallengeData quality - The True Big Data Challenge
Data quality - The True Big Data Challenge
 
Data Visualization at codetalks 2016
Data Visualization at codetalks 2016Data Visualization at codetalks 2016
Data Visualization at codetalks 2016
 
SKuehn_MachineLearningAndOptimization_2015
SKuehn_MachineLearningAndOptimization_2015SKuehn_MachineLearningAndOptimization_2015
SKuehn_MachineLearningAndOptimization_2015
 

Recently uploaded

Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 

Recently uploaded (20)

Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 

SKuehn_Talk_FootballAnalytics_data2day2015

  • 1. How smart is Football Data Analytics today? Dr. Stefan Kühn data2day - Karlsruhe 29.09.2015
  • 2. Topic Why Football Data Analytics? • It’s about Football • There is a lot of data out there • There is a lot of ignorance out there • Three examples • Corners • Marginal goals • Substitutions • Alternatives 2
  • 3. Infos Why Football is an interesting Use Case • 209 FIFA federations - worldwide • Most popular sport - 3.3-3.5 billion fans • Monetary facts - revenue (Deloitte Money League) • Real Madrid 2013/4: 549.5 Million € (Position 1) • Bayern Munich 2013/4: 487.5 Million € (Position 3) • Everton 2013/4: 144.1 Million € (Position 20) • Social Media facts (Deloitte Money League) • Facebook: FC Barcelona - 81.4 Million Likes • Twitter: Real Madrid - 14.4 Million Followers 3
  • 4. Some Stats Why Football is a Data Use Case • 306 Bundesliga matches per season • 2000+ recorded events per match • 512 Bundesliga players • Live Statistics (Opta, Prozone etc.): • Shots, Passes, Assists • Tacklings, Blocks, intercepted Passes • Saves and other actions of Goalkeepers • Fouls and Foul types • Position Data including time stamps • 1.8 Million Amateur matches (Deutschland) 4
  • 5. Some Remarks Is there anything left to do? • Big companies like SAP are involved • Players are tracked in training and matches (and sometimes at home as well) • Physiological data, nutrition data, training plans ★ BUT: Big data is not about the data. (Gary King, Harvard University, 2013) It’s about Analytics. 5
  • 6. Some Remarks Where is the ignorance? • „The Number’s Game - Why Everything You Know About Football Is Wrong“ • Book by Chris Anderson (former Cornell University Prof) and David Sally (Economics and Behavioral Game Theory) • „Is it easier to score as a sub“? • Blogpost by Dan Altman, founder of North Yard Analytics 6
  • 8. Corners Claim: Long corners are overrated, short corners are better, see e.g. Barca. 8 Long corners versus Short corners
  • 9. Corners Some useful stats • Average number of goals per team per match: 1.3 • Average number of corners per team per match: 5 • Long corners account for ~8.5% of all goals • Silly question: The average team scores once every ten games from a penalty, shall they give up on penalties as well? • Lack of relevant context • How efficient are the alternatives? • How efficient is the average possession? 9
  • 10. Corners Average Possession • Average number of possessions per team per match: 200 • Average number of goals per team per match: 1.3 • Expectation value per possession: 0.0065 • Normalized per match (200 possessions): • All possessions are corners: 4.4 goals • Half of the possessions are corner: 2.85 goals • 10% of the possessions are corners: 1.46 goals • The efficiency of long corners is more than three times as high as the efficiency of the average possession. • Still unknown: • How efficient are the alternatives? • Are there any negative counter effects? 10
  • 13. Marginal Goals 13 Claim: Some goals count more than others, one should rate players according to this.
  • 14. Marginal goals 14 Why they should have bought Darren Bent What do you think?
  • 15. Marginal goals Why they should have bought a book on hypothesis testing • How many second goals could have been scored without the first goal? • Do the samples for matches with one (own) goal, two goals etc. differ, and if yes (it’s a definite yes, selection bias): how? • Is it more likely to score more against weaker teams and less against stronger teams? • And of course: The events considered here are not statistically independent. 15 What they should have done • Compute marginal goals per sample group (e.g. fixed number of own goals). Here, the first goal cannot have less marginal points than the second goal etc. which is the only reasonable result. • Do not compare apples and pies. (In some sense Simpson’s paradox) • Or: Hire the best striker for first goals and the best striker for second goals.
  • 18. Substitutions and Scoring Claim Subs score more than expected • This is the first correct claim! • But still weak effect, unknown reason(s) • Do opponents score more as well? • Corrections needed • 36% of subs are forwards • Individual Orders • Tactical changes • Lots of other things 18
  • 19. Substitutions and Scoring Only forwards Controlled for time on the field • Claim: Fatigue is the cause of this effect! 19
  • 20. Substitutions and Scoring A closer look Estimates for the mean for first and second half • Analysis: No control for fatigue possible, only control for time spent on the field. 20 From minute 60 on the share of subs starts to rise. Effect on number of goals?
  • 21. Substitutions and Scoring Detected Reason Fatigue, subs are fitter • What do you think, when looking at this graph? 21
  • 22. Summary What are the commonalities in all cases? • „New“ spectacular insights • Preconceptions • Confirmation Bias • Lack of reflection • Challenging own results? • Alternative explanations? • Do not mix up a variable and your interpretation of this variable (fatigue vs. time on field) • BUT: Data and Tools have been good! 22
  • 24. What keeps Football Data Analytics from being smart? 24 Requirements + Scientific Method! Reality Tools Data Money ??? + Severe Time Constraint + Results must impress
  • 25. What keeps Data Analytics from being smart? 25 Requirements + Scientific Method! Reality Tools Data Money ??? + Severe Time Constraint + Results must impress
  • 27. 27 Thanks a lot! And enjoy the game :-) www.codecentric.de blog.codecentric.de stefan.kuehn@codecentric.de