SlideShare a Scribd company logo
1 of 37
Download to read offline
Predicting Football
Gabriele Pinto
Collegio Carlo Alberto
MADAS
(Master in Data Science for Complex Economic
Systems)
1/35
OUTLINE
Introduction
Why and What ?
The analysis
2/35
OUTLINE
Introduction:
WW- Why and What ?
3/35
WHAT IS FOOTBALL? Is it just a game?
4/35
How data is
entering into
football ?
• Acquisitions…
• Sensors to track match
and players performance…
• And BIG BANKS trying to
make prediction on world
cup….
5/35
The holy grail
6/35
7/35
The leader of the markets for predictions:
the bookmakers
ESTIMATES OF MARKET SIZE
• WORLD: 700bn-1trillion US$ a year
• Of which, 70% related to football
• IN ITALY: increase of 300% over ten years, total bets collected of 10
billion € per year. For a net revenue of 1,3 billion €**.
• IN USA: sports betting to become fully legal in 2019.
• *Source: “Football betting - the global gambling industry worth billions”, BBC, October
2013, link
• ** Source: AGIMEG-SOLE24_ORE, link
8/35
What bookmakers offer….
Implied probability is
1/odd……in this example
1/1,39 = 71% chance for
Manchester United to win
The lowest odd is the bookmaker’s favourite result! (the bookmaker prediction) !!
9/35
Sports betting in the Academia
10/35
OUTLINE
The analysis
11/35
WORKFLOW
• Data –
description
• Source
An
overview
of the
dataset
Feature
enginnering
Pre-
processing
data
Train/Test set
Selection of
the models
Training of
the models
Inspecting
results and
interpretation
Test of the
models
12/35
An overview
of the
dataset
“The European soccer database”
Downloadable from Kaggle or scraping on football-data.co.uk
• Seasons from 2008 to 2016
• 11 European top countries Leagues
• 25,979 matches
✓Results (Goal scored by home and away team)
✓Betting odds for the two teams from 10 top odds
providers
Home
team
Away
team
Home
Team
Goal
Away
Team
Goal
Stage Season League
Juventus Milan 3 2 2 2008/2009 Serie A
Lazio Sassuolo 1 0 3 2008/2009 Serie A
Real
Madrid
Valencia 1 1 5 2010/2011 Premier
League
…. … … … … … …
13/35
WORKFLOW
• Data –
description
• Source
An
overview
of the
dataset
Feature
enginnering
Pre-
processing
data
Train/Test set
Selection of
the models
Training of
the models
Inspecting
results and
interpretation
Test of the
models
14/35
Pre-
Processing
Data ‘ Result ‘
if home team
win 1
if draw X
if away team
win 2
• We will face a CLASSIFICATION task… we will classify each match to result “1” , “X” or “2”.
• We will threat those results as unordered factors…
15/35
Feature
Enginnering
• TEMPORAL:
• We cannot use the information regarding the
match to predict its result…because we need to
make the prediction before the match starts !
• WHICH FEATURES
• If we do not apply features engineering, the only
features we have available are the results of the
previous match, team_id’s and league id’s… but
this might not be enough
Why we
need to
create
features ?
• Points in the league (Win= 3 points, Draw= 1 points,
Loose=0 points)
• Goal of the teams
• Dummies (season, league, stage…)
• ELO points
• All this feature are calculated by team, league,
season….cumulatively till the match day !
Thus we
create….
16/35
Feature
Engineering
-
The final
dataset Home
team
Away
team
H_Points so far A_Points
so far
Goal in
season
_ Home
Goal in
season_
away
Elo_points
_home
Elo_point
s_away
…
Juve Milan 68 65 23 15 Serie A 2 …
Lazio Sassuol 58 18 3 3 Serie A 3 …
Real
Madrid
Valencia 15 23 5 2010/201
1
Premier
League
5 …
…. … … … … … … … …
Full set of 17 potential predictors…
Dummies
Team specific
ELO POINTS
"season","country_id","league_id","stage",
"home_team_api_id","away_team_api_id",
"pointsofar_home","pointsofar_away","home_H_points_sofar" ,"away_A_points_sofar",
"goalsofar_home_out" ,"goalsofar_home_in"
,"goalsofar_home_diff","goalsofar_away_out","goalsofar_away_in",
"home_H_goal_out_sofar" ,"away_A_goal_out_sofar"
,"home_H_goal_in_sofar","away_H_goal_in_sofar" ,"goalsofar_away_diff",
"home_H_points_elo_sofar" ,"away_A_points_elo_sofar")
17/35
Feature
Engineering
-
ELO RATINGS
ELO POINTS
• from Arpad Emrick Elo (1903 - 1992), professor of Physics, Hungary.
• Firstly applied to calculate the relative strenght of a Chess player.
• The intuition: In the normal setting we assign 3 points for a win, to each
match without taking into account the strenght of the rival.
The elo rating weights for the strenght of the rival, that is, if you are a low-
ranking team, and you win against an «high-ranking» team, you will be
assigned more than 3 points. The «multiplier» will be equal to the difference
in ranking (difference in points).
𝑬𝒍𝒐 𝒑𝒐𝒊𝒏𝒕𝒔 𝑨 𝒗𝒔 𝑩 =
𝑵𝒐𝒓𝒎𝒂𝒍 𝒑𝒐𝒊𝒏𝒕𝒔 𝑨 𝒗𝒔 𝑩
ൗ
𝑹𝒂𝒏𝒌𝒊𝒏𝒈𝒔 𝒂
𝑹𝒂𝒏𝒌𝒊𝒏𝒈𝒔 𝒃
18/35
WORKFLOW
• Data –
description
• Source
An
overview
of the
dataset
Feature
enginnering
Pre-
processing
data
Train/Test set
Selection of
the models
Training of
the models
Inspecting
results and
interpretation
Test of the
models
19/35
Training of
the models
Which model to use for classification ?
• Linear Discriminant Analysis LDA
• Random Forest + Boosting RF
• Support Vector Machine SVM
Despite we could theoretically use LOGIT for multinomial
classification, we prefer to go for other methods since Classes >2
(1,X,2). but will recall LOGIT later…
Before to look at results … LET’S GO BACK AGAIN TO
THE DATA
20/35
Training of
the models
What is the benchmark for our task ?
• If we always predict class 1, we got it right in 46% of the test. That is, our
benchmark for prediction accuracy is 0,46 !
• Boomakers do never predict “X” (draw) as favourite ! But predict 1 in most of
the cases (72,5 %). This will be important for our results analysis later!
21/35
Training of
the models
What is the benchmark for our task ?
If we take bookmakers favourite we have 0,53% prediction on
the whole set.
22/35
Training of
the models
LDA – what happens when we add
variables
We are capable of reaching the prediction accuracy of bookmakers with a
relatively simple procedure…. BUT this is not FAIR ….WHY ? WE ARE
TRAINING AND TESTING ON THE FULL SET 23/35
Selection of
training and
test set
How to choose the TRAINGING and TEST set ?
Problem: we have a time constraint to respect! We cannot look into the future!
i.e. we cannot use future observations to train a model that will predict past
observations
TIME (total period 8 years)
TRAINING
TEST
<->
2
weeks
24/35
Training of
the models
LDA – TEST/TRAIN*
Apparently we are not as good as the bookmakers.. So we try with another
model.
25/35
Training of
the models
SVM (Support Vector Machines)
NO BIG CHANGES!
26/35
Training of
the models
Random Forest
Size of trees…
27/35
Training of
the models
Apply boosting to Random Forest
Show boosting when changing parameters does not have big effect…
28/35
Training of
the models
Recap of the results of all model
How can we explain these results?
* I do not include the ‘boosted’ RF as it is calculated only on one test size
29/35
WORKFLOW
• Data –
description
• Source
An
overview
of the
dataset
Feature
enginnering
Pre-
processing
data
Train/Test set
Selection of
the models
Training of
the models
Inspecting
results and
interpretation
Test of the
models
30/35
Test of the
models
CONFUSION MATRIX
Problem has to do with the x’s?
31/35
Test of the
models
The problem of the “x”
We are not able to predict x’s….
32/35
Test of the
models
How to detect the ‘x’ ?
33/35
Test of the
models
How to detect the ‘x’ ?
34/35
Test of the
models
Potential evolution of the project
• Look for more and more “refined” predictors (e.g.
player stats, team stats, newspapers…etc…)
• Other prediction problem (e.g. number of goals,
strategies, etc…)
34/35
Test of the
models
1. Seems the issue has to do with the difficulties to
find a way good predictors for Draw
Matches…..eventually predicting football matches
is very difficult….
2. ….but reaching bookmakers performance seems
“achievable”.
Conclusions
Running of the project
Running time script Data preparation Modelling and Result Presentation
45 minutes 3 days 2 days 1 days
6 days
35/35
Bookmakers’ probability assigned
to the Icelandic football team to
get to quarter of final, and win
against England =
0,002

More Related Content

What's hot

Raster scan system & random scan system
Raster scan system & random scan systemRaster scan system & random scan system
Raster scan system & random scan systemshalinikarunakaran1
 
Image Processing using Matlab ( using a built in Highboost filtering,averagin...
Image Processing using Matlab ( using a built in Highboost filtering,averagin...Image Processing using Matlab ( using a built in Highboost filtering,averagin...
Image Processing using Matlab ( using a built in Highboost filtering,averagin...Majd Khaleel
 
Machine Learning lecture2(linear regression)
Machine Learning lecture2(linear regression)Machine Learning lecture2(linear regression)
Machine Learning lecture2(linear regression)cairo university
 
Cohen sutherland line clipping
Cohen sutherland line clippingCohen sutherland line clipping
Cohen sutherland line clippingMani Kanth
 
Lab manual of Digital image processing using python by khalid Shaikh
Lab manual of Digital image processing using python by khalid ShaikhLab manual of Digital image processing using python by khalid Shaikh
Lab manual of Digital image processing using python by khalid Shaikhkhalidsheikh24
 
COM2304: Digital Image Fundamentals - I
COM2304: Digital Image Fundamentals - I COM2304: Digital Image Fundamentals - I
COM2304: Digital Image Fundamentals - I Hemantha Kulathilake
 
Multimodal Learning with Severely Missing Modality.pptx
Multimodal Learning with Severely Missing Modality.pptxMultimodal Learning with Severely Missing Modality.pptx
Multimodal Learning with Severely Missing Modality.pptxSangmin Woo
 
Midpoint circle algo
Midpoint circle algoMidpoint circle algo
Midpoint circle algoMohd Arif
 
Summer training matlab
Summer training matlab Summer training matlab
Summer training matlab Arshit Rai
 
Probabilistic modeling in deep learning
Probabilistic modeling in deep learningProbabilistic modeling in deep learning
Probabilistic modeling in deep learningDenis Dus
 
Connected component labeling algorithm
Connected component labeling algorithmConnected component labeling algorithm
Connected component labeling algorithmManas Mantri
 
Interaction Networks for Learning about Objects, Relations and Physics
Interaction Networks for Learning about Objects, Relations and PhysicsInteraction Networks for Learning about Objects, Relations and Physics
Interaction Networks for Learning about Objects, Relations and PhysicsKen Kuroki
 

What's hot (20)

BRESENHAM’S LINE DRAWING ALGORITHM
BRESENHAM’S  LINE DRAWING ALGORITHMBRESENHAM’S  LINE DRAWING ALGORITHM
BRESENHAM’S LINE DRAWING ALGORITHM
 
Blurred image recognization system
Blurred image recognization systemBlurred image recognization system
Blurred image recognization system
 
Raster scan system & random scan system
Raster scan system & random scan systemRaster scan system & random scan system
Raster scan system & random scan system
 
Image Processing using Matlab ( using a built in Highboost filtering,averagin...
Image Processing using Matlab ( using a built in Highboost filtering,averagin...Image Processing using Matlab ( using a built in Highboost filtering,averagin...
Image Processing using Matlab ( using a built in Highboost filtering,averagin...
 
Machine Learning lecture2(linear regression)
Machine Learning lecture2(linear regression)Machine Learning lecture2(linear regression)
Machine Learning lecture2(linear regression)
 
Siamese networks
Siamese networksSiamese networks
Siamese networks
 
Perception
PerceptionPerception
Perception
 
Cohen sutherland line clipping
Cohen sutherland line clippingCohen sutherland line clipping
Cohen sutherland line clipping
 
Shape Features
 Shape Features  Shape Features
Shape Features
 
AlexNet
AlexNetAlexNet
AlexNet
 
Lab manual of Digital image processing using python by khalid Shaikh
Lab manual of Digital image processing using python by khalid ShaikhLab manual of Digital image processing using python by khalid Shaikh
Lab manual of Digital image processing using python by khalid Shaikh
 
COM2304: Digital Image Fundamentals - I
COM2304: Digital Image Fundamentals - I COM2304: Digital Image Fundamentals - I
COM2304: Digital Image Fundamentals - I
 
Image Sensing & Acquisition
Image Sensing & AcquisitionImage Sensing & Acquisition
Image Sensing & Acquisition
 
Multimodal Learning with Severely Missing Modality.pptx
Multimodal Learning with Severely Missing Modality.pptxMultimodal Learning with Severely Missing Modality.pptx
Multimodal Learning with Severely Missing Modality.pptx
 
Midpoint circle algo
Midpoint circle algoMidpoint circle algo
Midpoint circle algo
 
Summer training matlab
Summer training matlab Summer training matlab
Summer training matlab
 
Probabilistic modeling in deep learning
Probabilistic modeling in deep learningProbabilistic modeling in deep learning
Probabilistic modeling in deep learning
 
Mc culloch pitts neuron
Mc culloch pitts neuronMc culloch pitts neuron
Mc culloch pitts neuron
 
Connected component labeling algorithm
Connected component labeling algorithmConnected component labeling algorithm
Connected component labeling algorithm
 
Interaction Networks for Learning about Objects, Relations and Physics
Interaction Networks for Learning about Objects, Relations and PhysicsInteraction Networks for Learning about Objects, Relations and Physics
Interaction Networks for Learning about Objects, Relations and Physics
 

Similar to Predicting Football Results Using Machine Learning Models

Demo_NextMatch
Demo_NextMatchDemo_NextMatch
Demo_NextMatchYuwei Liu
 
NBA playoff prediction Model.pptx
NBA playoff prediction Model.pptxNBA playoff prediction Model.pptx
NBA playoff prediction Model.pptxrishikeshravi30
 
Predicting the NBA MVP
Predicting the NBA MVPPredicting the NBA MVP
Predicting the NBA MVPThinkful
 
Data Science Interview Questions | Data Science Interview Questions And Answe...
Data Science Interview Questions | Data Science Interview Questions And Answe...Data Science Interview Questions | Data Science Interview Questions And Answe...
Data Science Interview Questions | Data Science Interview Questions And Answe...Simplilearn
 
Machine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision TreesMachine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision Treesananth
 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in AgricultureAman Vasisht
 
Penggambaran Data dengan Grafik
Penggambaran Data dengan GrafikPenggambaran Data dengan Grafik
Penggambaran Data dengan Grafikanom0164
 
Le Machine Learning de A à Z
Le Machine Learning de A à ZLe Machine Learning de A à Z
Le Machine Learning de A à ZAlexia Audevart
 
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...PATHALAMRAJESH
 
Monte Carlo Simulation for project estimates v1.0
Monte Carlo Simulation for project estimates v1.0Monte Carlo Simulation for project estimates v1.0
Monte Carlo Simulation for project estimates v1.0PMILebanonChapter
 
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 1 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 1 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai UniversityMadhav Mishra
 
User Payment Prediction in Free-to-Play
User Payment Prediction in Free-to-PlayUser Payment Prediction in Free-to-Play
User Payment Prediction in Free-to-PlayAhmed Hassan
 
Players Movements and Team Performance
Players Movements and Team PerformancePlayers Movements and Team Performance
Players Movements and Team PerformanceUniversity of Salerno
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Md. Main Uddin Rony
 
Machine Learning vs Decision Optimization comparison
Machine Learning vs Decision Optimization comparisonMachine Learning vs Decision Optimization comparison
Machine Learning vs Decision Optimization comparisonAlain Chabrier
 
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...PAPIs.io
 
Ratios And Proportions Notes
Ratios And Proportions NotesRatios And Proportions Notes
Ratios And Proportions NotesJeremy Shortess
 
"Optimal Learning for Fun and Profit" by Scott Clark (Presented at The Yelp E...
"Optimal Learning for Fun and Profit" by Scott Clark (Presented at The Yelp E..."Optimal Learning for Fun and Profit" by Scott Clark (Presented at The Yelp E...
"Optimal Learning for Fun and Profit" by Scott Clark (Presented at The Yelp E...Yelp Engineering
 

Similar to Predicting Football Results Using Machine Learning Models (20)

Demo_NextMatch
Demo_NextMatchDemo_NextMatch
Demo_NextMatch
 
NBA playoff prediction Model.pptx
NBA playoff prediction Model.pptxNBA playoff prediction Model.pptx
NBA playoff prediction Model.pptx
 
Predicting the NBA MVP
Predicting the NBA MVPPredicting the NBA MVP
Predicting the NBA MVP
 
Data Science Interview Questions | Data Science Interview Questions And Answe...
Data Science Interview Questions | Data Science Interview Questions And Answe...Data Science Interview Questions | Data Science Interview Questions And Answe...
Data Science Interview Questions | Data Science Interview Questions And Answe...
 
Machine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision TreesMachine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision Trees
 
07 learning
07 learning07 learning
07 learning
 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in Agriculture
 
Penggambaran Data dengan Grafik
Penggambaran Data dengan GrafikPenggambaran Data dengan Grafik
Penggambaran Data dengan Grafik
 
Le Machine Learning de A à Z
Le Machine Learning de A à ZLe Machine Learning de A à Z
Le Machine Learning de A à Z
 
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
 
Monte Carlo Simulation for project estimates v1.0
Monte Carlo Simulation for project estimates v1.0Monte Carlo Simulation for project estimates v1.0
Monte Carlo Simulation for project estimates v1.0
 
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 1 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 1 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
 
User Payment Prediction in Free-to-Play
User Payment Prediction in Free-to-PlayUser Payment Prediction in Free-to-Play
User Payment Prediction in Free-to-Play
 
Players Movements and Team Performance
Players Movements and Team PerformancePlayers Movements and Team Performance
Players Movements and Team Performance
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
 
Machine Learning vs Decision Optimization comparison
Machine Learning vs Decision Optimization comparisonMachine Learning vs Decision Optimization comparison
Machine Learning vs Decision Optimization comparison
 
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...
 
Ratios And Proportions Notes
Ratios And Proportions NotesRatios And Proportions Notes
Ratios And Proportions Notes
 
Machine learning meetup
Machine learning meetupMachine learning meetup
Machine learning meetup
 
"Optimal Learning for Fun and Profit" by Scott Clark (Presented at The Yelp E...
"Optimal Learning for Fun and Profit" by Scott Clark (Presented at The Yelp E..."Optimal Learning for Fun and Profit" by Scott Clark (Presented at The Yelp E...
"Optimal Learning for Fun and Profit" by Scott Clark (Presented at The Yelp E...
 

Recently uploaded

100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 

Recently uploaded (20)

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 

Predicting Football Results Using Machine Learning Models

  • 1. Predicting Football Gabriele Pinto Collegio Carlo Alberto MADAS (Master in Data Science for Complex Economic Systems) 1/35
  • 2. OUTLINE Introduction Why and What ? The analysis 2/35
  • 4. WHAT IS FOOTBALL? Is it just a game? 4/35
  • 5. How data is entering into football ? • Acquisitions… • Sensors to track match and players performance… • And BIG BANKS trying to make prediction on world cup…. 5/35
  • 8. The leader of the markets for predictions: the bookmakers ESTIMATES OF MARKET SIZE • WORLD: 700bn-1trillion US$ a year • Of which, 70% related to football • IN ITALY: increase of 300% over ten years, total bets collected of 10 billion € per year. For a net revenue of 1,3 billion €**. • IN USA: sports betting to become fully legal in 2019. • *Source: “Football betting - the global gambling industry worth billions”, BBC, October 2013, link • ** Source: AGIMEG-SOLE24_ORE, link 8/35
  • 9. What bookmakers offer…. Implied probability is 1/odd……in this example 1/1,39 = 71% chance for Manchester United to win The lowest odd is the bookmaker’s favourite result! (the bookmaker prediction) !! 9/35
  • 10. Sports betting in the Academia 10/35
  • 12. WORKFLOW • Data – description • Source An overview of the dataset Feature enginnering Pre- processing data Train/Test set Selection of the models Training of the models Inspecting results and interpretation Test of the models 12/35
  • 13. An overview of the dataset “The European soccer database” Downloadable from Kaggle or scraping on football-data.co.uk • Seasons from 2008 to 2016 • 11 European top countries Leagues • 25,979 matches ✓Results (Goal scored by home and away team) ✓Betting odds for the two teams from 10 top odds providers Home team Away team Home Team Goal Away Team Goal Stage Season League Juventus Milan 3 2 2 2008/2009 Serie A Lazio Sassuolo 1 0 3 2008/2009 Serie A Real Madrid Valencia 1 1 5 2010/2011 Premier League …. … … … … … … 13/35
  • 14. WORKFLOW • Data – description • Source An overview of the dataset Feature enginnering Pre- processing data Train/Test set Selection of the models Training of the models Inspecting results and interpretation Test of the models 14/35
  • 15. Pre- Processing Data ‘ Result ‘ if home team win 1 if draw X if away team win 2 • We will face a CLASSIFICATION task… we will classify each match to result “1” , “X” or “2”. • We will threat those results as unordered factors… 15/35
  • 16. Feature Enginnering • TEMPORAL: • We cannot use the information regarding the match to predict its result…because we need to make the prediction before the match starts ! • WHICH FEATURES • If we do not apply features engineering, the only features we have available are the results of the previous match, team_id’s and league id’s… but this might not be enough Why we need to create features ? • Points in the league (Win= 3 points, Draw= 1 points, Loose=0 points) • Goal of the teams • Dummies (season, league, stage…) • ELO points • All this feature are calculated by team, league, season….cumulatively till the match day ! Thus we create…. 16/35
  • 17. Feature Engineering - The final dataset Home team Away team H_Points so far A_Points so far Goal in season _ Home Goal in season_ away Elo_points _home Elo_point s_away … Juve Milan 68 65 23 15 Serie A 2 … Lazio Sassuol 58 18 3 3 Serie A 3 … Real Madrid Valencia 15 23 5 2010/201 1 Premier League 5 … …. … … … … … … … … Full set of 17 potential predictors… Dummies Team specific ELO POINTS "season","country_id","league_id","stage", "home_team_api_id","away_team_api_id", "pointsofar_home","pointsofar_away","home_H_points_sofar" ,"away_A_points_sofar", "goalsofar_home_out" ,"goalsofar_home_in" ,"goalsofar_home_diff","goalsofar_away_out","goalsofar_away_in", "home_H_goal_out_sofar" ,"away_A_goal_out_sofar" ,"home_H_goal_in_sofar","away_H_goal_in_sofar" ,"goalsofar_away_diff", "home_H_points_elo_sofar" ,"away_A_points_elo_sofar") 17/35
  • 18. Feature Engineering - ELO RATINGS ELO POINTS • from Arpad Emrick Elo (1903 - 1992), professor of Physics, Hungary. • Firstly applied to calculate the relative strenght of a Chess player. • The intuition: In the normal setting we assign 3 points for a win, to each match without taking into account the strenght of the rival. The elo rating weights for the strenght of the rival, that is, if you are a low- ranking team, and you win against an «high-ranking» team, you will be assigned more than 3 points. The «multiplier» will be equal to the difference in ranking (difference in points). 𝑬𝒍𝒐 𝒑𝒐𝒊𝒏𝒕𝒔 𝑨 𝒗𝒔 𝑩 = 𝑵𝒐𝒓𝒎𝒂𝒍 𝒑𝒐𝒊𝒏𝒕𝒔 𝑨 𝒗𝒔 𝑩 ൗ 𝑹𝒂𝒏𝒌𝒊𝒏𝒈𝒔 𝒂 𝑹𝒂𝒏𝒌𝒊𝒏𝒈𝒔 𝒃 18/35
  • 19. WORKFLOW • Data – description • Source An overview of the dataset Feature enginnering Pre- processing data Train/Test set Selection of the models Training of the models Inspecting results and interpretation Test of the models 19/35
  • 20. Training of the models Which model to use for classification ? • Linear Discriminant Analysis LDA • Random Forest + Boosting RF • Support Vector Machine SVM Despite we could theoretically use LOGIT for multinomial classification, we prefer to go for other methods since Classes >2 (1,X,2). but will recall LOGIT later… Before to look at results … LET’S GO BACK AGAIN TO THE DATA 20/35
  • 21. Training of the models What is the benchmark for our task ? • If we always predict class 1, we got it right in 46% of the test. That is, our benchmark for prediction accuracy is 0,46 ! • Boomakers do never predict “X” (draw) as favourite ! But predict 1 in most of the cases (72,5 %). This will be important for our results analysis later! 21/35
  • 22. Training of the models What is the benchmark for our task ? If we take bookmakers favourite we have 0,53% prediction on the whole set. 22/35
  • 23. Training of the models LDA – what happens when we add variables We are capable of reaching the prediction accuracy of bookmakers with a relatively simple procedure…. BUT this is not FAIR ….WHY ? WE ARE TRAINING AND TESTING ON THE FULL SET 23/35
  • 24. Selection of training and test set How to choose the TRAINGING and TEST set ? Problem: we have a time constraint to respect! We cannot look into the future! i.e. we cannot use future observations to train a model that will predict past observations TIME (total period 8 years) TRAINING TEST <-> 2 weeks 24/35
  • 25. Training of the models LDA – TEST/TRAIN* Apparently we are not as good as the bookmakers.. So we try with another model. 25/35
  • 26. Training of the models SVM (Support Vector Machines) NO BIG CHANGES! 26/35
  • 27. Training of the models Random Forest Size of trees… 27/35
  • 28. Training of the models Apply boosting to Random Forest Show boosting when changing parameters does not have big effect… 28/35
  • 29. Training of the models Recap of the results of all model How can we explain these results? * I do not include the ‘boosted’ RF as it is calculated only on one test size 29/35
  • 30. WORKFLOW • Data – description • Source An overview of the dataset Feature enginnering Pre- processing data Train/Test set Selection of the models Training of the models Inspecting results and interpretation Test of the models 30/35
  • 31. Test of the models CONFUSION MATRIX Problem has to do with the x’s? 31/35
  • 32. Test of the models The problem of the “x” We are not able to predict x’s…. 32/35
  • 33. Test of the models How to detect the ‘x’ ? 33/35
  • 34. Test of the models How to detect the ‘x’ ? 34/35
  • 35. Test of the models Potential evolution of the project • Look for more and more “refined” predictors (e.g. player stats, team stats, newspapers…etc…) • Other prediction problem (e.g. number of goals, strategies, etc…) 34/35
  • 36. Test of the models 1. Seems the issue has to do with the difficulties to find a way good predictors for Draw Matches…..eventually predicting football matches is very difficult…. 2. ….but reaching bookmakers performance seems “achievable”. Conclusions Running of the project Running time script Data preparation Modelling and Result Presentation 45 minutes 3 days 2 days 1 days 6 days 35/35
  • 37. Bookmakers’ probability assigned to the Icelandic football team to get to quarter of final, and win against England = 0,002