SlideShare a Scribd company logo
1 of 24
Download to read offline
The Task My approach Conclusions
Competition for the International Conference of Information and Knowledge
Management (CIKM) hosted by Sportsbet
September 16th 2015
My Entry to the Sportsbet Competition
Simone Romano
simone.romano@unimelb.edu.au
@ialuronico
Simone Romano
My Entry to the Sportsbet Competition
The Task My approach Conclusions
The Task
Task description
The challenges
My approach
How to Build a Model for Predictions
Evaluation of Prediction Error
Conclusions
Summary
What I would have done if I had more time
Simone Romano
My Entry to the Sportsbet Competition
The Task My approach Conclusions
Task description
Task description
Sportbets competition: predict the outcomes of every match in the 2015
AFL season showing the probability that Team1 wins versus Team2.
E.g. Hawthorn (The Hawks) wins vs Adelaide (The Crows) on the 18th of
September with probability 0.75 (75%)1
Two phases:
The Leaderboard Phase prediction of the outcome of each regular-season
match in the 2015 AFL season.
(match results are already known)
The Finals Phase prediction of the outcome of each match in the 2015 AFL
Finals Series.
(match results are known after AFL Grand Final)
1
Implied by the odds for Hawthorn on Monday the 14th of September on
http://www.sportsbet.com.au/betting/australian-rules/afl
Simone Romano
My Entry to the Sportsbet Competition
The Task My approach Conclusions
Task description
Task description
Sportbets competition: predict the outcomes of every match in the 2015
AFL season showing the probability that Team1 wins versus Team2.
E.g. Hawthorn (The Hawks) wins vs Adelaide (The Crows) on the 18th of
September with probability 0.75 (75%)1
Two phases:
The Leaderboard Phase prediction of the outcome of each regular-season
match in the 2015 AFL season.
(match results are already known)
The Finals Phase prediction of the outcome of each match in the 2015 AFL
Finals Series.
(match results are known after AFL Grand Final)
I focused on the Lederboard Phase in order to evaluate the performance of my
predictions because we know the match results
1
Implied by the odds for Hawthorn on Monday the 14th of September on
http://www.sportsbet.com.au/betting/australian-rules/afl
Simone Romano
My Entry to the Sportsbet Competition
The Task My approach Conclusions
Task description
Data provided
The following datasets were provided:
Teams Name of teams which took part in AFL matches between 2000
and 2015.
Players Name of players that have played in at least one match
between 2000 and 2015.
Seasons Description, results, and statistics of regular-season (non-finals)
matches. E.g. it contains:
which team is home or away
venue: venue of the match.
margin: winning margin
Match stats Statistics recorded for a single player for every match (including
finals) between 2000 and 2015. E.g. it contains:
number of kicks performed
number of goals
Finals Contains information about the final matches between 2000
and 2014
Simone Romano
My Entry to the Sportsbet Competition
The Task My approach Conclusions
Task description
Data provided
The following datasets were provided:
Teams Name of teams which took part in AFL matches between 2000
and 2015.
Players Name of players that have played in at least one match
between 2000 and 2015.
Seasons Description, results, and statistics of regular-season (non-finals)
matches. E.g. it contains:
which team is home or away
venue: venue of the match.
margin: winning margin
Match stats Statistics recorded for a single player for every match (including
finals) between 2000 and 2015. E.g. it contains:
number of kicks performed
number of goals
Finals Contains information about the final matches between 2000
and 2014
Unplayed Remaining (unplayed) regular-season matches in the 2015
season. (Dataset release: end of July 2015)
Simone Romano
My Entry to the Sportsbet Competition
The Task My approach Conclusions
The challenges
The Challenges
Target: We want to predict the outcome of matches in the 2015 season using
the data available.
Challenges
Take into account the time constraints: when predicting the outcome of a
match we can only use information about past matches
Obtain low prediction error
Solution
Build an automated prediction model that incorporates information on
matches played between 2000 and 2014. Given 2 teams, Team1 and Team2,
the model predicts the probability for Team1 to win versus Team2.
We wish our model to have low prediction error
Simone Romano
My Entry to the Sportsbet Competition
The Task My approach Conclusions
The challenges
Evaluation of Prediction Error
Given that we actually know the results of matches in 2015 we can compute
the logloss error of our predictions. logloss error is used to score the entries to
the competition.
Useful facts about logloss error
logloss = 0 A team always wins when the model says 100% prob-
ability of winning and a team always loses if the
model says 0%. Model generates only 100% and
0% probabilities.
logloss = LARGE If it happens that even for just one match the pre-
diction of a team winning is 100% probability but
the team actually loses the game.
logloss = 0.693 If all predictions are set to 50%
Simone Romano
My Entry to the Sportsbet Competition
The Task My approach Conclusions
The challenges
We have to keep in mind that:
Large probability should be avoided (E.g. 100% or 0%) because just one
single error can increase a lot the logloss
Just being conservative we can obtain 0.693
This is not an easy task and some competitors performed really badly:
Simone Romano
My Entry to the Sportsbet Competition
The Task My approach Conclusions
The Task
Task description
The challenges
My approach
How to Build a Model for Predictions
Evaluation of Prediction Error
Conclusions
Summary
What I would have done if I had more time
Simone Romano
My Entry to the Sportsbet Competition
The Task My approach Conclusions
How to Build a Model for Predictions
Position on the Leaderboard
In two days a managed to finish half way in the Leaderboard with a
logloss = 0.640. Position 28 out of 52. The smallest error on the leaderboard
is 0.524
Simone Romano
My Entry to the Sportsbet Competition
The Task My approach Conclusions
How to Build a Model for Predictions
My Approach
We can build a simple model based on matches between 2000 and 2014 and
the knowledge of:
The teams that are playing
Which team is home and which one is away
Example: Hawthorn (The Hawks) vs Adelaide (The Crows)
Season Round Team Home Winner
2011 R01 Adelaide home Adelaide
2012 R03 Hawthorn home Hawthorn
2013 R06 Adelaide home Hawthorn
2014 R17 Adelaide home Hawthorn
2015 R12 Adelaide home ?
We could say that Hawthorn is going to win with probability 3
4
= 75%. Indeed,
Hawthorn won.
The model learn on the results of past matches to output this probability
according to this rationale
Simone Romano
My Entry to the Sportsbet Competition
The Task My approach Conclusions
How to Build a Model for Predictions
Adding Features
Feature: measurable information about matches which we can use to predict
the outcome for a match in 2015.
For example, can “winner margin” in past games help our predictions?
Season Round Team Home Winner Winner margin
2011 R01 Adelaide home Adelaide 20
2012 R03 Hawthorn home Hawthorn 56
2013 R06 Adelaide home Hawthorn 11
2014 R17 Adelaide home Hawthorn 12
2015 R12 Adelaide home ? ?
We can only use statistics about margin of previous events to predict the
probability of Hawthorn winning in 2015:
Mean margin of previous events (Hawthorn-Adelaide) ⇒ 14.75
Maximum margin of previous events (Hawthorn-Adelaide) ⇒ 56
Minimum margin of previous events (Hawthorn-Adelaide) ⇒ -20
But which one is a good predictor...
Simone Romano
My Entry to the Sportsbet Competition
The Task My approach Conclusions
How to Build a Model for Predictions
Is Mean Margin a good predictor of winning?
Distribution of games won according to the Mean Margin computed on
previous games(Red) for matches 2000-2014. Respectively games lost (Blue).
Mean Margin is good if these counts are well separated.
Mean Margin in Previous Games
-200 -100 0 100 200
Frequency
0
20
40
60
80
100
Lose
Win
Insights
If a team has Mean Margin more than 100 is likely to win
If a team has Mean Margin less than -90 it is likely to lose
Simone Romano
My Entry to the Sportsbet Competition
The Task My approach Conclusions
How to Build a Model for Predictions
Min Margin as predictor of winning
Min Margin in Previous Games
-200 -100 0 100 200
Frequency
0
20
40
60
80
100
Lose
Win
Insights
If a team has been defeated in the past by as many as 150 points it is
likely to lose
Simone Romano
My Entry to the Sportsbet Competition
The Task My approach Conclusions
How to Build a Model for Predictions
Max Margin as predictor of winning
Max Margin in Previous Games
-200 -100 0 100 200
Frequency
0
20
40
60
80
100
Lose
Win
Insights
If a team has won in the past by as many as 150 points it is likely to win
Simone Romano
My Entry to the Sportsbet Competition
The Task My approach Conclusions
How to Build a Model for Predictions
Other Features
Similarly to the margin of the final score between two teams, we can compute
the margin for other statistics:
Number of Kicks
Number of Inside 50
Number of Disposals
Number of Clearances
Rank of Attributes based on Prediction Errors (Best at the top)
Score2
Name
0.0449 Mean Margin Inside 50
0.0408 Mean Margin Score
0.0361 Max Margin Score
0.0325 Mean Margin Disposals
2
According to Information Gain
Simone Romano
My Entry to the Sportsbet Competition
The Task My approach Conclusions
Evaluation of Prediction Error
Evaluation of Prediction Error
I evaluated the model on the prediction of outcomes for 2015 matches:
logloss = 0.682 without statistics (just knowing the teams that are
playing)
logloss = 0.640 with statistics
This is obtained with a black-box model (Random Forest) which is accurate
but difficult to interpret.
Can we get a simpler model?
Interestingly, the simplest model obtained automatically from this data is:
(Mean-Margin ≥ -0.25 AND location = home) ⇒ win with probability 63.8%
else win with probability 36.8%
However, this shows high error: logloss = 0.689 (It does not take into account
the actual teams that are playing)
Simone Romano
My Entry to the Sportsbet Competition
The Task My approach Conclusions
Evaluation of Prediction Error
Remark about Data on Previous matches
We have to be careful about taking into account matches played too long ago.
Indeed, the best prediction (according to our features) is obtained only with
matches from 2014:
Least Recent Matches
2000 2002 2004 2006 2008 2010 2012 2014
logloss
6.2
6.4
6.6
6.8
Error in Prediction
This is probably because 2014 teams a very similar to 2015 teams.
It would be interesting to see which top players moved between teams in the
past years
Simone Romano
My Entry to the Sportsbet Competition
The Task My approach Conclusions
The Task
Task description
The challenges
My approach
How to Build a Model for Predictions
Evaluation of Prediction Error
Conclusions
Summary
What I would have done if I had more time
Simone Romano
My Entry to the Sportsbet Competition
The Task My approach Conclusions
Summary
Summary
It is possible to predict the outcome of future matches with enough accuracy
with 2 days of work:
Using features obtained from score margin, margin based on number of
inside 50, and number of disposals
Combining these features using a model (Random Forest logloss = 0.640)
and we can get insights from each feature individually
Knowing that data about recent matches is more helpful
Small error can be traded for model simplicity
Technicalities
I performed feature engineering in Python and predictions with WEKA.
Simone Romano
My Entry to the Sportsbet Competition
The Task My approach Conclusions
What I would have done if I had more time
What I would have done if I had more time
There are a number of things that can be done to improve my model and I did
not have the chance to try because of time:
Predict the outcome of a match on round X in 2015 based on matches
played in previous rounds in 2015
Use many other statistics: e.g. handballs, tackles
Use data about previously played finals
Introduce player level features: rank all the players based on goals and
count the number of top players a team is going to employ during the
match
Team strategy features (difficult to encode)
Use Sportsbet and other companies’ odds (not fair for my entry but it
would be fair in real practice)
Simone Romano
My Entry to the Sportsbet Competition
The Task My approach Conclusions
What I would have done if I had more time
Other interesting things other then predicting match outcomes...
It would be interesting to analyze data and see:
if there are players that are correlated with winning/losing games
characteristics of Brownlow Medal winners
probabilities of winning after losing the first/second/third quarters
identifying the ’turning points’ in important matches (which players are
involved in changing the outcome of a match?)
Simone Romano
My Entry to the Sportsbet Competition
The Task My approach Conclusions
What I would have done if I had more time
Thank you.
Questions?
Simone Romano
simone.romano@unimelb.edu.au
@ialuronico
Simone Romano
My Entry to the Sportsbet Competition

More Related Content

Similar to My Entry to the Sportsbet/CIKM competition

[Scorebook final]
[Scorebook final][Scorebook final]
[Scorebook final]Freelancer
 
Alex Kornilov: Building Big Data Company in Sports-Betting Industry - BETEGY ...
Alex Kornilov: Building Big Data Company in Sports-Betting Industry - BETEGY ...Alex Kornilov: Building Big Data Company in Sports-Betting Industry - BETEGY ...
Alex Kornilov: Building Big Data Company in Sports-Betting Industry - BETEGY ...AnalyticsConf
 
SFA 2.0 R3 Gamification Function Design
SFA 2.0 R3 Gamification Function DesignSFA 2.0 R3 Gamification Function Design
SFA 2.0 R3 Gamification Function DesignTony (Mingliang) Ye
 
Designing Test Cases (Test Driven Development)
Designing Test Cases (Test Driven Development)Designing Test Cases (Test Driven Development)
Designing Test Cases (Test Driven Development)Marco Beelen
 
Forecasting seo-v1-251010
Forecasting seo-v1-251010Forecasting seo-v1-251010
Forecasting seo-v1-251010Neil Walker
 
La liga 2013 2014 analysis
La liga 2013 2014 analysisLa liga 2013 2014 analysis
La liga 2013 2014 analysisRitu Sarkar
 
L & l awards programs 2014 (1)
L & l awards programs 2014 (1)L & l awards programs 2014 (1)
L & l awards programs 2014 (1)dezimaree
 
Andy Pick: Statistics Presentation
Andy Pick: Statistics PresentationAndy Pick: Statistics Presentation
Andy Pick: Statistics PresentationMilesBuesst
 
Tennis ComStat Presentation 2017
Tennis ComStat Presentation 2017Tennis ComStat Presentation 2017
Tennis ComStat Presentation 2017Tennis ComStat
 
Sports performance modelling in 100 ball
Sports performance modelling in 100 ball  Sports performance modelling in 100 ball
Sports performance modelling in 100 ball Devansh Chawla
 

Similar to My Entry to the Sportsbet/CIKM competition (12)

The Data Behind Football
The Data Behind FootballThe Data Behind Football
The Data Behind Football
 
[Scorebook final]
[Scorebook final][Scorebook final]
[Scorebook final]
 
Alex Kornilov: Building Big Data Company in Sports-Betting Industry - BETEGY ...
Alex Kornilov: Building Big Data Company in Sports-Betting Industry - BETEGY ...Alex Kornilov: Building Big Data Company in Sports-Betting Industry - BETEGY ...
Alex Kornilov: Building Big Data Company in Sports-Betting Industry - BETEGY ...
 
SFA 2.0 R3 Gamification Function Design
SFA 2.0 R3 Gamification Function DesignSFA 2.0 R3 Gamification Function Design
SFA 2.0 R3 Gamification Function Design
 
Designing Test Cases (Test Driven Development)
Designing Test Cases (Test Driven Development)Designing Test Cases (Test Driven Development)
Designing Test Cases (Test Driven Development)
 
Forecasting seo-v1-251010
Forecasting seo-v1-251010Forecasting seo-v1-251010
Forecasting seo-v1-251010
 
La liga 2013 2014 analysis
La liga 2013 2014 analysisLa liga 2013 2014 analysis
La liga 2013 2014 analysis
 
Dynamic League Point System
Dynamic League Point SystemDynamic League Point System
Dynamic League Point System
 
L & l awards programs 2014 (1)
L & l awards programs 2014 (1)L & l awards programs 2014 (1)
L & l awards programs 2014 (1)
 
Andy Pick: Statistics Presentation
Andy Pick: Statistics PresentationAndy Pick: Statistics Presentation
Andy Pick: Statistics Presentation
 
Tennis ComStat Presentation 2017
Tennis ComStat Presentation 2017Tennis ComStat Presentation 2017
Tennis ComStat Presentation 2017
 
Sports performance modelling in 100 ball
Sports performance modelling in 100 ball  Sports performance modelling in 100 ball
Sports performance modelling in 100 ball
 

More from Simone Romano

Startups and you 2021
Startups and you 2021Startups and you 2021
Startups and you 2021Simone Romano
 
A Framework to Adjust Dependency Measure Estimates for Chance
A Framework to Adjust Dependency Measure Estimates for Chance      A Framework to Adjust Dependency Measure Estimates for Chance
A Framework to Adjust Dependency Measure Estimates for Chance Simone Romano
 
Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning
Enhancing Diagnostics for Invasive Aspergillosis using Machine LearningEnhancing Diagnostics for Invasive Aspergillosis using Machine Learning
Enhancing Diagnostics for Invasive Aspergillosis using Machine LearningSimone Romano
 
Predicting the Response to Hepatitis C Therapy
Predicting the Response to Hepatitis C TherapyPredicting the Response to Hepatitis C Therapy
Predicting the Response to Hepatitis C TherapySimone Romano
 
PhD Completion Seminar
PhD Completion Seminar PhD Completion Seminar
PhD Completion Seminar Simone Romano
 

More from Simone Romano (6)

Startups and you 2021
Startups and you 2021Startups and you 2021
Startups and you 2021
 
Startups and You
Startups and YouStartups and You
Startups and You
 
A Framework to Adjust Dependency Measure Estimates for Chance
A Framework to Adjust Dependency Measure Estimates for Chance      A Framework to Adjust Dependency Measure Estimates for Chance
A Framework to Adjust Dependency Measure Estimates for Chance
 
Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning
Enhancing Diagnostics for Invasive Aspergillosis using Machine LearningEnhancing Diagnostics for Invasive Aspergillosis using Machine Learning
Enhancing Diagnostics for Invasive Aspergillosis using Machine Learning
 
Predicting the Response to Hepatitis C Therapy
Predicting the Response to Hepatitis C TherapyPredicting the Response to Hepatitis C Therapy
Predicting the Response to Hepatitis C Therapy
 
PhD Completion Seminar
PhD Completion Seminar PhD Completion Seminar
PhD Completion Seminar
 

Recently uploaded

Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Silpa
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfSumit Kumar yadav
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑Damini Dixit
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...ssuser79fe74
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Servicemonikaservice1
 
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATIONSTS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATIONrouseeyyy
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Servicenishacall1
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...chandars293
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Joonhun Lee
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)AkefAfaneh2
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)Areesha Ahmad
 

Recently uploaded (20)

Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATIONSTS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 

My Entry to the Sportsbet/CIKM competition

  • 1. The Task My approach Conclusions Competition for the International Conference of Information and Knowledge Management (CIKM) hosted by Sportsbet September 16th 2015 My Entry to the Sportsbet Competition Simone Romano simone.romano@unimelb.edu.au @ialuronico Simone Romano My Entry to the Sportsbet Competition
  • 2. The Task My approach Conclusions The Task Task description The challenges My approach How to Build a Model for Predictions Evaluation of Prediction Error Conclusions Summary What I would have done if I had more time Simone Romano My Entry to the Sportsbet Competition
  • 3. The Task My approach Conclusions Task description Task description Sportbets competition: predict the outcomes of every match in the 2015 AFL season showing the probability that Team1 wins versus Team2. E.g. Hawthorn (The Hawks) wins vs Adelaide (The Crows) on the 18th of September with probability 0.75 (75%)1 Two phases: The Leaderboard Phase prediction of the outcome of each regular-season match in the 2015 AFL season. (match results are already known) The Finals Phase prediction of the outcome of each match in the 2015 AFL Finals Series. (match results are known after AFL Grand Final) 1 Implied by the odds for Hawthorn on Monday the 14th of September on http://www.sportsbet.com.au/betting/australian-rules/afl Simone Romano My Entry to the Sportsbet Competition
  • 4. The Task My approach Conclusions Task description Task description Sportbets competition: predict the outcomes of every match in the 2015 AFL season showing the probability that Team1 wins versus Team2. E.g. Hawthorn (The Hawks) wins vs Adelaide (The Crows) on the 18th of September with probability 0.75 (75%)1 Two phases: The Leaderboard Phase prediction of the outcome of each regular-season match in the 2015 AFL season. (match results are already known) The Finals Phase prediction of the outcome of each match in the 2015 AFL Finals Series. (match results are known after AFL Grand Final) I focused on the Lederboard Phase in order to evaluate the performance of my predictions because we know the match results 1 Implied by the odds for Hawthorn on Monday the 14th of September on http://www.sportsbet.com.au/betting/australian-rules/afl Simone Romano My Entry to the Sportsbet Competition
  • 5. The Task My approach Conclusions Task description Data provided The following datasets were provided: Teams Name of teams which took part in AFL matches between 2000 and 2015. Players Name of players that have played in at least one match between 2000 and 2015. Seasons Description, results, and statistics of regular-season (non-finals) matches. E.g. it contains: which team is home or away venue: venue of the match. margin: winning margin Match stats Statistics recorded for a single player for every match (including finals) between 2000 and 2015. E.g. it contains: number of kicks performed number of goals Finals Contains information about the final matches between 2000 and 2014 Simone Romano My Entry to the Sportsbet Competition
  • 6. The Task My approach Conclusions Task description Data provided The following datasets were provided: Teams Name of teams which took part in AFL matches between 2000 and 2015. Players Name of players that have played in at least one match between 2000 and 2015. Seasons Description, results, and statistics of regular-season (non-finals) matches. E.g. it contains: which team is home or away venue: venue of the match. margin: winning margin Match stats Statistics recorded for a single player for every match (including finals) between 2000 and 2015. E.g. it contains: number of kicks performed number of goals Finals Contains information about the final matches between 2000 and 2014 Unplayed Remaining (unplayed) regular-season matches in the 2015 season. (Dataset release: end of July 2015) Simone Romano My Entry to the Sportsbet Competition
  • 7. The Task My approach Conclusions The challenges The Challenges Target: We want to predict the outcome of matches in the 2015 season using the data available. Challenges Take into account the time constraints: when predicting the outcome of a match we can only use information about past matches Obtain low prediction error Solution Build an automated prediction model that incorporates information on matches played between 2000 and 2014. Given 2 teams, Team1 and Team2, the model predicts the probability for Team1 to win versus Team2. We wish our model to have low prediction error Simone Romano My Entry to the Sportsbet Competition
  • 8. The Task My approach Conclusions The challenges Evaluation of Prediction Error Given that we actually know the results of matches in 2015 we can compute the logloss error of our predictions. logloss error is used to score the entries to the competition. Useful facts about logloss error logloss = 0 A team always wins when the model says 100% prob- ability of winning and a team always loses if the model says 0%. Model generates only 100% and 0% probabilities. logloss = LARGE If it happens that even for just one match the pre- diction of a team winning is 100% probability but the team actually loses the game. logloss = 0.693 If all predictions are set to 50% Simone Romano My Entry to the Sportsbet Competition
  • 9. The Task My approach Conclusions The challenges We have to keep in mind that: Large probability should be avoided (E.g. 100% or 0%) because just one single error can increase a lot the logloss Just being conservative we can obtain 0.693 This is not an easy task and some competitors performed really badly: Simone Romano My Entry to the Sportsbet Competition
  • 10. The Task My approach Conclusions The Task Task description The challenges My approach How to Build a Model for Predictions Evaluation of Prediction Error Conclusions Summary What I would have done if I had more time Simone Romano My Entry to the Sportsbet Competition
  • 11. The Task My approach Conclusions How to Build a Model for Predictions Position on the Leaderboard In two days a managed to finish half way in the Leaderboard with a logloss = 0.640. Position 28 out of 52. The smallest error on the leaderboard is 0.524 Simone Romano My Entry to the Sportsbet Competition
  • 12. The Task My approach Conclusions How to Build a Model for Predictions My Approach We can build a simple model based on matches between 2000 and 2014 and the knowledge of: The teams that are playing Which team is home and which one is away Example: Hawthorn (The Hawks) vs Adelaide (The Crows) Season Round Team Home Winner 2011 R01 Adelaide home Adelaide 2012 R03 Hawthorn home Hawthorn 2013 R06 Adelaide home Hawthorn 2014 R17 Adelaide home Hawthorn 2015 R12 Adelaide home ? We could say that Hawthorn is going to win with probability 3 4 = 75%. Indeed, Hawthorn won. The model learn on the results of past matches to output this probability according to this rationale Simone Romano My Entry to the Sportsbet Competition
  • 13. The Task My approach Conclusions How to Build a Model for Predictions Adding Features Feature: measurable information about matches which we can use to predict the outcome for a match in 2015. For example, can “winner margin” in past games help our predictions? Season Round Team Home Winner Winner margin 2011 R01 Adelaide home Adelaide 20 2012 R03 Hawthorn home Hawthorn 56 2013 R06 Adelaide home Hawthorn 11 2014 R17 Adelaide home Hawthorn 12 2015 R12 Adelaide home ? ? We can only use statistics about margin of previous events to predict the probability of Hawthorn winning in 2015: Mean margin of previous events (Hawthorn-Adelaide) ⇒ 14.75 Maximum margin of previous events (Hawthorn-Adelaide) ⇒ 56 Minimum margin of previous events (Hawthorn-Adelaide) ⇒ -20 But which one is a good predictor... Simone Romano My Entry to the Sportsbet Competition
  • 14. The Task My approach Conclusions How to Build a Model for Predictions Is Mean Margin a good predictor of winning? Distribution of games won according to the Mean Margin computed on previous games(Red) for matches 2000-2014. Respectively games lost (Blue). Mean Margin is good if these counts are well separated. Mean Margin in Previous Games -200 -100 0 100 200 Frequency 0 20 40 60 80 100 Lose Win Insights If a team has Mean Margin more than 100 is likely to win If a team has Mean Margin less than -90 it is likely to lose Simone Romano My Entry to the Sportsbet Competition
  • 15. The Task My approach Conclusions How to Build a Model for Predictions Min Margin as predictor of winning Min Margin in Previous Games -200 -100 0 100 200 Frequency 0 20 40 60 80 100 Lose Win Insights If a team has been defeated in the past by as many as 150 points it is likely to lose Simone Romano My Entry to the Sportsbet Competition
  • 16. The Task My approach Conclusions How to Build a Model for Predictions Max Margin as predictor of winning Max Margin in Previous Games -200 -100 0 100 200 Frequency 0 20 40 60 80 100 Lose Win Insights If a team has won in the past by as many as 150 points it is likely to win Simone Romano My Entry to the Sportsbet Competition
  • 17. The Task My approach Conclusions How to Build a Model for Predictions Other Features Similarly to the margin of the final score between two teams, we can compute the margin for other statistics: Number of Kicks Number of Inside 50 Number of Disposals Number of Clearances Rank of Attributes based on Prediction Errors (Best at the top) Score2 Name 0.0449 Mean Margin Inside 50 0.0408 Mean Margin Score 0.0361 Max Margin Score 0.0325 Mean Margin Disposals 2 According to Information Gain Simone Romano My Entry to the Sportsbet Competition
  • 18. The Task My approach Conclusions Evaluation of Prediction Error Evaluation of Prediction Error I evaluated the model on the prediction of outcomes for 2015 matches: logloss = 0.682 without statistics (just knowing the teams that are playing) logloss = 0.640 with statistics This is obtained with a black-box model (Random Forest) which is accurate but difficult to interpret. Can we get a simpler model? Interestingly, the simplest model obtained automatically from this data is: (Mean-Margin ≥ -0.25 AND location = home) ⇒ win with probability 63.8% else win with probability 36.8% However, this shows high error: logloss = 0.689 (It does not take into account the actual teams that are playing) Simone Romano My Entry to the Sportsbet Competition
  • 19. The Task My approach Conclusions Evaluation of Prediction Error Remark about Data on Previous matches We have to be careful about taking into account matches played too long ago. Indeed, the best prediction (according to our features) is obtained only with matches from 2014: Least Recent Matches 2000 2002 2004 2006 2008 2010 2012 2014 logloss 6.2 6.4 6.6 6.8 Error in Prediction This is probably because 2014 teams a very similar to 2015 teams. It would be interesting to see which top players moved between teams in the past years Simone Romano My Entry to the Sportsbet Competition
  • 20. The Task My approach Conclusions The Task Task description The challenges My approach How to Build a Model for Predictions Evaluation of Prediction Error Conclusions Summary What I would have done if I had more time Simone Romano My Entry to the Sportsbet Competition
  • 21. The Task My approach Conclusions Summary Summary It is possible to predict the outcome of future matches with enough accuracy with 2 days of work: Using features obtained from score margin, margin based on number of inside 50, and number of disposals Combining these features using a model (Random Forest logloss = 0.640) and we can get insights from each feature individually Knowing that data about recent matches is more helpful Small error can be traded for model simplicity Technicalities I performed feature engineering in Python and predictions with WEKA. Simone Romano My Entry to the Sportsbet Competition
  • 22. The Task My approach Conclusions What I would have done if I had more time What I would have done if I had more time There are a number of things that can be done to improve my model and I did not have the chance to try because of time: Predict the outcome of a match on round X in 2015 based on matches played in previous rounds in 2015 Use many other statistics: e.g. handballs, tackles Use data about previously played finals Introduce player level features: rank all the players based on goals and count the number of top players a team is going to employ during the match Team strategy features (difficult to encode) Use Sportsbet and other companies’ odds (not fair for my entry but it would be fair in real practice) Simone Romano My Entry to the Sportsbet Competition
  • 23. The Task My approach Conclusions What I would have done if I had more time Other interesting things other then predicting match outcomes... It would be interesting to analyze data and see: if there are players that are correlated with winning/losing games characteristics of Brownlow Medal winners probabilities of winning after losing the first/second/third quarters identifying the ’turning points’ in important matches (which players are involved in changing the outcome of a match?) Simone Romano My Entry to the Sportsbet Competition
  • 24. The Task My approach Conclusions What I would have done if I had more time Thank you. Questions? Simone Romano simone.romano@unimelb.edu.au @ialuronico Simone Romano My Entry to the Sportsbet Competition