SlideShare a Scribd company logo
1 of 3
Download to read offline
IPL Data Analysis
Kaushal Sanadhya
Indraprashta Institute of Information Technology,Delhi
Okhla Industrial Estate,Phase III
New Delhi,India
emailid : kaushal19133@iiitd.ac.in
Abstract—Cricket is one of the most celebrated games in
the world. With the introduction of Data science and machine
learning techniques in the world of cricket, forecasting the score
of the match has been established as one of the most challenging
problems.Especially in the shortest format T20, score forecasting
and analyzing other statistics become more important as every
moment is sufficient enough to take the game away from oppo-
sition team. Our work develops some crucial predictions using
various machine learning models like RandomForestRegressor,
Linear regressor , Radius Nearest Neighbors etc.
Index Terms—score forecasting,modeling Indian Premier
League data,Regression,machine learning
I. INTRODUCTION
There can be several factors that strongly affect predictions
like the current score, wickets in hand, weather conditions,
dew factor, pitch condition, etc. We have used a data set
of 1,79,079 records consisting of the data for every single
ball in IPL matches from the year 2009 to 2019. Significant
contributions from this project are as follows:
a) Feature construction: We have created new attributes
[balls remaining, current score, wickets in hand] that can
capture the critical information in the dataset(deliveries.csv)
much more efficiently than the original attributes.:
b) Final score prediction : predicting the eventual score
in the first innings. :
II. FEATURE CONSTRUCTION
The existing features (Deliveries.csv) like over,
ball,is super over,wide runs,non-stricker, etc. are not
good enough to make confident, reliable predictions for
a final score, Therefore new features score are created as
follows:
• balls remaining: number of balls remaining in the first
innings of the match
• current score: current score of the team.
• wickets remaining: wickets in hand for the team.
• final score: final score of that team in that match this is
the target variable which we are trying to forecast.
When these newly created features are used to predict the
final score, we obtained some handsome value of R square for
different machine learning models used in the project
III. PREDICTING FIRST INNINGS SCORES
Score prediction for the first innings is a typical multiple
regression problem since the output is the forecasted score.
Data set of size 10,315 records is used to train our model.
A. Training Data
Match data for the following teams is used to train our
machine learning models:
• Chennai Super Kings
• Hyderabad Sun Risers
• Mumbai Indians
Training data size is 10,316 records are used .
B. Test Data
Match data for Kolkata Knight Riders is used for Testing
purposes. To make predictions for the final score of the Kolkata
team, every time the final score for ten random matches is
selected, which is further compared with the predicted score
by our machine learning models.
IV. REGRESSION MODELS USED
A. Multivariate Linear Regression
Most of the cricket problems that are encountered will
have more than two variables. Therefore Multivariate Linear
regression is used to fit the line in our multi dimensional space.
Using this regression model we can even draw the impact of
each feature on the predicted score using the below well know
equation for linear regression: Y = a + b*X1 + c*X2 + d*X3
For our analysis we will get the below equation:
Final Score = 27.915 + 0.989121 * current score +
1.183421 * balls remaining - 3.576307 * wickets
Actual Score Predicted Score
150 154
119 114
204 210
130 131
155 152
160 169
222 232
163 160
223 231
148 149
1) Performance Evaluation: Mean Absolute error and Root
mean Squared errors are the two parameters used to evaluate
the performance and following values are achieved:
Mean Absolute Error: 5.13
Root Mean Squared Error: 6.02
These values are moderately high since the actual and pre-
dicted scores differs by at most 10 runs. The performance can
be enhanced by using some more advanced regression models
like ada boost or Random forest Regression.
B. Random Forest Regression
Random Forest Regression is an ensemble technique which
makes use of multiple prediction model.It combines the result
of these prediction models to give more accurate results.
Actual Score Predicted Score
150 150
119 121
204 201
130 132
155 155
160 160
222 220
163 163
223 221
148 148
1) Importance Associated With Various Features: By
looking at the values of the features importance we can
estimate the significant contribution made by some feature.
Feature Importance
current score 0.47
Balls Remaining 0.30
Wickets 0.22677739128946717
2) Performance Evaluation: Following are the values of
Mean Absolute Error , Root means squared Error:
Mean Absolute Error: 1.36
Root Mean Squared Error: 1.69
These values are far better than the values achieved using
Multiple linear Regression which highlights the power of
Ensemble regression models.One can easily visualize the same
by looking at the difference between predicted and actual score
which is at most 3 runs.
C. Radius Neighbors Regression
This regression model is based on the concept of K nearest
neighbors.Just like K nearest neighbors regression model
, Radius Neighbors Regression finds the neighbors within
specific distance(Manhattan Distance).
Actual Score Predicted Score
150 150
119 121
204 204
130 132
155 157
160 163
222 221
163 164
223 222
148 149
1) Performance Evaluation: Following are the values of
Mean absolute Error and Root mean squared Error for the
model.
Mean Absolute Error: 1.36
Root Mean Squared Error: 1.79
These values are better than the multiple regression model
and somewhat comparable to Random Forest ensemble error
values.
D. Comparison of R square Values
R square values of the regression models are shown in the
below graph.
High values of R square for Random Forest and Radius
Nearest Neighbors depicts that these regression models ex-
plains all the variability of the response data around its mean
in more effective manner as compared to Multivariate Linear
Regression Model.
E. Comparison of Mean Absolute Error
The mean of the absolute value of the errors is defined as
the absolute difference between actual and predicted score for
a match.The mathematical formula is given below:
Where MAE stands for Mean Absolute Error.
This difference is 6-7 runs for Multivariate Linear Re-
gression, 1-5 runs for Random Forest and Radius Nearest
Neighbor.
The graph depicting these error values for all three models
is shown below:
F. Comparison of Root Mean Squared Error
The square root of the mean of the squared errors is called as
Root Mean Squared Error.The mathematical formula is given
below(RMSE : Root Mean Squared Error):
Following graph compares these Root Mean Squared values.
Root Mean Squared Error and Mean absolute error graphs
are similar to each other.Both the graph shows that Random
Forest and Radius Nearest Neighbor Regression are perform-
ing better as compared to Multivariate Linear Regression
Model.
REFERENCES
[1] https://www.espncricinfo.com/series/ /id/8048/season/2019/indian-
premier-league
[2] Kaggle Data Set https://www.kaggle.com/manasgarg/ipl
[3] Regression Model Implementation https://towardsdatascience.com/
[4] Scikit-Learn Documentation https://scikit-
learn.org/stable/documentation.html
[5] Indian Premier League Official https://www.iplt20.com/
[6] Cricket Analytics Visualized https://cricketsavant.wordpress.com/
[7] Predicting the Outcome of ODI Cricket Matches: A Team Composition
Based Approach by Madan Gopal Jhawar, Vikram Pudi, IIIT-H

More Related Content

What's hot

AIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTIONAIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTIONIRJET Journal
 
The Rise & Fall of Indian Football Presentation
The Rise & Fall of Indian Football PresentationThe Rise & Fall of Indian Football Presentation
The Rise & Fall of Indian Football PresentationShekhar Ibhrampurkar
 
Kingfisher Airlines Launch
Kingfisher Airlines LaunchKingfisher Airlines Launch
Kingfisher Airlines LaunchShahzad Khan
 
This is a SQL IPL auction strategy based project help to choose best player a...
This is a SQL IPL auction strategy based project help to choose best player a...This is a SQL IPL auction strategy based project help to choose best player a...
This is a SQL IPL auction strategy based project help to choose best player a...sudhanshuwalia3
 
Project Report on IPL - Indian Premier League
Project Report on IPL - Indian Premier LeagueProject Report on IPL - Indian Premier League
Project Report on IPL - Indian Premier LeagueKaustubh Barve
 
Business strategies of football clubs
Business strategies of football clubsBusiness strategies of football clubs
Business strategies of football clubsKirankumar Dash
 
Marketing plan - x-league - football / soccer (Pre #HeroISL, #IndianSuperLea...
Marketing plan -  x-league - football / soccer (Pre #HeroISL, #IndianSuperLea...Marketing plan -  x-league - football / soccer (Pre #HeroISL, #IndianSuperLea...
Marketing plan - x-league - football / soccer (Pre #HeroISL, #IndianSuperLea...Sandeep Vadnere
 
Future of mobility | Mahindra War Room 2013 | North Zone Winners
Future of mobility | Mahindra War Room 2013 | North Zone WinnersFuture of mobility | Mahindra War Room 2013 | North Zone Winners
Future of mobility | Mahindra War Room 2013 | North Zone WinnersTarun Gupta
 
Ppt india world cup cricket wins
Ppt  india world cup cricket winsPpt  india world cup cricket wins
Ppt india world cup cricket winsAbhishek Abhi
 
Commercialization Of Sports
Commercialization Of SportsCommercialization Of Sports
Commercialization Of SportsAbhra Ghosh
 
Cricket match outcome prediction using machine learning
Cricket match outcome prediction using machine learningCricket match outcome prediction using machine learning
Cricket match outcome prediction using machine learningdataalcott
 
A Mall Case Study Machine Learning
A Mall Case Study Machine LearningA Mall Case Study Machine Learning
A Mall Case Study Machine LearningYogesh Dhandharia
 
Electric vehicle scenario in india
Electric vehicle scenario in indiaElectric vehicle scenario in india
Electric vehicle scenario in indiaDeepak Sakthivel
 

What's hot (20)

IT in sports
IT in sportsIT in sports
IT in sports
 
AIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTIONAIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTION
 
The Rise & Fall of Indian Football Presentation
The Rise & Fall of Indian Football PresentationThe Rise & Fall of Indian Football Presentation
The Rise & Fall of Indian Football Presentation
 
Kingfisher Airlines Launch
Kingfisher Airlines LaunchKingfisher Airlines Launch
Kingfisher Airlines Launch
 
Ipl.ppt
Ipl.pptIpl.ppt
Ipl.ppt
 
FAKE NEWS DETECTION PPT
FAKE NEWS DETECTION PPT FAKE NEWS DETECTION PPT
FAKE NEWS DETECTION PPT
 
This is a SQL IPL auction strategy based project help to choose best player a...
This is a SQL IPL auction strategy based project help to choose best player a...This is a SQL IPL auction strategy based project help to choose best player a...
This is a SQL IPL auction strategy based project help to choose best player a...
 
IPL 2014
IPL 2014IPL 2014
IPL 2014
 
Project Report on IPL - Indian Premier League
Project Report on IPL - Indian Premier LeagueProject Report on IPL - Indian Premier League
Project Report on IPL - Indian Premier League
 
Business strategies of football clubs
Business strategies of football clubsBusiness strategies of football clubs
Business strategies of football clubs
 
IPL: INDIAN PREMIER LEAGUE
IPL: INDIAN PREMIER LEAGUEIPL: INDIAN PREMIER LEAGUE
IPL: INDIAN PREMIER LEAGUE
 
Marketing plan - x-league - football / soccer (Pre #HeroISL, #IndianSuperLea...
Marketing plan -  x-league - football / soccer (Pre #HeroISL, #IndianSuperLea...Marketing plan -  x-league - football / soccer (Pre #HeroISL, #IndianSuperLea...
Marketing plan - x-league - football / soccer (Pre #HeroISL, #IndianSuperLea...
 
Football
FootballFootball
Football
 
Future of mobility | Mahindra War Room 2013 | North Zone Winners
Future of mobility | Mahindra War Room 2013 | North Zone WinnersFuture of mobility | Mahindra War Room 2013 | North Zone Winners
Future of mobility | Mahindra War Room 2013 | North Zone Winners
 
Ppt india world cup cricket wins
Ppt  india world cup cricket winsPpt  india world cup cricket wins
Ppt india world cup cricket wins
 
Commercialization Of Sports
Commercialization Of SportsCommercialization Of Sports
Commercialization Of Sports
 
Cricket match outcome prediction using machine learning
Cricket match outcome prediction using machine learningCricket match outcome prediction using machine learning
Cricket match outcome prediction using machine learning
 
A Mall Case Study Machine Learning
A Mall Case Study Machine LearningA Mall Case Study Machine Learning
A Mall Case Study Machine Learning
 
Maths and sports
Maths and sportsMaths and sports
Maths and sports
 
Electric vehicle scenario in india
Electric vehicle scenario in indiaElectric vehicle scenario in india
Electric vehicle scenario in india
 

Similar to IPL Data Analysis using Data Science

Comparative Analysis of Machine Learning Models for Cricket Score and Win Pre...
Comparative Analysis of Machine Learning Models for Cricket Score and Win Pre...Comparative Analysis of Machine Learning Models for Cricket Score and Win Pre...
Comparative Analysis of Machine Learning Models for Cricket Score and Win Pre...IRJET Journal
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Simplilearn
 
Stock Price Trend Forecasting using Supervised Learning
Stock Price Trend Forecasting using Supervised LearningStock Price Trend Forecasting using Supervised Learning
Stock Price Trend Forecasting using Supervised LearningSharvil Katariya
 
Cricket Score and Winning Prediction
Cricket Score and Winning PredictionCricket Score and Winning Prediction
Cricket Score and Winning PredictionIRJET Journal
 
Data Analytics Project_Eun Seuk Choi (Eric)
Data Analytics Project_Eun Seuk Choi (Eric)Data Analytics Project_Eun Seuk Choi (Eric)
Data Analytics Project_Eun Seuk Choi (Eric)Eric Choi
 
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...PATHALAMRAJESH
 
Real Estate Investment Advising Using Machine Learning
Real Estate Investment Advising Using Machine LearningReal Estate Investment Advising Using Machine Learning
Real Estate Investment Advising Using Machine LearningIRJET Journal
 
Cricket score and winner predictor
Cricket score and winner predictorCricket score and winner predictor
Cricket score and winner predictorKeyaShukla3
 
La liga 2013 2014 analysis
La liga 2013 2014 analysisLa liga 2013 2014 analysis
La liga 2013 2014 analysisRitu Sarkar
 
IRJET- Error Reduction in Data Prediction using Least Square Regression Method
IRJET- Error Reduction in Data Prediction using Least Square Regression MethodIRJET- Error Reduction in Data Prediction using Least Square Regression Method
IRJET- Error Reduction in Data Prediction using Least Square Regression MethodIRJET Journal
 
Predicting Football Match Results with Data Mining Techniques
Predicting Football Match Results with Data Mining TechniquesPredicting Football Match Results with Data Mining Techniques
Predicting Football Match Results with Data Mining TechniquesIJCSIS Research Publications
 
User Payment Prediction in Free-to-Play
User Payment Prediction in Free-to-PlayUser Payment Prediction in Free-to-Play
User Payment Prediction in Free-to-PlayAhmed Hassan
 
Machine Learning Foundations Project Presentation
Machine Learning Foundations Project PresentationMachine Learning Foundations Project Presentation
Machine Learning Foundations Project PresentationAmit J Bhattacharyya
 
House Price Estimation as a Function Fitting Problem with using ANN Approach
House Price Estimation as a Function Fitting Problem with using ANN ApproachHouse Price Estimation as a Function Fitting Problem with using ANN Approach
House Price Estimation as a Function Fitting Problem with using ANN ApproachYusuf Uzun
 
Chap13 intro to multiple regression
Chap13 intro to multiple regressionChap13 intro to multiple regression
Chap13 intro to multiple regressionUni Azza Aunillah
 
BIG MART SALES PREDICTION USING MACHINE LEARNING
BIG MART SALES PREDICTION USING MACHINE LEARNINGBIG MART SALES PREDICTION USING MACHINE LEARNING
BIG MART SALES PREDICTION USING MACHINE LEARNINGIRJET Journal
 
Database Modeling presentation
Database Modeling  presentationDatabase Modeling  presentation
Database Modeling presentationBhavishya Tyagi
 

Similar to IPL Data Analysis using Data Science (20)

IRJET-V8I11270.pdf
IRJET-V8I11270.pdfIRJET-V8I11270.pdf
IRJET-V8I11270.pdf
 
Comparative Analysis of Machine Learning Models for Cricket Score and Win Pre...
Comparative Analysis of Machine Learning Models for Cricket Score and Win Pre...Comparative Analysis of Machine Learning Models for Cricket Score and Win Pre...
Comparative Analysis of Machine Learning Models for Cricket Score and Win Pre...
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
 
Stock Price Trend Forecasting using Supervised Learning
Stock Price Trend Forecasting using Supervised LearningStock Price Trend Forecasting using Supervised Learning
Stock Price Trend Forecasting using Supervised Learning
 
Cricket Score and Winning Prediction
Cricket Score and Winning PredictionCricket Score and Winning Prediction
Cricket Score and Winning Prediction
 
Data Analytics Project_Eun Seuk Choi (Eric)
Data Analytics Project_Eun Seuk Choi (Eric)Data Analytics Project_Eun Seuk Choi (Eric)
Data Analytics Project_Eun Seuk Choi (Eric)
 
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
 
Real Estate Investment Advising Using Machine Learning
Real Estate Investment Advising Using Machine LearningReal Estate Investment Advising Using Machine Learning
Real Estate Investment Advising Using Machine Learning
 
Cricket score and winner predictor
Cricket score and winner predictorCricket score and winner predictor
Cricket score and winner predictor
 
Cricket 2
Cricket 2Cricket 2
Cricket 2
 
La liga 2013 2014 analysis
La liga 2013 2014 analysisLa liga 2013 2014 analysis
La liga 2013 2014 analysis
 
IRJET- Error Reduction in Data Prediction using Least Square Regression Method
IRJET- Error Reduction in Data Prediction using Least Square Regression MethodIRJET- Error Reduction in Data Prediction using Least Square Regression Method
IRJET- Error Reduction in Data Prediction using Least Square Regression Method
 
Predicting Football Match Results with Data Mining Techniques
Predicting Football Match Results with Data Mining TechniquesPredicting Football Match Results with Data Mining Techniques
Predicting Football Match Results with Data Mining Techniques
 
User Payment Prediction in Free-to-Play
User Payment Prediction in Free-to-PlayUser Payment Prediction in Free-to-Play
User Payment Prediction in Free-to-Play
 
Machine Learning Foundations Project Presentation
Machine Learning Foundations Project PresentationMachine Learning Foundations Project Presentation
Machine Learning Foundations Project Presentation
 
House Price Estimation as a Function Fitting Problem with using ANN Approach
House Price Estimation as a Function Fitting Problem with using ANN ApproachHouse Price Estimation as a Function Fitting Problem with using ANN Approach
House Price Estimation as a Function Fitting Problem with using ANN Approach
 
Chap13 intro to multiple regression
Chap13 intro to multiple regressionChap13 intro to multiple regression
Chap13 intro to multiple regression
 
SECh910
SECh910SECh910
SECh910
 
BIG MART SALES PREDICTION USING MACHINE LEARNING
BIG MART SALES PREDICTION USING MACHINE LEARNINGBIG MART SALES PREDICTION USING MACHINE LEARNING
BIG MART SALES PREDICTION USING MACHINE LEARNING
 
Database Modeling presentation
Database Modeling  presentationDatabase Modeling  presentation
Database Modeling presentation
 

Recently uploaded

Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 

Recently uploaded (20)

Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 

IPL Data Analysis using Data Science

  • 1. IPL Data Analysis Kaushal Sanadhya Indraprashta Institute of Information Technology,Delhi Okhla Industrial Estate,Phase III New Delhi,India emailid : kaushal19133@iiitd.ac.in Abstract—Cricket is one of the most celebrated games in the world. With the introduction of Data science and machine learning techniques in the world of cricket, forecasting the score of the match has been established as one of the most challenging problems.Especially in the shortest format T20, score forecasting and analyzing other statistics become more important as every moment is sufficient enough to take the game away from oppo- sition team. Our work develops some crucial predictions using various machine learning models like RandomForestRegressor, Linear regressor , Radius Nearest Neighbors etc. Index Terms—score forecasting,modeling Indian Premier League data,Regression,machine learning I. INTRODUCTION There can be several factors that strongly affect predictions like the current score, wickets in hand, weather conditions, dew factor, pitch condition, etc. We have used a data set of 1,79,079 records consisting of the data for every single ball in IPL matches from the year 2009 to 2019. Significant contributions from this project are as follows: a) Feature construction: We have created new attributes [balls remaining, current score, wickets in hand] that can capture the critical information in the dataset(deliveries.csv) much more efficiently than the original attributes.: b) Final score prediction : predicting the eventual score in the first innings. : II. FEATURE CONSTRUCTION The existing features (Deliveries.csv) like over, ball,is super over,wide runs,non-stricker, etc. are not good enough to make confident, reliable predictions for a final score, Therefore new features score are created as follows: • balls remaining: number of balls remaining in the first innings of the match • current score: current score of the team. • wickets remaining: wickets in hand for the team. • final score: final score of that team in that match this is the target variable which we are trying to forecast. When these newly created features are used to predict the final score, we obtained some handsome value of R square for different machine learning models used in the project III. PREDICTING FIRST INNINGS SCORES Score prediction for the first innings is a typical multiple regression problem since the output is the forecasted score. Data set of size 10,315 records is used to train our model. A. Training Data Match data for the following teams is used to train our machine learning models: • Chennai Super Kings • Hyderabad Sun Risers • Mumbai Indians Training data size is 10,316 records are used . B. Test Data Match data for Kolkata Knight Riders is used for Testing purposes. To make predictions for the final score of the Kolkata team, every time the final score for ten random matches is selected, which is further compared with the predicted score by our machine learning models. IV. REGRESSION MODELS USED A. Multivariate Linear Regression Most of the cricket problems that are encountered will have more than two variables. Therefore Multivariate Linear regression is used to fit the line in our multi dimensional space. Using this regression model we can even draw the impact of each feature on the predicted score using the below well know equation for linear regression: Y = a + b*X1 + c*X2 + d*X3 For our analysis we will get the below equation: Final Score = 27.915 + 0.989121 * current score + 1.183421 * balls remaining - 3.576307 * wickets Actual Score Predicted Score 150 154 119 114 204 210 130 131 155 152 160 169 222 232 163 160 223 231 148 149
  • 2. 1) Performance Evaluation: Mean Absolute error and Root mean Squared errors are the two parameters used to evaluate the performance and following values are achieved: Mean Absolute Error: 5.13 Root Mean Squared Error: 6.02 These values are moderately high since the actual and pre- dicted scores differs by at most 10 runs. The performance can be enhanced by using some more advanced regression models like ada boost or Random forest Regression. B. Random Forest Regression Random Forest Regression is an ensemble technique which makes use of multiple prediction model.It combines the result of these prediction models to give more accurate results. Actual Score Predicted Score 150 150 119 121 204 201 130 132 155 155 160 160 222 220 163 163 223 221 148 148 1) Importance Associated With Various Features: By looking at the values of the features importance we can estimate the significant contribution made by some feature. Feature Importance current score 0.47 Balls Remaining 0.30 Wickets 0.22677739128946717 2) Performance Evaluation: Following are the values of Mean Absolute Error , Root means squared Error: Mean Absolute Error: 1.36 Root Mean Squared Error: 1.69 These values are far better than the values achieved using Multiple linear Regression which highlights the power of Ensemble regression models.One can easily visualize the same by looking at the difference between predicted and actual score which is at most 3 runs. C. Radius Neighbors Regression This regression model is based on the concept of K nearest neighbors.Just like K nearest neighbors regression model , Radius Neighbors Regression finds the neighbors within specific distance(Manhattan Distance). Actual Score Predicted Score 150 150 119 121 204 204 130 132 155 157 160 163 222 221 163 164 223 222 148 149 1) Performance Evaluation: Following are the values of Mean absolute Error and Root mean squared Error for the model. Mean Absolute Error: 1.36 Root Mean Squared Error: 1.79 These values are better than the multiple regression model and somewhat comparable to Random Forest ensemble error values. D. Comparison of R square Values R square values of the regression models are shown in the below graph. High values of R square for Random Forest and Radius Nearest Neighbors depicts that these regression models ex- plains all the variability of the response data around its mean in more effective manner as compared to Multivariate Linear Regression Model. E. Comparison of Mean Absolute Error The mean of the absolute value of the errors is defined as the absolute difference between actual and predicted score for a match.The mathematical formula is given below:
  • 3. Where MAE stands for Mean Absolute Error. This difference is 6-7 runs for Multivariate Linear Re- gression, 1-5 runs for Random Forest and Radius Nearest Neighbor. The graph depicting these error values for all three models is shown below: F. Comparison of Root Mean Squared Error The square root of the mean of the squared errors is called as Root Mean Squared Error.The mathematical formula is given below(RMSE : Root Mean Squared Error): Following graph compares these Root Mean Squared values. Root Mean Squared Error and Mean absolute error graphs are similar to each other.Both the graph shows that Random Forest and Radius Nearest Neighbor Regression are perform- ing better as compared to Multivariate Linear Regression Model. REFERENCES [1] https://www.espncricinfo.com/series/ /id/8048/season/2019/indian- premier-league [2] Kaggle Data Set https://www.kaggle.com/manasgarg/ipl [3] Regression Model Implementation https://towardsdatascience.com/ [4] Scikit-Learn Documentation https://scikit- learn.org/stable/documentation.html [5] Indian Premier League Official https://www.iplt20.com/ [6] Cricket Analytics Visualized https://cricketsavant.wordpress.com/ [7] Predicting the Outcome of ODI Cricket Matches: A Team Composition Based Approach by Madan Gopal Jhawar, Vikram Pudi, IIIT-H