SlideShare a Scribd company logo
IPL Data Analysis
Kaushal Sanadhya
Indraprashta Institute of Information Technology,Delhi
Okhla Industrial Estate,Phase III
New Delhi,India
emailid : kaushal19133@iiitd.ac.in
Abstract—Cricket is one of the most celebrated games in
the world. With the introduction of Data science and machine
learning techniques in the world of cricket, forecasting the score
of the match has been established as one of the most challenging
problems.Especially in the shortest format T20, score forecasting
and analyzing other statistics become more important as every
moment is sufficient enough to take the game away from oppo-
sition team. Our work develops some crucial predictions using
various machine learning models like RandomForestRegressor,
Linear regressor , Radius Nearest Neighbors etc.
Index Terms—score forecasting,modeling Indian Premier
League data,Regression,machine learning
I. INTRODUCTION
There can be several factors that strongly affect predictions
like the current score, wickets in hand, weather conditions,
dew factor, pitch condition, etc. We have used a data set
of 1,79,079 records consisting of the data for every single
ball in IPL matches from the year 2009 to 2019. Significant
contributions from this project are as follows:
a) Feature construction: We have created new attributes
[balls remaining, current score, wickets in hand] that can
capture the critical information in the dataset(deliveries.csv)
much more efficiently than the original attributes.:
b) Final score prediction : predicting the eventual score
in the first innings. :
II. FEATURE CONSTRUCTION
The existing features (Deliveries.csv) like over,
ball,is super over,wide runs,non-stricker, etc. are not
good enough to make confident, reliable predictions for
a final score, Therefore new features score are created as
follows:
• balls remaining: number of balls remaining in the first
innings of the match
• current score: current score of the team.
• wickets remaining: wickets in hand for the team.
• final score: final score of that team in that match this is
the target variable which we are trying to forecast.
When these newly created features are used to predict the
final score, we obtained some handsome value of R square for
different machine learning models used in the project
III. PREDICTING FIRST INNINGS SCORES
Score prediction for the first innings is a typical multiple
regression problem since the output is the forecasted score.
Data set of size 10,315 records is used to train our model.
A. Training Data
Match data for the following teams is used to train our
machine learning models:
• Chennai Super Kings
• Hyderabad Sun Risers
• Mumbai Indians
Training data size is 10,316 records are used .
B. Test Data
Match data for Kolkata Knight Riders is used for Testing
purposes. To make predictions for the final score of the Kolkata
team, every time the final score for ten random matches is
selected, which is further compared with the predicted score
by our machine learning models.
IV. REGRESSION MODELS USED
A. Multivariate Linear Regression
Most of the cricket problems that are encountered will
have more than two variables. Therefore Multivariate Linear
regression is used to fit the line in our multi dimensional space.
Using this regression model we can even draw the impact of
each feature on the predicted score using the below well know
equation for linear regression: Y = a + b*X1 + c*X2 + d*X3
For our analysis we will get the below equation:
Final Score = 27.915 + 0.989121 * current score +
1.183421 * balls remaining - 3.576307 * wickets
Actual Score Predicted Score
150 154
119 114
204 210
130 131
155 152
160 169
222 232
163 160
223 231
148 149
1) Performance Evaluation: Mean Absolute error and Root
mean Squared errors are the two parameters used to evaluate
the performance and following values are achieved:
Mean Absolute Error: 5.13
Root Mean Squared Error: 6.02
These values are moderately high since the actual and pre-
dicted scores differs by at most 10 runs. The performance can
be enhanced by using some more advanced regression models
like ada boost or Random forest Regression.
B. Random Forest Regression
Random Forest Regression is an ensemble technique which
makes use of multiple prediction model.It combines the result
of these prediction models to give more accurate results.
Actual Score Predicted Score
150 150
119 121
204 201
130 132
155 155
160 160
222 220
163 163
223 221
148 148
1) Importance Associated With Various Features: By
looking at the values of the features importance we can
estimate the significant contribution made by some feature.
Feature Importance
current score 0.47
Balls Remaining 0.30
Wickets 0.22677739128946717
2) Performance Evaluation: Following are the values of
Mean Absolute Error , Root means squared Error:
Mean Absolute Error: 1.36
Root Mean Squared Error: 1.69
These values are far better than the values achieved using
Multiple linear Regression which highlights the power of
Ensemble regression models.One can easily visualize the same
by looking at the difference between predicted and actual score
which is at most 3 runs.
C. Radius Neighbors Regression
This regression model is based on the concept of K nearest
neighbors.Just like K nearest neighbors regression model
, Radius Neighbors Regression finds the neighbors within
specific distance(Manhattan Distance).
Actual Score Predicted Score
150 150
119 121
204 204
130 132
155 157
160 163
222 221
163 164
223 222
148 149
1) Performance Evaluation: Following are the values of
Mean absolute Error and Root mean squared Error for the
model.
Mean Absolute Error: 1.36
Root Mean Squared Error: 1.79
These values are better than the multiple regression model
and somewhat comparable to Random Forest ensemble error
values.
D. Comparison of R square Values
R square values of the regression models are shown in the
below graph.
High values of R square for Random Forest and Radius
Nearest Neighbors depicts that these regression models ex-
plains all the variability of the response data around its mean
in more effective manner as compared to Multivariate Linear
Regression Model.
E. Comparison of Mean Absolute Error
The mean of the absolute value of the errors is defined as
the absolute difference between actual and predicted score for
a match.The mathematical formula is given below:
Where MAE stands for Mean Absolute Error.
This difference is 6-7 runs for Multivariate Linear Re-
gression, 1-5 runs for Random Forest and Radius Nearest
Neighbor.
The graph depicting these error values for all three models
is shown below:
F. Comparison of Root Mean Squared Error
The square root of the mean of the squared errors is called as
Root Mean Squared Error.The mathematical formula is given
below(RMSE : Root Mean Squared Error):
Following graph compares these Root Mean Squared values.
Root Mean Squared Error and Mean absolute error graphs
are similar to each other.Both the graph shows that Random
Forest and Radius Nearest Neighbor Regression are perform-
ing better as compared to Multivariate Linear Regression
Model.
REFERENCES
[1] https://www.espncricinfo.com/series/ /id/8048/season/2019/indian-
premier-league
[2] Kaggle Data Set https://www.kaggle.com/manasgarg/ipl
[3] Regression Model Implementation https://towardsdatascience.com/
[4] Scikit-Learn Documentation https://scikit-
learn.org/stable/documentation.html
[5] Indian Premier League Official https://www.iplt20.com/
[6] Cricket Analytics Visualized https://cricketsavant.wordpress.com/
[7] Predicting the Outcome of ODI Cricket Matches: A Team Composition
Based Approach by Madan Gopal Jhawar, Vikram Pudi, IIIT-H

More Related Content

What's hot

Fifa World Cup Presentation #1
Fifa World Cup Presentation #1Fifa World Cup Presentation #1
Fifa World Cup Presentation #1guest590e9f
 
Indian Premier League Big Data Case Study
Indian Premier League Big Data Case StudyIndian Premier League Big Data Case Study
Indian Premier League Big Data Case Study
Vrushabh Chauhan
 
Cricket predictor
Cricket predictorCricket predictor
Cricket predictor
Rajat Mittal
 
IPL's Opening Week Receives Over 186K Mentions
IPL's Opening Week Receives Over 186K MentionsIPL's Opening Week Receives Over 186K Mentions
IPL's Opening Week Receives Over 186K Mentions
Simplify360
 
DISEASE PREDICTION SYSTEM USING DATA MINING
DISEASE PREDICTION SYSTEM USING  DATA MININGDISEASE PREDICTION SYSTEM USING  DATA MINING
DISEASE PREDICTION SYSTEM USING DATA MINING
shivaniyadav112
 
Cricket from sports to corporate business
Cricket from sports  to corporate businessCricket from sports  to corporate business
Cricket from sports to corporate businesskhelani123
 
Credit card fraud detection using machine learning Algorithms
Credit card fraud detection using machine learning AlgorithmsCredit card fraud detection using machine learning Algorithms
Credit card fraud detection using machine learning Algorithms
ankit panigrahy
 
Make my trip
Make my trip Make my trip
Make my trip
Nishant Rao Boddeda
 
Activate Technology & Media Outlook 2020
Activate Technology & Media Outlook 2020Activate Technology & Media Outlook 2020
Activate Technology & Media Outlook 2020
Activate
 
INDIAN PREMIER LEAGUE(IPL)
INDIAN PREMIER LEAGUE(IPL)INDIAN PREMIER LEAGUE(IPL)
INDIAN PREMIER LEAGUE(IPL)
Irfan Tanwari
 
How CleverTap helped Dream11 Drive Exceptional User Growth
How CleverTap helped Dream11 Drive Exceptional User GrowthHow CleverTap helped Dream11 Drive Exceptional User Growth
How CleverTap helped Dream11 Drive Exceptional User Growth
CleverTap
 
BSC CSIT Final Year Internship Experience Report on SEO
BSC CSIT Final Year Internship Experience Report on SEOBSC CSIT Final Year Internship Experience Report on SEO
BSC CSIT Final Year Internship Experience Report on SEO
Sirish Paudel
 
Famous Personality - Cristiano Ronaldo
Famous Personality - Cristiano RonaldoFamous Personality - Cristiano Ronaldo
Famous Personality - Cristiano Ronaldo
Muhd Aizuddin Ali
 
Fraud detection with Machine Learning
Fraud detection with Machine LearningFraud detection with Machine Learning
Fraud detection with Machine Learning
Scaleway
 
Ipl is it true to sports or business
Ipl   is it true to sports or businessIpl   is it true to sports or business
Ipl is it true to sports or business
AbhikSengupta7
 
Make My Trip - Successful Journey!!!!
Make My Trip - Successful Journey!!!!Make My Trip - Successful Journey!!!!
Make My Trip - Successful Journey!!!!
Sushil Rai
 
Entertainment & Advertising | Riding the Digital Wave
Entertainment & Advertising | Riding the Digital WaveEntertainment & Advertising | Riding the Digital Wave
Entertainment & Advertising | Riding the Digital Wave
RedSeer
 
Swiggy presentation
Swiggy presentationSwiggy presentation
Swiggy presentation
Avinashkumar1627
 
7 Ways Sports Teams Win With Sports Analytics
7 Ways Sports Teams Win With Sports Analytics7 Ways Sports Teams Win With Sports Analytics
7 Ways Sports Teams Win With Sports Analytics
Tableau Software
 

What's hot (20)

Fifa World Cup Presentation #1
Fifa World Cup Presentation #1Fifa World Cup Presentation #1
Fifa World Cup Presentation #1
 
Indian Premier League Big Data Case Study
Indian Premier League Big Data Case StudyIndian Premier League Big Data Case Study
Indian Premier League Big Data Case Study
 
Cricket predictor
Cricket predictorCricket predictor
Cricket predictor
 
IPL's Opening Week Receives Over 186K Mentions
IPL's Opening Week Receives Over 186K MentionsIPL's Opening Week Receives Over 186K Mentions
IPL's Opening Week Receives Over 186K Mentions
 
DISEASE PREDICTION SYSTEM USING DATA MINING
DISEASE PREDICTION SYSTEM USING  DATA MININGDISEASE PREDICTION SYSTEM USING  DATA MINING
DISEASE PREDICTION SYSTEM USING DATA MINING
 
Cricket from sports to corporate business
Cricket from sports  to corporate businessCricket from sports  to corporate business
Cricket from sports to corporate business
 
Credit card fraud detection using machine learning Algorithms
Credit card fraud detection using machine learning AlgorithmsCredit card fraud detection using machine learning Algorithms
Credit card fraud detection using machine learning Algorithms
 
Make my trip
Make my trip Make my trip
Make my trip
 
Activate Technology & Media Outlook 2020
Activate Technology & Media Outlook 2020Activate Technology & Media Outlook 2020
Activate Technology & Media Outlook 2020
 
INDIAN PREMIER LEAGUE(IPL)
INDIAN PREMIER LEAGUE(IPL)INDIAN PREMIER LEAGUE(IPL)
INDIAN PREMIER LEAGUE(IPL)
 
How CleverTap helped Dream11 Drive Exceptional User Growth
How CleverTap helped Dream11 Drive Exceptional User GrowthHow CleverTap helped Dream11 Drive Exceptional User Growth
How CleverTap helped Dream11 Drive Exceptional User Growth
 
BSC CSIT Final Year Internship Experience Report on SEO
BSC CSIT Final Year Internship Experience Report on SEOBSC CSIT Final Year Internship Experience Report on SEO
BSC CSIT Final Year Internship Experience Report on SEO
 
Famous Personality - Cristiano Ronaldo
Famous Personality - Cristiano RonaldoFamous Personality - Cristiano Ronaldo
Famous Personality - Cristiano Ronaldo
 
Fifa world-cup
Fifa world-cupFifa world-cup
Fifa world-cup
 
Fraud detection with Machine Learning
Fraud detection with Machine LearningFraud detection with Machine Learning
Fraud detection with Machine Learning
 
Ipl is it true to sports or business
Ipl   is it true to sports or businessIpl   is it true to sports or business
Ipl is it true to sports or business
 
Make My Trip - Successful Journey!!!!
Make My Trip - Successful Journey!!!!Make My Trip - Successful Journey!!!!
Make My Trip - Successful Journey!!!!
 
Entertainment & Advertising | Riding the Digital Wave
Entertainment & Advertising | Riding the Digital WaveEntertainment & Advertising | Riding the Digital Wave
Entertainment & Advertising | Riding the Digital Wave
 
Swiggy presentation
Swiggy presentationSwiggy presentation
Swiggy presentation
 
7 Ways Sports Teams Win With Sports Analytics
7 Ways Sports Teams Win With Sports Analytics7 Ways Sports Teams Win With Sports Analytics
7 Ways Sports Teams Win With Sports Analytics
 

Similar to IPL Data Analysis using Data Science

IRJET-V8I11270.pdf
IRJET-V8I11270.pdfIRJET-V8I11270.pdf
IRJET-V8I11270.pdf
ShubhamSharma2566
 
Comparative Analysis of Machine Learning Models for Cricket Score and Win Pre...
Comparative Analysis of Machine Learning Models for Cricket Score and Win Pre...Comparative Analysis of Machine Learning Models for Cricket Score and Win Pre...
Comparative Analysis of Machine Learning Models for Cricket Score and Win Pre...
IRJET Journal
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Simplilearn
 
Stock Price Trend Forecasting using Supervised Learning
Stock Price Trend Forecasting using Supervised LearningStock Price Trend Forecasting using Supervised Learning
Stock Price Trend Forecasting using Supervised Learning
Sharvil Katariya
 
AIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTIONAIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTION
IRJET Journal
 
Cricket Score and Winning Prediction
Cricket Score and Winning PredictionCricket Score and Winning Prediction
Cricket Score and Winning Prediction
IRJET Journal
 
Data Analytics Project_Eun Seuk Choi (Eric)
Data Analytics Project_Eun Seuk Choi (Eric)Data Analytics Project_Eun Seuk Choi (Eric)
Data Analytics Project_Eun Seuk Choi (Eric)Eric Choi
 
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
PATHALAMRAJESH
 
Real Estate Investment Advising Using Machine Learning
Real Estate Investment Advising Using Machine LearningReal Estate Investment Advising Using Machine Learning
Real Estate Investment Advising Using Machine Learning
IRJET Journal
 
Cricket score and winner predictor
Cricket score and winner predictorCricket score and winner predictor
Cricket score and winner predictor
KeyaShukla3
 
Cricket 2
Cricket 2Cricket 2
La liga 2013 2014 analysis
La liga 2013 2014 analysisLa liga 2013 2014 analysis
La liga 2013 2014 analysis
Ritu Sarkar
 
IRJET- Error Reduction in Data Prediction using Least Square Regression Method
IRJET- Error Reduction in Data Prediction using Least Square Regression MethodIRJET- Error Reduction in Data Prediction using Least Square Regression Method
IRJET- Error Reduction in Data Prediction using Least Square Regression Method
IRJET Journal
 
Predicting Football Match Results with Data Mining Techniques
Predicting Football Match Results with Data Mining TechniquesPredicting Football Match Results with Data Mining Techniques
Predicting Football Match Results with Data Mining Techniques
IJCSIS Research Publications
 
User Payment Prediction in Free-to-Play
User Payment Prediction in Free-to-PlayUser Payment Prediction in Free-to-Play
User Payment Prediction in Free-to-PlayAhmed Hassan
 
Machine Learning Foundations Project Presentation
Machine Learning Foundations Project PresentationMachine Learning Foundations Project Presentation
Machine Learning Foundations Project Presentation
Amit J Bhattacharyya
 
House Price Estimation as a Function Fitting Problem with using ANN Approach
House Price Estimation as a Function Fitting Problem with using ANN ApproachHouse Price Estimation as a Function Fitting Problem with using ANN Approach
House Price Estimation as a Function Fitting Problem with using ANN ApproachYusuf Uzun
 
Chap13 intro to multiple regression
Chap13 intro to multiple regressionChap13 intro to multiple regression
Chap13 intro to multiple regression
Uni Azza Aunillah
 
BIG MART SALES PREDICTION USING MACHINE LEARNING
BIG MART SALES PREDICTION USING MACHINE LEARNINGBIG MART SALES PREDICTION USING MACHINE LEARNING
BIG MART SALES PREDICTION USING MACHINE LEARNING
IRJET Journal
 

Similar to IPL Data Analysis using Data Science (20)

IRJET-V8I11270.pdf
IRJET-V8I11270.pdfIRJET-V8I11270.pdf
IRJET-V8I11270.pdf
 
Comparative Analysis of Machine Learning Models for Cricket Score and Win Pre...
Comparative Analysis of Machine Learning Models for Cricket Score and Win Pre...Comparative Analysis of Machine Learning Models for Cricket Score and Win Pre...
Comparative Analysis of Machine Learning Models for Cricket Score and Win Pre...
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
 
Stock Price Trend Forecasting using Supervised Learning
Stock Price Trend Forecasting using Supervised LearningStock Price Trend Forecasting using Supervised Learning
Stock Price Trend Forecasting using Supervised Learning
 
AIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTIONAIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTION
 
Cricket Score and Winning Prediction
Cricket Score and Winning PredictionCricket Score and Winning Prediction
Cricket Score and Winning Prediction
 
Data Analytics Project_Eun Seuk Choi (Eric)
Data Analytics Project_Eun Seuk Choi (Eric)Data Analytics Project_Eun Seuk Choi (Eric)
Data Analytics Project_Eun Seuk Choi (Eric)
 
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
 
Real Estate Investment Advising Using Machine Learning
Real Estate Investment Advising Using Machine LearningReal Estate Investment Advising Using Machine Learning
Real Estate Investment Advising Using Machine Learning
 
Cricket score and winner predictor
Cricket score and winner predictorCricket score and winner predictor
Cricket score and winner predictor
 
Cricket 2
Cricket 2Cricket 2
Cricket 2
 
La liga 2013 2014 analysis
La liga 2013 2014 analysisLa liga 2013 2014 analysis
La liga 2013 2014 analysis
 
IRJET- Error Reduction in Data Prediction using Least Square Regression Method
IRJET- Error Reduction in Data Prediction using Least Square Regression MethodIRJET- Error Reduction in Data Prediction using Least Square Regression Method
IRJET- Error Reduction in Data Prediction using Least Square Regression Method
 
Predicting Football Match Results with Data Mining Techniques
Predicting Football Match Results with Data Mining TechniquesPredicting Football Match Results with Data Mining Techniques
Predicting Football Match Results with Data Mining Techniques
 
User Payment Prediction in Free-to-Play
User Payment Prediction in Free-to-PlayUser Payment Prediction in Free-to-Play
User Payment Prediction in Free-to-Play
 
Machine Learning Foundations Project Presentation
Machine Learning Foundations Project PresentationMachine Learning Foundations Project Presentation
Machine Learning Foundations Project Presentation
 
House Price Estimation as a Function Fitting Problem with using ANN Approach
House Price Estimation as a Function Fitting Problem with using ANN ApproachHouse Price Estimation as a Function Fitting Problem with using ANN Approach
House Price Estimation as a Function Fitting Problem with using ANN Approach
 
Chap13 intro to multiple regression
Chap13 intro to multiple regressionChap13 intro to multiple regression
Chap13 intro to multiple regression
 
SECh910
SECh910SECh910
SECh910
 
BIG MART SALES PREDICTION USING MACHINE LEARNING
BIG MART SALES PREDICTION USING MACHINE LEARNINGBIG MART SALES PREDICTION USING MACHINE LEARNING
BIG MART SALES PREDICTION USING MACHINE LEARNING
 

Recently uploaded

Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 

Recently uploaded (20)

Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 

IPL Data Analysis using Data Science

  • 1. IPL Data Analysis Kaushal Sanadhya Indraprashta Institute of Information Technology,Delhi Okhla Industrial Estate,Phase III New Delhi,India emailid : kaushal19133@iiitd.ac.in Abstract—Cricket is one of the most celebrated games in the world. With the introduction of Data science and machine learning techniques in the world of cricket, forecasting the score of the match has been established as one of the most challenging problems.Especially in the shortest format T20, score forecasting and analyzing other statistics become more important as every moment is sufficient enough to take the game away from oppo- sition team. Our work develops some crucial predictions using various machine learning models like RandomForestRegressor, Linear regressor , Radius Nearest Neighbors etc. Index Terms—score forecasting,modeling Indian Premier League data,Regression,machine learning I. INTRODUCTION There can be several factors that strongly affect predictions like the current score, wickets in hand, weather conditions, dew factor, pitch condition, etc. We have used a data set of 1,79,079 records consisting of the data for every single ball in IPL matches from the year 2009 to 2019. Significant contributions from this project are as follows: a) Feature construction: We have created new attributes [balls remaining, current score, wickets in hand] that can capture the critical information in the dataset(deliveries.csv) much more efficiently than the original attributes.: b) Final score prediction : predicting the eventual score in the first innings. : II. FEATURE CONSTRUCTION The existing features (Deliveries.csv) like over, ball,is super over,wide runs,non-stricker, etc. are not good enough to make confident, reliable predictions for a final score, Therefore new features score are created as follows: • balls remaining: number of balls remaining in the first innings of the match • current score: current score of the team. • wickets remaining: wickets in hand for the team. • final score: final score of that team in that match this is the target variable which we are trying to forecast. When these newly created features are used to predict the final score, we obtained some handsome value of R square for different machine learning models used in the project III. PREDICTING FIRST INNINGS SCORES Score prediction for the first innings is a typical multiple regression problem since the output is the forecasted score. Data set of size 10,315 records is used to train our model. A. Training Data Match data for the following teams is used to train our machine learning models: • Chennai Super Kings • Hyderabad Sun Risers • Mumbai Indians Training data size is 10,316 records are used . B. Test Data Match data for Kolkata Knight Riders is used for Testing purposes. To make predictions for the final score of the Kolkata team, every time the final score for ten random matches is selected, which is further compared with the predicted score by our machine learning models. IV. REGRESSION MODELS USED A. Multivariate Linear Regression Most of the cricket problems that are encountered will have more than two variables. Therefore Multivariate Linear regression is used to fit the line in our multi dimensional space. Using this regression model we can even draw the impact of each feature on the predicted score using the below well know equation for linear regression: Y = a + b*X1 + c*X2 + d*X3 For our analysis we will get the below equation: Final Score = 27.915 + 0.989121 * current score + 1.183421 * balls remaining - 3.576307 * wickets Actual Score Predicted Score 150 154 119 114 204 210 130 131 155 152 160 169 222 232 163 160 223 231 148 149
  • 2. 1) Performance Evaluation: Mean Absolute error and Root mean Squared errors are the two parameters used to evaluate the performance and following values are achieved: Mean Absolute Error: 5.13 Root Mean Squared Error: 6.02 These values are moderately high since the actual and pre- dicted scores differs by at most 10 runs. The performance can be enhanced by using some more advanced regression models like ada boost or Random forest Regression. B. Random Forest Regression Random Forest Regression is an ensemble technique which makes use of multiple prediction model.It combines the result of these prediction models to give more accurate results. Actual Score Predicted Score 150 150 119 121 204 201 130 132 155 155 160 160 222 220 163 163 223 221 148 148 1) Importance Associated With Various Features: By looking at the values of the features importance we can estimate the significant contribution made by some feature. Feature Importance current score 0.47 Balls Remaining 0.30 Wickets 0.22677739128946717 2) Performance Evaluation: Following are the values of Mean Absolute Error , Root means squared Error: Mean Absolute Error: 1.36 Root Mean Squared Error: 1.69 These values are far better than the values achieved using Multiple linear Regression which highlights the power of Ensemble regression models.One can easily visualize the same by looking at the difference between predicted and actual score which is at most 3 runs. C. Radius Neighbors Regression This regression model is based on the concept of K nearest neighbors.Just like K nearest neighbors regression model , Radius Neighbors Regression finds the neighbors within specific distance(Manhattan Distance). Actual Score Predicted Score 150 150 119 121 204 204 130 132 155 157 160 163 222 221 163 164 223 222 148 149 1) Performance Evaluation: Following are the values of Mean absolute Error and Root mean squared Error for the model. Mean Absolute Error: 1.36 Root Mean Squared Error: 1.79 These values are better than the multiple regression model and somewhat comparable to Random Forest ensemble error values. D. Comparison of R square Values R square values of the regression models are shown in the below graph. High values of R square for Random Forest and Radius Nearest Neighbors depicts that these regression models ex- plains all the variability of the response data around its mean in more effective manner as compared to Multivariate Linear Regression Model. E. Comparison of Mean Absolute Error The mean of the absolute value of the errors is defined as the absolute difference between actual and predicted score for a match.The mathematical formula is given below:
  • 3. Where MAE stands for Mean Absolute Error. This difference is 6-7 runs for Multivariate Linear Re- gression, 1-5 runs for Random Forest and Radius Nearest Neighbor. The graph depicting these error values for all three models is shown below: F. Comparison of Root Mean Squared Error The square root of the mean of the squared errors is called as Root Mean Squared Error.The mathematical formula is given below(RMSE : Root Mean Squared Error): Following graph compares these Root Mean Squared values. Root Mean Squared Error and Mean absolute error graphs are similar to each other.Both the graph shows that Random Forest and Radius Nearest Neighbor Regression are perform- ing better as compared to Multivariate Linear Regression Model. REFERENCES [1] https://www.espncricinfo.com/series/ /id/8048/season/2019/indian- premier-league [2] Kaggle Data Set https://www.kaggle.com/manasgarg/ipl [3] Regression Model Implementation https://towardsdatascience.com/ [4] Scikit-Learn Documentation https://scikit- learn.org/stable/documentation.html [5] Indian Premier League Official https://www.iplt20.com/ [6] Cricket Analytics Visualized https://cricketsavant.wordpress.com/ [7] Predicting the Outcome of ODI Cricket Matches: A Team Composition Based Approach by Madan Gopal Jhawar, Vikram Pudi, IIIT-H