SlideShare a Scribd company logo
1 of 26
CRICKET MATCH WIN PREDICTOR USING LOGISTIC
REGRESSION
Under the Supervision of :
Mrs. Teressa Longjam
Team Members:
V. Aravind Reddy
V. Yaswanth Reddy
K. Praveen
B. Satyanarayana
Department of Computer Science
and Engineering
Contents
● Abstract
● Introduction
● Flow chart
● Outline of the Project and Software tools
● Data and Features
● Sigmoid Function
● Intuition
● Logistic Regression
● Exploratory Data Analysis(Part1,Part2)
● Model Fitting
● Performance Metrics
● References
● Conclusion and Future Work
Abstract
This project aims to find the best features which can accurately
predict the probability of a team winning or losing. It also focuses
on how we use stochastic gradient descent optimization
technique to update the weights and get the best linear
combination of features. In this project we have used scikit learn
pipeline for fitting the model,where in the pipeline we have used
columntransformer for various data types to process them at
once.
Introduction:
● Finding Important features through merging the both data
frames ,performing Exploratory Data Analysis on it and fitting the
Logistic Regression model to the data to obtain the winning
probability of either teams.
● Taking the first innings score and present situation of the of
second innings, it predicts winning probability of both the teams.
❖ Objectives of the Project
❖ Flowchart
❖ Outline of the project and Software Tools:
● This project focuses on how we can use Exploratory data Analysis to derive
important features and use a suitable machine learning algorithm to build an
application which predicts the winning probability of a certain cricket team.
● Feature (variable) importance indicates how much each feature contributes to the
model prediction.Basically it determines the degree of usefulness of a specific
variable for a current model and prediction.
● Coming to the data used in this project,it consists of two csv data frames collected from
Kaggle. These two dataframes explain regarding the matches and ball by ball data
respectively.
● Coming to the software we are using google colab environment for the project and
numpy,pandas,sklearn python libraries.
Data
● The below code explains the shapes of the two data frames i.e match
dataframe and the deliveries data frame respectively.
● The matches dataframe shows that it has 756 matches data over 18 features.
● And deliveries dataframe has almost 1.8 lakh deliveries data over 21 features.
Features
● Here we have the features of matches dataframe.
Features of deliveries data frame
● Here we have the features of deliveries dataframe.
Sigmoid function
Logistic Regression
Logistic Regression
● Logistic Regression is a classifier that can be applied in a single or multi-label
classification set ups.
● Logistic Regression is a discriminative classifier.
● It obtains probability of sample belonging to a specific class by computing
sigmoid (aka logistic function) of linear combination of features.
● The weight vector for linear combination is learnt via model training.
Exploratory Data Analysis (Part-1)
● Here in the EDA our main target is to finally extract a single data frame with
important features from the two given data frames.
● Initially in the deliveries data frame we group the runs(based on
match_id,inning) for every match according to the innings.
● Then we calculate the target as no.of runs in first innings+1.
● Now as the two data frames have a common column of match_id ,we merge
both the data frames on that id.
● After merging we ignore the matches using dl method, abandoned due to rain
and missing data points.
Exploratory Data Analysis(Part-2)
● In the part-2 of this analysis ,we focus on constructing cumulative score for
every ball.
● From this cumulative score we can calculate the required runs ,required run
rate and some other important features which are useful in predicting the
probability.
● Now after this step we calculate important features like
cur_run_rate,req_run_rate,balls_left,wickets_left using formulae given below.
● balls_left=126-(over*6+current_ball)
● cur_run_rate=(current_score*6)/(120-balls_left).
● req_run_rate=(runs_left*6)/(balls_left)
Presently features derived through EDA
● Batting-team
● Bowling-team
● City
● Runs-left
● Balls-left
● Wickets
● Total_runs
● Required_run_rate
● Cur_run_rate
● result
Model Fitting
Performance Analysis
● For evaluating performance of the model,we have used accuracy_score as the
metric from sklearn library.
● accuracy_score=(total no of correct predictions)/(Total no of samples).
● The accuracy score of the model was 86%.
● Here the score given by this metric shows that it predicts a correct probability
of the teams in 86% of the cases.
Overfitting and Underfitting
● Bias : Assumptions made by a model to make a function easier to learn.It is
actually the error rate of the training data.When the error rate has a high
value,we call it High Bias and when the error rate has a low value,we call it low
Bias.
● Variance :The difference between the error rate of training data and testing
data is called variance.If the difference is high then its called high variance
and when the difference of errors is low then its called low
variance.Usually,we want make a low variance for generalizing our model.
Underfitting
● A statistical model or a machine learning algorithm is said to have
underfitting when it cannot capture the underlying trend of data,i.e;it only
performs well on training data but performs poorly on testing data.(it’s just
like trying to fit undersized pants).
● Reasons for Underfitting
1. High bias and low variance
2. The size of the training dataset used is not enough.
3. The model is too simple.
4. Training data is not cleaned and also contains noise in it.
Techniques to reduce underfitting
● Increase model complexity
● Increase the number of features using feature engineering.
● Remove noise from the data.
● Increase the number of epochs or increase the duration of training to get
better results.
Overfitting and underfitting description depiction
Overfitting
● A statistical model is said to be overfitted when the model does not make
accurate predictions on test data.
● When the model gets trained with so much data,it starts learning from the
noise and inaccurate data entries from the data set.
● Then the models does not categorize the data correctly,because of too many
details and noise.
● Reasons for overfitting
● High variance and low bias,model is too complex ,size of training data.
Techniques for reducing overfitting
● Increase the training data
● Reduce the model complexity
● Early stopping during the training phase.
● Using Regularization
● Other overfitting techniques.
Conclusion and Future Work
● Here from this project we can conclude that how important is feature
extraction and how we can use a machine learning model on that to build
some useful applications.
● In future further we can develop this project to predict the win probability
from the first innings itself.
● We can use the previous matches datasets to predict the win probability from
first innings itself.
● For giving custom input and to predict the result,we are designing the front
end ,where we can enter all the values of derived features to get the
probability.
References
● Ananda Bandulasiri, “Predicting the Winner in One Day International Cricket”
Journal of Mathematical Sciences & Mathematics Education.
● Tejinder Singh, Vishal Singla and Parteek Bhatia, “Score and Winning
Prediction in Cricket through Data Mining” 8 October 2015.
THANK YOU

More Related Content

Similar to Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC REGRESSION.pptx

Stock market analysis using supervised machine learning
Stock market analysis using supervised machine learningStock market analysis using supervised machine learning
Stock market analysis using supervised machine learningPriyanshu Gandhi
 
Machine learning pipeline with spark ml
Machine learning pipeline with spark mlMachine learning pipeline with spark ml
Machine learning pipeline with spark mldatamantra
 
Scaling AutoML-Driven Anomaly Detection With Luminaire
Scaling AutoML-Driven Anomaly Detection With LuminaireScaling AutoML-Driven Anomaly Detection With Luminaire
Scaling AutoML-Driven Anomaly Detection With LuminaireDatabricks
 
Nose Dive into Apache Spark ML
Nose Dive into Apache Spark MLNose Dive into Apache Spark ML
Nose Dive into Apache Spark MLAhmet Bulut
 
AIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTIONAIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTIONIRJET Journal
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine LearningKnoldus Inc.
 
Machine learning Experiments report
Machine learning Experiments report Machine learning Experiments report
Machine learning Experiments report AlmkdadAli
 
Key projects Data Science and Engineering
Key projects Data Science and EngineeringKey projects Data Science and Engineering
Key projects Data Science and EngineeringVijayananda Mohire
 
Key projects Data Science and Engineering
Key projects Data Science and EngineeringKey projects Data Science and Engineering
Key projects Data Science and EngineeringVijayananda Mohire
 
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...GeeksLab Odessa
 
VSSML17 Review. Summary Day 1 Sessions
VSSML17 Review. Summary Day 1 SessionsVSSML17 Review. Summary Day 1 Sessions
VSSML17 Review. Summary Day 1 SessionsBigML, Inc
 
Supervised embedding techniques in search ranking system
Supervised embedding techniques in search ranking systemSupervised embedding techniques in search ranking system
Supervised embedding techniques in search ranking systemMarsan Ma
 
Production ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeProduction ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeIdo Shilon
 
BIG MART SALES PREDICTION USING MACHINE LEARNING
BIG MART SALES PREDICTION USING MACHINE LEARNINGBIG MART SALES PREDICTION USING MACHINE LEARNING
BIG MART SALES PREDICTION USING MACHINE LEARNINGIRJET Journal
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruptionjagan477830
 
Deep learning crash course
Deep learning crash courseDeep learning crash course
Deep learning crash courseVishwas N
 
MOPs & ML Pipelines on GCP - Session 6, RGDC
MOPs & ML Pipelines on GCP - Session 6, RGDCMOPs & ML Pipelines on GCP - Session 6, RGDC
MOPs & ML Pipelines on GCP - Session 6, RGDCgdgsurrey
 
Scaling & Transforming Stitch Fix's Visibility into What Folks will love
Scaling & Transforming Stitch Fix's Visibility into What Folks will loveScaling & Transforming Stitch Fix's Visibility into What Folks will love
Scaling & Transforming Stitch Fix's Visibility into What Folks will loveJune Andrews
 

Similar to Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC REGRESSION.pptx (20)

Stock market analysis using supervised machine learning
Stock market analysis using supervised machine learningStock market analysis using supervised machine learning
Stock market analysis using supervised machine learning
 
Machine learning pipeline with spark ml
Machine learning pipeline with spark mlMachine learning pipeline with spark ml
Machine learning pipeline with spark ml
 
Scaling AutoML-Driven Anomaly Detection With Luminaire
Scaling AutoML-Driven Anomaly Detection With LuminaireScaling AutoML-Driven Anomaly Detection With Luminaire
Scaling AutoML-Driven Anomaly Detection With Luminaire
 
Nose Dive into Apache Spark ML
Nose Dive into Apache Spark MLNose Dive into Apache Spark ML
Nose Dive into Apache Spark ML
 
AIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTIONAIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTION
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine Learning
 
HW03 (1).pdf
HW03 (1).pdfHW03 (1).pdf
HW03 (1).pdf
 
Machine learning Experiments report
Machine learning Experiments report Machine learning Experiments report
Machine learning Experiments report
 
Key projects Data Science and Engineering
Key projects Data Science and EngineeringKey projects Data Science and Engineering
Key projects Data Science and Engineering
 
Key projects Data Science and Engineering
Key projects Data Science and EngineeringKey projects Data Science and Engineering
Key projects Data Science and Engineering
 
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
 
Fianl_Paper
Fianl_PaperFianl_Paper
Fianl_Paper
 
VSSML17 Review. Summary Day 1 Sessions
VSSML17 Review. Summary Day 1 SessionsVSSML17 Review. Summary Day 1 Sessions
VSSML17 Review. Summary Day 1 Sessions
 
Supervised embedding techniques in search ranking system
Supervised embedding techniques in search ranking systemSupervised embedding techniques in search ranking system
Supervised embedding techniques in search ranking system
 
Production ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeProduction ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ waze
 
BIG MART SALES PREDICTION USING MACHINE LEARNING
BIG MART SALES PREDICTION USING MACHINE LEARNINGBIG MART SALES PREDICTION USING MACHINE LEARNING
BIG MART SALES PREDICTION USING MACHINE LEARNING
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruption
 
Deep learning crash course
Deep learning crash courseDeep learning crash course
Deep learning crash course
 
MOPs & ML Pipelines on GCP - Session 6, RGDC
MOPs & ML Pipelines on GCP - Session 6, RGDCMOPs & ML Pipelines on GCP - Session 6, RGDC
MOPs & ML Pipelines on GCP - Session 6, RGDC
 
Scaling & Transforming Stitch Fix's Visibility into What Folks will love
Scaling & Transforming Stitch Fix's Visibility into What Folks will loveScaling & Transforming Stitch Fix's Visibility into What Folks will love
Scaling & Transforming Stitch Fix's Visibility into What Folks will love
 

Recently uploaded

What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 

Recently uploaded (20)

What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 

Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC REGRESSION.pptx

  • 1. CRICKET MATCH WIN PREDICTOR USING LOGISTIC REGRESSION Under the Supervision of : Mrs. Teressa Longjam Team Members: V. Aravind Reddy V. Yaswanth Reddy K. Praveen B. Satyanarayana Department of Computer Science and Engineering
  • 2. Contents ● Abstract ● Introduction ● Flow chart ● Outline of the Project and Software tools ● Data and Features ● Sigmoid Function ● Intuition ● Logistic Regression ● Exploratory Data Analysis(Part1,Part2) ● Model Fitting ● Performance Metrics ● References ● Conclusion and Future Work
  • 3. Abstract This project aims to find the best features which can accurately predict the probability of a team winning or losing. It also focuses on how we use stochastic gradient descent optimization technique to update the weights and get the best linear combination of features. In this project we have used scikit learn pipeline for fitting the model,where in the pipeline we have used columntransformer for various data types to process them at once.
  • 4. Introduction: ● Finding Important features through merging the both data frames ,performing Exploratory Data Analysis on it and fitting the Logistic Regression model to the data to obtain the winning probability of either teams. ● Taking the first innings score and present situation of the of second innings, it predicts winning probability of both the teams. ❖ Objectives of the Project
  • 6. ❖ Outline of the project and Software Tools: ● This project focuses on how we can use Exploratory data Analysis to derive important features and use a suitable machine learning algorithm to build an application which predicts the winning probability of a certain cricket team. ● Feature (variable) importance indicates how much each feature contributes to the model prediction.Basically it determines the degree of usefulness of a specific variable for a current model and prediction. ● Coming to the data used in this project,it consists of two csv data frames collected from Kaggle. These two dataframes explain regarding the matches and ball by ball data respectively. ● Coming to the software we are using google colab environment for the project and numpy,pandas,sklearn python libraries.
  • 7. Data ● The below code explains the shapes of the two data frames i.e match dataframe and the deliveries data frame respectively. ● The matches dataframe shows that it has 756 matches data over 18 features. ● And deliveries dataframe has almost 1.8 lakh deliveries data over 21 features.
  • 8. Features ● Here we have the features of matches dataframe.
  • 9. Features of deliveries data frame ● Here we have the features of deliveries dataframe.
  • 12. Logistic Regression ● Logistic Regression is a classifier that can be applied in a single or multi-label classification set ups. ● Logistic Regression is a discriminative classifier. ● It obtains probability of sample belonging to a specific class by computing sigmoid (aka logistic function) of linear combination of features. ● The weight vector for linear combination is learnt via model training.
  • 13. Exploratory Data Analysis (Part-1) ● Here in the EDA our main target is to finally extract a single data frame with important features from the two given data frames. ● Initially in the deliveries data frame we group the runs(based on match_id,inning) for every match according to the innings. ● Then we calculate the target as no.of runs in first innings+1. ● Now as the two data frames have a common column of match_id ,we merge both the data frames on that id. ● After merging we ignore the matches using dl method, abandoned due to rain and missing data points.
  • 14. Exploratory Data Analysis(Part-2) ● In the part-2 of this analysis ,we focus on constructing cumulative score for every ball. ● From this cumulative score we can calculate the required runs ,required run rate and some other important features which are useful in predicting the probability. ● Now after this step we calculate important features like cur_run_rate,req_run_rate,balls_left,wickets_left using formulae given below. ● balls_left=126-(over*6+current_ball) ● cur_run_rate=(current_score*6)/(120-balls_left). ● req_run_rate=(runs_left*6)/(balls_left)
  • 15. Presently features derived through EDA ● Batting-team ● Bowling-team ● City ● Runs-left ● Balls-left ● Wickets ● Total_runs ● Required_run_rate ● Cur_run_rate ● result
  • 17. Performance Analysis ● For evaluating performance of the model,we have used accuracy_score as the metric from sklearn library. ● accuracy_score=(total no of correct predictions)/(Total no of samples). ● The accuracy score of the model was 86%. ● Here the score given by this metric shows that it predicts a correct probability of the teams in 86% of the cases.
  • 18. Overfitting and Underfitting ● Bias : Assumptions made by a model to make a function easier to learn.It is actually the error rate of the training data.When the error rate has a high value,we call it High Bias and when the error rate has a low value,we call it low Bias. ● Variance :The difference between the error rate of training data and testing data is called variance.If the difference is high then its called high variance and when the difference of errors is low then its called low variance.Usually,we want make a low variance for generalizing our model.
  • 19. Underfitting ● A statistical model or a machine learning algorithm is said to have underfitting when it cannot capture the underlying trend of data,i.e;it only performs well on training data but performs poorly on testing data.(it’s just like trying to fit undersized pants). ● Reasons for Underfitting 1. High bias and low variance 2. The size of the training dataset used is not enough. 3. The model is too simple. 4. Training data is not cleaned and also contains noise in it.
  • 20. Techniques to reduce underfitting ● Increase model complexity ● Increase the number of features using feature engineering. ● Remove noise from the data. ● Increase the number of epochs or increase the duration of training to get better results.
  • 21. Overfitting and underfitting description depiction
  • 22. Overfitting ● A statistical model is said to be overfitted when the model does not make accurate predictions on test data. ● When the model gets trained with so much data,it starts learning from the noise and inaccurate data entries from the data set. ● Then the models does not categorize the data correctly,because of too many details and noise. ● Reasons for overfitting ● High variance and low bias,model is too complex ,size of training data.
  • 23. Techniques for reducing overfitting ● Increase the training data ● Reduce the model complexity ● Early stopping during the training phase. ● Using Regularization ● Other overfitting techniques.
  • 24. Conclusion and Future Work ● Here from this project we can conclude that how important is feature extraction and how we can use a machine learning model on that to build some useful applications. ● In future further we can develop this project to predict the win probability from the first innings itself. ● We can use the previous matches datasets to predict the win probability from first innings itself. ● For giving custom input and to predict the result,we are designing the front end ,where we can enter all the values of derived features to get the probability.
  • 25. References ● Ananda Bandulasiri, “Predicting the Winner in One Day International Cricket” Journal of Mathematical Sciences & Mathematics Education. ● Tejinder Singh, Vishal Singla and Parteek Bhatia, “Score and Winning Prediction in Cricket through Data Mining” 8 October 2015.