This project uses logistic regression to build a cricket match win predictor. It analyzes match and ball-by-ball data to extract important features, performs exploratory data analysis to derive additional predictive features, and fits a logistic regression model to predict the winning probability of teams based on the game situation. The model achieves an accuracy of 86% on the test data. Future work includes predicting the winner based only on the first innings and adding a user interface to allow custom predictions.
Dataset: Gather a large dataset of laptops and their features, including processor speed, RAM, storage, and display size, along with their corresponding prices.
Feature engineering: Extracting meaningful features from the dataset, such as brand, model, and year, and transforming them into a format that machine learning algorithms can use.
Model selection: Choosing the most appropriate machine learning algorithm, such as linear regression, decision tree, or random forest, based on the type of data and desired level of accuracy.
Model training: Splitting the dataset into training and testing sets, and using the training data to train the machine learning model.
Model evaluation: Testing the model's performance on the testing data and evaluating its accuracy using metrics such as mean squared error or R-squared.
Hyperparameter tuning: Optimizing the model's hyperparameters, such as learning rate or regularization strength, to achieve the best performance.
Feature Engineering in Machine LearningKnoldus Inc.
In this Knolx we are going to explore Data Preprocessing and Feature Engineering Techniques. We will also understand what is Feature Engineering and its importance in Machine Learning. How Feature Engineering can help in getting the best results from the algorithms.
Dataset: Gather a large dataset of laptops and their features, including processor speed, RAM, storage, and display size, along with their corresponding prices.
Feature engineering: Extracting meaningful features from the dataset, such as brand, model, and year, and transforming them into a format that machine learning algorithms can use.
Model selection: Choosing the most appropriate machine learning algorithm, such as linear regression, decision tree, or random forest, based on the type of data and desired level of accuracy.
Model training: Splitting the dataset into training and testing sets, and using the training data to train the machine learning model.
Model evaluation: Testing the model's performance on the testing data and evaluating its accuracy using metrics such as mean squared error or R-squared.
Hyperparameter tuning: Optimizing the model's hyperparameters, such as learning rate or regularization strength, to achieve the best performance.
Feature Engineering in Machine LearningKnoldus Inc.
In this Knolx we are going to explore Data Preprocessing and Feature Engineering Techniques. We will also understand what is Feature Engineering and its importance in Machine Learning. How Feature Engineering can help in getting the best results from the algorithms.
Scaling AutoML-Driven Anomaly Detection With LuminaireDatabricks
Organizations rely heavily on time series metrics to measure and model key aspects of operational and business performance. The ability to reliably detect issues with these metrics is imperative to identifying early indicators of major problems before they become pervasive. This is a difficult machine learning and systems problem because temporal patterns are complex, ever changing, and often very noisy, traditionally requiring significant manual configuration and model maintenance.
At Zillow, we have built an orchestration framework around Luminaire, our open-source python library for hands-off time-series Anomaly Detection. Luminaire provides a suite of models and built-in AutoML capabilities which we process with Spark for distributed training and scoring of thousands of metrics. In this talk, we will cover the architecture of this framework and performance of the Luminaire package across detection and prediction accuracy as well as runtime efficiency.
Methods of Optimization in Machine LearningKnoldus Inc.
In this session we will discuss about various methods to optimise a machine learning model and, how we can adjust the hyper-parameters to minimise the cost function.
Machine Learning Laboratory set of experiments, including ANN, Backpropagation, K-Means, Hierarchical Clustering, Linear Regression, Multivariate Regression, Fuzzy Logic.
This is our contributions to the Data Science projects, as developed in our startup. These are part of partner trainings and in-house design and development and testing of the course material and concepts in Data Science and Engineering. It covers Data ingestion, data wrangling, feature engineering, data analysis, data storage, data extraction, querying data, formatting and visualizing data for various dashboards.Data is prepared for accurate ML model predictions and Generative AI apps
This is our project work at our startup for Data Science. This is part of our internal training and focused on data management for AI, ML and Generative AI apps
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...GeeksLab Odessa
DataScienceLab, 13 мая 2017
Оптимизация гиперпараметров машинного обучения при помощи Байесовской оптимизации
Максим Бевза (Research Engineer at Grammarly)
Все алгоритмы машинного обучения нуждаются в настройке (тьюнинге). Часто мы используем Grid Search или Randomized Search или нашу интуицию для подбора гиперпараметров. Байесовская оптимизация поможет нам направить Randomized Search в те места, которые наиболее перспективны, так, чтобы тот же (или лучший) результат мы получили за меньшее количество итераций.
Все материалы: http://datascience.in.ua/report2017
Valencian Summer School in Machine Learning 2017 - Day 1
Lectures Review: Summary Day 1 Sessions. By Mercè Martín (BigML).
https://bigml.com/events/valencian-summer-school-in-machine-learning-2017
Identifying and classifying unknown Network Disruptionjagan477830
Since the evolution of modern technology and with the drastic increase in the scale of network communication more and more network disruptions in traffic and private protocols have been taking place. Identifying and classifying the unknown network disruptions can provide support and even help to maintain the backup systems.
MOPs & ML Pipelines on GCP - Session 6, RGDCgdgsurrey
MLOps Lifecycle
ML problem framing
ML solution architecture
Data preparation and processing
ML model development
ML pipeline automation and orchestration
ML solution monitoring, optimization, and maintenance
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar
The European Union Agency for Law Enforcement Cooperation (Europol) has suffered an alleged data breach after a notorious threat actor claimed to have exfiltrated data from its systems. Infamous data leaker IntelBroker posted on the even more infamous BreachForums hacking forum, saying that Europol suffered a data breach this month.
The alleged breach affected Europol agencies CCSE, EC3, Europol Platform for Experts, Law Enforcement Forum, and SIRIUS. Infiltration of these entities can disrupt ongoing investigations and compromise sensitive intelligence shared among international law enforcement agencies.
However, this is neither the first nor the last activity of IntekBroker. We have compiled for you what happened in the last few days. To track such hacker activities on dark web sources like hacker forums, private Telegram channels, and other hidden platforms where cyber threats often originate, you can check SOCRadar’s Dark Web News.
Stay Informed on Threat Actors’ Activity on the Dark Web with SOCRadar!
Scaling AutoML-Driven Anomaly Detection With LuminaireDatabricks
Organizations rely heavily on time series metrics to measure and model key aspects of operational and business performance. The ability to reliably detect issues with these metrics is imperative to identifying early indicators of major problems before they become pervasive. This is a difficult machine learning and systems problem because temporal patterns are complex, ever changing, and often very noisy, traditionally requiring significant manual configuration and model maintenance.
At Zillow, we have built an orchestration framework around Luminaire, our open-source python library for hands-off time-series Anomaly Detection. Luminaire provides a suite of models and built-in AutoML capabilities which we process with Spark for distributed training and scoring of thousands of metrics. In this talk, we will cover the architecture of this framework and performance of the Luminaire package across detection and prediction accuracy as well as runtime efficiency.
Methods of Optimization in Machine LearningKnoldus Inc.
In this session we will discuss about various methods to optimise a machine learning model and, how we can adjust the hyper-parameters to minimise the cost function.
Machine Learning Laboratory set of experiments, including ANN, Backpropagation, K-Means, Hierarchical Clustering, Linear Regression, Multivariate Regression, Fuzzy Logic.
This is our contributions to the Data Science projects, as developed in our startup. These are part of partner trainings and in-house design and development and testing of the course material and concepts in Data Science and Engineering. It covers Data ingestion, data wrangling, feature engineering, data analysis, data storage, data extraction, querying data, formatting and visualizing data for various dashboards.Data is prepared for accurate ML model predictions and Generative AI apps
This is our project work at our startup for Data Science. This is part of our internal training and focused on data management for AI, ML and Generative AI apps
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...GeeksLab Odessa
DataScienceLab, 13 мая 2017
Оптимизация гиперпараметров машинного обучения при помощи Байесовской оптимизации
Максим Бевза (Research Engineer at Grammarly)
Все алгоритмы машинного обучения нуждаются в настройке (тьюнинге). Часто мы используем Grid Search или Randomized Search или нашу интуицию для подбора гиперпараметров. Байесовская оптимизация поможет нам направить Randomized Search в те места, которые наиболее перспективны, так, чтобы тот же (или лучший) результат мы получили за меньшее количество итераций.
Все материалы: http://datascience.in.ua/report2017
Valencian Summer School in Machine Learning 2017 - Day 1
Lectures Review: Summary Day 1 Sessions. By Mercè Martín (BigML).
https://bigml.com/events/valencian-summer-school-in-machine-learning-2017
Identifying and classifying unknown Network Disruptionjagan477830
Since the evolution of modern technology and with the drastic increase in the scale of network communication more and more network disruptions in traffic and private protocols have been taking place. Identifying and classifying the unknown network disruptions can provide support and even help to maintain the backup systems.
MOPs & ML Pipelines on GCP - Session 6, RGDCgdgsurrey
MLOps Lifecycle
ML problem framing
ML solution architecture
Data preparation and processing
ML model development
ML pipeline automation and orchestration
ML solution monitoring, optimization, and maintenance
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar
The European Union Agency for Law Enforcement Cooperation (Europol) has suffered an alleged data breach after a notorious threat actor claimed to have exfiltrated data from its systems. Infamous data leaker IntelBroker posted on the even more infamous BreachForums hacking forum, saying that Europol suffered a data breach this month.
The alleged breach affected Europol agencies CCSE, EC3, Europol Platform for Experts, Law Enforcement Forum, and SIRIUS. Infiltration of these entities can disrupt ongoing investigations and compromise sensitive intelligence shared among international law enforcement agencies.
However, this is neither the first nor the last activity of IntekBroker. We have compiled for you what happened in the last few days. To track such hacker activities on dark web sources like hacker forums, private Telegram channels, and other hidden platforms where cyber threats often originate, you can check SOCRadar’s Dark Web News.
Stay Informed on Threat Actors’ Activity on the Dark Web with SOCRadar!
In software engineering, the right architecture is essential for robust, scalable platforms. Wix has undergone a pivotal shift from event sourcing to a CRUD-based model for its microservices. This talk will chart the course of this pivotal journey.
Event sourcing, which records state changes as immutable events, provided robust auditing and "time travel" debugging for Wix Stores' microservices. Despite its benefits, the complexity it introduced in state management slowed development. Wix responded by adopting a simpler, unified CRUD model. This talk will explore the challenges of event sourcing and the advantages of Wix's new "CRUD on steroids" approach, which streamlines API integration and domain event management while preserving data integrity and system resilience.
Participants will gain valuable insights into Wix's strategies for ensuring atomicity in database updates and event production, as well as caching, materialization, and performance optimization techniques within a distributed system.
Join us to discover how Wix has mastered the art of balancing simplicity and extensibility, and learn how the re-adoption of the modest CRUD has turbocharged their development velocity, resilience, and scalability in a high-growth environment.
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfJay Das
With the advent of artificial intelligence or AI tools, project management processes are undergoing a transformative shift. By using tools like ChatGPT, and Bard organizations can empower their leaders and managers to plan, execute, and monitor projects more effectively.
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
How to Position Your Globus Data Portal for Success Ten Good PracticesGlobus
Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.
Code reviews are vital for ensuring good code quality. They serve as one of our last lines of defense against bugs and subpar code reaching production.
Yet, they often turn into annoying tasks riddled with frustration, hostility, unclear feedback and lack of standards. How can we improve this crucial process?
In this session we will cover:
- The Art of Effective Code Reviews
- Streamlining the Review Process
- Elevating Reviews with Automated Tools
By the end of this presentation, you'll have the knowledge on how to organize and improve your code review proces
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
Experience our free, in-depth three-part Tendenci Platform Corporate Membership Management workshop series! In Session 1 on May 14th, 2024, we began with an Introduction and Setup, mastering the configuration of your Corporate Membership Module settings to establish membership types, applications, and more. Then, on May 16th, 2024, in Session 2, we focused on binding individual members to a Corporate Membership and Corporate Reps, teaching you how to add individual members and assign Corporate Representatives to manage dues, renewals, and associated members. Finally, on May 28th, 2024, in Session 3, we covered questions and concerns, addressing any queries or issues you may have.
For more Tendenci AMS events, check out www.tendenci.com/events
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...informapgpstrackings
Keep tabs on your field staff effortlessly with Informap Technology Centre LLC. Real-time tracking, task assignment, and smart features for efficient management. Request a live demo today!
For more details, visit us : https://informapuae.com/field-staff-tracking/
Large Language Models and the End of ProgrammingMatt Welsh
Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software with models that perform reasoning, computation, and problem-solving.
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
Navigating the Metaverse: A Journey into Virtual Evolution"Donna Lenk
Join us for an exploration of the Metaverse's evolution, where innovation meets imagination. Discover new dimensions of virtual events, engage with thought-provoking discussions, and witness the transformative power of digital realms."
Enterprise Resource Planning System includes various modules that reduce any business's workload. Additionally, it organizes the workflows, which drives towards enhancing productivity. Here are a detailed explanation of the ERP modules. Going through the points will help you understand how the software is changing the work dynamics.
To know more details here: https://blogs.nyggs.com/nyggs/enterprise-resource-planning-erp-system-modules/
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Anthony Dahanne
Les Buildpacks existent depuis plus de 10 ans ! D’abord, ils étaient utilisés pour détecter et construire une application avant de la déployer sur certains PaaS. Ensuite, nous avons pu créer des images Docker (OCI) avec leur dernière génération, les Cloud Native Buildpacks (CNCF en incubation). Sont-ils une bonne alternative au Dockerfile ? Que sont les buildpacks Paketo ? Quelles communautés les soutiennent et comment ?
Venez le découvrir lors de cette session ignite
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.
We describe the deployment and use of Globus Compute for remote computation. This content is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC REGRESSION.pptx
1. CRICKET MATCH WIN PREDICTOR USING LOGISTIC
REGRESSION
Under the Supervision of :
Mrs. Teressa Longjam
Team Members:
V. Aravind Reddy
V. Yaswanth Reddy
K. Praveen
B. Satyanarayana
Department of Computer Science
and Engineering
2. Contents
● Abstract
● Introduction
● Flow chart
● Outline of the Project and Software tools
● Data and Features
● Sigmoid Function
● Intuition
● Logistic Regression
● Exploratory Data Analysis(Part1,Part2)
● Model Fitting
● Performance Metrics
● References
● Conclusion and Future Work
3. Abstract
This project aims to find the best features which can accurately
predict the probability of a team winning or losing. It also focuses
on how we use stochastic gradient descent optimization
technique to update the weights and get the best linear
combination of features. In this project we have used scikit learn
pipeline for fitting the model,where in the pipeline we have used
columntransformer for various data types to process them at
once.
4. Introduction:
● Finding Important features through merging the both data
frames ,performing Exploratory Data Analysis on it and fitting the
Logistic Regression model to the data to obtain the winning
probability of either teams.
● Taking the first innings score and present situation of the of
second innings, it predicts winning probability of both the teams.
❖ Objectives of the Project
6. ❖ Outline of the project and Software Tools:
● This project focuses on how we can use Exploratory data Analysis to derive
important features and use a suitable machine learning algorithm to build an
application which predicts the winning probability of a certain cricket team.
● Feature (variable) importance indicates how much each feature contributes to the
model prediction.Basically it determines the degree of usefulness of a specific
variable for a current model and prediction.
● Coming to the data used in this project,it consists of two csv data frames collected from
Kaggle. These two dataframes explain regarding the matches and ball by ball data
respectively.
● Coming to the software we are using google colab environment for the project and
numpy,pandas,sklearn python libraries.
7. Data
● The below code explains the shapes of the two data frames i.e match
dataframe and the deliveries data frame respectively.
● The matches dataframe shows that it has 756 matches data over 18 features.
● And deliveries dataframe has almost 1.8 lakh deliveries data over 21 features.
12. Logistic Regression
● Logistic Regression is a classifier that can be applied in a single or multi-label
classification set ups.
● Logistic Regression is a discriminative classifier.
● It obtains probability of sample belonging to a specific class by computing
sigmoid (aka logistic function) of linear combination of features.
● The weight vector for linear combination is learnt via model training.
13. Exploratory Data Analysis (Part-1)
● Here in the EDA our main target is to finally extract a single data frame with
important features from the two given data frames.
● Initially in the deliveries data frame we group the runs(based on
match_id,inning) for every match according to the innings.
● Then we calculate the target as no.of runs in first innings+1.
● Now as the two data frames have a common column of match_id ,we merge
both the data frames on that id.
● After merging we ignore the matches using dl method, abandoned due to rain
and missing data points.
14. Exploratory Data Analysis(Part-2)
● In the part-2 of this analysis ,we focus on constructing cumulative score for
every ball.
● From this cumulative score we can calculate the required runs ,required run
rate and some other important features which are useful in predicting the
probability.
● Now after this step we calculate important features like
cur_run_rate,req_run_rate,balls_left,wickets_left using formulae given below.
● balls_left=126-(over*6+current_ball)
● cur_run_rate=(current_score*6)/(120-balls_left).
● req_run_rate=(runs_left*6)/(balls_left)
15. Presently features derived through EDA
● Batting-team
● Bowling-team
● City
● Runs-left
● Balls-left
● Wickets
● Total_runs
● Required_run_rate
● Cur_run_rate
● result
17. Performance Analysis
● For evaluating performance of the model,we have used accuracy_score as the
metric from sklearn library.
● accuracy_score=(total no of correct predictions)/(Total no of samples).
● The accuracy score of the model was 86%.
● Here the score given by this metric shows that it predicts a correct probability
of the teams in 86% of the cases.
18. Overfitting and Underfitting
● Bias : Assumptions made by a model to make a function easier to learn.It is
actually the error rate of the training data.When the error rate has a high
value,we call it High Bias and when the error rate has a low value,we call it low
Bias.
● Variance :The difference between the error rate of training data and testing
data is called variance.If the difference is high then its called high variance
and when the difference of errors is low then its called low
variance.Usually,we want make a low variance for generalizing our model.
19. Underfitting
● A statistical model or a machine learning algorithm is said to have
underfitting when it cannot capture the underlying trend of data,i.e;it only
performs well on training data but performs poorly on testing data.(it’s just
like trying to fit undersized pants).
● Reasons for Underfitting
1. High bias and low variance
2. The size of the training dataset used is not enough.
3. The model is too simple.
4. Training data is not cleaned and also contains noise in it.
20. Techniques to reduce underfitting
● Increase model complexity
● Increase the number of features using feature engineering.
● Remove noise from the data.
● Increase the number of epochs or increase the duration of training to get
better results.
22. Overfitting
● A statistical model is said to be overfitted when the model does not make
accurate predictions on test data.
● When the model gets trained with so much data,it starts learning from the
noise and inaccurate data entries from the data set.
● Then the models does not categorize the data correctly,because of too many
details and noise.
● Reasons for overfitting
● High variance and low bias,model is too complex ,size of training data.
23. Techniques for reducing overfitting
● Increase the training data
● Reduce the model complexity
● Early stopping during the training phase.
● Using Regularization
● Other overfitting techniques.
24. Conclusion and Future Work
● Here from this project we can conclude that how important is feature
extraction and how we can use a machine learning model on that to build
some useful applications.
● In future further we can develop this project to predict the win probability
from the first innings itself.
● We can use the previous matches datasets to predict the win probability from
first innings itself.
● For giving custom input and to predict the result,we are designing the front
end ,where we can enter all the values of derived features to get the
probability.
25. References
● Ananda Bandulasiri, “Predicting the Winner in One Day International Cricket”
Journal of Mathematical Sciences & Mathematics Education.
● Tejinder Singh, Vishal Singla and Parteek Bhatia, “Score and Winning
Prediction in Cricket through Data Mining” 8 October 2015.