There can be several factors that strongly affect predictions like the current score, wickets in hand, weather conditions, dew factor, pitch condition, etc. We have used a data set of 1,79,079 records consisting of the data for every single ball in IPL matches from the year 2009 to 2019.My work develops some crucial predictions using various machine learning models like RandomForestRegressor, Linear regressor , Radius Nearest Neighbors, etc.
Significant contributions from this project are as follows:
Feature construction: We have created new attributes [balls remaining, current score, wickets in hand] that can capture the critical information in the dataset(deliveries.csv) much more efficiently than the original attributes.
Final score prediction: predicting the eventual score in the first innings.
An analytics project on Ball by Ball data of 9 IPL seasons to predict patterns and insights team and player wise. Apart from that a MLR model to predict the score at the end of innings.
In this presentation slide, we tried to figure out Cricket Match Prediction.
Subscribe our YouTube Channel: https://www.youtube.com/thehungryprogrammer
Follow me on Facebook- https://www.facebook.com/Marufhosenshawon
Follow me on Twitter- https://twitter.com/MarufHosenShaon
Follow me on Linkedin- https://www.linkedin.com/in/marufhosenshawon/
Follow me on github- https://github.com/Marufhosenshawon
Online fantasy gaming in India and its growth strategy via digital assets, th...Abhismita Sen
With 10 million and growing subscribers, Fanfight becomes one of the top most fantasy sports apps of India. What sets it apart is its reluctance to invest on any conventional mode of marketing for growth and brand amplification. While rivals like MPL, Dream 11 and Rummy circle have invested aggressively on television advertising, celebrity endorsement, brand tie ups and newspaper advertising for growth, Fanfight has stuck entirely to digital marketing for its revenue growth, user acquisition and proliferation as a brand.
Fanfight is an interesting example of how digital marketing can be a standalone growth hacking strategy for any brand using it well, without the overwhelming usage of any other form of media outlets.
An analytics project on Ball by Ball data of 9 IPL seasons to predict patterns and insights team and player wise. Apart from that a MLR model to predict the score at the end of innings.
In this presentation slide, we tried to figure out Cricket Match Prediction.
Subscribe our YouTube Channel: https://www.youtube.com/thehungryprogrammer
Follow me on Facebook- https://www.facebook.com/Marufhosenshawon
Follow me on Twitter- https://twitter.com/MarufHosenShaon
Follow me on Linkedin- https://www.linkedin.com/in/marufhosenshawon/
Follow me on github- https://github.com/Marufhosenshawon
Online fantasy gaming in India and its growth strategy via digital assets, th...Abhismita Sen
With 10 million and growing subscribers, Fanfight becomes one of the top most fantasy sports apps of India. What sets it apart is its reluctance to invest on any conventional mode of marketing for growth and brand amplification. While rivals like MPL, Dream 11 and Rummy circle have invested aggressively on television advertising, celebrity endorsement, brand tie ups and newspaper advertising for growth, Fanfight has stuck entirely to digital marketing for its revenue growth, user acquisition and proliferation as a brand.
Fanfight is an interesting example of how digital marketing can be a standalone growth hacking strategy for any brand using it well, without the overwhelming usage of any other form of media outlets.
IPL's Opening Week Receives Over 186K MentionsSimplify360
This year IPL almost seems to be a hush hush game. Compared to years in the past. Lesser publicity, lesser number of ads and an uninterested audience. However, for cricket lovers, it sure does not make any difference.
That is exactly the reason why Twitter is flooded with tweets about the IPL. With 88.5% buzz on twitter alone, it sure is not a lost game!
following topics are discussed inside the PPT:
Introduction
Objective
Motivation
Literature Survey
Some Key Features of Disease
Plan of Action
Methodology Adopted
Data Collection
Steps to be Performed
Functional Architecture
This year, the team at Activate has defined the 16 most important insights for tech and media in 2020. Key topics include:
*$300 Billion Global Internet and Media Growth Dollars by 2023
*Consumer Attention: 12:40 Hours of Technology & Media Per Day
*Super Users: The Imperative for Technology & Media Companies
*Social Splinter: The Social Media World Expands
*eCommerce: Shrinking the Divide Between Physical and Digital Shopping
*Digital Marketplaces: New Inventory, Price, Transparency, Ease of Use
*Video Gaming: The Next Streaming Battlefront
*Esports: Sport of the Future
*Video: Streaming Stacking, Battleground Households
*Sports Betting: The Next Big Financial Trading Market
*Sports: More Viewing, Emerging Sports, Better Fan Experiences
*Music: The Discovery Challenge
*Podcasting: New Listening Experiences and Explosive Growth
*The Networked Body: Quantified and Connected Human
*Digital Consumer Finance: Next Generation Services Go Mainstream
*Connectivity: Enabling the Next Wave of Technology and Media Growth
BSC CSIT Final Year Internship Experience Report on SEOSirish Paudel
This is a copy of BSC CSIT Final Year Internship Report on Search Engine Optimization prepared as per the standard Internship report format of Tribhuwan University, Nepal
We all know IPL is one of the richest and famous league of the world. It is a very competitive tournament as well as a great business model. In this presentation, I have tried to know you about this successful model.
Entertainment & Advertising | Riding the Digital WaveRedSeer
Josh – Fares higher on the NPS primarily on the back
of best satisfaction (53%) among Tier 1 users, primarily
emerging from the Hindi-belt of India
Moj – Secures the highest satisfaction (46%) across the
competition in Metro cities, and second best (45%) in
Tier 2+ cities
MX Takatak – Performs well in the vernacular region,
leading to higher NPS (46%) in Tier 1 cities
Roposo – While Roposo does reasonably well in Tier 2+
cities (garnering ~57% NPS), it’s Metro users’ satisfaction
is at 21%
IPL's Opening Week Receives Over 186K MentionsSimplify360
This year IPL almost seems to be a hush hush game. Compared to years in the past. Lesser publicity, lesser number of ads and an uninterested audience. However, for cricket lovers, it sure does not make any difference.
That is exactly the reason why Twitter is flooded with tweets about the IPL. With 88.5% buzz on twitter alone, it sure is not a lost game!
following topics are discussed inside the PPT:
Introduction
Objective
Motivation
Literature Survey
Some Key Features of Disease
Plan of Action
Methodology Adopted
Data Collection
Steps to be Performed
Functional Architecture
This year, the team at Activate has defined the 16 most important insights for tech and media in 2020. Key topics include:
*$300 Billion Global Internet and Media Growth Dollars by 2023
*Consumer Attention: 12:40 Hours of Technology & Media Per Day
*Super Users: The Imperative for Technology & Media Companies
*Social Splinter: The Social Media World Expands
*eCommerce: Shrinking the Divide Between Physical and Digital Shopping
*Digital Marketplaces: New Inventory, Price, Transparency, Ease of Use
*Video Gaming: The Next Streaming Battlefront
*Esports: Sport of the Future
*Video: Streaming Stacking, Battleground Households
*Sports Betting: The Next Big Financial Trading Market
*Sports: More Viewing, Emerging Sports, Better Fan Experiences
*Music: The Discovery Challenge
*Podcasting: New Listening Experiences and Explosive Growth
*The Networked Body: Quantified and Connected Human
*Digital Consumer Finance: Next Generation Services Go Mainstream
*Connectivity: Enabling the Next Wave of Technology and Media Growth
BSC CSIT Final Year Internship Experience Report on SEOSirish Paudel
This is a copy of BSC CSIT Final Year Internship Report on Search Engine Optimization prepared as per the standard Internship report format of Tribhuwan University, Nepal
We all know IPL is one of the richest and famous league of the world. It is a very competitive tournament as well as a great business model. In this presentation, I have tried to know you about this successful model.
Entertainment & Advertising | Riding the Digital WaveRedSeer
Josh – Fares higher on the NPS primarily on the back
of best satisfaction (53%) among Tier 1 users, primarily
emerging from the Hindi-belt of India
Moj – Secures the highest satisfaction (46%) across the
competition in Metro cities, and second best (45%) in
Tier 2+ cities
MX Takatak – Performs well in the vernacular region,
leading to higher NPS (46%) in Tier 1 cities
Roposo – While Roposo does reasonably well in Tier 2+
cities (garnering ~57% NPS), it’s Metro users’ satisfaction
is at 21%
This Machine Learning Algorithms presentation will help you learn you what machine learning is, and the various ways in which you can use machine learning to solve a problem. At the end, you will see a demo on linear regression, logistic regression, decision tree and random forest. This Machine Learning Algorithms presentation is designed for beginners to make them understand how to implement the different Machine Learning Algorithms.
Below topics are covered in this Machine Learning Algorithms Presentation:
1. Real world applications of Machine Learning
2. What is Machine Learning?
3. Processes involved in Machine Learning
4. Type of Machine Learning Algorithms
5. Popular Algorithms with a hands-on demo
- Linear regression
- Logistic regression
- Decision tree and Random forest
- N Nearest neighbor
What is Machine Learning: Machine Learning is an application of Artificial Intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed.
- - - - - - - -
About Simplilearn Machine Learning course:
A form of artificial intelligence, Machine Learning is revolutionizing the world of computing as well as all people’s digital interactions. Machine Learning powers such innovative automated technologies as recommendation engines, facial recognition, fraud protection and even self-driving cars.This Machine Learning course prepares engineers, data scientists and other professionals with knowledge and hands-on skills required for certification and job competency in Machine Learning.
- - - - - - -
Why learn Machine Learning?
Machine Learning is taking over the world- and with that, there is a growing need among companies for professionals to know the ins and outs of Machine Learning
The Machine Learning market size is expected to grow from USD 1.03 Billion in 2016 to USD 8.81 Billion by 2022, at a Compound Annual Growth Rate (CAGR) of 44.1% during the forecast period.
- - - - - -
What skills will you learn from this Machine Learning course?
By the end of this Machine Learning course, you will be able to:
1. Master the concepts of supervised, unsupervised and reinforcement learning concepts and modeling.
2. Gain practical mastery over principles, algorithms, and applications of Machine Learning through a hands-on approach which includes working on 28 projects and one capstone project.
3. Acquire thorough knowledge of the mathematical and heuristic aspects of Machine Learning.
4. Understand the concepts and operation of support vector machines, kernel SVM, naive Bayes, decision tree classifier, random forest classifier, logistic regression, K-nearest neighbors, K-means clustering and more.
5. Be able to model a wide variety of robust Machine Learning algorithms including deep learning, clustering, and recommendation systems
- - - - - - -
Stock Price Trend Forecasting using Supervised LearningSharvil Katariya
The aim of the project is to examine a number of different forecasting techniques to predict future stock returns based on past returns and numerical user-generated content to construct a portfolio of multiple stocks in order to diversify the risk. We do this by applying supervised learning methods for stock price forecasting by interpreting the seemingly chaotic market data.
CRICKET SCORE AND WINNER PREDICTOR
Cricket matches are known to be tremendously exciting but also, at times, extremely unpredictable. Players are in a constant state of training to emerge triumphant in their matches. To train their teams, coaches use previous performances of their respective teams to target areas where the team needs improvement. This would entail that coaches spend a lot of hours going through video footage trying to analyze what happened and what could have happened had their tactics been different. This wastes precious time and is a major cause of inefficiency in the work-flow. Resolving this would be of tremendous help to coaches as well as their teams and would give them an edge over other teams. This project aims to optimize this process of analyzing cricket matches to change tactics and encourage teams to perform better against certain rival teams through data mining algorithms. The goal is to create a model through the Linear Regression algorithm that predicts the score of an ongoing match by giving ball-to-ball data of previous similar matches (played on the same ground, played against the same team etc as the ongoing match) and determining the chances of positive outcomes for a particular team.
Team Members:
Keya Shukla (171210033) - Group Leader
Tanika Jindal (171210056)
Srijan Gupta (171210051)
Data Set Used:
https://cricsheet.org/
Data mining techniques are very effective and useful for forecasting in many domains or fields. In this
research, prediction of Spanish la liga football match outcomes is carried out using various data mining techniques
(Multilayer Perception, Decision Tables, Random Forest, Reptree and Meta. Bagging) to determine the most accurate
among these techniques.
This was the presentation for the project submitted for the Machine Learning course from a faculty of Duke University on coursera.com. This presentation talks about an ML model based on a multiple linear regression approach which helps predict power output for a given set of input values for different features considered in this problem.
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
1. IPL Data Analysis
Kaushal Sanadhya
Indraprashta Institute of Information Technology,Delhi
Okhla Industrial Estate,Phase III
New Delhi,India
emailid : kaushal19133@iiitd.ac.in
Abstract—Cricket is one of the most celebrated games in
the world. With the introduction of Data science and machine
learning techniques in the world of cricket, forecasting the score
of the match has been established as one of the most challenging
problems.Especially in the shortest format T20, score forecasting
and analyzing other statistics become more important as every
moment is sufficient enough to take the game away from oppo-
sition team. Our work develops some crucial predictions using
various machine learning models like RandomForestRegressor,
Linear regressor , Radius Nearest Neighbors etc.
Index Terms—score forecasting,modeling Indian Premier
League data,Regression,machine learning
I. INTRODUCTION
There can be several factors that strongly affect predictions
like the current score, wickets in hand, weather conditions,
dew factor, pitch condition, etc. We have used a data set
of 1,79,079 records consisting of the data for every single
ball in IPL matches from the year 2009 to 2019. Significant
contributions from this project are as follows:
a) Feature construction: We have created new attributes
[balls remaining, current score, wickets in hand] that can
capture the critical information in the dataset(deliveries.csv)
much more efficiently than the original attributes.:
b) Final score prediction : predicting the eventual score
in the first innings. :
II. FEATURE CONSTRUCTION
The existing features (Deliveries.csv) like over,
ball,is super over,wide runs,non-stricker, etc. are not
good enough to make confident, reliable predictions for
a final score, Therefore new features score are created as
follows:
• balls remaining: number of balls remaining in the first
innings of the match
• current score: current score of the team.
• wickets remaining: wickets in hand for the team.
• final score: final score of that team in that match this is
the target variable which we are trying to forecast.
When these newly created features are used to predict the
final score, we obtained some handsome value of R square for
different machine learning models used in the project
III. PREDICTING FIRST INNINGS SCORES
Score prediction for the first innings is a typical multiple
regression problem since the output is the forecasted score.
Data set of size 10,315 records is used to train our model.
A. Training Data
Match data for the following teams is used to train our
machine learning models:
• Chennai Super Kings
• Hyderabad Sun Risers
• Mumbai Indians
Training data size is 10,316 records are used .
B. Test Data
Match data for Kolkata Knight Riders is used for Testing
purposes. To make predictions for the final score of the Kolkata
team, every time the final score for ten random matches is
selected, which is further compared with the predicted score
by our machine learning models.
IV. REGRESSION MODELS USED
A. Multivariate Linear Regression
Most of the cricket problems that are encountered will
have more than two variables. Therefore Multivariate Linear
regression is used to fit the line in our multi dimensional space.
Using this regression model we can even draw the impact of
each feature on the predicted score using the below well know
equation for linear regression: Y = a + b*X1 + c*X2 + d*X3
For our analysis we will get the below equation:
Final Score = 27.915 + 0.989121 * current score +
1.183421 * balls remaining - 3.576307 * wickets
Actual Score Predicted Score
150 154
119 114
204 210
130 131
155 152
160 169
222 232
163 160
223 231
148 149
2. 1) Performance Evaluation: Mean Absolute error and Root
mean Squared errors are the two parameters used to evaluate
the performance and following values are achieved:
Mean Absolute Error: 5.13
Root Mean Squared Error: 6.02
These values are moderately high since the actual and pre-
dicted scores differs by at most 10 runs. The performance can
be enhanced by using some more advanced regression models
like ada boost or Random forest Regression.
B. Random Forest Regression
Random Forest Regression is an ensemble technique which
makes use of multiple prediction model.It combines the result
of these prediction models to give more accurate results.
Actual Score Predicted Score
150 150
119 121
204 201
130 132
155 155
160 160
222 220
163 163
223 221
148 148
1) Importance Associated With Various Features: By
looking at the values of the features importance we can
estimate the significant contribution made by some feature.
Feature Importance
current score 0.47
Balls Remaining 0.30
Wickets 0.22677739128946717
2) Performance Evaluation: Following are the values of
Mean Absolute Error , Root means squared Error:
Mean Absolute Error: 1.36
Root Mean Squared Error: 1.69
These values are far better than the values achieved using
Multiple linear Regression which highlights the power of
Ensemble regression models.One can easily visualize the same
by looking at the difference between predicted and actual score
which is at most 3 runs.
C. Radius Neighbors Regression
This regression model is based on the concept of K nearest
neighbors.Just like K nearest neighbors regression model
, Radius Neighbors Regression finds the neighbors within
specific distance(Manhattan Distance).
Actual Score Predicted Score
150 150
119 121
204 204
130 132
155 157
160 163
222 221
163 164
223 222
148 149
1) Performance Evaluation: Following are the values of
Mean absolute Error and Root mean squared Error for the
model.
Mean Absolute Error: 1.36
Root Mean Squared Error: 1.79
These values are better than the multiple regression model
and somewhat comparable to Random Forest ensemble error
values.
D. Comparison of R square Values
R square values of the regression models are shown in the
below graph.
High values of R square for Random Forest and Radius
Nearest Neighbors depicts that these regression models ex-
plains all the variability of the response data around its mean
in more effective manner as compared to Multivariate Linear
Regression Model.
E. Comparison of Mean Absolute Error
The mean of the absolute value of the errors is defined as
the absolute difference between actual and predicted score for
a match.The mathematical formula is given below:
3. Where MAE stands for Mean Absolute Error.
This difference is 6-7 runs for Multivariate Linear Re-
gression, 1-5 runs for Random Forest and Radius Nearest
Neighbor.
The graph depicting these error values for all three models
is shown below:
F. Comparison of Root Mean Squared Error
The square root of the mean of the squared errors is called as
Root Mean Squared Error.The mathematical formula is given
below(RMSE : Root Mean Squared Error):
Following graph compares these Root Mean Squared values.
Root Mean Squared Error and Mean absolute error graphs
are similar to each other.Both the graph shows that Random
Forest and Radius Nearest Neighbor Regression are perform-
ing better as compared to Multivariate Linear Regression
Model.
REFERENCES
[1] https://www.espncricinfo.com/series/ /id/8048/season/2019/indian-
premier-league
[2] Kaggle Data Set https://www.kaggle.com/manasgarg/ipl
[3] Regression Model Implementation https://towardsdatascience.com/
[4] Scikit-Learn Documentation https://scikit-
learn.org/stable/documentation.html
[5] Indian Premier League Official https://www.iplt20.com/
[6] Cricket Analytics Visualized https://cricketsavant.wordpress.com/
[7] Predicting the Outcome of ODI Cricket Matches: A Team Composition
Based Approach by Madan Gopal Jhawar, Vikram Pudi, IIIT-H