2. STATISTICAL ANALYSIS OF THE PERFORMANCE OF
ROYAL CHALLENGERS BANGALORE IN THE INDIAN
PREMIER LEAGUE WRITTEN BY RYANSTON RODRIGUES
3. KEYWORDS
• IPL – INDIAN PREMIER LEAGUE
• RCB – ROYAL CHALLENGERS BANGALORE
• BCCI – BOARD OF CONTROL FOR CRICKET IN INDIA
• NR – NO RESULT
• 1 – WON
• 0 – LOSS
• T20 – 20 20 CRICKET MATCH FORMAT
4. ABSTRACT
• IPL IS THE PROFESSTIONAL T20 CRICKET LEAGUE IN INDIA
• IPL IS FAMOUS AND MOST ATTENDED T20 LEAGUE IN THE WORLD
• EVERY YEAR 8 TEAMS PLAYS IN IPL REPRESENTING 8 STATES OR CITIES
• IN IPL EVERY TEAM PLAYS 14 MATCHES COMPULSORILY AND IF THEY QUALIFY FOR
PLAYOFF THEY PLAY ATMOST 3 MATCHES THAT DEPENDS ON THE POINTS TABLE
• 13 SEASONS OF IPL HAVE BEEN CONDUCTED SO FAR
• RCB IS THE FRANCHISE CRICKET TEAM BASED ON BANGALORE AND KARNATAKA ,THAT
PALYS IN IPL
• VIRAT KOHLI IS THE CAPTAIN OF RCB FROM 2014
• RCB IS ONE OF THE MOST POPULAR TEAM IN IPL BUT NEVER WON SINGLE EDITION OUT
OF 13, BUT RCB FINISHED RUNNER UP 3 TIMES (IN 2009,2011 AND 2016)
• RCB HOLDS BOTH HIGHEST AND LOWEST SCORE IN SINGLE IPL MATCH
• THE AIM OF THIS RESEARCH IS TO UNDERSTAND THE POOR PERFORMANCE OF RCB IN
THA IPL USING STATISTICAL AND MATHEMATICAL TECHINQUES LIKE MATRICES,GRAPHS
AND ALSO BUILT A MODEL TO PREDICT THE OUTCOME OF A GIVEN MATCH FOR THE TEAM
USING TECHNIQUE OF LOGISTIC REGRESSION IN R SOFTWARE
5. INTRODUCTION
• THE JOURNEY OF THE RCB IN THE IPL IS MIXED BAG
• WHILE THEY HAVE A FAIR SHARE OF WINS AND LOSSES AGAINST ALL
THE OTHER TEAMS
• THE TEAM HAS PERFORMED VERY POORLY OVER THE YEARS
• THEY HAD FAILED TO WIN A SINGLE EDITION OF THE IPL
• RCB TEAM HAD LOST THE MATCHES WHICH THEY CAN WIN EASILY
• RCB TEAM HAD WON THE MATCHES WHICH ARE SO DIFFICULT TO WIN
• THEY DID NOT ABLE TO DOMINATE ANY ONE EDITION TOTALLY THAT’S
THE ONE OF THE REASON THEY NEVER WON THE SINGLE EDITION OF
THE IPL
• THIS IS WHERE THE LOGISTIC REGRESSION COMES INTO PLAY
• IN THIS RESEARCH, WE ARE GOINIG TO LOOK AT THE PERFORMANCE
OF RCB IN IPL AND TRY TO FIND OUT THE REASONS FOR THE POOR
RESULTS IN THE TOURNAMENT IN RECENT YEARS
6. LOGISTIC REGRESSION
• LOGISTIC REGRESSION IS A STATISTICAL
MODEL WHERE THE RESPONSE(DEPENDENT)
VARIABLE HAS ONLY TWO OUTCOMES
SUCCESS AND FAILURE
• USUALLY DENOTED BY 0 AND 1
• 0 DENOTES FAILURE AND 1 DENOTES
SUCCESS
• THE RESPONSE IS USUALLY A COUNT SUCH
AS NO OF WINS AND LOSSES (IN THS
SCENARIO)
• WE HAVE TWO POSSIBLE OUTCOMES,THIS
MODEL IS TERMED AS BINARY LOGISTIC
REGRESSION MODEL
7. DATA SET
• THE DATA SET AND R CODES USED FOR THE RESEARCH IS PROVIDED
AT THE END OF THE PAPER.
• IT CONTAINS 191 OBSERVATION CORRESPONDS TO A MATCH PLAYED
BY RCB
• THE VARIABLES ARE Runs ,Innings ,Wickets. Taken,Wickets. Lost and
Results.
• NOTE THAT THIS DATA SET DOES NOT INCLUDE DATA FOR THOSE
MATCHES WHICH NOT HAVE A CONCLUSIONS(MATCH ABANDONED
DUE TO RAIN,ETC)
• WE ANALYS THE PERFORMACE OF THE RCB BY USING TWO PARTS
• 1)ANALYSIS USING GRAPHS
• 2)BUILDING LOGISTIC REGRESSION MODEL
8. PART 1: ANALYSIS USING GRAPH
• From our given dataset, we consider the Results
column, which shows the matches in which RCB
won or lost (0 for loss and 1 for win).
• For our convenience, we have omitted the results of
those matches which ended in NR (no result). The
total wins and losses are as follows:
• 0 1
• 100 91
• And their win loss percentage is given as follows:
• 0 1
• 0.5235602 0.4764398
• Clearly, it is evident that RCB has a greater loss%
then win% which shows that why RCB has
severely underperformed in the IPL
9. RCB PERFORMANCE IN TAKING AND LOSSING
WICKETS
• THE FOLLOWING GRAPHS SHOW VARIOUS
CRICKET STATISTICS FOR RCB, WITH THE HELP OF
WHICH WE SHALL ANALYZE THEIR PERFORMANCE
OVER THE YEARS. FOR CONVENIENCE, WE PAY
CLOSE ATTENTION TO THE PREVIOUS TWO
SEASONS SINCE THE CORE TEAM WAS MORE OR
LESS THE SAME FOR THESE TWO SEASONS.
• THE ABOVE GRAPHS SHOW THE RUNS SCORED
BY RCB IN BOTH INNINGS. FROM THE GIVEN DATA,
THE AVERAGE FIRST AND SECOND INNINGS
TOTAL BY RCB ARE 166.67 AND145.74
RESPECTIVELY, WHICH SHOWS THAT RCB
PERFORM WELL WITH THE BAT AND SCORE MORE
RUNS IN THE 1ST INNINGS AS COMPARED TO THE
2ND INNINGS OVER THE YEARS. HENCE, WE CAN
CONCLUDE THAT RCB ARE BETTER IN SETTING
THE TARGET RATHER THAN CHASING IT.
10. RCB PERFORMANCE IN TAKING AND LOSSING WICKETS
THE AVERAGE NUMBER OF WICKETS
TAKEN BY RCB IS 5.774869 AND THAT OF
WICKETS LOST IS 5.706806 , BOTH OF
WHICH ARE APROXIMATELY 6.
THIS MEANS THAT RCB TAKES WICKETS
AS WELL AS LOSSES SAME NO OF
WICKETS.
FROM BOWLING PERSPECTIVE ITS IS
QUITE GOOD TO TAKE 6 WICKETS PER
MATCH.
LOSING 6 WICKETS PER MATCH IS A
SIGN OF BATSMAN STRUGGLING ON
THE PITCH.
THIS SCENARIO IS SUPPORTED BY THE
11. BATSMANS RUNS SCORED AND BALLS FACED
• FROM RCB RUNS SCORED GRAPH,WE CAN SAY THAT BOTH
KOHLI AND DE VILLIERS HAD MAJORITY CONTRIBUTION IN THE
TEAM BATTING
• YOUNG OPENER(DEVDUTT PADIKKAL) HAD A VER GOOD OUTING
IN HIS DEBUT YEAR , HE HAD SCORED 473 RUNS.
• THE REST OF THE BATSMANS DID NOT SCORED ANY
CONSIDERABLE AMOUNT OF RUNS
• THIS SHOWS THAT THE TEAM IS TOTALLY DEPEND ON THEIR
UPPER ORDER BATSMANS.
• IF THE UPPER ORDER BATSMANS FAILED TO SCORE RUNS THEN
THE REST OF THE TEAM HAS A HARD TIME TO SCORE RUNS
• FROM THE RCB BALLS FACED GRAPH ,WE CAN SEE THAT ONLY 4
BATSMANS (UPPER ORDER) HAVE PLAYED MORE THAN 200
BALLS THIS YEAR.
• THE REST OF THE BATSMAN WHO USUALLY BUT AT LOWER
ORDER,FACED FEW BALLS THIS YEAR.
12. BOWLERS WICKETS TAKEN AND BALLS
BOWLED
FROM THE 1ST GRAPH INDICATES THAT YUZVENDRA
CHAHAL HAS BEEN THE MOST COSISTENT BOWLER
FOR RCB WITH AN AVERAGE 17 WICKETS IN 7 YEARS
OF PLYING FOR THE TEAM.
WE CAN CONCLUDE THAT ONE OF THE REASONS THAT
RCB STRUGGLE IN THEIR BOWLING DEPARTMENT IS
THAT THEY ARE HEAVILY DEPENDENT ON A SINGLE
BOWLER (YUZI) FOR PICKING UP THE BULK OF
WICKETS.
FROM THE 2ND GRAPH IT IS EVIDENT THAT CHAHAL
BOWLED THE MOST BOWLS THIS YEAR.
FOLLEWED BY WASHINGTON SUNDER AND NAVDEEP
SAINI ,ONE KEY THING IS THAT THEY HAVEN’T PICKED
UP MANY WICKETS THIS YEAR.
THIS AGAIN CEMENTS THE FACT THAT THE TEAM WAS
13. BATTING AND BOWLING STRIKE RATE
CHRIS MORRIS HAD THE HIGHEST STRIKE RATE BUT THE THING IS
THAT HE DIDN’T FACED MANY BALLS,THAT DOES NOT IMPLY THAT THE
PLAYER CONTRIBUTED SIGNIFICANTLY WITH THE BAT.
IN CASE OF KOHLI AND DE VILLIERS THEY SCORED MORE RUNS AND
FACED MANY BALLS BUT STILL HAD LOWER STRIKE RATE .THIS IS
WHY BATTIN STRIKE RATE CAN BE MISLEADING SOME TIMES.
IN THE T20 LOWER THE BOWLING STRIKE RATE ,THE CONSISTENT
THE BOWLER IS SAID TO BE CHAHAL, MORRIS,SIRAJ AND DUDE HAD
LOWER BOWLING STRIKE RATES.
CHAHAL IS PICKED MOST WICKETS AND BOWLED THE MOST BALLS.
DUDE HAS LOWER BOWLING STRIKE RATE THAN CHAHAL BUT THE
CHAHAL HAD BOWLED MANY BALLS AND PICKED MANY WICKETS THAN
DUDE
SO HERE THE BOWLING STRIKE RATE CAN BE MISLEADING
IN THIS WAY, THE ABOVE GRAOHS HELPS US TO ANALYZE THE MAIN
REASONS FOR THE TEAM’S LACK OF SUCCESS.THE OVER
DEPENDENCY OF THE TEAM ON ONLY A COUPLE OF BATSMAN AND
BOWLERS HAD LED TO TEAMS DOWNFALL IN THE TOURNAMENT.
14. PART 2 :BUILDING LOGISTIC REGRESSION
MODEL
IN THE SECOND PART OF THE
ANALYSIS,WE TRY TO FIT A LOGISTIC
REGRESSION MODEL TO THE GIVEN
DATA.OUR AIM IS TO BUILT A MODEL
THAT PREDICTS THE OUTCOME OF A
GIVEN MATCH BASED ON VARIABLES
SUCH AS WICKETS TAKEN AND
WICKETS LOST.
BY USING BACKWARD ELIMINATION
TECHNIQUE,WE CAN CONCLUDE THAT
THE TWO MOST STATISTICALLY
SIGNIFICANT VARIABLES ARE
WICKETS.TAKEN AND WICKETS.LOST.
WE DIVIDE THE DATA SET INTO TWO
PARTS,A TRAINING SET AND TESTING
SET.
WE BUILT THE MODEL USING THE
TRAINING SET AND USE THE TESTING
SET FOR PREDICTING NEW VALUES.BY
USING THE SUMMARY() FUNTION,WE
GET THE INFORMATION ABOUT THE
15. PREDICTION OF
RESULTS
WE USE THE PREDICT() FUNTION TO
PREDICT THE RESULTS AND CREATE A
CONFUSION MATRIX FOR CHECKING OUR
PREDICTIONS.THE CONFUSION MATRISES
ARE GIVEN IN THE FIGURES
WHERE CM1 AND CM2 ARE THE
CONFUSION MATRICES FOR TRAINING
AND TESTING SET RESPECTIVELY. THE
MODEL HAS AN ACCURACY OF 86% AND
79% FOR TRAINING AND TESTING SET
RESPECTIVELY, WHICH IS A PRETTY GOOD
RESULT CONCIDERING OUR DATA SET
HAS ONLY 191 OBSERVATIONS.
WITH RCB PLAYING MORE MATCHES IN
THE FUTURE,THE DATA SET CAN BE
INCRESED AND ACCORDINGLY THE
ACCURACY CAN ALSO BE INCRESED.
16. CONCLUSION
• Even though RCB remains a popular team among IPL fans, it can be seen that
why the team has consistently been underperforming.
• With only a few key players performing for the team, the success rate is low for
the teams.
• This assumption of their poor success rate is verified by the analysis using
graphs on various statistics. Logistic Regression helps us to build a model with a
decent accuracy for predicting outcome for a given match played by the team.
17. REFERENCES
• Douglas C. Montgomery, Elizabeth A. Beck, G. Geoffrey
Vining(2012), Introduction to Linear Regression Analysis, Wiley, Fifth
Edition
• G.Sudhamathy and G.RajaMeenakshi(2020), Prediction on IPL Data
using Machine Learning Techniques in R Package, ISSN: 2229-6956
(Online)
• Indian Premier League – Wikipedia
• Indian Premier League Official Website www.iplt20.com
• Royal Challengers Bangalore – Wikipedia
• LINK FOR DATASET AND R CODE:
https://github.com/Ryanston/RCB-Research