The document discusses analyzing the "bogey phenomenon" in sports, where certain teams or players consistently perform better than expected against a specific opponent. It proposes combining the Wald-Wolfowitz Runs Test with an upset result identification method using betting odds. An example application to men's tennis finds evidence that Ferrer was the bogey player of Lopez from 2005-2009 based on a statistically significant number of unexpected losses by Lopez according to betting odds. Further work is suggested to improve the methodology.
This document discusses predicting the outcome of cricket matches and assisting coaches. It will use algorithms like Naive Bayes and ID3 to predict matches based on factors such as home advantage, toss result, and team combination. These predictions will determine betting odds. The system will also assist coaches by selecting the best team using player records and using algorithms like Gale-Shapley to determine the optimal batting order. The document reviews several research papers on related topics and summarizes previous work on analyzing cricket matches.
Supervised sequential pattern mining for identifying important patterns of pl...Rory Bunker
This document describes a study that uses supervised sequential pattern mining (SPM) to analyze sequence data from rugby matches and identify important patterns of play. The study converts match event logs into labeled sequences of play events, then applies a supervised SPM method to identify patterns that discriminate between scoring/not scoring or conceding/not conceding. The supervised SPM method identifies more sophisticated and relevant patterns compared to unsupervised SPM methods. Key patterns indicated line breaks and lineouts were important for scoring, and maintaining possession and finding touch on kick restarts were important for preventing scores. The study concludes supervised SPM is useful for performance analysis in sports.
Hawk-Eye is a computer system that uses multiple cameras and trigonometry to visually track the path of a ball and display its most likely trajectory. It was developed in the UK and is now used in sports like cricket, tennis, and snooker to aid in decisions. Hawk-Eye systems use cameras placed around the field that triangulate the ball's position to create 3D representations and projections of its path. This technology helps minimize human error in close calls and allows players to review and analyze past performances. While expensive, Hawk-Eye brings more accuracy and fairness to many sports.
1) The document describes a probabilistic graphical model for simulating basketball matches. It builds on a previous model by including the possibility of dribbling and distinguishing between open and contested shots.
2) Key aspects of the model include probabilities for shot attempts, drives to the basket, shot efficiency, and defensive impact. These are calculated based on player tendencies and abilities as well as the offensive and defensive lineups.
3) The model represents events in a possession as vertices in a graph and the progression between events as edges with weighted probabilities. This allows full simulations of games to be run using the model.
The document discusses using machine learning models to predict point totals in NBA games in order to inform sports betting. It explores using collaborative filtering, neural networks, and LSTMs to predict the combined score of both teams. The best models were able to achieve results similar to sportsbooks, correctly predicting the outcome 51.5% of the time based on the mean squared error between the model predictions and actual scores. Feature engineering included team performance statistics from previous games as well as player and opponent data.
Abstract:
In this talk we will introduce MOE, the Metric Optimization Engine. MOE is an efficient way to optimize a system's parameters, when evaluating parameters is time-consuming or expensive. It can be used to help tackle a myriad of problems including optimizing a system's click-through or conversion rate via A/B testing, tuning parameters of a machine learning prediction method or expensive batch job, designing an engineering system or finding the optimal parameters of a real-world experiment.
MOE is ideal for problems in which the optimization problem's objective function is a black box, not necessarily convex or concave, derivatives are unavailable, and we seek a global optimum, rather than just a local one. This ability to handle black-box objective functions allows us to use MOE to optimize nearly any system, without requiring any internal knowledge or access. To use MOE, we simply need to specify some objective function, some set of parameters, and any historical data we may have from previous evaluations of the objective function. MOE then finds the set of parameters that maximize (or minimize) the objective function, while evaluating the objective function as few times as possible. This is done internally using Bayesian Global Optimization on a Gaussian Process model of the underlying system and finding the points of highest Expected Improvement to sample next. MOE provides easy to use Python, C++, CUDA and REST interfaces to accomplish these goals and is fully open source. We will present the motivation and background, discuss the implementation and give real-world examples.
Scott Clark Bio:
After finishing my PhD in Applied Mathematics at Cornell University in 2012 I have been working on the Ad Targeting team at Yelp Inc. I've been employing a variety of machine learning and optimization techniques from multi-armed bandits to Bayesian Global Optimization and beyond to their vast dataset and problems. I have also been trying to lead the charge on academic research and outreach within Yelp by leading projects like the Yelp Dataset Challenge and open sourcing MOE.
A detailed understanding about the technology and its implementations in various sports along with its limitations in Cricket. Comment your queries and mail me if you want to discuss upon this with me at abhinaybandaru@hotmail.com
Analysis on Attributes Deciding Cricket WinningIRJET Journal
This document discusses factors that influence the outcome of cricket matches and summarizes previous research on predicting cricket results. It identifies several key factors that can decide a cricket match, including the pitch, toss result, team strength, home advantage, current team and player form, and weather conditions. The document then summarizes various studies that have used statistical analysis and machine learning techniques like Bayesian classifiers and neural networks to predict match scores, winners, and outcomes based on these influential attributes.
This document discusses predicting the outcome of cricket matches and assisting coaches. It will use algorithms like Naive Bayes and ID3 to predict matches based on factors such as home advantage, toss result, and team combination. These predictions will determine betting odds. The system will also assist coaches by selecting the best team using player records and using algorithms like Gale-Shapley to determine the optimal batting order. The document reviews several research papers on related topics and summarizes previous work on analyzing cricket matches.
Supervised sequential pattern mining for identifying important patterns of pl...Rory Bunker
This document describes a study that uses supervised sequential pattern mining (SPM) to analyze sequence data from rugby matches and identify important patterns of play. The study converts match event logs into labeled sequences of play events, then applies a supervised SPM method to identify patterns that discriminate between scoring/not scoring or conceding/not conceding. The supervised SPM method identifies more sophisticated and relevant patterns compared to unsupervised SPM methods. Key patterns indicated line breaks and lineouts were important for scoring, and maintaining possession and finding touch on kick restarts were important for preventing scores. The study concludes supervised SPM is useful for performance analysis in sports.
Hawk-Eye is a computer system that uses multiple cameras and trigonometry to visually track the path of a ball and display its most likely trajectory. It was developed in the UK and is now used in sports like cricket, tennis, and snooker to aid in decisions. Hawk-Eye systems use cameras placed around the field that triangulate the ball's position to create 3D representations and projections of its path. This technology helps minimize human error in close calls and allows players to review and analyze past performances. While expensive, Hawk-Eye brings more accuracy and fairness to many sports.
1) The document describes a probabilistic graphical model for simulating basketball matches. It builds on a previous model by including the possibility of dribbling and distinguishing between open and contested shots.
2) Key aspects of the model include probabilities for shot attempts, drives to the basket, shot efficiency, and defensive impact. These are calculated based on player tendencies and abilities as well as the offensive and defensive lineups.
3) The model represents events in a possession as vertices in a graph and the progression between events as edges with weighted probabilities. This allows full simulations of games to be run using the model.
The document discusses using machine learning models to predict point totals in NBA games in order to inform sports betting. It explores using collaborative filtering, neural networks, and LSTMs to predict the combined score of both teams. The best models were able to achieve results similar to sportsbooks, correctly predicting the outcome 51.5% of the time based on the mean squared error between the model predictions and actual scores. Feature engineering included team performance statistics from previous games as well as player and opponent data.
Abstract:
In this talk we will introduce MOE, the Metric Optimization Engine. MOE is an efficient way to optimize a system's parameters, when evaluating parameters is time-consuming or expensive. It can be used to help tackle a myriad of problems including optimizing a system's click-through or conversion rate via A/B testing, tuning parameters of a machine learning prediction method or expensive batch job, designing an engineering system or finding the optimal parameters of a real-world experiment.
MOE is ideal for problems in which the optimization problem's objective function is a black box, not necessarily convex or concave, derivatives are unavailable, and we seek a global optimum, rather than just a local one. This ability to handle black-box objective functions allows us to use MOE to optimize nearly any system, without requiring any internal knowledge or access. To use MOE, we simply need to specify some objective function, some set of parameters, and any historical data we may have from previous evaluations of the objective function. MOE then finds the set of parameters that maximize (or minimize) the objective function, while evaluating the objective function as few times as possible. This is done internally using Bayesian Global Optimization on a Gaussian Process model of the underlying system and finding the points of highest Expected Improvement to sample next. MOE provides easy to use Python, C++, CUDA and REST interfaces to accomplish these goals and is fully open source. We will present the motivation and background, discuss the implementation and give real-world examples.
Scott Clark Bio:
After finishing my PhD in Applied Mathematics at Cornell University in 2012 I have been working on the Ad Targeting team at Yelp Inc. I've been employing a variety of machine learning and optimization techniques from multi-armed bandits to Bayesian Global Optimization and beyond to their vast dataset and problems. I have also been trying to lead the charge on academic research and outreach within Yelp by leading projects like the Yelp Dataset Challenge and open sourcing MOE.
A detailed understanding about the technology and its implementations in various sports along with its limitations in Cricket. Comment your queries and mail me if you want to discuss upon this with me at abhinaybandaru@hotmail.com
Analysis on Attributes Deciding Cricket WinningIRJET Journal
This document discusses factors that influence the outcome of cricket matches and summarizes previous research on predicting cricket results. It identifies several key factors that can decide a cricket match, including the pitch, toss result, team strength, home advantage, current team and player form, and weather conditions. The document then summarizes various studies that have used statistical analysis and machine learning techniques like Bayesian classifiers and neural networks to predict match scores, winners, and outcomes based on these influential attributes.
This document discusses using machine learning and data analytics to analyze basketball games and player performance. It describes tracking player and ball location data over time to measure aspects of the game like spacing, shooting ability, and defensive metrics. It details analyzing corner three-point shots to understand how they are created and find distinct patterns in shooter and defender movement before shots. The document also proposes using game theory and behavioral science to understand defensive strategies against corner three shooters. Finally, it outlines a method to evaluate the value of individual player actions like passes and screens by predicting possession outcomes and modified point values over time.
A Framework For Scheduling Professional Sports LeaguesAmber Ford
This document introduces a framework for modeling professional sports league scheduling problems. It defines key terminology used in sports scheduling like rounds, home/away games, breaks, and balanced schedules. It also outlines common constraints seen in real-world sports scheduling problems that aim to minimize breaks while considering other factors like travel distance, schedule compactness, and strength of opponents. The document provides examples from professional leagues that have worked with academics on scheduling and makes sample scheduling instances available online to test solution methods.
In the domain of Sport Analytics, Global Positioning Systems devices are intensively used as they permit to retrieve players' movements. Team sports' managers and coaches are interested on the relation between players' patterns of movements and team performance, in order to better manage their team. In this paper we propose a Cluster Analysis and Multidimensional Scaling approach to find and describe separate patterns of players movements. Using real data of multiple professional basketball teams, we find, consistently over different case studies, that in the defensive clusters players are close one to another while the transition cluster are characterized by a large space among them. Moreover, we find the pattern of players' positioning that produce the best shooting performance.
This document discusses predicting the outcomes of National Hockey League (NHL) games using machine learning models. It aims to improve upon the results of a previous study by the University of Ottawa that achieved 60% accuracy. The document uses the same dataset from the Ottawa study containing statistics from 517 NHL games. It builds machine learning models using decision trees, neural networks, and a proprietary software to predict game outcomes. The models are built using different combinations of the dataset's categorical and continuous variables. The best performing models achieve accuracies between 57-62%, showing an improvement over the previous study.
Prediction Of Right Bowlers For Death Overs In CricketIRJET Journal
This document discusses predicting the right bowlers for the death overs in cricket. It begins with an abstract that explains how selecting the optimal bowlers for the final overs of an innings can impact the outcome of a match. It then reviews related literature on topics like performance prediction, extracting player strengths/weaknesses, and using evolutionary algorithms for team selection. The document also compares different machine learning algorithms like decision trees, random forests, support vector machines, and Naive Bayes that could be used to build models to predict bowler performance in death overs.
This document is a dissertation that examines the determinants of NHL goalies' salaries. It aims to extend previous research by considering factors related to a player's popularity in addition to on-ice performance statistics. The author argues that after the 2004-2005 NHL lockout, which increased league profitability and popularity, goalies' wages became dependent on both on-ice production and off-ice popularity measures. Using regression analysis, the paper finds that including variables related to popularity significantly improves the model's ability to explain variation in goalies' salaries compared to only using performance statistics. The document provides context on previous literature, discusses the impact of the lockout, and outlines the data and methodology used in the empirical analysis.
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
This document discusses using machine learning and data analytics to analyze basketball games and player performance. It describes tracking player and ball location data over time to measure aspects of the game like spacing, shooting ability, and defensive metrics. It details analyzing corner three-point shots to understand how they are created and find distinct patterns in shooter and defender movement before shots. The document also proposes using game theory and behavioral science to understand defensive strategies against corner three shooters. Finally, it outlines a method to evaluate the value of individual player actions like passes and screens by predicting possession outcomes and modified point values over time.
A Framework For Scheduling Professional Sports LeaguesAmber Ford
This document introduces a framework for modeling professional sports league scheduling problems. It defines key terminology used in sports scheduling like rounds, home/away games, breaks, and balanced schedules. It also outlines common constraints seen in real-world sports scheduling problems that aim to minimize breaks while considering other factors like travel distance, schedule compactness, and strength of opponents. The document provides examples from professional leagues that have worked with academics on scheduling and makes sample scheduling instances available online to test solution methods.
In the domain of Sport Analytics, Global Positioning Systems devices are intensively used as they permit to retrieve players' movements. Team sports' managers and coaches are interested on the relation between players' patterns of movements and team performance, in order to better manage their team. In this paper we propose a Cluster Analysis and Multidimensional Scaling approach to find and describe separate patterns of players movements. Using real data of multiple professional basketball teams, we find, consistently over different case studies, that in the defensive clusters players are close one to another while the transition cluster are characterized by a large space among them. Moreover, we find the pattern of players' positioning that produce the best shooting performance.
This document discusses predicting the outcomes of National Hockey League (NHL) games using machine learning models. It aims to improve upon the results of a previous study by the University of Ottawa that achieved 60% accuracy. The document uses the same dataset from the Ottawa study containing statistics from 517 NHL games. It builds machine learning models using decision trees, neural networks, and a proprietary software to predict game outcomes. The models are built using different combinations of the dataset's categorical and continuous variables. The best performing models achieve accuracies between 57-62%, showing an improvement over the previous study.
Prediction Of Right Bowlers For Death Overs In CricketIRJET Journal
This document discusses predicting the right bowlers for the death overs in cricket. It begins with an abstract that explains how selecting the optimal bowlers for the final overs of an innings can impact the outcome of a match. It then reviews related literature on topics like performance prediction, extracting player strengths/weaknesses, and using evolutionary algorithms for team selection. The document also compares different machine learning algorithms like decision trees, random forests, support vector machines, and Naive Bayes that could be used to build models to predict bowler performance in death overs.
This document is a dissertation that examines the determinants of NHL goalies' salaries. It aims to extend previous research by considering factors related to a player's popularity in addition to on-ice performance statistics. The author argues that after the 2004-2005 NHL lockout, which increased league profitability and popularity, goalies' wages became dependent on both on-ice production and off-ice popularity measures. Using regression analysis, the paper finds that including variables related to popularity significantly improves the model's ability to explain variation in goalies' salaries compared to only using performance statistics. The document provides context on previous literature, discusses the impact of the lockout, and outlines the data and methodology used in the empirical analysis.
Similar to MathSportIntl22_Presentation_Rory_Bunker.pdf (6)
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
State of Artificial intelligence Report 2023kuntobimo2016
Artificial intelligence (AI) is a multidisciplinary field of science and engineering whose goal is to create intelligent machines.
We believe that AI will be a force multiplier on technological progress in our increasingly digital, data-driven world. This is because everything around us today, ranging from culture to consumer products, is a product of intelligence.
The State of AI Report is now in its sixth year. Consider this report as a compilation of the most interesting things we’ve seen with a goal of triggering an informed conversation about the state of AI and its implication for the future.
We consider the following key dimensions in our report:
Research: Technology breakthroughs and their capabilities.
Industry: Areas of commercial application for AI and its business impact.
Politics: Regulation of AI, its economic implications and the evolving geopolitics of AI.
Safety: Identifying and mitigating catastrophic risks that highly-capable future AI systems could pose to us.
Predictions: What we believe will happen in the next 12 months and a 2022 performance review to keep us honest.
Natural Language Processing (NLP), RAG and its applications .pptxfkyes25
1. In the realm of Natural Language Processing (NLP), knowledge-intensive tasks such as question answering, fact verification, and open-domain dialogue generation require the integration of vast and up-to-date information. Traditional neural models, though powerful, struggle with encoding all necessary knowledge within their parameters, leading to limitations in generalization and scalability. The paper "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks" introduces RAG (Retrieval-Augmented Generation), a novel framework that synergizes retrieval mechanisms with generative models, enhancing performance by dynamically incorporating external knowledge during inference.
1. 1
The Bogey Phenomenon in Sport
Rory Bunker
Behavior Signal Processing Laboratory (Sports Behavior Group)
Graduate School of Informatics
Nagoya University, Japan
IX Mathsport International Conference
11 - 13 July 2022
2. 2
Introduction
• Loosely speaking, bogey teams (team sport) or bogey players (in
individual sports) tend to beat a particular opposition (the non-
bogey) despite seemingly being weaker ‘on paper’.
• Whether bogey teams/players exist has been the subject of
much debate — particularly among sports fans and media.
• The concept has been briefly mentioned in fields including
education [Bruce‘14] and sociology [Chiweshe’18] [Poulton‘04].
• However, the topic has received very little attention in the
sports science and sports statistics literature.
• In this study, a method is proposed that combines:
• The Wald-Wolfowitz Runs Test (WWRT), a non-parametric
test for randomness in a two-valued sequence
• An unexpected (upset) result identification result method
that uses betting odds and actual results as its key inputs
3. 3
Examples
• In football, Manchester United is considered the bogey team of
Newcastle United.
• Watford has not beaten Manchester City since 1989 (although
note that this does not imply that Manchester City is their bogey
team, since Watford may have been expected to lose on each
occasion).
• Some are considered to exist at specific competitions:
– Portugal being the bogey team of England in football world
cups
– England and Argentina are the respective bogey teams of
Australia and France at rugby world cups
• In this study, the focus is on bogey players in tennis, which has
had less media attention than football but is more
straightforward to analyse (only has 2 possible outcomes).
4. 4
Related areas of research
• Related areas of study include studies on the hot-hand
effect and streaks in sports.
• Streaks can include:
– Individual player actions: e.g., successful 3-pointers in
basketball
– Team-level outcomes: e.g., a team’s run of
consecutive wins at their home venue
• Concept of positive recency: tendency to predict future
outcomes to be the same as previous outcomes.
[Ayton & Fischer ‘04], see [Bar-Eli+’06] for a review
5. 5
Definition
Defining a bogey player
A player who performs consistently better than would be
expected against a specified opposition (the non-bogey
player) over a certain period, given factors including the
difference in rankings, recent form, court surface, etc.
• This definition implies that the bogey phenomenon exists
between pairs of players, similar to pairwise comparisons (e.g.,
the Bradley-Terry model).
• It also suggests that there is a temporal element to the
phenomenon — it may exist for some period of time but not for
other periods.
• Incorporating betting odds into the method (next slide) means the
factors that are underlined in the above definition do not have to
separately be included.
6. 6
Benefit of using betting odds
• Betting odds incorporate many different factors including
strength, venue, form, and — in the case of team sports — player
availability, etc.
• The average odds, taken across multiple bookmaking companies,
will have greater reliability (the dataset in this study includes
average odds from oddsportal.com).
Venue
Form
Strength
…
Betting
odds
• So, betting odds can be used as a single variable in the method
rather than needing to include these different variables separately.
…
7. 7
Materials & Methods
Dataset
• Publicly available data from professional men’s tennis from
tennis-data.co.uk is used — the same as used by [Angelini+22]
• The dataset contains 38,868 matches from 4 July 2005 - 22
November 2020 and contains data including:
– All ATP tour matches from Master’s, ATP finals and Grand
Slams (Men’s ATP)
– Average bookmaker odds (oddsportal.com)
• The final number of matches was 33,976 after passing the dataset
through the clean() function in the welo R package [Angelini+22].
38,868
matches
33,976
matches
clean()
Data Data_Clean
8. 8
Materials & Methods
Wald-Wolfowitz Runs Test (WWRT)
• The WWRT has been used by several researchers in sports statistics,
e.g., in basketball [Arkes & Martinez‘11], [Koehler & Conley’03],
[Vergin‘00].
• It is a non-parametric test for randomness in a two-valued sequence.
• The test considers runs, which are successions of symbols followed
or preceded by different symbols. Players with:
– many lengthy streaks => fewer runs
– many alternating wins & losses => many runs
• The WWRT considers:
– the distributions of streaks of different lengths based on the
number of runs
– compares it to the distribution that would be expected if
successive outcomes were independent.
9. 9
Materials & Methods
Wald-Wolfowitz Runs Test (WWRT) — Example
Suppose there is a two-valued sequence:
+ + + - - - + + -
Let:
n = # of positive values in the sequence,
m = # of negative values in the sequence, and
R = # of runs in the sequence
In this case, we have n = 5, m = 4, R = 4.
H0 is that each element in the sequence is independently drawn
from the same distribution.
Run1 Run2 Run3 Run4
10. 10
Materials & Methods
WWRT — Expected number of runs & Z-test
The Z-statistic is calculated as
(the p-values can be easily calculated using statistical software)
where the expected number of runs and variance of R is
respectively.
11. 11
Materials & Methods
Two players, A & B have played each other T times in the past.
1) Construct a historical result set, HR, which consists of upset
‘Us’ and non-upset results ‘Ns’.
for t ∈ T
2) Construct an upset result type set, UR, which consists of the
types of upset results — upset wins (UWs) & upset losses (ULs).
but
Note that
A was expected to win based on the
betting odds-implied probabilities
But B won
the match
Unexpected (upset) Result Identification
O = odds
S = sets won
for t ∈ T
12. 12
Materials & Methods
Approach Flow
• The WWRT is firstly applied to the historical result set (HR)
• Then, if the result is statistically significant (and # runs > 1),
we check the upset result set (UR)
Apply
WWRT
to UR
13. 13
Andy Murray vs Roger Federer
• A Sydney Morning Herald article from 2013 suggested that Roger
Federer was the bogey player of Andy Murray—but specifically at
Grand Slam tournaments.
• In each of these examples, the date that the article was
published is important and is used to subset the original dataset.
https://www.smh.com.au/sport/tennis/its-murray-v-djokovic-20130125-2dcuv.html
14. 14
Andy Murray vs Novak Djokovic
• A forum comment on menstennisforums.com from May
2011 suggested that Andy Murray was the bogey player
of Novak Djokovic
https://www.menstennisforums.com/threads/will-nadal-be-able-to-reach-the-rg-
final.182597/?u=36369
15. 15
Kei Nishikori vs. Jo-Wilfried Tsonga
https://www.heraldsun.com.au/sport/tennis/jowilfried-tsonga-hopes-to-avoid-australian-
open-bogey-kei-nishikori/news-story/2dfaab3f641494447405ae65fc7e7592
• A Herald Sun Article from 2017 suggested that Kei Nishikori
was the bogey player of Jo-Wilfried Tsonga.
16. 16
Results
• WWRT could not be run in this case (division by zero)
• However, it is clear that the bogey phenomenon does not exist
between Federer and Murray at Grand Slams.
Murray vs. Federer
17. 17
Results
• When WWRT was applied to the HR set, a one-
tailed p-value = 0.148 > 0.05 was obtained
• Therefore, no bogey effect existed between Murray
& Djokovic during the period considered
Murray vs. Djokovic
18. 18
Results
• Applying WWRT to the HR set resulted in a p-value = 0.032 <
0.05, so proceed to the next step
• The UR set has three upset wins to Nishikori and two upset
wins to Tsonga
• If we run WWRT, but this time on UR, we get p-value = 0.063
Nishikori vs. Tsonga
Apply
WWRT
to UR
set
19. 19
Next Steps
Avenues for further work
• Use k-category extension of WWRT (with k = 3).
– This will enable the set consisting of upset wins, upset
losses and non-upset results to be analysed at the same
time rather than in two steps.
• Investigate other approaches apart from WWRT:
– Autocorrelation tests
– Entropy [Zhang+13], which was used to analyze winning
streaks in NHL Ice Hockey [Steeger+21].
• Apply to other sports and compare whether the existence of
the bogey phenomenon differs across sports.
• Correction procedures, e.g., Bonferroni-Holm, if a larger
number of statistical tests are performed.
20. 20
Code
GitHub Repository
https://github.com/rorybunker/bogey-phenomenon-sport
• The version of the method discussed in this presentation is
implemented in python in bogey_identification_tennis.py.
• The dataset is also available in CSV format.
• A new version is being created,
bogey_identification_tennis_v2.py, which uses a k = 3
category runs test (which can be applied to 3-valued
sequences).
23. 23
Example Output
==== STEP 2 RESULTS ====
Upset results set (UR):
Date Result
2005-08-27 UL
2006-10-12 UL
2007-10-05 UL
2007-10-17 UL
2008-03-06 UL
2008-10-15 UL
2009-04-14 UL
Number of UWs: 0
Number of ULs: 7
0.0% of upset results were UWs
100.0% of upset results were ULs
0.0% of matches were UWs
43.75% of matches were ULs
Ferrer D. vs. Lopez F.
==== STEP 1 RESULTS ====
Historical results set (HR):
Date Result
2005-08-27 U
2006-10-12 U
2007-10-05 N
2007-10-17 U
2008-03-06 U
2008-10-15 U
2009-04-14 N
2011-04-13 N
2011-10-15 N
2012-04-27 N
2013-05-31 N
2014-02-27 N
2015-10-04 N
2016-05-28 N
2016-10-11 U
2017-06-01 U
Wald-Wolfowitz Runs Test
Number of runs: 5
Number of Ns: 9; Number of Us: 7
Z value: -2.039650254375284
One tailed P value: 0.020692586443467945; Two tailed P
value: 0.04138517288693589