SlideShare a Scribd company logo
1
The Bogey Phenomenon in Sport
Rory Bunker
Behavior Signal Processing Laboratory (Sports Behavior Group)
Graduate School of Informatics
Nagoya University, Japan
IX Mathsport International Conference
11 - 13 July 2022
2
Introduction
• Loosely speaking, bogey teams (team sport) or bogey players (in
individual sports) tend to beat a particular opposition (the non-
bogey) despite seemingly being weaker ‘on paper’.
• Whether bogey teams/players exist has been the subject of
much debate — particularly among sports fans and media.
• The concept has been briefly mentioned in fields including
education [Bruce‘14] and sociology [Chiweshe’18] [Poulton‘04].
• However, the topic has received very little attention in the
sports science and sports statistics literature.
• In this study, a method is proposed that combines:
• The Wald-Wolfowitz Runs Test (WWRT), a non-parametric
test for randomness in a two-valued sequence
• An unexpected (upset) result identification result method
that uses betting odds and actual results as its key inputs
3
Examples
• In football, Manchester United is considered the bogey team of
Newcastle United.
• Watford has not beaten Manchester City since 1989 (although
note that this does not imply that Manchester City is their bogey
team, since Watford may have been expected to lose on each
occasion).
• Some are considered to exist at specific competitions:
– Portugal being the bogey team of England in football world
cups
– England and Argentina are the respective bogey teams of
Australia and France at rugby world cups
• In this study, the focus is on bogey players in tennis, which has
had less media attention than football but is more
straightforward to analyse (only has 2 possible outcomes).
4
Related areas of research
• Related areas of study include studies on the hot-hand
effect and streaks in sports.
• Streaks can include:
– Individual player actions: e.g., successful 3-pointers in
basketball
– Team-level outcomes: e.g., a team’s run of
consecutive wins at their home venue
• Concept of positive recency: tendency to predict future
outcomes to be the same as previous outcomes.
[Ayton & Fischer ‘04], see [Bar-Eli+’06] for a review
5
Definition
Defining a bogey player
A player who performs consistently better than would be
expected against a specified opposition (the non-bogey
player) over a certain period, given factors including the
difference in rankings, recent form, court surface, etc.
• This definition implies that the bogey phenomenon exists
between pairs of players, similar to pairwise comparisons (e.g.,
the Bradley-Terry model).
• It also suggests that there is a temporal element to the
phenomenon — it may exist for some period of time but not for
other periods.
• Incorporating betting odds into the method (next slide) means the
factors that are underlined in the above definition do not have to
separately be included.
6
Benefit of using betting odds
• Betting odds incorporate many different factors including
strength, venue, form, and — in the case of team sports — player
availability, etc.
• The average odds, taken across multiple bookmaking companies,
will have greater reliability (the dataset in this study includes
average odds from oddsportal.com).
Venue
Form
Strength
…
Betting
odds
• So, betting odds can be used as a single variable in the method
rather than needing to include these different variables separately.
…
7
Materials & Methods
Dataset
• Publicly available data from professional men’s tennis from
tennis-data.co.uk is used — the same as used by [Angelini+22]
• The dataset contains 38,868 matches from 4 July 2005 - 22
November 2020 and contains data including:
– All ATP tour matches from Master’s, ATP finals and Grand
Slams (Men’s ATP)
– Average bookmaker odds (oddsportal.com)
• The final number of matches was 33,976 after passing the dataset
through the clean() function in the welo R package [Angelini+22].
38,868
matches
33,976
matches
clean()
Data Data_Clean
8
Materials & Methods
Wald-Wolfowitz Runs Test (WWRT)
• The WWRT has been used by several researchers in sports statistics,
e.g., in basketball [Arkes & Martinez‘11], [Koehler & Conley’03],
[Vergin‘00].
• It is a non-parametric test for randomness in a two-valued sequence.
• The test considers runs, which are successions of symbols followed
or preceded by different symbols. Players with:
– many lengthy streaks => fewer runs
– many alternating wins & losses => many runs
• The WWRT considers:
– the distributions of streaks of different lengths based on the
number of runs
– compares it to the distribution that would be expected if
successive outcomes were independent.
9
Materials & Methods
Wald-Wolfowitz Runs Test (WWRT) — Example
Suppose there is a two-valued sequence:
+ + + - - - + + -
Let:
n = # of positive values in the sequence,
m = # of negative values in the sequence, and
R = # of runs in the sequence
In this case, we have n = 5, m = 4, R = 4.
H0 is that each element in the sequence is independently drawn
from the same distribution.
Run1 Run2 Run3 Run4
10
Materials & Methods
WWRT — Expected number of runs & Z-test
The Z-statistic is calculated as
(the p-values can be easily calculated using statistical software)
where the expected number of runs and variance of R is
respectively.
11
Materials & Methods
Two players, A & B have played each other T times in the past.
1) Construct a historical result set, HR, which consists of upset
‘Us’ and non-upset results ‘Ns’.
for t ∈ T
2) Construct an upset result type set, UR, which consists of the
types of upset results — upset wins (UWs) & upset losses (ULs).
but
Note that
A was expected to win based on the
betting odds-implied probabilities
But B won
the match
Unexpected (upset) Result Identification
O = odds
S = sets won
for t ∈ T
12
Materials & Methods
Approach Flow
• The WWRT is firstly applied to the historical result set (HR)
• Then, if the result is statistically significant (and # runs > 1),
we check the upset result set (UR)
Apply
WWRT
to UR
13
Andy Murray vs Roger Federer
• A Sydney Morning Herald article from 2013 suggested that Roger
Federer was the bogey player of Andy Murray—but specifically at
Grand Slam tournaments.
• In each of these examples, the date that the article was
published is important and is used to subset the original dataset.
https://www.smh.com.au/sport/tennis/its-murray-v-djokovic-20130125-2dcuv.html
14
Andy Murray vs Novak Djokovic
• A forum comment on menstennisforums.com from May
2011 suggested that Andy Murray was the bogey player
of Novak Djokovic
https://www.menstennisforums.com/threads/will-nadal-be-able-to-reach-the-rg-
final.182597/?u=36369
15
Kei Nishikori vs. Jo-Wilfried Tsonga
https://www.heraldsun.com.au/sport/tennis/jowilfried-tsonga-hopes-to-avoid-australian-
open-bogey-kei-nishikori/news-story/2dfaab3f641494447405ae65fc7e7592
• A Herald Sun Article from 2017 suggested that Kei Nishikori
was the bogey player of Jo-Wilfried Tsonga.
16
Results
• WWRT could not be run in this case (division by zero)
• However, it is clear that the bogey phenomenon does not exist
between Federer and Murray at Grand Slams.
Murray vs. Federer
17
Results
• When WWRT was applied to the HR set, a one-
tailed p-value = 0.148 > 0.05 was obtained
• Therefore, no bogey effect existed between Murray
& Djokovic during the period considered
Murray vs. Djokovic
18
Results
• Applying WWRT to the HR set resulted in a p-value = 0.032 <
0.05, so proceed to the next step
• The UR set has three upset wins to Nishikori and two upset
wins to Tsonga
• If we run WWRT, but this time on UR, we get p-value = 0.063
Nishikori vs. Tsonga
Apply
WWRT
to UR
set
19
Next Steps
Avenues for further work
• Use k-category extension of WWRT (with k = 3).
– This will enable the set consisting of upset wins, upset
losses and non-upset results to be analysed at the same
time rather than in two steps.
• Investigate other approaches apart from WWRT:
– Autocorrelation tests
– Entropy [Zhang+13], which was used to analyze winning
streaks in NHL Ice Hockey [Steeger+21].
• Apply to other sports and compare whether the existence of
the bogey phenomenon differs across sports.
• Correction procedures, e.g., Bonferroni-Holm, if a larger
number of statistical tests are performed.
20
Code
GitHub Repository
https://github.com/rorybunker/bogey-phenomenon-sport
• The version of the method discussed in this presentation is
implemented in python in bogey_identification_tennis.py.
• The dataset is also available in CSV format.
• A new version is being created,
bogey_identification_tennis_v2.py, which uses a k = 3
category runs test (which can be applied to 3-valued
sequences).
21
Thank You
Acknowledgements:
KAKENHI 20H04075, JST Presto JPMJPR20CA
22
Appendix
23
Example Output
==== STEP 2 RESULTS ====
Upset results set (UR):
Date Result
2005-08-27 UL
2006-10-12 UL
2007-10-05 UL
2007-10-17 UL
2008-03-06 UL
2008-10-15 UL
2009-04-14 UL
Number of UWs: 0
Number of ULs: 7
0.0% of upset results were UWs
100.0% of upset results were ULs
0.0% of matches were UWs
43.75% of matches were ULs
Ferrer D. vs. Lopez F.
==== STEP 1 RESULTS ====
Historical results set (HR):
Date Result
2005-08-27 U
2006-10-12 U
2007-10-05 N
2007-10-17 U
2008-03-06 U
2008-10-15 U
2009-04-14 N
2011-04-13 N
2011-10-15 N
2012-04-27 N
2013-05-31 N
2014-02-27 N
2015-10-04 N
2016-05-28 N
2016-10-11 U
2017-06-01 U
Wald-Wolfowitz Runs Test
Number of runs: 5
Number of Ns: 9; Number of Us: 7
Z value: -2.039650254375284
One tailed P value: 0.020692586443467945; Two tailed P
value: 0.04138517288693589

More Related Content

Similar to MathSportIntl22_Presentation_Rory_Bunker.pdf

Winning in Basketball with Data and Machine Learning
Winning in Basketball with Data and Machine LearningWinning in Basketball with Data and Machine Learning
Winning in Basketball with Data and Machine Learning
Konstantinos Pelechrinis
 
A Framework For Scheduling Professional Sports Leagues
A Framework For Scheduling Professional Sports LeaguesA Framework For Scheduling Professional Sports Leagues
A Framework For Scheduling Professional Sports Leagues
Amber Ford
 
Players Movements and Team Performance
Players Movements and Team PerformancePlayers Movements and Team Performance
Players Movements and Team Performance
University of Salerno
 
10.1.1.735.795.pdf
10.1.1.735.795.pdf10.1.1.735.795.pdf
10.1.1.735.795.pdf
researchict
 
Prediction Of Right Bowlers For Death Overs In Cricket
Prediction Of Right Bowlers For Death Overs In CricketPrediction Of Right Bowlers For Death Overs In Cricket
Prediction Of Right Bowlers For Death Overs In Cricket
IRJET Journal
 
Final Thesis
Final ThesisFinal Thesis
Final Thesis
Matthew Rosenstein
 

Similar to MathSportIntl22_Presentation_Rory_Bunker.pdf (6)

Winning in Basketball with Data and Machine Learning
Winning in Basketball with Data and Machine LearningWinning in Basketball with Data and Machine Learning
Winning in Basketball with Data and Machine Learning
 
A Framework For Scheduling Professional Sports Leagues
A Framework For Scheduling Professional Sports LeaguesA Framework For Scheduling Professional Sports Leagues
A Framework For Scheduling Professional Sports Leagues
 
Players Movements and Team Performance
Players Movements and Team PerformancePlayers Movements and Team Performance
Players Movements and Team Performance
 
10.1.1.735.795.pdf
10.1.1.735.795.pdf10.1.1.735.795.pdf
10.1.1.735.795.pdf
 
Prediction Of Right Bowlers For Death Overs In Cricket
Prediction Of Right Bowlers For Death Overs In CricketPrediction Of Right Bowlers For Death Overs In Cricket
Prediction Of Right Bowlers For Death Overs In Cricket
 
Final Thesis
Final ThesisFinal Thesis
Final Thesis
 

Recently uploaded

一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
g4dpvqap0
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
fkyes25
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 

Recently uploaded (20)

一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 

MathSportIntl22_Presentation_Rory_Bunker.pdf

  • 1. 1 The Bogey Phenomenon in Sport Rory Bunker Behavior Signal Processing Laboratory (Sports Behavior Group) Graduate School of Informatics Nagoya University, Japan IX Mathsport International Conference 11 - 13 July 2022
  • 2. 2 Introduction • Loosely speaking, bogey teams (team sport) or bogey players (in individual sports) tend to beat a particular opposition (the non- bogey) despite seemingly being weaker ‘on paper’. • Whether bogey teams/players exist has been the subject of much debate — particularly among sports fans and media. • The concept has been briefly mentioned in fields including education [Bruce‘14] and sociology [Chiweshe’18] [Poulton‘04]. • However, the topic has received very little attention in the sports science and sports statistics literature. • In this study, a method is proposed that combines: • The Wald-Wolfowitz Runs Test (WWRT), a non-parametric test for randomness in a two-valued sequence • An unexpected (upset) result identification result method that uses betting odds and actual results as its key inputs
  • 3. 3 Examples • In football, Manchester United is considered the bogey team of Newcastle United. • Watford has not beaten Manchester City since 1989 (although note that this does not imply that Manchester City is their bogey team, since Watford may have been expected to lose on each occasion). • Some are considered to exist at specific competitions: – Portugal being the bogey team of England in football world cups – England and Argentina are the respective bogey teams of Australia and France at rugby world cups • In this study, the focus is on bogey players in tennis, which has had less media attention than football but is more straightforward to analyse (only has 2 possible outcomes).
  • 4. 4 Related areas of research • Related areas of study include studies on the hot-hand effect and streaks in sports. • Streaks can include: – Individual player actions: e.g., successful 3-pointers in basketball – Team-level outcomes: e.g., a team’s run of consecutive wins at their home venue • Concept of positive recency: tendency to predict future outcomes to be the same as previous outcomes. [Ayton & Fischer ‘04], see [Bar-Eli+’06] for a review
  • 5. 5 Definition Defining a bogey player A player who performs consistently better than would be expected against a specified opposition (the non-bogey player) over a certain period, given factors including the difference in rankings, recent form, court surface, etc. • This definition implies that the bogey phenomenon exists between pairs of players, similar to pairwise comparisons (e.g., the Bradley-Terry model). • It also suggests that there is a temporal element to the phenomenon — it may exist for some period of time but not for other periods. • Incorporating betting odds into the method (next slide) means the factors that are underlined in the above definition do not have to separately be included.
  • 6. 6 Benefit of using betting odds • Betting odds incorporate many different factors including strength, venue, form, and — in the case of team sports — player availability, etc. • The average odds, taken across multiple bookmaking companies, will have greater reliability (the dataset in this study includes average odds from oddsportal.com). Venue Form Strength … Betting odds • So, betting odds can be used as a single variable in the method rather than needing to include these different variables separately. …
  • 7. 7 Materials & Methods Dataset • Publicly available data from professional men’s tennis from tennis-data.co.uk is used — the same as used by [Angelini+22] • The dataset contains 38,868 matches from 4 July 2005 - 22 November 2020 and contains data including: – All ATP tour matches from Master’s, ATP finals and Grand Slams (Men’s ATP) – Average bookmaker odds (oddsportal.com) • The final number of matches was 33,976 after passing the dataset through the clean() function in the welo R package [Angelini+22]. 38,868 matches 33,976 matches clean() Data Data_Clean
  • 8. 8 Materials & Methods Wald-Wolfowitz Runs Test (WWRT) • The WWRT has been used by several researchers in sports statistics, e.g., in basketball [Arkes & Martinez‘11], [Koehler & Conley’03], [Vergin‘00]. • It is a non-parametric test for randomness in a two-valued sequence. • The test considers runs, which are successions of symbols followed or preceded by different symbols. Players with: – many lengthy streaks => fewer runs – many alternating wins & losses => many runs • The WWRT considers: – the distributions of streaks of different lengths based on the number of runs – compares it to the distribution that would be expected if successive outcomes were independent.
  • 9. 9 Materials & Methods Wald-Wolfowitz Runs Test (WWRT) — Example Suppose there is a two-valued sequence: + + + - - - + + - Let: n = # of positive values in the sequence, m = # of negative values in the sequence, and R = # of runs in the sequence In this case, we have n = 5, m = 4, R = 4. H0 is that each element in the sequence is independently drawn from the same distribution. Run1 Run2 Run3 Run4
  • 10. 10 Materials & Methods WWRT — Expected number of runs & Z-test The Z-statistic is calculated as (the p-values can be easily calculated using statistical software) where the expected number of runs and variance of R is respectively.
  • 11. 11 Materials & Methods Two players, A & B have played each other T times in the past. 1) Construct a historical result set, HR, which consists of upset ‘Us’ and non-upset results ‘Ns’. for t ∈ T 2) Construct an upset result type set, UR, which consists of the types of upset results — upset wins (UWs) & upset losses (ULs). but Note that A was expected to win based on the betting odds-implied probabilities But B won the match Unexpected (upset) Result Identification O = odds S = sets won for t ∈ T
  • 12. 12 Materials & Methods Approach Flow • The WWRT is firstly applied to the historical result set (HR) • Then, if the result is statistically significant (and # runs > 1), we check the upset result set (UR) Apply WWRT to UR
  • 13. 13 Andy Murray vs Roger Federer • A Sydney Morning Herald article from 2013 suggested that Roger Federer was the bogey player of Andy Murray—but specifically at Grand Slam tournaments. • In each of these examples, the date that the article was published is important and is used to subset the original dataset. https://www.smh.com.au/sport/tennis/its-murray-v-djokovic-20130125-2dcuv.html
  • 14. 14 Andy Murray vs Novak Djokovic • A forum comment on menstennisforums.com from May 2011 suggested that Andy Murray was the bogey player of Novak Djokovic https://www.menstennisforums.com/threads/will-nadal-be-able-to-reach-the-rg- final.182597/?u=36369
  • 15. 15 Kei Nishikori vs. Jo-Wilfried Tsonga https://www.heraldsun.com.au/sport/tennis/jowilfried-tsonga-hopes-to-avoid-australian- open-bogey-kei-nishikori/news-story/2dfaab3f641494447405ae65fc7e7592 • A Herald Sun Article from 2017 suggested that Kei Nishikori was the bogey player of Jo-Wilfried Tsonga.
  • 16. 16 Results • WWRT could not be run in this case (division by zero) • However, it is clear that the bogey phenomenon does not exist between Federer and Murray at Grand Slams. Murray vs. Federer
  • 17. 17 Results • When WWRT was applied to the HR set, a one- tailed p-value = 0.148 > 0.05 was obtained • Therefore, no bogey effect existed between Murray & Djokovic during the period considered Murray vs. Djokovic
  • 18. 18 Results • Applying WWRT to the HR set resulted in a p-value = 0.032 < 0.05, so proceed to the next step • The UR set has three upset wins to Nishikori and two upset wins to Tsonga • If we run WWRT, but this time on UR, we get p-value = 0.063 Nishikori vs. Tsonga Apply WWRT to UR set
  • 19. 19 Next Steps Avenues for further work • Use k-category extension of WWRT (with k = 3). – This will enable the set consisting of upset wins, upset losses and non-upset results to be analysed at the same time rather than in two steps. • Investigate other approaches apart from WWRT: – Autocorrelation tests – Entropy [Zhang+13], which was used to analyze winning streaks in NHL Ice Hockey [Steeger+21]. • Apply to other sports and compare whether the existence of the bogey phenomenon differs across sports. • Correction procedures, e.g., Bonferroni-Holm, if a larger number of statistical tests are performed.
  • 20. 20 Code GitHub Repository https://github.com/rorybunker/bogey-phenomenon-sport • The version of the method discussed in this presentation is implemented in python in bogey_identification_tennis.py. • The dataset is also available in CSV format. • A new version is being created, bogey_identification_tennis_v2.py, which uses a k = 3 category runs test (which can be applied to 3-valued sequences).
  • 23. 23 Example Output ==== STEP 2 RESULTS ==== Upset results set (UR): Date Result 2005-08-27 UL 2006-10-12 UL 2007-10-05 UL 2007-10-17 UL 2008-03-06 UL 2008-10-15 UL 2009-04-14 UL Number of UWs: 0 Number of ULs: 7 0.0% of upset results were UWs 100.0% of upset results were ULs 0.0% of matches were UWs 43.75% of matches were ULs Ferrer D. vs. Lopez F. ==== STEP 1 RESULTS ==== Historical results set (HR): Date Result 2005-08-27 U 2006-10-12 U 2007-10-05 N 2007-10-17 U 2008-03-06 U 2008-10-15 U 2009-04-14 N 2011-04-13 N 2011-10-15 N 2012-04-27 N 2013-05-31 N 2014-02-27 N 2015-10-04 N 2016-05-28 N 2016-10-11 U 2017-06-01 U Wald-Wolfowitz Runs Test Number of runs: 5 Number of Ns: 9; Number of Us: 7 Z value: -2.039650254375284 One tailed P value: 0.020692586443467945; Two tailed P value: 0.04138517288693589