This document investigates different statistical models for predicting association football scores. It summarizes that previous studies found the Negative Binomial distribution fitted scores better than the Poisson distribution. However, this paper further examines the Poisson model by incorporating parameters representing teams' attacking and defensive strengths. It tests a hierarchy of models and finds that an independent Poisson model provides a reasonably accurate description of football scores, though a bivariate Poisson model with a 0.2 correlation between scores improves the fit slightly.
This document summarizes the key findings of a research paper on the frequency of convergent games under best-response dynamics. The paper shows that:
1) The frequency of randomly generated games with a unique pure strategy Nash equilibrium goes to zero as the number of players or strategies increases.
2) Convergent games with fewer pure strategy Nash equilibria are more common than those with more equilibria.
3) For 2-player games with less than 10 strategies, games with a unique equilibrium are most common, but games with multiple equilibria are more likely for more than 10 strategies.
This document analyzes NBA player statistics and salaries from the 2013-2014 season to determine if players were paid what their statistics indicated they deserved. It introduces the concepts of win shares and player efficiency rating to measure player performance, and describes how topological data analysis techniques like simplicial complexes, persistent homology, barcodes and persistence diagrams are used to visualize relationships in the statistical data. Key results include a graph comparing player statistics to average salary to identify overpaid and underpaid players, as well as analysis of homology dimensions to understand cluster structure.
This document provides an introduction to game theory, including:
- Game theory mathematically determines optimal strategies given conditions to maximize outcomes.
- It has roots in ancient texts and was modernized in 1944. Famous examples include the Prisoner's Dilemma.
- Games involve players, strategies, and payoffs. Equilibria like Nash equilibria predict likely outcomes.
- Games can be simultaneous or sequential, affecting likely equilibria. Strategies can be pure or mixed.
Coalition proofness in a class of games with strategic substitutesIra Tobing
This document summarizes a research article that examines coalition-proofness and Pareto properties of Nash equilibria in games with strategic substitutes and increasing/decreasing externalities. Specifically:
1) It considers a class of "σ-interactive games" with these properties and proves several results about the relationships between different equilibrium concepts in these games.
2) It proves that in games with strategic substitutes and increasing externalities, the sets of Nash equilibria, coalition-proof Nash equilibria under strong Pareto dominance, and Nash equilibria not strongly Pareto dominated coincide.
3) It also proves several results about the relationships between different equilibrium concepts in games with strategic substitutes and decreasing
The document provides an overview of game theory including:
- A brief history noting its development in the 20th century and key publications.
- Definitions of basic game theory terms like players, strategies (pure vs mixed), and payoff matrices.
- An outline of the typical process used in game theory including finding pure strategies like saddle points or using dominance rules, and determining mixed strategies.
- Examples are given to demonstrate finding the saddle point and using dominance rules to reduce a payoff matrix.
Game theory is the study of strategic decision makingManoj Ghorpade
Game theory is the study of strategic decision making between intelligent rational decision makers. It originated in economics and is used in fields like political science and psychology. Game theory analyzes interactions with both cooperative and non-cooperative games. Modern game theory began with John von Neumann's work on mixed strategy equilibria in two-person zero-sum games. Von Neumann and Oskar Morgenstern's 1944 book Theory of Games and Economic Behavior was influential in establishing game theory. Game theory has been widely applied, including in biology since the 1970s, and has helped explain behaviors in economics, politics and other fields.
Game Theory - Quantitative Analysis for Decision MakingIshita Bose
WHAT IS GAME THEORY?
HISTORY OF GAME THEORY
APPLICATIONS OF GAME THEORY
KEY ELEMENTS OF A GAME
TYPES OF GAME
NASH EQUILIBRIUM (NE)
PURE STRATEGIES AND MIXED STRATEGIES
2-PLAYERS ZERO-SUM GAMES
PRISONER’S DILEMMA
This document summarizes the key findings of a research paper on the frequency of convergent games under best-response dynamics. The paper shows that:
1) The frequency of randomly generated games with a unique pure strategy Nash equilibrium goes to zero as the number of players or strategies increases.
2) Convergent games with fewer pure strategy Nash equilibria are more common than those with more equilibria.
3) For 2-player games with less than 10 strategies, games with a unique equilibrium are most common, but games with multiple equilibria are more likely for more than 10 strategies.
This document analyzes NBA player statistics and salaries from the 2013-2014 season to determine if players were paid what their statistics indicated they deserved. It introduces the concepts of win shares and player efficiency rating to measure player performance, and describes how topological data analysis techniques like simplicial complexes, persistent homology, barcodes and persistence diagrams are used to visualize relationships in the statistical data. Key results include a graph comparing player statistics to average salary to identify overpaid and underpaid players, as well as analysis of homology dimensions to understand cluster structure.
This document provides an introduction to game theory, including:
- Game theory mathematically determines optimal strategies given conditions to maximize outcomes.
- It has roots in ancient texts and was modernized in 1944. Famous examples include the Prisoner's Dilemma.
- Games involve players, strategies, and payoffs. Equilibria like Nash equilibria predict likely outcomes.
- Games can be simultaneous or sequential, affecting likely equilibria. Strategies can be pure or mixed.
Coalition proofness in a class of games with strategic substitutesIra Tobing
This document summarizes a research article that examines coalition-proofness and Pareto properties of Nash equilibria in games with strategic substitutes and increasing/decreasing externalities. Specifically:
1) It considers a class of "σ-interactive games" with these properties and proves several results about the relationships between different equilibrium concepts in these games.
2) It proves that in games with strategic substitutes and increasing externalities, the sets of Nash equilibria, coalition-proof Nash equilibria under strong Pareto dominance, and Nash equilibria not strongly Pareto dominated coincide.
3) It also proves several results about the relationships between different equilibrium concepts in games with strategic substitutes and decreasing
The document provides an overview of game theory including:
- A brief history noting its development in the 20th century and key publications.
- Definitions of basic game theory terms like players, strategies (pure vs mixed), and payoff matrices.
- An outline of the typical process used in game theory including finding pure strategies like saddle points or using dominance rules, and determining mixed strategies.
- Examples are given to demonstrate finding the saddle point and using dominance rules to reduce a payoff matrix.
Game theory is the study of strategic decision makingManoj Ghorpade
Game theory is the study of strategic decision making between intelligent rational decision makers. It originated in economics and is used in fields like political science and psychology. Game theory analyzes interactions with both cooperative and non-cooperative games. Modern game theory began with John von Neumann's work on mixed strategy equilibria in two-person zero-sum games. Von Neumann and Oskar Morgenstern's 1944 book Theory of Games and Economic Behavior was influential in establishing game theory. Game theory has been widely applied, including in biology since the 1970s, and has helped explain behaviors in economics, politics and other fields.
Game Theory - Quantitative Analysis for Decision MakingIshita Bose
WHAT IS GAME THEORY?
HISTORY OF GAME THEORY
APPLICATIONS OF GAME THEORY
KEY ELEMENTS OF A GAME
TYPES OF GAME
NASH EQUILIBRIUM (NE)
PURE STRATEGIES AND MIXED STRATEGIES
2-PLAYERS ZERO-SUM GAMES
PRISONER’S DILEMMA
1) The study analyzed runs scored and allowed by MLB teams in each inning of the 2015 season to determine which innings best predict overall wins.
2) Regression analysis found that runs in the 1st, 2nd, 5th, and 7th innings were the best predictors of wins, explaining around 75% of the variation in wins.
3) The 1st and 2nd innings are important because teams face the opposing lineup, and early leads influence momentum. The 5th and 7th innings involve relief pitching changes that can impact scoring.
This paper proposes using Naive Bayes models to predict the probability of a goal being scored in the NHL based on shot attributes, player information, and game situation. The paper develops two Naive Bayes models: an Empirical Naive Bayes model using empirical densities directly from the data without parametric assumptions, and a Parametric Naive Bayes model fitting known distributions to the data. Both models are compared to a logistic regression baseline using cross-validation. The Empirical Naive Bayes model has the lowest error rates overall, while the Parametric Naive Bayes and logistic regression models perform similarly. The models can help identify favorable and unfavorable shots and locations to inform player placement strategy.
Sports Aanalytics - Goaltender PerformanceJason Mei
- Goaltenders are more likely to allow a goal in the first 5 shots after allowing a previous goal, showing a cumulative 1.87% increased chance of letting in a goal. This suggests goaltenders' confidence is temporarily impacted after letting in a goal.
- However, individual goaltenders show variable patterns in performance after goals, with impacts differing based on each player's mental response to in-game events.
- While goals tend to be "clustered" right after another for the team overall, the effect is not uniform for all goaltenders and seems to depend more on the individual's psychological state.
This document is a thesis submitted by Steve Cultrera to Central Connecticut State University analyzing the impact of weather on runs scored in baseball games at Fenway Park in Boston over 40 years. It reviews previous literature that found variables like hits, walks, and stolen bases explain over 95% of runs scored, but none looked at weather impacts. The thesis aims to determine if weather variables like temperature, wind, and pressure can explain additional variance in runs. It describes the dataset created by combining baseball game data from Fenway Park with weather data from a nearby airport. Exploratory analysis, clustering, and predictive modeling techniques are used to analyze the data and relationships between weather and runs scored.
This document provides an overview of one-way analysis of variance (ANOVA). It begins by explaining the basic concepts and settings for ANOVA, including comparing population means across three or more groups. It then covers the hypotheses, ideas, assumptions, and calculations involved in one-way ANOVA. These include splitting total variability into parts between and within groups, computing an F-statistic to test if population means are equal, and potentially performing multiple comparisons between pairs of groups if the F-test is significant. Worked examples are provided to illustrate key ANOVA concepts and calculations.
Supervised sequential pattern mining for identifying important patterns of pl...Rory Bunker
This document describes a study that uses supervised sequential pattern mining (SPM) to analyze sequence data from rugby matches and identify important patterns of play. The study converts match event logs into labeled sequences of play events, then applies a supervised SPM method to identify patterns that discriminate between scoring/not scoring or conceding/not conceding. The supervised SPM method identifies more sophisticated and relevant patterns compared to unsupervised SPM methods. Key patterns indicated line breaks and lineouts were important for scoring, and maintaining possession and finding touch on kick restarts were important for preventing scores. The study concludes supervised SPM is useful for performance analysis in sports.
Stability criterion of periodic oscillations in a (4)Alexander Decker
1) The authors establish that the distribution of the harmonic mean of group variances is a generalized beta distribution through simulation.
2) They show that the generalized beta distribution can be approximated by a chi-square distribution.
3) This means that the harmonic mean of group variances is approximately chi-square distributed, though the degrees of freedom need not be an integer. Using the harmonic mean in place of the pooled variance allows hypothesis testing when group variances are unequal.
The Problem of the Chinese Basketball Association Competing for the ChampionshipDr. Amarjeet Singh
This document establishes mathematical models to analyze historical score data from Chinese Basketball Association (CBA) games over four years in order to predict team win probabilities and rankings. It calculates win probabilities for each team in the regular season and playoffs based on scoring averages and variances. A time series model is used to predict probabilities for the next year. A fuzzy comprehensive evaluation model analyzes stability, performance level, and average scores to qualitatively assess the overall strength of each team. The models provide rankings of teams for the regular season, playoffs, and an overall "average" strength level for the CBA league.
This paper investigates whether professional baseball players follow optimal strategies as predicted by game theory's Minimax theorem, using Major League Baseball playoff season data. The authors find that baseball players' strategies are predictable based on their previous actions, indicating they do not fully optimize. Higher salaries are found to decrease players' incentives to pursue optimal strategies and bring lower performance, while more experience leads to strategies more aligned with Minimax.
Correlation: Bivariate Data and Scatter PlotDenzelMontuya1
This document discusses bivariate data and scatter plots. Bivariate data involves collecting values of two variables from the same population. A scatter plot can show the relationship between two quantitative variables by plotting their ordered pairs. The direction and strength of the pattern in a scatter plot indicates whether the variables have a positive, negative, or no correlation. The correlation coefficient (r) measures the strength of the linear relationship between variables on a scale from -1 to 1.
Analysis of variance (ANOVA) is a statistical technique used to compare the means of three or more groups. It compares the variance between groups with the variance within groups to determine if the population means are significantly different. The key assumptions of ANOVA are independence, normality, and homogeneity of variances. A one-way ANOVA involves one independent variable with multiple levels or groups, and compares the group means to the overall mean to calculate an F-ratio statistic. If the F-ratio exceeds a critical value, then the null hypothesis that the group means are equal can be rejected.
A combination of ball events and positional data is needed to understand the players’ and
team’s performance. Thus, several indicators such as player-player and player-ball dyadic
coordination, intra-and inter-team synchronization, pattern-forming dynamics, time required
to regain ball possession, ball possession percentage, number of passes and their length have
been used to characterize individual and collective performance using AI algorithms
I provide a (very) brief introduction to game theory. I have developed these notes to
provide quick access to some of the basics of game theory; mainly as an aid for students
in courses in which I assumed familiarity with game theory but did not require it as a
prerequisite
This document discusses predicting the outcomes of National Hockey League (NHL) games using machine learning models. It aims to improve upon the results of a previous study by the University of Ottawa that achieved 60% accuracy. The document uses the same dataset from the Ottawa study containing statistics from 517 NHL games. It builds machine learning models using decision trees, neural networks, and a proprietary software to predict game outcomes. The models are built using different combinations of the dataset's categorical and continuous variables. The best performing models achieve accuracies between 57-62%, showing an improvement over the previous study.
Yujie Zi Econ 123CW Research Paper - NBA Defensive TeamsYujie Zi
The study examines the impact of a team's defensive or offensive focus on regular season win percentage in the NBA. Regression analysis is conducted using offensive and defensive statistics from 30 NBA teams over the 2005-2006 and 2007-2008 seasons. The regression finds that having a "defensive mindset", as defined by a team's defensive efficiency ranking higher than its offensive efficiency ranking, has an insignificant and negative effect on regular season win percentage. This suggests that a team's focus on defense or offense does not statistically impact regular season performance. Future research could study the impact of defenses in close games or overcoming large deficits.
1) The document describes a probabilistic graphical model for simulating basketball matches. It builds on a previous model by including the possibility of dribbling and distinguishing between open and contested shots.
2) Key aspects of the model include probabilities for shot attempts, drives to the basket, shot efficiency, and defensive impact. These are calculated based on player tendencies and abilities as well as the offensive and defensive lineups.
3) The model represents events in a possession as vertices in a graph and the progression between events as edges with weighted probabilities. This allows full simulations of games to be run using the model.
A Hybrid Constraint Programming And Enumeration Approach For Solving NHL Play...Shannon Green
This document proposes a hybrid constraint programming and enumeration approach to solve National Hockey League (NHL) playoff qualification and elimination problems. The approach uses constraint programming, enumeration, network flows and decomposition to efficiently determine the minimum points needed to guarantee or possibly earn a playoff spot. It was experimentally tested on NHL data from 2005-06 and 2006-07 seasons, providing earlier qualification results than newspapers. The approach can identify critical "must-win" games that significantly impact playoff chances.
This document summarizes a statistical model to predict results of Euro 2016 qualifiers using multivariate regression. The model examines individual player performance data from club matches to predict national team match outcomes. It finds the model can correctly predict match results 62.1% of the time and goal scores in 33.6% of matches. Factor analysis is used to compress defending and attacking player stats into defending and attacking factors for each team.
1. After watching the attached video by Dan Pink on .docxjeremylockett77
1. After watching the attached video by Dan Pink on the inherent weaknesses of extrinsic motivators, present two salient applications to your role as a leader in athletics. Dan Pink: The puzzle of motivation Ted.com
2. One of the very real truisms about leadership is that it can be lonely at the top and quite stressful. Please describe two specific ways you as a leader manage stress in your life.
BIBLIOGRAPHY
Annala, C. N., & Winfree, J. (2011). Salary distribution and team performance in Major League Baseball. Sport Management Review, 14(2), 167-175.
Breunig, R., Garrett-Rumba, B., Jardin, M., & Rocaboy, Y. (2014). Wage dispersion and team performance: a theoretical model and evidence from baseball. Applied Economics, 46(3), 271-281.
Devi R. (2016). Data.world. Baseball Stats. Retrieved September 25, 2019 from https://data.world/deviramanan2016/baseball-stats
Lee, S., & Harris, J. (2012). Managing excellence in USA Major League Soccer: an analysis of the relationship between player performance and salary. Managing Leisure, 17(2-3), 106- 123.
Scully, G. W. (1974). Pay and performance in major league baseball. The American Economic Review, 64(6), 915-930.
Sommers, P. M., & Quinton, N. (1982). Pay and performance in major league baseball: The case of the first family of free agents. The Journal of Human Resources, 17(3), 426-436.
Tao, Y. L., Chuang, H. L., & Lin, E. S. (2016). Compensation and performance in Major League Baseball: Evidence from salary dispersion and team performance. International Review of Economics & Finance, 43, 151-159.
Wiseman, F., & Chatterjee, S. (2003). Team payroll and team performance in major league baseball: 1985–2002. Economics Bulletin, 1(2), 1-10.
Running Head: PAY AND PERFORMANCE IN MAJOR LEAGUE BASEBALL 1
PAY AND PERFORMANCE IN MAJOR LEAGUE BASEBALL 5
PAY AND PERFORMANCE IN MAJOR LEAGUE BASEBALL
RODERICK HOOKS
9-16-2019
Purpose statement and model
This study will try to examine whether there is a relationship between the payment and performance of a team. Performance is the dependent variable measured by wins of a team in the 2010 Major League Baseball (Tao Y. et al, 2016). This is the suitable dependent variable since the wins for a team can be influenced by many factors and the final results are the main target of every team (Scully G., 1974). The primary independent variable is payroll which the totals pay of the team (Wiseman F. & Chatterjee S., 2003). This is suitable in determining whether there is relationship between pay and performance due to the fact that a higher anticipates higher performance since many challenges for the team can be solved by financial stability (Sommers P. & Quinton N., 1982).
The general form of the model will be;
Wins = b0 + b1Payroll + b2Attendance + Error (
Definitions of variables
The variables used in this study are wins, payroll and attendance. Win is the dependent variable measuring the number of games the team wins. I ...
This document provides guidance on how to analyze a soccer match by looking at various elements of team structure, tactics, tendencies, and key players on both sides of the ball. It outlines things to observe such as formations, roles of midfielders and defenders, attacking and passing patterns, set pieces, pressing strategies, and how a team's approach may change based on the score, time remaining, or other in-game factors. The level of detail in the analysis can help high-level coaches better understand the opponent and make appropriate adjustments to their own tactics.
More Related Content
Similar to 1982 maher modelling association football scores
1) The study analyzed runs scored and allowed by MLB teams in each inning of the 2015 season to determine which innings best predict overall wins.
2) Regression analysis found that runs in the 1st, 2nd, 5th, and 7th innings were the best predictors of wins, explaining around 75% of the variation in wins.
3) The 1st and 2nd innings are important because teams face the opposing lineup, and early leads influence momentum. The 5th and 7th innings involve relief pitching changes that can impact scoring.
This paper proposes using Naive Bayes models to predict the probability of a goal being scored in the NHL based on shot attributes, player information, and game situation. The paper develops two Naive Bayes models: an Empirical Naive Bayes model using empirical densities directly from the data without parametric assumptions, and a Parametric Naive Bayes model fitting known distributions to the data. Both models are compared to a logistic regression baseline using cross-validation. The Empirical Naive Bayes model has the lowest error rates overall, while the Parametric Naive Bayes and logistic regression models perform similarly. The models can help identify favorable and unfavorable shots and locations to inform player placement strategy.
Sports Aanalytics - Goaltender PerformanceJason Mei
- Goaltenders are more likely to allow a goal in the first 5 shots after allowing a previous goal, showing a cumulative 1.87% increased chance of letting in a goal. This suggests goaltenders' confidence is temporarily impacted after letting in a goal.
- However, individual goaltenders show variable patterns in performance after goals, with impacts differing based on each player's mental response to in-game events.
- While goals tend to be "clustered" right after another for the team overall, the effect is not uniform for all goaltenders and seems to depend more on the individual's psychological state.
This document is a thesis submitted by Steve Cultrera to Central Connecticut State University analyzing the impact of weather on runs scored in baseball games at Fenway Park in Boston over 40 years. It reviews previous literature that found variables like hits, walks, and stolen bases explain over 95% of runs scored, but none looked at weather impacts. The thesis aims to determine if weather variables like temperature, wind, and pressure can explain additional variance in runs. It describes the dataset created by combining baseball game data from Fenway Park with weather data from a nearby airport. Exploratory analysis, clustering, and predictive modeling techniques are used to analyze the data and relationships between weather and runs scored.
This document provides an overview of one-way analysis of variance (ANOVA). It begins by explaining the basic concepts and settings for ANOVA, including comparing population means across three or more groups. It then covers the hypotheses, ideas, assumptions, and calculations involved in one-way ANOVA. These include splitting total variability into parts between and within groups, computing an F-statistic to test if population means are equal, and potentially performing multiple comparisons between pairs of groups if the F-test is significant. Worked examples are provided to illustrate key ANOVA concepts and calculations.
Supervised sequential pattern mining for identifying important patterns of pl...Rory Bunker
This document describes a study that uses supervised sequential pattern mining (SPM) to analyze sequence data from rugby matches and identify important patterns of play. The study converts match event logs into labeled sequences of play events, then applies a supervised SPM method to identify patterns that discriminate between scoring/not scoring or conceding/not conceding. The supervised SPM method identifies more sophisticated and relevant patterns compared to unsupervised SPM methods. Key patterns indicated line breaks and lineouts were important for scoring, and maintaining possession and finding touch on kick restarts were important for preventing scores. The study concludes supervised SPM is useful for performance analysis in sports.
Stability criterion of periodic oscillations in a (4)Alexander Decker
1) The authors establish that the distribution of the harmonic mean of group variances is a generalized beta distribution through simulation.
2) They show that the generalized beta distribution can be approximated by a chi-square distribution.
3) This means that the harmonic mean of group variances is approximately chi-square distributed, though the degrees of freedom need not be an integer. Using the harmonic mean in place of the pooled variance allows hypothesis testing when group variances are unequal.
The Problem of the Chinese Basketball Association Competing for the ChampionshipDr. Amarjeet Singh
This document establishes mathematical models to analyze historical score data from Chinese Basketball Association (CBA) games over four years in order to predict team win probabilities and rankings. It calculates win probabilities for each team in the regular season and playoffs based on scoring averages and variances. A time series model is used to predict probabilities for the next year. A fuzzy comprehensive evaluation model analyzes stability, performance level, and average scores to qualitatively assess the overall strength of each team. The models provide rankings of teams for the regular season, playoffs, and an overall "average" strength level for the CBA league.
This paper investigates whether professional baseball players follow optimal strategies as predicted by game theory's Minimax theorem, using Major League Baseball playoff season data. The authors find that baseball players' strategies are predictable based on their previous actions, indicating they do not fully optimize. Higher salaries are found to decrease players' incentives to pursue optimal strategies and bring lower performance, while more experience leads to strategies more aligned with Minimax.
Correlation: Bivariate Data and Scatter PlotDenzelMontuya1
This document discusses bivariate data and scatter plots. Bivariate data involves collecting values of two variables from the same population. A scatter plot can show the relationship between two quantitative variables by plotting their ordered pairs. The direction and strength of the pattern in a scatter plot indicates whether the variables have a positive, negative, or no correlation. The correlation coefficient (r) measures the strength of the linear relationship between variables on a scale from -1 to 1.
Analysis of variance (ANOVA) is a statistical technique used to compare the means of three or more groups. It compares the variance between groups with the variance within groups to determine if the population means are significantly different. The key assumptions of ANOVA are independence, normality, and homogeneity of variances. A one-way ANOVA involves one independent variable with multiple levels or groups, and compares the group means to the overall mean to calculate an F-ratio statistic. If the F-ratio exceeds a critical value, then the null hypothesis that the group means are equal can be rejected.
A combination of ball events and positional data is needed to understand the players’ and
team’s performance. Thus, several indicators such as player-player and player-ball dyadic
coordination, intra-and inter-team synchronization, pattern-forming dynamics, time required
to regain ball possession, ball possession percentage, number of passes and their length have
been used to characterize individual and collective performance using AI algorithms
I provide a (very) brief introduction to game theory. I have developed these notes to
provide quick access to some of the basics of game theory; mainly as an aid for students
in courses in which I assumed familiarity with game theory but did not require it as a
prerequisite
This document discusses predicting the outcomes of National Hockey League (NHL) games using machine learning models. It aims to improve upon the results of a previous study by the University of Ottawa that achieved 60% accuracy. The document uses the same dataset from the Ottawa study containing statistics from 517 NHL games. It builds machine learning models using decision trees, neural networks, and a proprietary software to predict game outcomes. The models are built using different combinations of the dataset's categorical and continuous variables. The best performing models achieve accuracies between 57-62%, showing an improvement over the previous study.
Yujie Zi Econ 123CW Research Paper - NBA Defensive TeamsYujie Zi
The study examines the impact of a team's defensive or offensive focus on regular season win percentage in the NBA. Regression analysis is conducted using offensive and defensive statistics from 30 NBA teams over the 2005-2006 and 2007-2008 seasons. The regression finds that having a "defensive mindset", as defined by a team's defensive efficiency ranking higher than its offensive efficiency ranking, has an insignificant and negative effect on regular season win percentage. This suggests that a team's focus on defense or offense does not statistically impact regular season performance. Future research could study the impact of defenses in close games or overcoming large deficits.
1) The document describes a probabilistic graphical model for simulating basketball matches. It builds on a previous model by including the possibility of dribbling and distinguishing between open and contested shots.
2) Key aspects of the model include probabilities for shot attempts, drives to the basket, shot efficiency, and defensive impact. These are calculated based on player tendencies and abilities as well as the offensive and defensive lineups.
3) The model represents events in a possession as vertices in a graph and the progression between events as edges with weighted probabilities. This allows full simulations of games to be run using the model.
A Hybrid Constraint Programming And Enumeration Approach For Solving NHL Play...Shannon Green
This document proposes a hybrid constraint programming and enumeration approach to solve National Hockey League (NHL) playoff qualification and elimination problems. The approach uses constraint programming, enumeration, network flows and decomposition to efficiently determine the minimum points needed to guarantee or possibly earn a playoff spot. It was experimentally tested on NHL data from 2005-06 and 2006-07 seasons, providing earlier qualification results than newspapers. The approach can identify critical "must-win" games that significantly impact playoff chances.
This document summarizes a statistical model to predict results of Euro 2016 qualifiers using multivariate regression. The model examines individual player performance data from club matches to predict national team match outcomes. It finds the model can correctly predict match results 62.1% of the time and goal scores in 33.6% of matches. Factor analysis is used to compress defending and attacking player stats into defending and attacking factors for each team.
1. After watching the attached video by Dan Pink on .docxjeremylockett77
1. After watching the attached video by Dan Pink on the inherent weaknesses of extrinsic motivators, present two salient applications to your role as a leader in athletics. Dan Pink: The puzzle of motivation Ted.com
2. One of the very real truisms about leadership is that it can be lonely at the top and quite stressful. Please describe two specific ways you as a leader manage stress in your life.
BIBLIOGRAPHY
Annala, C. N., & Winfree, J. (2011). Salary distribution and team performance in Major League Baseball. Sport Management Review, 14(2), 167-175.
Breunig, R., Garrett-Rumba, B., Jardin, M., & Rocaboy, Y. (2014). Wage dispersion and team performance: a theoretical model and evidence from baseball. Applied Economics, 46(3), 271-281.
Devi R. (2016). Data.world. Baseball Stats. Retrieved September 25, 2019 from https://data.world/deviramanan2016/baseball-stats
Lee, S., & Harris, J. (2012). Managing excellence in USA Major League Soccer: an analysis of the relationship between player performance and salary. Managing Leisure, 17(2-3), 106- 123.
Scully, G. W. (1974). Pay and performance in major league baseball. The American Economic Review, 64(6), 915-930.
Sommers, P. M., & Quinton, N. (1982). Pay and performance in major league baseball: The case of the first family of free agents. The Journal of Human Resources, 17(3), 426-436.
Tao, Y. L., Chuang, H. L., & Lin, E. S. (2016). Compensation and performance in Major League Baseball: Evidence from salary dispersion and team performance. International Review of Economics & Finance, 43, 151-159.
Wiseman, F., & Chatterjee, S. (2003). Team payroll and team performance in major league baseball: 1985–2002. Economics Bulletin, 1(2), 1-10.
Running Head: PAY AND PERFORMANCE IN MAJOR LEAGUE BASEBALL 1
PAY AND PERFORMANCE IN MAJOR LEAGUE BASEBALL 5
PAY AND PERFORMANCE IN MAJOR LEAGUE BASEBALL
RODERICK HOOKS
9-16-2019
Purpose statement and model
This study will try to examine whether there is a relationship between the payment and performance of a team. Performance is the dependent variable measured by wins of a team in the 2010 Major League Baseball (Tao Y. et al, 2016). This is the suitable dependent variable since the wins for a team can be influenced by many factors and the final results are the main target of every team (Scully G., 1974). The primary independent variable is payroll which the totals pay of the team (Wiseman F. & Chatterjee S., 2003). This is suitable in determining whether there is relationship between pay and performance due to the fact that a higher anticipates higher performance since many challenges for the team can be solved by financial stability (Sommers P. & Quinton N., 1982).
The general form of the model will be;
Wins = b0 + b1Payroll + b2Attendance + Error (
Definitions of variables
The variables used in this study are wins, payroll and attendance. Win is the dependent variable measuring the number of games the team wins. I ...
Similar to 1982 maher modelling association football scores (20)
This document provides guidance on how to analyze a soccer match by looking at various elements of team structure, tactics, tendencies, and key players on both sides of the ball. It outlines things to observe such as formations, roles of midfielders and defenders, attacking and passing patterns, set pieces, pressing strategies, and how a team's approach may change based on the score, time remaining, or other in-game factors. The level of detail in the analysis can help high-level coaches better understand the opponent and make appropriate adjustments to their own tactics.
The document discusses the results of a study on the effects of a new drug on memory and cognitive function in older adults. The double-blind study involved 100 participants aged 65-80 who were given either the drug or a placebo daily for 6 months. Researchers found that those who received the drug performed significantly better on memory and problem-solving tests at the end of the study compared to those who received the placebo.
The document discusses how mathematical models are used to predict football match outcomes in both academia and the gambling industry. It notes that while academics use models to test theories of market efficiency, the best performing models are proprietary to bookmakers who rely on accurate predictions to set odds. The author then describes building their own ordered probit model to predict match results based on factors like home field advantage, team rankings, and past performance weighted by opposition strength. The model was tested on almost 9,000 international matches.
Georgia vs Portugal Georgia UEFA Euro 2024 Squad Khvicha Kvaratskhelia Leads ...Eticketing.co
UEFA Euro 2024 fans worldwide can book Georgia vs Portugal Tickets from our online platform www.eticketing.co. Fans can book Euro Cup Germany Tickets on our website at discounted prices.
Olympic 2024 Key Players and Teams to Watch in Men's and Women's Football at ...Eticketing.co
Olympic 2024 fans worldwide can book Olympic Football Tickets from our online platforms e-ticketing. co. Fans can book Olympic Tickets on our website at discounted prices. Experience the thrill of the Games in Paris and support your favorites athletes as they compete for glory.
Euro Cup Group E Preview, Team Strategies, Key Players, and Tactical Insights...Eticketing.co
We offer Euro Cup Tickets to admirers who can get Belgium vs Romania Tickets through our trusted online ticketing marketplace. Eticketing.co is the most reliable source for booking Euro Cup Final Tickets. Sign up for the latest Euro Cup Germany Ticket alert.
Turkey UEFA Euro 2024 Journey A Quest for Redemption and Success.docxEticketing.co
We offer Euro Cup Tickets to admirers who can get Turkiye vs Georgia Tickets through our trusted online ticketing marketplace. Eticketing.co is the most reliable source for booking Euro Cup Final Tickets. Sign up for the latest Euro Cup Germany Ticket alert.
Belgium vs Romania Ultimate Guide to Euro Cup 2024 Tactics, Ticketing, and Qu...Eticketing.co
Euro Cup 2024 fans worldwide can book Belgium vs Romania Tickets from our online platform www.eticketing.co. Fans can book Euro Cup Germany Tickets on our website at discounted prices.
Luciano Spalletti Leads Italy's Transition at UEFA Euro 2024.docxEuro Cup 2024 Tickets
Italy are the defending European champs, but after Luciano Spalletti swapped Roberto Mancini last September, they are still taking the cautious first steps of a new era
Here are our Euro 2024 predictions for the group stages
Will England make it through the group stages?, Will Germany use the home advantage to full effect?
Follow our progress, see how many we get right
If you want to join in let us know before the first game kick off and we can invite you to our private league
or join in with our friends at DeeperThanBlue
https://www.linkedin.com/posts/activity-7204868572995538944-qejG
https://www.selectdistinct.co.uk/2024/06/13/euro-2024-match-predictions/
#EURO2024 #Germany2024 #England #EURO2024predictions
According to the report, the consumption of video content related to IPL 2024 has seen significant growth, nearly 3 times more than the previous season, reflecting an increasing interest of fans.
Belgium vs Slovakia Belgium Euro 2024 Golden Generation Faces Euro Cup Final ...Eticketing.co
We offer Euro Cup Tickets to admirers who can get Belgium vs Slovakia Tickets through our trusted online ticketing marketplace. Eticketing.co is the most reliable source for booking Euro Cup Final Tickets. Sign up for the latest Euro Cup Germany Ticket alert.
Spain vs Croatia Euro 2024 Spain's Chance to Shine on the International Stage...Eticketing.co
Euro 2024 fans worldwide can book Spain vs Croatia Tickets from our online platform www.eticketing.co. Fans can book Euro Cup Germany Tickets on our website at discounted prices.
Spain vs Italy Spain Route to The Euro Cup 2024 Final Who La Roja Will Face I...Eticketing.co
Euro Cup fans worldwide can book Spain vs Italy Tickets from our online platform www.eticketing.co. Fans can book Euro Cup Germany Tickets on our website at discounted prices.
1. Modelling associationfootball scores
by M. J. MAHER*
Abstract Previous authors have rejected the Poisson model for association football scores in
favour of the Negative Binomial. This paper, however, investigates the Poisson
model further. Parameters representingthe teams’ inherent attacking and defensive strengths are
incorporated and the most appropriate model is found from a hierarchy of models. Observed and
expected frequencies of scores are compared and goodness-of-fit tests show that although there
are some small systematicdifferences, an independent Poisson model gives a reasonably accurate
description of football scores. Improvements can be achieved by the use of a bivariate Poisson
model with a correlation between scores of 0.2.
Key Words: Poisson goals distribution, iterative maximum likelihood.
1 Introduction
MORONEY(1951) demonstrated that the number ofgoals scored by a team in a football
match was not well fitted by a Poisson distribution but that ifa “modified Poisson” (the
Negative Binomial, in fact) was used, the fit was much better. REEP,POLLARDand
BENJAMIN(1971) confirmed this, using data from the English Football League First
Division for four seasons, and then proceded to apply the Negative Binomial distribu-
tion to other ball games. The implication of this result is that the same Negative Bino-
mial distribution applied to the number of goals scored by a team, regardless of the
quality of that team or the quality ofthe opposition. In fact in an earlier paper, REEPand
BENJAMIN(1968)remarked that “chance does dominate the game”. HILL(1974)was un-
convinced by this and showed that football experts were able, before the season started,
to predict with some success the final league table positions. Therefore, certainly over a
whole season, skill rather than chance dominates the game. This would probably be
agreed by most people who watch the game of football; that whilst in a single match,
chance plays a considerable role (missed scoring opportunities, dubious offside deci-
sions and shots hitting the crossbar can obviously drastically affect the result), over
several matches luck plays much less ofa part. Teams are not identical; each one has its
own inherent quality, and, surely then we should expect that when a good team is play-
ing a weak team, the good team will have a high probability of winning and scoring
several goals. By using data from the whole orjust a part of the season, these inherent
qualities of the teams in a league can be inferred by, for example, maximum likelihood
estimation (as in THOMPSON(1975)) or by linear model methodology (as in HARVILLE
(1977) and LEEFLANGand VANPRAAG(1971)).
2 The Model
There are good reasons for thinking that the number of goals scored by a team in a
match is likely to be a Poisson variable: possession is an important aspect of football,
* Department of Probability and Statistics, Sheffield University, Sheffield S3 7RH,England.
Statistica Neerlandica 36 (1982), nr. 3. 109
2. and each time a team has the ball it has the opportunity to attack and score. The proba-
bility p that an attack will result in a goal is, of course, small, but the number of times a
team has possession during a match is very large. Ifp is constant and attacks are inde-
pendent, the number of goals will be Binomial and in these circumstances the Poisson
approximation will apply very well. The mean of this Poisson will vary according to the
quality of the team and so if one were to consider the distribution ofgoals scored by all
teams, one would have a Poisson distribution with variable mean, and hence something
like the Negative Binomial observed by MORONEY(1951) and REEP,POLLARDand
BENJAMIN(1971) could arise.
Therefore, in this paper, at least for the present, an independent Poisson model for
scores will be adopted. In particular, if team iis playing at home against teamjand the
observed score is (xu,yu),we shall assume that X,,is Poisson with mean a,Jj,that Y,,is
also Poisson with mean y, d,, and that Xuand Yuare independent. Thenwe canthink of a,
as representing the strength of team i’s attack when playing at home& the weakness of
teamj’s defence when playing away, y,the weakness of team i’s defence at home and d,
the strength of teamj’s attack away. In a league with 22 teams there are 88 such param-
eters (and 924 observations onthe scores); however ifall the a’sare multiplied by a fac-
tor kand all theg’s divided by k,all the a,& products are unaffected and, therefore, in
order to produce a unique set of parameters the constraint
may be imposed. In the same way the constraint
C y i = c d i
i I
may be imposed and so only 86independent parameters need to be specified. Since the
-X and are assumed to be independent of each other (representing separate “games”at
the two ends of the pitch), the estimation of the _a and4will be entirely from the gand
the estimation of the y and _S by means of the y alone.
For the home teams’ scores, therefore, the log likelihood function is:
i j + i
Therefore,
a log L xii
-=aai j + ic (-A+;)
and so the maximum likelihood estimates i,isatisfy:
110
3. An iterative technique, such as NEWTON-RAPHSON,enables these MLEs to be deter-
mined. One simpler schemewhich workswell is to be use the 6’sto estimate thej’sand
then to use theg’sto estimate the 63,and so on alternately. Good initial estimates can
be gained by regarding the denominator terms in the expressionsabove as summations
over all teams: that is,
Ci= 1xu/,& and jj=1xu/&, where &=I1xu.
j + i i+. j i j=ei
In a similar way, using the yo, and 3may be found.
3 Results
Data were obtained, in a convenient matrix form, from the Rothmans Football Year-
book (1973, 1974, 1975). This gave 12 separate leagues (the four English Football
League Divisions for each of three seasons) for analysis. The MLEs of the four types of
parameterg,J,e,andJare shownin Table 1forjust one data set: Division 1inthe season
1971-1972.
Table 1. Maximum likelihood estimates of the parameters for Division 1 1971-1972.
home away home away
attack defence defence attack
a B Y 6
Arsenal 1.36 1.03 0.64 1.06
Chelsea 1.55 1.18 0.97 0.83
Coventry City 1.05 1.66 1.12 0.84
Crystal Palace 0.99 1.28 1.49 0.65
Derby County 1.62 0.89 0.50 1.24
Everton 1.06 1.17 0.81 0.44
Huddersfield Town 0.46 1.37 1.06 0.74
lpswich Town 0.72 1.27 0.93 0.98
Leeds United 2.02 0.82 0.49 0.91
Leicester City 0.69 1.31 0.54 1.10
Liverpool 1.78 0.54 0.78 0.78
Manchester City 1.82 1.17 0.75 1.40
Manchester United 1.49 1.35 1.31 1.49
Newcastle United 1.14 1.29 0.88 0.93
Nottingham Forest 0.98 1.96 1.43 1.10
Sheffield United 1.49 1.31 1.28 1.09
Southampton 1.21 1.98 1.38 1.05
Stoke City 0.99 1.17 1.20 0.64
Tottenham Hotspur 1.71 1.12 0.63 0.87
West Bromwich Albion 0.84 1.16 1.13 0.99
West Ham United 1.18 1.22 0.92 0.78
Wolverhampton Wanderers 1.34 1.30 1.15 1.48
The question arises of whether all these parameters are necessary for an adequate des-
cription of the scores. Intuitively it seems that there must be real differences between
teams, but are these differencesmore apparentin the attacksordefences, and is it really
necessary to have separate parameters for the quality of a team’s attack at home and
111
4. away? Consideration of such questions leads to a possible hierarchy of models which
could be tested. At the bottom is model 0in which aj= a,Ji =J,yi = yand 6,= 6Vi;that
is, all teams are identical in all respects. At the top is model 4, previously described, in
which all four types of parameter are allowed to take different values for the different
teams. The hierarchy is shown in Table 2. In this the notation is designed to show
whether a set of parameters (such at theJ) are free to take different values for the dif-
ferent teams (shown asg,)or whether the same value applies to all teams (shown asJ).
Table 2. Hierarchy of models, with changes in the value of twice the maxirnised log likelihood
shown for Division I 1971-1972
~ ~ ~~
Model 4
Models 3C, 3D
Model 2
Models lA, 1B
Model 0
In model 0 there are four parameters but in order to have a unique set ofparameter esti-
mates, the constraints a=J and y = 6 are imposed (or, equivalently, a =J,y = kJ and
6= ka),givingjust two independent parameters. Details of the constraints imposed in
the other models are as follows:
Model IA
Model IB
Model 2
Model 3C
Model 30
Model 4
6, =a,,B,=J,y, =y V i ; Za,=a,.Therefore, there are n +1independent
parameters (where n is the number of teams in the league).
yI =8,,a,=a, 8,=J V i ; Za,=ZJl . Again, there are (n+1) independent
parameters.
6,=ka,,y,=kJ1Vi; Za,=Z.,.There are 2n independent parameters.
S,= a,V i ; Za,=ZJ,. 3n - 1 independent parameters.
y, =J,VI;Zal=ZJ, .Again, 3n - 1independent parameters.
Za,=aland ZyI=23,. Therefore, there are 4n-2 independent param-
eters.
It can be seen, therefore, that moving up one level in the hierarchy of models leads to
the introduction of (n- 1) further parameters. Under the null hypothesis that these
extra parameters are unnecessary, 2 logelwill be asymptotically Xj- I distribution by
the usual likelihood ratio test, where logelisthe increase in the log likelihood in moving
from one model to the other.
112 Statistica Neerlandica 36 (1982), nr. 3.
5. ForDivision 1 in the season 1971-1972 the changes in the value ofthe maximised log
likelihood when moving from one model to another are shown in Table 2 (n=22;
X.;5(21) = 32.7 and X.$(21) =38.9).
This table shows that when inequality of the aiis allowed (moving from model 0 to
model 1A or moving from model 1B to model 2), a highly significant increase in the log
likelihood results. Similarly, when inequality of theJi is allowed (model 0 to lB, or 1A
to 2), again the log likelihood increases very significantly. When the di are freed from
being proportional to the ai(model 2 to 3D or 3Cto 4),a marginally significant increase
in the log likelihood is obtained. However, when the yiare freed from their linking with
the&, no significant increase results. It should be noticed that the order in which the
freeing of these parameters occurred had virtually no effect on the increase in the log
likelihood due to each one; this was true for all the twelve data sets. Therefore, it is pos-
sible to associate an increase in the log likelihood with each of the four types of param-
eters, and, in parallel with the ideas oflinear models in which factorsare introduced into
the model one at a time, the “inclusion of I”,for example, means the freeing of the yi
from their linking with thepi.Table 3 shows the increases in log likelihood due to the
inclusion of each of the four types of parameter, for each of the twelve data sets. There
are 22 teams in divisions 1 and 2 and 24 teams in divisions 3 and 4. The numbers of
degrees of freedom, therefore, in the asymptotic X 2 distribution for2 logelare21 and 23
respectively.
Overall, then, it can be seen that the parameters gandlshould certainly be included in
the model but the parameters yand _S need not be included. (Not only can the null hypo-
theses not be rejected in these latter cases, but they appear perfectly consistent with the
data.) This means that a single parameter aican be used to describe the quality of team
i’s attack, and the parameter& to describe the weakness of the team’s defence, whether
the team is playing at home oraway. Soalthough home ground advantage isa highly sig-
nificant factor, it applies with equal effect to all teams, and each team’s inherent scoring
power is diminished by a constant factor when playing away.
Table 3. Increase in log likelihood due to inclusion of each of the four types ofparameter in the
model
season division -a B Y -6
1971-1972 1 37.7** 35.7** 8.6 17.7*
2 23.4** 32.4** 8.7 6.1
3 40.6** 28.1** 11.5 19.7*
4 29.4** 34.2** 12.5 11.0
1972-1973 1 24.8** 7.1 8.8 8.2
2 23.2** 17.9* 3.9 13.2
3 27.8** 18.4* 12.3 12.1
4 26.3** 30.2** 8.9 15.0
1973- 1974 1
2
3
4
12.5 19.5** 14.8 10.6
19.8** 20.1** 15.1 10.2
23.4** 39.9** 13.5 14.9
31.O** 28.4** 8.5 13.0
* indicates a significant increase at the 5%level
** indicates a significant increase at the 1% level
Statistica Neerlandica 36 (1982), nr. 3. 113
6. In the light of the results above, then, Model 2 was adopted as being the most appro-
priate, and further analyses were made of its adequacy as a description of the mech-
anism underlying football scores.
4 Goodness-of-fit-tests
For a match between team iand teamjthe MLEsfrom model 2 may be used to estimate
fluand A, ,the means of Xuand Yti.Since X, and Yuare assumed to be Poissonand inde-
pendent, the probabilities that Xu= xand Y,=ymay be easilycalculated. By repeating
this forall pairs of i andj, the expected score distributions may be found and compared
with the observed score distributions. For Division 1 in the season 1971-1972, for
example, these observed and expected frequencies are shown in Table 4.
Table 4. Observed and expected frequencies of home and away scores forDivision 1 1971-1972
home away
no.of goals obs. exp. obs. exp.
0 117 111.2 184 189.3
1 127 144.6 157 159.5
2 115 106.1 88 15.9
3 66 58.0 30 26.9
234 31 42.1 3 10.5
X 2 =4.90 x 2 = 7.79
For the fitted model 2 the MLEs of the parameters are:
J*i i * j
Vi, j
(1 +i2)Zjj jJ=(1 +i2)1i?i
ai=
and
z:L Yu
ccxu
- r j * i
k2=-
i j * i
It follows from this that
and that
which means that the sum ofthe means ofthe fitted Poissondistributions is equal tothe
observed numbers of goals scored. The estimation of the parameters gives rise, there-
fore, to one linear constraint on the expected frequencies in each of the two X 2 good-
114 Statistica Neerlandica 36 (1982), nr. 3.
7. ness-of-fit tests in Table 4. The resulting statisticswill be approximately X i distributed
under the hypotheses that home and away teams’ scores are Poisson distributed. This
wasrepeated for each ofthe other eleven data sets,and the resulting X2statisticsare list-
ed in Table 5.
Table 5. Values of the X 2 statistic for home and away teams’ scores for the independent Poisson
model
season division
x2 values
homes aways
1971-1972 4.90
5.71
10.05*
4.62
7.79
1.08
8.96*
1.07
1972-1973 6.08
3.44
4.94
0.78
13.41**
9.77*
4.31
3.22
1973-1974 1 7.91* 1.33
2 1.97 1.12
3 0.89 5.28
4 3.61 1.92
5% level 7.81
1% level 11.3
critical values
* indicates a significant value at the 5% level
** indicates a significant value at the 1% level
The case where the model would be rejected are shown by an asterisk. For the home
teams’ scores there are two such cases and for the away teams’ scores there are three.
Overall, then, the Poisson model may be regarded as acceptable, although with some
slight doubt. If the observed and expected frequencies are compared for each of the
twelve data sets, some small but systematic differences can be seen. The overall
observed and expected proportions are:
HOME SCORES
no. of goals 0 1 2 3 2 4
observed 0.217 0.321 0.254 0.130 0.078
expected 0.230 0.318 0.238 0.128 0.086
AWAY SCORES
no. of goals 0 1 2 3 2 4
observed 0.388 0.371 0.177 0.051 0.014
expected 0.406 0.352 0.166 0.056 0.020
The model underestimates the number of occasions on which one and two goals are
scored,and overestimates the number oftimes that 0or 4 goalsare scored.Thiseffect
Statistica Neerlandica 36 (1982), nr. 3. 115
8. can be seen in each of the twelve data sets. The differences are small and overjust one
season do not seriously inflate the X value, but if the observed and expected frequen-
cies for all twelve seasons are added the values of the X z statistics (16.2 and 28.8 for
home and away scores respectively) would lead to clear rejection of the model. The dis-
tribution of the number of goals scored by a team in a match is very close to a Poisson
distribution, then, but is slightly “narrower”. This might seem to conflict with MORO-
NEY’S (1951)and REEPand BENJAMIN’S(1968) conclusion which was that a distribution
which was wider (in terms of the variance to mean ratio) than the Poisson was required;
the Negative Binomial was their fitted distribution. However, in both these other works
a single distribution was fitted to scores from all matches, whereas here each match has
a different fitted Poisson distribution.
5 A bivariate Poisson model
There is no shortage of possible explanations, of course, for the small discrepancy be-
tween the independent Poisson model and the data; in fact it is perhaps fairer to saythat
it is surprising that such a simple model comes soclose to explaining the data sofully!A
match does not consist of two independent games at opposite ends of the pitch; to the
teams concerned, the result is all important, and so, for example, ifa team is losing with
ten minutes left to play, it must take more defensive risks in order to try to score.There-
fore, an examination of the distribution of the difference between the teams’ scores,
2”=Xu- Yomight be revealing. Table 6 shows the observed and estimated frequencies
for Z under model 2 for Division 1 in 1971-1972.
Table 6. Observed and estimated frequencies for Z, the difference in the teams’ scores, for
Division 1 in 1971-1972, for (i) the independentPoisson model and (ii) the bivariate
Poisson with e=0.2
Z ( - 3 - 2 - 1 0 + I + 2 + 3 + 4 ) 5
observed 8 26 72 129 105 69 31 16 6
estimated (e=O) 14.4 30.3 69.8 113.0 104.9 68.7 35.8 15.8 9.3
estimated (e=0.2) 9.9 25.3 68.0 126.2 111.7 67.7 32.6 13.4 7.1
In this it can be seen that the number of drawn matches (Z= 0) is a little underestimat-
ed. This is a systematic feature noted in all twelve data sets. The X 2 goodness-of-fit
statistics are shown in Table 7; four of the twelve are significant at the 5% level, whilst
several others approach this. Only one of the twelve has a value of the X 2 statistic which
is less than the expected value of 7. (The number of degrees of freedom is reduced to 7
because of the linear constraint on the expected frequencies resulting from the estima-
tion of the a’sandJ’s.) This suggests that there may be some correlation between the X,,
and Yu.A bivariate Poisson model was tried; in this the marginal distributions are still
Poisson with means flu( =aiJj)and ,lo(=k2ajJi)but there is a correlation of e be-
tween the scores. One way of thinking of such a bivariate Poisson distribution is that
Xu = Uti+ W, and Yti= Vo+ Wuwhere U,, Vuand W” are independent Poisson with
116 StatisticaNeerlandica 36 (1982), nr. 3.
9. means of bu - qu), (A, -qu) and qii respectively, with qu(= ~,/mbeing the co-
variance between x, and Yu.
A range of values of p was tried and the most appropriate seemed to be around 0.2. In
computing the expected frequencies for Z, the values of the C’s,j9s and used were
those found from the fitting of the independent Poisson model. The terms in the Pois-
son bivariate probability function can be calculated by the following recursive relation-
ship:
m = e x p (-p-A-q)
.Pxy=b- ‘ I ) P x - I . ” + ‘ l P x - I y - l
Y PXY = (A-d P X Y - I + VPx- 1y - 1
Table 7. Valuesof the X 2 goodness-of-lit statisticforZ, th8differencebetweenthe teams’scores,
for (i) the independent model and (ii) the bivariate Poisson model with e=0.2
season
~ ~
x2values
division independent model bivariate model
1971-1972 1
2
3
4
9.67
16.42*
10.87
12.99
1.86
6.50
3.94
5.75
1972-1973 1 15.51* 4.77
2 13.70 11.98
3 4.79 2.50
4 15.30* 8.27
1973-1974 1 16.47* 9.08
2 9.76 12.29
3 13.53 8.00
4 10.36 5.40
5%level
1% level
critical values
14.1
18.5
12.6
16.8
The results of fitting this bivariate Poisson model are shown in Tables 6 and 7, where it
can be seen that the introduction of the extra parameter Q has led to a considerable
improvement in the fit. The X2 statistics in Table 7 are not only non-significant but are
fairly representative values from a Xi distribution. (It has been assumed that the fitting
of the extra parameter e will be roughly equivalent to the imposition of another linear
constraint on the expected frequencies, although in fact the same value of e has been
applied to all the twelve data sets). A bivariate Poisson model with correlation ofabout
0.2, therefore, would seem to give a very adequate fit to the differences in scores.
6 Summary
Previous work on the distribution of scores in football matches has rejected the Poisson
model in favour of the Negative Binomial. This work, however, has not allowed for the
different qualities of the teams in a league. The first model investigated here assumes
that the home team’s and aways team’s scores in any one match are independent Pois-
Statistica Neerlandica 36 (1982), nr. 3. 117
10. son variableswith means a,Jjand yiSj, where the parameters 44,y and_S represent the
qualities of the teams attacks and defences, in home and away matches. Maximum
likelihood estimation of these parameter showsthat onlythe _a and4are needed, show-
ing that the relative strength of teams’ attacks is the same whether playing at home or
away; the same applies to the defences.
When this model is aplied to each of the twelve data sets and observed and expected
scoredistributionsare compared by means of a X 2 test, nineteen out of the twenty-four
cases give a non-significant result at the 5% level. Overall, then, the independent Pois-
son model gives a reasonably good fit to the data. The deviations from this model are
small but consistent in each of the data sets, there being slightly fewer occasions ob-
servedthan expected on which no goalsora large number ofgoalsare scored. When the
differencesin scoresare investigated however,the lack of fit ofthe model is rather more
serious and suggeststhat the independence assumption is not totally valid. A bivariate
Poisson model was then used to model this dependence between scores and this
improved the fit considerably for the differences in scores. The correlation coefficient
between home teams’ and away teams’ scores is estimated to be approximately 0.2.
Acknowledgement
The author would like to thank an anonymous refereefor his very helpful commentson
an earlier version of this paper.
References
HARVILLE,D. (1977), The use of linear-model methodology to rate high school orcollege football
HILL,I. D. (1974), Association football and statistical inference, Appl. Statist. 23, No. 2, pp. 203-
LEEFLANG,P. S. H. and B. M. S. VAN PRAAG(1971), A procedure to estimate relative powers in
binary contacts and an application to Dutch football league results, Statist. Neerlandica 25, No. 1,
MORONEY,M . (1951), Factsfromfigures, London, Pelican.
REEP,C. and B. BENJAMIN(1968), Skilland chance in association football, J. R.Statist. SOC.A, 131,
REEP,C.,R. POLLARDand B. BENJAMIN(1971), Skilland chance in ball games, J.R. Statist. SOC.A,
THOMPSON,M. (1975), On any given Sunday: fair competitor orderings with maximum likelihood
teams, J. Amer. Statist. Ass. 72, No. 3S8, pp. 278-289.
208.
pp. 63-84.
pp. 581-585.
134, pp. 623-629.
methods, J. Amer. Statist. Ass. 70, No. 351, pp. 536-541.
Received October 1981, Revised December 1981.
118 Statistica Neerlandica 36 (1982), nr. 3.