Final Research Paper

Greg Martino
12-5-2015
Econometrics
Final Research Paper
Paying to Win: Payroll and Success in Major League Baseball
I. Introduction
One of the most controversial topics surrounding major league baseball is just how
much a team’s spending means to its success. Year after year, teams make splashes in free
agency with outlandish contracts for big name players, only to see mixed results on the field.
This paper intends to focus on the question: Do MLB teams that spend more win more
games? On the surface spending more money on players should equate to a higher win total,
seeing that the highest paid players often have the largest impact on the outcomes of games.
However, an ambiguous result would not be surprising, seeing that there are several teams
near the bottom of the league in payroll that are able to find success. This paper will also
serve to determine the potential effect payroll and other major statistics may have on a
team’s success, and whether these results are consistent with prior findings in other studies.
Throughout the paper I will report on past research, evaluate the validity and meaning of
results, and reflect on the theoretical implications of various statistics, ultimately concluding
whether there is a relationship between payroll and success in major league baseball.

II. Literature Review
There are several studies that have been done in the past concerning how various
factors surrounding a team, including payroll, affects success. One such study performed by
Wen-Jhan Jane, an economics professor at Shih Hsin University, seeks to find whether team
payroll and player salary impact team performance. Jane theorizes that there should be a
positive relationship between these factors and a team’s wins. From a player perspective,
paying players more signals that they are more talented, and when viewing as a team, a high
payroll should mean that a team has several high paid players, equating to more team
success. In her model, Jane runs a regression with team performance as the dependent
variable and payroll/salary as independent, observing MLB teams from 1998-2007.
Ultimately it is concluded that while there is a positive relationship between payroll and
team performance, it is a very weak relationship (Jane, 2010). She cites within her results
that there are in fact many instances where a team with a low payroll experiences a good
amount of team success. Along with being theoretically consistent, her model is consistent
with results from similar studies that have been done.
A second instance of prior research was completed in 2012 by Andrew Somberg and
Paul Sommers. In their study they observe whether payroll affects a team’s probability of
making the playoffs. In theory there is a positive relationship between the two, as a high
payroll should lead to more wins, resulting in a team successfully making the playoffs. Their
model observed MLB teams from 1998-2011, with payroll being the independent variable

and the likelihood of making the playoffs the dependent. After running the regression, it was
concluded that a there is in fact a positive relationship, with a 1% increase in payroll
associating with a 0.3% increase in a team’s probability of making the playoffs (Somberg and
Sommers, 2012). The results of this paper line up with what Jane found; there is a positive
relationship between payroll and success, although the relationship is fairly weak.
Branching off of these two ideas, I began to wonder whether there were other factors
besides payroll that could be directly tied to team success. In 2009, Henry Demmink did a
study on the potential effect that the number of bases a team attempts to steal has on
winning. Theory says there should be a positive relationship, since attempting to steal a base
(when successful) puts a team in better position to score runs and more runs would usually
equate to more wins. For the model, Demmink ran a regression with the number of stolen
base attempts as his independent variable and a dependent variable of team wins, focusing on
these values for MLB teams from 1991-2004. He found that a one standard deviation increase
in the number of stolen base attempts associated with an increase of 3.65 wins per season
(Demmink, 2012). This result was consistent with the theory that more stolen base attempts
would lead to more wins.
This paper will attempt to address all three cases outlined above. While the first two
concern payroll and success, the main purpose of my model, the third study involving stolen
base attempts will also be a launching point. While the focus of the paper will be to find if
there is a relationship between payroll and wins, various team statistics like stolen bases and

runs will also be incorporated into the model. It is my hope that my results will remain
consistent with the previous findings, but a difference in time period and/or variables
utilized could make for a different conclusion.
III. Theoretical Model
The model for this study will be partially based off of previous models that involved
similar variables, although it will account for more than just the primary independent
variable. Levels of my dependent variable, team wins, are believed to be directly influenced
by several team traits and statistics. Payroll is the main independent variable I will observe,
but several other statistics that I believe to be impactful will also be included in the empirical
model. In theory, having better offensive or defensive statistics should also play a large role
in a team’s eventual success, or lack thereof. Is payroll the root of these statistics? If teams are
spending their money wisely then a team with highly paid players (and a high payroll)
should have better offensive/defensive statistics, leading to more wins. In addition to
determining how all of these variables affect wins, the model will serve to determine
whether the theoretical estimations for each variable are consistent with the effect payroll
ultimately has on success. It would be an intriguing result if some of the team statistics have
an effect different than their theory and are not related to payroll.
IV. Empirical Model, Variable and Data Descriptions

The study examined in this paper is a panel-data study for all MLB over the last five
seasons (2011-2015). This should provide a large enough sample that realistic coefficients can
be found, as well as some variability among statistics. The majority of statistical data was
derived from baseball-reference.com, a popular site home to lots of statistical data across the
major sports. Payroll and salary data was retrieved from stevetheump.com and spotrac.com,
with both sites containing important information on team payrolls and player yearly salaries.
The base empirical model for this study is:
Wins = β0 + β1Payrollreal + β2TeamERA + β3BatAvg + β4SB + β5Runs + β6Errors +
β7KPitch + β8League + β9 HighPaid + β10BatAvg*League + β11Runs*League
+ e
A new model was also formed in order to account for any fixed effects. Some people who
are familiar with baseball may believe that certain teams with track records of consistent success,
like the New York Yankees or Boston Red Sox, will have a high total of wins regardless of how
much they spend. The new model, below, accounts for any of these special effects so as to look
at all teams on an even playing field.
xi: Wins = β0 + β1Payrollreal + β2TeamERA + β3BatAvg + β4SB + β5Runs + β6Errors
+ β7KPitch + β8League + β9 HighPaid + β10BatAvg*League +
β11Runs*League + e i.teamonly
The model contains several independent variables that all may play a significant role in
the amount of success a team experiences. They are defined as:

Payrollreal- The amount of money each MLB team spent on their players (converted to real
dollars) from 2011-2015
TeamERA- The earned runs allowed by a team from 2011-2015
BatAvg- The batting average by a team from 2011-2015, the percentage of at-bats that a team
gets a hit
SB- Stolen bases by a team from 2011-2015, measured per game
Runs- Runs scored by a team from 2011-2015, measured per game
Errors- Errors committed by a team from 2011-2015, measured per game
KPitch- Strikeouts by a team while pitching from 2011-2015, measured per game
League- A dummy variable based on whether a team is in the National or American League. 0 if
National, 1 if American
HighPaid- Number of players who made at least $8 million dollars in a season from 2011-2015
BatAvg*League- A dummy interaction to account for the impact that using a designated hitter
(American League teams only) may have on a team’s batting average, 2011-2015
Runs*League- a dummy interaction to account for the impact that a designated hitter may have
on the number of runs a team scores, 2011-2015
There are three instances of functional form in the model. The dummy variable League is
included because of the different rules that exist across the National and American League. In the
American League a team may use a designated hitter (DH) in place of the pitcher while batting.
This player only bats, and does not field, but gives the team an upgrade over the pitcher, who
usually has a very low batting average. In the National League teams are not allowed to do this.
The fact alone that teams in the American League are able to use a position player in place of the

pitcher when batting may give them a slight advantage when it comes to winning games. Note:
When a National League team plays an American League team, the designated hitter rule is
determined by the home team- if the AL team is home a DH is used.
This then leads to having two similar dummy interactions. The first, BatAvg*League
takes the fact that American League teams have a DH, and likely have a higher batting average
because of it. Runs*League works in a similar way, in that a team who utilizes a DH may scores
a few more runs, and this may impact win totals over an entire season. There are no cases of
quadratics in the model, since it doesn’t appear that any one variable has an exponential
relationship with wins.
All independent variables listed above have their own theoretical estimations for the
effect they should have on a team’s wins. That is to say, what should happen to wins if X
increases. Their relationship theories are listed below:
Payrollreal- positive, more spending = more wins
TeamERA- negative, giving up more wins = more losses
BatAvg- positive, more hits = more runs = more wins
SB- positive, more stolen bases = more runs = more wins
Runs- positive, more runs = more wins
Errors- negative, more errors = more runs for opponent = more losses
KPitch- positive, more strikeouts = fewer runs for opponent = more wins
League- positive (if AL), AL has a designated hitter = more hits = more runs = more wins
HighPaid- positive, but potentially ambiguous, more high paid players = more wins

BatAvg*League- positive, being in the AL means that teams have a designated hitter (instead of
a pitcher batting) = higher batting average = more wins
Runs*League- positive, having a designated hitter should = more runs = more wins
Within the data retrieved, there are a fairly wide range of values for team’s payroll, as the
lowest over the last five years was only $22,500,000 and the highest was over ten times greater,
at $273,000,000. This makes for huge variability in the Payrollreal variable, which should lend
well towards determining the relationship between it and wins. Also worth noting is the mean for
Wins. An MLB season is 162 games for every team, so the average record for the entire league
should be 81-81. However, there was one year that two teams were unable to play a game
towards the end of the season due to a rainout. For that season they only played 161 games,
which is why the mean for Wins in the model is slightly less than 81. Finally, while TeamERA
has a mean of 3.90, the Runs mean is higher, at 4.22. The difference in these is attributed to the
fact that some runs a team scores are because of the opposing team committing an error. When a
run scores due to an error it is tracked as an unearned run, and does not factor into a team’s ERA
(earned run average). Lastly, it would seem that the mean for League should be 0.5. However,
the Houston Astros switched leagues in 2014, to make both leagues equal with 15 teams. Prior to
this the NL has 16 teams while the AL only had 14, thus making the mean for League slightly
less than 0.5. A full table of these descriptive statistics can be found at the end of the document,
labeled Table 1.
After running the second empirical model, which accounted for any fixed effects among
teams, many intriguing results were arrived at. The full table of regressions results can be found
at the end of the document, labeled as Table 3.
V. Discussion of Final Results

After running the second empirical model, which accounted for any fixed effects among
teams, many intriguing results were arrived at. The full table of regressions results can be found
at the end of the document, labeled as Table 3.
The primary independent variable, Payrollreal is interpreted to be that a one million
dollar increase in team’s payroll (in real dollars) associates with a decrease in wins per season of
0.0155, holding all other independent variables constant. This variable is also found to not be
statistically significant since its p-value is 0.404, well above 0.1, which is required to be
statistically significant at the 10% significance level. Furthermore, this result for Payrollreal is
not theoretically consistent since it was slightly negative. This relationship also opposes the
values that were found from previous studies done on similar topics, where payroll had a slightly
positive relationship with a team’s success.
The rest of the results are as follows:
TeamERA- a one run increase in a team’s earned run average (i.e. increasing from 3.50 to 4.50)
associates with a decrease in wins per season of 17.628, holding all else constant. This result was
statistically significant, with a p-value of 0.000. This is an important result because of the
magnitude of the coefficient. Simply by giving up one more run per game, a team will win nearly
18 games fewer per season! This would be more than the difference of making the playoffs or
not, likely putting a team in .500 (same number of wins and losses) for a season. The fact that
this variable is statistically significant makes it even stronger. This result was also theoretically
consistent, since it had a positive relationship with Wins.
BatAvg- a one-tenth of a point increase in a team’s batting average (.250 to .350) associates with
a 0.287 increase in wins per season, holding all else constant. This variable was not statistically

significant, with an extremely large p-value of 0.975. This result was theoretically consistent,
since it had a positive relationship with Wins.
SB- an increase of one stolen base per game associates with a decrease in wins of 4.094, holding
all else constant. This result is statistically significant at the 10% significance level since it has a
p-value of 0.090. This result was not theoretically consistent, since it had a negative relationship
with Wins.
Runs- an increase of one run per game associates with an increase in wins of 15.784, holding all
else constant. This result is statistically significant, with a p-value of 0.000. Like TeamERA, this
variable has a strong relationship with the number of wins a team has per season, and is also
statistically significant. This result was theoretically consistent, since it had a positive
relationship with Wins.
Errors- an increase in errors per game by one associates with a decrease of 6.706 per season,
holding all else constant. Errors was not statistically significant, since it had a p-value of 0.140.
This result was theoretically consistent, since it had a negative relationship with Wins.
Kpitch- an increase a team’s number of strikeouts per game while pitching of one associates a
decrease in wins per season of 0.799, holding all else constant. This variable was not statistically
significant because of a p-value of 0.306. This result was not theoretically consistent, since it had
a negative relationship with Wins.
League- playing in the American League, as opposed to not, associates with an increase in wins
of 23.70 per season, holding all else constant. The calculation was done for β8 + β10*BatAvg +
β11*Runs using the coefficient values from the second empirical model and the mean values.

The League variable is statistically significant with a p-value of 0.043. This result was
theoretically consistent, since it had a positive relationship with Wins.
HighPaid- an increase in high paid players on a team by one player associates with an increase
in wins per season of 0.454, holding all else constant. This variable was statistically significant
with a p-value of 0.088. This result was theoretically consistent, since it had a positive
relationship with Wins.
BatAvg*League- an increase in batting average by one-tenth of a point for a team in the
American League, associates with a decrease in wins per season of 10.193, holding all else
constant. This was found with the equation β10*League. This variable is not statistically
significant on its own, since it has a p-value of 0.116. However, a test of joint significance with
BatAvg and League shows that it is jointly significant with those variables, with a p-value of
0.0807. This result was not theoretically consistent, since it had a negative relationship with
Wins.
Runs*League- an increase of one run per game for a team in the American League, associates
with an increase in wins per season of 0.169, holding all else constant. This is not statistically
significant on its own, with a large p-value of 0.914, but a test of joint significance shows it is
jointly significant with Runs and League with a p-value of 0.000. This result was theoretically
consistent, since it had a positive relationship with Wins.
The adjusted R-squared for this regression is 0.8858. This means that 88.58% of the variance in
Wins is explained by the variance in all of the independent variables. This is a significant results
because an adjusted R-squared close to one means that the relationships between the independent
variables and the dependent variable is strong. Adjusted R-squared takes the degrees of freedom

into account, which for this study is 109. Adjusted R-squared only will increase if the variable is
worthwhile to add, so the high number means that many of the variables included in the
empirical model were important to the study.
When looking at previous studies done using similar variables, the results vary in
consistency. The most significant difference was in Payrollreal. Prior studies had found a
positive relationship between team payroll and wins, even though it was very weak. This
regression found a slightly negative relationship, and the variable was not statistically significant.
The contrast in answers could be due to the difference in timeline in terms of the teams observed.
Previous models looked at teams from a different time period, and it’s possible that spending
habits and other variables have a different impact now than they did then.
The stolen bases result is also somewhat different than the one found previously by
Demmink. In this regression it was found to have a negative relationship with wins, while
Demmink found that more stolen base attempts led to more success. While the variables are
slightly different, more stolen base attempts would undoubtedly lead to more stolen bases over a
season. Once again, this difference in relationship is likely due to the change in time period.
Going beyond this, the way baseball is played has also changed since the early 2000s.
Back then, teams relied more on what is known amongst baseball circles as “small ball”. This
tactic is used by teams who lack power hitters, instead relying on a high batting average and lots
of stolen bases to generate runs. In the past several years, teams have begun to shift to rely much
more heavily on home runs, rather than “small ball”. If teams know they have power hitters who
will drive in runs, they may be more reluctant to even attempt stolen bases, thus reducing the
total number of steals for a season. This, along with simple answer of a different timeline, could

be why stolen bases had a different relationship with wins than what Demmink had previously
determined.
While it doesn’t seem any variables were omitted, which would have caused omitted
variable bias, there were some variables included that were found to not be statistically
significant. The inclusion of these irrelevant variable can lead to “mushy results”. This causes the
standard errors to increase, t-values to decrease, and p-values to increase for all of the other
relevant variables, making them appear less significant than they actually are.
For this model there were no problems like multicollinearity, heteroscedasticity, or serial
correlation. These conclusions were reached by running a correlation matrix and a test of
variance inflation factors for multicollinearity. A white test for heteroscedasticity found that that
was not non-constant variance in the residual. As stated and explained above, the second
empirical model accounted for any fixed effects that may be present in major league baseball
teams. However, no fixed effects were found to be present in any team, a result that lends well
for the regression, but may certainly be surprising to many fans of baseball.
VI. Conclusion
My empirical model and results lead me to conclude that while a team’s payroll may
have a slightly negative relationship with wins, the statistical insignificance means that there may
not be much of a relationship at all, and that payroll has a negligible effect on success. The
results from TeamERA and Runs are more useful and interesting ultimately. While it makes
sense that scoring more or giving up more runs has a determinable effect on team wins, these are
important results that line up both in theory and in actuality.

I believe that the results of this study could lend useful to some MLB teams, especially
those who attempt to fix their team by spending big in free agency. Simply having big contract
players does not mean a team will translate this perceived talent onto the field. There are many
variables that go into team success, and thinking that increasing payroll will automatically results
in a trip to the playoffs is both unrealistic and unlikely. Strong pitching and defense, in the form
of few errors committed, are both strongly tied to team success. If a team wishes to improve its
record from one year to the next, these are two areas that I would personally recommend they
focus on.

VII. Descriptive Table Results
Variable Name N Mean Std. Dev. Min. Max.
Wins 150 80.9933 11.02316 51 102
Payrollreal 150 1.09*108 4.41*107 2.25*107 2.73*108
TeamERA 150 3.9015 .4517 2.94 5.22
BatAvg 150 0.2536 .0113 .226 .283
SB 150 0.5966 .1741 .2160 1.0494
Runs 150 4.2179 .4144 3.1667 5.5
Errors 150 0.5986 .0899 .3333 .8272
Kpitch 150 7.5119 .6365 5.8025 8.9506
League 150 0.4867 .5015 0 1
HighPaid 150 3.6 2.6873 0 11
BatAvg*League 150 0.1243 .1283 0 .283
Runs*League 150 2.1211 2.2061 0 5.5

VIII. RegressionModel Results
Variable Name Parameter Estimate
Payrollreal
-1.55*10-8
(1.85*10-8)
TeamERA
-17.628
(1.170)***
BatAvg
2.874
(91.222)
SB
-4.094
(2.395)*
Runs
15.784
(2.441)***
Errors
-6.706
(4.508)
KPitch
-0.799
(.778)
League
50.466
(24.686)**
HighPaid
0.454
(.264)*
BatAvg*League
-209.447
(132.320)
Runs*League
.347
(3.212)
Adjusted R-Square .8858
Sample Size 150
Standard error in parentheses. *p<.10, **p<.05, ***p<.01

IX. Bibliography
Demmink, H. (2009). Value of stealing bases in Major League Baseball. Public Choice, 142(3-
4), 497-505. Retrieved October 22, 2015, from EconLit.
Jane, W. (2010). Raising salary or redistributing it: A panel analysis of Major League Baseball.
Economics Letters, 107(2), 297-299. Retrieved October 22, 2015, from EconLit.
MLB. (n.d.). Retrieved November 19, 2015, from http://www.spotrac.com/mlb/
MLB Teams and Baseball Encyclopedia | Baseball-Reference.com. (n.d.). Retrieved November
19, 2015, from http://www.baseball-reference.com/teams/
MLB Teams and Baseball Encyclopedia | Baseball-Reference.com. (n.d.). Retrieved November
19, 2015, from http://www.baseball-reference.com/teams/
Somberg, A., & Sommers, P. (2012). Payrolls and Playoff Probabilities in Major League
Baseball. Atlantic Economic Journal, 40(3), 347-348. Retrieved October 22, 2015, from EconLit.

X. Appendix
Correlation Matrix
VIF
White Test
RunsLeague 0.0857 0.1887 0.2226 0.0302 0.4259 -0.1022 -0.1067 0.9908 0.0795 0.9950 1.0000
BatAvgLeague 0.0699 0.1998 0.1990 0.0274 0.3639 -0.0935 -0.1142 0.9979 0.0565 1.0000
HighPaid 0.8294 -0.1478 0.3931 -0.0791 0.3667 -0.1485 0.2828 0.0359 1.0000
League 0.0544 0.1984 0.1523 0.0247 0.3310 -0.0883 -0.1179 1.0000
Kpitch 0.3070 -0.5496 -0.0436 -0.1687 0.0171 -0.2106 1.0000
Errors -0.1727 0.2358 -0.0158 0.1730 -0.0999 1.0000
Runs 0.2506 0.0593 0.7483 0.0509 1.0000
SB -0.1763 0.0395 0.0240 1.0000
BatAvg 0.2893 0.1624 1.0000
TeamERA -0.0924 1.0000
Payrollreal 1.0000
Payr~eal TeamERA BatAvg SB Runs Errors Kpitch League HighPaid BatAvg~e RunsLe~e
(obs=150)
. corr Payrollreal TeamERA BatAvg SB Runs Errors Kpitch League HighPaid BatAvgLeague RunsLeague
Mean VIF 209.91
SB 1.10 0.906605
Errors 1.15 0.867265
TeamERA 1.62 0.616870
Kpitch 1.64 0.609638
Payrollreal 3.53 0.283426
HighPaid 3.74 0.267524
BatAvg 6.48 0.154228
Runs 7.32 0.136629
RunsLeague 318.69 0.003138
League 650.70 0.001537
BatAvgLeague 1312.99 0.000762
Variable VIF 1/VIF
Prob > chi2 = 0.3479
chi2(1) = 0.88
Variables: fitted values of Wins
Ho: Constant variance
Breusch-Pagan / Cook-Weisberg test for heteroskedasticity

Final Research Paper

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to Final Research Paper

Similar to Final Research Paper (20)

Final Research Paper