SlideShare a Scribd company logo
1 of 34
Download to read offline
A Further Examination of the Determinants of NHL
Goalies’ Salaries
Matthew J Rosenstein: S1223256 / B029618
April 3, 2016
I acknowledge that this work is my own and would like to thank Dr. Colin Roberts for
his continued support throughout this entire process. Any mistakes in this work are also my
own.
Abstract
This dissertation attempts to extend and improve on the current literature on wage
determination for NHL goalies. With the primary measure of a goalies’ production, save
percentage, having a coe cient of variation of 0.012, and a coe cient of variation of a
goalies wages equal to 0.74; there is a large variation of wages that is not explained by
di↵erences in skill. This paper argues that variables related to goalies’ popularity, have
significant explanatory power in determining wages. Using OLS methods distributed
under the student-t distribution, this paper shows that after considering factors related
to a players popularity a lot more of the variation in goalies wages is accounted for.
1
Contents
1 Introduction 4
2 Literature Review 5
2.1 Wage Discrimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Wage Determination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3 The e↵ect of the 2004 - 2005 NHL Lockout 7
3.1 On Players . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 On Owners . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4 The Model 8
4.1 Monopsonistic Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
5 Variables to Consider 10
5.1 Measuring MPL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
5.2 Measuring MR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5.3 Other Potential Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
6 Deviation From Original Paper 13
6.1 Restricted vs Unrestricted . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
6.2 Free Agency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
6.2.1 Sample Selection Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
7 Data 15
7.1 The Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
7.2 Estimation Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
7.2.1 Model Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
7.2.2 No Perfect Colinearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
7.2.3 Zero Conditional Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
7.2.4 Homoscedasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
7.2.5 Normality of Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
7.3 Regression Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
7.3.1 Looking Back . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
7.3.2 Looking Forward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
8 Conclusion 29
8.1 Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
A Stata do File 35
3
1 Introduction
Professional sports provide an ideal setting for studying labor economics, since relevant infor-
mation such as workers name, popularity and production measures is common knowledge.
Due to the nature of having near perfect information, economists frequently use profes-
sional sports markets as a way of studying topics such as discrimination, monopsony power,
ine ciencies and salary determination for players. This paper attempts to estimate the rela-
tionship between salary and measures of marginal revenue product of labor for NHL goalies.
The MRPL of a goalie is the additional revenue generated by adding said goalie to your
team. Goalies’ MRPL is heterogeneous and depends not only on their own characteristics,
but also on those of their team.
The current consensus in the literature is that NHL goalies’ wages are primarily a func-
tion of their on-ice performance statistics. However, due to the rapid growth in both the
profitability and popularity of the NHL, following the locked-out season of 2004–2005, the
functional relationship between wage and measures of production may have changed. It is
the opinion of the author that in the post lock-out era of the NHL, wages are not only based
on on-ice performance statistics relating to winning, but also on o↵-ice measures that relate
to a players’ popularity. Using OLS methods, this paper will test this hypothesis by regress-
ing the natural log of wages on both measures of performance and measures of popularity
in order to determine whether including measures of popularity improves the measure-of-fit
for the regressions. The results indicate that once a goalie’s popularity has been accounted
for, a much larger variation in their wages can be explained than before.
The remainder of this paper will be presented as follows. Section 2 will consider the
literature on the NHL, section 3 will discuss the impact of the NHL lock-out and how this
a↵ected the traditional relationship between wage and MRPL. Section 4 will provide a simple
theoretical model for wage determination in the NHL and section 5 will discuss all of the
potential variables relating to a goalie’s wage. Due to the fact that this paper is an extension
of work completed by Berri and Brook (2010), section 6 will discuss the deviations from
their research in this paper. Section 7 will present the empirical estimates and results of the
paper by describing the data set,discussing the OLS assumptions required for the estimates
to be BLUE, and finally by analyzing the results of the regressions provided. Section 8 will
summarize the findings of this paper and suggest potential areas of future research.
4
2 Literature Review
Due to the fact that the National Hockey League (NHL) is the smallest of the four major
sports leagues in North America, both in terms of size and revenues, there are far fewer
studies on hockey then there are on baseball, basketball and football, and the literature
on goalies is even more limited. With that said, there are still a significant number of
papers on hockey; which are mainly divided into two categories: wage discrimination and
wage determination. Due to the lack of literature on the specific subject of NHL skaters
and goalies, this paper will also draw from literature on the national basketball association
(NBA) and major league baseball (MLB) in order to produce the best estimates possible.
2.1 Wage Discrimination
The majority of the literature before the 2004–2005 lock-out was primarily focused on wage
discrimination in the NHL.The main determinants of discrimination seem to be the reserve
clause and a person’s country of birth. Studies on wage discrimination, as a response to the
reserve clause, began in the 1970’s when Scully demonstrated that the reserve clause in MLB,
similar to restricted free agency in the NHL, allowed teams to exploit players due to their
monopsony power[2]. Following Scully’s model of discrimination in MLB and applying this
to the NHL, David Richardson[3] tested the existence of wage discrimination using data from
the 1990s and found no evidence of wage discrimination due to the reserve clause. Following
the locked-out season of 2004–2005, Lambrinos and Ashman (2007)[4] furthered this claim
by testing for wage discrimination, as a result of the reserve clause, against forwards and
defensemen and found that arbitrated salaries (salaries for restricted free agents that were
determined by an arbiter in an arbitration hearing) were not statistically di↵erent from
negotiated salaries.
Among the first to apply the theory of wage discrimination regarding geographical loca-
tion to hockey were Jones and Walsh (1988) [7], who used data from the 1987–1988 NHL
season to conclude that there was no wage discrimination against French-Canadian players.
They note, however, that due to limited data, their findings may lack external validity in
later time periods. Around the same time, using data from later NHL seasons, Lavoie, Gre-
nier and Coulombe (1987)[5] and Grenier and Lavoie (1988)[6] found statistically significant
evidence of hiring discrimination against French-Canadians, specifically defensemen. They
argued that factors such the language barrier made these players less valuable to American
and Anglo-Canadian teams; consequently, these players often accepted lower wages for the
same level of productivity. Later work by Jones and Wash (1999)[8] and McLean and Veall
(1992)[9] contradicted these findings by showing that there was no statistically significant
5
evidence of wage discrimination based on birth location. More recent work, such as that of
Lebo (2006)[10], found evidence that Europeans, not counting defensemen, were paid more
then Canadians, and that American forwards were also paid more than Canadians. Con-
sidering an entirely di↵erent sample of players, Reader and Sommers (2009)[11] tested and
found evidence that Russians were the highest paid goalies in the NHL, while Americans
were the lowest.
2.2 Wage Determination
Economists have been using professional sports to study labor markets due to the large
number of performance statistics available; enabling them to calculate the marginal revenue
product of labor of athletes much more e ciently than for workers in other professions.
While this concept will be developed further in this essay, the existing literature on wage
determination will be discussed in order to establish a foundation for this work. The majority
of the literature, however, only applies to forwards and defensemen due to the di↵erent set
of performance statistics for skaters (forwards/defensemen) and goalies.
Prior to the locked-out season of 2004–2005, most of the literature considered team and
teammate e↵ects on compensation. Economists such as Idson and Kahane[12][13] were able
to show how team e↵ects and franchise location were statistically significant when looking
at the determinants of players’ wages. Kahane[13] claimed that di↵erences in wages due
to franchise location could partially be explained by di↵erences in team revenues; however,
goalies were not considered in these papers due to the di↵erent nature of assessing a goalie’s
on-ice productivity.
Following the locked-out season of 2004–2005, researchers began assessing goalies as well
as skaters. One of the first to look into the determinants of NHL goalies’ wages was Watterson
(2009)[14]. He considered variables such as: games played (GP), games started (GS), wins
(W), losses (L), ties (T), overtime losses (OT), goals against (GA), goals against average
(GAA), saves (S), save percentage (SV%) and shutouts (SO). Out of the variables considered,
he found that GP, GS, W, GA, and SO were statistically significant in determining goalies’
wages. Berri and Brook (2010)[1] tested and found that SV%, L.SV%1
, sq. age, TOI,
and L.TOI were statistically significant determinants of wage. Following the work of Berri
and Brook, Fuller (2012)[15] considered variables such as GP, W, L, OT, GAA, SV%, SO,
shootout wins (SOW), shootout losses (SOL), and shootout save percentage (SOSV%); he
only found games played to be statistically significant. At the same time, Pantano (2012)[16]
investigated and found that height was also a statistically significant determinant of wages.
1
L.SV is the lag of save percentage.
6
3 The e↵ect of the 2004 - 2005 NHL Lockout
3.1 On Players
According to Paul Staudohar[17], the main reasons for the 2004–2005 NHL lock-out were:
“higher player fines for misbehavior, reducing the schedule of games, minimum
salaries, playo↵ bonuses for players, free agency, operation of the salary arbitra-
tion process, and revenue sharing” [17]
and most important for the league (not players) was to implement a salary cap. While some
of these issues are not necessarily relevant to the determinants of goalies’ wages, certain
outcomes of the lock-out do have an impact. Resolutions such as reducing the size of all
goaltenders’ equipment and limiting the area behind the net where a goalie can go, can
potentially have a significant impact on the number of goals a goalie will let in during any
given game. Combining these changes with the introduction of a salary cap as well as
a general 24% cut in wages means that the salary performance relationship that existed
prior to the 2004–2005 lock-out may not exist any more For this reason, in order to avoid
generating biased results due to changes of the NHL collective bargaining agreement, this
paper will only use data following the 2004-2006 locked-out season.
3.2 On Owners
Prior to the 2004–2005 locked-out season, the average value of an NHL team was $163,000,000
and the majority of teams were operating at a loss[20]. After the locked-out season, the
average value of an NHL team rose to $413,000,000 and a majority of teams were now
operating at a profit. Due to the lack of profitability for teams prior to the lock-out, it
makes sense that the majority of papers (e.g. Vincent and Eastman[19], Watterson[14],
Richardson[3] and Brook & Berri[1]) assume that firms, in this case teams, have the sole
goal of maximizing wins2
.
Considering the change in profitability of NHL teams after the lock-out, it is fair to
assume that while wins could have been the sole factor in determining owners’ (firms’)
profits prior to the lock-out, this cannot be considered to be the case in the period following
the lock-out. Another reason that supports this theory is the introduction of the salary cap
2
In Berri and Brooks’ paper they state that “The ultimate objective in hockey is to win the Stanley
Cup” [1]. In addition to that this is known through the fact that in each of these papers the authors
considered players wages as a function of only traditional performance metrics, without considering factors
like popularity / jersey sales just as an example
7
mentioned above. The introduction of the salary cap not only created a more competitive
league, but also forced teams to reconsider how they pay players. Following this, rich teams
could not simply pay large sums of money to acquire the best players; instead, teams needed
a way to evaluate players and pay them according to their performance. Having considered
these reasons, this paper will develop a standard model of firm profit maximization in order
to account for all of a players’ expected marginal revenue product of labor, not just that
relating to wins, in an attempt to produce a more e↵ective model of wage determinants for
NHL goalies.
4 The Model
Neoclassical economic theory states that in a competitive goods market, where prices are
given, a firm maximizes profits by maximizing revenues relative to the cost of production.
The fixed prices assumption implies that prices adjust su ciently slowly to accommodate
changes in demand, making prices e↵ectively given. We can mathmatically derive a goalies’
wage from the fact that firms maximize profits:
⇧ = pF(L, Z) wL (1)
therefore,
@⇧
@L
= 0 =) w = p
@F(L, Z)
@L
= MRPL (2)
where, output F, is a function of both labor (L) and exogenous measures of profitability
independent of the goalie on the team (Z). In terms of ice hockey, Z can be thought of as
the population of the city that a team is based, where larger cities have more people to sell
merchandise to regardless of who they hire as their goalie. If,
@2
F
@L@Z
> 0 (3)
then L and Z are complements implying, in the context of the previous example, that as
cities grow, profitability will grow; and if,
@2
F
@L@Z
< 0 (4)
then L and Z are substitutes, implying that the exogenous characteristics reduce the prof-
itability of goalies. For simplicity, it is assumed that L and Z are complements.
Equation 2 can be rewritten as:
w = MRPL = MPLP = MPLMR = F(X) (5)
8
Where wages are a function of characteristics X, representing all the available indicators of
a goalies quality and exogenous characteristics relating to their profitability.
Following the work of Peck (2012), a goalies’ marginal product of labor includes all
available performance indicators, of which will be introduced later. Although Pecks’ paper
refers specifically to forward and defensemen; he states that “For the team owner the MPL
includes performance indicators like goals, assists, career games (experience), etc.”[21] In
this paper, the performance measures will be adjusted to reflect goalies and not skaters. In
relation to marginal revenue, Peck states that
“Owners also factor in the additional revenues likely to be generated by signing
a particular player as well. This is the marginal revenue factor. Usually, this
manifests itself in the sale of o cial licensed gear with that player’s name on
it, like jerseys[...Therefore] teams will pay a player more if they believe that by
hiring him, they not only will have better success on the ice, but they can also
sell more licensed merchandise.”[21]
This consideration is important because it was the first time that factors other than per-
formance measures were considered when determining players’ wages in the literature. The
current work will extend Pecks’ assumptions regarding marginal revenue impact on the
salaries of NHL goalies.
In practice, when a team wants to sign a goalie to a contract (assign a wage), they have
no way of knowing at the time what a goalies’ MRPL will be. This is due to the fact that
in the NHL, and all other sports leagues, a player’s contract (wage) is determined prior to
them signing for a team. For this reason we will define wage as:
w = E(MRPL) = E[F(X)] (6)
Where X can be thought as all the observed characteristics that a↵ect the marginal revenue
product of labor. Although there are unobserved characteristics that a↵ect MRPL, for the
analysis to be internally valid it has to be assumed that these characteristics are uncorrelated
with the observable characteristics of X.
4.1 Monopsonistic Characteristics
It is important to understand that in a completely competitive market (for labor and goods)
both wages and prices are fixed. What that means to employers is that they will continue
to hire workers at the given wage, until the extra revenue generated by the last worker is
equal to his wage (MR = MC). In the context of the NHL, while the market for goods is
9
competitive, the market for labor is not. In the NHL, there is a strict number of goalies
allowed on any given NHL roster limiting the number of goalies teams can hire, while also
giving them monopsony power over the goalies as they are the ’only’ source of revenue for
these goalies3
. Therefore it can be thought that, instead of hiring goalies at a single wage
until the MR of the last goalie equals his MC, teams just hire the maximum number of
allowed goalies, paying each of them a wage equal to their E(MRPL), which is lower then
what it would be in a competitive labor market.
5 Variables to Consider
In order to determine the relationship between wage and expected MRPL we must first define
what factors are associated with a goalies MPL and MR.
5.1 Measuring MPL
While there are many measures of productivity to evaluate the MPL of skaters, such as goals,
assists, points, plus-minus, penalty minutes, shots, hits, etc., there are only a maximum of
six variables possible that relate to a goalie’s productivity. This is due to the fact that while
skaters have many responsibilities on the ice, the goalie only has one task: to stop the puck.
These variables include: GS, SOG, GA, GAA, SV% and wins.
Wins and games started are fairly self explanatory; shots on goal refers to the number of
shots a goalie has faced in any given season; and goals against is the number of goals a goalie
has let in in any given season. The di↵erence between shots faced and goals against is the
number of saves a goalie made. It is important to note that GAA is not simply the average
number of goals a goalie allows per game, but more specifically refers to the average number
of goals they allow per 60 minutes. This is significant because it allows for the fact that if a
goalie has a really bad start and gets pulled, his poor performance for the short time he was
on the ice will be extrapolated and reflect the amount of time he played for. For example, if
a goalie lets in three goals in the first period and then gets pulled, his GAA will be 9 instead
of 3 for that game. Since the mean GAA of our sample is 2.65 with a standard deviation of
0.36, a GAA of 9 is much more reflective of a very poor performance then a GAA of 3.
A goalies save percentage is the number of saves a goalie has made in a season divided
by the shots they faced in that same season:
Save%i =
Savesi
ShotsFacedi
(7)
3
As there is no comparable league that pays similarly to the NHL in terms of goalie compensation
10
Squared save percentage will also be considered to account for a possible nonlinear relation-
ship between save percentage and wage.
Following the work of Pantano (2012), height can also be considered as related to a
goalies’ MPL. This is an interesting variable to consider due to the e↵ects of the 2004–2005
lock-out. In order to try and improve the overall experience for fans, the NHL made a
number of rule changes to try and increase scoring. The most significant of these was that
there was an 11% reduction in the overall size of goalies’ equipment[24]. Where before a
goalie could wear any size pad they wanted, they now had to reduce the size of their pads
relative to their size. While the aim of this policy was to increase scoring, it also gave larger,
taller goalies an advantage over their shorter counterparts because they could wear larger
pads.
Neoclassical economic theory also suggests that age and experience, as well as their
squares, to model for possible nonlinear relationships, will be statistically significant factors
of a goalies MPL.
5.2 Measuring MR
Following the work of Peck (2012), a player’s MR can be estimated based on the number of
All-Star Game appearances a goalie has made in his career up to the point of his contract
expiration. Peck states,
“The All-Star variable has this unique characteristic because fans directly select
players to perform in the All-Star game through a voting ballot, making this
variable appropriately related to fan preference”[21].
While this paper, according to the analysis of Peck (2012), will consider All-Star Games as
a variable, it is important to note that NHL All-Stars are, for the most part, not determined
by fans. Prior to 2016, of the 42 players that comprised the All-Star Game, the fans selected
only 6 [22]. In 2016, of the 44 players that comprise the All-Star Game, fans selected only 4
[23].
To correct for the problem in measuring popularity by using All-Star Games, average
monthly Google searches will be used as a proxy for popularity, to account for a player’s MR.
In an ideal world, I would have exact data about revenues from player-specific merchandise,
however, this information is not publicly available. Due to the fact that exact data on average
monthly Google searches was not available for each player, instead the data was estimated
using Google Adwords, and Google Trends. Google Adwords provides an actual number of
Google searches for the past 24 months; Google Trends indexes the number of searches per
11
month from 2005 to the present on a scale from 0 to 100. Using both of these tools, it is
possible to estimate the average number of monthly Google searches for a player in the last
year of his contract.
It is important to note that while this method can be considered a strong proxy for fan
popularity, this variable has its drawbacks. Many people share the same name, so results
for very common names, like Mike Smith, could be biased. To correct for this when using
Google Adwords, a keyword most reflective of the player was used. For example, instead
of using the average monthly searches for the general name Mike Smith, where anyone who
typed the following keywords would be included: Mike Smith, Michael Smith, Smith Michael,
Michael Smith blog. I instead used searches for Smith Goalie where anyone who typed in the
following would be included: Mike Smith goalie, Michael Smith NHL goalie, smith goalie.
Mike Smith goalie had 43,000 average monthly Google searches, while Michael Smith NHL
goalie only had 1,010 which is clearly more reflective of the NHL goalies’ popularity. In
addition to this it is also important to note that the majority of goalies in our sample have
fairly unique names and therefore would no su↵er from this ’Mike Smith’ bias. Considering
all of the above, it is reasonable to assume that it is still valid to use Google searches as a
strong proxy for popularity.
Another variable that can be considered as a factor of MR is capacity. This is due to
the simple fact that, in theory, the higher the percentage of seats a team can fill, the greater
their ability to pay players higher wages. However, it is possible that the salary cap makes
this variable obsolete as all teams now have, essentially, the same ability to pay all their
players. The introduction of revenue sharing could also render capacity obsolete as it further
increases a small market teams’ ability to pay players fair wages regardless of the number of
seats they fill per game.
5.3 Other Potential Variables
Another potentially significant factor that has not been previously considered in the literature
is the bargaining power of goalies. While in competitive labor markets there is easy labor
mobility, meaning that if one firm refuses to pay a worker a wage equal to their MRPL,
another firm will, the NHL can be seen as a type of monopsonist employer. This is because
there are only 60 ‘jobs’ available for goalies to play in the NHL and outside of this there is
nowhere they can go to receive an even slightly comparable wage4
. This is suggestive of the
4
Besides playing in the NHL there is also the American Hockey league, leagues in all of the European coun-
tries, as well as the KHL in Russia. None of these league can even come close to matching the compensation
that NHL teams pay goalies.
12
fact that NHL teams could exploit monopsony power over goalies in situations where there
are many free agent goalies at one time because there are only a limited number of roster
spots; consequently, the goalies would need to accept a wage less then their MRPL in order
to guarantee a spot on the team. For this reason I will consider bargaining power as a factor
determining wages when a team is looking to sign a goalie to a new contract. This measure
will be proxied by the variable: total number of NHL caliber free agent goalies. It will be
defined as the number of goalies whose contracts expired in the relevant o↵-season and who
played at least one minute in the NHL.
6 Deviation From Original Paper
While this paper follows the methods of Berri and Brook (2010), in an attempt at estimating
a relationship between wage and MRPL, two major deviations from their sample should be
noted.
6.1 Restricted vs Unrestricted
First, following the work of Richardson [3] and Lambrinos and Ashman [4] restricted free
agents will be included in the data-set as there is no evidence in the literature that their
wages are determined in a di↵erent manner from unrestricted free agents. A binary variable
for restricted free agent, RFA, will be used as a control if such di↵erences are found to exist
within this data-set.
6.2 Free Agency
Second, and most importantly, this paper provides a strict definition of a free agent, but
first it is crucial to discuss the way NHL contracts work and how NHL free agency works.
Firstly, there is no rule in the collective bargaining agreement that mandates an NHL team
or player to publicly reveal the contents of a player’s contract. Despite this, the NHLPA has
been releasing players’ salaries since before 19905
. Further, players and their agents have
been releasing players’ contract details as well as their salaries in an e↵ort to strengthen
their bargaining power in contract negotiations. So while not all NHL players’ salaries and
contracts are public, many of them are. Secondly, in the NHL free agency system any player
will become a free agent on July 1st of the year his contract expires. If a player to become
5
While an exact date is not known for when the NHLPA began releasing salaries, it is not relevant for
this paper.
13
a free agent chooses to re-sign with his current team, or is traded and signs with the new
team before July 1st, that player will no longer be considered a free agent.
In Berri and Brook’s paper, they define a free agent as someone whose contract would
have expired in the relevant o↵-season, regardless of whether they signed a new contract
before the start of free agency on July 1st. While the author agrees with this definition, it
requires private information of which the I can not obtain. In an email on file from Stacey
Brook he stated that he collected free agency data from Hockey News (magazine) through
reports produced of upcoming free agents before the season was over. He suggested that for
my data I use the website Spotrac. Upon further examination of the website, however, it
became clear that the information they provide is inconsistent and not always correct6
. In
an email on file with Julie Young, the director of communications for the NHL, she stated
that the “only” information the NHL releases publicly is the yearly July 1st free agent lists
and she couldn’t provide me with a list of players who would have become free agents had
they not been traded or resigned prior to free agency.
Given all the above, this paper will use the traditional definition of a free agent: any
player as of July 1st whose previous contract has expired and has not yet signed with a new
team.
6.2.1 Sample Selection Bias
As discussed above, given the public information available to the author at the time of
writing, only free agents in any given o↵-season are known, as opposed to knowing all the
players whose contracts would have expired. The issue this causes is that any goalie who
would have become a free agent, but resigned before free agency, does not become part of
our sample. What is included in our sample is all the NHL goalies whose teams, on the
expiration of the goalie’s contact, waited for free agency to see if they could get another
goalie7
. Thus, the sample comprises goalies whose teams do not what them anymore and so
they become free agents. Since our sample only includes goalies who played at least 1,000
minutes in all relevant seasons8
, it is e↵ectively made up of those whose recent production
was not high enough to guarantee them a spot on their old team, but not poor enough as
to prevent other teams from giving them a chance. These sample selection issues may lead
6
When comparing information on Spotrac with that provided by the NHL discrepancies were found
regarding the terms of the contract upon expiration: i.e whether the play became a UFA or RFA upon
contract information. Therefore, Spotrac can not be seen as a reliable source.
7
This can be assumed because if a team was sure their current goalie was better than all the available
upcoming goalies, they would just extend the goalie’s contract prior to free agency
8
the importance of this restriction will be disused later
14
to problems when attempting to model E(MRPL) because they could potentially lead to the
estimation of an inaccurate relationship between production and wage. Specific problems
caused by this sample selection bias will be discussed later using the relevant assumptions
necessary to run OLS regressions with estimates of that can be considered BLUE 9
.
7 Data
7.1 The Sample
Considering the data set is made up of three years of data for all goalies (their last contract
season, as well as the seasons before and after the final contract year), it can be classified as
an independently pooled cross section data-set. The data set is made up of performance and
production related statistics for 44 goalies that became free agents following the 2005-2006
season up to the start of free agency following the 2013-2014 season. Following to the work
of both Jenkins (1996)[25] and Berri and Brook (2010)[1], only free agents will be included
in our sample because,
“Including players in the midst of a long-term contract results in measurement
error in a salary regression.”[1]
Free agents were determined from the relevant lists provided to the author by the NHL.
Following the work of Berri and Brook only goalies who played 1000 minutes in each of the
two seasons before their contract expired as well as playing 1000 minutes in the year they
signed their new contract are included, in order to prevent goalies who played fewer then
1000 minutes with extreme (very bad or very good) results from biasing the data.
Salary data was collected from both ‘HockeyZonePlus’ for the seasons prior to 2011–2012
and ‘Spotrac’ for all data after the 2011–2012 season10
. Taking into account the previous
literature on labor economics and the NHL, this paper will consider the log of wages instead of
their absolute value in order to better examine the relationship between wage and E(MRPL).
All production (performance) related statistics including: SV%, GS, age, experience, and
height, as well as their lags and leads, were collected from the NHL’s website. Population
data was collected from the government websites of both the United States11
and Canada12
.
Google searches, as discussed above, were collected by using both Google Trends and Google
9
BLUE refers to the coe cient estimates being the most e cient, unbiased and e cient estimator
10
While Spotrac was found to be a non reliable source of contract information regarding RFA and UFA,
it was found to have reliable data on yearly salaries.
11
http://factfinder.census.gov/faces/tableservices/jsf/pages/productview.xhtml?src=bkmk
12
http://www.statcan.gc.ca/tables-tableaux/sum-som/l01/cst01/demo05a-eng.htm
15
Adwords. Information on both All-Star Games played and arena capacity was collected from
Wikipedia. Data on average attendance per game was collected from ’HockeyDB’. Capacity
is calculated as the average percentage of seats filled per game during the season. Table 1
presents descriptive statistics for all of the relevant variables.
Table 1: Descriptive Statistics
Variable Mean SD Median Minimum Maximum n
Log Salary 14.430 0.640 14.301 13.122 15.607 44
Save % 0.909 0.011 0.909 0.883 0.930 44
Squared Save % 0.826 0.021 0.826 0.780 0.865 44
Experience 6.773 3.917 5.5 1 18 44
Height 73.423 2.036 72.8 68.9 78 44
Age 31.023 5.151 31 23 42 44
Games Started 37.977 14.161 37 19 72 44
TOI 2,283.045 805.6107 2,103 1,108 4,305 44
Capacity 0.921 0.084 0.950 0.725 1 44
No. ASG 0.659 1.642 0 0 9 44
Avg. Monthly Google Searches 14,272.98 21,937.20 5,806.50 0 88,283.21 44
No. Free Agent Goalies 28.023 2.205 29 24 31 44
Population 2,469,391 2,909,200 1,235,150 165,521 8,175,133 44
Following neoclassical economic theory, both age and experience are expected to have
positive e↵ects on salary while squared age is thought to have a negative a↵ect due to the
fact that after a certain age goalies are not able to perform as well as they did in their prime.
On the other hand, if a nonlinear relationship exists between wage and experience then I
would expect squared experience to have a positive e↵ect on wages. In addition, following
the work of Peck (2012), height is expected to have a positive e↵ect on wages due to the
post lock-out benefits of being a taller goaltender.
One would expect the variable games started to have a positive correlation with salary,
if the act of starting a game is valued highly enough to generate greater payment. In a
case where two goalies have an equal save percentage, say 90%, for example, but one started
60 games and the other only 20, the former should arguably be compensated more than
the latter. For the same reasons as mentioned for games started, TOI is expected to have a
positive relationship with salary. In order to avoid potential problems of collinearity between
GS and TOI (they have a correlation of 0.5083), TOI will solely be considered, as it is a
16
more inclusive measure of ice time and games started.
As mentioned above, while it is reasonable to expect capacity to have a positive e↵ect
on goalies wages, the introduction of revenue sharing and the salary cap could dampen the
e↵ectiveness of capacity as an explanatory variable.
Acting as proxies for fan popularity, which can be represented by merchandise revenues
per player among other things, the number of All-Star Games as well as number of Google
searches are expected to have a positive e↵ect on a goalies’ wage. Population is also expected
to have a positive impact on a goalie’s wage due to the fact that in larger cities, more
merchandise can be sold to fans, therefore increasing a goalie’s marginal revenue.
As the main measure of a goalie’s production, save percentage is expected to be positively
related to wage. This is due to the fact that as a goalie produces more wins for his team,
he will be compensated for the increased revenue generated from winning. As discussed
earlier, and proven in section 7.2.3, the sample selection bias presented in this paper will
negatively a↵ect the impact save percentage has on wages. With regard to squared save
percentage, the expected sign is unknown. If teams believe that an above average save
percentage will generate them exponentially more wins, then the coe cient on squared save
percentage will be positive; if, on the other hand, they believe that after a certain point
increasing a goalie’s save percentage by 1% will generate them less then 1% more wins, then
there will be a negative relationship between squared save percentage and wins. Due to the
lack of variation in goalies’ save percentage, the author predicts the former; however, when
considering the inconsistent nature of goalies’ performance, as will be discussed later, the
latter seems more likely as teams would not be willing to pay an exponentially high wage to
a goalie that cannot guarantee above average future production.
7.2 Estimation Process
In order for to show that our OLS estimates of the coe cients are not only valid but also
BLUE (best linear unbiased estimator) we must first make assumptions about our data set.
After mathmatically showing how these assumptions will make our estimations unbiased, a
theoretical explanation will be presented concerning the validity of these assumptions
7.2.1 Model Specification
The first OLS assumption is that the model can be written linearly where y is a function of
observed and unobserved characteristics :
y = X + u (8)
17
where y, wages, is a observed (n x 1) vector; X is a [n x (k + 1)] observed matrix of di↵erent
factors of a goalies’ MRPL for the relevant years; and where u is a (n x 1) vector of unobserved
characteristics [error term]. In other words, this assumption is stating that the relationship
between the dependent variable and independent variable are linear in parameters. It is
important to note that variables like squared age, as well as squared save percentage are
still considered to be linear even though they model a non-linear relationship between the
dependent and independent variables. This is due to the fact that they are introduced into
the regression as a separate independent variables from their non squares giving a linear
relationship between wage and save% / age as well as another linear relationship between
wage and Sq. save% / Sq. Age. While there is a possibility that a non-linear relationship
exists between wages and MRPL, all of the previous literature on wage determination has
only considered wages as a linear function of MRPL.
7.2.2 No Perfect Colinearity
The second assumption for unbiased OLS estimates is that there is no perfect collinearity
between regressors. Empirically this means: Matrix X has rank (K + 1). Meaning all of the
regressors (variables that make up MRPL) are independent from one another; said di↵erently,
there can not be a high degree of correlation between regressors. This assumption is vital
because if it does not hold then a high correlation between regressors can produce biased
estimates of .
In order to test for colinearity between regressors one must check the variance inflation
factor (vif) of the regression. The way vif is calculated for each independent variable is as
follows, where:
vifi =
1
1 R2
i
(9)
Vif tests how ’inflated’ the variance of a coe cient is due to its correlation with other
regressors. It is lower bounded at 1, meaning the regressor with vif = 1 is not correlated
with any other regressors and is not bounded at the top. Traditionally, a vif > 10 implies
colinearity between regressors, meaning the variance of the estimate of the relevant variable
is being increased due to the presence of another variable in the regression. Below is a table
reporting the results of our colinearity test for the regression modeling w = E(MRPL)13
:
Here, the variables with a high vif: both age and age squared are expected to be highly
correlated with one another as they are directly related. For these variables, increased
variance is not a problem because it allows for a non-linear relationship between wage and
age. Considering the rest of the variables all with a vif below 5, this paper can take the
13
Specifically this is referring to Regression 3 of table 3 labeled ’Significant’
18
Table 2: Results of VIF Test
Independent Variables VIF
Age 323.19
Sq. Age 286.37
Save% Last Season 1.15
Save% 2 Seasons ago 1.50
RFA 5.10
Time on Ice last season 1.38
Capacity 1.37
Google Searches 1.25
No. Free Agent goalies 1.38
second OLS assumption as being true. In addition to the variables discussed above, Games
Started was found to have a very high correlation, 0.9884, with time on ice, therefore it was
omitted to prevent biasing the coe cient on TOI.
7.2.3 Zero Conditional Mean
The third OLS assumption is that the error term has a zero conditional mean:
E(u|X) = 0 (10)
The expected value of the error term should be zero given any values of X. This assumption
is implied to be true if it is assumed that the data is collected from a random sample.
Unfortunately, as discussed earlier, given the publicly available information to the author at
the time of writing, it was impossible to have a random sample. In fact, the data su↵ers from
sample selection bias. Without this assumption it is not possible for this paper to present
estimators that are BLUE, but since the bias a↵ecting the data is known, it is possible
to understand how the estimates are being e↵ected by this bias, thus allowing this paper
to proceed with running OLS regressions. It is crucial to note that without proving the
validity of this assumption, statements made about the causal relationships between our
independent regressors and our dependent variable need to be made with caution as the
sample lacks internal validity.
In order to understand how this bias a↵ects our estimates a theoretical framework must
first be developed. Using equation 5, it is then possible to state:
u = y X (11)
19
Knowing that OLS uses estimates of that minimize the sum of square residuals, we can
write:
SSR =
nX
i=1
ˆu2
i = ˆu0
ˆu = (y X )0
(y X ) (12)
Minimizing the equation above and setting it equal to zero gives you:
2X0
(y X ˆ) = 0 (13)
Rearranging for then gives you the estimate of beta:
ˆ = (X0
X) 1
X0
y (14)
Substituting (X + u) in for y in equation 13 yields:
ˆ = (X0
X) 1
X0
(X + u) = (X0
X) 1
X0
X + (X0
X) 1
X0
u (15)
Taking expectations yields:
E(ˆ|X) = E[(X0
X) 1
X0
X |X] + E[(X0
X) 1
X0
E(u)|X)] (16)
Because (X0
X) 1
X0
X is an identity and E( |X) = equation 15 can be re-written as:
E(ˆ|X) = + E[(X0
X) 1
X0
E(u)|X)] (17)
In normal circumstances, where the assumption E(u|X) equals zero, the expected value of ˆ
would equal . Here, because the zero conditional mean assumption has been violated our
estimates of beta will not be equal to the true of the population, but instead be biased.
Specifically, the e↵ect of the bias can be seen through the correlation of the unobserved
variable and X as well as correlation between the unobserved variable and y. Relating this
to the sample of this paper, there is some unobserved variable, call it z, that is negatively
correlated with save percentage and positively correlated with wage. Due to the presence of
this unobserved omitted variable, the estimates of ˆ on save percentage will be negatively
biased. It is in the opinion of the author that one option for what the unobserved omitted
variable could be, is that the variable is related to or reflective of the inconsistent nature of
goalies.
7.2.4 Homoscedasticity
The fourth assumption, and final assumption to make in order to determine our statistical
estimates to be BLUE, we must take is that the error term is Homoskedastic and is not
serially correlated. This can be written as:
V ar(u|X) = 2
In (18)
20
Where In is a (n x n) Identity Matrix. Since the data set is made up of independently
pooled cross sectional data, the assumption that the error terms are not serially correlated
automatically holds.
Regarding heteroscedasticity, a simple Breusch-Pagan / Cook-Weisberg test can be per-
formed on the final regression of table 3 labeled ’Non-Linear Sig’. In stata, this test is
preformed by testing whether t = o in the following regression:
V ar(✏) = 2
ezt
(19)
where z is equal to all of the fitted values of y; and if if t = o, then the variance of ✏ would
equal:
V ar(✏) = 2
In (20)
The p value from running the Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
is equal to 0.7639 which is a clear sign that the error does not have a constant variance.
While this can be problematic, Stata o↵ers a simple solution which allows the user to run a
’robust’ regression to account for this issue. This solution is called the ’Huber and White’
robust standard errors, and is also referred to as the sandwich estimator. This new way
of calculating standard errors just slightly adjusts the old way of calculation to allow for
a non-constant variance of y. Starting with equation 13, we can proceed with deriving the
sandwich estimator:
ˆ = (X0
X) 1
X0
y (21)
Therefore, the variance of ˆ can be calculated as:
V ar(ˆ) = V ar[(X0
X) 1
X0
y] = (X0
X) 1
V ar(X0
y)(X0
X) 1
(22)
Without running a robust regression stata assumes the Var(y) is constant and so it simplifies
the above equation to:
V ar(ˆ) = (X0
X) 1
X0
V ar(y)X(X0
X) 1
= 2
y(X0
X) 1
(23)
But because the Var(y) is not constant (heteroscedastic), adjustments need to be made to
the original formula. Because y is a (n x 1) row vector and X’ [(k +1) x n] matrix; X’y is a
[(k + 1) x 1] row vector, where its first element X0
1y can be defined as:
X0
1y = x11y1 + x21y2 + ... + xn1yn (24)
Where, assuming all yj are independent, the variance of X0
1y is equal to:
V ar(X0
1y) = x2
11V ar(y1) + x2
21V ar(y2) + ... + x2
n1V ar(yn) (25)
21
As X is a [n x (k + 1)] observed matrix where its rows are the di↵erent goalies, and its
columns represent di↵erent measures of MRPL, V ar(X1y) can be thought of as the variance
of save percentage (although any measure of MRPL would have su ced), allowing for het-
eroscedasticity. Combining V ar(X0
1y) with the variance of all the other columns of x, you
obtain:
V ar(X0
y) =
nX
j=1
ˆ✏2
j x0
jxj (26)
Substituting this new term back into equation 22 gives the formula for the sandwich estima-
tor:
V ar(ˆ) = (X0
X) 1
(
nX
j=1
ˆ✏2
j x0
jxj)(X0
X) 1
(27)
Here, Stata allows for a non-constant variance of the error term when calculating the variance
of the estimates, therefore making the standard errors robust to heteroscedasticity.
7.2.5 Normality of Error
The final assumption we must take, which will allow for the use of the t distribution, is that
the errors follow a normal distribution. Mathematically, this can be written as N ⇠ (0, 2
In).
This assumption is crucial because without it being valid, there is no way to test for any
statistically significant relationship between y and X. It can be tested simply by comparing
the distribution of the error terms to a normal distribution.
This can be tested by simply comparing the distribution of the error term, u, to that of
a normal distribution. A standard normal distribution with mean of 0 and variance equal
to 2
should have a skewness equal to 0 as well as kurtosis equal to 3. Comparing this to
our distribution of u14
; it has a mean of 0.00000000203, skewness of -0.20, and kurtosis of
2.71. While u does not follow a perfect standard normal distribution, evident by its negative
skewness as well as a slightly smaller kurtosis then normal, it can be approximated as normal
in order to move forward using the student t distribution.
It is important to note that multiple attempts were made to correct for this non-normal
distribution of the error term by transforming both the independent and dependent variables
in various way to achieve a normal distribution of the residuals. It was found that the original
specification provided in this paper produced residuals that can be considered approximately
normally distributed. For that reason we will approximate the distribution of the error
term, as above, as normally distributed in order to be able to define the significance of the
independent variables.
14
For R! of table 4.
22
7.3 Regression Results
7.3.1 Looking Back
The OLS assumptions with some corrections seem to be valid. In the following section I will
proceed with analyzing the e↵ects of MRPL on wage. Due to the di↵erences between the
work of Berri and Brook (2010) and this paper, both in explanatory variables and data sets,
this paper will re-run the specifications presented by Berri and Brook using their sample
in order to provide an accurate comparison between the two. First, the paper will present
three di↵erent ways of estimating wage, before a contract is signed. This is representative
of the estimates a team has to make of a player’s MRPL before they sign the contract, not
knowing what his production will be. The first set of regressions will be representative of
the equation:
w = E(MRPL) (28)
Empirically the first regression (BB) can be written as:
ˆln(wage) = ˆ0 + ˆ1SaveLast+ ˆ2Save2+ ˆ3Age+ ˆ4Sq.Age+ ˆ5TOI + ˆ6TOI2+ ˆ7POP +✏
(29)
Table 3 presents the empirical estimates of equation 29 in four di↵erent forms. The
first regression is modeled on the specifications of Berri and Brook (2010), which, as noted
above, only considered factors associated with MPL and not MR. The second regression
considers all potential relevant variables as discussed in Section 5, except for where they
were intentionally omitted (i.e. Games Started and experience). The third regression re-runs
the specifications of regression 2, only considering statistically significant variables in order
to prevent non-significant variables from biasing the estimates of the beta estimates. The
fourth regression tests for possible non-linear relationships between the wages and E(MRPL),
using the significant variables from the second regression.
Looking at the first regression alone (BB) there are a number of important findings to
note. Considering the regression run by Berri and Brook in their paper, where they found
a statistically significant (stat-sig.) relationship between wage and: save% last year, save%
2 seasons ago, squared age, time on ice, time on ice 2 seasons ago, and a stat-sig. constant.
Using the new sample in this paper, only save% last season and time on ice last season were
found to be stat-sig. This is important to understand because it shows a clear di↵erence in
the relationship between wage and E(MRPL) for the two samples; showing that the sample
used in this paper may not be externally valid to the specifications run by Berri and Brooks,
as suspected earlier. While it is problematic to have a sample that is not externally valid,
conclusions about improving the estimated relationship between wage and E(MRPL) can
23
Table 3: Salary Regressions. Dependent variable: Log of salary. OLS Methods used. Het-
eroscedastic robust standard errors used. p values presented in parenthesis below coe cient
estimates
Independent Variables BB All Var. Sig. Non-Linear Sig.
Save% Last Season 19.91*** 6.95 5.44 -125.56
(.030) (0.461) (0.558) (0.885)
Save% 2 Seasons Ago 8.22 9.85 11.6* 1076***
(0.310) (0.194) (0.112) (0.045)
Sq. Save% Last Season - - - 71.52
(-) (-) (-) (0.881)
Sq. Save% 2 Seasons Ago - - - -585.77***
(-) (-) (-) (0.047)
Age 0.04 0.31* 0.215 0.204
(0.851) (0.142) (.252) (0.258)
Sq. Age -0.93⇤10 3
-0.005* -0.003 -0.003
(0.759) (0.140) (0.226) (0.245)
Time on Ice Last Season 0.35⇤10 3
*** 0.254⇤10 3
*** 0.254⇤10 3
*** 0.254⇤10 3
***
(0.001) (0.006) (0.003) (0.007)
Time on Ice 2 Seasons Ago -0.11⇤10 3
- - -
(0.306) (-) (-) (-)
Population 1.63⇤10 8
4.78⇤10 9
- -
(0.562) (0.841) (-) (-)
Google Searches - 9.88⇤10 6
*** 1.04⇤10 5
*** 1.00⇤10 5
***
(-) (0.00) (0.00) (0.00)
No. Free Agent Goalies - -0.07* -0.07** -0.07**
(-) (0.115) (0.065) (0.083)
RFA - 0.49** 0.38 0.43*
(-) (0.094) (0.203) (0.119)
Capacity - -0.07 - -
(-) (0.945) (-) (-)
Height - 0.04 - -
(-) (0.316) (-) (-)
Constant -11.99 -7.97 -3.46 -426.80
(0.250) (0.396) (0.670) (0.320)
Adj. R Squared 0.37 0.53 0.56 0.56
Observations 44 44 44 44
*** 95% Confidence
** 90% Confidence
* 85% Confidence
still be made; however they are potentially invalid when considering new samples. With
that said, the sign of all of the estimates are of the expected sign except for time on ice 2
24
seasons ago, which is negative. For the same reasons as why TOI is expected to be positively
correlated with wage, as discussed in section 7.1, so is time on ice 2 seasons ago; though the
lack of statistical significance regarding this variable implies that it can be disregarded.
Considering the first regression (AllVar) both save% last season and save% 2 seasons ago
are of the expected sign, yet not stat-sig. The lack of significance of the estimates of save
percentage could be explained by the fact that E(u|X) 6= 0, where the unobserved omitted
variable is negatively biasing the e↵ect of save% on wage. The lack of significance could also
be explained by the fact that there is an omitted variable correlated with wage and save%, as
will be explained later; in this case the omitted factor was that I failed to model a non-linear
relationship between wage and save%. In the same regression, both age and squared age are
of the expected sign and are stat-sig. Implying how goalies improve from their rookie season
through their ‘prime’ and then become less productive after their ‘prime’. However, when
attempting to calculate what age a goalie in our sample hits their prime, it was found that
in our sample as goalies get older their wages decrease. This is known through the fact that
any age greater then 0 would yield a negative value for 0.04Age .93Age2
. This can again
be explained by the sample selection bias present in this paper, which meant that a lot of the
goalies in our sample were past their prime at the time of entering free agency. Furthering
the point that a team with a productive goalie in their prime would not allow them to enter
free agency at the risk of not being able to get as good of a goalie.
Variables such as population, capacity and height are all of the expected sign, but not
stat-sig, so they were dropped from the regression. Accounting for a player’s MR, Google
searches appears to be a significant determinant of wages as it is of the expected sign and is
stat-sig with 95% confidence.
Testing for di↵erences between UFAs and RFAs, there is a stat-sig relationship between
RFA and wage, showing that restricted free agents will earn more money than their unre-
stricted counterparts. This relationship can be explained by the fact that when a player is
a RFA, there are specific rules that teams must follow; for example, they are not allowed to
o↵er a player less then 90% of his original salary. When players are unrestricted free agents,
teams may o↵er and pay them any salary both parties agree to, including a pay cut of more
then 10%. Though considering the work Richardson, Lambrinos and Ashman, a more plau-
sible explanation is that the restricted free agent goalies were better then their non-restricted
counterparts. This is understandable as teams who have the rights to a goalie (referring to
RFAs) would be more willing to see what other goalies are available in free agency knowing
they could always re-sign their old goalie. This is contrary to teams whose goalies become
UFA after their contract because then the team would only resign the goalie if they are sure
he has a greater MRPL then all other free agents. Therefore it can be though that the RFAs
25
in our sample are of better quality then the UFA, which is why they command a higher
wage.
Regarding a player’s bargaining power, the sign on number of free agent goalies is neg-
ative and stat-sig; demonstrating that, ceteris paribus, as the number of free agent goalies
increases, a goalie’s individual wage will decrease.
When comparing the regressions run by Berri and Brook with the first regression of
this paper (AllVar) the di↵erences become quite clear. Including variables relating to a
goalie’s MR as well as their bargaining power significantly increases the overall level-of-
fit of the regression. Thus, with the adjusted r-squared increasing from 0.37 to 0.53, the
inclusion of variables relating to a goalie’s MR/bargaining power explains almost twice as
much of the di↵erences in wages as not including them. By comparing the results of the
last two regressions of table 3 (sig/non-linear sig), it is possible to interpret the relationship
between wage and determinants of wage. By modeling for non-linearity, the results become
a lot clearer than before. Instead of there being a linear relationship between save% and
wage, it appears that a non-linear relationship between these variables provides the strongest
explanation. Thus, when modeling for non-linearity, the estimates of the e↵ect of save% on
wage become more significant than before. Considering the stat-sig coe cients of save% 2
seasons ago, which was positive, and squared save% 2 seasons ago, which was negative, it
appears that there are diminishing returns on wage to save percentage.
This is very plausible, as increasing the save percentage from 91% to 93% would likely
have a much larger impact on wage then an increase from 93% to 96%. This relationship
can be explained by the fact that ice hockey is a team sport, meaning that in order to win a
game you not only need a goalie to prevent goals, but you also need forwards to score goals.
Thus, if you assume that an average NHL team takes 30 shots per game, then with a save
percentage of less then 91% a goalie will concede more then 2 goals; with a save percentage
between 91% and 95%, a goalie will concede 2 goals; and with a save percentage above 96%
a goalie will concede 1 goal or less. The mean save percentage in our sample was 90.9%, with
a minimum of 88.3% and a maximum of 93%; so with an average of 30 shots, a team will
score 3 times per game. In terms of goals conceded, increasing ones save percentage from
91% to 93% would decrease the goals allowed from 3 to 2; increasing save percentage from
93% to 95% would not change the goals allowed; and increasing the save percentage above
95% would decrease the goals allowed from 2 to 1 or even 0. Since the average team will
score 3 times per game, as stated above, teams will pay goalies more money for decreasing
their goals conceded from 3 to 2, than from 2 to 1, because in the first case the di↵erence
is between winning or not winning, while in the second case there is no e↵ect on whether a
team will win. The marginal return on wage of conceding 2 goals versus 3 is greater than
26
that of conceding 1 goal versus 2. Empirically, the point of diminishing returns which is
calculated by determining the maximum of the parabola 1076(Save2%) 585.77(Save2%)2
,
as implied by the regression ‘Non-Linear Sig’, is at 91.84%.
Due to the lack of specification robustness, the small sample size, and potential sample se-
lection biases, rather than considering each regression as a way of modeling w = E(MRPL), it
is better to consider what has been learned about how teams determine a goalies E(MRPL).
Through the regressions presented it is clear that the addition of variables relating to a
player’s MR, as well as their bargaining power and contract status, are all significant de-
terminants of wage. Although this sample lacks external validity for the reasons presented
above, it appears correct to claim that: in addition to the performance related statistics con-
sidered in the literature, measures of popularity as well as bargaining power are statistically
significant determinants of wage, and not considering them would result in a serious omitted
variable bias.
7.3.2 Looking Forward
While the variables presented above relating to a player’s MR and MPL do a good job at
explaining goalies’ wages, Berri and Brook (2010) point out that:
“Although salaries are often a function of past performance, the salary decision
is a statement about the future. Teams are not paying [for] what a goalie did
last year, but what they hope that goalie will do after he signs the contract.”[1]
Consequently, this section will consider how good teams are at estimating a goalie’s MRPL.
This section will consider the condition:
E(MRPL) = MRPL (30)
Empirically the first regression (BB) can be written as:
ˆln(wage) = ˆ0 + ˆ1Savenew + ˆ2Age + ˆ3Sq.Age + ˆ4TOI + ˆ5POP + ✏ (31)
This will be done by using the same methodology as used in the last section. Here, three
regressions will be presented; the first will run the specification of Berri and Brook through
the new sample of this paper, the second will consider variables that related to a goalies MR
that were not previously considered, and the third will re-run the specification of regression
2 while also allowing for possible non-linear relationships between E(MRPL) and MRPL.
Analyzing the results from the first regression of table 4 (BB), there are a few important
points to note. Comparing these results to the ones obtained in Berri and Brook’s paper
27
Table 4: Salary Regression. Dependent variable: Log of salary. OLS Methods used. Het-
eroscedastic robust standard errors used. p values presented in parenthesis below coe cient
estimates
Independent Variables BB R1 R2
Save% Current Season -0.706 -5.93 -877.11
(0.931) (0.402) (0.179)
Sq. Save% Current Season - - 479.44
(-) (-) (0.183)
Age -0.082 - -
(0.645) (-) (-)
Sq. Age 0.001 - -
(0.648) (-) (-)
Time on Ice Current Season 0.382⇤10 3
*** 0.353⇤10 3
*** 0.353⇤10 3
***
(0.004) (0.003) (0.003)
Population 3.52⇤10 8
* - -
((0.141)) (-) (-)
Google Searches - 1.38⇤10 5
*** 1.35⇤10 5
***
(-) (0.00) (0.00)
Constant 15.42*** 18.82*** 414.51
(0.037) (0.004) (0.162)
Adj. R Squared 0.166 0.404 0.405
Observations 44 44 44
*** 95% Confidence
** 90% Confidence
* 85% Confidence
(where save percentage, age, squared age and population were all of the expected sign but
not stat-sig); using the data set of this paper all of these variables, with the exception of
population, were not of the expected sign and not stat-sig. In their paper, Berri and Brook
attribute this lack of relationship between measures of production and expected production
to be due to the inconsistent nature of goalies. The fact that the sample in this paper found
an inverse relationship between production and expected production implies that the goalies
in this sample were more inconsistent than those in that of Berri and Brook. This is very
plausible due to the fact that our sample only considers goalies let go from teams into free
agency and not all goalies.
Before the inconsistency of goalies is discussed, it is first important to consider the final
two regressions of table 4. Considering the second regression (R1), a negative relationship
is found between save% and wage. Taken literally, this would imply that the better a goalie
performs, the lower he will be paid. This explanation has no bearing in the context of ice
hockey, and since the coe cient is not statistically di↵erent from zero, it can simply be said
28
that current wage is unrelated to current save%. Another important di↵erence between the
first and second regressions of table 4 is the exclusion of the age and population variables.
This is because table 4 represents the equation E(MRPL) = MRPL. When taking expecta-
tions of MRPL factors such as age, population, height, capacity, etc., are important because
they represent indicators of what future production may be. Especially considering the in-
consistent nature of goalies, which will be shown later, when estimating future production
it is important to consider indicators other then past production. With that said, when
determining actual MRPL to see how close it is to what was expected, variables such as age
and population are not actual indicators of production and were therefore excluded from the
final two regressions of table 4.
Another important feature of the last two regressions in table 4 is that although there is no
stat-sig relationship between production and wage, there is still a strong stat-sig relationship
between MR and wage. Although the coe cient on Google searches is not particularly large
in absolute value, its strong statistical significance as well as increased measure-of-fit (from
16% to 40%) shows that its inclusion as a determinant of MR is crucial. Coupled with
the inconsistent nature of goalies’ production, excluding measures related to a goalie’s MR
significantly decreases the estimation ability of our regressions.
The inconsistent nature of goalies can be shown through a regression of current save% on
save% last season, as well as through a regression of save% last season on save% 2 seasons
ago. Regarding the first regression, the r-squared is equal to 0.0059 with the coe cient on
save% last season not being stat-sig, implying that a goalie’s production last season explains
0.59% of their production this season. In terms of the second regression where the r-squared
is equal to 0.12 and the coe cient on save% 2 seasons ago is stat-sig, this implies that a
goalie’s production 2 years ago explains 12% of their production last year. The di↵erence in
significance of both the estimates of lagged production as well as the very low measures-of-fit
for both regressions strongly imply that a goalie’s production in one season is not a good
predictor of what their production will be in future seasons.
8 Conclusion
The purpose of this paper was to extend the current literature on wage determination for
NHL goalies. In Berri and Brook’s paper they found the coe cient of variation (CoV) of a
goalies wage to be 0.74, and a CoV of a goalies save% to be 0.01115
. This paper found a
CoV of wage equal to 0.65, and a CoV of save% equal to 0.012. Both of these results show
15
All CoV calculation were done by hand and were calculated by dividing a variables standard deviation
by its mean
29
that the di↵erences in goalies’ wages are significantly greater than their di↵erence in produc-
tion. Specifically, this paper claimed that in the post lock-out period of the NHL, including
variables relating to a goalie’s marginal revenue will significantly increase the measures-of-fit
for the regressions and therefore explain more of the variation in goalies’ wages. Due to the
problems discussed above with the internal and external validity of the sample, rather than
considering each regression individually, an overall analysis of all the specifications will be
made, which will still only refer to the goalies within the sample of this paper. Considering
both methods of modeling wages and E(MRPL), both before a contract is signed and after,
it is clear that a player’s popularity is a significant determinant of wage. Due to the lack of
predictability of a goalies’ production, their popularity becomes even more significant; when
teams are signing goalies to a contract, they can be a lot more certain of the revenues they
will generate due to a players popularity than of those generated from wins the goalie will
potentially produce.
8.1 Future Research
Due to problems regarding the sample in this paper, which a↵ected both the internal and
external validity of our regressions, it is quite di cult to draw concrete conclusions about the
e↵ects introducing these new variables has on modeling wages, although certain conclusions
can be drawn. Moving forward, the next best step in analyzing the determinants of goalies’
wages would be to first obtain full contract information of all NHL goalies so there is an
unbiased sample. Without the zero conditional mean assumption being violated, it is the
opinion of this author that a stronger, more significant relationship would be found between
production and wage.
This paper also agrees with the conclusions of Berri and Brooks (2010), which stated:
“Decision makers in hockey have correctly identified save percentage as the ap-
propriate measure of performance. However, it does not appear that decision
makers understand the inconsistency of this measure.[1]”
Therefore, if future papers also have information on how NHL teams deal with the inconsis-
tent nature of a goalie’s performance when estimating their future production, there would
be potential for a much stronger understanding of how teams determine a goalie’s salary.
Though this type of information is more likely to be obtained through conversations with
NHL general managers then through data analysis.
While in this paper, save percentage was used as the primary measure of production, due
to the problems in considering wins and goals against average, future research may consider
30
using adjusted save percentage in order to control for the e↵ect of the defense. Schuckers
(2011)[26] argued that even a goalie’s save percentage is dependent on the defense in front
of them and therefore by calculating a defense independent save percentage, also referred
to as adjusted save percentage, one could develop a more accurate estimate of the relative
production capabilities of goalies across the NHL. By using this metric, one should find a
more significant relationship between production and wage. Unfortunately, at the time of
writing this paper, the only website that that calculated adjusted save percentage for all
goalies (war-on-ice) was ine↵ective16
.
As demonstrated in this paper, due to the inconsistent nature of goalies, teams do a poor
job in estimating goalies’ future performance. Coupled with the fact that goalies are all very
similar in their production abilities, shown by a CoV of save% equal to 0.012, teams are
essentially ’throwing darts at a board’ when estimating a goalie’s future production. With
that said, contract determination could be considered a bargaining solution between teams
and goalies that is not based as much on production as it is on other factors. While the
main purpose of this paper was to consider the e↵ect of popularity on wages, and therefore it
only controlled for the number of free agent goalies so as to not bias the e↵ect of production
and popularity on wages, future work could undertake a much more in-depth analysis of
the bargaining process between teams and goalies. It may even be possible to argue that
compared to other sports, such as basketball and baseball, as well as other positions in ice
hockey, due to the limited information available to evaluate goalies’ production coupled with
the inconsistency of the measure, their salary decisions are more reflective of a bargaining
agreement than of a linear relationship between factors of MRPL and wage.
16
In addition to having data on adjusted save percentage, war-on-ice also had standard measures of
performance like save percentage, goals against average, time on ice, etc. When comparing this standard
measures of performance with the ones reported on NHL.com many inconsistencies were found and therefore
war-on-ice could not be trusted as a reliable source of data.
31
References
[1] Berri, D., Brook, S. (2010). On the Evaluation of the ’Most Important’ Position in
Professional Sports. Journal Of Sports Economics, 11(2), 157-171. Retrieved from:
http://dx.doi.org/10.1177/1527002510363097
[2] Scully, G.W., 1974. Pay and Performance in Major League Baseball. The American
Economic Review 64(6), 915–930. Retrieved from http://www.jstor.org/stable/1815242.
[3] Richardson, D. H.. (2000). Pay, Performance, and Competitive Balance in the Na-
tional Hockey League. Eastern Economic Journal, 26(4), 393–417. Retrieved from
http://www.jstor.org/stable/40326440
[4] Lambrinos, J.; Ashman, T. D. (2007). Salary Determination in the National Hockey
League Is Arbitration E cient? Journal of Sports Economics, 8(2), 192-201. Retrieved
from http://jse.sagepub.com/content/8/2/192.refs
[5] Lavoie, M., Grenier, G., Coulombe, S.. (1987). Discrimination and Performance Di↵er-
entials in the National Hockey League. Canadian Public Policy / Analyse De Politiques,
13(4), 407–422. Retrieved from http://doi.org/10.2307/3550883.
[6] Grenier, G., Lavoie, M. (1988). Francophones in the National Hockey League: Test of
Entry and Salary Discrimination. Mimeo, University of Ottawa.
[7] Jones, J. C. H., Walsh, W. D.. (1988). Salary Determination in the National Hockey
League: The E↵ects of Skills, Franchise Characteristics, and Discrimination. I ndustrial
and Labor Relations Review, 41(4), 592–604. http://doi.org/10.2307/2523593
[8] JONES, J., NADEAU, S., WALSH, W. (1999). Ethnicity, productivity and salary: player
compensation and discrimination in the National Hockey League. Applied Economics,
31(5), 593-608. Retrieved from: http://dx.doi.org/10.1080/000368499324048
[9] McLean, R., Veall, M. (1992). Performance and Salary Di↵erentials in the National
Hockey League. Canadian Public Policy / Analyse De Politiques, 18(4), 470. Retrieved
from: http://dx.doi.org/10.2307/3551660
[10] Lebo, A. (2006). Wage Discrimination in the National Hockey League (Bachelor of Arts).
Acadia University. Retrieved from: http://economics.acadiau.ca/
[11] Raeder, D., Sommers, P. (2009). Are Russians the Highest-Paid Goalies in the
NHL?. International Atlantic Economics Society, 16(1), 132-133. Retrieved from:
http://dx.doi.org/10.1007/s11294-009-9239-2
32
[12] Idson, T., Kahane, L. (2000). Team e↵ects on compensation: an application to salary de-
termination in the National Hockey League. Economic Inquiry, 38(2), 345-357. Retrieved
from: http://dx.doi.org/10.1111/j.1465-7295.2000.tb00022.x
[13] Kahane, L. (2001). Team and player e↵ects on NHL player salaries: a hierarchi-
cal linear model approach. Applied Economics Letters, 8(9), 629-632. Retrieved from:
http://dx.doi.org/10.1080/13504850010028607
[14] Watterson, S. (2009). Position Value and Wage Determinants in the NHL. Clemson
University.
[15] Fullard, J. (2012). Investigating Player Salaries and Performance
in the National Hockey League. Brock University. Retrieved from:
http://dr.library.brocku.ca/handle/10464/3997
[16] Pantano, J. (2012). Is Bigger Better? An Examination of the E↵ects of Size on Perfor-
mance and Compensation of NHL Goaltenders. College of the Holy Cross.
[17] Staudohar, P. (2005). Hockey Lockout of 2004-05. The Monthly Labor Review 128(12),
23-29.
[18] Perlo↵, J. (2013). M icroeconomics with calculus. Boston, Mass.: Pearson. Ch. 15
[19] Vincent, C., Eastman, B. (2009). Determinants of Pay in the NHL: A Quantile
Regression Approach. Journal Of Sports Economics, 10(3), 256-277. Retrieved from:
http://dx.doi.org/10.1177/1527002508327519
[20] Bhandari, N. (2014). The 2004-05 Lockout: Where is the NHL Ten Years Later?.
TheRichest. Retrieved from: http://www.therichest.com/sports/hockey-sports/the-
2004-05-nhl-lockout-where-is-the-nhl-ten-years-later/?view=all
[21] Peck, K. (2012). Salary Determination in the National Hockey League: Restricted,
Unrestricted, Forwards, and Defensemen (Honors Theses). W estern Michigan University
Retrieved from http://scholarworks.wmich.edu/honors theses
[22] Kimelman, A. (2010). All-Star Game to feature new Fantasy Draft. N HL.com. Retrieved
from https://www.nhl.com/news/all-star-game-to-feature-new-fantasy-draft/c-543059
[23] NHL (2016). New format for Honda NHL All-Star Game announced. N HL.com.
Retrieved from https://www.nhl.com/news/new-format-for-honda-nhl-all-star-game-
announced/c-788532
33
[24] NHL (2005). NHL Enacts Rule Changees. Nhl.com. Retrieved from:
http://www.nhl.com/ice/page.htm?id=26394
[25] Jenkins, J. (1996). A Reexamination of Salary Discrimination in Professional Basketball.
Social Science Quarterly. 77(3). 594 - 608.
[26] Schuckers, M. (2011). DIGR: A Defense Independent Rating of NHL Goaltenders using
Spatially Smoothed Save Percentage Maps. In MIT Sloan Sports Analytics Conference.
Canton, NY.
[27] Daccord, B. (1998). Hockey goaltending: Skills for Ice and In-Line Hockey. Champaign,
Ill.: Human Kinetics. p6
34
A Stata do File
⇤⇤Summary S t a t i s t i c s
summarize ln nwage
summarize ln nwage , d
summarize savenew
summarize savenew , d
summarize savenew2
summarize savenew2 , d
summarize exp
summarize exp , d
summarize height
summarize height , d
summarize age
summarize age , d
summarize gs new
summarize gs new , d
summarize toinew
summarize toinew , d
summarize capacity
summarize capacity , d
summarize asgbeforencontract
summarize asgbeforencontract , d
summarize avgmonthlygooglesearcheslas
summarize avgmonthlygooglesearcheslas , d
summarize numberofnhlcalibergoaliesin
summarize numberofnhlcalibergoaliesin , d
summarize pop
summarize pop , d
⇤ Variables names needed to be changed to f i t latex ⇤
rename avgmonthlygooglesearcheslas google
rename numberofnhlcalibergoaliesin bargain
⇤Proving Assumptions⇤
reg ln nwage s a v e l a s t save2 age age2 t o i l a s t google bargain r f a
estat v i f
reg ln nwage s a v e l a s t savelast2 save2 save2 2 t o i l a s t age age2 google bargain r f a
he tte st
reg ln nwage savenew toinew avgmonthlygooglesearcheslas , r
predict res , r
summarize res
summarize res , d
⇤Table 3 Regressions in order of appearance on table (A R2 squared Calculated Manually )⇤
reg ln nwage s a v e l a s t save2 age age2 t o i l a s t toi2 pop , r
reg ln nwage s a v e l a s t save2 age age2 t o i l a s t pop google bargain r f a capacity height , r
reg ln nwage s a v e l a s t save2 age age2 t o i l a s t google bargain rfa , r
reg ln nwage s a v e l a s t savelast2 save2 save2 2 age age2 t o i l a s t google bargain rfa , r
⇤Table 4 Regressions in order of appearance on table (A R2 squared Calculated Manually )⇤
reg ln nwage savenew age age2 toinew pop , r
reg ln nwage savenew toinew google , r
reg ln nwage savenew savenew2 toinew google , r
35

More Related Content

Similar to Final Thesis

Statistical Modelling of English Premier League Position
Statistical Modelling of English Premier League PositionStatistical Modelling of English Premier League Position
Statistical Modelling of English Premier League PositionJack O'Reilly
 
assessingthenumberofgoalsinsoccermatches
assessingthenumberofgoalsinsoccermatchesassessingthenumberofgoalsinsoccermatches
assessingthenumberofgoalsinsoccermatchesRasmus Bang Olesen
 
Multi Criteria Selection of All-Star Pitching Staff
Multi Criteria Selection of All-Star Pitching StaffMulti Criteria Selection of All-Star Pitching Staff
Multi Criteria Selection of All-Star Pitching StaffAustin Lambert
 
Violent Strategies
Violent StrategiesViolent Strategies
Violent StrategiesDana Shapiro
 
The Effect of RAT on Wages for Professional Basketball Players 0505.docx upda...
The Effect of RAT on Wages for Professional Basketball Players 0505.docx upda...The Effect of RAT on Wages for Professional Basketball Players 0505.docx upda...
The Effect of RAT on Wages for Professional Basketball Players 0505.docx upda...Andre Williams
 
Predicting Salary for MLB Players
Predicting Salary for MLB PlayersPredicting Salary for MLB Players
Predicting Salary for MLB PlayersRobert-Ian Greene
 
Niall_Brooke_Project_final.docx
Niall_Brooke_Project_final.docxNiall_Brooke_Project_final.docx
Niall_Brooke_Project_final.docxNiall Brooke
 
Econometric Analysis and Replication of Eichengreen, Watson, and Grossman's ...
Econometric Analysis and Replication of Eichengreen, Watson, and Grossman's  ...Econometric Analysis and Replication of Eichengreen, Watson, and Grossman's  ...
Econometric Analysis and Replication of Eichengreen, Watson, and Grossman's ...Kabeed Mansur
 
PREDICTABILITY OF MARKET RETURNS USING BOOK TO MARKET RATIO
PREDICTABILITY OF MARKET RETURNS USING BOOK TO  MARKET RATIOPREDICTABILITY OF MARKET RETURNS USING BOOK TO  MARKET RATIO
PREDICTABILITY OF MARKET RETURNS USING BOOK TO MARKET RATIOHimanshu Shrivastava
 
Regression Analysis of NBA Points Final
Regression Analysis of NBA Points  FinalRegression Analysis of NBA Points  Final
Regression Analysis of NBA Points FinalJohn Michael Croft
 
Quantitative Assessment of the Individuality of Friction Ridge Patterns.pdf
Quantitative Assessment of the Individuality of Friction Ridge Patterns.pdfQuantitative Assessment of the Individuality of Friction Ridge Patterns.pdf
Quantitative Assessment of the Individuality of Friction Ridge Patterns.pdfElias Mendoza
 
Nba wage study
Nba wage studyNba wage study
Nba wage studyTy Candler
 
Prévisions des crises
Prévisions des crises Prévisions des crises
Prévisions des crises Jibin Lin
 
Applied Functional Data Analysis Methods And Case Studies
Applied Functional Data Analysis  Methods And Case StudiesApplied Functional Data Analysis  Methods And Case Studies
Applied Functional Data Analysis Methods And Case StudiesJennifer Daniel
 

Similar to Final Thesis (20)

Statistical Modelling of English Premier League Position
Statistical Modelling of English Premier League PositionStatistical Modelling of English Premier League Position
Statistical Modelling of English Premier League Position
 
Directed Research MRP
Directed Research MRPDirected Research MRP
Directed Research MRP
 
assessingthenumberofgoalsinsoccermatches
assessingthenumberofgoalsinsoccermatchesassessingthenumberofgoalsinsoccermatches
assessingthenumberofgoalsinsoccermatches
 
Multi Criteria Selection of All-Star Pitching Staff
Multi Criteria Selection of All-Star Pitching StaffMulti Criteria Selection of All-Star Pitching Staff
Multi Criteria Selection of All-Star Pitching Staff
 
Violent Strategies
Violent StrategiesViolent Strategies
Violent Strategies
 
Master thesis
Master thesisMaster thesis
Master thesis
 
The Effect of RAT on Wages for Professional Basketball Players 0505.docx upda...
The Effect of RAT on Wages for Professional Basketball Players 0505.docx upda...The Effect of RAT on Wages for Professional Basketball Players 0505.docx upda...
The Effect of RAT on Wages for Professional Basketball Players 0505.docx upda...
 
Thesis!
Thesis!Thesis!
Thesis!
 
Predicting Salary for MLB Players
Predicting Salary for MLB PlayersPredicting Salary for MLB Players
Predicting Salary for MLB Players
 
Niall_Brooke_Project_final.docx
Niall_Brooke_Project_final.docxNiall_Brooke_Project_final.docx
Niall_Brooke_Project_final.docx
 
Econometric Analysis and Replication of Eichengreen, Watson, and Grossman's ...
Econometric Analysis and Replication of Eichengreen, Watson, and Grossman's  ...Econometric Analysis and Replication of Eichengreen, Watson, and Grossman's  ...
Econometric Analysis and Replication of Eichengreen, Watson, and Grossman's ...
 
EC331_a2
EC331_a2EC331_a2
EC331_a2
 
PREDICTABILITY OF MARKET RETURNS USING BOOK TO MARKET RATIO
PREDICTABILITY OF MARKET RETURNS USING BOOK TO  MARKET RATIOPREDICTABILITY OF MARKET RETURNS USING BOOK TO  MARKET RATIO
PREDICTABILITY OF MARKET RETURNS USING BOOK TO MARKET RATIO
 
Cricket predictor
Cricket predictorCricket predictor
Cricket predictor
 
Regression Analysis of NBA Points Final
Regression Analysis of NBA Points  FinalRegression Analysis of NBA Points  Final
Regression Analysis of NBA Points Final
 
Quantitative Assessment of the Individuality of Friction Ridge Patterns.pdf
Quantitative Assessment of the Individuality of Friction Ridge Patterns.pdfQuantitative Assessment of the Individuality of Friction Ridge Patterns.pdf
Quantitative Assessment of the Individuality of Friction Ridge Patterns.pdf
 
Nba wage study
Nba wage studyNba wage study
Nba wage study
 
Prévisions des crises
Prévisions des crises Prévisions des crises
Prévisions des crises
 
Applied Functional Data Analysis Methods And Case Studies
Applied Functional Data Analysis  Methods And Case StudiesApplied Functional Data Analysis  Methods And Case Studies
Applied Functional Data Analysis Methods And Case Studies
 
Dissertation
DissertationDissertation
Dissertation
 

Final Thesis

  • 1. A Further Examination of the Determinants of NHL Goalies’ Salaries Matthew J Rosenstein: S1223256 / B029618 April 3, 2016 I acknowledge that this work is my own and would like to thank Dr. Colin Roberts for his continued support throughout this entire process. Any mistakes in this work are also my own. Abstract This dissertation attempts to extend and improve on the current literature on wage determination for NHL goalies. With the primary measure of a goalies’ production, save percentage, having a coe cient of variation of 0.012, and a coe cient of variation of a goalies wages equal to 0.74; there is a large variation of wages that is not explained by di↵erences in skill. This paper argues that variables related to goalies’ popularity, have significant explanatory power in determining wages. Using OLS methods distributed under the student-t distribution, this paper shows that after considering factors related to a players popularity a lot more of the variation in goalies wages is accounted for. 1
  • 2. Contents 1 Introduction 4 2 Literature Review 5 2.1 Wage Discrimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Wage Determination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3 The e↵ect of the 2004 - 2005 NHL Lockout 7 3.1 On Players . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.2 On Owners . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 4 The Model 8 4.1 Monopsonistic Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 5 Variables to Consider 10 5.1 Measuring MPL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 5.2 Measuring MR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 5.3 Other Potential Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 6 Deviation From Original Paper 13 6.1 Restricted vs Unrestricted . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 6.2 Free Agency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 6.2.1 Sample Selection Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 7 Data 15 7.1 The Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 7.2 Estimation Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 7.2.1 Model Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 7.2.2 No Perfect Colinearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 7.2.3 Zero Conditional Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 7.2.4 Homoscedasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 7.2.5 Normality of Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 7.3 Regression Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 7.3.1 Looking Back . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 7.3.2 Looking Forward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 8 Conclusion 29 8.1 Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 A Stata do File 35 3
  • 3. 1 Introduction Professional sports provide an ideal setting for studying labor economics, since relevant infor- mation such as workers name, popularity and production measures is common knowledge. Due to the nature of having near perfect information, economists frequently use profes- sional sports markets as a way of studying topics such as discrimination, monopsony power, ine ciencies and salary determination for players. This paper attempts to estimate the rela- tionship between salary and measures of marginal revenue product of labor for NHL goalies. The MRPL of a goalie is the additional revenue generated by adding said goalie to your team. Goalies’ MRPL is heterogeneous and depends not only on their own characteristics, but also on those of their team. The current consensus in the literature is that NHL goalies’ wages are primarily a func- tion of their on-ice performance statistics. However, due to the rapid growth in both the profitability and popularity of the NHL, following the locked-out season of 2004–2005, the functional relationship between wage and measures of production may have changed. It is the opinion of the author that in the post lock-out era of the NHL, wages are not only based on on-ice performance statistics relating to winning, but also on o↵-ice measures that relate to a players’ popularity. Using OLS methods, this paper will test this hypothesis by regress- ing the natural log of wages on both measures of performance and measures of popularity in order to determine whether including measures of popularity improves the measure-of-fit for the regressions. The results indicate that once a goalie’s popularity has been accounted for, a much larger variation in their wages can be explained than before. The remainder of this paper will be presented as follows. Section 2 will consider the literature on the NHL, section 3 will discuss the impact of the NHL lock-out and how this a↵ected the traditional relationship between wage and MRPL. Section 4 will provide a simple theoretical model for wage determination in the NHL and section 5 will discuss all of the potential variables relating to a goalie’s wage. Due to the fact that this paper is an extension of work completed by Berri and Brook (2010), section 6 will discuss the deviations from their research in this paper. Section 7 will present the empirical estimates and results of the paper by describing the data set,discussing the OLS assumptions required for the estimates to be BLUE, and finally by analyzing the results of the regressions provided. Section 8 will summarize the findings of this paper and suggest potential areas of future research. 4
  • 4. 2 Literature Review Due to the fact that the National Hockey League (NHL) is the smallest of the four major sports leagues in North America, both in terms of size and revenues, there are far fewer studies on hockey then there are on baseball, basketball and football, and the literature on goalies is even more limited. With that said, there are still a significant number of papers on hockey; which are mainly divided into two categories: wage discrimination and wage determination. Due to the lack of literature on the specific subject of NHL skaters and goalies, this paper will also draw from literature on the national basketball association (NBA) and major league baseball (MLB) in order to produce the best estimates possible. 2.1 Wage Discrimination The majority of the literature before the 2004–2005 lock-out was primarily focused on wage discrimination in the NHL.The main determinants of discrimination seem to be the reserve clause and a person’s country of birth. Studies on wage discrimination, as a response to the reserve clause, began in the 1970’s when Scully demonstrated that the reserve clause in MLB, similar to restricted free agency in the NHL, allowed teams to exploit players due to their monopsony power[2]. Following Scully’s model of discrimination in MLB and applying this to the NHL, David Richardson[3] tested the existence of wage discrimination using data from the 1990s and found no evidence of wage discrimination due to the reserve clause. Following the locked-out season of 2004–2005, Lambrinos and Ashman (2007)[4] furthered this claim by testing for wage discrimination, as a result of the reserve clause, against forwards and defensemen and found that arbitrated salaries (salaries for restricted free agents that were determined by an arbiter in an arbitration hearing) were not statistically di↵erent from negotiated salaries. Among the first to apply the theory of wage discrimination regarding geographical loca- tion to hockey were Jones and Walsh (1988) [7], who used data from the 1987–1988 NHL season to conclude that there was no wage discrimination against French-Canadian players. They note, however, that due to limited data, their findings may lack external validity in later time periods. Around the same time, using data from later NHL seasons, Lavoie, Gre- nier and Coulombe (1987)[5] and Grenier and Lavoie (1988)[6] found statistically significant evidence of hiring discrimination against French-Canadians, specifically defensemen. They argued that factors such the language barrier made these players less valuable to American and Anglo-Canadian teams; consequently, these players often accepted lower wages for the same level of productivity. Later work by Jones and Wash (1999)[8] and McLean and Veall (1992)[9] contradicted these findings by showing that there was no statistically significant 5
  • 5. evidence of wage discrimination based on birth location. More recent work, such as that of Lebo (2006)[10], found evidence that Europeans, not counting defensemen, were paid more then Canadians, and that American forwards were also paid more than Canadians. Con- sidering an entirely di↵erent sample of players, Reader and Sommers (2009)[11] tested and found evidence that Russians were the highest paid goalies in the NHL, while Americans were the lowest. 2.2 Wage Determination Economists have been using professional sports to study labor markets due to the large number of performance statistics available; enabling them to calculate the marginal revenue product of labor of athletes much more e ciently than for workers in other professions. While this concept will be developed further in this essay, the existing literature on wage determination will be discussed in order to establish a foundation for this work. The majority of the literature, however, only applies to forwards and defensemen due to the di↵erent set of performance statistics for skaters (forwards/defensemen) and goalies. Prior to the locked-out season of 2004–2005, most of the literature considered team and teammate e↵ects on compensation. Economists such as Idson and Kahane[12][13] were able to show how team e↵ects and franchise location were statistically significant when looking at the determinants of players’ wages. Kahane[13] claimed that di↵erences in wages due to franchise location could partially be explained by di↵erences in team revenues; however, goalies were not considered in these papers due to the di↵erent nature of assessing a goalie’s on-ice productivity. Following the locked-out season of 2004–2005, researchers began assessing goalies as well as skaters. One of the first to look into the determinants of NHL goalies’ wages was Watterson (2009)[14]. He considered variables such as: games played (GP), games started (GS), wins (W), losses (L), ties (T), overtime losses (OT), goals against (GA), goals against average (GAA), saves (S), save percentage (SV%) and shutouts (SO). Out of the variables considered, he found that GP, GS, W, GA, and SO were statistically significant in determining goalies’ wages. Berri and Brook (2010)[1] tested and found that SV%, L.SV%1 , sq. age, TOI, and L.TOI were statistically significant determinants of wage. Following the work of Berri and Brook, Fuller (2012)[15] considered variables such as GP, W, L, OT, GAA, SV%, SO, shootout wins (SOW), shootout losses (SOL), and shootout save percentage (SOSV%); he only found games played to be statistically significant. At the same time, Pantano (2012)[16] investigated and found that height was also a statistically significant determinant of wages. 1 L.SV is the lag of save percentage. 6
  • 6. 3 The e↵ect of the 2004 - 2005 NHL Lockout 3.1 On Players According to Paul Staudohar[17], the main reasons for the 2004–2005 NHL lock-out were: “higher player fines for misbehavior, reducing the schedule of games, minimum salaries, playo↵ bonuses for players, free agency, operation of the salary arbitra- tion process, and revenue sharing” [17] and most important for the league (not players) was to implement a salary cap. While some of these issues are not necessarily relevant to the determinants of goalies’ wages, certain outcomes of the lock-out do have an impact. Resolutions such as reducing the size of all goaltenders’ equipment and limiting the area behind the net where a goalie can go, can potentially have a significant impact on the number of goals a goalie will let in during any given game. Combining these changes with the introduction of a salary cap as well as a general 24% cut in wages means that the salary performance relationship that existed prior to the 2004–2005 lock-out may not exist any more For this reason, in order to avoid generating biased results due to changes of the NHL collective bargaining agreement, this paper will only use data following the 2004-2006 locked-out season. 3.2 On Owners Prior to the 2004–2005 locked-out season, the average value of an NHL team was $163,000,000 and the majority of teams were operating at a loss[20]. After the locked-out season, the average value of an NHL team rose to $413,000,000 and a majority of teams were now operating at a profit. Due to the lack of profitability for teams prior to the lock-out, it makes sense that the majority of papers (e.g. Vincent and Eastman[19], Watterson[14], Richardson[3] and Brook & Berri[1]) assume that firms, in this case teams, have the sole goal of maximizing wins2 . Considering the change in profitability of NHL teams after the lock-out, it is fair to assume that while wins could have been the sole factor in determining owners’ (firms’) profits prior to the lock-out, this cannot be considered to be the case in the period following the lock-out. Another reason that supports this theory is the introduction of the salary cap 2 In Berri and Brooks’ paper they state that “The ultimate objective in hockey is to win the Stanley Cup” [1]. In addition to that this is known through the fact that in each of these papers the authors considered players wages as a function of only traditional performance metrics, without considering factors like popularity / jersey sales just as an example 7
  • 7. mentioned above. The introduction of the salary cap not only created a more competitive league, but also forced teams to reconsider how they pay players. Following this, rich teams could not simply pay large sums of money to acquire the best players; instead, teams needed a way to evaluate players and pay them according to their performance. Having considered these reasons, this paper will develop a standard model of firm profit maximization in order to account for all of a players’ expected marginal revenue product of labor, not just that relating to wins, in an attempt to produce a more e↵ective model of wage determinants for NHL goalies. 4 The Model Neoclassical economic theory states that in a competitive goods market, where prices are given, a firm maximizes profits by maximizing revenues relative to the cost of production. The fixed prices assumption implies that prices adjust su ciently slowly to accommodate changes in demand, making prices e↵ectively given. We can mathmatically derive a goalies’ wage from the fact that firms maximize profits: ⇧ = pF(L, Z) wL (1) therefore, @⇧ @L = 0 =) w = p @F(L, Z) @L = MRPL (2) where, output F, is a function of both labor (L) and exogenous measures of profitability independent of the goalie on the team (Z). In terms of ice hockey, Z can be thought of as the population of the city that a team is based, where larger cities have more people to sell merchandise to regardless of who they hire as their goalie. If, @2 F @L@Z > 0 (3) then L and Z are complements implying, in the context of the previous example, that as cities grow, profitability will grow; and if, @2 F @L@Z < 0 (4) then L and Z are substitutes, implying that the exogenous characteristics reduce the prof- itability of goalies. For simplicity, it is assumed that L and Z are complements. Equation 2 can be rewritten as: w = MRPL = MPLP = MPLMR = F(X) (5) 8
  • 8. Where wages are a function of characteristics X, representing all the available indicators of a goalies quality and exogenous characteristics relating to their profitability. Following the work of Peck (2012), a goalies’ marginal product of labor includes all available performance indicators, of which will be introduced later. Although Pecks’ paper refers specifically to forward and defensemen; he states that “For the team owner the MPL includes performance indicators like goals, assists, career games (experience), etc.”[21] In this paper, the performance measures will be adjusted to reflect goalies and not skaters. In relation to marginal revenue, Peck states that “Owners also factor in the additional revenues likely to be generated by signing a particular player as well. This is the marginal revenue factor. Usually, this manifests itself in the sale of o cial licensed gear with that player’s name on it, like jerseys[...Therefore] teams will pay a player more if they believe that by hiring him, they not only will have better success on the ice, but they can also sell more licensed merchandise.”[21] This consideration is important because it was the first time that factors other than per- formance measures were considered when determining players’ wages in the literature. The current work will extend Pecks’ assumptions regarding marginal revenue impact on the salaries of NHL goalies. In practice, when a team wants to sign a goalie to a contract (assign a wage), they have no way of knowing at the time what a goalies’ MRPL will be. This is due to the fact that in the NHL, and all other sports leagues, a player’s contract (wage) is determined prior to them signing for a team. For this reason we will define wage as: w = E(MRPL) = E[F(X)] (6) Where X can be thought as all the observed characteristics that a↵ect the marginal revenue product of labor. Although there are unobserved characteristics that a↵ect MRPL, for the analysis to be internally valid it has to be assumed that these characteristics are uncorrelated with the observable characteristics of X. 4.1 Monopsonistic Characteristics It is important to understand that in a completely competitive market (for labor and goods) both wages and prices are fixed. What that means to employers is that they will continue to hire workers at the given wage, until the extra revenue generated by the last worker is equal to his wage (MR = MC). In the context of the NHL, while the market for goods is 9
  • 9. competitive, the market for labor is not. In the NHL, there is a strict number of goalies allowed on any given NHL roster limiting the number of goalies teams can hire, while also giving them monopsony power over the goalies as they are the ’only’ source of revenue for these goalies3 . Therefore it can be thought that, instead of hiring goalies at a single wage until the MR of the last goalie equals his MC, teams just hire the maximum number of allowed goalies, paying each of them a wage equal to their E(MRPL), which is lower then what it would be in a competitive labor market. 5 Variables to Consider In order to determine the relationship between wage and expected MRPL we must first define what factors are associated with a goalies MPL and MR. 5.1 Measuring MPL While there are many measures of productivity to evaluate the MPL of skaters, such as goals, assists, points, plus-minus, penalty minutes, shots, hits, etc., there are only a maximum of six variables possible that relate to a goalie’s productivity. This is due to the fact that while skaters have many responsibilities on the ice, the goalie only has one task: to stop the puck. These variables include: GS, SOG, GA, GAA, SV% and wins. Wins and games started are fairly self explanatory; shots on goal refers to the number of shots a goalie has faced in any given season; and goals against is the number of goals a goalie has let in in any given season. The di↵erence between shots faced and goals against is the number of saves a goalie made. It is important to note that GAA is not simply the average number of goals a goalie allows per game, but more specifically refers to the average number of goals they allow per 60 minutes. This is significant because it allows for the fact that if a goalie has a really bad start and gets pulled, his poor performance for the short time he was on the ice will be extrapolated and reflect the amount of time he played for. For example, if a goalie lets in three goals in the first period and then gets pulled, his GAA will be 9 instead of 3 for that game. Since the mean GAA of our sample is 2.65 with a standard deviation of 0.36, a GAA of 9 is much more reflective of a very poor performance then a GAA of 3. A goalies save percentage is the number of saves a goalie has made in a season divided by the shots they faced in that same season: Save%i = Savesi ShotsFacedi (7) 3 As there is no comparable league that pays similarly to the NHL in terms of goalie compensation 10
  • 10. Squared save percentage will also be considered to account for a possible nonlinear relation- ship between save percentage and wage. Following the work of Pantano (2012), height can also be considered as related to a goalies’ MPL. This is an interesting variable to consider due to the e↵ects of the 2004–2005 lock-out. In order to try and improve the overall experience for fans, the NHL made a number of rule changes to try and increase scoring. The most significant of these was that there was an 11% reduction in the overall size of goalies’ equipment[24]. Where before a goalie could wear any size pad they wanted, they now had to reduce the size of their pads relative to their size. While the aim of this policy was to increase scoring, it also gave larger, taller goalies an advantage over their shorter counterparts because they could wear larger pads. Neoclassical economic theory also suggests that age and experience, as well as their squares, to model for possible nonlinear relationships, will be statistically significant factors of a goalies MPL. 5.2 Measuring MR Following the work of Peck (2012), a player’s MR can be estimated based on the number of All-Star Game appearances a goalie has made in his career up to the point of his contract expiration. Peck states, “The All-Star variable has this unique characteristic because fans directly select players to perform in the All-Star game through a voting ballot, making this variable appropriately related to fan preference”[21]. While this paper, according to the analysis of Peck (2012), will consider All-Star Games as a variable, it is important to note that NHL All-Stars are, for the most part, not determined by fans. Prior to 2016, of the 42 players that comprised the All-Star Game, the fans selected only 6 [22]. In 2016, of the 44 players that comprise the All-Star Game, fans selected only 4 [23]. To correct for the problem in measuring popularity by using All-Star Games, average monthly Google searches will be used as a proxy for popularity, to account for a player’s MR. In an ideal world, I would have exact data about revenues from player-specific merchandise, however, this information is not publicly available. Due to the fact that exact data on average monthly Google searches was not available for each player, instead the data was estimated using Google Adwords, and Google Trends. Google Adwords provides an actual number of Google searches for the past 24 months; Google Trends indexes the number of searches per 11
  • 11. month from 2005 to the present on a scale from 0 to 100. Using both of these tools, it is possible to estimate the average number of monthly Google searches for a player in the last year of his contract. It is important to note that while this method can be considered a strong proxy for fan popularity, this variable has its drawbacks. Many people share the same name, so results for very common names, like Mike Smith, could be biased. To correct for this when using Google Adwords, a keyword most reflective of the player was used. For example, instead of using the average monthly searches for the general name Mike Smith, where anyone who typed the following keywords would be included: Mike Smith, Michael Smith, Smith Michael, Michael Smith blog. I instead used searches for Smith Goalie where anyone who typed in the following would be included: Mike Smith goalie, Michael Smith NHL goalie, smith goalie. Mike Smith goalie had 43,000 average monthly Google searches, while Michael Smith NHL goalie only had 1,010 which is clearly more reflective of the NHL goalies’ popularity. In addition to this it is also important to note that the majority of goalies in our sample have fairly unique names and therefore would no su↵er from this ’Mike Smith’ bias. Considering all of the above, it is reasonable to assume that it is still valid to use Google searches as a strong proxy for popularity. Another variable that can be considered as a factor of MR is capacity. This is due to the simple fact that, in theory, the higher the percentage of seats a team can fill, the greater their ability to pay players higher wages. However, it is possible that the salary cap makes this variable obsolete as all teams now have, essentially, the same ability to pay all their players. The introduction of revenue sharing could also render capacity obsolete as it further increases a small market teams’ ability to pay players fair wages regardless of the number of seats they fill per game. 5.3 Other Potential Variables Another potentially significant factor that has not been previously considered in the literature is the bargaining power of goalies. While in competitive labor markets there is easy labor mobility, meaning that if one firm refuses to pay a worker a wage equal to their MRPL, another firm will, the NHL can be seen as a type of monopsonist employer. This is because there are only 60 ‘jobs’ available for goalies to play in the NHL and outside of this there is nowhere they can go to receive an even slightly comparable wage4 . This is suggestive of the 4 Besides playing in the NHL there is also the American Hockey league, leagues in all of the European coun- tries, as well as the KHL in Russia. None of these league can even come close to matching the compensation that NHL teams pay goalies. 12
  • 12. fact that NHL teams could exploit monopsony power over goalies in situations where there are many free agent goalies at one time because there are only a limited number of roster spots; consequently, the goalies would need to accept a wage less then their MRPL in order to guarantee a spot on the team. For this reason I will consider bargaining power as a factor determining wages when a team is looking to sign a goalie to a new contract. This measure will be proxied by the variable: total number of NHL caliber free agent goalies. It will be defined as the number of goalies whose contracts expired in the relevant o↵-season and who played at least one minute in the NHL. 6 Deviation From Original Paper While this paper follows the methods of Berri and Brook (2010), in an attempt at estimating a relationship between wage and MRPL, two major deviations from their sample should be noted. 6.1 Restricted vs Unrestricted First, following the work of Richardson [3] and Lambrinos and Ashman [4] restricted free agents will be included in the data-set as there is no evidence in the literature that their wages are determined in a di↵erent manner from unrestricted free agents. A binary variable for restricted free agent, RFA, will be used as a control if such di↵erences are found to exist within this data-set. 6.2 Free Agency Second, and most importantly, this paper provides a strict definition of a free agent, but first it is crucial to discuss the way NHL contracts work and how NHL free agency works. Firstly, there is no rule in the collective bargaining agreement that mandates an NHL team or player to publicly reveal the contents of a player’s contract. Despite this, the NHLPA has been releasing players’ salaries since before 19905 . Further, players and their agents have been releasing players’ contract details as well as their salaries in an e↵ort to strengthen their bargaining power in contract negotiations. So while not all NHL players’ salaries and contracts are public, many of them are. Secondly, in the NHL free agency system any player will become a free agent on July 1st of the year his contract expires. If a player to become 5 While an exact date is not known for when the NHLPA began releasing salaries, it is not relevant for this paper. 13
  • 13. a free agent chooses to re-sign with his current team, or is traded and signs with the new team before July 1st, that player will no longer be considered a free agent. In Berri and Brook’s paper, they define a free agent as someone whose contract would have expired in the relevant o↵-season, regardless of whether they signed a new contract before the start of free agency on July 1st. While the author agrees with this definition, it requires private information of which the I can not obtain. In an email on file from Stacey Brook he stated that he collected free agency data from Hockey News (magazine) through reports produced of upcoming free agents before the season was over. He suggested that for my data I use the website Spotrac. Upon further examination of the website, however, it became clear that the information they provide is inconsistent and not always correct6 . In an email on file with Julie Young, the director of communications for the NHL, she stated that the “only” information the NHL releases publicly is the yearly July 1st free agent lists and she couldn’t provide me with a list of players who would have become free agents had they not been traded or resigned prior to free agency. Given all the above, this paper will use the traditional definition of a free agent: any player as of July 1st whose previous contract has expired and has not yet signed with a new team. 6.2.1 Sample Selection Bias As discussed above, given the public information available to the author at the time of writing, only free agents in any given o↵-season are known, as opposed to knowing all the players whose contracts would have expired. The issue this causes is that any goalie who would have become a free agent, but resigned before free agency, does not become part of our sample. What is included in our sample is all the NHL goalies whose teams, on the expiration of the goalie’s contact, waited for free agency to see if they could get another goalie7 . Thus, the sample comprises goalies whose teams do not what them anymore and so they become free agents. Since our sample only includes goalies who played at least 1,000 minutes in all relevant seasons8 , it is e↵ectively made up of those whose recent production was not high enough to guarantee them a spot on their old team, but not poor enough as to prevent other teams from giving them a chance. These sample selection issues may lead 6 When comparing information on Spotrac with that provided by the NHL discrepancies were found regarding the terms of the contract upon expiration: i.e whether the play became a UFA or RFA upon contract information. Therefore, Spotrac can not be seen as a reliable source. 7 This can be assumed because if a team was sure their current goalie was better than all the available upcoming goalies, they would just extend the goalie’s contract prior to free agency 8 the importance of this restriction will be disused later 14
  • 14. to problems when attempting to model E(MRPL) because they could potentially lead to the estimation of an inaccurate relationship between production and wage. Specific problems caused by this sample selection bias will be discussed later using the relevant assumptions necessary to run OLS regressions with estimates of that can be considered BLUE 9 . 7 Data 7.1 The Sample Considering the data set is made up of three years of data for all goalies (their last contract season, as well as the seasons before and after the final contract year), it can be classified as an independently pooled cross section data-set. The data set is made up of performance and production related statistics for 44 goalies that became free agents following the 2005-2006 season up to the start of free agency following the 2013-2014 season. Following to the work of both Jenkins (1996)[25] and Berri and Brook (2010)[1], only free agents will be included in our sample because, “Including players in the midst of a long-term contract results in measurement error in a salary regression.”[1] Free agents were determined from the relevant lists provided to the author by the NHL. Following the work of Berri and Brook only goalies who played 1000 minutes in each of the two seasons before their contract expired as well as playing 1000 minutes in the year they signed their new contract are included, in order to prevent goalies who played fewer then 1000 minutes with extreme (very bad or very good) results from biasing the data. Salary data was collected from both ‘HockeyZonePlus’ for the seasons prior to 2011–2012 and ‘Spotrac’ for all data after the 2011–2012 season10 . Taking into account the previous literature on labor economics and the NHL, this paper will consider the log of wages instead of their absolute value in order to better examine the relationship between wage and E(MRPL). All production (performance) related statistics including: SV%, GS, age, experience, and height, as well as their lags and leads, were collected from the NHL’s website. Population data was collected from the government websites of both the United States11 and Canada12 . Google searches, as discussed above, were collected by using both Google Trends and Google 9 BLUE refers to the coe cient estimates being the most e cient, unbiased and e cient estimator 10 While Spotrac was found to be a non reliable source of contract information regarding RFA and UFA, it was found to have reliable data on yearly salaries. 11 http://factfinder.census.gov/faces/tableservices/jsf/pages/productview.xhtml?src=bkmk 12 http://www.statcan.gc.ca/tables-tableaux/sum-som/l01/cst01/demo05a-eng.htm 15
  • 15. Adwords. Information on both All-Star Games played and arena capacity was collected from Wikipedia. Data on average attendance per game was collected from ’HockeyDB’. Capacity is calculated as the average percentage of seats filled per game during the season. Table 1 presents descriptive statistics for all of the relevant variables. Table 1: Descriptive Statistics Variable Mean SD Median Minimum Maximum n Log Salary 14.430 0.640 14.301 13.122 15.607 44 Save % 0.909 0.011 0.909 0.883 0.930 44 Squared Save % 0.826 0.021 0.826 0.780 0.865 44 Experience 6.773 3.917 5.5 1 18 44 Height 73.423 2.036 72.8 68.9 78 44 Age 31.023 5.151 31 23 42 44 Games Started 37.977 14.161 37 19 72 44 TOI 2,283.045 805.6107 2,103 1,108 4,305 44 Capacity 0.921 0.084 0.950 0.725 1 44 No. ASG 0.659 1.642 0 0 9 44 Avg. Monthly Google Searches 14,272.98 21,937.20 5,806.50 0 88,283.21 44 No. Free Agent Goalies 28.023 2.205 29 24 31 44 Population 2,469,391 2,909,200 1,235,150 165,521 8,175,133 44 Following neoclassical economic theory, both age and experience are expected to have positive e↵ects on salary while squared age is thought to have a negative a↵ect due to the fact that after a certain age goalies are not able to perform as well as they did in their prime. On the other hand, if a nonlinear relationship exists between wage and experience then I would expect squared experience to have a positive e↵ect on wages. In addition, following the work of Peck (2012), height is expected to have a positive e↵ect on wages due to the post lock-out benefits of being a taller goaltender. One would expect the variable games started to have a positive correlation with salary, if the act of starting a game is valued highly enough to generate greater payment. In a case where two goalies have an equal save percentage, say 90%, for example, but one started 60 games and the other only 20, the former should arguably be compensated more than the latter. For the same reasons as mentioned for games started, TOI is expected to have a positive relationship with salary. In order to avoid potential problems of collinearity between GS and TOI (they have a correlation of 0.5083), TOI will solely be considered, as it is a 16
  • 16. more inclusive measure of ice time and games started. As mentioned above, while it is reasonable to expect capacity to have a positive e↵ect on goalies wages, the introduction of revenue sharing and the salary cap could dampen the e↵ectiveness of capacity as an explanatory variable. Acting as proxies for fan popularity, which can be represented by merchandise revenues per player among other things, the number of All-Star Games as well as number of Google searches are expected to have a positive e↵ect on a goalies’ wage. Population is also expected to have a positive impact on a goalie’s wage due to the fact that in larger cities, more merchandise can be sold to fans, therefore increasing a goalie’s marginal revenue. As the main measure of a goalie’s production, save percentage is expected to be positively related to wage. This is due to the fact that as a goalie produces more wins for his team, he will be compensated for the increased revenue generated from winning. As discussed earlier, and proven in section 7.2.3, the sample selection bias presented in this paper will negatively a↵ect the impact save percentage has on wages. With regard to squared save percentage, the expected sign is unknown. If teams believe that an above average save percentage will generate them exponentially more wins, then the coe cient on squared save percentage will be positive; if, on the other hand, they believe that after a certain point increasing a goalie’s save percentage by 1% will generate them less then 1% more wins, then there will be a negative relationship between squared save percentage and wins. Due to the lack of variation in goalies’ save percentage, the author predicts the former; however, when considering the inconsistent nature of goalies’ performance, as will be discussed later, the latter seems more likely as teams would not be willing to pay an exponentially high wage to a goalie that cannot guarantee above average future production. 7.2 Estimation Process In order for to show that our OLS estimates of the coe cients are not only valid but also BLUE (best linear unbiased estimator) we must first make assumptions about our data set. After mathmatically showing how these assumptions will make our estimations unbiased, a theoretical explanation will be presented concerning the validity of these assumptions 7.2.1 Model Specification The first OLS assumption is that the model can be written linearly where y is a function of observed and unobserved characteristics : y = X + u (8) 17
  • 17. where y, wages, is a observed (n x 1) vector; X is a [n x (k + 1)] observed matrix of di↵erent factors of a goalies’ MRPL for the relevant years; and where u is a (n x 1) vector of unobserved characteristics [error term]. In other words, this assumption is stating that the relationship between the dependent variable and independent variable are linear in parameters. It is important to note that variables like squared age, as well as squared save percentage are still considered to be linear even though they model a non-linear relationship between the dependent and independent variables. This is due to the fact that they are introduced into the regression as a separate independent variables from their non squares giving a linear relationship between wage and save% / age as well as another linear relationship between wage and Sq. save% / Sq. Age. While there is a possibility that a non-linear relationship exists between wages and MRPL, all of the previous literature on wage determination has only considered wages as a linear function of MRPL. 7.2.2 No Perfect Colinearity The second assumption for unbiased OLS estimates is that there is no perfect collinearity between regressors. Empirically this means: Matrix X has rank (K + 1). Meaning all of the regressors (variables that make up MRPL) are independent from one another; said di↵erently, there can not be a high degree of correlation between regressors. This assumption is vital because if it does not hold then a high correlation between regressors can produce biased estimates of . In order to test for colinearity between regressors one must check the variance inflation factor (vif) of the regression. The way vif is calculated for each independent variable is as follows, where: vifi = 1 1 R2 i (9) Vif tests how ’inflated’ the variance of a coe cient is due to its correlation with other regressors. It is lower bounded at 1, meaning the regressor with vif = 1 is not correlated with any other regressors and is not bounded at the top. Traditionally, a vif > 10 implies colinearity between regressors, meaning the variance of the estimate of the relevant variable is being increased due to the presence of another variable in the regression. Below is a table reporting the results of our colinearity test for the regression modeling w = E(MRPL)13 : Here, the variables with a high vif: both age and age squared are expected to be highly correlated with one another as they are directly related. For these variables, increased variance is not a problem because it allows for a non-linear relationship between wage and age. Considering the rest of the variables all with a vif below 5, this paper can take the 13 Specifically this is referring to Regression 3 of table 3 labeled ’Significant’ 18
  • 18. Table 2: Results of VIF Test Independent Variables VIF Age 323.19 Sq. Age 286.37 Save% Last Season 1.15 Save% 2 Seasons ago 1.50 RFA 5.10 Time on Ice last season 1.38 Capacity 1.37 Google Searches 1.25 No. Free Agent goalies 1.38 second OLS assumption as being true. In addition to the variables discussed above, Games Started was found to have a very high correlation, 0.9884, with time on ice, therefore it was omitted to prevent biasing the coe cient on TOI. 7.2.3 Zero Conditional Mean The third OLS assumption is that the error term has a zero conditional mean: E(u|X) = 0 (10) The expected value of the error term should be zero given any values of X. This assumption is implied to be true if it is assumed that the data is collected from a random sample. Unfortunately, as discussed earlier, given the publicly available information to the author at the time of writing, it was impossible to have a random sample. In fact, the data su↵ers from sample selection bias. Without this assumption it is not possible for this paper to present estimators that are BLUE, but since the bias a↵ecting the data is known, it is possible to understand how the estimates are being e↵ected by this bias, thus allowing this paper to proceed with running OLS regressions. It is crucial to note that without proving the validity of this assumption, statements made about the causal relationships between our independent regressors and our dependent variable need to be made with caution as the sample lacks internal validity. In order to understand how this bias a↵ects our estimates a theoretical framework must first be developed. Using equation 5, it is then possible to state: u = y X (11) 19
  • 19. Knowing that OLS uses estimates of that minimize the sum of square residuals, we can write: SSR = nX i=1 ˆu2 i = ˆu0 ˆu = (y X )0 (y X ) (12) Minimizing the equation above and setting it equal to zero gives you: 2X0 (y X ˆ) = 0 (13) Rearranging for then gives you the estimate of beta: ˆ = (X0 X) 1 X0 y (14) Substituting (X + u) in for y in equation 13 yields: ˆ = (X0 X) 1 X0 (X + u) = (X0 X) 1 X0 X + (X0 X) 1 X0 u (15) Taking expectations yields: E(ˆ|X) = E[(X0 X) 1 X0 X |X] + E[(X0 X) 1 X0 E(u)|X)] (16) Because (X0 X) 1 X0 X is an identity and E( |X) = equation 15 can be re-written as: E(ˆ|X) = + E[(X0 X) 1 X0 E(u)|X)] (17) In normal circumstances, where the assumption E(u|X) equals zero, the expected value of ˆ would equal . Here, because the zero conditional mean assumption has been violated our estimates of beta will not be equal to the true of the population, but instead be biased. Specifically, the e↵ect of the bias can be seen through the correlation of the unobserved variable and X as well as correlation between the unobserved variable and y. Relating this to the sample of this paper, there is some unobserved variable, call it z, that is negatively correlated with save percentage and positively correlated with wage. Due to the presence of this unobserved omitted variable, the estimates of ˆ on save percentage will be negatively biased. It is in the opinion of the author that one option for what the unobserved omitted variable could be, is that the variable is related to or reflective of the inconsistent nature of goalies. 7.2.4 Homoscedasticity The fourth assumption, and final assumption to make in order to determine our statistical estimates to be BLUE, we must take is that the error term is Homoskedastic and is not serially correlated. This can be written as: V ar(u|X) = 2 In (18) 20
  • 20. Where In is a (n x n) Identity Matrix. Since the data set is made up of independently pooled cross sectional data, the assumption that the error terms are not serially correlated automatically holds. Regarding heteroscedasticity, a simple Breusch-Pagan / Cook-Weisberg test can be per- formed on the final regression of table 3 labeled ’Non-Linear Sig’. In stata, this test is preformed by testing whether t = o in the following regression: V ar(✏) = 2 ezt (19) where z is equal to all of the fitted values of y; and if if t = o, then the variance of ✏ would equal: V ar(✏) = 2 In (20) The p value from running the Breusch-Pagan / Cook-Weisberg test for heteroskedasticity is equal to 0.7639 which is a clear sign that the error does not have a constant variance. While this can be problematic, Stata o↵ers a simple solution which allows the user to run a ’robust’ regression to account for this issue. This solution is called the ’Huber and White’ robust standard errors, and is also referred to as the sandwich estimator. This new way of calculating standard errors just slightly adjusts the old way of calculation to allow for a non-constant variance of y. Starting with equation 13, we can proceed with deriving the sandwich estimator: ˆ = (X0 X) 1 X0 y (21) Therefore, the variance of ˆ can be calculated as: V ar(ˆ) = V ar[(X0 X) 1 X0 y] = (X0 X) 1 V ar(X0 y)(X0 X) 1 (22) Without running a robust regression stata assumes the Var(y) is constant and so it simplifies the above equation to: V ar(ˆ) = (X0 X) 1 X0 V ar(y)X(X0 X) 1 = 2 y(X0 X) 1 (23) But because the Var(y) is not constant (heteroscedastic), adjustments need to be made to the original formula. Because y is a (n x 1) row vector and X’ [(k +1) x n] matrix; X’y is a [(k + 1) x 1] row vector, where its first element X0 1y can be defined as: X0 1y = x11y1 + x21y2 + ... + xn1yn (24) Where, assuming all yj are independent, the variance of X0 1y is equal to: V ar(X0 1y) = x2 11V ar(y1) + x2 21V ar(y2) + ... + x2 n1V ar(yn) (25) 21
  • 21. As X is a [n x (k + 1)] observed matrix where its rows are the di↵erent goalies, and its columns represent di↵erent measures of MRPL, V ar(X1y) can be thought of as the variance of save percentage (although any measure of MRPL would have su ced), allowing for het- eroscedasticity. Combining V ar(X0 1y) with the variance of all the other columns of x, you obtain: V ar(X0 y) = nX j=1 ˆ✏2 j x0 jxj (26) Substituting this new term back into equation 22 gives the formula for the sandwich estima- tor: V ar(ˆ) = (X0 X) 1 ( nX j=1 ˆ✏2 j x0 jxj)(X0 X) 1 (27) Here, Stata allows for a non-constant variance of the error term when calculating the variance of the estimates, therefore making the standard errors robust to heteroscedasticity. 7.2.5 Normality of Error The final assumption we must take, which will allow for the use of the t distribution, is that the errors follow a normal distribution. Mathematically, this can be written as N ⇠ (0, 2 In). This assumption is crucial because without it being valid, there is no way to test for any statistically significant relationship between y and X. It can be tested simply by comparing the distribution of the error terms to a normal distribution. This can be tested by simply comparing the distribution of the error term, u, to that of a normal distribution. A standard normal distribution with mean of 0 and variance equal to 2 should have a skewness equal to 0 as well as kurtosis equal to 3. Comparing this to our distribution of u14 ; it has a mean of 0.00000000203, skewness of -0.20, and kurtosis of 2.71. While u does not follow a perfect standard normal distribution, evident by its negative skewness as well as a slightly smaller kurtosis then normal, it can be approximated as normal in order to move forward using the student t distribution. It is important to note that multiple attempts were made to correct for this non-normal distribution of the error term by transforming both the independent and dependent variables in various way to achieve a normal distribution of the residuals. It was found that the original specification provided in this paper produced residuals that can be considered approximately normally distributed. For that reason we will approximate the distribution of the error term, as above, as normally distributed in order to be able to define the significance of the independent variables. 14 For R! of table 4. 22
  • 22. 7.3 Regression Results 7.3.1 Looking Back The OLS assumptions with some corrections seem to be valid. In the following section I will proceed with analyzing the e↵ects of MRPL on wage. Due to the di↵erences between the work of Berri and Brook (2010) and this paper, both in explanatory variables and data sets, this paper will re-run the specifications presented by Berri and Brook using their sample in order to provide an accurate comparison between the two. First, the paper will present three di↵erent ways of estimating wage, before a contract is signed. This is representative of the estimates a team has to make of a player’s MRPL before they sign the contract, not knowing what his production will be. The first set of regressions will be representative of the equation: w = E(MRPL) (28) Empirically the first regression (BB) can be written as: ˆln(wage) = ˆ0 + ˆ1SaveLast+ ˆ2Save2+ ˆ3Age+ ˆ4Sq.Age+ ˆ5TOI + ˆ6TOI2+ ˆ7POP +✏ (29) Table 3 presents the empirical estimates of equation 29 in four di↵erent forms. The first regression is modeled on the specifications of Berri and Brook (2010), which, as noted above, only considered factors associated with MPL and not MR. The second regression considers all potential relevant variables as discussed in Section 5, except for where they were intentionally omitted (i.e. Games Started and experience). The third regression re-runs the specifications of regression 2, only considering statistically significant variables in order to prevent non-significant variables from biasing the estimates of the beta estimates. The fourth regression tests for possible non-linear relationships between the wages and E(MRPL), using the significant variables from the second regression. Looking at the first regression alone (BB) there are a number of important findings to note. Considering the regression run by Berri and Brook in their paper, where they found a statistically significant (stat-sig.) relationship between wage and: save% last year, save% 2 seasons ago, squared age, time on ice, time on ice 2 seasons ago, and a stat-sig. constant. Using the new sample in this paper, only save% last season and time on ice last season were found to be stat-sig. This is important to understand because it shows a clear di↵erence in the relationship between wage and E(MRPL) for the two samples; showing that the sample used in this paper may not be externally valid to the specifications run by Berri and Brooks, as suspected earlier. While it is problematic to have a sample that is not externally valid, conclusions about improving the estimated relationship between wage and E(MRPL) can 23
  • 23. Table 3: Salary Regressions. Dependent variable: Log of salary. OLS Methods used. Het- eroscedastic robust standard errors used. p values presented in parenthesis below coe cient estimates Independent Variables BB All Var. Sig. Non-Linear Sig. Save% Last Season 19.91*** 6.95 5.44 -125.56 (.030) (0.461) (0.558) (0.885) Save% 2 Seasons Ago 8.22 9.85 11.6* 1076*** (0.310) (0.194) (0.112) (0.045) Sq. Save% Last Season - - - 71.52 (-) (-) (-) (0.881) Sq. Save% 2 Seasons Ago - - - -585.77*** (-) (-) (-) (0.047) Age 0.04 0.31* 0.215 0.204 (0.851) (0.142) (.252) (0.258) Sq. Age -0.93⇤10 3 -0.005* -0.003 -0.003 (0.759) (0.140) (0.226) (0.245) Time on Ice Last Season 0.35⇤10 3 *** 0.254⇤10 3 *** 0.254⇤10 3 *** 0.254⇤10 3 *** (0.001) (0.006) (0.003) (0.007) Time on Ice 2 Seasons Ago -0.11⇤10 3 - - - (0.306) (-) (-) (-) Population 1.63⇤10 8 4.78⇤10 9 - - (0.562) (0.841) (-) (-) Google Searches - 9.88⇤10 6 *** 1.04⇤10 5 *** 1.00⇤10 5 *** (-) (0.00) (0.00) (0.00) No. Free Agent Goalies - -0.07* -0.07** -0.07** (-) (0.115) (0.065) (0.083) RFA - 0.49** 0.38 0.43* (-) (0.094) (0.203) (0.119) Capacity - -0.07 - - (-) (0.945) (-) (-) Height - 0.04 - - (-) (0.316) (-) (-) Constant -11.99 -7.97 -3.46 -426.80 (0.250) (0.396) (0.670) (0.320) Adj. R Squared 0.37 0.53 0.56 0.56 Observations 44 44 44 44 *** 95% Confidence ** 90% Confidence * 85% Confidence still be made; however they are potentially invalid when considering new samples. With that said, the sign of all of the estimates are of the expected sign except for time on ice 2 24
  • 24. seasons ago, which is negative. For the same reasons as why TOI is expected to be positively correlated with wage, as discussed in section 7.1, so is time on ice 2 seasons ago; though the lack of statistical significance regarding this variable implies that it can be disregarded. Considering the first regression (AllVar) both save% last season and save% 2 seasons ago are of the expected sign, yet not stat-sig. The lack of significance of the estimates of save percentage could be explained by the fact that E(u|X) 6= 0, where the unobserved omitted variable is negatively biasing the e↵ect of save% on wage. The lack of significance could also be explained by the fact that there is an omitted variable correlated with wage and save%, as will be explained later; in this case the omitted factor was that I failed to model a non-linear relationship between wage and save%. In the same regression, both age and squared age are of the expected sign and are stat-sig. Implying how goalies improve from their rookie season through their ‘prime’ and then become less productive after their ‘prime’. However, when attempting to calculate what age a goalie in our sample hits their prime, it was found that in our sample as goalies get older their wages decrease. This is known through the fact that any age greater then 0 would yield a negative value for 0.04Age .93Age2 . This can again be explained by the sample selection bias present in this paper, which meant that a lot of the goalies in our sample were past their prime at the time of entering free agency. Furthering the point that a team with a productive goalie in their prime would not allow them to enter free agency at the risk of not being able to get as good of a goalie. Variables such as population, capacity and height are all of the expected sign, but not stat-sig, so they were dropped from the regression. Accounting for a player’s MR, Google searches appears to be a significant determinant of wages as it is of the expected sign and is stat-sig with 95% confidence. Testing for di↵erences between UFAs and RFAs, there is a stat-sig relationship between RFA and wage, showing that restricted free agents will earn more money than their unre- stricted counterparts. This relationship can be explained by the fact that when a player is a RFA, there are specific rules that teams must follow; for example, they are not allowed to o↵er a player less then 90% of his original salary. When players are unrestricted free agents, teams may o↵er and pay them any salary both parties agree to, including a pay cut of more then 10%. Though considering the work Richardson, Lambrinos and Ashman, a more plau- sible explanation is that the restricted free agent goalies were better then their non-restricted counterparts. This is understandable as teams who have the rights to a goalie (referring to RFAs) would be more willing to see what other goalies are available in free agency knowing they could always re-sign their old goalie. This is contrary to teams whose goalies become UFA after their contract because then the team would only resign the goalie if they are sure he has a greater MRPL then all other free agents. Therefore it can be though that the RFAs 25
  • 25. in our sample are of better quality then the UFA, which is why they command a higher wage. Regarding a player’s bargaining power, the sign on number of free agent goalies is neg- ative and stat-sig; demonstrating that, ceteris paribus, as the number of free agent goalies increases, a goalie’s individual wage will decrease. When comparing the regressions run by Berri and Brook with the first regression of this paper (AllVar) the di↵erences become quite clear. Including variables relating to a goalie’s MR as well as their bargaining power significantly increases the overall level-of- fit of the regression. Thus, with the adjusted r-squared increasing from 0.37 to 0.53, the inclusion of variables relating to a goalie’s MR/bargaining power explains almost twice as much of the di↵erences in wages as not including them. By comparing the results of the last two regressions of table 3 (sig/non-linear sig), it is possible to interpret the relationship between wage and determinants of wage. By modeling for non-linearity, the results become a lot clearer than before. Instead of there being a linear relationship between save% and wage, it appears that a non-linear relationship between these variables provides the strongest explanation. Thus, when modeling for non-linearity, the estimates of the e↵ect of save% on wage become more significant than before. Considering the stat-sig coe cients of save% 2 seasons ago, which was positive, and squared save% 2 seasons ago, which was negative, it appears that there are diminishing returns on wage to save percentage. This is very plausible, as increasing the save percentage from 91% to 93% would likely have a much larger impact on wage then an increase from 93% to 96%. This relationship can be explained by the fact that ice hockey is a team sport, meaning that in order to win a game you not only need a goalie to prevent goals, but you also need forwards to score goals. Thus, if you assume that an average NHL team takes 30 shots per game, then with a save percentage of less then 91% a goalie will concede more then 2 goals; with a save percentage between 91% and 95%, a goalie will concede 2 goals; and with a save percentage above 96% a goalie will concede 1 goal or less. The mean save percentage in our sample was 90.9%, with a minimum of 88.3% and a maximum of 93%; so with an average of 30 shots, a team will score 3 times per game. In terms of goals conceded, increasing ones save percentage from 91% to 93% would decrease the goals allowed from 3 to 2; increasing save percentage from 93% to 95% would not change the goals allowed; and increasing the save percentage above 95% would decrease the goals allowed from 2 to 1 or even 0. Since the average team will score 3 times per game, as stated above, teams will pay goalies more money for decreasing their goals conceded from 3 to 2, than from 2 to 1, because in the first case the di↵erence is between winning or not winning, while in the second case there is no e↵ect on whether a team will win. The marginal return on wage of conceding 2 goals versus 3 is greater than 26
  • 26. that of conceding 1 goal versus 2. Empirically, the point of diminishing returns which is calculated by determining the maximum of the parabola 1076(Save2%) 585.77(Save2%)2 , as implied by the regression ‘Non-Linear Sig’, is at 91.84%. Due to the lack of specification robustness, the small sample size, and potential sample se- lection biases, rather than considering each regression as a way of modeling w = E(MRPL), it is better to consider what has been learned about how teams determine a goalies E(MRPL). Through the regressions presented it is clear that the addition of variables relating to a player’s MR, as well as their bargaining power and contract status, are all significant de- terminants of wage. Although this sample lacks external validity for the reasons presented above, it appears correct to claim that: in addition to the performance related statistics con- sidered in the literature, measures of popularity as well as bargaining power are statistically significant determinants of wage, and not considering them would result in a serious omitted variable bias. 7.3.2 Looking Forward While the variables presented above relating to a player’s MR and MPL do a good job at explaining goalies’ wages, Berri and Brook (2010) point out that: “Although salaries are often a function of past performance, the salary decision is a statement about the future. Teams are not paying [for] what a goalie did last year, but what they hope that goalie will do after he signs the contract.”[1] Consequently, this section will consider how good teams are at estimating a goalie’s MRPL. This section will consider the condition: E(MRPL) = MRPL (30) Empirically the first regression (BB) can be written as: ˆln(wage) = ˆ0 + ˆ1Savenew + ˆ2Age + ˆ3Sq.Age + ˆ4TOI + ˆ5POP + ✏ (31) This will be done by using the same methodology as used in the last section. Here, three regressions will be presented; the first will run the specification of Berri and Brook through the new sample of this paper, the second will consider variables that related to a goalies MR that were not previously considered, and the third will re-run the specification of regression 2 while also allowing for possible non-linear relationships between E(MRPL) and MRPL. Analyzing the results from the first regression of table 4 (BB), there are a few important points to note. Comparing these results to the ones obtained in Berri and Brook’s paper 27
  • 27. Table 4: Salary Regression. Dependent variable: Log of salary. OLS Methods used. Het- eroscedastic robust standard errors used. p values presented in parenthesis below coe cient estimates Independent Variables BB R1 R2 Save% Current Season -0.706 -5.93 -877.11 (0.931) (0.402) (0.179) Sq. Save% Current Season - - 479.44 (-) (-) (0.183) Age -0.082 - - (0.645) (-) (-) Sq. Age 0.001 - - (0.648) (-) (-) Time on Ice Current Season 0.382⇤10 3 *** 0.353⇤10 3 *** 0.353⇤10 3 *** (0.004) (0.003) (0.003) Population 3.52⇤10 8 * - - ((0.141)) (-) (-) Google Searches - 1.38⇤10 5 *** 1.35⇤10 5 *** (-) (0.00) (0.00) Constant 15.42*** 18.82*** 414.51 (0.037) (0.004) (0.162) Adj. R Squared 0.166 0.404 0.405 Observations 44 44 44 *** 95% Confidence ** 90% Confidence * 85% Confidence (where save percentage, age, squared age and population were all of the expected sign but not stat-sig); using the data set of this paper all of these variables, with the exception of population, were not of the expected sign and not stat-sig. In their paper, Berri and Brook attribute this lack of relationship between measures of production and expected production to be due to the inconsistent nature of goalies. The fact that the sample in this paper found an inverse relationship between production and expected production implies that the goalies in this sample were more inconsistent than those in that of Berri and Brook. This is very plausible due to the fact that our sample only considers goalies let go from teams into free agency and not all goalies. Before the inconsistency of goalies is discussed, it is first important to consider the final two regressions of table 4. Considering the second regression (R1), a negative relationship is found between save% and wage. Taken literally, this would imply that the better a goalie performs, the lower he will be paid. This explanation has no bearing in the context of ice hockey, and since the coe cient is not statistically di↵erent from zero, it can simply be said 28
  • 28. that current wage is unrelated to current save%. Another important di↵erence between the first and second regressions of table 4 is the exclusion of the age and population variables. This is because table 4 represents the equation E(MRPL) = MRPL. When taking expecta- tions of MRPL factors such as age, population, height, capacity, etc., are important because they represent indicators of what future production may be. Especially considering the in- consistent nature of goalies, which will be shown later, when estimating future production it is important to consider indicators other then past production. With that said, when determining actual MRPL to see how close it is to what was expected, variables such as age and population are not actual indicators of production and were therefore excluded from the final two regressions of table 4. Another important feature of the last two regressions in table 4 is that although there is no stat-sig relationship between production and wage, there is still a strong stat-sig relationship between MR and wage. Although the coe cient on Google searches is not particularly large in absolute value, its strong statistical significance as well as increased measure-of-fit (from 16% to 40%) shows that its inclusion as a determinant of MR is crucial. Coupled with the inconsistent nature of goalies’ production, excluding measures related to a goalie’s MR significantly decreases the estimation ability of our regressions. The inconsistent nature of goalies can be shown through a regression of current save% on save% last season, as well as through a regression of save% last season on save% 2 seasons ago. Regarding the first regression, the r-squared is equal to 0.0059 with the coe cient on save% last season not being stat-sig, implying that a goalie’s production last season explains 0.59% of their production this season. In terms of the second regression where the r-squared is equal to 0.12 and the coe cient on save% 2 seasons ago is stat-sig, this implies that a goalie’s production 2 years ago explains 12% of their production last year. The di↵erence in significance of both the estimates of lagged production as well as the very low measures-of-fit for both regressions strongly imply that a goalie’s production in one season is not a good predictor of what their production will be in future seasons. 8 Conclusion The purpose of this paper was to extend the current literature on wage determination for NHL goalies. In Berri and Brook’s paper they found the coe cient of variation (CoV) of a goalies wage to be 0.74, and a CoV of a goalies save% to be 0.01115 . This paper found a CoV of wage equal to 0.65, and a CoV of save% equal to 0.012. Both of these results show 15 All CoV calculation were done by hand and were calculated by dividing a variables standard deviation by its mean 29
  • 29. that the di↵erences in goalies’ wages are significantly greater than their di↵erence in produc- tion. Specifically, this paper claimed that in the post lock-out period of the NHL, including variables relating to a goalie’s marginal revenue will significantly increase the measures-of-fit for the regressions and therefore explain more of the variation in goalies’ wages. Due to the problems discussed above with the internal and external validity of the sample, rather than considering each regression individually, an overall analysis of all the specifications will be made, which will still only refer to the goalies within the sample of this paper. Considering both methods of modeling wages and E(MRPL), both before a contract is signed and after, it is clear that a player’s popularity is a significant determinant of wage. Due to the lack of predictability of a goalies’ production, their popularity becomes even more significant; when teams are signing goalies to a contract, they can be a lot more certain of the revenues they will generate due to a players popularity than of those generated from wins the goalie will potentially produce. 8.1 Future Research Due to problems regarding the sample in this paper, which a↵ected both the internal and external validity of our regressions, it is quite di cult to draw concrete conclusions about the e↵ects introducing these new variables has on modeling wages, although certain conclusions can be drawn. Moving forward, the next best step in analyzing the determinants of goalies’ wages would be to first obtain full contract information of all NHL goalies so there is an unbiased sample. Without the zero conditional mean assumption being violated, it is the opinion of this author that a stronger, more significant relationship would be found between production and wage. This paper also agrees with the conclusions of Berri and Brooks (2010), which stated: “Decision makers in hockey have correctly identified save percentage as the ap- propriate measure of performance. However, it does not appear that decision makers understand the inconsistency of this measure.[1]” Therefore, if future papers also have information on how NHL teams deal with the inconsis- tent nature of a goalie’s performance when estimating their future production, there would be potential for a much stronger understanding of how teams determine a goalie’s salary. Though this type of information is more likely to be obtained through conversations with NHL general managers then through data analysis. While in this paper, save percentage was used as the primary measure of production, due to the problems in considering wins and goals against average, future research may consider 30
  • 30. using adjusted save percentage in order to control for the e↵ect of the defense. Schuckers (2011)[26] argued that even a goalie’s save percentage is dependent on the defense in front of them and therefore by calculating a defense independent save percentage, also referred to as adjusted save percentage, one could develop a more accurate estimate of the relative production capabilities of goalies across the NHL. By using this metric, one should find a more significant relationship between production and wage. Unfortunately, at the time of writing this paper, the only website that that calculated adjusted save percentage for all goalies (war-on-ice) was ine↵ective16 . As demonstrated in this paper, due to the inconsistent nature of goalies, teams do a poor job in estimating goalies’ future performance. Coupled with the fact that goalies are all very similar in their production abilities, shown by a CoV of save% equal to 0.012, teams are essentially ’throwing darts at a board’ when estimating a goalie’s future production. With that said, contract determination could be considered a bargaining solution between teams and goalies that is not based as much on production as it is on other factors. While the main purpose of this paper was to consider the e↵ect of popularity on wages, and therefore it only controlled for the number of free agent goalies so as to not bias the e↵ect of production and popularity on wages, future work could undertake a much more in-depth analysis of the bargaining process between teams and goalies. It may even be possible to argue that compared to other sports, such as basketball and baseball, as well as other positions in ice hockey, due to the limited information available to evaluate goalies’ production coupled with the inconsistency of the measure, their salary decisions are more reflective of a bargaining agreement than of a linear relationship between factors of MRPL and wage. 16 In addition to having data on adjusted save percentage, war-on-ice also had standard measures of performance like save percentage, goals against average, time on ice, etc. When comparing this standard measures of performance with the ones reported on NHL.com many inconsistencies were found and therefore war-on-ice could not be trusted as a reliable source of data. 31
  • 31. References [1] Berri, D., Brook, S. (2010). On the Evaluation of the ’Most Important’ Position in Professional Sports. Journal Of Sports Economics, 11(2), 157-171. Retrieved from: http://dx.doi.org/10.1177/1527002510363097 [2] Scully, G.W., 1974. Pay and Performance in Major League Baseball. The American Economic Review 64(6), 915–930. Retrieved from http://www.jstor.org/stable/1815242. [3] Richardson, D. H.. (2000). Pay, Performance, and Competitive Balance in the Na- tional Hockey League. Eastern Economic Journal, 26(4), 393–417. Retrieved from http://www.jstor.org/stable/40326440 [4] Lambrinos, J.; Ashman, T. D. (2007). Salary Determination in the National Hockey League Is Arbitration E cient? Journal of Sports Economics, 8(2), 192-201. Retrieved from http://jse.sagepub.com/content/8/2/192.refs [5] Lavoie, M., Grenier, G., Coulombe, S.. (1987). Discrimination and Performance Di↵er- entials in the National Hockey League. Canadian Public Policy / Analyse De Politiques, 13(4), 407–422. Retrieved from http://doi.org/10.2307/3550883. [6] Grenier, G., Lavoie, M. (1988). Francophones in the National Hockey League: Test of Entry and Salary Discrimination. Mimeo, University of Ottawa. [7] Jones, J. C. H., Walsh, W. D.. (1988). Salary Determination in the National Hockey League: The E↵ects of Skills, Franchise Characteristics, and Discrimination. I ndustrial and Labor Relations Review, 41(4), 592–604. http://doi.org/10.2307/2523593 [8] JONES, J., NADEAU, S., WALSH, W. (1999). Ethnicity, productivity and salary: player compensation and discrimination in the National Hockey League. Applied Economics, 31(5), 593-608. Retrieved from: http://dx.doi.org/10.1080/000368499324048 [9] McLean, R., Veall, M. (1992). Performance and Salary Di↵erentials in the National Hockey League. Canadian Public Policy / Analyse De Politiques, 18(4), 470. Retrieved from: http://dx.doi.org/10.2307/3551660 [10] Lebo, A. (2006). Wage Discrimination in the National Hockey League (Bachelor of Arts). Acadia University. Retrieved from: http://economics.acadiau.ca/ [11] Raeder, D., Sommers, P. (2009). Are Russians the Highest-Paid Goalies in the NHL?. International Atlantic Economics Society, 16(1), 132-133. Retrieved from: http://dx.doi.org/10.1007/s11294-009-9239-2 32
  • 32. [12] Idson, T., Kahane, L. (2000). Team e↵ects on compensation: an application to salary de- termination in the National Hockey League. Economic Inquiry, 38(2), 345-357. Retrieved from: http://dx.doi.org/10.1111/j.1465-7295.2000.tb00022.x [13] Kahane, L. (2001). Team and player e↵ects on NHL player salaries: a hierarchi- cal linear model approach. Applied Economics Letters, 8(9), 629-632. Retrieved from: http://dx.doi.org/10.1080/13504850010028607 [14] Watterson, S. (2009). Position Value and Wage Determinants in the NHL. Clemson University. [15] Fullard, J. (2012). Investigating Player Salaries and Performance in the National Hockey League. Brock University. Retrieved from: http://dr.library.brocku.ca/handle/10464/3997 [16] Pantano, J. (2012). Is Bigger Better? An Examination of the E↵ects of Size on Perfor- mance and Compensation of NHL Goaltenders. College of the Holy Cross. [17] Staudohar, P. (2005). Hockey Lockout of 2004-05. The Monthly Labor Review 128(12), 23-29. [18] Perlo↵, J. (2013). M icroeconomics with calculus. Boston, Mass.: Pearson. Ch. 15 [19] Vincent, C., Eastman, B. (2009). Determinants of Pay in the NHL: A Quantile Regression Approach. Journal Of Sports Economics, 10(3), 256-277. Retrieved from: http://dx.doi.org/10.1177/1527002508327519 [20] Bhandari, N. (2014). The 2004-05 Lockout: Where is the NHL Ten Years Later?. TheRichest. Retrieved from: http://www.therichest.com/sports/hockey-sports/the- 2004-05-nhl-lockout-where-is-the-nhl-ten-years-later/?view=all [21] Peck, K. (2012). Salary Determination in the National Hockey League: Restricted, Unrestricted, Forwards, and Defensemen (Honors Theses). W estern Michigan University Retrieved from http://scholarworks.wmich.edu/honors theses [22] Kimelman, A. (2010). All-Star Game to feature new Fantasy Draft. N HL.com. Retrieved from https://www.nhl.com/news/all-star-game-to-feature-new-fantasy-draft/c-543059 [23] NHL (2016). New format for Honda NHL All-Star Game announced. N HL.com. Retrieved from https://www.nhl.com/news/new-format-for-honda-nhl-all-star-game- announced/c-788532 33
  • 33. [24] NHL (2005). NHL Enacts Rule Changees. Nhl.com. Retrieved from: http://www.nhl.com/ice/page.htm?id=26394 [25] Jenkins, J. (1996). A Reexamination of Salary Discrimination in Professional Basketball. Social Science Quarterly. 77(3). 594 - 608. [26] Schuckers, M. (2011). DIGR: A Defense Independent Rating of NHL Goaltenders using Spatially Smoothed Save Percentage Maps. In MIT Sloan Sports Analytics Conference. Canton, NY. [27] Daccord, B. (1998). Hockey goaltending: Skills for Ice and In-Line Hockey. Champaign, Ill.: Human Kinetics. p6 34
  • 34. A Stata do File ⇤⇤Summary S t a t i s t i c s summarize ln nwage summarize ln nwage , d summarize savenew summarize savenew , d summarize savenew2 summarize savenew2 , d summarize exp summarize exp , d summarize height summarize height , d summarize age summarize age , d summarize gs new summarize gs new , d summarize toinew summarize toinew , d summarize capacity summarize capacity , d summarize asgbeforencontract summarize asgbeforencontract , d summarize avgmonthlygooglesearcheslas summarize avgmonthlygooglesearcheslas , d summarize numberofnhlcalibergoaliesin summarize numberofnhlcalibergoaliesin , d summarize pop summarize pop , d ⇤ Variables names needed to be changed to f i t latex ⇤ rename avgmonthlygooglesearcheslas google rename numberofnhlcalibergoaliesin bargain ⇤Proving Assumptions⇤ reg ln nwage s a v e l a s t save2 age age2 t o i l a s t google bargain r f a estat v i f reg ln nwage s a v e l a s t savelast2 save2 save2 2 t o i l a s t age age2 google bargain r f a he tte st reg ln nwage savenew toinew avgmonthlygooglesearcheslas , r predict res , r summarize res summarize res , d ⇤Table 3 Regressions in order of appearance on table (A R2 squared Calculated Manually )⇤ reg ln nwage s a v e l a s t save2 age age2 t o i l a s t toi2 pop , r reg ln nwage s a v e l a s t save2 age age2 t o i l a s t pop google bargain r f a capacity height , r reg ln nwage s a v e l a s t save2 age age2 t o i l a s t google bargain rfa , r reg ln nwage s a v e l a s t savelast2 save2 save2 2 age age2 t o i l a s t google bargain rfa , r ⇤Table 4 Regressions in order of appearance on table (A R2 squared Calculated Manually )⇤ reg ln nwage savenew age age2 toinew pop , r reg ln nwage savenew toinew google , r reg ln nwage savenew savenew2 toinew google , r 35