1. Jason Mei
Performance Consistency and Scoring Randomness in Professional
Hockey
Momentum or “streakiness” as it is sometimes referred to is an often talked about phenomena in the
world of sports. Whether it be a players whose shooting is on fire in basketball, or a goalie whose hot
and making an abnormally high number of saves. This paper aims to seek out evidence to determine
whether such a thing actually exists and what the potential drivers if any of observed phenomena.
For the primary analysis, the performance of goaltenders and their ability to make saves in the sport of
hockey has been used. The decision to use this data stems from the increased sample size and frequency
of data that can be collected in a game, as well as its availability. Goalies are certain to face a number of
shots (and conversely make saves) in every game they play in. In a standard NHL game in the 2015-2016
season, an average of approximately 30 shots land on goal in every game. This cannot be said of players
shooting the puck who often go whole games without a shot on goal. It is widely believed that a players
confidence and mental state has a significant effect on performance, it is a widespread practice for a
goaltender to be replaced in the middle of a game after a poor start or stretch of letting multiple goals
in. Alternatively, goalies who are “hot” are often played more frequently in succession. Given the
number of situations and shots that a professional goaltender will face in a 80 game NHL season, the
occurrence of improbable events such as stretches of consecutive saves as well as “grouping” of goals in
short span are statistically inevitable. We will try and determine if these occurrences are the result of
external drivers or simply an inevitable statistically probabilistic event occurring in a large enough
sample size. The base hypothesis will be to test the assertion “Does having a goal scored against them
effect subsequent performance for goaltenders?” Some analysis has been done previously determining
variance of a player’s performance game to game with respect to this. We aim here to establish if the
level of performance fluctuates during a game based on the events that occur within them.
The Data and Basic Terminology
Data utilized was taken from the passing project data release #2 for the 2015-2016 courtesy of Hockey-
Graphs. The data set is the result of collective efforts and countless man hours from a number of
individuals tracking information over the course of many games. Data utilized was as many games as
they had available for all teams. The data set contains information on each shot taken in the NHL,
whether it resulted in a goal and whether it was on goal. Additional data on the date, time, shooter, and
goaltender are all given. The first thing done with the data was to properly isolate and identify individual
goaltenders. The data was then filtered to eliminate goaltenders who only played a single game as well
as all shots that were not on goal.
Quality and skill among goaltenders is often measured by two key statistics, Goals Against per game,
and Save %. We choose to use the latter as the primary metric for our analysis as it best normalizes for
variations between games and skill levels of teams. The metric is calculated by simply dividing the
amount of saves made by a goaltender over the number of shots against them, or alternatively (Total
shots-Goals against)/Total Shots. A higher save % number would indicate that a goaltender is more
skilled or performing better in a given situation.
2. Jason Mei
Data Points 10,695
Individual Games 265
# of goaltenders 69
Range of games October 2015-Feb 2016
Average Save % 91.4%
Figure 1. Summary of data utilized in analysis
The resulting data set was cleaned and sorted to allow us to see each shot on goal and whether that
shot resulted in a goal for every goaltender who has played a minimum amount of games this season.
The data was then sorted to isolate each individual goaltender and list the sequential shots taken
against them for the season (in order as the season progresses). A column was created entitled “Shots
since last goal” excel was set up to count each progressive shot taken after each goal is scored and
resets the counter once a goal is scored on the goaltender. A corresponding number is assigned to each
shot taken against the goaltender. Illustratively, if 10 shots were landed on a goaltender since the last
goal was scored, the shots would be labeled 1 through 10. If a goal is scored, the counter resets and the
next shot taken would be assigned a 1 subsequently increasing.
Statistical Foundation and Bernoulli Distributions
Penned after Swiss scientist and mathematician Jacob Bernoulli a Bernoulli distribution represents the
probability distribution of a random binary event happening. The equation takes the form:
𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑜𝑓 𝑡𝑟𝑖𝑎𝑙𝑠 𝑢𝑛𝑡𝑖𝑙 𝑓𝑖𝑟𝑠𝑡 𝑠𝑢𝑐𝑐𝑒𝑠𝑠 = (1 − 𝑃) 𝑛−1
𝑃
Where P is the probability of an event occurring on the nth
trial: in our case having a goal scored
The distribution, takes the shape of a curve that asymptotically approaches 0 and has a cumulative sum
of 1 as n approaches infinity. The shape of the curve varies with the probability of the event (save %) as
demonstrated in figure 2.
3. Jason Mei
Figure 2: Example Bernoulli probability distributions for various save %’s
As such, the probability of scoring a goal on the first shot (n=1) is equal to P, the probability on the
second shot would be (1-P)P and so on. Note that since this is a probability distribution, the possibility of
scoring on the nth
shot decreases to account for the reduced probability of reaching the subsequent shot
event. Ie. The probability asymptotically approaches zero as n increases, equating to the likelihood that
a goal has already been scored in the previous shot attempts.
The distribution has the key underlying basis that the probability of each nth
event occurring is equal. As
in there is no difference in scoring probability between the first shot and 20th
shot against a goaltender.
This allows us the opportunity to compare the Bernoulli distribution with data collected from the 2015-
2016 NHL season, any variances from this randomized distribution would indicate that factors are
driving deviations in goal distribution that are not statistically random.
Analysis
Using the refined dataset created, a conditional analysis was performed surmising data in the “Shots
since last goal” column with a count of the occurrence of each number. Dividing this by the total number
of shots in our data set, we can calculate the probability of the occurrence of the nth
shot being taken
(ie. 2 shots between goals, 10 shots, 20 shots etc.). This newly created distribution can then be
compared to a Bernoulli distribution using the event probability of our data set.
Our first analysis is a high level overview of our entire data set. Taking a count of each of the shots
taken, and comparing to the measured goal scoring probability of 0.086395 over our data set (an
implied average save % of 0.914 overall). The bars in subsequent charts show variance between our
measured statistics and calculated random distribution.
0%
2%
4%
6%
8%
10%
12%
14%
16%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Probability of goal on nth shot for various save %
85% 90% 95%
4. Jason Mei
Figure 3: Actual goal distribution vs random distribution for entire data set.
We can see that at first glance, our measured data closely emulates that of a random distribution. This
leads to the preliminary analysis that overall as a whole, no significant deviations exist among NHL
goaltenders with regards to making saves in relation to previous events.
However, there is a noticeable deviation and pattern within our data, particularly in the first five shots
or so. There is a noticeable increase in the probability of a goal being scored in the first five shots
following a goal. After which, there is a decrease in probability before the two values begin to converge.
This indicates that as a whole, a goaltender is more likely to let another goal in for the first five shots
following a goal. However, once five consecutive saves have been made in a row, they are more likely to
make a save on preceding shots and thus less likely to let a goal in.
Further analyzing our data, we begin to look at the performance of individual players to determine if this
is a universal trend, or only one ascribed to only certain players that are overrepresented or skewing our
data. A similar analysis is performed isolating for the shots taken against individual goaltenders. This is
done using a player’s individual save % from the data set and shots taken against them to perform our
analysis. We investigate goaltenders that had the most shots taken against them in our sample as well
as the top and bottom ranked players in terms of save %. Finally the deviation of these players are
averaged and displayed to show the compounded result for goaltenders with the most playtime in the
NHL.
-0.2%
-0.1%
0.0%
0.1%
0.2%
0.3%
0.4%
0.5%
0.6%
-2%
0%
2%
4%
6%
8%
10%
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73
Variance
EventProbability
Shots Between Goals
Goaltender Save % Vs. Random Distribution
Variance Goal Probability Distribution Bernouli (Random) Distribution Cumulative Variance
5. Jason Mei
Figure 4: Analysis for Ben Bishop. Sample size = 432, Average save % =90.8%
Figure 5: Analysis for Braden Holtby. Sample size = 437, Average save % =92.2%
-0.8%
-0.6%
-0.4%
-0.2%
0.0%
0.2%
0.4%
0.6%
0.8%
1.0%
1.2%
1.4%
0%
1%
2%
3%
4%
5%
6%
7%
8%
9%
10%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
VariancevsRandom
Probabilityofgoal
Ben Bishop
Variance Goal Probability Distribution Bernouli (Random) Distribution
-1.0%
-0.8%
-0.6%
-0.4%
-0.2%
0.0%
0.2%
0.4%
0.6%
0.8%
0%
1%
2%
3%
4%
5%
6%
7%
8%
9%
1 2 3 4 5 6 7 8 9 1011121314151617181920212223242526272829303132333435363738394041424344454647484950515253
VariancevsRandom
Probabilityofgoal
Braden Holtby
Variance Goal Probability Distribution Bernouli (Random) Distribution
6. Jason Mei
Figure 6: Analysis for Corey Crawford. Sample size = 773, Average save % =93.0%
Figure 7: Analysis for Sergei Bobrovski. Sample size = 193, Average save % =86.5%
-1.0%
-0.8%
-0.6%
-0.4%
-0.2%
0.0%
0.2%
0.4%
0.6%
0%
1%
2%
3%
4%
5%
6%
7%
8%
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73
VariancevsRandom
Probabilityofgoal
Corey Crawford
Variance Goal Probability Distribution Bernouli (Random) Distribution
-1.5%
-1.0%
-0.5%
0.0%
0.5%
1.0%
0%
2%
4%
6%
8%
10%
12%
14%
16%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
VariancevsRandom
Probabilityofgoal
Sergei Bobrovski
Variance Goal Probability Distribution Bernouli (Random) Distribution
7. Jason Mei
Figure 8: Analysis for Tuka Rask. Sample size = 326, Average save % =89.6%
Figure 9: Analysis for Corey Schnieder. Sample size = 884, Average save % =92.9%
-0.8%
-0.6%
-0.4%
-0.2%
0.0%
0.2%
0.4%
0.6%
0.8%
0%
2%
4%
6%
8%
10%
12%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
VariancevsRandom
Probabilityofgoal
Tuka Rask
Variance Goal Probability Distribution Bernouli (Random) Distribution
-0.5%
-0.4%
-0.3%
-0.2%
-0.1%
0.0%
0.1%
0.2%
0.3%
0.4%
0.5%
0.6%
0%
1%
2%
3%
4%
5%
6%
7%
8%
1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930313233343536373839404142434445464748495051
VariancevsRandom
Probabilityofgoal
Corey Schnieder
Variance Goal Probability Distribution Bernouli (Random) Distribution
8. Jason Mei
Figure 10: Analysis for James Reimer Sample size = 298, Average save % =90.9%
Figure 11: Average Variance of Analyzed Players
Result Analysis
Our analysis of the league as a whole shows that on average there is a noticeable effect on the
performance of goaltenders after a goal is scored against them. This is primarily evident within the first
5 shots subsequent to letting a goal in, where there is a cumulative 1.87% increased chance of a goal
being let in. This can be theorized to be a result of the impact of confidence on a goaltender after letting
-0.4%
-0.2%
0.0%
0.2%
0.4%
0.6%
0.8%
1.0%
1.2%
0%
1%
2%
3%
4%
5%
6%
7%
8%
9%
10%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
VariancevsRandom
Probabilityofgoal
James Reimer
Variance Goal Probability Distribution Bernouli (Random) Distribution
-0.4%
-0.3%
-0.2%
-0.1%
0.0%
0.1%
0.2%
0.3%
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51
Average Variance
9. Jason Mei
a goal past them. After making subsequent saves in a game allows them to regain this “confidence” and
regress to a performance more representative of their mean.
However, looking at players individually, we see that this pattern varies broadly with each individual.
This is an expected outcome as each player reacts to and internalizes events in a different manner.
Goaltenders such as Corey Schnieder and Ben Bishop are less likely to make a save subsequent to letting
a goal in. While conversely, players such as Braden Holtby and Corey Crawford actually perform better
after having a goal scored against them. This can perhaps be attributed to them gaining increased levels
of focus and effort following the failure to make a save. Though our model does not point to any drivers
we can make assumptions as to the drivers of the results of our analysis.
Understanding these variances in each individual’s performance can lead to better decision making in
game time situations. Goaltenders who demonstrate increased performance after having a goal scored
against them should potentially not be pulled and replaced after a poor start as they show no statistical
evidence of decreased performance. Alternatively, certain players a more prone to decreased levels of
performance and the effects should be considered by coaching staff.
Ultimately, though variance from a random distribution is shown and thus the conclusion that
goals/saves are not randomly distributed, the absolute amount of variation is relatively insignificant. For
the aggregate average of our data sample as well as the vast majority of individual players the actual
distribution of goals and saves still closely resembles that of the random Bernoulli distribution. For many
professional goaltenders, the causal effects on performance as a result of preceding outcomes is
insignificant and likely not enough to be actionable upon.
Assumptions and Considerations
For the primary analysis done, “shots in between goals” carries from game to game in our data set. For
example if a goaltender made 10 consecutive games at the end of a game, the counter would begin at
10 for the next game. This belies the assumption that any “momentum” is carried on from game to
game or the effect is averaged out in our data set. We must also consider that within our data set, not
all games are present and that not all games played are accounted for. Thus not all games contained
within the data are consecutive. Additionally, our analysis does not consider the fact that goaltenders
may not play consecutive games for a number of reasons, including injuries, rest days, or two goalie
teams. To investigate this, another analysis was done with our data using the same methodology with
the sole difference being that the start of a game would represent a reset of the shots counter. Taking
the underlying assumptions that the beginning of a game to be a consistent start point for a player who
can only improve or regress based on subsequent events. However, this also does result in an arbitrary
end counter at the end of a game as it is reset as a result of running out of time on the game clock
rather than an event such as a goal instilling an upper bound to the number of consecutive shots. The
results of this analysis are shown the Appendix and were not considered because of the mention
constraints. If possible an even larger data set spanning an entire season would be preferable. Previous
studies have shown that game to game variation in performance does exist among certain goaltenders,
however performance long term averages and is relatively consistent.
It is also interesting to note that as a game progresses, goaltenders do on average improve regardless.
Figure 12 shows the average save % of all goaltenders sorted by period within our data. This indicates
10. Jason Mei
that there are already drivers that effect performance as a game progresses as performance on average
appears to degrade over the course of a game.
Figure 12: Average Save % per period
With any analysis, there is always the possibility of other un-accounted for drivers and causal variables
that also effect a players performance in relation to time and prior events. The above example is only
one potential driver. However, given the size of our sample data, the effects of this should be accounted
for and averaged in our analysis. Our Finally, in any statistical analysis the accuracy of insights and
conclusions are only as good as ones data. The source from Hockey-Graphs has readily acknowledged
that imperfections in the data do exist and not 100% accurate. It is believed that for our case, the shot
attempts and goal data to be of sufficient accuracy and consistency for the analysis performed.
Conclusion
While trends are shown in our analysis of the NHL as a whole, it is important to realize that each
individual reacts in a different and unique way to various situations. While we are not able to prescribe
the drivers behind any variations from a random distribution of goals and saves.
90.0%
90.5%
91.0%
91.5%
92.0%
92.5%
Period 1 Period 2 Period 3
Save % by Period