SlideShare a Scribd company logo
1 of 10
Download to read offline
Jason Mei
Performance Consistency and Scoring Randomness in Professional
Hockey
Momentum or “streakiness” as it is sometimes referred to is an often talked about phenomena in the
world of sports. Whether it be a players whose shooting is on fire in basketball, or a goalie whose hot
and making an abnormally high number of saves. This paper aims to seek out evidence to determine
whether such a thing actually exists and what the potential drivers if any of observed phenomena.
For the primary analysis, the performance of goaltenders and their ability to make saves in the sport of
hockey has been used. The decision to use this data stems from the increased sample size and frequency
of data that can be collected in a game, as well as its availability. Goalies are certain to face a number of
shots (and conversely make saves) in every game they play in. In a standard NHL game in the 2015-2016
season, an average of approximately 30 shots land on goal in every game. This cannot be said of players
shooting the puck who often go whole games without a shot on goal. It is widely believed that a players
confidence and mental state has a significant effect on performance, it is a widespread practice for a
goaltender to be replaced in the middle of a game after a poor start or stretch of letting multiple goals
in. Alternatively, goalies who are “hot” are often played more frequently in succession. Given the
number of situations and shots that a professional goaltender will face in a 80 game NHL season, the
occurrence of improbable events such as stretches of consecutive saves as well as “grouping” of goals in
short span are statistically inevitable. We will try and determine if these occurrences are the result of
external drivers or simply an inevitable statistically probabilistic event occurring in a large enough
sample size. The base hypothesis will be to test the assertion “Does having a goal scored against them
effect subsequent performance for goaltenders?” Some analysis has been done previously determining
variance of a player’s performance game to game with respect to this. We aim here to establish if the
level of performance fluctuates during a game based on the events that occur within them.
The Data and Basic Terminology
Data utilized was taken from the passing project data release #2 for the 2015-2016 courtesy of Hockey-
Graphs. The data set is the result of collective efforts and countless man hours from a number of
individuals tracking information over the course of many games. Data utilized was as many games as
they had available for all teams. The data set contains information on each shot taken in the NHL,
whether it resulted in a goal and whether it was on goal. Additional data on the date, time, shooter, and
goaltender are all given. The first thing done with the data was to properly isolate and identify individual
goaltenders. The data was then filtered to eliminate goaltenders who only played a single game as well
as all shots that were not on goal.
Quality and skill among goaltenders is often measured by two key statistics, Goals Against per game,
and Save %. We choose to use the latter as the primary metric for our analysis as it best normalizes for
variations between games and skill levels of teams. The metric is calculated by simply dividing the
amount of saves made by a goaltender over the number of shots against them, or alternatively (Total
shots-Goals against)/Total Shots. A higher save % number would indicate that a goaltender is more
skilled or performing better in a given situation.
Jason Mei
Data Points 10,695
Individual Games 265
# of goaltenders 69
Range of games October 2015-Feb 2016
Average Save % 91.4%
Figure 1. Summary of data utilized in analysis
The resulting data set was cleaned and sorted to allow us to see each shot on goal and whether that
shot resulted in a goal for every goaltender who has played a minimum amount of games this season.
The data was then sorted to isolate each individual goaltender and list the sequential shots taken
against them for the season (in order as the season progresses). A column was created entitled “Shots
since last goal” excel was set up to count each progressive shot taken after each goal is scored and
resets the counter once a goal is scored on the goaltender. A corresponding number is assigned to each
shot taken against the goaltender. Illustratively, if 10 shots were landed on a goaltender since the last
goal was scored, the shots would be labeled 1 through 10. If a goal is scored, the counter resets and the
next shot taken would be assigned a 1 subsequently increasing.
Statistical Foundation and Bernoulli Distributions
Penned after Swiss scientist and mathematician Jacob Bernoulli a Bernoulli distribution represents the
probability distribution of a random binary event happening. The equation takes the form:
𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑜𝑓 𝑡𝑟𝑖𝑎𝑙𝑠 𝑢𝑛𝑡𝑖𝑙 𝑓𝑖𝑟𝑠𝑡 𝑠𝑢𝑐𝑐𝑒𝑠𝑠 = (1 − 𝑃) 𝑛−1
𝑃
Where P is the probability of an event occurring on the nth
trial: in our case having a goal scored
The distribution, takes the shape of a curve that asymptotically approaches 0 and has a cumulative sum
of 1 as n approaches infinity. The shape of the curve varies with the probability of the event (save %) as
demonstrated in figure 2.
Jason Mei
Figure 2: Example Bernoulli probability distributions for various save %’s
As such, the probability of scoring a goal on the first shot (n=1) is equal to P, the probability on the
second shot would be (1-P)P and so on. Note that since this is a probability distribution, the possibility of
scoring on the nth
shot decreases to account for the reduced probability of reaching the subsequent shot
event. Ie. The probability asymptotically approaches zero as n increases, equating to the likelihood that
a goal has already been scored in the previous shot attempts.
The distribution has the key underlying basis that the probability of each nth
event occurring is equal. As
in there is no difference in scoring probability between the first shot and 20th
shot against a goaltender.
This allows us the opportunity to compare the Bernoulli distribution with data collected from the 2015-
2016 NHL season, any variances from this randomized distribution would indicate that factors are
driving deviations in goal distribution that are not statistically random.
Analysis
Using the refined dataset created, a conditional analysis was performed surmising data in the “Shots
since last goal” column with a count of the occurrence of each number. Dividing this by the total number
of shots in our data set, we can calculate the probability of the occurrence of the nth
shot being taken
(ie. 2 shots between goals, 10 shots, 20 shots etc.). This newly created distribution can then be
compared to a Bernoulli distribution using the event probability of our data set.
Our first analysis is a high level overview of our entire data set. Taking a count of each of the shots
taken, and comparing to the measured goal scoring probability of 0.086395 over our data set (an
implied average save % of 0.914 overall). The bars in subsequent charts show variance between our
measured statistics and calculated random distribution.
0%
2%
4%
6%
8%
10%
12%
14%
16%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Probability of goal on nth shot for various save %
85% 90% 95%
Jason Mei
Figure 3: Actual goal distribution vs random distribution for entire data set.
We can see that at first glance, our measured data closely emulates that of a random distribution. This
leads to the preliminary analysis that overall as a whole, no significant deviations exist among NHL
goaltenders with regards to making saves in relation to previous events.
However, there is a noticeable deviation and pattern within our data, particularly in the first five shots
or so. There is a noticeable increase in the probability of a goal being scored in the first five shots
following a goal. After which, there is a decrease in probability before the two values begin to converge.
This indicates that as a whole, a goaltender is more likely to let another goal in for the first five shots
following a goal. However, once five consecutive saves have been made in a row, they are more likely to
make a save on preceding shots and thus less likely to let a goal in.
Further analyzing our data, we begin to look at the performance of individual players to determine if this
is a universal trend, or only one ascribed to only certain players that are overrepresented or skewing our
data. A similar analysis is performed isolating for the shots taken against individual goaltenders. This is
done using a player’s individual save % from the data set and shots taken against them to perform our
analysis. We investigate goaltenders that had the most shots taken against them in our sample as well
as the top and bottom ranked players in terms of save %. Finally the deviation of these players are
averaged and displayed to show the compounded result for goaltenders with the most playtime in the
NHL.
-0.2%
-0.1%
0.0%
0.1%
0.2%
0.3%
0.4%
0.5%
0.6%
-2%
0%
2%
4%
6%
8%
10%
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73
Variance
EventProbability
Shots Between Goals
Goaltender Save % Vs. Random Distribution
Variance Goal Probability Distribution Bernouli (Random) Distribution Cumulative Variance
Jason Mei
Figure 4: Analysis for Ben Bishop. Sample size = 432, Average save % =90.8%
Figure 5: Analysis for Braden Holtby. Sample size = 437, Average save % =92.2%
-0.8%
-0.6%
-0.4%
-0.2%
0.0%
0.2%
0.4%
0.6%
0.8%
1.0%
1.2%
1.4%
0%
1%
2%
3%
4%
5%
6%
7%
8%
9%
10%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
VariancevsRandom
Probabilityofgoal
Ben Bishop
Variance Goal Probability Distribution Bernouli (Random) Distribution
-1.0%
-0.8%
-0.6%
-0.4%
-0.2%
0.0%
0.2%
0.4%
0.6%
0.8%
0%
1%
2%
3%
4%
5%
6%
7%
8%
9%
1 2 3 4 5 6 7 8 9 1011121314151617181920212223242526272829303132333435363738394041424344454647484950515253
VariancevsRandom
Probabilityofgoal
Braden Holtby
Variance Goal Probability Distribution Bernouli (Random) Distribution
Jason Mei
Figure 6: Analysis for Corey Crawford. Sample size = 773, Average save % =93.0%
Figure 7: Analysis for Sergei Bobrovski. Sample size = 193, Average save % =86.5%
-1.0%
-0.8%
-0.6%
-0.4%
-0.2%
0.0%
0.2%
0.4%
0.6%
0%
1%
2%
3%
4%
5%
6%
7%
8%
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73
VariancevsRandom
Probabilityofgoal
Corey Crawford
Variance Goal Probability Distribution Bernouli (Random) Distribution
-1.5%
-1.0%
-0.5%
0.0%
0.5%
1.0%
0%
2%
4%
6%
8%
10%
12%
14%
16%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
VariancevsRandom
Probabilityofgoal
Sergei Bobrovski
Variance Goal Probability Distribution Bernouli (Random) Distribution
Jason Mei
Figure 8: Analysis for Tuka Rask. Sample size = 326, Average save % =89.6%
Figure 9: Analysis for Corey Schnieder. Sample size = 884, Average save % =92.9%
-0.8%
-0.6%
-0.4%
-0.2%
0.0%
0.2%
0.4%
0.6%
0.8%
0%
2%
4%
6%
8%
10%
12%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
VariancevsRandom
Probabilityofgoal
Tuka Rask
Variance Goal Probability Distribution Bernouli (Random) Distribution
-0.5%
-0.4%
-0.3%
-0.2%
-0.1%
0.0%
0.1%
0.2%
0.3%
0.4%
0.5%
0.6%
0%
1%
2%
3%
4%
5%
6%
7%
8%
1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930313233343536373839404142434445464748495051
VariancevsRandom
Probabilityofgoal
Corey Schnieder
Variance Goal Probability Distribution Bernouli (Random) Distribution
Jason Mei
Figure 10: Analysis for James Reimer Sample size = 298, Average save % =90.9%
Figure 11: Average Variance of Analyzed Players
Result Analysis
Our analysis of the league as a whole shows that on average there is a noticeable effect on the
performance of goaltenders after a goal is scored against them. This is primarily evident within the first
5 shots subsequent to letting a goal in, where there is a cumulative 1.87% increased chance of a goal
being let in. This can be theorized to be a result of the impact of confidence on a goaltender after letting
-0.4%
-0.2%
0.0%
0.2%
0.4%
0.6%
0.8%
1.0%
1.2%
0%
1%
2%
3%
4%
5%
6%
7%
8%
9%
10%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
VariancevsRandom
Probabilityofgoal
James Reimer
Variance Goal Probability Distribution Bernouli (Random) Distribution
-0.4%
-0.3%
-0.2%
-0.1%
0.0%
0.1%
0.2%
0.3%
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51
Average Variance
Jason Mei
a goal past them. After making subsequent saves in a game allows them to regain this “confidence” and
regress to a performance more representative of their mean.
However, looking at players individually, we see that this pattern varies broadly with each individual.
This is an expected outcome as each player reacts to and internalizes events in a different manner.
Goaltenders such as Corey Schnieder and Ben Bishop are less likely to make a save subsequent to letting
a goal in. While conversely, players such as Braden Holtby and Corey Crawford actually perform better
after having a goal scored against them. This can perhaps be attributed to them gaining increased levels
of focus and effort following the failure to make a save. Though our model does not point to any drivers
we can make assumptions as to the drivers of the results of our analysis.
Understanding these variances in each individual’s performance can lead to better decision making in
game time situations. Goaltenders who demonstrate increased performance after having a goal scored
against them should potentially not be pulled and replaced after a poor start as they show no statistical
evidence of decreased performance. Alternatively, certain players a more prone to decreased levels of
performance and the effects should be considered by coaching staff.
Ultimately, though variance from a random distribution is shown and thus the conclusion that
goals/saves are not randomly distributed, the absolute amount of variation is relatively insignificant. For
the aggregate average of our data sample as well as the vast majority of individual players the actual
distribution of goals and saves still closely resembles that of the random Bernoulli distribution. For many
professional goaltenders, the causal effects on performance as a result of preceding outcomes is
insignificant and likely not enough to be actionable upon.
Assumptions and Considerations
For the primary analysis done, “shots in between goals” carries from game to game in our data set. For
example if a goaltender made 10 consecutive games at the end of a game, the counter would begin at
10 for the next game. This belies the assumption that any “momentum” is carried on from game to
game or the effect is averaged out in our data set. We must also consider that within our data set, not
all games are present and that not all games played are accounted for. Thus not all games contained
within the data are consecutive. Additionally, our analysis does not consider the fact that goaltenders
may not play consecutive games for a number of reasons, including injuries, rest days, or two goalie
teams. To investigate this, another analysis was done with our data using the same methodology with
the sole difference being that the start of a game would represent a reset of the shots counter. Taking
the underlying assumptions that the beginning of a game to be a consistent start point for a player who
can only improve or regress based on subsequent events. However, this also does result in an arbitrary
end counter at the end of a game as it is reset as a result of running out of time on the game clock
rather than an event such as a goal instilling an upper bound to the number of consecutive shots. The
results of this analysis are shown the Appendix and were not considered because of the mention
constraints. If possible an even larger data set spanning an entire season would be preferable. Previous
studies have shown that game to game variation in performance does exist among certain goaltenders,
however performance long term averages and is relatively consistent.
It is also interesting to note that as a game progresses, goaltenders do on average improve regardless.
Figure 12 shows the average save % of all goaltenders sorted by period within our data. This indicates
Jason Mei
that there are already drivers that effect performance as a game progresses as performance on average
appears to degrade over the course of a game.
Figure 12: Average Save % per period
With any analysis, there is always the possibility of other un-accounted for drivers and causal variables
that also effect a players performance in relation to time and prior events. The above example is only
one potential driver. However, given the size of our sample data, the effects of this should be accounted
for and averaged in our analysis. Our Finally, in any statistical analysis the accuracy of insights and
conclusions are only as good as ones data. The source from Hockey-Graphs has readily acknowledged
that imperfections in the data do exist and not 100% accurate. It is believed that for our case, the shot
attempts and goal data to be of sufficient accuracy and consistency for the analysis performed.
Conclusion
While trends are shown in our analysis of the NHL as a whole, it is important to realize that each
individual reacts in a different and unique way to various situations. While we are not able to prescribe
the drivers behind any variations from a random distribution of goals and saves.
90.0%
90.5%
91.0%
91.5%
92.0%
92.5%
Period 1 Period 2 Period 3
Save % by Period

More Related Content

Viewers also liked

Vienna FinTech meetup #1
Vienna FinTech meetup #1Vienna FinTech meetup #1
Vienna FinTech meetup #1Patrick Pöschl
 
Robotprogrammatie: enkele lessen uit de praktijk, trends en uitdagingen
Robotprogrammatie: enkele lessen uit de praktijk, trends en uitdagingenRobotprogrammatie: enkele lessen uit de praktijk, trends en uitdagingen
Robotprogrammatie: enkele lessen uit de praktijk, trends en uitdagingenericdemeester
 
How to double_your_income_in_12_months_tip_3
How to double_your_income_in_12_months_tip_3How to double_your_income_in_12_months_tip_3
How to double_your_income_in_12_months_tip_3Shonda Miles
 
Bill of rights #justmyreport
Bill of rights #justmyreportBill of rights #justmyreport
Bill of rights #justmyreportbenjie villacote
 
SSOW Asia Prospectus 2016
SSOW Asia Prospectus 2016SSOW Asia Prospectus 2016
SSOW Asia Prospectus 2016Tom Winter
 
connection_brochure (1)
connection_brochure (1)connection_brochure (1)
connection_brochure (1)Anand Patel
 
cardinal health Q3 2008 Earnings Release
cardinal health Q3 2008 Earnings Releasecardinal health Q3 2008 Earnings Release
cardinal health Q3 2008 Earnings Releasefinance2
 
morgan stanley Earnings Archive 1st
morgan stanley Earnings Archive 1st morgan stanley Earnings Archive 1st
morgan stanley Earnings Archive 1st finance2
 
cardinal health Q1 2008 Earnings Presentation
cardinal health Q1 2008 Earnings Presentationcardinal health Q1 2008 Earnings Presentation
cardinal health Q1 2008 Earnings Presentationfinance2
 
High Value Products Capabilities August 2009
High Value Products Capabilities   August 2009High Value Products Capabilities   August 2009
High Value Products Capabilities August 2009michaelbro8
 

Viewers also liked (14)

Vienna FinTech meetup #1
Vienna FinTech meetup #1Vienna FinTech meetup #1
Vienna FinTech meetup #1
 
SVME Profile 2015
SVME Profile 2015SVME Profile 2015
SVME Profile 2015
 
CV K.Fokianou
CV K.FokianouCV K.Fokianou
CV K.Fokianou
 
RICH_MEDIA_WEBSITE_DESIGN_S08_V2
RICH_MEDIA_WEBSITE_DESIGN_S08_V2RICH_MEDIA_WEBSITE_DESIGN_S08_V2
RICH_MEDIA_WEBSITE_DESIGN_S08_V2
 
RTF344M_S08_SYLLABUS
RTF344M_S08_SYLLABUSRTF344M_S08_SYLLABUS
RTF344M_S08_SYLLABUS
 
Robotprogrammatie: enkele lessen uit de praktijk, trends en uitdagingen
Robotprogrammatie: enkele lessen uit de praktijk, trends en uitdagingenRobotprogrammatie: enkele lessen uit de praktijk, trends en uitdagingen
Robotprogrammatie: enkele lessen uit de praktijk, trends en uitdagingen
 
How to double_your_income_in_12_months_tip_3
How to double_your_income_in_12_months_tip_3How to double_your_income_in_12_months_tip_3
How to double_your_income_in_12_months_tip_3
 
Bill of rights #justmyreport
Bill of rights #justmyreportBill of rights #justmyreport
Bill of rights #justmyreport
 
SSOW Asia Prospectus 2016
SSOW Asia Prospectus 2016SSOW Asia Prospectus 2016
SSOW Asia Prospectus 2016
 
connection_brochure (1)
connection_brochure (1)connection_brochure (1)
connection_brochure (1)
 
cardinal health Q3 2008 Earnings Release
cardinal health Q3 2008 Earnings Releasecardinal health Q3 2008 Earnings Release
cardinal health Q3 2008 Earnings Release
 
morgan stanley Earnings Archive 1st
morgan stanley Earnings Archive 1st morgan stanley Earnings Archive 1st
morgan stanley Earnings Archive 1st
 
cardinal health Q1 2008 Earnings Presentation
cardinal health Q1 2008 Earnings Presentationcardinal health Q1 2008 Earnings Presentation
cardinal health Q1 2008 Earnings Presentation
 
High Value Products Capabilities August 2009
High Value Products Capabilities   August 2009High Value Products Capabilities   August 2009
High Value Products Capabilities August 2009
 

Similar to Sports Aanalytics - Goaltender Performance

Senior Project Research Paper
Senior Project Research PaperSenior Project Research Paper
Senior Project Research Papercrissy498
 
Yujie Zi Econ 123CW Research Paper - NBA Defensive Teams
Yujie Zi Econ 123CW Research Paper - NBA Defensive TeamsYujie Zi Econ 123CW Research Paper - NBA Defensive Teams
Yujie Zi Econ 123CW Research Paper - NBA Defensive TeamsYujie Zi
 
Bank Shots to Bankroll Final
Bank Shots to Bankroll FinalBank Shots to Bankroll Final
Bank Shots to Bankroll FinalJoseph DeLay
 
Tactical Report Match Analysis
Tactical Report Match AnalysisTactical Report Match Analysis
Tactical Report Match AnalysisBrian VanDongen
 
The Importance of Being Open: What Player Tracking Data Can Say About NBA Fie...
The Importance of Being Open: What Player Tracking Data Can Say About NBA Fie...The Importance of Being Open: What Player Tracking Data Can Say About NBA Fie...
The Importance of Being Open: What Player Tracking Data Can Say About NBA Fie...Sloan Sports Conference
 
Predicting Salary for MLB Players
Predicting Salary for MLB PlayersPredicting Salary for MLB Players
Predicting Salary for MLB PlayersRobert-Ian Greene
 
Pressure Index in Cricket
Pressure Index in CricketPressure Index in Cricket
Pressure Index in CricketIOSR Journals
 
Beyond Moneyball: Data Science for Baseball in 2019
Beyond Moneyball: Data Science for Baseball in 2019Beyond Moneyball: Data Science for Baseball in 2019
Beyond Moneyball: Data Science for Baseball in 2019Christopher Conlan
 
Identifying Key Factors in Winning MLB Games Using a Data-Mining Approach
Identifying Key Factors in Winning MLB Games Using a Data-Mining ApproachIdentifying Key Factors in Winning MLB Games Using a Data-Mining Approach
Identifying Key Factors in Winning MLB Games Using a Data-Mining ApproachJoelDabady
 
Measuring Team Chemistry in MLB
Measuring Team Chemistry in MLBMeasuring Team Chemistry in MLB
Measuring Team Chemistry in MLBDavid Kelly
 
OTDK angol
OTDK angolOTDK angol
OTDK angolM J
 
WageDiscriminationAmongstNFLAthletes
WageDiscriminationAmongstNFLAthletesWageDiscriminationAmongstNFLAthletes
WageDiscriminationAmongstNFLAthletesGeorge Ulloa
 
MathematicsResearch
MathematicsResearchMathematicsResearch
MathematicsResearchJohn Crain
 
Joe Kruger Report. OPTIMA
Joe Kruger Report. OPTIMAJoe Kruger Report. OPTIMA
Joe Kruger Report. OPTIMAJoe Kruger
 
Analysis_of_the_Impact_of_Weather_on_Runs_Scored_in_Baseball_Games_at_Fenway_...
Analysis_of_the_Impact_of_Weather_on_Runs_Scored_in_Baseball_Games_at_Fenway_...Analysis_of_the_Impact_of_Weather_on_Runs_Scored_in_Baseball_Games_at_Fenway_...
Analysis_of_the_Impact_of_Weather_on_Runs_Scored_in_Baseball_Games_at_Fenway_...Steve Cultrera
 

Similar to Sports Aanalytics - Goaltender Performance (20)

LAX IMPACT! White Paper
LAX IMPACT! White PaperLAX IMPACT! White Paper
LAX IMPACT! White Paper
 
Senior Project Research Paper
Senior Project Research PaperSenior Project Research Paper
Senior Project Research Paper
 
Yujie Zi Econ 123CW Research Paper - NBA Defensive Teams
Yujie Zi Econ 123CW Research Paper - NBA Defensive TeamsYujie Zi Econ 123CW Research Paper - NBA Defensive Teams
Yujie Zi Econ 123CW Research Paper - NBA Defensive Teams
 
Bank Shots to Bankroll Final
Bank Shots to Bankroll FinalBank Shots to Bankroll Final
Bank Shots to Bankroll Final
 
Tactical Report Match Analysis
Tactical Report Match AnalysisTactical Report Match Analysis
Tactical Report Match Analysis
 
The Importance of Being Open: What Player Tracking Data Can Say About NBA Fie...
The Importance of Being Open: What Player Tracking Data Can Say About NBA Fie...The Importance of Being Open: What Player Tracking Data Can Say About NBA Fie...
The Importance of Being Open: What Player Tracking Data Can Say About NBA Fie...
 
Predicting Salary for MLB Players
Predicting Salary for MLB PlayersPredicting Salary for MLB Players
Predicting Salary for MLB Players
 
Pressure Index in Cricket
Pressure Index in CricketPressure Index in Cricket
Pressure Index in Cricket
 
Beyond Moneyball: Data Science for Baseball in 2019
Beyond Moneyball: Data Science for Baseball in 2019Beyond Moneyball: Data Science for Baseball in 2019
Beyond Moneyball: Data Science for Baseball in 2019
 
Identifying Key Factors in Winning MLB Games Using a Data-Mining Approach
Identifying Key Factors in Winning MLB Games Using a Data-Mining ApproachIdentifying Key Factors in Winning MLB Games Using a Data-Mining Approach
Identifying Key Factors in Winning MLB Games Using a Data-Mining Approach
 
Measuring Team Chemistry in MLB
Measuring Team Chemistry in MLBMeasuring Team Chemistry in MLB
Measuring Team Chemistry in MLB
 
OTDK angol
OTDK angolOTDK angol
OTDK angol
 
B036307011
B036307011B036307011
B036307011
 
WageDiscriminationAmongstNFLAthletes
WageDiscriminationAmongstNFLAthletesWageDiscriminationAmongstNFLAthletes
WageDiscriminationAmongstNFLAthletes
 
Cricket predictor
Cricket predictorCricket predictor
Cricket predictor
 
MathematicsResearch
MathematicsResearchMathematicsResearch
MathematicsResearch
 
Joe Kruger Report
Joe Kruger ReportJoe Kruger Report
Joe Kruger Report
 
Joe Kruger Report. OPTIMA
Joe Kruger Report. OPTIMAJoe Kruger Report. OPTIMA
Joe Kruger Report. OPTIMA
 
Analysis_of_the_Impact_of_Weather_on_Runs_Scored_in_Baseball_Games_at_Fenway_...
Analysis_of_the_Impact_of_Weather_on_Runs_Scored_in_Baseball_Games_at_Fenway_...Analysis_of_the_Impact_of_Weather_on_Runs_Scored_in_Baseball_Games_at_Fenway_...
Analysis_of_the_Impact_of_Weather_on_Runs_Scored_in_Baseball_Games_at_Fenway_...
 
Lineup Efficiency
Lineup EfficiencyLineup Efficiency
Lineup Efficiency
 

Sports Aanalytics - Goaltender Performance

  • 1. Jason Mei Performance Consistency and Scoring Randomness in Professional Hockey Momentum or “streakiness” as it is sometimes referred to is an often talked about phenomena in the world of sports. Whether it be a players whose shooting is on fire in basketball, or a goalie whose hot and making an abnormally high number of saves. This paper aims to seek out evidence to determine whether such a thing actually exists and what the potential drivers if any of observed phenomena. For the primary analysis, the performance of goaltenders and their ability to make saves in the sport of hockey has been used. The decision to use this data stems from the increased sample size and frequency of data that can be collected in a game, as well as its availability. Goalies are certain to face a number of shots (and conversely make saves) in every game they play in. In a standard NHL game in the 2015-2016 season, an average of approximately 30 shots land on goal in every game. This cannot be said of players shooting the puck who often go whole games without a shot on goal. It is widely believed that a players confidence and mental state has a significant effect on performance, it is a widespread practice for a goaltender to be replaced in the middle of a game after a poor start or stretch of letting multiple goals in. Alternatively, goalies who are “hot” are often played more frequently in succession. Given the number of situations and shots that a professional goaltender will face in a 80 game NHL season, the occurrence of improbable events such as stretches of consecutive saves as well as “grouping” of goals in short span are statistically inevitable. We will try and determine if these occurrences are the result of external drivers or simply an inevitable statistically probabilistic event occurring in a large enough sample size. The base hypothesis will be to test the assertion “Does having a goal scored against them effect subsequent performance for goaltenders?” Some analysis has been done previously determining variance of a player’s performance game to game with respect to this. We aim here to establish if the level of performance fluctuates during a game based on the events that occur within them. The Data and Basic Terminology Data utilized was taken from the passing project data release #2 for the 2015-2016 courtesy of Hockey- Graphs. The data set is the result of collective efforts and countless man hours from a number of individuals tracking information over the course of many games. Data utilized was as many games as they had available for all teams. The data set contains information on each shot taken in the NHL, whether it resulted in a goal and whether it was on goal. Additional data on the date, time, shooter, and goaltender are all given. The first thing done with the data was to properly isolate and identify individual goaltenders. The data was then filtered to eliminate goaltenders who only played a single game as well as all shots that were not on goal. Quality and skill among goaltenders is often measured by two key statistics, Goals Against per game, and Save %. We choose to use the latter as the primary metric for our analysis as it best normalizes for variations between games and skill levels of teams. The metric is calculated by simply dividing the amount of saves made by a goaltender over the number of shots against them, or alternatively (Total shots-Goals against)/Total Shots. A higher save % number would indicate that a goaltender is more skilled or performing better in a given situation.
  • 2. Jason Mei Data Points 10,695 Individual Games 265 # of goaltenders 69 Range of games October 2015-Feb 2016 Average Save % 91.4% Figure 1. Summary of data utilized in analysis The resulting data set was cleaned and sorted to allow us to see each shot on goal and whether that shot resulted in a goal for every goaltender who has played a minimum amount of games this season. The data was then sorted to isolate each individual goaltender and list the sequential shots taken against them for the season (in order as the season progresses). A column was created entitled “Shots since last goal” excel was set up to count each progressive shot taken after each goal is scored and resets the counter once a goal is scored on the goaltender. A corresponding number is assigned to each shot taken against the goaltender. Illustratively, if 10 shots were landed on a goaltender since the last goal was scored, the shots would be labeled 1 through 10. If a goal is scored, the counter resets and the next shot taken would be assigned a 1 subsequently increasing. Statistical Foundation and Bernoulli Distributions Penned after Swiss scientist and mathematician Jacob Bernoulli a Bernoulli distribution represents the probability distribution of a random binary event happening. The equation takes the form: 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑜𝑓 𝑡𝑟𝑖𝑎𝑙𝑠 𝑢𝑛𝑡𝑖𝑙 𝑓𝑖𝑟𝑠𝑡 𝑠𝑢𝑐𝑐𝑒𝑠𝑠 = (1 − 𝑃) 𝑛−1 𝑃 Where P is the probability of an event occurring on the nth trial: in our case having a goal scored The distribution, takes the shape of a curve that asymptotically approaches 0 and has a cumulative sum of 1 as n approaches infinity. The shape of the curve varies with the probability of the event (save %) as demonstrated in figure 2.
  • 3. Jason Mei Figure 2: Example Bernoulli probability distributions for various save %’s As such, the probability of scoring a goal on the first shot (n=1) is equal to P, the probability on the second shot would be (1-P)P and so on. Note that since this is a probability distribution, the possibility of scoring on the nth shot decreases to account for the reduced probability of reaching the subsequent shot event. Ie. The probability asymptotically approaches zero as n increases, equating to the likelihood that a goal has already been scored in the previous shot attempts. The distribution has the key underlying basis that the probability of each nth event occurring is equal. As in there is no difference in scoring probability between the first shot and 20th shot against a goaltender. This allows us the opportunity to compare the Bernoulli distribution with data collected from the 2015- 2016 NHL season, any variances from this randomized distribution would indicate that factors are driving deviations in goal distribution that are not statistically random. Analysis Using the refined dataset created, a conditional analysis was performed surmising data in the “Shots since last goal” column with a count of the occurrence of each number. Dividing this by the total number of shots in our data set, we can calculate the probability of the occurrence of the nth shot being taken (ie. 2 shots between goals, 10 shots, 20 shots etc.). This newly created distribution can then be compared to a Bernoulli distribution using the event probability of our data set. Our first analysis is a high level overview of our entire data set. Taking a count of each of the shots taken, and comparing to the measured goal scoring probability of 0.086395 over our data set (an implied average save % of 0.914 overall). The bars in subsequent charts show variance between our measured statistics and calculated random distribution. 0% 2% 4% 6% 8% 10% 12% 14% 16% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 Probability of goal on nth shot for various save % 85% 90% 95%
  • 4. Jason Mei Figure 3: Actual goal distribution vs random distribution for entire data set. We can see that at first glance, our measured data closely emulates that of a random distribution. This leads to the preliminary analysis that overall as a whole, no significant deviations exist among NHL goaltenders with regards to making saves in relation to previous events. However, there is a noticeable deviation and pattern within our data, particularly in the first five shots or so. There is a noticeable increase in the probability of a goal being scored in the first five shots following a goal. After which, there is a decrease in probability before the two values begin to converge. This indicates that as a whole, a goaltender is more likely to let another goal in for the first five shots following a goal. However, once five consecutive saves have been made in a row, they are more likely to make a save on preceding shots and thus less likely to let a goal in. Further analyzing our data, we begin to look at the performance of individual players to determine if this is a universal trend, or only one ascribed to only certain players that are overrepresented or skewing our data. A similar analysis is performed isolating for the shots taken against individual goaltenders. This is done using a player’s individual save % from the data set and shots taken against them to perform our analysis. We investigate goaltenders that had the most shots taken against them in our sample as well as the top and bottom ranked players in terms of save %. Finally the deviation of these players are averaged and displayed to show the compounded result for goaltenders with the most playtime in the NHL. -0.2% -0.1% 0.0% 0.1% 0.2% 0.3% 0.4% 0.5% 0.6% -2% 0% 2% 4% 6% 8% 10% 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 Variance EventProbability Shots Between Goals Goaltender Save % Vs. Random Distribution Variance Goal Probability Distribution Bernouli (Random) Distribution Cumulative Variance
  • 5. Jason Mei Figure 4: Analysis for Ben Bishop. Sample size = 432, Average save % =90.8% Figure 5: Analysis for Braden Holtby. Sample size = 437, Average save % =92.2% -0.8% -0.6% -0.4% -0.2% 0.0% 0.2% 0.4% 0.6% 0.8% 1.0% 1.2% 1.4% 0% 1% 2% 3% 4% 5% 6% 7% 8% 9% 10% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 VariancevsRandom Probabilityofgoal Ben Bishop Variance Goal Probability Distribution Bernouli (Random) Distribution -1.0% -0.8% -0.6% -0.4% -0.2% 0.0% 0.2% 0.4% 0.6% 0.8% 0% 1% 2% 3% 4% 5% 6% 7% 8% 9% 1 2 3 4 5 6 7 8 9 1011121314151617181920212223242526272829303132333435363738394041424344454647484950515253 VariancevsRandom Probabilityofgoal Braden Holtby Variance Goal Probability Distribution Bernouli (Random) Distribution
  • 6. Jason Mei Figure 6: Analysis for Corey Crawford. Sample size = 773, Average save % =93.0% Figure 7: Analysis for Sergei Bobrovski. Sample size = 193, Average save % =86.5% -1.0% -0.8% -0.6% -0.4% -0.2% 0.0% 0.2% 0.4% 0.6% 0% 1% 2% 3% 4% 5% 6% 7% 8% 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 VariancevsRandom Probabilityofgoal Corey Crawford Variance Goal Probability Distribution Bernouli (Random) Distribution -1.5% -1.0% -0.5% 0.0% 0.5% 1.0% 0% 2% 4% 6% 8% 10% 12% 14% 16% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 VariancevsRandom Probabilityofgoal Sergei Bobrovski Variance Goal Probability Distribution Bernouli (Random) Distribution
  • 7. Jason Mei Figure 8: Analysis for Tuka Rask. Sample size = 326, Average save % =89.6% Figure 9: Analysis for Corey Schnieder. Sample size = 884, Average save % =92.9% -0.8% -0.6% -0.4% -0.2% 0.0% 0.2% 0.4% 0.6% 0.8% 0% 2% 4% 6% 8% 10% 12% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 VariancevsRandom Probabilityofgoal Tuka Rask Variance Goal Probability Distribution Bernouli (Random) Distribution -0.5% -0.4% -0.3% -0.2% -0.1% 0.0% 0.1% 0.2% 0.3% 0.4% 0.5% 0.6% 0% 1% 2% 3% 4% 5% 6% 7% 8% 1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930313233343536373839404142434445464748495051 VariancevsRandom Probabilityofgoal Corey Schnieder Variance Goal Probability Distribution Bernouli (Random) Distribution
  • 8. Jason Mei Figure 10: Analysis for James Reimer Sample size = 298, Average save % =90.9% Figure 11: Average Variance of Analyzed Players Result Analysis Our analysis of the league as a whole shows that on average there is a noticeable effect on the performance of goaltenders after a goal is scored against them. This is primarily evident within the first 5 shots subsequent to letting a goal in, where there is a cumulative 1.87% increased chance of a goal being let in. This can be theorized to be a result of the impact of confidence on a goaltender after letting -0.4% -0.2% 0.0% 0.2% 0.4% 0.6% 0.8% 1.0% 1.2% 0% 1% 2% 3% 4% 5% 6% 7% 8% 9% 10% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 VariancevsRandom Probabilityofgoal James Reimer Variance Goal Probability Distribution Bernouli (Random) Distribution -0.4% -0.3% -0.2% -0.1% 0.0% 0.1% 0.2% 0.3% 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 Average Variance
  • 9. Jason Mei a goal past them. After making subsequent saves in a game allows them to regain this “confidence” and regress to a performance more representative of their mean. However, looking at players individually, we see that this pattern varies broadly with each individual. This is an expected outcome as each player reacts to and internalizes events in a different manner. Goaltenders such as Corey Schnieder and Ben Bishop are less likely to make a save subsequent to letting a goal in. While conversely, players such as Braden Holtby and Corey Crawford actually perform better after having a goal scored against them. This can perhaps be attributed to them gaining increased levels of focus and effort following the failure to make a save. Though our model does not point to any drivers we can make assumptions as to the drivers of the results of our analysis. Understanding these variances in each individual’s performance can lead to better decision making in game time situations. Goaltenders who demonstrate increased performance after having a goal scored against them should potentially not be pulled and replaced after a poor start as they show no statistical evidence of decreased performance. Alternatively, certain players a more prone to decreased levels of performance and the effects should be considered by coaching staff. Ultimately, though variance from a random distribution is shown and thus the conclusion that goals/saves are not randomly distributed, the absolute amount of variation is relatively insignificant. For the aggregate average of our data sample as well as the vast majority of individual players the actual distribution of goals and saves still closely resembles that of the random Bernoulli distribution. For many professional goaltenders, the causal effects on performance as a result of preceding outcomes is insignificant and likely not enough to be actionable upon. Assumptions and Considerations For the primary analysis done, “shots in between goals” carries from game to game in our data set. For example if a goaltender made 10 consecutive games at the end of a game, the counter would begin at 10 for the next game. This belies the assumption that any “momentum” is carried on from game to game or the effect is averaged out in our data set. We must also consider that within our data set, not all games are present and that not all games played are accounted for. Thus not all games contained within the data are consecutive. Additionally, our analysis does not consider the fact that goaltenders may not play consecutive games for a number of reasons, including injuries, rest days, or two goalie teams. To investigate this, another analysis was done with our data using the same methodology with the sole difference being that the start of a game would represent a reset of the shots counter. Taking the underlying assumptions that the beginning of a game to be a consistent start point for a player who can only improve or regress based on subsequent events. However, this also does result in an arbitrary end counter at the end of a game as it is reset as a result of running out of time on the game clock rather than an event such as a goal instilling an upper bound to the number of consecutive shots. The results of this analysis are shown the Appendix and were not considered because of the mention constraints. If possible an even larger data set spanning an entire season would be preferable. Previous studies have shown that game to game variation in performance does exist among certain goaltenders, however performance long term averages and is relatively consistent. It is also interesting to note that as a game progresses, goaltenders do on average improve regardless. Figure 12 shows the average save % of all goaltenders sorted by period within our data. This indicates
  • 10. Jason Mei that there are already drivers that effect performance as a game progresses as performance on average appears to degrade over the course of a game. Figure 12: Average Save % per period With any analysis, there is always the possibility of other un-accounted for drivers and causal variables that also effect a players performance in relation to time and prior events. The above example is only one potential driver. However, given the size of our sample data, the effects of this should be accounted for and averaged in our analysis. Our Finally, in any statistical analysis the accuracy of insights and conclusions are only as good as ones data. The source from Hockey-Graphs has readily acknowledged that imperfections in the data do exist and not 100% accurate. It is believed that for our case, the shot attempts and goal data to be of sufficient accuracy and consistency for the analysis performed. Conclusion While trends are shown in our analysis of the NHL as a whole, it is important to realize that each individual reacts in a different and unique way to various situations. While we are not able to prescribe the drivers behind any variations from a random distribution of goals and saves. 90.0% 90.5% 91.0% 91.5% 92.0% 92.5% Period 1 Period 2 Period 3 Save % by Period