SlideShare a Scribd company logo
1 of 17
Download to read offline
A Probabilistic Graphical Model for
Simulating Basketball Matches
John Crain and Liza Spencer
May 8, 2016
Abstract
Framework for simulating a basketball match while still consider-
ing the unique identities of individual players has not fully included
all possible actions that can occur in the progression of a game, no-
tably excluding the possibility of a player dribbling. We attempt to
extend the model originally developed by Oh et al. [5] to include the
actions associated with dribbling a basketball, while also modifying
some other aspects of their model to include all possible events that
could occur in a possession and all realistic outcomes.
1 Introduction
Graph theory is a branch of mathematics focusing on combinatorial struc-
tures called graphs, which are configurations of vertices and edges [1]. While
many branches of mathematics involve problems using calculations and metic-
ulous measurement, graph theory started with puzzles, with the intent to test
the cleverness [2]. Sparking the interest of many mathematicians, graph the-
ory has become astoundingly abundant in depth and theoretical findings [2].
The idea of graph theory began with a Swiss mathematician, Leonhard Eu-
ler, in 1735 when Euler answered an old puzzle called the Konigsberg bridge
problem-a problem that involved finding a path that crosses over all seven
bridges only once [3]. Euler claimed that there was no such path, which was
the first theory ever proven in graph theory. There are many applications
of graph theory, including chemistry, computer science, social sciences, and
creating models [3]. Using graph theory to create a model is the application
1
that our research focuses on, and more specifically, how that model can be
used to predict the outcome and progression of a basketball game.
The use of mathematical techniques to analyze sports is an endeavor that,
while fairly new, has become a very popular method of decision-making for
front-o ce executives in sports. Predicting the outcome of a sporting event
has intrigued many for years, but predictions have been inadequate due to a
lack of statistics to measure many actions that occur throughout the course
of a basketball game. While statistics such as points, rebounds, and as-
sists have been tracked for many years, data concerning the progression of a
match has been uncommon. Because of this, many conventional approaches
have focused on win-loss predictions only, placing no value on how the match
might progress from start to finish.
However, more information on individual players’ behaviors outside of
these traditionally documented stats is now available because of the new
player tracking systems used by the NBA. This system uses cameras to track
the movement of every player in a game 25 times per second, and with that
information provides data on the number of touches a player receives and
how they move while on the court, among many other things [4]. With these
resources, more detailed models examining the progression of a basketball
game, sometimes referred to as a ”microsimulation” [5], are possible. Shirley
[6] and Strumbelj and Vracar [7] attempted to use a Markov model to model
the progression of a basketball game so that a more detailed ”microsimu-
lation” of a game would be possible. However, they ignore the individual
identities of the players on each team, and since one basketball team has
many di↵erent 5-player combinations that it can use, the progression of a
game depends greatly on the o↵ensive and defensive lineups in the game at
a given point.
The paper that served as the primary motivation for this research used a
probabilistic graphical model to simulate how a basketball game progresses
[5]. The authors account for the unique identities of di↵erent players by in-
cluding in their model how the ball moves between players in each o↵ensive
possession and actions that may happen before or after passing. Relation-
ships between each player and possible actions are illustrated by a graph,
where each vertex represents a player or an action and each edge represents
the movement from one action to the next. Probability functions were found
2
for each action, and using the player tracking data described earlier, the
model provided a way to simulate a game using any combination of 5-player
o↵ensive and defensive lineups. While this model accounted for some player
interaction, it completely ignored the potential for a player to dribble the
ball. Cervone et al. [8] performed a micro-analysis of basketball games and
while their intent was not to simulate a game, they still quantitatively eval-
uated every decision made by a player in the course of a possession, and a
player’s possibility of dribbling the ball played a large role in that. Therefore,
a model that can be used for microsimulation of basketball games should in-
clude this very important aspect of a typical possession.
Our model builds upon the one first introduced by Oh et al. [5] by includ-
ing the possibility of dribbling to the basket (called ”drive” in our model),
distinguishing between open and contested shots, including the possibility
that a shot is blocked, and by modifying the e↵ect that defense has on many
actions. Our complete model diagramming a possession is shown in Figure
1.
Figure 1: Graphical Model showing possible flow of events in every possession
2 Method
Our model uses player tracking data and team tracking data, which we found
on NBA.com [4] and other statistics such as shot tracking data and lineup
data, found on NBAsavant.com [10]. We model each possession using the
3
graphical model shown in Figure 1, where the vertices represent every event
in a game, and the movements from one event to the next are represented by
the edges between vertices. The probability functions shown below determine
the weights on any edge of the graph, showing the likelihood that each edge
is travelled.
Table 1: Summary of Notation
Notation Meaning Section
Lo O↵ensive team lineup 2.2, 2.3, 2.7, 2.10
Ld Defensive team lineup 2.2, 2.3, 2.5, 2.6,
2.8, 2.10
i Player i’s propensity to take a non-driving shot 2.2
˜i Player i’s propensity to take a non-driving shot
given the defensive lineup
2.2
i Player i’s ability to deter a non-driving shot at-
tempt
2.2
bi Player i’s propensity to drive to the basket 2.3
˜bi Player i’s propensity to drive given the defensive
lineup
2.3
gi Player i’s ability to deter an attempt to drive to
the basket
2.3
↵ij Tendency of player i to pass to player j 2.7
✓id Non-driving shooting ability of player i at basis d 2.5
id Defensive ability of player i to reduce non-driving
shot accuracy at basis d
2.5
ci Driving shot accuracy of player i 2.6
fi Defensive ability of player i to reduce driving shot
accuracy
2.6
rij Tendency of player i to pass to player j after a
drive
2.7
vi Player i’s propensity to have a shot blocked 2.8
zi Player i’s propensity to block a shot 2.8
wij Weight that determines how much player i is af-
fected by the defense of player j
2.2, 2.3, 2.5, 2.9
qij Weight giving likelihood that player j is defending
player i at the end of a drive
2.6, 2.8
4
id Player i’s ability to draw a shooting foul at basis
d
2.9
⇣id Player i’s foul proneness at basis d 2.9
⇢d
i Defensive rebound grabbing ability of player i 2.10
⇢o
i O↵ensive rebound grabbing ability of player i 2.10
⌧a Average number of o↵ensive possessions of team a
every 48 minutes
2.12
In the next few sections, we explain how all of these parameters are used to
determine the weight assigned to each edge of the graphical model.
2.1 Start of a Possession
We model the start of a possession just like Oh et al. [5] did, as a multinomial
distribution between the players on the court at any given time. Our model
considers the start of a possession to be whenever there is a change in the
team that has the ball, such as a defensive rebound after a missed shot, an
inbound pass after a made shot, or a steal. When a possession begins as an
inbound pass, we look at historical backcourt touch data to see which player
is most likely to receive the inbound pass and thus, begin the possession.
Methods to find rebound and steal probabilities are discussed in sections
2.10 and 2.11 respectively. In the case of a steal or rebound, the player that
begins the possession is whichever player got the rebound or steal.
2.2 Non-Driving Shot Frequency
Similar to how Oh et al. [5] did, we model the probability of a field goal
attempt 1
for a given touch, excluding shots taken on drives to the basket,
which will be discussed in later sections, as a Bernoulli distribution with
probability
p(Sn
i = 1 |Lo, Ld, , ) = ˜i + ˜i
1
4
X
k2Lo,k6=i
˜k
!!
,
where
1
The term ”field goal” is interchangeable with the term ”shot” in basketball.
5
˜i = i
X
j2Ld
wij j
with (x) = ex
/(1 + ex
) included to ensure that the output values are
between 0 and 1. Sn
i is an indicator for whether player i attempts a non-
driving shot given a touch. Lo and Ld represent the lineups of the o↵ensive
and defensive teams respectively. i is a parameter which determines the
likelihood of player i taking a shot and j is the defensive ability of player
j to deter a shot attempt. The weight wij determines the likelihood that
player j is defending player i on player i’s initial touch. We determine that
weight according to both similarity of o↵ensive and defensive abilities and
similarity of position groupings (guard or forward) for player i and defensive
player j 2
. For example, if player i is the best o↵ensive guard, we give more
weight to the best defensive guard on the opposing team than to the worst
defensive guard, but we give more weight to the worst defensive guard than
to the best defensive forward.
Note that our method of determining wij is di↵erent than Oh et al. [5],
who only used similarity of exact positions of player i and defensive player
j. The justification for this is that defensive matchups are determined by
more than just positional similarity. For example, if the point guard on
the o↵ensive team is the best o↵ensive guard and the point guard on the
defensive team is the worst defensive guard, then the o↵ensive point guard
will likely not be defended by the defensive point guard. Rather, he will likely
be defended by a di↵erent defensive guard with higher defensive ability, even
though that defender may not be the defensive team’s point guard.
2.3 Drive Frequency
We model the probability of a player driving the ball on a given touch as a
function of the o↵ensive and defensive lineups, the o↵ensive player’s tendency
to drive, and the defender’s ability to deter an attempt to drive. It is modeled
2
There are five positions on a basketball team: point guard, shooting guard, small
forward, power forward, and center. The point guard, shooting guard, and small forward
can be grouped more generally as ”guards” while the power forward and center can be
grouped together as ”forwards.”
6
as a Bernoulli distribution with probability
p(Di = 1 |Lo, Ld, b, g) = ˜bi + ˜bi
1
4
X
k2Lo,k6=i
˜bk
!!!
where
˜bi = bi
X
j2Ld
wijgj.
The parameter Di is an indicator for whether player i drives on a given
touch. The parameter bi is a parameter describing how likely player i is to
drive to the basket and gj is player j’s defensive ability to deter an attempt
to drive to the basket. Therefore, ˜bi is player i’s propensity to drive while
accounting for the current 5-player defensive lineup. As described in the
previous section, wij is a weight determining the likelihood that on player i’s
initial touch he is being defended by any given defensive player j 2 Ld. Note
that
P
j2Ld
wij = 1. Then
P
j2Ld
wijgi is a weighted average of the defensive
lineup’s ability to deter an attempt to drive, with more weight given to the
ability of the defensive players that are most likely to be guarding o↵ensive
player i. Thus, if an o↵ensive player is being defended by a poor defender,
he is more likely to drive to the basket and vice versa.
We also have an o↵set term (˜bi
1
4
P
k2Lo,k6=i
˜bk) similar to the one first
introduced by Oh et al. in their original function for shot frequency [5]. We
include this in the model because a player’s decision to drive depends on not
only his own tendency to drive and his defender, but also his teammates’
propensities. This term is positive (or negative) if the propensity of player
i to drive is more (or less) than the average propensity of his teammates to
drive. For example, if Lebron James and Kyrie Irving are on the court and
Kyrie Irving is replaced by a taller, slower forward, Lebron James is likely to
drive to the basket more.
2.4 Result of a Drive
Given that a player drives to the basket, he has two options for what to do
next. He can either shoot the ball or pass it back out to a teammate. We
calculate the average probability of a shot attempt per drive for each player
from historical data and use that independent of current lineup due to data
constraints. There are other results of a drive, which include a turnover and
7
drawing a shooting foul. In order to determine the likelihood of these actions
happening, we use historical data.
2.5 Non-Driving Shot E ciency
We model non-driving shot e ciency with an approach very similar to Oh
et al. [5]. Given that a player attempts a non-driving field goal, we model
the shot e ciency (the probability that the shot goes in) as a function of
the defensive lineup, the shot location, the shooting ability of the o↵ensive
player, and the ability of the defensive player to reduce shot e ciency. It is
modeled as a Bernoulli distribution with probability
p(Yi = 1 |Ld, d, ✓, ) = ✓id
X
j2Ld
wij jd
!!
.
The parameter Yi is an indicator for whether player i made the shot, d
represents the ”basis,” or location, where the shot was taken, ✓id is player
i’s shooting ability at basis d, and jd is the defensive ability of player j to
reduce player i’s shot accuracy at basis d, and wij is the weight that has been
discussed in previous sections. Note that our model di↵ers from Oh et al. [5]
in that we include a weighted average of the entire defense’s ability to reduce
shot accuracy at basis d rather than just the ability of the defender at the
time of the shot. We do this because practically, it is di cult to determine
who the defender would be in an actual simulation.
In Figure 1, note that our model distinguishes between an open and
contested shot, which is di↵erent than the one proposed by Oh et al. [5].
In their model, the defensive ability of the closest defensive player j always
reduces shot accuracy, no matter the distance between player j and the
shooter. However, if the closest defensive player j is far enough away from
the shooter, his defensive ability will likely have no impact on the shooter’s
accuracy, even if he is the closest defender. This is our justification for
distinguishing between an open and contested shot. We define an ”open
shot” as one where the closest defender is five or more feet away from the
shooter. To model whether a shot is open, we calculate the percentage of
total non-driving shots taken by player i that are open and use this as the
probability that a non-driving shot taken by player i is open 3
. Given that
3
The probability that a non-driving shot attempt is contested would then just be equal
to (1 probability of open shot).
8
a shot is open, we simply set the defense’s ability to reduce shot accuracy
equal to zero, as it would have no impact on player i’s ability to make the
shot, and player i’s non-driving shot e ciency would become only a function
of his own shooting ability at basis d.
We sample the shot location from 6 bases: the restricted area, the rest of
the paint,4
two point shots outside the paint, three pointers from the center
of the court, three pointers from the left corner, and three pointers from the
right corner. These were the bases originally used by Franks et al. [9] with
the exception that Franks et al. combined corner three point shots into one
basis. After examining players’ shooting data and seeing that their shooting
abilities were impacted greatly depending on which corner they shot from
(left or right), we determined that shots from left and right corner were
intrinsically di↵erent shots and separated them into two bases. Therefore,
when player i takes a non-driving shot, we sample the shot location from
historical data on the percentage of non-driving shots that player i takes
from each of the six bases.
To summarize, when a player decides to take a non-driving shot, we first
sample whether the shot was open or contested, then we sample the shot
location from the six bases, and then the non-driving shot e ciency model
gives the success probability of the shot.
2.6 Driving Shot E ciency
Our model for driving shot e ciency is similar to our model for non-driving
shot e ciency. Given that a player has driven to the basket and decides to
shoot rather than pass, we model driving shot e ciency (the probability that
a shot taken on a drive is made) as a function of the defensive lineup, the
shooting ability of the o↵ensive player, and the defensive player’s ability to
reduce driving shot accuracy. It is modeled as a Bernoulli distribution with
probability
p(Xi = 1 |Ld, c, f) = ci
X
j2Ld
qijfj
!!
.
Here Xi is an indicator for whether the shot is made, ci is the driving
shot accuracy of player i, and fj is the defensive ability of player j to reduce
driving shot accuracy. As in previous functions where the defender’s ability
4
The paint is defined as the area inside the rectangle, surrounding the basket.
9
plays a role, we use the weighted average of the defense. However, the weight
qij used here determines the likelihood that player j is defending player i at
the end of a drive, which is di↵erent than the weight wij used in previous
functions. The reason for this change is that on a drive, players move closer
to the rim and often draw a di↵erent defender than the one who was guarding
them when they began their drive. For example, if Stephen Curry drives past
his defender to the basket, it is likely that a larger defender will step over
to help defend Curry, and so the ability of the defender that stepped over to
help defend Curry will play a larger role on Curry’s ability to make a driving
shot than the defensive ability of the player that was initially guarding Curry.
As with non-driving shot e ciency, we will also model whether the driving
shot is open by calculating the percentage of total driving shots taken by
player i that are open and use this as the probability that a driving shot
taken by player i is open. In this case, the term representing the defense’s
ability to reduce driving shot accuracy (
P
j2Ld
qijfj) would be set to zero.
Note that our model for driving shot e ciency does not include the basis d
that was included for non-driving shot e ciency. The justification for this
is that all driving shots will be taken close to the rim and are therefore in
roughly the same location.
2.7 Pass Network
We model the passes between players just as Oh et al. [5] did: a network with
edge weight parameterized by ↵ij(i 6= j). Then given that player i chooses
to pass, the probability that he passes to player j is
p(i ! j |↵, Lo) =
↵ij
⌃k6=i,k2Lo ↵ik
.
Then the probability that player i passes to player j depends on all five
o↵ensive players. We use the Expectation Maximization (EM) algorithm on
data of total number of passes between players (excluding passes after drives)
and total number of possessions each lineup had in every game to determine
the ↵ matrix.
We use this same framework to model passes between players after a
drive. That is, given that player i drives to the basket and decides to pass,
the probability that he passes to player j is given by
p(i ! j |r, Lo) =
rij
⌃k6=i,k2Lo rik
.
10
Here, we use the EM algorithm on data of total number of passes between
players after drives and total number of possessions each lineup had in every
game to determine the rij matrix. We separate the two networks because
we believe that the criteria for selecting a player to pass to after choosing to
drive is di↵erent than the criteria for selecting a player to pass to before a
drive. For example, after an o↵ensive player drives to the basket and causes
the rest of the defense to shrink towards the basket, he is more likely to pass
to a great shooter that may not have been as open before the drive.
2.8 Blocked Shots
Given that a player has taken a contested shot (the defender is within five
feet of the shooter) and is not fouled, we model the probability of the shot
being blocked as a function of the defensive lineup, the o↵ensive player’s
propensity to have their shot blocked, and the ability of the defense to block
shots. It is modeled as a Bernoulli distribution with probability
p(Bi = 1 |LD, v, z) =
1
2
vi +
X
j2LD
qijzj
!
.
Here, Bi is an indicator for whether the shot is blocked, vi is player
i’s propensity to have his shot blocked, zj represents defensive player j’s
propensity to block shots, and qij is a weight determining the likelihood
that player j is defending player i when player i takes a shot. Because the
defense’s ability a↵ects the probability of the shot being blocked and di↵erent
defenders could be guarding the o↵ensive player when he attempts the shot,
we use a weighted average of the entire defense’s ability to block shots, giving
more weight to the ability of players that typically play closer to the rim,
where shots are typically more likely to be blocked. A description for why
this weight is used instead of wij is given in section 2.6.
We assume that if player i takes a shot that is contested by player j then
the probability of that shot being blocked by player j is somewhere between
the historical probability of player i having his shot blocked and the historical
probability of player j to block a shot that he is contesting. For example, if a
player who has his shot blocked 10% of the time takes a shot that is contested
by a player that blocks 14% of the shots he defends, we can expect that the
probability that the shot is blocked is between 10% and 14%. For the model,
we assume that the probability of the shot being blocked is the average of
11
vi and the weighted average of the defense’s probability of blocking the shot
(
P
j2LD
qijzj).
2.9 Shooting Foul and Free Throw
Our model for shooting fouls is similar to that of Oh et al.[5] and it is a func-
tion of the shooter’s ability to draw a foul and the defender’s foul proneness;
however, unlike our motivating paper, we do not take into consideration the
basis of the shot on the court. We do this because we were not able to find
data which represents the likelihood of a player being fouled at each of the 6
bases that we consider in other functions, nor do we feel as though this makes
a large impact on the outcome of the game. We represent the likelihood that
player i is fouled by player j while shooting by SF(i, j), given by
p(SF(i, j) = 1 | , ⇣) = i +
X
j2LD
wij⇣j
!
.
Here i is player i’s ability to draw a shooting foul, ⇣j is defensive player
j’s foul proneness, and we include wij, the weight that determines how much
player i is a↵ected by the defense of player j, similar to previous functions.
Thus, if player j is very likely to foul the player he is guarding, the likelihood
of a shooting foul increases. However, if player j is very likely to foul the
player he is guarding but he isn’t guarding the shooter, the probability of a
shooting foul doesn’t change much due to the weight.
We use historical data of each player to model the probability of a free
throw success because other players on the court, on o↵ense and defense, do
not a↵ect the outcome of a free throw.
2.10 Rebound
Our model for a player’s ability to get a rebound is modeled as a competition
between all the players on the court and we model it just as Oh et al. [5]
did. Because the e↵ort required to rebound o↵ensively and defensively are
di↵erent, we model them di↵erently and call player i’s o↵ensive and defensive
rebound ability ⇢o
i and ⇢d
i respectively. Considering both the o↵ensive and
defensive players on the court, the probability that player i will grab an
defensive or o↵ensive rebound is represented by DRi and ORi and given by
12
p(DRi = 1 |Ld, Lo) =
exp(⇢d
i )
⌃j2Ld
exp(⇢d
j ) + ⌃k2Lo exp(⇢o
k)
p(ORi = 1 |Ld, Lo) =
exp(⇢o
i )
⌃j2Ld
exp(⇢d
j ) + ⌃k2Lo exp(⇢o
k)
Because our model is represented as a competition between the players
on the court, the lineups of both teams are vital. We are also able to model
the grabbing ability of possible lineups on any given team.
2.11 Turnover
We model the likelihood of an individual turning the ball over to the other
team like Oh et al. [5] did. There are two di↵erent types of turnovers that
we consider – when the ball is stolen and any other event that results in
an inbound pass. We use historical data of each player to determine the
probability of turnover per touch and we also take into consideration the
average steal rate of each player. Given a stolen ball, we assign a steal to the
player who has a probability proportional to his average steal rate compared
to the average steal rates of his teammates on the court. On any other type
of turnover, the possession begins with an inbound pass.
2.12 Number of Possessions
We model each team’s total number of possessions in a simpler way than
Oh. et al. [5]. In a game between team a and b, the number of possessions
of both teams will be roughly the same because the teams alternate posses-
sions throughout the game. ⌧a represents the average number of o↵ensive
possessions team a has. In a match between teams a and b where team a and
team b play with di↵erent paces, we assume that the number of possessions
that each team will have in the game is somewhere between ⌧a and ⌧b. If we
assume that it is the average of the two then each team gets ⌧a+⌧b
2
possessions
per game, meaning that each game has a total of (⌧a+⌧b
2
) ⇤ 2 = ⌧a + ⌧b posses-
sions for the entire game. For example, if team a averages 104 possessions
per 48 minutes 5
and team b averages 98 possessions per 48 minutes, we take
the average of 104 and 98 and say that a match between these two teams
5
A regulation game is 48 minutes long.
13
will consist of 101 possessions for each team, which means there would be a
total of 2 ⇤ 101 = 202 possessions in the game.
3 Data Collection
We collected roughly 75 raw statistics from the player tracking data on
NBA.com [4] and NBAsavant.com [10] that were necessary to learn the pa-
rameters used in the probability functions. Some parameters, such as the
probability that player i will make a free throw, need no further manipula-
tion, as they were simply equal to a raw statistic. For example, in the case of
the probability of player i making a free throw, the statistic measuring free-
throw percentage could be used without manipulation. However, parameters
such as the percentage of player’s non-driving field goal attempts at each of
the 6 bases require multiple statistics and multiple calculations. To calculate
how frequently a player shoots a non-driving shot in each basis, we needed
his total shot attempts in each basis and total driving shot attempts in each
basis. Then we subtracted total driving shots in each basis from total shots
in each basis to obtain the total non-driving shots in each basis, and divided
that number by the total shots in each basis to obtain the percentage of non-
driving shots in each basis. After collecting all of these raw statistics and
manipulating them so that they could be used as the necessary measures for
the parameters, we input them into the probability functions. From there,
we were able to assign a weight to each edge of the graphical model based
on each team’s unique combination of 5 players on the court.
4 Simulation
Once the data is collected and the parameters mentioned above are calcu-
lated, the general framework for using our model to conduct a simulation
would be fairly straightforward. At any vertex in the model, let X be the
set of all possible events that could occur next. Then x1, x2, ..., xn are the
elements of X, where n is the number of possible events that could occur
next. We divide the closed interval [0, 1] into n subintervals, where each
subinterval si has length p(xi), the probability of the xith event occurring.
Note that the sum of these probabilities should add to 1, and if this sum
is not exactly one, then each p(xi) is scaled so that
nP
i=1
p(xi) = 1. We then
14
generate a random U 2 (0, 1). Then U 2 si, where 1  i  n and the event
xi that corresponds to that subinterval si is the event that would occur next
according to our simulation.
Using the data we collected on the players for the Cleveland Cavaliers and
Golden State Warriors, we simulated what might happen in a hypothetical
game scenario where Lebron James has just taken a shot from the restricted
area6
against the Golden State Warriors, who currently have Steph Curry,
Klay Thompson, Andre Iguodala, Harrison Barnes, and Draymond Green on
the court. Once he shoots the ball, the simulation framework that was just
discussed can be used with the model to determine the series of events that
lead to either a made or missed shot.
First, the simulation determines whether the shot is contested or open.
We had previously calculated that, given the players on the court, p(open) =
0.13 and p(contested) = 0.87. In other words, there is a 13% chance that the
shot is open and an 87% chance that it is contested. Then we have
X =
⇢
open if 0  U < 0.13
contested if 0.13  U < 1.0
and the generated U = 0.4387 so the simulation determines that the shot
is contested.
Next, the simulation would determine whether the shot is blocked or
not. Using the data with the probability functions, we determine that
p(blocked) = 0.0501 and that p(not blocked) = 0.9499. Then we have
X =
⇢
blocked if 0  U < 0.0501
not blocked if 0.0501  U < 1.0
and the generated U = 0.4898 so the simulation determines that the shot
is not blocked.
Finally, the simulation would determine if the shot taken is made or
missed. We use the historical data along with the probability functions to
determine that p(made) = 0.6764 and that p(missed) = 0.3236.
X =
⇢
made if 0  U < 0.6764
missed if 0.6764  U < 1.0
6
The restricted area is the area of the court directly under the goal. More specifically,
it is the area within the semi-circle that lies under the goal.
15
and the generated U = 0.4456 so the simulation determines that the shot
is made. Two points would then be awarded to the Cleveland Cavaliers and
the Warriors would become the team on o↵ense and start their o↵ensive
possession.
We can continue this process for each event in a possession, and then
repeat it for an estimated number of possessions to provide both a winner
and statistics for one game. We can them simulate each game multiple times
to obtain averages for statistics and assign a win to the team with the higher
number of wins.
5 Future Research
While our model does provide a solid theoretical framework for performing a
simulation, there are several improvements that could make our model both
more accurate and complete. For one, some defensive parameters included
in our probability functions that, while quantifiable, were di cult to find.
Within the past few weeks, NBA.com released new defensive statistics that
have been tracked by individual teams, but haven’t been publicly available
until now. Incorporating this data would provide for more accurate simula-
tions.
Another improvement that could be made is to factor in a time and
score component. This model assumes that players tendencies are constant
throughout the entirety of a game, however, in reality this is not necessarily
the case. Players’ behaviors can change depending on both the time left in
the game and the score of the game. For example, in a game where the score
is close near the end, a player like Lebron James would tend to shoot more
than he usually would, and our model does not account for this.
Although it wouldn’t a↵ect the outcome of the game, future researches
should keep track of the assists for statistical reasons. Keeping track of which
players assist when would only make the player’s statistics more accurate.
There is obviously no perfect way to simulate or predict a basketball
game but we believe that these improvements would only make for a more
complete, more specific tool for teams and coaches to use.
16
References
[1] Gross, Jonathan L., and Jay Yellen. Graph theory and its applications.
CRC press, 2005.
[2] Biggs, Norman, E. Keith Lloyd, and Robin J. Wilson. Graph Theory
1736-1936. Oxford, England: Oxford UP, 1976. Print.
[3] “Graph Theory.” Encyclopedia Britannica Online. Encyclopedia Britan-
nica. Web.
[4] “NBA.com, O cial Site of the National Basketball Association.”
NBA.com. Web. 13 Jan. 2016. http://www.NBA.com/.
[5] Oh,M., Keshri, S. and Iyengar, G. (2015). Graphical Model for Basketball
Match Simulation. In MIT Sloan Sports Analytics Conference.
[6] Shirley, K. (2007). A Markov model for basketball. Poster presentation at
New England Symposium for Statistics in Sports, Boston, MA, Septem-
ber 2007.
[7] Strumbelj, E. and Vracar, P (2012). Simulating a basketball match with
a homogeneous Markov model and forecasting the outcome. International
Journal of Forecasting 28(2012) 532-542.
[8] Cervone, D., D’Amour, A., Bornn, L., and Goldsberry, K. ”POINTWISE:
predicting points and valuing decisions in real time with NBA Optical
Tracking Data.” Proceedings MIT Sloan Sports Analytics (2014).
[9] Franks, A., Miller, A., Bornn, L., and Goldsberry, K. (2014). Character-
izing the Spatial Structure of Defensive Skill in Professional Basketball.
arXiv:1405.0231
[10] Your Source For Advanced NBA Analytics. Web. 14 Jan. 2016.
http://www.NBAsavant.com/.
17

More Related Content

Viewers also liked

Johar Ali CV1
Johar Ali CV1Johar Ali CV1
Johar Ali CV1Johar Ali
 
Key concepts
Key conceptsKey concepts
Key conceptsindia108
 
Healthy Eating (LINCOLN)
Healthy Eating (LINCOLN)Healthy Eating (LINCOLN)
Healthy Eating (LINCOLN)Jacob Bell
 
EconomicsResearch
EconomicsResearchEconomicsResearch
EconomicsResearchJohn Crain
 
Hipervinculo dalm(1)
Hipervinculo dalm(1)Hipervinculo dalm(1)
Hipervinculo dalm(1)David Lezama
 
Imagenes animacion 1km dalm lttr
Imagenes animacion 1km dalm lttrImagenes animacion 1km dalm lttr
Imagenes animacion 1km dalm lttrDavid Lezama
 
Imagenes 1km dalm lttr
Imagenes 1km dalm lttrImagenes 1km dalm lttr
Imagenes 1km dalm lttrDavid Lezama
 
Institución kasama fecha 29_09 tema_ quinta generacion de computadoras nombr...
Institución  kasama fecha 29_09 tema_ quinta generacion de computadoras nombr...Institución  kasama fecha 29_09 tema_ quinta generacion de computadoras nombr...
Institución kasama fecha 29_09 tema_ quinta generacion de computadoras nombr...arielcacuango
 

Viewers also liked (13)

PPD-MediaKit
PPD-MediaKitPPD-MediaKit
PPD-MediaKit
 
Johar Ali CV1
Johar Ali CV1Johar Ali CV1
Johar Ali CV1
 
Key concepts
Key conceptsKey concepts
Key concepts
 
Healthy Eating (LINCOLN)
Healthy Eating (LINCOLN)Healthy Eating (LINCOLN)
Healthy Eating (LINCOLN)
 
Norma
NormaNorma
Norma
 
Norma
NormaNorma
Norma
 
EconomicsResearch
EconomicsResearchEconomicsResearch
EconomicsResearch
 
Hipervinculo dalm(1)
Hipervinculo dalm(1)Hipervinculo dalm(1)
Hipervinculo dalm(1)
 
Imagenes animacion 1km dalm lttr
Imagenes animacion 1km dalm lttrImagenes animacion 1km dalm lttr
Imagenes animacion 1km dalm lttr
 
Imagenes 1km dalm lttr
Imagenes 1km dalm lttrImagenes 1km dalm lttr
Imagenes 1km dalm lttr
 
AMU uddannelsesbevis 2015
AMU uddannelsesbevis 2015AMU uddannelsesbevis 2015
AMU uddannelsesbevis 2015
 
Institución kasama fecha 29_09 tema_ quinta generacion de computadoras nombr...
Institución  kasama fecha 29_09 tema_ quinta generacion de computadoras nombr...Institución  kasama fecha 29_09 tema_ quinta generacion de computadoras nombr...
Institución kasama fecha 29_09 tema_ quinta generacion de computadoras nombr...
 
大觀事件
大觀事件大觀事件
大觀事件
 

Similar to MathematicsResearch

Creative component alexzajichek
Creative component alexzajichekCreative component alexzajichek
Creative component alexzajichekAlexZajichek
 
Winning in Basketball with Data and Machine Learning
Winning in Basketball with Data and Machine LearningWinning in Basketball with Data and Machine Learning
Winning in Basketball with Data and Machine LearningKonstantinos Pelechrinis
 
Learning of Soccer Player Agents Using a Policy Gradient Method : Coordinatio...
Learning of Soccer Player Agents Using a Policy Gradient Method : Coordinatio...Learning of Soccer Player Agents Using a Policy Gradient Method : Coordinatio...
Learning of Soccer Player Agents Using a Policy Gradient Method : Coordinatio...Waqas Tariq
 
11.[6 12]assessing the relationship of kinematics with dribbling performance ...
11.[6 12]assessing the relationship of kinematics with dribbling performance ...11.[6 12]assessing the relationship of kinematics with dribbling performance ...
11.[6 12]assessing the relationship of kinematics with dribbling performance ...Alexander Decker
 
Learning to Reason in Round-based Games: Multi-task Sequence Generation for P...
Learning to Reason in Round-based Games: Multi-task Sequence Generation for P...Learning to Reason in Round-based Games: Multi-task Sequence Generation for P...
Learning to Reason in Round-based Games: Multi-task Sequence Generation for P...Deren Lei
 
PREDICTING PLAYERS' PERFORMANCE IN ONE DAY INTERNATIONAL CRICKET MATCHES USIN...
PREDICTING PLAYERS' PERFORMANCE IN ONE DAY INTERNATIONAL CRICKET MATCHES USIN...PREDICTING PLAYERS' PERFORMANCE IN ONE DAY INTERNATIONAL CRICKET MATCHES USIN...
PREDICTING PLAYERS' PERFORMANCE IN ONE DAY INTERNATIONAL CRICKET MATCHES USIN...cscpconf
 
Tactical Report Match Analysis
Tactical Report Match AnalysisTactical Report Match Analysis
Tactical Report Match AnalysisBrian VanDongen
 
INCREASED PREDICTION ACCURACY IN THE GAME OF CRICKETUSING MACHINE LEARNING
INCREASED PREDICTION ACCURACY IN THE GAME OF CRICKETUSING MACHINE LEARNINGINCREASED PREDICTION ACCURACY IN THE GAME OF CRICKETUSING MACHINE LEARNING
INCREASED PREDICTION ACCURACY IN THE GAME OF CRICKETUSING MACHINE LEARNINGIJDKP
 
RLOEB PPT — SportVU
RLOEB PPT — SportVURLOEB PPT — SportVU
RLOEB PPT — SportVURobbie Loeb
 
Sports Aanalytics - Goaltender Performance
Sports Aanalytics - Goaltender PerformanceSports Aanalytics - Goaltender Performance
Sports Aanalytics - Goaltender PerformanceJason Mei
 
Basketball players performance analytic as experiential learning approach
Basketball players performance analytic as experiential learning approachBasketball players performance analytic as experiential learning approach
Basketball players performance analytic as experiential learning approachNurfadhlina Mohd Sharef
 
CS 4800 final research paper
CS 4800 final research paperCS 4800 final research paper
CS 4800 final research paperRichard Ramsey
 
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Space-Time Analysis of Mov...
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Space-Time Analysis of Mov...Metulini, R., Manisera, M., Zuccolotto, P. (2017), Space-Time Analysis of Mov...
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Space-Time Analysis of Mov...University of Salerno
 
Scott Armistead- Motor Learning and Kinesiology Research Paper
Scott Armistead- Motor Learning and Kinesiology Research PaperScott Armistead- Motor Learning and Kinesiology Research Paper
Scott Armistead- Motor Learning and Kinesiology Research PaperScott Armistead
 
My Vision about complexity evolution of formats of play
My Vision about complexity evolution of formats of playMy Vision about complexity evolution of formats of play
My Vision about complexity evolution of formats of playCarlos Miguel
 
My Vision about complexity evolution of formats of play
My Vision about complexity evolution of formats of playMy Vision about complexity evolution of formats of play
My Vision about complexity evolution of formats of playCarlos Miguel
 
Scouting Report Coaching Assessment
Scouting Report Coaching AssessmentScouting Report Coaching Assessment
Scouting Report Coaching AssessmentJake Sykes
 
A Study on the application of Operation Research in Sports and Sports Management
A Study on the application of Operation Research in Sports and Sports ManagementA Study on the application of Operation Research in Sports and Sports Management
A Study on the application of Operation Research in Sports and Sports ManagementManoranjan Prusty
 

Similar to MathematicsResearch (20)

Creative component alexzajichek
Creative component alexzajichekCreative component alexzajichek
Creative component alexzajichek
 
Winning in Basketball with Data and Machine Learning
Winning in Basketball with Data and Machine LearningWinning in Basketball with Data and Machine Learning
Winning in Basketball with Data and Machine Learning
 
Learning of Soccer Player Agents Using a Policy Gradient Method : Coordinatio...
Learning of Soccer Player Agents Using a Policy Gradient Method : Coordinatio...Learning of Soccer Player Agents Using a Policy Gradient Method : Coordinatio...
Learning of Soccer Player Agents Using a Policy Gradient Method : Coordinatio...
 
11.[6 12]assessing the relationship of kinematics with dribbling performance ...
11.[6 12]assessing the relationship of kinematics with dribbling performance ...11.[6 12]assessing the relationship of kinematics with dribbling performance ...
11.[6 12]assessing the relationship of kinematics with dribbling performance ...
 
Learning to Reason in Round-based Games: Multi-task Sequence Generation for P...
Learning to Reason in Round-based Games: Multi-task Sequence Generation for P...Learning to Reason in Round-based Games: Multi-task Sequence Generation for P...
Learning to Reason in Round-based Games: Multi-task Sequence Generation for P...
 
Ai final module (1)
Ai final module (1)Ai final module (1)
Ai final module (1)
 
Cricket predictor
Cricket predictorCricket predictor
Cricket predictor
 
PREDICTING PLAYERS' PERFORMANCE IN ONE DAY INTERNATIONAL CRICKET MATCHES USIN...
PREDICTING PLAYERS' PERFORMANCE IN ONE DAY INTERNATIONAL CRICKET MATCHES USIN...PREDICTING PLAYERS' PERFORMANCE IN ONE DAY INTERNATIONAL CRICKET MATCHES USIN...
PREDICTING PLAYERS' PERFORMANCE IN ONE DAY INTERNATIONAL CRICKET MATCHES USIN...
 
Tactical Report Match Analysis
Tactical Report Match AnalysisTactical Report Match Analysis
Tactical Report Match Analysis
 
INCREASED PREDICTION ACCURACY IN THE GAME OF CRICKETUSING MACHINE LEARNING
INCREASED PREDICTION ACCURACY IN THE GAME OF CRICKETUSING MACHINE LEARNINGINCREASED PREDICTION ACCURACY IN THE GAME OF CRICKETUSING MACHINE LEARNING
INCREASED PREDICTION ACCURACY IN THE GAME OF CRICKETUSING MACHINE LEARNING
 
RLOEB PPT — SportVU
RLOEB PPT — SportVURLOEB PPT — SportVU
RLOEB PPT — SportVU
 
Sports Aanalytics - Goaltender Performance
Sports Aanalytics - Goaltender PerformanceSports Aanalytics - Goaltender Performance
Sports Aanalytics - Goaltender Performance
 
Basketball players performance analytic as experiential learning approach
Basketball players performance analytic as experiential learning approachBasketball players performance analytic as experiential learning approach
Basketball players performance analytic as experiential learning approach
 
CS 4800 final research paper
CS 4800 final research paperCS 4800 final research paper
CS 4800 final research paper
 
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Space-Time Analysis of Mov...
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Space-Time Analysis of Mov...Metulini, R., Manisera, M., Zuccolotto, P. (2017), Space-Time Analysis of Mov...
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Space-Time Analysis of Mov...
 
Scott Armistead- Motor Learning and Kinesiology Research Paper
Scott Armistead- Motor Learning and Kinesiology Research PaperScott Armistead- Motor Learning and Kinesiology Research Paper
Scott Armistead- Motor Learning and Kinesiology Research Paper
 
My Vision about complexity evolution of formats of play
My Vision about complexity evolution of formats of playMy Vision about complexity evolution of formats of play
My Vision about complexity evolution of formats of play
 
My Vision about complexity evolution of formats of play
My Vision about complexity evolution of formats of playMy Vision about complexity evolution of formats of play
My Vision about complexity evolution of formats of play
 
Scouting Report Coaching Assessment
Scouting Report Coaching AssessmentScouting Report Coaching Assessment
Scouting Report Coaching Assessment
 
A Study on the application of Operation Research in Sports and Sports Management
A Study on the application of Operation Research in Sports and Sports ManagementA Study on the application of Operation Research in Sports and Sports Management
A Study on the application of Operation Research in Sports and Sports Management
 

MathematicsResearch

  • 1. A Probabilistic Graphical Model for Simulating Basketball Matches John Crain and Liza Spencer May 8, 2016 Abstract Framework for simulating a basketball match while still consider- ing the unique identities of individual players has not fully included all possible actions that can occur in the progression of a game, no- tably excluding the possibility of a player dribbling. We attempt to extend the model originally developed by Oh et al. [5] to include the actions associated with dribbling a basketball, while also modifying some other aspects of their model to include all possible events that could occur in a possession and all realistic outcomes. 1 Introduction Graph theory is a branch of mathematics focusing on combinatorial struc- tures called graphs, which are configurations of vertices and edges [1]. While many branches of mathematics involve problems using calculations and metic- ulous measurement, graph theory started with puzzles, with the intent to test the cleverness [2]. Sparking the interest of many mathematicians, graph the- ory has become astoundingly abundant in depth and theoretical findings [2]. The idea of graph theory began with a Swiss mathematician, Leonhard Eu- ler, in 1735 when Euler answered an old puzzle called the Konigsberg bridge problem-a problem that involved finding a path that crosses over all seven bridges only once [3]. Euler claimed that there was no such path, which was the first theory ever proven in graph theory. There are many applications of graph theory, including chemistry, computer science, social sciences, and creating models [3]. Using graph theory to create a model is the application 1
  • 2. that our research focuses on, and more specifically, how that model can be used to predict the outcome and progression of a basketball game. The use of mathematical techniques to analyze sports is an endeavor that, while fairly new, has become a very popular method of decision-making for front-o ce executives in sports. Predicting the outcome of a sporting event has intrigued many for years, but predictions have been inadequate due to a lack of statistics to measure many actions that occur throughout the course of a basketball game. While statistics such as points, rebounds, and as- sists have been tracked for many years, data concerning the progression of a match has been uncommon. Because of this, many conventional approaches have focused on win-loss predictions only, placing no value on how the match might progress from start to finish. However, more information on individual players’ behaviors outside of these traditionally documented stats is now available because of the new player tracking systems used by the NBA. This system uses cameras to track the movement of every player in a game 25 times per second, and with that information provides data on the number of touches a player receives and how they move while on the court, among many other things [4]. With these resources, more detailed models examining the progression of a basketball game, sometimes referred to as a ”microsimulation” [5], are possible. Shirley [6] and Strumbelj and Vracar [7] attempted to use a Markov model to model the progression of a basketball game so that a more detailed ”microsimu- lation” of a game would be possible. However, they ignore the individual identities of the players on each team, and since one basketball team has many di↵erent 5-player combinations that it can use, the progression of a game depends greatly on the o↵ensive and defensive lineups in the game at a given point. The paper that served as the primary motivation for this research used a probabilistic graphical model to simulate how a basketball game progresses [5]. The authors account for the unique identities of di↵erent players by in- cluding in their model how the ball moves between players in each o↵ensive possession and actions that may happen before or after passing. Relation- ships between each player and possible actions are illustrated by a graph, where each vertex represents a player or an action and each edge represents the movement from one action to the next. Probability functions were found 2
  • 3. for each action, and using the player tracking data described earlier, the model provided a way to simulate a game using any combination of 5-player o↵ensive and defensive lineups. While this model accounted for some player interaction, it completely ignored the potential for a player to dribble the ball. Cervone et al. [8] performed a micro-analysis of basketball games and while their intent was not to simulate a game, they still quantitatively eval- uated every decision made by a player in the course of a possession, and a player’s possibility of dribbling the ball played a large role in that. Therefore, a model that can be used for microsimulation of basketball games should in- clude this very important aspect of a typical possession. Our model builds upon the one first introduced by Oh et al. [5] by includ- ing the possibility of dribbling to the basket (called ”drive” in our model), distinguishing between open and contested shots, including the possibility that a shot is blocked, and by modifying the e↵ect that defense has on many actions. Our complete model diagramming a possession is shown in Figure 1. Figure 1: Graphical Model showing possible flow of events in every possession 2 Method Our model uses player tracking data and team tracking data, which we found on NBA.com [4] and other statistics such as shot tracking data and lineup data, found on NBAsavant.com [10]. We model each possession using the 3
  • 4. graphical model shown in Figure 1, where the vertices represent every event in a game, and the movements from one event to the next are represented by the edges between vertices. The probability functions shown below determine the weights on any edge of the graph, showing the likelihood that each edge is travelled. Table 1: Summary of Notation Notation Meaning Section Lo O↵ensive team lineup 2.2, 2.3, 2.7, 2.10 Ld Defensive team lineup 2.2, 2.3, 2.5, 2.6, 2.8, 2.10 i Player i’s propensity to take a non-driving shot 2.2 ˜i Player i’s propensity to take a non-driving shot given the defensive lineup 2.2 i Player i’s ability to deter a non-driving shot at- tempt 2.2 bi Player i’s propensity to drive to the basket 2.3 ˜bi Player i’s propensity to drive given the defensive lineup 2.3 gi Player i’s ability to deter an attempt to drive to the basket 2.3 ↵ij Tendency of player i to pass to player j 2.7 ✓id Non-driving shooting ability of player i at basis d 2.5 id Defensive ability of player i to reduce non-driving shot accuracy at basis d 2.5 ci Driving shot accuracy of player i 2.6 fi Defensive ability of player i to reduce driving shot accuracy 2.6 rij Tendency of player i to pass to player j after a drive 2.7 vi Player i’s propensity to have a shot blocked 2.8 zi Player i’s propensity to block a shot 2.8 wij Weight that determines how much player i is af- fected by the defense of player j 2.2, 2.3, 2.5, 2.9 qij Weight giving likelihood that player j is defending player i at the end of a drive 2.6, 2.8 4
  • 5. id Player i’s ability to draw a shooting foul at basis d 2.9 ⇣id Player i’s foul proneness at basis d 2.9 ⇢d i Defensive rebound grabbing ability of player i 2.10 ⇢o i O↵ensive rebound grabbing ability of player i 2.10 ⌧a Average number of o↵ensive possessions of team a every 48 minutes 2.12 In the next few sections, we explain how all of these parameters are used to determine the weight assigned to each edge of the graphical model. 2.1 Start of a Possession We model the start of a possession just like Oh et al. [5] did, as a multinomial distribution between the players on the court at any given time. Our model considers the start of a possession to be whenever there is a change in the team that has the ball, such as a defensive rebound after a missed shot, an inbound pass after a made shot, or a steal. When a possession begins as an inbound pass, we look at historical backcourt touch data to see which player is most likely to receive the inbound pass and thus, begin the possession. Methods to find rebound and steal probabilities are discussed in sections 2.10 and 2.11 respectively. In the case of a steal or rebound, the player that begins the possession is whichever player got the rebound or steal. 2.2 Non-Driving Shot Frequency Similar to how Oh et al. [5] did, we model the probability of a field goal attempt 1 for a given touch, excluding shots taken on drives to the basket, which will be discussed in later sections, as a Bernoulli distribution with probability p(Sn i = 1 |Lo, Ld, , ) = ˜i + ˜i 1 4 X k2Lo,k6=i ˜k !! , where 1 The term ”field goal” is interchangeable with the term ”shot” in basketball. 5
  • 6. ˜i = i X j2Ld wij j with (x) = ex /(1 + ex ) included to ensure that the output values are between 0 and 1. Sn i is an indicator for whether player i attempts a non- driving shot given a touch. Lo and Ld represent the lineups of the o↵ensive and defensive teams respectively. i is a parameter which determines the likelihood of player i taking a shot and j is the defensive ability of player j to deter a shot attempt. The weight wij determines the likelihood that player j is defending player i on player i’s initial touch. We determine that weight according to both similarity of o↵ensive and defensive abilities and similarity of position groupings (guard or forward) for player i and defensive player j 2 . For example, if player i is the best o↵ensive guard, we give more weight to the best defensive guard on the opposing team than to the worst defensive guard, but we give more weight to the worst defensive guard than to the best defensive forward. Note that our method of determining wij is di↵erent than Oh et al. [5], who only used similarity of exact positions of player i and defensive player j. The justification for this is that defensive matchups are determined by more than just positional similarity. For example, if the point guard on the o↵ensive team is the best o↵ensive guard and the point guard on the defensive team is the worst defensive guard, then the o↵ensive point guard will likely not be defended by the defensive point guard. Rather, he will likely be defended by a di↵erent defensive guard with higher defensive ability, even though that defender may not be the defensive team’s point guard. 2.3 Drive Frequency We model the probability of a player driving the ball on a given touch as a function of the o↵ensive and defensive lineups, the o↵ensive player’s tendency to drive, and the defender’s ability to deter an attempt to drive. It is modeled 2 There are five positions on a basketball team: point guard, shooting guard, small forward, power forward, and center. The point guard, shooting guard, and small forward can be grouped more generally as ”guards” while the power forward and center can be grouped together as ”forwards.” 6
  • 7. as a Bernoulli distribution with probability p(Di = 1 |Lo, Ld, b, g) = ˜bi + ˜bi 1 4 X k2Lo,k6=i ˜bk !!! where ˜bi = bi X j2Ld wijgj. The parameter Di is an indicator for whether player i drives on a given touch. The parameter bi is a parameter describing how likely player i is to drive to the basket and gj is player j’s defensive ability to deter an attempt to drive to the basket. Therefore, ˜bi is player i’s propensity to drive while accounting for the current 5-player defensive lineup. As described in the previous section, wij is a weight determining the likelihood that on player i’s initial touch he is being defended by any given defensive player j 2 Ld. Note that P j2Ld wij = 1. Then P j2Ld wijgi is a weighted average of the defensive lineup’s ability to deter an attempt to drive, with more weight given to the ability of the defensive players that are most likely to be guarding o↵ensive player i. Thus, if an o↵ensive player is being defended by a poor defender, he is more likely to drive to the basket and vice versa. We also have an o↵set term (˜bi 1 4 P k2Lo,k6=i ˜bk) similar to the one first introduced by Oh et al. in their original function for shot frequency [5]. We include this in the model because a player’s decision to drive depends on not only his own tendency to drive and his defender, but also his teammates’ propensities. This term is positive (or negative) if the propensity of player i to drive is more (or less) than the average propensity of his teammates to drive. For example, if Lebron James and Kyrie Irving are on the court and Kyrie Irving is replaced by a taller, slower forward, Lebron James is likely to drive to the basket more. 2.4 Result of a Drive Given that a player drives to the basket, he has two options for what to do next. He can either shoot the ball or pass it back out to a teammate. We calculate the average probability of a shot attempt per drive for each player from historical data and use that independent of current lineup due to data constraints. There are other results of a drive, which include a turnover and 7
  • 8. drawing a shooting foul. In order to determine the likelihood of these actions happening, we use historical data. 2.5 Non-Driving Shot E ciency We model non-driving shot e ciency with an approach very similar to Oh et al. [5]. Given that a player attempts a non-driving field goal, we model the shot e ciency (the probability that the shot goes in) as a function of the defensive lineup, the shot location, the shooting ability of the o↵ensive player, and the ability of the defensive player to reduce shot e ciency. It is modeled as a Bernoulli distribution with probability p(Yi = 1 |Ld, d, ✓, ) = ✓id X j2Ld wij jd !! . The parameter Yi is an indicator for whether player i made the shot, d represents the ”basis,” or location, where the shot was taken, ✓id is player i’s shooting ability at basis d, and jd is the defensive ability of player j to reduce player i’s shot accuracy at basis d, and wij is the weight that has been discussed in previous sections. Note that our model di↵ers from Oh et al. [5] in that we include a weighted average of the entire defense’s ability to reduce shot accuracy at basis d rather than just the ability of the defender at the time of the shot. We do this because practically, it is di cult to determine who the defender would be in an actual simulation. In Figure 1, note that our model distinguishes between an open and contested shot, which is di↵erent than the one proposed by Oh et al. [5]. In their model, the defensive ability of the closest defensive player j always reduces shot accuracy, no matter the distance between player j and the shooter. However, if the closest defensive player j is far enough away from the shooter, his defensive ability will likely have no impact on the shooter’s accuracy, even if he is the closest defender. This is our justification for distinguishing between an open and contested shot. We define an ”open shot” as one where the closest defender is five or more feet away from the shooter. To model whether a shot is open, we calculate the percentage of total non-driving shots taken by player i that are open and use this as the probability that a non-driving shot taken by player i is open 3 . Given that 3 The probability that a non-driving shot attempt is contested would then just be equal to (1 probability of open shot). 8
  • 9. a shot is open, we simply set the defense’s ability to reduce shot accuracy equal to zero, as it would have no impact on player i’s ability to make the shot, and player i’s non-driving shot e ciency would become only a function of his own shooting ability at basis d. We sample the shot location from 6 bases: the restricted area, the rest of the paint,4 two point shots outside the paint, three pointers from the center of the court, three pointers from the left corner, and three pointers from the right corner. These were the bases originally used by Franks et al. [9] with the exception that Franks et al. combined corner three point shots into one basis. After examining players’ shooting data and seeing that their shooting abilities were impacted greatly depending on which corner they shot from (left or right), we determined that shots from left and right corner were intrinsically di↵erent shots and separated them into two bases. Therefore, when player i takes a non-driving shot, we sample the shot location from historical data on the percentage of non-driving shots that player i takes from each of the six bases. To summarize, when a player decides to take a non-driving shot, we first sample whether the shot was open or contested, then we sample the shot location from the six bases, and then the non-driving shot e ciency model gives the success probability of the shot. 2.6 Driving Shot E ciency Our model for driving shot e ciency is similar to our model for non-driving shot e ciency. Given that a player has driven to the basket and decides to shoot rather than pass, we model driving shot e ciency (the probability that a shot taken on a drive is made) as a function of the defensive lineup, the shooting ability of the o↵ensive player, and the defensive player’s ability to reduce driving shot accuracy. It is modeled as a Bernoulli distribution with probability p(Xi = 1 |Ld, c, f) = ci X j2Ld qijfj !! . Here Xi is an indicator for whether the shot is made, ci is the driving shot accuracy of player i, and fj is the defensive ability of player j to reduce driving shot accuracy. As in previous functions where the defender’s ability 4 The paint is defined as the area inside the rectangle, surrounding the basket. 9
  • 10. plays a role, we use the weighted average of the defense. However, the weight qij used here determines the likelihood that player j is defending player i at the end of a drive, which is di↵erent than the weight wij used in previous functions. The reason for this change is that on a drive, players move closer to the rim and often draw a di↵erent defender than the one who was guarding them when they began their drive. For example, if Stephen Curry drives past his defender to the basket, it is likely that a larger defender will step over to help defend Curry, and so the ability of the defender that stepped over to help defend Curry will play a larger role on Curry’s ability to make a driving shot than the defensive ability of the player that was initially guarding Curry. As with non-driving shot e ciency, we will also model whether the driving shot is open by calculating the percentage of total driving shots taken by player i that are open and use this as the probability that a driving shot taken by player i is open. In this case, the term representing the defense’s ability to reduce driving shot accuracy ( P j2Ld qijfj) would be set to zero. Note that our model for driving shot e ciency does not include the basis d that was included for non-driving shot e ciency. The justification for this is that all driving shots will be taken close to the rim and are therefore in roughly the same location. 2.7 Pass Network We model the passes between players just as Oh et al. [5] did: a network with edge weight parameterized by ↵ij(i 6= j). Then given that player i chooses to pass, the probability that he passes to player j is p(i ! j |↵, Lo) = ↵ij ⌃k6=i,k2Lo ↵ik . Then the probability that player i passes to player j depends on all five o↵ensive players. We use the Expectation Maximization (EM) algorithm on data of total number of passes between players (excluding passes after drives) and total number of possessions each lineup had in every game to determine the ↵ matrix. We use this same framework to model passes between players after a drive. That is, given that player i drives to the basket and decides to pass, the probability that he passes to player j is given by p(i ! j |r, Lo) = rij ⌃k6=i,k2Lo rik . 10
  • 11. Here, we use the EM algorithm on data of total number of passes between players after drives and total number of possessions each lineup had in every game to determine the rij matrix. We separate the two networks because we believe that the criteria for selecting a player to pass to after choosing to drive is di↵erent than the criteria for selecting a player to pass to before a drive. For example, after an o↵ensive player drives to the basket and causes the rest of the defense to shrink towards the basket, he is more likely to pass to a great shooter that may not have been as open before the drive. 2.8 Blocked Shots Given that a player has taken a contested shot (the defender is within five feet of the shooter) and is not fouled, we model the probability of the shot being blocked as a function of the defensive lineup, the o↵ensive player’s propensity to have their shot blocked, and the ability of the defense to block shots. It is modeled as a Bernoulli distribution with probability p(Bi = 1 |LD, v, z) = 1 2 vi + X j2LD qijzj ! . Here, Bi is an indicator for whether the shot is blocked, vi is player i’s propensity to have his shot blocked, zj represents defensive player j’s propensity to block shots, and qij is a weight determining the likelihood that player j is defending player i when player i takes a shot. Because the defense’s ability a↵ects the probability of the shot being blocked and di↵erent defenders could be guarding the o↵ensive player when he attempts the shot, we use a weighted average of the entire defense’s ability to block shots, giving more weight to the ability of players that typically play closer to the rim, where shots are typically more likely to be blocked. A description for why this weight is used instead of wij is given in section 2.6. We assume that if player i takes a shot that is contested by player j then the probability of that shot being blocked by player j is somewhere between the historical probability of player i having his shot blocked and the historical probability of player j to block a shot that he is contesting. For example, if a player who has his shot blocked 10% of the time takes a shot that is contested by a player that blocks 14% of the shots he defends, we can expect that the probability that the shot is blocked is between 10% and 14%. For the model, we assume that the probability of the shot being blocked is the average of 11
  • 12. vi and the weighted average of the defense’s probability of blocking the shot ( P j2LD qijzj). 2.9 Shooting Foul and Free Throw Our model for shooting fouls is similar to that of Oh et al.[5] and it is a func- tion of the shooter’s ability to draw a foul and the defender’s foul proneness; however, unlike our motivating paper, we do not take into consideration the basis of the shot on the court. We do this because we were not able to find data which represents the likelihood of a player being fouled at each of the 6 bases that we consider in other functions, nor do we feel as though this makes a large impact on the outcome of the game. We represent the likelihood that player i is fouled by player j while shooting by SF(i, j), given by p(SF(i, j) = 1 | , ⇣) = i + X j2LD wij⇣j ! . Here i is player i’s ability to draw a shooting foul, ⇣j is defensive player j’s foul proneness, and we include wij, the weight that determines how much player i is a↵ected by the defense of player j, similar to previous functions. Thus, if player j is very likely to foul the player he is guarding, the likelihood of a shooting foul increases. However, if player j is very likely to foul the player he is guarding but he isn’t guarding the shooter, the probability of a shooting foul doesn’t change much due to the weight. We use historical data of each player to model the probability of a free throw success because other players on the court, on o↵ense and defense, do not a↵ect the outcome of a free throw. 2.10 Rebound Our model for a player’s ability to get a rebound is modeled as a competition between all the players on the court and we model it just as Oh et al. [5] did. Because the e↵ort required to rebound o↵ensively and defensively are di↵erent, we model them di↵erently and call player i’s o↵ensive and defensive rebound ability ⇢o i and ⇢d i respectively. Considering both the o↵ensive and defensive players on the court, the probability that player i will grab an defensive or o↵ensive rebound is represented by DRi and ORi and given by 12
  • 13. p(DRi = 1 |Ld, Lo) = exp(⇢d i ) ⌃j2Ld exp(⇢d j ) + ⌃k2Lo exp(⇢o k) p(ORi = 1 |Ld, Lo) = exp(⇢o i ) ⌃j2Ld exp(⇢d j ) + ⌃k2Lo exp(⇢o k) Because our model is represented as a competition between the players on the court, the lineups of both teams are vital. We are also able to model the grabbing ability of possible lineups on any given team. 2.11 Turnover We model the likelihood of an individual turning the ball over to the other team like Oh et al. [5] did. There are two di↵erent types of turnovers that we consider – when the ball is stolen and any other event that results in an inbound pass. We use historical data of each player to determine the probability of turnover per touch and we also take into consideration the average steal rate of each player. Given a stolen ball, we assign a steal to the player who has a probability proportional to his average steal rate compared to the average steal rates of his teammates on the court. On any other type of turnover, the possession begins with an inbound pass. 2.12 Number of Possessions We model each team’s total number of possessions in a simpler way than Oh. et al. [5]. In a game between team a and b, the number of possessions of both teams will be roughly the same because the teams alternate posses- sions throughout the game. ⌧a represents the average number of o↵ensive possessions team a has. In a match between teams a and b where team a and team b play with di↵erent paces, we assume that the number of possessions that each team will have in the game is somewhere between ⌧a and ⌧b. If we assume that it is the average of the two then each team gets ⌧a+⌧b 2 possessions per game, meaning that each game has a total of (⌧a+⌧b 2 ) ⇤ 2 = ⌧a + ⌧b posses- sions for the entire game. For example, if team a averages 104 possessions per 48 minutes 5 and team b averages 98 possessions per 48 minutes, we take the average of 104 and 98 and say that a match between these two teams 5 A regulation game is 48 minutes long. 13
  • 14. will consist of 101 possessions for each team, which means there would be a total of 2 ⇤ 101 = 202 possessions in the game. 3 Data Collection We collected roughly 75 raw statistics from the player tracking data on NBA.com [4] and NBAsavant.com [10] that were necessary to learn the pa- rameters used in the probability functions. Some parameters, such as the probability that player i will make a free throw, need no further manipula- tion, as they were simply equal to a raw statistic. For example, in the case of the probability of player i making a free throw, the statistic measuring free- throw percentage could be used without manipulation. However, parameters such as the percentage of player’s non-driving field goal attempts at each of the 6 bases require multiple statistics and multiple calculations. To calculate how frequently a player shoots a non-driving shot in each basis, we needed his total shot attempts in each basis and total driving shot attempts in each basis. Then we subtracted total driving shots in each basis from total shots in each basis to obtain the total non-driving shots in each basis, and divided that number by the total shots in each basis to obtain the percentage of non- driving shots in each basis. After collecting all of these raw statistics and manipulating them so that they could be used as the necessary measures for the parameters, we input them into the probability functions. From there, we were able to assign a weight to each edge of the graphical model based on each team’s unique combination of 5 players on the court. 4 Simulation Once the data is collected and the parameters mentioned above are calcu- lated, the general framework for using our model to conduct a simulation would be fairly straightforward. At any vertex in the model, let X be the set of all possible events that could occur next. Then x1, x2, ..., xn are the elements of X, where n is the number of possible events that could occur next. We divide the closed interval [0, 1] into n subintervals, where each subinterval si has length p(xi), the probability of the xith event occurring. Note that the sum of these probabilities should add to 1, and if this sum is not exactly one, then each p(xi) is scaled so that nP i=1 p(xi) = 1. We then 14
  • 15. generate a random U 2 (0, 1). Then U 2 si, where 1  i  n and the event xi that corresponds to that subinterval si is the event that would occur next according to our simulation. Using the data we collected on the players for the Cleveland Cavaliers and Golden State Warriors, we simulated what might happen in a hypothetical game scenario where Lebron James has just taken a shot from the restricted area6 against the Golden State Warriors, who currently have Steph Curry, Klay Thompson, Andre Iguodala, Harrison Barnes, and Draymond Green on the court. Once he shoots the ball, the simulation framework that was just discussed can be used with the model to determine the series of events that lead to either a made or missed shot. First, the simulation determines whether the shot is contested or open. We had previously calculated that, given the players on the court, p(open) = 0.13 and p(contested) = 0.87. In other words, there is a 13% chance that the shot is open and an 87% chance that it is contested. Then we have X = ⇢ open if 0  U < 0.13 contested if 0.13  U < 1.0 and the generated U = 0.4387 so the simulation determines that the shot is contested. Next, the simulation would determine whether the shot is blocked or not. Using the data with the probability functions, we determine that p(blocked) = 0.0501 and that p(not blocked) = 0.9499. Then we have X = ⇢ blocked if 0  U < 0.0501 not blocked if 0.0501  U < 1.0 and the generated U = 0.4898 so the simulation determines that the shot is not blocked. Finally, the simulation would determine if the shot taken is made or missed. We use the historical data along with the probability functions to determine that p(made) = 0.6764 and that p(missed) = 0.3236. X = ⇢ made if 0  U < 0.6764 missed if 0.6764  U < 1.0 6 The restricted area is the area of the court directly under the goal. More specifically, it is the area within the semi-circle that lies under the goal. 15
  • 16. and the generated U = 0.4456 so the simulation determines that the shot is made. Two points would then be awarded to the Cleveland Cavaliers and the Warriors would become the team on o↵ense and start their o↵ensive possession. We can continue this process for each event in a possession, and then repeat it for an estimated number of possessions to provide both a winner and statistics for one game. We can them simulate each game multiple times to obtain averages for statistics and assign a win to the team with the higher number of wins. 5 Future Research While our model does provide a solid theoretical framework for performing a simulation, there are several improvements that could make our model both more accurate and complete. For one, some defensive parameters included in our probability functions that, while quantifiable, were di cult to find. Within the past few weeks, NBA.com released new defensive statistics that have been tracked by individual teams, but haven’t been publicly available until now. Incorporating this data would provide for more accurate simula- tions. Another improvement that could be made is to factor in a time and score component. This model assumes that players tendencies are constant throughout the entirety of a game, however, in reality this is not necessarily the case. Players’ behaviors can change depending on both the time left in the game and the score of the game. For example, in a game where the score is close near the end, a player like Lebron James would tend to shoot more than he usually would, and our model does not account for this. Although it wouldn’t a↵ect the outcome of the game, future researches should keep track of the assists for statistical reasons. Keeping track of which players assist when would only make the player’s statistics more accurate. There is obviously no perfect way to simulate or predict a basketball game but we believe that these improvements would only make for a more complete, more specific tool for teams and coaches to use. 16
  • 17. References [1] Gross, Jonathan L., and Jay Yellen. Graph theory and its applications. CRC press, 2005. [2] Biggs, Norman, E. Keith Lloyd, and Robin J. Wilson. Graph Theory 1736-1936. Oxford, England: Oxford UP, 1976. Print. [3] “Graph Theory.” Encyclopedia Britannica Online. Encyclopedia Britan- nica. Web. [4] “NBA.com, O cial Site of the National Basketball Association.” NBA.com. Web. 13 Jan. 2016. http://www.NBA.com/. [5] Oh,M., Keshri, S. and Iyengar, G. (2015). Graphical Model for Basketball Match Simulation. In MIT Sloan Sports Analytics Conference. [6] Shirley, K. (2007). A Markov model for basketball. Poster presentation at New England Symposium for Statistics in Sports, Boston, MA, Septem- ber 2007. [7] Strumbelj, E. and Vracar, P (2012). Simulating a basketball match with a homogeneous Markov model and forecasting the outcome. International Journal of Forecasting 28(2012) 532-542. [8] Cervone, D., D’Amour, A., Bornn, L., and Goldsberry, K. ”POINTWISE: predicting points and valuing decisions in real time with NBA Optical Tracking Data.” Proceedings MIT Sloan Sports Analytics (2014). [9] Franks, A., Miller, A., Bornn, L., and Goldsberry, K. (2014). Character- izing the Spatial Structure of Defensive Skill in Professional Basketball. arXiv:1405.0231 [10] Your Source For Advanced NBA Analytics. Web. 14 Jan. 2016. http://www.NBAsavant.com/. 17