3. School of Computing &
Information
Complex Predictions
• Prediction models give us a prediction under the
assumption that we know the inputs to the model
with certainty
• However, in many cases we are not certain for the
inputs
– Measurement errors
– Output from other models
– Inputs to the model are realized in the future
– …
4. School of Computing &
Information
Complex Predictions
• In other cases we do not even have a closed form
solution for the quantity we want to predict
– E.g., probability of a team winning a sports league
• In all these cases point estimates will not be able
to account for the randomness and uncertainty
associated with the phenomenon tried to be
described
5. School of Computing &
Information
Monte Carlo Simulations
• Monte Carlo simulations embrace the randomness
and uncertainty expressed through probabilities
for the inputs
– Uncertainty propagation mechanism
• A Monte Carlo – in its simplest form – is an
iterative process of sampling from a probability
distribution
6. School of Computing &
Information
Discrete Event Simulation
• Many times the operation of a system can be
abstracted to a series of discrete events taking
place with some probability
– Special case of Monte Carlo simulations
• Each event changes the state of the system
• The dependencies of these events can be overly
complicated formal treatment might not be
possible
– However, these discrete events can be simulated
several times in order to obtain an estimation of state
probabilities
7. School of Computing &
Information
Discrete Event Simulation
• A sports tournament is a series of matchups
• The outcome of each matchup is associated with a
given probability
• The “state of the league” (e.g., who gets into the
playoffs, who gets a buy week in the playoffs, who
gets the championship etc.) could be estimated
probabilistically by combining the probabilities of
individual matchups
– Tedious
– Solution: discrete event simulation (DES)
8. School of Computing &
Information
Discrete Event Simulation
• The smallest unit of a DES is the event
• In the case of a tournament this event correspond
to a matchup
– Building block for the DES is to simulate a single
event
• The single event simulation resembles a coin flip
– Biased coin
9. School of Computing &
Information
Discrete Event Simulation
Spurs rating: +6.4
Warriors rating: + 11.2
Home edge: 3
Projected point differential: 3+6.4-11.2= - 1.8
Home
Projected Spurs win probability: 43%
0 1
0.43
10. School of Computing &
Information
Discrete Event Simulation
Spurs rating: +6.4
Warriors rating: + 11.2
Home edge: 3
Projected point differential: 3+6.4-11.2= - 1.8
Home
Projected Spurs win probability: 43%
0 1
0.43
Imagine that you draw a line
intersecting the green line
vertically with your eyes closed
Why with closed eyes?
Spurs win Warriors win
If you repeat this process 1,000 times
how many times do you expect Spurs to win?
11. School of Computing &
Information
Discrete Event Simulation
Final
probabilities
Simulate
upcoming
matchups
Update standings,
playoff pairings,
etc.
Tournament not over?
Tournament over? Store
result
Number of simulations not reached?
12. School of Computing &
Information
Discrete Event Simulations
• How can we obtain the probability of each event?
(i.e., for each matchup)
• We can use team ratings
– Team ratings provide an estimate for the final point
difference between a matchup
– How to translate this to a probability?
13. School of Computing &
Information
Discrete Event Simulations
• Hal Stern in his seminal work “On the Probability
of Winning a Football Game” showed that the
difference between the final point margin of a
game and the point spread follows a normal
distribution with mean 0 and standard deviation
of 13.86
– Stern’s study was focused on Vegas’ point spread
– If one uses Vegas point spreads then can simply
calculate the probability of the favorite (by p points)
wins as:
Cumulative Distribution Function for
the standard normal distribution
14. School of Computing &
Information
Discrete Event Simulations
• We do not need to obtain Vegas’ spreads
– We can use our own regression-based rating method
– Will the difference between the point margin and our
prediction follow N(0,13.86)?
• Most probably not
• But we can examine this on past data
– Most probably it will still be a normal distribution but
with different variance
15. School of Computing &
Information
Monte Carlo Simulations
• Some times we need to
model a sequence of
discrete events that are
probabilistic in nature
– Maybe the best example is
modeling the winner of a
sports competition
• What we will need to know
is (i) probability of each
event and (ii) the sequence
of events
16. School of Computing &
Information
Final Four Simulation
• Let’s consider a simple discrete event simulation
case – Olympic Games Final Four
• The discrete events are the outcome of each single
game
– These probabilities can be obtained
through specific predictive models
• We also know how teams are
going to matchup in the future
17. School of Computing &
Information
Final Four Simulation
• For each semi final we flip a biased coin to decide
the winner
– The bias is based on the pre-computed probabilities
for each game
• Based on the outcome of our simulated semi finals
we simulate the corresponding final
• We repeat the process several times and
keep track of how many times each team won
18. School of Computing &
Information
Final Four Simulation
• An unbiased coin has a 50-50 chance of landing
heads or tails
– How to simulate a biased coin?
1-π
If we sample a uniform random distribution between 0 and 1,
the probability of getting a number in the interval [0,π] is
proportional to π.
π
Length π
19. School of Computing &
Information
Final Four Simulation
Gold Silver Bronze
Australia 0.21 0.21 0.29
France 0.22 0.32 0.18
Slovenia 0.18 0.28 0.29
USA 0.39 0.19 0.24
20. School of Computing &
Information
Bootstrap
• Monte Carlo and DES are based on random
sampling of known distributions for the
parameters/variables of the system
– The simulations simply allow us to propagate this
uncertainty to the output
• What if we want to identify the distribution of a
sample estimate but we only have a sample of
observations?
– E.g., assume that we are interested in the average
points scored by the Celtics per game
21. School of Computing &
Information
Bootstrap
• We have a sample from the 50 first games of the
season
• We could make an assumption for the distribution
of the data and use Maximum Likelihood
Estimates and the corresponding standard errors
– Normal distribution seems like an assumption that
someone could readily make since it has been the case
in many other situations (possibly unjustified)
• A better option is to use bootstrap
22. School of Computing &
Information
Bootstrap
• Estimate properties of an estimator through
resampling with replacement
– Assumption: observed data is a random sample of the
original population
• Typically we have only one sample – of n points –
observed for our variable of interest
– We can obtain a sample estimate (e.g., for the mean)
but we cannot estimate the distribution of this
estimator
23. School of Computing &
Information
Bootstrap Illustration
• Points scored by the Celtics during the first 50
games
– What is the average number of points scored?
• We can obtain a sample estimator (98.4 km/h)
• However, we do not know the
distribution of this estimator
– Resampling with replacement
will allow us to learn more about
the estimator
25. School of Computing &
Information
Bootstrap Illustration
Original Sample
X1 = {100}
26. School of Computing &
Information
Bootstrap Illustration
Original Sample
X1 = {100, 110}
27. School of Computing &
Information
Bootstrap Illustration
Original Sample
X1 = {100, 110, 114}
28. School of Computing &
Information
Bootstrap Illustration
Original Sample
X1 = {100, 110, 114, 110}
29. School of Computing &
Information
Bootstrap Illustration
Original Sample
X1 = {100, 110, 114, 110, 98} First Bootstrap Sample
…
XB = {98, 101, 95, 79, 100} B-th Bootstrap Sample
30. School of Computing &
Information
Bootstrap Illustration
• Through bootstrap we can
identify the distribution of
an estimator and use it for
our simulations if needed
• For multidimensional data
with correlations block
bootstrap can be used
31. School of Computing &
Information
Block Bootstrap
• Data might exhibit correlations
– Time series
– Spatial data
– Cluster data
– …
• Block bootstrap attempts to replicate the correlation
structure in the bootstrapped samples
– Instead of resampling single data points, blocks of data
are resampled
33. School of Computing &
Information
Biased Bootstrap
• Sampling in bootstrap is uniform
– That is, every sample has the same probability of
being picked
• Sometimes – depending on the application – we
might want to resample the observations in a non-
uniform way
34. School of Computing &
Information
Biased Bootstrap Example
• The points scored per game sample includes
games against opponents of variable strength
• However, 100 points scored against the top
defensive team is not the same as 100 points
scored against the bottom defensive team
• If we want to estimate the distribution of the
average number of points the Celtics score against
teams similar to their next opponent we should
use biased bootstrap
35. School of Computing &
Information
Biased Bootstrap Example
• Let’s assume that Celtics’ next opponent has a
defensive rating of -4 points (i.e., they allow 4
points less than an average defense)
• How can we use this information to get an
estimate for the average points to be scored by the
Celtics ?
• Biased bootstrap based on the defensive rating
37. School of Computing &
Information
Biased Bootstrap Example
• Performances against teams with similar defensive
rating with our next opponent will be sampled
more aggressively
• Obviously one can use more than one variables to
calculate the bias term
– E.g., for simulating future matchups one might need
to control for both offensive and defensive ratings,
home-vs-away games etc.
38. School of Computing &
Information
Why does bootstrap work?
• Bootstrap almost looks as magic!
• The way traditional inferential statistics work is
that we have a population and we randomly
sample a set of points to infer the statistic of
interest
• Ideally we would take several samples from the
population and for each sample calculate the
statistic of interest
– Estimate the variability of the statistic
39. School of Computing &
Information
Why does bootstrap work?
• Getting several samples from the population is not
practical/realistic
• Solution 1 (inferential statistics): make
assumptions for the shape of the population
• Solution 2 (bootstrap statistics): use information
from the (single) population sample that you have
– The sample that we have is a (smaller) population
itself with the same shape as the original population
40. School of Computing &
Information
Why does bootstrap work?
• In this case resampling with replacement
simulates the generation of multiple samples from
the original population
– Replacing back the sampled data points retains the
shape of the original population
• The sample we have is the best information – and
in fact the only information - we have for the
population and bootstrap takes maximum
advantage of it
41. School of Computing &
Information
Bootstrap and Sample Size
• The only assumption that the bootstrap method
makes is that the sample is representative of the
population
– Therefore, if we have a very small sample (e.g., 4
points) then the bootstrap method itself will be
limited
– However, it can still be applied, but the corresponding
estimates might be far from the true mean
• The same though will be true for the sample mean
obtained from the small sample as well
42. School of Computing &
Information
Bootstrap and Sample Size
• There is no restriction on what size should the
bootstrap samples have
– Typically we choose them to have the same size as the
original sample
• With regards to the number of bootstrapped
samples to be obtained this is in some sense
similar to the number of Monte Carlo simulations
– Rule of thumb: the more the better