1. Predicting March Madness Using Probabilities
Liana Valentino
College of Charleston
Introduction
Numerous predictive models exist that are used to predict a bracket for the NCAA March
Madness tournament. Basketball analysts have different opinions regarding which statistics
are important to use and the weight of importance associated with each statistic; this
discrepancy provides the option to use a variety of different models. Instead of focusing on
one model, the current research discusses using several methods with different weights and
using the probabilities of teams advancing to create a bracket. This allows a bracket to be
created from a combination of many models, instead of using a sole method.
Ratings: I chose to incorporate several different methods to create 36 different brackets, then
decided on a final bracket based on the predictions of the individual models. To create the
brackets, I started with uniform rating methods, then added various weights. The ratings
methods used for this study were Massey and Colley. The use of two different methods gives
a wider range of possibilities. A brief explanation of how these calculations differ is:
• Massey integrates the scores of the games, which allows a larger point differential to
produce a larger increase or decrease in rating.
• Colley only uses wins and losses, not looking at the scores of the games.
Starting.
These methods were then modified by different weights to incorporate different aspects of
the game to produce different sets of ratings. The ratings are then used to fill out a bracket,
being that the team with the higher rating will move on to the next game. In this study,
multiple sets of rankings are generated, then the probability that each team makes it to a
particular round is used to create the final bracket.
Weights: In order to use various methods opposed to a sole method to create a bracket,
different weights are added to the original ranking methods. The four weights incorporated
into this study are:
1. Location of the win. If a team wins a game on the road, it is weighed differently than if
they were to win at home.
2. Margin of victory. Massey incorporates point differential in the sense that winning by a lot
of points makes your rating better. In this case, close games are counted more than blow
out games.
3. When the game was played. Games played at different points in the season are weighted
differently.
4. Winning streak. This looks at how many games a team has won in a row. If your opponent
is on a winning streak, and you break that winning streak, that game is weighed more.
Probabilities: Probabilities of teams advancing to the next round is how the bracket is
created. This calculated by going through the 36 brackets and counting how many times each
team makes it to each round. I also calculated how likely a team is to make it to a specific
round regards to how often they made it to the previous round. For example, if a team makes
it to both the Elite 8 and Final Four five times, then given the team makes it to the Elite 8, the
probability of them making it to the Final Four is 100%. . Using this data, the bracket is
created by assuming that the teams with the highest probabilities in each round will be the
ones to progress.
Since Uniform Massey is the standard rating system that performs
the best on average, that method is used for comparison. In Figure
3, the prediction accuracy from the probabilities calculated in the
current study are compared to Massey's over the previous 15
years. Prediction accuracy is measured by how many games the
method predicted correctly in the tournament. From a visual
inspection, there are no significant differences between the two
models; in some years the probabilities calculated in the current
study were more accurate than Massey's predictions, and in some
years the pattern was opposite.
Team 3rd Round Sweet 16 Elite 8 Final 4 Champ Winner
Kentucky 100 100 100 100 75 75
Wisconsin 100 100 100 61.11 2.78 2.78
Villanova 100 100 100 97.22 69.44 2.78
Duke 100 100 83.33 72.22 25.00 0
Figure 1: 2015 Number 1 Seed Overall Probabilities
Figure 3: Prediction Accuracies by Year
Creating a bracket would be done using data similar to what is
displayed in Figure 1, which shows the probabilities number one
seeds of 2015 progressing to the specified round. For example,
• Wisconsin makes it to the Final Four 61% of the time
• Duke makes it 72% of the time.
This also shows that out of the number one seeds, Wisconsin has
the smallest probability of making it to the championship game.
Figure 2 provides a different analysis, giving the probability a team
makes it given that they made it to the previous round. Wisconsin
is shown winning the tournament 3% of the time, but Figure 2 tells
us if they do make it to the championship game, they win the
tournament 100% of the time. Figure 2 also shows that the only
round Kentucky would lose in is the Final Four.
Team 3rd Round Sweet 16 Elite 8 Final 4 Champ Winner
Kentucky 100 100 100 100 75.00 100
Wisconsin 100 100 100 61.11 4.55 100
Villanova 100 100 100 97.22 71.43 4
Duke 100 100 83.33 86.67 34.62 0
Figure 2: 2015 Number 1 Seed Previous Round Probabilities
0
10
20
30
40
50
60
70
80
90
3rd Round Sweet 16 Elite 8 Final Four Championship Winner
Accuracy%
Probability Accuracy
>90
80-90
70-80
60-70
50-60
50
55
60
65
70
75
80
85
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Accuracy%
Prediction Accuracy
Probabilities
Uniform
Massey
Figure 4: Probability Prediction Accuracy by Round
Opposed to looking at the probability method accuracy as a whole,
Figure 4 displays how ranges of probabilities perform in
comparison to each other each round. For example, the graph
shows that having a probability greater than 90% is the most
accurate in the 3rd round, but having a probability between 50%
and 60% performs better in the Elite 8 and Final Four. This
accuracy is calculated by counting the number of teams predicted
to make it to that round with the respective probability compared to
the number of times they actually do from 2001 to 2015
Over the 15 years of data that the probabilities method was tested on, it has an average prediction accuracy of 65.5%. This approach
was used to be able to incorporate some different opinions of what statistics are important. Although the average prediction accuracy is
the same as Uniform Massey, Massey produces a higher ESPN score on average, implying that Massey predicts more accurately in
the later rounds than probabilities. In general, the methods perform rather similarly across the rounds and on average. For the 2015
tournament, probabilities predicted 66.7% of the games correctly while Massey predicted 69.8% correctly. One of the best aspects of
the probabilities methods is that it produces an output that is easy to explain and understand. The probabilities displayed in Figure 2
represent very different information than Figure 1, but still useful when creating a bracket. It tells you how likely a team is to win the next
game assuming they won the previous game. The output in Figure 4 is interesting because it shows us that over 15 years, a team
having a probability of making it to that round greater than 90% is not always the most accurate. Also, it’s shown that if a team has a
probability between 80% and 90% of making it to the Elite 8, they are actually least likely to make it. Overall, the purpose of this study
was to create a bracket using the results of multiple rating methods as opposed to one. Over a 15 year span, the method appears to be
as accurate as existing methods. There are many more factors that can be included or added to the study to produce different and
more accurate results in the future.
Method
Results
Discussion
Acknowledgements
Dr. Amy Langville, John Sussingham, Drew Passarello, Stephen Gorman, and Thad Sulek, College of Charleston.
Dr. Tim Chartier, Davidson College.