SlideShare a Scribd company logo
1 of 1
Download to read offline
Predicting March Madness Using Probabilities
Liana Valentino
College of Charleston
Introduction
Numerous predictive models exist that are used to predict a bracket for the NCAA March
Madness tournament. Basketball analysts have different opinions regarding which statistics
are important to use and the weight of importance associated with each statistic; this
discrepancy provides the option to use a variety of different models. Instead of focusing on
one model, the current research discusses using several methods with different weights and
using the probabilities of teams advancing to create a bracket. This allows a bracket to be
created from a combination of many models, instead of using a sole method.
Ratings: I chose to incorporate several different methods to create 36 different brackets, then
decided on a final bracket based on the predictions of the individual models. To create the
brackets, I started with uniform rating methods, then added various weights. The ratings
methods used for this study were Massey and Colley. The use of two different methods gives
a wider range of possibilities. A brief explanation of how these calculations differ is:
•  Massey integrates the scores of the games, which allows a larger point differential to
produce a larger increase or decrease in rating.
•  Colley only uses wins and losses, not looking at the scores of the games.
Starting.
These methods were then modified by different weights to incorporate different aspects of
the game to produce different sets of ratings. The ratings are then used to fill out a bracket,
being that the team with the higher rating will move on to the next game. In this study,
multiple sets of rankings are generated, then the probability that each team makes it to a
particular round is used to create the final bracket.
Weights: In order to use various methods opposed to a sole method to create a bracket,
different weights are added to the original ranking methods. The four weights incorporated
into this study are:
1.  Location of the win. If a team wins a game on the road, it is weighed differently than if
they were to win at home.
2.  Margin of victory. Massey incorporates point differential in the sense that winning by a lot
of points makes your rating better. In this case, close games are counted more than blow
out games.
3.  When the game was played. Games played at different points in the season are weighted
differently.
4.  Winning streak. This looks at how many games a team has won in a row. If your opponent
is on a winning streak, and you break that winning streak, that game is weighed more.
Probabilities: Probabilities of teams advancing to the next round is how the bracket is
created. This calculated by going through the 36 brackets and counting how many times each
team makes it to each round. I also calculated how likely a team is to make it to a specific
round regards to how often they made it to the previous round. For example, if a team makes
it to both the Elite 8 and Final Four five times, then given the team makes it to the Elite 8, the
probability of them making it to the Final Four is 100%. . Using this data, the bracket is
created by assuming that the teams with the highest probabilities in each round will be the
ones to progress.
Since Uniform Massey is the standard rating system that performs
the best on average, that method is used for comparison. In Figure
3, the prediction accuracy from the probabilities calculated in the
current study are compared to Massey's over the previous 15
years. Prediction accuracy is measured by how many games the
method predicted correctly in the tournament. From a visual
inspection, there are no significant differences between the two
models; in some years the probabilities calculated in the current
study were more accurate than Massey's predictions, and in some
years the pattern was opposite.
Team 3rd Round Sweet 16 Elite 8 Final 4 Champ Winner
Kentucky 100 100 100 100 75 75
Wisconsin 100 100 100 61.11 2.78 2.78
Villanova 100 100 100 97.22 69.44 2.78
Duke 100 100 83.33 72.22 25.00 0
Figure 1: 2015 Number 1 Seed Overall Probabilities
Figure 3: Prediction Accuracies by Year
Creating a bracket would be done using data similar to what is
displayed in Figure 1, which shows the probabilities number one
seeds of 2015 progressing to the specified round. For example,
•  Wisconsin makes it to the Final Four 61% of the time
•  Duke makes it 72% of the time.
This also shows that out of the number one seeds, Wisconsin has
the smallest probability of making it to the championship game.
Figure 2 provides a different analysis, giving the probability a team
makes it given that they made it to the previous round. Wisconsin
is shown winning the tournament 3% of the time, but Figure 2 tells
us if they do make it to the championship game, they win the
tournament 100% of the time. Figure 2 also shows that the only
round Kentucky would lose in is the Final Four.
Team 3rd Round Sweet 16 Elite 8 Final 4 Champ Winner
Kentucky 100 100 100 100 75.00 100
Wisconsin 100 100 100 61.11 4.55 100
Villanova 100 100 100 97.22 71.43 4
Duke 100 100 83.33 86.67 34.62 0
Figure 2: 2015 Number 1 Seed Previous Round Probabilities
0
10
20
30
40
50
60
70
80
90
3rd Round Sweet 16 Elite 8 Final Four Championship Winner
Accuracy%
Probability Accuracy
>90
80-90
70-80
60-70
50-60
50
55
60
65
70
75
80
85
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Accuracy%
Prediction Accuracy
Probabilities
Uniform
Massey
Figure 4: Probability Prediction Accuracy by Round
Opposed to looking at the probability method accuracy as a whole,
Figure 4 displays how ranges of probabilities perform in
comparison to each other each round. For example, the graph
shows that having a probability greater than 90% is the most
accurate in the 3rd round, but having a probability between 50%
and 60% performs better in the Elite 8 and Final Four. This
accuracy is calculated by counting the number of teams predicted
to make it to that round with the respective probability compared to
the number of times they actually do from 2001 to 2015
Over the 15 years of data that the probabilities method was tested on, it has an average prediction accuracy of 65.5%. This approach
was used to be able to incorporate some different opinions of what statistics are important. Although the average prediction accuracy is
the same as Uniform Massey, Massey produces a higher ESPN score on average, implying that Massey predicts more accurately in
the later rounds than probabilities. In general, the methods perform rather similarly across the rounds and on average. For the 2015
tournament, probabilities predicted 66.7% of the games correctly while Massey predicted 69.8% correctly. One of the best aspects of
the probabilities methods is that it produces an output that is easy to explain and understand. The probabilities displayed in Figure 2
represent very different information than Figure 1, but still useful when creating a bracket. It tells you how likely a team is to win the next
game assuming they won the previous game. The output in Figure 4 is interesting because it shows us that over 15 years, a team
having a probability of making it to that round greater than 90% is not always the most accurate. Also, it’s shown that if a team has a
probability between 80% and 90% of making it to the Elite 8, they are actually least likely to make it. Overall, the purpose of this study
was to create a bracket using the results of multiple rating methods as opposed to one. Over a 15 year span, the method appears to be
as accurate as existing methods. There are many more factors that can be included or added to the study to produce different and
more accurate results in the future.
Method
Results
Discussion
Acknowledgements
Dr. Amy Langville, John Sussingham, Drew Passarello, Stephen Gorman, and Thad Sulek, College of Charleston.
Dr. Tim Chartier, Davidson College.

More Related Content

Similar to March Madness Probabilities

Ranking College Football
Ranking College FootballRanking College Football
Ranking College FootballWinston DeLoney
 
NCAA March Madness Recruiting For Success
NCAA March Madness Recruiting For SuccessNCAA March Madness Recruiting For Success
NCAA March Madness Recruiting For SuccessJonathan Stryer
 
Good score prediction strategies
Good score prediction strategiesGood score prediction strategies
Good score prediction strategiesKarwkai Sarunyapon
 
The Problem of the Chinese Basketball Association Competing for the Championship
The Problem of the Chinese Basketball Association Competing for the ChampionshipThe Problem of the Chinese Basketball Association Competing for the Championship
The Problem of the Chinese Basketball Association Competing for the ChampionshipDr. Amarjeet Singh
 
Football predictions
Football predictionsFootball predictions
Football predictionsponton42
 
PREDICTIVE MODELS FOR GAME OUTCOMES IN WOMEN’S LACROSSE
PREDICTIVE MODELS FOR GAME OUTCOMES IN WOMEN’S LACROSSEPREDICTIVE MODELS FOR GAME OUTCOMES IN WOMEN’S LACROSSE
PREDICTIVE MODELS FOR GAME OUTCOMES IN WOMEN’S LACROSSEmathsjournal
 
PREDICTIVE MODELS FOR GAME OUTCOMES IN WOMEN’S LACROSSE
PREDICTIVE MODELS FOR GAME OUTCOMES IN WOMEN’S LACROSSEPREDICTIVE MODELS FOR GAME OUTCOMES IN WOMEN’S LACROSSE
PREDICTIVE MODELS FOR GAME OUTCOMES IN WOMEN’S LACROSSEmathsjournal
 
PREDICTIVE MODELS FOR GAME OUTCOMES IN WOMEN’S LACROSSE
PREDICTIVE MODELS FOR GAME OUTCOMES IN WOMEN’S LACROSSEPREDICTIVE MODELS FOR GAME OUTCOMES IN WOMEN’S LACROSSE
PREDICTIVE MODELS FOR GAME OUTCOMES IN WOMEN’S LACROSSEmathsjournal
 
PREDICTIVE MODELS FOR GAME OUTCOMES IN WOMEN’S LACROSSE
PREDICTIVE MODELS FOR GAME OUTCOMES IN WOMEN’S LACROSSEPREDICTIVE MODELS FOR GAME OUTCOMES IN WOMEN’S LACROSSE
PREDICTIVE MODELS FOR GAME OUTCOMES IN WOMEN’S LACROSSEmathsjournal
 
PREDICTIVE MODELS FOR GAME OUTCOMES IN WOMEN’S LACROSSE
PREDICTIVE MODELS FOR GAME OUTCOMES IN WOMEN’S LACROSSEPREDICTIVE MODELS FOR GAME OUTCOMES IN WOMEN’S LACROSSE
PREDICTIVE MODELS FOR GAME OUTCOMES IN WOMEN’S LACROSSEmathsjournal
 
PREDICTIVE MODELS FOR GAME OUTCOMES IN WOMEN’S LACROSSE
PREDICTIVE MODELS FOR GAME OUTCOMES IN WOMEN’S LACROSSEPREDICTIVE MODELS FOR GAME OUTCOMES IN WOMEN’S LACROSSE
PREDICTIVE MODELS FOR GAME OUTCOMES IN WOMEN’S LACROSSEmathsjournal
 
As mentioned earlier, the mid-term will have conceptual and quanti.docx
As mentioned earlier, the mid-term will have conceptual and quanti.docxAs mentioned earlier, the mid-term will have conceptual and quanti.docx
As mentioned earlier, the mid-term will have conceptual and quanti.docxfredharris32
 
Do lower-seeded teams really play with an "underdog" mentality?
Do lower-seeded teams really play with an "underdog" mentality?Do lower-seeded teams really play with an "underdog" mentality?
Do lower-seeded teams really play with an "underdog" mentality?Kymee Noll
 
m503 Project1 FINAL DRAFT
m503 Project1 FINAL DRAFTm503 Project1 FINAL DRAFT
m503 Project1 FINAL DRAFTBrian Becker
 
A Hybrid Constraint Programming And Enumeration Approach For Solving NHL Play...
A Hybrid Constraint Programming And Enumeration Approach For Solving NHL Play...A Hybrid Constraint Programming And Enumeration Approach For Solving NHL Play...
A Hybrid Constraint Programming And Enumeration Approach For Solving NHL Play...Shannon Green
 
Lab Based E-portfolio
Lab Based E-portfolioLab Based E-portfolio
Lab Based E-portfolioNor Khamsiah
 

Similar to March Madness Probabilities (20)

Ranking College Football
Ranking College FootballRanking College Football
Ranking College Football
 
Cricket predictor
Cricket predictorCricket predictor
Cricket predictor
 
NCAA March Madness Recruiting For Success
NCAA March Madness Recruiting For SuccessNCAA March Madness Recruiting For Success
NCAA March Madness Recruiting For Success
 
Good score prediction strategies
Good score prediction strategiesGood score prediction strategies
Good score prediction strategies
 
Lineup Efficiency
Lineup EfficiencyLineup Efficiency
Lineup Efficiency
 
Final Research Paper
Final Research PaperFinal Research Paper
Final Research Paper
 
The Problem of the Chinese Basketball Association Competing for the Championship
The Problem of the Chinese Basketball Association Competing for the ChampionshipThe Problem of the Chinese Basketball Association Competing for the Championship
The Problem of the Chinese Basketball Association Competing for the Championship
 
Football predictions
Football predictionsFootball predictions
Football predictions
 
PREDICTIVE MODELS FOR GAME OUTCOMES IN WOMEN’S LACROSSE
PREDICTIVE MODELS FOR GAME OUTCOMES IN WOMEN’S LACROSSEPREDICTIVE MODELS FOR GAME OUTCOMES IN WOMEN’S LACROSSE
PREDICTIVE MODELS FOR GAME OUTCOMES IN WOMEN’S LACROSSE
 
PREDICTIVE MODELS FOR GAME OUTCOMES IN WOMEN’S LACROSSE
PREDICTIVE MODELS FOR GAME OUTCOMES IN WOMEN’S LACROSSEPREDICTIVE MODELS FOR GAME OUTCOMES IN WOMEN’S LACROSSE
PREDICTIVE MODELS FOR GAME OUTCOMES IN WOMEN’S LACROSSE
 
PREDICTIVE MODELS FOR GAME OUTCOMES IN WOMEN’S LACROSSE
PREDICTIVE MODELS FOR GAME OUTCOMES IN WOMEN’S LACROSSEPREDICTIVE MODELS FOR GAME OUTCOMES IN WOMEN’S LACROSSE
PREDICTIVE MODELS FOR GAME OUTCOMES IN WOMEN’S LACROSSE
 
PREDICTIVE MODELS FOR GAME OUTCOMES IN WOMEN’S LACROSSE
PREDICTIVE MODELS FOR GAME OUTCOMES IN WOMEN’S LACROSSEPREDICTIVE MODELS FOR GAME OUTCOMES IN WOMEN’S LACROSSE
PREDICTIVE MODELS FOR GAME OUTCOMES IN WOMEN’S LACROSSE
 
PREDICTIVE MODELS FOR GAME OUTCOMES IN WOMEN’S LACROSSE
PREDICTIVE MODELS FOR GAME OUTCOMES IN WOMEN’S LACROSSEPREDICTIVE MODELS FOR GAME OUTCOMES IN WOMEN’S LACROSSE
PREDICTIVE MODELS FOR GAME OUTCOMES IN WOMEN’S LACROSSE
 
PREDICTIVE MODELS FOR GAME OUTCOMES IN WOMEN’S LACROSSE
PREDICTIVE MODELS FOR GAME OUTCOMES IN WOMEN’S LACROSSEPREDICTIVE MODELS FOR GAME OUTCOMES IN WOMEN’S LACROSSE
PREDICTIVE MODELS FOR GAME OUTCOMES IN WOMEN’S LACROSSE
 
As mentioned earlier, the mid-term will have conceptual and quanti.docx
As mentioned earlier, the mid-term will have conceptual and quanti.docxAs mentioned earlier, the mid-term will have conceptual and quanti.docx
As mentioned earlier, the mid-term will have conceptual and quanti.docx
 
Do lower-seeded teams really play with an "underdog" mentality?
Do lower-seeded teams really play with an "underdog" mentality?Do lower-seeded teams really play with an "underdog" mentality?
Do lower-seeded teams really play with an "underdog" mentality?
 
m503 Project1 FINAL DRAFT
m503 Project1 FINAL DRAFTm503 Project1 FINAL DRAFT
m503 Project1 FINAL DRAFT
 
Hollick's Work Sample
Hollick's Work SampleHollick's Work Sample
Hollick's Work Sample
 
A Hybrid Constraint Programming And Enumeration Approach For Solving NHL Play...
A Hybrid Constraint Programming And Enumeration Approach For Solving NHL Play...A Hybrid Constraint Programming And Enumeration Approach For Solving NHL Play...
A Hybrid Constraint Programming And Enumeration Approach For Solving NHL Play...
 
Lab Based E-portfolio
Lab Based E-portfolioLab Based E-portfolio
Lab Based E-portfolio
 

March Madness Probabilities

  • 1. Predicting March Madness Using Probabilities Liana Valentino College of Charleston Introduction Numerous predictive models exist that are used to predict a bracket for the NCAA March Madness tournament. Basketball analysts have different opinions regarding which statistics are important to use and the weight of importance associated with each statistic; this discrepancy provides the option to use a variety of different models. Instead of focusing on one model, the current research discusses using several methods with different weights and using the probabilities of teams advancing to create a bracket. This allows a bracket to be created from a combination of many models, instead of using a sole method. Ratings: I chose to incorporate several different methods to create 36 different brackets, then decided on a final bracket based on the predictions of the individual models. To create the brackets, I started with uniform rating methods, then added various weights. The ratings methods used for this study were Massey and Colley. The use of two different methods gives a wider range of possibilities. A brief explanation of how these calculations differ is: •  Massey integrates the scores of the games, which allows a larger point differential to produce a larger increase or decrease in rating. •  Colley only uses wins and losses, not looking at the scores of the games. Starting. These methods were then modified by different weights to incorporate different aspects of the game to produce different sets of ratings. The ratings are then used to fill out a bracket, being that the team with the higher rating will move on to the next game. In this study, multiple sets of rankings are generated, then the probability that each team makes it to a particular round is used to create the final bracket. Weights: In order to use various methods opposed to a sole method to create a bracket, different weights are added to the original ranking methods. The four weights incorporated into this study are: 1.  Location of the win. If a team wins a game on the road, it is weighed differently than if they were to win at home. 2.  Margin of victory. Massey incorporates point differential in the sense that winning by a lot of points makes your rating better. In this case, close games are counted more than blow out games. 3.  When the game was played. Games played at different points in the season are weighted differently. 4.  Winning streak. This looks at how many games a team has won in a row. If your opponent is on a winning streak, and you break that winning streak, that game is weighed more. Probabilities: Probabilities of teams advancing to the next round is how the bracket is created. This calculated by going through the 36 brackets and counting how many times each team makes it to each round. I also calculated how likely a team is to make it to a specific round regards to how often they made it to the previous round. For example, if a team makes it to both the Elite 8 and Final Four five times, then given the team makes it to the Elite 8, the probability of them making it to the Final Four is 100%. . Using this data, the bracket is created by assuming that the teams with the highest probabilities in each round will be the ones to progress. Since Uniform Massey is the standard rating system that performs the best on average, that method is used for comparison. In Figure 3, the prediction accuracy from the probabilities calculated in the current study are compared to Massey's over the previous 15 years. Prediction accuracy is measured by how many games the method predicted correctly in the tournament. From a visual inspection, there are no significant differences between the two models; in some years the probabilities calculated in the current study were more accurate than Massey's predictions, and in some years the pattern was opposite. Team 3rd Round Sweet 16 Elite 8 Final 4 Champ Winner Kentucky 100 100 100 100 75 75 Wisconsin 100 100 100 61.11 2.78 2.78 Villanova 100 100 100 97.22 69.44 2.78 Duke 100 100 83.33 72.22 25.00 0 Figure 1: 2015 Number 1 Seed Overall Probabilities Figure 3: Prediction Accuracies by Year Creating a bracket would be done using data similar to what is displayed in Figure 1, which shows the probabilities number one seeds of 2015 progressing to the specified round. For example, •  Wisconsin makes it to the Final Four 61% of the time •  Duke makes it 72% of the time. This also shows that out of the number one seeds, Wisconsin has the smallest probability of making it to the championship game. Figure 2 provides a different analysis, giving the probability a team makes it given that they made it to the previous round. Wisconsin is shown winning the tournament 3% of the time, but Figure 2 tells us if they do make it to the championship game, they win the tournament 100% of the time. Figure 2 also shows that the only round Kentucky would lose in is the Final Four. Team 3rd Round Sweet 16 Elite 8 Final 4 Champ Winner Kentucky 100 100 100 100 75.00 100 Wisconsin 100 100 100 61.11 4.55 100 Villanova 100 100 100 97.22 71.43 4 Duke 100 100 83.33 86.67 34.62 0 Figure 2: 2015 Number 1 Seed Previous Round Probabilities 0 10 20 30 40 50 60 70 80 90 3rd Round Sweet 16 Elite 8 Final Four Championship Winner Accuracy% Probability Accuracy >90 80-90 70-80 60-70 50-60 50 55 60 65 70 75 80 85 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 Accuracy% Prediction Accuracy Probabilities Uniform Massey Figure 4: Probability Prediction Accuracy by Round Opposed to looking at the probability method accuracy as a whole, Figure 4 displays how ranges of probabilities perform in comparison to each other each round. For example, the graph shows that having a probability greater than 90% is the most accurate in the 3rd round, but having a probability between 50% and 60% performs better in the Elite 8 and Final Four. This accuracy is calculated by counting the number of teams predicted to make it to that round with the respective probability compared to the number of times they actually do from 2001 to 2015 Over the 15 years of data that the probabilities method was tested on, it has an average prediction accuracy of 65.5%. This approach was used to be able to incorporate some different opinions of what statistics are important. Although the average prediction accuracy is the same as Uniform Massey, Massey produces a higher ESPN score on average, implying that Massey predicts more accurately in the later rounds than probabilities. In general, the methods perform rather similarly across the rounds and on average. For the 2015 tournament, probabilities predicted 66.7% of the games correctly while Massey predicted 69.8% correctly. One of the best aspects of the probabilities methods is that it produces an output that is easy to explain and understand. The probabilities displayed in Figure 2 represent very different information than Figure 1, but still useful when creating a bracket. It tells you how likely a team is to win the next game assuming they won the previous game. The output in Figure 4 is interesting because it shows us that over 15 years, a team having a probability of making it to that round greater than 90% is not always the most accurate. Also, it’s shown that if a team has a probability between 80% and 90% of making it to the Elite 8, they are actually least likely to make it. Overall, the purpose of this study was to create a bracket using the results of multiple rating methods as opposed to one. Over a 15 year span, the method appears to be as accurate as existing methods. There are many more factors that can be included or added to the study to produce different and more accurate results in the future. Method Results Discussion Acknowledgements Dr. Amy Langville, John Sussingham, Drew Passarello, Stephen Gorman, and Thad Sulek, College of Charleston. Dr. Tim Chartier, Davidson College.