SlideShare a Scribd company logo
1 of 23
Download to read offline
Logistic Regression and Markov Chain approach to
NCAA Basketball seeding
Michael Hankin
University of Southern California
mhankin@usc.edu

April 22, 2013

Michael Hankin (USC)

LRMC

April 22, 2013

1 / 22
Overview

1

Background
Logistic Regression
Markov Chain

Michael Hankin (USC)

LRMC

April 22, 2013

2 / 22
Overview of Logistic Regression

Basic idea of Logistic Regression: Given explanatory variables X and
binary response variable Y we wish to determine P(Y = 1 | X ). Logistic
regression allows us to estimate this by modeling
1
Y ∼ Bernoulli σ(w T X ) where σ(w T X ) =
1+e w T X
If we model P(i beats j on j’s homecourt | i beat j by x on i’s homecourt)
as σ(α + βx) we obtain the following likelihood:
L(α, β) =
g :games

Michael Hankin (USC)

1
1 + e α+βxg

LRMC

wg

1−

1
1 + e α+βxg

1−wg

April 22, 2013

3 / 22
We then find parameters that maximize the likelihood.

= log L(α, β) =

wg log
g :games

+(1 − wg ) log 1 −

1
1 + e α+βxg
1
1 + e α+βxg

−wg log 1 + e α+βxg +(1−wg ) α + βxg − log 1 + e α+βxg

=
g :games

(1 − wg ) (α + βxg ) − log 1 + e α+βxg

=
g :games

Michael Hankin (USC)

LRMC

April 22, 2013

4 / 22
∂
e α+βxg
=
(1 − wg ) −
∂α g :games
1 + e α+βxg
(1 − wg ) − 1 −

=
g :games

=
g :games

(1)

1
1 + e α+βxg

(2)

1
− wg
1 + e α+βxg

(3)

e α+βxg
∂
=
xg
(1 − wg )xg −
∂β g :games
1 + e α+βxg
(1 − wg )xg − 1 −

=
g :games

=
g :games

Michael Hankin (USC)

1
− wg
1 + e α+βxg

LRMC

1
1 + e α+βxg

(4)
xg

xg

(5)
(6)

April 22, 2013

5 / 22
∂2
=
−
∂α2 g :games

1
1 + e α+βxg

e α+βxg
1 + e α+βxg

−

1
1 + e α+βxg

1−

=
g :games

∂2
∂2
=
=
−
∂α∂β
∂β∂α g :games

1
1 + e α+βxg

1
1 + e α+βxg

1
1 + e α+βxg

1−

∂2
=
−
∂β 2 g :games

1
1 + e α+βxg

e α+βxg
1 + e α+βxg

−

1
1 + e α+βxg

1−

g :games

=
g :games

Michael Hankin (USC)

LRMC

(8)

e α+βxg
1 + e α+βxg

−

=

(7)

1
1 + e α+βxg

xg

xg

2
xg

1
1 + e α+βxg

(9)
(10)
(11)

2
xg

April 22, 2013

(12)

6 / 22
Want α, β s.t.
Taylor we have:

0=

(α, β) = 0. For α , β let

(α, β) =

0=

(α +

α, β

(α , β ) +

2

+

β)

≈

(α , β )

α

=α−α ,

(α , β ) +

β

= β − β . By

2

(α , β )

α
β

α
−
β

2

(α , β )

α
β

Newton to the rescue: Successive updates of the following form should
converge to the optimal values.
α
α
=
β
β

Michael Hankin (USC)

−

2

(α , β )

LRMC

−1

(α , β )

April 22, 2013

7 / 22
Use of Logistic Regression in LRMC

H
Victory/Defeat margin: We have now found rx , the probability that if
team i beats team j by x at i’s home court, team i will beat team j at j’s
home court. Assuming homecourt advantage is additive, the superiority
H
probability sx , the probability that team i would beat team j on a neutral
H
court given that team i beat team j by x on team i’s home court= rx+h .
H
This gives h = −αrr and sx = σ( αr + βr x).
2β
2

Michael Hankin (USC)

LRMC

April 22, 2013

8 / 22
Alternative assumptions: Because each game has finite length (equal
except for overtime), a reasonable estimator for a teams skill is the
proportion of they control the ball. Going further, the proportion of time a
team controls the ball can be estimated by their score divided by the sum
of both teams scores. Multiplicative homecourt advantage (look at score
ratio) and log multiplicative (log of score ratio).
Reduce overfitting: By penalizing for large parameter values (implying
that future games are independent of past games) we can reduce
overfiiting by choosing nonnegative λα , λβ and minimizing
− + λα α2 + λβ β 2 .
In my regularized examples I placed larger penalties on the α’s, operating
under the hypothesis that there is no homecourt advantage.

Michael Hankin (USC)

LRMC

April 22, 2013

9 / 22
Logistic Regression ”Goodness of Fit”

Assumptions for test: Because the number of observations is much
larger than the number of ”buckets” (for classical LRMC mean and
median observations per score differential were approximately 32.9 and 17,
respectively) the CLT allows us to normalize the residuals by assuming
H
y
that each observation is Bernoulli ri = √ yi −ˆi and thus i ri2 ∼ χ2 .
n−2
yi (1−ˆi )
ˆ
y
-

Michael Hankin (USC)

LRMC

April 22, 2013

10 / 22
Chi Squared p-values for logistic regressions

additive
additive (reg)
multiplicative
multiplicative (reg)
log mult
log mult (reg)

2011
0.511777
0.500654
0.495586
0.027208
0.499545
0.424898

2012
0.552131
0.534811
0.537728
0.001498
0.558072
0.440884

2013
0.569139
0.550568
0.522612
0.001819
0.593485
0.483908

Table : χ2 p-values

Michael Hankin (USC)

LRMC

April 22, 2013

11 / 22
2010-2011 Logistic Regressions
Numbers in legends are estimated homecourt advantages.

Michael Hankin (USC)

LRMC

April 22, 2013

12 / 22
2011-2012 Logistic Regressions
Numbers in legends are estimated homecourt advantages.

Michael Hankin (USC)

LRMC

April 22, 2013

13 / 22
2012-2013 Logistic Regressions
Numbers in legends are estimated homecourt advantages.

Michael Hankin (USC)

LRMC

April 22, 2013

14 / 22
Parameter estimates for 2012-2013

Additive Parameters:
αr , βr =0.68503617299539032, -0.056212447269008876.
Variance:
α
β

Michael Hankin (USC)

α
1.94257829e-03
-6.11459051e-05

LRMC

β
-6.11459051e-05
1.20313009e-05

April 22, 2013

15 / 22
Overview of Markov Chains

Stochastic Process with finite states: A Finite-state markov chain is a
stochastic process where the probability of being in X at time t is
dependent only on the state at time t-1.
Steady state: Given some basic conditions, there exists a probability
distribution across the states such that if a Markov Chain is run for a long
time we can expect the state at any given time to be ”Multinoulli” with
the steady state distribution.

Michael Hankin (USC)

LRMC

April 22, 2013

16 / 22
Use of Markov Chains in LRMC

LRMC states: In LRMC we create a state for each team, indicating that
we think that team is the best team.
Transition probabilities: Given some probability distribution based on
each team’s regular season record we either jump to another team or stay
put at each ”step”.
Expected time per state: Eventually a steady state distribution emerges
representing the amount of time we expect to be in each state. In this
case because the transition matrix is sparse and small enough for my
laptop to handle, we just find its eigenvector corresponding to
eigenvalue=1, and normalize in L1 .

Michael Hankin (USC)

LRMC

April 22, 2013

17 / 22
Transition Probabilities

Naive Approach: To motivate the more complex LRMC approach we
start simple. Take
p = P(team i is better than team j | team i beat team j), wij = the
number of times i beat j, lij = the number of time j beat i, and Ni = total
number of games played by i (required to normalize transition
probabilities). Then we define the transition probability
1
tij = Ni (wij (1 − p) + lij p).
Better approach: Obviously we can do better by considering the victory
1
H
H
margin and game location. tij Ni
g :iatj rx(g ) +
g :jati (1 − rx(g ) ) ,
tii = 1 − j=i tij .
-

Michael Hankin (USC)

LRMC

April 22, 2013

18 / 22
2013 Top 10 projected teams

0
1
2
3
4
5
6
7
8
9

Top teams
Miami (FL)
Michigan
Wisconsin
Ohio State
Syracuse
Kansas
Gonzaga
Indiana
Louisville
Florida

Michael Hankin (USC)

Top teamsL
Nevada-Las Vegas
Notre Dame
Virginia Commonwealth
James Madison
Louisville
North Carolina A&T
North Carolina State
New Mexico
Syracuse
Memphis

LRMC

TopProb
0.006619
0.006619
0.006670
0.006788
0.006991
0.007234
0.007625
0.008241
0.008352
0.008582

TopProbL
0.003262
0.003262
0.003262
0.003262
0.003262
0.003262
0.003262
0.003361
0.003361
0.003361

April 22, 2013

19 / 22
Solitary and comparative accuracy

Proportion of Tournament matchups predicted correctly:
2012-2013
2011-2012
2010-2011
Additive
0.630769230769 0.716417910448 0.615384615385
Multiplicative 0.569230769231
0.641791044776
0.615384615385
Log Mult
0.630769230769 0.686567164179 0.630769230769

Michael Hankin (USC)

LRMC

April 22, 2013

20 / 22
2012-2013 Linear Regression for Playoff probability
difference vs victory margin

Michael Hankin (USC)

LRMC

April 22, 2013

21 / 22
References

Paul Kvam and Joel S. Sokol (2006)
A Logistic Regression/Markov Chain Model for NCAA Basketball
Naval Research Logistics

RogueWave Logistic Regression Documentation
http://www.roguewave.com/portals/0/products/legacy-hpp/docs/anaug/3-3.html

Michael Hankin (USC)

LRMC

April 22, 2013

22 / 22
The End

Michael Hankin (USC)

LRMC

April 22, 2013

23 / 22

More Related Content

Viewers also liked

Generalized Logistic Regression - by example (Anthony Kilili)
Generalized Logistic Regression - by example (Anthony Kilili)Generalized Logistic Regression - by example (Anthony Kilili)
Generalized Logistic Regression - by example (Anthony Kilili)Anthony Kilili
 
Ethics Midterm Presentation NCAA
Ethics Midterm Presentation NCAAEthics Midterm Presentation NCAA
Ethics Midterm Presentation NCAAJawanza Robinson
 
Mode Choice analysis for work trips using Multinomial Logit model for Windsor...
Mode Choice analysis for work trips using Multinomial Logit model for Windsor...Mode Choice analysis for work trips using Multinomial Logit model for Windsor...
Mode Choice analysis for work trips using Multinomial Logit model for Windsor...Aakash Bagchi
 
Intro to Logistic Regression
Intro to Logistic RegressionIntro to Logistic Regression
Intro to Logistic RegressionJay Victoria
 
Logistic Regression: Behind the Scenes
Logistic Regression: Behind the ScenesLogistic Regression: Behind the Scenes
Logistic Regression: Behind the ScenesChris White
 
From logistic regression to linear chain CRF
From logistic regression to linear chain CRFFrom logistic regression to linear chain CRF
From logistic regression to linear chain CRFDarren Yow-Bang Wang
 
4.5. logistic regression
4.5. logistic regression4.5. logistic regression
4.5. logistic regressionA M
 
ESL 4.4.3-4.5: Logistic Reression (contd.) and Separating Hyperplane
ESL 4.4.3-4.5: Logistic Reression (contd.) and Separating HyperplaneESL 4.4.3-4.5: Logistic Reression (contd.) and Separating Hyperplane
ESL 4.4.3-4.5: Logistic Reression (contd.) and Separating HyperplaneShinichi Tamura
 
Logistic regression for ordered dependant variable with more than 2 levels
Logistic regression for ordered dependant variable with more than 2 levelsLogistic regression for ordered dependant variable with more than 2 levels
Logistic regression for ordered dependant variable with more than 2 levelsArup Guha
 
Logistic regression (blyth 2006) (simplified)
Logistic regression (blyth 2006) (simplified)Logistic regression (blyth 2006) (simplified)
Logistic regression (blyth 2006) (simplified)MikeBlyth
 
Multinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkMultinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkDB Tsai
 
How to launch an aws ec2 instance
How to launch an aws ec2 instanceHow to launch an aws ec2 instance
How to launch an aws ec2 instanceAndrea Cirillo
 
Logistic Regression: Predicting The Chances Of Coronary Heart Disease
Logistic Regression: Predicting The Chances Of Coronary Heart DiseaseLogistic Regression: Predicting The Chances Of Coronary Heart Disease
Logistic Regression: Predicting The Chances Of Coronary Heart DiseaseMichael Lieberman
 

Viewers also liked (19)

Generalized Logistic Regression - by example (Anthony Kilili)
Generalized Logistic Regression - by example (Anthony Kilili)Generalized Logistic Regression - by example (Anthony Kilili)
Generalized Logistic Regression - by example (Anthony Kilili)
 
Ethics Midterm Presentation NCAA
Ethics Midterm Presentation NCAAEthics Midterm Presentation NCAA
Ethics Midterm Presentation NCAA
 
Mode Choice analysis for work trips using Multinomial Logit model for Windsor...
Mode Choice analysis for work trips using Multinomial Logit model for Windsor...Mode Choice analysis for work trips using Multinomial Logit model for Windsor...
Mode Choice analysis for work trips using Multinomial Logit model for Windsor...
 
Intro to Logistic Regression
Intro to Logistic RegressionIntro to Logistic Regression
Intro to Logistic Regression
 
Logistic Regression: Behind the Scenes
Logistic Regression: Behind the ScenesLogistic Regression: Behind the Scenes
Logistic Regression: Behind the Scenes
 
From logistic regression to linear chain CRF
From logistic regression to linear chain CRFFrom logistic regression to linear chain CRF
From logistic regression to linear chain CRF
 
Choice Models
Choice ModelsChoice Models
Choice Models
 
4.5. logistic regression
4.5. logistic regression4.5. logistic regression
4.5. logistic regression
 
Binary Logistic Regression Example
Binary Logistic Regression ExampleBinary Logistic Regression Example
Binary Logistic Regression Example
 
ESL 4.4.3-4.5: Logistic Reression (contd.) and Separating Hyperplane
ESL 4.4.3-4.5: Logistic Reression (contd.) and Separating HyperplaneESL 4.4.3-4.5: Logistic Reression (contd.) and Separating Hyperplane
ESL 4.4.3-4.5: Logistic Reression (contd.) and Separating Hyperplane
 
Logistic regression for ordered dependant variable with more than 2 levels
Logistic regression for ordered dependant variable with more than 2 levelsLogistic regression for ordered dependant variable with more than 2 levels
Logistic regression for ordered dependant variable with more than 2 levels
 
Logistic regression (blyth 2006) (simplified)
Logistic regression (blyth 2006) (simplified)Logistic regression (blyth 2006) (simplified)
Logistic regression (blyth 2006) (simplified)
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Probit and logit model
Probit and logit modelProbit and logit model
Probit and logit model
 
Multinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkMultinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache Spark
 
Multilevel Binary Logistic Regression
Multilevel Binary Logistic RegressionMultilevel Binary Logistic Regression
Multilevel Binary Logistic Regression
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
How to launch an aws ec2 instance
How to launch an aws ec2 instanceHow to launch an aws ec2 instance
How to launch an aws ec2 instance
 
Logistic Regression: Predicting The Chances Of Coronary Heart Disease
Logistic Regression: Predicting The Chances Of Coronary Heart DiseaseLogistic Regression: Predicting The Chances Of Coronary Heart Disease
Logistic Regression: Predicting The Chances Of Coronary Heart Disease
 

Similar to Logistic Regression/Markov Chain presentation

SAT based planning for multiagent systems
SAT based planning for multiagent systemsSAT based planning for multiagent systems
SAT based planning for multiagent systemsRavi Kuril
 
Multi nomial pdf
Multi nomial pdfMulti nomial pdf
Multi nomial pdfsabbir11
 
High Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNEHigh Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNEKai-Wen Zhao
 
Estimation and Prediction of Complex Systems: Progress in Weather and Climate
Estimation and Prediction of Complex Systems: Progress in Weather and ClimateEstimation and Prediction of Complex Systems: Progress in Weather and Climate
Estimation and Prediction of Complex Systems: Progress in Weather and Climatemodons
 
Relationship between some machine learning concepts
Relationship between some machine learning conceptsRelationship between some machine learning concepts
Relationship between some machine learning conceptsZoya Bylinskii
 
MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1arogozhnikov
 
Efficient Hill Climber for Constrained Pseudo-Boolean Optimization Problems
Efficient Hill Climber for Constrained Pseudo-Boolean Optimization ProblemsEfficient Hill Climber for Constrained Pseudo-Boolean Optimization Problems
Efficient Hill Climber for Constrained Pseudo-Boolean Optimization Problemsjfrchicanog
 
DissertationSlides169
DissertationSlides169DissertationSlides169
DissertationSlides169Ryan White
 
A new generalized lindley distribution
A new generalized lindley distributionA new generalized lindley distribution
A new generalized lindley distributionAlexander Decker
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsChristian Robert
 
MM framework for RL
MM framework for RLMM framework for RL
MM framework for RLSung Yub Kim
 
Matrix Completion Presentation
Matrix Completion PresentationMatrix Completion Presentation
Matrix Completion PresentationMichael Hankin
 
Automated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jecturesAutomated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jecturesKim Hammar
 
Quantitative Methods for Lawyers - Class #15 - Chi Square Distribution and Ch...
Quantitative Methods for Lawyers - Class #15 - Chi Square Distribution and Ch...Quantitative Methods for Lawyers - Class #15 - Chi Square Distribution and Ch...
Quantitative Methods for Lawyers - Class #15 - Chi Square Distribution and Ch...Daniel Katz
 
Mathcad - CMS (Component Mode Synthesis) Analysis.pdf
Mathcad - CMS (Component Mode Synthesis) Analysis.pdfMathcad - CMS (Component Mode Synthesis) Analysis.pdf
Mathcad - CMS (Component Mode Synthesis) Analysis.pdfJulio Banks
 
Efficient Identification of Improving Moves in a Ball for Pseudo-Boolean Prob...
Efficient Identification of Improving Moves in a Ball for Pseudo-Boolean Prob...Efficient Identification of Improving Moves in a Ball for Pseudo-Boolean Prob...
Efficient Identification of Improving Moves in a Ball for Pseudo-Boolean Prob...jfrchicanog
 

Similar to Logistic Regression/Markov Chain presentation (20)

SAT based planning for multiagent systems
SAT based planning for multiagent systemsSAT based planning for multiagent systems
SAT based planning for multiagent systems
 
Multi nomial pdf
Multi nomial pdfMulti nomial pdf
Multi nomial pdf
 
k*NN2016
k*NN2016k*NN2016
k*NN2016
 
High Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNEHigh Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNE
 
Estimation and Prediction of Complex Systems: Progress in Weather and Climate
Estimation and Prediction of Complex Systems: Progress in Weather and ClimateEstimation and Prediction of Complex Systems: Progress in Weather and Climate
Estimation and Prediction of Complex Systems: Progress in Weather and Climate
 
Relationship between some machine learning concepts
Relationship between some machine learning conceptsRelationship between some machine learning concepts
Relationship between some machine learning concepts
 
MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1
 
Efficient Hill Climber for Constrained Pseudo-Boolean Optimization Problems
Efficient Hill Climber for Constrained Pseudo-Boolean Optimization ProblemsEfficient Hill Climber for Constrained Pseudo-Boolean Optimization Problems
Efficient Hill Climber for Constrained Pseudo-Boolean Optimization Problems
 
DissertationSlides169
DissertationSlides169DissertationSlides169
DissertationSlides169
 
A new generalized lindley distribution
A new generalized lindley distributionA new generalized lindley distribution
A new generalized lindley distribution
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithms
 
MM framework for RL
MM framework for RLMM framework for RL
MM framework for RL
 
QMC: Operator Splitting Workshop, Projective Splitting with Forward Steps and...
QMC: Operator Splitting Workshop, Projective Splitting with Forward Steps and...QMC: Operator Splitting Workshop, Projective Splitting with Forward Steps and...
QMC: Operator Splitting Workshop, Projective Splitting with Forward Steps and...
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Matrix Completion Presentation
Matrix Completion PresentationMatrix Completion Presentation
Matrix Completion Presentation
 
Automated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jecturesAutomated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jectures
 
Quantitative Methods for Lawyers - Class #15 - Chi Square Distribution and Ch...
Quantitative Methods for Lawyers - Class #15 - Chi Square Distribution and Ch...Quantitative Methods for Lawyers - Class #15 - Chi Square Distribution and Ch...
Quantitative Methods for Lawyers - Class #15 - Chi Square Distribution and Ch...
 
CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...
CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...
CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...
 
Mathcad - CMS (Component Mode Synthesis) Analysis.pdf
Mathcad - CMS (Component Mode Synthesis) Analysis.pdfMathcad - CMS (Component Mode Synthesis) Analysis.pdf
Mathcad - CMS (Component Mode Synthesis) Analysis.pdf
 
Efficient Identification of Improving Moves in a Ball for Pseudo-Boolean Prob...
Efficient Identification of Improving Moves in a Ball for Pseudo-Boolean Prob...Efficient Identification of Improving Moves in a Ball for Pseudo-Boolean Prob...
Efficient Identification of Improving Moves in a Ball for Pseudo-Boolean Prob...
 

Recently uploaded

The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptxPoojaSen20
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersChitralekhaTherkar
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 

Recently uploaded (20)

The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptx
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of Powders
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 

Logistic Regression/Markov Chain presentation

  • 1. Logistic Regression and Markov Chain approach to NCAA Basketball seeding Michael Hankin University of Southern California mhankin@usc.edu April 22, 2013 Michael Hankin (USC) LRMC April 22, 2013 1 / 22
  • 2. Overview 1 Background Logistic Regression Markov Chain Michael Hankin (USC) LRMC April 22, 2013 2 / 22
  • 3. Overview of Logistic Regression Basic idea of Logistic Regression: Given explanatory variables X and binary response variable Y we wish to determine P(Y = 1 | X ). Logistic regression allows us to estimate this by modeling 1 Y ∼ Bernoulli σ(w T X ) where σ(w T X ) = 1+e w T X If we model P(i beats j on j’s homecourt | i beat j by x on i’s homecourt) as σ(α + βx) we obtain the following likelihood: L(α, β) = g :games Michael Hankin (USC) 1 1 + e α+βxg LRMC wg 1− 1 1 + e α+βxg 1−wg April 22, 2013 3 / 22
  • 4. We then find parameters that maximize the likelihood. = log L(α, β) = wg log g :games +(1 − wg ) log 1 − 1 1 + e α+βxg 1 1 + e α+βxg −wg log 1 + e α+βxg +(1−wg ) α + βxg − log 1 + e α+βxg = g :games (1 − wg ) (α + βxg ) − log 1 + e α+βxg = g :games Michael Hankin (USC) LRMC April 22, 2013 4 / 22
  • 5. ∂ e α+βxg = (1 − wg ) − ∂α g :games 1 + e α+βxg (1 − wg ) − 1 − = g :games = g :games (1) 1 1 + e α+βxg (2) 1 − wg 1 + e α+βxg (3) e α+βxg ∂ = xg (1 − wg )xg − ∂β g :games 1 + e α+βxg (1 − wg )xg − 1 − = g :games = g :games Michael Hankin (USC) 1 − wg 1 + e α+βxg LRMC 1 1 + e α+βxg (4) xg xg (5) (6) April 22, 2013 5 / 22
  • 6. ∂2 = − ∂α2 g :games 1 1 + e α+βxg e α+βxg 1 + e α+βxg − 1 1 + e α+βxg 1− = g :games ∂2 ∂2 = = − ∂α∂β ∂β∂α g :games 1 1 + e α+βxg 1 1 + e α+βxg 1 1 + e α+βxg 1− ∂2 = − ∂β 2 g :games 1 1 + e α+βxg e α+βxg 1 + e α+βxg − 1 1 + e α+βxg 1− g :games = g :games Michael Hankin (USC) LRMC (8) e α+βxg 1 + e α+βxg − = (7) 1 1 + e α+βxg xg xg 2 xg 1 1 + e α+βxg (9) (10) (11) 2 xg April 22, 2013 (12) 6 / 22
  • 7. Want α, β s.t. Taylor we have: 0= (α, β) = 0. For α , β let (α, β) = 0= (α + α, β (α , β ) + 2 + β) ≈ (α , β ) α =α−α , (α , β ) + β = β − β . By 2 (α , β ) α β α − β 2 (α , β ) α β Newton to the rescue: Successive updates of the following form should converge to the optimal values. α α = β β Michael Hankin (USC) − 2 (α , β ) LRMC −1 (α , β ) April 22, 2013 7 / 22
  • 8. Use of Logistic Regression in LRMC H Victory/Defeat margin: We have now found rx , the probability that if team i beats team j by x at i’s home court, team i will beat team j at j’s home court. Assuming homecourt advantage is additive, the superiority H probability sx , the probability that team i would beat team j on a neutral H court given that team i beat team j by x on team i’s home court= rx+h . H This gives h = −αrr and sx = σ( αr + βr x). 2β 2 Michael Hankin (USC) LRMC April 22, 2013 8 / 22
  • 9. Alternative assumptions: Because each game has finite length (equal except for overtime), a reasonable estimator for a teams skill is the proportion of they control the ball. Going further, the proportion of time a team controls the ball can be estimated by their score divided by the sum of both teams scores. Multiplicative homecourt advantage (look at score ratio) and log multiplicative (log of score ratio). Reduce overfitting: By penalizing for large parameter values (implying that future games are independent of past games) we can reduce overfiiting by choosing nonnegative λα , λβ and minimizing − + λα α2 + λβ β 2 . In my regularized examples I placed larger penalties on the α’s, operating under the hypothesis that there is no homecourt advantage. Michael Hankin (USC) LRMC April 22, 2013 9 / 22
  • 10. Logistic Regression ”Goodness of Fit” Assumptions for test: Because the number of observations is much larger than the number of ”buckets” (for classical LRMC mean and median observations per score differential were approximately 32.9 and 17, respectively) the CLT allows us to normalize the residuals by assuming H y that each observation is Bernoulli ri = √ yi −ˆi and thus i ri2 ∼ χ2 . n−2 yi (1−ˆi ) ˆ y - Michael Hankin (USC) LRMC April 22, 2013 10 / 22
  • 11. Chi Squared p-values for logistic regressions additive additive (reg) multiplicative multiplicative (reg) log mult log mult (reg) 2011 0.511777 0.500654 0.495586 0.027208 0.499545 0.424898 2012 0.552131 0.534811 0.537728 0.001498 0.558072 0.440884 2013 0.569139 0.550568 0.522612 0.001819 0.593485 0.483908 Table : χ2 p-values Michael Hankin (USC) LRMC April 22, 2013 11 / 22
  • 12. 2010-2011 Logistic Regressions Numbers in legends are estimated homecourt advantages. Michael Hankin (USC) LRMC April 22, 2013 12 / 22
  • 13. 2011-2012 Logistic Regressions Numbers in legends are estimated homecourt advantages. Michael Hankin (USC) LRMC April 22, 2013 13 / 22
  • 14. 2012-2013 Logistic Regressions Numbers in legends are estimated homecourt advantages. Michael Hankin (USC) LRMC April 22, 2013 14 / 22
  • 15. Parameter estimates for 2012-2013 Additive Parameters: αr , βr =0.68503617299539032, -0.056212447269008876. Variance: α β Michael Hankin (USC) α 1.94257829e-03 -6.11459051e-05 LRMC β -6.11459051e-05 1.20313009e-05 April 22, 2013 15 / 22
  • 16. Overview of Markov Chains Stochastic Process with finite states: A Finite-state markov chain is a stochastic process where the probability of being in X at time t is dependent only on the state at time t-1. Steady state: Given some basic conditions, there exists a probability distribution across the states such that if a Markov Chain is run for a long time we can expect the state at any given time to be ”Multinoulli” with the steady state distribution. Michael Hankin (USC) LRMC April 22, 2013 16 / 22
  • 17. Use of Markov Chains in LRMC LRMC states: In LRMC we create a state for each team, indicating that we think that team is the best team. Transition probabilities: Given some probability distribution based on each team’s regular season record we either jump to another team or stay put at each ”step”. Expected time per state: Eventually a steady state distribution emerges representing the amount of time we expect to be in each state. In this case because the transition matrix is sparse and small enough for my laptop to handle, we just find its eigenvector corresponding to eigenvalue=1, and normalize in L1 . Michael Hankin (USC) LRMC April 22, 2013 17 / 22
  • 18. Transition Probabilities Naive Approach: To motivate the more complex LRMC approach we start simple. Take p = P(team i is better than team j | team i beat team j), wij = the number of times i beat j, lij = the number of time j beat i, and Ni = total number of games played by i (required to normalize transition probabilities). Then we define the transition probability 1 tij = Ni (wij (1 − p) + lij p). Better approach: Obviously we can do better by considering the victory 1 H H margin and game location. tij Ni g :iatj rx(g ) + g :jati (1 − rx(g ) ) , tii = 1 − j=i tij . - Michael Hankin (USC) LRMC April 22, 2013 18 / 22
  • 19. 2013 Top 10 projected teams 0 1 2 3 4 5 6 7 8 9 Top teams Miami (FL) Michigan Wisconsin Ohio State Syracuse Kansas Gonzaga Indiana Louisville Florida Michael Hankin (USC) Top teamsL Nevada-Las Vegas Notre Dame Virginia Commonwealth James Madison Louisville North Carolina A&T North Carolina State New Mexico Syracuse Memphis LRMC TopProb 0.006619 0.006619 0.006670 0.006788 0.006991 0.007234 0.007625 0.008241 0.008352 0.008582 TopProbL 0.003262 0.003262 0.003262 0.003262 0.003262 0.003262 0.003262 0.003361 0.003361 0.003361 April 22, 2013 19 / 22
  • 20. Solitary and comparative accuracy Proportion of Tournament matchups predicted correctly: 2012-2013 2011-2012 2010-2011 Additive 0.630769230769 0.716417910448 0.615384615385 Multiplicative 0.569230769231 0.641791044776 0.615384615385 Log Mult 0.630769230769 0.686567164179 0.630769230769 Michael Hankin (USC) LRMC April 22, 2013 20 / 22
  • 21. 2012-2013 Linear Regression for Playoff probability difference vs victory margin Michael Hankin (USC) LRMC April 22, 2013 21 / 22
  • 22. References Paul Kvam and Joel S. Sokol (2006) A Logistic Regression/Markov Chain Model for NCAA Basketball Naval Research Logistics RogueWave Logistic Regression Documentation http://www.roguewave.com/portals/0/products/legacy-hpp/docs/anaug/3-3.html Michael Hankin (USC) LRMC April 22, 2013 22 / 22
  • 23. The End Michael Hankin (USC) LRMC April 22, 2013 23 / 22