Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Learning to Play Sports
ML in sports analytics
Dr. Tim Chartier
Tresata
Davidson College
tichartier@davidson.edu
Dr. Amy L...
Outline of talk
Play from bench
data availability
general interest
(a.k.a. cool factor)
domain knowledge
Application 1: Ranking
Apply here
Picture credit: http://orlandonest.files.wordpress.com/2011/03/2011-march-
madness-bracket.gif
How do we do?
• ESPN Tournament Challenge: > 4 million
brackets!
• 1st round correct choice = 10 points
• nth round correc...
finding ideal weight
4 prediction
Method 1: crowd source
• 2009 – best bracket – 97%
• 2010 – best bracket – 99%
• 2014 – national media led to thousands of...
Method 2: learn sports
• vary parameter weights to optimize ESPN
score or prediction rate
• subtlety: not all seasons are ...
Method 3: mad web
10 years, 50,000 games
Application 2: Cats Stats
Analytics for college teams to support coaching.
sports analytics keys
• coachable
• consumable
• understandable (informed opinion)
impact: coaching
“It kind of blew us away…it really opened our eyes...”
– Matt McKillop, NYT
impact: off-season
Player Poss. TO% OR% EFG% 2P% 3P%
Brian
Sullivan
77 14.3% 20.0% 65.6% 67.4% 40.0%
without 56 23.2% 20.8...
Application 3: Lotsa data
missile tech
25 frames/sec
Filtered for Warriors regular season
data we have
SportVU-like data
MasseyRatings.com
column 1 = date of game as measured as days since 1/1/0000
column 2 = date in YYYYMMDD format
column 3 =...
Tresata Data
For network analysis, Tresata added:
• seed
• coach’s Madness history
• kenpom.com statistics
• every season ...
Data needed
• ESPN bracket challenge scores for past years
• injuries for every game
• score with 2 min or 4 minutes left
...
• If we remove a team and it highly
affects reranking, what can we
learn about such a team for March
Madness?
• How can Bu...
New Work
How rankable is this dataset?
Rankability
Data
Apps Amazon products
Netflix movies
Financial networks
Teams
Intuitive Ideas
one extreme
Dominance graph
(very rankable)
Random graph
(less rankable)
other extreme
Inconsistency
Uparcs in a rank-ordered graph
5 uparcs
Inconsistency
Uparcs in a rank-ordered graph
Minimum Violations Ranking
3 uparcs
Inconsistency
BUT this measure of rankability
is tied to the ranking.
March Madness 2008
sorted by Massey rating
uparcs = ...
Goal
k-cycles
Create a rankability measure that is independent of ranking.
2-cycles: 1-2-1
2-1-2
5-cycles: 1-2-3-4-5-1
2-3...
Goal
k-cycles
Create a rankability measure that is independent of ranking.
2-cycles: 1-2-1
2-1-2
5-cycles: 1-2-3-4-5-1
2-3...
Goal
k-cycles
Create a rankability measure that is independent of ranking.
2-cycles: 1-2-1
2-1-2
5-cycles: 1-2-3-4-5-1
2-3...
Future Work
If a dataset is not very rankable, which edges should
we add to the graph to improve its rankability?
earn to play sports
data questions applications
questions?
Picture credit: http://www.trendir.com/ultra-modern/
Amy Langville, Professor of Mathematics, The College of Charleston in South Carolina & Tim Chartier, Chief Researcher, Tre...
Upcoming SlideShare
Loading in …5
×

Amy Langville, Professor of Mathematics, The College of Charleston in South Carolina & Tim Chartier, Chief Researcher, Tresata at MLconf ATL 2016

319 views

Published on

Learning to Play Sports: Sports Analytics is an active and growing field. With large datasets from biometric devices and player tracking equipment, sports teams can benefit from techniques in data analytics and machine learning. This talk will discuss work in the areas of March Madness and game-to-game analysis. With the emergence of algorithms to study such dynamics as player performance and fan engagement, the collection of data also becomes paramount. Professional sports organizations have access to premium technology. This talk will also discuss how such work can be transferred to the college and secondary levels. Machine learning allows cutting edge technology to play from the bench.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Amy Langville, Professor of Mathematics, The College of Charleston in South Carolina & Tim Chartier, Chief Researcher, Tresata at MLconf ATL 2016

  1. 1. Learning to Play Sports ML in sports analytics Dr. Tim Chartier Tresata Davidson College tichartier@davidson.edu Dr. Amy Langville College of Charleston Dept. of Math LangvilleA@cofc.edu @timchartier
  2. 2. Outline of talk
  3. 3. Play from bench data availability general interest (a.k.a. cool factor) domain knowledge
  4. 4. Application 1: Ranking
  5. 5. Apply here Picture credit: http://orlandonest.files.wordpress.com/2011/03/2011-march- madness-bracket.gif
  6. 6. How do we do? • ESPN Tournament Challenge: > 4 million brackets! • 1st round correct choice = 10 points • nth round correct choice = 2*(previous round)
  7. 7. finding ideal weight 4 prediction
  8. 8. Method 1: crowd source • 2009 – best bracket – 97% • 2010 – best bracket – 99% • 2014 – national media led to thousands of brackets on: marchmathness.davidson.edu
  9. 9. Method 2: learn sports • vary parameter weights to optimize ESPN score or prediction rate • subtlety: not all seasons are equally predictive
  10. 10. Method 3: mad web 10 years, 50,000 games
  11. 11. Application 2: Cats Stats Analytics for college teams to support coaching.
  12. 12. sports analytics keys • coachable • consumable • understandable (informed opinion)
  13. 13. impact: coaching “It kind of blew us away…it really opened our eyes...” – Matt McKillop, NYT
  14. 14. impact: off-season Player Poss. TO% OR% EFG% 2P% 3P% Brian Sullivan 77 14.3% 20.0% 65.6% 67.4% 40.0% without 56 23.2% 20.8% 55.3% 42.9% 47.1%
  15. 15. Application 3: Lotsa data
  16. 16. missile tech 25 frames/sec
  17. 17. Filtered for Warriors regular season
  18. 18. data we have
  19. 19. SportVU-like data
  20. 20. MasseyRatings.com column 1 = date of game as measured as days since 1/1/0000 column 2 = date in YYYYMMDD format column 3 = team 1 index column 4 = team 1 home field (1 = home, -1 = away, 0 = neutral) column 5 = team 1 score column 6 = team 2 index column 7 = team 2 home field (1 = home, -1 = away, 0 = neutral) column 8 = team 2 score
  21. 21. Tresata Data For network analysis, Tresata added: • seed • coach’s Madness history • kenpom.com statistics • every season game (and added game stats) What can we learn from about 50,000 games?
  22. 22. Data needed • ESPN bracket challenge scores for past years • injuries for every game • score with 2 min or 4 minutes left • learn from Vegas odds • biometric data
  23. 23. • If we remove a team and it highly affects reranking, what can we learn about such a team for March Madness? • How can Buddy Hield light up March Madness? • Compare Jack Gibbs to Stephen Curry in college play. media ?’s
  24. 24. New Work How rankable is this dataset?
  25. 25. Rankability Data Apps Amazon products Netflix movies Financial networks Teams
  26. 26. Intuitive Ideas one extreme Dominance graph (very rankable) Random graph (less rankable) other extreme
  27. 27. Inconsistency Uparcs in a rank-ordered graph 5 uparcs
  28. 28. Inconsistency Uparcs in a rank-ordered graph Minimum Violations Ranking 3 uparcs
  29. 29. Inconsistency BUT this measure of rankability is tied to the ranking. March Madness 2008 sorted by Massey rating uparcs = 27.2% March Madness 2014 sorted by Massey rating uparcs = 26.9%
  30. 30. Goal k-cycles Create a rankability measure that is independent of ranking. 2-cycles: 1-2-1 2-1-2 5-cycles: 1-2-3-4-5-1 2-3-4-5-1-2 3-4-5-1-2-3 4-5-1-2-3-4 5-1-2-3-4-5
  31. 31. Goal k-cycles Create a rankability measure that is independent of ranking. 2-cycles: 1-2-1 2-1-2 5-cycles: 1-2-3-4-5-1 2-3-4-5-1-2 3-4-5-1-2-3 4-5-1-2-3-4 5-1-2-3-4-5 4-paths: 1-2-1-2-1 2-1-2-1-2
  32. 32. Goal k-cycles Create a rankability measure that is independent of ranking. 2-cycles: 1-2-1 2-1-2 5-cycles: 1-2-3-4-5-1 2-3-4-5-1-2 3-4-5-1-2-3 4-5-1-2-3-4 5-1-2-3-4-5 4-paths: 1-2-1-2-1 2-1-2-1-2
  33. 33. Future Work If a dataset is not very rankable, which edges should we add to the graph to improve its rankability?
  34. 34. earn to play sports data questions applications
  35. 35. questions? Picture credit: http://www.trendir.com/ultra-modern/

×