Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Bayes in Competition     Tim Salimans
Who am I?Statistical consultant at    PhD candidate inAlgoritmica                  Econometrics at EUR                   T...
What is Kaggle?        No, that‟s a gaggle.
What is Kaggle?   Platform for predictive modelling and analytics    competitions   Company provides data and defines th...
What is Kaggle?   Public competitions   Private competitions   Kaggle In Class
My experience with Kaggle   Public competitions:     Deloitte/FIDE   Chess Rating Challenge     Dont          Overfit! ...
Kaggle in Class
My experience with Kaggle   Currently working on the Heritage Health Prize   Predict which patients go to the hospital ...
What is Bayes?  No, that‟s not Rev. Thomas Bayes
What is Bayes?Simple recipe for reasoning under uncertainty:   Quantify what you know before getting data:     P(X)     (...
Monty Hall problem
Monty Hall problem
Monty Hall problem   • Should you switch?   • CONTROVERSY!
Monty Hall problem   X is the number of the door with a car   Prior P(X): All doors are equally likely to have the    ca...
Monty Hall problem   X is the number of the door with a car   Y is the observation of the goat   Model P(Y|X):     Hos...
Monty Hall problem   Posterior P(X|Y):    multiply: P(X)*P(Y|X),    rescale: *2   Highest is for door 2    (1/3 * 1)*2 =...
Monty Hall problem   Switching or not depends on your model!   Bayesian Analysis makes this clear
Observing Dark Worlds competition   Organized by University of Edinburgh   Sponsored by Winton Capital   80% of mass in...
Observing Dark Worlds competition
Observing Dark Worlds competition   X is location of dark    matter   Y is distorted image of    galaxies in the sky   ...
Observing Dark Worlds competition
Observing Dark Worlds competition   Posterior P(X|Y):     Computation   a bit      more difficult     We can get draws ...
Observing Dark Worlds competition   Minimize the distance    between dark matter    and our prediction   Expected distan...
Observing Dark Worlds competitionSounds pretty smart?Half-way down the leaderboard!
Observing Dark Worlds competition   Leaderboard only based on 30 cases   Final score determined on 90 other cases
Observing Dark Worlds competition   Great modelling    competition   Bayes dominated:    runner-up used very    similar ...
Deloitte/FIDE chess rating challenge 10 years of chess match  results 2 years withheld, these  should be predicted A be...
Deloitte/FIDE chess rating challengeFIDE currently uses the Elo system   Every player is assigned a skill   Expected res...
Deloitte/FIDE chess rating challengeFIDE currently uses the Elo system
Deloitte/FIDE chess rating challengeProblems with the Elo system   It‟s not Bayesian! This means uncertainty is not corre...
Deloitte/FIDE chess rating challengeTrueSkill   A Bayesian version of Elo   Developed by Microsoft   Used to rate Halo ...
Deloitte/FIDE chess rating challengeMy tweaked version ofTrueSkillPrior P(X): Skill leveldistribution has the Gaussianbell...
Deloitte/FIDE chess rating challengeMy tweaked version ofTrueSkillModel P(Y|X):- Basics the same as Elo- Discounts past re...
Deloitte/FIDE chess rating challengeMy tweaked version ofTrueSkillPosterior P(X|Y):- Bayes automatically makes  us look ba...
TrueSkill posterior approximation       s1                 s2       p1                 p2                -                ...
Deloitte/FIDE chess rating challenge   First try   That‟s pretty easy!
Deloitte/FIDE chess rating challenge   2 weeks later   Looks like I‟m getting some competition
Deloitte/FIDE chess rating challenge   Again 2 weeks later   Damn it!
Deloitte/FIDE chess rating challenge   1 week later   Order is restored!
Deloitte/FIDE chess rating challenge   1 day later   That didn‟t last long
Deloitte/FIDE chess rating challengeBy this time I had to go to a conference in St. Louis….
Deloitte/FIDE chess rating challenge   Last-ditch effort in the early morning before the    conference…   Back to first ...
Deloitte/FIDE chess rating challenge   But of course the public leaderboard is no    guarantee…   Victory!
Deloitte/FIDE chess rating challengeIt turns out I had beaten theinventors of TrueSkill, who invitedme for an internship a...
Deloitte/FIDE chess rating challenge   Met my rival Jason    „PlanetThanet‟ from the    competition   Jason went on to w...
Making connections through KaggleThese are just a few examples of the connections Ihave made through Kaggle Job offers I...
Conclusions   Kaggle competitions are great fun   Bayesian analysis provides a strong competitive    edge   Kaggle is a...
Questions?   My blog: TimSalimans.com   Algoritmica: Algoritmica.nlE-mail: timsalimans@hotmail.com
Upcoming SlideShare
Loading in …5
×

Bayes in competition

2,466 views

Published on

Bayes in competition

  1. 1. Bayes in Competition Tim Salimans
  2. 2. Who am I?Statistical consultant at PhD candidate inAlgoritmica Econometrics at EUR Top 10 Kaggler
  3. 3. What is Kaggle? No, that‟s a gaggle.
  4. 4. What is Kaggle? Platform for predictive modelling and analytics competitions Company provides data and defines the modelling problem Participants build models on part of the data Predictions are evaluated on another part of the data
  5. 5. What is Kaggle? Public competitions Private competitions Kaggle In Class
  6. 6. My experience with Kaggle Public competitions:  Deloitte/FIDE Chess Rating Challenge  Dont Overfit!  Observing Dark Worlds Private competition  Allstate Customer Retention Prediction
  7. 7. Kaggle in Class
  8. 8. My experience with Kaggle Currently working on the Heritage Health Prize Predict which patients go to the hospital $ 3,000,000 grand prize $500,000 consolation prize
  9. 9. What is Bayes? No, that‟s not Rev. Thomas Bayes
  10. 10. What is Bayes?Simple recipe for reasoning under uncertainty: Quantify what you know before getting data: P(X) (“prior”) Build a model for your data P(Y|X) (“model”) Apply Bayes‟ rule P(X|Y) = P(Y|X)P(X)/P(Y) (“posterior”)
  11. 11. Monty Hall problem
  12. 12. Monty Hall problem
  13. 13. Monty Hall problem • Should you switch? • CONTROVERSY!
  14. 14. Monty Hall problem X is the number of the door with a car Prior P(X): All doors are equally likely to have the car P(door 1 has car) = 1/3 P(door 2 has car) = 1/3 P(door 3 has car) = 1/3
  15. 15. Monty Hall problem X is the number of the door with a car Y is the observation of the goat Model P(Y|X):  Host knows which door has the goat  Host never opens your chosen door  Host always opens a door with a goat P(door 3 is opened | door 1 has car) = ½ P(door 3 is opened | door 2 has car) = 1 P(door 3 is opened | door 3 has car) = 0
  16. 16. Monty Hall problem Posterior P(X|Y): multiply: P(X)*P(Y|X), rescale: *2 Highest is for door 2 (1/3 * 1)*2 = 2/3
  17. 17. Monty Hall problem Switching or not depends on your model! Bayesian Analysis makes this clear
  18. 18. Observing Dark Worlds competition Organized by University of Edinburgh Sponsored by Winton Capital 80% of mass in the universe is dark matter Dark: It does not emit or absorb light We see its effect through gravityFind location of dark matter based on the effectsof its gravity
  19. 19. Observing Dark Worlds competition
  20. 20. Observing Dark Worlds competition X is location of dark matter Y is distorted image of galaxies in the sky Prior P(X): Dark matter distributed uniformly across the sky
  21. 21. Observing Dark Worlds competition
  22. 22. Observing Dark Worlds competition Posterior P(X|Y):  Computation a bit more difficult  We can get draws from P(X|Y) using MCMC  Use samples (points) to approximate P(X|Y)
  23. 23. Observing Dark Worlds competition Minimize the distance between dark matter and our prediction Expected distance = average distance over samples from P(X|Y) Prediction:Choose the point thatminimizes the expecteddistance
  24. 24. Observing Dark Worlds competitionSounds pretty smart?Half-way down the leaderboard!
  25. 25. Observing Dark Worlds competition Leaderboard only based on 30 cases Final score determined on 90 other cases
  26. 26. Observing Dark Worlds competition Great modelling competition Bayes dominated: runner-up used very similar method Academic paper summarizing the results is being written
  27. 27. Deloitte/FIDE chess rating challenge 10 years of chess match results 2 years withheld, these should be predicted A beats B, B beats C, what isthe probability C will beat A? Sponsored by world chess federation FIDE and Deloitte Australia
  28. 28. Deloitte/FIDE chess rating challengeFIDE currently uses the Elo system Every player is assigned a skill Expected result is a function of the skill difference Points are rewarded based on this skill difference
  29. 29. Deloitte/FIDE chess rating challengeFIDE currently uses the Elo system
  30. 30. Deloitte/FIDE chess rating challengeProblems with the Elo system It‟s not Bayesian! This means uncertainty is not correctly incorporated It does not look back in time It does not properly discount past results There is also information in the pairings
  31. 31. Deloitte/FIDE chess rating challengeTrueSkill A Bayesian version of Elo Developed by Microsoft Used to rate Halo players
  32. 32. Deloitte/FIDE chess rating challengeMy tweaked version ofTrueSkillPrior P(X): Skill leveldistribution has the Gaussianbell shape
  33. 33. Deloitte/FIDE chess rating challengeMy tweaked version ofTrueSkillModel P(Y|X):- Basics the same as Elo- Discounts past results- Pairings are also part of Y
  34. 34. Deloitte/FIDE chess rating challengeMy tweaked version ofTrueSkillPosterior P(X|Y):- Bayes automatically makes us look back in time- Uncertainty is properly accounted for- Computation is very difficult!
  35. 35. TrueSkill posterior approximation s1 s2 p1 p2 - d Pink wins
  36. 36. Deloitte/FIDE chess rating challenge First try That‟s pretty easy!
  37. 37. Deloitte/FIDE chess rating challenge 2 weeks later Looks like I‟m getting some competition
  38. 38. Deloitte/FIDE chess rating challenge Again 2 weeks later Damn it!
  39. 39. Deloitte/FIDE chess rating challenge 1 week later Order is restored!
  40. 40. Deloitte/FIDE chess rating challenge 1 day later That didn‟t last long
  41. 41. Deloitte/FIDE chess rating challengeBy this time I had to go to a conference in St. Louis….
  42. 42. Deloitte/FIDE chess rating challenge Last-ditch effort in the early morning before the conference… Back to first place!
  43. 43. Deloitte/FIDE chess rating challenge But of course the public leaderboard is no guarantee… Victory!
  44. 44. Deloitte/FIDE chess rating challengeIt turns out I had beaten theinventors of TrueSkill, who invitedme for an internship at MicrosoftResearch, Cambridge
  45. 45. Deloitte/FIDE chess rating challenge Met my rival Jason „PlanetThanet‟ from the competition Jason went on to win many competition, currently ranked nr 2. of all Kagglers Also lead the Dark Worlds competition for a long time
  46. 46. Making connections through KaggleThese are just a few examples of the connections Ihave made through Kaggle Job offers Interesting people Consulting opportunities Invitations to talk to great people like you!
  47. 47. Conclusions Kaggle competitions are great fun Bayesian analysis provides a strong competitive edge Kaggle is a great way to market yourself and to make new connections
  48. 48. Questions? My blog: TimSalimans.com Algoritmica: Algoritmica.nlE-mail: timsalimans@hotmail.com

×