Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
Loading in …5
×

# Bayes in competition

2,466 views

Published on

• Full Name
Comment goes here.

Are you sure you want to Yes No
Your message goes here
• Be the first to comment

### Bayes in competition

1. 1. Bayes in Competition Tim Salimans
2. 2. Who am I?Statistical consultant at PhD candidate inAlgoritmica Econometrics at EUR Top 10 Kaggler
3. 3. What is Kaggle? No, that‟s a gaggle.
4. 4. What is Kaggle? Platform for predictive modelling and analytics competitions Company provides data and defines the modelling problem Participants build models on part of the data Predictions are evaluated on another part of the data
5. 5. What is Kaggle? Public competitions Private competitions Kaggle In Class
6. 6. My experience with Kaggle Public competitions:  Deloitte/FIDE Chess Rating Challenge  Dont Overfit!  Observing Dark Worlds Private competition  Allstate Customer Retention Prediction
7. 7. Kaggle in Class
8. 8. My experience with Kaggle Currently working on the Heritage Health Prize Predict which patients go to the hospital \$ 3,000,000 grand prize \$500,000 consolation prize
9. 9. What is Bayes? No, that‟s not Rev. Thomas Bayes
10. 10. What is Bayes?Simple recipe for reasoning under uncertainty: Quantify what you know before getting data: P(X) (“prior”) Build a model for your data P(Y|X) (“model”) Apply Bayes‟ rule P(X|Y) = P(Y|X)P(X)/P(Y) (“posterior”)
11. 11. Monty Hall problem
12. 12. Monty Hall problem
13. 13. Monty Hall problem • Should you switch? • CONTROVERSY!
14. 14. Monty Hall problem X is the number of the door with a car Prior P(X): All doors are equally likely to have the car P(door 1 has car) = 1/3 P(door 2 has car) = 1/3 P(door 3 has car) = 1/3
15. 15. Monty Hall problem X is the number of the door with a car Y is the observation of the goat Model P(Y|X):  Host knows which door has the goat  Host never opens your chosen door  Host always opens a door with a goat P(door 3 is opened | door 1 has car) = ½ P(door 3 is opened | door 2 has car) = 1 P(door 3 is opened | door 3 has car) = 0
16. 16. Monty Hall problem Posterior P(X|Y): multiply: P(X)*P(Y|X), rescale: *2 Highest is for door 2 (1/3 * 1)*2 = 2/3
17. 17. Monty Hall problem Switching or not depends on your model! Bayesian Analysis makes this clear
18. 18. Observing Dark Worlds competition Organized by University of Edinburgh Sponsored by Winton Capital 80% of mass in the universe is dark matter Dark: It does not emit or absorb light We see its effect through gravityFind location of dark matter based on the effectsof its gravity
19. 19. Observing Dark Worlds competition
20. 20. Observing Dark Worlds competition X is location of dark matter Y is distorted image of galaxies in the sky Prior P(X): Dark matter distributed uniformly across the sky
21. 21. Observing Dark Worlds competition
22. 22. Observing Dark Worlds competition Posterior P(X|Y):  Computation a bit more difficult  We can get draws from P(X|Y) using MCMC  Use samples (points) to approximate P(X|Y)
23. 23. Observing Dark Worlds competition Minimize the distance between dark matter and our prediction Expected distance = average distance over samples from P(X|Y) Prediction:Choose the point thatminimizes the expecteddistance
24. 24. Observing Dark Worlds competitionSounds pretty smart?Half-way down the leaderboard!
25. 25. Observing Dark Worlds competition Leaderboard only based on 30 cases Final score determined on 90 other cases
26. 26. Observing Dark Worlds competition Great modelling competition Bayes dominated: runner-up used very similar method Academic paper summarizing the results is being written
27. 27. Deloitte/FIDE chess rating challenge 10 years of chess match results 2 years withheld, these should be predicted A beats B, B beats C, what isthe probability C will beat A? Sponsored by world chess federation FIDE and Deloitte Australia
28. 28. Deloitte/FIDE chess rating challengeFIDE currently uses the Elo system Every player is assigned a skill Expected result is a function of the skill difference Points are rewarded based on this skill difference
29. 29. Deloitte/FIDE chess rating challengeFIDE currently uses the Elo system
30. 30. Deloitte/FIDE chess rating challengeProblems with the Elo system It‟s not Bayesian! This means uncertainty is not correctly incorporated It does not look back in time It does not properly discount past results There is also information in the pairings
31. 31. Deloitte/FIDE chess rating challengeTrueSkill A Bayesian version of Elo Developed by Microsoft Used to rate Halo players
32. 32. Deloitte/FIDE chess rating challengeMy tweaked version ofTrueSkillPrior P(X): Skill leveldistribution has the Gaussianbell shape
33. 33. Deloitte/FIDE chess rating challengeMy tweaked version ofTrueSkillModel P(Y|X):- Basics the same as Elo- Discounts past results- Pairings are also part of Y
34. 34. Deloitte/FIDE chess rating challengeMy tweaked version ofTrueSkillPosterior P(X|Y):- Bayes automatically makes us look back in time- Uncertainty is properly accounted for- Computation is very difficult!
35. 35. TrueSkill posterior approximation s1 s2 p1 p2 - d Pink wins
36. 36. Deloitte/FIDE chess rating challenge First try That‟s pretty easy!
37. 37. Deloitte/FIDE chess rating challenge 2 weeks later Looks like I‟m getting some competition
38. 38. Deloitte/FIDE chess rating challenge Again 2 weeks later Damn it!
39. 39. Deloitte/FIDE chess rating challenge 1 week later Order is restored!
40. 40. Deloitte/FIDE chess rating challenge 1 day later That didn‟t last long
41. 41. Deloitte/FIDE chess rating challengeBy this time I had to go to a conference in St. Louis….
42. 42. Deloitte/FIDE chess rating challenge Last-ditch effort in the early morning before the conference… Back to first place!
43. 43. Deloitte/FIDE chess rating challenge But of course the public leaderboard is no guarantee… Victory!
44. 44. Deloitte/FIDE chess rating challengeIt turns out I had beaten theinventors of TrueSkill, who invitedme for an internship at MicrosoftResearch, Cambridge
45. 45. Deloitte/FIDE chess rating challenge Met my rival Jason „PlanetThanet‟ from the competition Jason went on to win many competition, currently ranked nr 2. of all Kagglers Also lead the Dark Worlds competition for a long time
46. 46. Making connections through KaggleThese are just a few examples of the connections Ihave made through Kaggle Job offers Interesting people Consulting opportunities Invitations to talk to great people like you!
47. 47. Conclusions Kaggle competitions are great fun Bayesian analysis provides a strong competitive edge Kaggle is a great way to market yourself and to make new connections
48. 48. Questions? My blog: TimSalimans.com Algoritmica: Algoritmica.nlE-mail: timsalimans@hotmail.com