Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

No Downloads

Total views

2,466

On SlideShare

0

From Embeds

0

Number of Embeds

573

Shares

0

Downloads

0

Comments

0

Likes

11

No embeds

No notes for slide

- 1. Bayes in Competition Tim Salimans
- 2. Who am I?Statistical consultant at PhD candidate inAlgoritmica Econometrics at EUR Top 10 Kaggler
- 3. What is Kaggle? No, that‟s a gaggle.
- 4. What is Kaggle? Platform for predictive modelling and analytics competitions Company provides data and defines the modelling problem Participants build models on part of the data Predictions are evaluated on another part of the data
- 5. What is Kaggle? Public competitions Private competitions Kaggle In Class
- 6. My experience with Kaggle Public competitions: Deloitte/FIDE Chess Rating Challenge Dont Overfit! Observing Dark Worlds Private competition Allstate Customer Retention Prediction
- 7. Kaggle in Class
- 8. My experience with Kaggle Currently working on the Heritage Health Prize Predict which patients go to the hospital $ 3,000,000 grand prize $500,000 consolation prize
- 9. What is Bayes? No, that‟s not Rev. Thomas Bayes
- 10. What is Bayes?Simple recipe for reasoning under uncertainty: Quantify what you know before getting data: P(X) (“prior”) Build a model for your data P(Y|X) (“model”) Apply Bayes‟ rule P(X|Y) = P(Y|X)P(X)/P(Y) (“posterior”)
- 11. Monty Hall problem
- 12. Monty Hall problem
- 13. Monty Hall problem • Should you switch? • CONTROVERSY!
- 14. Monty Hall problem X is the number of the door with a car Prior P(X): All doors are equally likely to have the car P(door 1 has car) = 1/3 P(door 2 has car) = 1/3 P(door 3 has car) = 1/3
- 15. Monty Hall problem X is the number of the door with a car Y is the observation of the goat Model P(Y|X): Host knows which door has the goat Host never opens your chosen door Host always opens a door with a goat P(door 3 is opened | door 1 has car) = ½ P(door 3 is opened | door 2 has car) = 1 P(door 3 is opened | door 3 has car) = 0
- 16. Monty Hall problem Posterior P(X|Y): multiply: P(X)*P(Y|X), rescale: *2 Highest is for door 2 (1/3 * 1)*2 = 2/3
- 17. Monty Hall problem Switching or not depends on your model! Bayesian Analysis makes this clear
- 18. Observing Dark Worlds competition Organized by University of Edinburgh Sponsored by Winton Capital 80% of mass in the universe is dark matter Dark: It does not emit or absorb light We see its effect through gravityFind location of dark matter based on the effectsof its gravity
- 19. Observing Dark Worlds competition
- 20. Observing Dark Worlds competition X is location of dark matter Y is distorted image of galaxies in the sky Prior P(X): Dark matter distributed uniformly across the sky
- 21. Observing Dark Worlds competition
- 22. Observing Dark Worlds competition Posterior P(X|Y): Computation a bit more difficult We can get draws from P(X|Y) using MCMC Use samples (points) to approximate P(X|Y)
- 23. Observing Dark Worlds competition Minimize the distance between dark matter and our prediction Expected distance = average distance over samples from P(X|Y) Prediction:Choose the point thatminimizes the expecteddistance
- 24. Observing Dark Worlds competitionSounds pretty smart?Half-way down the leaderboard!
- 25. Observing Dark Worlds competition Leaderboard only based on 30 cases Final score determined on 90 other cases
- 26. Observing Dark Worlds competition Great modelling competition Bayes dominated: runner-up used very similar method Academic paper summarizing the results is being written
- 27. Deloitte/FIDE chess rating challenge 10 years of chess match results 2 years withheld, these should be predicted A beats B, B beats C, what isthe probability C will beat A? Sponsored by world chess federation FIDE and Deloitte Australia
- 28. Deloitte/FIDE chess rating challengeFIDE currently uses the Elo system Every player is assigned a skill Expected result is a function of the skill difference Points are rewarded based on this skill difference
- 29. Deloitte/FIDE chess rating challengeFIDE currently uses the Elo system
- 30. Deloitte/FIDE chess rating challengeProblems with the Elo system It‟s not Bayesian! This means uncertainty is not correctly incorporated It does not look back in time It does not properly discount past results There is also information in the pairings
- 31. Deloitte/FIDE chess rating challengeTrueSkill A Bayesian version of Elo Developed by Microsoft Used to rate Halo players
- 32. Deloitte/FIDE chess rating challengeMy tweaked version ofTrueSkillPrior P(X): Skill leveldistribution has the Gaussianbell shape
- 33. Deloitte/FIDE chess rating challengeMy tweaked version ofTrueSkillModel P(Y|X):- Basics the same as Elo- Discounts past results- Pairings are also part of Y
- 34. Deloitte/FIDE chess rating challengeMy tweaked version ofTrueSkillPosterior P(X|Y):- Bayes automatically makes us look back in time- Uncertainty is properly accounted for- Computation is very difficult!
- 35. TrueSkill posterior approximation s1 s2 p1 p2 - d Pink wins
- 36. Deloitte/FIDE chess rating challenge First try That‟s pretty easy!
- 37. Deloitte/FIDE chess rating challenge 2 weeks later Looks like I‟m getting some competition
- 38. Deloitte/FIDE chess rating challenge Again 2 weeks later Damn it!
- 39. Deloitte/FIDE chess rating challenge 1 week later Order is restored!
- 40. Deloitte/FIDE chess rating challenge 1 day later That didn‟t last long
- 41. Deloitte/FIDE chess rating challengeBy this time I had to go to a conference in St. Louis….
- 42. Deloitte/FIDE chess rating challenge Last-ditch effort in the early morning before the conference… Back to first place!
- 43. Deloitte/FIDE chess rating challenge But of course the public leaderboard is no guarantee… Victory!
- 44. Deloitte/FIDE chess rating challengeIt turns out I had beaten theinventors of TrueSkill, who invitedme for an internship at MicrosoftResearch, Cambridge
- 45. Deloitte/FIDE chess rating challenge Met my rival Jason „PlanetThanet‟ from the competition Jason went on to win many competition, currently ranked nr 2. of all Kagglers Also lead the Dark Worlds competition for a long time
- 46. Making connections through KaggleThese are just a few examples of the connections Ihave made through Kaggle Job offers Interesting people Consulting opportunities Invitations to talk to great people like you!
- 47. Conclusions Kaggle competitions are great fun Bayesian analysis provides a strong competitive edge Kaggle is a great way to market yourself and to make new connections
- 48. Questions? My blog: TimSalimans.com Algoritmica: Algoritmica.nlE-mail: timsalimans@hotmail.com

No public clipboards found for this slide

Be the first to comment