Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Conductrics bandit basicsemetrics1016

4,941 views

Published on

Conductrics bandit basicsemetrics1016

  1. 1. Bandit Basics – A Differenttake on Online Optimization Conductrics twitter: @mgershoff
  2. 2. Who is this guy?Matt GershoffCEO: ConductricsMany Years in Database Marketing (New Yorkand Paris)and a bit of Web Analyticswww.conductrics.comtwitter:@mgershoffEmail:matt@conductrics.com
  3. 3. Speak UpConductrics twitter: @mgershoff
  4. 4. What Are We Going to Hear?• Optimization Basics• Multi-Armed Bandit • Its a Problem, Not a Method • Some Methods • AB Testing • Epsilon Greedy • Upper Confidence Interval (UCB)• Some Results
  5. 5. Choices TargetingLearning Optimization Conductrics twitter: @mgershoff
  6. 6. OPTIMIZATIONIf THIS Then THATITTT brings together:1.Decision Rules2.Predictive Analytics3.Choice Optimization Conductrics twitter: @mgershoff
  7. 7. OPTIMIZATIONFind and Apply the Rule with the most Value If THIS Then THAT If THIS Then THAT If THIS Then THAT If THIS Then THAT If THIS Then THAT If THIS Then THAT If THIS Then THAT If THIS Then THAT If THIS Then THAT If THIS Then THAT If THIS Then THAT If THIS Then THAT Conductrics twitter: @mgershoff
  8. 8. OPTIMIZATIONVariables whose Values Variables whose Values Are Given to You You Control THIS THATIf Facebook High Spend Urban GEO . Then Offer A Offer B Offer C . . Predictive Model . . . . F1 . . F2 S Valuei . Home Page Fm Offer Y App Use Offer Z Inputs Conductrics twitter: @mgershoff Outputs
  9. 9. But …1. We Don’t Have Data on ‘THAT’2. Need to Collect – Sample THAT3. How to Sample Efficiently? Offer A ? Offer B ? Offer C ? . . . . . Offer Y ? Offer Z ? Conductrics twitter: @mgershoff
  10. 10. WhereMarketing Applications:• Websites• Mobile• Social Media Campaigns• Banner AdsPharma: Clinical Trials Conductrics twitter: @mgershoff
  11. 11. What is a Multi Armed BanditOne Armed Bandit –>Slot MachineThe problem:How to pick between Slot Machines so thatyou walk out with most $$$ from Casino at theend of the Night? OR A B Conductrics twitter: @mgershoff
  12. 12. ObjectivePick so as to get the mostreturn/profit as you canover timeTechnical term: Minimize Regret Conductrics twitter: @mgershoff
  13. 13. Sequential Selection… but how to Pick? OR A BNeed to Sample, but do it efficiently Conductrics twitter: @mgershoff
  14. 14. Explore – Collect Data OR A B• Data Collection is costly – an Investment• Be Efficient – Balance the potential value of collecting new data with exploiting what you currently know. Conductrics twitter: @mgershoff
  15. 15. Multi-Armed Bandits“Bandit problems embody in essentialform a conflict evident in all humanaction: choosing actions which yieldimmediate reward vs. choosing actions… whose benefit will come only later.”*- Peter Whittle *Source: Qing Zhao, UC Davis. Plenary talk at SPAWC, June, 2010. Conductrics twitter: @mgershoff
  16. 16. Exploration Exploitation1) Explore/Learn – Try out different actionsto learn how they perform over time – This isa data collection task.2) Exploit/Earn – Take advantage of whatyou have learned to get highest payoff –Your current best guess Conductrics twitter: @mgershoff
  17. 17. Not A New Problem1933 – first work on competing options1940 – WWII Problem Allies attempt to tackle1953 – Bellman formulates as a Dynamic Programing problemSource: http://www.lancs.ac.uk/~winterh/GRhist.html Conductrics twitter: @mgershoff
  18. 18. Testing• Explore First – All actions have an equal chance of selection (uniform random). – Use hypothesis testing to select a ‘Winner’.• Then Exploit - Keep only ‘Winner’ for selection Conductrics twitter: @mgershoff
  19. 19. Learn FirstData Collection/Sample Apply Leaning Explore/ Exploit/ Learn Earn Time Conductrics twitter: @mgershoff
  20. 20. P-Values: A DigressionP-Values:• NOT the probability that the Null is True. P( Null=True| DATA)• P(DATA (or more extreme)| Null=True)• Not a great tool for deciding when to stop samplingSee:http://andrewgelman.com/2010/09/noooooooooooooo_1/http://www.stat.duke.edu/~berger/papers/02-01.pdf Conductrics twitter: @mgershoff
  21. 21. A Couple Other Methods1. Epsilon Greedy Nice and Simple2. Upper Confidence Bounds(UCB) Adapts to Uncertainty Conductrics twitter: @mgershoff
  22. 22. 1) Epsilon-Greedy Conductrics twitter: @mgershoff
  23. 23. GreedyWhat do you mean by ‘Greedy’?Make whatever choice seemsbest at the moment.
  24. 24. Epsilon GreedyWhat do you mean by ‘EpsilonGreedy’?• Explore – randomly select action  percent of the time (say 20%)• Exploit – Play greedy (pick the current best) 1-  (say 80%)
  25. 25. Epsilon Greedy UserExplore/Learn Exploit/Earn(20%) (80%) Select Select Current Randomly Best Like AB Testing (Be Greedy) Conductrics twitter: @mgershoff
  26. 26. Epsilon Greedy20% Random 80% Select Best Action Value A $5.00 B $4.00 C $3.00 D $2.00 E $1.00 Conductrics twitter: @mgershoff
  27. 27. Continuous Sampling Explore/Learn Exploit/Earn Time Conductrics twitter: @mgershoff
  28. 28. Epsilon Greedy– Super Simple/low cost to implement– Tends to be surprisingly effective– Less affected by ‘Seasonality’– Not optimal (hard to pick best )– Doesn’t use measure of variance– Should/How to decrease Exploration over time? Conductrics twitter: @mgershoff
  29. 29. Upper Confidence BoundBasic Idea:1) Calculate both mean and a measure of uncertainty (variance) for each action.2) Make Greedy selections based on mean + uncertainty bonus Conductrics twitter: @mgershoff
  30. 30. Confidence Interval ReviewConfidence Interval = mean +/- z*Std - 2*Std Mean +2*Std Conductrics twitter: @mgershoff
  31. 31. Upper ConfidenceScore each option using the upperportion of the interval as a Bonus Mean +Bonus Conductrics twitter: @mgershoff
  32. 32. Upper Confidence Bound1) Use upper portion of CI as ‘Bonus’2) Make Greedy Selections A Select A B C $0 $5 $10 Estimated Reward Conductrics twitter: @mgershoff
  33. 33. Upper Confidence Bound1) Selecting Action ‘A’ reduces uncertaintybonus (because more data)2) Action ‘C’ now has highest score A B C Select C $0 $5 $10 Estimated Reward Conductrics twitter: @mgershoff
  34. 34. Upper Confidence Bound• Like A/B Test – uses variance measure• Unlike A/B Test – no hypothesis test• Automatically Balances Exploration with Exploitation Conductrics twitter: @mgershoff
  35. 35. Case Study: ConversionTreatment Rate ServedV2V3 9.9% 14,893V2V2 9.7% 9,720V2V1 8.0% 2,441V1V3 3.3% 2,090V2V3 2.6% 1,849V2V2 2.0% 1,817V1V1 1.8% 1,926V3V1 1.8% 1,821V1V2 1.5% 1,873 Conductrics twitter: @mgershoff
  36. 36. Case StudyTest Method Conversion RateAdaptive 7%Non Adaptive 4.5% Conductrics twitter: @mgershoff
  37. 37. AB Testing V BanditOption A ->Option B ->Option C -> Conductrics twitter: @mgershoff
  38. 38. Why Should I Care?• More Efficient Learning• Automation• Changing World Conductrics twitter: @mgershoff
  39. 39. Questions? Conductrics twitter: @mgershoff
  40. 40. Thank You!Matt Gershoff p) 646-384-5151 e) matt@conductrics.com t) @mgershoff Conductrics twitter: @mgershoff

×