Successfully reported this slideshow.
Upcoming SlideShare
×

# Conductrics bandit basicsemetrics1016

5,457 views

Published on

• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

### Conductrics bandit basicsemetrics1016

1. 1. Bandit Basics – A Differenttake on Online Optimization Conductrics twitter: @mgershoff
2. 2. Who is this guy?Matt GershoffCEO: ConductricsMany Years in Database Marketing (New Yorkand Paris)and a bit of Web Analyticswww.conductrics.comtwitter:@mgershoffEmail:matt@conductrics.com
3. 3. Speak UpConductrics twitter: @mgershoff
4. 4. What Are We Going to Hear?• Optimization Basics• Multi-Armed Bandit • Its a Problem, Not a Method • Some Methods • AB Testing • Epsilon Greedy • Upper Confidence Interval (UCB)• Some Results
5. 5. Choices TargetingLearning Optimization Conductrics twitter: @mgershoff
6. 6. OPTIMIZATIONIf THIS Then THATITTT brings together:1.Decision Rules2.Predictive Analytics3.Choice Optimization Conductrics twitter: @mgershoff
7. 7. OPTIMIZATIONFind and Apply the Rule with the most Value If THIS Then THAT If THIS Then THAT If THIS Then THAT If THIS Then THAT If THIS Then THAT If THIS Then THAT If THIS Then THAT If THIS Then THAT If THIS Then THAT If THIS Then THAT If THIS Then THAT If THIS Then THAT Conductrics twitter: @mgershoff
8. 8. OPTIMIZATIONVariables whose Values Variables whose Values Are Given to You You Control THIS THATIf Facebook High Spend Urban GEO . Then Offer A Offer B Offer C . . Predictive Model . . . . F1 . . F2 S Valuei . Home Page Fm Offer Y App Use Offer Z Inputs Conductrics twitter: @mgershoff Outputs
9. 9. But …1. We Don’t Have Data on ‘THAT’2. Need to Collect – Sample THAT3. How to Sample Efficiently? Offer A ? Offer B ? Offer C ? . . . . . Offer Y ? Offer Z ? Conductrics twitter: @mgershoff
10. 10. WhereMarketing Applications:• Websites• Mobile• Social Media Campaigns• Banner AdsPharma: Clinical Trials Conductrics twitter: @mgershoff
11. 11. What is a Multi Armed BanditOne Armed Bandit –>Slot MachineThe problem:How to pick between Slot Machines so thatyou walk out with most \$\$\$ from Casino at theend of the Night? OR A B Conductrics twitter: @mgershoff
12. 12. ObjectivePick so as to get the mostreturn/profit as you canover timeTechnical term: Minimize Regret Conductrics twitter: @mgershoff
13. 13. Sequential Selection… but how to Pick? OR A BNeed to Sample, but do it efficiently Conductrics twitter: @mgershoff
14. 14. Explore – Collect Data OR A B• Data Collection is costly – an Investment• Be Efficient – Balance the potential value of collecting new data with exploiting what you currently know. Conductrics twitter: @mgershoff
15. 15. Multi-Armed Bandits“Bandit problems embody in essentialform a conflict evident in all humanaction: choosing actions which yieldimmediate reward vs. choosing actions… whose benefit will come only later.”*- Peter Whittle *Source: Qing Zhao, UC Davis. Plenary talk at SPAWC, June, 2010. Conductrics twitter: @mgershoff
16. 16. Exploration Exploitation1) Explore/Learn – Try out different actionsto learn how they perform over time – This isa data collection task.2) Exploit/Earn – Take advantage of whatyou have learned to get highest payoff –Your current best guess Conductrics twitter: @mgershoff
17. 17. Not A New Problem1933 – first work on competing options1940 – WWII Problem Allies attempt to tackle1953 – Bellman formulates as a Dynamic Programing problemSource: http://www.lancs.ac.uk/~winterh/GRhist.html Conductrics twitter: @mgershoff
18. 18. Testing• Explore First – All actions have an equal chance of selection (uniform random). – Use hypothesis testing to select a ‘Winner’.• Then Exploit - Keep only ‘Winner’ for selection Conductrics twitter: @mgershoff
19. 19. Learn FirstData Collection/Sample Apply Leaning Explore/ Exploit/ Learn Earn Time Conductrics twitter: @mgershoff
20. 20. P-Values: A DigressionP-Values:• NOT the probability that the Null is True. P( Null=True| DATA)• P(DATA (or more extreme)| Null=True)• Not a great tool for deciding when to stop samplingSee:http://andrewgelman.com/2010/09/noooooooooooooo_1/http://www.stat.duke.edu/~berger/papers/02-01.pdf Conductrics twitter: @mgershoff
21. 21. A Couple Other Methods1. Epsilon Greedy Nice and Simple2. Upper Confidence Bounds(UCB) Adapts to Uncertainty Conductrics twitter: @mgershoff
22. 22. 1) Epsilon-Greedy Conductrics twitter: @mgershoff
23. 23. GreedyWhat do you mean by ‘Greedy’?Make whatever choice seemsbest at the moment.
24. 24. Epsilon GreedyWhat do you mean by ‘EpsilonGreedy’?• Explore – randomly select action  percent of the time (say 20%)• Exploit – Play greedy (pick the current best) 1-  (say 80%)
25. 25. Epsilon Greedy UserExplore/Learn Exploit/Earn(20%) (80%) Select Select Current Randomly Best Like AB Testing (Be Greedy) Conductrics twitter: @mgershoff
26. 26. Epsilon Greedy20% Random 80% Select Best Action Value A \$5.00 B \$4.00 C \$3.00 D \$2.00 E \$1.00 Conductrics twitter: @mgershoff
27. 27. Continuous Sampling Explore/Learn Exploit/Earn Time Conductrics twitter: @mgershoff
28. 28. Epsilon Greedy– Super Simple/low cost to implement– Tends to be surprisingly effective– Less affected by ‘Seasonality’– Not optimal (hard to pick best )– Doesn’t use measure of variance– Should/How to decrease Exploration over time? Conductrics twitter: @mgershoff
29. 29. Upper Confidence BoundBasic Idea:1) Calculate both mean and a measure of uncertainty (variance) for each action.2) Make Greedy selections based on mean + uncertainty bonus Conductrics twitter: @mgershoff
30. 30. Confidence Interval ReviewConfidence Interval = mean +/- z*Std - 2*Std Mean +2*Std Conductrics twitter: @mgershoff
31. 31. Upper ConfidenceScore each option using the upperportion of the interval as a Bonus Mean +Bonus Conductrics twitter: @mgershoff
32. 32. Upper Confidence Bound1) Use upper portion of CI as ‘Bonus’2) Make Greedy Selections A Select A B C \$0 \$5 \$10 Estimated Reward Conductrics twitter: @mgershoff
33. 33. Upper Confidence Bound1) Selecting Action ‘A’ reduces uncertaintybonus (because more data)2) Action ‘C’ now has highest score A B C Select C \$0 \$5 \$10 Estimated Reward Conductrics twitter: @mgershoff
34. 34. Upper Confidence Bound• Like A/B Test – uses variance measure• Unlike A/B Test – no hypothesis test• Automatically Balances Exploration with Exploitation Conductrics twitter: @mgershoff
35. 35. Case Study: ConversionTreatment Rate ServedV2V3 9.9% 14,893V2V2 9.7% 9,720V2V1 8.0% 2,441V1V3 3.3% 2,090V2V3 2.6% 1,849V2V2 2.0% 1,817V1V1 1.8% 1,926V3V1 1.8% 1,821V1V2 1.5% 1,873 Conductrics twitter: @mgershoff