SlideShare a Scribd company logo
METAGAMING:
Bandits with simple regret and small
budget
Chen-Wei Chou, Ping-Chiang Chou,
Chang-Shing Lee, David Lupien St-Pierre,
Olivier Teytaud, Mei-Hui Wang, Li-Wen Wu
and Shi-Jim Yen
Outline:
- what is a bandit problem ?
- what is a strategic bandit problem ?
- is a strategic bandit different from a bandit ?
- algorithms
- results
What is a bandit problem ?
A finite number of time steps
A (finite) number of options,
each of them equipped with a (unknown) proba distribution
At each time step:
- you choose one option
- you get a reward, distributed according to its proba distribution
At the end:
- you choose one option (you can not change anymore...)
- your reward is the expected reward associated to this option
What is a bandit problem ?
A finite number of time steps
A (finite) number of options,
each of them equipped with a (unknown) proba distribution
At each time step:
- you choose one option
- you get a reward, distributed according to its proba distribution
At the end:
- you choose one option (you can not change anymore...)
- your reward is the expected reward associated to this option
Here we collect
information
What is a bandit problem ?
A finite number of time steps
A (finite) number of options,
each of them equipped with a (unknown) proba distribution
At each time step:
- you choose one option
- you get a reward, distributed according to its proba distribution
At the end:
- you choose one option (you can not change anymore...)
- your reward is the expected reward associated to this option
Here we use
information for
the final choice
What is a bandit problem ?
A finite number of time steps
A (finite) number of options,
each of them equipped with a (unknown) proba distribution
At each time step:
- you choose one option
- you get a reward, distributed according to its proba distribution
At the end:
- you choose one option (you can not change anymore...)
- your reward is the expected reward associated to this option
Here, we
explore
What is a bandit problem ?
A finite number of time steps
A (finite) number of options,
each of them equipped with a (unknown) proba distribution
At each time step:
- you choose one option
- you get a reward, distributed according to its proba distribution
At the end:
- you choose one option (you can not change anymore...)
- your reward is the expected reward associated to this option
Here, we take no risk
What is a bandit problem ?
A finite number of time steps
A (finite) number of options,
each of them equipped with a (unknown) proba distribution
At each time step (exploration):
- you choose one option
- you get a reward, distributed according to its proba distribution
At the end (recommendation):
- you choose one option (you can not change anymore...)
- your reward is the expected reward associated to this option
Which kind of bandit ?
- in the bandit literature, options are
also termed “arms”
- here the criterion is the expected reward
of the option chosen at the end
(sometimes it is the sum
of the rewards during exploration)
- we presented here stochastic bandits
(a probability distribution
per option) ==> next slide is different
And adversarial bandit ?
A finite number of time steps
A (finite) number of options for player 1,
and a finite number of options for player 2.
An unknown probability distribution for each pair of options
At each time step:
- you choose one option for P1 and one option for P2
- you get a reward, distributed according to the
corresponding proba distribution
At the end:
- you choose one **probabilistic** option for P1
(you can not change anymore...)
- your reward is the expected reward associated to this option,
for the worst choice by P2
What is meta-gaming ?
What is “strategic choice” ?
Strategic choices:
- decisions once and for all, at a high level
- ≠ from tactical level
Meta-gaming: choice at a strategic level, in games:
- choosings cards, in card games
- choosing handicap positioning, in Go
==> once and for all, at the beginning of the game
Example of stochastic bandit
(i.e. 1P strategic choice)
Game of Go handicap bandit problem, at each time step:
- you choose one handicap positioning
- then you simulate one game from this position
==> only one player has a strategic choice
==> stochastic bandit
Example of adversarial bandit
(i.e. 2P strategic choice)
Urban Rivals bandit problem, at each time step:
- you choose
- one set of cards for you (P1)
- one set of cards for P2
- then you simulate one Urban Rivals game from this position
PLAYER 1:
PLAYER 2:
==> two players have a strategic choice
==> adversarial bandit
Is a strategic bandit problem
different from
a classical bandit problem ?
No difference in nature
Just a much
smaller budget
Algorithms
Reminder:
- two algorithms needed:
- one for choosing during N exploration steps
- one for choosing during 1 recommendation step
- two settings
- one-player case
- two-player case
Algorithms for exploration
Uniform: test all options uniformly
Bernstein races:
- uniformly among non discarded options,
- discard options with statistical tests
Successive reject:
- uniformly among non discarded options,
- discard periodically the worst option
UCB: choose option with best average result + bonus
for options weakly sampled,
Adaptive-UCB-E: a variant of UCB aimed at removing
hyper-parameters
EXP3: empirically best option + random perturbation
Algorithms for recommendation
Empirically Best Arm: choose empirically best option
Most Played Arm: choose most simulated option
Successive reject:: the only non discarded option
UCB: choose option with best average result + bonus
for options weakly sampled.
LCB: choose option with best average result + malus for
options weakly sampled.
Empirical distribution of play: an option has its
frequency (during exploration) as probability (for
recommendation)
TEXP3: idem, but discard low probability options
Experimental results
Big boring tables of results
are in the paper.
Only a sample of most clear
results here.
One player case
Killall Go stones positionning
One player case
Killall Go stones positionning
Uncertainty
should
have
malus in
recommend.
One player case
Killall Go stones positionning
EXP3 for
2player
case
Experimental results: TEXP3
outperforms EXP3 by far
2-player case, game =
Urban Rivals (free online card game)
Do you know killall-Go ?
Black has stones in advance (e.g. 8 in 13x13).
If white makes life, white wins.
If black kills everything, black wins.
Black choose stones
positioning
(strategic decisions).
Left: human is Black and chooses E3 C4.
Right: computer is Black and chooses D3 D5.
White won both.
Human said that the computer choice D3 D5 is good.
Killall Go, H8 (left) H9 (right)
Left: Human Pro Player (5P) as black has 8 handicap stones.
White (computer) makes life and wins.
Right: Human Pro Player (5P) as black has 9 handicap stones
and kills everything and wins.
CONCLUSIONS
1 player case:
UCB for exploration,
LCB or MPA for recommendation
2 player case:
TEXP3 performs best.
Killall-Go
Win against pro with H2 in 7x7 Killall-Go as white.
Loss against pro with H2 in 7x7 Killall-Go as black.
13x13: Computer won as white with H8, lost with H9.
13x13: Computer lost as black with H8 and with H9.
Further work:
Structured bandit: some options are close to each other.
Batoo: Go with strategic choice for both players; nice test case.
Industry: choosing investments for power grid simulations – in progress.

More Related Content

Viewers also liked

Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...
Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...
Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...
Olivier Teytaud
 
Noisy Optimization combining Bandits and Evolutionary Algorithms
Noisy Optimization combining Bandits and Evolutionary AlgorithmsNoisy Optimization combining Bandits and Evolutionary Algorithms
Noisy Optimization combining Bandits and Evolutionary Algorithms
Olivier Teytaud
 
Tools for Discrete Time Control; Application to Power Systems
Tools for Discrete Time Control; Application to Power SystemsTools for Discrete Time Control; Application to Power Systems
Tools for Discrete Time Control; Application to Power Systems
Olivier Teytaud
 
Games with partial information
Games with partial informationGames with partial information
Games with partial information
Olivier Teytaud
 
Theory of games
Theory of gamesTheory of games
Theory of games
Olivier Teytaud
 
The game of Go and energy; two nice computational intelligence problems (with...
The game of Go and energy; two nice computational intelligence problems (with...The game of Go and energy; two nice computational intelligence problems (with...
The game of Go and energy; two nice computational intelligence problems (with...
Olivier Teytaud
 
Optimization of power systems - old and new tools
Optimization of power systems - old and new toolsOptimization of power systems - old and new tools
Optimization of power systems - old and new tools
Olivier Teytaud
 
Artificial intelligence and blind Go
Artificial intelligence and blind GoArtificial intelligence and blind Go
Artificial intelligence and blind Go
Olivier Teytaud
 
Energy Management (production side)
Energy Management (production side)Energy Management (production side)
Energy Management (production side)
Olivier Teytaud
 
Openoffice and Linux
Openoffice and LinuxOpenoffice and Linux
Openoffice and Linux
Olivier Teytaud
 

Viewers also liked (11)

Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...
Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...
Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...
 
Noisy Optimization combining Bandits and Evolutionary Algorithms
Noisy Optimization combining Bandits and Evolutionary AlgorithmsNoisy Optimization combining Bandits and Evolutionary Algorithms
Noisy Optimization combining Bandits and Evolutionary Algorithms
 
Tools for Discrete Time Control; Application to Power Systems
Tools for Discrete Time Control; Application to Power SystemsTools for Discrete Time Control; Application to Power Systems
Tools for Discrete Time Control; Application to Power Systems
 
Games with partial information
Games with partial informationGames with partial information
Games with partial information
 
Grenoble
GrenobleGrenoble
Grenoble
 
Theory of games
Theory of gamesTheory of games
Theory of games
 
The game of Go and energy; two nice computational intelligence problems (with...
The game of Go and energy; two nice computational intelligence problems (with...The game of Go and energy; two nice computational intelligence problems (with...
The game of Go and energy; two nice computational intelligence problems (with...
 
Optimization of power systems - old and new tools
Optimization of power systems - old and new toolsOptimization of power systems - old and new tools
Optimization of power systems - old and new tools
 
Artificial intelligence and blind Go
Artificial intelligence and blind GoArtificial intelligence and blind Go
Artificial intelligence and blind Go
 
Energy Management (production side)
Energy Management (production side)Energy Management (production side)
Energy Management (production side)
 
Openoffice and Linux
Openoffice and LinuxOpenoffice and Linux
Openoffice and Linux
 

Similar to Choosing between several options in uncertain environments

Game Theory SV.docx
Game Theory SV.docxGame Theory SV.docx
Game Theory SV.docx
snehil35
 
Oligopoly and Game Theory
Oligopoly and Game TheoryOligopoly and Game Theory
Oligopoly and Game Theory
tutor2u
 
Simple regret bandit algorithms for unstructured noisy optimization
Simple regret bandit algorithms for unstructured noisy optimizationSimple regret bandit algorithms for unstructured noisy optimization
Simple regret bandit algorithms for unstructured noisy optimization
Olivier Teytaud
 
Module 3 Game Theory (1).pptx
Module 3 Game Theory (1).pptxModule 3 Game Theory (1).pptx
Module 3 Game Theory (1).pptx
DrNavaneethaKumar
 
navingameppt-191018085333.pdf
navingameppt-191018085333.pdfnavingameppt-191018085333.pdf
navingameppt-191018085333.pdf
DebadattaPanda4
 
An introduction to Game Theory
An introduction to Game TheoryAn introduction to Game Theory
An introduction to Game Theory
Paul Trafford
 
game THEORY ppt
game THEORY pptgame THEORY ppt
game THEORY ppt
Dronak Sahu
 
ch_5 Game playing Min max and Alpha Beta pruning.ppt
ch_5 Game playing Min max and Alpha Beta pruning.pptch_5 Game playing Min max and Alpha Beta pruning.ppt
ch_5 Game playing Min max and Alpha Beta pruning.ppt
SanGeet25
 
cai
caicai
21CSC206T_UNIT3.pptx.pdf ARITIFICIAL INTELLIGENCE
21CSC206T_UNIT3.pptx.pdf ARITIFICIAL INTELLIGENCE21CSC206T_UNIT3.pptx.pdf ARITIFICIAL INTELLIGENCE
21CSC206T_UNIT3.pptx.pdf ARITIFICIAL INTELLIGENCE
udayvanand
 
Topic 3- Cooperation and Collective Action
Topic 3- Cooperation and Collective ActionTopic 3- Cooperation and Collective Action
Topic 3- Cooperation and Collective Action
John Bradford
 
adversial search.pptx
adversial search.pptxadversial search.pptx
adversial search.pptx
KalaiarasiRaja
 
AI3391 Artificial Intelligence UNIT III Notes_merged.pdf
AI3391 Artificial Intelligence UNIT III Notes_merged.pdfAI3391 Artificial Intelligence UNIT III Notes_merged.pdf
AI3391 Artificial Intelligence UNIT III Notes_merged.pdf
Asst.prof M.Gokilavani
 
Misscommunication and you
Misscommunication and youMisscommunication and you
Misscommunication and you
trixobird
 
Applied Data Science for monetization: pitfalls, common misconceptions, and n...
Applied Data Science for monetization: pitfalls, common misconceptions, and n...Applied Data Science for monetization: pitfalls, common misconceptions, and n...
Applied Data Science for monetization: pitfalls, common misconceptions, and n...
DevGAMM Conference
 
OR PPT 280322 maximin final - nikhil tiwari.pptx
OR PPT 280322 maximin final - nikhil tiwari.pptxOR PPT 280322 maximin final - nikhil tiwari.pptx
OR PPT 280322 maximin final - nikhil tiwari.pptx
VivekSaurabh7
 
Game theory.ppt for Micro Economics content
Game theory.ppt for Micro Economics contentGame theory.ppt for Micro Economics content
Game theory.ppt for Micro Economics content
DrDeeptiSharma12
 
AI_unit3.pptx
AI_unit3.pptxAI_unit3.pptx
AI_unit3.pptx
G1719HarshalDafade
 

Similar to Choosing between several options in uncertain environments (20)

file1
file1file1
file1
 
Game Theory SV.docx
Game Theory SV.docxGame Theory SV.docx
Game Theory SV.docx
 
Oligopoly and Game Theory
Oligopoly and Game TheoryOligopoly and Game Theory
Oligopoly and Game Theory
 
Simple regret bandit algorithms for unstructured noisy optimization
Simple regret bandit algorithms for unstructured noisy optimizationSimple regret bandit algorithms for unstructured noisy optimization
Simple regret bandit algorithms for unstructured noisy optimization
 
Module 3 Game Theory (1).pptx
Module 3 Game Theory (1).pptxModule 3 Game Theory (1).pptx
Module 3 Game Theory (1).pptx
 
navingameppt-191018085333.pdf
navingameppt-191018085333.pdfnavingameppt-191018085333.pdf
navingameppt-191018085333.pdf
 
An introduction to Game Theory
An introduction to Game TheoryAn introduction to Game Theory
An introduction to Game Theory
 
game THEORY ppt
game THEORY pptgame THEORY ppt
game THEORY ppt
 
ch_5 Game playing Min max and Alpha Beta pruning.ppt
ch_5 Game playing Min max and Alpha Beta pruning.pptch_5 Game playing Min max and Alpha Beta pruning.ppt
ch_5 Game playing Min max and Alpha Beta pruning.ppt
 
cai
caicai
cai
 
21CSC206T_UNIT3.pptx.pdf ARITIFICIAL INTELLIGENCE
21CSC206T_UNIT3.pptx.pdf ARITIFICIAL INTELLIGENCE21CSC206T_UNIT3.pptx.pdf ARITIFICIAL INTELLIGENCE
21CSC206T_UNIT3.pptx.pdf ARITIFICIAL INTELLIGENCE
 
Topic 3- Cooperation and Collective Action
Topic 3- Cooperation and Collective ActionTopic 3- Cooperation and Collective Action
Topic 3- Cooperation and Collective Action
 
adversial search.pptx
adversial search.pptxadversial search.pptx
adversial search.pptx
 
AI3391 Artificial Intelligence UNIT III Notes_merged.pdf
AI3391 Artificial Intelligence UNIT III Notes_merged.pdfAI3391 Artificial Intelligence UNIT III Notes_merged.pdf
AI3391 Artificial Intelligence UNIT III Notes_merged.pdf
 
Misscommunication and you
Misscommunication and youMisscommunication and you
Misscommunication and you
 
Adversarial search
Adversarial search Adversarial search
Adversarial search
 
Applied Data Science for monetization: pitfalls, common misconceptions, and n...
Applied Data Science for monetization: pitfalls, common misconceptions, and n...Applied Data Science for monetization: pitfalls, common misconceptions, and n...
Applied Data Science for monetization: pitfalls, common misconceptions, and n...
 
OR PPT 280322 maximin final - nikhil tiwari.pptx
OR PPT 280322 maximin final - nikhil tiwari.pptxOR PPT 280322 maximin final - nikhil tiwari.pptx
OR PPT 280322 maximin final - nikhil tiwari.pptx
 
Game theory.ppt for Micro Economics content
Game theory.ppt for Micro Economics contentGame theory.ppt for Micro Economics content
Game theory.ppt for Micro Economics content
 
AI_unit3.pptx
AI_unit3.pptxAI_unit3.pptx
AI_unit3.pptx
 

Recently uploaded

[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
awadeshbabu
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
Aditya Rajan Patra
 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
heavyhaig
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
Victor Morales
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
SyedAbiiAzazi1
 
Building Electrical System Design & Installation
Building Electrical System Design & InstallationBuilding Electrical System Design & Installation
Building Electrical System Design & Installation
symbo111
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Amil Baba Dawood bangali
 
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTSHeap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Soumen Santra
 
Unbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptxUnbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptx
ChristineTorrepenida1
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
thanhdowork
 
bank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdfbank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdf
Divyam548318
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
Madan Karki
 
PPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testingPPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testing
anoopmanoharan2
 
sieving analysis and results interpretation
sieving analysis and results interpretationsieving analysis and results interpretation
sieving analysis and results interpretation
ssuser36d3051
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
MIGUELANGEL966976
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
Online aptitude test management system project report.pdf
Online aptitude test management system project report.pdfOnline aptitude test management system project report.pdf
Online aptitude test management system project report.pdf
Kamal Acharya
 
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
Mukeshwaran Balu
 

Recently uploaded (20)

[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
 
Building Electrical System Design & Installation
Building Electrical System Design & InstallationBuilding Electrical System Design & Installation
Building Electrical System Design & Installation
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
 
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTSHeap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
 
Unbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptxUnbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptx
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
 
bank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdfbank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdf
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
 
PPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testingPPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testing
 
sieving analysis and results interpretation
sieving analysis and results interpretationsieving analysis and results interpretation
sieving analysis and results interpretation
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
Online aptitude test management system project report.pdf
Online aptitude test management system project report.pdfOnline aptitude test management system project report.pdf
Online aptitude test management system project report.pdf
 
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
 

Choosing between several options in uncertain environments

  • 1. METAGAMING: Bandits with simple regret and small budget Chen-Wei Chou, Ping-Chiang Chou, Chang-Shing Lee, David Lupien St-Pierre, Olivier Teytaud, Mei-Hui Wang, Li-Wen Wu and Shi-Jim Yen
  • 2. Outline: - what is a bandit problem ? - what is a strategic bandit problem ? - is a strategic bandit different from a bandit ? - algorithms - results
  • 3. What is a bandit problem ? A finite number of time steps A (finite) number of options, each of them equipped with a (unknown) proba distribution At each time step: - you choose one option - you get a reward, distributed according to its proba distribution At the end: - you choose one option (you can not change anymore...) - your reward is the expected reward associated to this option
  • 4. What is a bandit problem ? A finite number of time steps A (finite) number of options, each of them equipped with a (unknown) proba distribution At each time step: - you choose one option - you get a reward, distributed according to its proba distribution At the end: - you choose one option (you can not change anymore...) - your reward is the expected reward associated to this option Here we collect information
  • 5. What is a bandit problem ? A finite number of time steps A (finite) number of options, each of them equipped with a (unknown) proba distribution At each time step: - you choose one option - you get a reward, distributed according to its proba distribution At the end: - you choose one option (you can not change anymore...) - your reward is the expected reward associated to this option Here we use information for the final choice
  • 6. What is a bandit problem ? A finite number of time steps A (finite) number of options, each of them equipped with a (unknown) proba distribution At each time step: - you choose one option - you get a reward, distributed according to its proba distribution At the end: - you choose one option (you can not change anymore...) - your reward is the expected reward associated to this option Here, we explore
  • 7. What is a bandit problem ? A finite number of time steps A (finite) number of options, each of them equipped with a (unknown) proba distribution At each time step: - you choose one option - you get a reward, distributed according to its proba distribution At the end: - you choose one option (you can not change anymore...) - your reward is the expected reward associated to this option Here, we take no risk
  • 8. What is a bandit problem ? A finite number of time steps A (finite) number of options, each of them equipped with a (unknown) proba distribution At each time step (exploration): - you choose one option - you get a reward, distributed according to its proba distribution At the end (recommendation): - you choose one option (you can not change anymore...) - your reward is the expected reward associated to this option
  • 9. Which kind of bandit ? - in the bandit literature, options are also termed “arms” - here the criterion is the expected reward of the option chosen at the end (sometimes it is the sum of the rewards during exploration) - we presented here stochastic bandits (a probability distribution per option) ==> next slide is different
  • 10. And adversarial bandit ? A finite number of time steps A (finite) number of options for player 1, and a finite number of options for player 2. An unknown probability distribution for each pair of options At each time step: - you choose one option for P1 and one option for P2 - you get a reward, distributed according to the corresponding proba distribution At the end: - you choose one **probabilistic** option for P1 (you can not change anymore...) - your reward is the expected reward associated to this option, for the worst choice by P2
  • 11. What is meta-gaming ? What is “strategic choice” ? Strategic choices: - decisions once and for all, at a high level - ≠ from tactical level Meta-gaming: choice at a strategic level, in games: - choosings cards, in card games - choosing handicap positioning, in Go ==> once and for all, at the beginning of the game
  • 12. Example of stochastic bandit (i.e. 1P strategic choice) Game of Go handicap bandit problem, at each time step: - you choose one handicap positioning - then you simulate one game from this position ==> only one player has a strategic choice ==> stochastic bandit
  • 13. Example of adversarial bandit (i.e. 2P strategic choice) Urban Rivals bandit problem, at each time step: - you choose - one set of cards for you (P1) - one set of cards for P2 - then you simulate one Urban Rivals game from this position PLAYER 1: PLAYER 2: ==> two players have a strategic choice ==> adversarial bandit
  • 14. Is a strategic bandit problem different from a classical bandit problem ? No difference in nature Just a much smaller budget
  • 15. Algorithms Reminder: - two algorithms needed: - one for choosing during N exploration steps - one for choosing during 1 recommendation step - two settings - one-player case - two-player case
  • 16. Algorithms for exploration Uniform: test all options uniformly Bernstein races: - uniformly among non discarded options, - discard options with statistical tests Successive reject: - uniformly among non discarded options, - discard periodically the worst option UCB: choose option with best average result + bonus for options weakly sampled, Adaptive-UCB-E: a variant of UCB aimed at removing hyper-parameters EXP3: empirically best option + random perturbation
  • 17. Algorithms for recommendation Empirically Best Arm: choose empirically best option Most Played Arm: choose most simulated option Successive reject:: the only non discarded option UCB: choose option with best average result + bonus for options weakly sampled. LCB: choose option with best average result + malus for options weakly sampled. Empirical distribution of play: an option has its frequency (during exploration) as probability (for recommendation) TEXP3: idem, but discard low probability options
  • 18. Experimental results Big boring tables of results are in the paper. Only a sample of most clear results here.
  • 19. One player case Killall Go stones positionning
  • 20. One player case Killall Go stones positionning Uncertainty should have malus in recommend.
  • 21. One player case Killall Go stones positionning EXP3 for 2player case
  • 22. Experimental results: TEXP3 outperforms EXP3 by far 2-player case, game = Urban Rivals (free online card game)
  • 23. Do you know killall-Go ? Black has stones in advance (e.g. 8 in 13x13). If white makes life, white wins. If black kills everything, black wins. Black choose stones positioning (strategic decisions).
  • 24. Left: human is Black and chooses E3 C4. Right: computer is Black and chooses D3 D5. White won both. Human said that the computer choice D3 D5 is good.
  • 25. Killall Go, H8 (left) H9 (right) Left: Human Pro Player (5P) as black has 8 handicap stones. White (computer) makes life and wins. Right: Human Pro Player (5P) as black has 9 handicap stones and kills everything and wins.
  • 26. CONCLUSIONS 1 player case: UCB for exploration, LCB or MPA for recommendation 2 player case: TEXP3 performs best. Killall-Go Win against pro with H2 in 7x7 Killall-Go as white. Loss against pro with H2 in 7x7 Killall-Go as black. 13x13: Computer won as white with H8, lost with H9. 13x13: Computer lost as black with H8 and with H9. Further work: Structured bandit: some options are close to each other. Batoo: Go with strategic choice for both players; nice test case. Industry: choosing investments for power grid simulations – in progress.