Transcript of "Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and some applications"
1.
SOME TOOLSFOR ARTIFICIAL INTELLIGENCEOlivier Teytaud --- olivier.teytaud@gmail.comNUTN, Tainan, 2011
2.
Tao (Inria, Cnrs, Lri, Paris-Sud)People: Permanent staff: 11 ~15 ph.D. StudentsIn Université Paris-Sud Largest campus in France Faculty of sciences: mathematics, computer science, physics, chemistry, biology, earth and space sciences ==> 12000 studentsInria affiliation: Around 50 years old Devoted to research in comp. science
3.
Tao (Inria, Cnrs, Lri, Paris-Sud)Reservoir computingOptimal decision making under uncertaintyOptimizationAutonomic computerMachine learning
4.
Communication not always so easy: Many of you speak Chinese + Taiwanese. So English = third language. I am French. English = second language. I work mainly in mathematical aspects of computer science, more than computer science.Difficulties might also be an enrichment.Feel free to interrupt me as much as useful.NUTN, Tainan, 2011
5.
Communication not always so easy: Many of you speak Chinese + Taiwanese. So English = third language. I am French. English = second language. I work mainly in mathematical aspects of computer science, more than computer science.Difficulties might also be an enrichment.Feel free to interrupt me as much as useful.NUTN, Tainan, 2011
6.
Vita in a nutshell:1) First research: mathematical logic2) I had fun, but I wanted to be “directly” useful. I switched to Statistics.3) I had fun, but I wanted to be “more directly” useful. Switched to Operational Research, in industry. - Many applications. - My favorite: electricity generation.4) Now (40 dangerously approaching), Artificial Intelligence: - Mathematics. - Challenges (in particular games). - Applications.
7.
Vita in a nutshell:1) First research: mathematical logic2) I had fun, but I wanted to be “directly” useful. I switched to Statistics.3) I had fun, but I wanted to be “more directly” useful. Switched to Operational Research, in industry. - Many applications. - My favorite: electricity generation.4) Now (40 dangerously approaching), Artificial Intelligence: - Mathematics. - Challenges (in particular games). - Applications.
8.
Vita in a nutshell:1) First research: mathematical logic2) I had fun, but I wanted to be “directly” useful. I switched to Statistics.3) I had fun, but I wanted to be “more directly” useful. Switched to Operation Research, in industry. - Many applications. - My favorite: electricity generation.4) Now (40 dangerously approaching), Artificial Intelligence: - Mathematics. Goes back to military - Challenges application around world war II, (in particular games). - Applications. UK resisted to Hitler thanks when to optimized radars. Now essentially civil applications.
9.
Vita in a nutshell:1) First research: mathematical logic2) I had fun, but I wanted to be “directly” useful. I switched to Statistics.3) I had fun, but I wanted to be “more directly” useful. Switched to Operational Research, in industry. - Many applications. - My favorite: electricity generation.4) Now (40 years old soon...), Artificial Intelligence: - Mathematics. - Beautiful challenges (in particular games). - Applications.
10.
Outline of what Ill discuss:1) Some concepts: - simplified problems - toolboxes for these problems2) Principle: - reducing real problems to groups of artificial problems - small problems might be considered as artificial and useless when considered alone. - but when you solve a clearly stated small problem, usually you can find an application for this solution. - we will see applications as well.==> For the moment lets see “big” applications3) Ill also show some works on which contributors are welcome.
13.
ELECTRICITY GENERATIONThe case of FranceData: - climate model (stochastic) - model of electricity demand (stochastic) - model of power plantsEach day we receive: - electricity consumption - weather information - info on faultsEach day, we decide how to distribute the productionamong the power plants. (also: schedule long-terminvestiments)
14.
Data: - climate model (stochastic) - model of electricity demand (stochastic) - model of power plants (PP): nuclear PP (NPP), thermal PP (TPP), Hydroelectric PP (HPP)...Each day we receive: - electricity consumption - weather information - info on faultsEach day, we decide how to distribute the production among the powerplants. Daily information DATA (climate, Electric PROGRAM STRATEGY plants, system economy) Decisions
15.
One of the most important industrial problem you can imagine:how to produce energy ?France has specific elements:- heavily nuclearized (most nuclearized country in the world) - often cooled by rivers (do not work in case of droughts ==> hard to predict) - we must schedule maintenance - we must take long-term decisions (building new NPP ? Removing ?)- also hydroelectricity: - should we use water now ? - should we keep it for winter (in France, high consumption is in winter) Daily information DATA (climate, Electric PROGRAM STRATEGY plants, system economy) Decisions
16.
Problem 1: Taiwan is very different from France :-)Almost no nuclear power plant ? Cooled by sea ?Electrically connected to other countries ? (France might be connected to Africa)Sun sufficient for massive photo-voltaic units ?Wind much stronger than in France - can be used ?Other questions ?Electriciy consumption dominated by air conditioning ?Maybe electric cars in the future ?Climate maybe more regular ? Problem easier than in France ?==> I dont know==> Id like to work on it (energy is an important concern, in Taiwan as well – lack of independence ?)==> Need Chinese-reading persons==> Other (Taiwan-independent) concern: tackling partial observation in energy generation problem
17.
GOOD NEWS: we had aGAME OF GO lot of progress with **generic** algorithms(with Nutn) (algorithms which can be used for many things). The revolution in Go which occurred in 2007-2009 is a major breakthrough in Artificial Intelligence. Well see that in details. I am a little bit tired of the game of Go, because I have no recent progress, and recent progress in the community comes from Go expertise, which is only useful for Go...
18.
Problem 2: Solving unsolved situations in GoNow computers are much stronger than in the past.However, they still misunderstand some trivial situations (in particular, liberty races).You have an idea ? Tell me :-)We have a solver inFrance (not for playing Go;aimed at provably solving),that we wouldlike to test on varioussituations. We do notplay Go. If you are 5kyuor better, you cancontribute.
19.
URBAN RIVALS17 Millions registered users. Important company.
20.
URBAN RIVALS- Choose 4 cards, your opponent chooses 4 Cards- Each player gets 12 “Pilz” (i.e. strength points)- Each player gets health points.- Each turn: - each player chooses a card - each player uses pilz (each used pilz is lost forever, but it gives strength) - read cards, apply rules==> no more health point ? ==> youre dead.
21.
Urban Rivals==> Partial information because you dont observe your opponents decisions==> There are “on the shell” algorithms and programs for full information games, but not for partial information games.==> We used a (provable) combination of MCTS and EXP3==> Immediately human level performance ==> suggests that maths can help ==> still possible works: - automatic choice of cards ? - reducing comp. cost ?
22.
POKEMONS 皮卡丘Second most lucrative videogame.Meta-gaming: choosing your deck.
23.
POKEMONS: Problem 3Second most lucrative videogame.Meta-gaming: choosing your deck.In-gaming: playing with your set of cards.
24.
Problem 4: Solving MineSweeper. Find an optimal move ?Looks like a trivial boring problem. Certainly not indeed.Many papers with the same approach (so-called CSP technique)We could outperform these algorithms thanks to a probabilistic approach.But my approach only works on small board (or huge computational cost) ==> we want to extend.Quite similar to electricity generation (yes, I believe in this)
25.
Game applications can be considered as childish.Shouldnt we focus on more important things ?However:- If you have a breakthrough in an important game, people will trust you. Doors will be opened when you will propose new algorithms for real-world applications.- Testing ideas on a nuclear power plant is more dangerous than testing ideas on a game of Go.- Its easier to compare approaches in games than in electricity generation.
26.
INTRODUCTION IS OVER.NOW TECHNICAL STUFF.REMARKS, QUESTIONS ?
27.
TODAY, GAMES.1) HOW TO SOLVE THEM2) C IMPLEMENTATION
28.
ONE FUNDAMENTAL TOOL: ZERMELOConsider the following game:- there are 5 sticks;- in turn, each player removes 1 or 2 sticks;- the player which removes the last stick looses.Example:Player I: IIIIIPlayer II: IIIPlayer I: I ==> looses! How should I play ?
29.
ONE FUNDAMENTAL TOOL: ZERMELOZermelo proposed a solution (for full-information games).Born in 1871.1900-1905: major contributions in logic.1913: major contribution to games in 1913.1931: Optimized navigation (from games to applications).Resigned in 1935 (he did not like Hitler).Died in 1953.
31.
ZERMELO: I HAVE THE OPTIMAL STRATEGY! 5 LOSS! 4 3 WIN! WIN! 3 2 2 1 LOSS! WIN! WIN! 1 2WIN! LOSS!
32.
ZERMELO: not limited to win/loss games. Can work on games with continuous rewards.New rule: if the game contains 4, reward is multiplied by 2. YELLOW NODES: 5 BLUE NODES: LABEL = MINIMUM 2 LABEL = MAXIMUM OF CHILDRENs LABELS OF CHILDRENs LABELS 0 4 3 2 2 3 2 2 1 0 2 1 1 2 2 0
33.
ZERMELO: C CODEstruct gameState{ int *descriptionOfState; int numberOfLegalMoves; int * legalMoves; int turn; // 1 if player 1 plays, -1 otherwise int result; // final reward, if numberOfLegalMoves=0};struct gameState next(struct gameState s,int move) { RULES };double zermeloValue(struct gameState s){ int i;double value; double maxValue=-MAXDOUBLE; if (s.numberOfLegalMoves==0) return(s.turn * s.result); for (i=0;i<s.numberOfLegalMoves;i++) { value=s.turn*zermeloValue(next(s,s.legalMoves[i])); if (value>maxValue) maxValue=value; } return s.turn*maxValue; //we return value for player 1}
34.
ZERMELO: C CODEstruct gameState{ int *descriptionOfState; int numberOfLegalMoves; Int * legalMoves; int turn; // 1 if player 1 plays, -1 otherwise int result; // final reward, if numberOfLegalMoves=0};struct gameState next(struct gameState s,int move) { RULES };double zermeloValue(struct gameState s){ int i;double value; double maxValue=-MAXDOUBLE; if (s.numberOfLegalMoves==0) return(s.turn * s.result); for (i=0;i<s.numberOfLegalMoves;i++) { value=s.turn*zermeloValue(next(s,s.legalMoves[i])); if (value>maxValue) maxValue=value; } return s.turn*maxValue; //we return value for player 1}
35.
Last week: Zermelo algorithm.What is Zermelo ? = Simplest algorithm for solving 1Player or 2Player games. = Recursive algorithm = Conveniently (but slowly) implemented with “struct” This week = a bit more on Zermelo algorithm = C development: “static” random variables Future weeksStill some C implementation (or other languages ? as you wish)Still some (not always easy) algorithmsModels of applications I hope I can convince you that operational research / artificial intelligence are useful and fun.
36.
Zermelo again. What does the “zermeloValue()” function returns ?===> The reward in case of perfect play.===> A perfect strategy.===> Gods can run Zermelo algorithms: perfect play.==> humans have no time for this.==> Can we design a new version in case it is too slow ?
37.
Lets see a pseudo-code, instead of a code.double zermeloValue(struct gameState s){ if (s is end of game) then return score. else { If (play 1 plays) then return max(zermeloValue(children)) Else return min(zermeloValue(children)) }}
39.
ZERMELO: C CODE FOR THE DEPTHdouble zermeloValue(struct gameState s){ static int depth=0; int i;double value; double maxValue=-MAXDOUBLE; if (s.numberOfLegalMoves==0) return(s.turn * s.result); depth++; for (i=0;i<s.numberOfLegalMoves;i++) { value=s.turn*zermeloValue(next(s,s.legalMoves[i])); if (value>maxValue) maxValue=value; } depth--; return s.turn*maxValue; //we return value for player 1}
40.
Sometimes it is too slow. Then, what can I do ?
43.
We will not go But, what shouldbelow this depth. zermeloFunction return ?
44.
double zermeloValue(struct gameState s){ static int depth=0; Should we return int i;double value; a random number ? double maxValue=-MAXDOUBLE; if (s.numberOfLegalMoves==0) return(s.turn * s.result); if (depth>5) return drand48(); depth++; for (i=0;i<s.numberOfLegalMoves;i++) { value=s.turn*zermeloValue(next(s,s.legalMoves[i])); if (value>maxValue) maxValue=value; } depth--; return s.turn*maxValue; //we return value for player 1}
45.
double zermeloValue(struct gameState s){ static int depth=0; int i;double value; double maxValue=-MAXDOUBLE; if (s.numberOfLegalMoves==0) return(s.turn * s.result); if (depth>5) return heuristicValue(s); depth++; for (i=0;i<s.numberOfLegalMoves;i++) { A function written by some expert of value=s.turn*zermeloValue(next(s,s.legalMoves[i])); if (value>maxValue) maxValue=value; game. the } depth--; return s.turn*maxValue; //we return value for player 1}
46.
SHANNON and games This idea is a main contribution by Shannon (for European chess). Shannon 1916-2001 Noble prize (not Nobel!)Works in:- Logic- Games (also: artificial mouse for mazes)- Financial analysis
50.
ALPHA-BETA PRINCIPLE OF ALPHA-BETA:In zermeloFunction, considering a opponent node, if I know:- THAT AT PREVIOUS DEPTH, I CAN REACH SCORE ALPHA=6,- THAT IN CURRENT STATE MY OPPONENT CAN ENSURE SCORE BETA<6, I CAN STOP STUDYING THIS BRANCH.==> THIS IS A “ALPHA-CUTOFF“==> OTHER PLAYER: “BETA-CUTOFF“ (just exchange players)
52.
EXAMPLE OF GAME (we can discuss why it is a good game)- Randomly generate a 4x4 matrix with 0 and 1 (K=4). 0011 1001 0111 1000- Player one removes top part or bottom part 0111 1000- Player two removes left part or right part 01 10- Player one removes top part of bottom part 01- Player two removes left part or right part 0 ==> Player one wins if 1, player two wins if 0!
53.
POSSIBLE HOME WORK1) ZERMELO: can you implement it on a simple game ?2) MINIMAX: can you add a heuristic function ? Which heuristic function ? Experiments: plot a graph: X(depth) = computation time of minimax (divided by Zermelos computation time) Y(depth) = win rate against Zermelo3) ALPHA-BETA Can you modify it ==> alpha-beta pruning ? Plot a graph for various sizes: X = number of visited nodes Y = average winning rate of alpha-beta vs minimax Or X = depth Y = average winning rate of a-b vs a-b with depth -1
54.
APPLICATION OF ZERMELO WE HAVE SEEN THE 5-STICKS GAME.CAN WE FIND A REALLY USEFUL APPLICATION ?
55.
APPLICATION OF ZERMELO WE HAVE SEEN THE 5-STICKS GAME.CAN WE FIND A REALLY USEFUL APPLICATION ?I have:- water
56.
APPLICATION OF ZERMELO WE HAVE SEEN THE 5-STICKS GAME.CAN WE FIND A REALLY USEFUL APPLICATION ?I have:- water- plants (which need water during summers heat wave)
57.
APPLICATION OF ZERMELO WE HAVE SEEN THE 5-STICKS GAME.CAN WE FIND A REALLY USEFUL APPLICATION ?I have:- water- plants (which need water during summers heat wave)Actions = giving water to plants, or not.
58.
APPLICATION OF ZERMELO I have: - water - plants (which need water during summers heat wave)Each day, I choose an action.State = { date +water level in stock + water level in plants }Reward = quality / quantity of production.
60.
IMPORTANT REMARK:- Maybe this does not look serious.- But heat waves are a serious problem.- Here the problem is simplified, but the concepts for the real application are the same.- Applying this just requires a computer and datas/models about plants/water resources.==> if you can apply Zermelo variants correctly, you can help for a better world.
61.
However, the “nextState” function israndomized ==> we need a Zermelo for thiscase
62.
s.turn == 0: action is randomly chosen.double zermeloValue(struct gameState s){ This is Zermelo, adapted to int i;double value; static int depth=0; stochastic games. If (s.turn==0) References: { value=0; - Massé double total=0; - Bellman for (i=0;i<s.numberOfLegalMoves;i++) value+=zermeloValue(next(s,s.legalMoves[i])); return value/s.numberOfLegalMoves; } double maxValue=-MAXDOUBLE; if (s.numberOfLegalMoves==0) return(s.turn * s.result); if (depth>5) return heuristicValue(s); depth++; for (i=0;i<s.numberOfLegalMoves;i++) { value=s.turn*zermeloValue(next(s,s.legalMoves[i])); if (value>maxValue) maxValue=value; } depth--; return s.turn*maxValue; //we return value for player 1}
63.
ONE MORE TOOL: MATRIX GAMESThe problem: Solving Matrix Games.A solution: EXP3.
64.
What is a (0-sum) Matrix Game ?Example: 1 0 0M= 0 1 1 1 0 1- You choose (privately) a row (i is 1, 2 or 3).- In same time, I choose (privately) a column (j=1, 2 or 3).- My reward: M(i,j)- Your reward: -M(i,j)I want a 1, you want a 0.Given M, how should I play ?
65.
What is a (0-sum) Matrix Game ?Example: rock-paper-scissor Rock Paper Scissor Rock 0 -1 1M= Paper 1 0 -1 Scissor -1 1 0- You choose (privately) a row (i is 1, 2 or 3).- In same time, I choose (privately) a column (j=1, 2 or 3).- My reward: M(i,j)- Your reward: -M(i,j)I want a 1, you want a 0.Given M, how should I play ?
66.
Given M, how should I play ?Nash (diagnosed with paranoid schizophrenia)got a Nobel prize for his work around that.Principle of a Nash equilibrium:- pure strategy = “fixed” strategy (e.g. “play scissor”)- mixed strategy = randomized strategy (e.g. “play scissor with probability ½ and play rock with probability ½”- choose the mixed strategy such that “The worst possible score against any opponent strategy is maximum” ==> “Nash” strategy ==> EXP3: algorithm for finding Nash strategies.
67.
IMPORTANT FACTS ON GAMES:- Turn-based, full-information games, solvers exist: - Too slow for chess, Go. - Ok for 8x8 checkers. ==> Zermelo ==> variants: Minimax, Alpha-beta, play reasonably well many games- Matrix games: - Nash strategies = wort-case optimal - Nash strategies = randomized strategies
68.
A BETTER EXAMPLE ? POKEMON.Each player chooses 2 pokemons amongthe 3 possible ones (real life: 3 or 4among hundreds).
69.
A BETTER EXAMPLE ? POKEMON.Three possibilities:
70.
A BETTER EXAMPLE ? POKEMON. Three possibilities (the same as choosing a row in a 3x3 matrix game): Player 2Player 1 Check who wins (by some full-observation game-solver).
71.
A BETTER EXAMPLE ? POKEMON. Three possibilities (the same as choosing a row in a 3x3 matrix game): Player 2Player 1 P1 P2 P2 P2 P1 P1 P1 P2 P1
72.
A BETTER EXAMPLE ? POKEMON. Three possibilities (the same as choosing a row in a 3x3 matrix game): Player 2Player 1 1 0 0 0 1 1 1 0 1
73.
EXP3 principle for Nash equilibrium of KxK matrix M: - choose a number N of iterations - S1=null vector - S2=null vector - at each iteration t=1, ..., t=N: { - compute p1 as a function of S1 // we will see how - compute p2 as a function of S2 // we will see how - randomly draw i according to probability distribution p1 - randomly draw j according to probability distribution p2 - define r=M(i,j) in the matrix - S1(i)+= r / p1(i) - S2(j)+=(1-r) / p2(j) - Player1Nash(i)+= (1/N); - Player2Nash(j)+= (1/N); }
74.
EXP3 principle for Nash equilibrium of KxK matrix M: - choose a number N of iterations - S1=null vector - S2=null vector - at each iteration t=1, ..., t=N: { - compute p1 as a function of S1 // we will see how - compute p2 as a function of S2 // we will see how - randomly draw i according to probability distribution p1 - randomly draw j according to probability distribution p2 - define r=M(i,j) in the matrix - S1(i)+= r / p1(i) - S2(j)+=(1-r) / p2(j) - Player1Nash(i)+= (1/N); - Player2Nash(j)+= (1/N); }
76.
Q&A: (my questions, and also yours)Q: Who cares about matrix games ?A: Useful for many things. Unfortunately, its usually a building block inside more complex algorithms. We will see examples, but later.Q: Is a Nash strategy optimal ?A: It depends for what... It is optimal in a worst case sense (i.e. against a very strong opponent). Not necessarily very good against a weak opponent.
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.
Be the first to comment