Your SlideShare is downloading. ×
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
win rate first search
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

win rate first search

849

Published on

discussion of Monte-Carlo search method for computer game, especially shogi (Japanese chess). presenting "win rate first search."

discussion of Monte-Carlo search method for computer game, especially shogi (Japanese chess). presenting "win rate first search."

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
849
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. From Monte-Carlo to win rate first search for “Dobutsu Shogi” 2010/05/22 IHARA Takehiro
  • 2. Abstract • On algorithm for computer Shogi (Japanese chess) • Contents – Exhibition of Dobutsu Shogi – Min-max method (conventional) – Monte-Carlo method (conventional) – Win rate first search (presented)
  • 3. Dobutsu shogi • This slide mentions computer game algorithm by using Dobutsu Shogi • Dobutsu Shogi: a miniature shogi • Shogi: Japanese chess • Dobutsu: animal • Normal shogi is too large to examine new methods
  • 4. Rule of Dobutsu Shogi 1 Five kind of pieces Initial position is as figure Win if you catch lion Win if your lion reaches to opposite end Chick promotes chicken
  • 5. Rule of Dobutsu Shogi 2 All pieces move by one step vertical horizontal and forward forward-diagonal around 8 diago vertical squares nal horizontal You can reuse (drop) the pieces that you took
  • 6. Copy right of Dobutsu shogi • I do not know who has copy right – FUJITA Maiko (illustration) – KITAO Madoka (making rule) – LPSA (the two designers had belonged to) – GENTOSHA Education (toy seller)
  • 7. Illustration on this slide • Because of that complex copy right, I use the illustrations on the website below in this slide, instead of FUJITA's ones • “SOZAIYA JUN” • (http://park18.wakwak.com/~osyare/)
  • 8. Exhibition initial position Black: win rate first search (presented) White: min-max method, search depth 9, evaluation function is composed by only piece value (conventional)
  • 9. Exhibition 1st move Black advanced giraffe
  • 10. Exhibition 2nd move White advanced giraffe
  • 11. Exhibition 3rd move Black took chick by chick
  • 12. Exhibition 4th move White took chick by elephant
  • 13. Exhibition 5th move Black advanced elephant
  • 14. Exhibition 6th move White dropped chick for defense
  • 15. Exhibition 7th move Black moved giraffe backward
  • 16. Exhibition 8th move White advanced giraffe
  • 17. Exhibition 9th move Black dropped chick for defense
  • 18. Exhibition 10th move White took elephant by giraffe
  • 19. Exhibition 11th move Black took giraffe by lion
  • 20. Exhibition 12th move White dropped elephant This elephant combination style is strong
  • 21. Exhibition 13th move Black lion escaped
  • 22. Exhibition 14th move White advanced lion
  • 23. Exhibition 15th move Black dropped giraffe and check
  • 24. Exhibition 16th move White escaped lion
  • 25. Exhibition 17th move Black advanced giraffe Black forced white to select taking giraffe or escaping elephant
  • 26. Exhibition 18th move White took giraffe by elephant
  • 27. Exhibition 19th move Black took elephant by lion
  • 28. Exhibition 20th move White dropped giraffe
  • 29. Exhibition 21st move Black dropped elephant behind lion
  • 30. Exhibition 22nd move White moved elephant backward
  • 31. Exhibition 23rd move Black advanced elephant
  • 32. Exhibition 24th move White check by giraffe
  • 33. Exhibition 25th move Black took giraffe by elephant
  • 34. Exhibition 26th move White took elephant by chick If white had taken by elephant, white would be mate
  • 35. Exhibition 27th move Black lion escaped
  • 36. Exhibition 28th move White dropped elephant
  • 37. Exhibition 29th move Black check by giraffe
  • 38. Exhibition 30th move White took giraffe by elephant
  • 39. Exhibition 31st move Black took chick by lion, and white resigned After it, white drops giraffe on side of lion, black giraffe takes elephant and check, white lion takes it, black chick advances, white lion moves backward, black drops chick, check mate
  • 40. Min-max method • A conventional method • Today the most successful method for shogi • Explanation using tree structure from next page
  • 41. Min-max Example: 3 depth Present board position after 1 and 2 moves Board position Board position after 3 moves
  • 42. Suppose scores after 3 moves were revealed -8 23 5 -9 Min-max 3 10 -3 -4
  • 43. Scores after 2 moves are maximum of each score -8 23 23 5 5 -9 Min-max 3 10 10 -3 -3 -4
  • 44. Scores after 1 moves are minimum of each score -8 23 23 5 5 5 -9 Min-max 3 10 10 -3 -3 -3 -4
  • 45. Select the move having maximum score -8 23 23 5 5 5 -9 5 Min-max 3 10 10 -3 -3 -3 -4
  • 46. Min-max method • Theoretically you can select the move that has the maximum score after N moves • Theoretically if we could obtain the score of the end of the game, we would always win the game • Practically because of too large computational cost, we cannot calculate all moves
  • 47. Min-max method • Although many methods for reducing computational cost is presented, they will be not mentioned this slide (It is called pruning to reduce the number of searched nodes)
  • 48. Conclusion of min-max method • It uses tree structure • Scores after N moves are needed • Pruning is needed
  • 49. Monte-Carlo method • While I do not know the history of Monte- Carlo method, it have been successful for computer “go” (precisely successful by Monte-Carlo tree search) • They say that it is difficult to apply computer shogi (or chess-like game) yet
  • 50. Outline of Monte-Carlo first move • Repeat random moves • Then game finishes random move and winner is playout revealed • making game end by random moves is called playout end of game
  • 51. Outline of Monte-Carlo • Repeat playout • Obtain win rate of the first move • (number of win) / (number of playout) • Select move having highest win rate at the last
  • 52. Outline of Monte-Carlo • Outline is only it • As to “Go”, this method has become stronger by combining tree structure and making Monte-Carlo tree search (this slide does not mention it) • Another improvement is that playout uses moves by knowledge of “Go” instead of simple random moves
  • 53. Example of knowledge of “Go” • Observe 3x3 squares • Set low probability to drop black stone the center of above figure • Set high probability to drop black stone the center of below figure
  • 54. Monte-Carlo for shogi • Simple Monte-Carlo method does not work for shogi (too many bad moves appear) • A causal must be that few moves in all legal moves are good on shogi • I do not want to use knowledge of shogi by neither machine learning nor manual setting
  • 55. Why Monte-Carlo for shogi • Ability to determine the move by result of the end of game, which seems beautiful • No evaluation function is needed, no preset knowledge is needed
  • 56. Discussion Monte using tree green and red equal win rate between Simple random moves lead Truth is that green win and red lose It tells importance of tree structure
  • 57. Discussion Monte using tree after 3 moves Suppose you obtain win rate 0.1 0.3 0.7 0.8 0.2 0.6 0.9 0.4 Obtain win rate of green and red from These 3-move-after rates by playout
  • 58. Discussion Monte using tree ones of min-max method Ideally the rates are equal to 0.3 0.6 0.3 0.8 0.6 0.9 0.1 0.3 0.7 0.8 0.2 0.6 0.9 0.4
  • 59. Discussion Monte using tree • Q: How do you calculate parent node 0.6 by children nodes 0.2 and 0.6 0.6 • A: Ignore 0.2 0.2 0.6
  • 60. Discussion Monte using tree • Q: How do you ignore 0.2? • A1: Always search maximum 0.6 win rate node • A2: sometimes search through node randomly 0.2 0.6
  • 61. Discussion Monte using tree maximum win rate Search node that has 0.1 0.3 0.7 0.8 0.2 0.6 0.9 0.4 This tactics finds the best path
  • 62. Win rate first search • Remember win rate of searched node • Almost always search node that has maximum win rate • Sometimes search randomly (ideally it is not needed) • Then this algorithm finds the best move
  • 63. Additional explanation • Update win rate at every playout • Keep numerator and denominator as win rate • Add constant number to both numerator and denominator when win the playout • Add constant number to only denominator when lose the playout
  • 64. Problems of presented method • Win rates of the nodes that have not been searched are mentioned from the next pages • Many other issues must be hiding, though I have not defined them
  • 65. Unreached node • On the node that has not been searched and no win rate 0.4 0.6 0.3 unreached
  • 66. Another win rate • Before this page, knowledge of shogi does not appear and only graph is used • This win rate uses knowledge of shogi • Win rate is calculated by kind of moves • For example, taking piece, promotion, and etc.
  • 67. Another win rate • Calculate win rate by these factors – Piece position before and after move – Kind of pieces moving and taken – Is position whether controlled or not • Win rate table for all combination of these factors is prepared • These win rates are learned by playout, whose values are not prepared
  • 68. Another smaller win rate • Another smaller win rate table is prepared – Kind of pieces moving and taken – Is position whether controlled or not • Since it is small, it learns fast • It is used when “another larger win rate” is not learned yet • If all three kinds of win rate have not been learned, let win rate be 1
  • 69. Conclusion of presented method • Win rates of all searched nodes are remembered and learned by playout • Select node that has highest win rate in playout (“win rate first search”) • Sometimes select node randomly • If win rate has not been learned, other win rates are used
  • 70. Condition of simulation game • Win rate first search vs. Simple min-max method (evaluation function is composed by only values of pieces) • If the game continues till 80 moves, the game is regarded as even (special rule for this simulation)
  • 71. Result of simulation 1 Number of playout 10000 30000 100000 Presented method: 22-76 44-52 48-49 black Presented method: 16-81 30-68 61-35 white Win-lose for presented method in 100 games Some even games exist Depth of min-max method is 6 More the playouts are, stronger the method is
  • 72. Result of simulation 2 Depth of min-max 4 5 6 7 8 9 Present method: 94-6 77-20 48-49 37-61 24-73 14-85 black Present method: 78-21 78-20 61-35 38-57 40-52 20-74 white Win-lose for presented method in 100 games Some even games exist 100000 playouts for presented method Almost same strongness to 6-depth min-max
  • 73. Impression by human viewer • Frequently presented method take bad moves • Although it is a variation of Monte-Carlo method, it can find mate route • It is good at finding narrow route • Difference of the number of playout shows clearly difference of strongness
  • 74. Conclusion and future issue • Conclusion – Playout by win rate first – Select moves without preset knowledge – Select moves by result of playout • Future – Someone can apply it to “Go” or other chess-like games – I return to research speech signal processing

×