Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
หัดเขียน A.I. แบบ
AlphaGo กันชิวๆ
Kan Ouivirach
http://www.tgdaily.com/web/134986-machine-learning-ai-and-digital-intellig...
Kan Ouivirach
Research & Development
Engineer
http://www.bigdata-madesimple.com/
Machine Learning
“Given example pairs, induce such
that for given pairs and
generalizes...
Dog
Cat
Generalization
Main Types of Learning
• Supervised Learning
• Unsupervised Learning
• Reinforcement Learning
“ไขความลับ อัลฟ่าโกะ การเรียนรู้แบบเชิงลึก
และอนาคตของปัญญาประดิษฐ์”
ดร. สรรพฤทธิ์ มฤคทัต
22 มี.ค. 2559
https://deepmind.com/alpha-go.html
Google DeepMind:
Ground-breaking AlphaGo masters the game of Go
https://www.youtube.com/watch?v=SUbqykXVx0A
We’ll do this today!
Minimax Decisions
Reinforcement Learning
and
http://www.123rf.com/photo_8104943_hand-drawn-tic-tac-toe-game-format.html
Minimax Decisions
MAX(X)
MIN(O)
MAX(X)
• 1 if wins
• -1 if loses
MAX(X)
MIN(O)
MAX(X)
+1
MAX(X)
MIN(O)
MAX(X)
+1
-1
MAX(X)
MIN(O)
MAX(X)
+1
-1
+1
MAX(X)
MIN(O)
MAX(X)
+1
-1
+1
-1
MAX(X)
MIN(O)
MAX(X)
+1
-1
+1
-1
-1
MAX(X)
MIN(O)
MAX(X)
+1
-1
+1
-1
+1
-1
MAX(X)
MIN(O)
MAX(X)
+1
-1
+1
-1
-1
+1
-1
argmaxa 2 ACTIONS(s)MIN-VALUE(RESULT(state, a))
MINIMAX-DECISION(state)function returns an action
return
MAX-VALUE(state)
...
https://github.com/zkan/tictactoe/blob/master/tictactoe_minimax.py
Alpha-Beta Pruning
[ 1, +1]
↵
max
min
3
max
min
[ 1, +1]
[ 1, +1]
↵
[ 1, 3]
3
[ 1, +1]
↵
max
min
3 12
[ 1, 3]
[ 1, +1]
↵
max
min
3 12 8
[3, +1]
[ 1, 3]
↵
max
min
3 12 8 2
[3, 2]
[3, +1]
[ 1, 3]
↵
max
min
3 12 8 2
Prune!
[3, 2]
[3, +1]
[ 1, 3]
↵
max
min
http://researchers.lille.inria.fr/~munos/
Reinforcement
Learning
Reinforcement Learning (RL)
y = f(x)
z
Given x and z, find a function f that generates y.
Agent-Environment Interaction in RL
Agent
Environment
Rt+1
St+1
St Rt At
State Reward Action
Policy
• The learning agent's way of behaving at a given
time
• A mapping from perceived states of the
environment to acti...
Agent
Environment
Rt+1
St+1
St Rt At
State Reward Action
I’m gonna find my optimal policy!
y = f(x)
z
Given x and z, find a function f that generates y.
Action
State
Reward
Policy
Google DeepMind's Deep Q-learning
playing Atari Breakout
https://www.youtube.com/watch?v=V1eYniJ0Rnk
Flappy Bird Hack using
Reinforcement Learning
http://sarvagyavaish.github.io/FlappyBirdRL/
http://quotes.lifehack.org/quote/albert-einstein/learning-is-experience-everything-else-is-just/
Mystery Game
https://github.com/guiferviz/rl_udacity
Exploration & Exploitation
http://mariashriver.com/blog/2012/10/balancing-happiness-and-heartache-in-alzheimers-kerry-lone...
Q-Learning
Q(s, a) Q(s, a) + ↵(R(s) + maxa0 Q(s0
, a0
) Q(s, a))
↵ learning rate
discount factor
Q table of action values ...
Q-Learning through
Simple Example
1 2
3 4
5
Goal!
1 2
3 4
5
Goal!
1 2
3 4
5 Goal!
1 2
3 4
5 Goal!
0
0
0
00
0
0
0
00
R =
0
B
B
B
B
B
B
@
1 2 3 4 5
1 1 1 0 1 1
2 0 1 1 0 100
3 0 1 1 0 1
4 1 0 0 1 1
5 1 0 1 ...
Q(s, a) Q(s, a) + ↵(R(s) + maxa0 Q(s0
, a0
) Q(s, a))
Suppose:
= 0.8
↵ = 1
1 2
3 4
5 Goal!
0
0
0
00
0
0
0
00
Q =
0
B
B
B
B...
1 2
3 4
5 Goal!
100
0
0
00
0
0
0
00
Q(2, 5) = R(2, 5) + 0.8 ⇥ max(Q(5, 2)) Q(2, 5)
= 100 + 0.8 ⇥ 0 0
= 100
Q =
0
B
B
B
B
B...
1 2
3 4
5 Goal!
100
0
0
800
0
0
0
00
Q(4, 2) = R(4, 2) + 0.8 ⇥ max(Q(2, 1), Q(2, 4), Q(2, 5)) Q(4, 2)
= 0 + 0.8 ⇥ 100 0
= ...
1 2
3 4
5 Goal!
100
0
0
800
64
0
0
00
Q(3, 4) = R(3, 4) + 0.8 ⇥ max(Q(4, 2), Q(4, 3)) Q(3, 4)
= 0 + 0.8 ⇥ 80 0
= 64
Q =
0
...
http://www.123rf.com/photo_8104943_hand-drawn-tic-tac-toe-game-format.html
State Space
• Board
Actions
• Move
Rewards
• +1...
https://github.com/zkan/tictactoe/blob/master/tictactoe_rl.py
Supervised learning
and unsupervised
learning?
BetaGo
http://maxpumperla.github.io/betago/
หัดเขียน A.I. แบบ AlphaGo กันชิวๆ
หัดเขียน A.I. แบบ AlphaGo กันชิวๆ
Upcoming SlideShare
Loading in …5
×

หัดเขียน A.I. แบบ AlphaGo กันชิวๆ

754 views

Published on

ไปพูดที่งาน "โค้ดชิวๆ : หัดเขียน A.I. แบบ AlphaGo กันแบบชิวๆ" เมื่อวันที่ 25 เม.ย. 2559 ที่ Launchpad

Published in: Science
  • You have to choose carefully. ⇒ www.HelpWriting.net ⇐ offers a professional writing service. I highly recommend them. The papers are delivered on time and customers are their first priority. This is their website: ⇒ www.HelpWriting.net ⇐
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

หัดเขียน A.I. แบบ AlphaGo กันชิวๆ

  1. 1. หัดเขียน A.I. แบบ AlphaGo กันชิวๆ Kan Ouivirach http://www.tgdaily.com/web/134986-machine-learning-ai-and-digital-intelligences-effect-on-business
  2. 2. Kan Ouivirach Research & Development Engineer
  3. 3. http://www.bigdata-madesimple.com/ Machine Learning “Given example pairs, induce such that for given pairs and generalizes well for unseen ” –Peter Norvig (2014) (x, y) f y = f(x) x
  4. 4. Dog Cat Generalization
  5. 5. Main Types of Learning • Supervised Learning • Unsupervised Learning • Reinforcement Learning
  6. 6. “ไขความลับ อัลฟ่าโกะ การเรียนรู้แบบเชิงลึก และอนาคตของปัญญาประดิษฐ์” ดร. สรรพฤทธิ์ มฤคทัต 22 มี.ค. 2559
  7. 7. https://deepmind.com/alpha-go.html
  8. 8. Google DeepMind: Ground-breaking AlphaGo masters the game of Go https://www.youtube.com/watch?v=SUbqykXVx0A
  9. 9. We’ll do this today!
  10. 10. Minimax Decisions Reinforcement Learning and http://www.123rf.com/photo_8104943_hand-drawn-tic-tac-toe-game-format.html
  11. 11. Minimax Decisions
  12. 12. MAX(X) MIN(O) MAX(X) • 1 if wins • -1 if loses
  13. 13. MAX(X) MIN(O) MAX(X) +1
  14. 14. MAX(X) MIN(O) MAX(X) +1 -1
  15. 15. MAX(X) MIN(O) MAX(X) +1 -1 +1
  16. 16. MAX(X) MIN(O) MAX(X) +1 -1 +1 -1
  17. 17. MAX(X) MIN(O) MAX(X) +1 -1 +1 -1 -1
  18. 18. MAX(X) MIN(O) MAX(X) +1 -1 +1 -1 +1 -1
  19. 19. MAX(X) MIN(O) MAX(X) +1 -1 +1 -1 -1 +1 -1
  20. 20. argmaxa 2 ACTIONS(s)MIN-VALUE(RESULT(state, a)) MINIMAX-DECISION(state)function returns an action return MAX-VALUE(state) TERMINAL-TEST(state) UTILITY(state) v 1 ACTIONS(state) v MAX(v, MIN-VALUE(RESULT(s, a))) v function return returns then return a utility value afor each in do if TERMINAL-TEST(state) UTILITY(state) ACTIONS(state) v function return returns then return a utility value afor each in do if MIN-VALUE(state) v 1 v MIN(v, MAX-VALUE(RESULT(s, a)))
  21. 21. https://github.com/zkan/tictactoe/blob/master/tictactoe_minimax.py
  22. 22. Alpha-Beta Pruning
  23. 23. [ 1, +1] ↵ max min
  24. 24. 3 max min [ 1, +1] [ 1, +1] ↵
  25. 25. [ 1, 3] 3 [ 1, +1] ↵ max min
  26. 26. 3 12 [ 1, 3] [ 1, +1] ↵ max min
  27. 27. 3 12 8 [3, +1] [ 1, 3] ↵ max min
  28. 28. 3 12 8 2 [3, 2] [3, +1] [ 1, 3] ↵ max min
  29. 29. 3 12 8 2 Prune! [3, 2] [3, +1] [ 1, 3] ↵ max min
  30. 30. http://researchers.lille.inria.fr/~munos/ Reinforcement Learning
  31. 31. Reinforcement Learning (RL) y = f(x) z Given x and z, find a function f that generates y.
  32. 32. Agent-Environment Interaction in RL Agent Environment Rt+1 St+1 St Rt At State Reward Action
  33. 33. Policy • The learning agent's way of behaving at a given time • A mapping from perceived states of the environment to actions to be taken when in those states • A simple lookup table
  34. 34. Agent Environment Rt+1 St+1 St Rt At State Reward Action I’m gonna find my optimal policy!
  35. 35. y = f(x) z Given x and z, find a function f that generates y. Action State Reward Policy
  36. 36. Google DeepMind's Deep Q-learning playing Atari Breakout https://www.youtube.com/watch?v=V1eYniJ0Rnk
  37. 37. Flappy Bird Hack using Reinforcement Learning http://sarvagyavaish.github.io/FlappyBirdRL/
  38. 38. http://quotes.lifehack.org/quote/albert-einstein/learning-is-experience-everything-else-is-just/
  39. 39. Mystery Game https://github.com/guiferviz/rl_udacity
  40. 40. Exploration & Exploitation http://mariashriver.com/blog/2012/10/balancing-happiness-and-heartache-in-alzheimers-kerry-lonergan-luksic/
  41. 41. Q-Learning Q(s, a) Q(s, a) + ↵(R(s) + maxa0 Q(s0 , a0 ) Q(s, a)) ↵ learning rate discount factor Q table of action values indexed by state and action s a previous state previous action a0 s0 R(s) reward in state s action taken in state
  42. 42. Q-Learning through Simple Example 1 2 3 4 5 Goal!
  43. 43. 1 2 3 4 5 Goal! 1 2 3 4 5 Goal!
  44. 44. 1 2 3 4 5 Goal! 0 0 0 00 0 0 0 00 R = 0 B B B B B B @ 1 2 3 4 5 1 1 1 0 1 1 2 0 1 1 0 100 3 0 1 1 0 1 4 1 0 0 1 1 5 1 0 1 1 100 1 C C C C C C A Define a reward function.
  45. 45. Q(s, a) Q(s, a) + ↵(R(s) + maxa0 Q(s0 , a0 ) Q(s, a)) Suppose: = 0.8 ↵ = 1 1 2 3 4 5 Goal! 0 0 0 00 0 0 0 00 Q = 0 B B B B B B @ 1 2 3 4 5 1 0 0 0 0 0 2 0 0 0 0 0 3 0 0 0 0 0 4 0 0 0 0 0 5 0 0 0 0 0 1 C C C C C C A Let’s start here. R = 0 B B B B B B @ 1 2 3 4 5 1 1 1 0 1 1 2 0 1 1 0 100 3 0 1 1 0 1 4 1 0 0 1 1 5 1 0 1 1 100 1 C C C C C C A
  46. 46. 1 2 3 4 5 Goal! 100 0 0 00 0 0 0 00 Q(2, 5) = R(2, 5) + 0.8 ⇥ max(Q(5, 2)) Q(2, 5) = 100 + 0.8 ⇥ 0 0 = 100 Q = 0 B B B B B B @ 1 2 3 4 5 1 0 0 0 0 0 2 0 0 0 0 100 3 0 0 0 0 0 4 0 0 0 0 0 5 0 0 0 0 0 1 C C C C C C A When choose to go to 5..
  47. 47. 1 2 3 4 5 Goal! 100 0 0 800 0 0 0 00 Q(4, 2) = R(4, 2) + 0.8 ⇥ max(Q(2, 1), Q(2, 4), Q(2, 5)) Q(4, 2) = 0 + 0.8 ⇥ 100 0 = 80 Q = 0 B B B B B B @ 1 2 3 4 5 1 0 0 0 0 0 2 0 0 0 0 100 3 0 0 0 0 0 4 0 80 0 0 0 5 0 0 0 0 0 1 C C C C C C A When choose to go to 2..
  48. 48. 1 2 3 4 5 Goal! 100 0 0 800 64 0 0 00 Q(3, 4) = R(3, 4) + 0.8 ⇥ max(Q(4, 2), Q(4, 3)) Q(3, 4) = 0 + 0.8 ⇥ 80 0 = 64 Q = 0 B B B B B B @ 1 2 3 4 5 1 0 0 0 0 0 2 0 0 0 0 100 3 0 0 0 64 0 4 0 80 0 0 0 5 0 0 0 0 0 1 C C C C C C A When choose to go to 4..
  49. 49. http://www.123rf.com/photo_8104943_hand-drawn-tic-tac-toe-game-format.html State Space • Board Actions • Move Rewards • +1 if A.I. wins • -1 if A.I. loses • 0.5 if the game’s a draw Back to our Tic Tac Toe!
  50. 50. https://github.com/zkan/tictactoe/blob/master/tictactoe_rl.py
  51. 51. Supervised learning and unsupervised learning?
  52. 52. BetaGo http://maxpumperla.github.io/betago/

×