SlideShare a Scribd company logo
A brief overview of
Reinforcement Learning
applied to games
Thomas Paula
August 16, 2018 - #10 Porto Alegre Machine Learning Meetup
Who am I?
2RL applied to games
Thomas Paula
● Machine Learning Engineer and Researcher @HP
● Msc in Computer Science
● POA Machine Learning Meetup
● @tsp_thomas
● tsp.thomas@gmail.com
Why study games?
● Simple rules and deep concepts
● Some of them are studied for hundreds or
thousands of years
● Encapsulate real world issues
● Games are fun :)
3RL applied to games
Source: David Silver, 2015
Agenda
● Introduction
○ Artificial Intelligence
○ Challenge for AI: beat humans in chess
● Reinforcement Learning
● Deep Reinforcement Learning
● Closing thoughts
4RL applied to games
Introduction
5
Artificial Intelligence
6RL applied to games
Source: Deep Learning (Goodfellow, Bengio, Courville)
Artificial Intelligence
● “The effort to automate intellectual tasks
normally performed by humans"
● Born in 1950s: people trying to make
computers think
● People used to believe human-level artificial
intelligence = hand-crafted set of rules
● 1950s to 1980s: Symbolic AI
7RL applied to games
Why chess is (was) challenging for computers?
Programming a Computer for Playing Chess
● Seminal paper of Claude Shannon in 1950
● Number of possible positions ~10^120
○ Number of atoms in known universe
estimate: 10^78 to 10^82
● Pure brute force: impossible even for modern
computers
8RL applied to games
Why chess is (was) challenging for computers?
● Let’s take tic-tac-toe as an example
9RL applied to games
O X O
X X
O
Source: https://materiaalit.github.io/intro-to-ai-17/part2/
Game Tree
What about chess?
10
IBM Deep Blue
● Chess-playing computer developed by IBM
● Won first game against Garry Kasparov on 10 February 1996
● Approach based on Symbolic AI
○ Alpha-beta pruning search algorithm
○ Deep Blue executed it in parallel
● Deep Blue won a six-game match, but was accused of
cheating in the last one
● Results
○ Deep Blue was retired
○ Stockfish
11RL applied to games
Source: https://en.wikipedia.org/wiki/Deep_Blue_(chess_computer)
Go
12RL applied to games
Number of possible positions: ~10^170!
Branching factor: average is 250!
RL applied to games
How can we solve Go?
13
Reinforcement Learning
to the rescue!
Reinforcement Learning
14
What is Reinforcement Learning
● Trial and error (no supervisor)
● Feedback is delayed, not instantaneous
● Time matters (data is not i.i.d.)
● Actions affect next states
15RL applied to games
Source: Richard Sutton, 2017
Comparison to Supervised/Unsupervised Learning
Supervised Learning
● Set of labeled examples provided by “external supervisor”
● Not applicable to learning from interaction
○ Generally complicated to obtain examples of all situations
Unsupervised Learning
● Usually tries to learn structure/data representation
● Does not exactly match RL: RL wants to maximize a reward
16RL applied to games
Source: David Silver, 2015
Reinforcement Learning Agent
17RL applied to games
Policy
A function for the
behavior, which maps
states to actions.
Value function
How good is each state
and/or action
Model
Agent’s representation
of the environment
RL Agent
Markov Decision Process (MDP)
18RL applied to games
In general
● Mathematical framework for modelling decision
making
● States, actions, and rewards
Relationship with RL
● Formally describe an environment for RL, where
the environment is fully-observable
● Almost all RL problems can be formalized as
MDPs
Source: David Silver, 2015
RL simple example (1)
19RL applied to games
+1
-1
Environment Possible Policy
RL simple example (2)- Q-learning
20RL applied to games
Source: https://cs.stanford.edu/people/karpathy/reinforcejs/gridworld_td.html
Examples of RL success in games (prior DL)
21RL applied to games
Backgammon
TD-Gammon (1992)
Scrabble
Maven (2000s)
What about Atari games?
22RL applied to games
How to represent complex games in RL scenario?
Can we use Deep Learning to capture information from raw pixels?
Deep Reinforcement Learning
23
Deep Q-Learning (DQN)
● Q-Learning is a tabular method
○ What if it’s the first time we’re visiting an state?
● Can we use a neural network as our Q-function?
○ Yes!
○ However, RL is unstable/diverge when using a nonlinear function
approximator (e.g. a neural network)
● DQN has clever techniques to solve that!
24RL applied to games
Source: Human-level control through deep reinforcement learning, 2015
DQN - Overview
25RL applied to games
Source: Human-level control through deep reinforcement learning, 2015
DQN - Overview
26RL applied to games
Source: Resource Management with Deep Reinforcement Learning, 2016
DQN - Breakout
27RL applied to games
Source: https://www.youtube.com/watch?v=TmPfTpjtdgg
DQN
28RL applied to games
Source: Human-level control through deep reinforcement learning, 2015
DQN
● Single architecture can successfully learn control policies in a range of different
environments
● Deep network architectures and reinforcement learning
○ Experience replay
○ Target network: made algorithm more stable
● Limitations
○ Games that demand more temporally extended strategies still a great
challenge
29RL applied to games
Source: Human-level control through deep reinforcement learning, 2015
Go
30RL applied to games
Number of possible positions: ~10^170!
Branching factor: average is 250!
AlphaGo
31RL applied to games
Policy Network Value Network
AlphaGo - Training Pipeline (simplified)
32RL applied to games
Source: Mastering the game of Go with deep neural networks and tree search, 2016
AlphaGo - Monte Carlo Tree Search (MCTS)
33RL applied to games
Source: Mastering the game of Go with deep neural networks and tree search, 2016
AlphaGo - Results
● Played against Lee Sedol, in
March 2016
● Lee is has won 18 world titles
● AlphaGo won the match 4-1
34RL applied to games
AlphaGo Documentary (Netflix)
AlphaZero (as per David Silver’s NIPS talk)
No human data
● Learns based on self-reinforcement learning,
starting from random
No human features
● Only takes raw board as input
Single neural network
● Policy and Value networks are combined
Simplified search
● No Monte Carlo rollouts, uses neural network to
evaluate
35RL applied to games
Source: 2017 NIPS Keynote by DeepMind's David Silver
AlphaZero (as per David Silver’s NIPS talk)
36RL applied to games
Source: 2017 NIPS Keynote by DeepMind's David Silver
Dota 2
Dota 2
● Real time strategy (RTS) game
○ Actually a specialization called Multiplayer
online battle arena (MOBA)
● Two teams of five players, where each player
controls a hero
● Main goal is to destroy other opponents “base”
● Lots of challenges for RL
38RL applied to games
Dota 2 - Challenges for RL
● Long time horizons
○ 30 fps for 45 minutes
● Partially-observed state
○ Part of the map is seen
○ Needs to make inferences with incomplete data
● High-dimensional, continuous action space
○ Space discretized into 170,000 possible actions
○ ~1,000 valid actions in “a moment”
● High-dimensional, continuous observation space
○ State: 20,000 numbers
39RL applied to games
Source: https://blog.openai.com/openai-five/
Dota 2 - OpenAI Five
● Each hero represented as a 1024-unit
LSTM
● Extracts game state with Valve’s Bot API
● Learns entirely from self-play
● Uses Proximal Policy Optimization
(PPO) for training
40RL applied to games
Source: https://blog.openai.com/openai-five/
Dota 2 - OpenAI Five
41RL applied to games
Source: https://blog.openai.com/openai-five/
● Simplified version of the game (not all heroes, removed some tactics)
● Played against team of 99.95th percentile Dota players
○ Four have played professionally
● 3 games
○ OpenAI Five won 1st and 2nd
○ 3rd: audience was asked to choose the heroes
■ AI predict 2.9% change of winning
OpenAI Five -> Dexterity
● Robot hand that can manipulate physical objects
● Makes use of the same RL algorithm of OpenAI Five
42RL applied to games
Other examples
43RL applied to games
Starcraft II Battlefield
Closing Thoughts
44
Take home message
● Reinforcement Learning is a hot topic
● The combination of RL and Deep Learning is producing great results
● Games are a great proxy for developing solutions for real-world problems
○ Lots of challenges far from being solved
● What about an RL agent that plays against you and improves to tackle your
way of playing?
45RL applied to games
Thank you!
August 16, 2018 - #10 Porto Alegre Machine Learning Meetup
Thomas Paula
● @tsp_thomas
● tsp.thomas@gmail.com

More Related Content

What's hot

Reinforcement Learning In AI Powerpoint Presentation Slide Templates Complete...
Reinforcement Learning In AI Powerpoint Presentation Slide Templates Complete...Reinforcement Learning In AI Powerpoint Presentation Slide Templates Complete...
Reinforcement Learning In AI Powerpoint Presentation Slide Templates Complete...
SlideTeam
 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-Learning
Kuppusamy P
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement Learning
Khaled Saleh
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
MeetupDataScienceRoma
 
Explainable AI
Explainable AIExplainable AI
Explainable AI
Wagston Staehler
 
machine learning
machine learningmachine learning
machine learning
soundaryasarya
 
Reinforcement Learning Tutorial | Edureka
Reinforcement Learning Tutorial | EdurekaReinforcement Learning Tutorial | Edureka
Reinforcement Learning Tutorial | Edureka
Edureka!
 
Machine learning
Machine learningMachine learning
Machine learning
InfoFarm
 
Reinforcement Learning Q-Learning
Reinforcement Learning   Q-Learning Reinforcement Learning   Q-Learning
Reinforcement Learning Q-Learning
Melaku Eneayehu
 
An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)
pauldix
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
Shahan Ali Memon
 
Deep reinforcement learning from scratch
Deep reinforcement learning from scratchDeep reinforcement learning from scratch
Deep reinforcement learning from scratch
Jie-Han Chen
 
Deep Q-Learning
Deep Q-LearningDeep Q-Learning
Deep Q-Learning
Nikolay Pavlov
 
AI: AI & Searching
AI: AI & SearchingAI: AI & Searching
AI: AI & Searching
DataminingTools Inc
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
Koundinya Desiraju
 
Reinforcement Learning 6. Temporal Difference Learning
Reinforcement Learning 6. Temporal Difference LearningReinforcement Learning 6. Temporal Difference Learning
Reinforcement Learning 6. Temporal Difference Learning
Seung Jae Lee
 
Reinforcement Learning 2. Multi-armed Bandits
Reinforcement Learning 2. Multi-armed BanditsReinforcement Learning 2. Multi-armed Bandits
Reinforcement Learning 2. Multi-armed Bandits
Seung Jae Lee
 
Generative models
Generative modelsGenerative models
Generative models
Birger Moell
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networks
Si Haem
 
Reinforcement learning slides
Reinforcement learning slidesReinforcement learning slides
Reinforcement learning slides
OmranHakami
 

What's hot (20)

Reinforcement Learning In AI Powerpoint Presentation Slide Templates Complete...
Reinforcement Learning In AI Powerpoint Presentation Slide Templates Complete...Reinforcement Learning In AI Powerpoint Presentation Slide Templates Complete...
Reinforcement Learning In AI Powerpoint Presentation Slide Templates Complete...
 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-Learning
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement Learning
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 
Explainable AI
Explainable AIExplainable AI
Explainable AI
 
machine learning
machine learningmachine learning
machine learning
 
Reinforcement Learning Tutorial | Edureka
Reinforcement Learning Tutorial | EdurekaReinforcement Learning Tutorial | Edureka
Reinforcement Learning Tutorial | Edureka
 
Machine learning
Machine learningMachine learning
Machine learning
 
Reinforcement Learning Q-Learning
Reinforcement Learning   Q-Learning Reinforcement Learning   Q-Learning
Reinforcement Learning Q-Learning
 
An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Deep reinforcement learning from scratch
Deep reinforcement learning from scratchDeep reinforcement learning from scratch
Deep reinforcement learning from scratch
 
Deep Q-Learning
Deep Q-LearningDeep Q-Learning
Deep Q-Learning
 
AI: AI & Searching
AI: AI & SearchingAI: AI & Searching
AI: AI & Searching
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Reinforcement Learning 6. Temporal Difference Learning
Reinforcement Learning 6. Temporal Difference LearningReinforcement Learning 6. Temporal Difference Learning
Reinforcement Learning 6. Temporal Difference Learning
 
Reinforcement Learning 2. Multi-armed Bandits
Reinforcement Learning 2. Multi-armed BanditsReinforcement Learning 2. Multi-armed Bandits
Reinforcement Learning 2. Multi-armed Bandits
 
Generative models
Generative modelsGenerative models
Generative models
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networks
 
Reinforcement learning slides
Reinforcement learning slidesReinforcement learning slides
Reinforcement learning slides
 

Similar to A brief overview of Reinforcement Learning applied to games

Artificial Intelligence: Facebook loses to Google in race to solve the ancien...
Artificial Intelligence: Facebook loses to Google in race to solve the ancien...Artificial Intelligence: Facebook loses to Google in race to solve the ancien...
Artificial Intelligence: Facebook loses to Google in race to solve the ancien...
Claire Rioualen
 
J-Fall 2017 - AI Self-learning Game Playing
J-Fall 2017 - AI Self-learning Game PlayingJ-Fall 2017 - AI Self-learning Game Playing
J-Fall 2017 - AI Self-learning Game Playing
Richard Abbuhl
 
Memory-based Reinforcement Learning
Memory-based Reinforcement LearningMemory-based Reinforcement Learning
Memory-based Reinforcement Learning
Hung Le
 
GDSC Introduction to Deep Learning Workshop
GDSC Introduction to Deep Learning WorkshopGDSC Introduction to Deep Learning Workshop
GDSC Introduction to Deep Learning Workshop
ssuser540861
 
Halite @ Google Cloud Next 2018
Halite @ Google Cloud Next 2018Halite @ Google Cloud Next 2018
Halite @ Google Cloud Next 2018
Two Sigma
 
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree SearchAlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search
Karel Ha
 
The Role of Evolutionary Computation in Game AI
The Role of Evolutionary Computation in Game AIThe Role of Evolutionary Computation in Game AI
The Role of Evolutionary Computation in Game AI
Mike Preuss
 
From alpha go to alpha zero TLP innova 2018
From alpha go to alpha zero  TLP innova 2018From alpha go to alpha zero  TLP innova 2018
From alpha go to alpha zero TLP innova 2018
Juantomás García Molina
 
Rogue like-ness-ness! tgc 2018 presentation
Rogue like-ness-ness! tgc 2018 presentationRogue like-ness-ness! tgc 2018 presentation
Rogue like-ness-ness! tgc 2018 presentation
Aidin Zolghadr
 
Multi-Agent Reinforcement Learning
Multi-Agent Reinforcement LearningMulti-Agent Reinforcement Learning
Multi-Agent Reinforcement Learning
Seolhokim
 
How DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of GoHow DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of Go
Tim Riser
 
What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?
Tobias Pfeiffer
 
Memory for Lean Reinforcement Learning.pdf
Memory for Lean Reinforcement Learning.pdfMemory for Lean Reinforcement Learning.pdf
Memory for Lean Reinforcement Learning.pdf
Hung Le
 
earning by s/doing/h4ck1ng/ - Our experience learning application security th...
earning by s/doing/h4ck1ng/ - Our experience learning application security th...earning by s/doing/h4ck1ng/ - Our experience learning application security th...
earning by s/doing/h4ck1ng/ - Our experience learning application security th...
NECST Lab @ Politecnico di Milano
 
What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?
Tobias Pfeiffer
 
Mastering the game of Go with deep neural networks and tree search: Presentation
Mastering the game of Go with deep neural networks and tree search: PresentationMastering the game of Go with deep neural networks and tree search: Presentation
Mastering the game of Go with deep neural networks and tree search: Presentation
Karel Ha
 
Understanding AlphaGo
Understanding AlphaGoUnderstanding AlphaGo
Understanding AlphaGo
Amit Mandelbaum
 
GenAi LLMs Zero to Hero: Mastering GenAI
GenAi LLMs Zero to Hero: Mastering GenAIGenAi LLMs Zero to Hero: Mastering GenAI
GenAi LLMs Zero to Hero: Mastering GenAI
ShakeelAhmed286165
 
Think machine-learning-with-scikit-learn-chetan
Think machine-learning-with-scikit-learn-chetanThink machine-learning-with-scikit-learn-chetan
Think machine-learning-with-scikit-learn-chetan
Chetan Khatri
 
weekly AI tech talk #85 ml-agents Enabling Learned Behaviors with Reinforceme...
weekly AI tech talk #85 ml-agents Enabling Learned Behaviors with Reinforceme...weekly AI tech talk #85 ml-agents Enabling Learned Behaviors with Reinforceme...
weekly AI tech talk #85 ml-agents Enabling Learned Behaviors with Reinforceme...
Bill Liu
 

Similar to A brief overview of Reinforcement Learning applied to games (20)

Artificial Intelligence: Facebook loses to Google in race to solve the ancien...
Artificial Intelligence: Facebook loses to Google in race to solve the ancien...Artificial Intelligence: Facebook loses to Google in race to solve the ancien...
Artificial Intelligence: Facebook loses to Google in race to solve the ancien...
 
J-Fall 2017 - AI Self-learning Game Playing
J-Fall 2017 - AI Self-learning Game PlayingJ-Fall 2017 - AI Self-learning Game Playing
J-Fall 2017 - AI Self-learning Game Playing
 
Memory-based Reinforcement Learning
Memory-based Reinforcement LearningMemory-based Reinforcement Learning
Memory-based Reinforcement Learning
 
GDSC Introduction to Deep Learning Workshop
GDSC Introduction to Deep Learning WorkshopGDSC Introduction to Deep Learning Workshop
GDSC Introduction to Deep Learning Workshop
 
Halite @ Google Cloud Next 2018
Halite @ Google Cloud Next 2018Halite @ Google Cloud Next 2018
Halite @ Google Cloud Next 2018
 
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree SearchAlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search
 
The Role of Evolutionary Computation in Game AI
The Role of Evolutionary Computation in Game AIThe Role of Evolutionary Computation in Game AI
The Role of Evolutionary Computation in Game AI
 
From alpha go to alpha zero TLP innova 2018
From alpha go to alpha zero  TLP innova 2018From alpha go to alpha zero  TLP innova 2018
From alpha go to alpha zero TLP innova 2018
 
Rogue like-ness-ness! tgc 2018 presentation
Rogue like-ness-ness! tgc 2018 presentationRogue like-ness-ness! tgc 2018 presentation
Rogue like-ness-ness! tgc 2018 presentation
 
Multi-Agent Reinforcement Learning
Multi-Agent Reinforcement LearningMulti-Agent Reinforcement Learning
Multi-Agent Reinforcement Learning
 
How DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of GoHow DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of Go
 
What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?
 
Memory for Lean Reinforcement Learning.pdf
Memory for Lean Reinforcement Learning.pdfMemory for Lean Reinforcement Learning.pdf
Memory for Lean Reinforcement Learning.pdf
 
earning by s/doing/h4ck1ng/ - Our experience learning application security th...
earning by s/doing/h4ck1ng/ - Our experience learning application security th...earning by s/doing/h4ck1ng/ - Our experience learning application security th...
earning by s/doing/h4ck1ng/ - Our experience learning application security th...
 
What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?
 
Mastering the game of Go with deep neural networks and tree search: Presentation
Mastering the game of Go with deep neural networks and tree search: PresentationMastering the game of Go with deep neural networks and tree search: Presentation
Mastering the game of Go with deep neural networks and tree search: Presentation
 
Understanding AlphaGo
Understanding AlphaGoUnderstanding AlphaGo
Understanding AlphaGo
 
GenAi LLMs Zero to Hero: Mastering GenAI
GenAi LLMs Zero to Hero: Mastering GenAIGenAi LLMs Zero to Hero: Mastering GenAI
GenAi LLMs Zero to Hero: Mastering GenAI
 
Think machine-learning-with-scikit-learn-chetan
Think machine-learning-with-scikit-learn-chetanThink machine-learning-with-scikit-learn-chetan
Think machine-learning-with-scikit-learn-chetan
 
weekly AI tech talk #85 ml-agents Enabling Learned Behaviors with Reinforceme...
weekly AI tech talk #85 ml-agents Enabling Learned Behaviors with Reinforceme...weekly AI tech talk #85 ml-agents Enabling Learned Behaviors with Reinforceme...
weekly AI tech talk #85 ml-agents Enabling Learned Behaviors with Reinforceme...
 

Recently uploaded

GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 

Recently uploaded (20)

GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 

A brief overview of Reinforcement Learning applied to games

  • 1. A brief overview of Reinforcement Learning applied to games Thomas Paula August 16, 2018 - #10 Porto Alegre Machine Learning Meetup
  • 2. Who am I? 2RL applied to games Thomas Paula ● Machine Learning Engineer and Researcher @HP ● Msc in Computer Science ● POA Machine Learning Meetup ● @tsp_thomas ● tsp.thomas@gmail.com
  • 3. Why study games? ● Simple rules and deep concepts ● Some of them are studied for hundreds or thousands of years ● Encapsulate real world issues ● Games are fun :) 3RL applied to games Source: David Silver, 2015
  • 4. Agenda ● Introduction ○ Artificial Intelligence ○ Challenge for AI: beat humans in chess ● Reinforcement Learning ● Deep Reinforcement Learning ● Closing thoughts 4RL applied to games
  • 6. Artificial Intelligence 6RL applied to games Source: Deep Learning (Goodfellow, Bengio, Courville)
  • 7. Artificial Intelligence ● “The effort to automate intellectual tasks normally performed by humans" ● Born in 1950s: people trying to make computers think ● People used to believe human-level artificial intelligence = hand-crafted set of rules ● 1950s to 1980s: Symbolic AI 7RL applied to games
  • 8. Why chess is (was) challenging for computers? Programming a Computer for Playing Chess ● Seminal paper of Claude Shannon in 1950 ● Number of possible positions ~10^120 ○ Number of atoms in known universe estimate: 10^78 to 10^82 ● Pure brute force: impossible even for modern computers 8RL applied to games
  • 9. Why chess is (was) challenging for computers? ● Let’s take tic-tac-toe as an example 9RL applied to games O X O X X O Source: https://materiaalit.github.io/intro-to-ai-17/part2/ Game Tree
  • 11. IBM Deep Blue ● Chess-playing computer developed by IBM ● Won first game against Garry Kasparov on 10 February 1996 ● Approach based on Symbolic AI ○ Alpha-beta pruning search algorithm ○ Deep Blue executed it in parallel ● Deep Blue won a six-game match, but was accused of cheating in the last one ● Results ○ Deep Blue was retired ○ Stockfish 11RL applied to games Source: https://en.wikipedia.org/wiki/Deep_Blue_(chess_computer)
  • 12. Go 12RL applied to games Number of possible positions: ~10^170! Branching factor: average is 250!
  • 13. RL applied to games How can we solve Go? 13 Reinforcement Learning to the rescue!
  • 15. What is Reinforcement Learning ● Trial and error (no supervisor) ● Feedback is delayed, not instantaneous ● Time matters (data is not i.i.d.) ● Actions affect next states 15RL applied to games Source: Richard Sutton, 2017
  • 16. Comparison to Supervised/Unsupervised Learning Supervised Learning ● Set of labeled examples provided by “external supervisor” ● Not applicable to learning from interaction ○ Generally complicated to obtain examples of all situations Unsupervised Learning ● Usually tries to learn structure/data representation ● Does not exactly match RL: RL wants to maximize a reward 16RL applied to games Source: David Silver, 2015
  • 17. Reinforcement Learning Agent 17RL applied to games Policy A function for the behavior, which maps states to actions. Value function How good is each state and/or action Model Agent’s representation of the environment RL Agent
  • 18. Markov Decision Process (MDP) 18RL applied to games In general ● Mathematical framework for modelling decision making ● States, actions, and rewards Relationship with RL ● Formally describe an environment for RL, where the environment is fully-observable ● Almost all RL problems can be formalized as MDPs Source: David Silver, 2015
  • 19. RL simple example (1) 19RL applied to games +1 -1 Environment Possible Policy
  • 20. RL simple example (2)- Q-learning 20RL applied to games Source: https://cs.stanford.edu/people/karpathy/reinforcejs/gridworld_td.html
  • 21. Examples of RL success in games (prior DL) 21RL applied to games Backgammon TD-Gammon (1992) Scrabble Maven (2000s)
  • 22. What about Atari games? 22RL applied to games How to represent complex games in RL scenario? Can we use Deep Learning to capture information from raw pixels?
  • 24. Deep Q-Learning (DQN) ● Q-Learning is a tabular method ○ What if it’s the first time we’re visiting an state? ● Can we use a neural network as our Q-function? ○ Yes! ○ However, RL is unstable/diverge when using a nonlinear function approximator (e.g. a neural network) ● DQN has clever techniques to solve that! 24RL applied to games Source: Human-level control through deep reinforcement learning, 2015
  • 25. DQN - Overview 25RL applied to games Source: Human-level control through deep reinforcement learning, 2015
  • 26. DQN - Overview 26RL applied to games Source: Resource Management with Deep Reinforcement Learning, 2016
  • 27. DQN - Breakout 27RL applied to games Source: https://www.youtube.com/watch?v=TmPfTpjtdgg
  • 28. DQN 28RL applied to games Source: Human-level control through deep reinforcement learning, 2015
  • 29. DQN ● Single architecture can successfully learn control policies in a range of different environments ● Deep network architectures and reinforcement learning ○ Experience replay ○ Target network: made algorithm more stable ● Limitations ○ Games that demand more temporally extended strategies still a great challenge 29RL applied to games Source: Human-level control through deep reinforcement learning, 2015
  • 30. Go 30RL applied to games Number of possible positions: ~10^170! Branching factor: average is 250!
  • 31. AlphaGo 31RL applied to games Policy Network Value Network
  • 32. AlphaGo - Training Pipeline (simplified) 32RL applied to games Source: Mastering the game of Go with deep neural networks and tree search, 2016
  • 33. AlphaGo - Monte Carlo Tree Search (MCTS) 33RL applied to games Source: Mastering the game of Go with deep neural networks and tree search, 2016
  • 34. AlphaGo - Results ● Played against Lee Sedol, in March 2016 ● Lee is has won 18 world titles ● AlphaGo won the match 4-1 34RL applied to games AlphaGo Documentary (Netflix)
  • 35. AlphaZero (as per David Silver’s NIPS talk) No human data ● Learns based on self-reinforcement learning, starting from random No human features ● Only takes raw board as input Single neural network ● Policy and Value networks are combined Simplified search ● No Monte Carlo rollouts, uses neural network to evaluate 35RL applied to games Source: 2017 NIPS Keynote by DeepMind's David Silver
  • 36. AlphaZero (as per David Silver’s NIPS talk) 36RL applied to games Source: 2017 NIPS Keynote by DeepMind's David Silver
  • 38. Dota 2 ● Real time strategy (RTS) game ○ Actually a specialization called Multiplayer online battle arena (MOBA) ● Two teams of five players, where each player controls a hero ● Main goal is to destroy other opponents “base” ● Lots of challenges for RL 38RL applied to games
  • 39. Dota 2 - Challenges for RL ● Long time horizons ○ 30 fps for 45 minutes ● Partially-observed state ○ Part of the map is seen ○ Needs to make inferences with incomplete data ● High-dimensional, continuous action space ○ Space discretized into 170,000 possible actions ○ ~1,000 valid actions in “a moment” ● High-dimensional, continuous observation space ○ State: 20,000 numbers 39RL applied to games Source: https://blog.openai.com/openai-five/
  • 40. Dota 2 - OpenAI Five ● Each hero represented as a 1024-unit LSTM ● Extracts game state with Valve’s Bot API ● Learns entirely from self-play ● Uses Proximal Policy Optimization (PPO) for training 40RL applied to games Source: https://blog.openai.com/openai-five/
  • 41. Dota 2 - OpenAI Five 41RL applied to games Source: https://blog.openai.com/openai-five/ ● Simplified version of the game (not all heroes, removed some tactics) ● Played against team of 99.95th percentile Dota players ○ Four have played professionally ● 3 games ○ OpenAI Five won 1st and 2nd ○ 3rd: audience was asked to choose the heroes ■ AI predict 2.9% change of winning
  • 42. OpenAI Five -> Dexterity ● Robot hand that can manipulate physical objects ● Makes use of the same RL algorithm of OpenAI Five 42RL applied to games
  • 43. Other examples 43RL applied to games Starcraft II Battlefield
  • 45. Take home message ● Reinforcement Learning is a hot topic ● The combination of RL and Deep Learning is producing great results ● Games are a great proxy for developing solutions for real-world problems ○ Lots of challenges far from being solved ● What about an RL agent that plays against you and improves to tackle your way of playing? 45RL applied to games
  • 46. Thank you! August 16, 2018 - #10 Porto Alegre Machine Learning Meetup Thomas Paula ● @tsp_thomas ● tsp.thomas@gmail.com