SlideShare a Scribd company logo
Playing the Snake Game with
Deep Reinforcement Learning
Chuyang Liu
https://github.com/chuyangliu/snake
Contents
01
02
03
04
Background
My Works
Algorithms
Experiments
05 Summary
Background
2015 2016 2017
DQN Optimizations
DDQN, Prioritized, Duel
Human-Level Control through Deep
Reinforcement Learning (DeepMind)
Deep Q-Network (DQN)
Snake AI
(Hawstein, Tobin Ehlis,
Christopher Lockhart, …)
Graph Theory, GP, RL, …
Snake AI (Chuyang Liu)
Path Search, Hamilton Cycle
Benchmark: Atari 2600 BFS, Longest Path
My Works
?
Designed global/local state vector
Discussed absolute/relative move direction
Feature Extraction
Implemented Deep Q-Network
(Experience Replay & Fixed Target-Net)
DQN
Compared the performance of
the three optimizations for DQN
DQN Optimizations
Designed a better reward
policy improving performance
Environment
𝜽 ← 𝜽 + 𝜼
𝝏
𝝏𝜽
𝒓 + 𝜸𝐦𝐚𝐱
𝒂′
𝑸 𝒕𝒂𝒓𝒈𝒆𝒕 𝒔′
, 𝒂′
; 𝜽′
− 𝑸 𝒆𝒗𝒂𝒍(𝒔, 𝒂; 𝜽)
𝟐
Policy
𝑎 ← 𝜋 𝑠
Eval-NetTarget-Net
Q-Target Q-Eval
Loss
state
action
state’reward
Environment
𝑟, 𝑠′
← 𝑒𝑛𝑣 𝑠, 𝑎
Algorithm: Computation Graph
assign
Environment
𝑟, 𝑠′
← 𝑒𝑛𝑣 𝑠, 𝑎
Algorithm: Environment
FOOD +1.0
DEAD -0.5 (game over)
ELSE -0.005
𝜽 ← 𝜽 + 𝜼
𝝏
𝝏𝜽
𝒓 + 𝜸𝐦𝐚𝐱
𝒂′
𝑸 𝒕𝒂𝒓𝒈𝒆𝒕 𝒔′
, 𝒂′
; 𝜽′
− 𝑸 𝒆𝒗𝒂𝒍(𝒔, 𝒂; 𝜽)
𝟐
Policy
𝑎 ← 𝜋 𝑠
state
action
state’reward
𝜽 ← 𝜽 + 𝜼
𝝏
𝝏𝜽
𝒓 + 𝜸𝐦𝐚𝐱
𝒂′
𝑸 𝒕𝒂𝒓𝒈𝒆𝒕 𝒔′
, 𝒂′
; 𝜽′
− 𝑸 𝒆𝒗𝒂𝒍(𝒔, 𝒂; 𝜽)
𝟐
Policy
𝑎 ← 𝜋 𝑠
Eval-NetTarget-Net
Q-Target Q-Eval
Loss
state
action
state’reward
Environment
𝑟, 𝑠′
← 𝑒𝑛𝑣 𝑠, 𝑎
Algorithm: Computation Graph
assign
Eval-NetTarget-Net
Algorithm: Network & State Vector
assign
Target-Net
Algorithm: Network & State Vector
global (8x8x4)
local (1x3)
Conv3x3x32
Conv3x3x64
Conv2x2x128
Conv2x2x256
FC1024
FC512
Advantage1x3 Value1x1
Q1x3
FC512
Eval-Net
Same structure
but different weights and bias
assign
Experiments
Experiments
Summary
Double DQN
Prioritized Experience Replay
Dueling Network Structure
DQN
Feature Extraction
Environment Feedback
State & Action
Human experience
Pixels input
Network Inputs
Deep Neural Network
Deep Q-Learning from
Demonstrations (2018)
Training Efficiency
Thanks
Chuyang Liu
https://github.com/chuyangliu/snake

More Related Content

What's hot

Software Engineering an Introduction
Software Engineering an IntroductionSoftware Engineering an Introduction
Software Engineering an Introduction
Ajit Nayak
 
Data Structures Chapter-2
Data Structures Chapter-2Data Structures Chapter-2
Data Structures Chapter-2
priyavanimurugarajan
 
Planning Agent
Planning AgentPlanning Agent
Planning Agent
Shiwani Gupta
 
Np cooks theorem
Np cooks theoremNp cooks theorem
Np cooks theorem
Narayana Galla
 
POST’s CORRESPONDENCE PROBLEM
POST’s CORRESPONDENCE PROBLEMPOST’s CORRESPONDENCE PROBLEM
POST’s CORRESPONDENCE PROBLEM
Rajendran
 
Forward and Backward chaining in AI
Forward and Backward chaining in AIForward and Backward chaining in AI
Forward and Backward chaining in AI
Megha Sharma
 
Hill climbing algorithm
Hill climbing algorithmHill climbing algorithm
Hill climbing algorithm
Dr. C.V. Suresh Babu
 
Lisp
LispLisp
Data Structures Chapter-4
Data Structures Chapter-4Data Structures Chapter-4
Data Structures Chapter-4
priyavanimurugarajan
 
AI Lecture 4 (informed search and exploration)
AI Lecture 4 (informed search and exploration)AI Lecture 4 (informed search and exploration)
AI Lecture 4 (informed search and exploration)
Tajim Md. Niamat Ullah Akhund
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streams
Krish_ver2
 
Heap sort
Heap sort Heap sort
Randomized algorithms ver 1.0
Randomized algorithms ver 1.0Randomized algorithms ver 1.0
Randomized algorithms ver 1.0
Dr. C.V. Suresh Babu
 
Uninformed search /Blind search in AI
Uninformed search /Blind search in AIUninformed search /Blind search in AI
Uninformed search /Blind search in AI
Kirti Verma
 
ML - Multiple Linear Regression
ML - Multiple Linear RegressionML - Multiple Linear Regression
ML - Multiple Linear Regression
Andrew Ferlitsch
 
Introduction to Data Structure
Introduction to Data Structure Introduction to Data Structure
Introduction to Data Structure
Prof Ansari
 
Planning
PlanningPlanning
Planning
ahmad bassiouny
 
Artificial Intelligence 1 Planning In The Real World
Artificial Intelligence 1 Planning In The Real WorldArtificial Intelligence 1 Planning In The Real World
Artificial Intelligence 1 Planning In The Real World
ahmad bassiouny
 
Closure properties
Closure propertiesClosure properties
Closure properties
Dr. ABHISHEK K PANDEY
 
Theory of computation / Post’s Correspondence Problems (PCP)
Theory of computation / Post’s Correspondence Problems (PCP)Theory of computation / Post’s Correspondence Problems (PCP)
Theory of computation / Post’s Correspondence Problems (PCP)
Technical Advisor at Iraqi Government
 

What's hot (20)

Software Engineering an Introduction
Software Engineering an IntroductionSoftware Engineering an Introduction
Software Engineering an Introduction
 
Data Structures Chapter-2
Data Structures Chapter-2Data Structures Chapter-2
Data Structures Chapter-2
 
Planning Agent
Planning AgentPlanning Agent
Planning Agent
 
Np cooks theorem
Np cooks theoremNp cooks theorem
Np cooks theorem
 
POST’s CORRESPONDENCE PROBLEM
POST’s CORRESPONDENCE PROBLEMPOST’s CORRESPONDENCE PROBLEM
POST’s CORRESPONDENCE PROBLEM
 
Forward and Backward chaining in AI
Forward and Backward chaining in AIForward and Backward chaining in AI
Forward and Backward chaining in AI
 
Hill climbing algorithm
Hill climbing algorithmHill climbing algorithm
Hill climbing algorithm
 
Lisp
LispLisp
Lisp
 
Data Structures Chapter-4
Data Structures Chapter-4Data Structures Chapter-4
Data Structures Chapter-4
 
AI Lecture 4 (informed search and exploration)
AI Lecture 4 (informed search and exploration)AI Lecture 4 (informed search and exploration)
AI Lecture 4 (informed search and exploration)
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streams
 
Heap sort
Heap sort Heap sort
Heap sort
 
Randomized algorithms ver 1.0
Randomized algorithms ver 1.0Randomized algorithms ver 1.0
Randomized algorithms ver 1.0
 
Uninformed search /Blind search in AI
Uninformed search /Blind search in AIUninformed search /Blind search in AI
Uninformed search /Blind search in AI
 
ML - Multiple Linear Regression
ML - Multiple Linear RegressionML - Multiple Linear Regression
ML - Multiple Linear Regression
 
Introduction to Data Structure
Introduction to Data Structure Introduction to Data Structure
Introduction to Data Structure
 
Planning
PlanningPlanning
Planning
 
Artificial Intelligence 1 Planning In The Real World
Artificial Intelligence 1 Planning In The Real WorldArtificial Intelligence 1 Planning In The Real World
Artificial Intelligence 1 Planning In The Real World
 
Closure properties
Closure propertiesClosure properties
Closure properties
 
Theory of computation / Post’s Correspondence Problems (PCP)
Theory of computation / Post’s Correspondence Problems (PCP)Theory of computation / Post’s Correspondence Problems (PCP)
Theory of computation / Post’s Correspondence Problems (PCP)
 

Similar to Playing the Snake Game with Deep Reinforcement Learning (by Chuyang Liu)

Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement Learning
Khaled Saleh
 
Lec3 dqn
Lec3 dqnLec3 dqn
Lec3 dqn
Ronald Teo
 
Chakrabarti alpha go analysis
Chakrabarti alpha go analysisChakrabarti alpha go analysis
Chakrabarti alpha go analysis
Dave Selinger
 
Bootstrap Custom Image Classification using Transfer Learning by Danielle Dea...
Bootstrap Custom Image Classification using Transfer Learning by Danielle Dea...Bootstrap Custom Image Classification using Transfer Learning by Danielle Dea...
Bootstrap Custom Image Classification using Transfer Learning by Danielle Dea...
Wee Hyong Tok
 
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI ProjectsDiscovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
Wee Hyong Tok
 
pycon2018 "RL Adventure : DQN 부터 Rainbow DQN까지"
pycon2018 "RL Adventure : DQN 부터 Rainbow DQN까지"pycon2018 "RL Adventure : DQN 부터 Rainbow DQN까지"
pycon2018 "RL Adventure : DQN 부터 Rainbow DQN까지"
YeChan(Paul) Kim
 
Dueling Network Architectures for Deep Reinforcement Learning
Dueling Network Architectures for Deep Reinforcement LearningDueling Network Architectures for Deep Reinforcement Learning
Dueling Network Architectures for Deep Reinforcement Learning
Yoonho Lee
 
ddpg seminar
ddpg seminarddpg seminar
ddpg seminar
민재 정
 
Distributed Deep Learning + others for Spark Meetup
Distributed Deep Learning + others for Spark MeetupDistributed Deep Learning + others for Spark Meetup
Distributed Deep Learning + others for Spark Meetup
Vijay Srinivas Agneeswaran, Ph.D
 
deep reinforcement learning with double q learning
deep reinforcement learning with double q learningdeep reinforcement learning with double q learning
deep reinforcement learning with double q learning
SeungHyeok Baek
 
How DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of GoHow DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of Go
Tim Riser
 
Self-Tuning and Managing Services
Self-Tuning and Managing ServicesSelf-Tuning and Managing Services
Self-Tuning and Managing Services
Reza Rahimi
 
Introduction to Chainer
Introduction to ChainerIntroduction to Chainer
Introduction to Chainer
Preferred Networks
 
Introduction to Chainer
Introduction to ChainerIntroduction to Chainer
Introduction to Chainer
Shunta Saito
 
TensorFlow and Deep Learning Tips and Tricks
TensorFlow and Deep Learning Tips and TricksTensorFlow and Deep Learning Tips and Tricks
TensorFlow and Deep Learning Tips and Tricks
Ben Ball
 
Data Profiling in Apache Calcite
Data Profiling in Apache CalciteData Profiling in Apache Calcite
Data Profiling in Apache Calcite
Julian Hyde
 
Machine Learning: A gentle Introduction
Machine Learning: A gentle IntroductionMachine Learning: A gentle Introduction
Machine Learning: A gentle Introduction
Matthias Zimmermann
 
Distributed Deep Learning on AWS with Apache MXNet
Distributed Deep Learning on AWS with Apache MXNetDistributed Deep Learning on AWS with Apache MXNet
Distributed Deep Learning on AWS with Apache MXNet
Amazon Web Services
 
RLTopics_2021_Lect1.pdf
RLTopics_2021_Lect1.pdfRLTopics_2021_Lect1.pdf
RLTopics_2021_Lect1.pdf
NaveenKumarSingh57
 
Demystifying deep reinforement learning
Demystifying deep reinforement learningDemystifying deep reinforement learning
Demystifying deep reinforement learning
재연 윤
 

Similar to Playing the Snake Game with Deep Reinforcement Learning (by Chuyang Liu) (20)

Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement Learning
 
Lec3 dqn
Lec3 dqnLec3 dqn
Lec3 dqn
 
Chakrabarti alpha go analysis
Chakrabarti alpha go analysisChakrabarti alpha go analysis
Chakrabarti alpha go analysis
 
Bootstrap Custom Image Classification using Transfer Learning by Danielle Dea...
Bootstrap Custom Image Classification using Transfer Learning by Danielle Dea...Bootstrap Custom Image Classification using Transfer Learning by Danielle Dea...
Bootstrap Custom Image Classification using Transfer Learning by Danielle Dea...
 
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI ProjectsDiscovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
 
pycon2018 "RL Adventure : DQN 부터 Rainbow DQN까지"
pycon2018 "RL Adventure : DQN 부터 Rainbow DQN까지"pycon2018 "RL Adventure : DQN 부터 Rainbow DQN까지"
pycon2018 "RL Adventure : DQN 부터 Rainbow DQN까지"
 
Dueling Network Architectures for Deep Reinforcement Learning
Dueling Network Architectures for Deep Reinforcement LearningDueling Network Architectures for Deep Reinforcement Learning
Dueling Network Architectures for Deep Reinforcement Learning
 
ddpg seminar
ddpg seminarddpg seminar
ddpg seminar
 
Distributed Deep Learning + others for Spark Meetup
Distributed Deep Learning + others for Spark MeetupDistributed Deep Learning + others for Spark Meetup
Distributed Deep Learning + others for Spark Meetup
 
deep reinforcement learning with double q learning
deep reinforcement learning with double q learningdeep reinforcement learning with double q learning
deep reinforcement learning with double q learning
 
How DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of GoHow DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of Go
 
Self-Tuning and Managing Services
Self-Tuning and Managing ServicesSelf-Tuning and Managing Services
Self-Tuning and Managing Services
 
Introduction to Chainer
Introduction to ChainerIntroduction to Chainer
Introduction to Chainer
 
Introduction to Chainer
Introduction to ChainerIntroduction to Chainer
Introduction to Chainer
 
TensorFlow and Deep Learning Tips and Tricks
TensorFlow and Deep Learning Tips and TricksTensorFlow and Deep Learning Tips and Tricks
TensorFlow and Deep Learning Tips and Tricks
 
Data Profiling in Apache Calcite
Data Profiling in Apache CalciteData Profiling in Apache Calcite
Data Profiling in Apache Calcite
 
Machine Learning: A gentle Introduction
Machine Learning: A gentle IntroductionMachine Learning: A gentle Introduction
Machine Learning: A gentle Introduction
 
Distributed Deep Learning on AWS with Apache MXNet
Distributed Deep Learning on AWS with Apache MXNetDistributed Deep Learning on AWS with Apache MXNet
Distributed Deep Learning on AWS with Apache MXNet
 
RLTopics_2021_Lect1.pdf
RLTopics_2021_Lect1.pdfRLTopics_2021_Lect1.pdf
RLTopics_2021_Lect1.pdf
 
Demystifying deep reinforement learning
Demystifying deep reinforement learningDemystifying deep reinforement learning
Demystifying deep reinforement learning
 

Recently uploaded

Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
BibashShahi
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
christinelarrosa
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
Fwdays
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
UiPathCommunity
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
LizaNolte
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
 
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
FilipTomaszewski5
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
ScyllaDB
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
Fwdays
 

Recently uploaded (20)

Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
 
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
 

Playing the Snake Game with Deep Reinforcement Learning (by Chuyang Liu)

  • 1. Playing the Snake Game with Deep Reinforcement Learning Chuyang Liu https://github.com/chuyangliu/snake
  • 3. Background 2015 2016 2017 DQN Optimizations DDQN, Prioritized, Duel Human-Level Control through Deep Reinforcement Learning (DeepMind) Deep Q-Network (DQN) Snake AI (Hawstein, Tobin Ehlis, Christopher Lockhart, …) Graph Theory, GP, RL, … Snake AI (Chuyang Liu) Path Search, Hamilton Cycle Benchmark: Atari 2600 BFS, Longest Path
  • 4. My Works ? Designed global/local state vector Discussed absolute/relative move direction Feature Extraction Implemented Deep Q-Network (Experience Replay & Fixed Target-Net) DQN Compared the performance of the three optimizations for DQN DQN Optimizations Designed a better reward policy improving performance Environment
  • 5. 𝜽 ← 𝜽 + 𝜼 𝝏 𝝏𝜽 𝒓 + 𝜸𝐦𝐚𝐱 𝒂′ 𝑸 𝒕𝒂𝒓𝒈𝒆𝒕 𝒔′ , 𝒂′ ; 𝜽′ − 𝑸 𝒆𝒗𝒂𝒍(𝒔, 𝒂; 𝜽) 𝟐 Policy 𝑎 ← 𝜋 𝑠 Eval-NetTarget-Net Q-Target Q-Eval Loss state action state’reward Environment 𝑟, 𝑠′ ← 𝑒𝑛𝑣 𝑠, 𝑎 Algorithm: Computation Graph assign
  • 6. Environment 𝑟, 𝑠′ ← 𝑒𝑛𝑣 𝑠, 𝑎 Algorithm: Environment FOOD +1.0 DEAD -0.5 (game over) ELSE -0.005 𝜽 ← 𝜽 + 𝜼 𝝏 𝝏𝜽 𝒓 + 𝜸𝐦𝐚𝐱 𝒂′ 𝑸 𝒕𝒂𝒓𝒈𝒆𝒕 𝒔′ , 𝒂′ ; 𝜽′ − 𝑸 𝒆𝒗𝒂𝒍(𝒔, 𝒂; 𝜽) 𝟐 Policy 𝑎 ← 𝜋 𝑠 state action state’reward
  • 7. 𝜽 ← 𝜽 + 𝜼 𝝏 𝝏𝜽 𝒓 + 𝜸𝐦𝐚𝐱 𝒂′ 𝑸 𝒕𝒂𝒓𝒈𝒆𝒕 𝒔′ , 𝒂′ ; 𝜽′ − 𝑸 𝒆𝒗𝒂𝒍(𝒔, 𝒂; 𝜽) 𝟐 Policy 𝑎 ← 𝜋 𝑠 Eval-NetTarget-Net Q-Target Q-Eval Loss state action state’reward Environment 𝑟, 𝑠′ ← 𝑒𝑛𝑣 𝑠, 𝑎 Algorithm: Computation Graph assign
  • 9. Target-Net Algorithm: Network & State Vector global (8x8x4) local (1x3) Conv3x3x32 Conv3x3x64 Conv2x2x128 Conv2x2x256 FC1024 FC512 Advantage1x3 Value1x1 Q1x3 FC512 Eval-Net Same structure but different weights and bias assign
  • 12. Summary Double DQN Prioritized Experience Replay Dueling Network Structure DQN Feature Extraction Environment Feedback State & Action Human experience Pixels input Network Inputs Deep Neural Network Deep Q-Learning from Demonstrations (2018) Training Efficiency