SlideShare a Scribd company logo
1 of 53
Download to read offline
>>> Building
Intelligent Agents
using
Deep Reinforcement Learning
@aliostad
Ali Kheyrollahi, ASOS
@aliostad
/// Do you has teh codez?
> Slides will be published - check @aliostad
> github page:
https://github.com/aliostad/hexagon-rl
@aliostad
: @aliostad
email: the same @gmail.com
http://byterot.blogspot.com
Ali Kheyrollahi,
Solutions Architect at ASOS
@aliostad
@aliostad
/// Take-aways
> mini-history of Reinforcement Learning (RL)
> Basics
> Representations and models
> Putting it all together in Hexagon
@aliostad
/// Deep Learning
@aliostad
/// Deep Learning basics
> Bunch of techniques to overcome 80s problems:
- Overfitting: DropOut Layers
- Curse of Dimensionality: MOAR data!
- Better training and optimisation techniques
- GPUs and parallel computing to speed-up training
> Multi-layer neural network described back in 1950s
> Type of layers, # of units and activation function
@aliostad
/// supervised learning
@aliostad
/// unsupervised learning
Clustering
GAN
word2vec
king + woman - man = queen
@aliostad
/// 2013 - Atari
“We apply our method to seven Atari 2600
games from the Arcade Learning
Environment, with no adjustment of the
architecture or learning algorithm. We find
that it outperforms all previous approaches
on six of the games and surpasses a human
expert on three of them.”
> Deep-Mind
@aliostad
/// 2015 - Go
> DeepMindLive reactions to the move 37
@aliostad
/// 2016 - Doom
@aliostad
/// 2017 - Dota2
> OpenAI
@aliostad
/// Late 2017 - Chess
> DeepMind
Grandmaster Daniel King on
AlphaZero’s game 10 against Stockfish
https://www.youtube.com/watch?v=Lfkam_oLLM8
@aliostad
/// 1992
Gerald Tesauro - IBM
> TD-Gammon
> Using Temporal Difference
Learning TD-Lambda
> Neural Networks
> Training using Self-Play
> Value Function
@aliostad
/// harry klopf
Harry Klopf
Marvin Minsky
Alan Turing
@aliostad
/// research grant
Rich Sutton Andrew Barto
> “Goal seeking components for Adaptive Intelligence” 1977
> Cybernetics Center for
Systems Neuroscience in
University of
Massachusetts Amherst
> “Synthesis of Nonlinear Control Surfaces by a
Layered Associative Network” 1981
@aliostad
/// progress
1982
1998
@aliostad
/// reinforcement learning
> Neuroscience + Psychology + Control Theory
> Learning with a Critic
ENVIRONMENT
AGENT
Observation (state)
Action
Reward
@aliostad
/// markov decision process
Markov Decision Process - Wikipedia
@aliostad
/// Value Function v(s)
v(s1) = v(s0) - R
@aliostad
/// Temporal Difference (TD)
if error is zero => reward=v(s)-γv(s’)
where γ is the discount factor
Predictive Reward Signal of Dopamine Neurons
- Wolfram Schultz 1998
@aliostad
/// Monte-Carlo Tree Search (MCTS)
In MCTS,
γ is 1!
@aliostad
/// Q-Learning
> A form of TD Learning
> Uses Q Function which returns probability
distribution for actions to be drawn from
R L U D F N
0.1 0.2 0.5 0.1 0.0 0.1
Explore vs Exploit
(Greediness)
@aliostad
/// Deep Q Network (DQN)
> Proposed by Atari paper (DeepMind) in 2013
> Uses a Deep Network to map state to
action and uses Q-Learning error to train
> Double Q-learning variant (DeepMind 2015)
> Duelling Networks variant (DeepMind 2015)
@aliostad
/// meet lunar-lander!
> State: (8,)
> Action: (4,)
> Rewards:
- leg touchdown: +10
- crash: -100
- rest: +100
- solve: 200
- main engine: -0.3
Part of OpenAI’s gym
@aliostad
/// keras-rl
> based on OpenAI’s agent/environment interface
> Supports DQN (and its variants), CEM,
SARSA and DDPG algorithms
> Upcoming ACER, A2C/A3C, PPO, etc algorithms
> Uses any keras models as long as input/output
shapes matches. “Bring Your Own Models”
@aliostad
/// DQN in keras-rl - 1
@aliostad
/// DQN in keras-rl -2
INPUT
I N P U T
DENSE
DENSE
O U T P U T
[0.8, 0.9,…-0.3]
[0, 0, 1, 0]
DENSE
FLATTEN
@aliostad
/// lunar-lander with DQN
@aliostad
/// hexagon
> Mainly A coding challenge (playhexagon.com)
> Danske Bank (Vidas)
> A round-based strategy
game for 2 or more
players to start with one
cell and gradually occupy
the board or have more
cells when time runs out.
@aliostad
/// hexagon - start
@aliostad
/// hexagon - expansion
Transferring 70 resources
from seed cell to the
adjacent neutral cell
@aliostad
/// hexagon - increments
Maroon also transfers 70
resources from seed cell to
its adjacent neutral cell.
All occupied cells get +1
resource unless they have
100 or more resources.
+1
@aliostad
/// hexagon - attack
Transferring 40 resources
from the cell having 58 to
the adjacent enemy cell
having 16 results in own
having 18 and the attacked
cell 40-16=24.
@aliostad
/// hexagon - boost
Transferring 50 resources from
the cell having 100 to to friendly
cell having 4 results in own
having 50 and the boosted cell
4-50=54.
This helps the cell to protect
against neighbouring enemy
cells having 20 and 25
resources.
@aliostad
/// hexagon - neighbourhood
@aliostad
/// hexagon - gameplay
@aliostad
/// hexagon - strategies
attack all the things! defend…build a wall
flooding
@aliostad
/// hexagon - what to do?
> Attack? From which
cell to which cell?
> Re-inforcements?
> How many resources?
@aliostad
/// hexagon - heuristics
self.attackPotential = self.resources * 
math.sqrt(max(self.resources -
safeMin([n.resources for n in self.nonOwns]), 1)) / 
math.log(sum([n.resources for n in self.enemies], 1)
+ 1, 5)
# how suitable is a cell for receiving boost
self.boostFactor = math.sqrt(sum((n.resources for n in self.enemies), 1)) * 
safeMax([n.resources for n in self.enemies], 1) /
(self.resources + 1)
def getGivingBoostSuitability(self):
return (self.depth + 1) * math.sqrt(self.resources + 1) *
(1.7 if self.resources == 100 else 1)
@aliostad
/// hexagon - heuristics
@aliostad
/// hexagon - heuristics??
> Score functions are arbitrary: they do not necessarily
represent the underlying mechanics of the game
> No easy way to learn parameters and and
testing all combinations impossible
> When it does not work, it is hard to
know which parameter to tune.
Got to be a better way…
self.attackPotential = self.resources * 
math.sqrt(max(self.resources -
safeMin([n.resources for n in self.nonOwns]), 1)) / 
math.log(sum([n.resources for n in self.enemies], 1)
+ 1, 5)
@aliostad
/// hexagon -representation
@aliostad
/// hexagon - cell representation
> Own cells represented by positive integer (for
resources). Enemy cells by negative integer. Neutral
by zero
> Feature extraction: for every cell extract
- sum/max/min friendly cells
- sum/max/min enemy cells
/// hexagon - board representation
> Flattened: array of cells
> 2D representation so that we can use Convolutional
Neural Network. Hexagon => Grid
10 -1 0 0 25 -43 -12 3 0 -9
@aliostad
/// hexagon - model
> Pure RL models: DQN, PPO, etc
> AlphaZero: Monte Carlo Tree Search (MCTS) + RL
models
@aliostad
/// hexagon - decision tree
Hierarchy of Models and Game Rules
> Centaur:
Replacing parts of the heuristic-
based man-made agent with
machine-learning
- selecting attacker or boosting cell
- choosing attack/boost resources
@aliostad
/// hexagon - repo
> DQN
> AlphaZero (Monte Carlo Tree Search)
> DDPG
> PPO
@aliostad
/// hexagon - alphazero
> Cell representation: 1, -1 and 0 for friendly,
enemy and neutral cells.
> Board representation: Grid mapping of Hexagon
> Action representation: flattened board with 1 for
cells that can attack or boost.
> Deep Learning Model: choice of flat or Conv2D
> Resource quantization: actions include resource
proportions
@aliostad
/// hexagon - alphazero
> default: python hexagon_alphazero train —radius 4
> model: python hexagon_alphazero train -m [f|cm|cam]
> default: python hexagon_alphazero test -p fm -q a
> quantization: python hexagon_alphazero -p fmz -a a -z 4
> rounds: python hexagon_alphazero -p cmz -q a -x 200
Training
Testing
@aliostad
/// C O D E
&
D E M O
@aliostad
Automatic real-time road marking recognition
Hexagon Game: Winter Picture
Researchgate: Convolution Picture
Perceptron Video
AlphaGo vs Lee Sedol: Move 37
AI playing FPS
Hexagon Official Site
hexagon-rl github page

More Related Content

Similar to Autonomous agents with deep reinforcement learning - Oredev 2018

A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterpriseA Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterprisePatrick McFadin
 
5 must-have patterns for your microservice - buildstuff
5 must-have patterns for your microservice - buildstuff5 must-have patterns for your microservice - buildstuff
5 must-have patterns for your microservice - buildstuffAli Kheyrollahi
 
maxbox starter60 machine learning
maxbox starter60 machine learningmaxbox starter60 machine learning
maxbox starter60 machine learningMax Kleiner
 
Pepe Vila - Cache and Syphilis [rooted2019]
Pepe Vila - Cache and Syphilis [rooted2019]Pepe Vila - Cache and Syphilis [rooted2019]
Pepe Vila - Cache and Syphilis [rooted2019]RootedCON
 
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...DataStax Academy
 
OSCON Presentation: Developing High Performance Websites and Modern Apps with...
OSCON Presentation: Developing High Performance Websites and Modern Apps with...OSCON Presentation: Developing High Performance Websites and Modern Apps with...
OSCON Presentation: Developing High Performance Websites and Modern Apps with...Doris Chen
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudRevolution Analytics
 
Locks? We Don't Need No Stinkin' Locks - Michael Barker
Locks? We Don't Need No Stinkin' Locks - Michael BarkerLocks? We Don't Need No Stinkin' Locks - Michael Barker
Locks? We Don't Need No Stinkin' Locks - Michael BarkerJAX London
 
Lock? We don't need no stinkin' locks!
Lock? We don't need no stinkin' locks!Lock? We don't need no stinkin' locks!
Lock? We don't need no stinkin' locks!Michael Barker
 
Cloud Observation and Performance Analysis using Solaris 11 DTrace
Cloud Observation and Performance Analysis using Solaris 11 DTraceCloud Observation and Performance Analysis using Solaris 11 DTrace
Cloud Observation and Performance Analysis using Solaris 11 DTraceOrgad Kimchi
 
How to Become a Tree Hugger: Random Forests and Predictive Modeling for Devel...
How to Become a Tree Hugger: Random Forests and Predictive Modeling for Devel...How to Become a Tree Hugger: Random Forests and Predictive Modeling for Devel...
How to Become a Tree Hugger: Random Forests and Predictive Modeling for Devel...Matt Harrison
 
4Developers 2018: Ile (nie) wiesz o strukturach w .NET (Łukasz Pyrzyk)
4Developers 2018: Ile (nie) wiesz o strukturach w .NET (Łukasz Pyrzyk)4Developers 2018: Ile (nie) wiesz o strukturach w .NET (Łukasz Pyrzyk)
4Developers 2018: Ile (nie) wiesz o strukturach w .NET (Łukasz Pyrzyk)PROIDEA
 
Austin c-c++-meetup-feb2018-spectre
Austin c-c++-meetup-feb2018-spectreAustin c-c++-meetup-feb2018-spectre
Austin c-c++-meetup-feb2018-spectreKim Phillips
 
Tulsa techfest Spark Core Aug 5th 2016
Tulsa techfest Spark Core Aug 5th 2016Tulsa techfest Spark Core Aug 5th 2016
Tulsa techfest Spark Core Aug 5th 2016Mark Smith
 
Parallel Computing with R
Parallel Computing with RParallel Computing with R
Parallel Computing with RAbhirup Mallik
 
Large volume data analysis on the Typesafe Reactive Platform
Large volume data analysis on the Typesafe Reactive PlatformLarge volume data analysis on the Typesafe Reactive Platform
Large volume data analysis on the Typesafe Reactive PlatformMartin Zapletal
 
Communicating State Machines
Communicating State MachinesCommunicating State Machines
Communicating State Machinessrirammalhar
 
Music Recommender Systems
Music Recommender SystemsMusic Recommender Systems
Music Recommender Systemsfuchaoqun
 
String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?Jeremy Schneider
 
Language translation with Deep Learning (RNN) with TensorFlow
Language translation with Deep Learning (RNN) with TensorFlowLanguage translation with Deep Learning (RNN) with TensorFlow
Language translation with Deep Learning (RNN) with TensorFlowS N
 

Similar to Autonomous agents with deep reinforcement learning - Oredev 2018 (20)

A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterpriseA Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
 
5 must-have patterns for your microservice - buildstuff
5 must-have patterns for your microservice - buildstuff5 must-have patterns for your microservice - buildstuff
5 must-have patterns for your microservice - buildstuff
 
maxbox starter60 machine learning
maxbox starter60 machine learningmaxbox starter60 machine learning
maxbox starter60 machine learning
 
Pepe Vila - Cache and Syphilis [rooted2019]
Pepe Vila - Cache and Syphilis [rooted2019]Pepe Vila - Cache and Syphilis [rooted2019]
Pepe Vila - Cache and Syphilis [rooted2019]
 
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
 
OSCON Presentation: Developing High Performance Websites and Modern Apps with...
OSCON Presentation: Developing High Performance Websites and Modern Apps with...OSCON Presentation: Developing High Performance Websites and Modern Apps with...
OSCON Presentation: Developing High Performance Websites and Modern Apps with...
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the Cloud
 
Locks? We Don't Need No Stinkin' Locks - Michael Barker
Locks? We Don't Need No Stinkin' Locks - Michael BarkerLocks? We Don't Need No Stinkin' Locks - Michael Barker
Locks? We Don't Need No Stinkin' Locks - Michael Barker
 
Lock? We don't need no stinkin' locks!
Lock? We don't need no stinkin' locks!Lock? We don't need no stinkin' locks!
Lock? We don't need no stinkin' locks!
 
Cloud Observation and Performance Analysis using Solaris 11 DTrace
Cloud Observation and Performance Analysis using Solaris 11 DTraceCloud Observation and Performance Analysis using Solaris 11 DTrace
Cloud Observation and Performance Analysis using Solaris 11 DTrace
 
How to Become a Tree Hugger: Random Forests and Predictive Modeling for Devel...
How to Become a Tree Hugger: Random Forests and Predictive Modeling for Devel...How to Become a Tree Hugger: Random Forests and Predictive Modeling for Devel...
How to Become a Tree Hugger: Random Forests and Predictive Modeling for Devel...
 
4Developers 2018: Ile (nie) wiesz o strukturach w .NET (Łukasz Pyrzyk)
4Developers 2018: Ile (nie) wiesz o strukturach w .NET (Łukasz Pyrzyk)4Developers 2018: Ile (nie) wiesz o strukturach w .NET (Łukasz Pyrzyk)
4Developers 2018: Ile (nie) wiesz o strukturach w .NET (Łukasz Pyrzyk)
 
Austin c-c++-meetup-feb2018-spectre
Austin c-c++-meetup-feb2018-spectreAustin c-c++-meetup-feb2018-spectre
Austin c-c++-meetup-feb2018-spectre
 
Tulsa techfest Spark Core Aug 5th 2016
Tulsa techfest Spark Core Aug 5th 2016Tulsa techfest Spark Core Aug 5th 2016
Tulsa techfest Spark Core Aug 5th 2016
 
Parallel Computing with R
Parallel Computing with RParallel Computing with R
Parallel Computing with R
 
Large volume data analysis on the Typesafe Reactive Platform
Large volume data analysis on the Typesafe Reactive PlatformLarge volume data analysis on the Typesafe Reactive Platform
Large volume data analysis on the Typesafe Reactive Platform
 
Communicating State Machines
Communicating State MachinesCommunicating State Machines
Communicating State Machines
 
Music Recommender Systems
Music Recommender SystemsMusic Recommender Systems
Music Recommender Systems
 
String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?
 
Language translation with Deep Learning (RNN) with TensorFlow
Language translation with Deep Learning (RNN) with TensorFlowLanguage translation with Deep Learning (RNN) with TensorFlow
Language translation with Deep Learning (RNN) with TensorFlow
 

More from Ali Kheyrollahi

Buildstuff - what do you need to know about RPC comeback
Buildstuff - what do you need to know about RPC comebackBuildstuff - what do you need to know about RPC comeback
Buildstuff - what do you need to know about RPC comebackAli Kheyrollahi
 
Deep learning for developers - oredev
Deep learning for developers - oredevDeep learning for developers - oredev
Deep learning for developers - oredevAli Kheyrollahi
 
Microservice Architecture at ASOS - DevSum 2017
Microservice Architecture at ASOS - DevSum 2017Microservice Architecture at ASOS - DevSum 2017
Microservice Architecture at ASOS - DevSum 2017Ali Kheyrollahi
 
Real time monitoring-alerting: storing 2Tb of logs a day in Elasticsearch
Real time monitoring-alerting: storing 2Tb of logs a day in ElasticsearchReal time monitoring-alerting: storing 2Tb of logs a day in Elasticsearch
Real time monitoring-alerting: storing 2Tb of logs a day in ElasticsearchAli Kheyrollahi
 
From Power Chord to the Power of Models - Oredev
From Power Chord to the Power of Models - OredevFrom Power Chord to the Power of Models - Oredev
From Power Chord to the Power of Models - OredevAli Kheyrollahi
 
From Hard Science to Baseless Opinions - Oredev
From Hard Science to Baseless Opinions  - OredevFrom Hard Science to Baseless Opinions  - Oredev
From Hard Science to Baseless Opinions - OredevAli Kheyrollahi
 
5 must have patterns for your microservice
5 must have patterns for your microservice5 must have patterns for your microservice
5 must have patterns for your microserviceAli Kheyrollahi
 
From hard science to baseless opinions
From hard science to baseless opinionsFrom hard science to baseless opinions
From hard science to baseless opinionsAli Kheyrollahi
 
Microservice architecture at ASOS
Microservice architecture at ASOSMicroservice architecture at ASOS
Microservice architecture at ASOSAli Kheyrollahi
 
Us Elections 2016 - Iran Elections 2005
Us Elections 2016 - Iran Elections 2005Us Elections 2016 - Iran Elections 2005
Us Elections 2016 - Iran Elections 2005Ali Kheyrollahi
 
5 Anti-Patterns in Api Design - NDC London 2016
5 Anti-Patterns in Api Design - NDC London 20165 Anti-Patterns in Api Design - NDC London 2016
5 Anti-Patterns in Api Design - NDC London 2016Ali Kheyrollahi
 
From power chords to the power of models
From power chords to the power of modelsFrom power chords to the power of models
From power chords to the power of modelsAli Kheyrollahi
 
5 Anti-Patterns in Api Design - buildstuff
5 Anti-Patterns in Api Design - buildstuff5 Anti-Patterns in Api Design - buildstuff
5 Anti-Patterns in Api Design - buildstuffAli Kheyrollahi
 
5 Anti-Patterns in API Design - DDD East Anglia 2015
5 Anti-Patterns in API Design - DDD East Anglia 20155 Anti-Patterns in API Design - DDD East Anglia 2015
5 Anti-Patterns in API Design - DDD East Anglia 2015Ali Kheyrollahi
 
5 Anti-Patterns in API Design
5 Anti-Patterns in API Design5 Anti-Patterns in API Design
5 Anti-Patterns in API DesignAli Kheyrollahi
 
Topic Modelling and APIs
Topic Modelling and APIsTopic Modelling and APIs
Topic Modelling and APIsAli Kheyrollahi
 
Http caching 101 and a bit of CacheCow
Http caching 101 and a bit of CacheCowHttp caching 101 and a bit of CacheCow
Http caching 101 and a bit of CacheCowAli Kheyrollahi
 

More from Ali Kheyrollahi (17)

Buildstuff - what do you need to know about RPC comeback
Buildstuff - what do you need to know about RPC comebackBuildstuff - what do you need to know about RPC comeback
Buildstuff - what do you need to know about RPC comeback
 
Deep learning for developers - oredev
Deep learning for developers - oredevDeep learning for developers - oredev
Deep learning for developers - oredev
 
Microservice Architecture at ASOS - DevSum 2017
Microservice Architecture at ASOS - DevSum 2017Microservice Architecture at ASOS - DevSum 2017
Microservice Architecture at ASOS - DevSum 2017
 
Real time monitoring-alerting: storing 2Tb of logs a day in Elasticsearch
Real time monitoring-alerting: storing 2Tb of logs a day in ElasticsearchReal time monitoring-alerting: storing 2Tb of logs a day in Elasticsearch
Real time monitoring-alerting: storing 2Tb of logs a day in Elasticsearch
 
From Power Chord to the Power of Models - Oredev
From Power Chord to the Power of Models - OredevFrom Power Chord to the Power of Models - Oredev
From Power Chord to the Power of Models - Oredev
 
From Hard Science to Baseless Opinions - Oredev
From Hard Science to Baseless Opinions  - OredevFrom Hard Science to Baseless Opinions  - Oredev
From Hard Science to Baseless Opinions - Oredev
 
5 must have patterns for your microservice
5 must have patterns for your microservice5 must have patterns for your microservice
5 must have patterns for your microservice
 
From hard science to baseless opinions
From hard science to baseless opinionsFrom hard science to baseless opinions
From hard science to baseless opinions
 
Microservice architecture at ASOS
Microservice architecture at ASOSMicroservice architecture at ASOS
Microservice architecture at ASOS
 
Us Elections 2016 - Iran Elections 2005
Us Elections 2016 - Iran Elections 2005Us Elections 2016 - Iran Elections 2005
Us Elections 2016 - Iran Elections 2005
 
5 Anti-Patterns in Api Design - NDC London 2016
5 Anti-Patterns in Api Design - NDC London 20165 Anti-Patterns in Api Design - NDC London 2016
5 Anti-Patterns in Api Design - NDC London 2016
 
From power chords to the power of models
From power chords to the power of modelsFrom power chords to the power of models
From power chords to the power of models
 
5 Anti-Patterns in Api Design - buildstuff
5 Anti-Patterns in Api Design - buildstuff5 Anti-Patterns in Api Design - buildstuff
5 Anti-Patterns in Api Design - buildstuff
 
5 Anti-Patterns in API Design - DDD East Anglia 2015
5 Anti-Patterns in API Design - DDD East Anglia 20155 Anti-Patterns in API Design - DDD East Anglia 2015
5 Anti-Patterns in API Design - DDD East Anglia 2015
 
5 Anti-Patterns in API Design
5 Anti-Patterns in API Design5 Anti-Patterns in API Design
5 Anti-Patterns in API Design
 
Topic Modelling and APIs
Topic Modelling and APIsTopic Modelling and APIs
Topic Modelling and APIs
 
Http caching 101 and a bit of CacheCow
Http caching 101 and a bit of CacheCowHttp caching 101 and a bit of CacheCow
Http caching 101 and a bit of CacheCow
 

Recently uploaded

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 

Recently uploaded (20)

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 

Autonomous agents with deep reinforcement learning - Oredev 2018

  • 1. >>> Building Intelligent Agents using Deep Reinforcement Learning @aliostad Ali Kheyrollahi, ASOS
  • 2. @aliostad /// Do you has teh codez? > Slides will be published - check @aliostad > github page: https://github.com/aliostad/hexagon-rl
  • 3. @aliostad : @aliostad email: the same @gmail.com http://byterot.blogspot.com Ali Kheyrollahi, Solutions Architect at ASOS
  • 5. @aliostad /// Take-aways > mini-history of Reinforcement Learning (RL) > Basics > Representations and models > Putting it all together in Hexagon
  • 7. @aliostad /// Deep Learning basics > Bunch of techniques to overcome 80s problems: - Overfitting: DropOut Layers - Curse of Dimensionality: MOAR data! - Better training and optimisation techniques - GPUs and parallel computing to speed-up training > Multi-layer neural network described back in 1950s > Type of layers, # of units and activation function
  • 10. @aliostad /// 2013 - Atari “We apply our method to seven Atari 2600 games from the Arcade Learning Environment, with no adjustment of the architecture or learning algorithm. We find that it outperforms all previous approaches on six of the games and surpasses a human expert on three of them.” > Deep-Mind
  • 11. @aliostad /// 2015 - Go > DeepMindLive reactions to the move 37
  • 13. @aliostad /// 2017 - Dota2 > OpenAI
  • 14. @aliostad /// Late 2017 - Chess > DeepMind Grandmaster Daniel King on AlphaZero’s game 10 against Stockfish https://www.youtube.com/watch?v=Lfkam_oLLM8
  • 15. @aliostad /// 1992 Gerald Tesauro - IBM > TD-Gammon > Using Temporal Difference Learning TD-Lambda > Neural Networks > Training using Self-Play > Value Function
  • 16. @aliostad /// harry klopf Harry Klopf Marvin Minsky Alan Turing
  • 17. @aliostad /// research grant Rich Sutton Andrew Barto > “Goal seeking components for Adaptive Intelligence” 1977 > Cybernetics Center for Systems Neuroscience in University of Massachusetts Amherst > “Synthesis of Nonlinear Control Surfaces by a Layered Associative Network” 1981
  • 19. @aliostad /// reinforcement learning > Neuroscience + Psychology + Control Theory > Learning with a Critic ENVIRONMENT AGENT Observation (state) Action Reward
  • 20. @aliostad /// markov decision process Markov Decision Process - Wikipedia
  • 21. @aliostad /// Value Function v(s) v(s1) = v(s0) - R
  • 22. @aliostad /// Temporal Difference (TD) if error is zero => reward=v(s)-γv(s’) where γ is the discount factor Predictive Reward Signal of Dopamine Neurons - Wolfram Schultz 1998
  • 23. @aliostad /// Monte-Carlo Tree Search (MCTS) In MCTS, γ is 1!
  • 24. @aliostad /// Q-Learning > A form of TD Learning > Uses Q Function which returns probability distribution for actions to be drawn from R L U D F N 0.1 0.2 0.5 0.1 0.0 0.1 Explore vs Exploit (Greediness)
  • 25. @aliostad /// Deep Q Network (DQN) > Proposed by Atari paper (DeepMind) in 2013 > Uses a Deep Network to map state to action and uses Q-Learning error to train > Double Q-learning variant (DeepMind 2015) > Duelling Networks variant (DeepMind 2015)
  • 26. @aliostad /// meet lunar-lander! > State: (8,) > Action: (4,) > Rewards: - leg touchdown: +10 - crash: -100 - rest: +100 - solve: 200 - main engine: -0.3 Part of OpenAI’s gym
  • 27. @aliostad /// keras-rl > based on OpenAI’s agent/environment interface > Supports DQN (and its variants), CEM, SARSA and DDPG algorithms > Upcoming ACER, A2C/A3C, PPO, etc algorithms > Uses any keras models as long as input/output shapes matches. “Bring Your Own Models”
  • 28. @aliostad /// DQN in keras-rl - 1
  • 29. @aliostad /// DQN in keras-rl -2 INPUT I N P U T DENSE DENSE O U T P U T [0.8, 0.9,…-0.3] [0, 0, 1, 0] DENSE FLATTEN
  • 31. @aliostad /// hexagon > Mainly A coding challenge (playhexagon.com) > Danske Bank (Vidas) > A round-based strategy game for 2 or more players to start with one cell and gradually occupy the board or have more cells when time runs out.
  • 33. @aliostad /// hexagon - expansion Transferring 70 resources from seed cell to the adjacent neutral cell
  • 34. @aliostad /// hexagon - increments Maroon also transfers 70 resources from seed cell to its adjacent neutral cell. All occupied cells get +1 resource unless they have 100 or more resources. +1
  • 35. @aliostad /// hexagon - attack Transferring 40 resources from the cell having 58 to the adjacent enemy cell having 16 results in own having 18 and the attacked cell 40-16=24.
  • 36. @aliostad /// hexagon - boost Transferring 50 resources from the cell having 100 to to friendly cell having 4 results in own having 50 and the boosted cell 4-50=54. This helps the cell to protect against neighbouring enemy cells having 20 and 25 resources.
  • 37. @aliostad /// hexagon - neighbourhood
  • 39. @aliostad /// hexagon - strategies attack all the things! defend…build a wall flooding
  • 40. @aliostad /// hexagon - what to do? > Attack? From which cell to which cell? > Re-inforcements? > How many resources?
  • 41. @aliostad /// hexagon - heuristics self.attackPotential = self.resources * math.sqrt(max(self.resources - safeMin([n.resources for n in self.nonOwns]), 1)) / math.log(sum([n.resources for n in self.enemies], 1) + 1, 5) # how suitable is a cell for receiving boost self.boostFactor = math.sqrt(sum((n.resources for n in self.enemies), 1)) * safeMax([n.resources for n in self.enemies], 1) / (self.resources + 1) def getGivingBoostSuitability(self): return (self.depth + 1) * math.sqrt(self.resources + 1) * (1.7 if self.resources == 100 else 1)
  • 43. @aliostad /// hexagon - heuristics?? > Score functions are arbitrary: they do not necessarily represent the underlying mechanics of the game > No easy way to learn parameters and and testing all combinations impossible > When it does not work, it is hard to know which parameter to tune. Got to be a better way… self.attackPotential = self.resources * math.sqrt(max(self.resources - safeMin([n.resources for n in self.nonOwns]), 1)) / math.log(sum([n.resources for n in self.enemies], 1) + 1, 5)
  • 45. @aliostad /// hexagon - cell representation > Own cells represented by positive integer (for resources). Enemy cells by negative integer. Neutral by zero > Feature extraction: for every cell extract - sum/max/min friendly cells - sum/max/min enemy cells
  • 46. /// hexagon - board representation > Flattened: array of cells > 2D representation so that we can use Convolutional Neural Network. Hexagon => Grid 10 -1 0 0 25 -43 -12 3 0 -9
  • 47. @aliostad /// hexagon - model > Pure RL models: DQN, PPO, etc > AlphaZero: Monte Carlo Tree Search (MCTS) + RL models
  • 48. @aliostad /// hexagon - decision tree Hierarchy of Models and Game Rules > Centaur: Replacing parts of the heuristic- based man-made agent with machine-learning - selecting attacker or boosting cell - choosing attack/boost resources
  • 49. @aliostad /// hexagon - repo > DQN > AlphaZero (Monte Carlo Tree Search) > DDPG > PPO
  • 50. @aliostad /// hexagon - alphazero > Cell representation: 1, -1 and 0 for friendly, enemy and neutral cells. > Board representation: Grid mapping of Hexagon > Action representation: flattened board with 1 for cells that can attack or boost. > Deep Learning Model: choice of flat or Conv2D > Resource quantization: actions include resource proportions
  • 51. @aliostad /// hexagon - alphazero > default: python hexagon_alphazero train —radius 4 > model: python hexagon_alphazero train -m [f|cm|cam] > default: python hexagon_alphazero test -p fm -q a > quantization: python hexagon_alphazero -p fmz -a a -z 4 > rounds: python hexagon_alphazero -p cmz -q a -x 200 Training Testing
  • 52. @aliostad /// C O D E & D E M O
  • 53. @aliostad Automatic real-time road marking recognition Hexagon Game: Winter Picture Researchgate: Convolution Picture Perceptron Video AlphaGo vs Lee Sedol: Move 37 AI playing FPS Hexagon Official Site hexagon-rl github page