REPRESENTATION LEARNING FOR STATE APPROXIMATION IN PLATFORM GAMES

HIERARCHICAL DECISION MAKING
USING SPATIO-TEMPORAL ABSTRACTION
REPRESENTATION LEARNING FOR STATE APPROXIMATION IN PLATFORM GAMES
by S.K.Ramnandan
Deep Learning Group, IIT-Madras

How to emulate intelligent behaviour?
Spatial abstraction - by ignoring irrelevant sensory input
Group sets of primitive states in MDP into abstract states
Temporal abstraction - by ignoring ﬁne grain details of actions
Extended actions directly take agent from one abstract state to another
Identify useful skills

Motivation for spatial abstraction:
Find regions of state space that are well-connected - abstract
states
Idea from conformal dynamics - metastability:
Particles stay in same region of state space for long periods of
time without external stimulus
Behaviour under random walks
Identiﬁed using spectral clustering algorithm - PCCA+

Construct Laplacian of transition matrix corresponding to random walk on
underlying MDP
Spectra of the Laplacian encodes the properties of underlying  
graph
Vertices of a simplex which lie on the transformed basis are the abstract states
States are classiﬁed to abstract states based on their membership to clusters after
projection
Advantages:
Degree of membership of states to each abstract state
Connectivity information between abstract states
Automatically estimate number of abstract states
PCCA+

Use partitions of state space into abstract states along with membership function returned by PCCA+ to
compose options for free
Thus, use the structural information obtained to deﬁne behavioral policies for the subtasks independent of
the task being solved
Hence these skills may work even for platform games where rewards are hugely delayed  
TEMPORAL ABSTRACTION: OPTIONS
Option policy to go from abstract state 1 to 2 in 3-room domain

No access to a model of the MDP
Have to estimate transition matrix from sampled trajectories
Underlying policy while sampling cannot be random since
exploration of MDP heavily depends on near-optimal policy
ONLINE AGENT FOR PLATFORM GAMES
Trajectories Featurization
Dimensionality
Reduction
Clustering
Fitting Markov
State Model
PCCA+

Exponential state space - 25352 possible states
22 x 16 tiled grid with 25 possible values
Higher-level state representation than pixel space
FEATURIZATION
12 possible primitive actions
Rewards for achieving ‘side’ goals, such as gathering coins and
killing monsters
MARIO DOMAIN

After featurization, dimensionality of state vector = 240
For 10,000 trajectories, time taken to cluster & ﬁt MSM:
Curse of dimensionality, local feature relevance problem
Reduced dimension representation learning:
Deep Q-Network
Autoencoder (Denoising)
Stacked denoising autoencoder
DIMENSIONALITY REDUCTION
1-D 3-D 240-D
15 min 307 min ?

DQN
RL presents challenges from a deep learning perspective
No direct association between inputs and targets - RL algorithms must be
able to learn from a scalar reward signal that is frequently sparse, noisy
and delayed
Correlated data - In RL, encounter sequences of highly correlated data
Non-stationary training distribution - Problematic for deep learning
methods that assume a ﬁxed underlying distribution
Neural network trained with TD-error acts as non-linear function
approximator for action-values
Experience replay mechanism - randomly samples previous transitions
(s-a-r-s’) from replay pool

116 8x8
ﬁlters
32 4x4
ﬁlters
Fully connected
hidden layer
Fully connected
output layer
84x84x4
input
• Deriving an approximate state representation
• Compress last hidden layer to simulate encoder in auto encoders
• Summarize state by values of neurons in last hidden layer
• In case of Mario where input is not in pixel space, replaced convolution
layers with fully connected layers
Note: 
Contractive nature
of reduced dimension
as training epochs
increases

AUTOENCODER
Cross-entropy error for binary inputs
Directly using loss function for ordinal data inputs ?

AUTOENCODER (DENOISING)
• Is representation learnt from autoencoder useful enough?
• Further constraints need to be applied to attempt to separate useful information from noise
• Will naturally translate to non-zero reconstruction error
• Two implicit underlying ideas:
• A higher level representation should be rather stable and robust under corruptions of the input
• Performing the denoising task well requires extracting features that capture useful structure in
the input distribution

VISULAZATION OF REDUCED DIMENSION
DQN 1-d DQN 2-d DQN 3-d
Auto 1-d Auto 2-d Auto 3-d

VISULAZATION OF REDUCED DIMENSION
dAuto 1-d dAuto 2-d dAuto 3-d
Auto 1-d Auto 2-d Auto 3-d
25%noise0%noise

RECONSTRUCTION ERROR
Reduced
dimension
Auto
dAuto
(25% noise)
h-1 200.559 177.456
h-2 168.765 158.984
h-3 158.751 151.514
h-5 156.246 139.845
dAuto
Auto
Fall in training cost smoother
for denoising autoencoder

END-TO-ENDTESTING RESULTS FOR STATE APPROXIMATION
• Average % increase in return per episode: 15.3%
• Average % decrease in time spent per episode: 4.39%

END-TO-ENDTESTING RESULTS FOR STATE APPROXIMATION
Observations:
Performance improves when approximating state using
denoising variant of autoencoder for same latent
representation size
Tradeoff when increasing dimensionality of approximated state:
Increase in end-to-end performance
Signiﬁcant increase in time taken for clustering & ﬁtting a Markov
state model

MONTEZUMA’S REVENGE
• Much higher emphasis on representation learning than Mario
• DeepMind’s DQN reports worst performance on this - 0% compared to
human test player
• After training DQN, we have a 256 real-valued feature vector output by
the last fully connected hidden layer
• Has been observed that the magnitude of the output values themselves
do not matter in an image recognition task
• Hence can binarize the values and obtain a 256-bit binary feature
vector representing a state
• Perform further state approximation using d-Autoencoder

REPRESENTATION LEARNING FOR STATE APPROXIMATION IN PLATFORM GAMES

More Related Content

What's hot

Viewers also liked

Similar to REPRESENTATION LEARNING FOR STATE APPROXIMATION IN PLATFORM GAMES

Recently uploaded

REPRESENTATION LEARNING FOR STATE APPROXIMATION IN PLATFORM GAMES