2022/10/30 BESC2022: Parameter Tuning Method for Multi-agent Simulation using Reinforcement Learning

Parameter Tuning Method for
Multi-agent Simulation using
Reinforcement Learning
Masanori HIRANO, Kiyoshi IZUMI
School of Engineering, The University of Tokyo
research@mhirano.jp
https://mhirano.jp/

©M.HIRANO & Izumi Lab.
What’s Multi-agent Simulation?
• MAS: Multi-agent Simulation
• Simulate social phenomena by piling up agents’ behavior
• Can see the emergent phenomena of complex systems caused by
agents’ complex interactions
• Useful as a tool to understand social phenomena
• Artificial Market Simulation
• MAS for financial markets
• Agents’ interactions are necessary to replicate ”Stylized facts” (well
known phenomena in financial markets)[Lux+, 1999]
• Many models are available
Lux, T., & Marchesi, M. (1999). Scaling and criticality in a stochastic multi-agent model of a financial market. Nature,
397(6719), 498–500. https://doi.org/10.1038/17290
10/30/22
BESC2022
2

MAS
• Usually,
1. Human made a simulation model
2. Tune model parameter to replicate the actual phenomena
3. Evaluation & Analysis
• Parameter tuning for MAS is difficult because …
• Usually, MAS has many parameters
• The response of parameter changes are not always continuous
• MAS is useful and used mainly for complex system, which shows
the chaotic phenomena
Modeling
Parameter
Tuning
Evaluation
& Analysis
10/30/22
BESC2022
3

Approach: Deep Reinforce Learning
• Social Simulation & Bayesian optimization seems
incompatible.
• Bayesian optimization (Optuna): Continuous estimation
• Social simulation using MAS: each trial shows the different
movements & chaotic phenomena or phase transition is frequently
occurs
• Recent developed deep reinforcement learning is a possible
solution.
• Deep reinforcement learning can handle high-dimensional
parameter spaces.
• à We try to use reinforcement learning for MAS parameter
tuning.
10/30/22
BESC2022
4

Model Outline
10/30/22
BESC2022
5

DDPG (Deep Deterministic Policy Gradients)
• Actor-critic based
• Continuous action space
• https://arxiv.org/abs/1509.02971
10/30/22
BESC2022
6
State
Action
NN
Concatenate
Action
NN
State
Actor
Critic

Architectures used in DDPG
• General architectures widely used in reinforcement learning
• Replay buffer
• Not prioritized
• Soft-target
• Exploration noise
• Ornstein–Uhlenbeck process
• In practice, Gaussian is enough but we employed OU noise according
to the original paper
10/30/22
BESC2022
7
𝑑𝑟! = −𝜃 𝑟! − 𝜇 𝑑𝑡 + 𝜎𝑑𝑊!

Simulation surrogate by the critic
10/30/22
BESC2022
8
Works as surrogate
• Actor-critic-based RL is important in terms of surrogation

Proposed method to realize our idea
• DDPG4MASPT (DDPG for MAS Parameter Tuning)
1. Customized DDPG for our task
2. Action Converter (AC)
3. Redundant Full Neural Network Actor (FNNA)
4. Seed Fixer (SF)
• → According to the results, all of them are required to
realize our idea!
10/30/22
BESC2022
12

Customized DDPG
• For 𝑖 th iteration, the parameter set for trial are:
𝑃+ = 𝐴()
• Then, minimize the surrogate error through critic:
min
,
MSE(𝑜+, 𝐶(𝑃+))
• According to the surrogate, the actor is updates as:
max
-
𝐶(𝐴())
10/30/22
BESC2022
13

Action Converter (AC)
• Convert action space
• DDPG is usually supporting continuous spaces
• For non-negative restriction, mapping −∞, ∞ →
.
(0, ∞)
seems good for learning.
• Action probability squashing of Soft Actor-Critic (SAC) is
similar but different from this.
• In our experiments, we employed the mapping 𝑓 𝑥 =
ln(1 + exp(𝑥))
• (Other mappings can be used for it but not tested)
• Because
!"($)
!$
=
&!
'(&! = 1 −
'
'(&!, the effect of this mapping is not
significant in the area 𝑥 ≫ 1
10/30/22
BESC2022
14

Redundant Full Neural Network Actor
• Use redundant neural networks for Actor
10/30/22
BESC2022
15
𝑃",$
𝑃%,$
𝑃&,$
… 𝑃",$
𝑃%,$
𝑃&,$
…
…
…
…
Minimum requirements Redundant Full Neural Network

Seed Fixer
• In social simulations, usually, the effect of seeds >> the
effect of parameters.
• When the variance among simulation trials, it is hard to get precise
gradients.
• Fix seed and test
10/30/22
BESC2022
16
Blue print
When the seed is different When the parameters are slightly different
(when no phase transfer is not included)

Simulation model to evaluate our idea
• Artificial market simulation
• Stylized Trader Agents [Chiarella et al. 02]
• Agents calculate…
• Logarithmic return prediction for bid/ask price
𝑟 =
0
1!21"21#
𝑤3 ⋅ 𝐹 + 𝑤, ⋅ 𝐶 + 𝑤4 ⋅ 𝑁
• Fundamentals
𝐹 =
'
)*+, -*.*-/01, 20)*
ln
34--*,2 )+-5*2 6-03*
34--*,2 74,8+)*,2+9 6-03*
• Chartist (trend)
𝐶 = logarithm averaged return in the past
• Noise 𝑁 ~ 𝑁 0, 𝜎:
• + margin => decide price
• We tuned 𝑤3, 𝑤,.
BESC2022
17
10/30/22

Experiments & Results
• The objective function is MSE of skewness and kurtosis
• Target: Skewness = 0.0, Kurtosis = 6.0
• tuned 𝑤3, 𝑤, only
• Comparative models:
• Optuna (Bayesian estimation)
• Models missing one or two components
• Our proposed model showed the best performance!
(But, not statistically significant…)
10/30/22
BESC2022
18

Discussion & Future work
• Our proposed method works well and three additional
components (AC, FNNA, SF) were necessary.
• Critic works well as a surrogation of simulation
• The task we used in this study is very simple only tuning 2
parameters.
• Bayesian estimation has the limitation of supporting high-
dimensional tuning (Bayesian considers a sample as a POINT)
• Gradient-based methods, such as DDPG, can support high-
dimensional tuning. (A sample having the gradient can be
considered as a surface)
• According to some previous works, DDPG can solve high-
dimensional tasks. Therefore, it seems a good fit for tuning
high-dimensional parameter tuning tasks. à we should test
• For future work, RL models with high-exploration capability
should be tested
10/30/22
BESC2022
19

2022/10/30 BESC2022: Parameter Tuning Method for Multi-agent Simulation using Reinforcement Learning

Recommended

Recommended

More Related Content

Similar to 2022/10/30 BESC2022: Parameter Tuning Method for Multi-agent Simulation using Reinforcement Learning

Similar to 2022/10/30 BESC2022: Parameter Tuning Method for Multi-agent Simulation using Reinforcement Learning (20)

More from Masanori HIRANO

More from Masanori HIRANO (12)

Recently uploaded

Recently uploaded (20)

2022/10/30 BESC2022: Parameter Tuning Method for Multi-agent Simulation using Reinforcement Learning