【DL輪読会】Universal Trading for Order Execution with Oracle Policy DistillationDeep Learning JP
1. This document summarizes a research paper that proposes a reinforcement learning framework for optimal order execution. It uses an approach called Oracle Policy Distillation to distill knowledge from an optimal "teacher" policy into a "student" policy to improve sample efficiency despite noisy market data.
2. The experiments show that the proposed method can discover effective trading strategies and that Oracle Policy Distillation improves performance compared to baselines. It also finds that the student policy learns more efficient trading patterns than alternatives.
3. In conclusion, the framework can learn universal policies across different financial assets and the Oracle Policy Distillation approach enhances learning from limited market samples. Future work aims to distill policies learned from individual assets into universally applicable policies.
The document discusses control as inference in Markov decision processes (MDPs) and partially observable MDPs (POMDPs). It introduces optimality variables that represent whether a state-action pair is optimal or not. It formulates the optimal action-value function Q* and optimal value function V* in terms of these optimality variables and the reward and transition distributions. Q* is defined as the log probability of a state-action pair being optimal, and V* is defined as the log probability of a state being optimal. Bellman equations are derived relating Q* and V* to the reward and next state value.
【DL輪読会】Universal Trading for Order Execution with Oracle Policy DistillationDeep Learning JP
1. This document summarizes a research paper that proposes a reinforcement learning framework for optimal order execution. It uses an approach called Oracle Policy Distillation to distill knowledge from an optimal "teacher" policy into a "student" policy to improve sample efficiency despite noisy market data.
2. The experiments show that the proposed method can discover effective trading strategies and that Oracle Policy Distillation improves performance compared to baselines. It also finds that the student policy learns more efficient trading patterns than alternatives.
3. In conclusion, the framework can learn universal policies across different financial assets and the Oracle Policy Distillation approach enhances learning from limited market samples. Future work aims to distill policies learned from individual assets into universally applicable policies.
The document discusses control as inference in Markov decision processes (MDPs) and partially observable MDPs (POMDPs). It introduces optimality variables that represent whether a state-action pair is optimal or not. It formulates the optimal action-value function Q* and optimal value function V* in terms of these optimality variables and the reward and transition distributions. Q* is defined as the log probability of a state-action pair being optimal, and V* is defined as the log probability of a state being optimal. Bellman equations are derived relating Q* and V* to the reward and next state value.
In this work, we introduce a new Markov operator associated with a digraph, which we refer to as a nonlinear Laplacian. Unlike previous Laplacians for digraphs, the nonlinear Laplacian does not rely on the stationary distribution of the random walk process and is well defined on digraphs that are not strongly connected. We show that the nonlinear Laplacian has nontrivial eigenvalues and give a Cheeger-like inequality, which relates the conductance of a digraph and the smallest non-zero eigenvalue of its nonlinear Laplacian. Finally, we apply the nonlinear Laplacian to the analysis of real-world networks and obtain encouraging results.
The document discusses two recent papers on off-policy meta-reinforcement learning:
1) "Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables" which introduces PEARL, an off-policy method for meta RL using context variables to enable efficient adaptation.
2) "Guided Meta-Policy Search" which uses a two-level approach of task learning and meta-learning, where task learning trains policies via RL and meta-learning trains a meta-objective via imitation. Both papers aim to enable efficient off-policy adaptation in meta RL.
The document discusses hyperparameter optimization in machine learning models. It introduces various hyperparameters that can affect model performance, and notes that as models become more complex, the number of hyperparameters increases, making manual tuning difficult. It formulates hyperparameter optimization as a black-box optimization problem to minimize validation loss and discusses challenges like high function evaluation costs and lack of gradient information.
The document discusses FactorVAE, a method for disentangling latent representations in variational autoencoders (VAEs). It introduces Total Correlation (TC) as a penalty term that encourages independence between latent variables. TC is added to the standard VAE objective function to guide the model to learn disentangled representations. The document provides details on how TC is defined and computed based on the density-ratio trick from generative adversarial networks. It also discusses how FactorVAE uses TC to learn disentangled representations and can be evaluated using a disentanglement metric.
In this work, we introduce a new Markov operator associated with a digraph, which we refer to as a nonlinear Laplacian. Unlike previous Laplacians for digraphs, the nonlinear Laplacian does not rely on the stationary distribution of the random walk process and is well defined on digraphs that are not strongly connected. We show that the nonlinear Laplacian has nontrivial eigenvalues and give a Cheeger-like inequality, which relates the conductance of a digraph and the smallest non-zero eigenvalue of its nonlinear Laplacian. Finally, we apply the nonlinear Laplacian to the analysis of real-world networks and obtain encouraging results.
The document discusses two recent papers on off-policy meta-reinforcement learning:
1) "Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables" which introduces PEARL, an off-policy method for meta RL using context variables to enable efficient adaptation.
2) "Guided Meta-Policy Search" which uses a two-level approach of task learning and meta-learning, where task learning trains policies via RL and meta-learning trains a meta-objective via imitation. Both papers aim to enable efficient off-policy adaptation in meta RL.
The document discusses hyperparameter optimization in machine learning models. It introduces various hyperparameters that can affect model performance, and notes that as models become more complex, the number of hyperparameters increases, making manual tuning difficult. It formulates hyperparameter optimization as a black-box optimization problem to minimize validation loss and discusses challenges like high function evaluation costs and lack of gradient information.
The document discusses FactorVAE, a method for disentangling latent representations in variational autoencoders (VAEs). It introduces Total Correlation (TC) as a penalty term that encourages independence between latent variables. TC is added to the standard VAE objective function to guide the model to learn disentangled representations. The document provides details on how TC is defined and computed based on the density-ratio trick from generative adversarial networks. It also discusses how FactorVAE uses TC to learn disentangled representations and can be evaluated using a disentanglement metric.
- Hiroaki Shiokawa's research interests include graph mining, network analysis, and efficient algorithms. He was previously employed at NTT from 2011 to 2015.
- His current research focuses on developing clustering algorithms for large-scale networks and evaluating their performance on real-world network datasets.
- He has published highly cited papers in top data mining and network science conferences such as KDD, CIKM, and WSDM.
This document discusses information theory and related concepts such as entropy, Kullback-Leibler divergence, mutual information, independent component analysis, clustering algorithms, change point detection, kernel density estimation, and nonparametric regression. It provides mathematical definitions and formulas for these concepts. Figures are included to illustrate clustering and change point detection methods. The document contains information that could be useful for understanding techniques in machine learning, signal processing, and statistics.
This document presents an overview of optimization algorithms on Riemannian manifolds. It begins by introducing concepts such as vector transport and retraction mappings that are used to generalize algorithms from Euclidean spaces to manifolds. It then summarizes several classical optimization methods including gradient descent, conjugate gradient, and variants of quasi-Newton methods adapted to the Riemannian setting using these geometric concepts. The convergence of the Fletcher-Reeves method is analyzed under standard assumptions on the objective function. Overall, the document provides a conceptual and mathematical foundation for optimization on manifolds.
This document discusses methods for identifying the source node of information spread in networks based on the observed spread over time. It begins by introducing epidemic models like SIS and SI for modeling information spread over networks. It then discusses maximum likelihood methods for identifying the source node on regular tree networks based on the observed subgraph. The accuracy of these methods increases with network size and degree. Extensions to other network structures and SIR models are also proposed. Overall, the document reviews mathematical models and algorithms for source identification in networks from limited observations of information spread.
This document summarizes a presentation on rigorously verifying the accuracy of numerical solutions to semi-linear parabolic partial differential equations using analytic semigroups. It introduces the considered problem of finding the solution to a semi-linear parabolic PDE. It then discusses using a piecewise linear finite element discretization in space and time to obtain an initial numerical solution. The goal is to rigorously enclose the true solution within a radius ρ of this numerical solution in the function space L∞(J;H10(Ω)). Key steps involve using properties of the analytic semigroup generated by the operator A and estimating discretization errors to compute the enclosure radius ρ.
This document discusses pattern formation in crowd dynamics. It begins with an introduction to crowd dynamics and then discusses two specific patterns: lane formation and freezing-by-heating transition. Lane formation occurs when pedestrians walking in opposite directions spontaneously form lanes to allow for more efficient movement. Freezing-by-heating transition refers to the phenomenon where increasing noise or energy in a crowd leads to the formation of orderly lanes, rather than disorder. The document explores mathematical modeling of these patterns using particle simulation models.
14. 指定区間内の固有値数を求める
• 区間[a, b]内の固有値数mを求める
A = U UT
Ak
= U k
UT
UT
U = I
tr(h(A)) = tr(Uh( )UT
)
= tr( i [a,b] uiuT
i )
= m
定義と関係式
ただし
h(t)
1 if t [a, b]
0 otherwize
diag( 1, 2, . . . , n)
U [u1, u2, . . . , un]
14
15. 矩形関数の近似
• 矩形関数h(A)を近似する
– 多項式近似*1
• Chebyshev多項式等
– 周回数値積分による近似(有理関数近似)*2
0
h( )
今回扱う手法
15
*1
E.
D.
Napoli,
E.
Polizzi,
Y.
Saad,
arXiv:1308.4275v2,
2014
*2
Y.
Futamura,
H.
Tadano,
and
T.
Sakurai,
JSIAM
Le`ers,
Vol.
2,
pp.
127-‐130,
2010
16. ただし、 かつ
周回積分のよる表現
16
h(A) =
1
2 i
(zI A) 1
dz
1
2 i
(zI A) 1
dz =
1
2 i
U(zI ) 1
UT
dz
=
n
i=1
1
2 i
1
z i
dzuiuT
i
= b
Re
Im
‥eigenvalue
Γ
= a
よって
a b
= m
17. 数値積分
• 周回積分を数値積分で近似する
– 台形則
– Gauss-‐Legendre則
– etc..
• h(A)の有理関数近似と見て取れる
1
2 i
(zI A) 1
dz
N
j=1
wj(zjI A) 1
wj:重み,zj:積分点
17
18. 行列traceのモンテカルロ計算
• 大規模行列の場合、行列多項式や逆行列の
陽を計算するのは非現実的
18
行列のtraceを求めるモンテカルロ系の
方法を用いる
M.
F.
Hutchinson,
Commun.
Staast.
Simula.,
19
(1990),
pp.
433–450.
Z.
Bai,
M.
Fahey
and
G.
Golub,
Vol.
74,
Issues
1-‐2,
pp.
71-‐89,
1996
H.
Avron
and
S.
Toledo,
Journal
of
the
ACM,
58,
p.
8,
2011
F.
Roosta-‐Khorasani,
U.
Ascher,
Foundaaons
of
Computaaonal
Mathemaacs,
pp.1-‐26,
2014
19. 問題設定
• 多項式近似の場合:
– 実対称行列の(実係数)多項式は実対称行列
• 周回数値積分(有理関数近似)の場合:
– 積分点を共役対で与えれば実対称行列の議論に落
ちる
• 問題設定:実対称行列Aのtraceを近似する
w(zI A) 1
+ w(zI A) 1
は実対称行列
※
Re
Im
19
20. 行列Traceのモンテカルロ計算
• viは乱数で作られたベクトル
• vi
TAviの期待値がtr(A)になるように作られる
• Aが行列値関数の値だったりして陽に与えられ
なくても、Aとベクトルとの積が計算できれば使え
る
tr(A)
1
s
s
i=1
vT
i Avi
20
21. Trace
esamatorの種類
• Hutchinson
esamator
• Gaussian
esamator
• Normalized
Rayleigh-‐quoaent
esamator
• Unit
vector
esamator
21
H.
Avron
and
S.
Toledo,
Journal
of
the
ACM,
58,
p.
8,
2011
F.
Roosta-‐Khorasani,
U.
Ascher,
Foundaaons
of
Computaaonal
Mathemaacs,
pp.1-‐26,
2014
22. Hutchinson
esamator
• 各要素が等確率で-‐1か1のベクトルを使う
• 行列Traceを近似するモンテカルロ法の最初の
提案*1で使われ、広く用いられている*2
• 行列が対角優位なほど分散が相対的に小さくな
る
• Aが対角行列の場合は、ベクトル1本で真値が得
られる
Var(vT
Av) = 2(||A||2
F
n
i=1
aii)
22
*1
M.
F.
Hutchinson,
Commun.
Staast.
Simula.,
19
(1990),
pp.
433–450.
*2
H.
Avron
and
S.
Toledo,
Journal
of
the
ACM,
58,
p.
8,
2011
25. Unit
vector
esamator
• を一様に乱択し、単位ベクトルeiを
使う
• 分散は行列の対角成分のみのばらつきできまる
• 対角要素が全て等しい場合は、ベクトル1本で真値が
得られる
• 離散フーリエ変換などでシャッフルするアプローチもあ
る
i {1, 2, . . . , n}
tr(A)
n
s
s
i=1
vT
i Avi
25
Var(nvT
Av) = n
n
i=1
A2
ii tr(A)2
36. 数値実験
ex.3
(1/3)
• 密度汎関数計算で現れる行列における実験
†
• 標準固有値問題
• Si
510
原子系の計算で現れる行列で実験
– 行列の次元数
175,616
– 最小から1,020
の固有対が必要
– 行列は行列ベクトル積の関数としてのみ与えられ
る
36
†J.
R.
Chelikowsky,
N.
Troullier,
K.
Wu,
Y.
Saad,
Phys.
Rev.
B
50
11355-‐11364,
1994.
†J.-‐I.
Iwata,
D.
Takahashi,
A.
Oshiyama,
T.
Boku,
K.
Shiraishi,
S.
Okada,
and
K.
Yamada
J.
Comput.
Phys.
229,
2339-‐2363,
2010.
38. ex.3
数値実験結果
(1/2)
38
0 10 20 30 40 50 60 70 80 90 100
0
10
20
30
40
50
60
Index of the circle
Eigenvaluecount
Estimation
Exact
39. ex.3
数値実験結果 (2/2)
10 20 30 40 50 60 70 80 90 100
0
200
400
600
800
1000
1200
Index of the circle
Eigenvaluecount
Estimation
Exact
accumulated
39
40. モンテカルロ計算による固有値分布
推定に関するReference
• 非対称一般化固有値問題を扱った論文
• 非線形固有値問題を扱った論文
40
Y.
Futamura,
H.
Tadano,
and
T.
Sakurai,
“Parallel
stochasac
esamaaon
method
of
eigenvalue
distribuaon”,
JSIAM
Le`ers,
Vol.
2,
pp.
127-‐130,
2010
Y.
Maeda,
Y.
Futamura,
and
T.
Sakurai,
“Stochasac
esamaaon
of
eigenvalue
density
for
nonlinear
eigenvalue
problem
on
the
complex
plane”,
JSIAM
Le`ers,
pp.
61–64,
2011
44. (前処理無し)COCG法のアルゴリズム
Set initial guess x0
Set p0 = r0 = b x0
for k = 0, 1, . . . , until converge do
qk = Cpk
k =
rk
T
rk
pk
Tqk
xk+1 = xk + kpk
rk+1 = rk kqk
k =
rk+1
T
rk+1
rk
Trk
pk+1 = rk+1 + kpk
end for
複素対称行列Cに関する
線形方程式
Cx = b
を解くアルゴリズム
44
47. shi=ed
Krylov部分空間法
• Krylov部分空間のシフト不変性を利用した解
法
– Shi=ed
CG*1
– Shi=ed
COCG*2
– Shi=ed
COCR*3
– etc..
47
*1
B.
Jegerlehner,
Hep-‐lat/9612014,
1996
*2
S.
Yamamoto,
T.
Sogabe,
T.
Hoshi,
S.-‐L.
Zhang,
and
T.
Fujiwara,
J.
Phys.
Soc.
Jpn.,
Vol.
77,
No.
11,
114713,
pp.
1-‐8,
2008
*3
T.
Sogabe
and
S.-‐L.
Zhang,
East
Asia
J.
on
Appl.
Math.,
1,
pp.
97-‐107,
2011
49. Shi=ed
COCGのアルゴリズム
Set initial guess x0
Set x j
0 = x0 = 0, 1 = j
1 = j
0 = 1
Set p j
0 = p0 = r0 = b
for k = 0, 1, . . . , until converge do
{COCG iteration}
for j = 1, 2, . . . , N do
j
k+1 =
j
k
j
k 1 k 1
k k 1( j
k 1
j
k ) + j
k 1 k 1(1 + j k)
j
k = ( j
k+1/ j
k ) k
x j
k+1 = x j
k + j
k p j
k
j
k = ( j
k+1/ j
k )2 j
k
p j
k+1 = j
k+1rk+1 + j
k p j
k
end for
end for 49
58. Shi=ed
COCGのアルゴリズム(再掲)
Set initial guess x0
Set x j
0 = x0 = 0, 1 = j
1 = j
0 = 1
Set p j
0 = p0 = r0 = b
for k = 0, 1, . . . , until converge do
{COCG iteration}
for j = 1, 2, . . . , N do
j
k+1 =
j
k
j
k 1 k 1
k k 1( j
k 1
j
k ) + j
k 1 k 1(1 + j k)
j
k = ( j
k+1/ j
k ) k
x j
k+1 = x j
k + j
k p j
k
j
k = ( j
k+1/ j
k )2 j
k
p j
k+1 = j
k+1rk+1 + j
k p j
k
end for
end for 58
59. スカラーの漸化式
x j
k+1 = x j
k + j
k p j
k
ベクトルの漸化式がスカラーの漸化式に
計算量、メモリ要求量の大幅な削減
p j
k+1 = j
k+1rk+1 + j
k p j
k
bT
x j
k+1 = bT
x j
k + j
k bT
p j
k
bT
p j
k+1 = j
k bT
p j
k
ベクトルの漸化式:
bとの内積をとると…
bT
rk = 0
(k = 1, 2, . . . )
を利用
59