SlideShare a Scribd company logo
1 of 117
Download to read offline
Copyright © GREE, Inc. All Rights Reserved.
機械学習モデルのハイパパラメータ最適化
Copyright © GREE, Inc. All Rights Reserved.
• 尾崎 嘉彦
• グリー株式会社 エンジニア
• Webゲーム開発 -> 機械学習
• 産総研 特定集中研究専門員
• ブラックボックス最適化
• 微分フリー最適化
• ハイパパラメータ最適化
発表者の紹介
Copyright © GREE, Inc. All Rights Reserved.
イントロダクション
Copyright © GREE, Inc. All Rights Reserved.
Copyright © GREE, Inc. All Rights Reserved.
機械学習におけるハイパパラメータ
モデル自身や学習に関わる手法が持つ,性能に影響を及ぼす調整可能なパラメータ
x
t
ln λ = −18
0 1
−1
0
1
x
t
ln λ = 0
0 1
−1
0
1
正則化項のはたらき (Bishop, 2006) Adam optimizer (Kingma and Ba 2015)
Copyright © GREE, Inc. All Rights Reserved.
モデルの複雑化に伴いハイパパラメータ数も増加
手作業や簡単な手法では細かい調整が手に負えない状況
7x7conv,64,/2
pool,/2
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,128,/2
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,256,/2
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,512,/2
3x3conv,512
3x3conv,512
3x3conv,512
3x3conv,512
3x3conv,512
avgpool
fc1000
image
3x3conv,512
3x3conv,64
3x3conv,64
pool,/2
3x3conv,128
3x3conv,128
pool,/2
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
pool,/2
3x3conv,512
3x3conv,512
3x3conv,512
pool,/2
3x3conv,512
3x3conv,512
3x3conv,512
3x3conv,512
pool,/2
fc4096
fc4096
fc1000
image
output
size:112
output
size:224
output
size:56
output
size:28
output
size:14
output
size:7
output
size:1
VGG-1934-layerplain
7x7conv,64,/2
pool,/2
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,128,/2
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,256,/2
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,512,/2
3x3conv,512
3x3conv,512
3x3conv,512
3x3conv,512
3x3conv,512
avgpool
fc1000
image
34-layerresidual
Residual Network (He et al. 2016)
Copyright © GREE, Inc. All Rights Reserved.
ハイパパラメータ最適化の研究の盛り上がり
深層学習等の実用において必要不可欠な道具へ発展
• 探索空間が広大
• 関数評価コストが高価
• 目的関数がノイジー
• 変数のタイプが多様
ベイズ最適化などを中心に研究が発展 (Hutter et al. 2015)
ハイパパラメータ調整の自動化は最適化問題としてチャレンジング
Copyright © GREE, Inc. All Rights Reserved.
ハイパパラメータ最適化問題の定式化
性能指標(損失関数)を最小化するブラックボックス最適化と考えるのが標準的
Minimize f(λ)
subject to λ ∈ Λ.
自分たちが観測できるのは,ノイズを伴った目的関数値のみ
目的関数が数式の形で明示的には与えられない
fϵ(λ) = f(λ) + ϵ, ϵ
iid
∼ N(0, σ2
n)
Copyright © GREE, Inc. All Rights Reserved.
ブラックボックス最適化
利点と欠点
• 目的関数値しか要らない
• モデルや損失関数に依存せず極めて汎用的
• 目的関数の素性が不明
• 勾配情報が利用不可 (効率的な最適化手法を考えるのが難しい)
• 微分フリー最適化手法が必要
利点
欠点
Copyright © GREE, Inc. All Rights Reserved.
ハイパパラメータ最適化問題の定式化
最適化対象として直接k-fold cross validation lossなどを考えるのが一般的
fϵ(λ) =
1
k
k
i=1
L(Aλ, Di
train, Di
valid)
Copyright © GREE, Inc. All Rights Reserved.
ハイパパラメータの分類
連続は一番扱いやすく,条件的は一番扱いにくい
Copyright © GREE, Inc. All Rights Reserved.
最適化手法
Copyright © GREE, Inc. All Rights Reserved.
• Strong Anytime Performance
• 厳しい制約のもとで,良い性能が得られること
• Strong Final Performance
• 緩い制約のもとで,非常に良い設定が得られること
• Effective Use of Parallel Resources
• 効率的に並列化できること
• Scalability
• 非常に多くのパラメータ数でも問題なく扱うことができること
• Robustness & Flexibility
• 目的関数値の観測ノイズや非常にセンシティブなパラメータに対して,
頑健かつ柔軟であること
ハイパパラメータ最適化手法が満たすべき要件 (Falkner et al. 2018a)
全てを満たすのは難しいため,現実には目的に応じて取捨選択が必要
Copyright © GREE, Inc. All Rights Reserved.
手法の分類
Dodge et al. (2017)
λk
{(λi
, f(λi
))}k−1
i=1
λk
{λi
}k−1
i=1
• ベイズ最適化など
• 目的関数値を活用して効率的に最適化
• 評価回数を少なく抑えられる傾向
• グリッドサーチやランダムサーチなど
• 目的関数値に対する依存性がないため,リソースの許す限り並列評価が可能
• CPU時間に対する課金が主流のクラウド計算資源と相性がよい
• ウォールクロックタイムを少なく抑えられる傾向
Copyright © GREE, Inc. All Rights Reserved.
グリッドサーチ
ハイパパラメータ調整に言及していたNIPS2014の論文88本のうち84本が使用 (Simm 2015)
Copyright © GREE, Inc. All Rights Reserved.
グリッドサーチ
利点と欠点
• 並列化しやすく,計算リソースに対してスケーラブル
• 低実効次元性(後述)に著しく脆弱
• 計算量がパラメータ数の指数オーダーのためノンスケーラブル
• 局所・大域的最適解を見つける能力が貧弱
Copyright © GREE, Inc. All Rights Reserved.
実験計画法 (Design of Experiments)
最良の点を中心とするより狭い範囲を反復的にサンプリング (Staelin 2002)
黒:2-level DOE
白:3-level DOE
黒:2-level DOEの1反復目
白:左下黒を最良と仮定した2反復目
Copyright © GREE, Inc. All Rights Reserved.
ランダムサーチ
グリッドサーチと並んで最もシンプルな手法
Copyright © GREE, Inc. All Rights Reserved.
ランダムサーチ
利点と欠点
• 並列化しやすく,計算リソースに対してスケーラブル
• パラメータ数に対してスケーラブル
• 低実効次元性(後述)に頑健
• 局所・大域的最適解を見つける能力が貧弱
利点
欠点
Copyright © GREE, Inc. All Rights Reserved.
低実効次元性 (Low Effective Dimensionality)
モデル性能にとって重要なパラメータは少数であるためグリッドサーチは非効率,
またデータセット毎にそれらは異なる (Bergstra et al. 2012)
Important parameter
Unimportantparameter
Important parameter
Unimportantparameter
f(λ1, λ2) = g(λ1) + h(λ2) ≈ g(λ1)
Copyright © GREE, Inc. All Rights Reserved.
• Hutter et al. (2014)
• functional ANOVAによるアプローチで重要なハイパパラメータを特定
• Fawcett and Hoos (2016)
• 2つの設定間で最もパフォーマンスに貢献しているパラメータを調べるablation
analysis
• Biedenkapp et al. (2017)
• サロゲートを用いることでablation analysisを高速化
• van Rijn and Hutter (2017a, b)
• functional ANOVAを用いて大規模にデータセット間のハイパパラメータ重要性を分析
重要なハイパパラメータの特定
近年の研究動向
Copyright © GREE, Inc. All Rights Reserved.
低食い違い量列 (Low Discrepancy Sequence)
一様ランダムの代わりにSobol列やLatin Hypercube Samplingの使用を提案,計算実験の
結果Sobol列が有望 (Bergstra et al. 2012),Dodge et al. 2017はk-DPPの使用を提案
Uniform Sobol LHS
Copyright © GREE, Inc. All Rights Reserved.
Nelder-Mead法 (Nelder and Mead 1965)
反復的に単体を変形し最適化,Rのoptim関数の標準手法として採用されている
1次元,2次元および3次元単体
Copyright © GREE, Inc. All Rights Reserved.
λ⁰
λ2
λ¹
λic
λc
λoc
λr
λe
f(λ0
) ≤ f(λ1
) ≤ f(λ2
)
Nelder-Mead法 (Nelder and Mead 1965)
Copyright © GREE, Inc. All Rights Reserved.
λ⁰
λ2
λ¹
λic
λc
λoc
λr
λe
Reflect: λr
= λc
+ δr
(λc
− λn
) where λc
=
n−1
i=0 λi
/n
Nelder-Mead法 (Nelder and Mead 1965)
Copyright © GREE, Inc. All Rights Reserved.
λ⁰
λ2
λ¹
λic
λc
λoc
λr
λe
Expand: λe
= λc
+ δe
(λc
− λn
)
Nelder-Mead法 (Nelder and Mead 1965)
Copyright © GREE, Inc. All Rights Reserved.
λ⁰
λ2
λ¹
λic
λc
λoc
λr
λe
Outside contract: λoc
= λc
+ δoc
(λc
− λn
)
Nelder-Mead法 (Nelder and Mead 1965)
Copyright © GREE, Inc. All Rights Reserved.
λ⁰
λ2
λ¹
λic
λc
λoc
λr
λe
Inside contract: λic
= λc
+ δic
(λc
− λn
)
Nelder-Mead法 (Nelder and Mead 1965)
Copyright © GREE, Inc. All Rights Reserved.
λ⁰
λ2
λ¹
λic
λ1s
λoc
λr
λe
λ2s
Shrink: λ0
+ γs
(λi
− λ0
) : i = 0, . . . , n}
Nelder-Mead法 (Nelder and Mead 1965)
Copyright © GREE, Inc. All Rights Reserved.
λ0
λ1
λ2
f(λ0
) ≤ f(λ1
) ≤ f(λ2
)
Nelder-Mead法 (Nelder and Mead 1965)
Copyright © GREE, Inc. All Rights Reserved.
λ0
λ1
λr
λ2
Reflect
Nelder-Mead法 (Nelder and Mead 1965)
Copyright © GREE, Inc. All Rights Reserved.
λ0
λ1
λr
λe
λ2
f(λr
) < f(λ0
) Expand
Nelder-Mead法 (Nelder and Mead 1965)
Copyright © GREE, Inc. All Rights Reserved.
λ0
λ1
λe
f(λr
) f(λe
) λ2
Nelder-Mead法 (Nelder and Mead 1965)
Copyright © GREE, Inc. All Rights Reserved.
λ1
λ2
λr
λ0
Nelder-Mead法 (Nelder and Mead 1965)
Copyright © GREE, Inc. All Rights Reserved.
λ1
λ2
λr
λ0
λoc
f(λ1
) ≤ f(λr
) < f(λ2
) Outside contract
Nelder-Mead法 (Nelder and Mead 1965)
Copyright © GREE, Inc. All Rights Reserved.
λ2
λ1
λ0
f(λoc
) ≤ f(λ2
) λ2
λoc
Nelder-Mead法 (Nelder and Mead 1965)
Copyright © GREE, Inc. All Rights Reserved.
λ2
λ1
λr
λ0
λe
Nelder-Mead法 (Nelder and Mead 1965)
Copyright © GREE, Inc. All Rights Reserved.
λ0
λ2
λ1
Nelder-Mead法 (Nelder and Mead 1965)
Copyright © GREE, Inc. All Rights Reserved.
λ1
λ0
λ2
Nelder-Mead法 (Nelder and Mead 1965)
Copyright © GREE, Inc. All Rights Reserved.
λ1
λ0
λ2
λic
λr
f(λr
) ≥ f(λ2
) Inside contract
Nelder-Mead法 (Nelder and Mead 1965)
Copyright © GREE, Inc. All Rights Reserved.
λ2
λ0
λ1
Reflect Contract λ2
Shrink
Nelder-Mead法 (Nelder and Mead 1965)
Copyright © GREE, Inc. All Rights Reserved.
λ2
λ0
λ1
Reflect Contract λ2
Shrink
Nelder-Mead法 (Nelder and Mead 1965)
Copyright © GREE, Inc. All Rights Reserved.
λ2
λ1
λ0
Nelder-Mead法 (Nelder and Mead 1965)
Copyright © GREE, Inc. All Rights Reserved.
McCormick benchmark function
Nelder-Mead法 (Nelder and Mead 1965)
Copyright © GREE, Inc. All Rights Reserved.
利点と欠点
収束性や失敗する例,改良した手法などはConn et al. (2009); Audet and Hare (2017)
利点
• 局所解を見つける能力に優れる
• 部分的な並列化しかできない
• 悪質な局所解に陥る可能性がある
欠点
Nelder-Mead法 (Nelder and Mead 1965)
Copyright © GREE, Inc. All Rights Reserved.
• 標準的な選択
係数の選択
0 < γs
< 1, −1 < δic
< 0 < δoc
< δr
< δe
γs
= 1
2 , δic
= −1
2 , δoc
= 1
2 , δr
= 1 and δe
= 2
γs
= 1 −
1
n
, δic
= −
3
4
+
1
2n
, δoc
=
3
4
−
1
2n
, δr
= 1, δe
= 1 +
2
n
where n ≥ 2
• 適応的な係数 (Gao and Han 2012)
Nelder-Mead法 (Nelder and Mead 1965)
Copyright © GREE, Inc. All Rights Reserved.
ベイズ最適化
現在最も注目されているハイパパラメータ最適化手法(この例は最大化問題)
Copyright © GREE, Inc. All Rights Reserved.
ベイズ最適化
• ベイズ最適化
• サロゲートをベイズ的に構築するSMBOの総称
•       を考えるP(fϵ(λ) | λ)
• サロゲートの種類
• ガウス過程 (GP)
• 最も標準的,有名な実装はSpearmint (Snoek et al. 2012)
• ランダムフォレスト
• SMAC (Hutter et al. 2011)
• Tree Parzen Estimator (TPE) (Bergstra et al. 2011)
• 実装はHyperopt
•            を考える
• DNN (Snoek et al. 2015)
P(λ | fϵ(λ)), P(fϵ(λ))
• Sequential Model-based Optimization (SMBO)
• 反復的に関数評価とサロゲート(目的関数のモデル)の更新を繰り返す手法の総称
• ベイズ最適化や信頼領域法 (Ghanbari and Scheinberg 2017)
Copyright © GREE, Inc. All Rights Reserved.
• ガウス分布
• スカラ,ベクトル上の分布
• ガウス過程
• 関数上の分布
ベイズ最適化
ガウス過程回帰に基づく方法
−1 −0.5 0 0.5 1
−3
−1.5
0
1.5
3
ガウス過程からのサンプル (Bishop, 2006)
Copyright © GREE, Inc. All Rights Reserved.
• 目的関数が平均関数mと共分散関数kにより特徴づけされるGPに従うと仮定
• 事前平均関数としては      とするのが標準的
ベイズ最適化
ガウス過程回帰に基づく方法
fϵ(λ) ∼ GP(m(λ), k(λ, λ′
))
m(λ) = 0
Copyright © GREE, Inc. All Rights Reserved.
• カーネルはモデルの形を特徴づける
• 2点間の近さを抽象化したようなもの
• 適切なカーネルを選べばカテゴリ的・条件的パラメータも扱える
ベイズ最適化
共分散関数(カーネル)
Exponentiated Quadratic
Matérn 5/2
Kernels / Covariance functions (PyMC3)
Copyright © GREE, Inc. All Rights Reserved.
• ARD squared exponential kernel
• ARD Matérn 5/2 kernel
• カーネルのハイパパラメータはデータから動的に決める
• 経験ベイズ (Bishop 2006)
• Markov Chain Monte Carlo (MCMC) (Snoek et al. 2012)
共分散関数(カーネル)の選択 (Snoek et al. 2012)
kse(λ, λ′
) = θ0
exp(−
1
2
r2
(λ, λ′
)),
r2
(λ, λ′
) =
D
d=1
(λd − λ′
d)2
/(θd
)2
k52(λ, λ′
) = θ0
(1 + 5r2(λ, λ′) +
5
3
r2
(x, λ′
)) exp(− 5r2(λ, λ′))
ベイズ最適化
Copyright © GREE, Inc. All Rights Reserved.
ベイズ最適化
PRML 6章,カーネルのハイパパラメータの影響 (Bishop 2006)
(1.00, 4.00, 0.00, 0.00)
−1 −0.5 0 0.5 1
−3
−1.5
0
1.5
3
(9.00, 4.00, 0.00, 0.00)
−1 −0.5 0 0.5 1
−9
−4.5
0
4.5
9
(1.00, 64.00, 0.00, 0.00)
−1 −0.5 0 0.5 1
−3
−1.5
0
1.5
3
(1.00, 0.25, 0.00, 0.00)
−1 −0.5 0 0.5 1
−3
−1.5
0
1.5
3
(1.00, 4.00, 10.00, 0.00)
−1 −0.5 0 0.5 1
−9
−4.5
0
4.5
9
(1.00, 4.00, 0.00, 5.00)
−1 −0.5 0 0.5 1
−4
−2
0
2
4
k(λ, λ′
) = θ0
exp −
θ1
2
∥λ − λ′
∥2
+ θ2
+ θ3
λ⊤
λ′
Copyright © GREE, Inc. All Rights Reserved.
ベイズ最適化
mとkを決めれば,過去の観測から未観測点の関数値を予測できる
ガウス分布の性質とSchurの公式から導出される (Rasmussen and Williams 2005; Bishop 2006)
データがないとまともに予測できないので,ランダムサーチなどでデータを集めて初期化しておく
P(fϵ(λt+1
) | λ1
, λ2
, . . . , λt+1
) = N(µt(λt+1
), σ2
t (λt+1
) + σ2
n),
µt(λt+1
) = k⊤
[K + σ2
nI]−1
[f(λ1
) f(λ2
) · · · f(λt
)]⊤
,
σ2
t (λt+1
) = k(λt+1
, λt+1
) − k⊤
[K + σ2
nI]−1
k
where
k = [k(λt+1
, λ1
) k(λt+1
, λ2
) · · · k(λt+1
, λt
)]⊤
,
K =
⎡
⎢
⎣
k(λ1
, λ1
) · · · k(λ1
, λt
)
...
...
...
k(λt
, λ1
) · · · k(λt
, λt
)
⎤
⎥
⎦ .
Copyright © GREE, Inc. All Rights Reserved.
ベイズ最適化
観測点の近くでは分散小,離れると分散大(予測が不確かになる)
Brochu et al. (2010)
Copyright © GREE, Inc. All Rights Reserved.
ベイズ最適化
次に評価する点の選び方
• 獲得関数と呼ばれる指標を最大化する点を次に評価する点として選ぶ
• 獲得関数は探索と知識利用のトレードオフを担う
• サロゲートの分散が大きい点を評価(探索)
• サロゲートの平均が小さい点を評価(知識利用)
aUCB(λ) = −µ(λ) + ξσ(λ)
• 例:GP-Upper Confidence Bound (GP-UCB) (Srinivas 2012)

解きたいのは損失最小化問題なので-µ(λ)
• Probability of Improvement (PI), Expected Improvement (EI), Predictive
Entropy Search (PES) など色々あり,探索性能に大きく影響
Copyright © GREE, Inc. All Rights Reserved.
ベイズ最適化
利点と欠点
利点
欠点
• 探索と知識利用のトレードオフを考慮した大域的な探索が可能
• 観測ノイズを考慮した探索が可能
• 共分散関数と獲得関数に対してセンシティブ
• 獲得関数の最適化が非凸大域的最適化
• ガウス過程回帰の場合,観測データ数の3乗オーダーの計算量
• 並列化が難しい
Copyright © GREE, Inc. All Rights Reserved.
サロゲートの計算量削減
近年の研究動向
[K + σ2
nI]−1
• ガウス過程回帰のボトルネック:
• 近似計算 (Quiñonero-Candela et al. 2007; Titsias 2009)
• 計算量が相対的に少ないサロゲート
• ランダムフォレスト (Hutter et al. 2011)
• DNN (Snoek et al. 2015)
Copyright © GREE, Inc. All Rights Reserved.
• Shah and Ghahramani (2015)
• Parallel Predictive Entropy Search
• Gonzalez et al. (2016)
• Local Penalization
• Kathuria et al. (2016)
• DPP sampling
• Kandasamy et al. (2018)
• 非同期並列Thompson sampling
• この他にも沢山
• Bergstra et al. (2011); Snoek et al. (2012); Contal et al.
(2013); Desautels et al. (2014); Daxberger and Low (2017);
Wang et al. (2017, 2018a); Rubin (2018)
ベイズ最適化の並列化
近年の研究動向
Copyright © GREE, Inc. All Rights Reserved.
ベイズ最適化
(再掲)この例は最大化問題
Copyright © GREE, Inc. All Rights Reserved.
その他の手法
適用事例報告がある主なもの
• CMA-ES
• Watanabe and Le Roux (2014); Loshchilov and Hutter (2016)
• Particle Swarm Optimization (PSO)
• Meissner et al. (2006); Lin et al. (2009); Lorenzo et al. (2017); Ye
(2017)
• Genetic Algorithm (GA)
• Leung et al. (2003); Young et al. (2015)
• Differential Evolution (DE)
• Fu et al. (2016a,b)
• 強化学習
• Hansen (2016); Bello et al. (2017); Dong et al. (2018)
• 勾配法 (※ブラックボックス最適化でない,連続パラメータのみ)
• Maclaurin et al. (2015); Luketina et al. (2016); Pedregosa (2016);
Franceschi (2017a,b,c, 2018a,b)
Copyright © GREE, Inc. All Rights Reserved.
補助的なテクニック
Copyright © GREE, Inc. All Rights Reserved.
• Domhan et al. (2015)
• 11種類の基底関数の重み付き線形和で学習曲線をモデル化
• ベイジアンネットワークを使用 (Klein et al. 2016)
• 過去のデータを活用 (Chandrashekaran and Lane 2017)
早期終了
エポック数に対する学習曲線を予測し,良い性能を達成する見込みのない学習を停止
fcomb =
k
i=1 wi
fi
(λ | θi) + ϵ, ϵ ∼ N(0, σ2
),
k
i=1 wi
= 1, ∀wi
, wi
≥ 0
Copyright © GREE, Inc. All Rights Reserved.
• 異なる解像度でハイパパラメータ最適化後,functional ANOVAにより重要なパラメータを分析
• 多くの重要なパラメータとその値は解像度に依らず同じ (e.g. 学習率,バッチサイズ)
• 解像度の影響を受けるものは直後にmax-poolingを伴う畳込み層の数など(poolingすると
解像度が減るため)-> 高解像度化した際の適切な初期値は低解像度の場合から推測する
• 32×32で750回評価,64×64で500回評価,128×128で250回評価を行いハイパパラメータ最
適化しても精度は落ちず,128×128で1500回評価するよりも早く終わる
Increasing Image Sizes (IIS) (Hinz et al. 2018)
低解像度の画像を用いてハイパパラメータを最適化を始め,徐々に解像度を上げていく
Copyright © GREE, Inc. All Rights Reserved.
• Successive Halving (Jamieson and Talwalkar 2015)
• 複数のハイパパラメータ設定候補を評価
• 下位候補を棄却,リソースを上位候補に多く割当て直して評価を継続
• 課題
• 候補数をnリソースをBとしたとき,nとB/nの適切なトレードオフは非自明
Hyperband (Li et al. 2016)
リソース (e.g. 学習時間,教師データ数) を適応的に割り当てる
Copyright © GREE, Inc. All Rights Reserved.
Hyperband (Li et al. 2016)
提案手法:グリッドサーチのようにnとB/nのトレードオフを複数試す
ランダムサーチやベイズ最適化と組み合わせる (Bertrand et al. 2017; Falkner et al. 2018; Wang et al. 2018)
Copyright © GREE, Inc. All Rights Reserved.
• 仮説:近いデータセットに対するハイパパラメータ最適化結果は似ている
• e.g. 学習データが増えたので,モデルを再学習する場合
• メタ特徴量
• ハンドメイド
• シンプルな特徴量(e.g. データ数,次元数,クラス数)
• 統計学や情報理論に基づく特徴 (e.g. 分布の歪度)
• ランドマーク特徴(決定木などシンプルな機械学習モデルの性能)
• 深層学習 (Kim et al. 2017a,b)
• 近いデータセットのハイパパラメータ最適化結果で手法を初期化しウォームスタート
• PSO (Gomes et al. 2012)
• GA (Reif et al. 2012)
• ベイズ最適化 (Bardenet et al. 2013; Yogatama and Mann 2014; Feurer et al.
2014,2015,2018; Kim et al. 2017a,b)
メタ学習とウォームスタート
近年の研究動向
Copyright © GREE, Inc. All Rights Reserved.
• Sampling (Arnold and Beyer 2006)
• 設定をn回評価し,平均値を取る
• Threshold Selection Equipped with Re-evaluation

(Markon et al. 2001; Beielstein and Markon 2002; Jin and Branke 2005; Goh and Tan 2007; Gießen and Kötzing 2016)
• 目的関数値が最良値をしきい値以上改善した場合にsampling
• Value Suppression (Wang et al. 2018b)
• best-k設定が一定期間更新されないときにbest-k設定をsamplingし,関数値を修正
ノイズ対策
近年の研究動向
Copyright © GREE, Inc. All Rights Reserved.
計算実験
Copyright © GREE, Inc. All Rights Reserved.
CNNのハイパパラメータ最適化 (Ozaki et al. 2017)
以下を5つの手法でハイパパラメータ最適化する
Name Description Range
x1 Learning rate (= 0.1x1
) [1, 4]
x2 Momentum (= 1 − 0.1x2
) [0.5, 2]
x3 L2 weight decay [0.001, 0.01]
x∗
4 FC1 units [256, 1024]
Integer parameters are marked with ∗
.
データセット:MNIST
ネットワーク:LeNet,Batch-Normalized Maxout Network in Network
タスク:文字認識(10クラス分類)
Name Description Range
x1 Learning rate (= 0.1x1
) [0.5, 2]
x2 Momentum (= 1 − 0.1x2
) [0.5, 2]
x3 L2 weight decay [0.001, 0.01]
x4 Dropout 1 [0.4, 0.6]
x5 Dropout 2 [0.4, 0.6]
x6 Conv 1 initialization deviation [0.01, 0.05]
x7 Conv 2 initialization deviation [0.01, 0.05]
x8 Conv 3 initialization deviation [0.01, 0.05]
x9 MMLP 1-1 initialization deviation [0.01, 0.05]
x10 MMLP 1-2 initialization deviation [0.01, 0.05]
x11 MMLP 2-1 initialization deviation [0.01, 0.05]
x12 MMLP 2-2 initialization deviation [0.01, 0.05]
x13 MMLP 3-1 initialization deviation [0.01, 0.05]
x14 MMLP 3-2 initialization deviation [0.01, 0.05]
Batch-Normalized Mahout Network in Network
(Chang and Chen 2015)
MMLP (Maxout Multi Layer Perceptron)
LeNet (LeCun et al. 1998)
MNIST (LeCun and Cortes, 2010)
Copyright © GREE, Inc. All Rights Reserved.
CNNのハイパパラメータ最適化 (Ozaki et al. 2017)
文字認識 (LeNet) 結果
Mean loss of all executions for each method per iteration (LeNet)
Copyright © GREE, Inc. All Rights Reserved.
CNNのハイパパラメータ最適化 (Ozaki et al. 2017)
文字認識 (LeNet) 結果
Method mean loss min loss
Random search 0.005411 (±0.001413) 0.002781
Bayesian optimization 0.004217 (±0.002242) 0.000089
CMA-ES 0.000926 (±0.001420) 0.000047
Coordinate-search method 0.000052 (±0.000094) 0.000002
Nelder-Mead method 0.000029 (±0.000029) 0.000004
Method mean accuracy (%) accuracy with min loss (%)
Random search 98.98 (±0.08) 99.06
Bayesian optimization 99.07 (±0.02) 99.25
CMA-ES 99.20 (±0.08) 99.30
Coordinate-search method 99.26 (±0.05) 99.35
Nelder-Mead method 99.24 (±0.04) 99.28
Copyright © GREE, Inc. All Rights Reserved.
CNNのハイパパラメータ最適化 (Ozaki et al. 2017)
文字認識 (Batch-Normalized Mahout Network in Network) 結果
Mean loss of all executions for each method per iteration
(Batch-Normalized Maxout Network in Network)
Copyright © GREE, Inc. All Rights Reserved.
CNNのハイパパラメータ最適化 (Ozaki et al. 2017)
文字認識 (Batch-Normalized Mahout Network in Network) 結果
Method mean loss min loss
Random search 0.045438 (±0.002142) 0.042694
Bayesian optimization 0.045636 (±0.001197) 0.044447
CMA-ES 0.045248 (±0.002537) 0.042250
Coordinate-search method 0.045131 (±0.001088) 0.043639
Nelder-Mead method 0.044549 (±0.001079) 0.043238
Method mean accuracy (%) accuracy with min loss (%)
Random search 99.56 (±0.02) 99.58
Bayesian optimization 99.47 (±0.05) 99.59
CMA-ES 99.49 (±0.14) 99.59
Coordinate-search method 99.48 (±0.04) 99.53
Nelder-Mead method 99.53 (±0.00) 99.54
Copyright © GREE, Inc. All Rights Reserved.
CNNのハイパパラメータ最適化 (Ozaki et al. 2017)
データセット:Adience benchmark
ネットワーク:Gil and Tal (2015)
タスク:
(1)性別推定(2クラス分類)
(2)年齢層推定(8クラス分類)
Name Description Range
x1 Learning rate (= 0.1x1
) [1, 4]
x2 Momentum (= 1 − 0.1x2
) [0.5, 2]
x3 L2 weight decay [0.001, 0.01]
x4 Dropout 1 [0.4, 0.6]
x5 Dropout 2 [0.4, 0.6]
x∗
6 FC 1 units [512, 1024]
x∗
7 FC 2 units [256, 512]
x8 Conv 1 initialization deviation [0.01, 0.05]
x9 Conv 2 initialization deviation [0.01, 0.05]
x10 Conv 3 initialization deviation [0.01, 0.05]
x11 FC 1 initialization deviation [0.001, 0.01]
x12 FC 2 initialization deviation [0.001, 0.01]
x13 FC 3 initialization deviation [0.001, 0.01]
x14 Conv 1 bias [0, 1]
x15 Conv 2 bias [0, 1]
x16 Conv 3 bias [0, 1]
x17 FC 1 bias [0, 1]
x18 FC 2 bias [0, 1]
x∗
19 Normalization 1 localsize (= 2x19 + 3) [0, 2]
x∗
20 Normalization 2 localsize (= 2x20 + 3) [0, 2]
x21 Normalization 1 alpha [0.0001, 0.0002]
x22 Normalization 2 alpha [0.0001, 0.0002]
x23 Normalization 1 beta [0.5, 0.95]
x24 Normalization 2 beta [0.5, 0.95]
Integer parameters are marked with ∗
.
Adience benchmark (Eran et al. 2014)
Copyright © GREE, Inc. All Rights Reserved.
性別推定結果
Mean loss of all executions for each method per iteration
(gender classification CNN)
CNNのハイパパラメータ最適化 (Ozaki et al. 2017)
Copyright © GREE, Inc. All Rights Reserved.
CNNのハイパパラメータ最適化 (Ozaki et al. 2017)
性別推定結果
Method mean loss min loss
Random search 0.001732 (±0.000540) 0.000984
Bayesian optimization 0.00183 (±0.000547) 0.001097
CMA-ES 0.001804 (±0.000480) 0.001249
Coordinate-search method 0.002240 (±0.001448) 0.000378
Nelder-Mead method 0.000395 (±0.000129) 0.000245
Method mean accuracy (%) accuracy with min loss (%)
Random search 87.93 (±0.24) 88.21
Bayesian optimization 88.07 (±0.27) 87.85
CMA-ES 88.20 (±0.38) 88.55
Coordinate-search method 87.04 (±0.52) 87.72
Nelder-Mead method 88.38 (±0.47) 88.83
Copyright © GREE, Inc. All Rights Reserved.
CNNのハイパパラメータ最適化 (Ozaki et al. 2017)
年齢層推定結果
Mean loss of all executions for each method per iteration (age
classification CNN)
Copyright © GREE, Inc. All Rights Reserved.
CNNのハイパパラメータ最適化 (Ozaki et al. 2017)
年齢層推定結果
Method mean loss min loss
Random search 0.035694 (±0.006958) 0.026563
Bayesian optimization 0.024792 (±0.003076) 0.020466
CMA-ES 0.031244 (±0.010834) 0.016952
Coordinate-search method 0.032244 (±0.006109) 0.024637
Nelder-Mead method 0.015492 (±0.002276) 0.013556
Method mean accuracy (%) accuracy with min loss (%)
Random search 57.18 (±0.96) 57.90
Bayesian optimization 56.28 (±1.68) 57.19
CMA-ES 57.17 (±0.80) 58.19
Coordinate-search method 55.06 (±2.31) 56.98
Nelder-Mead method 56.72 (±0.50) 57.42
Copyright © GREE, Inc. All Rights Reserved.
CNNのハイパパラメータ最適化 (Ozaki et al. 2017)
局所探索法が良い結果を出せた理由はなにか
仮説:目的関数が多くの良質な局所解を持つ? ->肯定的な結果(NMは異なる局所解に収束も,良い性能)
Parallel coordinates plot of the optimized hyperparameters of the gender classification CNN
• Olof (2018)による追試
• NMはCNNに対して確かに上手くいく,RNNに対しては微妙
• 平均的にはCNN/RNNいずれもTPEが良かった (ベイズ最適化でもGPの方は全然ダメだった)
• 実験を通して最良の結果を見つけたのはCNN/RNNいずれについてもNM
• CNNに共通するロス関数の性質がRNNでは成り立たないと指摘
• Snoek et al. (2012)らの実験ではGPを用いたベイズ最適化が,TPEより優れていたと報告
Copyright © GREE, Inc. All Rights Reserved.
計算実験
様々な課題
• 基本的にどの論文も提案手法が一番という結論を主張する
• 提案手法は念入りにチューニングしてあるものと考える
• 再現性の問題
• 手法の実装(ソースコード公開),ランダム性及びチューニング
• 十分な計算リソースが手元にない
• モデルの評価結果を記録した表形式のデータセット (Klein et al. 2018)
• 実験設定がまちまち
• HPOLib (Eggensperger et al. 2013)
• 手法比較の方法
• 基準(e.g. 精度,AUC)と順位付けの手法 (Dewancker et al. 2016)
• 検証データへの過学習
• 実用においてはデータセットをtraining / validation / testの3つに分割して
おきチューニング後の性能がtestにおいて乖離し過ぎていないか確認
Copyright © GREE, Inc. All Rights Reserved.
結論
Copyright © GREE, Inc. All Rights Reserved.
結論
これから熱くなると予想するトピック
• 脱グリッドサーチ
• ランダムサーチをはじめとする他の手法を使用
• 状況に応じて利点と欠点を考慮
• 自分と近い実験設定の論文を参考
• 研究トピック
• 最適化手法
• 関連手法 (e.g. 重要なパラメータの特定,学習曲線予測)
• 再現性の担保やベンチマークの整備
• 応用 (AutoML e.g. CASH problem,モデルアーキテクチャ探索)

Combined Algorithm Selection and Hyperparameter Optimization (CASH)
Copyright © GREE, Inc. All Rights Reserved.
付録
Copyright © GREE, Inc. All Rights Reserved.
Coordinate Search法
Maximal positive basisを活用した探索 (Conn et al., 2009; Audet and Hare, 2017)
D⊕
D⊕ = {±ei
: i = 1, 2, . . . , n}
Copyright © GREE, Inc. All Rights Reserved.
Coordinate Search法
λ0
∈ Λ(⊂ Rn
) δ0
∈ R with δ > 0 ϵ ∈ [0, ∞)
λ0
Copyright © GREE, Inc. All Rights Reserved.
Coordinate Search法
Pk
= {λk
+ δk
d : d ∈ D⊕} f(λ) < f(λk
) λ ∈ Pk
λ0
λ
Copyright © GREE, Inc. All Rights Reserved.
Coordinate Search法
λk+1
= λ δk+1
= δk
λ0
λ1
Copyright © GREE, Inc. All Rights Reserved.
Coordinate Search法
λ0
λ1
Pk
= {λk
+ δk
d : d ∈ D⊕} f(λ) < f(λk
) λ ∈ Pk
Copyright © GREE, Inc. All Rights Reserved.
Coordinate Search法
λk+1
= λ δk+1
= δk
λ0
λ1
λ2
Copyright © GREE, Inc. All Rights Reserved.
Coordinate Search法
λ0
λ1
λ2
λ3
Pk
= {λk
+ δk
: d ∈ D⊕} f(λ) < f(λk
) λ ∈ Pk
Copyright © GREE, Inc. All Rights Reserved.
Coordinate Search法
λk+1
= λk
δk+1
= δk
/2
λ0
λ1
λ2
λ3
=λ4
Copyright © GREE, Inc. All Rights Reserved.
Coordinate Search法
λ0
λ1
λ2
λ3
=λ4
=λ5
λk+1
= λk
δk+1
= δk
/2
Copyright © GREE, Inc. All Rights Reserved.
Coordinate Search法
δk+1
≤ ϵ
λ0
λ1
λ2
λ3
=λ4
=λ5
λ6
Copyright © GREE, Inc. All Rights Reserved.
Coordinate Search法
McCormick benchmark function
Copyright © GREE, Inc. All Rights Reserved.
Coordinate Search法
Pros and Cons
• 局所解を見つける能力
• 並列化は部分的にのみ可能
• 座標軸に沿い反復的に探索を行うため次元数に対して低スケーラブル
• 大域的な探索を行わないため,悪質な局所解に陥るリスク
収束性や失敗する例,改良した手法などはConn et al. (2009); Audet and Hare (2017)
Copyright © GREE, Inc. All Rights Reserved.
Coordinate Search法
探索空間の正規化
• ハイパパラメータ間のスケールが違いすぎると探索が非効率化
• 探索空間を予め単位超立方体に正規化して防止
• 実用上は無効値となる場合,適当に大きな損失値を返す
Copyright © GREE, Inc. All Rights Reserved.
• 初期点の決め方
• 悪質な局所解に陥る問題に対して有効な方法
Coordinate Search法
初期化の戦略
• 探索範囲の中心で初期化
• 数回のランダムサーチを行い,最も良かった点で初期化
• 異なる初期点からのマルチスタート
Copyright © GREE, Inc. All Rights Reserved.
Coordinate Search法
探索の戦略 (Audet and Hare 2017)
• Opportunistic polling
• 良いものが見つかった時点で採用
• 固定された順番
• 完全にランダム
• 直前に改善した方向からスタート
• Complete polling(スケールしない)
• 反復の度に全ての候補を評価して最良の値を選択
Copyright © GREE, Inc. All Rights Reserved.
• Weighted Hamming distance kernel (Hutter et al. 2011)
ベイズ最適化
カテゴリ的パラメータを扱うためのカーネル
kmixed(λ, λ′
) = exp(rcont + rcat),
rcont(λ, λ′
) =
l∈Λcont
(−θl(λl − λ′
l)2
),
rcat(λ, λ′
) =
l∈Λcat
−θl(1 − δ(λl, λ′
l)).
where δ is the Kronecker delta function
Copyright © GREE, Inc. All Rights Reserved.
• Conditional kernel (Lévesque et al. 2017)
• 条件的パラメータのための別のカーネル (Swersky et al. 2014)
ベイズ最適化
条件パラメータを扱うためのカーネル
kc(λ, λ′
) =
k(λ, λ′
) if λc = λ′
c ∀c ∈ C
0 otherwise
where C is the set of indices of active conditional hyperparameters
Copyright © GREE, Inc. All Rights Reserved.
ベイズ最適化
具体的なガウス過程回帰の計算
µ1(λ2
) = k(λ2
, λ1
)f(λ1
)
µ2(λ3
) = k(λ3
, λ1
) k(λ3
, λ2
)
1 k(λ1
, λ2
)
k(λ2
, λ1
) 1
−1
f(λ1
)
f(λ2
)
=
1
1 − k(λ1, λ2)2
k(λ3
, λ1
) k(λ3
, λ2
)
1 −k(λ1
, λ2
)
−k(λ2
, λ1
) 1
f(λ1
)
f(λ2
)
=
1
1 − k(λ1, λ2)2
k(λ3
, λ1
) − k(λ2
, λ1
)k(λ3
, λ2
) k(λ3
, λ2
) − k(λ2
, λ1
)k(λ3
, λ1
)
f(λ1
)
f(λ2
)
=
1
1 − k(λ1, λ2)2
(k(λ3
, λ1
) − k(λ2
, λ1
)k(λ3
, λ2
))f(λ1
) + (k(λ3
, λ2
) − k(λ2
, λ1
)k(λ3
, λ1
))f(λ2
)
λ1
λ2
λ3
k(λ, λ′
) = exp −1
2 ∥λ − λ′
∥2
k(λ3
, λ1
) k(λ2
, λ1
) k(λ3
, λ2
)
f(λ1
) f(λ3
)
Copyright © GREE, Inc. All Rights Reserved.
• Probability of Improvement (PI) (Kushner 1964)
• Expected Improvement (EI) (Mockus et al. 1978)
• 改善量を加味,よく使われる
• Predictive Entropy Search (PES) (Henrández-
Lobato et al. 2014)
• 情報量を最大化
ベイズ最適化
獲得関数の補足
aPI = P(f(λ) ≤ f(λ∗
) − ξ)
= φ
f(λ∗
) − ξ − µ(λ)
σ(λ)
λ∗
Φ ξ
PIの可視化 (Brochu et al. 2010)
※この図は最大化問題のため左式とは少し異なる
Copyright © GREE, Inc. All Rights Reserved.
ベイズ最適化
獲得関数の最大化手法
• 獲得関数最大化自体が非凸大域的最適化
• 最適化手法
• Brochu (2010)
• DIRECT (Jones et al. 1993)
• Bergstra (2011)
• Estimation of Distribution (EDA) (Larraanaga and
Lozano 2011)
• Covariance Matrix Adaptation Evolution Strategy (CMA-
ES) (Hansen 2006)
Copyright © GREE, Inc. All Rights Reserved.
• 多腕バンディット
• 複数の候補から最も良いものを逐次的に探す
• スロットマシンの累積報酬最大化問題
• ハイパパラメータ最適化は連続 / 無限腕バンディットや最適腕識別として考えられる
• ベイズ最適化は平均ケースを考えている
• バンディットは最悪ケースのリグレット最小化を考えるのが一般的
• 関連研究
• Srinivas et al. (2010, 2012); Bull (2011); Kandasamy et al. (2015,
2017)など
ベイズ最適化と多腕バンディットの繋がり
近年の研究動向
Copyright © GREE, Inc. All Rights Reserved.
参考文献
Copyright © GREE, Inc. All Rights Reserved.
Christopher M. Bishop. Pattern recognition and machine learning. Information science and statistics. Springer, New York, 2006. ISBN
978-0-387-31073-2.

Diederik P. Kingma and Jimmy Ba. Adam: A Method for Stochastic Optimization. arXiv:1412.6980 [cs], December 2014. URL http://arxiv.org/abs/
1412.6980. arXiv:1412.6980.

He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition.
2016.

Frank Hutter, Jörg Lücke, and Lars Schmidt-Thieme. Beyond Manual Tuning of Hyperparameters. KI - Künstliche Intelligenz, 29(4):329–337,
November 2015. ISSN 0933-1875, 1610-1987. doi: 10.1007/s13218-015-0381-0. URL http://link.springer.com/10.1007/s13218-015-0381-0.

Stefan Falkner, Aaron Klein, and Frank Hutter. Practical hyperparameter optimization for

deep learning, 2018a. URL https://openreview.net/forum?id=HJMudFkDf.

Jesse Dodge, Kevin Jamieson, and Noah A. Smith. Open Loop Hyperparameter Optimization and Determinantal Point Processes. arXiv:1706.01566
[cs, stat], June 2017. URL http://arxiv.org/abs/1706.01566. arXiv: 1706.01566.

Jaak Simm. Survey of hyperparameter optimization in NIPS2014, 2015. URL https://github.com/jaak-s/nips2014-survey.

Carl Staelin. Parameter selection for support vector machines. 2002. URL http://www.hpl.hp.com/techreports/2002/HPL-2002-354R1.html.

James Bergstra and Yoshua Bengio. Random search for hyper-parameter optimization. J. Mach. Learn. Res., 13:281–305, February 2012. ISSN
1532-4435. URL http://dl.acm.org/citation.cfm?id=2188385.2188395.

Frank Hutter, Holger Hoos, and Kevin Leyton-Brown. An efficient approach for assessing hyperparameter importance. In Proceedings of the 31st
International Conference on International Conference on Machine Learning - Volume 32, ICML’14, pages I—754–I—762. JMLR.org, 2014. URL
http://dl.acm.org/citation.cfm?id=3044805.3044891.
参考文献
Copyright © GREE, Inc. All Rights Reserved.
Chris Fawcett and Holger H. Hoos. Analysing differences between algorithm configurations through ablation. Journal of Heuristics, 22(4):431–458,
Aug 2016. ISSN 1572-9397. doi:10.1007/s10732-014-9275-9. URL https://doi.org/10.1007/s10732-014-9275-9.

Andre Biedenkapp, Marius Lindauer, Katharina Eggensperger, Frank Hutter, ChrisFawcett, and Holger Hoos. Efficient parameter importance analysis
via ablation with surrogates, 2017. URL https://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14750.

Jan N van Rijn and Frank Hutter. An empirical study of hyperparameter importance across datasets. In AutoML@PKDD/ECML, 2017a.

Jan N van Rijn and Frank Hutter. Hyperparameter importance across datasets. arXiv preprint arXiv:1710.04725, 2017b.

J. A. Nelder and R. Mead. A Simplex Method for Function Minimization. The Computer Journal, 7(4):308–313, January 1965. ISSN 0010-4620,
1460-2067. doi: 10.1093/comjnl/7.4.308. URL https://academic.oup.com/comjnl/article-lookup/doi/10.1093/comjnl/7.4.308.

Andrew R. Conn, Katya Scheinberg, and Luis N. Vicente. Introduction to Derivative-Free Optimization. Society for Industrial and Applied Mathematics,
January 2009. ISBN 978-0-89871-668-9 978-0-89871-876-8. doi: 10.1137/1.9780898718768. URL http://epubs.siam.org/doi/book/
10.1137/1.9780898718768.

Charles Audet and Warren Hare. Derivative-Free and Blackbox Optimization. Springer Series in Operations Research and Financial Engineering.
Springer International Publishing, Cham, 2017. ISBN 978-3-319-68912-8 978-3-319-68913-5. doi: 10.1007/978-3-319-68913-5. URL http://
link.springer.com/10.1007/978-3-319-68913-5.

Fuchang Gao and Lixing Han. Implementing the Nelder-Mead simplex algorithm with adaptive parameters. Computational Optimization and
Applications, 51(1):259–277, January 2012. ISSN 0926-6003, 1573-2894. doi: 10.1007/s10589-010-9329-3. URL http://link.springer.com/10.1007/
s10589-010-9329-3.

Hiva Ghanbari and Katya Scheinberg. Black-Box Optimization in Machine Learning with Trust Region Based Derivative Free Algorithm. arXiv:
1703.06925 [cs], March 2017. URL http://arxiv.org/abs/1703.06925. arXiv: 1703.06925.

Jasper Snoek, Hugo Larochelle, and Ryan P Adams. Practical bayesian optimization of machine learning algorithms. In Advances in neural
information processing systems, pages 2951–2959, 2012.
参考文献
Copyright © GREE, Inc. All Rights Reserved.
Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown. Sequential Model-Based Optimization for General Algorithm Configuration. In Carlos A. Coello Coello, editor, Learning and
Intelligent Optimization, pages 507–523, Berlin, Heidelberg, 2011. Springer Berlin Heidelberg. ISBN 978-3-642-25566-3.

James Bergstra, Rémi Bardenet, Yoshua Bengio, and Balázs Kégl. Algorithms for hyperparameter optimization. In Proceedings of the 24th International Conference on Neural
Information Processing Systems, NIPS’11, pages 2546–2554, USA, 2011. Curran Associates Inc. ISBN 978-1-61839-599-3. URL http://dl.acm.org/citation.cfm?id=2986459.2986743.

Jasper Snoek, Oren Rippel, Kevin Swersky, Ryan Kiros, Nadathur Satish, Narayanan Sundaram, Md. Mostofa Ali Patwary, Prabhat Prabhat, and Ryan P. Adams. Scalable bayesian
optimization using deep neural networks. In Proceedings of the 32Nd International Conference on International Conference on Machine Learning - Volume 37, ICML’15, pages 2171–
2180. JMLR.org, 2015. URL http://dl.acm.org/citation.cfm?id=3045118.3045349.

Carl Edward Rasmussen and Christopher K. I. Williams. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). The MIT Press, 2005. ISBN
026218253X.32

Eric Brochu, Vlad M. Cora, and Nando de Freitas. A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical
Reinforcement Learning. arXiv:1012.2599 [cs], December 2010. URL http://arxiv.org/abs/1012.2599. arXiv: 1012.2599.

Niranjan Srinivas, Andreas Krause, Sham M. Kakade, and Matthias W. Seeger. Information-theoretic regret bounds for gaussian process optimization in the bandit setting. IEEE
Transactions on Information Theory, 58:3250–3265, 2012.

J. Quiñonero-Candela, CE. Rasmussen, and CKI. Williams. Approximation Methods for Gaussian Process Regression, pages 203–223. Neural Information Processing. MIT Press,
Cambridge, MA, USA, September 2007.

Michalis Titsias. Variational learning of inducing variables in sparse gaussian processes. In

David van Dyk and Max Welling, editors, Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics, volume 5 of Proceedings of Machine Learning
Research, pages 567–574, Hilton Clearwater Beach Resort, Clearwater Beach, Florida USA, 16–18 Apr 2009. PMLR. URL http://proceedings.mlr.press/v5/titsias09a.html.

Amar Shah and Zoubin Ghahramani. Parallel predictive entropy search for batch global optimization of expensive objective functions. In Proceedings of the 28th International
Conference on Neural Information Processing Systems - Volume 2, NIPS’15, pages 3330–3338, Cambridge, MA, USA, 2015. MIT Press. URL http://dl.acm.org/citation.cfm?
id=2969442.2969611.

Javier Gonzalez, Zhenwen Dai, Philipp Hennig, and Neil Lawrence. Batch bayesian optimization via local penalization. In Arthur Gretton and Christian C. Robert, editors, Proceedings
of the 19th International Conference on Artificial Intelligence and Statistics, volume 51 of Proceedings of Machine Learning Research, pages 648–657, Cadiz, Spain, 09–11 May 2016.
PMLR. URL http://proceedings.mlr.press/v51/gonzalez16a.html.
参考文献
Copyright © GREE, Inc. All Rights Reserved.
Tarun Kathuria, Amit Deshpande, and Pushmeet Kohli. Batched Gaussian Process Bandit Optimization via Determinantal Point Processes. arXiv:1611.04088 [cs],
November 2016. URL http://arxiv.org/abs/1611.04088. arXiv: 1611.04088.

Kirthevasan Kandasamy, Akshay Krishnamurthy, Jeff Schneider, and Barnabas Poczos. Parallelised bayesian optimisation via thompson sampling. In Amos Storkey
and Fernando

Perez-Cruz, editors, Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, volume 84 of Proceedings of Machine Learning
Research, pages 133–142, Playa Blanca, Lanzarote, Canary Islands, 09–11 Apr 2018. PMLR. URL http://proceedings.mlr.press/v84/kandasamy18a.html.

Emile Contal, David Buffoni, Alexandre Robicquet, and Nicolas Vayatis. Parallel gaussian process optimization with upper confidence bound and pure exploration. In
Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases - Volume 8188, ECML PKDD 2013, pages 225–240, New York,
NY, USA, 2013. Springer-Verlag New York, Inc. ISBN 978-3-642-40987-5. doi: 10.1007/978-3-642-40988-2_15. URL http://dx.doi.org/10.1007/978-3-642-40988-2_15.

Thomas Desautels, Andreas Krause, and Joel W. Burdick. Parallelizing Exploration-Exploitation Tradeoffs in Gaussian Process Bandit Optimization. Journal of Machine
Learning Research, 15:4053–4103, 2014. URL http://jmlr.org/papers/v15/desautels14a.html.

Erik A. Daxberger and Bryan Kian Hsiang Low. Distributed batch Gaussian process optimization. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th
International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 951–960, International Convention Centre, Sydney,
Australia, 06–11 Aug 2017. PMLR. URL http://proceedings.mlr.press/v70/daxberger17a.html.

Zi Wang, Chengtao Li, Stefanie Jegelka, and Pushmeet Kohli. Batched high-dimensional Bayesian optimization via structural kernel learning. In Doina Precup and Yee
Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 3656–
3664, International Convention Centre, Sydney, Australia, 06–11 Aug 2017. PMLR. URL http://proceedings.mlr.press/v70/wang17h.html.

Zi Wang, Clement Gehring, Pushmeet Kohli, and Stefanie Jegelka. Batched large-scalebayesian optimization in high-dimensional spaces. In Amos Storkey and
Fernando Perez-Cruz, editors, Proceedings of the Twenty-First nternational Conference on Artificial Intelligence and Statistics, volume 84 of Proceedings of Machine
Learning Research, pages 745–754, Playa Blanca, Lanzarote, Canary Islands, 09–11 Apr 2018b. PMLR. URL http://proceedings.mlr.press/v84/wang18c.html.

Ran Rubin. New Heuristics for Parallel and Scalable Bayesian Optimization. arXiv:1807.00373 [cs, stat], July 2018. URL http://arxiv.org/abs/1807.00373. arXiv:
1807.00373.

Watanabe, Shinji, and Jonathan Le Roux. Black box optimization for automatic speech recognition. 2014.

Loshchilov, Ilya, and Frank Hutter. CMA-ES for Hyperparameter Optimization of Deep Neural Networks. 2016.
参考文献
Copyright © GREE, Inc. All Rights Reserved.
Michael Meissner, Michael Schmuker, and Gisbert Schneider. Optimized Particle Swarm Optimization (OPSO) and its application to artificial neural network
training. BMC Bioinformatics, 7(1):125, March 2006. ISSN 1471-2105. doi: 10.1186/1471-2105-7-125. URL https://doi.org/10.1186/1471-2105-7-125.

Shih-Wei Lin, Shih-Chieh Chen, Wen-Jie Wu, and Chih-Hsien Chen. Parameter determination and feature selection for back-propagation network by
particle swarm optimization. Knowledge and Information Systems, 21(2):249–266, November 2009. ISSN 0219-3116. doi: 10.1007/s10115-009-0242-y.
URL https://doi.org/10.1007/s10115-009-0242-y.

Pablo Ribalta Lorenzo, Jakub Nalepa, Luciano Sanchez Ramos, and José Ranilla Pastor. Hyper-parameter selection in deep neural networks using parallel
particle swarm optimization. In Proceedings of the Genetic and Evolutionary Computation Conference Companion, pages 1864–1871. ACM, 2017.

Fei Ye. Particle swarm optimization-based automatic parameter selection for deep neural networks and its applications in large-scale and high-
dimensional data. PLOS ONE, 12 (12):1–36, 2017. doi: 10.1371/journal.pone.0188746. URL https://doi.org/10.1371/journal.pone.0188746.

F. H. F. Leung, H. K. Lam, S. H. Ling, and P. K. S. Tam. Tuning of the structure and parameters of a neural network using an improved genetic algorithm.
Neural Networks, IEEE Transactions on, 14(1):79–88, February 2003. doi: 10.1109/tnn.2002.804317. URL http://dx.doi.org/10.1109/tnn.2002.804317.

Steven R Young, Derek C Rose, Thomas P Karnowski, Seung-Hwan Lim, and Robert M Patton. Optimizing deep learning hyper-parameters through an
evolutionary algorithm. In Proceedings of the Workshop on Machine Learning in High-Performance Computing Environments, page 4. ACM, 2015.

Wei Fu, Tim Menzies, and Xipeng Shen. Tuning for software analytics: Is it really necessary? Information and Software Technology, 76:135 – 146, 2016a.
ISSN 0950-5849. doi: https://doi.org/10.1016/j.infsof.2016.04.017. URL http://www.sciencedirect.com/science/article/pii/S0950584916300738.

Wei Fu, Vivek Nair, and Tim Menzies. Why is Differential Evolution Better than Grid Search for Tuning Defect Predictors? arXiv:1609.02613 [cs, stat],
September 2016b. URL http://arxiv.org/abs/1609.02613. arXiv: 1609.02613.

Samantha Hansen. Using deep q-learning to control optimization hyperparameters. arXiv preprint arXiv:1602.04062, 2016.

Irwan Bello, Barret Zoph, Vijay Vasudevan, and Quoc V Le. Neural optimizer search with reinforcement learning. In International Conference on Machine
Learning, pages 459–468, 2017.
参考文献
Copyright © GREE, Inc. All Rights Reserved.
Xingping Dong, Jianbing Shen, Wenguan Wang, Yu Liu, Ling Shao, and Fatih Porikli. Hyperparameter optimization for tracking with continuous deep q-learning. In Proceedings of
the IEEE Conference on Computer Vision and Pattern Recognition, pages 518–527, 2018.

Dougal Maclaurin, David Duvenaud, and Ryan P. Adams. Gradient-based hyperparameter optimization through reversible learning. In Proceedings of the 32Nd International
Conference on International Conference on Machine Learning - Volume 37, ICML’15, pages 2113–2122. JMLR.org, 2015. URL http://dl.acm.org/citation.cfm?id=3045118.3045343.

Jelena Luketina, Mathias Berglund, Klaus Greff, and Tapani Raiko. Scalable gradientbased tuning of continuous regularization hyperparameters. In Proceedings of the 33rd
International Conference on International Conference on Machine Learning - Volume 48, ICML’16, pages 2952–2960. JMLR.org, 2016. URL http://dl.acm.org/citation.cfm?
id=3045390.3045701.

Fabian Pedregosa. Hyperparameter optimization with approximate gradient. In Proceedings of the 33rd International Conference on International Conference on Machine Learning -
Volume 48, ICML’16, pages 737–746. JMLR.org, 2016. URL http://dl.acm.org/citation.cfm?id=3045390.3045469.

Luca Franceschi, Michele Donini, Paolo Frasconi, and Massimiliano Pontil. On hyperparameter optimization in learning systems. In Proceedings of the 5th International Conference
on Learning Representations (Workshop Track), 2017a.

Luca Franceschi, Michele Donini, Paolo Frasconi, and Massimiliano Pontil. A Bridge Between Hyperparameter Optimization and Larning-to-learn. arXiv:1712.06283 [cs, stat],
December 2017b. URL http://arxiv.org/abs/1712.06283. arXiv: 1712.06283.

Luca Franceschi, Michele Donini, Paolo Frasconi, and Massimiliano Pontil. Forward and reverse gradient-based hyperparameter optimization. In Doina Precup and Yee Whye Teh,
editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 1165–1173, International
ConventionCentre, Sydney, Australia, 06–11 Aug 2017c. PMLR. URL http://proceedings.mlr. press/v70/franceschi17a.html.

Luca Franceschi, Paolo Frasconi, Saverio Salzo, Riccardo Grazzi, and Massimiliano Pontil. Bilevel programming for hyperparameter optimization and meta-learning. In Jennifer Dy
and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 1563–1572,
Stockholmsmässan, Stockholm Sweden, 10–15 Jul 2018a. PMLR. URL http://proceedings.mlr.press/v80/franceschi18a.html.

Luca Franceschi, Riccardo Grazzi, Massimiliano Pontil, Saverio Salzo, and Paolo Frasconi. Far-ho: A bilevel programming package for hyperparameter optimization and
metalearning. CoRR, abs/1806.04941, 2018b. URL http://arxiv.org/abs/1806.04941.

Tobias Domhan, Jost Tobias Springenberg, and Frank Hutter. Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves. In
Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI’15, pages 3460–3468. AAAI Press, 2015. ISBN 978-1-57735-738-4. URL http://dl.acm.org/
citation.cfm?id=2832581.2832731.
参考文献
Copyright © GREE, Inc. All Rights Reserved.
Aaron Klein, Stefan Falkner, Jost Tobias Springenberg, and Frank Hutter. Learning curve prediction with bayesian neural networks. 2016.

Akshay Chandrashekaran and Ian R. Lane. Speeding up Hyper-parameter Optimization by Extrapolation of Learning Curves Using Previous Builds.
In Michelangelo Ceci, Jaakko Hollmén, Ljupčo Todorovski, Celine Vens, and Sašo Džeroski, editors, Machine Learning and Knowledge Discovery in
Databases, pages 477–492, Cham, 2017. Springer International Publishing. ISBN 978-3-319-71249-9.

Tobias Hinz, Nicolás Navarro-Guerrero, Sven Magg, and Stefan Wermter. Speeding up the hyperparameter optimization of deep convolutional neural
networks. International Journal of Computational Intelligence and Applications, page 1850008, 2018.

Lisha Li, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, and Ameet Talwalkar. Hyperband: A Novel Bandit-Based Approach to
Hyperparameter Optimization. Journal of Machine Learning Research, 18(185):1–52, 2018. URL http://jmlr.org/papers/v18/16-558.html.

Hadrien Bertrand, Roberto Ardon, Matthieu Perrot, and Isabelle Bloch. Hyperparameter optimization of deep neural networks : Combining
hyperband with bayesian model selection. 2017.

Stefan Falkner, Aaron Klein, and Frank Hutter. Bohb: Robust and efficient hyperparameter optimization at scale. In International Conference on
Machine Learning, pages 1436–1445, 2018b.

Jiazhuo Wang, Jason Xu, and Xuejun Wang. Combination of Hyperband and Bayesian Optimization for Hyperparameter Optimization in Deep
Learning. arXiv:1801.01596 [cs], January 2018a. URL http://arxiv.org/abs/1801.01596. arXiv: 1801.01596.

Jungtaek Kim, Saehoon Kim, and Seungjin Choi. Learning to Warm-Start Bayesian Hyperparameter Optimization. ArXiv e-prints, October 2017.

Jungtaek Kim, Saehoon Kim, and Seungjin Choi. Learning to transfer initializations for bayesian hyperparameter optimization. arXiv preprint arXiv:
1710.06219, 2017.

T Gomes, P Miranda, R Prudêncio, C Soares, and A Carvalho. Combining meta-learning and optimization algorithms for parameter selection. In 5 th
PLANNING TO LEARN WORKSHOP WS28 AT ECAI 2012, page 6. 2012.
参考文献
Copyright © GREE, Inc. All Rights Reserved.
Matthias Reif, Faisal Shafait, and Andreas Dengel. Meta-learning for evolutionary parameter optimization of classifiers. Machine learning, 87(3):357–
380, 2012.

Rémi Bardenet, Mátyás Brendel, Balázs Kégl, and Michele Sebag. Collaborative hyperparameter tuning. In International Conference on Machine
Learning, pages 199–207, 2013.

Dani Yogatama and Gideon Mann. Efficient transfer learning method for automatic hyperparameter tuning. In Artificial Intelligence and Statistics,
pages 1077–1085, 2014.

Matthias Feurer, Jost Tobias Springenberg, and Frank Hutter. Using meta-learning to initialize bayesian optimization of hyperparameters. In
Proceedings of the 2014 International Conference on Meta-learning and Algorithm Selection-Volume 1201, pages 3–10. 2014.

Matthias Feurer, Jost Tobias Springenberg, and Frank Hutter. Initializing bayesian hyperparameter optimization via meta-learning. In AAAI, pages
1128–1135, 2015.

Matthias Feurer, Benjamin Letham, and Eytan Bakshy. Scalable meta-learning for bayesian optimization. arXiv preprint arXiv:1802.02219, 2018.

Dirk V Arnold and H-G Beyer. A general noise model and its effects on evolution strategy performance. IEEE Transactions on Evolutionary
Computation, 10(4):380–391, 2006.

Sandor Markon, Dirk V Arnold, Thomas Back, Thomas Beielstein, and H-G Beyer. Thresholding-a selection operator for noisy es. In Evolutionary
Computation, 2001. Proceedings of the 2001 Congress on, volume 1, pages 465–472. IEEE, 2001.

Thomas Beielstein and Sandor Markon. Threshold selection, hypothesis tests, and doe methods. In Evolutionary Computation, 2002. CEC’02.
Proceedings of the 2002 Congress on, volume 1, pages 777–782. IEEE, 2002.

Yaochu Jin and Jürgen Branke. Evolutionary optimization in uncertain environments-a survey. IEEE Transactions on evolutionary computation, 9(3):
303–317, 2005.
参考文献
Copyright © GREE, Inc. All Rights Reserved.
Chi Keong Goh and Kay Chen Tan. An investigation on noisy environments in evolutionary multiobjective optimization. IEEE Transactions on
Evolutionary Computation, 11(3):354–381, 2007.

Christian Gießen and Timo Kötzing. Robustness of populations in stochastic environments. Algorithmica, 75(3):462–489, 2016.

Hong Wang, Hong Qian, and Yang Yu. Noisy derivative-free optimization with value suppression. 2018b.

Yoshihiko Ozaki, Masaki Yano, and Masaki Onishi. Effective hyperparameter optimization using Nelder-Mead method in deep learning. IPSJ
Transactions on Computer Vision and Applications, 9(1), December 2017. ISSN 1882-6695. doi: 10.1186/s41074-017-0030-7. URL https://
ipsjcva.springeropen.com/articles/10.1186/s41074-017-0030-7.

LeCun Y, Cortes C MNIST handwritten digit database. http://yann.lecun.com/exdb/mnist/. 2010.

LeCun Y, Bottou L, Bengio Y, Patrick H Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324, 1998.

Chang JR, Chen YS Batch-Normalized Maxout Network in Network. In: Proceedings of the 33rd International Conference on Machine Learning.
2015. https://arxiv.org/abs/1511.02583.

Eran E, Roee E, Tal E Age and gender estimation of unfiltered faces. IEEE Trans Inf Forensic Secur 9(12):2170–2179, 2014.

Gil L, Tal H Age and gender classification using convolutional neural networks. Computer Vision and Pattern Recognition Workshops (CVPRW).
2015. http://ieeexplore.ieee.org/document/7301352.

Skogby Steinholtz Olof. A comparative study of black-box optimization algorithms for tuning of hyper-parameters in deep neural networks, 2018.
参考文献
Copyright © GREE, Inc. All Rights Reserved.
Aaron Klein, Eric Christiansen, Kevin Murphy, and Frank Hutter. Towards reproducible neural architecture and hyperparameter search. 2018.

Katharina Eggensperger, Matthias Feurer, Frank Hutter, James Bergstra, Jasper Snoek, Holger Hoos, and Kevin Leyton-Brown. Towards an empirical
foundation for assessing bayesian optimization of hyperparameters. In NIPS workshop on Bayesian Optimization in Theory and Practice, volume 10,
page 3, 2013.

Ian Dewancker, Michael McCourt, Scott Clark, Patrick Hayes, Alexandra Johnson, and George Ke. A strategy for ranking optimization methods using
multiple criteria. In Workshop on Automatic Machine Learning, pages 11–20, 2016.

Julien-Charles Lévesque, Audrey Durand, Christian Gagné, and Robert Sabourin. Bayesian optimization for conditional hyperparameter spaces. In
Proc. of the International Joint Conference on Neural Networks (IJCNN). IEEE, 05 2017.

Kevin Swersky, David Duvenaud, Jasper Snoek, Frank Hutter, and Michael A Osborne. Raiders of the lost architecture: Kernels for bayesian
optimization in conditional parameter spaces. arXiv preprint arXiv:1409.4011, 2014a.

Harold J. Kushner. A New Method of Locating the Maximum Point of an Arbitrary Multipeak Curve in the Presence of Noise. Journal of Basic
Engineering, 86(1):97+, 1964. ISSN 00219223. doi: 10.1115/1.3653121. URL http://dx.doi.org/10.1115/1.3653121.

Jonas Mockus, Vytautas Tiesis, and Antanas Zilinskas. The application of bayesian methods for seeking the extremum. Towards Global Optimization,
1978.

José Miguel Henrández-Lobato, Matthew W. Hoffman, and Zoubin Ghahramani. Predictive entropy search for efficient global optimization of black-
box functions. In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 1, NIPS’14, pages 918–926,
Cambridge, MA, USA, 2014. MIT Press. URL http://dl.acm.org/citation.cfm?id=2968826.2968929.

D. R. Jones, C. D. Perttunen, and B. E. Stuckman. Lipschitzian optimization without the Lipschitz constant. Journal of Optimization Theory and
Applications, 79(1):157–181, October 1993. ISSN 1573-2878. doi: 10.1007/BF00941892. URL https://doi.org/10.1007/BF00941892.

Pedro Larraanaga and Jose A. Lozano. Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation. Kluwer Academic Publishers,
Norwell, MA, USA, 2001. ISBN 0792374665.
参考文献
Copyright © GREE, Inc. All Rights Reserved.
Nikolaus Hansen. The CMA Evolution Strategy: A Comparing Review. In Jose A. Lozano, Pedro Larrañaga, Iñaki Inza, and Endika Bengoetxea,
editors, Towards a New Evolutionary Computation: Advances in the Estimation of Distribution Algorithms, pages 75–102. Springer Berlin Heidelberg,
Berlin, Heidelberg, 2006. ISBN 978-3-540-32494-2. doi: 10.1007/3-540-32494-1_4. URL https://doi.org/10.1007/3-540-32494-1_4.

Niranjan Srinivas, Andreas Krause, Sham Kakade, and Matthias Seeger. Gaussian process optimization in the bandit setting: No regret and
experimental design. In Proceedings of the 27th International Conference on International Conference on Machine Learning, ICML’10, pages 1015–
1022, USA, 2010. Omnipress. ISBN 978-1-60558-907-7. URL http://dl.acm.org/citation.cfm?id=3104322.3104451.

Niranjan Srinivas, Andreas Krause, Sham M. Kakade, and Matthias W. Seeger. Information-theoretic regret bounds for gaussian process
optimization in the bandit setting. IEEE Transactions on Information Theory, 58:3250–3265, 2012.

Adam D. Bull. Convergence rates of efficient global optimization algorithms. J. Mach. Learn. Res., 12:2879–2904, November 2011. ISSN 1532-4435.
URL http://dl.acm.org/citation.cfm?id=1953048.2078198.

Kirthevasan Kandasamy, Jeff Schneider, and Barnabás Póczos. High dimensional bayesian optimisation and bandits via additive models. In
Proceedings of the 32Nd International Conference on International Conference on Machine Learning - Volume 37, ICML’15, pages 295–304.
JMLR.org, 2015. URL http://dl.acm.org/citation.cfm?id=3045118.3045151.

Kirthevasan Kandasamy. Tuning hyper-parameters without grad students: Scaling up bandit optimisation. 2017.
参考文献

More Related Content

What's hot

深層学習の数理
深層学習の数理深層学習の数理
深層学習の数理Taiji Suzuki
 
「世界モデル」と関連研究について
「世界モデル」と関連研究について「世界モデル」と関連研究について
「世界モデル」と関連研究についてMasahiro Suzuki
 
グラフィカルモデル入門
グラフィカルモデル入門グラフィカルモデル入門
グラフィカルモデル入門Kawamoto_Kazuhiko
 
強化学習と逆強化学習を組み合わせた模倣学習
強化学習と逆強化学習を組み合わせた模倣学習強化学習と逆強化学習を組み合わせた模倣学習
強化学習と逆強化学習を組み合わせた模倣学習Eiji Uchibe
 
【メタサーベイ】数式ドリブン教師あり学習
【メタサーベイ】数式ドリブン教師あり学習【メタサーベイ】数式ドリブン教師あり学習
【メタサーベイ】数式ドリブン教師あり学習cvpaper. challenge
 
モデル高速化百選
モデル高速化百選モデル高速化百選
モデル高速化百選Yusuke Uchida
 
Statistical Semantic入門 ~分布仮説からword2vecまで~
Statistical Semantic入門 ~分布仮説からword2vecまで~Statistical Semantic入門 ~分布仮説からword2vecまで~
Statistical Semantic入門 ~分布仮説からword2vecまで~Yuya Unno
 
モデルアーキテクチャ観点からのDeep Neural Network高速化
モデルアーキテクチャ観点からのDeep Neural Network高速化モデルアーキテクチャ観点からのDeep Neural Network高速化
モデルアーキテクチャ観点からのDeep Neural Network高速化Yusuke Uchida
 
勾配ブースティングの基礎と最新の動向 (MIRU2020 Tutorial)
勾配ブースティングの基礎と最新の動向 (MIRU2020 Tutorial)勾配ブースティングの基礎と最新の動向 (MIRU2020 Tutorial)
勾配ブースティングの基礎と最新の動向 (MIRU2020 Tutorial)RyuichiKanoh
 
変分推論と Normalizing Flow
変分推論と Normalizing Flow変分推論と Normalizing Flow
変分推論と Normalizing FlowAkihiro Nitta
 
最適輸送の解き方
最適輸送の解き方最適輸送の解き方
最適輸送の解き方joisino
 
Transformer メタサーベイ
Transformer メタサーベイTransformer メタサーベイ
Transformer メタサーベイcvpaper. challenge
 
社会心理学者のための時系列分析入門_小森
社会心理学者のための時系列分析入門_小森社会心理学者のための時系列分析入門_小森
社会心理学者のための時系列分析入門_小森Masashi Komori
 
[DL輪読会]Neural Ordinary Differential Equations
[DL輪読会]Neural Ordinary Differential Equations[DL輪読会]Neural Ordinary Differential Equations
[DL輪読会]Neural Ordinary Differential EquationsDeep Learning JP
 
機械学習のためのベイズ最適化入門
機械学習のためのベイズ最適化入門機械学習のためのベイズ最適化入門
機械学習のためのベイズ最適化入門hoxo_m
 
深層生成モデルと世界モデル(2020/11/20版)
深層生成モデルと世界モデル(2020/11/20版)深層生成モデルと世界モデル(2020/11/20版)
深層生成モデルと世界モデル(2020/11/20版)Masahiro Suzuki
 
深層学習の不確実性 - Uncertainty in Deep Neural Networks -
深層学習の不確実性 - Uncertainty in Deep Neural Networks -深層学習の不確実性 - Uncertainty in Deep Neural Networks -
深層学習の不確実性 - Uncertainty in Deep Neural Networks -tmtm otm
 
Transformerを雰囲気で理解する
Transformerを雰囲気で理解するTransformerを雰囲気で理解する
Transformerを雰囲気で理解するAtsukiYamaguchi1
 
最適化超入門
最適化超入門最適化超入門
最適化超入門Takami Sato
 
ようやく分かった!最尤推定とベイズ推定
ようやく分かった!最尤推定とベイズ推定ようやく分かった!最尤推定とベイズ推定
ようやく分かった!最尤推定とベイズ推定Akira Masuda
 

What's hot (20)

深層学習の数理
深層学習の数理深層学習の数理
深層学習の数理
 
「世界モデル」と関連研究について
「世界モデル」と関連研究について「世界モデル」と関連研究について
「世界モデル」と関連研究について
 
グラフィカルモデル入門
グラフィカルモデル入門グラフィカルモデル入門
グラフィカルモデル入門
 
強化学習と逆強化学習を組み合わせた模倣学習
強化学習と逆強化学習を組み合わせた模倣学習強化学習と逆強化学習を組み合わせた模倣学習
強化学習と逆強化学習を組み合わせた模倣学習
 
【メタサーベイ】数式ドリブン教師あり学習
【メタサーベイ】数式ドリブン教師あり学習【メタサーベイ】数式ドリブン教師あり学習
【メタサーベイ】数式ドリブン教師あり学習
 
モデル高速化百選
モデル高速化百選モデル高速化百選
モデル高速化百選
 
Statistical Semantic入門 ~分布仮説からword2vecまで~
Statistical Semantic入門 ~分布仮説からword2vecまで~Statistical Semantic入門 ~分布仮説からword2vecまで~
Statistical Semantic入門 ~分布仮説からword2vecまで~
 
モデルアーキテクチャ観点からのDeep Neural Network高速化
モデルアーキテクチャ観点からのDeep Neural Network高速化モデルアーキテクチャ観点からのDeep Neural Network高速化
モデルアーキテクチャ観点からのDeep Neural Network高速化
 
勾配ブースティングの基礎と最新の動向 (MIRU2020 Tutorial)
勾配ブースティングの基礎と最新の動向 (MIRU2020 Tutorial)勾配ブースティングの基礎と最新の動向 (MIRU2020 Tutorial)
勾配ブースティングの基礎と最新の動向 (MIRU2020 Tutorial)
 
変分推論と Normalizing Flow
変分推論と Normalizing Flow変分推論と Normalizing Flow
変分推論と Normalizing Flow
 
最適輸送の解き方
最適輸送の解き方最適輸送の解き方
最適輸送の解き方
 
Transformer メタサーベイ
Transformer メタサーベイTransformer メタサーベイ
Transformer メタサーベイ
 
社会心理学者のための時系列分析入門_小森
社会心理学者のための時系列分析入門_小森社会心理学者のための時系列分析入門_小森
社会心理学者のための時系列分析入門_小森
 
[DL輪読会]Neural Ordinary Differential Equations
[DL輪読会]Neural Ordinary Differential Equations[DL輪読会]Neural Ordinary Differential Equations
[DL輪読会]Neural Ordinary Differential Equations
 
機械学習のためのベイズ最適化入門
機械学習のためのベイズ最適化入門機械学習のためのベイズ最適化入門
機械学習のためのベイズ最適化入門
 
深層生成モデルと世界モデル(2020/11/20版)
深層生成モデルと世界モデル(2020/11/20版)深層生成モデルと世界モデル(2020/11/20版)
深層生成モデルと世界モデル(2020/11/20版)
 
深層学習の不確実性 - Uncertainty in Deep Neural Networks -
深層学習の不確実性 - Uncertainty in Deep Neural Networks -深層学習の不確実性 - Uncertainty in Deep Neural Networks -
深層学習の不確実性 - Uncertainty in Deep Neural Networks -
 
Transformerを雰囲気で理解する
Transformerを雰囲気で理解するTransformerを雰囲気で理解する
Transformerを雰囲気で理解する
 
最適化超入門
最適化超入門最適化超入門
最適化超入門
 
ようやく分かった!最尤推定とベイズ推定
ようやく分かった!最尤推定とベイズ推定ようやく分かった!最尤推定とベイズ推定
ようやく分かった!最尤推定とベイズ推定
 

Similar to Machine Learning Hyperparameter Optimization

ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mo...
ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mo...ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mo...
ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mo...Databricks
 
The Factoring Dead: Preparing for the Cryptopocalypse
The Factoring Dead: Preparing for the CryptopocalypseThe Factoring Dead: Preparing for the Cryptopocalypse
The Factoring Dead: Preparing for the CryptopocalypseAlex Stamos
 
On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...
On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...
On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...Abdel Salam Sayyad
 
Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning...
Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning...Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning...
Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning...Xavier Llorà
 
ブラックボックス最適化とその応用
ブラックボックス最適化とその応用ブラックボックス最適化とその応用
ブラックボックス最適化とその応用gree_tech
 
Towards billion bit optimization via parallel estimation of distribution algo...
Towards billion bit optimization via parallel estimation of distribution algo...Towards billion bit optimization via parallel estimation of distribution algo...
Towards billion bit optimization via parallel estimation of distribution algo...kknsastry
 
Slope one recommender on hadoop
Slope one recommender on hadoopSlope one recommender on hadoop
Slope one recommender on hadoopYONG ZHENG
 
Using Simulation to Investigate Requirements Prioritization Strategies
Using Simulation to Investigate Requirements Prioritization StrategiesUsing Simulation to Investigate Requirements Prioritization Strategies
Using Simulation to Investigate Requirements Prioritization StrategiesCS, NcState
 
EPFL workshop on sparsity
EPFL workshop on sparsityEPFL workshop on sparsity
EPFL workshop on sparsityJuri Ranieri
 
On the Value of User Preferences in Search-Based Software Engineering
On the Value of User Preferences in Search-Based Software EngineeringOn the Value of User Preferences in Search-Based Software Engineering
On the Value of User Preferences in Search-Based Software EngineeringAbdel Salam Sayyad
 
"A Shallow Dive into Training Deep Neural Networks," a Presentation from Deep...
"A Shallow Dive into Training Deep Neural Networks," a Presentation from Deep..."A Shallow Dive into Training Deep Neural Networks," a Presentation from Deep...
"A Shallow Dive into Training Deep Neural Networks," a Presentation from Deep...Edge AI and Vision Alliance
 
Summary - Adaptive Insertion Policies for High Performance Caching. Qureshi, ...
Summary - Adaptive Insertion Policies for High Performance Caching. Qureshi, ...Summary - Adaptive Insertion Policies for High Performance Caching. Qureshi, ...
Summary - Adaptive Insertion Policies for High Performance Caching. Qureshi, ...Jose Pinilla
 
Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?Ed Kohlwey
 
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...MLconf
 
A machine-learning view on heterogeneous catalyst design and discovery
A machine-learning view on heterogeneous catalyst design and discoveryA machine-learning view on heterogeneous catalyst design and discovery
A machine-learning view on heterogeneous catalyst design and discoveryIchigaku Takigawa
 
Mixed Integer Programming: Analyzing 12 Years of Progress
Mixed Integer Programming: Analyzing 12 Years of ProgressMixed Integer Programming: Analyzing 12 Years of Progress
Mixed Integer Programming: Analyzing 12 Years of ProgressIBM Decision Optimization
 
Hybrid Multi-Gradient Explorer Algorithm for Global Multi-Objective Optimization
Hybrid Multi-Gradient Explorer Algorithm for Global Multi-Objective OptimizationHybrid Multi-Gradient Explorer Algorithm for Global Multi-Objective Optimization
Hybrid Multi-Gradient Explorer Algorithm for Global Multi-Objective OptimizationeArtius, Inc.
 
Designing Architecture-aware Library using Boost.Proto
Designing Architecture-aware Library using Boost.ProtoDesigning Architecture-aware Library using Boost.Proto
Designing Architecture-aware Library using Boost.ProtoJoel Falcou
 
A Fusion of Machine Learning and Graph Analysis for Free-Form Data Entry Clus...
A Fusion of Machine Learning and Graph Analysis for Free-Form Data Entry Clus...A Fusion of Machine Learning and Graph Analysis for Free-Form Data Entry Clus...
A Fusion of Machine Learning and Graph Analysis for Free-Form Data Entry Clus...Neo4j
 

Similar to Machine Learning Hyperparameter Optimization (20)

ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mo...
ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mo...ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mo...
ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mo...
 
The Factoring Dead: Preparing for the Cryptopocalypse
The Factoring Dead: Preparing for the CryptopocalypseThe Factoring Dead: Preparing for the Cryptopocalypse
The Factoring Dead: Preparing for the Cryptopocalypse
 
On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...
On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...
On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...
 
Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning...
Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning...Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning...
Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning...
 
Ihdels presentation
Ihdels presentationIhdels presentation
Ihdels presentation
 
ブラックボックス最適化とその応用
ブラックボックス最適化とその応用ブラックボックス最適化とその応用
ブラックボックス最適化とその応用
 
Towards billion bit optimization via parallel estimation of distribution algo...
Towards billion bit optimization via parallel estimation of distribution algo...Towards billion bit optimization via parallel estimation of distribution algo...
Towards billion bit optimization via parallel estimation of distribution algo...
 
Slope one recommender on hadoop
Slope one recommender on hadoopSlope one recommender on hadoop
Slope one recommender on hadoop
 
Using Simulation to Investigate Requirements Prioritization Strategies
Using Simulation to Investigate Requirements Prioritization StrategiesUsing Simulation to Investigate Requirements Prioritization Strategies
Using Simulation to Investigate Requirements Prioritization Strategies
 
EPFL workshop on sparsity
EPFL workshop on sparsityEPFL workshop on sparsity
EPFL workshop on sparsity
 
On the Value of User Preferences in Search-Based Software Engineering
On the Value of User Preferences in Search-Based Software EngineeringOn the Value of User Preferences in Search-Based Software Engineering
On the Value of User Preferences in Search-Based Software Engineering
 
"A Shallow Dive into Training Deep Neural Networks," a Presentation from Deep...
"A Shallow Dive into Training Deep Neural Networks," a Presentation from Deep..."A Shallow Dive into Training Deep Neural Networks," a Presentation from Deep...
"A Shallow Dive into Training Deep Neural Networks," a Presentation from Deep...
 
Summary - Adaptive Insertion Policies for High Performance Caching. Qureshi, ...
Summary - Adaptive Insertion Policies for High Performance Caching. Qureshi, ...Summary - Adaptive Insertion Policies for High Performance Caching. Qureshi, ...
Summary - Adaptive Insertion Policies for High Performance Caching. Qureshi, ...
 
Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?
 
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
 
A machine-learning view on heterogeneous catalyst design and discovery
A machine-learning view on heterogeneous catalyst design and discoveryA machine-learning view on heterogeneous catalyst design and discovery
A machine-learning view on heterogeneous catalyst design and discovery
 
Mixed Integer Programming: Analyzing 12 Years of Progress
Mixed Integer Programming: Analyzing 12 Years of ProgressMixed Integer Programming: Analyzing 12 Years of Progress
Mixed Integer Programming: Analyzing 12 Years of Progress
 
Hybrid Multi-Gradient Explorer Algorithm for Global Multi-Objective Optimization
Hybrid Multi-Gradient Explorer Algorithm for Global Multi-Objective OptimizationHybrid Multi-Gradient Explorer Algorithm for Global Multi-Objective Optimization
Hybrid Multi-Gradient Explorer Algorithm for Global Multi-Objective Optimization
 
Designing Architecture-aware Library using Boost.Proto
Designing Architecture-aware Library using Boost.ProtoDesigning Architecture-aware Library using Boost.Proto
Designing Architecture-aware Library using Boost.Proto
 
A Fusion of Machine Learning and Graph Analysis for Free-Form Data Entry Clus...
A Fusion of Machine Learning and Graph Analysis for Free-Form Data Entry Clus...A Fusion of Machine Learning and Graph Analysis for Free-Form Data Entry Clus...
A Fusion of Machine Learning and Graph Analysis for Free-Form Data Entry Clus...
 

More from gree_tech

アナザーエデンPC版リリースへの道のり 〜WFSにおけるマルチプラットフォーム対応の取り組み〜
アナザーエデンPC版リリースへの道のり 〜WFSにおけるマルチプラットフォーム対応の取り組み〜アナザーエデンPC版リリースへの道のり 〜WFSにおけるマルチプラットフォーム対応の取り組み〜
アナザーエデンPC版リリースへの道のり 〜WFSにおけるマルチプラットフォーム対応の取り組み〜gree_tech
 
GREE VR Studio Laboratory「XR-UX Devプロジェクト」の成果紹介
GREE VR Studio Laboratory「XR-UX Devプロジェクト」の成果紹介GREE VR Studio Laboratory「XR-UX Devプロジェクト」の成果紹介
GREE VR Studio Laboratory「XR-UX Devプロジェクト」の成果紹介gree_tech
 
REALITYアバターを様々なメタバースで活躍させてみた - GREE VR Studio Laboratory インターン研究成果発表
REALITYアバターを様々なメタバースで活躍させてみた - GREE VR Studio Laboratory インターン研究成果発表REALITYアバターを様々なメタバースで活躍させてみた - GREE VR Studio Laboratory インターン研究成果発表
REALITYアバターを様々なメタバースで活躍させてみた - GREE VR Studio Laboratory インターン研究成果発表gree_tech
 
アプリ起動時間高速化 ~推測するな、計測せよ~
アプリ起動時間高速化 ~推測するな、計測せよ~アプリ起動時間高速化 ~推測するな、計測せよ~
アプリ起動時間高速化 ~推測するな、計測せよ~gree_tech
 
長寿なゲーム事業におけるアプリビルドの効率化
長寿なゲーム事業におけるアプリビルドの効率化長寿なゲーム事業におけるアプリビルドの効率化
長寿なゲーム事業におけるアプリビルドの効率化gree_tech
 
Cloud Spanner をより便利にする運用支援ツールの紹介
Cloud Spanner をより便利にする運用支援ツールの紹介Cloud Spanner をより便利にする運用支援ツールの紹介
Cloud Spanner をより便利にする運用支援ツールの紹介gree_tech
 
WFSにおけるCloud SpannerとGKEを中心としたGCP導入事例の紹介
WFSにおけるCloud SpannerとGKEを中心としたGCP導入事例の紹介WFSにおけるCloud SpannerとGKEを中心としたGCP導入事例の紹介
WFSにおけるCloud SpannerとGKEを中心としたGCP導入事例の紹介gree_tech
 
SINoALICE -シノアリス- Google Cloud Firestoreを用いた観戦機能の実現について
SINoALICE -シノアリス- Google Cloud Firestoreを用いた観戦機能の実現についてSINoALICE -シノアリス- Google Cloud Firestoreを用いた観戦機能の実現について
SINoALICE -シノアリス- Google Cloud Firestoreを用いた観戦機能の実現についてgree_tech
 
海外展開と負荷試験
海外展開と負荷試験海外展開と負荷試験
海外展開と負荷試験gree_tech
 
翻訳QAでのテスト自動化の取り組み
翻訳QAでのテスト自動化の取り組み翻訳QAでのテスト自動化の取り組み
翻訳QAでのテスト自動化の取り組みgree_tech
 
組み込み開発のテストとゲーム開発のテストの違い
組み込み開発のテストとゲーム開発のテストの違い組み込み開発のテストとゲーム開発のテストの違い
組み込み開発のテストとゲーム開発のテストの違いgree_tech
 
サーバーフレームワークに潜んでる脆弱性検知ツール紹介
サーバーフレームワークに潜んでる脆弱性検知ツール紹介サーバーフレームワークに潜んでる脆弱性検知ツール紹介
サーバーフレームワークに潜んでる脆弱性検知ツール紹介gree_tech
 
データエンジニアとアナリストチーム兼務になった件について
データエンジニアとアナリストチーム兼務になった件についてデータエンジニアとアナリストチーム兼務になった件について
データエンジニアとアナリストチーム兼務になった件についてgree_tech
 
シェアドサービスとしてのデータテクノロジー
シェアドサービスとしてのデータテクノロジーシェアドサービスとしてのデータテクノロジー
シェアドサービスとしてのデータテクノロジーgree_tech
 
「ドキュメント見つからない問題」をなんとかしたい - 横断検索エンジン導入の取り組みについて-
「ドキュメント見つからない問題」をなんとかしたい - 横断検索エンジン導入の取り組みについて-「ドキュメント見つからない問題」をなんとかしたい - 横断検索エンジン導入の取り組みについて-
「ドキュメント見つからない問題」をなんとかしたい - 横断検索エンジン導入の取り組みについて-gree_tech
 
「Atomic Design × Nuxt.js」コンポーネント毎に責務の範囲を明確にしたら幸せになった話
「Atomic Design × Nuxt.js」コンポーネント毎に責務の範囲を明確にしたら幸せになった話「Atomic Design × Nuxt.js」コンポーネント毎に責務の範囲を明確にしたら幸せになった話
「Atomic Design × Nuxt.js」コンポーネント毎に責務の範囲を明確にしたら幸せになった話gree_tech
 
比較サイトの検索改善(SPA から SSR に変換)
比較サイトの検索改善(SPA から SSR に変換)比較サイトの検索改善(SPA から SSR に変換)
比較サイトの検索改善(SPA から SSR に変換)gree_tech
 
コードの自動修正によって実現する、機能開発を止めないフレームワーク移行
コードの自動修正によって実現する、機能開発を止めないフレームワーク移行コードの自動修正によって実現する、機能開発を止めないフレームワーク移行
コードの自動修正によって実現する、機能開発を止めないフレームワーク移行gree_tech
 
「やんちゃ、足りてる?」〜ヤンマガWebで挑戦を続ける新入りエンジニア〜
「やんちゃ、足りてる?」〜ヤンマガWebで挑戦を続ける新入りエンジニア〜「やんちゃ、足りてる?」〜ヤンマガWebで挑戦を続ける新入りエンジニア〜
「やんちゃ、足りてる?」〜ヤンマガWebで挑戦を続ける新入りエンジニア〜gree_tech
 
法人向けメタバースプラットフォームの開発の裏側をのぞいてみた(仮)
法人向けメタバースプラットフォームの開発の裏側をのぞいてみた(仮)法人向けメタバースプラットフォームの開発の裏側をのぞいてみた(仮)
法人向けメタバースプラットフォームの開発の裏側をのぞいてみた(仮)gree_tech
 

More from gree_tech (20)

アナザーエデンPC版リリースへの道のり 〜WFSにおけるマルチプラットフォーム対応の取り組み〜
アナザーエデンPC版リリースへの道のり 〜WFSにおけるマルチプラットフォーム対応の取り組み〜アナザーエデンPC版リリースへの道のり 〜WFSにおけるマルチプラットフォーム対応の取り組み〜
アナザーエデンPC版リリースへの道のり 〜WFSにおけるマルチプラットフォーム対応の取り組み〜
 
GREE VR Studio Laboratory「XR-UX Devプロジェクト」の成果紹介
GREE VR Studio Laboratory「XR-UX Devプロジェクト」の成果紹介GREE VR Studio Laboratory「XR-UX Devプロジェクト」の成果紹介
GREE VR Studio Laboratory「XR-UX Devプロジェクト」の成果紹介
 
REALITYアバターを様々なメタバースで活躍させてみた - GREE VR Studio Laboratory インターン研究成果発表
REALITYアバターを様々なメタバースで活躍させてみた - GREE VR Studio Laboratory インターン研究成果発表REALITYアバターを様々なメタバースで活躍させてみた - GREE VR Studio Laboratory インターン研究成果発表
REALITYアバターを様々なメタバースで活躍させてみた - GREE VR Studio Laboratory インターン研究成果発表
 
アプリ起動時間高速化 ~推測するな、計測せよ~
アプリ起動時間高速化 ~推測するな、計測せよ~アプリ起動時間高速化 ~推測するな、計測せよ~
アプリ起動時間高速化 ~推測するな、計測せよ~
 
長寿なゲーム事業におけるアプリビルドの効率化
長寿なゲーム事業におけるアプリビルドの効率化長寿なゲーム事業におけるアプリビルドの効率化
長寿なゲーム事業におけるアプリビルドの効率化
 
Cloud Spanner をより便利にする運用支援ツールの紹介
Cloud Spanner をより便利にする運用支援ツールの紹介Cloud Spanner をより便利にする運用支援ツールの紹介
Cloud Spanner をより便利にする運用支援ツールの紹介
 
WFSにおけるCloud SpannerとGKEを中心としたGCP導入事例の紹介
WFSにおけるCloud SpannerとGKEを中心としたGCP導入事例の紹介WFSにおけるCloud SpannerとGKEを中心としたGCP導入事例の紹介
WFSにおけるCloud SpannerとGKEを中心としたGCP導入事例の紹介
 
SINoALICE -シノアリス- Google Cloud Firestoreを用いた観戦機能の実現について
SINoALICE -シノアリス- Google Cloud Firestoreを用いた観戦機能の実現についてSINoALICE -シノアリス- Google Cloud Firestoreを用いた観戦機能の実現について
SINoALICE -シノアリス- Google Cloud Firestoreを用いた観戦機能の実現について
 
海外展開と負荷試験
海外展開と負荷試験海外展開と負荷試験
海外展開と負荷試験
 
翻訳QAでのテスト自動化の取り組み
翻訳QAでのテスト自動化の取り組み翻訳QAでのテスト自動化の取り組み
翻訳QAでのテスト自動化の取り組み
 
組み込み開発のテストとゲーム開発のテストの違い
組み込み開発のテストとゲーム開発のテストの違い組み込み開発のテストとゲーム開発のテストの違い
組み込み開発のテストとゲーム開発のテストの違い
 
サーバーフレームワークに潜んでる脆弱性検知ツール紹介
サーバーフレームワークに潜んでる脆弱性検知ツール紹介サーバーフレームワークに潜んでる脆弱性検知ツール紹介
サーバーフレームワークに潜んでる脆弱性検知ツール紹介
 
データエンジニアとアナリストチーム兼務になった件について
データエンジニアとアナリストチーム兼務になった件についてデータエンジニアとアナリストチーム兼務になった件について
データエンジニアとアナリストチーム兼務になった件について
 
シェアドサービスとしてのデータテクノロジー
シェアドサービスとしてのデータテクノロジーシェアドサービスとしてのデータテクノロジー
シェアドサービスとしてのデータテクノロジー
 
「ドキュメント見つからない問題」をなんとかしたい - 横断検索エンジン導入の取り組みについて-
「ドキュメント見つからない問題」をなんとかしたい - 横断検索エンジン導入の取り組みについて-「ドキュメント見つからない問題」をなんとかしたい - 横断検索エンジン導入の取り組みについて-
「ドキュメント見つからない問題」をなんとかしたい - 横断検索エンジン導入の取り組みについて-
 
「Atomic Design × Nuxt.js」コンポーネント毎に責務の範囲を明確にしたら幸せになった話
「Atomic Design × Nuxt.js」コンポーネント毎に責務の範囲を明確にしたら幸せになった話「Atomic Design × Nuxt.js」コンポーネント毎に責務の範囲を明確にしたら幸せになった話
「Atomic Design × Nuxt.js」コンポーネント毎に責務の範囲を明確にしたら幸せになった話
 
比較サイトの検索改善(SPA から SSR に変換)
比較サイトの検索改善(SPA から SSR に変換)比較サイトの検索改善(SPA から SSR に変換)
比較サイトの検索改善(SPA から SSR に変換)
 
コードの自動修正によって実現する、機能開発を止めないフレームワーク移行
コードの自動修正によって実現する、機能開発を止めないフレームワーク移行コードの自動修正によって実現する、機能開発を止めないフレームワーク移行
コードの自動修正によって実現する、機能開発を止めないフレームワーク移行
 
「やんちゃ、足りてる?」〜ヤンマガWebで挑戦を続ける新入りエンジニア〜
「やんちゃ、足りてる?」〜ヤンマガWebで挑戦を続ける新入りエンジニア〜「やんちゃ、足りてる?」〜ヤンマガWebで挑戦を続ける新入りエンジニア〜
「やんちゃ、足りてる?」〜ヤンマガWebで挑戦を続ける新入りエンジニア〜
 
法人向けメタバースプラットフォームの開発の裏側をのぞいてみた(仮)
法人向けメタバースプラットフォームの開発の裏側をのぞいてみた(仮)法人向けメタバースプラットフォームの開発の裏側をのぞいてみた(仮)
法人向けメタバースプラットフォームの開発の裏側をのぞいてみた(仮)
 

Recently uploaded

priority interrupt computer organization
priority interrupt computer organizationpriority interrupt computer organization
priority interrupt computer organizationchnrketan
 
Detection&Tracking - Thermal imaging object detection and tracking
Detection&Tracking - Thermal imaging object detection and trackingDetection&Tracking - Thermal imaging object detection and tracking
Detection&Tracking - Thermal imaging object detection and trackinghadarpinhas1
 
Uk-NO1 Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Exp...
Uk-NO1 Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Exp...Uk-NO1 Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Exp...
Uk-NO1 Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Exp...Amil baba
 
The Satellite applications in telecommunication
The Satellite applications in telecommunicationThe Satellite applications in telecommunication
The Satellite applications in telecommunicationnovrain7111
 
Introduction to Artificial Intelligence: Intelligent Agents, State Space Sear...
Introduction to Artificial Intelligence: Intelligent Agents, State Space Sear...Introduction to Artificial Intelligence: Intelligent Agents, State Space Sear...
Introduction to Artificial Intelligence: Intelligent Agents, State Space Sear...shreenathji26
 
Artificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewArtificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewsandhya757531
 
Design and Analysis of Algorithms Lecture Notes
Design and Analysis of Algorithms Lecture NotesDesign and Analysis of Algorithms Lecture Notes
Design and Analysis of Algorithms Lecture NotesSreedhar Chowdam
 
Theory of Machine Notes / Lecture Material .pdf
Theory of Machine Notes / Lecture Material .pdfTheory of Machine Notes / Lecture Material .pdf
Theory of Machine Notes / Lecture Material .pdfShreyas Pandit
 
Structural Integrity Assessment Standards in Nigeria by Engr Nimot Muili
Structural Integrity Assessment Standards in Nigeria by Engr Nimot MuiliStructural Integrity Assessment Standards in Nigeria by Engr Nimot Muili
Structural Integrity Assessment Standards in Nigeria by Engr Nimot MuiliNimot Muili
 
input buffering in lexical analysis in CD
input buffering in lexical analysis in CDinput buffering in lexical analysis in CD
input buffering in lexical analysis in CDHeadOfDepartmentComp1
 
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSneha Padhiar
 
KCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitosKCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitosVictor Morales
 
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Sumanth A
 
Network Enhancements on BitVisor for BitVisor Summit 12
Network Enhancements on BitVisor for BitVisor Summit 12Network Enhancements on BitVisor for BitVisor Summit 12
Network Enhancements on BitVisor for BitVisor Summit 12cjchen22
 
TEST CASE GENERATION GENERATION BLOCK BOX APPROACH
TEST CASE GENERATION GENERATION BLOCK BOX APPROACHTEST CASE GENERATION GENERATION BLOCK BOX APPROACH
TEST CASE GENERATION GENERATION BLOCK BOX APPROACHSneha Padhiar
 
Defining the Clouds for entriprises.pptx
Defining the Clouds for entriprises.pptxDefining the Clouds for entriprises.pptx
Defining the Clouds for entriprises.pptxAshwiniTodkar4
 
Prach: A Feature-Rich Platform Empowering the Autism Community
Prach: A Feature-Rich Platform Empowering the Autism CommunityPrach: A Feature-Rich Platform Empowering the Autism Community
Prach: A Feature-Rich Platform Empowering the Autism Communityprachaibot
 
22CYT12 & Chemistry for Computer Systems_Unit-II-Corrosion & its Control Meth...
22CYT12 & Chemistry for Computer Systems_Unit-II-Corrosion & its Control Meth...22CYT12 & Chemistry for Computer Systems_Unit-II-Corrosion & its Control Meth...
22CYT12 & Chemistry for Computer Systems_Unit-II-Corrosion & its Control Meth...KrishnaveniKrishnara1
 
Machine Learning 5G Federated Learning.pdf
Machine Learning 5G Federated Learning.pdfMachine Learning 5G Federated Learning.pdf
Machine Learning 5G Federated Learning.pdfadeyimikaipaye
 
Javier_Fernandez_CARS_workshop_presentation.pptx
Javier_Fernandez_CARS_workshop_presentation.pptxJavier_Fernandez_CARS_workshop_presentation.pptx
Javier_Fernandez_CARS_workshop_presentation.pptxJavier Fernández Muñoz
 

Recently uploaded (20)

priority interrupt computer organization
priority interrupt computer organizationpriority interrupt computer organization
priority interrupt computer organization
 
Detection&Tracking - Thermal imaging object detection and tracking
Detection&Tracking - Thermal imaging object detection and trackingDetection&Tracking - Thermal imaging object detection and tracking
Detection&Tracking - Thermal imaging object detection and tracking
 
Uk-NO1 Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Exp...
Uk-NO1 Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Exp...Uk-NO1 Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Exp...
Uk-NO1 Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Exp...
 
The Satellite applications in telecommunication
The Satellite applications in telecommunicationThe Satellite applications in telecommunication
The Satellite applications in telecommunication
 
Introduction to Artificial Intelligence: Intelligent Agents, State Space Sear...
Introduction to Artificial Intelligence: Intelligent Agents, State Space Sear...Introduction to Artificial Intelligence: Intelligent Agents, State Space Sear...
Introduction to Artificial Intelligence: Intelligent Agents, State Space Sear...
 
Artificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewArtificial Intelligence in Power System overview
Artificial Intelligence in Power System overview
 
Design and Analysis of Algorithms Lecture Notes
Design and Analysis of Algorithms Lecture NotesDesign and Analysis of Algorithms Lecture Notes
Design and Analysis of Algorithms Lecture Notes
 
Theory of Machine Notes / Lecture Material .pdf
Theory of Machine Notes / Lecture Material .pdfTheory of Machine Notes / Lecture Material .pdf
Theory of Machine Notes / Lecture Material .pdf
 
Structural Integrity Assessment Standards in Nigeria by Engr Nimot Muili
Structural Integrity Assessment Standards in Nigeria by Engr Nimot MuiliStructural Integrity Assessment Standards in Nigeria by Engr Nimot Muili
Structural Integrity Assessment Standards in Nigeria by Engr Nimot Muili
 
input buffering in lexical analysis in CD
input buffering in lexical analysis in CDinput buffering in lexical analysis in CD
input buffering in lexical analysis in CD
 
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
 
KCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitosKCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitos
 
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
 
Network Enhancements on BitVisor for BitVisor Summit 12
Network Enhancements on BitVisor for BitVisor Summit 12Network Enhancements on BitVisor for BitVisor Summit 12
Network Enhancements on BitVisor for BitVisor Summit 12
 
TEST CASE GENERATION GENERATION BLOCK BOX APPROACH
TEST CASE GENERATION GENERATION BLOCK BOX APPROACHTEST CASE GENERATION GENERATION BLOCK BOX APPROACH
TEST CASE GENERATION GENERATION BLOCK BOX APPROACH
 
Defining the Clouds for entriprises.pptx
Defining the Clouds for entriprises.pptxDefining the Clouds for entriprises.pptx
Defining the Clouds for entriprises.pptx
 
Prach: A Feature-Rich Platform Empowering the Autism Community
Prach: A Feature-Rich Platform Empowering the Autism CommunityPrach: A Feature-Rich Platform Empowering the Autism Community
Prach: A Feature-Rich Platform Empowering the Autism Community
 
22CYT12 & Chemistry for Computer Systems_Unit-II-Corrosion & its Control Meth...
22CYT12 & Chemistry for Computer Systems_Unit-II-Corrosion & its Control Meth...22CYT12 & Chemistry for Computer Systems_Unit-II-Corrosion & its Control Meth...
22CYT12 & Chemistry for Computer Systems_Unit-II-Corrosion & its Control Meth...
 
Machine Learning 5G Federated Learning.pdf
Machine Learning 5G Federated Learning.pdfMachine Learning 5G Federated Learning.pdf
Machine Learning 5G Federated Learning.pdf
 
Javier_Fernandez_CARS_workshop_presentation.pptx
Javier_Fernandez_CARS_workshop_presentation.pptxJavier_Fernandez_CARS_workshop_presentation.pptx
Javier_Fernandez_CARS_workshop_presentation.pptx
 

Machine Learning Hyperparameter Optimization

  • 1. Copyright © GREE, Inc. All Rights Reserved. 機械学習モデルのハイパパラメータ最適化
  • 2. Copyright © GREE, Inc. All Rights Reserved. • 尾崎 嘉彦 • グリー株式会社 エンジニア • Webゲーム開発 -> 機械学習 • 産総研 特定集中研究専門員 • ブラックボックス最適化 • 微分フリー最適化 • ハイパパラメータ最適化 発表者の紹介
  • 3. Copyright © GREE, Inc. All Rights Reserved. イントロダクション
  • 4. Copyright © GREE, Inc. All Rights Reserved.
  • 5. Copyright © GREE, Inc. All Rights Reserved. 機械学習におけるハイパパラメータ モデル自身や学習に関わる手法が持つ,性能に影響を及ぼす調整可能なパラメータ x t ln λ = −18 0 1 −1 0 1 x t ln λ = 0 0 1 −1 0 1 正則化項のはたらき (Bishop, 2006) Adam optimizer (Kingma and Ba 2015)
  • 6. Copyright © GREE, Inc. All Rights Reserved. モデルの複雑化に伴いハイパパラメータ数も増加 手作業や簡単な手法では細かい調整が手に負えない状況 7x7conv,64,/2 pool,/2 3x3conv,64 3x3conv,64 3x3conv,64 3x3conv,64 3x3conv,64 3x3conv,64 3x3conv,128,/2 3x3conv,128 3x3conv,128 3x3conv,128 3x3conv,128 3x3conv,128 3x3conv,128 3x3conv,128 3x3conv,256,/2 3x3conv,256 3x3conv,256 3x3conv,256 3x3conv,256 3x3conv,256 3x3conv,256 3x3conv,256 3x3conv,256 3x3conv,256 3x3conv,256 3x3conv,256 3x3conv,512,/2 3x3conv,512 3x3conv,512 3x3conv,512 3x3conv,512 3x3conv,512 avgpool fc1000 image 3x3conv,512 3x3conv,64 3x3conv,64 pool,/2 3x3conv,128 3x3conv,128 pool,/2 3x3conv,256 3x3conv,256 3x3conv,256 3x3conv,256 pool,/2 3x3conv,512 3x3conv,512 3x3conv,512 pool,/2 3x3conv,512 3x3conv,512 3x3conv,512 3x3conv,512 pool,/2 fc4096 fc4096 fc1000 image output size:112 output size:224 output size:56 output size:28 output size:14 output size:7 output size:1 VGG-1934-layerplain 7x7conv,64,/2 pool,/2 3x3conv,64 3x3conv,64 3x3conv,64 3x3conv,64 3x3conv,64 3x3conv,64 3x3conv,128,/2 3x3conv,128 3x3conv,128 3x3conv,128 3x3conv,128 3x3conv,128 3x3conv,128 3x3conv,128 3x3conv,256,/2 3x3conv,256 3x3conv,256 3x3conv,256 3x3conv,256 3x3conv,256 3x3conv,256 3x3conv,256 3x3conv,256 3x3conv,256 3x3conv,256 3x3conv,256 3x3conv,512,/2 3x3conv,512 3x3conv,512 3x3conv,512 3x3conv,512 3x3conv,512 avgpool fc1000 image 34-layerresidual Residual Network (He et al. 2016)
  • 7. Copyright © GREE, Inc. All Rights Reserved. ハイパパラメータ最適化の研究の盛り上がり 深層学習等の実用において必要不可欠な道具へ発展 • 探索空間が広大 • 関数評価コストが高価 • 目的関数がノイジー • 変数のタイプが多様 ベイズ最適化などを中心に研究が発展 (Hutter et al. 2015) ハイパパラメータ調整の自動化は最適化問題としてチャレンジング
  • 8. Copyright © GREE, Inc. All Rights Reserved. ハイパパラメータ最適化問題の定式化 性能指標(損失関数)を最小化するブラックボックス最適化と考えるのが標準的 Minimize f(λ) subject to λ ∈ Λ. 自分たちが観測できるのは,ノイズを伴った目的関数値のみ 目的関数が数式の形で明示的には与えられない fϵ(λ) = f(λ) + ϵ, ϵ iid ∼ N(0, σ2 n)
  • 9. Copyright © GREE, Inc. All Rights Reserved. ブラックボックス最適化 利点と欠点 • 目的関数値しか要らない • モデルや損失関数に依存せず極めて汎用的 • 目的関数の素性が不明 • 勾配情報が利用不可 (効率的な最適化手法を考えるのが難しい) • 微分フリー最適化手法が必要 利点 欠点
  • 10. Copyright © GREE, Inc. All Rights Reserved. ハイパパラメータ最適化問題の定式化 最適化対象として直接k-fold cross validation lossなどを考えるのが一般的 fϵ(λ) = 1 k k i=1 L(Aλ, Di train, Di valid)
  • 11. Copyright © GREE, Inc. All Rights Reserved. ハイパパラメータの分類 連続は一番扱いやすく,条件的は一番扱いにくい
  • 12. Copyright © GREE, Inc. All Rights Reserved. 最適化手法
  • 13. Copyright © GREE, Inc. All Rights Reserved. • Strong Anytime Performance • 厳しい制約のもとで,良い性能が得られること • Strong Final Performance • 緩い制約のもとで,非常に良い設定が得られること • Effective Use of Parallel Resources • 効率的に並列化できること • Scalability • 非常に多くのパラメータ数でも問題なく扱うことができること • Robustness & Flexibility • 目的関数値の観測ノイズや非常にセンシティブなパラメータに対して, 頑健かつ柔軟であること ハイパパラメータ最適化手法が満たすべき要件 (Falkner et al. 2018a) 全てを満たすのは難しいため,現実には目的に応じて取捨選択が必要
  • 14. Copyright © GREE, Inc. All Rights Reserved. 手法の分類 Dodge et al. (2017) λk {(λi , f(λi ))}k−1 i=1 λk {λi }k−1 i=1 • ベイズ最適化など • 目的関数値を活用して効率的に最適化 • 評価回数を少なく抑えられる傾向 • グリッドサーチやランダムサーチなど • 目的関数値に対する依存性がないため,リソースの許す限り並列評価が可能 • CPU時間に対する課金が主流のクラウド計算資源と相性がよい • ウォールクロックタイムを少なく抑えられる傾向
  • 15. Copyright © GREE, Inc. All Rights Reserved. グリッドサーチ ハイパパラメータ調整に言及していたNIPS2014の論文88本のうち84本が使用 (Simm 2015)
  • 16. Copyright © GREE, Inc. All Rights Reserved. グリッドサーチ 利点と欠点 • 並列化しやすく,計算リソースに対してスケーラブル • 低実効次元性(後述)に著しく脆弱 • 計算量がパラメータ数の指数オーダーのためノンスケーラブル • 局所・大域的最適解を見つける能力が貧弱
  • 17. Copyright © GREE, Inc. All Rights Reserved. 実験計画法 (Design of Experiments) 最良の点を中心とするより狭い範囲を反復的にサンプリング (Staelin 2002) 黒:2-level DOE 白:3-level DOE 黒:2-level DOEの1反復目 白:左下黒を最良と仮定した2反復目
  • 18. Copyright © GREE, Inc. All Rights Reserved. ランダムサーチ グリッドサーチと並んで最もシンプルな手法
  • 19. Copyright © GREE, Inc. All Rights Reserved. ランダムサーチ 利点と欠点 • 並列化しやすく,計算リソースに対してスケーラブル • パラメータ数に対してスケーラブル • 低実効次元性(後述)に頑健 • 局所・大域的最適解を見つける能力が貧弱 利点 欠点
  • 20. Copyright © GREE, Inc. All Rights Reserved. 低実効次元性 (Low Effective Dimensionality) モデル性能にとって重要なパラメータは少数であるためグリッドサーチは非効率, またデータセット毎にそれらは異なる (Bergstra et al. 2012) Important parameter Unimportantparameter Important parameter Unimportantparameter f(λ1, λ2) = g(λ1) + h(λ2) ≈ g(λ1)
  • 21. Copyright © GREE, Inc. All Rights Reserved. • Hutter et al. (2014) • functional ANOVAによるアプローチで重要なハイパパラメータを特定 • Fawcett and Hoos (2016) • 2つの設定間で最もパフォーマンスに貢献しているパラメータを調べるablation analysis • Biedenkapp et al. (2017) • サロゲートを用いることでablation analysisを高速化 • van Rijn and Hutter (2017a, b) • functional ANOVAを用いて大規模にデータセット間のハイパパラメータ重要性を分析 重要なハイパパラメータの特定 近年の研究動向
  • 22. Copyright © GREE, Inc. All Rights Reserved. 低食い違い量列 (Low Discrepancy Sequence) 一様ランダムの代わりにSobol列やLatin Hypercube Samplingの使用を提案,計算実験の 結果Sobol列が有望 (Bergstra et al. 2012),Dodge et al. 2017はk-DPPの使用を提案 Uniform Sobol LHS
  • 23. Copyright © GREE, Inc. All Rights Reserved. Nelder-Mead法 (Nelder and Mead 1965) 反復的に単体を変形し最適化,Rのoptim関数の標準手法として採用されている 1次元,2次元および3次元単体
  • 24. Copyright © GREE, Inc. All Rights Reserved. λ⁰ λ2 λ¹ λic λc λoc λr λe f(λ0 ) ≤ f(λ1 ) ≤ f(λ2 ) Nelder-Mead法 (Nelder and Mead 1965)
  • 25. Copyright © GREE, Inc. All Rights Reserved. λ⁰ λ2 λ¹ λic λc λoc λr λe Reflect: λr = λc + δr (λc − λn ) where λc = n−1 i=0 λi /n Nelder-Mead法 (Nelder and Mead 1965)
  • 26. Copyright © GREE, Inc. All Rights Reserved. λ⁰ λ2 λ¹ λic λc λoc λr λe Expand: λe = λc + δe (λc − λn ) Nelder-Mead法 (Nelder and Mead 1965)
  • 27. Copyright © GREE, Inc. All Rights Reserved. λ⁰ λ2 λ¹ λic λc λoc λr λe Outside contract: λoc = λc + δoc (λc − λn ) Nelder-Mead法 (Nelder and Mead 1965)
  • 28. Copyright © GREE, Inc. All Rights Reserved. λ⁰ λ2 λ¹ λic λc λoc λr λe Inside contract: λic = λc + δic (λc − λn ) Nelder-Mead法 (Nelder and Mead 1965)
  • 29. Copyright © GREE, Inc. All Rights Reserved. λ⁰ λ2 λ¹ λic λ1s λoc λr λe λ2s Shrink: λ0 + γs (λi − λ0 ) : i = 0, . . . , n} Nelder-Mead法 (Nelder and Mead 1965)
  • 30. Copyright © GREE, Inc. All Rights Reserved. λ0 λ1 λ2 f(λ0 ) ≤ f(λ1 ) ≤ f(λ2 ) Nelder-Mead法 (Nelder and Mead 1965)
  • 31. Copyright © GREE, Inc. All Rights Reserved. λ0 λ1 λr λ2 Reflect Nelder-Mead法 (Nelder and Mead 1965)
  • 32. Copyright © GREE, Inc. All Rights Reserved. λ0 λ1 λr λe λ2 f(λr ) < f(λ0 ) Expand Nelder-Mead法 (Nelder and Mead 1965)
  • 33. Copyright © GREE, Inc. All Rights Reserved. λ0 λ1 λe f(λr ) f(λe ) λ2 Nelder-Mead法 (Nelder and Mead 1965)
  • 34. Copyright © GREE, Inc. All Rights Reserved. λ1 λ2 λr λ0 Nelder-Mead法 (Nelder and Mead 1965)
  • 35. Copyright © GREE, Inc. All Rights Reserved. λ1 λ2 λr λ0 λoc f(λ1 ) ≤ f(λr ) < f(λ2 ) Outside contract Nelder-Mead法 (Nelder and Mead 1965)
  • 36. Copyright © GREE, Inc. All Rights Reserved. λ2 λ1 λ0 f(λoc ) ≤ f(λ2 ) λ2 λoc Nelder-Mead法 (Nelder and Mead 1965)
  • 37. Copyright © GREE, Inc. All Rights Reserved. λ2 λ1 λr λ0 λe Nelder-Mead法 (Nelder and Mead 1965)
  • 38. Copyright © GREE, Inc. All Rights Reserved. λ0 λ2 λ1 Nelder-Mead法 (Nelder and Mead 1965)
  • 39. Copyright © GREE, Inc. All Rights Reserved. λ1 λ0 λ2 Nelder-Mead法 (Nelder and Mead 1965)
  • 40. Copyright © GREE, Inc. All Rights Reserved. λ1 λ0 λ2 λic λr f(λr ) ≥ f(λ2 ) Inside contract Nelder-Mead法 (Nelder and Mead 1965)
  • 41. Copyright © GREE, Inc. All Rights Reserved. λ2 λ0 λ1 Reflect Contract λ2 Shrink Nelder-Mead法 (Nelder and Mead 1965)
  • 42. Copyright © GREE, Inc. All Rights Reserved. λ2 λ0 λ1 Reflect Contract λ2 Shrink Nelder-Mead法 (Nelder and Mead 1965)
  • 43. Copyright © GREE, Inc. All Rights Reserved. λ2 λ1 λ0 Nelder-Mead法 (Nelder and Mead 1965)
  • 44. Copyright © GREE, Inc. All Rights Reserved. McCormick benchmark function Nelder-Mead法 (Nelder and Mead 1965)
  • 45. Copyright © GREE, Inc. All Rights Reserved. 利点と欠点 収束性や失敗する例,改良した手法などはConn et al. (2009); Audet and Hare (2017) 利点 • 局所解を見つける能力に優れる • 部分的な並列化しかできない • 悪質な局所解に陥る可能性がある 欠点 Nelder-Mead法 (Nelder and Mead 1965)
  • 46. Copyright © GREE, Inc. All Rights Reserved. • 標準的な選択 係数の選択 0 < γs < 1, −1 < δic < 0 < δoc < δr < δe γs = 1 2 , δic = −1 2 , δoc = 1 2 , δr = 1 and δe = 2 γs = 1 − 1 n , δic = − 3 4 + 1 2n , δoc = 3 4 − 1 2n , δr = 1, δe = 1 + 2 n where n ≥ 2 • 適応的な係数 (Gao and Han 2012) Nelder-Mead法 (Nelder and Mead 1965)
  • 47. Copyright © GREE, Inc. All Rights Reserved. ベイズ最適化 現在最も注目されているハイパパラメータ最適化手法(この例は最大化問題)
  • 48. Copyright © GREE, Inc. All Rights Reserved. ベイズ最適化 • ベイズ最適化 • サロゲートをベイズ的に構築するSMBOの総称 •       を考えるP(fϵ(λ) | λ) • サロゲートの種類 • ガウス過程 (GP) • 最も標準的,有名な実装はSpearmint (Snoek et al. 2012) • ランダムフォレスト • SMAC (Hutter et al. 2011) • Tree Parzen Estimator (TPE) (Bergstra et al. 2011) • 実装はHyperopt •            を考える • DNN (Snoek et al. 2015) P(λ | fϵ(λ)), P(fϵ(λ)) • Sequential Model-based Optimization (SMBO) • 反復的に関数評価とサロゲート(目的関数のモデル)の更新を繰り返す手法の総称 • ベイズ最適化や信頼領域法 (Ghanbari and Scheinberg 2017)
  • 49. Copyright © GREE, Inc. All Rights Reserved. • ガウス分布 • スカラ,ベクトル上の分布 • ガウス過程 • 関数上の分布 ベイズ最適化 ガウス過程回帰に基づく方法 −1 −0.5 0 0.5 1 −3 −1.5 0 1.5 3 ガウス過程からのサンプル (Bishop, 2006)
  • 50. Copyright © GREE, Inc. All Rights Reserved. • 目的関数が平均関数mと共分散関数kにより特徴づけされるGPに従うと仮定 • 事前平均関数としては      とするのが標準的 ベイズ最適化 ガウス過程回帰に基づく方法 fϵ(λ) ∼ GP(m(λ), k(λ, λ′ )) m(λ) = 0
  • 51. Copyright © GREE, Inc. All Rights Reserved. • カーネルはモデルの形を特徴づける • 2点間の近さを抽象化したようなもの • 適切なカーネルを選べばカテゴリ的・条件的パラメータも扱える ベイズ最適化 共分散関数(カーネル) Exponentiated Quadratic Matérn 5/2 Kernels / Covariance functions (PyMC3)
  • 52. Copyright © GREE, Inc. All Rights Reserved. • ARD squared exponential kernel • ARD Matérn 5/2 kernel • カーネルのハイパパラメータはデータから動的に決める • 経験ベイズ (Bishop 2006) • Markov Chain Monte Carlo (MCMC) (Snoek et al. 2012) 共分散関数(カーネル)の選択 (Snoek et al. 2012) kse(λ, λ′ ) = θ0 exp(− 1 2 r2 (λ, λ′ )), r2 (λ, λ′ ) = D d=1 (λd − λ′ d)2 /(θd )2 k52(λ, λ′ ) = θ0 (1 + 5r2(λ, λ′) + 5 3 r2 (x, λ′ )) exp(− 5r2(λ, λ′)) ベイズ最適化
  • 53. Copyright © GREE, Inc. All Rights Reserved. ベイズ最適化 PRML 6章,カーネルのハイパパラメータの影響 (Bishop 2006) (1.00, 4.00, 0.00, 0.00) −1 −0.5 0 0.5 1 −3 −1.5 0 1.5 3 (9.00, 4.00, 0.00, 0.00) −1 −0.5 0 0.5 1 −9 −4.5 0 4.5 9 (1.00, 64.00, 0.00, 0.00) −1 −0.5 0 0.5 1 −3 −1.5 0 1.5 3 (1.00, 0.25, 0.00, 0.00) −1 −0.5 0 0.5 1 −3 −1.5 0 1.5 3 (1.00, 4.00, 10.00, 0.00) −1 −0.5 0 0.5 1 −9 −4.5 0 4.5 9 (1.00, 4.00, 0.00, 5.00) −1 −0.5 0 0.5 1 −4 −2 0 2 4 k(λ, λ′ ) = θ0 exp − θ1 2 ∥λ − λ′ ∥2 + θ2 + θ3 λ⊤ λ′
  • 54. Copyright © GREE, Inc. All Rights Reserved. ベイズ最適化 mとkを決めれば,過去の観測から未観測点の関数値を予測できる ガウス分布の性質とSchurの公式から導出される (Rasmussen and Williams 2005; Bishop 2006) データがないとまともに予測できないので,ランダムサーチなどでデータを集めて初期化しておく P(fϵ(λt+1 ) | λ1 , λ2 , . . . , λt+1 ) = N(µt(λt+1 ), σ2 t (λt+1 ) + σ2 n), µt(λt+1 ) = k⊤ [K + σ2 nI]−1 [f(λ1 ) f(λ2 ) · · · f(λt )]⊤ , σ2 t (λt+1 ) = k(λt+1 , λt+1 ) − k⊤ [K + σ2 nI]−1 k where k = [k(λt+1 , λ1 ) k(λt+1 , λ2 ) · · · k(λt+1 , λt )]⊤ , K = ⎡ ⎢ ⎣ k(λ1 , λ1 ) · · · k(λ1 , λt ) ... ... ... k(λt , λ1 ) · · · k(λt , λt ) ⎤ ⎥ ⎦ .
  • 55. Copyright © GREE, Inc. All Rights Reserved. ベイズ最適化 観測点の近くでは分散小,離れると分散大(予測が不確かになる) Brochu et al. (2010)
  • 56. Copyright © GREE, Inc. All Rights Reserved. ベイズ最適化 次に評価する点の選び方 • 獲得関数と呼ばれる指標を最大化する点を次に評価する点として選ぶ • 獲得関数は探索と知識利用のトレードオフを担う • サロゲートの分散が大きい点を評価(探索) • サロゲートの平均が小さい点を評価(知識利用) aUCB(λ) = −µ(λ) + ξσ(λ) • 例:GP-Upper Confidence Bound (GP-UCB) (Srinivas 2012)
 解きたいのは損失最小化問題なので-µ(λ) • Probability of Improvement (PI), Expected Improvement (EI), Predictive Entropy Search (PES) など色々あり,探索性能に大きく影響
  • 57. Copyright © GREE, Inc. All Rights Reserved. ベイズ最適化 利点と欠点 利点 欠点 • 探索と知識利用のトレードオフを考慮した大域的な探索が可能 • 観測ノイズを考慮した探索が可能 • 共分散関数と獲得関数に対してセンシティブ • 獲得関数の最適化が非凸大域的最適化 • ガウス過程回帰の場合,観測データ数の3乗オーダーの計算量 • 並列化が難しい
  • 58. Copyright © GREE, Inc. All Rights Reserved. サロゲートの計算量削減 近年の研究動向 [K + σ2 nI]−1 • ガウス過程回帰のボトルネック: • 近似計算 (Quiñonero-Candela et al. 2007; Titsias 2009) • 計算量が相対的に少ないサロゲート • ランダムフォレスト (Hutter et al. 2011) • DNN (Snoek et al. 2015)
  • 59. Copyright © GREE, Inc. All Rights Reserved. • Shah and Ghahramani (2015) • Parallel Predictive Entropy Search • Gonzalez et al. (2016) • Local Penalization • Kathuria et al. (2016) • DPP sampling • Kandasamy et al. (2018) • 非同期並列Thompson sampling • この他にも沢山 • Bergstra et al. (2011); Snoek et al. (2012); Contal et al. (2013); Desautels et al. (2014); Daxberger and Low (2017); Wang et al. (2017, 2018a); Rubin (2018) ベイズ最適化の並列化 近年の研究動向
  • 60. Copyright © GREE, Inc. All Rights Reserved. ベイズ最適化 (再掲)この例は最大化問題
  • 61. Copyright © GREE, Inc. All Rights Reserved. その他の手法 適用事例報告がある主なもの • CMA-ES • Watanabe and Le Roux (2014); Loshchilov and Hutter (2016) • Particle Swarm Optimization (PSO) • Meissner et al. (2006); Lin et al. (2009); Lorenzo et al. (2017); Ye (2017) • Genetic Algorithm (GA) • Leung et al. (2003); Young et al. (2015) • Differential Evolution (DE) • Fu et al. (2016a,b) • 強化学習 • Hansen (2016); Bello et al. (2017); Dong et al. (2018) • 勾配法 (※ブラックボックス最適化でない,連続パラメータのみ) • Maclaurin et al. (2015); Luketina et al. (2016); Pedregosa (2016); Franceschi (2017a,b,c, 2018a,b)
  • 62. Copyright © GREE, Inc. All Rights Reserved. 補助的なテクニック
  • 63. Copyright © GREE, Inc. All Rights Reserved. • Domhan et al. (2015) • 11種類の基底関数の重み付き線形和で学習曲線をモデル化 • ベイジアンネットワークを使用 (Klein et al. 2016) • 過去のデータを活用 (Chandrashekaran and Lane 2017) 早期終了 エポック数に対する学習曲線を予測し,良い性能を達成する見込みのない学習を停止 fcomb = k i=1 wi fi (λ | θi) + ϵ, ϵ ∼ N(0, σ2 ), k i=1 wi = 1, ∀wi , wi ≥ 0
  • 64. Copyright © GREE, Inc. All Rights Reserved. • 異なる解像度でハイパパラメータ最適化後,functional ANOVAにより重要なパラメータを分析 • 多くの重要なパラメータとその値は解像度に依らず同じ (e.g. 学習率,バッチサイズ) • 解像度の影響を受けるものは直後にmax-poolingを伴う畳込み層の数など(poolingすると 解像度が減るため)-> 高解像度化した際の適切な初期値は低解像度の場合から推測する • 32×32で750回評価,64×64で500回評価,128×128で250回評価を行いハイパパラメータ最 適化しても精度は落ちず,128×128で1500回評価するよりも早く終わる Increasing Image Sizes (IIS) (Hinz et al. 2018) 低解像度の画像を用いてハイパパラメータを最適化を始め,徐々に解像度を上げていく
  • 65. Copyright © GREE, Inc. All Rights Reserved. • Successive Halving (Jamieson and Talwalkar 2015) • 複数のハイパパラメータ設定候補を評価 • 下位候補を棄却,リソースを上位候補に多く割当て直して評価を継続 • 課題 • 候補数をnリソースをBとしたとき,nとB/nの適切なトレードオフは非自明 Hyperband (Li et al. 2016) リソース (e.g. 学習時間,教師データ数) を適応的に割り当てる
  • 66. Copyright © GREE, Inc. All Rights Reserved. Hyperband (Li et al. 2016) 提案手法:グリッドサーチのようにnとB/nのトレードオフを複数試す ランダムサーチやベイズ最適化と組み合わせる (Bertrand et al. 2017; Falkner et al. 2018; Wang et al. 2018)
  • 67. Copyright © GREE, Inc. All Rights Reserved. • 仮説:近いデータセットに対するハイパパラメータ最適化結果は似ている • e.g. 学習データが増えたので,モデルを再学習する場合 • メタ特徴量 • ハンドメイド • シンプルな特徴量(e.g. データ数,次元数,クラス数) • 統計学や情報理論に基づく特徴 (e.g. 分布の歪度) • ランドマーク特徴(決定木などシンプルな機械学習モデルの性能) • 深層学習 (Kim et al. 2017a,b) • 近いデータセットのハイパパラメータ最適化結果で手法を初期化しウォームスタート • PSO (Gomes et al. 2012) • GA (Reif et al. 2012) • ベイズ最適化 (Bardenet et al. 2013; Yogatama and Mann 2014; Feurer et al. 2014,2015,2018; Kim et al. 2017a,b) メタ学習とウォームスタート 近年の研究動向
  • 68. Copyright © GREE, Inc. All Rights Reserved. • Sampling (Arnold and Beyer 2006) • 設定をn回評価し,平均値を取る • Threshold Selection Equipped with Re-evaluation
 (Markon et al. 2001; Beielstein and Markon 2002; Jin and Branke 2005; Goh and Tan 2007; Gießen and Kötzing 2016) • 目的関数値が最良値をしきい値以上改善した場合にsampling • Value Suppression (Wang et al. 2018b) • best-k設定が一定期間更新されないときにbest-k設定をsamplingし,関数値を修正 ノイズ対策 近年の研究動向
  • 69. Copyright © GREE, Inc. All Rights Reserved. 計算実験
  • 70. Copyright © GREE, Inc. All Rights Reserved. CNNのハイパパラメータ最適化 (Ozaki et al. 2017) 以下を5つの手法でハイパパラメータ最適化する Name Description Range x1 Learning rate (= 0.1x1 ) [1, 4] x2 Momentum (= 1 − 0.1x2 ) [0.5, 2] x3 L2 weight decay [0.001, 0.01] x∗ 4 FC1 units [256, 1024] Integer parameters are marked with ∗ . データセット:MNIST ネットワーク:LeNet,Batch-Normalized Maxout Network in Network タスク:文字認識(10クラス分類) Name Description Range x1 Learning rate (= 0.1x1 ) [0.5, 2] x2 Momentum (= 1 − 0.1x2 ) [0.5, 2] x3 L2 weight decay [0.001, 0.01] x4 Dropout 1 [0.4, 0.6] x5 Dropout 2 [0.4, 0.6] x6 Conv 1 initialization deviation [0.01, 0.05] x7 Conv 2 initialization deviation [0.01, 0.05] x8 Conv 3 initialization deviation [0.01, 0.05] x9 MMLP 1-1 initialization deviation [0.01, 0.05] x10 MMLP 1-2 initialization deviation [0.01, 0.05] x11 MMLP 2-1 initialization deviation [0.01, 0.05] x12 MMLP 2-2 initialization deviation [0.01, 0.05] x13 MMLP 3-1 initialization deviation [0.01, 0.05] x14 MMLP 3-2 initialization deviation [0.01, 0.05] Batch-Normalized Mahout Network in Network (Chang and Chen 2015) MMLP (Maxout Multi Layer Perceptron) LeNet (LeCun et al. 1998) MNIST (LeCun and Cortes, 2010)
  • 71. Copyright © GREE, Inc. All Rights Reserved. CNNのハイパパラメータ最適化 (Ozaki et al. 2017) 文字認識 (LeNet) 結果 Mean loss of all executions for each method per iteration (LeNet)
  • 72. Copyright © GREE, Inc. All Rights Reserved. CNNのハイパパラメータ最適化 (Ozaki et al. 2017) 文字認識 (LeNet) 結果 Method mean loss min loss Random search 0.005411 (±0.001413) 0.002781 Bayesian optimization 0.004217 (±0.002242) 0.000089 CMA-ES 0.000926 (±0.001420) 0.000047 Coordinate-search method 0.000052 (±0.000094) 0.000002 Nelder-Mead method 0.000029 (±0.000029) 0.000004 Method mean accuracy (%) accuracy with min loss (%) Random search 98.98 (±0.08) 99.06 Bayesian optimization 99.07 (±0.02) 99.25 CMA-ES 99.20 (±0.08) 99.30 Coordinate-search method 99.26 (±0.05) 99.35 Nelder-Mead method 99.24 (±0.04) 99.28
  • 73. Copyright © GREE, Inc. All Rights Reserved. CNNのハイパパラメータ最適化 (Ozaki et al. 2017) 文字認識 (Batch-Normalized Mahout Network in Network) 結果 Mean loss of all executions for each method per iteration (Batch-Normalized Maxout Network in Network)
  • 74. Copyright © GREE, Inc. All Rights Reserved. CNNのハイパパラメータ最適化 (Ozaki et al. 2017) 文字認識 (Batch-Normalized Mahout Network in Network) 結果 Method mean loss min loss Random search 0.045438 (±0.002142) 0.042694 Bayesian optimization 0.045636 (±0.001197) 0.044447 CMA-ES 0.045248 (±0.002537) 0.042250 Coordinate-search method 0.045131 (±0.001088) 0.043639 Nelder-Mead method 0.044549 (±0.001079) 0.043238 Method mean accuracy (%) accuracy with min loss (%) Random search 99.56 (±0.02) 99.58 Bayesian optimization 99.47 (±0.05) 99.59 CMA-ES 99.49 (±0.14) 99.59 Coordinate-search method 99.48 (±0.04) 99.53 Nelder-Mead method 99.53 (±0.00) 99.54
  • 75. Copyright © GREE, Inc. All Rights Reserved. CNNのハイパパラメータ最適化 (Ozaki et al. 2017) データセット:Adience benchmark ネットワーク:Gil and Tal (2015) タスク: (1)性別推定(2クラス分類) (2)年齢層推定(8クラス分類) Name Description Range x1 Learning rate (= 0.1x1 ) [1, 4] x2 Momentum (= 1 − 0.1x2 ) [0.5, 2] x3 L2 weight decay [0.001, 0.01] x4 Dropout 1 [0.4, 0.6] x5 Dropout 2 [0.4, 0.6] x∗ 6 FC 1 units [512, 1024] x∗ 7 FC 2 units [256, 512] x8 Conv 1 initialization deviation [0.01, 0.05] x9 Conv 2 initialization deviation [0.01, 0.05] x10 Conv 3 initialization deviation [0.01, 0.05] x11 FC 1 initialization deviation [0.001, 0.01] x12 FC 2 initialization deviation [0.001, 0.01] x13 FC 3 initialization deviation [0.001, 0.01] x14 Conv 1 bias [0, 1] x15 Conv 2 bias [0, 1] x16 Conv 3 bias [0, 1] x17 FC 1 bias [0, 1] x18 FC 2 bias [0, 1] x∗ 19 Normalization 1 localsize (= 2x19 + 3) [0, 2] x∗ 20 Normalization 2 localsize (= 2x20 + 3) [0, 2] x21 Normalization 1 alpha [0.0001, 0.0002] x22 Normalization 2 alpha [0.0001, 0.0002] x23 Normalization 1 beta [0.5, 0.95] x24 Normalization 2 beta [0.5, 0.95] Integer parameters are marked with ∗ . Adience benchmark (Eran et al. 2014)
  • 76. Copyright © GREE, Inc. All Rights Reserved. 性別推定結果 Mean loss of all executions for each method per iteration (gender classification CNN) CNNのハイパパラメータ最適化 (Ozaki et al. 2017)
  • 77. Copyright © GREE, Inc. All Rights Reserved. CNNのハイパパラメータ最適化 (Ozaki et al. 2017) 性別推定結果 Method mean loss min loss Random search 0.001732 (±0.000540) 0.000984 Bayesian optimization 0.00183 (±0.000547) 0.001097 CMA-ES 0.001804 (±0.000480) 0.001249 Coordinate-search method 0.002240 (±0.001448) 0.000378 Nelder-Mead method 0.000395 (±0.000129) 0.000245 Method mean accuracy (%) accuracy with min loss (%) Random search 87.93 (±0.24) 88.21 Bayesian optimization 88.07 (±0.27) 87.85 CMA-ES 88.20 (±0.38) 88.55 Coordinate-search method 87.04 (±0.52) 87.72 Nelder-Mead method 88.38 (±0.47) 88.83
  • 78. Copyright © GREE, Inc. All Rights Reserved. CNNのハイパパラメータ最適化 (Ozaki et al. 2017) 年齢層推定結果 Mean loss of all executions for each method per iteration (age classification CNN)
  • 79. Copyright © GREE, Inc. All Rights Reserved. CNNのハイパパラメータ最適化 (Ozaki et al. 2017) 年齢層推定結果 Method mean loss min loss Random search 0.035694 (±0.006958) 0.026563 Bayesian optimization 0.024792 (±0.003076) 0.020466 CMA-ES 0.031244 (±0.010834) 0.016952 Coordinate-search method 0.032244 (±0.006109) 0.024637 Nelder-Mead method 0.015492 (±0.002276) 0.013556 Method mean accuracy (%) accuracy with min loss (%) Random search 57.18 (±0.96) 57.90 Bayesian optimization 56.28 (±1.68) 57.19 CMA-ES 57.17 (±0.80) 58.19 Coordinate-search method 55.06 (±2.31) 56.98 Nelder-Mead method 56.72 (±0.50) 57.42
  • 80. Copyright © GREE, Inc. All Rights Reserved. CNNのハイパパラメータ最適化 (Ozaki et al. 2017) 局所探索法が良い結果を出せた理由はなにか 仮説:目的関数が多くの良質な局所解を持つ? ->肯定的な結果(NMは異なる局所解に収束も,良い性能) Parallel coordinates plot of the optimized hyperparameters of the gender classification CNN • Olof (2018)による追試 • NMはCNNに対して確かに上手くいく,RNNに対しては微妙 • 平均的にはCNN/RNNいずれもTPEが良かった (ベイズ最適化でもGPの方は全然ダメだった) • 実験を通して最良の結果を見つけたのはCNN/RNNいずれについてもNM • CNNに共通するロス関数の性質がRNNでは成り立たないと指摘 • Snoek et al. (2012)らの実験ではGPを用いたベイズ最適化が,TPEより優れていたと報告
  • 81. Copyright © GREE, Inc. All Rights Reserved. 計算実験 様々な課題 • 基本的にどの論文も提案手法が一番という結論を主張する • 提案手法は念入りにチューニングしてあるものと考える • 再現性の問題 • 手法の実装(ソースコード公開),ランダム性及びチューニング • 十分な計算リソースが手元にない • モデルの評価結果を記録した表形式のデータセット (Klein et al. 2018) • 実験設定がまちまち • HPOLib (Eggensperger et al. 2013) • 手法比較の方法 • 基準(e.g. 精度,AUC)と順位付けの手法 (Dewancker et al. 2016) • 検証データへの過学習 • 実用においてはデータセットをtraining / validation / testの3つに分割して おきチューニング後の性能がtestにおいて乖離し過ぎていないか確認
  • 82. Copyright © GREE, Inc. All Rights Reserved. 結論
  • 83. Copyright © GREE, Inc. All Rights Reserved. 結論 これから熱くなると予想するトピック • 脱グリッドサーチ • ランダムサーチをはじめとする他の手法を使用 • 状況に応じて利点と欠点を考慮 • 自分と近い実験設定の論文を参考 • 研究トピック • 最適化手法 • 関連手法 (e.g. 重要なパラメータの特定,学習曲線予測) • 再現性の担保やベンチマークの整備 • 応用 (AutoML e.g. CASH problem,モデルアーキテクチャ探索)
 Combined Algorithm Selection and Hyperparameter Optimization (CASH)
  • 84. Copyright © GREE, Inc. All Rights Reserved. 付録
  • 85. Copyright © GREE, Inc. All Rights Reserved. Coordinate Search法 Maximal positive basisを活用した探索 (Conn et al., 2009; Audet and Hare, 2017) D⊕ D⊕ = {±ei : i = 1, 2, . . . , n}
  • 86. Copyright © GREE, Inc. All Rights Reserved. Coordinate Search法 λ0 ∈ Λ(⊂ Rn ) δ0 ∈ R with δ > 0 ϵ ∈ [0, ∞) λ0
  • 87. Copyright © GREE, Inc. All Rights Reserved. Coordinate Search法 Pk = {λk + δk d : d ∈ D⊕} f(λ) < f(λk ) λ ∈ Pk λ0 λ
  • 88. Copyright © GREE, Inc. All Rights Reserved. Coordinate Search法 λk+1 = λ δk+1 = δk λ0 λ1
  • 89. Copyright © GREE, Inc. All Rights Reserved. Coordinate Search法 λ0 λ1 Pk = {λk + δk d : d ∈ D⊕} f(λ) < f(λk ) λ ∈ Pk
  • 90. Copyright © GREE, Inc. All Rights Reserved. Coordinate Search法 λk+1 = λ δk+1 = δk λ0 λ1 λ2
  • 91. Copyright © GREE, Inc. All Rights Reserved. Coordinate Search法 λ0 λ1 λ2 λ3 Pk = {λk + δk : d ∈ D⊕} f(λ) < f(λk ) λ ∈ Pk
  • 92. Copyright © GREE, Inc. All Rights Reserved. Coordinate Search法 λk+1 = λk δk+1 = δk /2 λ0 λ1 λ2 λ3 =λ4
  • 93. Copyright © GREE, Inc. All Rights Reserved. Coordinate Search法 λ0 λ1 λ2 λ3 =λ4 =λ5 λk+1 = λk δk+1 = δk /2
  • 94. Copyright © GREE, Inc. All Rights Reserved. Coordinate Search法 δk+1 ≤ ϵ λ0 λ1 λ2 λ3 =λ4 =λ5 λ6
  • 95. Copyright © GREE, Inc. All Rights Reserved. Coordinate Search法 McCormick benchmark function
  • 96. Copyright © GREE, Inc. All Rights Reserved. Coordinate Search法 Pros and Cons • 局所解を見つける能力 • 並列化は部分的にのみ可能 • 座標軸に沿い反復的に探索を行うため次元数に対して低スケーラブル • 大域的な探索を行わないため,悪質な局所解に陥るリスク 収束性や失敗する例,改良した手法などはConn et al. (2009); Audet and Hare (2017)
  • 97. Copyright © GREE, Inc. All Rights Reserved. Coordinate Search法 探索空間の正規化 • ハイパパラメータ間のスケールが違いすぎると探索が非効率化 • 探索空間を予め単位超立方体に正規化して防止 • 実用上は無効値となる場合,適当に大きな損失値を返す
  • 98. Copyright © GREE, Inc. All Rights Reserved. • 初期点の決め方 • 悪質な局所解に陥る問題に対して有効な方法 Coordinate Search法 初期化の戦略 • 探索範囲の中心で初期化 • 数回のランダムサーチを行い,最も良かった点で初期化 • 異なる初期点からのマルチスタート
  • 99. Copyright © GREE, Inc. All Rights Reserved. Coordinate Search法 探索の戦略 (Audet and Hare 2017) • Opportunistic polling • 良いものが見つかった時点で採用 • 固定された順番 • 完全にランダム • 直前に改善した方向からスタート • Complete polling(スケールしない) • 反復の度に全ての候補を評価して最良の値を選択
  • 100. Copyright © GREE, Inc. All Rights Reserved. • Weighted Hamming distance kernel (Hutter et al. 2011) ベイズ最適化 カテゴリ的パラメータを扱うためのカーネル kmixed(λ, λ′ ) = exp(rcont + rcat), rcont(λ, λ′ ) = l∈Λcont (−θl(λl − λ′ l)2 ), rcat(λ, λ′ ) = l∈Λcat −θl(1 − δ(λl, λ′ l)). where δ is the Kronecker delta function
  • 101. Copyright © GREE, Inc. All Rights Reserved. • Conditional kernel (Lévesque et al. 2017) • 条件的パラメータのための別のカーネル (Swersky et al. 2014) ベイズ最適化 条件パラメータを扱うためのカーネル kc(λ, λ′ ) = k(λ, λ′ ) if λc = λ′ c ∀c ∈ C 0 otherwise where C is the set of indices of active conditional hyperparameters
  • 102. Copyright © GREE, Inc. All Rights Reserved. ベイズ最適化 具体的なガウス過程回帰の計算 µ1(λ2 ) = k(λ2 , λ1 )f(λ1 ) µ2(λ3 ) = k(λ3 , λ1 ) k(λ3 , λ2 ) 1 k(λ1 , λ2 ) k(λ2 , λ1 ) 1 −1 f(λ1 ) f(λ2 ) = 1 1 − k(λ1, λ2)2 k(λ3 , λ1 ) k(λ3 , λ2 ) 1 −k(λ1 , λ2 ) −k(λ2 , λ1 ) 1 f(λ1 ) f(λ2 ) = 1 1 − k(λ1, λ2)2 k(λ3 , λ1 ) − k(λ2 , λ1 )k(λ3 , λ2 ) k(λ3 , λ2 ) − k(λ2 , λ1 )k(λ3 , λ1 ) f(λ1 ) f(λ2 ) = 1 1 − k(λ1, λ2)2 (k(λ3 , λ1 ) − k(λ2 , λ1 )k(λ3 , λ2 ))f(λ1 ) + (k(λ3 , λ2 ) − k(λ2 , λ1 )k(λ3 , λ1 ))f(λ2 ) λ1 λ2 λ3 k(λ, λ′ ) = exp −1 2 ∥λ − λ′ ∥2 k(λ3 , λ1 ) k(λ2 , λ1 ) k(λ3 , λ2 ) f(λ1 ) f(λ3 )
  • 103. Copyright © GREE, Inc. All Rights Reserved. • Probability of Improvement (PI) (Kushner 1964) • Expected Improvement (EI) (Mockus et al. 1978) • 改善量を加味,よく使われる • Predictive Entropy Search (PES) (Henrández- Lobato et al. 2014) • 情報量を最大化 ベイズ最適化 獲得関数の補足 aPI = P(f(λ) ≤ f(λ∗ ) − ξ) = φ f(λ∗ ) − ξ − µ(λ) σ(λ) λ∗ Φ ξ PIの可視化 (Brochu et al. 2010) ※この図は最大化問題のため左式とは少し異なる
  • 104. Copyright © GREE, Inc. All Rights Reserved. ベイズ最適化 獲得関数の最大化手法 • 獲得関数最大化自体が非凸大域的最適化 • 最適化手法 • Brochu (2010) • DIRECT (Jones et al. 1993) • Bergstra (2011) • Estimation of Distribution (EDA) (Larraanaga and Lozano 2011) • Covariance Matrix Adaptation Evolution Strategy (CMA- ES) (Hansen 2006)
  • 105. Copyright © GREE, Inc. All Rights Reserved. • 多腕バンディット • 複数の候補から最も良いものを逐次的に探す • スロットマシンの累積報酬最大化問題 • ハイパパラメータ最適化は連続 / 無限腕バンディットや最適腕識別として考えられる • ベイズ最適化は平均ケースを考えている • バンディットは最悪ケースのリグレット最小化を考えるのが一般的 • 関連研究 • Srinivas et al. (2010, 2012); Bull (2011); Kandasamy et al. (2015, 2017)など ベイズ最適化と多腕バンディットの繋がり 近年の研究動向
  • 106. Copyright © GREE, Inc. All Rights Reserved. 参考文献
  • 107. Copyright © GREE, Inc. All Rights Reserved. Christopher M. Bishop. Pattern recognition and machine learning. Information science and statistics. Springer, New York, 2006. ISBN 978-0-387-31073-2. Diederik P. Kingma and Jimmy Ba. Adam: A Method for Stochastic Optimization. arXiv:1412.6980 [cs], December 2014. URL http://arxiv.org/abs/ 1412.6980. arXiv:1412.6980. He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. Frank Hutter, Jörg Lücke, and Lars Schmidt-Thieme. Beyond Manual Tuning of Hyperparameters. KI - Künstliche Intelligenz, 29(4):329–337, November 2015. ISSN 0933-1875, 1610-1987. doi: 10.1007/s13218-015-0381-0. URL http://link.springer.com/10.1007/s13218-015-0381-0. Stefan Falkner, Aaron Klein, and Frank Hutter. Practical hyperparameter optimization for deep learning, 2018a. URL https://openreview.net/forum?id=HJMudFkDf. Jesse Dodge, Kevin Jamieson, and Noah A. Smith. Open Loop Hyperparameter Optimization and Determinantal Point Processes. arXiv:1706.01566 [cs, stat], June 2017. URL http://arxiv.org/abs/1706.01566. arXiv: 1706.01566. Jaak Simm. Survey of hyperparameter optimization in NIPS2014, 2015. URL https://github.com/jaak-s/nips2014-survey. Carl Staelin. Parameter selection for support vector machines. 2002. URL http://www.hpl.hp.com/techreports/2002/HPL-2002-354R1.html. James Bergstra and Yoshua Bengio. Random search for hyper-parameter optimization. J. Mach. Learn. Res., 13:281–305, February 2012. ISSN 1532-4435. URL http://dl.acm.org/citation.cfm?id=2188385.2188395. Frank Hutter, Holger Hoos, and Kevin Leyton-Brown. An efficient approach for assessing hyperparameter importance. In Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32, ICML’14, pages I—754–I—762. JMLR.org, 2014. URL http://dl.acm.org/citation.cfm?id=3044805.3044891. 参考文献
  • 108. Copyright © GREE, Inc. All Rights Reserved. Chris Fawcett and Holger H. Hoos. Analysing differences between algorithm configurations through ablation. Journal of Heuristics, 22(4):431–458, Aug 2016. ISSN 1572-9397. doi:10.1007/s10732-014-9275-9. URL https://doi.org/10.1007/s10732-014-9275-9. Andre Biedenkapp, Marius Lindauer, Katharina Eggensperger, Frank Hutter, ChrisFawcett, and Holger Hoos. Efficient parameter importance analysis via ablation with surrogates, 2017. URL https://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14750. Jan N van Rijn and Frank Hutter. An empirical study of hyperparameter importance across datasets. In AutoML@PKDD/ECML, 2017a. Jan N van Rijn and Frank Hutter. Hyperparameter importance across datasets. arXiv preprint arXiv:1710.04725, 2017b. J. A. Nelder and R. Mead. A Simplex Method for Function Minimization. The Computer Journal, 7(4):308–313, January 1965. ISSN 0010-4620, 1460-2067. doi: 10.1093/comjnl/7.4.308. URL https://academic.oup.com/comjnl/article-lookup/doi/10.1093/comjnl/7.4.308. Andrew R. Conn, Katya Scheinberg, and Luis N. Vicente. Introduction to Derivative-Free Optimization. Society for Industrial and Applied Mathematics, January 2009. ISBN 978-0-89871-668-9 978-0-89871-876-8. doi: 10.1137/1.9780898718768. URL http://epubs.siam.org/doi/book/ 10.1137/1.9780898718768. Charles Audet and Warren Hare. Derivative-Free and Blackbox Optimization. Springer Series in Operations Research and Financial Engineering. Springer International Publishing, Cham, 2017. ISBN 978-3-319-68912-8 978-3-319-68913-5. doi: 10.1007/978-3-319-68913-5. URL http:// link.springer.com/10.1007/978-3-319-68913-5. Fuchang Gao and Lixing Han. Implementing the Nelder-Mead simplex algorithm with adaptive parameters. Computational Optimization and Applications, 51(1):259–277, January 2012. ISSN 0926-6003, 1573-2894. doi: 10.1007/s10589-010-9329-3. URL http://link.springer.com/10.1007/ s10589-010-9329-3. Hiva Ghanbari and Katya Scheinberg. Black-Box Optimization in Machine Learning with Trust Region Based Derivative Free Algorithm. arXiv: 1703.06925 [cs], March 2017. URL http://arxiv.org/abs/1703.06925. arXiv: 1703.06925. Jasper Snoek, Hugo Larochelle, and Ryan P Adams. Practical bayesian optimization of machine learning algorithms. In Advances in neural information processing systems, pages 2951–2959, 2012. 参考文献
  • 109. Copyright © GREE, Inc. All Rights Reserved. Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown. Sequential Model-Based Optimization for General Algorithm Configuration. In Carlos A. Coello Coello, editor, Learning and Intelligent Optimization, pages 507–523, Berlin, Heidelberg, 2011. Springer Berlin Heidelberg. ISBN 978-3-642-25566-3. James Bergstra, Rémi Bardenet, Yoshua Bengio, and Balázs Kégl. Algorithms for hyperparameter optimization. In Proceedings of the 24th International Conference on Neural Information Processing Systems, NIPS’11, pages 2546–2554, USA, 2011. Curran Associates Inc. ISBN 978-1-61839-599-3. URL http://dl.acm.org/citation.cfm?id=2986459.2986743. Jasper Snoek, Oren Rippel, Kevin Swersky, Ryan Kiros, Nadathur Satish, Narayanan Sundaram, Md. Mostofa Ali Patwary, Prabhat Prabhat, and Ryan P. Adams. Scalable bayesian optimization using deep neural networks. In Proceedings of the 32Nd International Conference on International Conference on Machine Learning - Volume 37, ICML’15, pages 2171– 2180. JMLR.org, 2015. URL http://dl.acm.org/citation.cfm?id=3045118.3045349. Carl Edward Rasmussen and Christopher K. I. Williams. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). The MIT Press, 2005. ISBN 026218253X.32 Eric Brochu, Vlad M. Cora, and Nando de Freitas. A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning. arXiv:1012.2599 [cs], December 2010. URL http://arxiv.org/abs/1012.2599. arXiv: 1012.2599. Niranjan Srinivas, Andreas Krause, Sham M. Kakade, and Matthias W. Seeger. Information-theoretic regret bounds for gaussian process optimization in the bandit setting. IEEE Transactions on Information Theory, 58:3250–3265, 2012. J. Quiñonero-Candela, CE. Rasmussen, and CKI. Williams. Approximation Methods for Gaussian Process Regression, pages 203–223. Neural Information Processing. MIT Press, Cambridge, MA, USA, September 2007. Michalis Titsias. Variational learning of inducing variables in sparse gaussian processes. In David van Dyk and Max Welling, editors, Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics, volume 5 of Proceedings of Machine Learning Research, pages 567–574, Hilton Clearwater Beach Resort, Clearwater Beach, Florida USA, 16–18 Apr 2009. PMLR. URL http://proceedings.mlr.press/v5/titsias09a.html. Amar Shah and Zoubin Ghahramani. Parallel predictive entropy search for batch global optimization of expensive objective functions. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2, NIPS’15, pages 3330–3338, Cambridge, MA, USA, 2015. MIT Press. URL http://dl.acm.org/citation.cfm? id=2969442.2969611. Javier Gonzalez, Zhenwen Dai, Philipp Hennig, and Neil Lawrence. Batch bayesian optimization via local penalization. In Arthur Gretton and Christian C. Robert, editors, Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, volume 51 of Proceedings of Machine Learning Research, pages 648–657, Cadiz, Spain, 09–11 May 2016. PMLR. URL http://proceedings.mlr.press/v51/gonzalez16a.html. 参考文献
  • 110. Copyright © GREE, Inc. All Rights Reserved. Tarun Kathuria, Amit Deshpande, and Pushmeet Kohli. Batched Gaussian Process Bandit Optimization via Determinantal Point Processes. arXiv:1611.04088 [cs], November 2016. URL http://arxiv.org/abs/1611.04088. arXiv: 1611.04088. Kirthevasan Kandasamy, Akshay Krishnamurthy, Jeff Schneider, and Barnabas Poczos. Parallelised bayesian optimisation via thompson sampling. In Amos Storkey and Fernando Perez-Cruz, editors, Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, volume 84 of Proceedings of Machine Learning Research, pages 133–142, Playa Blanca, Lanzarote, Canary Islands, 09–11 Apr 2018. PMLR. URL http://proceedings.mlr.press/v84/kandasamy18a.html. Emile Contal, David Buffoni, Alexandre Robicquet, and Nicolas Vayatis. Parallel gaussian process optimization with upper confidence bound and pure exploration. In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases - Volume 8188, ECML PKDD 2013, pages 225–240, New York, NY, USA, 2013. Springer-Verlag New York, Inc. ISBN 978-3-642-40987-5. doi: 10.1007/978-3-642-40988-2_15. URL http://dx.doi.org/10.1007/978-3-642-40988-2_15. Thomas Desautels, Andreas Krause, and Joel W. Burdick. Parallelizing Exploration-Exploitation Tradeoffs in Gaussian Process Bandit Optimization. Journal of Machine Learning Research, 15:4053–4103, 2014. URL http://jmlr.org/papers/v15/desautels14a.html. Erik A. Daxberger and Bryan Kian Hsiang Low. Distributed batch Gaussian process optimization. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 951–960, International Convention Centre, Sydney, Australia, 06–11 Aug 2017. PMLR. URL http://proceedings.mlr.press/v70/daxberger17a.html. Zi Wang, Chengtao Li, Stefanie Jegelka, and Pushmeet Kohli. Batched high-dimensional Bayesian optimization via structural kernel learning. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 3656– 3664, International Convention Centre, Sydney, Australia, 06–11 Aug 2017. PMLR. URL http://proceedings.mlr.press/v70/wang17h.html. Zi Wang, Clement Gehring, Pushmeet Kohli, and Stefanie Jegelka. Batched large-scalebayesian optimization in high-dimensional spaces. In Amos Storkey and Fernando Perez-Cruz, editors, Proceedings of the Twenty-First nternational Conference on Artificial Intelligence and Statistics, volume 84 of Proceedings of Machine Learning Research, pages 745–754, Playa Blanca, Lanzarote, Canary Islands, 09–11 Apr 2018b. PMLR. URL http://proceedings.mlr.press/v84/wang18c.html. Ran Rubin. New Heuristics for Parallel and Scalable Bayesian Optimization. arXiv:1807.00373 [cs, stat], July 2018. URL http://arxiv.org/abs/1807.00373. arXiv: 1807.00373. Watanabe, Shinji, and Jonathan Le Roux. Black box optimization for automatic speech recognition. 2014. Loshchilov, Ilya, and Frank Hutter. CMA-ES for Hyperparameter Optimization of Deep Neural Networks. 2016. 参考文献
  • 111. Copyright © GREE, Inc. All Rights Reserved. Michael Meissner, Michael Schmuker, and Gisbert Schneider. Optimized Particle Swarm Optimization (OPSO) and its application to artificial neural network training. BMC Bioinformatics, 7(1):125, March 2006. ISSN 1471-2105. doi: 10.1186/1471-2105-7-125. URL https://doi.org/10.1186/1471-2105-7-125. Shih-Wei Lin, Shih-Chieh Chen, Wen-Jie Wu, and Chih-Hsien Chen. Parameter determination and feature selection for back-propagation network by particle swarm optimization. Knowledge and Information Systems, 21(2):249–266, November 2009. ISSN 0219-3116. doi: 10.1007/s10115-009-0242-y. URL https://doi.org/10.1007/s10115-009-0242-y. Pablo Ribalta Lorenzo, Jakub Nalepa, Luciano Sanchez Ramos, and José Ranilla Pastor. Hyper-parameter selection in deep neural networks using parallel particle swarm optimization. In Proceedings of the Genetic and Evolutionary Computation Conference Companion, pages 1864–1871. ACM, 2017. Fei Ye. Particle swarm optimization-based automatic parameter selection for deep neural networks and its applications in large-scale and high- dimensional data. PLOS ONE, 12 (12):1–36, 2017. doi: 10.1371/journal.pone.0188746. URL https://doi.org/10.1371/journal.pone.0188746. F. H. F. Leung, H. K. Lam, S. H. Ling, and P. K. S. Tam. Tuning of the structure and parameters of a neural network using an improved genetic algorithm. Neural Networks, IEEE Transactions on, 14(1):79–88, February 2003. doi: 10.1109/tnn.2002.804317. URL http://dx.doi.org/10.1109/tnn.2002.804317. Steven R Young, Derek C Rose, Thomas P Karnowski, Seung-Hwan Lim, and Robert M Patton. Optimizing deep learning hyper-parameters through an evolutionary algorithm. In Proceedings of the Workshop on Machine Learning in High-Performance Computing Environments, page 4. ACM, 2015. Wei Fu, Tim Menzies, and Xipeng Shen. Tuning for software analytics: Is it really necessary? Information and Software Technology, 76:135 – 146, 2016a. ISSN 0950-5849. doi: https://doi.org/10.1016/j.infsof.2016.04.017. URL http://www.sciencedirect.com/science/article/pii/S0950584916300738. Wei Fu, Vivek Nair, and Tim Menzies. Why is Differential Evolution Better than Grid Search for Tuning Defect Predictors? arXiv:1609.02613 [cs, stat], September 2016b. URL http://arxiv.org/abs/1609.02613. arXiv: 1609.02613. Samantha Hansen. Using deep q-learning to control optimization hyperparameters. arXiv preprint arXiv:1602.04062, 2016. Irwan Bello, Barret Zoph, Vijay Vasudevan, and Quoc V Le. Neural optimizer search with reinforcement learning. In International Conference on Machine Learning, pages 459–468, 2017. 参考文献
  • 112. Copyright © GREE, Inc. All Rights Reserved. Xingping Dong, Jianbing Shen, Wenguan Wang, Yu Liu, Ling Shao, and Fatih Porikli. Hyperparameter optimization for tracking with continuous deep q-learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 518–527, 2018. Dougal Maclaurin, David Duvenaud, and Ryan P. Adams. Gradient-based hyperparameter optimization through reversible learning. In Proceedings of the 32Nd International Conference on International Conference on Machine Learning - Volume 37, ICML’15, pages 2113–2122. JMLR.org, 2015. URL http://dl.acm.org/citation.cfm?id=3045118.3045343. Jelena Luketina, Mathias Berglund, Klaus Greff, and Tapani Raiko. Scalable gradientbased tuning of continuous regularization hyperparameters. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, ICML’16, pages 2952–2960. JMLR.org, 2016. URL http://dl.acm.org/citation.cfm? id=3045390.3045701. Fabian Pedregosa. Hyperparameter optimization with approximate gradient. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, ICML’16, pages 737–746. JMLR.org, 2016. URL http://dl.acm.org/citation.cfm?id=3045390.3045469. Luca Franceschi, Michele Donini, Paolo Frasconi, and Massimiliano Pontil. On hyperparameter optimization in learning systems. In Proceedings of the 5th International Conference on Learning Representations (Workshop Track), 2017a. Luca Franceschi, Michele Donini, Paolo Frasconi, and Massimiliano Pontil. A Bridge Between Hyperparameter Optimization and Larning-to-learn. arXiv:1712.06283 [cs, stat], December 2017b. URL http://arxiv.org/abs/1712.06283. arXiv: 1712.06283. Luca Franceschi, Michele Donini, Paolo Frasconi, and Massimiliano Pontil. Forward and reverse gradient-based hyperparameter optimization. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 1165–1173, International ConventionCentre, Sydney, Australia, 06–11 Aug 2017c. PMLR. URL http://proceedings.mlr. press/v70/franceschi17a.html. Luca Franceschi, Paolo Frasconi, Saverio Salzo, Riccardo Grazzi, and Massimiliano Pontil. Bilevel programming for hyperparameter optimization and meta-learning. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 1563–1572, Stockholmsmässan, Stockholm Sweden, 10–15 Jul 2018a. PMLR. URL http://proceedings.mlr.press/v80/franceschi18a.html. Luca Franceschi, Riccardo Grazzi, Massimiliano Pontil, Saverio Salzo, and Paolo Frasconi. Far-ho: A bilevel programming package for hyperparameter optimization and metalearning. CoRR, abs/1806.04941, 2018b. URL http://arxiv.org/abs/1806.04941. Tobias Domhan, Jost Tobias Springenberg, and Frank Hutter. Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves. In Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI’15, pages 3460–3468. AAAI Press, 2015. ISBN 978-1-57735-738-4. URL http://dl.acm.org/ citation.cfm?id=2832581.2832731. 参考文献
  • 113. Copyright © GREE, Inc. All Rights Reserved. Aaron Klein, Stefan Falkner, Jost Tobias Springenberg, and Frank Hutter. Learning curve prediction with bayesian neural networks. 2016. Akshay Chandrashekaran and Ian R. Lane. Speeding up Hyper-parameter Optimization by Extrapolation of Learning Curves Using Previous Builds. In Michelangelo Ceci, Jaakko Hollmén, Ljupčo Todorovski, Celine Vens, and Sašo Džeroski, editors, Machine Learning and Knowledge Discovery in Databases, pages 477–492, Cham, 2017. Springer International Publishing. ISBN 978-3-319-71249-9. Tobias Hinz, Nicolás Navarro-Guerrero, Sven Magg, and Stefan Wermter. Speeding up the hyperparameter optimization of deep convolutional neural networks. International Journal of Computational Intelligence and Applications, page 1850008, 2018. Lisha Li, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, and Ameet Talwalkar. Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization. Journal of Machine Learning Research, 18(185):1–52, 2018. URL http://jmlr.org/papers/v18/16-558.html. Hadrien Bertrand, Roberto Ardon, Matthieu Perrot, and Isabelle Bloch. Hyperparameter optimization of deep neural networks : Combining hyperband with bayesian model selection. 2017. Stefan Falkner, Aaron Klein, and Frank Hutter. Bohb: Robust and efficient hyperparameter optimization at scale. In International Conference on Machine Learning, pages 1436–1445, 2018b. Jiazhuo Wang, Jason Xu, and Xuejun Wang. Combination of Hyperband and Bayesian Optimization for Hyperparameter Optimization in Deep Learning. arXiv:1801.01596 [cs], January 2018a. URL http://arxiv.org/abs/1801.01596. arXiv: 1801.01596. Jungtaek Kim, Saehoon Kim, and Seungjin Choi. Learning to Warm-Start Bayesian Hyperparameter Optimization. ArXiv e-prints, October 2017. Jungtaek Kim, Saehoon Kim, and Seungjin Choi. Learning to transfer initializations for bayesian hyperparameter optimization. arXiv preprint arXiv: 1710.06219, 2017. T Gomes, P Miranda, R Prudêncio, C Soares, and A Carvalho. Combining meta-learning and optimization algorithms for parameter selection. In 5 th PLANNING TO LEARN WORKSHOP WS28 AT ECAI 2012, page 6. 2012. 参考文献
  • 114. Copyright © GREE, Inc. All Rights Reserved. Matthias Reif, Faisal Shafait, and Andreas Dengel. Meta-learning for evolutionary parameter optimization of classifiers. Machine learning, 87(3):357– 380, 2012. Rémi Bardenet, Mátyás Brendel, Balázs Kégl, and Michele Sebag. Collaborative hyperparameter tuning. In International Conference on Machine Learning, pages 199–207, 2013. Dani Yogatama and Gideon Mann. Efficient transfer learning method for automatic hyperparameter tuning. In Artificial Intelligence and Statistics, pages 1077–1085, 2014. Matthias Feurer, Jost Tobias Springenberg, and Frank Hutter. Using meta-learning to initialize bayesian optimization of hyperparameters. In Proceedings of the 2014 International Conference on Meta-learning and Algorithm Selection-Volume 1201, pages 3–10. 2014. Matthias Feurer, Jost Tobias Springenberg, and Frank Hutter. Initializing bayesian hyperparameter optimization via meta-learning. In AAAI, pages 1128–1135, 2015. Matthias Feurer, Benjamin Letham, and Eytan Bakshy. Scalable meta-learning for bayesian optimization. arXiv preprint arXiv:1802.02219, 2018. Dirk V Arnold and H-G Beyer. A general noise model and its effects on evolution strategy performance. IEEE Transactions on Evolutionary Computation, 10(4):380–391, 2006. Sandor Markon, Dirk V Arnold, Thomas Back, Thomas Beielstein, and H-G Beyer. Thresholding-a selection operator for noisy es. In Evolutionary Computation, 2001. Proceedings of the 2001 Congress on, volume 1, pages 465–472. IEEE, 2001. Thomas Beielstein and Sandor Markon. Threshold selection, hypothesis tests, and doe methods. In Evolutionary Computation, 2002. CEC’02. Proceedings of the 2002 Congress on, volume 1, pages 777–782. IEEE, 2002. Yaochu Jin and Jürgen Branke. Evolutionary optimization in uncertain environments-a survey. IEEE Transactions on evolutionary computation, 9(3): 303–317, 2005. 参考文献
  • 115. Copyright © GREE, Inc. All Rights Reserved. Chi Keong Goh and Kay Chen Tan. An investigation on noisy environments in evolutionary multiobjective optimization. IEEE Transactions on Evolutionary Computation, 11(3):354–381, 2007. Christian Gießen and Timo Kötzing. Robustness of populations in stochastic environments. Algorithmica, 75(3):462–489, 2016. Hong Wang, Hong Qian, and Yang Yu. Noisy derivative-free optimization with value suppression. 2018b. Yoshihiko Ozaki, Masaki Yano, and Masaki Onishi. Effective hyperparameter optimization using Nelder-Mead method in deep learning. IPSJ Transactions on Computer Vision and Applications, 9(1), December 2017. ISSN 1882-6695. doi: 10.1186/s41074-017-0030-7. URL https:// ipsjcva.springeropen.com/articles/10.1186/s41074-017-0030-7. LeCun Y, Cortes C MNIST handwritten digit database. http://yann.lecun.com/exdb/mnist/. 2010. LeCun Y, Bottou L, Bengio Y, Patrick H Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324, 1998. Chang JR, Chen YS Batch-Normalized Maxout Network in Network. In: Proceedings of the 33rd International Conference on Machine Learning. 2015. https://arxiv.org/abs/1511.02583. Eran E, Roee E, Tal E Age and gender estimation of unfiltered faces. IEEE Trans Inf Forensic Secur 9(12):2170–2179, 2014. Gil L, Tal H Age and gender classification using convolutional neural networks. Computer Vision and Pattern Recognition Workshops (CVPRW). 2015. http://ieeexplore.ieee.org/document/7301352. Skogby Steinholtz Olof. A comparative study of black-box optimization algorithms for tuning of hyper-parameters in deep neural networks, 2018. 参考文献
  • 116. Copyright © GREE, Inc. All Rights Reserved. Aaron Klein, Eric Christiansen, Kevin Murphy, and Frank Hutter. Towards reproducible neural architecture and hyperparameter search. 2018. Katharina Eggensperger, Matthias Feurer, Frank Hutter, James Bergstra, Jasper Snoek, Holger Hoos, and Kevin Leyton-Brown. Towards an empirical foundation for assessing bayesian optimization of hyperparameters. In NIPS workshop on Bayesian Optimization in Theory and Practice, volume 10, page 3, 2013. Ian Dewancker, Michael McCourt, Scott Clark, Patrick Hayes, Alexandra Johnson, and George Ke. A strategy for ranking optimization methods using multiple criteria. In Workshop on Automatic Machine Learning, pages 11–20, 2016. Julien-Charles Lévesque, Audrey Durand, Christian Gagné, and Robert Sabourin. Bayesian optimization for conditional hyperparameter spaces. In Proc. of the International Joint Conference on Neural Networks (IJCNN). IEEE, 05 2017. Kevin Swersky, David Duvenaud, Jasper Snoek, Frank Hutter, and Michael A Osborne. Raiders of the lost architecture: Kernels for bayesian optimization in conditional parameter spaces. arXiv preprint arXiv:1409.4011, 2014a. Harold J. Kushner. A New Method of Locating the Maximum Point of an Arbitrary Multipeak Curve in the Presence of Noise. Journal of Basic Engineering, 86(1):97+, 1964. ISSN 00219223. doi: 10.1115/1.3653121. URL http://dx.doi.org/10.1115/1.3653121. Jonas Mockus, Vytautas Tiesis, and Antanas Zilinskas. The application of bayesian methods for seeking the extremum. Towards Global Optimization, 1978. José Miguel Henrández-Lobato, Matthew W. Hoffman, and Zoubin Ghahramani. Predictive entropy search for efficient global optimization of black- box functions. In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 1, NIPS’14, pages 918–926, Cambridge, MA, USA, 2014. MIT Press. URL http://dl.acm.org/citation.cfm?id=2968826.2968929. D. R. Jones, C. D. Perttunen, and B. E. Stuckman. Lipschitzian optimization without the Lipschitz constant. Journal of Optimization Theory and Applications, 79(1):157–181, October 1993. ISSN 1573-2878. doi: 10.1007/BF00941892. URL https://doi.org/10.1007/BF00941892. Pedro Larraanaga and Jose A. Lozano. Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation. Kluwer Academic Publishers, Norwell, MA, USA, 2001. ISBN 0792374665. 参考文献
  • 117. Copyright © GREE, Inc. All Rights Reserved. Nikolaus Hansen. The CMA Evolution Strategy: A Comparing Review. In Jose A. Lozano, Pedro Larrañaga, Iñaki Inza, and Endika Bengoetxea, editors, Towards a New Evolutionary Computation: Advances in the Estimation of Distribution Algorithms, pages 75–102. Springer Berlin Heidelberg, Berlin, Heidelberg, 2006. ISBN 978-3-540-32494-2. doi: 10.1007/3-540-32494-1_4. URL https://doi.org/10.1007/3-540-32494-1_4. Niranjan Srinivas, Andreas Krause, Sham Kakade, and Matthias Seeger. Gaussian process optimization in the bandit setting: No regret and experimental design. In Proceedings of the 27th International Conference on International Conference on Machine Learning, ICML’10, pages 1015– 1022, USA, 2010. Omnipress. ISBN 978-1-60558-907-7. URL http://dl.acm.org/citation.cfm?id=3104322.3104451. Niranjan Srinivas, Andreas Krause, Sham M. Kakade, and Matthias W. Seeger. Information-theoretic regret bounds for gaussian process optimization in the bandit setting. IEEE Transactions on Information Theory, 58:3250–3265, 2012. Adam D. Bull. Convergence rates of efficient global optimization algorithms. J. Mach. Learn. Res., 12:2879–2904, November 2011. ISSN 1532-4435. URL http://dl.acm.org/citation.cfm?id=1953048.2078198. Kirthevasan Kandasamy, Jeff Schneider, and Barnabás Póczos. High dimensional bayesian optimisation and bandits via additive models. In Proceedings of the 32Nd International Conference on International Conference on Machine Learning - Volume 37, ICML’15, pages 295–304. JMLR.org, 2015. URL http://dl.acm.org/citation.cfm?id=3045118.3045151. Kirthevasan Kandasamy. Tuning hyper-parameters without grad students: Scaling up bandit optimisation. 2017. 参考文献