その2: Ghost BatchNormalization(GBN)
• 小さいバッチに分けてそれぞれのバッチごとにBNを行う
Small Batch
Small Batch
Small Batch
Small Batch
BN
BN
BN
BN
Large Batch
Ghost Batch Normalization(BN)
分割
出力
結合
19.
その3: 学習回数を増やす
• バッチサイズを大きくすると1epochあたりの学習回数は減っ
てしまう
Small Batch
Small Batch
Small Batch
Small Batch
Large Batch
Large Batch
4回 2回
参考文献
• (Hoffer etal., 2017): Train longer, generalize better: closing the
generalization gap in large batch training of neural networks.
• (Wilson et al., 2017): The Value of Adaptive Gradient Methods in
Machine Learning.
• (Dinh et al., 2017): Sharp Minima Can generalize For Deep Nets.
• (Kesker et al., 2017): On Large-Batch Training For Deep Learning:
Generalization Gap and Sharp Minima.
• (Chaudhari et al., 2017): Entropy-sgd: Biasing gradient descent
into wide valleys.
• (Hochreiter & Schmidhuber, 1997): Flat minima.