Ensemble normalization for stable training

Ensemble Normalization for Stable Training
𝑆𝑒𝑜𝑢𝑛𝑔-𝐻𝑜 𝐶ℎ𝑜𝑖 * and 𝐽𝑖 𝑊𝑜𝑜𝑛𝑔 𝐶ℎ𝑜𝑖
1. Department of Electronic Information Engineering, Hansung University
2. Department of Trade, Hansung University
1. Introduction
Research on normalization is needed to correct and learn internal covariate shifts.
Excessive internal covariate shifts can result in vanishing or exploding gradients. However,
learning by conventional single normalization can be learned in an unoptimized direction.
In order to guide the learning of different features in the right direction through learning,
we propose an ensemble normalization that learns different features through two
normalizations.
Abstract Normalization studies have been conducted to solve vanishing or exploding gradient for stable training. This phenomenon leads to poor model training. Existing normalization method has used a
single normalization. However, single normalization has limitations in stabilizing learning through the model weight of internal covariate shift prevention. For improvement, we propose new
normalization method, an ensemble method that combines two normalization methods. We analyze the existing normalization method through an ablation study. To verify the proposed method,
We experimented two models and two datasets in semantic segmentation task and two models and a dataset in generation task. As a result, we confirmed that the new normalization method is
effective for improving stable training and semantic segmentation and generation performance.
2. Related Works
Existing normalization studies have been conducted such as batch normalization (BN),
instance normalization (IN), and group normalization (GN). BN normalizes samples in mini-
batch units through using mean and variance of each mini-batch and also add scaling and
shifting factors to the layers. However, BN does not perform well at small batch sizes. IN
normalizes across each channel in each sample. IN performs well on style transfer
regardless of the batch size. GN is similar to IN but normalizes over groups of channels for
each sample. GN performed as well as BN on ImageNet.
3. Ensemble Normalization
Figure 1. Process of Normalization Method, a) Existing normalization method, b)
Ensemble normalization method
Ensemble normalization calculates the addition of batch normalization and existing
normalization. Then apply division 2 to the preceding result. The above calculation
sequence is shown in Figure 1. Input is a group of feature maps convoluted by a
convolutional layer. Output is a group of feature maps normalizaed by a normalization
layer which is one of the conventional methods in a). However in b), the normalization
layer is consist of two methods. To have a clear understanding of the existing methods, We
conducted the ablation study on normalization to conduct an impact analysis on each
existing normalization method before verifying the proposed method. In semantic
segmentation task, we analyzed each normalization in the learning process through Focal
loss, Hinge loss, and Cross entropy. Experimental results show average results from Focal
loss, Hinge loss, and Cross entropy. The experimental method uses VOC and ATR datasets
for FCN and U-Net and uses the average value of three losses and uses the average value
through five iterative experiments. In generation task, we analyzed each normalization by
using GAN and LSGAN models and MNIST dataset.
5. Conclusion
We propose a new normalization which is an ensemble normalization. First, the impact of
existing normalization was normalization was analyzed through ablation studies to
determine the impact of the existing method. We also verified the ensemble normalization
method proposed in this paper. As a result, we confirmed that the proposed method is
effective for stable training on semantic segmentation and generation.
a) b)
FCN
FCNIN
FCNBN
FCNGN
Figure 2. Four feature maps in the second layer of the model, each with BN, IN, and GN
applied. a) existing normalizations, b) the proposed normalization.
As shown in figure 2, a comparison is made between the conventional normalization and
the proposed method. In the case of existing normalization, each feature map is not
different from each other, whereas in the feature map of the proposed method, different
feature maps can be confirmed by looking at BIN (combined BN with IN) and BGN
(combined BN with GN). This means that many different features have been trained,
which leads to performance gains. In the case of BBN (combined BN with BN), on the
other hand, three of the four feature maps show little difference. This means that rich
features are not trained, and as shown as in table2, they do not perform relatively well.
GAN
GANBN
GANGN
GANBBN
GANIN GANBIN
GANBGN
LSGAN
LSGANBN
LSGANGN
LSGANBBN
LSGANIN LSGANBIN
LSGANBGN
Figure 2. Four generated images of GAN and LSGAN with conventional normalization
methods or ensemble normalization methods.
4. Experimental Results
Table1. Quantitative comparison of two segmentation models(FCN and U-Net) using the
VOC dataset and two generation models(GAN and LSGAN) using the MNIST dataset.
Table 1 shows experimental results using two datasets for the existing methods and the
proposed method and overall performance is improved. It is because they learn
complementary to each other by catching the characteristics that cannot be captured by
existing normalization methods through other normalization methods. With the addition of
instance normalization, batch normalization, and group normalization, the loss each
changes were +2.3%, +0.4%, and -4.2% for the FCN model and in the U-Net model, there
were -42.9%, -20.3%, and -54.2%, respectively.
Also, overall structure similarity(SSIM) values have been improved in the generator
models. Through this experiment, we confirmed that the proposed method works well for
the generator models.
As shown as in figure 2, it is confirmed that By applying the proposed normalization
method, the image is clearer than when applying the conventional normalization method.

Ensemble normalization for stable training

Recommended

Recommended

More Related Content

Similar to Ensemble normalization for stable training

Similar to Ensemble normalization for stable training (20)

More from Seoung-Ho Choi

More from Seoung-Ho Choi (20)

Recently uploaded

Recently uploaded (20)

Ensemble normalization for stable training

Editor's Notes