Design and Development of a Provenance Capture Platform for Data Science
Ensemble normalization for stable training
1. Ensemble Normalization for Stable Training
𝑆𝑒𝑜𝑢𝑛𝑔-𝐻𝑜 𝐶ℎ𝑜𝑖 * and 𝐽𝑖 𝑊𝑜𝑜𝑛𝑔 𝐶ℎ𝑜𝑖
1. Department of Electronic Information Engineering, Hansung University
2. Department of Trade, Hansung University
1. Introduction
Research on normalization is needed to correct and learn internal covariate shifts.
Excessive internal covariate shifts can result in vanishing or exploding gradients. However,
learning by conventional single normalization can be learned in an unoptimized direction.
In order to guide the learning of different features in the right direction through learning,
we propose an ensemble normalization that learns different features through two
normalizations.
Abstract Normalization studies have been conducted to solve vanishing or exploding gradient for stable training. This phenomenon leads to poor model training. Existing normalization method has used a
single normalization. However, single normalization has limitations in stabilizing learning through the model weight of internal covariate shift prevention. For improvement, we propose new
normalization method, an ensemble method that combines two normalization methods. We analyze the existing normalization method through an ablation study. To verify the proposed method,
We experimented two models and two datasets in semantic segmentation task and two models and a dataset in generation task. As a result, we confirmed that the new normalization method is
effective for improving stable training and semantic segmentation and generation performance.
2. Related Works
Existing normalization studies have been conducted such as batch normalization (BN),
instance normalization (IN), and group normalization (GN). BN normalizes samples in mini-
batch units through using mean and variance of each mini-batch and also add scaling and
shifting factors to the layers. However, BN does not perform well at small batch sizes. IN
normalizes across each channel in each sample. IN performs well on style transfer
regardless of the batch size. GN is similar to IN but normalizes over groups of channels for
each sample. GN performed as well as BN on ImageNet.
3. Ensemble Normalization
Figure 1. Process of Normalization Method, a) Existing normalization method, b)
Ensemble normalization method
Ensemble normalization calculates the addition of batch normalization and existing
normalization. Then apply division 2 to the preceding result. The above calculation
sequence is shown in Figure 1. Input is a group of feature maps convoluted by a
convolutional layer. Output is a group of feature maps normalizaed by a normalization
layer which is one of the conventional methods in a). However in b), the normalization
layer is consist of two methods. To have a clear understanding of the existing methods, We
conducted the ablation study on normalization to conduct an impact analysis on each
existing normalization method before verifying the proposed method. In semantic
segmentation task, we analyzed each normalization in the learning process through Focal
loss, Hinge loss, and Cross entropy. Experimental results show average results from Focal
loss, Hinge loss, and Cross entropy. The experimental method uses VOC and ATR datasets
for FCN and U-Net and uses the average value of three losses and uses the average value
through five iterative experiments. In generation task, we analyzed each normalization by
using GAN and LSGAN models and MNIST dataset.
5. Conclusion
We propose a new normalization which is an ensemble normalization. First, the impact of
existing normalization was normalization was analyzed through ablation studies to
determine the impact of the existing method. We also verified the ensemble normalization
method proposed in this paper. As a result, we confirmed that the proposed method is
effective for stable training on semantic segmentation and generation.
a) b)
FCN
FCNIN
FCNBN
FCNGN
Figure 2. Four feature maps in the second layer of the model, each with BN, IN, and GN
applied. a) existing normalizations, b) the proposed normalization.
As shown in figure 2, a comparison is made between the conventional normalization and
the proposed method. In the case of existing normalization, each feature map is not
different from each other, whereas in the feature map of the proposed method, different
feature maps can be confirmed by looking at BIN (combined BN with IN) and BGN
(combined BN with GN). This means that many different features have been trained,
which leads to performance gains. In the case of BBN (combined BN with BN), on the
other hand, three of the four feature maps show little difference. This means that rich
features are not trained, and as shown as in table2, they do not perform relatively well.
GAN
GANBN
GANGN
GANBBN
GANIN GANBIN
GANBGN
LSGAN
LSGANBN
LSGANGN
LSGANBBN
LSGANIN LSGANBIN
LSGANBGN
Figure 2. Four generated images of GAN and LSGAN with conventional normalization
methods or ensemble normalization methods.
4. Experimental Results
Table1. Quantitative comparison of two segmentation models(FCN and U-Net) using the
VOC dataset and two generation models(GAN and LSGAN) using the MNIST dataset.
Table 1 shows experimental results using two datasets for the existing methods and the
proposed method and overall performance is improved. It is because they learn
complementary to each other by catching the characteristics that cannot be captured by
existing normalization methods through other normalization methods. With the addition of
instance normalization, batch normalization, and group normalization, the loss each
changes were +2.3%, +0.4%, and -4.2% for the FCN model and in the U-Net model, there
were -42.9%, -20.3%, and -54.2%, respectively.
Also, overall structure similarity(SSIM) values have been improved in the generator
models. Through this experiment, we confirmed that the proposed method works well for
the generator models.
As shown as in figure 2, it is confirmed that By applying the proposed normalization
method, the image is clearer than when applying the conventional normalization method.
Editor's Notes
Figure 2에 나타나듯이 기존 노말라이제이션과 제안하는 방법을 비교해 보았을때, 두드러지는 특징이 발견 된다. 먼저 기존 노말라이제이션의 경우 각 feature map들이 서로 차이가 없는 반면에 제안하는 방법의 feature map들의 경우 BIN과 BGN을 보면 각기 다른 feature map들이 추출된 것을 확인 할 수 있다. 즉 서로 다른 여러가지의 특징이 훈련 되었다는 의미이며, 이는 곧 성능 향상으로 이어진다. 반면 BBN의 경우 네 장의 feature map중에서 세 장은 거의 차이가 없다. 이것은 풍부한 특징이 훈련되지 못한다는 의미이며, 이후 table2에서 확인할 수 있듯이 비교적 좋은 성능을 보이지 못한다.