To present on the seminar in DASH-Lab, SKKU, I brought out the thesis, which is Transferable GAN-generated Images (ICML 2020)
Detection.
.
If you want to see the context more specifically, you can see from this link : https://arxiv.org/abs/2008.04115
1. Data-driven AI
Security HCI (DASH) Lab
1
Data-driven AI
Security HCI (DASH) Lab
김민하
성균관대학교
July 23, 2020
Data-driven AI
Security HCI (DASH) Lab
HyeonseongJeon1 YoungohBang1 JunyaupKim SimonS.Woo1
2. Data-driven AI
Security HCI (DASH) Lab
Background
• high-resolution images produced
by the latest GANs
• It’s feasible through few-shot or
single-shot learning
This paper proposes a novel regularization method with self-
training for transfer learning by combining and transforming
regularization, augmentation, self-training
3. Data-driven AI
Security HCI (DASH) Lab
Limitation
● Relying on Metadata Information
They cannot provide an optimal solution for transfer learning
5. Data-driven AI
Security HCI (DASH) Lab
Transfer learning framework in more detail.
Calculate the L2 SP using the weights
6. Data-driven AI
Security HCI (DASH) Lab
L2-SP(L2-Starting Point)
Pre trained weights Of the Conv layers
Eq. 1
Eq. 2 L2-SP
L2-Norm
L2-SP differs in that the starting point from a well pre-trained
source dataset guides the learning process by referring to the
information of the pre-trained source dataset.
7. Data-driven AI
Security HCI (DASH) Lab
Eq. 3
Eq. 4
Eq. 5
Binary Cross Entropy
Self-training for L2-SP
inversely proportional.
Final loss Function
L2-norm
L2-SP
Self training
9. Data-driven AI
Security HCI (DASH) Lab
Generaltransfer
● It is common practice to freeze
some weights of pre-trained
model from source dataset, and
finetune the model with weight
decay to the target dataset.
Baselines
ForensicTransfer
● ForensicTransfer introduced an
autoencoder for the GAN-image
detection.
● Although the ForensicTransfer
showed promise for model
transferability, its performance
remains mediocre.
Cozzolino, D., Thies, J., R¨ossler, A., Riess, C., Nießner, M., and Verdoliva,
L. Forensictransfer: Weakly-supervised domain adaptation for forgery
detection. arXiv preprint arXiv:1812.02510, 2018.
10. Data-driven AI
Security HCI (DASH) Lab
Performance Results
As you can see, Even though T-GD did Transfer learning,
it has been confirmed that there is a high probability of performance
even if the data set of different types
11. Data-driven AI
Security HCI (DASH) Lab
ResNext vs. EfficientNet
Therefore, TGD performance is not directly
related to the number of parameters.
EfficientNet shows better performance in T-GD
even though it has a few parameters.
When comparing T-GD from different base models…
12. Data-driven AI
Security HCI (DASH) Lab
Non-face GAN-image detection
T-GD is effective not only for GAN-generated face detection,
but also for nonface tasks
13. Data-driven AI
Security HCI (DASH) Lab
Self-training & Data augmentation effects
intra-class Cutmix, JPEG compression,
Gaussian blur, and random horizontal flip
to avoid over-fitting in transfer learning…
Self-trained and augmentation method’s AUROC is
higher than with out method’s AUROC
15. Data-driven AI
Security HCI (DASH) Lab
Contribution
● Maintain high performance and Overcoming catastrophic forgetting
during Transfer Learning
● Effectively detecting state-of-the art GAN-images with a small data
without any metadata information.
(Hello, I’m glad to be here with you today.)
(Let me start off by brifly introducing myself first.
My name is minha kim.
I am going to transfer from Hanyang University to the Department of Software at Sungkyunkwan University..)
The second paper I’m going to talk about is T-GD: Transferable GAN-generated Images Detection Framework
It was registered in the ICML in 2020.
Recent advancements(엣벤스먼츠) in Generative Adversarial Networks (GANs) enable the generation of realistic images, which has now become feasible through(뜨루) few-shot or single-shot learning.
Even high-resolution images produced by the latest GANs are hardly distinguishable from real images or by human inspection.
While many studies on transfer learning have already shown impressive performance, they have not applied for GAN-image detection.
Now I talk about limitations of previous works.
First, Some methods rely on detection with the metadata such as GAN-model information.
Second, Data augmentation methods such as JPEG compression and Gaussian Blur are not fully explained about the generalized way.
Third, they show relatively weak results for transfer learning ability within GAN-image detection.
That is, They cannot provide an optimal solution for transfer learning
-------------------------------
metadata information???
metadata data is data’s information
For example, if data is composed of PGGAN, the meatadata show that it was created as PGGAN
Let me explain the framework of TGD.
TGD consists of Teacher Classifier and Student Classifier
Teacher Classifier is pre-trained classifier, and Student Classifier is that we will train using L2 SP and Self-training.
Let me explain about transfer learning framework in more detail.
First, Calculate the L2 SP using the weights of the pre-trained Teacher Model for self training.
Then, apply binary cross entropy and apply self training as suggested in this paper
Through this self training, we will adjust the target date while automatically adjusting the regularization learning rate.
--------------------------------
Stochastic Depth?
Let me explain about L2-sp for self training breifly.
The weight of the pretrained model from the source dataset is used as the SPAR that is (starting point as the reference).
we use L2-SP for transfer learning.
which regularizes the weight variation of a target model by referring to the weights pre trained on the source dataset
L2-SP differs in that the starting point from a well pre-trained source dataset guides the learning process by referring to the information of the pre-trained source dataset.
This method does not require freezing the weights of the pre-trained model nor using weight decay.
----------------------------------------------
L2 SP이용하는 이유?
This method does not require freezing the weights of the pre-trained model nor using weight decay.
This Regularization can lead to a better optimization by preventing over-fitting when learning from scratch;
Regularization 하는 이유?
Regularization can lead to a better optimization by preventing over-fitting when learning from scratch;
1. The loss function uses binary cross entropy as shown in equation 3.
the input data, ˜ x(틸트)noised i , is from the target dataset with noise injection.
Fw(prime) denotes the pre-trained models from the source dataset
2. And for stable self-learning, gamma values can be obtained by inserting the target source into the Teacher model, just like equation 4
As you can see, when you put the target datas in the Teacher model, you can see that the Loss and Gamma values are inversely proportional.This means that if Teacher model considers Target data unfamiliar, it will lower the gamma value so that it can learn more.
and the negative value of the result is taken and transformed by the sigmoid function γ in Eq. 4,
3.
The final loss function, as shown in Eq. 5, is composed of a cross-entropy term and an L2-SP term for the self training of the student model
----------------------------------------------
*Advantage of self training
The proposed method has the advantage of preventing either excessive or minor regularization.
---------------------------------------------
Y~ = target data
Y = source data
W’ = weights of the teacher model
Y헷 =
Fw’ denotes the pre-trained models from the source dataset
-----------------------
Sigma, S ?????
And Sigma means a sigmoid function.
s is a hyperparameter taking values from 0.1 to 2.0
Next is paper’s proposed intra-class CutMix.
This paper introduce a novel augmentation method to solve the over-fitting problem by transforming Cutmix
On the left side, the original CutMix, that is inter-class CutMix. Inter-class Cutmix replaces the chosen patch with another image patch in the same location.
On the right side is shown this paper proposed Intra-class CutMix.
To put it simply, A1 GAN image is cropped to a random size at a random location, then A2 GAN image is changed to the same location and size.
this paper found that the inter-class CutMix for a binary classification causes highly unstable training
----------------------------------
What is difference from two methods.
Before I explain the performance comparison results of TGD, I will briefly explain two transfer methods introduced in Baseline.
First. It’s a Generaltransfer learning method.
It is common practice to freeze some weights of pre-trained model from source dataset, and finetune the model with weight decay to the target dataset.
Second. It’s a ForensicTransfer.
ForensicTransfer introduced an auto encoder for the GAN-image detection.
They apply auto encoder and detect GAN-images through reconstruction error. This learning method has advantages regarding lower data usage, when the model is well trained.
Although the ForensicTransfer showed promise for model transferability, its performance remains mediocre
In this paper, GN+WS is applied instead of BN to quickly achieve high Accuracy even in small batches.The disadvantage of BN is that the performance of the model depends on the large batch size because it performs normalization in minibatch units.Various techniques have been proposed to address these issues, such as Group Normalization (GN), but in typical large-batch training situations, the performance of the BN is not as good as that of the BN and is significantly underutilized.The Weight Standardization (WS) proposed in this paper has shown in the 'WS' paper that it achieved better performance than BN in large-batch situations by completely eliminating minibatch dependency, as shown in GN.-----------------
Weight standarzation?
This table is performance Results of Baseline methods and T-GD using EfficientNet and ResNext
The evaluation metric is AUROC (%).
The Data set column indicates pretrained model from a source data set,
and the Data set row indicates the target test set for transfer learning.
The Zero-shot category represents the performance of a pre-trained model without any additional training and
the Transfer learning category represents each pre-trained model transferred from the source to target dataset.
As you can see, Even though T-GD did Transfer learning,
it has been confirmed that there is a high probability of performance even if the data set of different types
-------------------------------
Auroc는?
AUROC is an abbreviation for the Area Under a ROC Curve. It is the area of the ROC curve calculated and graphically represented by TPR and FPR.
In comparison to T-GD from different base models, the results show subtle differences.
Although the number of parameters in ResNext is greater than that of EfficientNet,
EfficientNet shows better performance in T-GD even though it has a few parameters.
Although the number of parameters affects the classification performance, the performance of EfficientNet was superior to that of ResNext in TGD transfer tasks.
Therefore, TGD performance is not directly related to the number of parameters.
------------------
EfficientNet have 3million weights, and ResNext have 20million weights
ResNet이 depth scaling을 통해 모델의 크기를 조절하는 대표적인 모델
Non-face GAN-image detection.
T-GD is effective not only for GAN-generated face detection, but also for nonface tasks.
This experimented with transfer learning from non-face GAN-images as the source (PGGAN-images from LSUN-bedroom and LSUN-bird) to face GAN-images as the target.
This achieved stable AUROC on both detection tasks as shown in Table.
------------------------------
Auroc는?
AUROC is an abbreviation for the Area Under a ROC Curve. It is the area of the ROC curve calculated and graphically represented by TPR and FPR.
I explained why self-training is used for L2-SP.
The following table shows Self-training and Data augmentation effects.
As you can see, Self-trained method’s AUROC is higher than with out self-trained method’s AUROC
To compare the performance of this model with and without self-training,
it’s keep all other settings the same.
And at Data augmentation effect.
to avoid over-fitting in transfer learning, this paper utilized the following data augmentation methods :
Using intra-class Cutmix, JPEG compression, Gaussian blur, and random horizontal flip.
Despite the small reduction in the target AUROC, the drastic increase in the source AUROC implies that over-fitting can be avoided through these augmentation methods in transfer learning, while preventing catastrophic forgetting. //이까지만 설명
+
For the target dataset with augmentation, the AUROC of T-GD dropped from 99.38% to 98.13% (1.25%), but we achieved a 10.04% higher AUROC for the source dataset than that of the same dataset without augmentation (from 85.04% to 95.08%).
------------------------------------------------------------
w augrmentation에서 target auroc가 떨어진거면 더 안좋은 성능인거 아닌가?
The target auroc alone does, but it is a small decrease compared to the source auroc’s increasing of about 10 percent.It is relatively more effective when w augmentation is applied.
--------------------------
catastrophic forgetting??
(the Neural Network shows excellent performance for Single task, but )there is a problem that learning different kinds of tasks significantly reduces the performance of previously learned tasks. This is called Catastrophic porting.
This is Validation loss in transfer learning between Cutmix and intra-class Cutmix.
the validation loss for yellow and red lines, that intra-class Cutmix is considerably lower and more stable than that for Cutmix (green and blue).
-------------------------------------------
1.This paper present T-GD network, a method to maintain high performance on both the source and target datasets for the GAN-image detection during transfer learning.
This paper propose the novel regularization and augmentation techniques, the L2-SP self-training and intra-class Cutmix, building upon well-known CNN backbone models.
In addition, target AUROC is increased while preserving existing source AUROC using self training and intra cutmix, thus preventing CATASTRHIC FORGETTING.
2.
T-GD achieves high performance on the source dataset by overcoming catastrophic forgetting and effectively detecting state-of-the artGAN-images with only a small volume of data without any metadata information.
--------------------
catastrophic forgetting??
(the Neural Network shows excellent performance for Single task, but )there is a problem that learning different kinds of tasks significantly reduces the performance of previously learned tasks. This is called Catastrophic porting.
----------------------------------------------
metadata information???
metadata data is data’s information
For example, if data is composed of PGGAN, the meatadata show that it was created as PGGAN
------------------------------------------
이 논문의 장점 단점?
Adventage?
1.without any meta, this able to transfer learning
2.this overcome catastrophic forgetting
Weakness?
If there were comparison results for active cutmix, cutout, mixup, etc. in addition to general cutmix, it would have been possible to objectify the results of intra cutmix.