This document summarizes a research paper on generalizing deepfake detection using knowledge distillation and representation learning. The proposed FReTAL framework uses knowledge distillation to transfer knowledge from a teacher model to a student model, preventing catastrophic forgetting when adapting to new domains without source data. It also applies representation learning to leverage similar features between domains. Experiments show FReTAL outperforms baselines on deepfake benchmark datasets, achieving up to 86.97% accuracy on low-quality deepfake detection.
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Generalizing Deepfake Detection using Knowledge Distillation and Representation Learning
1. FReTAL: Generalizing Deepfake Detection using Knowledge
Distillation and Representation Learning
Minha Kim
kimminha@g.skku.edu
Workshop on Media Forensics
CVPR 2021 - Virtual
Data-drive AI Security HCI (DASH) Lab
Sungkyunkwan University, South Korea
Simon S. Woo
swoo@g.skku.edu
Shahroz Tariq
shahroz@g.skku.edu
3. Transfer
Introduction
Dataset B
(Face2Face)
Dataset A
Dataset B
(Face2Face)
Accuracy 95%87% ↓
Accuracy 98%
Catastrophic
Forgetting
However, It can lead to “catastrophic forgetting”
Accuracy of original domain drops from 95% (dataset A) to 87% (dataset A),
while achieving the new domain detection accuracy w/ 98% (dataset B).
4. Problem Statements
1. Catastrophic forgetting may occur due to training with data from different domains.
2. Pretrained model requires a few source data to prevent the catastrophic forgetting during
domain adaptation. However, sometimes source data may not be available.
3. It is difficult to classify low-quality (LQ) deepfake data compared to high-quality (HQ)
data
HQ
LQ
VS.
5. Contributions
1. We propose a novel domain adaption framework, “Feature Representation Transfer
Adaptation Learning” (FReTAL), based on knowledge distillation and representation
learning that can prevent catastrophic forgetting without accessing to the source
domain data.
2. We show that leveraging knowledge distillation and representation learning can enhance
adaptability across different deepfake domains.
3. We demonstrate that our method outperforms baseline approaches on deepfake
benchmark datasets with up to 86.97% accuracy on low-quality deepfake detection.
6. KD Loss
Non-trainable Teacher Model
(Pre-trained on Source Data)
Student Model
(To-be trained on Target Data)
Our Approach - Knowledge Distillation (KD)
KD is a method to compress knowledge of a large model to a small model
The main idea is the student model can mimic the knowledge of the teacher model
Target Data
7. Representation learning is the process of learning representations of input data.
Representation Learning is a training method by representing the hidden layers of a network
by applying some conditions to the learned intermediate features.
Default Representation Semantic Representation
Our Approach – Feature-based Representation Learning
13. Conclusion
1. We found that similar features exist between the source and target dataset that can help in
domain adaptation though our representation learning framework.
2. We demonstrate that applying our FReTAL loss without even using the source dataset can
significantly reduce catastrophic forgetting.
3. We achieve higher performances than other methods, on both source dataset as well as
target dataset.
Deepfakes
Face2Fa
ce
10sec
Hello everyone, my name is Minha Kim. And, I’m going to present the paper ‘FReTAL: Generalizing Deepfake Detection using Knowledge Distillation and Representation Learning’
4sec
For example, we train a model with A Dataset.
23sec
And then, we proceed with transfer learning using B dataset. during transfer learning, knowledge of models trained for A datasets can gradually be forgotten due to the target domain. In other words, Catastrophic Forgetting is caused, which forgets knowledge of the model about the source domain
As we proceed with transfer learning using the B dataset of different domain, knowledge of the model, which had been pretrained and optimized for the A dataset, will now be more optimized for the B dataset than the A. We call this Catastrophic Forgetting in which the model forgets knowledge on source domain
(v2)18sec
There are three difficulties to overcome; One, catastrophic forgetting caused by cross-domain training, Two, restriction to source data access due to privacy and availability issues, and Lastly, low resolution environment that degrades performance of model
----------------------------------
(v1)32sec
Our problem statement is; first, catastrophic forgetting is caused by two different domains.
Second, to maintain existing information during transfer learning, the source data that you learned is usually required.
However, it may raise privacy concerns. And, Most source domain data is not available.
Third, It is difficult to classify low-resolution data compared to high-resolution data
36sec
So, our contributions are as follows.
First, our approach can prevent catastrophic forgetting without accessing to the source domain data.
Second, We show that leveraging knowledge distillation and representation learning can enhance adaptability across different deepfake domains.
Third, We demonstrate that our method outperforms baseline approaches on deepfake benchmark datasets with up to 86.97% accuracy on low-quality deepfakes
(v2) 25sec
To prevent the catastrophic forgetting, our approach mainly focuses on the Knowledge distillation and representation learning.
KD is a method to compress knowledge of a large model into a smaller model with a notion of student model can mimic the teacher model. We take advantage of this and use two common networks for teacher and student models when transfer learning
---------------------------------
(v1) 32sec
To prevent the catastrophic forgetting, our approach mainly focuses the Knowledge distillation and representation learning.
KD is a method to compress knowledge of a large model to a small model. The main idea is the student model can mimic the knowledge of the teacher model. We take advantage of the ability of student to mimic teacher's knowledge and use two same networks as teacher and student models for transfer learning.
27sec
The Representation Learning is representing the hidden layers of a network by applying some conditions to the learned intermediate features.
Representation learning is the process of learning representations of input data, usually by transforming or extracting features from it, making a task like classification easier to perform.
----------------------------------------------------------------
100sec
It is architecture of our model, including KD and representation learning.
We assume that similar features must exist between different types of deepfakes. Therefore, a teacher model trained on the source domain can help the student learns the target domain with fewer target data samples.
We divide the storage into five, Because we have to fine-tune very carefully to prevent domain shifting. And we store feature information in the appropriate storage for the softmax value of output, where input is the average value scalar of feature vector.
All feature scalar values stored in this storage are fixed on the teacher model, because it is always fixed during the training progress.
Also, In the same way, we store feature information in storage of the student model..
Note that, unlike the teacher model, the feature information in storage of the student model will change in each iteration as training progress.
And, during transfer learning, we calculate the difference between student and teacher model using our feature-based square loss.
Also, We calculate KD loss between teacher and student models and a cross-entropy loss function just for the student model.
Finally, Our FReTAL method can be written with the three loss terms.
18sec
For experiment, we used FaceForensics++ datasets and extract frames from the origin and manipulated video.
First, we divide the number of videos as shown in the following table and then extract 80 images per video.
10sec
We explored several baselines fine-tuning, T-GD, and KD methods based on Xception for comparison in transfer learning phase.
14sec
We evaluate all datasets with four baselines using four domain adaptation methods. FReTAL demonstrate the best and most consistent performance. The best results are highlighted with the red color .
6sec
In addition, our approach show the best performance for all low-quality datasets.
25sec
To sum up, we find that similar features exist between the source and target dataset that can help in domain adaptation.
Moreover, we demonstrate that applying KD loss without even using the source dataset can reduce catastrophic forgetting.
Finally, we achieve higher performances than other methods, on both source dataset as well as target dataset.