1. MASKED RELATION
LEARNING FOR DEEPFAKE
DETECTION
Team – 20
1. Shaik Neha Sulthana (Y20CS165)
2. Shaik Majida (Y20CS163)
3. Pavaluri Poojitha (Y20CS138)
Guide : Mr. M. Naveen
2. ABSTRACT
DeepFake detection aims to differentiate falsified faces from
real ones. The approach proposed, aims to improve DeepFake detection
by considering the relationships between different parts of the face by
using a graph-like structure to represent the face and its different
regions.We use a technique called "masked modeling" to reduce the
amount of redundant information. This involves masking or ignoring
some of the relationships between different regions of the face to focus
on the most informative relationships.
3. INTRODUCTION
DeepFake videos are computer-generated videos that make it appear
as though someone is doing or saying something that they did not
actually do or say. DeepFake detection aims to differentiate falsified
faces from real ones.
The majority of methods treat it as a binary classification problem
by focusing just on the regional differences in face forgery and local
artifacts, omitting the relationship between local regions. The
approach proposed, aims to improve DeepFake detection by
considering the relationships between different parts of the face by
using a graph-like structure to represent the face and its different
regions.
However, too much information can make the method less effective,
so we use a technique called "masked modeling" to reduce the
amount of redundant information. This involves masking or ignoring
some of the relationships between different regions of the face to
focus on the most informative relationships.
4. EXISTING TECHNIQUES
Here are the previous techniques on DeepFake detection, relation
learning, masked graph modeling.
DeepFake Detection
Image forensic patterns and physiological signals.
Deep Learning with Convolutional Neural Networks (CNNs).
Frequency domain features.
Temporal artifacts.
Relation Learning
Handcrafted Features
Deep Learning with Convolutional Neural Networks (CNNs)
Adversarial Learning
Feature Fusion
6. DEEPFAKE DETECTION
Image forensic patterns and physiological signals:
Authors: Lugstein, S. Baier, Y. Li
These early DeepFake detectors attempt to expose fake faces through
image forensic patterns and physiological signals
Limitation: They are incompetent to detect realistic face forgery.
Deep Learning with Convolutional Neural Networks(CNNs):
Authors: A. Rossler, S. Ren, M. Tan
The deep learning approaches are effective to learn discriminative
characteristics of specific face manipulation algorithms.
Limitation: These methods are not robust enough to detect the
manipulations that they have not seen during their training, and their
performance suffers when facing new and challenging scenarios.
7. DEEPFAKE DETECTION(CONTD)
Frequency domain features :
Authors: Q. Gu, Y. Rao, L. M. Binh, J. Li
These assist classifiers to capture fine-grained clues of face forgery.
Limitation: They mainly focus on local features and do not explore the
relationships between facial regions.
Temporal artifacts:
Authors: S. Lyu, M. Pantic, Z. Sun
These include discontinuity of eye blinking , lip motion , and facial
landmarks as effective clues for DeepFake detection.
Limitation: They are limited in detecting more sophisticated
DeepFake manipulations.
8. RELATION LEARNING
Handcrafted Features:
Authors: J. Li et al., A. Goel, B. Fernando, F. Keller
Handcrafted features such as LBP, HOG were used in the past for face
recognition and image classification.
Limitation: These features are limited in their ability to capture
complex patterns and relationships within the data.
Deep Learning with Convolutional Neural Networks :
Authors: Z. Zha, L. Yu, Y. Li, and Y. Zhang
CNNs have been widely adopted in recent years for various visual
tasks, including DeepFake detection.
Limitation: CNNs limits their ability to model non-Euclidean structure
data such as 3D shapes or point clouds. Additionally, CNNs require a
large amount of labeled data for training and are computationally
expensive.
9. RELATION LEARNING(CONTD)
Adversarial Learning:
Authors: Y. Rao, J. Ni, and H. Xie
Adversarial learning is a technique where a model is trained to
distinguish between real and fake samples, while another model is
trained to generate realistic fake samples.
Limitation: It is vulnerable to adversarial attacks and requires careful
tuning of hyperparameters.
Feature Fusion:
Authors: S. Chen, T. Yao
Feature fusion techniques aim to combine multiple modalities of
information, such as RGB and frequency domains, to improve
DeepFake detection.
Limitation: Feature fusion is computationally expensive and requires
careful selection of fusion methods.
10. MASKED GRAPH MODELING
Masked Language Modeling (MLM):
Authors: A. Wettig, J. Devlin
This method involves masking partial tokens to improve the
performance of language models. Recent studies indicate that MLM is
also effective for computer vision tasks.
Limitation: This method is limited to language and computer vision
tasks and may not be applicable to other domains.
Masked Autoencoders (MAEs):
Authors: K. He, X. Chen, S. Xie
MAEs are the pioneering methods that learn visual representation by
reconstructing masked image patches. The rationale behind masked
modeling is information redundancy. Limitation: This method is
limited to learning visual representation and may not be suitable for
other tasks.
11. MASKED GRAPH MODELING(CONTD)
Masked Graph Modeling:
Authors: F. Manessi and A. Rozza
This method involves masking vertices and edges of a graph during
training to improve performance. Most works on masked graph
modeling adopt self-supervised learning to predict the masked vertices
and edges of a graph.
Limitation: This method may require a large amount of training data to
achieve good performance and may be computationally expensive.
Graph Autoencoders (GAEs):
Authors: A. Salehi, G. Cui, J. Zhou
GAEs are used to learn the structure of a graph by reconstructing it
from its latent representation.
Limitation: This method may not be suitable for tasks that require
more complex relationships between nodes in a graph.
12. MASKED GRAPH MODELING(CONTD)
Pretext Tasks:
Authors: W. Jin et al.
Pretext tasks are self-supervised learning tasks used to train neural
networks. They have limited improvement in performance and may not
generalize well to other tasks.
Limitation: This method may require a large amount of training data
to achieve good performance and may be computationally expensive.
13. PROPOSED TECHNIQUE – MASKED RELATION LEARNING
It consists of two main components:
SpatioTemporal Attention (STA) module
Masked Relation Learner (MRL).
SpatioTemporal Attention
SpatioTemporal Attention is a technique used in machine learning
and computer vision to selectively focus on specific regions or
frames of a video sequence. It involves allocating more
computational resources to the relevant parts of the video and
ignoring the irrelevant parts. In the context of video analysis, spatio-
temporal attention is used to recognize and classify actions, gestures,
or events in a video sequence.
14. SPATIOTEMPORAL ATTENTION(CONTD)
How SpatioTemporal Attention is used in masked relation
learning for deepfake detection
In the context of deepfake detection, spatio-temporal attention is
used to identify subtle changes in the visual appearance of the
face or the body that are indicative of manipulation. By focusing
on the most relevant regions of the image or video, the model
can better capture the underlying relationships between these
regions, which can help to distinguish between real and fake
content.
Overall, spatio-temporal attention is a powerful tool for
improving the accuracy of deep learning models in detecting
deepfakes, as it allows the model to selectively focus on the most
relevant information in the video while ignoring irrelevant
distractions.
15. SPATIOTEMPORAL ATTENTION(CONTD)
Advantages of SpatioTemporal Attention
Robustness: SpatioTemporal Attention can make the model more
robust to occlusions and distortions in the video, by allowing it
to focus on the most informative regions of the image or video
and ignore the irrelevant parts.
Selective focusing: SpatioTemporal Attention allows the model
to selectively focus on specific regions or frames of a video
sequence, which can improve the accuracy of the model's
predictions.
Computational efficiency: SpatioTemporal Attention can help to
reduce the computational cost of processing large video datasets,
by allowing the model to focus on the most informative parts of
the video and avoid processing irrelevant parts.
16. MASKED RELATION LEARNER
Masked relation learning is a technique used in deep learning to
learn the relationships between different parts of an image or video,
while being robust to occlusions and manipulations. The technique
involves masking different parts of the input and forcing the model
to learn the relationships between the remaining unmasked parts.
Role of MRL in DeepFake detection:
In the context of deepfake detection, masked relation learning
can help to identify subtle changes in the visual appearance of
the face or the body that are indicative of manipulation.
The technique involves dividing the input into different regions,
such as the eyes, nose, and mouth, and then masking some of
these regions while leaving others unmasked. The model is then
trained to predict the relationships between the unmasked
regions, such as the relationship between the movement of the
eyes and the mouth.
17. MASKED RELATION LEARNER(CONTD)
Overall, masked relation learning is a powerful technique for
learning the relationships between different parts of an image or
video, and it has many practical applications in fields such as
computer vision, robotics, and natural language processing.
Advantages of MRL:
Robustness: Masked relation learning can help the model to be
more robust to manipulations, and distortions in the input, by
focusing on the relationships between the unmasked parts and
ignoring the masked parts.
Scalability: Masked relation learning can help to reduce the
computational cost of processing large datasets, by focusing only
on the most informative parts of the input.
18. MASKED RELATION LEARNER(CONTD)
Transferability: Masked relation learning can be applied to
different tasks and domains, making it a versatile technique for
machine learning and computer vision.
Improved performance: By learning the relationships between
different parts of the input, masked relation learning can help to
improve the performance of deep learning models, particularly in
tasks such as object detection, segmentation, and action
recognition
19. DATASETS
FaceForensics++ (FF++) : a standardized dataset for DeepFake
detection. It consists of 1,000 pristine videos and 4,000 fake videos.
Four manipulation techniques are used to generate fake videos,
including DeepFakes1 (DF), Face2Face (F2F), FaceSwap2 (FS), and
NeuralTextures (NT). To simulate the setting of social networks,
FF++ has high-quality (HQ) and low-quality (LQ) copies created by
light compression and heavy compression, respectively.
21. DATASETS(CONTD)
Celeb-DF : a large-scale deepfakes dataset. It con- tains 590 real
videos and 5,639 fake videos of celebrities. An undisclosed
improved synthesis algorithm is devised to produce face forgeries.
The realistic forgeries make it difficult for DeepFake detection.
22. DATASETS(CONTD)
DeepFake Detection Challenge (DFDC) [46]: a public faceswap
video dataset. It contains 1,131 real videos and 4,119 fake videos.
Six advanced faceswap algorithms are used to craft fake videos. The
real videos are filmed in a variety of real-world scenes. Many
distractors such as dark lighting, extreme pose, and occlusion lead to
challenging forgery detection.