2. What are DeepFakes?
● The phenomenon gained its name from a user of the
platform Reddit, who went by the name “deepfakes”
(deep learning + fakes).
● This person shared the first deepfakes by placing
unknowing celebrities into adult video clips. This
triggered widespread interest in the Reddit
community and led to an explosion of fake content.
● The first targets of deepfakes were famous people,
including actors (e.g., Emma Watson and Scarlett
Johansson), singers (e.g., Katy Perry) and politicians
(e.g., President Obama)
Slide 2 of 8
3. How to deepfakes work?
● Deepfakes are commonly created using a specific deep network architecture
known as autoencoder.
● Autoencoders are trained to recognize key characteristics of an input image to
subsequently recreate it as their output. In this process, the network performs
heavy data compression.
● Autoencoders consist of three subparts:
● - an encoder (recognizing key features of an input face)
● - a latent space (representing the face as a compressed version)
● - a decoder (reconstructing the input image with all detail)
Slide 3 of 8
4. What is autoencoder and how is it used?
● An autoencoder is a type of neural network used for unsupervised learning. It
consists of an encoder and a decoder, and its primary purpose is to learn an
efficient representation (latent space) of the input data. The encoder compresses
the input data into a lower-dimensional space, and the decoder reconstructs the
input data from this compressed representation.
In the context of creating deepfakes, which involve generating realistic-looking
images or videos of one person's face onto another person's body, the use of two
separate autoencoders is not efficient. This is because each autoencoder, when
trained independently on different people, would learn unique features and
representations specific to the individuals it was trained on. These representations
are likely to be incompatible with each other, making it challenging to seamlessly
combine them for the purpose of generating deepfakes.
Slide 4 of 8
5. The trick
● Training Individual Autoencoders:
Train an autoencoder for each person separately. Each autoencoder
consists of an encoder and a decoder.
The encoder in each autoencoder is responsible for compressing the facial
features of the respective person into a latent space.
● Sharing Encoder Architecture:
Design the encoder part of both autoencoders to have a similar
architecture. This could involve using the same neural network structure
or ensuring that the dimensions of the latent space are compatible.
● Creating Latent Space Representation:
Use the encoder from the first person's autoencoder to encode an image
of that person's face. This results in a compressed latent space
representation.
● Generating Fake Image:
Take the latent space representation obtained from the first person's
encoder and input it into the decoder of the second person's autoencoder.
Slide 5 of 8
6. The trick
● The shared-latent space
assumption. The two
heterogeneous images of x 1 and
x 2 can be mapped into the same
latent representation z by a
coupling VAE for comparability,
while the latent representations
can be reconstructed into the
original images, respectively, for
completeness.
Slide 6 of 8
7. Popular example we all have seen
● Popular Indian
actress
Rashmika
Mandanna’s
deepfake that
went viral on
social media not
too long ago.
Slide 7 of 8
8. Takeaway
● Deepfakes can be used in positive and negative ways to manipulate content for
media, entertainment, marketing, and education.
● Deepfakes are not magic but are produced using techniques from Al that can
generate fake content that is highly believable.
Slide 8 of 8