SlideShare a Scribd company logo
1 of 20
Video Face Manipulation
Detection Through
Ensemble of CNNs
Paper Tutorial: Chris Chien
N. Bonettini, E. D. Cannas, S. Mandelli, L. Bondi, P. Bestagini and S. Tubaro, "Video Face Manipulation
Detection Through Ensemble of CNNs," 2020 25th International Conference on Pattern Recognition (ICPR), 2021,
pp. 5012-5019, doi: 10.1109/ICPR48806.2021.9412711.
Paper Summary
● Motivation: tackle the detection of modern face manipulation and run the
detection in a more efficient way.
● Methodology:
○ Ensemble CNN (EfficientNet models) trained based on the use of attention layers and
siamese training.
● Future Work:
○ Embedding of temporal info
○ Voting schemes of the ensemble models
Related Work
Fake Face Detection Algo
The previous networks/methods designed for fake face detection:
● MesoNet: a relatively shallow CNN detecting fake faces
● XceptionNet
● LSTM: extract a series of frame-based features
● Warping traces models
● Eye blinking analysis
● Semantic analysis of the frames
● Inconsistent lighting effects
Solution
Dataset
● FF++:
○ Data Variety: Face2Face, FaceSwap, DeepFakes,
Neural Textures techniques.
○ Data Volume: Each method applies to 1000
pristine videos from YouTube where each video
has 280 frames at least.
○ Data Format: Videos are compressed using the
H.264 codec.
○ Data Split: 720 videos for training, 104 for validation
and 140 for testing.
● DFDC
○ Data Variety: Different DeepFake techniques.
Have the diversity info in terms of gender,
skintone, age.
○ Data Volume: 119,000 videos where each has
roughly 300 frames. Unbalanced dataset:
100,000 are fakes.
○ Data Split: 35 folders for training, 5 folders for
validation and lst 10 folders for testing
Data Preprocessing
● Select 32 frames in each video to the training set because this number can conquer overfitting
and too many frame do not contribute to the model performance.
● Extract faces from each frame using the BlazeFace extractor in that it is faster than the MTCNN
detector. If more than one faces is in a frame, only capture the one with the higher confidence score.
● Data augmentation: downscaling, horizontal flipping, random brightness contrast, hue saturation,
noise addition and finally JPEG compression. ( Albumentation lib)
32
frames
Extract Faces Using
Model BlazeFace
Run Data
Augmentation
Model Training
Solution = Ensemble CNN + Attention Mechanism + Siamese Paradigm
Keep computational complexity at bay:
● Analyze 4,000 videos in less than 9 hours using at most a single NVIDIA
P100 GPU.
● The trained model must occupies less than 1GB of disk space.
Ensembling Process
● Why using ensemble?
Train classifiers that can capture high-level semantic info that complement one
another.
● How?
1. Model Arch Source: EfficientNet (reasoning: good trade-off in terms of
model size, run time(FLOPS cost) and accuracy).
2. Attention Mechanism: make the network explainable → show which part of
frame is manipulated.
3. Network Training Strategies: Siamese Training.
What Is EfficientNet?
● The design of EfficientNet relies on the techniques of architecture scaling on
CNN.
● The scaling works on balancing the dimensions including width, depth, and
image resolution.
Image Source: EfficientNet: Rethinking
Model Scaling for Convolutional Neural
Networks
What Is Attention Layer?
Q vector K vector V vector
softmax((Q*K)/(dK)^0.5) * V
What Is Siamese Training? - Pros and Cons
It compares the similarity for a pair of input images.
Pros:
● It is designed for a scalable system (One-shot learning model.) for use cases
like Facial Recognition System, Place Registration, Signature Verification.
Cons:
● Could have more training time as it requires quadratics pairs to learn.
What Is Siamese Training? - Siamese Model Architecture
Real Image
Fake Image
Loss
Function
Network 1
Network 2
Real Image
Embedding
Fake Image
Embedding
shared network
weights
What Is Siamese Training? - Siamese Model Loss
Functions
Real Image
(Anchor
Image)
Real Image
(Positive
Image)
Fake Image
(Negative
Image)
Option 1. Triplet Loss Function =
D(A, P) - D(A, F) + Margin
Better to get
closer.
Better to get
far from
each other.
Option 2. Contrastive Loss Function =
(1-Y)*0.5*D(A, B) +
(Y)*0.5*{max(0, Margin - D(A, B))}
Image A
Image B
EfficientNetB4 with the Attention Layer
Standard EfficientNet:
● Model Size: 19 millions of
parameters
● Model Operations: 4.2
billion of FLOPS
● Model Performance: 83.8%
top-1 accuracy on the
ImageNet
EfficientNetB4Att
Siamese Network Training
source: paper content
Other Training Info
● Hyperparameter: models using Adam optimizer with hyperparameters equal
to β 1 = 0.9, β 2 = 0.999, = 10 −8, and initial learning rate equal to 10 −5.
● HW for Training: Intel Xeon E5-2687W-v4 and a NVIDIA Titan V.
Evaluation Results
EfficientNetB4Att Explainability
select the output of the Sigmoid layer in the attention block, which is a 2D map
with size 28 × 28. Then, we up-scale it to the input face size (224 × 224), and
superimpose this to the input face.
source: paper content

More Related Content

Similar to ML Paper Tutorial - Video Face Manipulation Detection Through Ensemble of CNNs.pptx

深度學習在AOI的應用
深度學習在AOI的應用深度學習在AOI的應用
深度學習在AOI的應用CHENHuiMei
 
5 ijaems sept-2015-9-video feature extraction based on modified lle using ada...
5 ijaems sept-2015-9-video feature extraction based on modified lle using ada...5 ijaems sept-2015-9-video feature extraction based on modified lle using ada...
5 ijaems sept-2015-9-video feature extraction based on modified lle using ada...INFOGAIN PUBLICATION
 
Computer Vision - Real Time Face Recognition using Open CV and Python
Computer Vision - Real Time Face Recognition using Open CV and PythonComputer Vision - Real Time Face Recognition using Open CV and Python
Computer Vision - Real Time Face Recognition using Open CV and PythonAkash Satamkar
 
Real time multi face detection using deep learning
Real time multi face detection using deep learningReal time multi face detection using deep learning
Real time multi face detection using deep learningReallykul Kuul
 
one shot15729752 Deep Learning for AI and DS
one shot15729752 Deep Learning for AI and DSone shot15729752 Deep Learning for AI and DS
one shot15729752 Deep Learning for AI and DSManiMaran230751
 
A Review on Color Recognition using Deep Learning and Different Image Segment...
A Review on Color Recognition using Deep Learning and Different Image Segment...A Review on Color Recognition using Deep Learning and Different Image Segment...
A Review on Color Recognition using Deep Learning and Different Image Segment...IRJET Journal
 
IRJET- Mango Classification using Convolutional Neural Networks
IRJET- Mango Classification using Convolutional Neural NetworksIRJET- Mango Classification using Convolutional Neural Networks
IRJET- Mango Classification using Convolutional Neural NetworksIRJET Journal
 
cvpresentation-190812154654 (1).pptx
cvpresentation-190812154654 (1).pptxcvpresentation-190812154654 (1).pptx
cvpresentation-190812154654 (1).pptxPyariMohanJena
 
ppt 20BET1024.pptx
ppt 20BET1024.pptxppt 20BET1024.pptx
ppt 20BET1024.pptxManeetBali
 
Mirko Lucchese - Deep Image Processing
Mirko Lucchese - Deep Image ProcessingMirko Lucchese - Deep Image Processing
Mirko Lucchese - Deep Image ProcessingMeetupDataScienceRoma
 
Deep hypersphere embedding for real-time face recognition
Deep hypersphere embedding for real-time face recognitionDeep hypersphere embedding for real-time face recognition
Deep hypersphere embedding for real-time face recognitionTELKOMNIKA JOURNAL
 
Automatism System Using Faster R-CNN and SVM
Automatism System Using Faster R-CNN and SVMAutomatism System Using Faster R-CNN and SVM
Automatism System Using Faster R-CNN and SVMIRJET Journal
 
Ultrasound Nerve Segmentation
Ultrasound Nerve Segmentation Ultrasound Nerve Segmentation
Ultrasound Nerve Segmentation Sneha Ravikumar
 
Report face recognition : ArganRecogn
Report face recognition :  ArganRecognReport face recognition :  ArganRecogn
Report face recognition : ArganRecognIlyas CHAOUA
 
IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...
IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...
IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...IRJET Journal
 
Deep-learning-for-computer-vision-applications-using-matlab.pdf
Deep-learning-for-computer-vision-applications-using-matlab.pdfDeep-learning-for-computer-vision-applications-using-matlab.pdf
Deep-learning-for-computer-vision-applications-using-matlab.pdfAubainYro1
 
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...CSCJournals
 
IRJET- Implementation of Gender Detection with Notice Board using Raspberry Pi
IRJET- Implementation of Gender Detection with Notice Board using Raspberry PiIRJET- Implementation of Gender Detection with Notice Board using Raspberry Pi
IRJET- Implementation of Gender Detection with Notice Board using Raspberry PiIRJET Journal
 

Similar to ML Paper Tutorial - Video Face Manipulation Detection Through Ensemble of CNNs.pptx (20)

深度學習在AOI的應用
深度學習在AOI的應用深度學習在AOI的應用
深度學習在AOI的應用
 
5 ijaems sept-2015-9-video feature extraction based on modified lle using ada...
5 ijaems sept-2015-9-video feature extraction based on modified lle using ada...5 ijaems sept-2015-9-video feature extraction based on modified lle using ada...
5 ijaems sept-2015-9-video feature extraction based on modified lle using ada...
 
Computer Vision - Real Time Face Recognition using Open CV and Python
Computer Vision - Real Time Face Recognition using Open CV and PythonComputer Vision - Real Time Face Recognition using Open CV and Python
Computer Vision - Real Time Face Recognition using Open CV and Python
 
kanimozhi2019.pdf
kanimozhi2019.pdfkanimozhi2019.pdf
kanimozhi2019.pdf
 
Deep learning-practical
Deep learning-practicalDeep learning-practical
Deep learning-practical
 
Real time multi face detection using deep learning
Real time multi face detection using deep learningReal time multi face detection using deep learning
Real time multi face detection using deep learning
 
one shot15729752 Deep Learning for AI and DS
one shot15729752 Deep Learning for AI and DSone shot15729752 Deep Learning for AI and DS
one shot15729752 Deep Learning for AI and DS
 
A Review on Color Recognition using Deep Learning and Different Image Segment...
A Review on Color Recognition using Deep Learning and Different Image Segment...A Review on Color Recognition using Deep Learning and Different Image Segment...
A Review on Color Recognition using Deep Learning and Different Image Segment...
 
IRJET- Mango Classification using Convolutional Neural Networks
IRJET- Mango Classification using Convolutional Neural NetworksIRJET- Mango Classification using Convolutional Neural Networks
IRJET- Mango Classification using Convolutional Neural Networks
 
cvpresentation-190812154654 (1).pptx
cvpresentation-190812154654 (1).pptxcvpresentation-190812154654 (1).pptx
cvpresentation-190812154654 (1).pptx
 
ppt 20BET1024.pptx
ppt 20BET1024.pptxppt 20BET1024.pptx
ppt 20BET1024.pptx
 
Mirko Lucchese - Deep Image Processing
Mirko Lucchese - Deep Image ProcessingMirko Lucchese - Deep Image Processing
Mirko Lucchese - Deep Image Processing
 
Deep hypersphere embedding for real-time face recognition
Deep hypersphere embedding for real-time face recognitionDeep hypersphere embedding for real-time face recognition
Deep hypersphere embedding for real-time face recognition
 
Automatism System Using Faster R-CNN and SVM
Automatism System Using Faster R-CNN and SVMAutomatism System Using Faster R-CNN and SVM
Automatism System Using Faster R-CNN and SVM
 
Ultrasound Nerve Segmentation
Ultrasound Nerve Segmentation Ultrasound Nerve Segmentation
Ultrasound Nerve Segmentation
 
Report face recognition : ArganRecogn
Report face recognition :  ArganRecognReport face recognition :  ArganRecogn
Report face recognition : ArganRecogn
 
IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...
IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...
IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...
 
Deep-learning-for-computer-vision-applications-using-matlab.pdf
Deep-learning-for-computer-vision-applications-using-matlab.pdfDeep-learning-for-computer-vision-applications-using-matlab.pdf
Deep-learning-for-computer-vision-applications-using-matlab.pdf
 
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
 
IRJET- Implementation of Gender Detection with Notice Board using Raspberry Pi
IRJET- Implementation of Gender Detection with Notice Board using Raspberry PiIRJET- Implementation of Gender Detection with Notice Board using Raspberry Pi
IRJET- Implementation of Gender Detection with Notice Board using Raspberry Pi
 

Recently uploaded

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 

Recently uploaded (20)

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 

ML Paper Tutorial - Video Face Manipulation Detection Through Ensemble of CNNs.pptx

  • 1. Video Face Manipulation Detection Through Ensemble of CNNs Paper Tutorial: Chris Chien N. Bonettini, E. D. Cannas, S. Mandelli, L. Bondi, P. Bestagini and S. Tubaro, "Video Face Manipulation Detection Through Ensemble of CNNs," 2020 25th International Conference on Pattern Recognition (ICPR), 2021, pp. 5012-5019, doi: 10.1109/ICPR48806.2021.9412711.
  • 2. Paper Summary ● Motivation: tackle the detection of modern face manipulation and run the detection in a more efficient way. ● Methodology: ○ Ensemble CNN (EfficientNet models) trained based on the use of attention layers and siamese training. ● Future Work: ○ Embedding of temporal info ○ Voting schemes of the ensemble models
  • 4. Fake Face Detection Algo The previous networks/methods designed for fake face detection: ● MesoNet: a relatively shallow CNN detecting fake faces ● XceptionNet ● LSTM: extract a series of frame-based features ● Warping traces models ● Eye blinking analysis ● Semantic analysis of the frames ● Inconsistent lighting effects
  • 6. Dataset ● FF++: ○ Data Variety: Face2Face, FaceSwap, DeepFakes, Neural Textures techniques. ○ Data Volume: Each method applies to 1000 pristine videos from YouTube where each video has 280 frames at least. ○ Data Format: Videos are compressed using the H.264 codec. ○ Data Split: 720 videos for training, 104 for validation and 140 for testing. ● DFDC ○ Data Variety: Different DeepFake techniques. Have the diversity info in terms of gender, skintone, age. ○ Data Volume: 119,000 videos where each has roughly 300 frames. Unbalanced dataset: 100,000 are fakes. ○ Data Split: 35 folders for training, 5 folders for validation and lst 10 folders for testing
  • 7. Data Preprocessing ● Select 32 frames in each video to the training set because this number can conquer overfitting and too many frame do not contribute to the model performance. ● Extract faces from each frame using the BlazeFace extractor in that it is faster than the MTCNN detector. If more than one faces is in a frame, only capture the one with the higher confidence score. ● Data augmentation: downscaling, horizontal flipping, random brightness contrast, hue saturation, noise addition and finally JPEG compression. ( Albumentation lib) 32 frames Extract Faces Using Model BlazeFace Run Data Augmentation
  • 8. Model Training Solution = Ensemble CNN + Attention Mechanism + Siamese Paradigm Keep computational complexity at bay: ● Analyze 4,000 videos in less than 9 hours using at most a single NVIDIA P100 GPU. ● The trained model must occupies less than 1GB of disk space.
  • 9. Ensembling Process ● Why using ensemble? Train classifiers that can capture high-level semantic info that complement one another. ● How? 1. Model Arch Source: EfficientNet (reasoning: good trade-off in terms of model size, run time(FLOPS cost) and accuracy). 2. Attention Mechanism: make the network explainable → show which part of frame is manipulated. 3. Network Training Strategies: Siamese Training.
  • 10. What Is EfficientNet? ● The design of EfficientNet relies on the techniques of architecture scaling on CNN. ● The scaling works on balancing the dimensions including width, depth, and image resolution. Image Source: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
  • 11. What Is Attention Layer? Q vector K vector V vector softmax((Q*K)/(dK)^0.5) * V
  • 12. What Is Siamese Training? - Pros and Cons It compares the similarity for a pair of input images. Pros: ● It is designed for a scalable system (One-shot learning model.) for use cases like Facial Recognition System, Place Registration, Signature Verification. Cons: ● Could have more training time as it requires quadratics pairs to learn.
  • 13. What Is Siamese Training? - Siamese Model Architecture Real Image Fake Image Loss Function Network 1 Network 2 Real Image Embedding Fake Image Embedding shared network weights
  • 14. What Is Siamese Training? - Siamese Model Loss Functions Real Image (Anchor Image) Real Image (Positive Image) Fake Image (Negative Image) Option 1. Triplet Loss Function = D(A, P) - D(A, F) + Margin Better to get closer. Better to get far from each other. Option 2. Contrastive Loss Function = (1-Y)*0.5*D(A, B) + (Y)*0.5*{max(0, Margin - D(A, B))} Image A Image B
  • 15. EfficientNetB4 with the Attention Layer Standard EfficientNet: ● Model Size: 19 millions of parameters ● Model Operations: 4.2 billion of FLOPS ● Model Performance: 83.8% top-1 accuracy on the ImageNet EfficientNetB4Att
  • 17. Other Training Info ● Hyperparameter: models using Adam optimizer with hyperparameters equal to β 1 = 0.9, β 2 = 0.999, = 10 −8, and initial learning rate equal to 10 −5. ● HW for Training: Intel Xeon E5-2687W-v4 and a NVIDIA Titan V.
  • 19. EfficientNetB4Att Explainability select the output of the Sigmoid layer in the attention block, which is a 2D map with size 28 × 28. Then, we up-scale it to the input face size (224 × 224), and superimpose this to the input face.