Successfully reported this slideshow.
Your SlideShare is downloading. ×

Deepfake Detection: The Importance of Training Data Preprocessing and Practical Considerations

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 19 Ad

Deepfake Detection: The Importance of Training Data Preprocessing and Practical Considerations

Download to read offline

Talk on the AI4Media Workshop on GANs for Media Content Generation, October 1st 2020, https://ai4media.eu/events/gan-media-generation-workshop-oct-2020/

Talk on the AI4Media Workshop on GANs for Media Content Generation, October 1st 2020, https://ai4media.eu/events/gan-media-generation-workshop-oct-2020/

Advertisement
Advertisement

More Related Content

Similar to Deepfake Detection: The Importance of Training Data Preprocessing and Practical Considerations (20)

Advertisement

More from Symeon Papadopoulos (20)

Recently uploaded (20)

Advertisement

Deepfake Detection: The Importance of Training Data Preprocessing and Practical Considerations

  1. 1. DeepFake Detection: The Importance of Training Data Preprocessing and Practical Considerations Dr. Symeon (Akis) Papadopoulos – @sympap MeVer Team @ Information Technologies Institute (ITI) / Centre for Research & Technology Hellas (CERTH) Joint work with Polychronis Charitidis, George Kordopatis-Zilos and Yiannis Kompatsiaris AI4Media Workshop on GANs for Media Content Generation, Oct 1, 2020 Media Verification (MeVer)
  2. 2. DeepFakes • Content, generated by AI, that seems authentic to human eye • Most common form: generation and manipulation of human face Source: https://en.wikipedia.org/wiki/Deepfake Source: https://www.youtube.com/watch?v=iHv6Q9ychnA Source: Media Forensics and DeepFakes: an overview
  3. 3. Manipulation types Facial manipulations can be categorised in four main different groups: • Entire face synthesis • Attribute manipulation • Identity swap • Expression swap Source: DeepFakes and Beyond: A Survey of Face Manipulation and Fake Detection (Tolosana et al., 2020) Tolosana, R., et al. (2020). Deepfakes and beyond: A survey of face manipulation and fake detection. arXiv preprint arXiv:2001.00179. Verdoliva, L. (2020). Media forensics and deepfakes: an overview. arXiv preprint arXiv:2001.06564. Mirsky, Y., & Lee, W. (2020). The Creation and Detection of Deepfakes: A Survey. arXiv preprint arXiv:2004.11138.
  4. 4. WeVerify Project • WeVerify aims at detecting disinformation in social media and expose misleading and fabricated content • Partners: Univ. Sheffield, OntoText, ATC, DW, AFP, EU DisinfoLab, CERTH • A key outcome is a platform for collaborative content verification, tracking, and debunking • Currently, we are developing a deepfake detection service for the WeVerify platform • Participation in DeepFake Detection Challenge https://weverify.eu/
  5. 5. DeepFake Detection Challenge • Goal: detect videos with facial or voice manipulations • 2,114 teams participated in the challenge • Log Loss error evaluation on public and private validation sets • Public evaluation contained videos with similar transformations as the training set • Private evaluation contained organic videos and videos with unknown transformations from the Internet • Our final standings: • public leaderboard: 49 (top 3%) with 0.295 Log Loss error • private leaderboard: 115 (top 5%) with 0.515 Log Loss error Source: https://www.kaggle.com/c/deepfake-detection-challenge
  6. 6. DeepFake Detection Challenge - dataset • Dataset of more than 110k videos • Approx. 20k REAL and the rest are FAKE • FAKE videos generated from the REAL • Models used: • DeepFake AutoEncoder (DFAE) • Morphable Mask faceswap (MM/NN) • Neural Talking Heads (NTH) • FSGAN • StyleGAN Dolhansky, B., Bitton, J., Pflaum, B., Lu, J., Howes, R., Wang, M., & Ferrer, C. C. (2020). The DeepFake Detection Challenge Dataset. arXiv preprint arXiv:2006.07397.
  7. 7. Dataset preprocessing - Issues • Face dataset quality depends on face extraction accuracy (Dlib, mtcnn, facenet-pytorch, Blazeface) • Generally all face extraction libraries generate a number of false positive detections • Manual tuning can improve the quality of the generated dataset Deep learning model Face extraction Frame extraction Video corpus
  8. 8. Noisy data creeping in the training set • Extracting faces with 1 fps from Kaggle DeepFake Detection Challenge dataset videos using pytorch implementation of MTCNN face detection • Observation: False detections are less compared to true detections in a video
  9. 9. Our “noise” filtering approach • Compute face embeddings for each detected face in video • Similarity calculation between all face embeddings in a video → similarity graph construction • Nodes represent faces and two faces are connected if their similarities are greater than 0.8 (solid lines) • Drop components smaller than N/2 (e.g. component 2) • N is the number of frames that contain face detections (true or false).
  10. 10. Advantages • Simple and fast procedure • No need for manual tuning of the face extraction settings • Clusters of distinct faces in cases of multiple persons in the video • This information can be utilized in various ways (e.g. predictions per face) Faces extracted from multiple video frames Component 1 Component 2
  11. 11. Experiments • We trained multiple DeepFake detection models on the DFDC dataset with and without (baseline) our proposed approach • Three datasets: a) Celeb-DF, b) FaceForensics++, c) DFDC subset • For evaluation we examined two aggregation approaches • avg: prediction is the average of all face predictions • face: prediction is the max prediction among different avg face predictions • Results for the EfficientNet-B4 model in terms of Log loss error: Pre- processing CelebDF FaceForensics++ DFDC avg face avg face avg face baseline 0,510 0,511 0,563 0,563 0,213 0,198 proposed 0,458 0,456 0,497 0,496 0,195 0,173
  12. 12. Our DFDC Approach - details • Applied proposed preprocessing approach to clean the generated face dataset • Face augmentation: • Horizontal & vertical flip, random crop, rotation, image compression, Gaussian & motion blurring, brightness, saturation & contrast transformation • Trained three different models: a) EfficientNet-B3, b) EfficientNet-B4, c) I3D* • Models trained on face level: • I3d trained with 10 consecutive face images exploiting temporal information. • EfficientNet models trained on single face images • Per model: • Added two dense layers with dropout after the backbone architecture with 256 and 1 units • Used the sigmoid activation for the last layer * ignoring the optical flow stream
  13. 13. Our DFDC approach – inference pre-processing model inference post-processing
  14. 14. Lessons from other DFDC teams • Most approaches ensemble multiple EfficientNet architectures (B3-B7) and some of them were trained on different seeds • ResNeXT was another architecture used by a top-performing solutions combined with 3D architectures such as I3D, 3D ResNet34, MC3 & R2+1D • Several approaches increased the margin of the detected facial bounding box to further improve results. • We used an additional margin of 20% but other works proposed a higher proportion. • To improve generalization: • Domain-specific augmentations: a) half face removal horizontally or vertically, b) landmark (eyes, nose, or mouth) removal • Mixup augmentations
  15. 15. Practical challenges • Limited generalization • This observation applies to most submissions. The winning team scored 0.20336 in public validation and only 0.42798 in the private (Log Loss) • Overfitting • The best submission in the public leaderboard scored 0.19207 but in the private evaluation the error was 0.57468, leading to the 904-th position! • Broad problem scope • The term DeepFake may refer to every possible manipulation and generation • Constantly increasing manipulation and generation techniques • A detector is only trained with a subset of these manipulations
  16. 16. DeepFake Detection in the Wild • Videos in the wild usually contain multiple scenes • Only a subset of these scenes may contain DeepFakes • Detection process might be slow for multi-shot videos (even short ones) • Low quality videos • Low quality faces tend to fool classifiers • Small detected and fast-moving faces • Usually lead to noisy predictions • Changes in the environment • Moving obstacles in front of the faces • Changes in lighting
  17. 17. DeepFake Detection Service @ WeVerify https://www.youtube.com/watch?v=cVljNV V5VPw&ab_channel=TheFakening
  18. 18. More details at TTO 2020 Charitidis, P., Kordopatis-Zilos, G., Papadopoulos, S., & Kompatsiaris, Y. (2020). Investigating the impact of preprocessing and prediction aggregation on the DeepFake detection task. Proceedings of the Conference for Truth and Trust Online (TTO) [to appear], https://arxiv.org/abs/2006.07084 https://truthandtrustonline.com/
  19. 19. Thank you! Dr. Symeon Papadopoulos papadop@iti.gr @sympap Media Verification (MeVer) https://mever.iti.gr/ @meverteam https://ai4media.eu/ https://weverify.eu/

×