1. Detecting image splicing in the wild (Web)
Markos Zampoglou, Symeon Papadopoulos, Yiannis Kompatsiaris
1Centre for Research and Technology Hellas (CERTH) – Information Technologies Institute (ITI)
WeMuV2015 workshop, ICME, June 29, 2015, Turin, Italy
3. Blind image splicing detection
• Assume the splice differs in some aspect from the
rest of the image
– Capture invisible “traces”: DCT coefficient distribution,
PRNU, CFA interpolation patterns…
• But traces degrade at subsequent image alterations
• Social media journalism establishes a different
paradigm from typical image forensics
– We don’t have the luxury of demanding we see the
originals
#3
5. Images in the wild
#5
• Twitter:
– Images larger than 2048×1024 are scaled down
– Large PNG files (> 3MB) converted to JPEG
– JPEG files resaved at quality 75
• Facebook
– Images larger than 2048 × 2048 are scaled down
– Large PNG files converted to JPEG
– JPEG files resaved at varying quality (~70-90)
• Both media platforms also erase metadata from
images
6. Existing image splicing datasets
#6
Name Format Masks #images
Columbia1 BMP grayscale No 933/912
Columbia Unc.2 TIFF Unc. Yes 183/180
CASIA TIDE v2.03 TIFF Unc. , JPEG, BMP No 7491/5123
VIPP Synthetic4 JPEG Yes 4800/4800
VIPP Realistic4 JPEG Manual 63/68
1http://www.ee.columbia.edu/ln/dvmm/downloads/AuthSplicedDataSet/AuthSplicedDataSet.htm
2http://www.ee.columbia.edu/ln/dvmm/downloads/authsplcuncmp/
3http://forensics.idealtest.org:8080/indexopt_v2.php
4http://clem.dii.unisi.it/~vipp/index.php/imagerepository/129-a-framework-for-decision-fusion-in-
image-forensics-based-on-dempster-shafer-theory-of-evidence
7. Issues with existing datasets
#7
• Ground-truth masks: only Columbia Uncompressed and
VIPP offer binary masks
• Quality of splices: only CASIA and VIPP Realistic contain
realistic forgeries
• Image format: Only VIPP and CASIA offer JPEG images
– At least 87% of the common crawl corpus
(http://commoncrawl.org/) images are JPEG
– Out of 13,577 forged images collected in our investigations,
~95% were in JPEG format
• Neatness: All datasets contain first-level forgeries with
no further alterations
8. Collecting a dataset of Web forgeries
• Aim: build an evaluation framework with the web-
based case in mind
– Evaluate existing and future algorithms against the real-
world, web-based application scenario
– Assess the status of the web: how many versions of each
forgery, how close to the original
• Methodology: identify verified forgeries, and
exhaustively download as many instances as possible
for analysis
#8
9. The Wild Web Dataset (1/5)
• Identified 82 cases of confirmed forgeries
#9
10. The Wild Web Dataset (2/5)
• Collected all detectable instances of each case
• Removed exact file duplicates
• 13,577 images in total
• Identified and removed heavily altered variants of
each case
#10
11. The Wild Web Dataset (3/5)
• By removing crops and post-splices, we were left
with 9,751 images
• Variants within cases were separated, and the
sources were gathered where possible
#11
12. The Wild Web Dataset (4/5)
• Designed ground-truth binary masks for each sub-
case corresponding to each possible forgery step (for
complex forgeries)
#12
13. The Wild Web Dataset (5/5)
#13
• The final dataset by the numbers:
– 82 cases of forgeries
– 92 forgery variants
– 101 unique masks
– 13,577 images total
– 9,751 images resembling the original forgery
• For each of the 82 cases, a match on any mask of any
variant should be considered an overall success
14. Experimental evaluations
#14
• Emulated real-world conditions: we applied the
minimum typical transformations (JPEG resave &
rescaling) to the datasets compatible with the task:
– Columbia Uncompressed
– VIPP Synthetic
– VIPP Realistic
– Set 1: JPEG recompression at Quality 75
– Set 2: rescale to 75% size followed JPEG recompression at
Quality 75
15. Reconsidering evaluation protocols (1/3)
#15
• Forgery localization algorithms typically produce a
value map
• Ground truth takes the form of a binary mask
signifying the tampered area
• Past approaches compare values under the mask to
the rest of the image:
– Kolmogorov-Smirnov (KS) statistic (Farid et al, 2009)
– Median value (Fontani et al, 2013)
17. • This would be considered a good detection under
typical methodologies
– Median under mask: ~0.93
– Median outside mask: ~0.02
– K-S Statistic: ~0.41
• Any human evaluator would disagree
#17
Reconsidering evaluation protocols (3/3)
18. Proposed evaluation protocol (1/2)
#18
1. Take the output value map
2. Binarize according to some method-appropriate
threshold
– e.g. 0.5 for probabilistic methods
3. Compare the binary map to the ground truth mask:
4. Values above an experimental threshold (0.65)
suggest a strong match
𝐸 𝐴, 𝑀 =
𝐴 ∩ 𝑀 2
𝐴 × 𝑀
19. Proposed evaluation protocol (2/2)
#19
• Adapt to mimic a human’s perspective:
1. Apply multiple morphological processing operations
2. Try multiple (method-appropriate) thresholds
3. Keep the best-fitting result (bias towards success)
• For non-spliced images (true negative/false positive
detection), apply the same methodology and declare
a success for a blank binary map
– Main disadvantage: binary outcome, no parameters to
tweak for ROC curve generation.
20. Evaluations
#20
• Evaluated seven algorithms:
– Double JPEG quantization (Lin et al, 2009), (Bianchi et al,
2011), (Bianchi et al, 2012a)
– Non-Aligned double JPEG quantization (Bianchi et al,
2012b)
– CFA artifacts (Ferrara et al, 2007)
– High-frequency DW noise (Mahdian et al, 2009)
– JPEG ghosts (Farid, 2010)
23. Evaluation results: Emulated datasets (4/4)
#23
• Methods behave generally as expected
– CFA patterns destroyed by the first JPEG compression
• (Mahdian et al, 2009) is not particularly effective, but
shows little vulnerability to alterations
• DQ methods show some degree of robustness to
recompression only
• Rescaling is extremely disruptive, as expected
24. Evaluation results: Wild Web dataset (1/2)
#24
• 36 out of 82 cases were successfully detected by at
least one method
– Not a single image gave good results for the other 46
cases, for any algorithm
(Lin et
al, 2009)
(Bianchi et
al, 2011)
(Ferrara et
al, 2007)
(Bianchi et
al, 2012b)
(Bianchi et
al, 2012b)
(Mahdian
et al, 2009)
(Farid,
2010)
Detections 13 12 1 8 5 15 29
Unique 4 1 0 1 2 6 10
25. Evaluation results: Wild Web dataset (2/2)
#25
• The noise-based method of (Mahdian et al, 2009)
proved disproportionately successful,
– We should not forget how prone to false positives it is.
• JPEG Ghosts are very robust, if we can manage the
amount of output they produce
• Even in the cases where successful detection
occurred, only a few images were correctly detected
– 1386 images in the entire dataset (~ 14.3%)
– Excluding the three easiest classes, only 333 out of 8580
images were detected (~ 3.9%)
30. Conclusions
• In the web, very few images retain traces which are detectable
with today’s state-of-the-art forensic approaches
• It is difficult to estimate the relative age of each instance of a
viral image
• DQ-based methods give results with the highest confidence,
but are not particularly robust
• JPEG Ghosts demonstrate significantly higher robustness than
other methods, but produce large amounts of noisy output
• DW high-frequency noise also appears to give good results, but
seems extremely prone to false positives
#30
31. Future steps
• For the web journalism case, robustness ought to be a central
consideration for future algorithm evaluations
• The Wild Web dataset is freely distributed for research purposes
– Due to copyright considerations, this is currently only feasible through direct contact
– The dataset should be maintained to incorporate new cases of forgeries, as they
come out
• Advance the state-of-the-art by focusing on more robust traces of splicing
• Following the life-cycle of images on the web can help locate their earliest
versions and build an account of the alterations that have taken place
(Kennedy & Chang, 2008)
• The question remains: to what extent is the task feasible? When can we be
certain that all traces have been lost?
#31
32. References
#32
• Bianchi, Tiziano, Alessia De Rosa, and Alessandro Piva. "Improved DCT coefficient analysis for
forgery localization in JPEG images." In Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE
International Conference on, pp. 2444-2447. IEEE, 2011.
• Bianchi, Tiziano and Alessandro Piva, “Image forgery localization via block-grained analysis of JPEG
artifacts,” IEEE Transactions on Information Forensics and Security, vol. 7, no. 3, pp. 1003–1017,
2012.
• Ferrara, Pasquale, Tiziano Bianchi, Alessia De Rosa, and Alessandro Piva. "Image forgery localization
via fine-grained analysis of cfa artifacts." Information Forensics and Security, IEEE Transactions on
7, no. 5 (2012): 1566-1577.
• Farid, Hany. "Exposing digital forgeries from JPEG ghosts." Information Forensics and Security, IEEE
Transactions on 4, no. 1 (2009): 154-160.
• Fontani, Marco, Tiziano Bianchi, Alessia De Rosa, Alessandro Piva, and Mauro Barni. "A framework
for decision fusion in image forensics based on dempster–shafer theory of evidence." Information
Forensics and Security, IEEE Transactions on 8, no. 4 (2013): 593-607.
• Kennedy, Lyndon, and Shih-Fu Chang. "Internet image archaeology: automatically tracing the
manipulation history of photographs on the web." In Proceedings of the 16th ACM international
conference on Multimedia, pp. 349-358. ACM, 2008.
• Lin, Zhouchen, Junfeng He, Xiaoou Tang, and Chi-Keung Tang. "Fast, automatic and fine-grained
tampered JPEG image detection via DCT coefficient analysis." Pattern Recognition 42, no. 11
(2009): 2492-2501.
• Mahdian, Babak and Stanislav Saic, “Using noise inconsistencies for blind image forensics,” Image
and Vision Computing, vol. 27, no. 10, pp. 1497–1503, 2009.