Detecting image splicing in the wild Web

Symeon Papadopoulos
Symeon PapadopoulosResearcher at CERTH-ITI, Co-founder at infalia
Detecting image splicing in the wild (Web)
Markos Zampoglou, Symeon Papadopoulos, Yiannis Kompatsiaris
1Centre for Research and Technology Hellas (CERTH) – Information Technologies Institute (ITI)
WeMuV2015 workshop, ICME, June 29, 2015, Turin, Italy
A new journalistic paradigm
#2
…and its pitfalls
Blind image splicing detection
• Assume the splice differs in some aspect from the
rest of the image
– Capture invisible “traces”: DCT coefficient distribution,
PRNU, CFA interpolation patterns…
• But traces degrade at subsequent image alterations
• Social media journalism establishes a different
paradigm from typical image forensics
– We don’t have the luxury of demanding we see the
originals
#3
Image tampering lifecycle
#4
Images in the wild
#5
• Twitter:
– Images larger than 2048×1024 are scaled down
– Large PNG files (> 3MB) converted to JPEG
– JPEG files resaved at quality 75
• Facebook
– Images larger than 2048 × 2048 are scaled down
– Large PNG files converted to JPEG
– JPEG files resaved at varying quality (~70-90)
• Both media platforms also erase metadata from
images
Existing image splicing datasets
#6
Name Format Masks #images
Columbia1 BMP grayscale No 933/912
Columbia Unc.2 TIFF Unc. Yes 183/180
CASIA TIDE v2.03 TIFF Unc. , JPEG, BMP No 7491/5123
VIPP Synthetic4 JPEG Yes 4800/4800
VIPP Realistic4 JPEG Manual 63/68
1http://www.ee.columbia.edu/ln/dvmm/downloads/AuthSplicedDataSet/AuthSplicedDataSet.htm
2http://www.ee.columbia.edu/ln/dvmm/downloads/authsplcuncmp/
3http://forensics.idealtest.org:8080/indexopt_v2.php
4http://clem.dii.unisi.it/~vipp/index.php/imagerepository/129-a-framework-for-decision-fusion-in-
image-forensics-based-on-dempster-shafer-theory-of-evidence
Issues with existing datasets
#7
• Ground-truth masks: only Columbia Uncompressed and
VIPP offer binary masks
• Quality of splices: only CASIA and VIPP Realistic contain
realistic forgeries
• Image format: Only VIPP and CASIA offer JPEG images
– At least 87% of the common crawl corpus
(http://commoncrawl.org/) images are JPEG
– Out of 13,577 forged images collected in our investigations,
~95% were in JPEG format
• Neatness: All datasets contain first-level forgeries with
no further alterations
Collecting a dataset of Web forgeries
• Aim: build an evaluation framework with the web-
based case in mind
– Evaluate existing and future algorithms against the real-
world, web-based application scenario
– Assess the status of the web: how many versions of each
forgery, how close to the original
• Methodology: identify verified forgeries, and
exhaustively download as many instances as possible
for analysis
#8
The Wild Web Dataset (1/5)
• Identified 82 cases of confirmed forgeries
#9
The Wild Web Dataset (2/5)
• Collected all detectable instances of each case
• Removed exact file duplicates
• 13,577 images in total
• Identified and removed heavily altered variants of
each case
#10
The Wild Web Dataset (3/5)
• By removing crops and post-splices, we were left
with 9,751 images
• Variants within cases were separated, and the
sources were gathered where possible
#11
The Wild Web Dataset (4/5)
• Designed ground-truth binary masks for each sub-
case corresponding to each possible forgery step (for
complex forgeries)
#12
The Wild Web Dataset (5/5)
#13
• The final dataset by the numbers:
– 82 cases of forgeries
– 92 forgery variants
– 101 unique masks
– 13,577 images total
– 9,751 images resembling the original forgery
• For each of the 82 cases, a match on any mask of any
variant should be considered an overall success
Experimental evaluations
#14
• Emulated real-world conditions: we applied the
minimum typical transformations (JPEG resave &
rescaling) to the datasets compatible with the task:
– Columbia Uncompressed
– VIPP Synthetic
– VIPP Realistic
– Set 1: JPEG recompression at Quality 75
– Set 2: rescale to 75% size followed JPEG recompression at
Quality 75
Reconsidering evaluation protocols (1/3)
#15
• Forgery localization algorithms typically produce a
value map
• Ground truth takes the form of a binary mask
signifying the tampered area
• Past approaches compare values under the mask to
the rest of the image:
– Kolmogorov-Smirnov (KS) statistic (Farid et al, 2009)
– Median value (Fontani et al, 2013)
Reconsidering evaluation protocols (2/3)
#16
• A recompressed image from VIPP Realistic, analyzed
using (Lin et al, 2009)
• This would be considered a good detection under
typical methodologies
– Median under mask: ~0.93
– Median outside mask: ~0.02
– K-S Statistic: ~0.41
• Any human evaluator would disagree
#17
Reconsidering evaluation protocols (3/3)
Proposed evaluation protocol (1/2)
#18
1. Take the output value map
2. Binarize according to some method-appropriate
threshold
– e.g. 0.5 for probabilistic methods
3. Compare the binary map to the ground truth mask:
4. Values above an experimental threshold (0.65)
suggest a strong match
𝐸 𝐴, 𝑀 =
𝐴 ∩ 𝑀 2
𝐴 × 𝑀
Proposed evaluation protocol (2/2)
#19
• Adapt to mimic a human’s perspective:
1. Apply multiple morphological processing operations
2. Try multiple (method-appropriate) thresholds
3. Keep the best-fitting result (bias towards success)
• For non-spliced images (true negative/false positive
detection), apply the same methodology and declare
a success for a blank binary map
– Main disadvantage: binary outcome, no parameters to
tweak for ROC curve generation.
Evaluations
#20
• Evaluated seven algorithms:
– Double JPEG quantization (Lin et al, 2009), (Bianchi et al,
2011), (Bianchi et al, 2012a)
– Non-Aligned double JPEG quantization (Bianchi et al,
2012b)
– CFA artifacts (Ferrara et al, 2007)
– High-frequency DW noise (Mahdian et al, 2009)
– JPEG ghosts (Farid, 2010)
• Comparing median values:
Evaluation results: Emulated datasets (1/2)
#21
Dataset
(Lin et al,
2009)
(Bianchi et
al, 2011)
(Ferrara et
al, 2007)
(Bianchi
et al,
2012b)
(Bianchi
et al,
2012b)
(Mahdian
et al,
2009)
Columbia
Uncomp.
Orig.
JPEG
Resized
- -
0.89 (0.05)
0.05 (0.05)
0.03 (0.04)
- -
0.39 (0.04)
0.09 (0.05)
0.11 (0.05)
VIPP
Synthetic
Orig.
JPEG
Resized
0.47 (0.05)
0.30 (0.04)
0.05 (0.05)
0.51 (0.05)
0.43 (0.04)
0.05 (0.05)
0.15 (0.05)
0.16 (0.05)
0.05 (0.04)
0.57 (0.01)
0.39 (0.05)
0.05 (0.05)
0.28 (0.05)
0.16 (0.05)
0.05 (0.05)
0.13 (0.05)
0.10 (0.05)
0.06 (0.05)
VIPP
Realistic
Orig.
JPEG
Resized
0.54 (0.04)
0.32 (0.04)
0.13 (0.04)
0.58 (0.04)
0.36 (0.04)
0.12(0.06)
0.04 (0.04)
0.04 (0.04)
0.03 (0.04)
0.70 (0.04)
0.51 (0.04)
0.23 (0.04)
0.28 (0.04)
0.17 (0.04)
0.17 (0.04)
0.20 (0.04)
0.20 (0.04)
0.18 (0.04)
• Proposed evaluation framework:
Evaluation results: Emulated datasets (2/2)
#22
Dataset
(Lin et al,
2009)
(Bianchi et
al, 2011)
(Ferrara et
al, 2007)
(Bianchi
et al,
2012b)
(Bianchi
et al,
2012b)
(Mahdian
et al,
2009)
Columbia
Uncomp.
Orig.
JPEG
Resized
- -
0.66 (0.16)
0.00 (0.20)
0.00 (0.24)
- -
0.12 (0.57)
0.02 (0.86)
0.04 (0.79)
VIPP
Synthetic
Orig.
JPEG
Resized
0.44 (0.27)
0.26 (0.30)
0.00 (0.23)
0.52 (0.00)
0.30 (0.10)
0.00 (0.00)
0.01 (0.23)
0.01 (0.28)
0.00 (0.23)
0.58 (0.09)
0.23 (0.27)
0.00 (0.15)
0.04 (0.25)
0.01 (0.29)
0.00 (0.29)
0.04 (0.74)
0.04 (0.74)
0.00 (0.84)
VIPP
Realistic
Orig.
JPEG
Resized
0.41 (0.46)
0.13 (0.44)
0.00 (0.47)
0.38 (0.09)
0.17 (0.29)
0.00 (0.00)
0.09 (0.22)
0.00 (0.25)
0.00 (0.28)
0.23 (0.30)
0.14 (0.46)
0.03 (0.25)
0.03 (0.39)
0.01 (0.43)
0.01 (0.47)
0.04 (0.90)
0.02 (0.90)
0.01 (0.47)
Evaluation results: Emulated datasets (4/4)
#23
• Methods behave generally as expected
– CFA patterns destroyed by the first JPEG compression
• (Mahdian et al, 2009) is not particularly effective, but
shows little vulnerability to alterations
• DQ methods show some degree of robustness to
recompression only
• Rescaling is extremely disruptive, as expected
Evaluation results: Wild Web dataset (1/2)
#24
• 36 out of 82 cases were successfully detected by at
least one method
– Not a single image gave good results for the other 46
cases, for any algorithm
(Lin et
al, 2009)
(Bianchi et
al, 2011)
(Ferrara et
al, 2007)
(Bianchi et
al, 2012b)
(Bianchi et
al, 2012b)
(Mahdian
et al, 2009)
(Farid,
2010)
Detections 13 12 1 8 5 15 29
Unique 4 1 0 1 2 6 10
Evaluation results: Wild Web dataset (2/2)
#25
• The noise-based method of (Mahdian et al, 2009)
proved disproportionately successful,
– We should not forget how prone to false positives it is.
• JPEG Ghosts are very robust, if we can manage the
amount of output they produce
• Even in the cases where successful detection
occurred, only a few images were correctly detected
– 1386 images in the entire dataset (~ 14.3%)
– Excluding the three easiest classes, only 333 out of 8580
images were detected (~ 3.9%)
Forgery detection in the Wild (1/4)
#26
Forgery detection in the Wild (2/4)
#27
Forgery detection in the Wild (3/4)
#28
Forgery detection in the Wild (4/4)
#29
Conclusions
• In the web, very few images retain traces which are detectable
with today’s state-of-the-art forensic approaches
• It is difficult to estimate the relative age of each instance of a
viral image
• DQ-based methods give results with the highest confidence,
but are not particularly robust
• JPEG Ghosts demonstrate significantly higher robustness than
other methods, but produce large amounts of noisy output
• DW high-frequency noise also appears to give good results, but
seems extremely prone to false positives
#30
Future steps
• For the web journalism case, robustness ought to be a central
consideration for future algorithm evaluations
• The Wild Web dataset is freely distributed for research purposes
– Due to copyright considerations, this is currently only feasible through direct contact
– The dataset should be maintained to incorporate new cases of forgeries, as they
come out
• Advance the state-of-the-art by focusing on more robust traces of splicing
• Following the life-cycle of images on the web can help locate their earliest
versions and build an account of the alterations that have taken place
(Kennedy & Chang, 2008)
• The question remains: to what extent is the task feasible? When can we be
certain that all traces have been lost?
#31
References
#32
• Bianchi, Tiziano, Alessia De Rosa, and Alessandro Piva. "Improved DCT coefficient analysis for
forgery localization in JPEG images." In Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE
International Conference on, pp. 2444-2447. IEEE, 2011.
• Bianchi, Tiziano and Alessandro Piva, “Image forgery localization via block-grained analysis of JPEG
artifacts,” IEEE Transactions on Information Forensics and Security, vol. 7, no. 3, pp. 1003–1017,
2012.
• Ferrara, Pasquale, Tiziano Bianchi, Alessia De Rosa, and Alessandro Piva. "Image forgery localization
via fine-grained analysis of cfa artifacts." Information Forensics and Security, IEEE Transactions on
7, no. 5 (2012): 1566-1577.
• Farid, Hany. "Exposing digital forgeries from JPEG ghosts." Information Forensics and Security, IEEE
Transactions on 4, no. 1 (2009): 154-160.
• Fontani, Marco, Tiziano Bianchi, Alessia De Rosa, Alessandro Piva, and Mauro Barni. "A framework
for decision fusion in image forensics based on dempster–shafer theory of evidence." Information
Forensics and Security, IEEE Transactions on 8, no. 4 (2013): 593-607.
• Kennedy, Lyndon, and Shih-Fu Chang. "Internet image archaeology: automatically tracing the
manipulation history of photographs on the web." In Proceedings of the 16th ACM international
conference on Multimedia, pp. 349-358. ACM, 2008.
• Lin, Zhouchen, Junfeng He, Xiaoou Tang, and Chi-Keung Tang. "Fast, automatic and fine-grained
tampered JPEG image detection via DCT coefficient analysis." Pattern Recognition 42, no. 11
(2009): 2492-2501.
• Mahdian, Babak and Stanislav Saic, “Using noise inconsistencies for blind image forensics,” Image
and Vision Computing, vol. 27, no. 10, pp. 1497–1503, 2009.
Thank you!
• Slides:
http://www.slideshare.net/sympapadopoulos/detecting-image-splicing-
in-the-wild-web
• Get in touch:
@markzampoglou / markzampoglou@iti.gr
@sympapadopoulos / papadop@iti.gr
#33
1 of 33

Recommended

Naveen 9911103606 major ppt by
Naveen 9911103606 major pptNaveen 9911103606 major ppt
Naveen 9911103606 major pptNaveen Rajgariya
422 views16 slides
Digital image forgery detection by
Digital image forgery detectionDigital image forgery detection
Digital image forgery detectionAB Rizvi
10.7K views25 slides
Blind detection of image manipulation @ PoliMi by
Blind detection of image manipulation @ PoliMiBlind detection of image manipulation @ PoliMi
Blind detection of image manipulation @ PoliMiGiorgio Sironi
1.7K views16 slides
20120140502012 by
2012014050201220120140502012
20120140502012IAEME Publication
1.5K views8 slides
Region duplication forgery detection in digital images by
Region duplication forgery detection  in digital imagesRegion duplication forgery detection  in digital images
Region duplication forgery detection in digital imagesRupesh Ambatwad
299 views17 slides
Image forgery and security by
Image forgery and securityImage forgery and security
Image forgery and securityأحلام انصارى
13.5K views15 slides

More Related Content

What's hot

A proposed accelerated image copy-move forgery detection-vcip2014 by
A proposed accelerated image copy-move forgery detection-vcip2014A proposed accelerated image copy-move forgery detection-vcip2014
A proposed accelerated image copy-move forgery detection-vcip2014SondosFadl
1.8K views33 slides
Copy-Rotate-Move Forgery Detection Based on Spatial Domain by
Copy-Rotate-Move Forgery Detection Based on Spatial DomainCopy-Rotate-Move Forgery Detection Based on Spatial Domain
Copy-Rotate-Move Forgery Detection Based on Spatial DomainSondosFadl
1.5K views40 slides
FAN search for image copy-move forgery-amalta 2014 by
 FAN search for image copy-move forgery-amalta 2014 FAN search for image copy-move forgery-amalta 2014
FAN search for image copy-move forgery-amalta 2014SondosFadl
569 views31 slides
Digital Image Forgery by
Digital Image ForgeryDigital Image Forgery
Digital Image ForgeryMohamed Talaat
2.5K views12 slides
Visual Quality for both Images and Display of Systems by Visual Enhancement u... by
Visual Quality for both Images and Display of Systems by Visual Enhancement u...Visual Quality for both Images and Display of Systems by Visual Enhancement u...
Visual Quality for both Images and Display of Systems by Visual Enhancement u...IJMER
422 views6 slides
Digital Image Processing: Image Segmentation by
Digital Image Processing: Image SegmentationDigital Image Processing: Image Segmentation
Digital Image Processing: Image SegmentationMostafa G. M. Mostafa
48.3K views58 slides

What's hot(20)

A proposed accelerated image copy-move forgery detection-vcip2014 by SondosFadl
A proposed accelerated image copy-move forgery detection-vcip2014A proposed accelerated image copy-move forgery detection-vcip2014
A proposed accelerated image copy-move forgery detection-vcip2014
SondosFadl1.8K views
Copy-Rotate-Move Forgery Detection Based on Spatial Domain by SondosFadl
Copy-Rotate-Move Forgery Detection Based on Spatial DomainCopy-Rotate-Move Forgery Detection Based on Spatial Domain
Copy-Rotate-Move Forgery Detection Based on Spatial Domain
SondosFadl1.5K views
FAN search for image copy-move forgery-amalta 2014 by SondosFadl
 FAN search for image copy-move forgery-amalta 2014 FAN search for image copy-move forgery-amalta 2014
FAN search for image copy-move forgery-amalta 2014
SondosFadl569 views
Visual Quality for both Images and Display of Systems by Visual Enhancement u... by IJMER
Visual Quality for both Images and Display of Systems by Visual Enhancement u...Visual Quality for both Images and Display of Systems by Visual Enhancement u...
Visual Quality for both Images and Display of Systems by Visual Enhancement u...
IJMER422 views
Basics of image processing using MATLAB by Mohsin Siddique
Basics of image processing using MATLABBasics of image processing using MATLAB
Basics of image processing using MATLAB
Mohsin Siddique1.4K views
New microsoft power point presentation by Azad Singh
New microsoft power point presentationNew microsoft power point presentation
New microsoft power point presentation
Azad Singh241 views
Statistical Feature based Blind Classifier for JPEG Image Splice Detection by rahulmonikasharma
Statistical Feature based Blind Classifier for JPEG Image Splice DetectionStatistical Feature based Blind Classifier for JPEG Image Splice Detection
Statistical Feature based Blind Classifier for JPEG Image Splice Detection
Denoising Process Based on Arbitrarily Shaped Windows by CSCJournals
Denoising Process Based on Arbitrarily Shaped WindowsDenoising Process Based on Arbitrarily Shaped Windows
Denoising Process Based on Arbitrarily Shaped Windows
CSCJournals153 views
Image Enhancement by Image Fusion for Crime Investigation by CSCJournals
Image Enhancement by Image Fusion for Crime InvestigationImage Enhancement by Image Fusion for Crime Investigation
Image Enhancement by Image Fusion for Crime Investigation
CSCJournals354 views
Removal of Gaussian noise on the image edges using the Prewitt operator and t... by IOSR Journals
Removal of Gaussian noise on the image edges using the Prewitt operator and t...Removal of Gaussian noise on the image edges using the Prewitt operator and t...
Removal of Gaussian noise on the image edges using the Prewitt operator and t...
IOSR Journals305 views
Feature isolation and extraction of satellite images for remote sensing appli... by IAEME Publication
Feature isolation and extraction of satellite images for remote sensing appli...Feature isolation and extraction of satellite images for remote sensing appli...
Feature isolation and extraction of satellite images for remote sensing appli...
IAEME Publication354 views
An Efficient Approach of Segmentation and Blind Deconvolution in Image Restor... by iosrjce
An Efficient Approach of Segmentation and Blind Deconvolution in Image Restor...An Efficient Approach of Segmentation and Blind Deconvolution in Image Restor...
An Efficient Approach of Segmentation and Blind Deconvolution in Image Restor...
iosrjce203 views
SEMANTIC IMAGE RETRIEVAL USING MULTIPLE FEATURES by cscpconf
SEMANTIC IMAGE RETRIEVAL USING MULTIPLE FEATURESSEMANTIC IMAGE RETRIEVAL USING MULTIPLE FEATURES
SEMANTIC IMAGE RETRIEVAL USING MULTIPLE FEATURES
cscpconf45 views
IJCER (www.ijceronline.com) International Journal of computational Engineeri... by ijceronline
 IJCER (www.ijceronline.com) International Journal of computational Engineeri... IJCER (www.ijceronline.com) International Journal of computational Engineeri...
IJCER (www.ijceronline.com) International Journal of computational Engineeri...
ijceronline270 views
A Biometric Approach to Encrypt a File with the Help of Session Key by Sougata Das
A Biometric Approach to Encrypt a File with the Help of Session KeyA Biometric Approach to Encrypt a File with the Help of Session Key
A Biometric Approach to Encrypt a File with the Help of Session Key
Sougata Das757 views
Object Shape Representation by Kernel Density Feature Points Estimator by cscpconf
Object Shape Representation by Kernel Density Feature Points Estimator Object Shape Representation by Kernel Density Feature Points Estimator
Object Shape Representation by Kernel Density Feature Points Estimator
cscpconf43 views

Similar to Detecting image splicing in the wild Web

slide-171212080528.pptx by
slide-171212080528.pptxslide-171212080528.pptx
slide-171212080528.pptxSharanrajK22MMT1003
4 views41 slides
Real Time Object Dectection using machine learning by
Real Time Object Dectection using machine learningReal Time Object Dectection using machine learning
Real Time Object Dectection using machine learningpratik pratyay
750 views41 slides
From ensembles to computer networks by
From ensembles to computer networksFrom ensembles to computer networks
From ensembles to computer networksCSIRO
33 views56 slides
Automated Security Surveillance System in Real Time World by
Automated Security Surveillance System in Real Time WorldAutomated Security Surveillance System in Real Time World
Automated Security Surveillance System in Real Time WorldIRJET Journal
41 views4 slides
Knowledge-based Fusion for Image Tampering Localization by
Knowledge-based Fusion for Image Tampering LocalizationKnowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering LocalizationSymeon Papadopoulos
133 views24 slides
Aiai2020 paper27-forensics-fusion-201021113929 by
Aiai2020 paper27-forensics-fusion-201021113929Aiai2020 paper27-forensics-fusion-201021113929
Aiai2020 paper27-forensics-fusion-201021113929Weverify
219 views24 slides

Similar to Detecting image splicing in the wild Web(20)

Real Time Object Dectection using machine learning by pratik pratyay
Real Time Object Dectection using machine learningReal Time Object Dectection using machine learning
Real Time Object Dectection using machine learning
pratik pratyay750 views
From ensembles to computer networks by CSIRO
From ensembles to computer networksFrom ensembles to computer networks
From ensembles to computer networks
CSIRO33 views
Automated Security Surveillance System in Real Time World by IRJET Journal
Automated Security Surveillance System in Real Time WorldAutomated Security Surveillance System in Real Time World
Automated Security Surveillance System in Real Time World
IRJET Journal41 views
Knowledge-based Fusion for Image Tampering Localization by Symeon Papadopoulos
Knowledge-based Fusion for Image Tampering LocalizationKnowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering Localization
Aiai2020 paper27-forensics-fusion-201021113929 by Weverify
Aiai2020 paper27-forensics-fusion-201021113929Aiai2020 paper27-forensics-fusion-201021113929
Aiai2020 paper27-forensics-fusion-201021113929
Weverify219 views
face recognition system using LBP by Marwan H. Noman
face recognition system using LBPface recognition system using LBP
face recognition system using LBP
Marwan H. Noman11.4K views
Introduction to computer vision with Convoluted Neural Networks by MarcinJedyk
Introduction to computer vision with Convoluted Neural NetworksIntroduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural Networks
MarcinJedyk60 views
Introduction talk to Computer Vision by Chen Sagiv
Introduction talk to Computer Vision Introduction talk to Computer Vision
Introduction talk to Computer Vision
Chen Sagiv406 views
Introduction to computer vision by Marcin Jedyk
Introduction to computer visionIntroduction to computer vision
Introduction to computer vision
Marcin Jedyk56 views
Coin recognition using matlab by slmnsvn
Coin recognition using matlabCoin recognition using matlab
Coin recognition using matlab
slmnsvn3.2K views
IRJET- Exploring Image Super Resolution Techniques by IRJET Journal
IRJET- Exploring Image Super Resolution TechniquesIRJET- Exploring Image Super Resolution Techniques
IRJET- Exploring Image Super Resolution Techniques
IRJET Journal20 views
Comparison of Matrix Completion Algorithms for Background Initialization in V... by ActiveEon
Comparison of Matrix Completion Algorithms for Background Initialization in V...Comparison of Matrix Completion Algorithms for Background Initialization in V...
Comparison of Matrix Completion Algorithms for Background Initialization in V...
ActiveEon551 views
rsec2a-2016-jheaton-morning by Jeff Heaton
rsec2a-2016-jheaton-morningrsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morning
Jeff Heaton496 views
Computer vision for transportation by Wanjin Yu
Computer vision for transportationComputer vision for transportation
Computer vision for transportation
Wanjin Yu633 views
International Journal of Engineering and Science Invention (IJESI) by inventionjournals
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
inventionjournals180 views
IRJET- Criminal Recognization in CCTV Surveillance Video by IRJET Journal
IRJET-  	  Criminal Recognization in CCTV Surveillance VideoIRJET-  	  Criminal Recognization in CCTV Surveillance Video
IRJET- Criminal Recognization in CCTV Surveillance Video
IRJET Journal49 views

More from Symeon Papadopoulos

DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno... by
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...Symeon Papadopoulos
856 views29 slides
Deepfakes: An Emerging Internet Threat and their Detection by
Deepfakes: An Emerging Internet Threat and their DetectionDeepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their DetectionSymeon Papadopoulos
1.5K views50 slides
Deepfake Detection: The Importance of Training Data Preprocessing and Practic... by
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...Symeon Papadopoulos
168 views19 slides
COVID-19 Infodemic vs Contact Tracing by
COVID-19 Infodemic vs Contact TracingCOVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact TracingSymeon Papadopoulos
205 views11 slides
Similarity-based retrieval of multimedia content by
Similarity-based retrieval of multimedia contentSimilarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia contentSymeon Papadopoulos
814 views61 slides
Twitter-based Sensing of City-level Air Quality by
Twitter-based Sensing of City-level Air QualityTwitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air QualitySymeon Papadopoulos
708 views27 slides

More from Symeon Papadopoulos(20)

DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno... by Symeon Papadopoulos
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
Deepfakes: An Emerging Internet Threat and their Detection by Symeon Papadopoulos
Deepfakes: An Emerging Internet Threat and their DetectionDeepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their Detection
Symeon Papadopoulos1.5K views
Deepfake Detection: The Importance of Training Data Preprocessing and Practic... by Symeon Papadopoulos
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Similarity-based retrieval of multimedia content by Symeon Papadopoulos
Similarity-based retrieval of multimedia contentSimilarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia content
Aggregating and Analyzing the Context of Social Media Content by Symeon Papadopoulos
Aggregating and Analyzing the Context of Social Media ContentAggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media Content
Symeon Papadopoulos5.9K views
Learning to detect Misleading Content on Twitter by Symeon Papadopoulos
Learning to detect Misleading Content on TwitterLearning to detect Misleading Content on Twitter
Learning to detect Misleading Content on Twitter
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers by Symeon Papadopoulos
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersNear-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Placing Images with Refined Language Models and Similarity Search with PCA-re... by Symeon Papadopoulos
Placing Images with Refined Language Models and Similarity Search with PCA-re...Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...
Perceived versus Actual Predictability of Personal Information in Social Netw... by Symeon Papadopoulos
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...
Web and Social Media Image Forensics for News Professionals by Symeon Papadopoulos
Web and Social Media Image Forensics for News ProfessionalsWeb and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News Professionals
Symeon Papadopoulos1.2K views
Predicting News Popularity by Mining Online Discussions by Symeon Papadopoulos
Predicting News Popularity by Mining Online DiscussionsPredicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online Discussions
Symeon Papadopoulos1.2K views

Recently uploaded

The details of description: Techniques, tips, and tangents on alternative tex... by
The details of description: Techniques, tips, and tangents on alternative tex...The details of description: Techniques, tips, and tangents on alternative tex...
The details of description: Techniques, tips, and tangents on alternative tex...BookNet Canada
127 views24 slides
Ransomware is Knocking your Door_Final.pdf by
Ransomware is Knocking your Door_Final.pdfRansomware is Knocking your Door_Final.pdf
Ransomware is Knocking your Door_Final.pdfSecurity Bootcamp
55 views46 slides
Vertical User Stories by
Vertical User StoriesVertical User Stories
Vertical User StoriesMoisés Armani Ramírez
14 views16 slides
PRODUCT LISTING.pptx by
PRODUCT LISTING.pptxPRODUCT LISTING.pptx
PRODUCT LISTING.pptxangelicacueva6
14 views1 slide
Zero to Automated in Under a Year by
Zero to Automated in Under a YearZero to Automated in Under a Year
Zero to Automated in Under a YearNetwork Automation Forum
15 views23 slides
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院 by
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院IttrainingIttraining
52 views8 slides

Recently uploaded(20)

The details of description: Techniques, tips, and tangents on alternative tex... by BookNet Canada
The details of description: Techniques, tips, and tangents on alternative tex...The details of description: Techniques, tips, and tangents on alternative tex...
The details of description: Techniques, tips, and tangents on alternative tex...
BookNet Canada127 views
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院 by IttrainingIttraining
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
Empathic Computing: Delivering the Potential of the Metaverse by Mark Billinghurst
Empathic Computing: Delivering  the Potential of the MetaverseEmpathic Computing: Delivering  the Potential of the Metaverse
Empathic Computing: Delivering the Potential of the Metaverse
Mark Billinghurst478 views
STPI OctaNE CoE Brochure.pdf by madhurjyapb
STPI OctaNE CoE Brochure.pdfSTPI OctaNE CoE Brochure.pdf
STPI OctaNE CoE Brochure.pdf
madhurjyapb14 views
SAP Automation Using Bar Code and FIORI.pdf by Virendra Rai, PMP
SAP Automation Using Bar Code and FIORI.pdfSAP Automation Using Bar Code and FIORI.pdf
SAP Automation Using Bar Code and FIORI.pdf
Unit 1_Lecture 2_Physical Design of IoT.pdf by StephenTec
Unit 1_Lecture 2_Physical Design of IoT.pdfUnit 1_Lecture 2_Physical Design of IoT.pdf
Unit 1_Lecture 2_Physical Design of IoT.pdf
StephenTec12 views
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... by Bernd Ruecker
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
Bernd Ruecker37 views
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdf by Dr. Jimmy Schwarzkopf
STKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdfSTKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdf
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdf
handbook for web 3 adoption.pdf by Liveplex
handbook for web 3 adoption.pdfhandbook for web 3 adoption.pdf
handbook for web 3 adoption.pdf
Liveplex22 views
Data Integrity for Banking and Financial Services by Precisely
Data Integrity for Banking and Financial ServicesData Integrity for Banking and Financial Services
Data Integrity for Banking and Financial Services
Precisely21 views

Detecting image splicing in the wild Web

  • 1. Detecting image splicing in the wild (Web) Markos Zampoglou, Symeon Papadopoulos, Yiannis Kompatsiaris 1Centre for Research and Technology Hellas (CERTH) – Information Technologies Institute (ITI) WeMuV2015 workshop, ICME, June 29, 2015, Turin, Italy
  • 2. A new journalistic paradigm #2 …and its pitfalls
  • 3. Blind image splicing detection • Assume the splice differs in some aspect from the rest of the image – Capture invisible “traces”: DCT coefficient distribution, PRNU, CFA interpolation patterns… • But traces degrade at subsequent image alterations • Social media journalism establishes a different paradigm from typical image forensics – We don’t have the luxury of demanding we see the originals #3
  • 5. Images in the wild #5 • Twitter: – Images larger than 2048×1024 are scaled down – Large PNG files (> 3MB) converted to JPEG – JPEG files resaved at quality 75 • Facebook – Images larger than 2048 × 2048 are scaled down – Large PNG files converted to JPEG – JPEG files resaved at varying quality (~70-90) • Both media platforms also erase metadata from images
  • 6. Existing image splicing datasets #6 Name Format Masks #images Columbia1 BMP grayscale No 933/912 Columbia Unc.2 TIFF Unc. Yes 183/180 CASIA TIDE v2.03 TIFF Unc. , JPEG, BMP No 7491/5123 VIPP Synthetic4 JPEG Yes 4800/4800 VIPP Realistic4 JPEG Manual 63/68 1http://www.ee.columbia.edu/ln/dvmm/downloads/AuthSplicedDataSet/AuthSplicedDataSet.htm 2http://www.ee.columbia.edu/ln/dvmm/downloads/authsplcuncmp/ 3http://forensics.idealtest.org:8080/indexopt_v2.php 4http://clem.dii.unisi.it/~vipp/index.php/imagerepository/129-a-framework-for-decision-fusion-in- image-forensics-based-on-dempster-shafer-theory-of-evidence
  • 7. Issues with existing datasets #7 • Ground-truth masks: only Columbia Uncompressed and VIPP offer binary masks • Quality of splices: only CASIA and VIPP Realistic contain realistic forgeries • Image format: Only VIPP and CASIA offer JPEG images – At least 87% of the common crawl corpus (http://commoncrawl.org/) images are JPEG – Out of 13,577 forged images collected in our investigations, ~95% were in JPEG format • Neatness: All datasets contain first-level forgeries with no further alterations
  • 8. Collecting a dataset of Web forgeries • Aim: build an evaluation framework with the web- based case in mind – Evaluate existing and future algorithms against the real- world, web-based application scenario – Assess the status of the web: how many versions of each forgery, how close to the original • Methodology: identify verified forgeries, and exhaustively download as many instances as possible for analysis #8
  • 9. The Wild Web Dataset (1/5) • Identified 82 cases of confirmed forgeries #9
  • 10. The Wild Web Dataset (2/5) • Collected all detectable instances of each case • Removed exact file duplicates • 13,577 images in total • Identified and removed heavily altered variants of each case #10
  • 11. The Wild Web Dataset (3/5) • By removing crops and post-splices, we were left with 9,751 images • Variants within cases were separated, and the sources were gathered where possible #11
  • 12. The Wild Web Dataset (4/5) • Designed ground-truth binary masks for each sub- case corresponding to each possible forgery step (for complex forgeries) #12
  • 13. The Wild Web Dataset (5/5) #13 • The final dataset by the numbers: – 82 cases of forgeries – 92 forgery variants – 101 unique masks – 13,577 images total – 9,751 images resembling the original forgery • For each of the 82 cases, a match on any mask of any variant should be considered an overall success
  • 14. Experimental evaluations #14 • Emulated real-world conditions: we applied the minimum typical transformations (JPEG resave & rescaling) to the datasets compatible with the task: – Columbia Uncompressed – VIPP Synthetic – VIPP Realistic – Set 1: JPEG recompression at Quality 75 – Set 2: rescale to 75% size followed JPEG recompression at Quality 75
  • 15. Reconsidering evaluation protocols (1/3) #15 • Forgery localization algorithms typically produce a value map • Ground truth takes the form of a binary mask signifying the tampered area • Past approaches compare values under the mask to the rest of the image: – Kolmogorov-Smirnov (KS) statistic (Farid et al, 2009) – Median value (Fontani et al, 2013)
  • 16. Reconsidering evaluation protocols (2/3) #16 • A recompressed image from VIPP Realistic, analyzed using (Lin et al, 2009)
  • 17. • This would be considered a good detection under typical methodologies – Median under mask: ~0.93 – Median outside mask: ~0.02 – K-S Statistic: ~0.41 • Any human evaluator would disagree #17 Reconsidering evaluation protocols (3/3)
  • 18. Proposed evaluation protocol (1/2) #18 1. Take the output value map 2. Binarize according to some method-appropriate threshold – e.g. 0.5 for probabilistic methods 3. Compare the binary map to the ground truth mask: 4. Values above an experimental threshold (0.65) suggest a strong match 𝐸 𝐴, 𝑀 = 𝐴 ∩ 𝑀 2 𝐴 × 𝑀
  • 19. Proposed evaluation protocol (2/2) #19 • Adapt to mimic a human’s perspective: 1. Apply multiple morphological processing operations 2. Try multiple (method-appropriate) thresholds 3. Keep the best-fitting result (bias towards success) • For non-spliced images (true negative/false positive detection), apply the same methodology and declare a success for a blank binary map – Main disadvantage: binary outcome, no parameters to tweak for ROC curve generation.
  • 20. Evaluations #20 • Evaluated seven algorithms: – Double JPEG quantization (Lin et al, 2009), (Bianchi et al, 2011), (Bianchi et al, 2012a) – Non-Aligned double JPEG quantization (Bianchi et al, 2012b) – CFA artifacts (Ferrara et al, 2007) – High-frequency DW noise (Mahdian et al, 2009) – JPEG ghosts (Farid, 2010)
  • 21. • Comparing median values: Evaluation results: Emulated datasets (1/2) #21 Dataset (Lin et al, 2009) (Bianchi et al, 2011) (Ferrara et al, 2007) (Bianchi et al, 2012b) (Bianchi et al, 2012b) (Mahdian et al, 2009) Columbia Uncomp. Orig. JPEG Resized - - 0.89 (0.05) 0.05 (0.05) 0.03 (0.04) - - 0.39 (0.04) 0.09 (0.05) 0.11 (0.05) VIPP Synthetic Orig. JPEG Resized 0.47 (0.05) 0.30 (0.04) 0.05 (0.05) 0.51 (0.05) 0.43 (0.04) 0.05 (0.05) 0.15 (0.05) 0.16 (0.05) 0.05 (0.04) 0.57 (0.01) 0.39 (0.05) 0.05 (0.05) 0.28 (0.05) 0.16 (0.05) 0.05 (0.05) 0.13 (0.05) 0.10 (0.05) 0.06 (0.05) VIPP Realistic Orig. JPEG Resized 0.54 (0.04) 0.32 (0.04) 0.13 (0.04) 0.58 (0.04) 0.36 (0.04) 0.12(0.06) 0.04 (0.04) 0.04 (0.04) 0.03 (0.04) 0.70 (0.04) 0.51 (0.04) 0.23 (0.04) 0.28 (0.04) 0.17 (0.04) 0.17 (0.04) 0.20 (0.04) 0.20 (0.04) 0.18 (0.04)
  • 22. • Proposed evaluation framework: Evaluation results: Emulated datasets (2/2) #22 Dataset (Lin et al, 2009) (Bianchi et al, 2011) (Ferrara et al, 2007) (Bianchi et al, 2012b) (Bianchi et al, 2012b) (Mahdian et al, 2009) Columbia Uncomp. Orig. JPEG Resized - - 0.66 (0.16) 0.00 (0.20) 0.00 (0.24) - - 0.12 (0.57) 0.02 (0.86) 0.04 (0.79) VIPP Synthetic Orig. JPEG Resized 0.44 (0.27) 0.26 (0.30) 0.00 (0.23) 0.52 (0.00) 0.30 (0.10) 0.00 (0.00) 0.01 (0.23) 0.01 (0.28) 0.00 (0.23) 0.58 (0.09) 0.23 (0.27) 0.00 (0.15) 0.04 (0.25) 0.01 (0.29) 0.00 (0.29) 0.04 (0.74) 0.04 (0.74) 0.00 (0.84) VIPP Realistic Orig. JPEG Resized 0.41 (0.46) 0.13 (0.44) 0.00 (0.47) 0.38 (0.09) 0.17 (0.29) 0.00 (0.00) 0.09 (0.22) 0.00 (0.25) 0.00 (0.28) 0.23 (0.30) 0.14 (0.46) 0.03 (0.25) 0.03 (0.39) 0.01 (0.43) 0.01 (0.47) 0.04 (0.90) 0.02 (0.90) 0.01 (0.47)
  • 23. Evaluation results: Emulated datasets (4/4) #23 • Methods behave generally as expected – CFA patterns destroyed by the first JPEG compression • (Mahdian et al, 2009) is not particularly effective, but shows little vulnerability to alterations • DQ methods show some degree of robustness to recompression only • Rescaling is extremely disruptive, as expected
  • 24. Evaluation results: Wild Web dataset (1/2) #24 • 36 out of 82 cases were successfully detected by at least one method – Not a single image gave good results for the other 46 cases, for any algorithm (Lin et al, 2009) (Bianchi et al, 2011) (Ferrara et al, 2007) (Bianchi et al, 2012b) (Bianchi et al, 2012b) (Mahdian et al, 2009) (Farid, 2010) Detections 13 12 1 8 5 15 29 Unique 4 1 0 1 2 6 10
  • 25. Evaluation results: Wild Web dataset (2/2) #25 • The noise-based method of (Mahdian et al, 2009) proved disproportionately successful, – We should not forget how prone to false positives it is. • JPEG Ghosts are very robust, if we can manage the amount of output they produce • Even in the cases where successful detection occurred, only a few images were correctly detected – 1386 images in the entire dataset (~ 14.3%) – Excluding the three easiest classes, only 333 out of 8580 images were detected (~ 3.9%)
  • 26. Forgery detection in the Wild (1/4) #26
  • 27. Forgery detection in the Wild (2/4) #27
  • 28. Forgery detection in the Wild (3/4) #28
  • 29. Forgery detection in the Wild (4/4) #29
  • 30. Conclusions • In the web, very few images retain traces which are detectable with today’s state-of-the-art forensic approaches • It is difficult to estimate the relative age of each instance of a viral image • DQ-based methods give results with the highest confidence, but are not particularly robust • JPEG Ghosts demonstrate significantly higher robustness than other methods, but produce large amounts of noisy output • DW high-frequency noise also appears to give good results, but seems extremely prone to false positives #30
  • 31. Future steps • For the web journalism case, robustness ought to be a central consideration for future algorithm evaluations • The Wild Web dataset is freely distributed for research purposes – Due to copyright considerations, this is currently only feasible through direct contact – The dataset should be maintained to incorporate new cases of forgeries, as they come out • Advance the state-of-the-art by focusing on more robust traces of splicing • Following the life-cycle of images on the web can help locate their earliest versions and build an account of the alterations that have taken place (Kennedy & Chang, 2008) • The question remains: to what extent is the task feasible? When can we be certain that all traces have been lost? #31
  • 32. References #32 • Bianchi, Tiziano, Alessia De Rosa, and Alessandro Piva. "Improved DCT coefficient analysis for forgery localization in JPEG images." In Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, pp. 2444-2447. IEEE, 2011. • Bianchi, Tiziano and Alessandro Piva, “Image forgery localization via block-grained analysis of JPEG artifacts,” IEEE Transactions on Information Forensics and Security, vol. 7, no. 3, pp. 1003–1017, 2012. • Ferrara, Pasquale, Tiziano Bianchi, Alessia De Rosa, and Alessandro Piva. "Image forgery localization via fine-grained analysis of cfa artifacts." Information Forensics and Security, IEEE Transactions on 7, no. 5 (2012): 1566-1577. • Farid, Hany. "Exposing digital forgeries from JPEG ghosts." Information Forensics and Security, IEEE Transactions on 4, no. 1 (2009): 154-160. • Fontani, Marco, Tiziano Bianchi, Alessia De Rosa, Alessandro Piva, and Mauro Barni. "A framework for decision fusion in image forensics based on dempster–shafer theory of evidence." Information Forensics and Security, IEEE Transactions on 8, no. 4 (2013): 593-607. • Kennedy, Lyndon, and Shih-Fu Chang. "Internet image archaeology: automatically tracing the manipulation history of photographs on the web." In Proceedings of the 16th ACM international conference on Multimedia, pp. 349-358. ACM, 2008. • Lin, Zhouchen, Junfeng He, Xiaoou Tang, and Chi-Keung Tang. "Fast, automatic and fine-grained tampered JPEG image detection via DCT coefficient analysis." Pattern Recognition 42, no. 11 (2009): 2492-2501. • Mahdian, Babak and Stanislav Saic, “Using noise inconsistencies for blind image forensics,” Image and Vision Computing, vol. 27, no. 10, pp. 1497–1503, 2009.
  • 33. Thank you! • Slides: http://www.slideshare.net/sympapadopoulos/detecting-image-splicing- in-the-wild-web • Get in touch: @markzampoglou / markzampoglou@iti.gr @sympapadopoulos / papadop@iti.gr #33