SlideShare a Scribd company logo
Similarity-based retrieval of
multimedia content
Dr. Symeon Papadopoulos
Senior Researcher, CERTH-ITI
Monday Jan 28, 2019 @ Media AUTh
Our lab
Multimedia Knowledge and
Social Media Analytics Laboratory
• Part of Information Technologies Institute (ITI) -
Centre for Research and Technology Hellas (CERTH)
• 60+ researchers (20+ post-docs)
• key areas: multimedia, social media, computer vision,
data mining, machine learning
• applications: media, security, culture, environment
• involved in 60+ projects and published 600+ papers
https://mklab.iti.gr/
Related projects
2018-20212016-2018
https://www.invid-project.eu/ https://weverify.eu/
https://www.smartinsights.com/
internet-marketing-
statistics/happens-online-60-
seconds/
500 hours of video per min =
720,000 hours per day >
82 years of video per day!
Pope Francis
Pope Benedict
2007: iPhone release
2008: Android release
2010: iPad release
http://petapixel.com/2013/03/14/a-starry-sea-of-cameras-at-the-unveiling-of-pope-francis/
Detecting disinformation
Claim:
Hurricane Irma, Sep 2017
Fact:
Hurricane Dolores, May 2016
A shark thriving in hurricanes
https://www.snopes.com/photos/animals/puertorico.asp
Memes
Similarity-based media search
Two main problems
•How to compute similarity between two
items (in accordance with my needs)?
•How to search (using above similarity
function) in very large collections in
reasonable time?
visual similarity
an overview of approaches
What is similar?
• Variety of definitions and understandings regarding what
can be considered to be similar
• Near-duplicate videos: definition by Wu et al. (2007)
• photometric variations: gamma, contrast, brightness, etc.
• editing operations: resize, shift, crop, flip
• insertion of patterns: caption, logo, subtitles, sliding captions, etc.
• re-encoding: video format, compression
• video modifications: frame rate, frame insertion, deletion, swap
X. Wu, A. G. Hauptmann, and C. W. Ngo. Practical elimination of near-duplicates from web video search. In
Proceedings of the 15th ACM international conference on Multimedia, pp. 218-227, 2007
Hashing
• Cryptographic or checksum hashing: MD5, SHA1
• Input: bitstream (not just images or videos)
• Output: hash code 128-bit (MD5), 160-bit (SHA1), etc.
• Property: minor changes in input can lead to completely
different hash codes
https://jenssegers.com/61/perceptual-image-hashes
Example
EA6BF04059B4CB0D
889296F1788B321B
8435D4A072804237
308F9566508C963C
http://onlinemd5.com/
Perceptual hashing
• Generate a fingerprint that can be used to compare
images using the Hamming Distance
• Instance: Average Hashing (aHash)
• Reduce size  8x8 pixels
• Reduce colour  RGB to grayscale
• Calculate average colour  among 64 grayscale values
• Compute hash  for each pixel, binary value depending
on whether it is higher or lower than average
 64-bit signature
aHash: example
11001001011010010011110000011000
00001000000000000000011100111111
https://jenssegers.com/61/perceptual-image-hashes
dHash and pHash
• dHash: Difference Hash
• same steps as aHash
• hash is generated based on whether left pixel is brighter
than the right one
• less false positives compared to aHash
• pHash: Perceptual Hash
• more complicated algorithm
• resize to 32x32
• DCT on luma (brightness) component
• top left 8x8  hash by comparing to median value
pHash examples
Hamming distance = 0
Hamming distance = 24
Hamming distance = 29
Hamming distance = 27
https://www.phash.org/demo/ (select DCT hash)
Pixel-based similarity doesn’t
match perception
All three variations of the first image are equidistant
from it in terms of L2 pixel distance!
http://cs231n.github.io/classification/
Global descriptors
• A single vector that attempts to capture the main
visual properties of an image, e.g. distribution of
colour, spatial layout of brightness, textures, etc.
• Popular choices include:
• GIST – spatial envelope (Oliva & Torralba, 2001)
• Color: Dominant Color, Scalable Color, Color Structure,
Color Layout Descriptor (MPEG-7, 2001)
• Texture: Texture Browsing, Homogeneous Texture, Edge
Histogram (MPEG-7, 2001)
A. Oliva and A. Torralba. Modeling the shape of the scene: a holistic representation
of the spatial envelope. IJCV, 42(3):145–175, 2001
Text of ISO/IEC 15 938-3 Multimedia Content Description Interface—Part 3: Visual.
Final Committee Draft, ISO/IEC/JTC1/SC29/ WG11, Doc. N4062, Mar. 2001
GIST-based near-duplicate search
Douze, M., Jégou, H., Sandhawalia, H., Amsaleg, L., & Schmid, C. (2009, July). Evaluation of gist descriptors for web-
scale image search. In Proceedings of the ACM International Conference on Image and Video Retrieval (p. 19). ACM.
Local descriptors
• Basic scheme:
• Detect a set of features (i.e. interest points) in an image
• Extract one descriptor around each feature
• Plenty of options for both parts, e.g.:
• Feature detectors: Canny, Sobel, Harris, FAST, Laplacian
of Gaussian (LoG), Difference of Gaussians (DoG),
Determinant of Hessian (DoH), MSER
• Feature descriptors: SIFT, GLOH, SURF, ORB
• Much higher accuracy at the cost of increased
complexity
Scale-Invariant Feature Transforms (SIFT)
Set of descriptors
A single descriptor
(16 histograms of 8 bins 
128 dims)
http://faculty.ucmerced.edu/mhyang/project/iccv13_exemplar/ICCV13_exemplarCut/vlfeat-0.9.14/doc/overview/sift.html
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International journal of computer
vision, 60(2), 91-110.
Example: SIFT matching
https://www.cc.gatech.edu/~hays/compvision/proj2/
Bag of Visual Words (BoVW)
https://towardsdatascience.com/bag-of-visual-words-in-a-nutshell-9ceea97ce0fb
Bag of Visual Words (BoVW)
https://towardsdatascience.com/bag-of-visual-words-in-a-nutshell-9ceea97ce0fb
extract a set of local features from each image
Bag of Visual Words (BoVW)
• a representative
sample of features
selected
• features are clustered
• cluster centroids (or
medoids) are
considered to be the
visual codebook
https://towardsdatascience.com/bag-of-visual-words-in-a-nutshell-9ceea97ce0fb
Bag of Visual Words (BoVW)
https://towardsdatascience.com/bag-of-visual-words-in-a-nutshell-9ceea97ce0fb
Indexing and Querying
• tf-idf weighting of visual words
𝑤𝑡𝑑 = 𝑛 𝑡𝑑 ∙ log 𝐷 𝑏 /𝑛 𝑡
• Inverted file indexing structure for fast search
• Retrieve candidates with at least one common
visual word
• Rank candidates, e.g. based on cosine similarity
of their tf-idf representations
𝑠𝑖𝑚 𝑞, 𝑝 =
𝒘 𝒒 ∙ 𝒘 𝒑
𝒘 𝒒 𝒘 𝒑
BoVW Discussion
• BoVW is a sparse representation: each image is
associated with few visual words (compared to the
whole vocabulary)
• Convenient for indexing and look-up
• Completely misses spatial layout  extensions
• Performance depends on:
• size of vocabulary
• dataset where vocabulary was learned
Neural network features
https://www.pnas.org/content/116/4/1074 (artist Lucy Reading-Ikkanda)
Popular CNN architectures
VGGNet (2014)
GoogleNet (2014)
https://cs.stanford.edu/people
/karpathy/cnnembed/
video search
towards building a reverse video
search engine
From Image to Video Similarity
• A video can be considered as a richer
representation compared to images:
• set of images (frames)
• frames and motion
• frames and motion and audio
• For efficiency purposes, we typically simplify or
discard part of the information:
• frames  descriptors  average
• frames  visual words  bag of frame-words
Video search architecture
Video indexing calls
/index (HTTP GET request)
Add the provided video to the video index
• url: the URL of the video that is going to be indexed
• async: flag for asynchronous processing
/youtube (HTTP GET request)
Query YouTube API with either a video ID or a provided text query
and add the retrieved videos to the video index
• video_id: video ID to query YouTube API
• text: provided text to query YouTube API
• max: maximum number of videos to be add to the video index
/delete (HTTP DELETE request)
Delete the provided video from the video index
• url: the URL of the video that is going to be deleted
Video search calls
/search (HTTP GET request)
Video-level search: retrieve relevant video by calculating the
similarity between the entire videos
• url: URL of the query video
• t_sim: similarity threshold
• t_rank: rank threshold
/partial (HTTP GET request)
Shot-level search: retrieve relevant video segments from the indexed
videos in the database
• url: URL of the query video
• v_sim: video similarity threshold
• s_sim: shot similarity threshold
Combining CNNs and BoVW
Kordopatis-Zilos, G., Papadopoulos, S., Patras, I., & Kompatsiaris, Y. (2017, January). Near-duplicate video retrieval by
aggregating intermediate CNN layers. In International Conference on Multimedia Modeling (pp. 251-263). Springer
An improved setup
Kordopatis-Zilos, G., Papadopoulos, S., Patras, I., & Kompatsiaris, Y. (2017, January). Near-duplicate video retrieval by
aggregating intermediate CNN layers. In International Conference on Multimedia Modeling (pp. 251-263). Springer
Learning similarity
Before training
After training
Kordopatis-Zilos, G., Papadopoulos, S., Patras, I., & Kompatsiaris, Y. (2017, October). Near-Duplicate Video Retrieval with
Deep Metric Learning. In 2017 IEEE International Conference on Computer Vision Workshop (ICCVW), (pp. 347-356). IEEE
Support for partial duplicate search
FIVR-200K
a dataset for evaluating NDVR
Kordopatis-Zilos, G., Papadopoulos, S., Patras, I., & Kompatsiaris, I. (2018).
FIVR: Fine-grained Incident Video Retrieval. arXiv preprint arXiv:1809.04094
FIVR-200K
• A video dataset to help research on the problem of
Fine-grained Incident Video Retrieval
• Duplicate Scene Videos (DSVs)
• Complementary Scene Videos (CSVs)
• Incident Scene Videos (ISVs)
• 225,960 videos around 4,687 news events from Jan
1st 2013 to Dec 31st 2017
Wikipedia: current events
https://en.wikipedia.org/wiki/Portal:Current_events
Dataset statistics
Number of events
Number of videos
Dataset statistics
Video category
Video duration
Dataset statistics
Dataset statistics
Example videos
Boston Marathon bombing
query near-duplicate
complementary view same incident
Las Vegas shootings
query near-duplicate
complementary view same incident
Our Video Search Tool
http://ndd.iti.gr/video_search/
Ideas
• Pick one video around one event between 2013
and 2017 and try to find similar versions of it
• Pick one of the clusters-events in the Browse
section and try to find some important videos that
cover the event
• Given an event of interest, identify in which sources
it is covered (language, country, type of channel)
• Add videos from a newer event and use them to
perform new searches
Source code
https://github.com/MKLab-ITI/intermediate-cnn-features
https://github.com/MKLab-ITI/ndvr-dml
Papers
• Kordopatis-Zilos, G., Papadopoulos, S., Patras, I., & Kompatsiaris, Y.
(2017, January). Near-duplicate video retrieval by aggregating
intermediate CNN layers. In International Conference on Multimedia
Modeling (pp. 251-263). Springer
• Kordopatis-Zilos, G., Papadopoulos, S., Patras, I., & Kompatsiaris, Y.
(2017, October). Near-Duplicate Video Retrieval with Deep Metric
Learning. In 2017 IEEE International Conference on Computer Vision
Workshop (ICCVW), (pp. 347-356). IEEE
• Kordopatis-Zilos, G., Papadopoulos, S., Patras, I., & Kompatsiaris, I.
(2018). FIVR: Fine-grained Incident Video Retrieval. arXiv preprint
arXiv:1809.04094
Acknowledgements
• Giorgos Kordopatis-Zilos / near-duplicate video
retrieval, back-end development, FIVR-200K
collection and annotation
• Lazaros Apostolidis / web front-end development
• Polichronis Charitidis / FIVR-200K annotation
Thank you for your attention!
Akis Papadopoulos papadop@iti.gr
@sympap

More Related Content

What's hot (20)

Biometric encryption
Biometric encryptionBiometric encryption
Biometric encryption
 
Image enhancement ppt nal2
Image enhancement ppt nal2Image enhancement ppt nal2
Image enhancement ppt nal2
 
Pixel Relationships Examples
Pixel Relationships ExamplesPixel Relationships Examples
Pixel Relationships Examples
 
Image processing
Image processingImage processing
Image processing
 
Digital image formats
Digital image formatsDigital image formats
Digital image formats
 
Module 4 part_1
Module 4 part_1Module 4 part_1
Module 4 part_1
 
Image processing Presentation
Image processing PresentationImage processing Presentation
Image processing Presentation
 
Digital Image Processing: Image Segmentation
Digital Image Processing: Image SegmentationDigital Image Processing: Image Segmentation
Digital Image Processing: Image Segmentation
 
Image feature extraction
Image feature extractionImage feature extraction
Image feature extraction
 
Morphological image processing
Morphological image processingMorphological image processing
Morphological image processing
 
Lecture 3 image sampling and quantization
Lecture 3 image sampling and quantizationLecture 3 image sampling and quantization
Lecture 3 image sampling and quantization
 
Computer Vision image classification
Computer Vision image classificationComputer Vision image classification
Computer Vision image classification
 
Enhancement in spatial domain
Enhancement in spatial domainEnhancement in spatial domain
Enhancement in spatial domain
 
Content Based Image Retrieval
Content Based Image Retrieval Content Based Image Retrieval
Content Based Image Retrieval
 
Image enhancement
Image enhancementImage enhancement
Image enhancement
 
Real Time Object Tracking
Real Time Object TrackingReal Time Object Tracking
Real Time Object Tracking
 
MPEG 4
MPEG 4MPEG 4
MPEG 4
 
Bayesian Networks - A Brief Introduction
Bayesian Networks - A Brief IntroductionBayesian Networks - A Brief Introduction
Bayesian Networks - A Brief Introduction
 
Image compression standards
Image compression standardsImage compression standards
Image compression standards
 
Image segmentation
Image segmentation Image segmentation
Image segmentation
 

Similar to Similarity-based retrieval of multimedia content

Video Hyperlinking Tutorial (Part B)
Video Hyperlinking Tutorial (Part B)Video Hyperlinking Tutorial (Part B)
Video Hyperlinking Tutorial (Part B)LinkedTV
 
Video Coding Enhancements for HTTP Adaptive Streaming
Video Coding Enhancements for HTTP Adaptive StreamingVideo Coding Enhancements for HTTP Adaptive Streaming
Video Coding Enhancements for HTTP Adaptive StreamingAlpen-Adria-Universität
 
Research@Lunch_Presentation.pdf
Research@Lunch_Presentation.pdfResearch@Lunch_Presentation.pdf
Research@Lunch_Presentation.pdfVignesh V Menon
 
Presentation of the InVID verification technologies at IPTC 2018
Presentation of the InVID verification technologies at IPTC 2018Presentation of the InVID verification technologies at IPTC 2018
Presentation of the InVID verification technologies at IPTC 2018InVID Project
 
Video Compression Algorithm Based on Frame Difference Approaches
Video Compression Algorithm Based on Frame Difference Approaches Video Compression Algorithm Based on Frame Difference Approaches
Video Compression Algorithm Based on Frame Difference Approaches ijsc
 
A Segmentation based Sequential Pattern Matching for Efficient Video Copy De...
A Segmentation based Sequential Pattern Matching for Efficient Video Copy De...A Segmentation based Sequential Pattern Matching for Efficient Video Copy De...
A Segmentation based Sequential Pattern Matching for Efficient Video Copy De...SWAMI06
 
Multimedia Information Retrieval: What is it, and why isn't ...
Multimedia Information Retrieval: What is it, and why isn't ...Multimedia Information Retrieval: What is it, and why isn't ...
Multimedia Information Retrieval: What is it, and why isn't ...webhostingguy
 
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersNear-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersSymeon Papadopoulos
 
How to prepare a perfect video abstract for your research paper – Pubrica.pdf
How to prepare a perfect video abstract for your research paper – Pubrica.pdfHow to prepare a perfect video abstract for your research paper – Pubrica.pdf
How to prepare a perfect video abstract for your research paper – Pubrica.pdfPubrica
 
A Framework for Adaptive Delivery of Omnidirectional Video
A Framework for Adaptive Delivery of Omnidirectional VideoA Framework for Adaptive Delivery of Omnidirectional Video
A Framework for Adaptive Delivery of Omnidirectional VideoAlpen-Adria-Universität
 
An Stepped Forward Security System for Multimedia Content Material for Cloud ...
An Stepped Forward Security System for Multimedia Content Material for Cloud ...An Stepped Forward Security System for Multimedia Content Material for Cloud ...
An Stepped Forward Security System for Multimedia Content Material for Cloud ...IRJET Journal
 
Software Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software EngineeringSoftware Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software EngineeringTao Xie
 
Query clip genre recognition using tree pruning technique for video retrieval
Query clip genre recognition using tree pruning technique for video retrievalQuery clip genre recognition using tree pruning technique for video retrieval
Query clip genre recognition using tree pruning technique for video retrievalIAEME Publication
 
Query clip genre recognition using tree pruning technique for video retrieval
Query clip genre recognition using tree pruning technique for video retrievalQuery clip genre recognition using tree pruning technique for video retrieval
Query clip genre recognition using tree pruning technique for video retrievalIAEME Publication
 
How to prepare a perfect video abstract for your research paper – Pubrica.pptx
How to prepare a perfect video abstract for your research paper – Pubrica.pptxHow to prepare a perfect video abstract for your research paper – Pubrica.pptx
How to prepare a perfect video abstract for your research paper – Pubrica.pptxPubrica
 
Key Frame Extraction for Salient Activity Recognition
Key Frame Extraction for Salient Activity RecognitionKey Frame Extraction for Salient Activity Recognition
Key Frame Extraction for Salient Activity RecognitionSuhas Pillai
 

Similar to Similarity-based retrieval of multimedia content (20)

A04840107
A04840107A04840107
A04840107
 
Video Hyperlinking Tutorial (Part B)
Video Hyperlinking Tutorial (Part B)Video Hyperlinking Tutorial (Part B)
Video Hyperlinking Tutorial (Part B)
 
Video Coding Enhancements for HTTP Adaptive Streaming
Video Coding Enhancements for HTTP Adaptive StreamingVideo Coding Enhancements for HTTP Adaptive Streaming
Video Coding Enhancements for HTTP Adaptive Streaming
 
Research@Lunch_Presentation.pdf
Research@Lunch_Presentation.pdfResearch@Lunch_Presentation.pdf
Research@Lunch_Presentation.pdf
 
Presentation of the InVID verification technologies at IPTC 2018
Presentation of the InVID verification technologies at IPTC 2018Presentation of the InVID verification technologies at IPTC 2018
Presentation of the InVID verification technologies at IPTC 2018
 
Video Compression Algorithm Based on Frame Difference Approaches
Video Compression Algorithm Based on Frame Difference Approaches Video Compression Algorithm Based on Frame Difference Approaches
Video Compression Algorithm Based on Frame Difference Approaches
 
A Segmentation based Sequential Pattern Matching for Efficient Video Copy De...
A Segmentation based Sequential Pattern Matching for Efficient Video Copy De...A Segmentation based Sequential Pattern Matching for Efficient Video Copy De...
A Segmentation based Sequential Pattern Matching for Efficient Video Copy De...
 
Guru_poster
Guru_posterGuru_poster
Guru_poster
 
Multimedia Information Retrieval: What is it, and why isn't ...
Multimedia Information Retrieval: What is it, and why isn't ...Multimedia Information Retrieval: What is it, and why isn't ...
Multimedia Information Retrieval: What is it, and why isn't ...
 
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersNear-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
 
50120130404055
5012013040405550120130404055
50120130404055
 
How to prepare a perfect video abstract for your research paper – Pubrica.pdf
How to prepare a perfect video abstract for your research paper – Pubrica.pdfHow to prepare a perfect video abstract for your research paper – Pubrica.pdf
How to prepare a perfect video abstract for your research paper – Pubrica.pdf
 
A Framework for Adaptive Delivery of Omnidirectional Video
A Framework for Adaptive Delivery of Omnidirectional VideoA Framework for Adaptive Delivery of Omnidirectional Video
A Framework for Adaptive Delivery of Omnidirectional Video
 
An Stepped Forward Security System for Multimedia Content Material for Cloud ...
An Stepped Forward Security System for Multimedia Content Material for Cloud ...An Stepped Forward Security System for Multimedia Content Material for Cloud ...
An Stepped Forward Security System for Multimedia Content Material for Cloud ...
 
Software Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software EngineeringSoftware Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software Engineering
 
Query clip genre recognition using tree pruning technique for video retrieval
Query clip genre recognition using tree pruning technique for video retrievalQuery clip genre recognition using tree pruning technique for video retrieval
Query clip genre recognition using tree pruning technique for video retrieval
 
Query clip genre recognition using tree pruning technique for video retrieval
Query clip genre recognition using tree pruning technique for video retrievalQuery clip genre recognition using tree pruning technique for video retrieval
Query clip genre recognition using tree pruning technique for video retrieval
 
How to prepare a perfect video abstract for your research paper – Pubrica.pptx
How to prepare a perfect video abstract for your research paper – Pubrica.pptxHow to prepare a perfect video abstract for your research paper – Pubrica.pptx
How to prepare a perfect video abstract for your research paper – Pubrica.pptx
 
What’s new in MPEG?
What’s new in MPEG?What’s new in MPEG?
What’s new in MPEG?
 
Key Frame Extraction for Salient Activity Recognition
Key Frame Extraction for Salient Activity RecognitionKey Frame Extraction for Salient Activity Recognition
Key Frame Extraction for Salient Activity Recognition
 

More from Symeon Papadopoulos

DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...Symeon Papadopoulos
 
Deepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their DetectionDeepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their DetectionSymeon Papadopoulos
 
Knowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering LocalizationKnowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering LocalizationSymeon Papadopoulos
 
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...Symeon Papadopoulos
 
COVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact TracingCOVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact TracingSymeon Papadopoulos
 
Twitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air QualityTwitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air QualitySymeon Papadopoulos
 
Aggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media ContentAggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media ContentSymeon Papadopoulos
 
Verifying Multimedia Content on the Internet
Verifying Multimedia Content on the InternetVerifying Multimedia Content on the Internet
Verifying Multimedia Content on the InternetSymeon Papadopoulos
 
A Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering DetectionA Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering DetectionSymeon Papadopoulos
 
Learning to detect Misleading Content on Twitter
Learning to detect Misleading Content on TwitterLearning to detect Misleading Content on Twitter
Learning to detect Misleading Content on TwitterSymeon Papadopoulos
 
Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016Symeon Papadopoulos
 
Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Symeon Papadopoulos
 
In-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging PerformanceIn-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging PerformanceSymeon Papadopoulos
 
Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Symeon Papadopoulos
 
Web and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News ProfessionalsWeb and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News ProfessionalsSymeon Papadopoulos
 
Predicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online DiscussionsPredicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online DiscussionsSymeon Papadopoulos
 
Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015Symeon Papadopoulos
 
CERTH/CEA LIST at MediaEval Placing Task 2015
CERTH/CEA LIST at MediaEval Placing Task 2015CERTH/CEA LIST at MediaEval Placing Task 2015
CERTH/CEA LIST at MediaEval Placing Task 2015Symeon Papadopoulos
 
Verifying Multimedia Use at MediaEval 2015
Verifying Multimedia Use at MediaEval 2015Verifying Multimedia Use at MediaEval 2015
Verifying Multimedia Use at MediaEval 2015Symeon Papadopoulos
 

More from Symeon Papadopoulos (20)

DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
 
Deepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their DetectionDeepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their Detection
 
Knowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering LocalizationKnowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering Localization
 
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
 
COVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact TracingCOVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact Tracing
 
Twitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air QualityTwitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air Quality
 
Aggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media ContentAggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media Content
 
Verifying Multimedia Content on the Internet
Verifying Multimedia Content on the InternetVerifying Multimedia Content on the Internet
Verifying Multimedia Content on the Internet
 
A Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering DetectionA Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering Detection
 
Learning to detect Misleading Content on Twitter
Learning to detect Misleading Content on TwitterLearning to detect Misleading Content on Twitter
Learning to detect Misleading Content on Twitter
 
Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016
 
Multimedia Privacy
Multimedia PrivacyMultimedia Privacy
Multimedia Privacy
 
Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...
 
In-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging PerformanceIn-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging Performance
 
Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...
 
Web and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News ProfessionalsWeb and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News Professionals
 
Predicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online DiscussionsPredicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online Discussions
 
Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015
 
CERTH/CEA LIST at MediaEval Placing Task 2015
CERTH/CEA LIST at MediaEval Placing Task 2015CERTH/CEA LIST at MediaEval Placing Task 2015
CERTH/CEA LIST at MediaEval Placing Task 2015
 
Verifying Multimedia Use at MediaEval 2015
Verifying Multimedia Use at MediaEval 2015Verifying Multimedia Use at MediaEval 2015
Verifying Multimedia Use at MediaEval 2015
 

Recently uploaded

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...CzechDreamin
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaCzechDreamin
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Thierry Lestable
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backElena Simperl
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationZilliz
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...Product School
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxDavid Michel
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...Product School
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeCzechDreamin
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...CzechDreamin
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...Product School
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...CzechDreamin
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCzechDreamin
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Product School
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...Sri Ambati
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaRTTS
 

Recently uploaded (20)

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG Evaluation
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 

Similarity-based retrieval of multimedia content

  • 1. Similarity-based retrieval of multimedia content Dr. Symeon Papadopoulos Senior Researcher, CERTH-ITI Monday Jan 28, 2019 @ Media AUTh
  • 2. Our lab Multimedia Knowledge and Social Media Analytics Laboratory • Part of Information Technologies Institute (ITI) - Centre for Research and Technology Hellas (CERTH) • 60+ researchers (20+ post-docs) • key areas: multimedia, social media, computer vision, data mining, machine learning • applications: media, security, culture, environment • involved in 60+ projects and published 600+ papers https://mklab.iti.gr/
  • 5. 500 hours of video per min = 720,000 hours per day > 82 years of video per day!
  • 6. Pope Francis Pope Benedict 2007: iPhone release 2008: Android release 2010: iPad release http://petapixel.com/2013/03/14/a-starry-sea-of-cameras-at-the-unveiling-of-pope-francis/
  • 7.
  • 8.
  • 9. Detecting disinformation Claim: Hurricane Irma, Sep 2017 Fact: Hurricane Dolores, May 2016
  • 10. A shark thriving in hurricanes https://www.snopes.com/photos/animals/puertorico.asp
  • 11.
  • 12. Memes
  • 13. Similarity-based media search Two main problems •How to compute similarity between two items (in accordance with my needs)? •How to search (using above similarity function) in very large collections in reasonable time?
  • 15. What is similar? • Variety of definitions and understandings regarding what can be considered to be similar • Near-duplicate videos: definition by Wu et al. (2007) • photometric variations: gamma, contrast, brightness, etc. • editing operations: resize, shift, crop, flip • insertion of patterns: caption, logo, subtitles, sliding captions, etc. • re-encoding: video format, compression • video modifications: frame rate, frame insertion, deletion, swap X. Wu, A. G. Hauptmann, and C. W. Ngo. Practical elimination of near-duplicates from web video search. In Proceedings of the 15th ACM international conference on Multimedia, pp. 218-227, 2007
  • 16. Hashing • Cryptographic or checksum hashing: MD5, SHA1 • Input: bitstream (not just images or videos) • Output: hash code 128-bit (MD5), 160-bit (SHA1), etc. • Property: minor changes in input can lead to completely different hash codes https://jenssegers.com/61/perceptual-image-hashes
  • 18. Perceptual hashing • Generate a fingerprint that can be used to compare images using the Hamming Distance • Instance: Average Hashing (aHash) • Reduce size  8x8 pixels • Reduce colour  RGB to grayscale • Calculate average colour  among 64 grayscale values • Compute hash  for each pixel, binary value depending on whether it is higher or lower than average  64-bit signature
  • 20. dHash and pHash • dHash: Difference Hash • same steps as aHash • hash is generated based on whether left pixel is brighter than the right one • less false positives compared to aHash • pHash: Perceptual Hash • more complicated algorithm • resize to 32x32 • DCT on luma (brightness) component • top left 8x8  hash by comparing to median value
  • 21. pHash examples Hamming distance = 0 Hamming distance = 24 Hamming distance = 29 Hamming distance = 27 https://www.phash.org/demo/ (select DCT hash)
  • 22. Pixel-based similarity doesn’t match perception All three variations of the first image are equidistant from it in terms of L2 pixel distance! http://cs231n.github.io/classification/
  • 23. Global descriptors • A single vector that attempts to capture the main visual properties of an image, e.g. distribution of colour, spatial layout of brightness, textures, etc. • Popular choices include: • GIST – spatial envelope (Oliva & Torralba, 2001) • Color: Dominant Color, Scalable Color, Color Structure, Color Layout Descriptor (MPEG-7, 2001) • Texture: Texture Browsing, Homogeneous Texture, Edge Histogram (MPEG-7, 2001) A. Oliva and A. Torralba. Modeling the shape of the scene: a holistic representation of the spatial envelope. IJCV, 42(3):145–175, 2001 Text of ISO/IEC 15 938-3 Multimedia Content Description Interface—Part 3: Visual. Final Committee Draft, ISO/IEC/JTC1/SC29/ WG11, Doc. N4062, Mar. 2001
  • 24. GIST-based near-duplicate search Douze, M., Jégou, H., Sandhawalia, H., Amsaleg, L., & Schmid, C. (2009, July). Evaluation of gist descriptors for web- scale image search. In Proceedings of the ACM International Conference on Image and Video Retrieval (p. 19). ACM.
  • 25. Local descriptors • Basic scheme: • Detect a set of features (i.e. interest points) in an image • Extract one descriptor around each feature • Plenty of options for both parts, e.g.: • Feature detectors: Canny, Sobel, Harris, FAST, Laplacian of Gaussian (LoG), Difference of Gaussians (DoG), Determinant of Hessian (DoH), MSER • Feature descriptors: SIFT, GLOH, SURF, ORB • Much higher accuracy at the cost of increased complexity
  • 26. Scale-Invariant Feature Transforms (SIFT) Set of descriptors A single descriptor (16 histograms of 8 bins  128 dims) http://faculty.ucmerced.edu/mhyang/project/iccv13_exemplar/ICCV13_exemplarCut/vlfeat-0.9.14/doc/overview/sift.html Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60(2), 91-110.
  • 28. Bag of Visual Words (BoVW) https://towardsdatascience.com/bag-of-visual-words-in-a-nutshell-9ceea97ce0fb
  • 29. Bag of Visual Words (BoVW) https://towardsdatascience.com/bag-of-visual-words-in-a-nutshell-9ceea97ce0fb extract a set of local features from each image
  • 30. Bag of Visual Words (BoVW) • a representative sample of features selected • features are clustered • cluster centroids (or medoids) are considered to be the visual codebook https://towardsdatascience.com/bag-of-visual-words-in-a-nutshell-9ceea97ce0fb
  • 31. Bag of Visual Words (BoVW) https://towardsdatascience.com/bag-of-visual-words-in-a-nutshell-9ceea97ce0fb
  • 32. Indexing and Querying • tf-idf weighting of visual words 𝑤𝑡𝑑 = 𝑛 𝑡𝑑 ∙ log 𝐷 𝑏 /𝑛 𝑡 • Inverted file indexing structure for fast search • Retrieve candidates with at least one common visual word • Rank candidates, e.g. based on cosine similarity of their tf-idf representations 𝑠𝑖𝑚 𝑞, 𝑝 = 𝒘 𝒒 ∙ 𝒘 𝒑 𝒘 𝒒 𝒘 𝒑
  • 33. BoVW Discussion • BoVW is a sparse representation: each image is associated with few visual words (compared to the whole vocabulary) • Convenient for indexing and look-up • Completely misses spatial layout  extensions • Performance depends on: • size of vocabulary • dataset where vocabulary was learned
  • 35. Popular CNN architectures VGGNet (2014) GoogleNet (2014)
  • 37. video search towards building a reverse video search engine
  • 38. From Image to Video Similarity • A video can be considered as a richer representation compared to images: • set of images (frames) • frames and motion • frames and motion and audio • For efficiency purposes, we typically simplify or discard part of the information: • frames  descriptors  average • frames  visual words  bag of frame-words
  • 40. Video indexing calls /index (HTTP GET request) Add the provided video to the video index • url: the URL of the video that is going to be indexed • async: flag for asynchronous processing /youtube (HTTP GET request) Query YouTube API with either a video ID or a provided text query and add the retrieved videos to the video index • video_id: video ID to query YouTube API • text: provided text to query YouTube API • max: maximum number of videos to be add to the video index /delete (HTTP DELETE request) Delete the provided video from the video index • url: the URL of the video that is going to be deleted
  • 41. Video search calls /search (HTTP GET request) Video-level search: retrieve relevant video by calculating the similarity between the entire videos • url: URL of the query video • t_sim: similarity threshold • t_rank: rank threshold /partial (HTTP GET request) Shot-level search: retrieve relevant video segments from the indexed videos in the database • url: URL of the query video • v_sim: video similarity threshold • s_sim: shot similarity threshold
  • 42. Combining CNNs and BoVW Kordopatis-Zilos, G., Papadopoulos, S., Patras, I., & Kompatsiaris, Y. (2017, January). Near-duplicate video retrieval by aggregating intermediate CNN layers. In International Conference on Multimedia Modeling (pp. 251-263). Springer
  • 43. An improved setup Kordopatis-Zilos, G., Papadopoulos, S., Patras, I., & Kompatsiaris, Y. (2017, January). Near-duplicate video retrieval by aggregating intermediate CNN layers. In International Conference on Multimedia Modeling (pp. 251-263). Springer
  • 44. Learning similarity Before training After training Kordopatis-Zilos, G., Papadopoulos, S., Patras, I., & Kompatsiaris, Y. (2017, October). Near-Duplicate Video Retrieval with Deep Metric Learning. In 2017 IEEE International Conference on Computer Vision Workshop (ICCVW), (pp. 347-356). IEEE
  • 45. Support for partial duplicate search
  • 46. FIVR-200K a dataset for evaluating NDVR Kordopatis-Zilos, G., Papadopoulos, S., Patras, I., & Kompatsiaris, I. (2018). FIVR: Fine-grained Incident Video Retrieval. arXiv preprint arXiv:1809.04094
  • 47. FIVR-200K • A video dataset to help research on the problem of Fine-grained Incident Video Retrieval • Duplicate Scene Videos (DSVs) • Complementary Scene Videos (CSVs) • Incident Scene Videos (ISVs) • 225,960 videos around 4,687 news events from Jan 1st 2013 to Dec 31st 2017
  • 49. Dataset statistics Number of events Number of videos
  • 54. Boston Marathon bombing query near-duplicate complementary view same incident
  • 55. Las Vegas shootings query near-duplicate complementary view same incident
  • 56. Our Video Search Tool http://ndd.iti.gr/video_search/
  • 57. Ideas • Pick one video around one event between 2013 and 2017 and try to find similar versions of it • Pick one of the clusters-events in the Browse section and try to find some important videos that cover the event • Given an event of interest, identify in which sources it is covered (language, country, type of channel) • Add videos from a newer event and use them to perform new searches
  • 59. Papers • Kordopatis-Zilos, G., Papadopoulos, S., Patras, I., & Kompatsiaris, Y. (2017, January). Near-duplicate video retrieval by aggregating intermediate CNN layers. In International Conference on Multimedia Modeling (pp. 251-263). Springer • Kordopatis-Zilos, G., Papadopoulos, S., Patras, I., & Kompatsiaris, Y. (2017, October). Near-Duplicate Video Retrieval with Deep Metric Learning. In 2017 IEEE International Conference on Computer Vision Workshop (ICCVW), (pp. 347-356). IEEE • Kordopatis-Zilos, G., Papadopoulos, S., Patras, I., & Kompatsiaris, I. (2018). FIVR: Fine-grained Incident Video Retrieval. arXiv preprint arXiv:1809.04094
  • 60. Acknowledgements • Giorgos Kordopatis-Zilos / near-duplicate video retrieval, back-end development, FIVR-200K collection and annotation • Lazaros Apostolidis / web front-end development • Polichronis Charitidis / FIVR-200K annotation
  • 61. Thank you for your attention! Akis Papadopoulos papadop@iti.gr @sympap