SlideShare a Scribd company logo
ADBIS 2013 Conference
Genoa, Italy, Sep 1-4, 2013
Compact and Distinctive Visual Vocabularies for
Efficient Multimedia Data Indexing
Dimitris Kastrinakis1
, Symeon Papadopoulos2
, Athena Vakali1
1
Aristotle University of Thessaloniki, Department of Informatics (AUTH)
2
Centre for Research and Technology Hellas, Information Technologies Institute (CERTH-ITI)
#2
Overview
• Problem formulation
• Related work
• Proposed method
• Evaluation
• Conclusions
#3
Motivation
• Multimedia collections are ever-growing
– Personal photo collections can easily reach several
thousands of photos
– Professional photo archives are typically in the range of
hundreds of thousands to many millions of photos
– Online photos are many billions
– It is estimated that a total of 3.5 trillion photos have been
captured by people so far
• Need for effective and efficient search!
– Prevalent paradigm: Content-based image search
http://blog.1000memories.com/94-number-of-photos-ever-taken-digital-and-analog-in-shoebox
#4
Problem Formulation (1)
Content-based image search (CBIR)
• Also known as “example-based search”, “similarity-
based search”
• Given an indexed collection of images and a query
image (typically not part of the collection) fetch N
images of the collection that are most similar to the
query image
– similarity is typically computed on the basis of Euclidean
distance between feature vectors extracted from the
visual content of images
#5
Problem Formulation (2)
• Recent CBIR systems make use of local descriptors (e.g. SIFT,
SURF) extracted from images, according to the BoW retrieval
paradigm.
– Each image is represented as a set (“bag”) of visual words
• Visual words are the result of a learning process (clustering +
quantization) on a large set of images
– An inverted index structure is used to speed-up retrieval
• Such systems achieve good search accuracy, but:
– For this to happen, the number of visual words in the vocabulary need
to be very large (~105
-106
)
– In case of very large vocabularies, we face two computational
problems: (a) creating the vocabularies (offline), and (b) quantizing
the local descriptors of a new image to the vocabulary words (online)
#6
Overview of BoW indexing
Feature extraction
(e.g. SIFT, SURF)
Feature clustering
+ quantization
Visual
vocabulary
VOCABULARY LEARNING
Feature extraction
(e.g. SIFT, SURF)
Feature-to-
vocabulary
mapping
TRAINING COLLECTION
COLLECTION TO INDEX
QUERY IMAGE
Feature extraction
(e.g. SIFT, SURF)
Feature-to-
vocabulary
mapping
Collection
index
INDEXING
RETRIEVAL
#7
Our contribution
• Propose the concept of Composite Visual Words
(CVW) that reduces the need for many visual words
– CVW are permutations of plain visual words
– Using CVW instead of plain visual words makes it possible
to achieve similar search accuracy levels having much
fewer visual words in the vocabulary
• Experimentally validate our approach in two
standard datasets (Oxford and Paris buildings)
– With a vocabulary of 200 visual words and the use of CVW
we manage to match the retrieval performance of
approaches that use two-three orders of magnitude more
visual words.
#8
Our contribution
Feature extraction
(e.g. SIFT, SURF)
Feature clustering
+ quantization
Visual
vocabulary
VOCABULARY LEARNING
Feature extraction
(e.g. SIFT, SURF)
Feature-to-
vocabulary
mapping
TRAINING COLLECTION
COLLECTION TO INDEX
QUERY IMAGE
Feature extraction
(e.g. SIFT, SURF)
Feature-to-
vocabulary
mapping
Collection
index
INDEXING
RETRIEVAL
CVW
#9
Notation
#10
Related Work
• First use of BoW approach for image similarity search (Sivic &
Zisserman 2003)
• Approaches to speed up original method:
– Hierarchical vocabulary tree (Nister & Stewenius, 2007)
– Approximate k-means clustering (Philbin et al., 2007)
• More expressive (and expensive!) representations:
– Soft assignment of descriptors to words (Philbin et al., 2008) 
requires more index space and time
– Visual phrases (Yuan et al., 2007)  does not take into account order
of words, expensive mining phase
– Words + phrases (Zhang et al., 2009)  high accuracy but
computationally intensive learning and still large vocabulary
– Vocabulary that maintains spatial layout of words (Zhang et al., 2010),
bundled features (Wu et al., 2009)  complex vocabulary generation
process + not high coverage of feature space
#11
Proposed Approach (1)
• Instead of indexing using plain visual words, we index
using permutations of visual words based on their
relative distance from the image to be indexed
PLAIN VISUAL WORDS COMPOSITE VISUAL WORDS
#12
Proposed Approach (2)
• Important distinction:
– Original visual vocabulary (V): Number of visual words
that are the result of the clustering and quantization
process on the training collection.
– Effective visual vocabulary (V’): Number of composite
visual words that can be used for indexing.
Maximum theoretical size:
– For instance, if |V|=100 and B = 3 (the maximum length of
a permutation), then |V|’max = 970,200
This is the main way to increase the distinctive capability
of the vocabulary without increasing complexity.
#13
Proposed Approach (3)
• Caveat: If the maximum effective vocabulary size increases a
lot (e.g. several millions), the retrieval performance might be
harmed since several CVWs will appear very sparsely.
• For this, we employ a thresholding strategy to make sure that
the resulting CVWs are high-quality.
where d(u,f) is the Euclidean distance between local feature f
and word u, while max d(u’,f) is the maximum distance
between feature f and any word of V. Parameter α controls
the “strictness” of the threshold (larger α means stricter
thresholding).
#14
Algorithmic description of approach
#15
Evaluation
• Datasets: Oxford and Paris buildings
• Implementation:
– Local descriptors: SIFT, 2000 features/image (Oxford),
1000 features/image (Paris)
– Inverted index: Apache Solr
• Evaluation Measure:
mean Average Precision (mAP)
#16
Results (1)
OXFORD PARIS
#17
Results (2)
EXAMPLE RESULTS (|V|=200)
Comparison to SoA
#18
Conclusions
SUMMARY
• Approach to increase distinctive capability of visual
vocabulary without harming efficiency.
• Validation in standard datasets demonstrates
significant reduction in vocabulary size while
maintaining state-of-the-art retrieval accuracy.
FUTURE WORK
• Tests in larger datasets (e.g. using millions of images
as distractors)
• Alternative thresholding or CVW filtering strategies.
#19
References (1)
• Classic BoW indexing
Sivic, J., Zisserman, A.: Video Google: A text retrieval
approach to object matching in videos. Ninth IEEE
International Conference on Computer Vision, ICCV (2003)
• Efficient BoW indexing
Nister, D., Stewenius, H.: Scalable recognition with a
vocabulary tree. IEEE Computer Society Conference on
Computer Vision and Pattern Recognition, Vol. 2. (2006)
Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object
retrieval with large vocabularies and fast spatial matching.
IEEE Conference on Computer Vision and Pattern
Recognition, CVPR’07 (2007)
#20
References (2)
• Richer BoW-based representations
Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Lost in
quantization: Improving particular object retrieval in large scale image
databases. IEEE Conference on Computer Vision and Pattern
Recognition, CVPR (2008)
Wu, Z., Ke, Q., Isard, M., Sun, J.: Bundling features for large scale partial-
duplicate web image search. IEEE Conference on Computer Vision and
Pattern Recognition, CVPR (2009)
Yuan, J., Wu, Y., Yang, M.: Discovery of collocation patterns: from visual
words to visual phrases. In IEEE Conference on Computer Vision and
Pattern Recognition, CVPR’07, 1–8, IEEE (2007)
Zhang, S., Tian, Q., Hua, G., Huang, Q., Li, S.: Descriptive visual words and
visual phrases for image applications. In Proceedings of the 17th ACM
international conference on Multimedia, 75–84, ACM (2009)
Zhang, S., Huang, Q., Hua, G., Jiang, S., Gao, W., Tian, Q: Building
contextual visual vocabulary for large-scale image applications. In
Proceedings of the international conference on Multimedia, 501–510,
ACM (2010)
#21
Questions
Further contact:
papadop@iti.gr
Acknowledgement:
www.socialsensor.eu

More Related Content

Similar to Compact and Distinctive Visual Vocabularies for Efficient Multimedia Data Indexing

Fusion of demands in review of bag of-visual words
Fusion of demands in review of bag of-visual wordsFusion of demands in review of bag of-visual words
Fusion of demands in review of bag of-visual words
eSAT Publishing House
 
Proof of Concept for Learning Analytics Interoperability
Proof of Concept for Learning Analytics InteroperabilityProof of Concept for Learning Analytics Interoperability
Proof of Concept for Learning Analytics Interoperability
Open Cyber University of Korea
 
RECURRENT FEATURE GROUPING AND CLASSIFICATION MODEL FOR ACTION MODEL PREDICTI...
RECURRENT FEATURE GROUPING AND CLASSIFICATION MODEL FOR ACTION MODEL PREDICTI...RECURRENT FEATURE GROUPING AND CLASSIFICATION MODEL FOR ACTION MODEL PREDICTI...
RECURRENT FEATURE GROUPING AND CLASSIFICATION MODEL FOR ACTION MODEL PREDICTI...
IJDKP
 
RECURRENT FEATURE GROUPING AND CLASSIFICATION MODEL FOR ACTION MODEL PREDICTI...
RECURRENT FEATURE GROUPING AND CLASSIFICATION MODEL FOR ACTION MODEL PREDICTI...RECURRENT FEATURE GROUPING AND CLASSIFICATION MODEL FOR ACTION MODEL PREDICTI...
RECURRENT FEATURE GROUPING AND CLASSIFICATION MODEL FOR ACTION MODEL PREDICTI...
IJDKP
 
F010433136
F010433136F010433136
F010433136
IOSR Journals
 
Evaluating the User Experience of Virtual Learning Environments Using Biometr...
Evaluating the User Experience of Virtual Learning Environments Using Biometr...Evaluating the User Experience of Virtual Learning Environments Using Biometr...
Evaluating the User Experience of Virtual Learning Environments Using Biometr...
Renée Schulz
 
Video + Language
Video + LanguageVideo + Language
Research Inventy : International Journal of Engineering and Science is publis...
Research Inventy : International Journal of Engineering and Science is publis...Research Inventy : International Journal of Engineering and Science is publis...
Research Inventy : International Journal of Engineering and Science is publis...researchinventy
 
Research Inventy: International Journal of Engineering and Science
Research Inventy: International Journal of Engineering and ScienceResearch Inventy: International Journal of Engineering and Science
Research Inventy: International Journal of Engineering and Scienceresearchinventy
 
Interactive Video Search: Where is the User in the Age of Deep Learning?
Interactive Video Search: Where is the User in the Age of Deep Learning?Interactive Video Search: Where is the User in the Age of Deep Learning?
Interactive Video Search: Where is the User in the Age of Deep Learning?
klschoef
 
Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?
Goergen Institute for Data Science
 
Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?
Goergen Institute for Data Science
 
Query clip genre recognition using tree pruning technique for video retrieval
Query clip genre recognition using tree pruning technique for video retrievalQuery clip genre recognition using tree pruning technique for video retrieval
Query clip genre recognition using tree pruning technique for video retrievalIAEME Publication
 
Query clip genre recognition using tree pruning technique for video retrieval
Query clip genre recognition using tree pruning technique for video retrievalQuery clip genre recognition using tree pruning technique for video retrieval
Query clip genre recognition using tree pruning technique for video retrievalIAEME Publication
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Matthew Lease
 
Searching Images: Recent research at Southampton
Searching Images: Recent research at SouthamptonSearching Images: Recent research at Southampton
Searching Images: Recent research at Southampton
Jonathon Hare
 
Searching Images: Recent research at Southampton
Searching Images: Recent research at SouthamptonSearching Images: Recent research at Southampton
Searching Images: Recent research at Southampton
Jonathon Hare
 
Searching Images: Recent research at Southampton
Searching Images: Recent research at SouthamptonSearching Images: Recent research at Southampton
Searching Images: Recent research at Southampton
Jonathon Hare
 

Similar to Compact and Distinctive Visual Vocabularies for Efficient Multimedia Data Indexing (20)

Fusion of demands in review of bag of-visual words
Fusion of demands in review of bag of-visual wordsFusion of demands in review of bag of-visual words
Fusion of demands in review of bag of-visual words
 
Proof of Concept for Learning Analytics Interoperability
Proof of Concept for Learning Analytics InteroperabilityProof of Concept for Learning Analytics Interoperability
Proof of Concept for Learning Analytics Interoperability
 
RECURRENT FEATURE GROUPING AND CLASSIFICATION MODEL FOR ACTION MODEL PREDICTI...
RECURRENT FEATURE GROUPING AND CLASSIFICATION MODEL FOR ACTION MODEL PREDICTI...RECURRENT FEATURE GROUPING AND CLASSIFICATION MODEL FOR ACTION MODEL PREDICTI...
RECURRENT FEATURE GROUPING AND CLASSIFICATION MODEL FOR ACTION MODEL PREDICTI...
 
RECURRENT FEATURE GROUPING AND CLASSIFICATION MODEL FOR ACTION MODEL PREDICTI...
RECURRENT FEATURE GROUPING AND CLASSIFICATION MODEL FOR ACTION MODEL PREDICTI...RECURRENT FEATURE GROUPING AND CLASSIFICATION MODEL FOR ACTION MODEL PREDICTI...
RECURRENT FEATURE GROUPING AND CLASSIFICATION MODEL FOR ACTION MODEL PREDICTI...
 
F010433136
F010433136F010433136
F010433136
 
50120130404055
5012013040405550120130404055
50120130404055
 
Evaluating the User Experience of Virtual Learning Environments Using Biometr...
Evaluating the User Experience of Virtual Learning Environments Using Biometr...Evaluating the User Experience of Virtual Learning Environments Using Biometr...
Evaluating the User Experience of Virtual Learning Environments Using Biometr...
 
Video + Language 2019
Video + Language 2019Video + Language 2019
Video + Language 2019
 
Video + Language
Video + LanguageVideo + Language
Video + Language
 
Research Inventy : International Journal of Engineering and Science is publis...
Research Inventy : International Journal of Engineering and Science is publis...Research Inventy : International Journal of Engineering and Science is publis...
Research Inventy : International Journal of Engineering and Science is publis...
 
Research Inventy: International Journal of Engineering and Science
Research Inventy: International Journal of Engineering and ScienceResearch Inventy: International Journal of Engineering and Science
Research Inventy: International Journal of Engineering and Science
 
Interactive Video Search: Where is the User in the Age of Deep Learning?
Interactive Video Search: Where is the User in the Age of Deep Learning?Interactive Video Search: Where is the User in the Age of Deep Learning?
Interactive Video Search: Where is the User in the Age of Deep Learning?
 
Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?
 
Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?
 
Query clip genre recognition using tree pruning technique for video retrieval
Query clip genre recognition using tree pruning technique for video retrievalQuery clip genre recognition using tree pruning technique for video retrieval
Query clip genre recognition using tree pruning technique for video retrieval
 
Query clip genre recognition using tree pruning technique for video retrieval
Query clip genre recognition using tree pruning technique for video retrievalQuery clip genre recognition using tree pruning technique for video retrieval
Query clip genre recognition using tree pruning technique for video retrieval
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
Searching Images: Recent research at Southampton
Searching Images: Recent research at SouthamptonSearching Images: Recent research at Southampton
Searching Images: Recent research at Southampton
 
Searching Images: Recent research at Southampton
Searching Images: Recent research at SouthamptonSearching Images: Recent research at Southampton
Searching Images: Recent research at Southampton
 
Searching Images: Recent research at Southampton
Searching Images: Recent research at SouthamptonSearching Images: Recent research at Southampton
Searching Images: Recent research at Southampton
 

More from Symeon Papadopoulos

DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
Symeon Papadopoulos
 
Deepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their DetectionDeepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their Detection
Symeon Papadopoulos
 
Knowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering LocalizationKnowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering Localization
Symeon Papadopoulos
 
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Symeon Papadopoulos
 
COVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact TracingCOVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact Tracing
Symeon Papadopoulos
 
Twitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air QualityTwitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air Quality
Symeon Papadopoulos
 
Aggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media ContentAggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media Content
Symeon Papadopoulos
 
Verifying Multimedia Content on the Internet
Verifying Multimedia Content on the InternetVerifying Multimedia Content on the Internet
Verifying Multimedia Content on the Internet
Symeon Papadopoulos
 
A Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering DetectionA Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering Detection
Symeon Papadopoulos
 
Learning to detect Misleading Content on Twitter
Learning to detect Misleading Content on TwitterLearning to detect Misleading Content on Twitter
Learning to detect Misleading Content on Twitter
Symeon Papadopoulos
 
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersNear-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Symeon Papadopoulos
 
Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016
Symeon Papadopoulos
 
Multimedia Privacy
Multimedia PrivacyMultimedia Privacy
Multimedia Privacy
Symeon Papadopoulos
 
Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...
Symeon Papadopoulos
 
In-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging PerformanceIn-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging Performance
Symeon Papadopoulos
 
Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...
Symeon Papadopoulos
 
Web and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News ProfessionalsWeb and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News Professionals
Symeon Papadopoulos
 
Predicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online DiscussionsPredicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online Discussions
Symeon Papadopoulos
 
Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015
Symeon Papadopoulos
 
CERTH/CEA LIST at MediaEval Placing Task 2015
CERTH/CEA LIST at MediaEval Placing Task 2015CERTH/CEA LIST at MediaEval Placing Task 2015
CERTH/CEA LIST at MediaEval Placing Task 2015
Symeon Papadopoulos
 

More from Symeon Papadopoulos (20)

DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
 
Deepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their DetectionDeepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their Detection
 
Knowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering LocalizationKnowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering Localization
 
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
 
COVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact TracingCOVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact Tracing
 
Twitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air QualityTwitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air Quality
 
Aggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media ContentAggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media Content
 
Verifying Multimedia Content on the Internet
Verifying Multimedia Content on the InternetVerifying Multimedia Content on the Internet
Verifying Multimedia Content on the Internet
 
A Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering DetectionA Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering Detection
 
Learning to detect Misleading Content on Twitter
Learning to detect Misleading Content on TwitterLearning to detect Misleading Content on Twitter
Learning to detect Misleading Content on Twitter
 
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersNear-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
 
Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016
 
Multimedia Privacy
Multimedia PrivacyMultimedia Privacy
Multimedia Privacy
 
Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...
 
In-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging PerformanceIn-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging Performance
 
Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...
 
Web and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News ProfessionalsWeb and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News Professionals
 
Predicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online DiscussionsPredicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online Discussions
 
Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015
 
CERTH/CEA LIST at MediaEval Placing Task 2015
CERTH/CEA LIST at MediaEval Placing Task 2015CERTH/CEA LIST at MediaEval Placing Task 2015
CERTH/CEA LIST at MediaEval Placing Task 2015
 

Recently uploaded

PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
Vlad Stirbu
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 

Recently uploaded (20)

PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 

Compact and Distinctive Visual Vocabularies for Efficient Multimedia Data Indexing

  • 1. ADBIS 2013 Conference Genoa, Italy, Sep 1-4, 2013 Compact and Distinctive Visual Vocabularies for Efficient Multimedia Data Indexing Dimitris Kastrinakis1 , Symeon Papadopoulos2 , Athena Vakali1 1 Aristotle University of Thessaloniki, Department of Informatics (AUTH) 2 Centre for Research and Technology Hellas, Information Technologies Institute (CERTH-ITI)
  • 2. #2 Overview • Problem formulation • Related work • Proposed method • Evaluation • Conclusions
  • 3. #3 Motivation • Multimedia collections are ever-growing – Personal photo collections can easily reach several thousands of photos – Professional photo archives are typically in the range of hundreds of thousands to many millions of photos – Online photos are many billions – It is estimated that a total of 3.5 trillion photos have been captured by people so far • Need for effective and efficient search! – Prevalent paradigm: Content-based image search http://blog.1000memories.com/94-number-of-photos-ever-taken-digital-and-analog-in-shoebox
  • 4. #4 Problem Formulation (1) Content-based image search (CBIR) • Also known as “example-based search”, “similarity- based search” • Given an indexed collection of images and a query image (typically not part of the collection) fetch N images of the collection that are most similar to the query image – similarity is typically computed on the basis of Euclidean distance between feature vectors extracted from the visual content of images
  • 5. #5 Problem Formulation (2) • Recent CBIR systems make use of local descriptors (e.g. SIFT, SURF) extracted from images, according to the BoW retrieval paradigm. – Each image is represented as a set (“bag”) of visual words • Visual words are the result of a learning process (clustering + quantization) on a large set of images – An inverted index structure is used to speed-up retrieval • Such systems achieve good search accuracy, but: – For this to happen, the number of visual words in the vocabulary need to be very large (~105 -106 ) – In case of very large vocabularies, we face two computational problems: (a) creating the vocabularies (offline), and (b) quantizing the local descriptors of a new image to the vocabulary words (online)
  • 6. #6 Overview of BoW indexing Feature extraction (e.g. SIFT, SURF) Feature clustering + quantization Visual vocabulary VOCABULARY LEARNING Feature extraction (e.g. SIFT, SURF) Feature-to- vocabulary mapping TRAINING COLLECTION COLLECTION TO INDEX QUERY IMAGE Feature extraction (e.g. SIFT, SURF) Feature-to- vocabulary mapping Collection index INDEXING RETRIEVAL
  • 7. #7 Our contribution • Propose the concept of Composite Visual Words (CVW) that reduces the need for many visual words – CVW are permutations of plain visual words – Using CVW instead of plain visual words makes it possible to achieve similar search accuracy levels having much fewer visual words in the vocabulary • Experimentally validate our approach in two standard datasets (Oxford and Paris buildings) – With a vocabulary of 200 visual words and the use of CVW we manage to match the retrieval performance of approaches that use two-three orders of magnitude more visual words.
  • 8. #8 Our contribution Feature extraction (e.g. SIFT, SURF) Feature clustering + quantization Visual vocabulary VOCABULARY LEARNING Feature extraction (e.g. SIFT, SURF) Feature-to- vocabulary mapping TRAINING COLLECTION COLLECTION TO INDEX QUERY IMAGE Feature extraction (e.g. SIFT, SURF) Feature-to- vocabulary mapping Collection index INDEXING RETRIEVAL CVW
  • 10. #10 Related Work • First use of BoW approach for image similarity search (Sivic & Zisserman 2003) • Approaches to speed up original method: – Hierarchical vocabulary tree (Nister & Stewenius, 2007) – Approximate k-means clustering (Philbin et al., 2007) • More expressive (and expensive!) representations: – Soft assignment of descriptors to words (Philbin et al., 2008)  requires more index space and time – Visual phrases (Yuan et al., 2007)  does not take into account order of words, expensive mining phase – Words + phrases (Zhang et al., 2009)  high accuracy but computationally intensive learning and still large vocabulary – Vocabulary that maintains spatial layout of words (Zhang et al., 2010), bundled features (Wu et al., 2009)  complex vocabulary generation process + not high coverage of feature space
  • 11. #11 Proposed Approach (1) • Instead of indexing using plain visual words, we index using permutations of visual words based on their relative distance from the image to be indexed PLAIN VISUAL WORDS COMPOSITE VISUAL WORDS
  • 12. #12 Proposed Approach (2) • Important distinction: – Original visual vocabulary (V): Number of visual words that are the result of the clustering and quantization process on the training collection. – Effective visual vocabulary (V’): Number of composite visual words that can be used for indexing. Maximum theoretical size: – For instance, if |V|=100 and B = 3 (the maximum length of a permutation), then |V|’max = 970,200 This is the main way to increase the distinctive capability of the vocabulary without increasing complexity.
  • 13. #13 Proposed Approach (3) • Caveat: If the maximum effective vocabulary size increases a lot (e.g. several millions), the retrieval performance might be harmed since several CVWs will appear very sparsely. • For this, we employ a thresholding strategy to make sure that the resulting CVWs are high-quality. where d(u,f) is the Euclidean distance between local feature f and word u, while max d(u’,f) is the maximum distance between feature f and any word of V. Parameter α controls the “strictness” of the threshold (larger α means stricter thresholding).
  • 15. #15 Evaluation • Datasets: Oxford and Paris buildings • Implementation: – Local descriptors: SIFT, 2000 features/image (Oxford), 1000 features/image (Paris) – Inverted index: Apache Solr • Evaluation Measure: mean Average Precision (mAP)
  • 17. #17 Results (2) EXAMPLE RESULTS (|V|=200) Comparison to SoA
  • 18. #18 Conclusions SUMMARY • Approach to increase distinctive capability of visual vocabulary without harming efficiency. • Validation in standard datasets demonstrates significant reduction in vocabulary size while maintaining state-of-the-art retrieval accuracy. FUTURE WORK • Tests in larger datasets (e.g. using millions of images as distractors) • Alternative thresholding or CVW filtering strategies.
  • 19. #19 References (1) • Classic BoW indexing Sivic, J., Zisserman, A.: Video Google: A text retrieval approach to object matching in videos. Ninth IEEE International Conference on Computer Vision, ICCV (2003) • Efficient BoW indexing Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 2. (2006) Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. IEEE Conference on Computer Vision and Pattern Recognition, CVPR’07 (2007)
  • 20. #20 References (2) • Richer BoW-based representations Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Lost in quantization: Improving particular object retrieval in large scale image databases. IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2008) Wu, Z., Ke, Q., Isard, M., Sun, J.: Bundling features for large scale partial- duplicate web image search. IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2009) Yuan, J., Wu, Y., Yang, M.: Discovery of collocation patterns: from visual words to visual phrases. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR’07, 1–8, IEEE (2007) Zhang, S., Tian, Q., Hua, G., Huang, Q., Li, S.: Descriptive visual words and visual phrases for image applications. In Proceedings of the 17th ACM international conference on Multimedia, 75–84, ACM (2009) Zhang, S., Huang, Q., Hua, G., Jiang, S., Gao, W., Tian, Q: Building contextual visual vocabulary for large-scale image applications. In Proceedings of the international conference on Multimedia, 501–510, ACM (2010)