1. MUCKE Project
• Iftene, A., Sirițeanu, A., Petic, M. How to Do
Diversification in an Image Retrieval System
• Laic, A., Iftene, A. Automatic Image Annotation
• Gherasim, L. M., Iftene, A. Extracting Background
Knowledge about World from Text.
ConsILR, September 18-19, 2014, Craiova
2. Content
MUCKE Team
The core
The data
Text processing
Image processing
Diversification
Problem
Demo
Automatic Image Annotation
ConsILR, September 18-19, 2014, Craiova
3. MUCKE Team
Bilkent University, Turkey
“Al. I. Cuza” University, Iasi, Romania
Vienna University of Technology, Austria
Center for Alternative and Atomic Energy, France
IMCS-50, 2014
5. The core
Text
Processing
Concept
similarity
Image
Processing
User
credibility
Raw multimedia and multilingual data
Output
Image
retrieval
framework
Semantic
Resources
ConsILR, September 18-19, 2014, Craiova
6. The data
Existing collections
A survey done and published online
ImageNet – 14 million annotated images
mediaEval – 3.2 million images
MIRFLICKR – 1 million annotated images
Wikipedia (DBpedia)
ClueWeb09/12
Text
Processing
Concept
similarity
Image
Processing
User
credibility
Raw multimedia and multilingual data
Output
Image
retrieval
framework
Semantic
Resources
New data
Aim: 100million annotated images
Crawling ongoing
ConsILR, September 18-19, 2014, Craiova
7. The data
Distributed crawling and replicated
storage
Text
Processing
Concept
similarity
Image
Processing
User
credibility
Raw multimedia and multilingual data
Output
Image
retrieval
framework
Semantic
Resources
ConsILR, September 18-19, 2014, Craiova
8. Text Processing
Text
Processing
Concept
similarity
Image
Processing
User
credibility
Raw multimedia and multilingual data
Output
Image
retrieval
framework
Semantic
Resources
Entity recognition
Disambiguation
Anaphora resolution
Combined with IR methods
Latent semantic retrieval
Explicit semantic retrieval
Components for:
English, French, German, Romanian
9. Image Processing
Text
Processing
Concept
similarity
Image
Processing
User
credibility
Raw multimedia and multilingual data
Output
Image
retrieval
framework
Semantic
Resources
Parsimonious image description
Large scale concept detection
Detector generalization
Across different datasets
Asses the use and utility of
Different local image descriptors
their combination with other properties (e.g.
color)
For optimal low-level image description
Adapted models for specialized tasks
Face / landmark recognition
11. Diversification – Problem definition
Search Results Diversification is an optimization
problem aiming to select a subset S of k items out of
the n available ones, such that, the diversity and the
relevance among the items of S is maximized. [1]
ConsILR, September 18-19, 2014, Craiova
12. Diversification – Proposed solution
Exploitation of semantic structures in order to
provide diverse and relevant results
Hierarchical structure of YAGO Concepts [6]:
IMCS-50, 2014
13. Performed steps
Deciding what terms in a query should be
used to query YAGO ontology.
Ranking and grouping the results retrieved
by YAGO ontology.
Choosing which YAGO entities to use in
crawling Flickr database.
Ranking the results so that we achieve both
relevance and diversity in the result set.
ConsILR, September 18-19, 2014, Craiova
17. Conclusions
Diversification can really improve quality of
search results.
There is still some work to do in order to
achieve good results in all the possible
scenarios
We need a large collection of annotated
images
We need performance algorithms which
provide the distance between images
ConsILR, September 18-19, 2014, Craiova
18. Thank you
MUCKE
Multimedia and User Credibility Knowledge Extraction
http://thor.info.uaic.ro/~mucke/
ConsILR, September 18-19, 2014, Craiova
19. Bibliography
[1] Drosou, M., Pitoura, E., Search Results Diversification. In SIGMOD, pages 41-47,
2010.
[2] Gollapudi, S., Sharma, A., An Axiomatic Approach for Result Diversification. In
WWW, pages 381-390, 2009.
[3] Carbonell, J. G., Goldstein, J., The use of MMR, diversity-based reranking for
reordering documents and producing summaries. In SIGIR, pages 335–336, 1998
[4] Clarke, C. L. A., Kolla, M., Cormack, G. V., Vechtomova, O., Ashkan, A., Büttcher, S.,
MacKinnon, I., Novelty and diversity in information retrieval evaluation. In SIGIR,
pages 659–666, 2008.
[5] Zheng, W., Wang, X., Fang, H., Cheng, H., Coverage-based search result
diversification, In Journal Information Retrieval, pages 433-457, 2012.
[6] YAGO2s: A High-Quality Knowledge Base, [Online] Available at http://www.mpi-inf.
mpg.de/departments/databases-and-information-systems/research/yago-naga/
yago/ [Last Accessed 27 June 2014].
[7] Cilibrasi, R., Vitanyi, P. M. B., The Google Similarity Distance. In IEEE TKDE, Vol.
19, Issue 3, pages 370-383, 2007.
[8] Kelleher, M., [Online] Available at http://www.smartinsights.com/email-marketing/
behavioural-email-marketing/which-top-5-strategies-drive-relevance-in-email-
marketing/ [Last Accessed 1 July 2014]
ConsILR, September 18-19, 2014, Craiova