https://imatge.upc.edu/web/publications/demonstration-open-source-framework-qualitative-evaluation-cbir-systems
Evaluating image retrieval systems in a quantitative way, for example by computing measures like mean average precision, allows for objective comparisons with a ground-truth. However, in cases where ground-truth is not available, the only alternative is to collect feedback from a user. Thus, qualitative assessments become important to better understand how the system works. Visualizing the results could be, in some scenarios, the only way to evaluate the results obtained and also the only opportunity to identify that a system is failing. This necessitates developing a User Interface (UI) for a Content Based Image Retrieval (CBIR) system that allows visualization of results and improvement via capturing user relevance feedback. A well-designed UI facilitates understanding of the performance of the system, both in cases where it works well and perhaps more importantly those which highlight the need for improvement. Our open-source system implements three components to facilitate researchers to quickly develop these capabilities for their retrieval engine. We present: a web-based user interface to visualize retrieval results and collect user annotations; a server that simplifies connection with any underlying CBIR system; and a server that manages the search engine data.
Movie recommendation Engine using Artificial IntelligenceHarivamshi D
My Academic Major Project Movie Recommendation using Artificial Intelligence. We also developed a website named movie engine for the recommendation of movies.
L injection toward effective collaborative filtering using uninteresting itemsKumar Dlk
We develop a novel framework, named as l-injection, to address the sparsity problem of recommender systems. By carefully injecting low values to a selected set of unrated user-item pairs in a user-item matrix secure computing in chennai
Online Exams System fulfils the requirements of the institutes to conduct the exams online. They do not have to go to any software developer to make a separate site for being able to conduct exams online. They just have to register on the site and enter the exam details and the lists of the students which can appear in the exam.
Movie recommendation Engine using Artificial IntelligenceHarivamshi D
My Academic Major Project Movie Recommendation using Artificial Intelligence. We also developed a website named movie engine for the recommendation of movies.
L injection toward effective collaborative filtering using uninteresting itemsKumar Dlk
We develop a novel framework, named as l-injection, to address the sparsity problem of recommender systems. By carefully injecting low values to a selected set of unrated user-item pairs in a user-item matrix secure computing in chennai
Online Exams System fulfils the requirements of the institutes to conduct the exams online. They do not have to go to any software developer to make a separate site for being able to conduct exams online. They just have to register on the site and enter the exam details and the lists of the students which can appear in the exam.
Predicting user engagement with direct displays (DD) is of paramount importance to commercial search engines, as well as to search performance evaluation. However, understanding within-content engagement on a web page is not a trivial task mainly because of two reasons: (1) engagement is subjective and different users may exhibit different behavioural patterns; (2) existing proxies of user engagement (e.g., clicks, dwell time) suffer from certain caveats, such as the well-known position bias, and are not as effective in discriminating between useful and non-useful components. In this paper, we conduct a crowdsourcing study and examine how users engage with a prominent web search engine component such as the knowledge module (KM) display. To this end, we collect and analyse more than 115k mouse cursor positions from 300 users, who perform a series of search tasks. Furthermore, we engineer a large number of meta-features which we use to predict different proxies of user engagement, including attention and usefulness. In our experiments, we demonstrate that our approach is able to predict more accurately different levels of user engagement and outperform existing baselines.
Tutorial for Machine Learning 101 (an all-day tutorial at Strata + Hadoop World, New York City, 2015)
The course is designed to introduce machine learning via real applications like building a recommender image analysis using deep learning.
In this talk we cover deployment of machine learning models.
Performance Management of IT Service Processes Using a Mashup-based ApproachCarlos Raniery
Performance Management of IT Service Processes Using a Mashup-based Approach - Thesis presentation
Hypothesis: The employment of mashups enhances the performance of human-centered ITSM processes
Performance evaluation of a multi-core system using Systems development meth...Yoshifumi Sakamoto
I propose to apply Systems development method utilizing Reverse modeling and Model-based Simulation(SRMS), in order to solve the issues of derivational development. Derivational development is a development method for developing derived products based on products already been released. Many organizations developing embedded system have been applying derivational development to improve development efficiency, time-to-market, and product quality.
However, applying derivational development to large-scale embedded systems in a traditional way to performanec requirements of higher functionality is of high risk. In addition, as requirements of embedded systems for dependability are extremely high, the dependability of development process and safety of the product must be proven by evidence. Therefore, there is a possibility to significantly inhibit the evidence of dependability on embedded systems by continuing derivational development over many years. We apply SRMS in order to solve these issues.
In this presentation, I propose a methodology to solve issues of derivational development by applying SRMS to an SoC Equipped Multi Function Peripheral/Printer. Performance evaluation and energy consumption estimation of SoC are performed in the early stages of the SoC development. Estimating the performace and the energy consumption by a high level of abstraction model-based simulation. Because at the early stages of SoC development, modeling with a high level of abstraction is required for both the system architecture of the SoC and the behavior of the System.
The proposed method is applied for an actual embedded system in an MFP - Multi Function Peripheral/Printer. Proposed method SRMS is promising to solve the issues of derivational developmnrt in the embedded system development.
Collaborative spaces are widely used for diverse organizations and purposes. Despite the fact that technological solutions exist there is a lack of methodological support to develop such environments. In this paper we illustrate how FlowiXML methodology can be used to develop collaborative spaces using a real life case study. The benefits of the resulting system are evaluated and the results are discussed.
This document provides an overview of deep generative learning and summarizes several key generative models including GANs, VAEs, diffusion models, and autoregressive models. It discusses the motivation for generative models and their applications such as image generation, text-to-image synthesis, and enhancing other media like video and speech. Example state-of-the-art models are provided for each application. The document also covers important concepts like the difference between discriminative and generative modeling, sampling techniques, and the training procedures for GANs and VAEs.
Predicting user engagement with direct displays (DD) is of paramount importance to commercial search engines, as well as to search performance evaluation. However, understanding within-content engagement on a web page is not a trivial task mainly because of two reasons: (1) engagement is subjective and different users may exhibit different behavioural patterns; (2) existing proxies of user engagement (e.g., clicks, dwell time) suffer from certain caveats, such as the well-known position bias, and are not as effective in discriminating between useful and non-useful components. In this paper, we conduct a crowdsourcing study and examine how users engage with a prominent web search engine component such as the knowledge module (KM) display. To this end, we collect and analyse more than 115k mouse cursor positions from 300 users, who perform a series of search tasks. Furthermore, we engineer a large number of meta-features which we use to predict different proxies of user engagement, including attention and usefulness. In our experiments, we demonstrate that our approach is able to predict more accurately different levels of user engagement and outperform existing baselines.
Tutorial for Machine Learning 101 (an all-day tutorial at Strata + Hadoop World, New York City, 2015)
The course is designed to introduce machine learning via real applications like building a recommender image analysis using deep learning.
In this talk we cover deployment of machine learning models.
Performance Management of IT Service Processes Using a Mashup-based ApproachCarlos Raniery
Performance Management of IT Service Processes Using a Mashup-based Approach - Thesis presentation
Hypothesis: The employment of mashups enhances the performance of human-centered ITSM processes
Performance evaluation of a multi-core system using Systems development meth...Yoshifumi Sakamoto
I propose to apply Systems development method utilizing Reverse modeling and Model-based Simulation(SRMS), in order to solve the issues of derivational development. Derivational development is a development method for developing derived products based on products already been released. Many organizations developing embedded system have been applying derivational development to improve development efficiency, time-to-market, and product quality.
However, applying derivational development to large-scale embedded systems in a traditional way to performanec requirements of higher functionality is of high risk. In addition, as requirements of embedded systems for dependability are extremely high, the dependability of development process and safety of the product must be proven by evidence. Therefore, there is a possibility to significantly inhibit the evidence of dependability on embedded systems by continuing derivational development over many years. We apply SRMS in order to solve these issues.
In this presentation, I propose a methodology to solve issues of derivational development by applying SRMS to an SoC Equipped Multi Function Peripheral/Printer. Performance evaluation and energy consumption estimation of SoC are performed in the early stages of the SoC development. Estimating the performace and the energy consumption by a high level of abstraction model-based simulation. Because at the early stages of SoC development, modeling with a high level of abstraction is required for both the system architecture of the SoC and the behavior of the System.
The proposed method is applied for an actual embedded system in an MFP - Multi Function Peripheral/Printer. Proposed method SRMS is promising to solve the issues of derivational developmnrt in the embedded system development.
Collaborative spaces are widely used for diverse organizations and purposes. Despite the fact that technological solutions exist there is a lack of methodological support to develop such environments. In this paper we illustrate how FlowiXML methodology can be used to develop collaborative spaces using a real life case study. The benefits of the resulting system are evaluated and the results are discussed.
Similar to User Interface for an Image Retrieval Engine System (20)
This document provides an overview of deep generative learning and summarizes several key generative models including GANs, VAEs, diffusion models, and autoregressive models. It discusses the motivation for generative models and their applications such as image generation, text-to-image synthesis, and enhancing other media like video and speech. Example state-of-the-art models are provided for each application. The document also covers important concepts like the difference between discriminative and generative modeling, sampling techniques, and the training procedures for GANs and VAEs.
Machine translation and computer vision have greatly benefited from the advances in deep learning. A large and diverse amount of textual and visual data have been used to train neural networks whether in a supervised or self-supervised manner. Nevertheless, the convergence of the two fields in sign language translation and production still poses multiple open challenges, like the low video resources, limitations in hand pose estimation, or 3D spatial grounding from poses.
The transformer is the neural architecture that has received most attention in the early 2020's. It removed the recurrency in RNNs, replacing it with and attention mechanism across the input and output tokens of a sequence (cross-attenntion) and between the tokens composing the input (and output) sequences, named self-attention.
These slides review the research of our lab since 2016 on applied deep learning, starting from our participation in the TRECVID Instance Search 2014, moving into video analysis with CNN+RNN architectures, and our current efforts in sign language translation and production.
Machine translation and computer vision have greatly benefited of the advances in deep learning. The large and diverse amount of textual and visual data have been used to train neural networks whether in a supervised or self-supervised manner. Nevertheless, the convergence of the two field in sign language translation and production is still poses multiple open challenges, like the low video resources, limitations in hand pose estimation, or 3D spatial grounding from poses. This talk will present these challenges and the How2✌️Sign dataset (https://how2sign.github.io) recorded at CMU in collaboration with UPC, BSC, Gallaudet University and Facebook.
https://imatge.upc.edu/web/publications/sign-language-translation-and-production-multimedia-and-multimodal-challenges-all
https://imatge-upc.github.io/synthref/
Integrating computer vision with natural language processing has achieved significant progress
over the last years owing to the continuous evolution of deep learning. A novel vision and language
task, which is tackled in the present Master thesis is referring video object segmentation, in which a
language query defines which instance to segment from a video sequence. One of the biggest chal-
lenges for this task is the lack of relatively large annotated datasets since a tremendous amount of
time and human effort is required for annotation. Moreover, existing datasets suffer from poor qual-
ity annotations in the sense that approximately one out of ten language expressions fails to uniquely
describe the target object.
The purpose of the present Master thesis is to address these challenges by proposing a novel
method for generating synthetic referring expressions for an image (video frame). This method pro-
duces synthetic referring expressions by using only the ground-truth annotations of the objects as well
as their attributes, which are detected by a state-of-the-art object detection deep neural network. One
of the advantages of the proposed method is that its formulation allows its application to any object
detection or segmentation dataset.
By using the proposed method, the first large-scale dataset with synthetic referring expressions for
video object segmentation is created, based on an existing large benchmark dataset for video instance
segmentation. A statistical analysis and comparison of the created synthetic dataset with existing ones
is also provided in the present Master thesis.
The conducted experiments on three different datasets used for referring video object segmen-
tation prove the efficiency of the generated synthetic data. More specifically, the obtained results
demonstrate that by pre-training a deep neural network with the proposed synthetic dataset one can
improve the ability of the network to generalize across different datasets, without any additional annotation cost. This outcome is even more important taking into account that no additional annotation cost is involved.
Master MATT thesis defense by Juan José Nieto
Advised by Víctor Campos and Xavier Giro-i-Nieto.
27th May 2021.
Pre-training Reinforcement Learning (RL) agents in a task-agnostic manner has shown promising results. However, previous works still struggle to learn and discover meaningful skills in high-dimensional state-spaces. We approach the problem by leveraging unsupervised skill discovery and self-supervised learning of state representations. In our work, we learn a compact latent representation by making use of variational or contrastive techniques. We demonstrate that both allow learning a set of basic navigation skills by maximizing an information theoretic objective. We assess our method in Minecraft 3D maps with different complexities. Our results show that representations and conditioned policies learned from pixels are enough for toy examples, but do not scale to realistic and complex maps. We also explore alternative rewards and input observations to overcome these limitations.
https://imatge.upc.edu/web/publications/discovery-and-learning-navigation-goals-pixels-minecraft
Peter Muschick MSc thesis
Universitat Pollitecnica de Catalunya, 2020
Sign language recognition and translation has been an active research field in the recent years with most approaches using deep neural networks to extract information from sign language data. This work investigates the mostly disregarded approach of using human keypoint estimation from image and video data with OpenPose in combination with transformer network architecture. Firstly, it was shown that it is possible to recognize individual signs (4.5% word error rate (WER)). Continuous sign language recognition though was more error prone (77.3% WER) and sign language translation was not possible using the proposed methods, which might be due to low accuracy scores of human keypoint estimation by OpenPose and accompanying loss of information or insufficient capacities of the used transformer model. Results may improve with the use of datasets containing higher repetition rates of individual signs or focusing more precisely on keypoint extraction of hands.
https://github.com/telecombcn-dl/lectures-all/
These slides review techniques for interpreting the behavior of deep neural networks. The talk reviews basic techniques such as the display of filters and tensors, as well as more advanced ones that try to interpret which part of the input data is responsible for the predictions, or generate data that maximizes the activation of certain neurons.
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/dlai-2020/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/drl-2020/
This course presents the principles of reinforcement learning as an artificial intelligence tool based on the interaction of the machine with its environment, with applications to control tasks (eg. robotics, autonomous driving) o decision making (eg. resource optimization in wireless communication networks). It also advances in the development of deep neural networks trained with little or no supervision, both for discriminative and generative tasks, with special attention on multimedia applications (vision, language and speech).
Giro-i-Nieto, X. One Perceptron to Rule Them All: Language, Vision, Audio and Speech. In Proceedings of the 2020 International Conference on Multimedia Retrieval (pp. 7-8).
Tutorial page:
https://imatge.upc.edu/web/publications/one-perceptron-rule-them-all-language-vision-audio-and-speech-tutorial
Deep neural networks have boosted the convergence of multimedia data analytics in a unified framework shared by practitioners in natural language, vision and speech. Image captioning, lip reading or video sonorization are some of the first applications of a new and exciting field of research exploiting the generalization properties of deep neural representation. This tutorial will firstly review the basic neural architectures to encode and decode vision, text and audio, to later review the those models that have successfully translated information across modalities.
Image segmentation is a classic computer vision task that aims at labeling pixels with semantic classes. These slides provide an overview of the basic approaches applied from the deep learning field to tackle this challenge and presents the basic subtasks (semantic, instance and panoptic segmentation) and related datasets.
Presented at the International Summer School on Deep Learning (ISSonDL) 2020 held online and organized by the University of Gdansk (Poland) between the 30th August and 2nd September.
http://2020.dl-lab.eu/virtual-summer-school-on-deep-learning/
https://imatge-upc.github.io/rvos-mots/
Video object segmentation can be understood as a sequence-to-sequence task that can benefit from the curriculum learning strategies for better and faster training of deep neural networks. This work explores different schedule sampling and frame skipping variations to significantly improve the performance of a recurrent architecture. Our results on the car class of the KITTI-MOTS challenge indicate that, surprisingly, an inverse schedule sampling is a better option than a classic forward one. Also, that a progressive skipping of frames during training is beneficial, but only when training with the ground truth masks instead of the predicted ones.
Deep neural networks have achieved outstanding results in various applications such as vision, language, audio, speech, or reinforcement learning. These powerful function approximators typically require large amounts of data to be trained, which poses a challenge in the usual case where little labeled data is available. During the last year, multiple solutions have been proposed to leverage this problem, based on the concept of self-supervised learning, which can be understood as a specific case of unsupervised learning. This talk will cover its basic principles and provide examples in the field of multimedia.
More from Universitat Politècnica de Catalunya (20)
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
User Interface for an Image Retrieval Engine System
1. UI for an image
retrieval engine
system
Paula Gomez Duran
A project carried out in The Insight Centre for Data Analytics , in DCU
Kevin McGuinness, Eva Mohedano, Xavier Giró-i-Nieto
3. Why is a UI useful for CBIR ?
● The importance of
visualizing the results
● Ability to capture the
user's intent
33
4. Contributions to the project
- Development of the UI
- Incorporate to the system different modes of interaction
- Quantitative and qualitative evaluation
4
5. Contributions to the project
- Development of the UI
- Incorporate to the system different modes of interaction
- Quantitative and qualitative evaluation
5
7. ReactJs | NodeJS | Python
● Framework of JavaScript
● Scalability, speed and simplicity.
● Fast
● Virtual DOM
● Fast and scalable network apps
● Single-thread using non-blocking I/O calls
● Capable of handling huge number of simultaneous
connections with high throughput.
● NOT able to handle CPU-intensive operations
● Focuses on code readability
● Large standard libraries
● SLOW with speed in request or
response petitions processing
DEVELOPING UI
7
12. CBIR system
● All images analysed and stored
● Compare query analysed with all the other queries (cosine)
** Mohedano, Eva, Kevin McGuinness, Noel E. O'Connor, Amaia Salvador, Ferran Marques, and
Xavier Giro-i-Nieto. "Bags of local convolutional features for scalable instance search." ACM ICMR,
2016.
DEVELOPING UI
12
13. INPUTS OF THE SYSTEM
URL
IMAGE
FROM
FILE
SYSTEM
EXAMPLES
DEVELOPING UI
13
14. INPUTS OF THE SYSTEM
DEVELOPING UI
URL
SYSTEM
EXAMPLES
IMAGE
FROM
FILE
14
15. Contributions to the project
- Development of the UI
- Incorporate to the system different modes of interaction
- Quantitative and qualitative evaluation
15
16. Functionalities of the system
● Explorer mode
INCORPORATE TO THE SYSTEM DIFFERENT MODES OF INTERACTION
● Query expansion mode ● Annotation mode
16
17. Explorer mode :
- motivation → Get to know the datasets and explore the system
- functioning → When the first query is received and the ranking of the similar images is
computed, whichever other image appearing below can be selected now as the new query to
search into the dataset.
INCORPORATE TO THE SYSTEM DIFFERENT MODES OF INTERACTION
17
18. Query expansion mode :
- motivation → Get to know how the algorithm works in the system.
- functioning → Average of the multiple image descriptors selected providing richer representation.
INCORPORATE TO THE SYSTEM DIFFERENT MODES OF INTERACTION
18
19. Annotation mode :
- motivation → Improve the accuracy of the automatic system by user’s interaction.
- functioning → Annotating images and submit to the system to it can train an SVM.
INCORPORATE TO THE SYSTEM DIFFERENT MODES OF INTERACTION
19
20. Contributions to the project
- Development of the UI
- Incorporate to the system different modes of interaction
- Quantitative and qualitative evaluation
20
21. Feedback of users
● UI intuitive
● UI robust and consistent
● UI fully featured
● Understand the purpose of the UI
● Understanding modes with existing explanations
● Explorer mode useful regarding the ‘clickable’ function
● Query expansion mode useful to experiment without affecting
the systems accuracy .
● Annotation mode useful to improve accuracy of a trained model.
Questionnaire data represented in a graphic
Strongly agree
Agree
QUANTITATIVE AND QUALITATIVE EVALUATION
80%
100%
80%
80%
70%
70%
90%
90%
21
22. Query expansion mode:
QUANTITATIVE AND QUALITATIVE EVALUATION
● Low Average Precision → can improve the ranking
● High Average Precision → Just adds noise
22
23. Annotation mode :
QUANTITATIVE AND QUALITATIVE EVALUATION
● Possibility to give just the negative feedback
● Possibility to train a model in order of improve the
system by just annotating some images of the dataset.
23
25. Conclusions
● UI for an image retrieval system.
● User’s feedback was positive in the questionnaire done.
● UI works with 3 commonly used CBIR benchmarks :
❖ Oxford, Paris and Instre
● Annotation tool has been developed
● Quantitative and qualitative evaluation have been carried out.
● Structure in blocks → Can be adapted for other retrieval
algorithms.
25
26. FUTURE WORK
● Include a 'Crop' mode on the query images to specify the region of interest.
● Unify structure of all datasets
● Include a mechanism to measure the time expand per query image.
● Include the ability to search within all photos in the three datasets.
26
Editor's Notes
eina pels cientifics per millorar els seus systems
As I just said, the goal was to develop a UI for a Content Based Image Retrieval search engine system. These systems emerged as a research field in order to fix issues in text-based systems. VISUAL SEARCH
They aim of this systems is to structure the datasets based on the content instead of on the metadata associated.
They work through algorithms which kind of summarize the content of an image into a numerical vector called ‘image representation’.
It is really important to visualize the results. Most systems, base their evaluation in a quantitative assessment, for example, computing measures as mean Average Precision to perform objectives comparisons with a groundtruth. However, in a real life system there is no groundtruth but just the user. So, qualitative assessments are even more important due the fact that good and bad examples can be shown to understand better how the system works. If there is no groundtruth, visualizing the data can be the only way of testing the results and maybe the only opportunity to realize when a system is failing and correct it.
The ability of capture the user’s intent is also really important. This is because it can happen that the similarity concept of the automatic system does not fit with what the user thing is similar. For this reason, it is also really important to develop tools which allows to the user to annotate his perception of whether the results are correct or not.
relevance feedback
My contributions on the project can be summarized in 3 fields.
The first one would be the development of the user interface
the second one would be incorporate to the system different modes of interaction with the users
and the last one would be the quantitative and qualitative evaluation.
So, the development of the UI is very important because it will leads us the tool to provide a visualization of the results obtained to, eventually, also improve them by getting the user’s feedback.
To build the system we needed to build first an structure which was able to support all the requirements on the system. We thought about this:
We thought about developing a web application, which would be all the client side, and connect it with the image retrival system through a server.
Choosing the tools to develop a project is always a difficult task. However, taking into account the different needs of the project when we are in the designing part can make it easier.
3 languages are presented in the state of art of this project, and they are ReactJS, nodeJS and Python.
// EXPLICAR ELS 3
So, taking into account that the server should allow exchanging data with the client in order to provide fast speed and consistent data structures, we decided to built the front end or client part in reactjs and the server in nodejs, due its strength in processing request-response petitions.
However, we also needed a server able to support all the mathematical operations in order to calculate the rankings of a query. A solution for this was to build also another server, this one in python, that would incorpore all the image retrieval system code and maintain also a connection with the nodejs server. This connection will be done by ZERO_RPC, which is a library that allows to establish the connection between both programming languages.
The three datasets used are :
** these are very popular datasets in the scientific community working in CBIR.
- Oxford Buildings contains 5,063 images and Paris Buildings contains 6,412 images. These both datasets have collected the images from Flickr, a photo sharing community.
- In the other hand, we also have Instre dataset. This dataset is larger and contains 28,543 images collected from multiple
sources, such as search engines as Google, different social networks as facebook and also photo sharing communities like Flickr.
The aim of the project is to develop an UI capable to compute the list of similar images given a query. So when introducing a query into the system this page will be displayed where you can observe the most similar images at the top of the raking and the less similar ones at the bottom of it.
So, i want to brief a little bit how does the CBIR system works. The system i’m using is from the state of art . it was developed at DCU & UPC and it was awarded in a conference in NY in 2016.
In the CBIR system, first all of the images are analysed and stored. Analysing the images follows the next proceedment.
first of all, the image is introduced into a pre-trained CNN and decomposed in local deep learning descriptors. Then, this descriptors will be codified by a BoW model using a specific vocabulary. This descriptor vector is the one stored for each image.
So, when a query is received into the system, this image is also analysed in order to get the final descriptor vector as gotten above. This vector will be compared with all the other vector’s queries of the system by cosine similarity and then, the ranking will be generated by sorting the similarity scores in descending order.
When two images are really similar, the cosine similarity score is nearly 1.
Cosine similarity computedSimilarity scores in descending orderRanking computed
A web UI is designed so that different alternatives are provided when choosing an image to enter into the system. Thus, the interface will have the option of using some of the already existing images provided as an example by the system, as well as an option of experimenting with a user's own image, either uploading it from a file or providing its URL to the system.
In the second contribution to the project the aim was to incorporate to the system different modes of interaction with the users. Each mode built has different purposes
and it is explained in future slides.
The explorer mode’s motivation is to navigate through the system and get to know the different datasets. When a query is received and its ranking is computed, whichever of these images that appear below like in this image, are clickable to become the new query on which one start the new search.
The query expansion mode’s motivation is to check how the deep learning algorithm works by a qualitative evaluation. This mode works by selecting multiples images. The average of their descriptors is done and the search and ranking are computed again. A richer representation is provided to the system, allowing it to improve . In this image it is not very clear but in the demo at the end of the presentation it can be better appreciated.
relevance feedback
The annotation’s mode motivation is to improve the system’s accuracy by user’s interaction. Annotate successful and failure images of the ranking and submit the annotations to the system. It is thought to train a model in order to classify what is relevant of what is not. Then, the ranking is update it.
support vector machines (SVMs, also support vector networks[1]) are supervised learning models with associated learning algorithms that analyze data used for classificationand regression analysis.
functioning → annotate successful and failure images of the ranking and submit the annotations to the system. It is thought to train a model in order to classify what is relevant of what is not. Then, the ranking is update it.
The last contribution is the quantitative and qualitative evaluation of the system. It is done in the first place by the results of a questionnaire, and then through the comparison between query expansion and annotation modes.
An experiment is carried out by asking to 10 people if they can use the system for about 10 minutes and then, answer these questions. The questionnaire was measured with a likert scale where the users choose an option amongst : SD, D, N, A or SA.
In this slide it is shown a brief of the results obtained on the questionnaire.
The statistics concluded that the UI is intuitive, robust, consistent and built with a useful purpose. The aim of its different modes of operation is understood by the users although some hover info icons are advised in the comments.
The users also agreed on the fact that each mode has its own utility thus it is very useful either giving straight feedback to the system as well as being able to experiment with hypothetical situations without affecting it.
Last but not least, the fact that an explorer mode is developed with 'clickable' images, has been the most liked feature.
On the QE mode, the 2 next conclusions have been achieved.
The first one, is that taking a query with Low Average Precision makes the ranking improves either in the quantitative way as well as in the visualization of the results, what means the qualitative way.
The second one, as we can also see in quantitative and qualitative ways, is that when selecting an image with hight AP, it does not improve the accuracy but just introduce noise.
For this reason, if a really accurate ranking wants to be improved, its is much better to use the annotation mode. This is due the possibility of annotating specifically the negative images that doesn’t belong to the ranking .
It is also useful the possibility of training an svm because when we have thousands of images we don’t want to annotate the full dataset. Then, instead of annotating all, we can just annotate some of the images and then, train a mode, in this case an svm was trained, which will predict the classification for all the images.
The final conclusions are summarized in this points:
So, we built an UI for an image retrieval system and the feedback collect it by the users denoted that the UI is intuitive, robust, consistent and built with a useful purpose.
The UI works with 3 commonly used CBIR benchmarks that allow internal and external searches of queries. This are Oxford, Paris and instre dataset.
An annotation tool to collect the intent of the users is developed to improve the image retrieval system.
Quantitative and qualitative evaluations have been carried out and different scenarios have been studied in order to improve the system
The system is designed in a way that the structure can be used for other search engines just changing this module. For this reason, experimenting with another retrieval system could be done and that is why a possible open source contribution has been talked about.