The document describes tools for image retrieval in large multimedia databases using the Hierarchical Cellular Tree (HCT) indexing technique. It discusses modifications made to the original HCT, including using an approximation for covering radius. Experimental results on a 216,317 image database show the HCT can be built efficiently and retrievals performed in under a second using the Preemptive Cell Search technique, achieving high recall and retrieval rates. Tools implementing HCT indexing and querying were developed along with a server/client architecture.
Flexible Design for Simple Digital Library Tools and ServicesLighton Phiri
I gave a talk today at this year (2013)'s South African Institute for Computer Scientists and Information Technologists ---SAICSIT--- conference, held in East London, South Africa. The pre-print is here [1], and the actual publisher copy is here [2]. I used the same bastardised version [3] of the Torino Beamer theme.
[1] http://goo.gl/PmHRVo
[2] http://goo.gl/ipQlgw
[3] http://blog.barisione.org/2007-09/torino-a-pretty-theme-for-latex-beamer
Flexible Design for Simple Digital Library Tools and ServicesLighton Phiri
I gave a talk today at this year (2013)'s South African Institute for Computer Scientists and Information Technologists ---SAICSIT--- conference, held in East London, South Africa. The pre-print is here [1], and the actual publisher copy is here [2]. I used the same bastardised version [3] of the Torino Beamer theme.
[1] http://goo.gl/PmHRVo
[2] http://goo.gl/ipQlgw
[3] http://blog.barisione.org/2007-09/torino-a-pretty-theme-for-latex-beamer
TMS workshop on machine learning in materials science: Intro to deep learning...BrianDeCost
This presentation is intended as a high-level introduction for to deep learning and its applications in materials science. The intended audience is materials scientists and engineers
Disclaimers: the second half of this presentation is intended as a broad overview of deep learning applications in materials science; due to time limitations it is not intended to be comprehensive. As a review of the field, this necessarily includes work that is not my own. If my own name is not included explicitly in the reference at the bottom of a slide, I was not involved in that work.
Any mention of commercial products in this presentation is for information only; it does not imply recommendation or endorsement by NIST.
Science platforms are made up of (at least) four planks: data formats, services, tools and conventions. I focus here on formats and conventions, specifically the HDF5 format, already used in many disciplines, and the Climate-Forecast and HDF-EOS Conventions. Many science disciplines have already agreed on HDF as the preferred format for storing and sharing data. It is well established in high performance computing and supports arbitrary grouping and annotation. Community conventions are critical for useful data on top of the format. The Climate-Forecast (CF) conventions were created for relatively simple gridded data types while the HDF-EOS conventions originally considered more complex data (swaths). Making simple conventions more complex makes adoption more difficult. Community input and the need for stable data processing systems must be balanced in governance of conventions.
Recent Advances in Chemical & Biological Search Systems: Evolution vs RevolutionNextMove Software
Presented by Roger Sayle at the 11th International Conference on Chemical Structures (ICCS) 2018.
Exponential database growth will always be a technical challenge but evolutionary strategies offer to hold off the inevitable in the short term and revolutionary strategies promise a longer-term solution.
Deep Learning algorithms are gaining momentum as main components in a large number of fields, from computer vision and robotics to finance and biotechnology. At the same time, the use of Field Programmable Gate Arrays (FPGAs) for data-intensive applications is increasingly widespread thanks to the possibility to customize hardware accelerators and achieve high-performance implementations with low energy consumption. Moreover, FPGAs have demonstrated to be a viable alternative to GPUs in embedded systems applications, where the benefits of the reconfigurability properties make the system more robust, capable to face the system failures and to respect the constraints of the embedded devices. In this work, we present a framework to help to implement Deep Learning algorithms by exploiting the PYNQ platform. In particular, we optimized the creation of the communication interface, the failure tolerance, and the on-chip memory usage.
In this talk, we discuss QuTrack, a Blockchain-based approach to track experiment and model changes primarily for AI and ML models. In addition, we discuss how change analytics can be used for process improvement and to enhance the model development and deployment processes.
Fingerprints are used to identify human and for crime discover. They are used to authenticate persons in order to allow them to gain access to their financial and personal resources or to identify them in big databases. This requires use of fast search engine to reduce time consumed in searching big fingerprint databases, there for choosing searching engine is an important issue to reduce searching time. This paper investigates the existing searching engine methods and presents advantages of AVL tree method over other methods. The paper will investigate searching speed and time consuming to retrieve fingerprint image. Experiment shows use of AVL tree is the best searching algorithm.
A Novel Approach for Clustering Big Data based on MapReduce IJECEIAES
Clustering is one of the most important applications of data mining. It has attracted attention of researchers in statistics and machine learning. It is used in many applications like information retrieval, image processing and social network analytics etc. It helps the user to understand the similarity and dissimilarity between objects. Cluster analysis makes the users understand complex and large data sets more clearly. There are different types of clustering algorithms analyzed by various researchers. Kmeans is the most popular partitioning based algorithm as it provides good results because of accurate calculation on numerical data. But Kmeans give good results for numerical data only. Big data is combination of numerical and categorical data. Kprototype algorithm is used to deal with numerical as well as categorical data. Kprototype combines the distance calculated from numeric and categorical data. With the growth of data due to social networking websites, business transactions, scientific calculation etc., there is vast collection of structured, semi-structured and unstructured data. So, there is need of optimization of Kprototype so that these varieties of data can be analyzed efficiently.In this work, Kprototype algorithm is implemented on MapReduce in this paper. Experiments have proved that Kprototype implemented on Mapreduce gives better performance gain on multiple nodes as compared to single node. CPU execution time and speedup are used as evaluation metrics for comparison.Intellegent splitter is proposed in this paper which splits mixed big data into numerical and categorical data. Comparison with traditional algorithms proves that proposed algorithm works better for large scale of data.
Presenter: Amaia Salvador
Related papers:
E. Mohedano, Salvador, A., McGuinness, K., Giró-i-Nieto, X., O'Connor, N., and Marqués, F., “Bags of Local Convolutional Features for Scalable Instance Search”, in ACM International Conference on Multimedia Retrieval (ICMR), New York City, NY; USA. 2016
A. Salvador, Giró-i-Nieto, X., Marqués, F., and Satoh, S. 'ichi, “Faster R-CNN Features for Instance Search”, in CVPR Workshop Deep Vision, Las Vegas, NV, USA. 2016.
Abstract:
Image representations derived from pre-trained Convolutional Neural Networks (CNNs) have become the new state of the art in computer vision tasks such as instance retrieval. This work proposes a simple pipeline for encoding the local activations of a convolutional layer of a pre-trained CNN using the well-known bag of words aggregation scheme (BoW). Assigning each local array of activations in a convolutional layer to a visual word produces an assignment map, a compact representation that relates regions of an image with a visual word. We use the assignment map for fast spatial reranking, obtaining object localizations that are used for query expansion. We further investigate the potential of using convolutional features from an object detection network such as Faster R-CNN, which allows to obtain image- and region- wise features in a single forward pass. We demonstrate the suitability of such representations for image retrieval on the Oxford Buildings 5k, Paris Buildings 6k and a subset of TRECVid Instance Search 2013, achieving competitive results. This talk will review the two publications related to this work, which have been recently accepted at ICMR 2016 and DeepVision CVPRW 2016.
Barcelona, 3 May 2016.
https://imatge.upc.edu/web/publications/region-oriented-convolutional-networks-object-retrieval
BSc thesis by Eduard Fontdevila advised by Amaia Salvador and Xavier Giró-i-Nieto.
EET UPC, June 2015.
TMS workshop on machine learning in materials science: Intro to deep learning...BrianDeCost
This presentation is intended as a high-level introduction for to deep learning and its applications in materials science. The intended audience is materials scientists and engineers
Disclaimers: the second half of this presentation is intended as a broad overview of deep learning applications in materials science; due to time limitations it is not intended to be comprehensive. As a review of the field, this necessarily includes work that is not my own. If my own name is not included explicitly in the reference at the bottom of a slide, I was not involved in that work.
Any mention of commercial products in this presentation is for information only; it does not imply recommendation or endorsement by NIST.
Science platforms are made up of (at least) four planks: data formats, services, tools and conventions. I focus here on formats and conventions, specifically the HDF5 format, already used in many disciplines, and the Climate-Forecast and HDF-EOS Conventions. Many science disciplines have already agreed on HDF as the preferred format for storing and sharing data. It is well established in high performance computing and supports arbitrary grouping and annotation. Community conventions are critical for useful data on top of the format. The Climate-Forecast (CF) conventions were created for relatively simple gridded data types while the HDF-EOS conventions originally considered more complex data (swaths). Making simple conventions more complex makes adoption more difficult. Community input and the need for stable data processing systems must be balanced in governance of conventions.
Recent Advances in Chemical & Biological Search Systems: Evolution vs RevolutionNextMove Software
Presented by Roger Sayle at the 11th International Conference on Chemical Structures (ICCS) 2018.
Exponential database growth will always be a technical challenge but evolutionary strategies offer to hold off the inevitable in the short term and revolutionary strategies promise a longer-term solution.
Deep Learning algorithms are gaining momentum as main components in a large number of fields, from computer vision and robotics to finance and biotechnology. At the same time, the use of Field Programmable Gate Arrays (FPGAs) for data-intensive applications is increasingly widespread thanks to the possibility to customize hardware accelerators and achieve high-performance implementations with low energy consumption. Moreover, FPGAs have demonstrated to be a viable alternative to GPUs in embedded systems applications, where the benefits of the reconfigurability properties make the system more robust, capable to face the system failures and to respect the constraints of the embedded devices. In this work, we present a framework to help to implement Deep Learning algorithms by exploiting the PYNQ platform. In particular, we optimized the creation of the communication interface, the failure tolerance, and the on-chip memory usage.
In this talk, we discuss QuTrack, a Blockchain-based approach to track experiment and model changes primarily for AI and ML models. In addition, we discuss how change analytics can be used for process improvement and to enhance the model development and deployment processes.
Fingerprints are used to identify human and for crime discover. They are used to authenticate persons in order to allow them to gain access to their financial and personal resources or to identify them in big databases. This requires use of fast search engine to reduce time consumed in searching big fingerprint databases, there for choosing searching engine is an important issue to reduce searching time. This paper investigates the existing searching engine methods and presents advantages of AVL tree method over other methods. The paper will investigate searching speed and time consuming to retrieve fingerprint image. Experiment shows use of AVL tree is the best searching algorithm.
A Novel Approach for Clustering Big Data based on MapReduce IJECEIAES
Clustering is one of the most important applications of data mining. It has attracted attention of researchers in statistics and machine learning. It is used in many applications like information retrieval, image processing and social network analytics etc. It helps the user to understand the similarity and dissimilarity between objects. Cluster analysis makes the users understand complex and large data sets more clearly. There are different types of clustering algorithms analyzed by various researchers. Kmeans is the most popular partitioning based algorithm as it provides good results because of accurate calculation on numerical data. But Kmeans give good results for numerical data only. Big data is combination of numerical and categorical data. Kprototype algorithm is used to deal with numerical as well as categorical data. Kprototype combines the distance calculated from numeric and categorical data. With the growth of data due to social networking websites, business transactions, scientific calculation etc., there is vast collection of structured, semi-structured and unstructured data. So, there is need of optimization of Kprototype so that these varieties of data can be analyzed efficiently.In this work, Kprototype algorithm is implemented on MapReduce in this paper. Experiments have proved that Kprototype implemented on Mapreduce gives better performance gain on multiple nodes as compared to single node. CPU execution time and speedup are used as evaluation metrics for comparison.Intellegent splitter is proposed in this paper which splits mixed big data into numerical and categorical data. Comparison with traditional algorithms proves that proposed algorithm works better for large scale of data.
Presenter: Amaia Salvador
Related papers:
E. Mohedano, Salvador, A., McGuinness, K., Giró-i-Nieto, X., O'Connor, N., and Marqués, F., “Bags of Local Convolutional Features for Scalable Instance Search”, in ACM International Conference on Multimedia Retrieval (ICMR), New York City, NY; USA. 2016
A. Salvador, Giró-i-Nieto, X., Marqués, F., and Satoh, S. 'ichi, “Faster R-CNN Features for Instance Search”, in CVPR Workshop Deep Vision, Las Vegas, NV, USA. 2016.
Abstract:
Image representations derived from pre-trained Convolutional Neural Networks (CNNs) have become the new state of the art in computer vision tasks such as instance retrieval. This work proposes a simple pipeline for encoding the local activations of a convolutional layer of a pre-trained CNN using the well-known bag of words aggregation scheme (BoW). Assigning each local array of activations in a convolutional layer to a visual word produces an assignment map, a compact representation that relates regions of an image with a visual word. We use the assignment map for fast spatial reranking, obtaining object localizations that are used for query expansion. We further investigate the potential of using convolutional features from an object detection network such as Faster R-CNN, which allows to obtain image- and region- wise features in a single forward pass. We demonstrate the suitability of such representations for image retrieval on the Oxford Buildings 5k, Paris Buildings 6k and a subset of TRECVid Instance Search 2013, achieving competitive results. This talk will review the two publications related to this work, which have been recently accepted at ICMR 2016 and DeepVision CVPRW 2016.
Barcelona, 3 May 2016.
https://imatge.upc.edu/web/publications/region-oriented-convolutional-networks-object-retrieval
BSc thesis by Eduard Fontdevila advised by Amaia Salvador and Xavier Giró-i-Nieto.
EET UPC, June 2015.
More details: https://imatge.upc.edu/web/publications/rapid-serial-visual-presentation-relevance-feedback-image-retrieval-eeg-signals
Author: Sergi Porta
Advisors: Eva Mohedano & Noel O'Connor (DCU) / Amaia Salavador & Xavier Giró-i-Nieto (UPC)
This thesis explores the potential of relevance feedback for image retrieval using EEG signals for human-computer interaction. This project aims at studying the optimal parameters of a rapid serial visual presentation (RSVP) of frames from a video database when the user is searching for an object instance. The simulations reported in this thesis assess the trade-off between using a small or a large amount of images in each RSVP round that captures the user feedback. While short RSVP rounds allow a quick learning of the user intention from the system, RSVP rounds must also be long enough to let users generate the P300 EEG signals which are triggered by relevant images. This work also addresses the problem of how to distribute potential relevant and non-relevant images in a RSVP round to maximize the probabilities of displaying each relevant frame separated at least 1 second from another relevant frame, as this configuration generates a cleaner P300 EEG signal. The presented simulations are based on a realistic set up for video retrieval with a subset of 1,000 frames from the TRECVID 2014 Instance Search task
Creating new classes of objects with deep generative neural netsAkin Osman Kazakci
How can a machine search for novelty - if all it knows is known objects with known values?
This presentation clarifies this problem, pointing out at current paradoxes underlying both machine learning and computational creativity research. A series of experiments based on deep generative neural networks illustrates the exploration of new values in a knowledge-driven fashion. The key idea is to transfer value from one domain to the next, through the generation of out-of-distribution objects. We demonstrate the idea training a net on a set of digits, that generates letters - without being asked.
https://imatge.upc.edu/web/publications/livre-video-extension-lire-content-based-image-retrieval-system
This project explores the expansion of Lucene Image Retrieval Engine (LIRE), an open-source Content-Based Image Retrieval (CBIR) system, for video retrieval on large scale video datasets. The fast growth of the need to store huge amounts of video in servers requires efficient, scalable search and indexing engines capable to assist users in their management and retrieval. In our tool, queries are formulated by visual examples allowing users to find the videos and the moment of time when the query image is matched with. The video dataset used on this scenario comprise over 1,000 hours of different news broadcast channels. This thesis presents an extension and adaptation of Lire and its plugin for Solr, an open-source enterprise search platform from the Apache Lucene project, for video retrieval based on visual features, as well as a web-interface for users from different devices.
Abu-El-Haija, Sami, Nisarg Kothari, Joonseok Lee, Paul Natsev, George Toderici, Balakrishnan Varadarajan, and Sudheendra Vijayanarasimhan. "Youtube-8m: A large-scale video classification benchmark." arXiv preprint arXiv:1609.08675 (2016).
The recent emergence of machine learning and deep learning methods for medical image analysis has enabled the development of intelligent medical imaging-based diagnosis systems that can assist physicians in making better decisions about a patient’s health. In particular, skin imaging is a field where these new methods can be applied with a high rate of success.
This thesis focuses on the problem of automatic skin lesion detection, particularly on melanoma detection, by applying semantic segmentation and classification from dermoscopic images using a deep learning based approach. For the first problem, a U-Net convolutional neural network architecture is applied for an accurate extraction of the lesion region. For the second problem, the current model performs a binary classification (benign versus malignant) that can be used for early melanoma detection. The model is general enough to be extended to multi-class skin lesion classification. The proposed solution is built around the VGG-Net ConvNet architecture and uses the transfer learning paradigm. Finally, this work performs a comparative evaluation of classification alone (using the entire image) against a combination of the two approaches (segmentation followed by classification) in order to assess which of them achieves better classification results.
Thomas Heinis is a post-doctoral researcher in the database group at EPFL. His research focuses on scalable data management algorithms for large-scale scientific applications. Thomas is a part of the "Human Brain Project" and currently works with neuroscientists to develop the data management infrastructure necessary for scaling up brain simulations. Prior to joining EPFL, Thomas completed his Ph.D. in the Systems Group at ETH Zurich, where he pursued research in workflow execution systems as well as data provenance.
Seminar presentation about :
Automatic Image Annotation structure: shallow and deep,
cons and pros of different features and classification methods in AIA and
useful information about databases,toolboxes, authors
On the Support of a Similarity-Enabled Relational Database Management System ...Universidade de São Paulo
Crowdsourcing solutions can be helpful to extract information from disaster-related data during crisis management. However, certain information can only be obtained through similarity operations. Some of them also depend on additional data stored in a Relational Database Management System (RDBMS). In this context, several works focus on crisis management supported by data. Nevertheless, none of them provides a methodology for employing a similarity-enabled RDBMS in disaster-relief tasks. To fill this gap, we introduce a similarity-enabled methodology together with a supporting architecture named Data-Centric Crisis Management (DCCM), which employs our methods over a RDBMS. We evaluate our proposal through three tasks: classification of incoming data regarding current events, identifying relevant information to guide rescue teams; filtering of incoming data, enhancing the decision support by removing near-duplicate data; and similarity retrieval of historical data, supporting analytical comprehension of the crisis context. To make it possible, similarity-based operations were implemented within one popular, open-source RDBMS. Results using real data from Flickr show that the proposed methodology over DCCM is feasible for real-time applications. In addition to high performance, accurate results were obtained with a proper combination of techniques for each task. At last, given its accuracy and efficiency, we expect our work to provide a framework for further developments on crisis management solutions.
How to Scale from Workstation through Cloud to HPC in Cryo-EM Processinginside-BigData.com
In this video from the GPU Technology Conference, Lance Wilson from Monash University presents: How to Scale from Workstation through Cloud to HPC in Cryo-EM Processing.
"Learn how high-resolution imaging is revolutionizing science and dramatically changing how we process, analyze, and visualize at this new scale. We will show the journey a researcher can take to produce images capable of winning a Nobel prize. We'll review the last two years of development in single-particle cryo-electron microscopy processing, with a focus on accelerated software, and discuss benchmarks and best practices for common software packages in this domain. Our talk will include videos and images of atomic resolution molecules and viruses that demonstrate our success in high-resolution imaging."
Watch the video: https://wp.me/p3RLHQ-kcW
Learn more: https://www.monash.edu/researchinfrastructure/cryo-em
and
https://www.nvidia.com/en-us/gtc/home/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Applying ‘best fit’ frameworks to systematic review data extractionAndrea Miller-Nesbitt
Presented at the 7th International Conference on Qualitative and Quantitative Methods in Libraries, Paris France. Application for the 'best fit' framework synthesis methodology to systematic review data extraction
Keynote presentation at GlobusWorld 2021. Highlights product updates and roadmap, as well as user success stories in research data management. Presented by Ian Foster, Rachana Ananthakrishnan, Kyle Chard and Vas Vasiliadis.
Ivan Sahumbaiev "Deep Learning approaches meet 3D data"Fwdays
During this talk, I’d be talking about how 3d data can be processed with Deep Learning models. The main focus would be on Point Clouds.
Session agenda:
What are 3D data and its representation
Overview of libraries to visualize and process
How to collect. Cameras. Calibration
The current state of the art for point cloud processing with Deep Learning models.
classification problem. Models to use
segmentation problem. Models to use
datasets. Losses and training routine
Point clouds correspondences
spectral methods to generate correspondences
Limitations.
A Proposed Method to Develop Shared Papers for Researchers at Conferenceiosrjce
In conferences, the topics of interest for papers include variety of subjects, if the researcher wants to
write a shared researched paper on specific subject with another researcher who is also interested in the same
subject and wants to participate in the same conference, here the problem will arise especially when the topics
of interest and number of researchers become large. The aim of the paper is to solve this problem by finding a
suitable representation of researcher information of topics of interest that can be easily represented and then
found shared researcher on the same topics of interest. Two proposed system algorithms are implemented to
find the shared researchers in conference which gives an easy and efficient implementation.
This document provides an overview of deep generative learning and summarizes several key generative models including GANs, VAEs, diffusion models, and autoregressive models. It discusses the motivation for generative models and their applications such as image generation, text-to-image synthesis, and enhancing other media like video and speech. Example state-of-the-art models are provided for each application. The document also covers important concepts like the difference between discriminative and generative modeling, sampling techniques, and the training procedures for GANs and VAEs.
Machine translation and computer vision have greatly benefited from the advances in deep learning. A large and diverse amount of textual and visual data have been used to train neural networks whether in a supervised or self-supervised manner. Nevertheless, the convergence of the two fields in sign language translation and production still poses multiple open challenges, like the low video resources, limitations in hand pose estimation, or 3D spatial grounding from poses.
The transformer is the neural architecture that has received most attention in the early 2020's. It removed the recurrency in RNNs, replacing it with and attention mechanism across the input and output tokens of a sequence (cross-attenntion) and between the tokens composing the input (and output) sequences, named self-attention.
These slides review the research of our lab since 2016 on applied deep learning, starting from our participation in the TRECVID Instance Search 2014, moving into video analysis with CNN+RNN architectures, and our current efforts in sign language translation and production.
Machine translation and computer vision have greatly benefited of the advances in deep learning. The large and diverse amount of textual and visual data have been used to train neural networks whether in a supervised or self-supervised manner. Nevertheless, the convergence of the two field in sign language translation and production is still poses multiple open challenges, like the low video resources, limitations in hand pose estimation, or 3D spatial grounding from poses. This talk will present these challenges and the How2✌️Sign dataset (https://how2sign.github.io) recorded at CMU in collaboration with UPC, BSC, Gallaudet University and Facebook.
https://imatge.upc.edu/web/publications/sign-language-translation-and-production-multimedia-and-multimodal-challenges-all
https://imatge-upc.github.io/synthref/
Integrating computer vision with natural language processing has achieved significant progress
over the last years owing to the continuous evolution of deep learning. A novel vision and language
task, which is tackled in the present Master thesis is referring video object segmentation, in which a
language query defines which instance to segment from a video sequence. One of the biggest chal-
lenges for this task is the lack of relatively large annotated datasets since a tremendous amount of
time and human effort is required for annotation. Moreover, existing datasets suffer from poor qual-
ity annotations in the sense that approximately one out of ten language expressions fails to uniquely
describe the target object.
The purpose of the present Master thesis is to address these challenges by proposing a novel
method for generating synthetic referring expressions for an image (video frame). This method pro-
duces synthetic referring expressions by using only the ground-truth annotations of the objects as well
as their attributes, which are detected by a state-of-the-art object detection deep neural network. One
of the advantages of the proposed method is that its formulation allows its application to any object
detection or segmentation dataset.
By using the proposed method, the first large-scale dataset with synthetic referring expressions for
video object segmentation is created, based on an existing large benchmark dataset for video instance
segmentation. A statistical analysis and comparison of the created synthetic dataset with existing ones
is also provided in the present Master thesis.
The conducted experiments on three different datasets used for referring video object segmen-
tation prove the efficiency of the generated synthetic data. More specifically, the obtained results
demonstrate that by pre-training a deep neural network with the proposed synthetic dataset one can
improve the ability of the network to generalize across different datasets, without any additional annotation cost. This outcome is even more important taking into account that no additional annotation cost is involved.
Master MATT thesis defense by Juan José Nieto
Advised by Víctor Campos and Xavier Giro-i-Nieto.
27th May 2021.
Pre-training Reinforcement Learning (RL) agents in a task-agnostic manner has shown promising results. However, previous works still struggle to learn and discover meaningful skills in high-dimensional state-spaces. We approach the problem by leveraging unsupervised skill discovery and self-supervised learning of state representations. In our work, we learn a compact latent representation by making use of variational or contrastive techniques. We demonstrate that both allow learning a set of basic navigation skills by maximizing an information theoretic objective. We assess our method in Minecraft 3D maps with different complexities. Our results show that representations and conditioned policies learned from pixels are enough for toy examples, but do not scale to realistic and complex maps. We also explore alternative rewards and input observations to overcome these limitations.
https://imatge.upc.edu/web/publications/discovery-and-learning-navigation-goals-pixels-minecraft
Peter Muschick MSc thesis
Universitat Pollitecnica de Catalunya, 2020
Sign language recognition and translation has been an active research field in the recent years with most approaches using deep neural networks to extract information from sign language data. This work investigates the mostly disregarded approach of using human keypoint estimation from image and video data with OpenPose in combination with transformer network architecture. Firstly, it was shown that it is possible to recognize individual signs (4.5% word error rate (WER)). Continuous sign language recognition though was more error prone (77.3% WER) and sign language translation was not possible using the proposed methods, which might be due to low accuracy scores of human keypoint estimation by OpenPose and accompanying loss of information or insufficient capacities of the used transformer model. Results may improve with the use of datasets containing higher repetition rates of individual signs or focusing more precisely on keypoint extraction of hands.
https://github.com/telecombcn-dl/lectures-all/
These slides review techniques for interpreting the behavior of deep neural networks. The talk reviews basic techniques such as the display of filters and tensors, as well as more advanced ones that try to interpret which part of the input data is responsible for the predictions, or generate data that maximizes the activation of certain neurons.
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/dlai-2020/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/drl-2020/
This course presents the principles of reinforcement learning as an artificial intelligence tool based on the interaction of the machine with its environment, with applications to control tasks (eg. robotics, autonomous driving) o decision making (eg. resource optimization in wireless communication networks). It also advances in the development of deep neural networks trained with little or no supervision, both for discriminative and generative tasks, with special attention on multimedia applications (vision, language and speech).
Giro-i-Nieto, X. One Perceptron to Rule Them All: Language, Vision, Audio and Speech. In Proceedings of the 2020 International Conference on Multimedia Retrieval (pp. 7-8).
Tutorial page:
https://imatge.upc.edu/web/publications/one-perceptron-rule-them-all-language-vision-audio-and-speech-tutorial
Deep neural networks have boosted the convergence of multimedia data analytics in a unified framework shared by practitioners in natural language, vision and speech. Image captioning, lip reading or video sonorization are some of the first applications of a new and exciting field of research exploiting the generalization properties of deep neural representation. This tutorial will firstly review the basic neural architectures to encode and decode vision, text and audio, to later review the those models that have successfully translated information across modalities.
Image segmentation is a classic computer vision task that aims at labeling pixels with semantic classes. These slides provide an overview of the basic approaches applied from the deep learning field to tackle this challenge and presents the basic subtasks (semantic, instance and panoptic segmentation) and related datasets.
Presented at the International Summer School on Deep Learning (ISSonDL) 2020 held online and organized by the University of Gdansk (Poland) between the 30th August and 2nd September.
http://2020.dl-lab.eu/virtual-summer-school-on-deep-learning/
https://imatge-upc.github.io/rvos-mots/
Video object segmentation can be understood as a sequence-to-sequence task that can benefit from the curriculum learning strategies for better and faster training of deep neural networks. This work explores different schedule sampling and frame skipping variations to significantly improve the performance of a recurrent architecture. Our results on the car class of the KITTI-MOTS challenge indicate that, surprisingly, an inverse schedule sampling is a better option than a classic forward one. Also, that a progressive skipping of frames during training is beneficial, but only when training with the ground truth masks instead of the predicted ones.
Deep neural networks have achieved outstanding results in various applications such as vision, language, audio, speech, or reinforcement learning. These powerful function approximators typically require large amounts of data to be trained, which poses a challenge in the usual case where little labeled data is available. During the last year, multiple solutions have been proposed to leverage this problem, based on the concept of self-supervised learning, which can be understood as a specific case of unsupervised learning. This talk will cover its basic principles and provide examples in the field of multimedia.
More from Universitat Politècnica de Catalunya (20)
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Tools for Image Retrieval in Large Multimedia Databases
1. Tools for Image Retrieval
in Large Multimedia Databases
by Carles Ventura Royo
Directors:
Verónica Vilaplana
Xavier Giró
Tutor:
Ferran Marqués
Barcelona, September 2011
1
2. Index
Identifying the problem
State of art: indexing techniques
State of art: Hierarchical Cellular Tree (HCT)
Modifications to the original HCT
Experimental results
Implemented tools
Conclusions and future work lines
2
5. Identifying the problem (III)
K nearest neighbor problem
Solution: Sequential scan
Drawback: Computational time for large
databases (10 s for a 200,000 elements)
Approximate K nearest neighbor problem
Indexing techniques
5
6. Requirements of the solution
Dynamic approach
Multimedia databases are not static
Insertions and deletions
High dimensional feature spaces
“curse of dimensionality” problem
MPEG-7 visual descriptors are high-dimensional
feature vectors
6
7. Index
Identifying the problem
State of art: indexing techniques
State of art: Hierarchical Cellular Tree (HCT)
Modifications to the original HCT
Experimental results
Implemented tools
Conclusions and future work lines
7
8. Indexing techniques (I)
Hierarchical data structures
Spatial Access Methods (SAMs)
K-d tree, R-tree, R*-tree, TV-tree, etc.
Drawbacks:
Items have to be represented in an N-dimensional feature
space
Dissimilarity measure based on a Lp metric
SAMs do not scale up well to high dimensional spaces
8
9. Indexing techniques (II)
Hierarchical data structures
Metric Access Methods (MAMs)
VP-tree, MVP-tree, GNAT, M-tree, etc.
More general approach than SAMs
Assuming only a similarity distance function
MAMs scale up well to high dimensional spaces
Drawbacks:
Static MAMs do not support dynamic changes
Dependence on pre-fixed parameters
9
10. Indexing techniques (III)
Locality Sensitive Hashing
It uses hash functions
Nearby data points are hashed into the same
bucket with a high probability
Points faraway are hashed into the same bucket
with a low probability
Drawback:
It does not solves the K nearest neighbor problem, but
the Є-near neighbor problem.
10
11. Index
Identifying the problem
State of art: indexing techniques
State of art: Hierarchical Cellular Tree (HCT)
Modifications to the original HCT
Experimental results
Implemented tools
Conclusions and future work lines
11
12. Solution adopted
Hierarchical Cellular Tree (HCT)
MAM-based indexing scheme
Hierarchical structure
Self-organized tree
Incremental construction
in a bottom-up fashion
Unbalanced tree
Not dependence on a maximum capacity
Preemptive cell search algorithm for insertion
Dynamic approach
[KG07] S. Kiranyaz and M.Gabbouj, Hierarchical Cellular Tree: An efficient
indexing scheme for content-based retrieval on multimedia databases.
12
15. HCT: Level Structure
Representatives for each cell from the lower
level
Responsible for maximizing the compactness
of its cells
Compactness threshold
15
16. HCT Operations (I)
Item insertion
Find the most
suitable cell
Most Similar Nucleus
vs Preemptive Cell Search
16
17. HCT Operations (II)
Item insertion
Find the most
suitable cell
Most Similar Nucleus
vs Preemptive Cell Search
C2: d(O,O2
N) - r(O2
N) < dmin
C3: d(O,O3
N) - r(O3
N) > dmin
17
18. HCT Operations (III)
Item insertion
Find the most suitable cell
Append the element
Generic post-processing check
Mitosis operation
Nucleus change
18
19. HCT Operations (IV)
Item removal
Cell search algorithm not required
Remove the element
Generic post-processing check
Mitosis operation
Nucleus change
19
20. HCT: Retrieval scheme
Progressive Query
Periodical subqueries over database subsets
Query Path formation
Based on Most Similar Nucleus
Ranking agreggation
20
21. Index
Identifying the problem
State of art: indexing techniques
State of art: Hierarchical Cellular Tree (HCT)
Modifications to the original HCT
Experimental results
Implemented tools
Conclusions and future work lines
21
22. Modifications to the HCT (I)
Covering radius
Original definition gives an approximation by
defect
Consider all the elements belonging to the
subtree
High computational cost
Approximation by excess
22
24. Modifications to the HCT (III)
Covering radius
24
rC (C9) = max
• d(E,B)+rC(C1)
• rC(C2)
• d(E,H)+rC(C3)
25. Modifications to the HCT (III)
HCT construction
Preemptive Cell Search over all the levels
A method for updating the covering radius
To reduce the searching time
It can be performed after the HCT construction or
periodically
25
26. Modifications to the HCT (IV)
Searching techniques
PQ fails in solving the KNN problem efficiently
New searching techniques
Most Similar Nucleus
Preemptive Cell Search
Hybrid
Number of cells to be considered
Minimum number of cells
Cells hosting 2·K elements
Cellular structure is not kept
26
27. Index
Identifying the problem
State of art: indexing techniques
State of art: Hierarchical Cellular Tree (HCT)
Modifications to the original HCT
Experimental results
Implemented tools
Conclusions and future work lines
27
28. Experimental results
CCMA image database of 216,317 elements
HCT building evaluation
Construction time
Retrieval system evaluation
Retrieval time
Elements retrieved
28
33. Retrieval system evaluation (II)
33
Evaluation with respect to exhaustive search
Mean Competitive Recall
Elements in common
Mean Normalized Aggregate Goodness
Kendall distance
Number of exchanges needed in a bubble sort
Query set of 1,082 images
36. Index
Identifying the problem
State of art: indexing techniques
State of art: Hierarchical Cellular Tree (HCT)
Modifications to the original HCT
Experimental results
Implemented tools
Conclusions and future work lines
36
37. Implemented tools (I)
database_indexing tool
Tool for indexing an image database
HCT is stored at disk
hct_query tool
Tool for carrying out a search over an indexed
database
HCT is read from disk and load at main memory
37
39. Index
Identifying the problem
State of art: indexing techniques
State of art: Hierarchical Cellular Tree (HCT)
Modifications to the original HCT
Experimental results
Implemented tools
Conclusions and future work lines
39
40. Conclusions
Hierarchical Cellular Tree implementation
To improve the retrieval times
Generic implementation for any kind of data
Modifications proposed
HCT evaluation
Measures extracted from literature
Preemptive Cell Search technique gives the
best performance
It is essential not to use an underestimated value
for the covering radius 40
41. Future work lines
Very large databases
Not using only main memory
Region-based CBIR system
Each image can be represented by a set of
regions
Browser application based on HCT
Take advantage of the hierarchical structure
Alternative way to retrieve elements
41
45. A tool for image DB indexing
MPEG-7 visual descriptors
Indexing a database
Inserting new elements to an indexed
database
45
46. A tool for image retrieval
It requires an indexed database
Image query
Image file
XML file with the visual descriptors
TXT file with several image queries
Interactive mode
Performing a search
46
Good morning, I’m going to defend my master’s final project dissertation, titled Tools for Image Retrieval in Large Multimedia Databases, which has been directed by Veronica Vilaplana and Xavi Giró and tutored by Ferran Marques.
First of all, we’re going to identify which is the problem to be solved.
When we are looking for images in an image database, there are basically two ways for informing the retrieval system of what we’re looking for: through words or by giving an image, which is called query.
Our retrieval system is based on the latter one. Thus, we give a query image to the system in order to retrieve the K most similar elements from a given database.This problem is known as the K nearest neighbor problem. In order to compute how similar the images are in our retrieval system, the images have been represented by some visual descriptors defined in MPEG-7, which are compared by using some dissimlaritiy functions also defined in this standard.
The way to obtain these most similar elements is performing an exhaustive search over the whole database. However, this sequential scan is not feasible in a large database. To get an idea, an exhaustive search performed on a database of about 200,000 elements lasts 10 seconds, time which will not be acceptable for the user.
Therefore, CBIR systems needs incorporating indexing techniques in order to scale up well when they work over large databases in order to achieve retrieval times that would be acceptable for the user. On the other hand, the exact K nearest neighbors may not be retrieved and we may obtain an approximated solution to the KNN problem.
In particular, we need an indexing technique which fulfills the following requirements:
A dynamic approach. Since multimedia databases are not static, we want an index structure which allows insertions and deletions of the indexed elements without having to reindex the whole database from scratch.
Some indexing techniques suffer from the so-called “curse of dimensionality” problem when they are used in high dimensional feature spaces and, therefore, become more inefficient than even an exhaustive search. Since we work with the MPEG-7 visual descriptors, which are high-dimensional feature vectors, we need an indexing technique which is appropriate for high dimensional feature spaces.
Once it’s clear the problem to be solved, we’re going to analyze which indexing techniques are available in the literature.
The most traditional indexing techniques are formed mostly in a hierarchical tree structure. These indexing techniques can be mainly grouped in two categories: spatial access methods and metric access methods.
The applicability of SAMs is limited by the fact that the items have to be represented by points in an N-dimensional feature space and the dissimilarity measure between two points has to be based on a distance function in Lp metric such as Euclidean distance. Furthermore, the main disadvantage of SAMs is that they do not scale up well to high dimensional spaces.
The metric access methods give a more general approach than SAMs and do not suffer from the “curse of dimensionality” problem. Therefore, they scale up well to high dimensional spaces. However, some existing MAMs present several drawbacks. For instance, the static MAMs do not support dynamic changes such as new insertions or deletions. Even though some MAM-based indexing techniques provide a dynamic approach, their dependence on pre-fixed parameters can lead to significantly varying performances.
Another very popular indexing technique is the Locality Sensitive Hashing, which is not based on a hierarchical tree structure. Its main disadvatge is that LSH, by its nature, does not solve the KNN problem but the e-near neighbor problem, which consists in gathering the elements within a predefined e-distance from the query point.
After analyzing several indexing techniques from the literature, we have decided to implement an indexing technique called Hierarchical Cellular Tree, which was designed to bring an effective solution especially for indexing large multimedia databases.
The HCT is a MAM-based indexing technique with a hierarchical structure, which consists of one or more levels and each level in turn holds one or more cells. It is a self-organized tree, which basically means that the operations (item insertion, removal, etc.) are not externally controlled, but they are carried out according to some internal rules. The HCT is built dynamically in a bottom-up fashion and is an unbalanced tree optimized for achieving highly focused cells. The HCT does not depend on a maximum capacity and, therefore, it has no limit for the cell size as long as the cell keeps a definite compactness measure. In order to perform an optimum search for the insertion process, the HCT includes a preemptive cell search algorithm. Finally, the HCT is also characterized for its totally dynamic approach.
Now, we’re going to detail the two main components of the HCT: the cell structure and the level structure. A cell is a basic container structure where similar database elements are stored. The distances between each pair of elements are assigned to each edge of a connected, undirected graph. We also have the Minimum Spanning Tree, which is the subgraph that connects all the vertices together with the minimum cumulative total weight. The cell nucleus is the element which represents the cell on the upper level. It is essential to promote the best item for this representation since these elements are used during the top-down search for item insertion or for query requests. Therefore, it was proposed to choose the element having the maximum number of branches in the MST. Then, the covering radius is defined as the distance from the nucleus to the furthest element in the cell.
Another cell parameter is the cell compactness, which quantifies how focused or compact the clustering for the items within the cell is. It is computed as a function of the following cell parameters: mean and standard deviation of the MST branch weights, covering radius, maximum MST branch weight and the number of elements. The cell compactness plays a key role in deciding whether or not to perform mitosis within the cell at any instant, and so does the maturity size. The maturity size is a prefixed parameter. When a cell holds a number of elements greater than or equal to the maturity size value, we consider it mature. Therefore, a cell can suffer from mitosis operation only if it is mature and it is not compact enough. When mitosis is granted, MST is again used to decide how to split the cell by breaking the branch with largest weight of the MST.
As mentioned before, the HCT has a hierarchical structure, which is formed by one or more levels. The elements belonging to a particular level are the representants for each cell from the lower level except, obviously, the ground level, which contains the entire database. Each level is the responsible for maximizing the compactness of its cell. With this purpose, each level updates its own compactness threshold, which is computed by applying the median operator over the compactness values of its mature cells. This parameter is used to evaluate whether a cell is compact enough.
Now, we’re going to detail the two external operations that can be carried out over a HCT: item insertion and item removal.
Whenever an item insertion is performed, the first thing to do is to find the most suitable cell to which the element must be appended in the ground level. In this figure, element C is the closest nucleus from the ground level. In order to perform this operation in an efficient way, a search algorithm called Preemptive Cell Search was proposed instead of the traditional Most Similar Nucleus. The MS-Nucleus assumes that the closest nucleus yields the best subtree during descend. In the figure, the element being inserted is first compared to the nucleus objects from the upper level. Since the nucleus of cell C1 is the closest one, the two other cells would be discarded. Therefore, element C would be wrongly discarded.
On the other hand, the pre-emptive cell search algorithm performs a preemptive analysis on the upper level to find out all possible nucleus objects which might yield the closest objects on the lower level. In this figure, since d(O,ON3)-r(ON3) &gt; dmin, cell C3 can not yield the closest nucleus and, therefore, is discarded. On the other hand, since d(O,ON2)-r(ON2) &lt; dmin, cell C2 is not discarded. Therefore, element C will be considered and the element being inserted will be appended to the most suitable cell.
Once the most suitable cell has been found, the element is appended. Then, the cell becomes subject to a generic post-processing check. First, the cell is examined for a mitosis operation. In case mitosis is not performed, it is necessary to check if the nucleus has changed due to the insertion.
The item removal is an operation that does not require any cell search algorithm. The element to be removed is taken out of the cell and the cell becomes subject to a generic post-processing check as in an item insertion operation.
In the original article, a retrieval scheme called Progressive Query was proposed. This searching technique consists in performing periodical sub-queries over subsets of database items. The order in which the comparisons are done is given by the Query Path, which is formed based on the Most Similar Nucleus searching technique. The rankings obtained over each subset are aggregated after each subquery.
So far, we have presented the Hierarchical Cellular Tree, the solution adopted for indexing large multimedia databases. Therefore, the HCT has been implemented in our development platform. Since we have detected some weak points, now we are going to present some proposed modifications to the original HCT.
The covering radius was originally defined as the distance from the nucleus to the furthest element in the cell. However, we consider that we should take into account not only the elements belonging to the corresponding cell, but also all the elements belonging to its subtree. Therefore, the original HCT is working with a poor approximation by defect. Since the computation of the exact covering radius can have a high computational cost, we have proposed an approximation by excess, which guarantees that no subtree will be wrongly discarded. Therefore, the most suitable cell will be always found.
For instance, in order to compute the covering radius for cell C9, all the elements in the cells C1, C2 and C3 should be considered, instead of using only the elements belonging to that cell. Using the approximation by excess that we have proposed, the covering radius for the cell C9 is computed as follows:
The covering radius for the cell C9 is computed as the maximum among the following values:
Covering radius of cell C1 plus the distance from element B (its nucleus) to element E, the cell C9 nucleus
Covering radius of cell C2
Covering radius of cell C3 plus the distance from element H (its nucleus) to element E, the cell C9 nucleus
Originally, it was proposed to adopt a hybrid approach to build the HCT, using the Preemptive Cell Search algorithm only in a certain number of the uppermost levels and the Most Similar Nucleus technique in the lowest ones. Since the Most Similar Nucleus is used, the element being inserted may not be appended to the most suitable cell. That’s the reason why we have decided to build the HCT by using the Preemptive Cell Search algorithm over all the levels.
Furthermore, we have implemented a method for updating the covering radius to its actual value in order to reduce the searching time of both the insertion operations and query requests. In this way, we will visit only the cells which are strictly necessary. This method can be applied after the HCT construction or periodically.
As we will see in the results, the Progressive Query fails in solving the KNN problem efficiently. That’s the reason why we have decided to implement new searching techniques. Therefore, we have adapted the cell search algorithms for insertion operations to searching techniques for retrieval. In particular, we are interested on the Preemptive Cell Search algorithm since it takes advantage of the covering radius and it guarantees that the most suitable cell is retrieved.
Since the cell whose nucleus is the nearest element to the query may not contain the most similar elements, we have proposed considering a minimum number of cells. We have also proposed that the number of cells being considered depends on the number of expected results. Therefore, we have decided that the cells considered must host at least twice the number of elements expected. These elements are sorted individually without keeping the cellular structure.
Once the original HCT and some proposed modifications have been presented, we’re going to proceed to their evaluation.
We have selected a database which consists of approximately 200,000 elements given by the Corporació Catalana de Mitjans Audiovisuals. Although the main objective of these experiments is to evaluate how fast the retrieval system is and how good the obtained results are depending on the searching technique and the HCT used, we also want to analyze the time required for the HCT to be built.
First of all, we are going to analyze the building time. We can see the time required for building the HCT when the original covering radius is used is by far the best one. However, as we will see in the retrieval system evaluation, this approach gives the worst performance. This is because the original covering radius is an approximation by defect and, therefore, a number of cells smaller than the strictly necessary are being visited during the insertion operations. Although the building time required by the proposed covering radius is worse, this approach give a much better performance. Furthermore, this time can be reduced by applying periodically the update method without affecting its performance. Thus, we have reduced the building time in a 23% by applying the update method every 5000 insertions.
Now, we’re going to evaluate the retrieval system. We want to compare the approximate ranking, obtained by using a searching technique over the HCT, with the exact ranking given by the exhaustive search. Here we have an example of a search in which an image of an anchorman has been used as query. All the images illustrated represent the results of an exhaustive search over the whole image database. The images marked with a green rectangle have been retrieved by the Preemptive Cell Search algorithm whereas the ones which are marked with a red rectangle are the missing ones in the approximate ranking.
Here we have an example in which three images have not been retrieved by the Preemptive Cell Search technique.
And another one in which all the images have been retrieved.
In order to compare the approximate ranking, obtained by using a searching technique over the HCT, with the exact ranking given by the exhaustive search, we have selected three different measures from the literature:
The Mean Competitive Recall, which is defined as the number of elements in common.
The Mean Normalized Aggregate Goodness.It is only 1 when the k nearest elements are retrieved and is only 0 when the k farthest elements are retrieved, the worst case.
The Kendall distance, which turns out to be equal to the number of exchanges needed in a bubble sort.
These measures have been computed over a query set of 1082 images.
Now, we want to present the results obtained by the different searching techniques when the number of resuts expected by the user is 40. First of all, we’re going to analyze the Preemptive Cell Search algorithm depending on the covering radius used and whether the update method for the covering radius has been applied or not. We can see that the retrieval system behaves by far the worst when the original covering radius is used without the update method since there are only 12.35 out of 40 elements in common between the rankings on average. Futhermore, only 49.26% of the query images have been retrieved. However, the quality of the results for the original covering radius is remarkably improved when the proposed update method for the covering radius is used. Therefore, we can see how important it is not to use an underestimated value for the covering radius. That’s the reason why the performance of the retrieval system is also so good when the proposed covering radius is used regardless whether the update method is applied or not. Thus, in a retrieval time of about 1 second, 99% of the image queries are retrieved and there are about 28 out of 40 elements in common between the rankings on average. These results are also corroborated by the other two quality measures.
Here we have a table which will allow us to compare the implemented searching techniques. We can see that the retrieval time increases jointly with the number of the levels over which the Preemptive Cell Search algorithm is applied. On the other hand, the greater the number of levels over which the Preemptive technique is applied, the better the quality measures. Since we consider that a retrieval time of about 1 second is absolutely acceptable for the user, we can conclude that the Preemptive Cell Search technique gives the best performance. Finally, we want to compare it to the Progressive Query, which is the retrieval scheme originally proposed. If we consider a larger number of elements instead of only 80 elements to obtain the approximate ranking, the Most Similar Nucleus technique becomes equivalent to the first subqueries performed by the Progressive Query. From the results showed in the table, we can see that even when we consider 20,000 elements, which means a retrieval time of 1.37 seconds, the quality of the results is worse than when the Preemptive Cell Search algorithm is used. In such interval of time, there are only 12.33 out of 40 elements in common between the rankings on average and only 31.61% of the query images have been retrieved.
Finally, we’re going to present some tools which have been implemented to make it easier for the user to use the HCT.
We have implemented a tool called database_indexing which is used for indexing an image database.The resulting HCT is stored at hard disk. Then, in order to use this HCT to perform searches over it, the user should use the hct_query tool.
Furthermore, the image retrieval system based on the HCT has been also implemented over a server/client architecture. On the one hand, we have the hct_building tool, a daemon program which loads an indexed database and is constantly running waiting for new query requests. On the other hand, there is the query_request tool, which is the tool used to send the query request. Since we also need a means of communication between these tools, we have selected a messaging system called KSC.
Finally, let’s draw the conclusions and present the future work lines.
As seen, in this project we have implemented an indexing technique called Hierarchical Cellular Tree in order to improve the retrieval times of the CBIR system over large databases. Although the HCT has been chosen to be used in a particular scenario, its implementation allows to index any type of data and, therefore, the HCT is useful for indexing a generic database as long as there is a function to measure the dissimilarity between any pair of elements. Futhermore, some modifications have been proposed which have resulted in an improvement of the quality of the retrieved elements. In order to evaluate its performance, we have used some contrasted measures extracted from the literature. Finally, we conclude that the Preemptive Cell Search gives the best performance when is used for retrieval. Furthermore, it is essential not to use an underestimated value for the covering radius, either building the HCT with the proposed covering radius or applying the update method over the HCT built with the original covering radius.
This project opens the door to new future work lines. The main work consists in adapting the HCT implementation for working with very large databases which do not allow to be indexed by only using the main memory. The implementation of this indexing technique will also allow to improve the retrieval time of any region-based CBIR system. For instance, if each image is represented by a set of 100 regions, an small image database of 1,000 images results in a large database of 100,000 elements. Finally, another future work line from this project is the implementation of a browser for an image database, as it is proposed in the original article, since the hierarchical structure of the HCT is quite appropriate to give an overview to the user about what lies under the current level. Thus, the browser can be also used as an alternative way to retrieve elements from an image database.
Here we have an illustrative example of HCT construction where the elements are represented by colored circles and the similarity between two elements is given by how similar in color are. First, all the elements are inserted in the unique cell of the HCT, which is created in the ground level, until this becomes mature and not compact enough. In this example, this happens when the yellow ball labeled as 1 is inserted.
As a result, the cell is split into two new cells and a new top level is created. The new nuclei of each child are computed (the yellow ball labeled as 1 and the red ball labeled as f) and they are inserted in the new top cell. This process continues until all the elements of the database are inserted.
Although the HCT has been implemented for indexing any element type, we have implemented a tool which is used to index image database elements which are represented by some MPEG-7 visual descriptors and are compared by using the dissimilarity measures also defined in MPEG-7. The user is asked to precompute this descriptors for each image before using this tool. We have to specify which are the elements to be indexed by a TXT file including the path of the XML files containing the visual descriptors and the directory where the HCT will be stored at disk. We also specify the visual descriptor in which we are intrerested and some HCT parameters as the maturity size.
Furthermore, this tool also allows the user to insert new elements to a previously indexed database by giving the path of the HCT stored at disk.
Another tool which allows the user to carry out a search on an indexed database has been also implemented. The image query is specified by giving either its filename or the XML file including the visual descriptors. This tool also allows the user to perform several retrievals by giving a TXT file including the different image queries. We have also implemented an interactive mode in which the user is asked if he wants to perform a new retrieval without waiting for the HCT being load again. Here we have an example of use of this tool.
The image retrieval system based on the HCT has been also implemented over a server/client architecture. On the one hand, we have the hct_building tool, a daemon program which loads an indexed database and is constantly running waiting for new query requests. On the other hand, there is the query_request tool, which is the tool used to send the query request. Since we also need a means of communication between these tools, we have selected a messaging system called KSC. The objective of the KSC is to control the registration of the clients, to find out to which message type the clients subscribe and to forward the messages to the subscribed clients. Thus, we have defined a message type named SearchKSCMessage, which defines the parameters of the search, and a message type named SearchCompletedKSCMessage which informs that the searching process has been properly completed. The query_request tool subscribes as a writer of the SearchKSCMessage and as a reader of the SearchCompletedKSCMessage, whereas the hct_building tool subscribes as a reader of the SearchKSCMessage and as a writer of the SearchCompletedKSCMessage.
Now, we are going to analyze the same experiment but using the Most Similar Nucleus technique instead. For this technique, the covering radius is not used during the search so it does not matter whether the update method is used or not. Analyzing the results we come to the conclusion that the Most Similar Nucleus technique gives a bad performance since the query image is retrieved only in the 5% of the query requests and there are only 1.35 out of 40 elements in common between the rankings on average. These results are even worse when the original covering radius is used.
If we consider a larger number of elements, the Most Similar Nucleus technique becomes equivalent to the first subqueries performed by the Progressive Query. From the results showed in the table, we can see that even when we consider 20,000 elements, which means a retrieval time of 1.37 seconds, the quality of the results is worse than whether the Preemptive Cell Search algorithm is used. In such interval of time, there are only 12.33 out of 40 elements in common between the rankings on average.
Finally, we consider the Hybrid technique, which combines the two searching techniques analyzed before. The results showed in the table have been obtained when the Preemptive Cell Search algorithm has been applied over the 8 uppermost levels, and the Most Similar Nucleus technique in the lowest ones. For the Hybrid technique, we have a retrieval time which is always shorter than the Preemptive Cell retrieval time and longer tha the Most Similar Nucleus retrieval time. On the contrary, the quality of the elements retrieved by using the Hybrid technique is always better than applying the Most Similar Nucleus and worse than using the Preemptive Cell Search.