https://imatge.upc.edu/web/publications/livre-video-extension-lire-content-based-image-retrieval-system
This project explores the expansion of Lucene Image Retrieval Engine (LIRE), an open-source Content-Based Image Retrieval (CBIR) system, for video retrieval on large scale video datasets. The fast growth of the need to store huge amounts of video in servers requires efficient, scalable search and indexing engines capable to assist users in their management and retrieval. In our tool, queries are formulated by visual examples allowing users to find the videos and the moment of time when the query image is matched with. The video dataset used on this scenario comprise over 1,000 hours of different news broadcast channels. This thesis presents an extension and adaptation of Lire and its plugin for Solr, an open-source enterprise search platform from the Apache Lucene project, for video retrieval based on visual features, as well as a web-interface for users from different devices.
This is a presentation of Fuzzy Hash Map (FHM). FHM is an extension to the regular Java HashMap data structure allowing efficient fuzzy string key search. Customizable algorithms and settings bring flexibility to this new data structure, making it adaptable to each specific use case. Fuzzy string search performance comparison between Fuzzy Hash Map and the regular HashMap are presented for both accuracy and time consumption. Results show very good performance for Fuzzy Hash Map compared to the regular HashMap.
This is a presentation of Fuzzy Hash Map (FHM). FHM is an extension to the regular Java HashMap data structure allowing efficient fuzzy string key search. Customizable algorithms and settings bring flexibility to this new data structure, making it adaptable to each specific use case. Fuzzy string search performance comparison between Fuzzy Hash Map and the regular HashMap are presented for both accuracy and time consumption. Results show very good performance for Fuzzy Hash Map compared to the regular HashMap.
More details: https://imatge.upc.edu/web/publications/rapid-serial-visual-presentation-relevance-feedback-image-retrieval-eeg-signals
Author: Sergi Porta
Advisors: Eva Mohedano & Noel O'Connor (DCU) / Amaia Salavador & Xavier Giró-i-Nieto (UPC)
This thesis explores the potential of relevance feedback for image retrieval using EEG signals for human-computer interaction. This project aims at studying the optimal parameters of a rapid serial visual presentation (RSVP) of frames from a video database when the user is searching for an object instance. The simulations reported in this thesis assess the trade-off between using a small or a large amount of images in each RSVP round that captures the user feedback. While short RSVP rounds allow a quick learning of the user intention from the system, RSVP rounds must also be long enough to let users generate the P300 EEG signals which are triggered by relevant images. This work also addresses the problem of how to distribute potential relevant and non-relevant images in a RSVP round to maximize the probabilities of displaying each relevant frame separated at least 1 second from another relevant frame, as this configuration generates a cleaner P300 EEG signal. The presented simulations are based on a realistic set up for video retrieval with a subset of 1,000 frames from the TRECVID 2014 Instance Search task
Presenter: Amaia Salvador
Related papers:
E. Mohedano, Salvador, A., McGuinness, K., Giró-i-Nieto, X., O'Connor, N., and Marqués, F., “Bags of Local Convolutional Features for Scalable Instance Search”, in ACM International Conference on Multimedia Retrieval (ICMR), New York City, NY; USA. 2016
A. Salvador, Giró-i-Nieto, X., Marqués, F., and Satoh, S. 'ichi, “Faster R-CNN Features for Instance Search”, in CVPR Workshop Deep Vision, Las Vegas, NV, USA. 2016.
Abstract:
Image representations derived from pre-trained Convolutional Neural Networks (CNNs) have become the new state of the art in computer vision tasks such as instance retrieval. This work proposes a simple pipeline for encoding the local activations of a convolutional layer of a pre-trained CNN using the well-known bag of words aggregation scheme (BoW). Assigning each local array of activations in a convolutional layer to a visual word produces an assignment map, a compact representation that relates regions of an image with a visual word. We use the assignment map for fast spatial reranking, obtaining object localizations that are used for query expansion. We further investigate the potential of using convolutional features from an object detection network such as Faster R-CNN, which allows to obtain image- and region- wise features in a single forward pass. We demonstrate the suitability of such representations for image retrieval on the Oxford Buildings 5k, Paris Buildings 6k and a subset of TRECVid Instance Search 2013, achieving competitive results. This talk will review the two publications related to this work, which have been recently accepted at ICMR 2016 and DeepVision CVPRW 2016.
Barcelona, 3 May 2016.
https://imatge.upc.edu/web/publications/region-oriented-convolutional-networks-object-retrieval
BSc thesis by Eduard Fontdevila advised by Amaia Salvador and Xavier Giró-i-Nieto.
EET UPC, June 2015.
Quality, Relevance and Importance in Information Retrieval with Fuzzy Semanti...tmra
We propose a framework for ranking information based on quality, relevance and importance, and argue that a socio-semantic contextual approach that extends topicality can lead to increased value of information retrieval systems. We use Topic Maps to implement our framework, and discuss procedures for calculating the resource ranking. A fuzzy neural network approach is envisioned to complement the process of manual metadata creation.
Inverted Index Based Multi-Keyword Public-key Searchable Encryption with Stro...Mateus S. H. Cruz
Presentation given at the SWIM seminar (University of Tsukuba) about the paper "Inverted Index Based Multi-Keyword Public-key Searchable Encryption with Strong Privacy Guarantee"*.
This presentation is based on the uploader's understanding of the paper and may contain inaccurate interpretations.
A summary of the paper is available at: https://mshcruz.wordpress.com/2016/10/24/summary-inverted-index-based-multi-keyword-public-key-searchable-encryption-with-strong-privacy-guarantee/
*Wang et al.: "Inverted Index Based Multi-Keyword Public-key Searchable Encryption with Strong Privacy Guarantee". INFOCOM 2015.
Application Architecture Summit - Monitoring the Dynamic Cloud New Relic
How do you apply modern application to your digital business? Hear from New Relic's Sr Director, Strategic Architecture, Lee Atchison, at the Application Architecture Summit. Learn more here: https://newrelic.com/partner/aws
Fuzzy logic is often heralded as a technique for handling problems with large amounts of vagueness or uncertainty. Since its inception in 1965 it has grown from an obscure mathematical idea to a technique used in a wide variety of applications from cooking rice to controlling diesel engines on an ocean liner.
This talk will give a layman's introduction to the topic and explore some of the real world applications in control and human decision making. Examples might include household appliances, control of large industrial plant, and health monitoring systems for the elderly. We will look at where the field might be going over the next ten years, highlighting areas where DMU's specialist expertise drives the way.
How can you deal with Fuzzy Logic. Fuzzy logic is a form of many-valued logic; it deals with reasoning that is approximate rather than fixed and exact. In contrast with traditional logic theory, where binary sets have two-valued logic: true or false, fuzzy logic variables may have a truth value that ranges in degree
between 0 and 1
Privacy-Preserving Multi-Keyword Fuzzy Search over Encrypted Data in the CloudMateus S. H. Cruz
Presentation given at the SWIM seminar (University of Tsukuba) about the paper "Privacy-Preserving Multi-Keyword Fuzzy Search over Encrypted Data in the Cloud"*.
This presentation is based on the uploader's understanding of the paper and may contain inaccurate interpretations.
A summary of the paper is available at: https://mshcruz.wordpress.com/2016/08/19/summary-privacy-preserving-multi-keyword-fuzzy-search-over-encrypted-data-in-the-cloud/
*Wang et al.: "Privacy-Preserving Multi-Keyword Fuzzy Search over Encrypted Data in the Cloud". INFOCOM 2014.
No Compromise - Better, Stronger, Faster Java in the CloudAll Things Open
Presented at All Things Open 2022
Presented by Jarek Gawor & Harry L. Hoots, III
Title: No Compromise - Better, Stronger, Faster Java in the Cloud
Abstract: Innovation in the cloud-era is about driving efficiencies, agility, and greater opportunities to deploy workloads to the cloud of your choice. Join us as we explore critical challenges faced by organizations in their move to cloud-native architectures along with the innovation in Java standards, including MicroProfile and Jakarta EE, and emerging technologies that help them build and deploy their applications on any cloud, faster and with better performance. Throughout, we showcase Open Liberty, the open-source, cloud-optimized runtime, that is delivering on the promise of this innovation to enable rapid delivery of highly scalable and performant applications, without compromise.
Laying the Foundation for Ionic Platform Insights on SparkIonic Security
The Ionic Analytics team shares insights about the system they built using Spark and Databricks to enable low cost, flexible reporting and lay a foundation for advanced analytics.
These slides were originally presented at the Databricks Data+ML Workshop entitled "Unify Data Pipelines with Machine Learning" on Tuesday September 11 2018 in Atlanta, GA.
More details: https://imatge.upc.edu/web/publications/rapid-serial-visual-presentation-relevance-feedback-image-retrieval-eeg-signals
Author: Sergi Porta
Advisors: Eva Mohedano & Noel O'Connor (DCU) / Amaia Salavador & Xavier Giró-i-Nieto (UPC)
This thesis explores the potential of relevance feedback for image retrieval using EEG signals for human-computer interaction. This project aims at studying the optimal parameters of a rapid serial visual presentation (RSVP) of frames from a video database when the user is searching for an object instance. The simulations reported in this thesis assess the trade-off between using a small or a large amount of images in each RSVP round that captures the user feedback. While short RSVP rounds allow a quick learning of the user intention from the system, RSVP rounds must also be long enough to let users generate the P300 EEG signals which are triggered by relevant images. This work also addresses the problem of how to distribute potential relevant and non-relevant images in a RSVP round to maximize the probabilities of displaying each relevant frame separated at least 1 second from another relevant frame, as this configuration generates a cleaner P300 EEG signal. The presented simulations are based on a realistic set up for video retrieval with a subset of 1,000 frames from the TRECVID 2014 Instance Search task
Presenter: Amaia Salvador
Related papers:
E. Mohedano, Salvador, A., McGuinness, K., Giró-i-Nieto, X., O'Connor, N., and Marqués, F., “Bags of Local Convolutional Features for Scalable Instance Search”, in ACM International Conference on Multimedia Retrieval (ICMR), New York City, NY; USA. 2016
A. Salvador, Giró-i-Nieto, X., Marqués, F., and Satoh, S. 'ichi, “Faster R-CNN Features for Instance Search”, in CVPR Workshop Deep Vision, Las Vegas, NV, USA. 2016.
Abstract:
Image representations derived from pre-trained Convolutional Neural Networks (CNNs) have become the new state of the art in computer vision tasks such as instance retrieval. This work proposes a simple pipeline for encoding the local activations of a convolutional layer of a pre-trained CNN using the well-known bag of words aggregation scheme (BoW). Assigning each local array of activations in a convolutional layer to a visual word produces an assignment map, a compact representation that relates regions of an image with a visual word. We use the assignment map for fast spatial reranking, obtaining object localizations that are used for query expansion. We further investigate the potential of using convolutional features from an object detection network such as Faster R-CNN, which allows to obtain image- and region- wise features in a single forward pass. We demonstrate the suitability of such representations for image retrieval on the Oxford Buildings 5k, Paris Buildings 6k and a subset of TRECVid Instance Search 2013, achieving competitive results. This talk will review the two publications related to this work, which have been recently accepted at ICMR 2016 and DeepVision CVPRW 2016.
Barcelona, 3 May 2016.
https://imatge.upc.edu/web/publications/region-oriented-convolutional-networks-object-retrieval
BSc thesis by Eduard Fontdevila advised by Amaia Salvador and Xavier Giró-i-Nieto.
EET UPC, June 2015.
Quality, Relevance and Importance in Information Retrieval with Fuzzy Semanti...tmra
We propose a framework for ranking information based on quality, relevance and importance, and argue that a socio-semantic contextual approach that extends topicality can lead to increased value of information retrieval systems. We use Topic Maps to implement our framework, and discuss procedures for calculating the resource ranking. A fuzzy neural network approach is envisioned to complement the process of manual metadata creation.
Inverted Index Based Multi-Keyword Public-key Searchable Encryption with Stro...Mateus S. H. Cruz
Presentation given at the SWIM seminar (University of Tsukuba) about the paper "Inverted Index Based Multi-Keyword Public-key Searchable Encryption with Strong Privacy Guarantee"*.
This presentation is based on the uploader's understanding of the paper and may contain inaccurate interpretations.
A summary of the paper is available at: https://mshcruz.wordpress.com/2016/10/24/summary-inverted-index-based-multi-keyword-public-key-searchable-encryption-with-strong-privacy-guarantee/
*Wang et al.: "Inverted Index Based Multi-Keyword Public-key Searchable Encryption with Strong Privacy Guarantee". INFOCOM 2015.
Application Architecture Summit - Monitoring the Dynamic Cloud New Relic
How do you apply modern application to your digital business? Hear from New Relic's Sr Director, Strategic Architecture, Lee Atchison, at the Application Architecture Summit. Learn more here: https://newrelic.com/partner/aws
Fuzzy logic is often heralded as a technique for handling problems with large amounts of vagueness or uncertainty. Since its inception in 1965 it has grown from an obscure mathematical idea to a technique used in a wide variety of applications from cooking rice to controlling diesel engines on an ocean liner.
This talk will give a layman's introduction to the topic and explore some of the real world applications in control and human decision making. Examples might include household appliances, control of large industrial plant, and health monitoring systems for the elderly. We will look at where the field might be going over the next ten years, highlighting areas where DMU's specialist expertise drives the way.
How can you deal with Fuzzy Logic. Fuzzy logic is a form of many-valued logic; it deals with reasoning that is approximate rather than fixed and exact. In contrast with traditional logic theory, where binary sets have two-valued logic: true or false, fuzzy logic variables may have a truth value that ranges in degree
between 0 and 1
Privacy-Preserving Multi-Keyword Fuzzy Search over Encrypted Data in the CloudMateus S. H. Cruz
Presentation given at the SWIM seminar (University of Tsukuba) about the paper "Privacy-Preserving Multi-Keyword Fuzzy Search over Encrypted Data in the Cloud"*.
This presentation is based on the uploader's understanding of the paper and may contain inaccurate interpretations.
A summary of the paper is available at: https://mshcruz.wordpress.com/2016/08/19/summary-privacy-preserving-multi-keyword-fuzzy-search-over-encrypted-data-in-the-cloud/
*Wang et al.: "Privacy-Preserving Multi-Keyword Fuzzy Search over Encrypted Data in the Cloud". INFOCOM 2014.
No Compromise - Better, Stronger, Faster Java in the CloudAll Things Open
Presented at All Things Open 2022
Presented by Jarek Gawor & Harry L. Hoots, III
Title: No Compromise - Better, Stronger, Faster Java in the Cloud
Abstract: Innovation in the cloud-era is about driving efficiencies, agility, and greater opportunities to deploy workloads to the cloud of your choice. Join us as we explore critical challenges faced by organizations in their move to cloud-native architectures along with the innovation in Java standards, including MicroProfile and Jakarta EE, and emerging technologies that help them build and deploy their applications on any cloud, faster and with better performance. Throughout, we showcase Open Liberty, the open-source, cloud-optimized runtime, that is delivering on the promise of this innovation to enable rapid delivery of highly scalable and performant applications, without compromise.
Laying the Foundation for Ionic Platform Insights on SparkIonic Security
The Ionic Analytics team shares insights about the system they built using Spark and Databricks to enable low cost, flexible reporting and lay a foundation for advanced analytics.
These slides were originally presented at the Databricks Data+ML Workshop entitled "Unify Data Pipelines with Machine Learning" on Tuesday September 11 2018 in Atlanta, GA.
CloudHealth: A Model-Driven Approach to Watch the Health of Cloud ServicesAnas Shatnawi
The goal of this project is to develop a monitoring approach to watch the health of cloud services. We studied the state-of-the-art related to monitoring approaches and identified a list of challenges that need to be addressed in recent cloud environments. Such challenges are the automated deployment of probes to collect Key Performance Indicators (KPIs), the mapping of low level KPIs to high level monitoring goals, the attachment of probes to already running services, the generation of dynamic dashboards and some others.
The goal of this project is to develop a monitoring approach to watch the health of cloud services. We studied the state-of-the-art related to monitoring approaches and identified a list of challenges that need to be addressed in recent cloud environments. Such challenges are the automated deployment of probes to collect Key Performance Indicators (KPIs), the mapping of low level KPIs to high level monitoring goals, the attachment of probes to already running services, the generation of dynamic dashboards and some others.
We adapted existing theories, methods, and tools such as Elasticsearch, Kibana, Beats, Logstash, Simple Network Management Protocol and Ansiple, in order to develop address the identified challenges.
The CNCF ecosystem is large, diverse and continues to grow. CNCF would like to ensure cross-project interoperability and cross-cloud deployments of all cloud native technologies and show the daily status of builds and deployments on a status dashboard. Cross Cloud CI addresses this need.
Presented at STPCon 2016. With the extensive amount of testing performed nightly on large software projects, test and verification teams often experience lengthy wait times for the availability of test results of the latest build. As we strive to identify and resolve issues as fast as possible, alternative methods of test execution have to be found. Learn how to use Jenkins to launch tests in parallel across a number of Virtual Machines, monitor execution health, and process results. Learn about various Jenkins plugins and how they contributed to the solution. Learn how to trigger downstream jobs, even if they are on separate Jenkins instances.
UVM BASED REUSABLE VERIFICATION IP FOR WISHBONE COMPLIANT SPI MASTER COREVLSICS Design
The System on Chip design industry relies heavily on functional verification to ensure that the designs are bug-free. As design engineers are coming up with increasingly dense chips with much functionality, the functional verification field has advanced to provide modern verification techniques. In this paper, we
present verification of a wishbone compliant Serial Peripheral Interface (SPI) Master core using a System Verilog based standard verification methodology, the Universal Verification Methodology (UVM). The reason for using UVM factory pattern with parameterized classes is to develop a robust and reusable
verification IP. SPI is a full duplex communication protocol used to interface components most likely in embedded systems. We have verified an SPI Master IP core design that is wishbone compliant and compatible with SPI protocol and bus and furnished the results of our verification. We have used
QuestaSim for simulation and analysis of waveforms, Integrated Metrics Center, Cadence for coverage analysis. We also propose interesting future directions for this work in developing reliable systems.
UVM BASED REUSABLE VERIFICATION IP FOR WISHBONE COMPLIANT SPI MASTER COREVLSICS Design
The System on Chip design industry relies heavily on functional verification to ensure that the designs are bug-free. As design engineers are coming up with increasingly dense chips with much functionality, the functional verification field has advanced to provide modern verification techniques. In this paper, we present verification of a wishbone compliant Serial Peripheral Interface (SPI) Master core using a System Verilog based standard verification methodology, the Universal Verification Methodology (UVM). The reason for using UVM factory pattern with parameterized classes is to develop a robust and reusable verification IP. SPI is a full duplex communication protocol used to interface components most likely in embedded systems. We have verified an SPI Master IP core design that is wishbone compliant and compatible with SPI protocol and bus and furnished the results of our verification. We have used QuestaSim for simulation and analysis of waveforms, Integrated Metrics Center, Cadence for coverage analysis. We also propose interesting future directions for this work in developing reliable systems.
UVM BASED REUSABLE VERIFICATION IP FOR WISHBONE COMPLIANT SPI MASTER COREVLSICS Design
The System on Chip design industry relies heavily on functional verification to ensure that the designs are bug-free. As design engineers are coming up with increasingly dense chips with much functionality, the functional verification field has advanced to provide modern verification techniques. In this paper, we
present verification of a wishbone compliant Serial Peripheral Interface (SPI) Master core using a System Verilog based standard verification methodology, the Universal Verification Methodology (UVM). The reason for using UVM factory pattern with parameterized classes is to develop a robust and reusable
verification IP. SPI is a full duplex communication protocol used to interface components most likely in embedded systems. We have verified an SPI Master IP core design that is wishbone compliant and compatible with SPI protocol and bus and furnished the results of our verification. We have used
QuestaSim for simulation and analysis of waveforms, Integrated Metrics Center, Cadence for coverage analysis. We also propose interesting future directions for this work in developing reliable systems.
GOTOpia 2020: "The Past, Present, and Future of Cloud Native API Gateways"Daniel Bryant
Many engineers are confused about how a cloud-native API gateway relates to Kubernetes Ingress or a Service load balancer. This talk will unravel this confusion.
An API gateway is at the core of how APIs are managed, secured and presented within any web-based system. Although the technology has been in use for many years, it has not always kept pace with recent developments within the cloud-native space.
Join the expert to experts Daniel Bryant in uncovering the evolution of API gateways over the past ten years and how the original problems they were solving have shifted in relation to cloud-native technologies and workflow.
Current challenges of using an API gateway within Kubernetes: scaling the developer workflow, and supporting multiple architecture styles and protocols
In this talk, you'll learn:
How the evolution of API gateways looks
Strategies for exposing Kubernetes services and APIs at the edge of your system
A brief guide to the (potential) future of cloud-native API gateways
Implementing AI: Running AI at the Edge: ClickCV – Providing high-performance...KTN
The Implementing AI: Running AI at the Edge, hosted by KTN and eFutures, is the second event of the Implementing AI webinar series.
To make products more intelligent, more responsive and to reduce the data generated, it is advantageous to run AI on the product itself, as opposed to in the cloud.
The focus of this webinar was the opportunities and challenges of moving the AI processing to “the Edge”. The webinar had four presentations from experts covering overviews of the opportunity, implementation techniques and case studies.
Find out more: https://ktn-uk.co.uk/news/just-launched-implementing-ai-webinar-series
Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)Saimunur Rahman
This presentation was prepared for ViPr Reading group at Multimedia University, Cyberjaya. The goal of this presentation was to make aware the lab members about the recent advancements in action recognition.
This document provides an overview of deep generative learning and summarizes several key generative models including GANs, VAEs, diffusion models, and autoregressive models. It discusses the motivation for generative models and their applications such as image generation, text-to-image synthesis, and enhancing other media like video and speech. Example state-of-the-art models are provided for each application. The document also covers important concepts like the difference between discriminative and generative modeling, sampling techniques, and the training procedures for GANs and VAEs.
Machine translation and computer vision have greatly benefited from the advances in deep learning. A large and diverse amount of textual and visual data have been used to train neural networks whether in a supervised or self-supervised manner. Nevertheless, the convergence of the two fields in sign language translation and production still poses multiple open challenges, like the low video resources, limitations in hand pose estimation, or 3D spatial grounding from poses.
The transformer is the neural architecture that has received most attention in the early 2020's. It removed the recurrency in RNNs, replacing it with and attention mechanism across the input and output tokens of a sequence (cross-attenntion) and between the tokens composing the input (and output) sequences, named self-attention.
These slides review the research of our lab since 2016 on applied deep learning, starting from our participation in the TRECVID Instance Search 2014, moving into video analysis with CNN+RNN architectures, and our current efforts in sign language translation and production.
Machine translation and computer vision have greatly benefited of the advances in deep learning. The large and diverse amount of textual and visual data have been used to train neural networks whether in a supervised or self-supervised manner. Nevertheless, the convergence of the two field in sign language translation and production is still poses multiple open challenges, like the low video resources, limitations in hand pose estimation, or 3D spatial grounding from poses. This talk will present these challenges and the How2✌️Sign dataset (https://how2sign.github.io) recorded at CMU in collaboration with UPC, BSC, Gallaudet University and Facebook.
https://imatge.upc.edu/web/publications/sign-language-translation-and-production-multimedia-and-multimodal-challenges-all
https://imatge-upc.github.io/synthref/
Integrating computer vision with natural language processing has achieved significant progress
over the last years owing to the continuous evolution of deep learning. A novel vision and language
task, which is tackled in the present Master thesis is referring video object segmentation, in which a
language query defines which instance to segment from a video sequence. One of the biggest chal-
lenges for this task is the lack of relatively large annotated datasets since a tremendous amount of
time and human effort is required for annotation. Moreover, existing datasets suffer from poor qual-
ity annotations in the sense that approximately one out of ten language expressions fails to uniquely
describe the target object.
The purpose of the present Master thesis is to address these challenges by proposing a novel
method for generating synthetic referring expressions for an image (video frame). This method pro-
duces synthetic referring expressions by using only the ground-truth annotations of the objects as well
as their attributes, which are detected by a state-of-the-art object detection deep neural network. One
of the advantages of the proposed method is that its formulation allows its application to any object
detection or segmentation dataset.
By using the proposed method, the first large-scale dataset with synthetic referring expressions for
video object segmentation is created, based on an existing large benchmark dataset for video instance
segmentation. A statistical analysis and comparison of the created synthetic dataset with existing ones
is also provided in the present Master thesis.
The conducted experiments on three different datasets used for referring video object segmen-
tation prove the efficiency of the generated synthetic data. More specifically, the obtained results
demonstrate that by pre-training a deep neural network with the proposed synthetic dataset one can
improve the ability of the network to generalize across different datasets, without any additional annotation cost. This outcome is even more important taking into account that no additional annotation cost is involved.
Master MATT thesis defense by Juan José Nieto
Advised by Víctor Campos and Xavier Giro-i-Nieto.
27th May 2021.
Pre-training Reinforcement Learning (RL) agents in a task-agnostic manner has shown promising results. However, previous works still struggle to learn and discover meaningful skills in high-dimensional state-spaces. We approach the problem by leveraging unsupervised skill discovery and self-supervised learning of state representations. In our work, we learn a compact latent representation by making use of variational or contrastive techniques. We demonstrate that both allow learning a set of basic navigation skills by maximizing an information theoretic objective. We assess our method in Minecraft 3D maps with different complexities. Our results show that representations and conditioned policies learned from pixels are enough for toy examples, but do not scale to realistic and complex maps. We also explore alternative rewards and input observations to overcome these limitations.
https://imatge.upc.edu/web/publications/discovery-and-learning-navigation-goals-pixels-minecraft
Peter Muschick MSc thesis
Universitat Pollitecnica de Catalunya, 2020
Sign language recognition and translation has been an active research field in the recent years with most approaches using deep neural networks to extract information from sign language data. This work investigates the mostly disregarded approach of using human keypoint estimation from image and video data with OpenPose in combination with transformer network architecture. Firstly, it was shown that it is possible to recognize individual signs (4.5% word error rate (WER)). Continuous sign language recognition though was more error prone (77.3% WER) and sign language translation was not possible using the proposed methods, which might be due to low accuracy scores of human keypoint estimation by OpenPose and accompanying loss of information or insufficient capacities of the used transformer model. Results may improve with the use of datasets containing higher repetition rates of individual signs or focusing more precisely on keypoint extraction of hands.
https://github.com/telecombcn-dl/lectures-all/
These slides review techniques for interpreting the behavior of deep neural networks. The talk reviews basic techniques such as the display of filters and tensors, as well as more advanced ones that try to interpret which part of the input data is responsible for the predictions, or generate data that maximizes the activation of certain neurons.
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/dlai-2020/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/drl-2020/
This course presents the principles of reinforcement learning as an artificial intelligence tool based on the interaction of the machine with its environment, with applications to control tasks (eg. robotics, autonomous driving) o decision making (eg. resource optimization in wireless communication networks). It also advances in the development of deep neural networks trained with little or no supervision, both for discriminative and generative tasks, with special attention on multimedia applications (vision, language and speech).
Giro-i-Nieto, X. One Perceptron to Rule Them All: Language, Vision, Audio and Speech. In Proceedings of the 2020 International Conference on Multimedia Retrieval (pp. 7-8).
Tutorial page:
https://imatge.upc.edu/web/publications/one-perceptron-rule-them-all-language-vision-audio-and-speech-tutorial
Deep neural networks have boosted the convergence of multimedia data analytics in a unified framework shared by practitioners in natural language, vision and speech. Image captioning, lip reading or video sonorization are some of the first applications of a new and exciting field of research exploiting the generalization properties of deep neural representation. This tutorial will firstly review the basic neural architectures to encode and decode vision, text and audio, to later review the those models that have successfully translated information across modalities.
Image segmentation is a classic computer vision task that aims at labeling pixels with semantic classes. These slides provide an overview of the basic approaches applied from the deep learning field to tackle this challenge and presents the basic subtasks (semantic, instance and panoptic segmentation) and related datasets.
Presented at the International Summer School on Deep Learning (ISSonDL) 2020 held online and organized by the University of Gdansk (Poland) between the 30th August and 2nd September.
http://2020.dl-lab.eu/virtual-summer-school-on-deep-learning/
https://imatge-upc.github.io/rvos-mots/
Video object segmentation can be understood as a sequence-to-sequence task that can benefit from the curriculum learning strategies for better and faster training of deep neural networks. This work explores different schedule sampling and frame skipping variations to significantly improve the performance of a recurrent architecture. Our results on the car class of the KITTI-MOTS challenge indicate that, surprisingly, an inverse schedule sampling is a better option than a classic forward one. Also, that a progressive skipping of frames during training is beneficial, but only when training with the ground truth masks instead of the predicted ones.
Deep neural networks have achieved outstanding results in various applications such as vision, language, audio, speech, or reinforcement learning. These powerful function approximators typically require large amounts of data to be trained, which poses a challenge in the usual case where little labeled data is available. During the last year, multiple solutions have been proposed to leverage this problem, based on the concept of self-supervised learning, which can be understood as a specific case of unsupervised learning. This talk will cover its basic principles and provide examples in the field of multimedia.
More from Universitat Politècnica de Catalunya (20)
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Monitoring Java Application Security with JDK Tools and JFR Events
LIvRE: A Video Extension to the LIRE Content-Based Image Retrieval System
1. LIVRE: A VIDEO EXTENSION TO THE
LIRE CONTENT-BASED IMAGE
RETRIEVAL SYSTEM
Degree’s Final Project Dissertation
Telecommunications Engineering
Gabriel de Oliveira
Supervisors:
Assoc. Prof. Mathias Lux
Assoc. Prof. Xavier Giró
2. Outline of the Thesis
1. Introduction
i. Motivation
ii. Overview and previous work
2. Proposed solution: The LIvRE system
i. Parsing
ii. Indexing
iii.Retrieval
3. Validation
i. Dataset
- Stanford I2V Newscasts dataset
ii. Experiments
- Quantitative evaluation
- Qualitative evaluation - The thinking-aloud test.
4. Conclusions and Further Work
March – October 2015
Slide 2
3. Motivation
Goal: To develop an all-in-one open source system for CBVR.
• Server side requeriments:
• Fast
• Scalable
• Flexible
• Automated
• User interface requeriments:
• Fast
• OS and device independent
• Mobile
Slide 3
Introduction · Overview · LIvRE CBVR system · Validation · Conclusions
4. Overview and previous work
Slide 6
• Open source CBIR Library in Java
• Apache Lucene core
• Solr plugin
• Supports parsing, indexing and retrieval
• Global and local descriptors
• Web-based interface
[1] Mathias Lux. LIRE: Open source image retrieval in java. In Proceedings of the 21st ACM international conference on Multimedia, pages 843{846.ACM, 2013.
Introduction · Overview · LIvRE CBVR system · Validation · Conclusions
5. Developed solution: The LIvRE CBVR system
Slide 7
CBVR system - concept and requirements.
database
Introduction · Overview · LIvRE CBVR system · Validation · Conclusions
6. Developed solution: The LIvRE CBVR system
Slide 7
User sideServer side
LIvRE CBVR system architecture.
Introduction · Overview · LIvRE CBVR system · Validation · Conclusions
7. Block 1: Parsing
System Architecture - Parsing
Slide 8
1. Find videos in any given folder structure.
2. Extract keyframes from those videos.
3. Parse extracted keyframes with selected image descriptors.
- Color Layout, Edge Histogram, JCD and PHOG
4. Generate XML Documents with the Feature Vectors.
Tools are provided to:
Introduction · Overview · LIvRE CBVR system · Validation · Conclusions
8. Block 2: Indexing
Fig. System Architecture
Slide 8
1. Find XML Documents containing the Feature Vectors
(generated from Parsing Block).
2. Upload XML documents to Solr.
3. Commit changes in Solr core.
Tools are provided to:
Fig. System Architecture - Indexing
Introduction · Overview · LIvRE CBVR system · Validation · Conclusions
9. Block 3: Retrieval
System Architecture - Retrieval
Slide 8
1. Image search field.
2. Settings.
User web-based interface input:
Web-based user interface input as
displayed on small screen devices.
Introduction · Overview · LIvRE CBVR system · Validation · Conclusions
10. Block 3: Retrieval
System Architecture - Retrieval
Slide 8
1. Image search field.
2. Settings.
User web-based interface input:
Web-based user interface input as
displayed on small screen devices.
Introduction · Overview · LIvRE CBVR system · Validation · Conclusions
11. Block 3: Retrieval
Slide 8
1. Candidate videos displayed using HTML5.
2. Thumbnails with other similar frames.
3. Time refinement.
4. Video information.
User web-based results presentation:
System Architecture - Retrieval Retrieval results presentation for
small screen devices
Introduction · Overview · LIvRE CBVR system · Validation · Conclusions
12. Block 3: Retrieval
Slide 8
1. Candidate videos displayed using HTML5.
2. Thumbnails with other similar frames.
3. Time refinement and ranking.
4. Video information.
User web-based results presentation:
Fig. System Architecture - Retrieval Fig. Retrieval results presentation
for small screen devices
Introduction · Overview · LIvRE CBVR system · Validation · Conclusions
13. LIvRE CBVR system demo
Introduction · Overview · LIvRE CBVR system · Dataset · Experiments · Conclusions
Slide 9
14. Validation
Stanford I2V Dataset
Freely available data set.
Large (~1TB Video)
• 23,443 video clips
• Average video duration: 2,65min.
• Keyframes @1fps: 3,808,760
• Video hours: 1,035h
Ground-truth
• 78 queries
Some query images and video frames
from the Stanford I2V dataset.
Slide 14
Introduction · Overview · LIvRE CBVR system · Validation · Conclusions
15. Validation
Experiments
LIvRE CBVR system tested with 2
different evaluation methods:
Slide 16
1
2
Quantitative evaluation
Qualitative evaluation
(Thinking-aloud Test)
Introduction · Overview · LIvRE CBVR system · Validation · Conclusions
16. Quantitative study:
• Use ground-truth provided with the dataset for:
• Scene Retrieval evaluation (finding the right video).
• Time Refinement evaluation (finding the right moment of time at the right video).
Qualitative study:
• Web-based user interface.
• Thinking-aloud Test (offline).
• Participants are expert and non expert users.
• 4 Non-expert users.
• 2 Expert users.
Slide 17
1
2
Validation
Experiments
Introduction · Overview · LIvRE CBVR system · Validation · Conclusions
18. Quantitative study
Slide 18
1st Stage: Scene Retrieval
Introduction · Overview · LIvRE CBVR system · Validation · Conclusions
19. Quantitative study
Slide 18
2nd Stage: Temporal Refinement
Temporal Refinement results for 100k candidates
Introduction · Overview · LIvRE CBVR system · Validation · Conclusions
20. Qualitative study
Thinking-aloud Test
• Volunteer participants perform specific tasks with the web-based
user interface.
• LIvRE CBVR system is running locally (offline) on the machine.
• Participants show their thoughts in loud-voice.
• Sessions are recorded and evaluated.
Slide 18
Introduction · Overview · LIvRE CBVR system · Validation · Conclusions
21. Qualitative study
Thinking-aloud Test
Slide 19
Sample input query frames Screenshots from Thinking-aloud test 1
Introduction · Overview · LIvRE CBVR system · Validation · Conclusions
Timing results for 50K candidates (in miliseconds)
22. Conclusions and Future Work
- New CBVR System, LIvRE, was developed as an
extension of LIRE.
- LIvRE is now a branch of the LIRE Solr Project.
Slide 28
Future work:
• Local image descriptors.
• Integration of sound descriptors.
• Simplified set-up and deployment.
• Demo paper at ICMR 2016.
• Add Video annotation tool.
• Integration with computer vision / deep learning projects.
Introduction · Overview · LIvRE CBVR system · Validation · Conclusions
23. Thank you for your attention
Do you have any question?
8 October 2015
LIvRE: A Video Extension to the LIRE
Content-Based Image Retrieval System.
Gabriel de Oliveira
Editor's Notes
Intro to what is a CBVR system.
http://es.slideshare.net/dermotte/lire-27544341?related=2
On this slide: Explain LIRE, the Lucene core, the LireSolr plugin, the descriptors flexibility, the web based interface for image retrieval
In this slide: Explain the concepts and requirements for a CBVR system
3 Blocks: Parsing, Indexing and Retrieval (Querying)Server side: From the video dataset to the search engine. User side: Interface
Given a video dataset, the first block, Parsing, performs the first step by
taking this video dataset as input and outputs documents containing all the
image features from the keyframes of each one of the videos.
Given a running and set-up deployment of the Apache Solr search engine with
the LireSolr plugin installed and configured, as well and the XML documents obtained
during the parsing stage containing the image features of the keyframes,
the user is given a tool to perform the following actions automatically:
Given a previously indexed video dataset on a deployment of the Apache Solr
search engine, with the LireSolr plugin installed and congured, the user must
be given a web-based interface to perform the following actions:
Given a previously indexed video dataset on a deployment of the Apache Solr
search engine, with the LireSolr plugin installed and congured, the user must
be given a web-based interface to perform the following actions:
In addition, the web-based interface should independent from the device, OS,
and web browser. It should as well be scalable and modular to be usable from
any screen size.
In addition, the web-based interface should independent from the device, OS,
and web browser. It should as well be scalable and modular to be usable from
any screen size.
k is the rank in the sequence of retrieved documents, n is the number of
retrieved documents, P(k) is the precision at cut-o k in the list, and r(k) is
the change in recall from items k-1 to k. Although Average Precision assesses
the quality of the returned ranked list of results and is useful in applications
where a list of potential results is shown to the user, we also measure Precision at
1 (p@1), since it is important in cases where the best result is directly returned
to the user (for example, in the case where the system would start playing the
best clip match without further interaction with the user).
Jaccard index is computed by the ratio between the
intersection of the retrieved and ground truth sequences and their union.