Comparing Incremental Learning Strategies for Convolutional Neural NetworksVincenzo Lomonaco
In the last decade, Convolutional Neural Networks (CNNs) have shown to perform incredibly well in many computer vision tasks such as object recognition and object detection, being able to extract meaningful high-level invariant features. However, partly because of their complex training and tricky hyper-parameters tuning, CNNs have been scarcely studied in the context of incremental learning where data are available in consecutive batches and retraining the model from scratch is unfeasible. In this work we compare different incremental learning strategies for CNN based architectures, targeting real-word applications.
If you are interested in this work please cite:
Lomonaco, V., & Maltoni, D. (2016, September). Comparing Incremental Learning Strategies for Convolutional Neural Networks. In IAPR Workshop on Artificial Neural Networks in Pattern Recognition (pp. 175-184). Springer International Publishing.
For further information visit my website: http://www.vincenzolomonaco.com/
Deep Learning for Computer Vision: A comparision between Convolutional Neural...Vincenzo Lomonaco
This document describes a study comparing Convolutional Neural Networks (CNNs) and Hierarchical Temporal Memories (HTMs) on object recognition tasks. The study implements a CNN using Theano, creates a new benchmark of image sequences from the NORB dataset, and evaluates the performance of CNNs and HTMs on the original NORB dataset and new image sequences. The results show that while CNNs achieve higher accuracy on the original NORB data, HTMs are more competitive on the image sequences and can achieve comparable performance using less training data. The study proves that bio-inspired approaches like HTM can advance deep learning research.
Naver learning to rank question answer pairs using hrde-ltcNAVER Engineering
The automatic question answering (QA) task has long been considered a primary objective of artificial intelligence.
Among the QA sub-systems, we focused on answer-ranking part. In particular, we investigated a novel neural network architecture with additional data clustering module to improve the performance in ranking answer candidates which are longer than a single sentence. This work can be used not only for the QA ranking task, but also to evaluate the relevance of next utterance with given dialogue generated from the dialogue model.
In this talk, I'll present our research results (NAACL 2018), and also its potential use cases (i.e. fake news detection). Finally, I'll conclude by introducing some issues on previous research, and by introducing recent approach in academic.
The Munich LSTM-RNN Approach to the MediaEval 2014 “Emotion in Music” Taskmultimediaeval
In this paper we describe TUM's approach for the MediaEval's Emotion in Music" task. The goal of this task is to automatically estimate the emotions expressed by music (in terms of Arousal and Valence) in a time-continuous fashion. Our system consists of Long-Short Term Memory Recurrent Neural Networks (LSTM-RNN) for dynamic Arousal and Valence regression. We used two different sets of acoustic and psychoacoustic features that have been previously proven as effective for emotion prediction in music and speech. The best model yielded an average Pearson's correlation coeficient of 0.354 (Arousal) and 0.198 (Valence), and an average Root Mean Squared Error of 0.102 (Arousal) and 0.079 (Valence).
http://ceur-ws.org/Vol-1263/mediaeval2014_submission_7.pdf
This document summarizes a research paper that analyzes deep networks using kernel methods. It hypothesizes that (1) representations in higher layers of deep networks are simpler and more accurate, and (2) the network architecture controls how quickly representations are formed. The researchers used kernel principal component analysis to measure representation simplicity and accuracy at each layer of deep networks trained on MNIST and CIFAR. Their experiments found support for both hypotheses and that convolutional and pretrained networks form representations more systematically than standard multilayer perceptrons.
○ 개요
현재 많은 연구자들이 network를 깊고 넓게 설계함으로써 높은 인식률을 갖는 네트워크를 얻고 있다. Network의 크기가 증가하면서 parameter와 computation의 수가 증가하게 되었고, 이러한 문제를 해결하기 위하여 pruning을 기반으로 한 압축 알고리즘들이 제안되어 왔다. 하지만 이러한 방법을 이용하여서는 network architecture자체를 바꿀 수 없기 때문에, 구조에서 오는 한계점들은 해결할 수 없었다.
Network recasting은 구조의 특성으로 인하여 발생하는 한계들을 해결하기 위하여 network architecture 자체를 바꾸는 방법이다. Network recasting을 이용하면 network를 구성하고있는 block들을 다른 형태의 block으로 변환을 할 수 있게 된다. Block-wise recasting 방법을 사용하여 각 block들을 변환할 수 있고, 해당 방법을 연속하여 적용함으로써 전체 network의 구조를 바꿀 수 있다. Sequential recasting 방법을 이용하게 되면 inference accuracy를 더욱 잘 보존할 수 있고, 또한 network architecture에 상관 없이 vanishing gradient problem을 완화 시킬 수 있다. Network recasting을 같은 network architecture에 적용하게 되면 parameter와 computation을 줄이는 효과를 얻을 수 있고, 다른 종류의 network architecture로 변환하게 되면 network를 가속시킬 수 있다. 이러한 경우에는 network architecture 자체를 변경할 수 있기 때문에 구조적 한계보다 더 높은 속도 향상을 얻을 수 있다.
Comparing Incremental Learning Strategies for Convolutional Neural NetworksVincenzo Lomonaco
In the last decade, Convolutional Neural Networks (CNNs) have shown to perform incredibly well in many computer vision tasks such as object recognition and object detection, being able to extract meaningful high-level invariant features. However, partly because of their complex training and tricky hyper-parameters tuning, CNNs have been scarcely studied in the context of incremental learning where data are available in consecutive batches and retraining the model from scratch is unfeasible. In this work we compare different incremental learning strategies for CNN based architectures, targeting real-word applications.
If you are interested in this work please cite:
Lomonaco, V., & Maltoni, D. (2016, September). Comparing Incremental Learning Strategies for Convolutional Neural Networks. In IAPR Workshop on Artificial Neural Networks in Pattern Recognition (pp. 175-184). Springer International Publishing.
For further information visit my website: http://www.vincenzolomonaco.com/
Deep Learning for Computer Vision: A comparision between Convolutional Neural...Vincenzo Lomonaco
This document describes a study comparing Convolutional Neural Networks (CNNs) and Hierarchical Temporal Memories (HTMs) on object recognition tasks. The study implements a CNN using Theano, creates a new benchmark of image sequences from the NORB dataset, and evaluates the performance of CNNs and HTMs on the original NORB dataset and new image sequences. The results show that while CNNs achieve higher accuracy on the original NORB data, HTMs are more competitive on the image sequences and can achieve comparable performance using less training data. The study proves that bio-inspired approaches like HTM can advance deep learning research.
Naver learning to rank question answer pairs using hrde-ltcNAVER Engineering
The automatic question answering (QA) task has long been considered a primary objective of artificial intelligence.
Among the QA sub-systems, we focused on answer-ranking part. In particular, we investigated a novel neural network architecture with additional data clustering module to improve the performance in ranking answer candidates which are longer than a single sentence. This work can be used not only for the QA ranking task, but also to evaluate the relevance of next utterance with given dialogue generated from the dialogue model.
In this talk, I'll present our research results (NAACL 2018), and also its potential use cases (i.e. fake news detection). Finally, I'll conclude by introducing some issues on previous research, and by introducing recent approach in academic.
The Munich LSTM-RNN Approach to the MediaEval 2014 “Emotion in Music” Taskmultimediaeval
In this paper we describe TUM's approach for the MediaEval's Emotion in Music" task. The goal of this task is to automatically estimate the emotions expressed by music (in terms of Arousal and Valence) in a time-continuous fashion. Our system consists of Long-Short Term Memory Recurrent Neural Networks (LSTM-RNN) for dynamic Arousal and Valence regression. We used two different sets of acoustic and psychoacoustic features that have been previously proven as effective for emotion prediction in music and speech. The best model yielded an average Pearson's correlation coeficient of 0.354 (Arousal) and 0.198 (Valence), and an average Root Mean Squared Error of 0.102 (Arousal) and 0.079 (Valence).
http://ceur-ws.org/Vol-1263/mediaeval2014_submission_7.pdf
This document summarizes a research paper that analyzes deep networks using kernel methods. It hypothesizes that (1) representations in higher layers of deep networks are simpler and more accurate, and (2) the network architecture controls how quickly representations are formed. The researchers used kernel principal component analysis to measure representation simplicity and accuracy at each layer of deep networks trained on MNIST and CIFAR. Their experiments found support for both hypotheses and that convolutional and pretrained networks form representations more systematically than standard multilayer perceptrons.
○ 개요
현재 많은 연구자들이 network를 깊고 넓게 설계함으로써 높은 인식률을 갖는 네트워크를 얻고 있다. Network의 크기가 증가하면서 parameter와 computation의 수가 증가하게 되었고, 이러한 문제를 해결하기 위하여 pruning을 기반으로 한 압축 알고리즘들이 제안되어 왔다. 하지만 이러한 방법을 이용하여서는 network architecture자체를 바꿀 수 없기 때문에, 구조에서 오는 한계점들은 해결할 수 없었다.
Network recasting은 구조의 특성으로 인하여 발생하는 한계들을 해결하기 위하여 network architecture 자체를 바꾸는 방법이다. Network recasting을 이용하면 network를 구성하고있는 block들을 다른 형태의 block으로 변환을 할 수 있게 된다. Block-wise recasting 방법을 사용하여 각 block들을 변환할 수 있고, 해당 방법을 연속하여 적용함으로써 전체 network의 구조를 바꿀 수 있다. Sequential recasting 방법을 이용하게 되면 inference accuracy를 더욱 잘 보존할 수 있고, 또한 network architecture에 상관 없이 vanishing gradient problem을 완화 시킬 수 있다. Network recasting을 같은 network architecture에 적용하게 되면 parameter와 computation을 줄이는 효과를 얻을 수 있고, 다른 종류의 network architecture로 변환하게 되면 network를 가속시킬 수 있다. 이러한 경우에는 network architecture 자체를 변경할 수 있기 때문에 구조적 한계보다 더 높은 속도 향상을 얻을 수 있다.
Learning Convolutional Neural Networks for GraphsMathias Niepert
This document discusses a method called Patchy for applying convolutional neural networks to graph-structured data. Patchy selects node sequences from graphs using centrality measures and assembles neighborhoods around the nodes. The neighborhoods are normalized and used as receptive fields for a convolutional architecture. Experiments on graph classification benchmarks show Patchy can outperform graph kernels in terms of efficiency and effectiveness while also supporting visualization of learned edge filters. Potential limitations include increased risk of overfitting on small datasets compared to graph kernels.
AI&BigData Lab 2016. Александр Баев: Transfer learning - зачем, как и где.GeeksLab Odessa
4.6.16 AI&BigData Lab
Upcoming events: goo.gl/I2gJ4H
Поговорим об одной из базовых практических техник обучения нейронных сетей - предобучение, finetuning, transfer learning. В каких случаях применять, какие модели использовать, где их брать и как адаптировать.
https://telecombcn-dl.github.io/2017-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
This document summarizes a technical seminar on using convolutional neural networks for P300 detection in brain-computer interfaces. The seminar covers an introduction to brain-computer interfaces and the P300 signal, describes existing P300 detection systems and the convolutional neural network approach, and presents the network architecture, learning process, evaluation results on two datasets showing improved detection rates over other methods, and conclusions. The seminar demonstrates that the convolutional neural network approach outperforms existing methods for P300 detection, especially with a limited number of electrodes or training epochs.
Convolutional neural networks (CNNs) are better suited than traditional neural networks for processing image data due to properties of images. CNNs apply filters with local receptive fields and shared weights across the input, allowing them to detect features regardless of position. A CNN architecture consists of convolutional layers that apply filters, and pooling layers for downsampling. This reduces parameters and allows the network to learn representations of the input with minimal feature engineering.
1) iMTFA is an incremental approach to few-shot instance segmentation that allows adding new classes without retraining.
2) It extends the MTFA baseline by training an instance feature extractor to generate discriminative embeddings for each instance, with the average embedding used as the class representative.
3) At inference, it predicts classes based on the cosine distance between ROI embeddings and stored class representatives, using class-agnostic box regression and mask prediction.
4) Experiments on COCO, VOC2007 and VOC2012 show iMTFA outperforms SOTA few-shot object detection and instance segmentation methods while enabling incremental class addition.
This document proposes a method called Factor Transfer for compressing complex networks via knowledge transfer from a teacher network to a student network. It introduces paraphrasing and translating modules to extract factors from the teacher and student networks and minimize their difference, unlike existing methods that directly compare outputs. Experiments on image classification datasets CIFAR-10, CIFAR-100 and ImageNet, as well as object detection, show the proposed method helps increase student network accuracy compared to directly transferring knowledge or attention from the teacher.
This document summarizes research using neuroevolution techniques like HyperNEAT to train deep learning networks on image classification tasks. It describes using HyperNEAT both to directly train networks to classify MNIST handwritten digits, and to act as a feature extractor by evolving the first layers of a network and then training subsequent layers with backpropagation. The experiments compare different HyperNEAT architectures - traditional ANNs versus convolutional networks - and evaluate their performance on classifying MNIST test images both with and without the additional backpropagation training of later layers.
Focal Loss for Dense Object Detection proposes a novel focal loss function to address the extreme foreground-background class imbalance encountered in training dense object detectors. The focal loss focuses training on hard examples and prevents easy negatives from overwhelming the detector. RetinaNet, a simple dense detector designed with a ResNet-FPN backbone and focal loss, achieves state-of-the-art accuracy while running faster than existing two-stage detectors. Extensive experiments demonstrate the focal loss enables training highly accurate dense detectors on datasets with vast numbers of background examples like COCO.
Modern Convolutional Neural Network techniques for image segmentationGioele Ciaparrone
Recently, Convolutional Neural Networks have been successfully applied to image segmentation tasks. Here we present some of the most recent techniques that increased the accuracy in such tasks. First we describe the Inception architecture and its evolution, which allowed to increase width and depth of the network without increasing the computational burden. We then show how to adapt classification networks into fully convolutional networks, able to perform pixel-wise classification for segmentation tasks. We finally introduce the hypercolumn technique to further improve state-of-the-art on various fine-grained localization tasks.
Offline Character Recognition Using Monte Carlo Method and Neural Networkijaia
Human Machine interface are constantly gaining improvements because of increasing development of
computer tools. Handwritten Character Recognition do have various significant applications like form
scanning, verification, validation, or checks reading. Because of the importance of these applications
passionate research in the field of Off-Line handwritten character recognition is going on. The challenge in
recognising the handwritings lies in the nature of humans, having unique styles in terms of font, contours,
etc. This paper presents a novice approach to identify the offline characters; we call it as character divider
approach which can be used after pre-processing stage. We devise an innovative approach for feature
extraction known as vector contour. We also discuss the pros and cons including limitations, of our
approach
A comprehensive tutorial on Convolutional Neural Networks (CNN) which talks about the motivation behind CNNs and Deep Learning in general, followed by a description of the various components involved in a typical CNN layer. It explains the theory involved with the different variants used in practice and also, gives a big picture of the whole network by putting everything together.
Next, there's a discussion of the various state-of-the-art frameworks being used to implement CNNs to tackle real-world classification and regression problems.
Finally, the implementation of the CNNs is demonstrated by implementing the paper 'Age ang Gender Classification Using Convolutional Neural Networks' by Hassner (2015).
https://telecombcn-dl.github.io/dlmm-2017-dcu/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
Part B, ML in/for Wireless LANs.
Contents: Basics of ML, Applications of ML for Wireless Networks, and Techniques to train ML model in Wireless Networks.
The document discusses relational knowledge distillation (RKD), a technique for transferring knowledge from a teacher model to a student model. It begins by providing background on knowledge distillation and recent approaches. It then introduces RKD, which transfers relational information between examples in the teacher's embedding space, such as distances and angles, rather than just individual example outputs. The document describes experiments applying RKD to metric learning, image classification, and few-shot learning, finding it improves student model performance over other distillation methods. It concludes RKD effectively leverages relational information to transfer knowledge between models.
HardNet: Convolutional Network for Local Image DescriptionDmytro Mishkin
We introduce a novel loss for learning local feature descriptors which is inspired by the Lowe's matching criterion for SIFT. We show that the proposed loss that maximizes the distance between the closest positive and closest negative patch in the batch is better than complex regularization methods; it works well for both shallow and deep convolution network architectures. Applying the novel loss to the L2Net CNN architecture results in a compact descriptor -- it has the same dimensionality as SIFT (128) that shows state-of-art performance in wide baseline stereo, patch verification and instance retrieval benchmarks. It is fast, computing a descriptor takes about 1 millisecond on a low-end GPU.
InVID verification application presentation at SMVW16InVID Project
Presentation of the InVID verification application at Social Media Verification Workshop (SMVW16) that was organized by the REVEAL project and took place in Athens, Greece, on September 16th, 2016.
Rolf Fricke from Condat AG, a member of the InVID consortium, presented the InVID applications and standards for UGC verification at the Video Day of the IPTC Autumn Meeting 2016 that was held at the DPA Headquarters in Berlin, on 25 October 2016.
Learning Convolutional Neural Networks for GraphsMathias Niepert
This document discusses a method called Patchy for applying convolutional neural networks to graph-structured data. Patchy selects node sequences from graphs using centrality measures and assembles neighborhoods around the nodes. The neighborhoods are normalized and used as receptive fields for a convolutional architecture. Experiments on graph classification benchmarks show Patchy can outperform graph kernels in terms of efficiency and effectiveness while also supporting visualization of learned edge filters. Potential limitations include increased risk of overfitting on small datasets compared to graph kernels.
AI&BigData Lab 2016. Александр Баев: Transfer learning - зачем, как и где.GeeksLab Odessa
4.6.16 AI&BigData Lab
Upcoming events: goo.gl/I2gJ4H
Поговорим об одной из базовых практических техник обучения нейронных сетей - предобучение, finetuning, transfer learning. В каких случаях применять, какие модели использовать, где их брать и как адаптировать.
https://telecombcn-dl.github.io/2017-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
This document summarizes a technical seminar on using convolutional neural networks for P300 detection in brain-computer interfaces. The seminar covers an introduction to brain-computer interfaces and the P300 signal, describes existing P300 detection systems and the convolutional neural network approach, and presents the network architecture, learning process, evaluation results on two datasets showing improved detection rates over other methods, and conclusions. The seminar demonstrates that the convolutional neural network approach outperforms existing methods for P300 detection, especially with a limited number of electrodes or training epochs.
Convolutional neural networks (CNNs) are better suited than traditional neural networks for processing image data due to properties of images. CNNs apply filters with local receptive fields and shared weights across the input, allowing them to detect features regardless of position. A CNN architecture consists of convolutional layers that apply filters, and pooling layers for downsampling. This reduces parameters and allows the network to learn representations of the input with minimal feature engineering.
1) iMTFA is an incremental approach to few-shot instance segmentation that allows adding new classes without retraining.
2) It extends the MTFA baseline by training an instance feature extractor to generate discriminative embeddings for each instance, with the average embedding used as the class representative.
3) At inference, it predicts classes based on the cosine distance between ROI embeddings and stored class representatives, using class-agnostic box regression and mask prediction.
4) Experiments on COCO, VOC2007 and VOC2012 show iMTFA outperforms SOTA few-shot object detection and instance segmentation methods while enabling incremental class addition.
This document proposes a method called Factor Transfer for compressing complex networks via knowledge transfer from a teacher network to a student network. It introduces paraphrasing and translating modules to extract factors from the teacher and student networks and minimize their difference, unlike existing methods that directly compare outputs. Experiments on image classification datasets CIFAR-10, CIFAR-100 and ImageNet, as well as object detection, show the proposed method helps increase student network accuracy compared to directly transferring knowledge or attention from the teacher.
This document summarizes research using neuroevolution techniques like HyperNEAT to train deep learning networks on image classification tasks. It describes using HyperNEAT both to directly train networks to classify MNIST handwritten digits, and to act as a feature extractor by evolving the first layers of a network and then training subsequent layers with backpropagation. The experiments compare different HyperNEAT architectures - traditional ANNs versus convolutional networks - and evaluate their performance on classifying MNIST test images both with and without the additional backpropagation training of later layers.
Focal Loss for Dense Object Detection proposes a novel focal loss function to address the extreme foreground-background class imbalance encountered in training dense object detectors. The focal loss focuses training on hard examples and prevents easy negatives from overwhelming the detector. RetinaNet, a simple dense detector designed with a ResNet-FPN backbone and focal loss, achieves state-of-the-art accuracy while running faster than existing two-stage detectors. Extensive experiments demonstrate the focal loss enables training highly accurate dense detectors on datasets with vast numbers of background examples like COCO.
Modern Convolutional Neural Network techniques for image segmentationGioele Ciaparrone
Recently, Convolutional Neural Networks have been successfully applied to image segmentation tasks. Here we present some of the most recent techniques that increased the accuracy in such tasks. First we describe the Inception architecture and its evolution, which allowed to increase width and depth of the network without increasing the computational burden. We then show how to adapt classification networks into fully convolutional networks, able to perform pixel-wise classification for segmentation tasks. We finally introduce the hypercolumn technique to further improve state-of-the-art on various fine-grained localization tasks.
Offline Character Recognition Using Monte Carlo Method and Neural Networkijaia
Human Machine interface are constantly gaining improvements because of increasing development of
computer tools. Handwritten Character Recognition do have various significant applications like form
scanning, verification, validation, or checks reading. Because of the importance of these applications
passionate research in the field of Off-Line handwritten character recognition is going on. The challenge in
recognising the handwritings lies in the nature of humans, having unique styles in terms of font, contours,
etc. This paper presents a novice approach to identify the offline characters; we call it as character divider
approach which can be used after pre-processing stage. We devise an innovative approach for feature
extraction known as vector contour. We also discuss the pros and cons including limitations, of our
approach
A comprehensive tutorial on Convolutional Neural Networks (CNN) which talks about the motivation behind CNNs and Deep Learning in general, followed by a description of the various components involved in a typical CNN layer. It explains the theory involved with the different variants used in practice and also, gives a big picture of the whole network by putting everything together.
Next, there's a discussion of the various state-of-the-art frameworks being used to implement CNNs to tackle real-world classification and regression problems.
Finally, the implementation of the CNNs is demonstrated by implementing the paper 'Age ang Gender Classification Using Convolutional Neural Networks' by Hassner (2015).
https://telecombcn-dl.github.io/dlmm-2017-dcu/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
Part B, ML in/for Wireless LANs.
Contents: Basics of ML, Applications of ML for Wireless Networks, and Techniques to train ML model in Wireless Networks.
The document discusses relational knowledge distillation (RKD), a technique for transferring knowledge from a teacher model to a student model. It begins by providing background on knowledge distillation and recent approaches. It then introduces RKD, which transfers relational information between examples in the teacher's embedding space, such as distances and angles, rather than just individual example outputs. The document describes experiments applying RKD to metric learning, image classification, and few-shot learning, finding it improves student model performance over other distillation methods. It concludes RKD effectively leverages relational information to transfer knowledge between models.
HardNet: Convolutional Network for Local Image DescriptionDmytro Mishkin
We introduce a novel loss for learning local feature descriptors which is inspired by the Lowe's matching criterion for SIFT. We show that the proposed loss that maximizes the distance between the closest positive and closest negative patch in the batch is better than complex regularization methods; it works well for both shallow and deep convolution network architectures. Applying the novel loss to the L2Net CNN architecture results in a compact descriptor -- it has the same dimensionality as SIFT (128) that shows state-of-art performance in wide baseline stereo, patch verification and instance retrieval benchmarks. It is fast, computing a descriptor takes about 1 millisecond on a low-end GPU.
InVID verification application presentation at SMVW16InVID Project
Presentation of the InVID verification application at Social Media Verification Workshop (SMVW16) that was organized by the REVEAL project and took place in Athens, Greece, on September 16th, 2016.
Rolf Fricke from Condat AG, a member of the InVID consortium, presented the InVID applications and standards for UGC verification at the Video Day of the IPTC Autumn Meeting 2016 that was held at the DPA Headquarters in Berlin, on 25 October 2016.
Presentation of the InVID system architecture and multimodal analytics dashboard at Social Media Verification Workshop (SMVW16) that was organized by the REVEAL project and took place in Athens, Greece, on September 16th, 2016.
The InVID project overview presentation at Social Media Verification Workshop (SMVW16) that was organized by the REVEAL project and took place in Athens, Greece, on September 16th, 2016.
NS-CUK Seminar: J.H.Lee, Review on "Task Relation-aware Continual User Repres...ssuser4b1f48
1) The document proposes a new continual user representation learning method called TERACON that learns from a continuous stream of tasks while retaining knowledge from previous tasks and capturing relationships between tasks.
2) TERACON uses task embeddings to generate relation-aware task-specific masks that maintain learning ability and facilitate capturing task relationships.
3) It prevents "catastrophic forgetting" using a knowledge retention module with pseudo-labeling on past tasks.
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017MLconf
This document discusses deep reinforcement learning and concept network reinforcement learning. It begins with an introduction to reinforcement learning concepts like Markov decision processes and value-based methods. It then describes Concept-Network Reinforcement Learning which decomposes complex tasks into high-level concepts or actions. This allows composing existing solutions to sub-problems without retraining. The document provides examples of using concept networks for lunar lander and robot pick-and-place tasks. It concludes by discussing how concept networks can improve sample efficiency, especially for sparse reward problems.
This document discusses using deep reinforcement learning and deep learning techniques for agent-based models. It discusses using deep learning to approximate policy and value functions, using imitation learning to learn from expert demonstrations, and using Q-learning and model-based reinforcement learning to optimize agent behavior. Micro-emulations use deep learning to model individual agent behavior, while macro-emulations aim to emulate the overall system behavior. Open problems include using reinforcement learning to find optimal policies given an agent-based model simulator.
The document summarizes the Trajectory Transformer model, which frames reinforcement learning as a single sequence modeling problem that can be solved using a Transformer architecture. It describes how the model unifies components like the critic, actor, and dynamics model. The Trajectory Transformer directly models state, action, and reward sequences. It can be used for tasks like imitation learning, goal-reaching, and offline RL by applying techniques like beam search while conditioning on goals or rewards. Experiments show it achieves good performance on imitation learning, goal-reaching, and offline RL benchmarks.
Combinatorial optimization and deep reinforcement learning민재 정
The document discusses using deep learning approaches for solving combinatorial optimization problems like task allocation. It reviews different reinforcement learning methods that have been applied to problems like the vehicle routing problem using pointer networks, transformers, and graph neural networks. Future work opportunities are identified in applying these deep learning techniques to multi-vehicle routing problems and using them to solve specific task allocation scenarios.
Machine learning techniques can be applied in formal verification in several ways:
1) To enhance current formal verification tools by automating tasks like debugging, specification mining, and theorem proving.
2) To enable the development of new formal verification tools by applying machine learning to problems like SAT solving, model checking, and property checking.
3) Specific applications include using machine learning for debugging and root cause identification, learning specifications from runtime traces, aiding theorem proving by selecting heuristics, and tuning SAT solver parameters and selection.
Reinforcement Learning and Artificial Neural NetsPierre de Lacaze
The document provides an overview of reinforcement learning and artificial neural networks. It discusses key concepts in reinforcement learning including Markov decision processes, the Q-learning algorithm, temporal difference learning, and challenges in reinforcement learning like exploration vs exploitation. It also covers basics of artificial neural networks like linear and sigmoid units, backpropagation for training multi-layer networks, and applications of neural networks to problems like image recognition.
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...Ryo Takahashi
This document proposes an approach for quantizing neural networks to integer values only in order to enable efficient inference on common hardware like CPUs. It involves: (1) Quantizing weights and activations to unsigned 8-bit integers during both training and inference, while keeping biases in 32-bit. (2) Performing "quantization-aware training" where the model is trained with quantization simulated to help handle outlier values. Experiments on MobileNets for ImageNet classification and COCO object detection showed up to 50% faster inference with minimal accuracy loss using this integer-only quantization approach.
Slides for the paper titled "Structured pruning of LSTMs via Eigenanalysis and Geometric Median for Mobile Multimedia and Deep Learning Applications", by N. Gkalelis and V. Mezaris, presented at the 22nd IEEE Int. Symposium on Multimedia (ISM), Dec. 2020.
Implementing a neural network potential for exascale molecular dynamicsPFHub PFHub
The document discusses implementing a neural network potential for molecular dynamics simulations using the CabanaMD framework. Key points include:
- A neural network potential was implemented in CabanaMD using Kokkos/Cabana constructs, offering significant on-node scalability and the first GPU implementation of a neural network potential, showing up to 10x speedup over CPU.
- Data layout changes provided an additional 10% performance gain for the GPU implementation.
- For a nickel system, the GPU implementation achieved over 1 million atomsteps per second, vastly outperforming the water system.
- Future work includes exploring hierarchical parallelism and MPI scaling as well as applying machine learning techniques to other computational materials problems like phase
Learning to Learn by Gradient Descent by Gradient DescentKaty Lee
This document discusses learning to learn by training a neural network (LSTM) to be an optimizer that learns update rules rather than using hand-designed update rules. The optimizer takes gradients as input and outputs updates to the parameters of the optimizee network. The optimizer is trained end-to-end using its trajectory optimization objective. Experiments show the learned optimizer can generalize to different network architectures but not different activation functions. The conclusion suggests emailing authors if confused by typos.
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...inside-BigData.com
In this deck from PASC18, Robert Searles from the University of Delaware presents: Abstractions and Directives for Adapting Wavefront Algorithms to Future Architectures.
"Architectures are rapidly evolving, and exascale machines are expected to offer billion-way concurrency. We need to rethink algorithms, languages and programming models among other components in order to migrate large scale applications and explore parallelism on these machines. Although directive-based programming models allow programmers to worry less about programming and more about science, expressing complex parallel patterns in these models can be a daunting task especially when the goal is to match the performance that the hardware platforms can offer. One such pattern is wavefront. This paper extensively studies a wavefront-based miniapplication for Denovo, a production code for nuclear reactor modeling.
We parallelize the Koch-Baker-Alcouffe (KBA) parallel-wavefront sweep algorithm in the main kernel of Minisweep (the miniapplication) using CUDA, OpenMP and OpenACC. Our OpenACC implementation running on NVIDIA's next-generation Volta GPU boasts an 85.06x speedup over serial code, which is larger than CUDA's 83.72x speedup over the same serial implementation. Our experimental platform includes SummitDev, an ORNL representative architecture of the upcoming Summit supercomputer. Our parallelization effort across platforms also motivated us to define an abstract parallelism model that is architecture independent, with a goal of creating software abstractions that can be used by applications employing the wavefront sweep motif."
Watch the video: https://wp.me/p3RLHQ-iPU
Read the Full Paper: https://doi.org/10.1145/3218176.3218228
and
https://pasc18.pasc-conference.org/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Analysis of Educational Robotics activities using a machine learning approachLorenzo Cesaretti
These slides present the preliminary results through the utilisation of machine learning techniques for the analysis of Educational Robotics activities. An experimentation with 197 secondary school students from Italy was con-ducted, through updating Lego Mindstorms EV3 programming blocks in order to record log files containing the coding sequences designed by the students (within team work), during the resolution of a preliminary Robotics’ exercise. We utilised four machine learning techniques (logistic regression, support vec-tor machine, K-nearest neighbors and random forests) to predict the students’ performance, comparing a supervised approach (using twelve indicators ex-tracted from the log files as input for the algorithms) and a mixed approach (ap-plying a k-means algorithm to calculate the machine learning features). The re-sults have highlighted that SVM with the mixed approach outperformed the other techniques, and that three learning styles were predominantly emerged from the data mining analysis.
This document describes a new deep learning model called Convolutional eXtreme Gradient Boosting (ConvXGB) that combines a Convolutional Neural Network (CNN) and XGBoost for classification problems. ConvXGB consists of stacked convolutional layers for feature learning, followed by XGBoost in the last layer for class prediction. Experiments on image and general datasets showed ConvXGB achieved slightly better accuracy than CNN and XGBoost alone, and was sometimes significantly better.
Quantifying Overheads in Charm++ and HPX using Task BenchPatrick Diehl
Asynchronous Many-Task (AMT) runtime systems take advantage of multi-core architectures with light-weight threads, asynchronous executions, and smart scheduling. In this paper, we present the comparison of the AMT systems Charm++ and HPX with the main stream MPI, OpenMP, and MPI+OpenMP libraries using the Task Bench benchmarks. Charm++ is a parallel programming language based on C++, supporting stackless tasks as well as light-weight threads asynchronously along with an adaptive runtime system. HPX is a C++ library for concurrency and parallelism, exposing C++ standards conforming API. First, we analyze the commonalities, differences, and advantageous scenarios of Charm++ and HPX in detail. Further, to investigate the potential overheads introduced by the tasking systems of Charm++ and HPX, we utilize an existing parameterized benchmark, Task Bench, wherein 15 different programming systems were implemented, e.g., MPI, OpenMP, MPI + OpenMP, and extend Task Bench by adding HPX implementations. We quantify the overheads of Charm++, HPX, and the main stream libraries in different scenarios where a single task and multi-task are assigned to each core, respectively. We also investigate each system's scalability and the ability to hide the communication latency.
Transfer Learning for Improving Model Predictions in Robotic SystemsPooyan Jamshidi
Modern software systems are now being built to be used in dynamic environments utilizing configuration capabilities to adapt to changes and external uncertainties. In a self-adaptation context, we are often interested in reasoning about the performance of the systems under different configurations. Usually, we learn a black-box model based on real measurements to predict the performance of the system given a specific configuration. However, as modern systems become more complex, there are many configuration parameters that may interact and, therefore, we end up learning an exponentially large configuration space. Naturally, this does not scale when relying on real measurements in the actual changing environment. We propose a different solution: Instead of taking the measurements from the real system, we learn the model using samples from other sources, such as simulators that approximate performance of the real system at low cost.
Optimization as a model for few shot learningKaty Lee
paper presentation of "Optimization as a model for few shot learning" at ICLR 2017 by Sachin Ravi and Hugo Larochelle
highly related to "learning to learn by gradient descent by gradient descent"
Similar to ELLA LC algorithm presentation in ICIP 2016 (20)
The document discusses the In Video Veritas project, an EU Horizon 2020 project focused on verification of social media video content for news organizations. It provides an overview of the project, including extracting keyframes from deepfake videos, searching for original videos, finding matching images, side-by-side comparisons, issues with neural network generated videos, and inconsistencies in lighting. The In Video Veritas project is developing techniques to verify social media videos and is funded by the EU Horizon 2020 Programme.
Presentation of the InVID technologies for image forensics analysisInVID Project
Presentation of the challenges related to the detection of manipulation or tampering of digital images, and description of the funtionalities offered by the REVEAL Media Verificaiton Assistant that is integrated in the InVID Verification Plugin.
Presentation of the InVID tool for video fragmentation and keyframe-based rev...InVID Project
Presentation of the developed web application that allows the user to fragment a video into visually coherent parts, extract a number of representative keyframes, and apply a keyframe-based reverse search at the video-fragment level.
Presentation of the InVID project and verification technologiesInVID Project
Presentation of the InVID's motivation, goals and overall concept, and brief description of the project's integrated technologies for newsworthy content collection and verification.
Présentation du projet européen InVID par Denis Teyssou (Medialab AFP) lors des 6e Rencontres InfoCom de l'IUT de Toulouse (Université Paul Sabatier) consacrée aux fake news.
A state of the art in journalism about fake image and video detectionInVID Project
Presentation given by the Innovation Manager of InVID (Mr. Denis Teyssou from AFP) at the IEEE Workshop on Information Forensics and Security 2017, on the state of the art in journalism about fake image and video detection.
Presentation of the InVID tool for social media verificationInVID Project
Presentation of the InVID tool for social media verification through contextual analysis, at the Media Informatics Lab meeting on detection and verification of socially shared videos.
Presentation of the InVID tool for video fragmentation and reverse keyframe s...InVID Project
Presentation of the InVID web application for video fragmentation and keyframe reverse search, at the Media Informatics Lab meeting on detection and verification of socially shared videos.
Presentation of the InVID Verification PluginInVID Project
Presentation of the verification functionalities of the InVID plugin for fake news video debunking, at the Media Informatics Lab meeting on detection and verification of socially shared videos.
Presentation of the InVID tools for image forensics analysisInVID Project
Presentation of the InVID tools for image forensics analysis, at the Media Informatics Lab meeting on detection and verification of socially shared videos.
Presentation of the InVID project's motivation, goals, overall concept and integrated tools for newsworthy media collection and verification, at the Media Informatics Lab meeting on detection and verification of socially shared videos.
This newsletter provides an update on the progress of the InVID project, which develops tools to help verify social media videos for news organizations. It summarizes the latest technologies developed, including story detection from Twitter streams, video fragmentation and annotation, near-duplicate video detection, logo detection, video context analysis, and video rights management. Prototypes of the Visual Analytics Dashboard, Verification Plugin, Verification Application, and Mobile Application are introduced. Recent dissemination activities promoting InVID and media coverage of the project are also outlined.
The InVID Plug-in: Web Video Verification on the BrowserInVID Project
Presentation of the paper "The InVID Plug-in: Web Video
Verification on the Browser" at the 1st Int. Workshop on Multimedia Verification (MuVer) that was hosted at the ACM Multimedia Conference, October 23 - 27, 2017 Mountain View, CA, USA.
Video Retrieval for Multimedia Verification of Breaking News on Social NetworksInVID Project
This slideset presents an approach to automatically detecting breaking news events from social media streams, using event detection to collecting near real time relevant video documents from social networks regarding that breaking news. A visual analytics dashboard provides access to the results of the content processing pipeline, providing a rich interactive interface to explore emerging stories and select video material around those stories for verification.
The InVID Vefirication Plugin at #IFRAExpo #DCXExpoInVID Project
The document discusses the InVID verification plugin, which is a tool developed through an EU-funded project to help journalists verify social media video content. The plugin allows users to extract keyframes from videos, run reverse image searches on those frames to discover the original source and context of the video. Two examples are given where the plugin helped debunk misinformation - identifying a video used in a political campaign was not actually shot in France, and discovering a photo was not actually taken in Gaza as claimed. The document encourages journalists to use the InVID verification plugin to help verify social media videos.
The InVID Vefirication Plugin at DisinfoLabInVID Project
Presentation of the developed InVID Verification Plugin at the Brussels DisinfoLab event. Demonstration of its use and functionality on recent fake videos published on various social networks.
InVID Project Presentation 3rd release March 2016InVID Project
This is the 3rd release of the InVID overall project presentation. This presentation provides information about:
- the motivation behind the project, which comes for the growing use of User Generated Content (UGC) by media organizations and the need to verify this content before its publication
- a set of uses cases and examples for stressing the need for building the InVID technologies
- the project's objectives and expected outcomes
- the overall InVID concept and approach
- the project consortium and its funding agency
- contacting us
Current Ms word generated power point presentation covers major details about the micronuclei test. It's significance and assays to conduct it. It is used to detect the micronuclei formation inside the cells of nearly every multicellular organism. It's formation takes place during chromosomal sepration at metaphase.
Authoring a personal GPT for your research and practice: How we created the Q...Leonel Morgado
Thematic analysis in qualitative research is a time-consuming and systematic task, typically done using teams. Team members must ground their activities on common understandings of the major concepts underlying the thematic analysis, and define criteria for its development. However, conceptual misunderstandings, equivocations, and lack of adherence to criteria are challenges to the quality and speed of this process. Given the distributed and uncertain nature of this process, we wondered if the tasks in thematic analysis could be supported by readily available artificial intelligence chatbots. Our early efforts point to potential benefits: not just saving time in the coding process but better adherence to criteria and grounding, by increasing triangulation between humans and artificial intelligence. This tutorial will provide a description and demonstration of the process we followed, as two academic researchers, to develop a custom ChatGPT to assist with qualitative coding in the thematic data analysis process of immersive learning accounts in a survey of the academic literature: QUAL-E Immersive Learning Thematic Analysis Helper. In the hands-on time, participants will try out QUAL-E and develop their ideas for their own qualitative coding ChatGPT. Participants that have the paid ChatGPT Plus subscription can create a draft of their assistants. The organizers will provide course materials and slide deck that participants will be able to utilize to continue development of their custom GPT. The paid subscription to ChatGPT Plus is not required to participate in this workshop, just for trying out personal GPTs during it.
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...AbdullaAlAsif1
The pygmy halfbeak Dermogenys colletei, is known for its viviparous nature, this presents an intriguing case of relatively low fecundity, raising questions about potential compensatory reproductive strategies employed by this species. Our study delves into the examination of fecundity and the Gonadosomatic Index (GSI) in the Pygmy Halfbeak, D. colletei (Meisner, 2001), an intriguing viviparous fish indigenous to Sarawak, Borneo. We hypothesize that the Pygmy halfbeak, D. colletei, may exhibit unique reproductive adaptations to offset its low fecundity, thus enhancing its survival and fitness. To address this, we conducted a comprehensive study utilizing 28 mature female specimens of D. colletei, carefully measuring fecundity and GSI to shed light on the reproductive adaptations of this species. Our findings reveal that D. colletei indeed exhibits low fecundity, with a mean of 16.76 ± 2.01, and a mean GSI of 12.83 ± 1.27, providing crucial insights into the reproductive mechanisms at play in this species. These results underscore the existence of unique reproductive strategies in D. colletei, enabling its adaptation and persistence in Borneo's diverse aquatic ecosystems, and call for further ecological research to elucidate these mechanisms. This study lends to a better understanding of viviparous fish in Borneo and contributes to the broader field of aquatic ecology, enhancing our knowledge of species adaptations to unique ecological challenges.
ESPP presentation to EU Waste Water Network, 4th June 2024 “EU policies driving nutrient removal and recycling
and the revised UWWTD (Urban Waste Water Treatment Directive)”
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...University of Maribor
Slides from talk:
Aleš Zamuda: Remote Sensing and Computational, Evolutionary, Supercomputing, and Intelligent Systems.
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Inter-Society Networking Panel GRSS/MTT-S/CIS Panel Session: Promoting Connection and Cooperation
https://www.etran.rs/2024/en/home-english/
hematic appreciation test is a psychological assessment tool used to measure an individual's appreciation and understanding of specific themes or topics. This test helps to evaluate an individual's ability to connect different ideas and concepts within a given theme, as well as their overall comprehension and interpretation skills. The results of the test can provide valuable insights into an individual's cognitive abilities, creativity, and critical thinking skills
The technology uses reclaimed CO₂ as the dyeing medium in a closed loop process. When pressurized, CO₂ becomes supercritical (SC-CO₂). In this state CO₂ has a very high solvent power, allowing the dye to dissolve easily.
The binding of cosmological structures by massless topological defectsSérgio Sacani
Assuming spherical symmetry and weak field, it is shown that if one solves the Poisson equation or the Einstein field
equations sourced by a topological defect, i.e. a singularity of a very specific form, the result is a localized gravitational
field capable of driving flat rotation (i.e. Keplerian circular orbits at a constant speed for all radii) of test masses on a thin
spherical shell without any underlying mass. Moreover, a large-scale structure which exploits this solution by assembling
concentrically a number of such topological defects can establish a flat stellar or galactic rotation curve, and can also deflect
light in the same manner as an equipotential (isothermal) sphere. Thus, the need for dark matter or modified gravity theory is
mitigated, at least in part.
The binding of cosmological structures by massless topological defects
ELLA LC algorithm presentation in ICIP 2016
1. 1
ONLINE MULTI-TASK LEARNING FOR
SEMANTIC CONCEPT DETECTION IN VIDEO
Foteini Markatopoulou1,2, Vasileios Mezaris1, and Ioannis Patras2
1Information Technologies Institute / Centre for Research and Technology Hellas
2Queen Mary University of London
5. 5
Motivation for going beyond the
typical solution
• Typical concept detection: Train one supervised classifier separately for
each concept; a single-task learning process (STL)
• However, concepts do not appear in isolation from each other
Label relations Task relations
sky
sky
sky
skysun
sun
sun
human
car road
sea
trees
outdoorsoutdoors
outdoorsoutdoors
6. 6
Literature review
• Multi-concept learning (MCL): Exploit concept relations
• Stacking-based approaches (Smith et al. 2003), (Markatopoulou et al.
2014)
• Inner learning approaches (Qi et al. 2007)
• Multi-task learning (MTL): Exploit task relations (learn many tasks
together)
• Assuming all tasks are related e.g., use regularization (Argyriou et al.
2007)
• Some tasks may be unrelated e.g., CMTL (Zhou et al. 2011), AMTL (Sun
et al. 2015), GO-MTL (Kumar et al. 2012)
• Online MTL for lifelong learning e.g., ELLA (Eaton & Ruvolo 2013)
7. 7
Our approach
• Proposed method: ELLA_LC
• ELLA_LC stands for Efficient Lifelong Learning Algorithm with Label
Constraint
• It jointly considers task and label relations
• ELLA_LC is based on ELLA (Eaton & Ruvolo 2013)
• ELLA is the online version of GO-MTL: Learning Task Grouping and
Overlap in Multi-Task Learning (Kumar et al. 2012)
9. 9
Background: The GO-MTL algorithm
= .
each linear combination is
assumed to be sparse in the
latent basis
d-element
featurevector
d-element
featurevector
• Objective function:
ensuring sparsity of S (matrix
S concatenates the weight
vectors s(t) from all the tasks)
ensuring
sparsity of L
base learner
e.g., LSVM, LR
loss
function
10. 10
Background: The GO-MTL algorithm
Fix L
Update S
.
Update L
Fix S
.
For each
task
Iterative optimization with respect to L and S:
11. 11
Background: The ELLA algorithm
Average the
model losses
across tasks
• ELLA is the online version of GO-MTL (useful in lifelong learning scenarios)
First inefficiency: due to the explicit dependence of the above equation on all of the
previous training data (through the inner summation)
• Solution: Approximate the equation using the second-order Taylor expansion of
aaaaaaaaaaaround w(t)
Second inefficiency: In order to evaluate a single candidate L, an optimization
problem must be solved to recompute the value of each of the s(t)’s
• Solution: Compute each s(t) only when training data for task t are available and
do not update it when new tasks arrive
12. 12
ELLA_LC objective function
extra term added to ELLA’s
objective function that considers
concept correlations
(1)
• Contributions:
1. We add a new label-based constraint that considers concept correlations
2. We solve the objective function of ELLA using quadratic programming
instead of solving the Lasso problem
3. We use linear SVMs as base learners instead of logistic regression
φ-correlation
coefficient
between t and t’
13. 13
ELLA_LC label constraint
Correlation
between
sun and all
the other
concepts
sky
sun
sea
outdoors
Positive correlation: force task
parameters to be similar, linear
classifiers return similar scores
Negative correlation: force task
parameters to be opposite, linear
classifiers return opposite scores
t2: indoors
t1: sky
t: sun
t2: indoors
t: sun
t1: sky
14. 14
ELLA_LC solution
. .
Learn
e.g., SVM
Compute the ϕ-
correlation
coefficient of the
concept learned in
task t with all the
previously learned
concepts
d-element
featurevector
t
t’
d-element
featurevector
To update s(t) we use
quadratic
programming
15. 15
Experimental setup: Compared
methods
Dataset: TRECVID SIN 2013
• 800 and 200 hours of internet archive videos for training and testing
• One keyframe per video shot
• Evaluated concepts: 38, Evaluation measure: MXinfAP
We experimented with 8 different feature sets
• The output from 4 different pre-trained ImageNet DCNNs (CaffeNet, ConvNet,
GoogLeNet-1k, GoogLeNet-5k)
• The output from 4 fine-tuned networks on the TRECVID SIN dataset
Compared methods
• STL using: a) LR, b) LSVM, c) kernel SVM with radial kernel (KSVM)
• The label powerset (LP) multi-label learning algorithm that models only label
relations (Markatopoulou et al. 2014)
• AMTL (Sun et al. 2015) and CMTL (Zhou et al. 2011), two batch MTL methods
• ELLA (Eaton & Ruvolo 2013), an online MTL method (what we extend in this study)
16. 16
Experimental results
• Results of our experiments in terms of MXinfAP
• ELLA_QP: an intermediate version of the proposed ELLA_LC that does not use
the label constraint of ELLA_LC but uses quadratic programming
• Statistical significance from the best performing method using the paired t-
test (at 5% significance level); the absence of * suggests statistical significance
17. 17
Experimental results
• Change in XinfAP for each task between the iteration that the task was first
learned and the last iteration (where all tasks had been learned), divided by
the position of the task in the task sequence
• Reverse transfer occurred, i.e., a positive change in accuracy for a task
indicates this, mainly for the tasks that were learned early
• As far as the pool of tasks increases early tasks get new knowledge from many
more tasks, which explains why the benefit is bigger for them
18. 18
Conclusions
• Proposed ELLA_LC: an online MTL method for video concept detection
• Learning the relations between many task models (one per concept) in
combination with the concept correlations that can be captured from the
ground-truth annotation outperforms other SoA single-task and multi-task
learning approaches
• The proposed ELLA_QP and ELLA_LC perform better than the STL alternatives
both when LR and when LSVM is used as the base learner
• The proposed ELLA_QP and ELLA_LC perform better than the MTL ELLA algorithm
(the one that they extend) both when LR and when LSVM is used as the base
learner
• Serving as input more complicated keyframe representations (e.g., combining
many DCNNs instead of using a single DCNN) improves the accuracy of the
proposed ELLA_QP and ELLA_LC
• Fine-tuning is a process that improves the retrieval accuracy of ELLA_QP and
ELLA_LC
19. 19
Thank you for your attention!
Questions?
More information and contact:
Dr. Vasileios Mezaris
bmezaris@iti.gr
http://www.iti.gr/~bmezaris