Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
http://imatge-upc.github.io/telecombcn-2016-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
Session 10 in module 3 from the Master in Computer Vision by UPC, UAB, UOC & UPF.
This lecture provides an overview of state of the art applications of convolutional neural networks to the problems in video processing: semantic recognition, optical flow estimation and object tracking.
https://mcv-m6-video.github.io/deepvideo-2018/
Overview of deep learning solutions for video processing. Part of a series of slides covering topics like action recognition, action detection, object tracking, object detection, scene segmentation, language and learning from videos.
Prepared for the Master in Computer Vision Barcelona:
http://pagines.uab.cat/mcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
https://mcv-m6-video.github.io/deepvideo-2019/
These slides provides an overview of how deep neural networks can be used to solve an object tracking task
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
http://imatge-upc.github.io/telecombcn-2016-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
Session 10 in module 3 from the Master in Computer Vision by UPC, UAB, UOC & UPF.
This lecture provides an overview of state of the art applications of convolutional neural networks to the problems in video processing: semantic recognition, optical flow estimation and object tracking.
https://mcv-m6-video.github.io/deepvideo-2018/
Overview of deep learning solutions for video processing. Part of a series of slides covering topics like action recognition, action detection, object tracking, object detection, scene segmentation, language and learning from videos.
Prepared for the Master in Computer Vision Barcelona:
http://pagines.uab.cat/mcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
https://mcv-m6-video.github.io/deepvideo-2019/
These slides provides an overview of how deep neural networks can be used to solve an object tracking task
http://imatge-upc.github.io/telecombcn-2016-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
https://telecombcn-dl.github.io/2017-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
http://imatge-upc.github.io/telecombcn-2016-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
https://telecombcn-dl.github.io/2017-dlsl/
Winter School on Deep Learning for Speech and Language. UPC BarcelonaTech ETSETB TelecomBCN.
The aim of this course is to train students in methods of deep learning for speech and language. Recurrent Neural Networks (RNN) will be presented and analyzed in detail to understand the potential of these state of the art tools for time series processing. Engineering tips and scalability issues will be addressed to solve tasks such as machine translation, speech recognition, speech synthesis or question answering. Hands-on sessions will provide development skills so that attendees can become competent in contemporary data analytics tools.
These slides review the research of our lab since 2016 on applied deep learning, starting from our participation in the TRECVID Instance Search 2014, moving into video analysis with CNN+RNN architectures, and our current efforts in sign language translation and production.
https://telecombcn-dl.github.io/2017-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
https://telecombcn-dl.github.io/2017-dlsl/
Winter School on Deep Learning for Speech and Language. UPC BarcelonaTech ETSETB TelecomBCN.
The aim of this course is to train students in methods of deep learning for speech and language. Recurrent Neural Networks (RNN) will be presented and analyzed in detail to understand the potential of these state of the art tools for time series processing. Engineering tips and scalability issues will be addressed to solve tasks such as machine translation, speech recognition, speech synthesis or question answering. Hands-on sessions will provide development skills so that attendees can become competent in contemporary data analytics tools.
https://telecombcn-dl.github.io/2019-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
https://imatge-upc.github.io/activitynet-2016-cvprw/
This thesis explore different approaches using Convolutional and Recurrent Neural Networks to classify and temporally localize activities on videos, furthermore an implementation to achieve it has been proposed. As the first step, features have been extracted from video frames using an state of the art 3D Convolutional Neural Network. This features are fed in a recurrent neural network that solves the activity classification and temporally location tasks in a simple and flexible way. Different architectures and configurations have been tested in order to achieve the best performance and learning of the video dataset provided. In addition it has been studied different kind of post processing over the trained network's output to achieve a better results on the temporally localization of activities on the videos. The results provided by the neural network developed in this thesis have been submitted to the ActivityNet Challenge 2016 of the CVPR, achieving competitive results using a simple and flexible architecture.
https://mcv-m6-video.github.io/deepvideo-2020/
Self-supervised audiovisual learning exploits the synchronization between pixels and audio recorded in video files. This lecture reviews the state of the art in deep neural networks trained with this approach, which does not require any manual annotation from humans.
https://mcv-m6-video.github.io/deepvideo-2018/
Overview of deep learning solutions for video processing. Part of a series of slides covering topics like action recognition, action detection, object tracking, object detection, scene segmentation, language and learning from videos.
Prepared for the Master in Computer Vision Barcelona:
http://pagines.uab.cat/mcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
These slides summarize the main trends in deep neural networks for video encoding. Including single frame models, spatiotemporal convolutionals, long term sequence modeling with RNNs and their combinaction with optical flow.
https://mcv-m6-video.github.io/deepvideo-2019/
Overview of deep learning solutions for video processing. Part of a series of slides covering topics like action recognition, action detection, object tracking, object detection, scene segmentation, language and learning from videos.
Master in Computer Vision Barcelona, 2019
Lecture by Xavier Giro-i-Nieto (UPC) at the Master in Computer Vision Barcelona (March 30, 2016).
http://pagines.uab.cat/mcv/
This lecture provides an overview of computer vision analysis of images at a global scale using deep learning techniques. The session is structure in two blocks: a first one addressing end to end learning, and a second one focusing on applications that use off-the-shelf features.
Please submit your feedback as comments on the GDrive source slides:
https://docs.google.com/presentation/d/1ms9Fczkep__9pMCjxtVr41OINMklcHWc74kwANj7KKI/edit?usp=sharing
https://telecombcn-dl.github.io/2017-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
https://imatge-upc.github.io/unsupervised-2017-cvprw/
Lin, Xunyu, Victor Campos, Xavier Giro-i-Nieto, Jordi Torres, and Cristian Canton Ferrer. "Disentangling Motion, Foreground and Background Features in Videos." CVPR Workshops 2017. (extended abstract)
This paper introduces an unsupervised framework to extract semantically rich features for video representation. Inspired by how the human visual system groups objects based on motion cues, we propose a deep convolutional neural network that disentangles motion, foreground and background information. The proposed architecture consists of a 3D convolutional feature encoder for blocks of 16 frames, which is trained for reconstruction tasks over the first and last frames of the sequence. A preliminary supervised experiment was conducted to verify the feasibility of proposed method by training the model with a fraction of videos from the UCF-101 dataset taking as ground truth the bounding boxes around the activity regions. Qualitative results indicate that the network can successfully segment foreground and background in videos as well as update the foreground appearance based on disentangled motion features. The benefits of these learned features are shown in a discriminative classification task, where initializing the network with the proposed pretraining method outperforms both random initialization and autoencoder pretraining.
https://telecombcn-dl.github.io/2017-dlsl/
Winter School on Deep Learning for Speech and Language. UPC BarcelonaTech ETSETB TelecomBCN.
The aim of this course is to train students in methods of deep learning for speech and language. Recurrent Neural Networks (RNN) will be presented and analyzed in detail to understand the potential of these state of the art tools for time series processing. Engineering tips and scalability issues will be addressed to solve tasks such as machine translation, speech recognition, speech synthesis or question answering. Hands-on sessions will provide development skills so that attendees can become competent in contemporary data analytics tools.
https://mcv-m6-video.github.io/deepvideo-2019/
Overview of deep learning solutions for video processing. Part of a series of slides covering topics like action recognition, action detection, object tracking, object detection, scene segmentation, language and learning from videos.
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
Applying Deep Learning Vision Technology to low-cost/power Embedded SystemsJenny Midwinter
Slides from Ottawa Machine Learning Meetup from January 16, 2016.
Pierre Paulin, Director of R&D at Synopsys (Embedded Vision Subsystems) , will be will be making a presentation on:
“Applying Deep Learning Vision Technology to Low-Cost, Low-Power Embedded Systems: An Industrial Perspective”
http://imatge-upc.github.io/telecombcn-2016-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
https://telecombcn-dl.github.io/2017-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
http://imatge-upc.github.io/telecombcn-2016-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
https://telecombcn-dl.github.io/2017-dlsl/
Winter School on Deep Learning for Speech and Language. UPC BarcelonaTech ETSETB TelecomBCN.
The aim of this course is to train students in methods of deep learning for speech and language. Recurrent Neural Networks (RNN) will be presented and analyzed in detail to understand the potential of these state of the art tools for time series processing. Engineering tips and scalability issues will be addressed to solve tasks such as machine translation, speech recognition, speech synthesis or question answering. Hands-on sessions will provide development skills so that attendees can become competent in contemporary data analytics tools.
These slides review the research of our lab since 2016 on applied deep learning, starting from our participation in the TRECVID Instance Search 2014, moving into video analysis with CNN+RNN architectures, and our current efforts in sign language translation and production.
https://telecombcn-dl.github.io/2017-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
https://telecombcn-dl.github.io/2017-dlsl/
Winter School on Deep Learning for Speech and Language. UPC BarcelonaTech ETSETB TelecomBCN.
The aim of this course is to train students in methods of deep learning for speech and language. Recurrent Neural Networks (RNN) will be presented and analyzed in detail to understand the potential of these state of the art tools for time series processing. Engineering tips and scalability issues will be addressed to solve tasks such as machine translation, speech recognition, speech synthesis or question answering. Hands-on sessions will provide development skills so that attendees can become competent in contemporary data analytics tools.
https://telecombcn-dl.github.io/2019-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
https://imatge-upc.github.io/activitynet-2016-cvprw/
This thesis explore different approaches using Convolutional and Recurrent Neural Networks to classify and temporally localize activities on videos, furthermore an implementation to achieve it has been proposed. As the first step, features have been extracted from video frames using an state of the art 3D Convolutional Neural Network. This features are fed in a recurrent neural network that solves the activity classification and temporally location tasks in a simple and flexible way. Different architectures and configurations have been tested in order to achieve the best performance and learning of the video dataset provided. In addition it has been studied different kind of post processing over the trained network's output to achieve a better results on the temporally localization of activities on the videos. The results provided by the neural network developed in this thesis have been submitted to the ActivityNet Challenge 2016 of the CVPR, achieving competitive results using a simple and flexible architecture.
https://mcv-m6-video.github.io/deepvideo-2020/
Self-supervised audiovisual learning exploits the synchronization between pixels and audio recorded in video files. This lecture reviews the state of the art in deep neural networks trained with this approach, which does not require any manual annotation from humans.
https://mcv-m6-video.github.io/deepvideo-2018/
Overview of deep learning solutions for video processing. Part of a series of slides covering topics like action recognition, action detection, object tracking, object detection, scene segmentation, language and learning from videos.
Prepared for the Master in Computer Vision Barcelona:
http://pagines.uab.cat/mcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
These slides summarize the main trends in deep neural networks for video encoding. Including single frame models, spatiotemporal convolutionals, long term sequence modeling with RNNs and their combinaction with optical flow.
https://mcv-m6-video.github.io/deepvideo-2019/
Overview of deep learning solutions for video processing. Part of a series of slides covering topics like action recognition, action detection, object tracking, object detection, scene segmentation, language and learning from videos.
Master in Computer Vision Barcelona, 2019
Lecture by Xavier Giro-i-Nieto (UPC) at the Master in Computer Vision Barcelona (March 30, 2016).
http://pagines.uab.cat/mcv/
This lecture provides an overview of computer vision analysis of images at a global scale using deep learning techniques. The session is structure in two blocks: a first one addressing end to end learning, and a second one focusing on applications that use off-the-shelf features.
Please submit your feedback as comments on the GDrive source slides:
https://docs.google.com/presentation/d/1ms9Fczkep__9pMCjxtVr41OINMklcHWc74kwANj7KKI/edit?usp=sharing
https://telecombcn-dl.github.io/2017-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
https://imatge-upc.github.io/unsupervised-2017-cvprw/
Lin, Xunyu, Victor Campos, Xavier Giro-i-Nieto, Jordi Torres, and Cristian Canton Ferrer. "Disentangling Motion, Foreground and Background Features in Videos." CVPR Workshops 2017. (extended abstract)
This paper introduces an unsupervised framework to extract semantically rich features for video representation. Inspired by how the human visual system groups objects based on motion cues, we propose a deep convolutional neural network that disentangles motion, foreground and background information. The proposed architecture consists of a 3D convolutional feature encoder for blocks of 16 frames, which is trained for reconstruction tasks over the first and last frames of the sequence. A preliminary supervised experiment was conducted to verify the feasibility of proposed method by training the model with a fraction of videos from the UCF-101 dataset taking as ground truth the bounding boxes around the activity regions. Qualitative results indicate that the network can successfully segment foreground and background in videos as well as update the foreground appearance based on disentangled motion features. The benefits of these learned features are shown in a discriminative classification task, where initializing the network with the proposed pretraining method outperforms both random initialization and autoencoder pretraining.
https://telecombcn-dl.github.io/2017-dlsl/
Winter School on Deep Learning for Speech and Language. UPC BarcelonaTech ETSETB TelecomBCN.
The aim of this course is to train students in methods of deep learning for speech and language. Recurrent Neural Networks (RNN) will be presented and analyzed in detail to understand the potential of these state of the art tools for time series processing. Engineering tips and scalability issues will be addressed to solve tasks such as machine translation, speech recognition, speech synthesis or question answering. Hands-on sessions will provide development skills so that attendees can become competent in contemporary data analytics tools.
https://mcv-m6-video.github.io/deepvideo-2019/
Overview of deep learning solutions for video processing. Part of a series of slides covering topics like action recognition, action detection, object tracking, object detection, scene segmentation, language and learning from videos.
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
Applying Deep Learning Vision Technology to low-cost/power Embedded SystemsJenny Midwinter
Slides from Ottawa Machine Learning Meetup from January 16, 2016.
Pierre Paulin, Director of R&D at Synopsys (Embedded Vision Subsystems) , will be will be making a presentation on:
“Applying Deep Learning Vision Technology to Low-Cost, Low-Power Embedded Systems: An Industrial Perspective”
Slides by Alberto Montes about the original CVPR 2016 paper:
Zheng Shou and Dongang Wang and Shih-Fu Chang, "Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs", CVPR 2016.
Abstract:
We address temporal action localization in untrimmed long videos. This is important because videos in real applications are usually unconstrained and contain multiple action instances plus video content of background scenes or other activities. To address this challenging issue, we exploit the effectiveness of deep networks in temporal action localization via three segment-based 3D ConvNets: (1) a proposal network identifies candidate segments in a long video that may contain actions; (2) a classification network learns one-vs-all action classification model to serve as initialization for the localization network; and (3) a localization network fine-tunes the learned classification network to localize each action instance. We propose a novel loss function for the localization network to explicitly consider temporal overlap and achieve high temporal localization accuracy. In the end, only the proposal network and the localization network are used during prediction. On two largescale benchmarks, our approach achieves significantly superior performances compared with other state-of-the-art
systems: mAP increases from 1.7% to 7.4% on MEXaction2
and increases from 15.0% to 19.0% on THUMOS 2014.
"Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Soft...Dataconomy Media
"Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Software Engineer - Machine Learning Team at Source {d}
Watch more from Data Natives Berlin 2016 here: http://bit.ly/2fE1sEo
Visit the conference website to learn more: www.datanatives.io
Follow Data Natives:
https://www.facebook.com/DataNatives
https://twitter.com/DataNativesConf
Stay Connected to Data Natives by Email: Subscribe to our newsletter to get the news first about Data Natives 2016: http://bit.ly/1WMJAqS
About the Author:
Currently Vadim is a Senior Machine Learning Engineer at source{d} where he works on deep neural networks that aim to understand all of the world's developers through their code. Vadim is one of the creators of the distributed deep learning platform Veles (https://velesnet.ml) while working at Samsung. Afterwards Vadim was responsible for the machine learning efforts to fight email spam at Mail.Ru. In the past Vadim was also a visiting associate professor at Moscow Institute of Physics and Technology, teaching about new technologies and conducting ACM-like internal coding competitions. Vadim is also a big fan of GitHub (vmarkovtsev) and HackerRank (markhor), as well as likes to write technical articles on a number of web sites.
Comparing Incremental Learning Strategies for Convolutional Neural NetworksVincenzo Lomonaco
In the last decade, Convolutional Neural Networks (CNNs) have shown to perform incredibly well in many computer vision tasks such as object recognition and object detection, being able to extract meaningful high-level invariant features. However, partly because of their complex training and tricky hyper-parameters tuning, CNNs have been scarcely studied in the context of incremental learning where data are available in consecutive batches and retraining the model from scratch is unfeasible. In this work we compare different incremental learning strategies for CNN based architectures, targeting real-word applications.
If you are interested in this work please cite:
Lomonaco, V., & Maltoni, D. (2016, September). Comparing Incremental Learning Strategies for Convolutional Neural Networks. In IAPR Workshop on Artificial Neural Networks in Pattern Recognition (pp. 175-184). Springer International Publishing.
For further information visit my website: http://www.vincenzolomonaco.com/
Slides by Amaia Salvador at the UPC Computer Vision Reading Group.
Source document on GDocs with clickable links:
https://docs.google.com/presentation/d/1jDTyKTNfZBfMl8OHANZJaYxsXTqGCHMVeMeBe5o1EL0/edit?usp=sharing
Based on the original work:
Ren, Shaoqing, Kaiming He, Ross Girshick, and Jian Sun. "Faster R-CNN: Towards real-time object detection with region proposal networks." In Advances in Neural Information Processing Systems, pp. 91-99. 2015.
http://imatge-upc.github.io/telecombcn-2016-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
https://imatge.upc.edu/web/people/xavier-giro
These slides provide an overview of our research group at UPC, which has been applying deep learning to computer vision since 2014. We are one of the pioneering research groups in Europe and, despite the youth of most of its member, it has already contributed to the community with a diverse range of publications and software at top scientific venues.
Deep Learning for X ray Image to Text Generationijtsrd
Motivated by the recent success of supervised and weakly supervised common object discovery, in this work we move forward one step further to tackle common object discovery in a fully unsupervised way. Mainly, object co localization aims at simultaneously localizing the objects of the same class across a group of images. Traditional object localization detection usually trains the specific object detectors which require bounding box annotations of object instances, or at least image level labels to indicate the presence absence of objects in an image. Given a collection of images without any annotations, our proposed fully unsupervised method is to simultaneously discover images that contain common objects and also localize common objects in corresponding images. It has been long envisioned that the machines one day will understand the visual world at a human level of intelligence. Now we can build very deep convolutional neural networks CNNs and achieve an impressively low error rate for tasks like large scale image classification. However, in tasks like image classification, the content of an image is usually simple, containing a predominant object to be classified. The situation could be much more challenging when we want computers to understand complex scenes. Image captioning is one such task. In these tasks, we have to train a model to predict the category of a given x ray image is to first annotate each x ray image in a training set with a label from the predefined set of categories. Through such fully supervised training, the computer learns how to classify an x ray image and convert into text. Mahima Chaddha | Sneha Kashid | Snehal Bhosale | Prof. Radha Deoghare ""Deep Learning for X-ray Image to Text Generation"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-3 , April 2019, URL: https://www.ijtsrd.com/papers/ijtsrd23168.pdf
Paper URL: https://www.ijtsrd.com/engineering/information-technology/23168/deep-learning-for-x-ray-image-to-text-generation/mahima-chaddha
Real Time Object Detection with Audio Feedback using Yolo v3ijtsrd
In this paper, we propose a system that combines real time object detection using the YOLOv3 algorithm with audio feedback to assist visually impaired individuals in locating and identifying objects in their surroundings. The YOLOv3 algorithm is a state of the art object detection algorithm that has been used in numerous studies for various applications. Audio feedback has also been studied in previous research as a useful tool for assisting visually impaired individuals. Our proposed system builds on the effectiveness of both these technologies to provide a valuable tool for improving the independence and quality of life of visually impaired individuals. We present the architecture of our proposed system, which includes a YOLOv3 model for object detection and a text to speech engine for providing audio feedback. We also present the results of our experiments, which demonstrate the effectiveness of our system in detecting and identifying objects in real time. Our proposed system can be used in various settings, such as indoor and outdoor environments, and can assist visually impaired individuals in various activities such as the navigation and object identification. Dr. K. Nagi Reddy | K. Sreeja | M. Sreenivasulu Reddy | K. Sireesha | M. Triveni "Real Time Object Detection with Audio Feedback using Yolo_v3" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-7 | Issue-2 , April 2023, URL: https://www.ijtsrd.com.com/papers/ijtsrd55158.pdf Paper URL: https://www.ijtsrd.com.com/engineering/electronics-and-communication-engineering/55158/real-time-object-detection-with-audio-feedback-using-yolov3/dr-k-nagi-reddy
Multimedia Information Retrieval: Bytes and pixels meet the challenges of hum...maranlar
Within computer science, "Multimedia" is a field of research that investigates how computers can support people in communication, information finding, and knowledge/opinion building. Multimedia content is defined broadly. It includes not only video, but also images accompanied by text and other information (for example, a geo-location). It can be professionally produced, or generated by users for online sharing. Computer scientists historically have a “love-hate” relationship with multimedia. They “love” it because of the richness of the data sources and the wealth of available data, which leads to interesting problems to tackle with machine learning. They “hate” it because multimedia is a diffuse and moving target: the interpretation of multimedia differs from person to person, and changes over time in the course of its use as a communication medium. This talk gives a view onto ongoing research in the area of multimedia information retrieval algorithms, which help people find multimedia. We look at a series of topics that reveal how pattern recognition, text processing, and crowdsourcing tools are used in multimedia research, and discuss both their limitations and their potential.
Il deep learning ed una nuova generazione di AI - Simone ScardapaneData Driven Innovation
Il deep learning rappresenta una nuova famiglia di tecniche data-driven, che aprono nuovi orizzonti in quello che le macchine possono essere programmate a fare. In pochi anni abbiamo visto automobili che si guidano da sole, robot che imparano a muoversi, campioni di Go sconfitti, e molto altro ancora. Quali sono le sfide tecniche, sociali e scientifiche del prossimo futuro? E, soprattutto, queste tecnologie sono alla portata di tutti? In questo talk daremo una (brevissima) panoramica di queste questioni e delle loro possibili risposte.
Deep neural networks have achieved outstanding results in various applications such as vision, language, audio, speech, or reinforcement learning. These powerful function approximators typically require large amounts of data to be trained, which poses a challenge in the usual case where little labeled data is available. During the last year, multiple solutions have been proposed to leverage this problem, based on the concept of self-supervised learning, which can be understood as a specific case of unsupervised learning. This talk will cover its basic principles and provide examples in the field of multimedia.
International Perspectives: Visualization in Science and EducationLiz Dorland
Overview of the international and interdisciplinary Gordon Research Conference on Visualization in Science and Education and info on key cognitive science and learning sciences researchers. History of the conference, NSF workshop, and research on learning with visualizations.
https://mcv-m6-video.github.io/deepvideo-2020/
Self-supervised techniques define surrogate tasks to train machine learning algorithms without the need of human generated labels. This lecture reviews the state of the art in the field of computer vision, including the baseline techniques based on visual feature learning from ImageNet data.
Deep neural networks have revolutionized the data analytics scene by improving results in several and diverse benchmarks with the same recipe: learning feature representations from data. These achievements have raised the interest across multiple scientific fields, especially in those where large amounts of data and computation are available. This change of paradigm in data analytics has several ethical and economic implications that are driving large investments, political debates and sounding press coverage under the generic label of artificial intelligence (AI). This talk will present the fundamentals of deep learning through the classic example of image classification, and point at how the same principal has been adopted for several tasks. Finally, some of the forthcoming potentials and risks for AI will be pointed.
This lecture reviews methods that allow interpreting the outcomes of a deep convolutional neural network. It presents some of the techniques proposed in the literature.
Presentation by Symeon Papadopoulos - a framework for mining social and Web multimedia with the goal of supporting verification. Presented at PAISI workshop, co-located with PA-KDD 2015, Ho Chi Minh City, Vietnam.
Media REVEALr: A social multimedia monitoring and intelligence system for Web...Symeon Papadopoulos
Presentation of Media REVEALr, a framework for mining social and Web multimedia with the goal of supporting verification. Presented at PAISI workshop, co-located with PA-KDD 2015, Ho Chi Minh City, Vietnam
https://mcv-m6-video.github.io/deepvideo-2018/
Overview of deep learning solutions for video processing. Part of a series of slides covering topics like action recognition, action detection, object tracking, object detection, scene segmentation, language and learning from videos.
Prepared for the Master in Computer Vision Barcelona:
http://pagines.uab.cat/mcv/
This document provides an overview of deep generative learning and summarizes several key generative models including GANs, VAEs, diffusion models, and autoregressive models. It discusses the motivation for generative models and their applications such as image generation, text-to-image synthesis, and enhancing other media like video and speech. Example state-of-the-art models are provided for each application. The document also covers important concepts like the difference between discriminative and generative modeling, sampling techniques, and the training procedures for GANs and VAEs.
Machine translation and computer vision have greatly benefited from the advances in deep learning. A large and diverse amount of textual and visual data have been used to train neural networks whether in a supervised or self-supervised manner. Nevertheless, the convergence of the two fields in sign language translation and production still poses multiple open challenges, like the low video resources, limitations in hand pose estimation, or 3D spatial grounding from poses.
The transformer is the neural architecture that has received most attention in the early 2020's. It removed the recurrency in RNNs, replacing it with and attention mechanism across the input and output tokens of a sequence (cross-attenntion) and between the tokens composing the input (and output) sequences, named self-attention.
Machine translation and computer vision have greatly benefited of the advances in deep learning. The large and diverse amount of textual and visual data have been used to train neural networks whether in a supervised or self-supervised manner. Nevertheless, the convergence of the two field in sign language translation and production is still poses multiple open challenges, like the low video resources, limitations in hand pose estimation, or 3D spatial grounding from poses. This talk will present these challenges and the How2✌️Sign dataset (https://how2sign.github.io) recorded at CMU in collaboration with UPC, BSC, Gallaudet University and Facebook.
https://imatge.upc.edu/web/publications/sign-language-translation-and-production-multimedia-and-multimodal-challenges-all
https://imatge-upc.github.io/synthref/
Integrating computer vision with natural language processing has achieved significant progress
over the last years owing to the continuous evolution of deep learning. A novel vision and language
task, which is tackled in the present Master thesis is referring video object segmentation, in which a
language query defines which instance to segment from a video sequence. One of the biggest chal-
lenges for this task is the lack of relatively large annotated datasets since a tremendous amount of
time and human effort is required for annotation. Moreover, existing datasets suffer from poor qual-
ity annotations in the sense that approximately one out of ten language expressions fails to uniquely
describe the target object.
The purpose of the present Master thesis is to address these challenges by proposing a novel
method for generating synthetic referring expressions for an image (video frame). This method pro-
duces synthetic referring expressions by using only the ground-truth annotations of the objects as well
as their attributes, which are detected by a state-of-the-art object detection deep neural network. One
of the advantages of the proposed method is that its formulation allows its application to any object
detection or segmentation dataset.
By using the proposed method, the first large-scale dataset with synthetic referring expressions for
video object segmentation is created, based on an existing large benchmark dataset for video instance
segmentation. A statistical analysis and comparison of the created synthetic dataset with existing ones
is also provided in the present Master thesis.
The conducted experiments on three different datasets used for referring video object segmen-
tation prove the efficiency of the generated synthetic data. More specifically, the obtained results
demonstrate that by pre-training a deep neural network with the proposed synthetic dataset one can
improve the ability of the network to generalize across different datasets, without any additional annotation cost. This outcome is even more important taking into account that no additional annotation cost is involved.
Master MATT thesis defense by Juan José Nieto
Advised by Víctor Campos and Xavier Giro-i-Nieto.
27th May 2021.
Pre-training Reinforcement Learning (RL) agents in a task-agnostic manner has shown promising results. However, previous works still struggle to learn and discover meaningful skills in high-dimensional state-spaces. We approach the problem by leveraging unsupervised skill discovery and self-supervised learning of state representations. In our work, we learn a compact latent representation by making use of variational or contrastive techniques. We demonstrate that both allow learning a set of basic navigation skills by maximizing an information theoretic objective. We assess our method in Minecraft 3D maps with different complexities. Our results show that representations and conditioned policies learned from pixels are enough for toy examples, but do not scale to realistic and complex maps. We also explore alternative rewards and input observations to overcome these limitations.
https://imatge.upc.edu/web/publications/discovery-and-learning-navigation-goals-pixels-minecraft
Peter Muschick MSc thesis
Universitat Pollitecnica de Catalunya, 2020
Sign language recognition and translation has been an active research field in the recent years with most approaches using deep neural networks to extract information from sign language data. This work investigates the mostly disregarded approach of using human keypoint estimation from image and video data with OpenPose in combination with transformer network architecture. Firstly, it was shown that it is possible to recognize individual signs (4.5% word error rate (WER)). Continuous sign language recognition though was more error prone (77.3% WER) and sign language translation was not possible using the proposed methods, which might be due to low accuracy scores of human keypoint estimation by OpenPose and accompanying loss of information or insufficient capacities of the used transformer model. Results may improve with the use of datasets containing higher repetition rates of individual signs or focusing more precisely on keypoint extraction of hands.
https://github.com/telecombcn-dl/lectures-all/
These slides review techniques for interpreting the behavior of deep neural networks. The talk reviews basic techniques such as the display of filters and tensors, as well as more advanced ones that try to interpret which part of the input data is responsible for the predictions, or generate data that maximizes the activation of certain neurons.
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/dlai-2020/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/drl-2020/
This course presents the principles of reinforcement learning as an artificial intelligence tool based on the interaction of the machine with its environment, with applications to control tasks (eg. robotics, autonomous driving) o decision making (eg. resource optimization in wireless communication networks). It also advances in the development of deep neural networks trained with little or no supervision, both for discriminative and generative tasks, with special attention on multimedia applications (vision, language and speech).
Giro-i-Nieto, X. One Perceptron to Rule Them All: Language, Vision, Audio and Speech. In Proceedings of the 2020 International Conference on Multimedia Retrieval (pp. 7-8).
Tutorial page:
https://imatge.upc.edu/web/publications/one-perceptron-rule-them-all-language-vision-audio-and-speech-tutorial
Deep neural networks have boosted the convergence of multimedia data analytics in a unified framework shared by practitioners in natural language, vision and speech. Image captioning, lip reading or video sonorization are some of the first applications of a new and exciting field of research exploiting the generalization properties of deep neural representation. This tutorial will firstly review the basic neural architectures to encode and decode vision, text and audio, to later review the those models that have successfully translated information across modalities.
Image segmentation is a classic computer vision task that aims at labeling pixels with semantic classes. These slides provide an overview of the basic approaches applied from the deep learning field to tackle this challenge and presents the basic subtasks (semantic, instance and panoptic segmentation) and related datasets.
Presented at the International Summer School on Deep Learning (ISSonDL) 2020 held online and organized by the University of Gdansk (Poland) between the 30th August and 2nd September.
http://2020.dl-lab.eu/virtual-summer-school-on-deep-learning/
https://imatge-upc.github.io/rvos-mots/
Video object segmentation can be understood as a sequence-to-sequence task that can benefit from the curriculum learning strategies for better and faster training of deep neural networks. This work explores different schedule sampling and frame skipping variations to significantly improve the performance of a recurrent architecture. Our results on the car class of the KITTI-MOTS challenge indicate that, surprisingly, an inverse schedule sampling is a better option than a classic forward one. Also, that a progressive skipping of frames during training is beneficial, but only when training with the ground truth masks instead of the predicted ones.
Benet Oriol, Jordi Luque, Ferran Diego, Xavier Giro-i-Nieto
Telefonica Research / Universitat Politecnica de Catalunya (UPC)
CVPR 2020 Workshop on on Egocentric Perception, Interaction and Computing
In this work, we propose an effective approach for training unique embedding representations by combining three simultaneous modalities: image and spoken and textual narratives. The proposed methodology departs from a baseline system that spawns a embedding space trained with only spoken narratives and image cues. Our experiments on the EPIC-Kitchen and Places Audio Caption datasets show that introducing the human-generated textual transcriptions of the spoken narratives helps to the training procedure yielding to get better embedding representations. The triad speech, image and words allows for a better estimate of the point embedding and show an improving of the performance within tasks like image and speech retrieval, even when text third modality, text, is not present in the task.
These slides provide an overview of the most popular approaches up to date to solve the task of object detection with deep neural networks. It reviews both the two stages approaches such as R-CNN, Fast R-CNN and Faster R-CNN, and one-stage approaches such as YOLO and SSD. It also contains pointers to relevant datasets (Pascal, COCO, ILSRVC, OpenImages) and the definition of the Average Precision (AP) metric.
Full program:
https://www.talent.upc.edu/ing/estudis/formacio/curs/310400/postgraduate-course-artificial-intelligence-deep-learning/
This lecture provides an introduction to recurrent neural networks, which include a layer whose hidden state is aware of its values in a previous time-step.
These slides were used in the Master in Computer Vision Barcelona 2019/2020, in the Module 6 dedicated to Video Analysis.
http://pagines.uab.cat/mcv/
More from Universitat Politècnica de Catalunya (20)
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Leading Change strategies and insights for effective change management pdf 1.pdf
Deep Learning for Computer Vision (2/4): Object Analytics @ laSalle 2016
1. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
@DocXavi
Deep Learning for Computer Vision
Object Analytics
5 May 2016
Xavier Giró-i-Nieto
Master en
Creació Multimedia
2. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
One lecture organized in three parts
2
Images (global) Objects (local)
Deep ConvNets for Recognition for...
Video (2D+T)
3. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
One lecture organized in four parts
3
Detection Recognition
Local analysis for...
Segmentation
person
bag
me
my bag
person
bag
Proposals
4. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
One lecture organized in four parts
4
Detection Recognition
Local analysis for...
Segmentation
person
bag
me
my bag
person
bag
Proposals
5. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Proposals: Hand-crafted
5
Slides credit:
Marc Bolaños
Hand-crafted proposals used to be based on bottom-up proposals.
Selective Search (SS) Multiscale Combinatorial Grouping (MCG)
[SS] Uijlings, Jasper RR, Koen EA van de Sande, Theo Gevers, and Arnold WM Smeulders. "Selective search for object
recognition." International journal of computer vision 104, no. 2 (2013): 154-171.
[MCG] Arbeláez, Pablo, Jordi Pont-Tuset, Jonathan Barron, Ferran Marques, and Jitendra Malik. "Multiscale combinatorial
grouping." CVPR 2014.
6. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Proposals: DeepBox
6
Kuo, Weicheng, Bharath Hariharan, and Jitendra Malik. "Deepbox: Learning objectness with convolutional
networks." ICCV 2015. [software]
7. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Proposals: DeepBox
7
Slides credit:
Marc Bolaños
Deepbox proposes a very simple method:
1) Use a state-of-the-art method (Edge Box) to generate initial object proposals.
2) Rerank them (and possibly discard them) by using DeepBox.
8. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Proposals: DeepBox: Architecture
8
Slides credit:
Marc Bolaños
PASCAL VOC
AUC = 0.75, IoU = 0.5
AUC = 0.62, IoU = 0.7
PASCAL VOC
AUC = 0.74, IoU = 0.5
AUC = 0.60, IoU = 0.7
AlexNet
architecture
(heavier)
DeepBox
architecture
(lighter)
Small
drop
9. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Proposals: DeepBox: Training
9
Slides credit:
Marc Bolaños
1) Initialize layers with AlexNet weights. 3) Train on Hard Negatives
2) Train on Sliding Windows
Negative Samples:
Extract windows by raster scanning.
Positive Samples:
Having GT bounding boxes, they
generate samples per instance
with a perturbation of:
By using bottom-up proposals from Edge
boxes:
If GT overlap threshold <= 0.3 → Negative
Samples
If GT overlap threshold >= 0.7 → Positive
Samples:
10. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Proposals: DeepBox: Results
10
DeepBox Edge Boxes DeepBox Edge Boxes
Slides credit:
Marc Bolaños
11. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Proposals: DeepBox: Results
11
With a rather simple approach ConvNets can obtain much better results than
previous techniques for Object Proposals.
Slides credit:
Marc Bolaños
12. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Proposals: DeepBox: Results
12
Slides credit:
Marc Bolaños
13. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Proposals: DeepBox: Results
13
Increasing not only Detection capabilities of known classes, but also of unknown ones
(suitable for Object Discovery).
Slides credit:
Marc Bolaños
14. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
One lecture organized in four parts
14
Detection Recognition
Local analysis for...
Segmentation
person
bag
me
my bag
person
bag
Proposals
15. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Detection: Objects
15
16. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Detection: Objects
16
DPM (HOG features)[1] R-CNN [2] SPPnet [3]
Hand-crafted features Deep features
+60 %
Slide credit:
Amaia Salvador
17. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Detection: Objects
17
Girshick, Ross, Forrest Iandola, Trevor Darrell, and Jitendra Malik. "Deformable Part Models are
Convolutional Neural Networks." CVPR 2015
Convnets (CNNs) actually learn similar detectors to the ones learned by
Deformable Parts-based Models (DPMs)
18. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Detection: Objects: R-CNN
18
Girshick, R., Donahue, J., Darrell, T., & Malik, J. . Rich feature hierarchies for accurate
object detection and semantic segmentation. CVPR 2014
19. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Detection: Objects: R-CNN
19
Slide credit:
Joost van de Weijer
20. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Detection: Objects: R-CNN
20
Slide credit:
Joost van de Weijer
21. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Detection: Objects: R-CNN
21
22. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Detection: Objects: Fast R-CNN
22
Girshick, Ross. "Fast R-CNN." ICCV 2015.
23. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Detection: Objects: Fast R-CNN
23
Slide credit:
Amaia Salvador
24. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Detection: Objects: Fast R-CNN
24
Slide credit:
Amaia Salvador
Same as SPP[3], but single scale
25. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Detection: Objects: Fast R-CNN
25
He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Spatial pyramid pooling in deep convolutional
networks for visual recognition." PAMI 2015.
Slide credit:
Joost van de Weijer
26. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Detection: Objects: Fast R-CNN
26
Slide credit:
Amaia Salvador
H
h
w
h
w
Size of pooling bins:
h / H’ x w/ W’
w/W’
h/H’
max pooling
CONV5
27. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Detection: Objects: Fast R-CNN
27
Slide credit:
Amaia Salvador
AlexNet [4], VGG16 [5], VGG_1024 [6]
28. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Detection: Objects: Fast R-CNN
28
Slide credit:
Amaia Salvador
Multi-task loss
29. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Detection: Objects: Faster R-CNN
29
Ren, S., He, K., Girshick, R. and Sun, J., 2015. Faster R-CNN: Towards real-time
object detection with region proposal networks. In Advances in Neural Information
Processing Systems (pp. 91-99). [Python code] [Matlab code]
30. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Detection: Objects: Faster R-CNN
30
Slide credit:
Amaia Salvador
Selective Search CPMC
MCG
Object Proposal computation is the bottleneck in
current state of the art object detection systems
Selective Search. Van de Sande, K. E., Uijlings, J. R., Gevers, T., & Smeulders, A. W. (2011, November). Segmentation as selective search for object
recognition. InComputer Vision (ICCV), 2011 IEEE International Conference on (pp. 1879-1886). IEEE.
CPMC. Carreira, J., & Sminchisescu, C. (2010, June). Constrained parametric min-cuts for automatic object segmentation. In Computer Vision
and Pattern Recognition (CVPR), 2010 IEEE Conference on (pp. 3241-3248). IEEE.
MCG. Arbeláez, P., Pont-Tuset, J., Barron, J., Marques, F., & Malik, J. (2014). Multiscale combinatorial grouping. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition (pp. 328-335).
31. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Detection: Objects: Faster R-CNN
31
Slide credit:
Amaia Salvador
Selective Search CPMC
MCG
Replace the usage of external Object Proposals
with a Region Proposal Network (RPN).
32. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Detection: Objects: Faster R-CNN
32
Slide credit:
Amaia Salvador
Conv
Layer 5
Conv
layers
RPN RPN Proposals
RPN Proposals
Class probabilities
RoI pooling layer
FC layers
Class scores
33. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Detection: Objects: Faster R-CNN
33
Slide credit:
Amaia Salvador
Conv
Layer 5
Conv
layers
RPN RPN Proposals
RPN Proposals
Class probabilities
RoI pooling layer
FC layers
Class scores
34. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Detection: Objects: Faster R-CNN
34
Slide credit:
Amaia Salvador
Objectness scores
(object/no object)
Bounding Box Regression
In practice, k = 9 (3 different scales and 3 aspect ratios)
35. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Detection: Objects: Faster R-CNN
35
Slide credit:
Amaia Salvador
Conv
Layer 5
Conv
layers
RPN RPN Proposals
RPN Proposals
Class probabilities
RoI pooling layer
FC layers
Class scores
36. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Detection: Objects: Faster R-CNN
36
Slide credit:
Amaia Salvador
Fast R-CNN
37. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Detection: Objects: Faster R-CNN
37
Slide credit:
Amaia Salvador
Conv
Layer 5
Conv
layers
RPN RPN Proposals
RPN Proposals
Class probabilities
RoI pooling layer
FC layers
Class scores
4-step training to share features for RPN and Fast R-CNN
38. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Detection: Objects: Faster R-CNN
38
Slide credit:
Amaia Salvador
Conv
Layer 5
Conv
layers
RPN RPN Proposals
Step 1: Train RPN initialized with an ImageNet pre-trained model.
ImageNet weights
(fine tuned)
39. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Detection: Objects: Faster R-CNN
39
Slide credit:
Amaia Salvador
Conv
Layer 5
Conv
layers
RPN Proposals
(learned in 1)
Class probabilities
Step 2: Train Fast R-CNN with learned RPN proposals.
ImageNet weights
(fine tuned)
40. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Detection: Objects: Faster R-CNN
40
Slide credit:
Amaia Salvador
Conv
Layer 5
Conv
layers RPN RPN Proposals
Step 3: The model trained in 2 is used to initialize RPN and train again.
Weights from Step 2
(fixed)
41. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Detection: Objects: Faster R-CNN
41
Slide credit:
Amaia Salvador
Conv
Layer 5
Conv
layers
RPN Proposals
(learned in 3)
Class probabilities
Step 4: Fine tune FC layers of Fast R-CNN using same shared convolutional layers as in 3.
Weights from Step 2&3
(fixed)
42. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Detection: Objects: Faster R-CNN
42
Slide credit:
Amaia Salvador
Detection Accuracy (Pascal VOC)
Timing in ms (Pascal VOC)
43. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Detection: Objects: Faster R-CNN
43
Slide credit:
Amaia Salvador
44. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Detection: Objects: Faster R-CNN
44
Slide credit:
Amaia Salvador
45. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Detection: Objects: Faster R-CNN
45
Slide credit:
Amaia Salvador
46. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016) 46
Detection: Objects: Reinforcement L.
Caicedo, Juan C., and Svetlana Lazebnik. "Active object localization with deep reinforcement learning." ICCV
2015 [Slides by Miriam Bellver]
47. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016) 47
Detection: Objects: Reinforcement L.
Object is localized based on visual features from AlexNet FC6.
48. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016) 48
Detection: Objects: Reinforcement Slide credit:
Míriam Bellver
Set of actions A
Transformation actions
49. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016) 49
Detection: Objects: Reinforcement Slide credit:
Míriam Bellver
Set of actions A
Terminates the sequence of the current search
Marks the region, inhibition-of-return (IoR)
50. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016) 50
Detection: Objects: Reinforcement Slide credit:
Míriam Bellver
Set of states S
(o,h)
o = feature vector from pre-trained CNN fc6 : 4096 dim
h = history of taken actions binary vector dim 90
51. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016) 51
Detection: Objects: Reinforcement Slide credit:
Míriam Bellver
Reward Function R
ground-truthbounding box
52. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016) 52
Detection: Objects: Reinforcement Slide credit:
Míriam Bellver
Reward Function R for trigger action
The Reward function considers the number of steps as a cost
3
minimum
IoU:
0.6
53. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016) 53
Detection: Objects: Reinforcement Slide credit:
Míriam Bellver
Policy function
If the current state is S, which should be the next action A?
Reinforcement Learning using a Q-learning
54. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016) 54
Detection: Objects: Reinforcement Slide credit:
Míriam Bellver
The action-value function is estimated using a neural network that:
● has as many output units as actions
● the algorithm incorporates a replay-memory to collect experiences
● category-specific Q-network
Policy of the agent: selection action A with maximum estimated value of the
learnt action-value function.
55. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016) 55
Detection: Objects: Reinforcement Slide credit:
Míriam Bellver
56. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016) 56
Detection: Objects: Reinforcement Slide credit:
Míriam Bellver
Datasets for training and testing : PASCAL VOC
Two modes of evaluation:
1) All attended Regions (AAR)
2) Terminal regions (TR)
57. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016) 57
Detection: Objects: Reinforcement Slide credit:
Míriam Bellver
Best performance with
few region proposals
58. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016) 58
Detection: Objects: Reinforcement Slide credit:
Míriam Bellver
59. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016) 59
Detection: Objects: Reinforcement Slide credit:
Míriam Bellver
60. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Detection: Faces
60
61. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Detection: Faces:DDFD
61
Farfade, Sachin Sudhakar, Mohammad Saberian, and Li-Jia Li. "Multi-view Face
Detection Using Deep Convolutional Neural Networks." ICMR (2015). [software]
62. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Detection: Faces: DDFD: Train
62
Dataset
● Source: Annotated Facial Landmarks in the Wild by TU Graz
● 25k annotated faces on images downloaded from Flickr.
● 380k manually annotated facial landmarks.
63. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Detection: Faces: DDFD: Train
63
● Randomly samples sub-windows (blocks)
○ Positive examples if Intersection-over Union (IoU) with an annotated
face is larger than 50%.
○ ...and negative sample otherwise.
● Total samples: 200K positive and 20M negative.
64. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Detection: Faces: DDFD: Test
64
Test images are rescaled up/down 3 times per octave to find different sizes.
65. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Detection: Faces: DDFD: Test
65
Sliding window of 227x227 over the test image.
Source: James Hays, “Object Category Detetcion: Sliding Windows” (Brown University, 2011)
66. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Detection: Faces: DDFD: Test
66
Fully-connected layers are converted to convolutional layers, which allows
processing images from any size.
Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully Convolutional Networks for Semantic
Segmentation." CVPR 2015
67. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Detection: Faces: DDFD: Test
67
● This makes possible to:
○ Efficiently run the convnet on images of any size.
○ Obtain a heat-map of the face etector.
68. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Detection: Faces: DDFD: Test
68
● Non-Maximum Suppression (NMS) to avoid overlapped detections.
Source: Adrian Rosebrock, “Non-Maximum Suppression for Object Detection in Python” (Pyimagesearch, 2014)
69. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Detection: Faces: DDFD: Results
69
70. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Detection: Faces: DDFD: Results
70
Precision vs Recall Curves
- DPM corresponds to Deformable Part-based Models.
- OpenCV face detector is an implementation of Viola & Jones.
- IMPORTANT: DPM or Headhunter need extra information about pose or facial landmarks during
training.
71. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
One lecture organized in four parts
71
Detection Recognition
Local analysis for...
Segmentation
person
bag
me
my bag
person
bag
Proposals
72. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016) 72
Faces: Recognition: FaceNet
Schroff, Florian, Dmitry Kalenichenko, and James Philbin. "FaceNet: A Unified Embedding for Face
Recognition and Clustering." CVPR 2015
(Extended summary slides by Xavier Giro on the ReadCV seminar.)
73. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016) 73
Faces: Recognition: FaceNet
Faces
Euclidean space
where distances
correspond to
face similarity
FaceNet
74. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016) 74
Faces: Recognition: FaceNet
End-to-end learning of an embedding (distance metric learning)...
Weinberger, Kilian Q., and Lawrence K. Saul. "Distance metric learning for large margin nearest neighbor
classification." The Journal of Machine Learning Research 10 (2009): 207-244
75. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016) 75
Faces: Recognition: FaceNet
...by means of well chosen triplets, using curriculum learning.
Bengio, Yoshua, Jérôme Louradour, Ronan Collobert, and Jason Weston. "Curriculum learning." In Proceedings of the 26th annual international
conference on machine learning, pp. 41-48. ACM, 2009
76. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016) 76
Faces: Recognition: FaceNet
77. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016) 77
Faces: Recognition: FaceNet
Zeiler, Matthew D., and Rob Fergus. "Visualizing and understanding convolutional networks." In Computer
Vision–ECCV 2014, pp. 818-833. Springer International Publishing, 2014 (Slides by Xavier Giró-i-Nieto)
Architecture 1 (NN1): ZF
78. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016) 78
Faces: Recognition: FaceNet
Architecture 2 (NN2): GoogLeNet
Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent
Vanhoucke, and Andrew Rabinovich. "Going Deeper With Convolutions." CVPR 2015. (Slides by Elisa Sayrol)
79. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016) 79
Faces: Recognition: FaceNet
80. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016) 80
Faces: Recognition: FaceNet: Test
LBW: 99.63% (new record)
YouTubeFaces DB: 95.12%
81. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016) 81
Faces: Recognition: FaceNet: Software
Software implementation: OpenFace
82. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016) 82
Faces: Recognition: VGG Face
Parkhi, Omkar M., Andrea Vedaldi, and Andrew Zisserman. "Deep face recognition."
Proceedings of the British Machine Vision 1, no. 3 (2015): 6. [software]
83. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
E. Mohedano, Salvador, A., McGuinness, K., Giró-i-Nieto, X., O'Connor, N., and Marqués, F., “Bags of Local
Convolutional Features for Scalable Instance Search”, ICMR 2016
83
Objects: Recognition: Retrieval
84. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016) 84
Objects: Recognition: Retrieval
Image Database
Visual Query
“A dog”
Expected outcome:
85. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016) 85
Objects: Recognition: Retrieval
Image Database
Visual Query
“This dog”
Expected outcome:
86. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016) 86
...
Instance Retrieval
(Instance: Object, Building, Person, Place…)
Objects: Recognition: Retrieval
87. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016) 87
Objects: Recognition: Retrieval
v1
= (v11
, …, v1n
)
vk
= (vk1
, …, vkn
)
...
INVERTED FILE
word Image ID
1 1, 12,
2 1, 30, 102
3 10, 12
4 2,3
6 10
...
Local hand-crafted features
(e.g. SIFT)
Bag of Visual
WordsN-Dimensional
feature space High-dimensional
Highly sparse
88. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016) 88
Objects: Recognition: Retrieval
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In
Advances in neural information processing systems (pp. 1097-1105).
Convolutional Neural Networks
89. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016) 89
Objects: Recognition: Retrieval
Babenko, A., Slesarev, A., Chigorin, A., & Lempitsky, V. (2014). Neural codes for image retrieval. In ECCV 2014
Razavian, A., Azizpour, H., Sullivan, J., & Carlsson, S. (2014). CNN features off-the-shelf: an astounding baseline for recognition. In
DeepVision CVPRW 2014
Convolutional Neural Networks FC layers as global feature representation
90. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016) 90
Objects: Recognition: Retrieval
Babenko, A., & Lempitsky, V. (2015). Aggregating local deep features for image retrieval. ICCV 2015
Tolias, G., Sicre, R., & Jégou, H. (2015). Particular object retrieval with integral max-pooling of CNN activations. ICLR 2015
Kalantidis, Y., Mellina, C., & Osindero, S. (2015). Cross-dimensional Weighting for Aggregated Deep Convolutional Features. arXiv
preprint arXiv:1512.04065.
Convolutional Neural Networks
sum/max pooled conv features as global representation
91. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016) 91
Objects: Recognition: Retrieval
Ng, J., Yang, F., & Davis, L. (2015). Exploiting local features from deep networks for image retrieval. In DeepVision CVPRW 2015
Convolutional Neural Networks
conv features encoded with VLAD as global representation
92. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016) 92
Objects: Recognition: Retrieval
93. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016) 93
Objects: Recognition: Retrieval
(336x256)
Resolution
conv5_1 from
VGG16[1]
(42x32)
25K centroids 25K-D vector
94. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016) 94
Objects: Recognition: Retrieval
Query Representation
... ... ...
... ... ...
Global Search
(GS)
Local Search
(LS)
95. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016) 95
Objects: Recognition: Retrieval
96. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
One lecture organized in four parts
96
Detection Recognition
Local analysis for...
Segmentation
person
bag
me
my bag
person
bag
Proposals
97. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Objects: Segmentation
97
Slide credit:
Eduard Fontdevila
Semantic segmentation: assign a category label to all pixels in an image
98. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Objects: Segmentation: Farabet
98
Farabet, Clement, Camille Couprie, Laurent Najman, and Yann LeCun. "Learning hierarchical features
for scene labeling." TPAMI 2013
99. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Objects: Segmentation: Farabet
99
Pyramid of three spatial scales.
100. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Objects: Segmentation: Farabet
100
The same parameters in the three convnets
theta_i=theta_0=filters weights (H_l) and biases b_l)
Non-linear: tanh
Pooling: max
101. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Objects: Segmentation: Farabet
101
Upsampling and concatenation.
102. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Objects: Segmentation: Farabet
102
Pixel-wise soft-max classifier
103. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Objects: Segmentation: Farabet
103
Problem: No spatial consistency among labels
3 explored solutions:
1) Superpixels
2) Conditional Random Fields
3) Parameter-free multilevel parsing
104. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Objects: Segmentation: Farabet
104
Prediction with a 2-layer
network
Solution 1: Superpixels
105. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Objects: Segmentation: Farabet
105
Prediction with a 2-layer
network
Solution 2: Superpixels + CRF
106. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Objects: Segmentation: Farabet
106
Solution 3: Multi-level parsing
Problems with Solutions 1 & 2:
Observation level.
BPT
[Garrido, Salembier]
107. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Objects: Segmentation: Farabet
107
Solution 3: Multi-level parsing
Problems with Solutions 1 & 2: Observation level.
Contribution: Automatically discover the best
observation level (optimal cover) for each pixel in the
image.
108. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Objects: Segmentation: Farabet
108
Solution 3: Multi-level parsing
Problems with Solutions 1 & 2: Observation level.
Contribution: Automatically discover the best
observation level (optimal cover) for each pixel in the
image.
C2 will be labelled with the class of C5
For each pixel (leaf) i, the optimal component
is the C_i is the one along the path between
the leaf and the root with minimal cost S.
109. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Objects: Segmentation: SDS
109
Slide credit:
Eduard Fontdevila
Hariharan, Arbelaez, Girshick, Malik, Simultaneous Detection and Segmentation (ECCV 2014)
110. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Objects: Segmentation: SDS
110
Slide credit:
Eduard Fontdevila
● Interest in obtaining segments, not just bounding boxes
● Multiscale combinational grouping (MCG) to generate object candidates
○ Cuts algorithm
○ Hierarchical segmenter
○ Grouping strategy to combine
multiscale regions
111. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Objects: Segmentation: SDS
111
Slide credit:
Eduard Fontdevila
BBOX CNN
feature
vector
1
feature
vector
2
[1 2]
*Finetuned to classify bboxes (with background), so extracting features from the region foreground is
suboptimal
BBOX CNN*
vector A
background masked out
with the mean image
112. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Objects: Segmentation: SDS
112
Slide credit:
Eduard Fontdevila
● Training: 2 networks trained in isolation
● Testing: results are combined
BBOX CNN
feature
vector
1
feature
vector
2
[1 2]
REGION
CNN
vector B
113. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Objects: Segmentation: SDS
113
Slide credit:
Eduard Fontdevila
● Training: as a whole (using segmentation overlap)
● Testing: results are combined (using the output of the penultimate layer)
vector C
114. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Objects: Segmentation: SDS
114
Slide credit:
Eduard Fontdevila
penultimate fully
connected layer
SVM
115. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Objects: Segmentation: SDS
115
Slide credit:
Eduard Fontdevila
116. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Objects: Segmentation: SDS
116
Slide credit:
Eduard Fontdevila
● Results on pixel IU (Jaccard index) to evaluate semantic segmentation:
○ Convert the output of the final system (C+ref) into a pixel-level
category labeling (using pasting scheme, Carreira et al)
117. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
Objects: Segmentation: SDS
117
Slide credit:
Eduard Fontdevila
118. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016)
One lecture organized in four parts
118
Detection Recognition
Local analysis for...
Segmentation
person
bag
me
my bag
person
bag
Proposals
119. Xavier Giró i Nieto, “Deep learning for vision: Objects”. Master in Multimedia, La Salle URL (May 2016) 119
Thank you !
https://imatge.upc.edu/web/people/xavier-giro
https://twitter.com/DocXavi
https://www.facebook.com/ProfessorXavi
xavier.giro@upc.edu
Xavier Giró-i-Nieto