https://telecombcn-dl.github.io/2017-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/2017-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/2017-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/2017-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/2017-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/2017-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/2017-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/2017-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/2017-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/2017-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/2017-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Recurrent Neural Networks (RNNs) continue to show outstanding performance in sequence modeling tasks. However, training RNNs on long sequences often face challenges like slow inference, vanishing gradients and difficulty in capturing long term dependencies. In backpropagation through time settings, these issues are tightly coupled with the large, sequential computational graph resulting from unfolding the RNN in time. We introduce the Skip RNN model which extends existing RNN models by learning to skip state updates and shortens the effective size of the computational graph. This model can also be encouraged to perform fewer state updates through a budget constraint. We evaluate the proposed model on various tasks and show how it can reduce the number of required RNN updates while preserving, and sometimes even improving, the performance of the baseline RNN models.
https://telecombcn-dl.github.io/2017-dlsl/
Winter School on Deep Learning for Speech and Language. UPC BarcelonaTech ETSETB TelecomBCN.
The aim of this course is to train students in methods of deep learning for speech and language. Recurrent Neural Networks (RNN) will be presented and analyzed in detail to understand the potential of these state of the art tools for time series processing. Engineering tips and scalability issues will be addressed to solve tasks such as machine translation, speech recognition, speech synthesis or question answering. Hands-on sessions will provide development skills so that attendees can become competent in contemporary data anlytics tools.
https://telecombcn-dl.github.io/2017-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
https://telecombcn-dl.github.io/2017-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/2017-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
http://imatge-upc.github.io/telecombcn-2016-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
http://imatge-upc.github.io/telecombcn-2016-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
https://telecombcn-dl.github.io/2017-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
An introduction to Deep Learning (DL) concepts, starting with a simple yet complete neural network (no frameworks), followed by aspects of deep neural networks, such as back propagation, activation functions, CNNs, and the AUT theorem. Next, a quick introduction to TensorFlow and Tensorboard, and then some code samples with Scala and TensorFlow.
A fast-paced introduction to Deep Learning (DL) concepts, such as neural networks, back propagation, activation functions, CNNs, RNNs (if time permits), and the CLT/AUT/fixed-point theorems, along with a basic code sample in TensorFlow.
During this session you will learn how to manually create a basic neural network that acts as a classifier, and also the segue from linear regression to a neural network.
You'll also learn about GANs (Generative Adversarial Networks) for static images as well as voice, and the former case, their potential impact on self-driving cars.
Anomaly detection using deep one class classifier홍배 김
- Anomaly detection의 다양한 방법을 소개하고
- Support Vector Data Description (SVDD)를 이용하여
cluster의 모델링을 쉽게 하도록 cluster의 형상을 단순화하고
boundary근방의 애매한 point를 처리하는 방법 소개
An introduction to Deep Learning concepts, with a simple yet complete neural network, CNNs, followed by rudimentary concepts of Keras and TensorFlow, and some simple code fragments.
This presentation is Part 2 of my September Lisp NYC presentation on Reinforcement Learning and Artificial Neural Nets. We will continue from where we left off by covering Convolutional Neural Nets (CNN) and Recurrent Neural Nets (RNN) in depth.
Time permitting I also plan on having a few slides on each of the following topics:
1. Generative Adversarial Networks (GANs)
2. Differentiable Neural Computers (DNCs)
3. Deep Reinforcement Learning (DRL)
Some code examples will be provided in Clojure.
After a very brief recap of Part 1 (ANN & RL), we will jump right into CNN and their appropriateness for image recognition. We will start by covering the convolution operator. We will then explain feature maps and pooling operations and then explain the LeNet 5 architecture. The MNIST data will be used to illustrate a fully functioning CNN.
Next we cover Recurrent Neural Nets in depth and describe how they have been used in Natural Language Processing. We will explain why gated networks and LSTM are used in practice.
Please note that some exposure or familiarity with Gradient Descent and Backpropagation will be assumed. These are covered in the first part of the talk for which both video and slides are available online.
A lot of material will be drawn from the new Deep Learning book by Goodfellow & Bengio as well as Michael Nielsen's online book on Neural Networks and Deep Learning as well several other online resources.
Bio
Pierre de Lacaze has over 20 years industry experience with AI and Lisp based technologies. He holds a Bachelor of Science in Applied Mathematics and a Master’s Degree in Computer Science.
https://www.linkedin.com/in/pierre-de-lacaze-b11026b/
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/2017-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/2017-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Recurrent Neural Networks (RNNs) continue to show outstanding performance in sequence modeling tasks. However, training RNNs on long sequences often face challenges like slow inference, vanishing gradients and difficulty in capturing long term dependencies. In backpropagation through time settings, these issues are tightly coupled with the large, sequential computational graph resulting from unfolding the RNN in time. We introduce the Skip RNN model which extends existing RNN models by learning to skip state updates and shortens the effective size of the computational graph. This model can also be encouraged to perform fewer state updates through a budget constraint. We evaluate the proposed model on various tasks and show how it can reduce the number of required RNN updates while preserving, and sometimes even improving, the performance of the baseline RNN models.
https://telecombcn-dl.github.io/2017-dlsl/
Winter School on Deep Learning for Speech and Language. UPC BarcelonaTech ETSETB TelecomBCN.
The aim of this course is to train students in methods of deep learning for speech and language. Recurrent Neural Networks (RNN) will be presented and analyzed in detail to understand the potential of these state of the art tools for time series processing. Engineering tips and scalability issues will be addressed to solve tasks such as machine translation, speech recognition, speech synthesis or question answering. Hands-on sessions will provide development skills so that attendees can become competent in contemporary data anlytics tools.
https://telecombcn-dl.github.io/2017-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
https://telecombcn-dl.github.io/2017-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/2017-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
http://imatge-upc.github.io/telecombcn-2016-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
http://imatge-upc.github.io/telecombcn-2016-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
https://telecombcn-dl.github.io/2017-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
An introduction to Deep Learning (DL) concepts, starting with a simple yet complete neural network (no frameworks), followed by aspects of deep neural networks, such as back propagation, activation functions, CNNs, and the AUT theorem. Next, a quick introduction to TensorFlow and Tensorboard, and then some code samples with Scala and TensorFlow.
A fast-paced introduction to Deep Learning (DL) concepts, such as neural networks, back propagation, activation functions, CNNs, RNNs (if time permits), and the CLT/AUT/fixed-point theorems, along with a basic code sample in TensorFlow.
During this session you will learn how to manually create a basic neural network that acts as a classifier, and also the segue from linear regression to a neural network.
You'll also learn about GANs (Generative Adversarial Networks) for static images as well as voice, and the former case, their potential impact on self-driving cars.
Anomaly detection using deep one class classifier홍배 김
- Anomaly detection의 다양한 방법을 소개하고
- Support Vector Data Description (SVDD)를 이용하여
cluster의 모델링을 쉽게 하도록 cluster의 형상을 단순화하고
boundary근방의 애매한 point를 처리하는 방법 소개
An introduction to Deep Learning concepts, with a simple yet complete neural network, CNNs, followed by rudimentary concepts of Keras and TensorFlow, and some simple code fragments.
This presentation is Part 2 of my September Lisp NYC presentation on Reinforcement Learning and Artificial Neural Nets. We will continue from where we left off by covering Convolutional Neural Nets (CNN) and Recurrent Neural Nets (RNN) in depth.
Time permitting I also plan on having a few slides on each of the following topics:
1. Generative Adversarial Networks (GANs)
2. Differentiable Neural Computers (DNCs)
3. Deep Reinforcement Learning (DRL)
Some code examples will be provided in Clojure.
After a very brief recap of Part 1 (ANN & RL), we will jump right into CNN and their appropriateness for image recognition. We will start by covering the convolution operator. We will then explain feature maps and pooling operations and then explain the LeNet 5 architecture. The MNIST data will be used to illustrate a fully functioning CNN.
Next we cover Recurrent Neural Nets in depth and describe how they have been used in Natural Language Processing. We will explain why gated networks and LSTM are used in practice.
Please note that some exposure or familiarity with Gradient Descent and Backpropagation will be assumed. These are covered in the first part of the talk for which both video and slides are available online.
A lot of material will be drawn from the new Deep Learning book by Goodfellow & Bengio as well as Michael Nielsen's online book on Neural Networks and Deep Learning as well several other online resources.
Bio
Pierre de Lacaze has over 20 years industry experience with AI and Lisp based technologies. He holds a Bachelor of Science in Applied Mathematics and a Master’s Degree in Computer Science.
https://www.linkedin.com/in/pierre-de-lacaze-b11026b/
A comprehensive tutorial on Convolutional Neural Networks (CNN) which talks about the motivation behind CNNs and Deep Learning in general, followed by a description of the various components involved in a typical CNN layer. It explains the theory involved with the different variants used in practice and also, gives a big picture of the whole network by putting everything together.
Next, there's a discussion of the various state-of-the-art frameworks being used to implement CNNs to tackle real-world classification and regression problems.
Finally, the implementation of the CNNs is demonstrated by implementing the paper 'Age ang Gender Classification Using Convolutional Neural Networks' by Hassner (2015).
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Artificial Intelligence, Machine Learning and Deep LearningSujit Pal
Slides for talk Abhishek Sharma and I gave at the Gennovation tech talks (https://gennovationtalks.com/) at Genesis. The talk was part of outreach for the Deep Learning Enthusiasts meetup group at San Francisco. My part of the talk is covered from slides 19-34.
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
This document provides an overview of deep generative learning and summarizes several key generative models including GANs, VAEs, diffusion models, and autoregressive models. It discusses the motivation for generative models and their applications such as image generation, text-to-image synthesis, and enhancing other media like video and speech. Example state-of-the-art models are provided for each application. The document also covers important concepts like the difference between discriminative and generative modeling, sampling techniques, and the training procedures for GANs and VAEs.
Machine translation and computer vision have greatly benefited from the advances in deep learning. A large and diverse amount of textual and visual data have been used to train neural networks whether in a supervised or self-supervised manner. Nevertheless, the convergence of the two fields in sign language translation and production still poses multiple open challenges, like the low video resources, limitations in hand pose estimation, or 3D spatial grounding from poses.
The transformer is the neural architecture that has received most attention in the early 2020's. It removed the recurrency in RNNs, replacing it with and attention mechanism across the input and output tokens of a sequence (cross-attenntion) and between the tokens composing the input (and output) sequences, named self-attention.
These slides review the research of our lab since 2016 on applied deep learning, starting from our participation in the TRECVID Instance Search 2014, moving into video analysis with CNN+RNN architectures, and our current efforts in sign language translation and production.
Machine translation and computer vision have greatly benefited of the advances in deep learning. The large and diverse amount of textual and visual data have been used to train neural networks whether in a supervised or self-supervised manner. Nevertheless, the convergence of the two field in sign language translation and production is still poses multiple open challenges, like the low video resources, limitations in hand pose estimation, or 3D spatial grounding from poses. This talk will present these challenges and the How2✌️Sign dataset (https://how2sign.github.io) recorded at CMU in collaboration with UPC, BSC, Gallaudet University and Facebook.
https://imatge.upc.edu/web/publications/sign-language-translation-and-production-multimedia-and-multimodal-challenges-all
https://imatge-upc.github.io/synthref/
Integrating computer vision with natural language processing has achieved significant progress
over the last years owing to the continuous evolution of deep learning. A novel vision and language
task, which is tackled in the present Master thesis is referring video object segmentation, in which a
language query defines which instance to segment from a video sequence. One of the biggest chal-
lenges for this task is the lack of relatively large annotated datasets since a tremendous amount of
time and human effort is required for annotation. Moreover, existing datasets suffer from poor qual-
ity annotations in the sense that approximately one out of ten language expressions fails to uniquely
describe the target object.
The purpose of the present Master thesis is to address these challenges by proposing a novel
method for generating synthetic referring expressions for an image (video frame). This method pro-
duces synthetic referring expressions by using only the ground-truth annotations of the objects as well
as their attributes, which are detected by a state-of-the-art object detection deep neural network. One
of the advantages of the proposed method is that its formulation allows its application to any object
detection or segmentation dataset.
By using the proposed method, the first large-scale dataset with synthetic referring expressions for
video object segmentation is created, based on an existing large benchmark dataset for video instance
segmentation. A statistical analysis and comparison of the created synthetic dataset with existing ones
is also provided in the present Master thesis.
The conducted experiments on three different datasets used for referring video object segmen-
tation prove the efficiency of the generated synthetic data. More specifically, the obtained results
demonstrate that by pre-training a deep neural network with the proposed synthetic dataset one can
improve the ability of the network to generalize across different datasets, without any additional annotation cost. This outcome is even more important taking into account that no additional annotation cost is involved.
Master MATT thesis defense by Juan José Nieto
Advised by Víctor Campos and Xavier Giro-i-Nieto.
27th May 2021.
Pre-training Reinforcement Learning (RL) agents in a task-agnostic manner has shown promising results. However, previous works still struggle to learn and discover meaningful skills in high-dimensional state-spaces. We approach the problem by leveraging unsupervised skill discovery and self-supervised learning of state representations. In our work, we learn a compact latent representation by making use of variational or contrastive techniques. We demonstrate that both allow learning a set of basic navigation skills by maximizing an information theoretic objective. We assess our method in Minecraft 3D maps with different complexities. Our results show that representations and conditioned policies learned from pixels are enough for toy examples, but do not scale to realistic and complex maps. We also explore alternative rewards and input observations to overcome these limitations.
https://imatge.upc.edu/web/publications/discovery-and-learning-navigation-goals-pixels-minecraft
Peter Muschick MSc thesis
Universitat Pollitecnica de Catalunya, 2020
Sign language recognition and translation has been an active research field in the recent years with most approaches using deep neural networks to extract information from sign language data. This work investigates the mostly disregarded approach of using human keypoint estimation from image and video data with OpenPose in combination with transformer network architecture. Firstly, it was shown that it is possible to recognize individual signs (4.5% word error rate (WER)). Continuous sign language recognition though was more error prone (77.3% WER) and sign language translation was not possible using the proposed methods, which might be due to low accuracy scores of human keypoint estimation by OpenPose and accompanying loss of information or insufficient capacities of the used transformer model. Results may improve with the use of datasets containing higher repetition rates of individual signs or focusing more precisely on keypoint extraction of hands.
https://github.com/telecombcn-dl/lectures-all/
These slides review techniques for interpreting the behavior of deep neural networks. The talk reviews basic techniques such as the display of filters and tensors, as well as more advanced ones that try to interpret which part of the input data is responsible for the predictions, or generate data that maximizes the activation of certain neurons.
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/dlai-2020/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/drl-2020/
This course presents the principles of reinforcement learning as an artificial intelligence tool based on the interaction of the machine with its environment, with applications to control tasks (eg. robotics, autonomous driving) o decision making (eg. resource optimization in wireless communication networks). It also advances in the development of deep neural networks trained with little or no supervision, both for discriminative and generative tasks, with special attention on multimedia applications (vision, language and speech).
Giro-i-Nieto, X. One Perceptron to Rule Them All: Language, Vision, Audio and Speech. In Proceedings of the 2020 International Conference on Multimedia Retrieval (pp. 7-8).
Tutorial page:
https://imatge.upc.edu/web/publications/one-perceptron-rule-them-all-language-vision-audio-and-speech-tutorial
Deep neural networks have boosted the convergence of multimedia data analytics in a unified framework shared by practitioners in natural language, vision and speech. Image captioning, lip reading or video sonorization are some of the first applications of a new and exciting field of research exploiting the generalization properties of deep neural representation. This tutorial will firstly review the basic neural architectures to encode and decode vision, text and audio, to later review the those models that have successfully translated information across modalities.
Image segmentation is a classic computer vision task that aims at labeling pixels with semantic classes. These slides provide an overview of the basic approaches applied from the deep learning field to tackle this challenge and presents the basic subtasks (semantic, instance and panoptic segmentation) and related datasets.
Presented at the International Summer School on Deep Learning (ISSonDL) 2020 held online and organized by the University of Gdansk (Poland) between the 30th August and 2nd September.
http://2020.dl-lab.eu/virtual-summer-school-on-deep-learning/
https://imatge-upc.github.io/rvos-mots/
Video object segmentation can be understood as a sequence-to-sequence task that can benefit from the curriculum learning strategies for better and faster training of deep neural networks. This work explores different schedule sampling and frame skipping variations to significantly improve the performance of a recurrent architecture. Our results on the car class of the KITTI-MOTS challenge indicate that, surprisingly, an inverse schedule sampling is a better option than a classic forward one. Also, that a progressive skipping of frames during training is beneficial, but only when training with the ground truth masks instead of the predicted ones.
Deep neural networks have achieved outstanding results in various applications such as vision, language, audio, speech, or reinforcement learning. These powerful function approximators typically require large amounts of data to be trained, which poses a challenge in the usual case where little labeled data is available. During the last year, multiple solutions have been proposed to leverage this problem, based on the concept of self-supervised learning, which can be understood as a specific case of unsupervised learning. This talk will cover its basic principles and provide examples in the field of multimedia.
Deep neural networks have revolutionized the data analytics scene by improving results in several and diverse benchmarks with the same recipe: learning feature representations from data. These achievements have raised the interest across multiple scientific fields, especially in those where large amounts of data and computation are available. This change of paradigm in data analytics has several ethical and economic implications that are driving large investments, political debates and sounding press coverage under the generic label of artificial intelligence (AI). This talk will present the fundamentals of deep learning through the classic example of image classification, and point at how the same principal has been adopted for several tasks. Finally, some of the forthcoming potentials and risks for AI will be pointed.
More from Universitat Politècnica de Catalunya (20)
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificial Intelligence)
1. [course
site]
Verónica Vilaplana
veronica.vilaplana@upc.edu
Associate Professor
Universitat Politecnica de Catalunya
Technical University of Catalonia
Convolutional Neural
Networks
Day 5 Lecture 1
#DLUPC
4. Neural
networks
for
visual
data
• Example:
Image
recogni)on
• Given
some
input
image,
iden-fy
which
object
it
contains
4
sun
flower
Caltech101
dataset
image
size
150x112
neurons
connected
to
16800
inputs
5. Neural
networks
for
visual
data
• We
can
design
neural
networks
that
are
specifically
adapted
for
such
problems
• must
deal
with
very
high-‐dimensional
inputs
• 150
x
112
pixels
=
16800
inputs,
or
3
x
16800
if
RGB
pixels
• can
exploit
the
2D
topology
of
pixels
(or
3D
for
video
data)
• can
build
in
invariance
to
certain
varia-ons
we
can
expect
• transla-ons,
illumina-on,
etc.
• Convolu)onal
networks
are
a
specialized
kind
of
neural
network
for
processing
data
that
has
a
known,
grid-‐like
topology.
They
leverage
these
ideas:
• local
connec-vity
• parameter
sharing
• pooling
/
subsampling
hidden
units
5
S.
Credit:
H.
Larochelle
6. Convolu)onal
neural
networks
Local
connec)vity
• First
idea:
use
a
local
connec-vity
of
hidden
units
• each
hidden
unit
is
connected
only
to
a
subregion
(patch)
of
the
input
image:
recep)ve
field
• it
is
connected
to
all
channels
• 1
if
greyscale
image
• 3
(R,
G,
B)
for
color
image
• …
• Solves
the
following
problems:
• fully
connected
hidden
layer
would
have
an
unmanageable
number
of
parameters
• compu-ng
the
linear
ac-va-ons
of
the
hidden
units
would
be
very
expensive
6
S.
Credit:
H.
Larochelle
7. Convolu)onal
neural
networks
Parameter
sharing
• Second
idea:
share
matrix
of
parameters
across
certain
units
• units
organized
into
the
same
‘‘feature
map’’
share
parameters
• hidden
units
within
a
feature
map
cover
different
posi-ons
in
the
image
• Solves
the
following
problems:
• reduces
even
more
the
number
of
parameters
• will
extract
the
same
features
at
every
posi-on
(features
are
‘‘equivariant’’)
7
Wij
is
the
matrix
connec-ng
the
ith
input
channel
with
the
jth
feature
map
S.
Credit:
H.
Larochelle
8. Convolu)onal
neural
networks
Parameter
sharing
• Each
feature
map
forms
a
2D
grid
of
features
• can
be
computed
with
a
discrete
convolu)on
of
a
kernel
matrix
kij
which
is
the
hidden
weights
matrix
Wij
with
its
rows
and
columns
flipped
8
Convolu-ons
Input
image
Input
Feature
Map
...
yj
= f ( kij
∗ xi
)
i
∑
xi
ith
channel
of
input,
yj
hidden
layer
S.
Credit:
H.
Larochelle
9. Convolu)onal
neural
networks
• Convolu-on
as
feature
extrac-on
(applying
a
filterbank)
• but
filters
are
learned
9
Input
Feature
Map
.
.
.
S.
Credit:
H.
Larochelle
10. Convolu)onal
neural
networks
Pooling
and
subsampling
• Third
idea:
pool
hidden
units
in
same
neighborhood
• pooling
is
performed
in
non-‐overlapping
neighborhoods
(subsampling)
• an
alterna-ve
to
‘‘max’’
pooling
is
‘‘average’’
pooling
• pooling
reduces
dimensionality
and
provides
invariance
to
small
local
changes
10
Max
S.
Credit:
H.
Larochelle
11. Convolu)onal
Neural
Networks
• Convolu)onal
neural
networks
alternate
between
convolu)onal
layers
(followed
by
a
nonlinearity)
and
pooling
layers
(basic
architecture)
• For
recogni-on:
output
layer
is
a
regular,
fully
connected
layer
with
sohmax
non-‐
linearity
• output
provides
an
es-mate
of
the
condi-onal
probability
of
each
class
• The
network
is
trained
by
stochas-c
gradient
descent
(&
variants)
• backpropaga-on
is
used
similarly
as
in
a
fully
connected
network
11
S.
Credit:
H.
Larochelle
train
the
weights
of
filters
12. Convolu)onal
Neural
Networks
12
‘Clasic’
architecture
for
object
recogni-on
(BoW):
CNN=
learning
hierarchical
representa)ons
with
increasing
levels
of
abstrac)on
End-‐to-‐end
training:
joint
op-miza-on
of
features
and
classifier
Feature
visualiza-on
of
convolu-onal
net
trained
on
ImageNet
(Zeiler,
Fergus,
2013)
13. Example:
LeNet-‐5
13
• LeCun
et
al.,
1998
MNIST
digit
classifica-on
problem
handwrilen
digits
60,000
training
examples
10,000
test
samples
10
classes
28x28
grayscale
images
Y.
LeCun,
L.
Bolou,
Y.
Bengio,
and
P.
Haffner,
Gradient-‐based
learning
applied
to
document
recogni-on,
Proc.
IEEE
86(11):
2278–2324,
1998.
Conv
filters
were
5x5,
applied
at
stride
1
Sigmoid
or
tanh
nonlinearity
Subsampling
(average
pooling)
layers
were
2x2
applied
at
stride
2
Fully
connected
layers
at
the
end
i.e.
architecture
is
[CONV-‐POOL-‐CONV-‐POOL-‐CONV-‐FC]
15. Convolu)onal
Neural
Networks
15
A
regular
3-‐layer
Neural
Network A
ConvNet
with
3
layers
• In
ConvNets
inputs
are
‘images’
(architecture
is
constrained)
• A
ConvNet
arranges
neurons
in
three
dimensions
(width,
height,
depth)
• Every
layer
transforms
3D
input
volume
to
a
3D
output
volume
of
neuron
ac-va-ons
S.
Credit:
Stanford
cs231_2017
17. Convolu)onal
layer
17
32x32x3
input
5x5x3
filter
Filters
always
extend
the
full
detph
of
the
input
volume
Convolve
the
filter
with
the
input
i.e.
slide
over
the
input
spa-ally,
compu-ng
dot
products
depth
width
height
S.
Credit:
Stanford
cs231_2017
• Convolu-on
on
a
volume
18. Convolu)onal
layer
18
32x32x3
input
5x5x3
filter
w
Ac-va-on
map
or
feature
map
Convolve
(slide)
over
all
spa-al
loca-ons
1
number:
the
result
of
taking
a
dot
product
between
the
filter
and
a
small
5x5x3
chunk
of
the
input:
5x5x3=75-‐dim
dot
product
+
bias
wtx+b
S.
Credit:
Stanford
cs231_2017
• Convolu-on
on
a
volume
19. Convolu)onal
layer
19
32x32x3
input
5x5x3
filter
w
Ac-va-on
maps
Convolve
(slide)
over
all
spa-al
loca-ons
Consider
a
second,
green
filter
S.
Credit:
Stanford
cs231_2017
20. Convolu)onal
layer
20
Ac-va-on
maps
of
feature
maps
Convolu-on
Layer
We
stack
the
maps
up
to
get
a
‘new
volume’
of
size
28x28x6
If
we
have
6
5x5x3
filters,
we
get
6
separate
ac-va-on
maps
So
applying
a
filterbank
to
an
input
(3D
matrix)
yields
a
cube-‐like
output,
a
3D
matrix
in
which
each
slice
is
an
output
of
convolu-on
with
one
filter.
S.
Credit:
Stanford
cs231_2017
21. Convolu)onal
layer
21
ConvNet
is
a
sequence
of
Convolu-onal
Layers,
interspersed
with
ac-va-on
func-ons
and
pooling
layers
(and
a
small
number
of
fully
connected
layers)
We
add
more
layers
of
filters.
We
apply
filters
(convolu-ons)
to
the
output
volume
of
the
previous
layer.
The
result
of
each
convolu-on
is
a
slice
in
the
new
volume.
S.
Credit:
Stanford
cs231_2017
22. Example:
filters
and
ac)va)on
maps
22
Example
CNN
trained
for
image
recogni-on
on
CIFAR
dataset
The
network
learns
features
that
ac-vate
when
they
see
some
specific
type
of
feature
at
some
spa-al
posi-on
in
the
input.
Stacking
ac-va-on
maps
for
all
filters
along
depth
dimension
forms
the
full
output
volume
hlp://cs.stanford.edu/people/karpathy/convnetjs/demo/cifar10.html
23. Convolu)on
23
7x7
input
(spa-ally)
assume
3x3
filter:
5x5
output
(stride
1)
Hyperparameters:
filter
spa-al
extent
F,
stride
S,
padding
P
Stride
is
the
number
of
pixels
by
which
we
slide
the
filter
matrix
over
the
input
matrix.
Larger
stride
will
produce
smaller
feature
maps.
7
7
7
7
7x7
input
(spa-ally)
assume
3x3
filter
applied
with
stride
2:
3x3
output
S.
Credit:
Stanford
cs231_2017
24. Convolu)on
24
N
N
Output
size:
(N-‐F)/
stride
+
1
e.g.
N=7,
F=3
stride
1:
(7-‐3)/1+1
=
5
stride
2:
(7-‐3)/2+1
=
3
stride
3:
(7-‐3)/3+1
=
2.33
not
applied
E.g.
N=7,
F=3
,
stride
1,
pad
with
1
pixel
border:
output
size
7x7
In
general,
common
to
see
CONV
layers
with
stride
1,
filters
FxF,
zero-‐padding
with
P=
(F-‐1)/2:
preserve
size
spa)ally
common:
zero-‐padding
in
the
border
S.
Credit:
Stanford
cs231_2017
Hyperparameters:
filter
spa-al
extent
F,
stride
S,
padding
P
25. 1x1
convolu)ons
25
1x1
conv
with
32
filters
1x1
convolu-on
layers:
used
to
reduce
dimensionality
(number
of
feature
maps)
each
filter
has
size
1x1x64
and
performs
a
64-‐dimensional
dot
product
S.
Credit:
Stanford
cs231_2017
26. Example:
size,
parameters
26
Input
volume:
32x32x3
10 5x5
filters
with
stride
1,
pad
2
S.
Credit:
Stanford
cs231_2017
Output
volume
size:
(
N
+
2P
–
F
)
/
S
+
1
(32+2*2-‐5)/1+1=
32
spa-ally
so
32x32x10
Number
of
parameters
in
this
layer:
each
filter
has
5x5x3
+
1
=76
params
(
+1
for
bias)
-‐>
76x10
=
760
parameters
27. Summary:
conv
layer
To
summarize,
the
Conv
Layer
• Accepts
a
volume
of
size
W1
x
H1
x
D1
• Requires
four
hyperparameters
• number
of
filters
K
• their
spa)al
extent
F
• the
stride
S
• the
amount
of
zero
padding
P
• Produces
a
volume
of
size
W2
x
H2
x
D2
• W2
=
(W1
–
F
+
2P)
/
S
+
1
• H2
=
(H1
–
F
+
2P)
/
S
+
1
• D2
=
K
• With
parameter
sharing,
it
introduces
F.F.D1
weights
per
filter,
for
a
total
of
(F.F.D1).K
weights
and
K
biases
• In
the
output
volume,
the
dth
depth
slice
(of
size
W2xH2)
is
the
result
of
performing
a
valid
convolu-on
of
the
dth
filter
over
the
input
volume
with
a
stride
of
S,
and
then
offset
by
dth
bias
27
Common
sevngs:
K
=
powers
of
2
(32,64,128,256)
F=3,
S=1,
P=1
F=5,
S=1,
P=2
F=5,
S=2,
P=?
(whatever
fits)
F=1,
S=1,
P=0
S.
Credit:
Stanford
cs231_2017
28. Ac)va)on
func)ons
• Desirable
proper)es:
mostly
smooth,
con-nuous,
differen-able,
fairly
linear
28
Sigmoid
tanh
ReLu
Leaky
ReLu
Maxout
ELU
tanh x =
ex
− e−x
ex
+ e−x
σ (x) =
1
1+ e−x
max(0.1x,x)
max(0,x)
max(w1
T
x + b1
,w2
T
x + b2
)
30. Pooling
layer
• Makes
the
representa-ons
smaller
and
more
manageable
for
later
layers
• Useful
to
get
invariance
to
small
local
changes
• Operates
over
each
ac-va-on
map
independently
30
S.
Credit:
Stanford
cs231_2017
31. Pooling
layer
• Max
pooling
• Other
pooling
func-ons:
average
pooling
or
L2-‐norm
pooling
31
y
x
single
depth
slice
max
pool
with
2x2
filters
and
stride
2
S.
Credit:
Stanford
cs231_2017
33. Summary:
pooling
layer
To
summarize,
the
pooling
layer
• Accepts
a
volume
of
size
W1xH1xD1
• Requires
two
hyperparameters
• the
spa)al
extent
F
• the
stride
S
• Produces
a
volume
of
size
W2xH2xD2,
where
• W2
=
(W1-‐F)
/
S
+
1
• H2
=
(H1-‐F)
/
S
+
1
• D2
=
D1
• Introduces
zero
parameters
since
it
computes
a
fixed
func-on
of
the
input
33
Common
sevngs:
F=2,
S=2
F=3,
S=2
S.
Credit:
Stanford
cs231_2017
34. Fully
connected
layer
• In
the
end
it
is
common
to
add
one
or
more
fully
(or
densely
)
connected
layers.
• Every
neuron
in
the
previous
layer
is
connected
to
every
neuron
in
the
next
layer
(as
in
regular
neural
networks).
Ac-va-on
is
computed
as
matrix
mul-plica-on
plus
bias
• At
the
output,
sohmax
ac-va-on
for
classifica-on
34
connec-ons
and
weights
not
shown
here
4
possible
outputs
35. Fully
connected
layers
and
Convolu)onal
layers
• A
convolu-onal
layer
can
be
implemented
as
a
fully
connected
layer
• The
weight
matrix
is
a
large
matrix
that
is
mostly
zero
except
for
certain
blocks
(due
to
local
connec-vity)
35
I
input
image
4x4
vectorized
to
16x1
O
output
image
4x1
(later
reshaped
2x2)
h
3x3
kernel;
C
16x4
(weights)
37. Fully
connected
layers
and
Convolu)onal
layers
• Fully
connected
layers
can
also
be
viewed
as
convolu-ons
with
kernels
that
cover
the
en-re
input
region
• Example:
A
fully
convolu-onal
layer
with
K=4096
neurons,
input
volume
32x32x512
can
be
expressed
as
a
convolu-onal
layer
with
F=32
(size
of
kernel),
P=0,
S=1
K=4096
(number
of
filters)
the
filter
size
is
exactly
the
size
of
the
input
volume
the
output
is
1x1x4096
37
38. Batch
normaliza)on
layer
• As
learning
progresses,
the
distribu-on
of
the
layer
inputs
changes
due
to
parameter
updates
(
internal
covariate
shih)
• This
can
result
in
most
inputs
being
in
the
non-‐linear
regime
of
the
ac-va-on
func-on,
slowing
down
learning
• Bach
normaliza-on
is
a
technique
to
reduce
this
effect
• Explicitly
force
the
layer
ac-va-ons
to
have
zero
mean
and
unit
variance
w.r.t
running
batch
es-mates
• Adds
a
learnable
scale
and
bias
term
to
allow
the
network
to
s-ll
use
the
nonlinearity
38
Ioffe
and
Szegedy,
2015.
“Batch
normaliza-on:
accelera-ng
deep
network
training
by
reducing
internal
covariate
shih”
FC
/
Conv
Batch
norm
ReLu
FC
/
Conv
Batch
norm
ReLu
ˆx(k )
=
x(k )
− E(x(k )
)
var(x(k )
)
y(k )
= γ (k )
ˆx(k )
+ β(k )
39. Upsampling
layers:
recovering
spa)al
shape
• Mo-va-on:
seman-c
segmenta-on.
Make
predic-ons
for
all
pixels
at
once
39
Problem:
convolu-ons
at
original
image
resolu-on
will
be
very
expensive
S.
Credit:
Stanford
cs231_2017
40. Upsampling
layers:
recovering
spa)al
shape
• Mo-va-on:
seman-c
segmenta-on.
Make
predic-ons
for
all
pixels
at
once
40
Design
a
network
as
a
sequence
of
convolu-onal
layers
with
downsampling
and
upsampling
Other
applica)ons:
super-‐resolu)on,
flow
es)ma)on,
genera)ve
modeling
S.
Credit:
Stanford
cs231_2017
41. Learnable
upsampling
• Recall:
3x3
convolu-on,
stride
1,
pad
1
41
S.
Credit:
Stanford
cs231_2017
42. Learnable
upsampling
• Recall:
3x3
convolu-on,
stride
2,
pad
1
42
Filter
moves
2
pixels
in
the
input
for
every
one
pixel
in
the
output
Stride
gives
ra-o
between
movement
in
input
and
output
S.
Credit:
Stanford
cs231_2017
43. Learnable
upsampling:
transposed
convolu)on
• 3x3
transposed
convolu-on,
stride
2,
pad1
43
Sum
where
output
overlaps
Filter
moves
2
pixels
in
the
output
for
every
one
pixel
in
the
input
Stride
gives
ra-o
between
movement
in
input
and
output
S.
Credit:
Stanford
cs231_2017
44. Learnable
upsampling:
transposed
convolu)on
• 1D
example
44
More info:
Dumoulin et al, A guide to convolution arithmetic for deep learning, 2016
https://github.com/vdumoulin/conv_arithmetic
Shi, Is the deconvolution layer the same as a convolutional layer?, 2016
Other
names
-‐transposed
convolu-on
-‐backward
strided
convolu-on,
-‐frac-onally
strided
convolu-on,
-‐upconvolu-on,
-‐“deconvolu-on”
S.
Credit:
Stanford
cs231_2017
45. Learnable
upsampling
It
is
always
possible
to
emulate
a
transposed
convolu-on
with
a
direct
convolu-on
(with
frac-onal
stride)
45
More info:
Dumoulin et al, A guide to convolution arithmetic for deep learning, 2016
https://github.com/vdumoulin/conv_arithmetic
46. Learnable
upsampling
46
1D Convolution with stride 2
x input size 8
y output size 5
filter size 4 1D Transposed
Convolution with stride 2
1D Subpixel convolution
with stride 1/2
Transposed
convolu-on
vs
frac-onal
strided
convolu-on:
the
two
operators
can
achieve
the
same
result
if
the
filters
are
learned.
Shi, Is the deconvolution layer the same as a convolutional layer?, 2016
1D example:
47. Dilated
convolu)ons
• Systema-cally
aggregate
mul-scale
contextual
informa-on
without
losing
resolu-on
Usual
convolu-on
Dilated
convolu-on
47
Yu,
Koltun,
Mul--‐scale
context
aggrega-on
by
dilated
convolu-ons,
2016
(F ∗k)( p) = F( p − t)k(t)
t
∑ (F ∗l
k)( p) = F( p − lt)k(t)
t
∑
Fi+1
= Fi
∗2i ki
i = 0,1,2,...n − 2The
recep-ve
field
grows
exponen-ally
as
you
add
more
layers
Number
of
parameters
increases
linearly
as
you
add
more
layers
50. AlexNet
(2012)
• Similar
framework
to
LeNet:
• 8
parameter
layers
(5
convolu-onal,
3
fully
connected)
• Max
pooling,
ReLu
nonlineari-es
• 650,000
units,
60
million
parameters)
• trained
on
two
GPUs
for
a
week
• Dropout
regulariza-on
50
A.
Krizhevsky,
I.
Sutskever,
and
G.
Hinton,
ImageNet
Classifica-on
with
Deep
Convolu-onal
Neural
Networks,
NIPS
2012
51. AlexNet
(2012)
51
Full
AlexNet
architecture:
8
layers
[227x227x3]
INPUT
[55x55x96]
CONV1:
96
11x11
filters
at
stride
4,
pad
0
[27x27x96]
MAX
POOL1:
3x3
filters
at
stride
2
[27x27x96]
NORM1:
Normaliza-on
layer
[27x27x256]
CONV2:
256
5x5
filters
at
stride
1,
pad
2
[13x13x256]
MAX
POOL2:
3x3
filters
at
stride
2
[13x13x256]
NORM2:
Normaliza-on
layer
[13x13x384]
CONV3:
384
3x3
filters
at
stride
1,
pad
1
[13x13x384]
CONV4:
384
3x3
filters
at
stride
1,
pad
1
[13x13x256]
CONV5:
256
3x3
filters
at
stride
1,
pad
1
[6x6x256]
MAX
POOL3:
3x3
filters
at
stride
2
[4096]
FC6:
4096
neurons
[4096]
FC7:
4096
neurons
[1000]
FC8:
1000
neurons
(class
scores)
Details/Retrospec)ves:
-‐
first
use
of
ReLU
-‐
used
Norm
layers
(not
common)
-‐
heavy
data
augmenta-on
-‐
dropout
0.5
-‐
batch
size
128
-‐
SGD
Momentum
0.9
-‐
Learning
rate
1e-‐2,
reduced
by
10
manually
when
val
accuracy
plateaus
-‐
L2
weight
decay
5e-‐4
ILSVRC
2012
winner
7
CNN
ensemble:
18.2%
-‐>
15.4%
52. AlexNet
(2012)
• Visualiza-on
of
the
96
11x11
filters
learned
by
the
first
layer
52
A.
Krizhevsky,
I.
Sutskever,
and
G.
Hinton,
ImageNet
Classifica-on
with
Deep
Convolu-onal
Neural
Networks,
NIPS
2012
53. VGGNet-‐16
(2014)
53
K.
Simonyan
and
A.
Zisserman,
Very
Deep
Convolu-onal
Networks
for
Large-‐Scale
Image
Recogni-on,
ICLR
2015
Seq.
of
deeper
nets
trained
progressively
Large
recep-ve
fields
replaced
by
3x3
conv
Only:
3x3
CONV
stride
1,
pad
1
and
2x2
MAX
POOL
stride
2
Shows
that
depth
is
a
cri-cal
component
for
good
performance
TOTAL
memory:
24M
*
4
bytes
~=
93MB
/
image
(only
forward!
~*2
for
bwd)
most
memory
is
in
the
early
CONV
TOTAL
params:
138M
parameters
most
parameters
are
in
late
FC
54. GoogLeNet
(2014)
54
-‐
Incep-on
module
that
drama-cally
reduced
the
number
of
parameters
-‐
Uses
average
pooling
instead
of
FC
Compared
to
AlexNet
12x
less
parameters
(5M
vs
60M)
2x
more
compute
6.67%
(vs
16,4%)
22
layers
Convolu-on
Pooling
Sohmax
Other
C.
Szegedy
et
al.,
Going
deeper
with
convolu-ons,
CVPR
2015
9
incep-on
modules:
Network
in
a
network…
55. GoogLeNet
(2014)
• The
Incep-on
Module
• Parallel
paths
with
different
recep-ve
field
sizes
and
opera-ons
are
meant
to
capture
sparse
palerns
of
correla-ons
in
the
stack
of
feature
maps
• Use
1x1
convolu-ons
for
dimensionality
reduc-on
before
expensive
convolu-ons
55
C.
Szegedy
et
al.,
Going
deeper
with
convolu-ons,
CVPR
2015
56. GoogLeNet
(2014)
56
Auxiliary
classifier
C.
Szegedy
et
al.,
Going
deeper
with
convolu-ons,
CVPR
2015
• Two
Sohmax
Classifiers
at
intermediate
layers
combat
the
vanishing
gradient
while
providing
regulariza-on
at
training
-me.
...and
no
fully
connected
layers
needed
!
Convolu-on
Pooling
Sohmax
Other
57. Incep)on
v2,
v3,
v4
• Regularize
training
with
batch
normaliza-on,
reducing
importance
of
auxiliary
classifiers
• More
variants
of
incep-on
modules
with
aggressive
factoriza-on
of
filters
57
C.
Szegedy
et
al.,
Rethinking
the
incep-on
architecture
for
computer
vision,
CVPR
2016
C.
Szegedy
et
al.,
Incep-on-‐v4,
Incep-on-‐ResNet
and
the
Impact
of
Residual
Connec-ons
on
Learning,
arXiv
2016
58. ResNet
(2015)
58
ILSVRC
2015
winner
(3.6%
top
5
error)
MSRA:
ILSVRC
&
COCO
2015
compe--ons
-‐ ImageNet
Classifica-on:
“ultra
deep”,
152-‐leyers
-‐ ImageNet
Detec-on:
16%
beler
than
2nd
-‐ ImageNet
Localiza-on:
27%
beler
than
2nd
-‐ COCO
Detec-on:
11%
beler
than
2nd
-‐ COCO
Segmenta-on:
12%
beler
than
2nd
2-‐3
weeks
of
training
on
8
GPU
machine
at
run-me:
faster
than
a
VGGNet!
(even
though
it
has
8x
more
layers)
spa-al
dimension
only
56x56!
Kaiming
He,
Xiangyu
Zhang,
Shaoqing
Ren,
and
Jian
Sun,
Deep
Residual
Learning
for
Image
Recogni-on,
CVPR
2016
(Best
Paper)
59. ResNet
(2015)
59
Plain
net
-‐
Batch
Normaliza-on
aher
every
CONV
layer
-‐
Xavier/2
ini-aliza-on
from
He
et
al.
-‐
SGD
+
Momentum
(0.9)
-‐
Learning
rate:
0.1,
divided
by
10
when
valida-on
error
plateaus
-‐
Mini-‐batch
size
256
-‐
Weight
decay
of
1e-‐5
-‐
No
dropout
used
Residual
net
Residual
learning:
reformulate
the
layers
as
learning
residual
func)ons
with
reference
to
the
layer
inputs,
instead
of
learning
unreferenced
func-ons
-‐
Introduces
skip
or
shortcut
connec-ons
(exis-ng
before
in
the
literature)
-‐
Make
it
easy
for
net
layers
to
represent
the
iden-ty
mapping
Kaiming
He,
Xiangyu
Zhang,
Shaoqing
Ren,
and
Jian
Sun,
Deep
Residual
Learning
for
Image
Recogni-on,
CVPR
2016
(Best
Paper)
60. ResNet
(2015)
• Deeper
residual
module
(bolleneck)
60
¤ Directly
performing
3x3
convolu-ons
with
256
feature
maps
at
input
and
output:
256
x
256
x
3
x
3
~
600K
opera-ons
¤ Using
1x1
convolu-ons
to
reduce
256
to
64
feature
maps,
followed
by
3x3
convolu-ons,
followed
by
1x1
convolu-ons
to
expand
back
to
256
maps:
256
x
64
x
1
x
1
~
16K
64
x
64
x
3
x
3
~
36K
64
x
256
x
1
x
1
~
16K
Total:
~70K
Kaiming
He,
Xiangyu
Zhang,
Shaoqing
Ren,
and
Jian
Sun,
Deep
Residual
Learning
for
Image
Recogni-on,
CVPR
2016
(Best
Paper)
61. Deeper
models
61
Kaiming
He,
Xiangyu
Zhang,
Shaoqing
Ren,
and
Jian
Sun,
Deep
Residual
Learning
for
Image
Recogni-on,
CVPR
2016
62. ILSVRC
2012-‐2015
summary
62
Team
Year
Place
Error
(top-‐5)
External
data
SuperVision
–
Toronto
(AlexNet,
7
layers)
2012
-‐
16.4%
no
SuperVision
2012
1st
15.3%
ImageNet
22k
Clarifai
–
NYU
(7
layers)
2013
-‐
11.7%
no
Clarifai
2013
1st
11.2%
ImageNet
22k
VGG
–
Oxford
(16
layers)
2014
2nd
7.32%
no
GoogLeNet
(19
layers)
2014
1st
6.67%
no
ResNet
(152
layers)
2015
1st
3.57%
Human
expert*
5.1%
hlp://karpathy.github.io/2014/09/02/what-‐i-‐learned-‐from-‐compe-ng-‐against-‐a-‐convnet-‐on-‐imagenet/
63. Summary
• Convolu-onal
neural
networks
are
a
specialized
kind
of
neural
network
for
processing
data
that
has
a
known,
grid-‐like
topology
• CNNs
leverage
these
ideas:
• local
connec-vity
• parameter
sharing
• pooling
/
subsampling
hidden
units
• Layers:
convolu-onal,
non-‐linear
ac-va-on,
pooling,
upsampling,
batch
normaliza-on
• Architectures
for
object
recogni-on
in
images
63