This document provides an overview of deep learning techniques including:
- Greedy layer-wise training for supervised learning using techniques like deep belief networks, stacked denoising auto-encoders, and stacked predictive sparse coding.
- Unsupervised pre-training of these networks provides better initialization than random initialization, allowing deep networks to be trained effectively.
- Applications of deep learning discussed include vision, audio, and language processing.
P04 restricted boltzmann machines cvpr2012 deep learning methods for visionzukun
The document summarizes deep learning methods for computer vision, specifically discussing restricted Boltzmann machines (RBMs), deep belief networks (DBNs), and their training algorithms. RBMs and DBNs are unsupervised learning models that can learn feature representations from unlabeled data. DBNs are deep neural networks composed of multiple RBMs trained in a greedy layer-wise fashion. The training algorithms, like contrastive divergence, maximize the likelihood of the data to learn the model parameters.
This document provides an overview of autoencoders and their use in unsupervised learning for deep neural networks. It discusses the history and development of neural networks, including early work in the 1940s-1980s and more recent advances in deep learning. It then explains how autoencoders work by setting the target values equal to the inputs, describes variants like denoising autoencoders, and how stacking autoencoders can create deep architectures for tasks like document retrieval, facial recognition, and signal denoising.
This document provides an overview and literature review of unsupervised feature learning techniques. It begins with background on machine learning and the challenges of feature engineering. It then discusses unsupervised feature learning as a framework to learn representations from unlabeled data. The document specifically examines sparse autoencoders, PCA, whitening, and self-taught learning. It provides details on the mathematical concepts and implementations of these algorithms, including applying them to learn features from images. The goal is to use unsupervised learning to extract features that can enhance supervised models without requiring labeled training data.
Piotr Mirowski - Review Autoencoders (Deep Learning) - CIUUK14Daniel Lewis
Piotr Mirowski (of Microsoft Bing London) presented Review of Auto-Encoders to the Computational Intelligence Unconference 2014, with our Deep Learning stream. These are his slides. Original link here: https://piotrmirowski.files.wordpress.com/2014/08/piotrmirowski_ciunconf_2014_reviewautoencoders.pptx
He also has Matlab-based tutorial on auto-encoders available here:
https://github.com/piotrmirowski/Tutorial_AutoEncoders/
(1) The document discusses using autoencoders for image classification. Autoencoders are neural networks trained to encode inputs so they can be reconstructed, learning useful features in the process. (2) Stacked autoencoders and convolutional autoencoders are evaluated on the MNIST handwritten digit dataset. Greedy layerwise training is used to construct deep pretrained networks. (3) Visualization of hidden unit activations shows the features learned by the autoencoders. The main difference between autoencoders and convolutional networks is that convolutional networks have more hardwired topological constraints due to the convolutional and pooling operations.
The document introduces autoencoders, which are neural networks that compress an input into a lower-dimensional code and then reconstruct the output from that code. It discusses that autoencoders can be trained using an unsupervised pre-training method called restricted Boltzmann machines to minimize the reconstruction error. Autoencoders can be used for dimensionality reduction, document retrieval by compressing documents into codes, and data visualization by compressing high-dimensional data points into 2D for plotting with different categories colored separately.
Deep Style: Using Variational Auto-encoders for Image GenerationTJ Torres
This document summarizes a presentation about using variational autoencoders for image generation. It discusses using unsupervised deep learning techniques like autoencoders to learn feature representations from image data without labels. Specifically, it covers variational autoencoders, which regularize the training of standard autoencoders by modeling the latent space as a probability distribution rather than a single point. The presentation outlines building and training a simple variational autoencoder model using the Chainer deep learning framework in Python.
P04 restricted boltzmann machines cvpr2012 deep learning methods for visionzukun
The document summarizes deep learning methods for computer vision, specifically discussing restricted Boltzmann machines (RBMs), deep belief networks (DBNs), and their training algorithms. RBMs and DBNs are unsupervised learning models that can learn feature representations from unlabeled data. DBNs are deep neural networks composed of multiple RBMs trained in a greedy layer-wise fashion. The training algorithms, like contrastive divergence, maximize the likelihood of the data to learn the model parameters.
This document provides an overview of autoencoders and their use in unsupervised learning for deep neural networks. It discusses the history and development of neural networks, including early work in the 1940s-1980s and more recent advances in deep learning. It then explains how autoencoders work by setting the target values equal to the inputs, describes variants like denoising autoencoders, and how stacking autoencoders can create deep architectures for tasks like document retrieval, facial recognition, and signal denoising.
This document provides an overview and literature review of unsupervised feature learning techniques. It begins with background on machine learning and the challenges of feature engineering. It then discusses unsupervised feature learning as a framework to learn representations from unlabeled data. The document specifically examines sparse autoencoders, PCA, whitening, and self-taught learning. It provides details on the mathematical concepts and implementations of these algorithms, including applying them to learn features from images. The goal is to use unsupervised learning to extract features that can enhance supervised models without requiring labeled training data.
Piotr Mirowski - Review Autoencoders (Deep Learning) - CIUUK14Daniel Lewis
Piotr Mirowski (of Microsoft Bing London) presented Review of Auto-Encoders to the Computational Intelligence Unconference 2014, with our Deep Learning stream. These are his slides. Original link here: https://piotrmirowski.files.wordpress.com/2014/08/piotrmirowski_ciunconf_2014_reviewautoencoders.pptx
He also has Matlab-based tutorial on auto-encoders available here:
https://github.com/piotrmirowski/Tutorial_AutoEncoders/
(1) The document discusses using autoencoders for image classification. Autoencoders are neural networks trained to encode inputs so they can be reconstructed, learning useful features in the process. (2) Stacked autoencoders and convolutional autoencoders are evaluated on the MNIST handwritten digit dataset. Greedy layerwise training is used to construct deep pretrained networks. (3) Visualization of hidden unit activations shows the features learned by the autoencoders. The main difference between autoencoders and convolutional networks is that convolutional networks have more hardwired topological constraints due to the convolutional and pooling operations.
The document introduces autoencoders, which are neural networks that compress an input into a lower-dimensional code and then reconstruct the output from that code. It discusses that autoencoders can be trained using an unsupervised pre-training method called restricted Boltzmann machines to minimize the reconstruction error. Autoencoders can be used for dimensionality reduction, document retrieval by compressing documents into codes, and data visualization by compressing high-dimensional data points into 2D for plotting with different categories colored separately.
Deep Style: Using Variational Auto-encoders for Image GenerationTJ Torres
This document summarizes a presentation about using variational autoencoders for image generation. It discusses using unsupervised deep learning techniques like autoencoders to learn feature representations from image data without labels. Specifically, it covers variational autoencoders, which regularize the training of standard autoencoders by modeling the latent space as a probability distribution rather than a single point. The presentation outlines building and training a simple variational autoencoder model using the Chainer deep learning framework in Python.
Icml2012 learning hierarchies of invariant featureszukun
This document discusses learning hierarchies of invariant features using convolutional neural networks. It describes how convolutional networks build hierarchical representations through multiple stacked layers that each apply normalization, filtering, non-linearity, and pooling operations to learn increasingly complex features. This architecture is inspired by the hierarchical organization of the mammalian visual cortex. The document outlines applications of convolutional networks in areas like computer vision, speech recognition, and natural language processing where they have achieved state-of-the-art performance by learning hierarchical representations from data.
Deep learning and neural networks are inspired by biological neurons. Artificial neural networks (ANN) can have multiple layers and learn through backpropagation. Deep neural networks with multiple hidden layers did not work well until recent developments in unsupervised pre-training of layers. Experiments on MNIST digit recognition and NORB object recognition datasets showed deep belief networks and deep Boltzmann machines outperform other models. Deep learning is now widely used for applications like computer vision, natural language processing, and information retrieval.
This document discusses neural networks and deep learning. It provides an agenda that covers what neural networks are, how to train a neural network, unsupervised feature learning, building a handwritten digits classifier, and tips and tricks. It describes how neural networks are inspired by the human brain and are best suited for human-like tasks such as speech and object recognition. It also outlines the processes of feedforwarding, backpropagation, autoencoders, and stacked autoencoders. Recommended links for further learning are also included.
P03 neural networks cvpr2012 deep learning methods for visionzukun
This document provides an overview of neural networks for computer vision tasks. It discusses using neural networks to build an object recognition system from raw pixels to labels in an end-to-end manner with no distinction between feature extraction and classification. The key ideas are to learn features from data, use differentiable functions to efficiently compute and train features, and use a "deep" architecture of simpler non-linear modules. Building complex functions from simple building blocks like logistic regression allows constructing highly non-linear systems for tasks like vision.
Deep Learning & NLP: Graphs to the Rescue!Roelof Pieters
This document provides an overview of deep learning and natural language processing techniques. It begins with a history of machine learning and how deep learning advanced beyond early neural networks using methods like backpropagation. Deep learning methods like convolutional neural networks and word embeddings are discussed in the context of natural language processing tasks. Finally, the document proposes some graph-based approaches to combining deep learning with NLP, such as encoding language structures in graphs or using finite state graphs trained with genetic algorithms.
Transfer learning aims to improve learning in a target domain by leveraging knowledge from a related source domain. It is useful when the target domain has limited labeled data. There are several approaches, including instance-based approaches that reweight or resample source instances, and feature-based approaches that learn a transformation to align features across domains. Spectral feature alignment is one technique that builds a graph of correlations between pivot features shared across domains and domain-specific features, then applies spectral clustering to derive new shared features.
The document discusses deep learning and learning hierarchical representations. It makes three key points:
1. Deep learning involves learning multiple levels of representations or features from raw input in a hierarchical manner, unlike traditional machine learning which uses engineered features.
2. Learning hierarchical representations is important because natural data lies on low-dimensional manifolds and disentangling the factors of variation can lead to more robust features.
3. Architectures for deep learning involve multiple levels of non-linear feature transformations followed by pooling to build increasingly abstract representations at each level. This allows the representations to become more invariant and disentangled.
This document provides an overview of artificial neural networks (ANN). It discusses the origin of ANNs from biological neural networks. It describes different ANN architectures like multilayer perceptrons and different learning methods like backpropagation. It also outlines some challenging problems that ANNs can help with, such as pattern recognition, clustering, and optimization. The summary states that while the paper gives a good overview of ANNs, more development is needed to show ANNs are better than other methods for most problems.
Deep learning (also known as deep structured learning or hierarchical learning) is the application of artificial neural networks (ANNs) to learning tasks that contain more than one hidden layer. Deep learning is part of a broader family of machine learning methods based on learning data representations, as opposed to task-specific algorithms. Learning can be supervised, partially supervised or unsupervised.
The document discusses deep learning in computer vision. It provides an overview of research areas in computer vision including 3D reconstruction, shape analysis, and optical flow. It then discusses how deep learning approaches can learn representations from raw data through methods like convolutional neural networks and restricted Boltzmann machines. Deep learning has achieved state-of-the-art results in applications such as handwritten digit recognition, ImageNet classification, learning optical flow, and generating image captions. Convolutional neural networks have been particularly successful due to properties of shared local weights and pooling layers.
This document summarizes a technical seminar on using convolutional neural networks for P300 detection in brain-computer interfaces. The seminar covers an introduction to brain-computer interfaces and the P300 signal, describes existing P300 detection systems and the convolutional neural network approach, and presents the network architecture, learning process, evaluation results on two datasets showing improved detection rates over other methods, and conclusions. The seminar demonstrates that the convolutional neural network approach outperforms existing methods for P300 detection, especially with a limited number of electrodes or training epochs.
Deep learning is a machine learning technique that uses neural networks with multiple hidden layers between the input and output layers to model high-level abstractions in data. It can perform complex pattern recognition and feature extraction through multiple transformations of the input data. Deep learning techniques like deep neural networks, convolutional neural networks, and deep belief networks have achieved significant performance improvements in areas like computer vision, speech recognition, and natural language processing compared to traditional machine learning methods.
Recurrent Neural Networks have shown to be very powerful models as they can propagate context over several time steps. Due to this they can be applied effectively for addressing several problems in Natural Language Processing, such as Language Modelling, Tagging problems, Speech Recognition etc. In this presentation we introduce the basic RNN model and discuss the vanishing gradient problem. We describe LSTM (Long Short Term Memory) and Gated Recurrent Units (GRU). We also discuss Bidirectional RNN with an example. RNN architectures can be considered as deep learning systems where the number of time steps can be considered as the depth of the network. It is also possible to build the RNN with multiple hidden layers, each having recurrent connections from the previous time steps that represent the abstraction both in time and space.
A comprehensive tutorial on Convolutional Neural Networks (CNN) which talks about the motivation behind CNNs and Deep Learning in general, followed by a description of the various components involved in a typical CNN layer. It explains the theory involved with the different variants used in practice and also, gives a big picture of the whole network by putting everything together.
Next, there's a discussion of the various state-of-the-art frameworks being used to implement CNNs to tackle real-world classification and regression problems.
Finally, the implementation of the CNNs is demonstrated by implementing the paper 'Age ang Gender Classification Using Convolutional Neural Networks' by Hassner (2015).
Article overview: Unsupervised Learning of Visual Structure Using Predictive ...Ilya Kuzovkin
This set of slides goes over the recent article that tries to tie together the idea of predictive coding and deep learning. The main point of the article is that a generative system trained on sequential data to predict the future samples learns more "useful" representation than the usual autoencoder. The result resonates with the fact that our brain is probably using predictive mechanisms.
Icml2012 learning hierarchies of invariant featureszukun
This document discusses learning hierarchies of invariant features using convolutional neural networks. It describes how convolutional networks build hierarchical representations through multiple stacked layers that each apply normalization, filtering, non-linearity, and pooling operations to learn increasingly complex features. This architecture is inspired by the hierarchical organization of the mammalian visual cortex. The document outlines applications of convolutional networks in areas like computer vision, speech recognition, and natural language processing where they have achieved state-of-the-art performance by learning hierarchical representations from data.
Deep learning and neural networks are inspired by biological neurons. Artificial neural networks (ANN) can have multiple layers and learn through backpropagation. Deep neural networks with multiple hidden layers did not work well until recent developments in unsupervised pre-training of layers. Experiments on MNIST digit recognition and NORB object recognition datasets showed deep belief networks and deep Boltzmann machines outperform other models. Deep learning is now widely used for applications like computer vision, natural language processing, and information retrieval.
This document discusses neural networks and deep learning. It provides an agenda that covers what neural networks are, how to train a neural network, unsupervised feature learning, building a handwritten digits classifier, and tips and tricks. It describes how neural networks are inspired by the human brain and are best suited for human-like tasks such as speech and object recognition. It also outlines the processes of feedforwarding, backpropagation, autoencoders, and stacked autoencoders. Recommended links for further learning are also included.
P03 neural networks cvpr2012 deep learning methods for visionzukun
This document provides an overview of neural networks for computer vision tasks. It discusses using neural networks to build an object recognition system from raw pixels to labels in an end-to-end manner with no distinction between feature extraction and classification. The key ideas are to learn features from data, use differentiable functions to efficiently compute and train features, and use a "deep" architecture of simpler non-linear modules. Building complex functions from simple building blocks like logistic regression allows constructing highly non-linear systems for tasks like vision.
Deep Learning & NLP: Graphs to the Rescue!Roelof Pieters
This document provides an overview of deep learning and natural language processing techniques. It begins with a history of machine learning and how deep learning advanced beyond early neural networks using methods like backpropagation. Deep learning methods like convolutional neural networks and word embeddings are discussed in the context of natural language processing tasks. Finally, the document proposes some graph-based approaches to combining deep learning with NLP, such as encoding language structures in graphs or using finite state graphs trained with genetic algorithms.
Transfer learning aims to improve learning in a target domain by leveraging knowledge from a related source domain. It is useful when the target domain has limited labeled data. There are several approaches, including instance-based approaches that reweight or resample source instances, and feature-based approaches that learn a transformation to align features across domains. Spectral feature alignment is one technique that builds a graph of correlations between pivot features shared across domains and domain-specific features, then applies spectral clustering to derive new shared features.
The document discusses deep learning and learning hierarchical representations. It makes three key points:
1. Deep learning involves learning multiple levels of representations or features from raw input in a hierarchical manner, unlike traditional machine learning which uses engineered features.
2. Learning hierarchical representations is important because natural data lies on low-dimensional manifolds and disentangling the factors of variation can lead to more robust features.
3. Architectures for deep learning involve multiple levels of non-linear feature transformations followed by pooling to build increasingly abstract representations at each level. This allows the representations to become more invariant and disentangled.
This document provides an overview of artificial neural networks (ANN). It discusses the origin of ANNs from biological neural networks. It describes different ANN architectures like multilayer perceptrons and different learning methods like backpropagation. It also outlines some challenging problems that ANNs can help with, such as pattern recognition, clustering, and optimization. The summary states that while the paper gives a good overview of ANNs, more development is needed to show ANNs are better than other methods for most problems.
Deep learning (also known as deep structured learning or hierarchical learning) is the application of artificial neural networks (ANNs) to learning tasks that contain more than one hidden layer. Deep learning is part of a broader family of machine learning methods based on learning data representations, as opposed to task-specific algorithms. Learning can be supervised, partially supervised or unsupervised.
The document discusses deep learning in computer vision. It provides an overview of research areas in computer vision including 3D reconstruction, shape analysis, and optical flow. It then discusses how deep learning approaches can learn representations from raw data through methods like convolutional neural networks and restricted Boltzmann machines. Deep learning has achieved state-of-the-art results in applications such as handwritten digit recognition, ImageNet classification, learning optical flow, and generating image captions. Convolutional neural networks have been particularly successful due to properties of shared local weights and pooling layers.
This document summarizes a technical seminar on using convolutional neural networks for P300 detection in brain-computer interfaces. The seminar covers an introduction to brain-computer interfaces and the P300 signal, describes existing P300 detection systems and the convolutional neural network approach, and presents the network architecture, learning process, evaluation results on two datasets showing improved detection rates over other methods, and conclusions. The seminar demonstrates that the convolutional neural network approach outperforms existing methods for P300 detection, especially with a limited number of electrodes or training epochs.
Deep learning is a machine learning technique that uses neural networks with multiple hidden layers between the input and output layers to model high-level abstractions in data. It can perform complex pattern recognition and feature extraction through multiple transformations of the input data. Deep learning techniques like deep neural networks, convolutional neural networks, and deep belief networks have achieved significant performance improvements in areas like computer vision, speech recognition, and natural language processing compared to traditional machine learning methods.
Recurrent Neural Networks have shown to be very powerful models as they can propagate context over several time steps. Due to this they can be applied effectively for addressing several problems in Natural Language Processing, such as Language Modelling, Tagging problems, Speech Recognition etc. In this presentation we introduce the basic RNN model and discuss the vanishing gradient problem. We describe LSTM (Long Short Term Memory) and Gated Recurrent Units (GRU). We also discuss Bidirectional RNN with an example. RNN architectures can be considered as deep learning systems where the number of time steps can be considered as the depth of the network. It is also possible to build the RNN with multiple hidden layers, each having recurrent connections from the previous time steps that represent the abstraction both in time and space.
A comprehensive tutorial on Convolutional Neural Networks (CNN) which talks about the motivation behind CNNs and Deep Learning in general, followed by a description of the various components involved in a typical CNN layer. It explains the theory involved with the different variants used in practice and also, gives a big picture of the whole network by putting everything together.
Next, there's a discussion of the various state-of-the-art frameworks being used to implement CNNs to tackle real-world classification and regression problems.
Finally, the implementation of the CNNs is demonstrated by implementing the paper 'Age ang Gender Classification Using Convolutional Neural Networks' by Hassner (2015).
Article overview: Unsupervised Learning of Visual Structure Using Predictive ...Ilya Kuzovkin
This set of slides goes over the recent article that tries to tie together the idea of predictive coding and deep learning. The main point of the article is that a generative system trained on sequential data to predict the future samples learns more "useful" representation than the usual autoencoder. The result resonates with the fact that our brain is probably using predictive mechanisms.
Predictive coding : inhibition in the retina Jérémie Kalfon
a presentation of a paper from M. V. Srinivasan; S. B. Laughlin; A. Dubs called : Predictive Coding: A Fresh View of Inhibition in the Retina.
Stable URL:
http://links.jstor.org/sici?sici=0080-4649%2819821122%29216%3A1205%3C427%3APCAFVO%3E2.0.CO%3B2-P
Proceedings of the Royal Society of London. Series B, Biological Sciences is currently published by The Royal Society.
DSRLab seminar Introduction to deep learningPoo Kuan Hoong
Deep learning is a subfield of machine learning that has shown tremendous progress in the past 10 years. The success can be attributed to large datasets, cheap computing like GPUs, and improved machine learning models. Deep learning primarily uses neural networks, which are interconnected nodes that can perform complex tasks like object recognition. Key deep learning models include Restricted Boltzmann Machines (RBMs), Deep Belief Networks (DBNs), Convolutional Neural Networks (CNNs), and Recurrent Neural Networks (RNNs). CNNs are commonly used for computer vision tasks while RNNs are well-suited for sequential data like text or time series. Deep learning provides benefits like automatic feature learning and robustness, but also has weaknesses such
This document provides an overview of deep learning in neural networks. It defines deep learning as using artificial neural networks with multiple levels that learn higher-level concepts from lower-level ones. It describes how deep learning networks have many layers that build improved feature spaces, with earlier layers learning simple features that are combined in later layers. Deep learning networks are categorized as unsupervised or supervised, or hybrids. Common deep learning architectures like deep neural networks, deep belief networks, convolutional neural networks, and deep Boltzmann machines are also described. The document explains why GPUs are useful for deep learning due to their throughput-oriented design that speeds up model training.
This document provides an overview of deep learning including:
- Deep learning uses multiple layers of nonlinear processing units for feature extraction and transformation from input data.
- Deep learning architectures like deep neural networks have been applied to fields including computer vision, speech recognition, and natural language processing.
- Training deep networks involves learning features from raw data in an unsupervised manner before fine-tuning in a supervised way using labeled data.
- Popular deep learning models covered include convolutional neural networks, recurrent neural networks, autoencoders, and generative adversarial networks.
- Deep learning has achieved success in applications such as image recognition, generation and style transfer, as well as natural language processing, audio processing, and medical domains.
Deep learning is receiving phenomenal attention due to breakthrough results in several AI tasks and significant research investment by top technology companies like Google, Facebook, Microsoft, IBM. For someone who has not been introduced to this technology, it may be daunting to learn several concepts such as feature learning, Restricted Boltzmann Machines, Autoencoders, etc all at once and start applying it to their own AI applications. This presentation is the first of several in this series that is intended at practitioners.
This document provides an overview of convolutional neural networks (ConvNets). It begins by briefly introducing deep learning and explaining that ConvNets are a supervised deep learning method. It then discusses how ConvNets learn feature representations directly from data in a hierarchical manner using successive layers that apply filters to local regions of the input. The document provides examples of filters and feature maps and explains how techniques like pooling and multiple filters allow ConvNets to capture different features and build translation invariance. It concludes by discussing how ConvNets can be used for tasks like object detection and examples of popular ConvNet libraries.
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerPoo Kuan Hoong
The document provides an overview of machine learning and deep learning. It discusses the history and development of neural networks, including deep belief networks, convolutional neural networks, and recurrent neural networks. Applications of deep learning in areas like computer vision, natural language processing, and robotics are also covered. Finally, popular platforms, frameworks and libraries for developing deep learning models are presented, along with examples of pre-trained models that are available.
Deep Learning Sample Class (Jon Lederman)Jon Lederman
Deep learning uses neural networks that can learn their own features from data. The document discusses the history and limitations of early neural networks like perceptrons that used hand-engineered features. Modern deep learning overcomes these limitations by using hierarchical neural networks that can learn increasingly complex features from raw data through backpropagation and gradient descent. Deep learning networks represent features using tensors, or multidimensional arrays, that are learned from data through training examples.
This document provides an overview of convolutional neural networks (CNNs) and describes a research study that used a two-dimensional heterogeneous CNN (2D-hetero CNN) for mobile health analytics. The study developed a 2D-hetero CNN model to assess fall risk using motion sensor data from 5 sensor locations on participants. The model extracts low-level local features using convolutional layers and integrates them into high-level global features to classify fall risk. The 2D-hetero CNN was evaluated against feature-based approaches and other CNN architectures and performed ablation analysis.
This document summarizes a research paper on convolutional restricted Boltzmann machines (CRBMs) for feature learning. The paper proposes using CRBMs to learn hierarchical local feature detectors in an unsupervised and generative manner. CRBMs extend regular restricted Boltzmann machines to incorporate spatial locality. The learned features are evaluated on handwritten digit and human detection tasks, achieving results comparable to state-of-the-art. The paper contributes an approach to generative feature learning using CRBMs that can capture spatial relationships in images.
ResNet (short for Residual Network) is a deep neural network architecture that has achieved significant advancements in image recognition tasks. It was introduced by Kaiming He et al. in 2015.
The key innovation of ResNet is the use of residual connections, or skip connections, that enable the network to learn residual mappings instead of directly learning the desired underlying mappings. This addresses the problem of vanishing gradients that commonly occurs in very deep neural networks.
In a ResNet, the input data flows through a series of residual blocks. Each residual block consists of several convolutional layers followed by batch normalization and rectified linear unit (ReLU) activations. The original input to a residual block is passed through the block and added to the output of the block, creating a shortcut connection. This addition operation allows the network to learn residual mappings by computing the difference between the input and the output.
By using residual connections, the gradients can propagate more effectively through the network, enabling the training of deeper models. This enables the construction of extremely deep ResNet architectures with hundreds of layers, such as ResNet-101 or ResNet-152, while still maintaining good performance.
ResNet has become a widely adopted architecture in various computer vision tasks, including image classification, object detection, and image segmentation. Its ability to train very deep networks effectively has made it a fundamental building block in the field of deep learning.
Deep learning systems are susceptible to adversarial manipulation through techniques like generating adversarial samples and substitute models. By making small, targeted perturbations to inputs, an attacker can cause misclassifications or reduce a model's confidence without affecting human perception of the inputs. This is possible due to blind spots in how models learn representations that are different from human concepts. Defending against such attacks requires training models with adversarial techniques to make them more robust.
Deep learning techniques like convolutional neural networks (CNNs) and deep neural networks have achieved human-level performance on certain tasks. Pioneers in the field include Geoffrey Hinton, who co-invented backpropagation, Yann LeCun who developed CNNs for image recognition, and Andrew Ng who helped apply these techniques at companies like Baidu and Coursera. Deep learning is now widely used for applications such as image recognition, speech recognition, and distinguishing objects like dogs from cats, often outperforming previous machine learning methods.
This document provides information about a development deep learning architecture event organized by Pantech Solutions and The Institution of Electronics and Telecommunication. The event agenda includes general talks on AI, deep learning libraries, deep learning algorithms like ANN, RNN and CNN, and demonstrations of character recognition and emotion recognition. Details are provided about the organizers Pantech Solutions and IETE, as well as deep learning topics like neural networks, activation functions, common deep learning libraries, algorithms, applications, and the event agenda.
This document provides a summary of topics covered in a deep neural networks tutorial, including:
- A brief introduction to artificial intelligence, machine learning, and artificial neural networks.
- An overview of common deep neural network architectures like convolutional neural networks, recurrent neural networks, autoencoders, and their applications in areas like computer vision and natural language processing.
- Advanced techniques for training deep neural networks like greedy layer-wise training, regularization methods like dropout, and unsupervised pre-training.
- Applications of deep learning beyond traditional discriminative models, including image synthesis, style transfer, and generative adversarial networks.
ResNet, short for "Residual Network," is a type of deep neural network architecture that was introduced by Microsoft researchers in 2015. ResNet is designed to address the problem of vanishing gradients, which can occur in deep neural networks that are many layers deep.
The main innovation in ResNet is the use of residual connections, also known as skip connections. These connections allow information from earlier layers of the network to bypass some of the later layers and be directly fed into the later layers. This helps to ensure that the gradient signal from the output can propagate back through the network during training, which can help to prevent the vanishing gradient problem.
ResNet has been shown to be very effective at image recognition and other computer vision tasks. It has achieved state-of-the-art performance on a number of benchmark datasets, such as ImageNet. Since its introduction, many variations and improvements to the original ResNet architecture have been proposed, including ResNeXt, Wide ResNet, and Residual Attention Network (RANet).
Similar to 2010 deep learning and unsupervised feature learning (20)
AI for Legal Research with applications, toolsmahaffeycheryld
AI applications in legal research include rapid document analysis, case law review, and statute interpretation. AI-powered tools can sift through vast legal databases to find relevant precedents and citations, enhancing research accuracy and speed. They assist in legal writing by drafting and proofreading documents. Predictive analytics help foresee case outcomes based on historical data, aiding in strategic decision-making. AI also automates routine tasks like contract review and due diligence, freeing up lawyers to focus on complex legal issues. These applications make legal research more efficient, cost-effective, and accessible.
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Sinan KOZAK
Sinan from the Delivery Hero mobile infrastructure engineering team shares a deep dive into performance acceleration with Gradle build cache optimizations. Sinan shares their journey into solving complex build-cache problems that affect Gradle builds. By understanding the challenges and solutions found in our journey, we aim to demonstrate the possibilities for faster builds. The case study reveals how overlapping outputs and cache misconfigurations led to significant increases in build times, especially as the project scaled up with numerous modules using Paparazzi tests. The journey from diagnosing to defeating cache issues offers invaluable lessons on maintaining cache integrity without sacrificing functionality.
Comparative analysis between traditional aquaponics and reconstructed aquapon...bijceesjournal
The aquaponic system of planting is a method that does not require soil usage. It is a method that only needs water, fish, lava rocks (a substitute for soil), and plants. Aquaponic systems are sustainable and environmentally friendly. Its use not only helps to plant in small spaces but also helps reduce artificial chemical use and minimizes excess water use, as aquaponics consumes 90% less water than soil-based gardening. The study applied a descriptive and experimental design to assess and compare conventional and reconstructed aquaponic methods for reproducing tomatoes. The researchers created an observation checklist to determine the significant factors of the study. The study aims to determine the significant difference between traditional aquaponics and reconstructed aquaponics systems propagating tomatoes in terms of height, weight, girth, and number of fruits. The reconstructed aquaponics system’s higher growth yield results in a much more nourished crop than the traditional aquaponics system. It is superior in its number of fruits, height, weight, and girth measurement. Moreover, the reconstructed aquaponics system is proven to eliminate all the hindrances present in the traditional aquaponics system, which are overcrowding of fish, algae growth, pest problems, contaminated water, and dead fish.
Rainfall intensity duration frequency curve statistical analysis and modeling...bijceesjournal
Using data from 41 years in Patna’ India’ the study’s goal is to analyze the trends of how often it rains on a weekly, seasonal, and annual basis (1981−2020). First, utilizing the intensity-duration-frequency (IDF) curve and the relationship by statistically analyzing rainfall’ the historical rainfall data set for Patna’ India’ during a 41 year period (1981−2020), was evaluated for its quality. Changes in the hydrologic cycle as a result of increased greenhouse gas emissions are expected to induce variations in the intensity, length, and frequency of precipitation events. One strategy to lessen vulnerability is to quantify probable changes and adapt to them. Techniques such as log-normal, normal, and Gumbel are used (EV-I). Distributions were created with durations of 1, 2, 3, 6, and 24 h and return times of 2, 5, 10, 25, and 100 years. There were also mathematical correlations discovered between rainfall and recurrence interval.
Findings: Based on findings, the Gumbel approach produced the highest intensity values, whereas the other approaches produced values that were close to each other. The data indicates that 461.9 mm of rain fell during the monsoon season’s 301st week. However, it was found that the 29th week had the greatest average rainfall, 92.6 mm. With 952.6 mm on average, the monsoon season saw the highest rainfall. Calculations revealed that the yearly rainfall averaged 1171.1 mm. Using Weibull’s method, the study was subsequently expanded to examine rainfall distribution at different recurrence intervals of 2, 5, 10, and 25 years. Rainfall and recurrence interval mathematical correlations were also developed. Further regression analysis revealed that short wave irrigation, wind direction, wind speed, pressure, relative humidity, and temperature all had a substantial influence on rainfall.
Originality and value: The results of the rainfall IDF curves can provide useful information to policymakers in making appropriate decisions in managing and minimizing floods in the study area.
Rainfall intensity duration frequency curve statistical analysis and modeling...
2010 deep learning and unsupervised feature learning
1. 1
NIPS 2010 Workshop on
Deep Learning and Unsupervised Feature Learning
Tutorial on Deep Learning and Applications
Honglak Lee
University of Michigan
Co-organizers: Yoshua Bengio, Geoff Hinton, Yann LeCun,
Andrew Ng, and Marc’Aurelio Ranzato
* Includes slide material sourced from the co-organizers
2. 2
Outline
• Deep learning
– Greedy layer-wise training (for supervised learning)
– Deep belief nets
– Stacked denoising auto-encoders
– Stacked predictive sparse coding
– Deep Boltzmann machines
• Applications
– Vision
– Audio
– Language
3. 3
Outline
• Deep learning
– Greedy layer-wise training (for supervised learning)
– Deep belief nets
– Stacked denoising auto-encoders
– Stacked predictive sparse coding
– Deep Boltzmann machines
• Applications
– Vision
– Audio
– Language
4. 4
Motivation: why go deep?
• Deep Architectures can be representationally efficient
– Fewer computational units for same function
• Deep Representations might allow for a hierarchy or
representation
– Allows non-local generalization
– Comprehensibility
• Multiple levels of latent variables allow combinatorial
sharing of statistical strength
• Deep architectures work well (vision, audio, NLP, etc.)!
5. 5
Different Levels of Abstraction
• Hierarchical Learning
– Natural progression from low
level to high level structure as
seen in natural complexity
– Easier to monitor what is being
learnt and to guide the machine
to better subspaces
– A good lower level
representation can be used for
many distinct tasks
6. 6
Generalizable Learning
• Shared Low Level
Representations
– Multi-Task Learning
– Unsupervised Training
raw input
task 1
output
task 3
output
task 2
output
shared
intermediate
representation
…
…
…
…
…
task 1
output y1
task N
output yN
High-level features
Low-level features
• Partial Feature Sharing
– Mixed Mode Learning
– Composition of
Functions
7. 7
A Neural Network
• Forward Propagation :
– Sum inputs, produce activation, feed-forward
8. 8
A Neural Network
• Training : Back Propagation of Error
– Calculate total error at the top
– Calculate contributions to error at each step going
backwards
t2
t1
9. 9
Deep Neural Networks
• Simple to construct
– Sigmoid nonlinearity for hidden layers
– Softmax for the output layer
• But, backpropagation does not
work well (if randomly initialized)
– Deep networks trained with
backpropagation (without
unsupervised pretraining) perform
worse than shallow networks
(Bengio et al., NIPS 2007)
10. 10
Problems with Back Propagation
• Gradient is progressively getting more dilute
– Below top few layers, correction signal is minimal
• Gets stuck in local minima
– Especially since they start out far from ‘good’
regions (i.e., random initialization)
• In usual settings, we can use only labeled data
– Almost all data is unlabeled!
– The brain can learn from unlabeled data
11. 12
Deep Network Training (that actually works)
• Use unsupervised learning (greedy layer-wise
training)
– Allows abstraction to develop naturally from one layer
to another
– Help the network initialize with good parameters
• Perform supervised top-down training as final step
– Refine the features (intermediate layers) so that they
become more relevant for the task
12. 13
Outline
• Deep learning
– Greedy layer-wise training (for supervised learning)
– Deep belief nets
– Stacked denoising auto-encoders
– Stacked predictive sparse coding
– Deep Boltzmann machines
• Applications
– Vision
– Audio
– Language
13. 14
• Probabilistic generative model
• Deep architecture – multiple layers
• Unsupervised pre-learning provides a good
initialization of the network
– maximizing the lower-bound of the log-likelihood
of the data
• Supervised fine-tuning
– Generative: Up-down algorithm
– Discriminative: backpropagation
Deep Belief Networks(DBNs)
Hinton et al., 2006
15. 17
DBN Greedy training
• First step:
– Construct an RBM with
an input layer v and a
hidden layer h
– Train the RBM
Hinton et al., 2006
16. 18
DBN Greedy training
• Second step:
– Stack another hidden
layer on top of the RBM
to form a new RBM
– Fix , sample from
as input. Train
as RBM.
2
W
1
W
1
W
2
W
)|( 1
vhQ
1
h
)|( 1
vhQ
Hinton et al., 2006
17. 19
DBN Greedy training
• Third step:
– Continue to stack layers
on top of the network,
train it as previous step,
with sample sampled
from
• And so on…
2
W
1
W
3
W
3
h
)|( 12
hhQ
)|( 1
vhQ
)|( 12
hhQ
Hinton et al., 2006
18. 20
Why greedy training works?
• RBM specifies P(v,h) from
P(v|h) and P(h|v)
– Implicitly defines P(v) and
P(h)
• Key idea of stacking
– Keep P(v|h) from 1st RBM
– Replace P(h) by the
distribution generated by
2nd level RBM
Hinton et al., 2006
19. 21
Why greedy training works?
• Easy approximate inference
– P(hk+1|hk) approximated from the
associated RBM
– Approximation because P(hk+1)
differs between RBM and DBN
• Training:
– Variational bound justifies greedy
layerwise training of RBMs
2
W
1
W
3
W
3
h
)|( 1
vhQ
)|( 12
hhQ
Trained by the second layer RBM
Hinton et al., 2006
20. 22
Outline
• Deep learning
– Greedy layer-wise training (for supervised learning)
– Deep belief nets
– Stacked denoising auto-encoders
– Stacked predictive sparse coding
– Deep Boltzmann machines
• Applications
– Vision
– Audio
– Language
21. 23
Denoising Auto-Encoder
• Corrupt the input (e.g. set 25% of inputs to 0)
• Reconstruct the uncorrupted input
• Use uncorrupted encoding as input to next level
KL(reconstruction|raw input)
Hidden code
(representation)
Corrupted input Raw input reconstruction
(Vincent et al, 2008)
22. 24
Denoising Auto-Encoder
• Learns a vector field towards
higher probability regions
• Minimizes variational lower
bound on a generative model
• Corresponds to regularized
score matching on an RBM
Corrupted input
Corrupted input
(Vincent et al, 2008)
23. 25
Stacked (Denoising) Auto-Encoders
• Greedy Layer wise learning
– Start with the lowest level and stack upwards
– Train each layer of auto-encoder on the intermediate code
(features) from the layer below
– Top layer can have a different output (e.g., softmax non-
linearity) to provide an output for classification
26. 32
Outline
• Deep learning
– Greedy layer-wise training (for supervised learning)
– Deep belief nets
– Stacked denoising auto-encoders
– Stacked predictive sparse coding
– Deep Boltzmann machines
• Applications
– Vision
– Audio
– Language
27. 33
Predictive Sparse Coding
• Recall the objective function for sparse coding:
• Modify by adding a penalty for prediction error:
– Approximate the sparse code with an encoder
29. 35
Using PSD to Train a Hierarchy of Features
• Phase 1: train first layer using PSD
30. 36
Using PSD to Train a Hierarchy of Features
• Phase 1: train first layer using PSD
• Phase 2: use encoder+absolute value as feature extractor
31. 37
Using PSD to Train a Hierarchy of Features
• Phase 1: train first layer using PSD
• Phase 2: use encoder+absolute value as feature extractor
• Phase 3: train the second layer using PSD
32. 38
Using PSD to Train a Hierarchy of Features
• Phase 1: train first layer using PSD
• Phase 2: use encoder+absolute value as feature extractor
• Phase 3: train the second layer using PSD
• Phase 4: use encoder + absolute value as 2nd feature extractor
33. 39
Using PSD to Train a Hierarchy of Features
• Phase 1: train first layer using PSD
• Phase 2: use encoder+absolute value as feature extractor
• Phase 3: train the second layer using PSD
• Phase 4: use encoder + absolute value as 2nd feature extractor
• Phase 5: train a supervised classifier on top
• Phase 6: (optional): train the entire system with supervised
back-propagation
34. 40
Outline
• Deep learning
– Greedy layer-wise training (for supervised learning)
– Deep belief nets
– Stacked denoising auto-encoders
– Stacked predictive sparse coding
– Deep Boltzmann machines
• Applications
– Vision
– Audio
– Language
35. 41
Deep Boltzmann Machines
Slide credit: R. Salskhutdinov
Undirected connections between
all layers
(no connections between the
nodes in the same layer)
Salakhutdinov & Hinton, 2009
36. 42
DBMs vs. DBNs
• In multiple layer model, the undirected connection
between the layers make complete Boltzmann machine.
Salakhutdinov & Hinton, 2009
42. 48
Why Greedy Layer Wise Training Works
• Regularization Hypothesis
– Pre-training is “constraining” parameters in a
region relevant to unsupervised dataset
– Better generalization
(Representations that better describe unlabeled data are more
discriminative for labeled data)
• Optimization Hypothesis
– Unsupervised training initializes lower level
parameters near localities of better minima than
random initialization can
(Bengio 2009, Erhan et al. 2009)
43. 56
Outline
• Deep learning
– Greedy layer-wise training (for supervised learning)
– Deep belief nets
– Stacked denoising auto-encoders
– Stacked predictive sparse coding
– Deep Boltzmann machines
• Applications
– Vision
– Audio
– Language
46. 60
Nonlinearities and pooling
• Details of feature processing stage for PSD
Local contrast
normalization
Max-poolingRectificationConvolution
or filtering
(Jarret et al., 2009)
47. 61
Convolutional DBNs
(Lee et al, 2009; Desjardins and Bengio, 2008; Norouzi et al., 2009)
Convolutional RBM: Generative
training of convolutional structures
(with probabilistic max-pooling)
48. 62
Spatial Pyramid Structure
• Descriptor Layer: detect and locate
features, extract corresponding
descriptors (e.g. SIFT)
• Code Layer: code the descriptors
– Vector Quantization (VQ): each code has
only one non-zero element
– Soft-VQ: small group of elements can be
non-zero
• SPM layer: pool codes across
subregions and average/normalize into
a histogram
(Yang et al., 2009)
49. 63
Improving the coding step
• Classifiers using these features need
nonlinear kernels
– Increases computational complexity
• Modify the Coding step to produce
feature representations that linear
classifiers can use effectively
– Sparse coding
– Local Coordinate coding
(Yang et al., 2009)
50. 64
Experimental results
• Competitive performance to other state-of-
the-art methods using a single type of
features on object recognition benchmarks
• E.g.: Caltech 101 (30 examples per class)
– Using pixel representation: ~65% accuracy (Jarret
et al., 2009; Lee et al., 2009; and many others)
– Using SIFT representation: 73~75% accuracy (Yang
et al., 2009; Jarret et al., 2009, Boureau et al.,
2010, and many others)
51. 65
Outline
• Deep learning
– Greedy layer-wise training (for supervised learning)
– Deep belief nets
– Stacked denoising auto-encoders
– Stacked predictive sparse coding
– Deep Boltzmann machines
• Applications
– Vision
– Audio
– Language
52. 66
Convolutional DBN for audio
Spectrogram
Detection nodes
Max pooling node
time
frequency
(Lee et al., 2009)
58. 72
Phone recognition results
Method PER
Stochastic Segmental Models 36.0%
Conditional Random Field 34.8%
Large-Margin GMM 33.0%
CD-HMM 27.3%
Augmented conditional Random Fields 26.6%
Recurrent Neural Nets 26.1%
Bayesian Triphone HMM 25.6%
Monophone HTMs 24.8%
Heterogeneous Classifiers 24.4%
Deep Belief Networks(DBNs) 23.0%
Triphone HMMs discriminatively trained w/ BMMI 22.7%
Deep Belief Networks with mcRBM feature extraction 20.5%
(Dahl et al., 2010)
59. 73
Outline
• Deep learning
– Greedy layer-wise training (for supervised learning)
– Deep belief nets
– Stacked denoising auto-encoders
– Stacked predictive sparse coding
– Deep Boltzmann machines
• Applications
– Vision
– Audio
– Language
60. 74
Language modeling
• Language Models
– Estimating the probability of the next word w
given a sequence of words
• Baseline approach in NLP
– N-gram models (with smoothing):
• Deep Learning approach
– Bengio et al. (2000, 2003): via Neural network
– Mnih and Hinton (2007): via RBMs
61. 75
Other NLP tasks
• Part-Of-Speech Tagging (POS)
– mark up the words in a text (corpus) as corresponding
to a particular tag
• E.g. Noun, adverb, ...
• Chunking
– Also called shallow parsing
– In the view of phrase: Labeling phrase to syntactic
constituents
• E.g. NP (noun phrase), VP (verb phrase), …
– In the view of word: Labeling word to syntactic role in
a phrase
• E.g. B-NP (beginning of NP), I-VP (inside VP), …
(Collobert and Weston, 2009)
62. 76
Other NLP tasks
• Named Entity Recognition (NER)
– In the view of thought group: Given a stream of
text, determine which items in the text map to
proper names
– E.g., labeling “atomic elements” into “PERSON”,
“COMPANY”, “LOCATION”
• Semantic Role Labeling (SRL)
– In the view of sentence: giving a semantic role to a
syntactic constituent of a sentence
– E.g. [John]ARG0 [ate]REL [the apple]ARG1 (Proposition Bank)
• An Annotated Corpus of Semantic Roles (Palmer et al.)
(Collobert and Weston, 2009)
63. 77
A unified architecture for NLP
• Main idea: a unified architecture for NLP
– Deep Neural Network
– Trained jointly with different tasks (feature sharing
and multi-task learning)
– Language model is trained in an unsupervised
fashion
• Show the generality of the architecture
• Improve SRL performance
(Collobert and Weston, 2009)
64. 78
General Deep Architecture for NLP
Basic features (e.g., word,
capitalization, relative position)
Embedding by lookup table
Convolution (i.e., how each
word is relevant to its context?)
Max pooling
Supervised learning
(Collobert and Weston, 2009)
66. 80
Summary
• Training deep architectures
– Unsupervised pre-training helps training deep
networks
– Deep belief nets, Stacked denoising auto-
encoders, Stacked predictive sparse coding, Deep
Boltzmann machines
• Deep learning algorithms and unsupervised
feature learning algorithms show promising
results in many applications
– vision, audio, natural language processing, etc.
68. 82
References
• B. Olshausen, D. Field. Emergence of Simple-Cell Receptive Field
Properties by Learning a Sparse Code for Natural Images. Nature, 1996.
• H. Lee, A. Battle, R. Raina, and A. Y. Ng. Efficient sparse coding algorithms.
NIPS, 2007.
• R. Raina, A. Battle, H. Lee, B. Packer, and A. Y. Ng. Self-taught learning:
Transfer learning from unlabeled data. ICML, 2007.
• H. Lee, R. Raina, A. Teichman, and A. Y. Ng. Exponential Family Sparse
Coding with Application to Self-taught Learning. IJCAI, 2009.
• J. Yang, K. Yu, Y. Gong, and T. Huang. Linear Spatial Pyramid Matching
Using Sparse Coding for Image Classification. CVPR, 2009.
• Y. Bengio. Learning Deep Architectures for AI, Foundations and Trends in
Machine Learning, 2009.
• Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle. Greedy layer-wise
training of deep networks. NIPS, 2007.
• P. Vincent, H. Larochelle, Y. Bengio, and P. Manzagol. Extracting and
composing robust features with denoising autoencoders. ICML, 2008.
69. 83
References
• H. Lee, C. Ekanadham, and A. Y. Ng. Sparse deep belief net model for visual
area V2. NIPS, 2008.
• Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard,
and L. D. Jackel. Backpropagation applied to handwritten zip code
recognition. Neural Computation, 1:541–551, 1989.
• H. Lee, R. Grosse, R. Ranganath, and A. Y. Ng. Convolutional deep belief
networks for scalable unsupervised learning of hierarchical
representations. ICML, 2009.
• H. Lee, Y. Largman, P. Pham, and A. Y. Ng. Unsupervised feature learning
for audio classification using convolutional deep belief networks. NIPS,
2009.
• A. R. Mohamed, G. Dahl, and G. E. Hinton, Deep belief networks for phone
recognition. NIPS 2009 workshop on deep learning for speech recognition.
• G. Dahl, M. Ranzato, A. Mohamed, G. Hinton, Phone Recognition with the
Mean-Covariance Restricted Boltzmann Machine, NIPS 2010
• M. Ranzato, A. Krizhevsky, G. E. Hinton, Factored 3-Way Restricted
Boltzmann Machines for Modeling Natural Images. AISTATS, 2010.
70. 84
References
• M. Ranzato, G. E. Hinton. Modeling Pixel Means and Covariances Using
Factorized Third-Order Boltzmann Machines. CVPR, 2010.
• G. Taylor, G. E. Hinton, and S. Roweis. Modeling Human Motion Using
Binary Latent Variables. NIPS, 2007.
• G. Taylor and G. E. Hinton. Factored Conditional Restricted Boltzmann
Machines for Modeling Motion Style. ICML, 2009.
• G. Taylor, R. Fergus, Y. LeCun and C. Bregler. Convolutional Learning of
Spatio-temporal Features. ECCV, 2010.
• K. Kavukcuoglu, M. Ranzato, R. Fergus, and Y. LeCun, Learning Invariant
Features through Topographic Filter Maps. CVPR, 2009.
• K. Kavukcuoglu, M. Ranzato, and Y. LeCun, Fast Inference in Sparse Coding
Algorithms with Applications to Object Recognition. CBLL-TR-2008-12-01,
2008.
• K. Jarrett, K. Kavukcuoglu, M. Ranzato, and Y. LeCun, What is the Best
Multi-Stage Architecture for Object Recognition? ICML, 2009.
• R. Salakhutdinov and I. Murray. On the Quantitative Analysis of Deep
Belief Networks. ICML, 2008.
71. 85
References
• R. Salakhutdinov and G. E. Hinton. Deep Boltzmann machines. AISTATS,
2009.
• K. Yu, T. Zhang, and Y. Gong. Nonlinear Learning using Local Coordinate
Coding, NIPS, 2009.
• J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong. Learning Locality-
constrained Linear Coding for Image Classification. CVPR, 2010.
• H. Larochelle, Y. Bengio, J. Louradour and P. Lamblin, Exploring Strategies
for Training Deep Neural Networks, JMLR, 2009.
• D. Erhan, Y. Bengio, A. Courville, P.-A. Manzagol, P. Vincent and S. Bengio,
Why Does Unsupervised Pre-training Help Deep Learning? JMLR, 2010.
• J. Yang, K. Yu, and T. Huang. Supervised Translation-Invariant Sparse
Coding. CVPR, 2010.
• Y. Boureau, F. Bach, Y. LeCun and J. Ponce: Learning Mid-Level Features for
Recognition. CVPR, 2010.
• I. J. Goodfellow, Q. V. Le, A. M. Saxe, H. Lee, and A. Y. Ng. Measuring
invariances in deep networks. NIPS, 2009.
72. 86
References
• Y. Boureau, J. Ponce, Y. LeCun, A theoretical analysis of feature pooling in
vision algorithms. ICML, 2010.
• R. Collobert and J. Weston. A unified architecture for natural language
processing: Deep neural networks with multitask learning. ICML, 2009.
73. 87
References
(1) links to code:
• sparse coding
– http://www.eecs.umich.edu/~honglak/softwares/nips06-sparsecoding.htm
• DBN
– http://www.cs.toronto.edu/~hinton/MatlabForSciencePaper.html
• DBM
– http://web.mit.edu/~rsalakhu/www/DBM.html
• convnet
– http://cs.nyu.edu/~koray/publis/code/eblearn.tar.gz
– http://cs.nyu.edu/~koray/publis/code/randomc101.tar.gz
(2) link to general website on deeplearning:
– http://deeplearning.net