El deep learning, o aprendizaje profundo, ha revolucionado el panorama del aprendizaje automático, en particular, y de la inteligencia artificial, en general. Los modelos de redes neuronales profundas (con un gran número de capas) han permitido obtener avances importantes en diversas tareas de aprendizaje, percepción y análisis de datos, que van desde la clasificación de imágenes hasta el reconocimiento del habla.
En la charla se presentarán, de manera general, los fundamentos de estos modelos y diferentes casos de aplicación en aprendizaje de la representación, visión por computador y análisis de texto entre otros. Se revisarán los avances teóricos y tecnológicos que han permitido abordar estos complejos problemas y se discutirá la experiencia tecnológica y científica en proyectos de investigación adelantados en Colombia.
Presentador: Fabio Gonzalez. Profesor Titular del Depto. de Ingeniería de Sistemas e Industrial de la Universidad Nacional de Colombia, donde lidera el Laboratorio de aprendizaje, percepción y descubrimiento automático (MindLab). Su trabajo de investigación se concentra en el aprendizaje automático, la recuperación de información y la visión por computador, con aplicaciones en campos diversos como el análisis de imágenes médicas, el análisis automático de textos y el aprendizaje a partir de información multimodal, entre otros.
This dissertation examines using neural networks to predict financial time series, specifically the S&P Mib Index of the Milan stock exchange. The document provides background on neural networks, including their history and development from simple linear models to modern multi-layer models. It describes supervised neural networks and their components like activation functions and weights. The dissertation then details training neural network weights using methods like backpropagation and techniques to prevent overfitting. Finally, it applies these concepts in a case study using neural networks to forecast changes in the S&P Mib Index.
With massive amounts of computational power, machines can now recognize objects and translate speech in real time. Thanks to Deep Learning, Artificial Intelligence is now getting smart. Deep Learning models attempt to mimic the activity of the neocortex. It is understood that the activity of these layers of neurons is what constitutes a brain to be able to "think". These models learn to recognize patterns in digital representations of data in a very similar sense to humans. In this survey report, we introduce the most important concepts of Deep Learning along with the state of the art models that are now widely adopted in commercial products.
1. Self-organizing maps (SOM) are an unsupervised learning algorithm that transform high-dimensional data into lower dimensions for visualization while preserving topological properties.
2. The SOM network has an input layer fully connected to an output layer arranged in a grid, with each node containing a weight vector of the same dimension as inputs.
3. During training, the best matching unit (BMU) and its neighbors on the grid have their weight vectors adjusted to better match the input based on their distance from the BMU, with learning rates decreasing over time.
The document provides biographical information about Professor Kunihiko Fukushima, a pioneer in the field of neural networks. It describes his invention of the Neocognitron, a hierarchical neural network for deformation invariant pattern recognition. The Neocognitron is able to recognize patterns that have been distorted through partial shifts, rotations, or other transformations. The document also discusses Fukushima's research interests in modeling neural networks to understand visual processing and active vision in the brain.
The increase in computational power of embedded devices and the latency demands of novel applications brought a paradigm shift on how and where the computation is performed. AI inference is slowly moving from the Cloud to end-devices with limited resources, reducing bandwidth and latency, using compression, distillation of large networks, or quantization methods. While this approach worked well with regular Artificial Neural Networks, time-centric recurrent networks like Long-Short Term Memory remain too complex to be transferred on embedded devices without extreme simplifications. To solve this issue, the Reservoir Computing paradigm proposes sparse untrained non-linear networks, the reservoir, that can embed temporal relations without some of the hindrances of Recurrent Neural Networks training, and with a lower memory occupation. Echo State Networks (ESN) and Liquid State Machines are the most notable examples. In this scenario, we propose a methodology for ESN design and training based on Bayesian Optimization. Our Bayesian learning process efficiently searches hyper-parameters that maximize a fitness function. At the same time, it considers soft memory and time boundaries, measured empirically on the target device (whether embedded or not), and subject to the user’s constraints. Preliminary results show that the system is able to optimize the ESN hyper-parameters under stringent time and memory constraints, obtaining comparable results in terms of prediction accuracy.
[251] implementing deep learning using cu dnnNAVER D2
This document provides an overview of deep learning and implementation on GPU using cuDNN. It begins with a brief history of neural networks and an introduction to common deep learning models like convolutional neural networks. It then discusses implementing deep learning models using cuDNN, including initialization, forward and backward passes for layers like convolution, pooling and fully connected. It covers optimization issues like initialization and speeding up training. Finally, it introduces VUNO-Net, the company's deep learning framework, and discusses its performance, applications and visualization.
(2017/06)Practical points of deep learning for medical imagingKyuhwan Jung
This document provides an overview of deep learning and its applications in medical imaging. It discusses key topics such as the definition of artificial intelligence, a brief history of neural networks and machine learning, and how deep learning is driving breakthroughs in tasks like visual and speech recognition. The document also addresses challenges in medical data analysis using deep learning, such as how to handle limited data or annotations. It provides examples of techniques used to address these challenges, such as data augmentation, transfer learning, and weakly supervised learning.
This dissertation examines using neural networks to predict financial time series, specifically the S&P Mib Index of the Milan stock exchange. The document provides background on neural networks, including their history and development from simple linear models to modern multi-layer models. It describes supervised neural networks and their components like activation functions and weights. The dissertation then details training neural network weights using methods like backpropagation and techniques to prevent overfitting. Finally, it applies these concepts in a case study using neural networks to forecast changes in the S&P Mib Index.
With massive amounts of computational power, machines can now recognize objects and translate speech in real time. Thanks to Deep Learning, Artificial Intelligence is now getting smart. Deep Learning models attempt to mimic the activity of the neocortex. It is understood that the activity of these layers of neurons is what constitutes a brain to be able to "think". These models learn to recognize patterns in digital representations of data in a very similar sense to humans. In this survey report, we introduce the most important concepts of Deep Learning along with the state of the art models that are now widely adopted in commercial products.
1. Self-organizing maps (SOM) are an unsupervised learning algorithm that transform high-dimensional data into lower dimensions for visualization while preserving topological properties.
2. The SOM network has an input layer fully connected to an output layer arranged in a grid, with each node containing a weight vector of the same dimension as inputs.
3. During training, the best matching unit (BMU) and its neighbors on the grid have their weight vectors adjusted to better match the input based on their distance from the BMU, with learning rates decreasing over time.
The document provides biographical information about Professor Kunihiko Fukushima, a pioneer in the field of neural networks. It describes his invention of the Neocognitron, a hierarchical neural network for deformation invariant pattern recognition. The Neocognitron is able to recognize patterns that have been distorted through partial shifts, rotations, or other transformations. The document also discusses Fukushima's research interests in modeling neural networks to understand visual processing and active vision in the brain.
The increase in computational power of embedded devices and the latency demands of novel applications brought a paradigm shift on how and where the computation is performed. AI inference is slowly moving from the Cloud to end-devices with limited resources, reducing bandwidth and latency, using compression, distillation of large networks, or quantization methods. While this approach worked well with regular Artificial Neural Networks, time-centric recurrent networks like Long-Short Term Memory remain too complex to be transferred on embedded devices without extreme simplifications. To solve this issue, the Reservoir Computing paradigm proposes sparse untrained non-linear networks, the reservoir, that can embed temporal relations without some of the hindrances of Recurrent Neural Networks training, and with a lower memory occupation. Echo State Networks (ESN) and Liquid State Machines are the most notable examples. In this scenario, we propose a methodology for ESN design and training based on Bayesian Optimization. Our Bayesian learning process efficiently searches hyper-parameters that maximize a fitness function. At the same time, it considers soft memory and time boundaries, measured empirically on the target device (whether embedded or not), and subject to the user’s constraints. Preliminary results show that the system is able to optimize the ESN hyper-parameters under stringent time and memory constraints, obtaining comparable results in terms of prediction accuracy.
[251] implementing deep learning using cu dnnNAVER D2
This document provides an overview of deep learning and implementation on GPU using cuDNN. It begins with a brief history of neural networks and an introduction to common deep learning models like convolutional neural networks. It then discusses implementing deep learning models using cuDNN, including initialization, forward and backward passes for layers like convolution, pooling and fully connected. It covers optimization issues like initialization and speeding up training. Finally, it introduces VUNO-Net, the company's deep learning framework, and discusses its performance, applications and visualization.
(2017/06)Practical points of deep learning for medical imagingKyuhwan Jung
This document provides an overview of deep learning and its applications in medical imaging. It discusses key topics such as the definition of artificial intelligence, a brief history of neural networks and machine learning, and how deep learning is driving breakthroughs in tasks like visual and speech recognition. The document also addresses challenges in medical data analysis using deep learning, such as how to handle limited data or annotations. It provides examples of techniques used to address these challenges, such as data augmentation, transfer learning, and weakly supervised learning.
- Geoffrey Hinton gives a tutorial on deep belief nets and how to learn multi-layer generative models of unlabeled data by learning one layer of features at a time using restricted Boltzmann machines (RBMs).
- RBMs make it possible to efficiently learn deep generative models one layer at a time by approximating the intractable posterior distribution over hidden units given visible data.
- Layer-by-layer unsupervised pre-training of features followed by discriminative fine-tuning improves classification performance on benchmark datasets like MNIST compared to backpropagation alone.
This document provides a critical review of recurrent neural networks for sequence learning. It begins with an abstract summarizing the paper. It then discusses why recurrent neural networks are well-suited for modeling sequential data compared to other models like feedforward neural networks and Markov models. Specifically, it notes that RNNs can capture long-range temporal dependencies, unlike models with a finite context window. It also explains that RNNs can represent a vast number of states using real-valued activations, unlike discrete state Markov models.
[PR12] Inception and Xception - Jaejun YooJaeJun Yoo
This document discusses Inception and Xception models for computer vision tasks. It describes the Inception architecture, which uses 1x1, 3x3 and 5x5 convolutional filters arranged in parallel to capture correlations at different scales more efficiently. It also describes the Xception model, which entirely separates cross-channel correlations and spatial correlations using depthwise separable convolutions. The document compares different approaches for reducing computational costs like pooling and strided convolutions.
(DL輪読)Matching Networks for One Shot LearningMasahiro Suzuki
1. Matching Networks is a neural network architecture proposed by DeepMind for one-shot learning.
2. The network learns to classify novel examples by comparing them to a small support set of examples, using an attention mechanism to focus on the most relevant support examples.
3. The network is trained using a meta-learning approach, where it learns to learn from small support sets to classify novel examples from classes not seen during training.
Lifelong Learning for Dynamically Expandable NetworksNAVER Engineering
발표자: 윤재홍(KAIST 박사과정)
발표일: 2018.7.
We propose a novel deep network architecture for lifelong learning which we refer to as Dynamically Expandable Network (DEN), that can dynamically decide its network capacity as it trains on a sequence of tasks, to learn a compact overlapping knowledge sharing structure among tasks. DEN is efficiently trained in an online manner by performing selective retraining, dynamically expands network capacity upon arrival of each task with only the necessary number of units, and effectively prevents semantic drift by splitting/duplicating units and timestamping them. We validate DEN on multiple public datasets under lifelong learning scenarios, on which it not only significantly outperforms existing lifelong learning methods for deep networks, but also achieves the same level of performance as the batch counterparts with substantially fewer number of parameters. Further, the obtained network fine-tuned on all tasks obtained significantly better performance over the batch models, which shows that it can be used to estimate the optimal network structure even when all tasks are available in the first place.
The document discusses clustering and k-means clustering algorithms. It provides examples of scenarios where clustering can be used, such as placing cell phone towers or opening new offices. It then defines clustering as organizing data into groups where objects within each group are similar to each other and dissimilar to objects in other groups. The document proceeds to explain k-means clustering, including the process of initializing cluster centers, assigning data points to the closest center, recomputing the centers, and iterating until centers converge. It provides a use case of using k-means to determine locations for new schools.
Robust Feature Learning with Deep Neural Networks
http://snu-primo.hosted.exlibrisgroup.com/primo_library/libweb/action/display.do?tabs=viewOnlineTab&doc=82SNU_INST21557911060002591
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
UNIT I INTRODUCTION
Neural Networks-Application Scope of Neural Networks-Artificial Neural Network: An IntroductionEvolution of Neural Networks-Basic Models of Artificial Neural Network- Important Terminologies of
ANNs-Supervised Learning Network.
This document describes a new method for detecting community structure in complex networks based on node similarity. The method works as follows:
1. It calculates the similarity between all node pairs using a local node similarity metric.
2. It treats each node as its own community initially. Then it iteratively incorporates the community of the current node with the communities containing its most similar nodes.
3. It selects the most similar uncovered node as the next current node, and repeats the process until all nodes have been incorporated into communities.
The method requires only local network information and has a computational complexity of O(nk) for a network with n nodes and average degree k. It is evaluated on real and computer-generated networks, demonstrating
Employing Neocognitron Neural Network Base Ensemble Classifiers To Enhance Ef...cscpconf
This paper presents an ensemble of neo-cognitron neural network base classifiers to enhance
the accuracy of the system, along the experimental results. The method offers lesser
computational preprocessing in comparison to other ensemble techniques as it ex-preempts
feature extraction process before feeding the data into base classifiers. This is achieved by the
basic nature of neo-cognitron, it is a multilayer feed-forward neural network. Ensemble of such
base classifiers gives class labels for each pattern that in turn is combined to give the final class
label for that pattern. The purpose of this paper is not only to exemplify learning behaviour of
neo-cognitron as base classifiers, but also to purport better fashion to combine neural network
based ensemble classifiers.
The document discusses neural networks and machine learning. It provides information on:
- Feed forward training of neural networks using input, hidden, and output layers
- Backpropagation to propagate errors backwards and adjust weights
- Perceptron learning algorithm for supervised learning using a single neuron
- Unsupervised learning to discover patterns without labels using methods like self-organizing maps
Deep Belief Nets (DBNs) are stacks of Restricted Boltzmann Machines (RBMs) that form a deep neural network architecture. RBMs are energy-based models that can be trained layer-by-layer to learn hierarchical representations of data. This presentation discusses how RBMs are used to learn the weights of DBNs in a greedy, unsupervised manner by treating the hidden units of one RBM as the visible data for the next RBM. Fine-tuning of the entire DBN can then be done with backpropagation. The paper demonstrates state-of-the-art performance of DBNs on MNIST handwritten digit recognition.
This document summarizes an experiment comparing two algorithms, Resilient Backpropagation (RPROP) and Gradient Descent, for training a Convolutional Neural Network to classify images from the UCI dataset. RPROP achieved higher accuracy (92% for happy vs sad expressions, 89% for images with and without sunglasses) and faster execution times than Gradient Descent (85% and 84% accuracy, respectively, with longer times). The authors conclude RPROP is a more effective training algorithm for CNNs based on these results.
What is meant by deep learning?
Deep learning is a subset of machine learning, which is essentially a neural network with three or more layers. These neural networks attempt to simulate the behavior of the human brain—albeit far from matching its ability—allowing it to “learn” from large amounts
A Time Series ANN Approach for Weather Forecastingijctcm
Weather forecasting is most challenging problem around the world. There are various reason because of its experimented values in meteorology, but it is also a typical unbiased time series forecasting problem in scientific research. A lots of methods proposed by various scientists. The motive behind research is to predict more accurate. This paper contribute the same using artificial neural network (ANN) and simulated in MATLAB to predict two important weather parameters i.e. maximum and minimum temperature. The model has been trained using past 60 years of real data collected from(1901-1960) and tested over 40 years to forecast maximum and minimum temperature. The results based on mean square error function (MSE) confirm, this model which is based on multilayer perceptron has the potential to successful application to weather forecasting
Professor Maria Petrou gave a lecture on "A Classification Framework for Software Component Models" in the Distinguished Lecturer Series - Leon The Mathematician.
More Information available at:
http://dls.csd.auth.gr
This document discusses probabilistic models for inference using Hidden Markov Models (HMM) and Bayesian networks. It provides references on HMM, Bayesian probability, and temporal models. It explains that probabilistic models are needed to handle uncertain knowledge and probabilistic reasoning, unlike logic-based models. The document outlines contents on learning and inference in HMM and Bayesian networks. It discusses uncertainty, Bayesian probability, generative models, inferences in Bayesian networks, and using temporal models like HMM. Mathematical representations of inference in HMM are also presented.
The slides includes an introduction to Long Short-term Memory (LSTM ) >> A novel approach in dealing with vanishing gradients in deep neural networks. Made for students, and anyone out there who'd love to learn about recurrent artificial neural networks, specifically of the LSTMs architecture.
Reference material has been attached to further your reading.
Invited talk at Tsinghua University on "Applications of Deep Neural Network". As the tech. lead of deep learning task force at NIO USA INC, I was invited to give this colloquium talk on general applications of deep neural network.
This document provides an overview of autoencoders and their use in unsupervised learning for deep neural networks. It discusses the history and development of neural networks, including early work in the 1940s-1980s and more recent advances in deep learning. It then explains how autoencoders work by setting the target values equal to the inputs, describes variants like denoising autoencoders, and how stacking autoencoders can create deep architectures for tasks like document retrieval, facial recognition, and signal denoising.
Reinforcement Learning (RL) approaches to deal with finding an optimal reward based policy to act in an environment (Charla en Inglés)
However, what has led to their widespread use is its combination with deep neural networks (DNN) i.e., deep reinforcement learning (Deep RL). Recent successes on not only learning to play games but also superseding humans in it and academia-industry research collaborations like for manipulation of objects, locomotion skills, smart grids, etc. have surely demonstrated their case on a wide variety of challenging tasks.
With application spanning across games, robotics, dialogue, healthcare, marketing, energy and many more domains, Deep RL might just be the power that drives the next generation of Artificial Intelligence (AI) agents!
This document discusses the application of machine learning in healthcare. It provides an overview of machine learning and data science concepts and methodologies like the CRISP-DM process. It also discusses challenges with non-communicable diseases and opportunities for applying machine learning to areas like precision medicine, disease diagnosis, and clinical trials optimization using diverse healthcare data sources. Machine learning can help address issues like reducing healthcare costs and improving outcomes for conditions like diabetes and cardiovascular disease.
More Related Content
Similar to Deep learning: el renacimiento de las redes neuronales
- Geoffrey Hinton gives a tutorial on deep belief nets and how to learn multi-layer generative models of unlabeled data by learning one layer of features at a time using restricted Boltzmann machines (RBMs).
- RBMs make it possible to efficiently learn deep generative models one layer at a time by approximating the intractable posterior distribution over hidden units given visible data.
- Layer-by-layer unsupervised pre-training of features followed by discriminative fine-tuning improves classification performance on benchmark datasets like MNIST compared to backpropagation alone.
This document provides a critical review of recurrent neural networks for sequence learning. It begins with an abstract summarizing the paper. It then discusses why recurrent neural networks are well-suited for modeling sequential data compared to other models like feedforward neural networks and Markov models. Specifically, it notes that RNNs can capture long-range temporal dependencies, unlike models with a finite context window. It also explains that RNNs can represent a vast number of states using real-valued activations, unlike discrete state Markov models.
[PR12] Inception and Xception - Jaejun YooJaeJun Yoo
This document discusses Inception and Xception models for computer vision tasks. It describes the Inception architecture, which uses 1x1, 3x3 and 5x5 convolutional filters arranged in parallel to capture correlations at different scales more efficiently. It also describes the Xception model, which entirely separates cross-channel correlations and spatial correlations using depthwise separable convolutions. The document compares different approaches for reducing computational costs like pooling and strided convolutions.
(DL輪読)Matching Networks for One Shot LearningMasahiro Suzuki
1. Matching Networks is a neural network architecture proposed by DeepMind for one-shot learning.
2. The network learns to classify novel examples by comparing them to a small support set of examples, using an attention mechanism to focus on the most relevant support examples.
3. The network is trained using a meta-learning approach, where it learns to learn from small support sets to classify novel examples from classes not seen during training.
Lifelong Learning for Dynamically Expandable NetworksNAVER Engineering
발표자: 윤재홍(KAIST 박사과정)
발표일: 2018.7.
We propose a novel deep network architecture for lifelong learning which we refer to as Dynamically Expandable Network (DEN), that can dynamically decide its network capacity as it trains on a sequence of tasks, to learn a compact overlapping knowledge sharing structure among tasks. DEN is efficiently trained in an online manner by performing selective retraining, dynamically expands network capacity upon arrival of each task with only the necessary number of units, and effectively prevents semantic drift by splitting/duplicating units and timestamping them. We validate DEN on multiple public datasets under lifelong learning scenarios, on which it not only significantly outperforms existing lifelong learning methods for deep networks, but also achieves the same level of performance as the batch counterparts with substantially fewer number of parameters. Further, the obtained network fine-tuned on all tasks obtained significantly better performance over the batch models, which shows that it can be used to estimate the optimal network structure even when all tasks are available in the first place.
The document discusses clustering and k-means clustering algorithms. It provides examples of scenarios where clustering can be used, such as placing cell phone towers or opening new offices. It then defines clustering as organizing data into groups where objects within each group are similar to each other and dissimilar to objects in other groups. The document proceeds to explain k-means clustering, including the process of initializing cluster centers, assigning data points to the closest center, recomputing the centers, and iterating until centers converge. It provides a use case of using k-means to determine locations for new schools.
Robust Feature Learning with Deep Neural Networks
http://snu-primo.hosted.exlibrisgroup.com/primo_library/libweb/action/display.do?tabs=viewOnlineTab&doc=82SNU_INST21557911060002591
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
UNIT I INTRODUCTION
Neural Networks-Application Scope of Neural Networks-Artificial Neural Network: An IntroductionEvolution of Neural Networks-Basic Models of Artificial Neural Network- Important Terminologies of
ANNs-Supervised Learning Network.
This document describes a new method for detecting community structure in complex networks based on node similarity. The method works as follows:
1. It calculates the similarity between all node pairs using a local node similarity metric.
2. It treats each node as its own community initially. Then it iteratively incorporates the community of the current node with the communities containing its most similar nodes.
3. It selects the most similar uncovered node as the next current node, and repeats the process until all nodes have been incorporated into communities.
The method requires only local network information and has a computational complexity of O(nk) for a network with n nodes and average degree k. It is evaluated on real and computer-generated networks, demonstrating
Employing Neocognitron Neural Network Base Ensemble Classifiers To Enhance Ef...cscpconf
This paper presents an ensemble of neo-cognitron neural network base classifiers to enhance
the accuracy of the system, along the experimental results. The method offers lesser
computational preprocessing in comparison to other ensemble techniques as it ex-preempts
feature extraction process before feeding the data into base classifiers. This is achieved by the
basic nature of neo-cognitron, it is a multilayer feed-forward neural network. Ensemble of such
base classifiers gives class labels for each pattern that in turn is combined to give the final class
label for that pattern. The purpose of this paper is not only to exemplify learning behaviour of
neo-cognitron as base classifiers, but also to purport better fashion to combine neural network
based ensemble classifiers.
The document discusses neural networks and machine learning. It provides information on:
- Feed forward training of neural networks using input, hidden, and output layers
- Backpropagation to propagate errors backwards and adjust weights
- Perceptron learning algorithm for supervised learning using a single neuron
- Unsupervised learning to discover patterns without labels using methods like self-organizing maps
Deep Belief Nets (DBNs) are stacks of Restricted Boltzmann Machines (RBMs) that form a deep neural network architecture. RBMs are energy-based models that can be trained layer-by-layer to learn hierarchical representations of data. This presentation discusses how RBMs are used to learn the weights of DBNs in a greedy, unsupervised manner by treating the hidden units of one RBM as the visible data for the next RBM. Fine-tuning of the entire DBN can then be done with backpropagation. The paper demonstrates state-of-the-art performance of DBNs on MNIST handwritten digit recognition.
This document summarizes an experiment comparing two algorithms, Resilient Backpropagation (RPROP) and Gradient Descent, for training a Convolutional Neural Network to classify images from the UCI dataset. RPROP achieved higher accuracy (92% for happy vs sad expressions, 89% for images with and without sunglasses) and faster execution times than Gradient Descent (85% and 84% accuracy, respectively, with longer times). The authors conclude RPROP is a more effective training algorithm for CNNs based on these results.
What is meant by deep learning?
Deep learning is a subset of machine learning, which is essentially a neural network with three or more layers. These neural networks attempt to simulate the behavior of the human brain—albeit far from matching its ability—allowing it to “learn” from large amounts
A Time Series ANN Approach for Weather Forecastingijctcm
Weather forecasting is most challenging problem around the world. There are various reason because of its experimented values in meteorology, but it is also a typical unbiased time series forecasting problem in scientific research. A lots of methods proposed by various scientists. The motive behind research is to predict more accurate. This paper contribute the same using artificial neural network (ANN) and simulated in MATLAB to predict two important weather parameters i.e. maximum and minimum temperature. The model has been trained using past 60 years of real data collected from(1901-1960) and tested over 40 years to forecast maximum and minimum temperature. The results based on mean square error function (MSE) confirm, this model which is based on multilayer perceptron has the potential to successful application to weather forecasting
Professor Maria Petrou gave a lecture on "A Classification Framework for Software Component Models" in the Distinguished Lecturer Series - Leon The Mathematician.
More Information available at:
http://dls.csd.auth.gr
This document discusses probabilistic models for inference using Hidden Markov Models (HMM) and Bayesian networks. It provides references on HMM, Bayesian probability, and temporal models. It explains that probabilistic models are needed to handle uncertain knowledge and probabilistic reasoning, unlike logic-based models. The document outlines contents on learning and inference in HMM and Bayesian networks. It discusses uncertainty, Bayesian probability, generative models, inferences in Bayesian networks, and using temporal models like HMM. Mathematical representations of inference in HMM are also presented.
The slides includes an introduction to Long Short-term Memory (LSTM ) >> A novel approach in dealing with vanishing gradients in deep neural networks. Made for students, and anyone out there who'd love to learn about recurrent artificial neural networks, specifically of the LSTMs architecture.
Reference material has been attached to further your reading.
Invited talk at Tsinghua University on "Applications of Deep Neural Network". As the tech. lead of deep learning task force at NIO USA INC, I was invited to give this colloquium talk on general applications of deep neural network.
This document provides an overview of autoencoders and their use in unsupervised learning for deep neural networks. It discusses the history and development of neural networks, including early work in the 1940s-1980s and more recent advances in deep learning. It then explains how autoencoders work by setting the target values equal to the inputs, describes variants like denoising autoencoders, and how stacking autoencoders can create deep architectures for tasks like document retrieval, facial recognition, and signal denoising.
Similar to Deep learning: el renacimiento de las redes neuronales (20)
Reinforcement Learning (RL) approaches to deal with finding an optimal reward based policy to act in an environment (Charla en Inglés)
However, what has led to their widespread use is its combination with deep neural networks (DNN) i.e., deep reinforcement learning (Deep RL). Recent successes on not only learning to play games but also superseding humans in it and academia-industry research collaborations like for manipulation of objects, locomotion skills, smart grids, etc. have surely demonstrated their case on a wide variety of challenging tasks.
With application spanning across games, robotics, dialogue, healthcare, marketing, energy and many more domains, Deep RL might just be the power that drives the next generation of Artificial Intelligence (AI) agents!
This document discusses the application of machine learning in healthcare. It provides an overview of machine learning and data science concepts and methodologies like the CRISP-DM process. It also discusses challenges with non-communicable diseases and opportunities for applying machine learning to areas like precision medicine, disease diagnosis, and clinical trials optimization using diverse healthcare data sources. Machine learning can help address issues like reducing healthcare costs and improving outcomes for conditions like diabetes and cardiovascular disease.
Whose Balance Sheet is this? Neural Networks for Banks’ Pattern RecognitionBig Data Colombia
This document discusses using neural networks to perform pattern recognition on banks' balance sheets. It proposes representing each balance sheet as a 27x1 pixel image and training a neural network to identify which bank each balance sheet belongs to. This could help detect important changes in banks' financial accounts over time and classify banks by risk level. The document reviews related literature on using neural networks for financial data analysis and pattern recognition. It argues that working with raw balance sheet data, rather than selected financial ratios, may provide more useful information for classification. The goal is to determine if neural networks can accurately recognize the owners of balance sheets presented as images.
Analysis of your own Facebook friends’ data structure through graphsBig Data Colombia
This document outlines steps to analyze a person's social network structure through visualizing their Facebook friend connections and relationships:
1. It recommends using the Lost Circles Chrome extension to scrape a user's Facebook friend list and export it to a JSON file.
2. The JSON file can then be converted to a graph data file format (GDF) using a Python script for analysis in Gephi network visualization software.
3. Gephi can be used to analyze and visualize the network based on metrics like betweenness centrality, degree distribution, and modularity to understand the network structure and relationships.
Este documento resume las conclusiones y decisiones tomadas por el concesionario espacial Saturno tras analizar los datos de tráfico y ventas de sus productos (Atlantis, Icarus y Destiny) en diferentes canales y momentos. Identificaron que los clientes, visitantes y productos más buscados variaban durante la semana, por lo que ajustarán la exhibición, puntos de venta, mensajes y capacitación de vendedores. También encontrar que la nave más buscada no era la más vendida, y que el target de Destiny no coincidía con sus visitantes.
Esta charla se pregunta sobre el rol del Big Data en las Smart Cities y la construcción de la ciudad futura. Gracias al desarrollo de campos como el Data Science, Internet of Things y Urban Analytics, surgen nuevas maneras de comprender las dinámicas y los entornos urbanos.
Los "Entornos Naturalmente Inteligentes" son la visión de una ciudad futura, como un organismo vivo y complejo que se adapta, se transforma y se reinventa; este proceso, es una búsqueda constante por construir nuevas maneras más sostenibles de coexistir con otros sistemas.
Estamos en un momento fascinante en el área de salud. Hoy en día es posible tener diagnósticos clínicos muy oportunos y generar predicciones en tiempo real, lo cual abre espacios que impactarán a la sociedad de forma muy positiva. Uno de éstos es la medicina de precisión que trata de explotar insights de condiciones biológicas, de entorno y hábitos para mejorar de forma preventiva la salud en los individuos.
Llegó el momento... las predicciones del futuro son ahora y en Colombia ya se están dando los primeros pasos!
Ayudando a los Viajeros usando 500 millones de Reseñas Hoteleras al MesBig Data Colombia
This document discusses how TrustYou processes large amounts of hotel review data to provide summaries to travelers. It crawls over 30 million reviews daily across 25 languages. Natural language processing and machine learning techniques are used to analyze the text and provide recommendations. Workflows are managed through Luigi and tasks include crawling, text processing, modeling word embeddings, and powering a sample application. Hadoop and Python are used extensively to handle the large scale processing.
Este documento describe la evolución de IPython y Jupyter, desde sus inicios como un shell interactivo de Python hasta convertirse en una plataforma multi-lenguaje para computación interactiva y publicación de documentos. Se explica cómo el protocolo REPL genérico de Jupyter permite ejecutar código en múltiples lenguajes y cómo herramientas como JupyterHub, nbviewer y notebooks han impulsado su adopción en educación, investigación y comunicación científica.
Un estudio reportado por la Harvard Business Review muestra las tres estrategias encontradas para explotar totalmente las capacidades de Big Data y Analytics en una organización, estas son: 1) identificar, combinar y manejar múltiples fuentes de datos. 2) Construir modelos analíticos avanzados para predecir y optimizar resultados. 3) Transformar las capacidades de la organización de tal forma que los datos utilizados y el análisis de los mismos lleven a tomar mejores decisiones. El modelo de Cloud computing sirve para cada uno de las capacidades anteriormente mencionadas.
https://www.youtube.com/watch?v=eXtWRkfMisM
Esta charla presentará conceptos introductorios de Machine Learning haciendo uso de kaggle.com (El portal de Data Scientists más grande del mundo). La charla se divide en:
1. Introducción a kaggle.com
2. Competencias de Machine Learning
3. Kaggle.com como sitio de contratación/búsqueda de trabajo
4. Cómo competir y obtener buenos resultados en competencias de ML
5. Ejemplos prácticos de competencias pasadas
1. Easy Solutions is a leading global provider of electronic fraud prevention for financial institutions and enterprise customers, protecting over 75 million users and monitoring over 22 billion online connections in the last 12 months.
2. Alejandro Correa Bahnsen is a data scientist at Easy Solutions who has over 8 years of experience in data science and works on fraud detection and prevention.
3. Fraud analytics uses machine learning and artificial intelligence techniques to analyze customer transaction data and detect patterns that can predict fraudulent transactions from legitimate ones.
Realizar análisis de datos cuando se tienen que cruzar grandes cantidades de información, procesarla y limpiarla es un reto difícil y dispendioso. Apache Spark es un framework para procesar grandes cantidades de información.
Introducción a las bodegas de datos: qué son y para qué son. Metodologías para el diseño y construcción de una bodega de datos, procesos ETL e integración de tecnologías.
El mundo de Big Data y Data Science es altamente técnico, pero entender cuáles son sus ideas centrales no requiere súper poderes. Explicaremos en qué consiste esta fascinante tendencia tecnológica y sus principales conceptos, herramientas y posibilidades.
El documento habla sobre cómo el Big Data puede ayudar en temas de salud, finanzas y relaciones. En salud, los algoritmos analíticos pueden identificar patrones en datos de pacientes que ayuden a detectar enfermedades como el Alzheimer más temprano. En finanzas, el análisis de grandes cantidades de datos ayuda a prevenir fraudes, cumplir regulaciones y ofrecer productos personalizados. En relaciones, sitios como eHarmony usan sistemas de compatibilidad que emparejan personas según 150 preguntas sobre personalidad y valores,
Este documento presenta una introducción a los conceptos de Business Analytics y Big Data. Explica cómo los grandes volúmenes de datos (Big Data) están cambiando los retos de las empresas y cómo adaptarse a ellos. Propone un plan de acción para aplicar técnicas analíticas a diferentes áreas como ventas, finanzas, operaciones y recursos humanos, con el fin de extraer valor agregado de los datos y transformar el negocio. Finalmente, muestra un caso práctico de aplicación de Big Data.
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Codeless Generative AI Pipelines
(GenAI with Milvus)
https://ml.dssconf.pl/user.html#!/lecture/DSSML24-041a/rate
Discover the potential of real-time streaming in the context of GenAI as we delve into the intricacies of Apache NiFi and its capabilities. Learn how this tool can significantly simplify the data engineering workflow for GenAI applications, allowing you to focus on the creative aspects rather than the technical complexities. I will guide you through practical examples and use cases, showing the impact of automation on prompt building. From data ingestion to transformation and delivery, witness how Apache NiFi streamlines the entire pipeline, ensuring a smooth and hassle-free experience.
Timothy Spann
https://www.youtube.com/@FLaNK-Stack
https://medium.com/@tspann
https://www.datainmotion.dev/
milvus, unstructured data, vector database, zilliz, cloud, vectors, python, deep learning, generative ai, genai, nifi, kafka, flink, streaming, iot, edge
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
Open Source Contributions to Postgres: The Basics POSETTE 2024ElizabethGarrettChri
Postgres is the most advanced open-source database in the world and it's supported by a community, not a single company. So how does this work? How does code actually get into Postgres? I recently had a patch submitted and committed and I want to share what I learned in that process. I’ll give you an overview of Postgres versions and how the underlying project codebase functions. I’ll also show you the process for submitting a patch and getting that tested and committed.
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
4. Fabio A. González Universidad Nacional de Colombia
Rosenblatt’s Perceptron
(1957)
• Input: 20x20 photocells array
• Weights implemented with
potentiometers
• Weight updating performed by
electric motors
5. Fabio A. González Universidad Nacional de Colombia
Neural networks time line
1943 20161957 1969 1986 1995 2007 2012
6. Fabio A. González Universidad Nacional de Colombia
Neural networks time line
1943 20161957 1969 1986 1995 2007 2012
7. Fabio A. González Universidad Nacional de Colombia
Neural networks time line
1943 20161957 1969 1986 1995 2007 2012
8. Fabio A. González Universidad Nacional de Colombia
Neural networks time line
1943 20161957 1969 1986 1995 2007 2012
9. Fabio A. González Universidad Nacional de Colombia
Neural networks time line
1943 20161957 1969 1986 1995 2007 2012
10. Fabio A. González Universidad Nacional de Colombia
Backpropagation
Source: http://home.agh.edu.pl/~vlsi/AI/backp_t_en/backprop.html
11. Fabio A. González Universidad Nacional de Colombia
Neural networks time line
1943 20161957 1969 1986 1995 2007 2012
12. Fabio A. González Universidad Nacional de Colombia
Neural networks time line
1943 20161957 1969 1986 1995 2007 2012
Colombian local NN history
13. Fabio A. González Universidad Nacional de Colombia
Colombian local neural
networks history
14. Fabio A. González Universidad Nacional de Colombia
Colombian local neural
networks history
UNNeuro (1993, UN)
15. Fabio A. González Universidad Nacional de Colombia
Neural networks time line
1943 20161957 1969 1986 1995 2007 2012
16. Fabio A. González Universidad Nacional de Colombia
Neural networks time line
1943 20161957 1969 1986 1995 2007 2012
17. Fabio A. González Universidad Nacional de Colombia
Neural networks time line
1943 20161957 1969 1986 1995 2007 2012
19. Fabio A. González Universidad Nacional de Colombia
Deep learning boom
20. Fabio A. González Universidad Nacional de Colombia
Deep learning boom
21. Fabio A. González Universidad Nacional de Colombia
Deep learning boom
22. Fabio A. González Universidad Nacional de Colombia
Deep learning boom
23. Fabio A. González Universidad Nacional de Colombia
Deep learning boom
24. Fabio A. González Universidad Nacional de Colombia
Deep learning boom
25. Fabio A. González Universidad Nacional de Colombia
Deep learning boom
26. Fabio A. González Universidad Nacional de Colombia
Deep learning time line
1943 20161957 1969 1986 1995 2007 2012
27. Fabio A. González Universidad Nacional de Colombia
Deep learning time line
1943 20161957 1969 1986 1995 2007 2012
LETTER Communicated by Yann Le Cun
A Fast Learning Algorithm for Deep Belief Nets
Geoffrey E. Hinton
hinton@cs.toronto.edu
Simon Osindero
osindero@cs.toronto.edu
Department of Computer Science, University of Toronto, Toronto, Canada M5S 3G4
Yee-Whye Teh
tehyw@comp.nus.edu.sg
Department of Computer Science, National University of Singapore,
Singapore 117543
We show how to use “complementary priors” to eliminate the explaining-
away effects that make inference difficult in densely connected belief nets
that have many hidden layers. Using complementary priors, we derive a
fast, greedy algorithm that can learn deep, directed belief networks one
layer at a time, provided the top two layers form an undirected associa-
tive memory. The fast, greedy algorithm is used to initialize a slower
learning procedure that fine-tunes the weights using a contrastive ver-
sion of the wake-sleep algorithm. After fine-tuning, a network with three
hidden layers forms a very good generative model of the joint distribu-
tion of handwritten digit images and their labels. This generative model
gives better digit classification than the best discriminative learning al-
gorithms. The low-dimensional manifolds on which the digits lie are
modeled by long ravines in the free-energy landscape of the top-level
associative memory, and it is easy to explore these ravines by using the
directed connections to display what the associative memory has in mind.
1 Introduction
Learning is difficult in densely connected, directed belief nets that have
many hidden layers because it is difficult to infer the conditional distribu-
tion of the hidden activities when given a data vector. Variational methods
use simple approximations to the true conditional distribution, but the ap-
proximations may be poor, especially at the deepest hidden layer, where
the prior assumes independence. Also, variational learning still requires all
of the parameters to be learned together and this makes the learning time
scale poorly as the number of parameters increases.
We describe a model in which the top two hidden layers form an undi-
rected associative memory (see Figure 1) and the remaining hidden layers
Neural Computation 18, 1527–1554 (2006) C⃝ 2006 Massachusetts Institute of Technology
LETTER Communicated by Yann Le Cun
A Fast Learning Algorithm for Deep Belief Nets
Geoffrey E. Hinton
hinton@cs.toronto.edu
Simon Osindero
osindero@cs.toronto.edu
Department of Computer Science, University of Toronto, Toronto, Canada M5S 3G4
Yee-Whye Teh
tehyw@comp.nus.edu.sg
Department of Computer Science, National University of Singapore,
Singapore 117543
We show how to use “complementary priors” to eliminate the explaining-
away effects that make inference difficult in densely connected belief nets
that have many hidden layers. Using complementary priors, we derive a
fast, greedy algorithm that can learn deep, directed belief networks one
layer at a time, provided the top two layers form an undirected associa-
tive memory. The fast, greedy algorithm is used to initialize a slower
learning procedure that fine-tunes the weights using a contrastive ver-
sion of the wake-sleep algorithm. After fine-tuning, a network with three
hidden layers forms a very good generative model of the joint distribu-
tion of handwritten digit images and their labels. This generative model
gives better digit classification than the best discriminative learning al-
gorithms. The low-dimensional manifolds on which the digits lie are
modeled by long ravines in the free-energy landscape of the top-level
associative memory, and it is easy to explore these ravines by using the
directed connections to display what the associative memory has in mind.
1 Introduction
Learning is difficult in densely connected, directed belief nets that have
many hidden layers because it is difficult to infer the conditional distribu-
tion of the hidden activities when given a data vector. Variational methods
use simple approximations to the true conditional distribution, but the ap-
proximations may be poor, especially at the deepest hidden layer, where
the prior assumes independence. Also, variational learning still requires all
of the parameters to be learned together and this makes the learning time
scale poorly as the number of parameters increases.
We describe a model in which the top two hidden layers form an undi-
rected associative memory (see Figure 1) and the remaining hidden layers
Neural Computation 18, 1527–1554 (2006) C⃝ 2006 Massachusetts Institute of Technology
28. Fabio A. González Universidad Nacional de Colombia
Deep learning time line
1943 20161957 1969 1986 1995 2007 2012
LETTER Communicated by Yann Le Cun
A Fast Learning Algorithm for Deep Belief Nets
Geoffrey E. Hinton
hinton@cs.toronto.edu
Simon Osindero
osindero@cs.toronto.edu
Department of Computer Science, University of Toronto, Toronto, Canada M5S 3G4
Yee-Whye Teh
tehyw@comp.nus.edu.sg
Department of Computer Science, National University of Singapore,
Singapore 117543
We show how to use “complementary priors” to eliminate the explaining-
away effects that make inference difficult in densely connected belief nets
that have many hidden layers. Using complementary priors, we derive a
fast, greedy algorithm that can learn deep, directed belief networks one
layer at a time, provided the top two layers form an undirected associa-
tive memory. The fast, greedy algorithm is used to initialize a slower
learning procedure that fine-tunes the weights using a contrastive ver-
sion of the wake-sleep algorithm. After fine-tuning, a network with three
hidden layers forms a very good generative model of the joint distribu-
tion of handwritten digit images and their labels. This generative model
gives better digit classification than the best discriminative learning al-
gorithms. The low-dimensional manifolds on which the digits lie are
modeled by long ravines in the free-energy landscape of the top-level
associative memory, and it is easy to explore these ravines by using the
directed connections to display what the associative memory has in mind.
1 Introduction
Learning is difficult in densely connected, directed belief nets that have
many hidden layers because it is difficult to infer the conditional distribu-
tion of the hidden activities when given a data vector. Variational methods
use simple approximations to the true conditional distribution, but the ap-
proximations may be poor, especially at the deepest hidden layer, where
the prior assumes independence. Also, variational learning still requires all
of the parameters to be learned together and this makes the learning time
scale poorly as the number of parameters increases.
We describe a model in which the top two hidden layers form an undi-
rected associative memory (see Figure 1) and the remaining hidden layers
Neural Computation 18, 1527–1554 (2006) C⃝ 2006 Massachusetts Institute of Technology
LETTER Communicated by Yann Le Cun
A Fast Learning Algorithm for Deep Belief Nets
Geoffrey E. Hinton
hinton@cs.toronto.edu
Simon Osindero
osindero@cs.toronto.edu
Department of Computer Science, University of Toronto, Toronto, Canada M5S 3G4
Yee-Whye Teh
tehyw@comp.nus.edu.sg
Department of Computer Science, National University of Singapore,
Singapore 117543
We show how to use “complementary priors” to eliminate the explaining-
away effects that make inference difficult in densely connected belief nets
that have many hidden layers. Using complementary priors, we derive a
fast, greedy algorithm that can learn deep, directed belief networks one
layer at a time, provided the top two layers form an undirected associa-
tive memory. The fast, greedy algorithm is used to initialize a slower
learning procedure that fine-tunes the weights using a contrastive ver-
sion of the wake-sleep algorithm. After fine-tuning, a network with three
hidden layers forms a very good generative model of the joint distribu-
tion of handwritten digit images and their labels. This generative model
gives better digit classification than the best discriminative learning al-
gorithms. The low-dimensional manifolds on which the digits lie are
modeled by long ravines in the free-energy landscape of the top-level
associative memory, and it is easy to explore these ravines by using the
directed connections to display what the associative memory has in mind.
1 Introduction
Learning is difficult in densely connected, directed belief nets that have
many hidden layers because it is difficult to infer the conditional distribu-
tion of the hidden activities when given a data vector. Variational methods
use simple approximations to the true conditional distribution, but the ap-
proximations may be poor, especially at the deepest hidden layer, where
the prior assumes independence. Also, variational learning still requires all
of the parameters to be learned together and this makes the learning time
scale poorly as the number of parameters increases.
We describe a model in which the top two hidden layers form an undi-
rected associative memory (see Figure 1) and the remaining hidden layers
Neural Computation 18, 1527–1554 (2006) C⃝ 2006 Massachusetts Institute of Technology
29. Fabio A. González Universidad Nacional de Colombia
Deep learning time line
1943 20161957 1969 1986 1995 2007 2012
LETTER Communicated by Yann Le Cun
A Fast Learning Algorithm for Deep Belief Nets
Geoffrey E. Hinton
hinton@cs.toronto.edu
Simon Osindero
osindero@cs.toronto.edu
Department of Computer Science, University of Toronto, Toronto, Canada M5S 3G4
Yee-Whye Teh
tehyw@comp.nus.edu.sg
Department of Computer Science, National University of Singapore,
Singapore 117543
We show how to use “complementary priors” to eliminate the explaining-
away effects that make inference difficult in densely connected belief nets
that have many hidden layers. Using complementary priors, we derive a
fast, greedy algorithm that can learn deep, directed belief networks one
layer at a time, provided the top two layers form an undirected associa-
tive memory. The fast, greedy algorithm is used to initialize a slower
learning procedure that fine-tunes the weights using a contrastive ver-
sion of the wake-sleep algorithm. After fine-tuning, a network with three
hidden layers forms a very good generative model of the joint distribu-
tion of handwritten digit images and their labels. This generative model
gives better digit classification than the best discriminative learning al-
gorithms. The low-dimensional manifolds on which the digits lie are
modeled by long ravines in the free-energy landscape of the top-level
associative memory, and it is easy to explore these ravines by using the
directed connections to display what the associative memory has in mind.
1 Introduction
Learning is difficult in densely connected, directed belief nets that have
many hidden layers because it is difficult to infer the conditional distribu-
tion of the hidden activities when given a data vector. Variational methods
use simple approximations to the true conditional distribution, but the ap-
proximations may be poor, especially at the deepest hidden layer, where
the prior assumes independence. Also, variational learning still requires all
of the parameters to be learned together and this makes the learning time
scale poorly as the number of parameters increases.
We describe a model in which the top two hidden layers form an undi-
rected associative memory (see Figure 1) and the remaining hidden layers
Neural Computation 18, 1527–1554 (2006) C⃝ 2006 Massachusetts Institute of Technology
LETTER Communicated by Yann Le Cun
A Fast Learning Algorithm for Deep Belief Nets
Geoffrey E. Hinton
hinton@cs.toronto.edu
Simon Osindero
osindero@cs.toronto.edu
Department of Computer Science, University of Toronto, Toronto, Canada M5S 3G4
Yee-Whye Teh
tehyw@comp.nus.edu.sg
Department of Computer Science, National University of Singapore,
Singapore 117543
We show how to use “complementary priors” to eliminate the explaining-
away effects that make inference difficult in densely connected belief nets
that have many hidden layers. Using complementary priors, we derive a
fast, greedy algorithm that can learn deep, directed belief networks one
layer at a time, provided the top two layers form an undirected associa-
tive memory. The fast, greedy algorithm is used to initialize a slower
learning procedure that fine-tunes the weights using a contrastive ver-
sion of the wake-sleep algorithm. After fine-tuning, a network with three
hidden layers forms a very good generative model of the joint distribu-
tion of handwritten digit images and their labels. This generative model
gives better digit classification than the best discriminative learning al-
gorithms. The low-dimensional manifolds on which the digits lie are
modeled by long ravines in the free-energy landscape of the top-level
associative memory, and it is easy to explore these ravines by using the
directed connections to display what the associative memory has in mind.
1 Introduction
Learning is difficult in densely connected, directed belief nets that have
many hidden layers because it is difficult to infer the conditional distribu-
tion of the hidden activities when given a data vector. Variational methods
use simple approximations to the true conditional distribution, but the ap-
proximations may be poor, especially at the deepest hidden layer, where
the prior assumes independence. Also, variational learning still requires all
of the parameters to be learned together and this makes the learning time
scale poorly as the number of parameters increases.
We describe a model in which the top two hidden layers form an undi-
rected associative memory (see Figure 1) and the remaining hidden layers
Neural Computation 18, 1527–1554 (2006) C⃝ 2006 Massachusetts Institute of Technology
30. Fabio A. González Universidad Nacional de Colombia
Deep learning time line
1943 20161957 1969 1986 1995 2007 2012
LETTER Communicated by Yann Le Cun
A Fast Learning Algorithm for Deep Belief Nets
Geoffrey E. Hinton
hinton@cs.toronto.edu
Simon Osindero
osindero@cs.toronto.edu
Department of Computer Science, University of Toronto, Toronto, Canada M5S 3G4
Yee-Whye Teh
tehyw@comp.nus.edu.sg
Department of Computer Science, National University of Singapore,
Singapore 117543
We show how to use “complementary priors” to eliminate the explaining-
away effects that make inference difficult in densely connected belief nets
that have many hidden layers. Using complementary priors, we derive a
fast, greedy algorithm that can learn deep, directed belief networks one
layer at a time, provided the top two layers form an undirected associa-
tive memory. The fast, greedy algorithm is used to initialize a slower
learning procedure that fine-tunes the weights using a contrastive ver-
sion of the wake-sleep algorithm. After fine-tuning, a network with three
hidden layers forms a very good generative model of the joint distribu-
tion of handwritten digit images and their labels. This generative model
gives better digit classification than the best discriminative learning al-
gorithms. The low-dimensional manifolds on which the digits lie are
modeled by long ravines in the free-energy landscape of the top-level
associative memory, and it is easy to explore these ravines by using the
directed connections to display what the associative memory has in mind.
1 Introduction
Learning is difficult in densely connected, directed belief nets that have
many hidden layers because it is difficult to infer the conditional distribu-
tion of the hidden activities when given a data vector. Variational methods
use simple approximations to the true conditional distribution, but the ap-
proximations may be poor, especially at the deepest hidden layer, where
the prior assumes independence. Also, variational learning still requires all
of the parameters to be learned together and this makes the learning time
scale poorly as the number of parameters increases.
We describe a model in which the top two hidden layers form an undi-
rected associative memory (see Figure 1) and the remaining hidden layers
Neural Computation 18, 1527–1554 (2006) C⃝ 2006 Massachusetts Institute of Technology
LETTER Communicated by Yann Le Cun
A Fast Learning Algorithm for Deep Belief Nets
Geoffrey E. Hinton
hinton@cs.toronto.edu
Simon Osindero
osindero@cs.toronto.edu
Department of Computer Science, University of Toronto, Toronto, Canada M5S 3G4
Yee-Whye Teh
tehyw@comp.nus.edu.sg
Department of Computer Science, National University of Singapore,
Singapore 117543
We show how to use “complementary priors” to eliminate the explaining-
away effects that make inference difficult in densely connected belief nets
that have many hidden layers. Using complementary priors, we derive a
fast, greedy algorithm that can learn deep, directed belief networks one
layer at a time, provided the top two layers form an undirected associa-
tive memory. The fast, greedy algorithm is used to initialize a slower
learning procedure that fine-tunes the weights using a contrastive ver-
sion of the wake-sleep algorithm. After fine-tuning, a network with three
hidden layers forms a very good generative model of the joint distribu-
tion of handwritten digit images and their labels. This generative model
gives better digit classification than the best discriminative learning al-
gorithms. The low-dimensional manifolds on which the digits lie are
modeled by long ravines in the free-energy landscape of the top-level
associative memory, and it is easy to explore these ravines by using the
directed connections to display what the associative memory has in mind.
1 Introduction
Learning is difficult in densely connected, directed belief nets that have
many hidden layers because it is difficult to infer the conditional distribu-
tion of the hidden activities when given a data vector. Variational methods
use simple approximations to the true conditional distribution, but the ap-
proximations may be poor, especially at the deepest hidden layer, where
the prior assumes independence. Also, variational learning still requires all
of the parameters to be learned together and this makes the learning time
scale poorly as the number of parameters increases.
We describe a model in which the top two hidden layers form an undi-
rected associative memory (see Figure 1) and the remaining hidden layers
Neural Computation 18, 1527–1554 (2006) C⃝ 2006 Massachusetts Institute of Technology
31. Fabio A. González Universidad Nacional de Colombia
Deep learning time line
1943 20161957 1969 1986 1995 2007 2012
LETTER Communicated by Yann Le Cun
A Fast Learning Algorithm for Deep Belief Nets
Geoffrey E. Hinton
hinton@cs.toronto.edu
Simon Osindero
osindero@cs.toronto.edu
Department of Computer Science, University of Toronto, Toronto, Canada M5S 3G4
Yee-Whye Teh
tehyw@comp.nus.edu.sg
Department of Computer Science, National University of Singapore,
Singapore 117543
We show how to use “complementary priors” to eliminate the explaining-
away effects that make inference difficult in densely connected belief nets
that have many hidden layers. Using complementary priors, we derive a
fast, greedy algorithm that can learn deep, directed belief networks one
layer at a time, provided the top two layers form an undirected associa-
tive memory. The fast, greedy algorithm is used to initialize a slower
learning procedure that fine-tunes the weights using a contrastive ver-
sion of the wake-sleep algorithm. After fine-tuning, a network with three
hidden layers forms a very good generative model of the joint distribu-
tion of handwritten digit images and their labels. This generative model
gives better digit classification than the best discriminative learning al-
gorithms. The low-dimensional manifolds on which the digits lie are
modeled by long ravines in the free-energy landscape of the top-level
associative memory, and it is easy to explore these ravines by using the
directed connections to display what the associative memory has in mind.
1 Introduction
Learning is difficult in densely connected, directed belief nets that have
many hidden layers because it is difficult to infer the conditional distribu-
tion of the hidden activities when given a data vector. Variational methods
use simple approximations to the true conditional distribution, but the ap-
proximations may be poor, especially at the deepest hidden layer, where
the prior assumes independence. Also, variational learning still requires all
of the parameters to be learned together and this makes the learning time
scale poorly as the number of parameters increases.
We describe a model in which the top two hidden layers form an undi-
rected associative memory (see Figure 1) and the remaining hidden layers
Neural Computation 18, 1527–1554 (2006) C⃝ 2006 Massachusetts Institute of Technology
LETTER Communicated by Yann Le Cun
A Fast Learning Algorithm for Deep Belief Nets
Geoffrey E. Hinton
hinton@cs.toronto.edu
Simon Osindero
osindero@cs.toronto.edu
Department of Computer Science, University of Toronto, Toronto, Canada M5S 3G4
Yee-Whye Teh
tehyw@comp.nus.edu.sg
Department of Computer Science, National University of Singapore,
Singapore 117543
We show how to use “complementary priors” to eliminate the explaining-
away effects that make inference difficult in densely connected belief nets
that have many hidden layers. Using complementary priors, we derive a
fast, greedy algorithm that can learn deep, directed belief networks one
layer at a time, provided the top two layers form an undirected associa-
tive memory. The fast, greedy algorithm is used to initialize a slower
learning procedure that fine-tunes the weights using a contrastive ver-
sion of the wake-sleep algorithm. After fine-tuning, a network with three
hidden layers forms a very good generative model of the joint distribu-
tion of handwritten digit images and their labels. This generative model
gives better digit classification than the best discriminative learning al-
gorithms. The low-dimensional manifolds on which the digits lie are
modeled by long ravines in the free-energy landscape of the top-level
associative memory, and it is easy to explore these ravines by using the
directed connections to display what the associative memory has in mind.
1 Introduction
Learning is difficult in densely connected, directed belief nets that have
many hidden layers because it is difficult to infer the conditional distribu-
tion of the hidden activities when given a data vector. Variational methods
use simple approximations to the true conditional distribution, but the ap-
proximations may be poor, especially at the deepest hidden layer, where
the prior assumes independence. Also, variational learning still requires all
of the parameters to be learned together and this makes the learning time
scale poorly as the number of parameters increases.
We describe a model in which the top two hidden layers form an undi-
rected associative memory (see Figure 1) and the remaining hidden layers
Neural Computation 18, 1527–1554 (2006) C⃝ 2006 Massachusetts Institute of Technology
32. Fabio A. González Universidad Nacional de Colombia
Deep learning time line
1943 20161957 1969 1986 1995 2007 2012
LETTER Communicated by Yann Le Cun
A Fast Learning Algorithm for Deep Belief Nets
Geoffrey E. Hinton
hinton@cs.toronto.edu
Simon Osindero
osindero@cs.toronto.edu
Department of Computer Science, University of Toronto, Toronto, Canada M5S 3G4
Yee-Whye Teh
tehyw@comp.nus.edu.sg
Department of Computer Science, National University of Singapore,
Singapore 117543
We show how to use “complementary priors” to eliminate the explaining-
away effects that make inference difficult in densely connected belief nets
that have many hidden layers. Using complementary priors, we derive a
fast, greedy algorithm that can learn deep, directed belief networks one
layer at a time, provided the top two layers form an undirected associa-
tive memory. The fast, greedy algorithm is used to initialize a slower
learning procedure that fine-tunes the weights using a contrastive ver-
sion of the wake-sleep algorithm. After fine-tuning, a network with three
hidden layers forms a very good generative model of the joint distribu-
tion of handwritten digit images and their labels. This generative model
gives better digit classification than the best discriminative learning al-
gorithms. The low-dimensional manifolds on which the digits lie are
modeled by long ravines in the free-energy landscape of the top-level
associative memory, and it is easy to explore these ravines by using the
directed connections to display what the associative memory has in mind.
1 Introduction
Learning is difficult in densely connected, directed belief nets that have
many hidden layers because it is difficult to infer the conditional distribu-
tion of the hidden activities when given a data vector. Variational methods
use simple approximations to the true conditional distribution, but the ap-
proximations may be poor, especially at the deepest hidden layer, where
the prior assumes independence. Also, variational learning still requires all
of the parameters to be learned together and this makes the learning time
scale poorly as the number of parameters increases.
We describe a model in which the top two hidden layers form an undi-
rected associative memory (see Figure 1) and the remaining hidden layers
Neural Computation 18, 1527–1554 (2006) C⃝ 2006 Massachusetts Institute of Technology
LETTER Communicated by Yann Le Cun
A Fast Learning Algorithm for Deep Belief Nets
Geoffrey E. Hinton
hinton@cs.toronto.edu
Simon Osindero
osindero@cs.toronto.edu
Department of Computer Science, University of Toronto, Toronto, Canada M5S 3G4
Yee-Whye Teh
tehyw@comp.nus.edu.sg
Department of Computer Science, National University of Singapore,
Singapore 117543
We show how to use “complementary priors” to eliminate the explaining-
away effects that make inference difficult in densely connected belief nets
that have many hidden layers. Using complementary priors, we derive a
fast, greedy algorithm that can learn deep, directed belief networks one
layer at a time, provided the top two layers form an undirected associa-
tive memory. The fast, greedy algorithm is used to initialize a slower
learning procedure that fine-tunes the weights using a contrastive ver-
sion of the wake-sleep algorithm. After fine-tuning, a network with three
hidden layers forms a very good generative model of the joint distribu-
tion of handwritten digit images and their labels. This generative model
gives better digit classification than the best discriminative learning al-
gorithms. The low-dimensional manifolds on which the digits lie are
modeled by long ravines in the free-energy landscape of the top-level
associative memory, and it is easy to explore these ravines by using the
directed connections to display what the associative memory has in mind.
1 Introduction
Learning is difficult in densely connected, directed belief nets that have
many hidden layers because it is difficult to infer the conditional distribu-
tion of the hidden activities when given a data vector. Variational methods
use simple approximations to the true conditional distribution, but the ap-
proximations may be poor, especially at the deepest hidden layer, where
the prior assumes independence. Also, variational learning still requires all
of the parameters to be learned together and this makes the learning time
scale poorly as the number of parameters increases.
We describe a model in which the top two hidden layers form an undi-
rected associative memory (see Figure 1) and the remaining hidden layers
Neural Computation 18, 1527–1554 (2006) C⃝ 2006 Massachusetts Institute of Technology
LETTER Communicated by Yann Le Cun
A Fast Learning Algorithm for Deep Belief Nets
Geoffrey E. Hinton
hinton@cs.toronto.edu
Simon Osindero
osindero@cs.toronto.edu
Department of Computer Science, University of Toronto, Toronto, Canada M5S 3G4
Yee-Whye Teh
tehyw@comp.nus.edu.sg
Department of Computer Science, National University of Singapore,
Singapore 117543
We show how to use “complementary priors” to eliminate the explaining-
away effects that make inference difficult in densely connected belief nets
that have many hidden layers. Using complementary priors, we derive a
Learning is difficult in densely connected, directed belief nets that have
many hidden layers because it is difficult to infer the conditional distribu-
tion of the hidden activities when given a data vector. Variational methods
use simple approximations to the true conditional distribution, but the ap-
proximations may be poor, especially at the deepest hidden layer, where
the prior assumes independence. Also, variational learning still requires all
of the parameters to be learned together and this makes the learning time
scale poorly as the number of parameters increases.
We describe a model in which the top two hidden layers form an undi-
rected associative memory (see Figure 1) and the remaining hidden layers
Neural Computation 18, 1527–1554 (2006) C⃝ 2006 Massachusetts Institute of Technology
33. Fabio A. González Universidad Nacional de Colombia
Deep learning time line
1943 20161957 1969 1986 1995 2007 2012
LETTER Communicated by Yann Le Cun
A Fast Learning Algorithm for Deep Belief Nets
Geoffrey E. Hinton
hinton@cs.toronto.edu
Simon Osindero
osindero@cs.toronto.edu
Department of Computer Science, University of Toronto, Toronto, Canada M5S 3G4
Yee-Whye Teh
tehyw@comp.nus.edu.sg
Department of Computer Science, National University of Singapore,
Singapore 117543
We show how to use “complementary priors” to eliminate the explaining-
away effects that make inference difficult in densely connected belief nets
that have many hidden layers. Using complementary priors, we derive a
fast, greedy algorithm that can learn deep, directed belief networks one
layer at a time, provided the top two layers form an undirected associa-
tive memory. The fast, greedy algorithm is used to initialize a slower
learning procedure that fine-tunes the weights using a contrastive ver-
sion of the wake-sleep algorithm. After fine-tuning, a network with three
hidden layers forms a very good generative model of the joint distribu-
tion of handwritten digit images and their labels. This generative model
gives better digit classification than the best discriminative learning al-
gorithms. The low-dimensional manifolds on which the digits lie are
modeled by long ravines in the free-energy landscape of the top-level
associative memory, and it is easy to explore these ravines by using the
directed connections to display what the associative memory has in mind.
1 Introduction
Learning is difficult in densely connected, directed belief nets that have
many hidden layers because it is difficult to infer the conditional distribu-
tion of the hidden activities when given a data vector. Variational methods
use simple approximations to the true conditional distribution, but the ap-
proximations may be poor, especially at the deepest hidden layer, where
the prior assumes independence. Also, variational learning still requires all
of the parameters to be learned together and this makes the learning time
scale poorly as the number of parameters increases.
We describe a model in which the top two hidden layers form an undi-
rected associative memory (see Figure 1) and the remaining hidden layers
Neural Computation 18, 1527–1554 (2006) C⃝ 2006 Massachusetts Institute of Technology
LETTER Communicated by Yann Le Cun
A Fast Learning Algorithm for Deep Belief Nets
Geoffrey E. Hinton
hinton@cs.toronto.edu
Simon Osindero
osindero@cs.toronto.edu
Department of Computer Science, University of Toronto, Toronto, Canada M5S 3G4
Yee-Whye Teh
tehyw@comp.nus.edu.sg
Department of Computer Science, National University of Singapore,
Singapore 117543
We show how to use “complementary priors” to eliminate the explaining-
away effects that make inference difficult in densely connected belief nets
that have many hidden layers. Using complementary priors, we derive a
fast, greedy algorithm that can learn deep, directed belief networks one
layer at a time, provided the top two layers form an undirected associa-
tive memory. The fast, greedy algorithm is used to initialize a slower
learning procedure that fine-tunes the weights using a contrastive ver-
sion of the wake-sleep algorithm. After fine-tuning, a network with three
hidden layers forms a very good generative model of the joint distribu-
tion of handwritten digit images and their labels. This generative model
gives better digit classification than the best discriminative learning al-
gorithms. The low-dimensional manifolds on which the digits lie are
modeled by long ravines in the free-energy landscape of the top-level
associative memory, and it is easy to explore these ravines by using the
directed connections to display what the associative memory has in mind.
1 Introduction
Learning is difficult in densely connected, directed belief nets that have
many hidden layers because it is difficult to infer the conditional distribu-
tion of the hidden activities when given a data vector. Variational methods
use simple approximations to the true conditional distribution, but the ap-
proximations may be poor, especially at the deepest hidden layer, where
the prior assumes independence. Also, variational learning still requires all
of the parameters to be learned together and this makes the learning time
scale poorly as the number of parameters increases.
We describe a model in which the top two hidden layers form an undi-
rected associative memory (see Figure 1) and the remaining hidden layers
Neural Computation 18, 1527–1554 (2006) C⃝ 2006 Massachusetts Institute of Technology
34. Fabio A. González Universidad Nacional de Colombia
Deep learning time line
1943 20161957 1969 1986 1995 2007 2012
LETTER Communicated by Yann Le Cun
A Fast Learning Algorithm for Deep Belief Nets
Geoffrey E. Hinton
hinton@cs.toronto.edu
Simon Osindero
osindero@cs.toronto.edu
Department of Computer Science, University of Toronto, Toronto, Canada M5S 3G4
Yee-Whye Teh
tehyw@comp.nus.edu.sg
Department of Computer Science, National University of Singapore,
Singapore 117543
We show how to use “complementary priors” to eliminate the explaining-
away effects that make inference difficult in densely connected belief nets
that have many hidden layers. Using complementary priors, we derive a
fast, greedy algorithm that can learn deep, directed belief networks one
layer at a time, provided the top two layers form an undirected associa-
tive memory. The fast, greedy algorithm is used to initialize a slower
learning procedure that fine-tunes the weights using a contrastive ver-
sion of the wake-sleep algorithm. After fine-tuning, a network with three
hidden layers forms a very good generative model of the joint distribu-
tion of handwritten digit images and their labels. This generative model
gives better digit classification than the best discriminative learning al-
gorithms. The low-dimensional manifolds on which the digits lie are
modeled by long ravines in the free-energy landscape of the top-level
associative memory, and it is easy to explore these ravines by using the
directed connections to display what the associative memory has in mind.
1 Introduction
Learning is difficult in densely connected, directed belief nets that have
many hidden layers because it is difficult to infer the conditional distribu-
tion of the hidden activities when given a data vector. Variational methods
use simple approximations to the true conditional distribution, but the ap-
proximations may be poor, especially at the deepest hidden layer, where
the prior assumes independence. Also, variational learning still requires all
of the parameters to be learned together and this makes the learning time
scale poorly as the number of parameters increases.
We describe a model in which the top two hidden layers form an undi-
rected associative memory (see Figure 1) and the remaining hidden layers
Neural Computation 18, 1527–1554 (2006) C⃝ 2006 Massachusetts Institute of Technology
LETTER Communicated by Yann Le Cun
A Fast Learning Algorithm for Deep Belief Nets
Geoffrey E. Hinton
hinton@cs.toronto.edu
Simon Osindero
osindero@cs.toronto.edu
Department of Computer Science, University of Toronto, Toronto, Canada M5S 3G4
Yee-Whye Teh
tehyw@comp.nus.edu.sg
Department of Computer Science, National University of Singapore,
Singapore 117543
We show how to use “complementary priors” to eliminate the explaining-
away effects that make inference difficult in densely connected belief nets
that have many hidden layers. Using complementary priors, we derive a
fast, greedy algorithm that can learn deep, directed belief networks one
layer at a time, provided the top two layers form an undirected associa-
tive memory. The fast, greedy algorithm is used to initialize a slower
learning procedure that fine-tunes the weights using a contrastive ver-
sion of the wake-sleep algorithm. After fine-tuning, a network with three
hidden layers forms a very good generative model of the joint distribu-
tion of handwritten digit images and their labels. This generative model
gives better digit classification than the best discriminative learning al-
gorithms. The low-dimensional manifolds on which the digits lie are
modeled by long ravines in the free-energy landscape of the top-level
associative memory, and it is easy to explore these ravines by using the
directed connections to display what the associative memory has in mind.
1 Introduction
Learning is difficult in densely connected, directed belief nets that have
many hidden layers because it is difficult to infer the conditional distribu-
tion of the hidden activities when given a data vector. Variational methods
use simple approximations to the true conditional distribution, but the ap-
proximations may be poor, especially at the deepest hidden layer, where
the prior assumes independence. Also, variational learning still requires all
of the parameters to be learned together and this makes the learning time
scale poorly as the number of parameters increases.
We describe a model in which the top two hidden layers form an undi-
rected associative memory (see Figure 1) and the remaining hidden layers
Neural Computation 18, 1527–1554 (2006) C⃝ 2006 Massachusetts Institute of Technology
35. Fabio A. González Universidad Nacional de Colombia
Deep learning model won
ILSVRC 2012 challenge
36. Fabio A. González Universidad Nacional de Colombia
Deep learning model won
ILSVRC 2012 challenge
1.2 million images
1,000 concepts
37. Fabio A. González Universidad Nacional de Colombia
Deep learning model won
ILSVRC 2012 challenge
1.2 million images
1,000 concepts
38. Fabio A. González Universidad Nacional de Colombia
Deep learning recipe
Data
HPC
Algorithms
Tricks
Feature
learning
Size
50. Fabio A. González Universidad Nacional de Colombia
Deep learning recipe
Data
HPC
Algorithms
Tricks
Feature
learning
Size
51. Fabio A. González Universidad Nacional de Colombia
Algorithms
• Backpropagation
• Backpropagation through time
• Online learning (stochastic
gradient descent)
• Softmax (hierarchical)
52. Fabio A. González Universidad Nacional de Colombia
Deep learning recipe
Data
HPC
Algorithms
Tricks
Feature
learning
Size
53. Fabio A. González Universidad Nacional de Colombia
Tricks
• DL is mainly an engineering
problem
• DL networks are hard to train
• Several tricks product of years
of experience
54. Fabio A. González Universidad Nacional de Colombia
Tricks
• DL is mainly an engineering
problem
• DL networks are hard to train
• Several tricks product of years
of experience
• Layer-wise training
• RELU units
• Dropout
• Adaptive learning rates
• Initialization
• Preprocessing
• Gradient norm clipping
55. Fabio A. González Universidad Nacional de Colombia
Applications
• Computer vision:
• Image: annotation, detection, segmentation, captioning
• Video: object tracking, action recognition, segmentation
• Speech recognition and synthesis
• Text: language modeling, word/text representation, text
classification, translation
• Biomedical image analysis
57. Fabio A. González Universidad Nacional de Colombia
Feature learning for cancer
diagnosisArtificial Intelligence in Medicine 64 (2015) 131–145
Contents lists available at ScienceDirect
Artificial Intelligence in Medicine
journal homepage: www.elsevier.com/locate/aiim
An unsupervised feature learning framework for basal cell carcinoma
image analysis
John Arevaloa
, Angel Cruz-Roaa
, Viviana Ariasb
, Eduardo Romeroc
, Fabio A. Gonzáleza,∗
a
Machine Learning, Perception and Discovery Lab, Systems and Computer Engineering Department, Universidad Nacional de Colombia, Faculty of
Engineering, Cra 30 No 45 03-Ciudad Universitaria, Building 453 Office 114, Bogotá DC, Colombia
b
Pathology Department, Universidad Nacional de Colombia, Faculty of Medicine, Cra 30 No 45 03-Ciudad Universitaria, Bogotá DC, Colombia
c
Computer Imaging & Medical Applications Laboratory, Universidad Nacional de Colombia, Faculty of Medicine, Cra 30 No 45 03-Ciudad Universitaria,
Bogotá DC, Colombia
a r t i c l e i n f o
Article history:
Received 1 October 2014
Received in revised form 9 April 2015
Accepted 15 April 2015
Keywords:
Digital pathology
Representation learning
Unsupervised feature learning
Basal cell carcinoma
a b s t r a c t
Objective: The paper addresses the problem of automatic detection of basal cell carcinoma (BCC) in
histopathology images. In particular, it proposes a framework to both, learn the image representation in
an unsupervised way and visualize discriminative features supported by the learned model.
Materials and methods: This paper presents an integrated unsupervised feature learning (UFL) framework
for histopathology image analysis that comprises three main stages: (1) local (patch) representation
learning using different strategies (sparse autoencoders, reconstruct independent component analy-
sis and topographic independent component analysis (TICA), (2) global (image) representation learning
using a bag-of-features representation or a convolutional neural network, and (3) a visual interpretation
layer to highlight the most discriminant regions detected by the model. The integrated unsupervised
feature learning framework was exhaustively evaluated in a histopathology image dataset for BCC diag-
nosis.
Results: The experimental evaluation produced a classification performance of 98.1%, in terms of the
area under receiver-operating-characteristic curve, for the proposed framework outperforming by 7%
the state-of-the-art discrete cosine transform patch-based representation.
Conclusions: The proposed UFL-representation-based approach outperforms state-of-the-art methods for
BCC detection. Thanks to its visual interpretation layer, the method is able to highlight discriminative
58. Fabio A. González Universidad Nacional de Colombia
Feature learning for cancer
diagnosis
Artificial Intelligence in Medicine
journal homepage: www.elsevier.com/locate/aiim
An unsupervised feature learning framework for basal cell carcinoma
image analysis
John Arevaloa
, Angel Cruz-Roaa
, Viviana Ariasb
, Eduardo Romeroc
, Fabio A. Gonzáleza,∗
a
Machine Learning, Perception and Discovery Lab, Systems and Computer Engineering Department, Universidad Nacional de Colombia, Faculty of
Engineering, Cra 30 No 45 03-Ciudad Universitaria, Building 453 Office 114, Bogotá DC, Colombia
b
Pathology Department, Universidad Nacional de Colombia, Faculty of Medicine, Cra 30 No 45 03-Ciudad Universitaria, Bogotá DC, Colombia
c
Computer Imaging & Medical Applications Laboratory, Universidad Nacional de Colombia, Faculty of Medicine, Cra 30 No 45 03-Ciudad Universitaria,
Bogotá DC, Colombia
a r t i c l e i n f o
Article history:
a b s t r a c t
Objective: The paper addresses the problem of automatic detection of basal cell carcinoma (BCC)
406 A.A. Cruz-Roa et al.
Fig. 2. Convolutional auto-encoder neural network architecture for histopathology image repre-
sentation learning, automatic cancer detection and visually interpretable prediction results anal-
ogous to a digital stain identifying image regions that are most relevant for diagnostic decisions.
59. Fabio A. González Universidad Nacional de Colombia
Feature learning for cancer
diagnosis
Artificial Intelligence in Medicine
journal homepage: www.elsevier.com/locate/aiim
An unsupervised feature learning framework for basal cell carcinoma
image analysis
John Arevaloa
, Angel Cruz-Roaa
, Viviana Ariasb
, Eduardo Romeroc
, Fabio A. Gonzáleza,∗
a
Machine Learning, Perception and Discovery Lab, Systems and Computer Engineering Department, Universidad Nacional de Colombia, Faculty of
Engineering, Cra 30 No 45 03-Ciudad Universitaria, Building 453 Office 114, Bogotá DC, Colombia
b
Pathology Department, Universidad Nacional de Colombia, Faculty of Medicine, Cra 30 No 45 03-Ciudad Universitaria, Bogotá DC, Colombia
c
Computer Imaging & Medical Applications Laboratory, Universidad Nacional de Colombia, Faculty of Medicine, Cra 30 No 45 03-Ciudad Universitaria,
Bogotá DC, Colombia
a r t i c l e i n f o
Article history:
a b s t r a c t
Objective: The paper addresses the problem of automatic detection of basal cell carcinoma (BCC)
. Local learned features with different unsupervised feature learning methods. Left: sparse autoencoders learned some features that are visually re
ies of the histopathology dataset (i.e. nuclei shapes) which are highlighted by red squares. Center: reconstruct independent component analysis. R
onent analysis highlighting some translational (blue), color (red), scale (yellow) and rotational (green) invariances. (For interpretation of the referen
e legend, the reader is referred to the web version of this article.)
. Discriminant map of learned features. Enclosed with red path are related with positive (basal cell carcinoma) class and enclosed by blue path are rela
thy tissue) class. Left: softmax weights mapped back to topographic organization. Center: features learned with topographic component analysis. Rig
minative features for positive (top) and negative (bottom) classes. (For interpretation of the references to color in this figure legend, the reader is re
on of this article.)
size is not very significant in terms of time for feature extrac-
process. Excluding SAE second layer, any feature extraction
el proposed here takes less than 2 s to extract features from
4.9. Overall BCC classification results
Table 4 summarizes the systematic evaluation per
60. Fabio A. González Universidad Nacional de Colombia
Feature learning for cancer
diagnosis
Artificial Intelligence in Medicine
journal homepage: www.elsevier.com/locate/aiim
An unsupervised feature learning framework for basal cell carcinoma
image analysis
John Arevaloa
, Angel Cruz-Roaa
, Viviana Ariasb
, Eduardo Romeroc
, Fabio A. Gonzáleza,∗
a
Machine Learning, Perception and Discovery Lab, Systems and Computer Engineering Department, Universidad Nacional de Colombia, Faculty of
Engineering, Cra 30 No 45 03-Ciudad Universitaria, Building 453 Office 114, Bogotá DC, Colombia
b
Pathology Department, Universidad Nacional de Colombia, Faculty of Medicine, Cra 30 No 45 03-Ciudad Universitaria, Bogotá DC, Colombia
c
Computer Imaging & Medical Applications Laboratory, Universidad Nacional de Colombia, Faculty of Medicine, Cra 30 No 45 03-Ciudad Universitaria,
Bogotá DC, Colombia
a r t i c l e i n f o
Article history:
a b s t r a c t
Objective: The paper addresses the problem of automatic detection of basal cell carcinoma (BCC)
142 J. Arevalo et al. / Artificial Intelligence in Medicine 64 (2015) 131–145
Table 2
Outputs produced by the system for different cancer and non-cancer input images. The table rows show from top to bottom: the real image class
predicted by the model, the probability associated to the prediction, and the digital stained image (red stain indicates cancer regions, blue stain in
61. Fabio A. González Universidad Nacional de Colombia
Feature learning for cancer
diagnosis
Artificial Intelligence in Medicine
journal homepage: www.elsevier.com/locate/aiim
An unsupervised feature learning framework for basal cell carcinoma
image analysis
John Arevaloa
, Angel Cruz-Roaa
, Viviana Ariasb
, Eduardo Romeroc
, Fabio A. Gonzáleza,∗
a
Machine Learning, Perception and Discovery Lab, Systems and Computer Engineering Department, Universidad Nacional de Colombia, Faculty of
Engineering, Cra 30 No 45 03-Ciudad Universitaria, Building 453 Office 114, Bogotá DC, Colombia
b
Pathology Department, Universidad Nacional de Colombia, Faculty of Medicine, Cra 30 No 45 03-Ciudad Universitaria, Bogotá DC, Colombia
c
Computer Imaging & Medical Applications Laboratory, Universidad Nacional de Colombia, Faculty of Medicine, Cra 30 No 45 03-Ciudad Universitaria,
Bogotá DC, Colombia
a r t i c l e i n f o
Article history:
a b s t r a c t
Objective: The paper addresses the problem of automatic detection of basal cell carcinoma (BCC)
62. Fabio A. González Universidad Nacional de Colombia
Deep learning for cancer
diagnosis
Cascaded Ensemble of Convolutional Neural Networks and
Handcrafted Features for Mitosis Detection
Haibo Wang *⇤1
, Angel Cruz-Roa*2
, Ajay Basavanhally1
, Hannah Gilmore1
, Natalie Shih3
, Mike
Feldman3
, John Tomaszewski4
, Fabio Gonzalez2
, and Anant Madabhushi1
1
Case Western Reserve University, USA
2
Universidad Nacional de Colombia, Colombia
3
University of Pennsylvania, USA
4
University at Buffalo School of Medicine and Biomedical Sciences, USA
ABSTRACT
Breast cancer (BCa) grading plays an important role in predicting disease aggressiveness and patient outcome. A key
component of BCa grade is mitotic count, which involves quantifying the number of cells in the process of dividing (i.e.
undergoing mitosis) at a specific point in time. Currently mitosis counting is done manually by a pathologist looking at
multiple high power fields on a glass slide under a microscope, an extremely laborious and time consuming process. The
development of computerized systems for automated detection of mitotic nuclei, while highly desirable, is confounded
by the highly variable shape and appearance of mitoses. Existing methods use either handcrafted features that capture
certain morphological, statistical or textural attributes of mitoses or features learned with convolutional neural networks
(CNN). While handcrafted features are inspired by the domain and the particular application, the data-driven CNN models
63. Fabio A. González Universidad Nacional de Colombia
Deep learning for cancer
diagnosis
Cascaded Ensemble of Convolutional Neural Networks and
Handcrafted Features for Mitosis Detection
Haibo Wang *⇤1
, Angel Cruz-Roa*2
, Ajay Basavanhally1
, Hannah Gilmore1
, Natalie Shih3
, Mike
Feldman3
, John Tomaszewski4
, Fabio Gonzalez2
, and Anant Madabhushi1
1
Case Western Reserve University, USA
2
Universidad Nacional de Colombia, Colombia
3
University of Pennsylvania, USA
4
University at Buffalo School of Medicine and Biomedical Sciences, USA
optimization. Including the time needed to extract handcrafted features (6.5 hours in pure MATLAB implementation), the
training stage for HC+CNN was completed in less than 18 hours.
Dataset Method TP FP FN Precision Recall F-measure
Scanner
Aperio
HC+CNN 65 12 35 0.84 0.65 0.7345
HC 64 22 36 0.74 0.64 0.6864
CNN 53 32 47 0.63 0.53 0.5730
IDSIA13
70 9 30 0.89 0.70 0.7821
IPAL4
74 32 26 0.70 0.74 0.7184
SUTECH 72 31 28 0.70 0.72 0.7094
NEC12
59 20 41 0.75 0.59 0.6592
Table 2: Evaluation results for mitosis detection using HC+CNN and comparative methods on the ICPR12 dataset.
Figure 3: Mitoses identified by HC+CNN as TP (green circles), FN (yellow circles), and FP (red circles) on the ICPR12
dataset. The TP examples have distinctive intensity, shape and texture while the FN examples are less distinctive in intensity
and shape. The FP examples are visually more alike to mitotic figures than the FNs.
3.4 Results on AMIDA13 Dataset
On the AMIDA13 dataset, the F-measure of our approach (CCIPD/MINDLAB) is 0.319, which ranks 6 among 14 submis-
64. Fabio A. González Universidad Nacional de Colombia
Deep learning for cancer
diagnosis
Cascaded Ensemble of Convolutional Neural Networks and
Handcrafted Features for Mitosis Detection
Haibo Wang *⇤1
, Angel Cruz-Roa*2
, Ajay Basavanhally1
, Hannah Gilmore1
, Natalie Shih3
, Mike
Feldman3
, John Tomaszewski4
, Fabio Gonzalez2
, and Anant Madabhushi1
1
Case Western Reserve University, USA
2
Universidad Nacional de Colombia, Colombia
3
University of Pennsylvania, USA
4
University at Buffalo School of Medicine and Biomedical Sciences, USA
Figure 3: Mitoses identified by HC+CNN as TP (green circles), FN (yellow circles), and FP (red circles) on the ICPR12
HPF
Handcrafted features
Feature learning
Classifier 1
Classifier 3
Probabilistic
fusion
Classifier 2
HPF Segmentation
Figure 1: Workflow of our methodology. Blue-ratio thresholding14
is first applied to segment mitosis candidates. On each
segmented blob, handcrafted features are extracted and classified via a Random Forests classifier. Meanwhile, on each
segmented 80 ⇥ 80 patch, convolutional neural networks (CNN)8
are trained with a fully connected regression model as
part of the classification layer. For those candidates that are difficult to classify (ambiguous result from the CNN), we train
a second-stage Random Forests classifier on the basis of combining CNN-derived and handcrafted features. Final decision
is obtained via a consensus of the predictions of the three classifiers.
65. Fabio A. González Universidad Nacional de Colombia
Deep learning for cancer
diagnosis
Cascaded Ensemble of Convolutional Neural Networks and
Handcrafted Features for Mitosis Detection
Haibo Wang *⇤1
, Angel Cruz-Roa*2
, Ajay Basavanhally1
, Hannah Gilmore1
, Natalie Shih3
, Mike
Feldman3
, John Tomaszewski4
, Fabio Gonzalez2
, and Anant Madabhushi1
1
Case Western Reserve University, USA
2
Universidad Nacional de Colombia, Colombia
3
University of Pennsylvania, USA
4
University at Buffalo School of Medicine and Biomedical Sciences, USA
Figure 3: Mitoses identified by HC+CNN as TP (green circles), FN (yellow circles), and FP (red circles) on the ICPR12
HPF
Handcrafted features
Feature learning
Classifier 1
Classifier 3
Probabilistic
fusion
Classifier 2
HPF Segmentation
66. Fabio A. González Universidad Nacional de Colombia
Efficient DL over whole slide
pathology images
~2000x2000 pixels
67. Fabio A. González Universidad Nacional de Colombia
Efficient DL over whole slide
pathology images
~2000x2000 pixels
68. Fabio A. González Universidad Nacional de Colombia
Efficient DL over whole slide
pathology images
~2000x2000 pixels
69. Fabio A. González Universidad Nacional de Colombia
Efficient DL over whole slide
pathology images
~2000x2000 pixels
70. Fabio A. González Universidad Nacional de Colombia
Efficient DL over whole slide
pathology images
~2000x2000 pixels
71. Fabio A. González Universidad Nacional de Colombia
Exudate detection in eye
fundus images
72. Fabio A. González Universidad Nacional de Colombia
Exudate detection in eye
fundus images
73. Fabio A. González Universidad Nacional de Colombia
Exudate detection in eye
fundus images
74. Fabio A. González Universidad Nacional de Colombia
Exudate detection in eye
fundus images
75. Fabio A. González Universidad Nacional de Colombia
RNN for book genre
classification
Dataset construction
Annotated
Dataset
Get top tags
from users
No. Class Tags
1 science_fiction sci-fi, science-fiction
2 comedy comedies, comedy, humor
... ... ...
9 religion christian, religion, christianity,...
76. Fabio A. González Universidad Nacional de Colombia
RNN for book genre
classification
Dataset construction
Annotated
Dataset
Get top tags
from users
No. Class Tags
1 science_fiction sci-fi, science-fiction
2 comedy comedies, comedy, humor
... ... ...
9 religion christian, religion, christianity,...
RNN Architecture
77. Fabio A. González Universidad Nacional de Colombia
Deep CNN-RNN for object
tracking in video
78. Fabio A. González Universidad Nacional de Colombia
Deep CNN-RNN for object
tracking in video
79. Fabio A. González Universidad Nacional de Colombia
Deep CNN-RNN for object
tracking in video
Number of sequences in different
public datasets:
• VOT2013: 16
• VOT2014: 25
• VOT2015: 60
• OOT: 50
• ALOV++: 315
80. Fabio A. González Universidad Nacional de Colombia
Deep CNN-RNN for object
tracking in video
Number of sequences in different
public datasets:
• VOT2013: 16
• VOT2014: 25
• VOT2015: 60
• OOT: 50
• ALOV++: 315
81. Fabio A. González Universidad Nacional de Colombia
Deep CNN-RNN for object
tracking in video
Number of sequences in different
public datasets:
• VOT2013: 16
• VOT2014: 25
• VOT2015: 60
• OOT: 50
• ALOV++: 315
82. Fabio A. González Universidad Nacional de Colombia
Deep CNN-RNN for object
tracking in video
Number of sequences in different
public datasets:
• VOT2013: 16
• VOT2014: 25
• VOT2015: 60
• OOT: 50
• ALOV++: 315
83. Fabio A. González Universidad Nacional de Colombia
Deep CNN-RNN for object
tracking in video
84. Fabio A. González Universidad Nacional de Colombia
Deep CNN-RNN for object
tracking in video
85. Fabio A. González Universidad Nacional de Colombia
The Team
Alexis Carrillo
Andrés Esteban Paez
Angel Cruz
Andrés Castillo
Andrés Jaque
Andrés Rosso
Camilo Pino
Claudia Becerra
Fabián Paez
Felipe Baquero
Fredy Díaz
Gustavo Bula
Germán Sosa
Hugo Castellanos
Ingrid Suárez
John Arévalo
JorgeVanegas
Jorge Camargo
Jorge Mario Carrasco
Joseph Alejandro Gallego
José David Bermeo
Juan Carlos Caicedo
Juan Sebastián Otálora
Katherine Rozo
LadyViviana Beltrán
Lina Rosales
Luis Alejandro Riveros
Miguel Chitiva
Óscar Paruma
Óscar Perdomo
Raúl Ramos
Roger Guzmán
Santiago Pérez
Sergio Jiménez
Susana Sánchez
Sebastián Sierra