Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Deep Learning
Neural Networks Reborn
Fabio A. González
Univ. Nacional de Colombia
Fabio A. González Universidad Nacional de Colombia
Some history
https://www.youtube.com/watch?v=cNxadbrN_aI
Fabio A. González Universidad Nacional de Colombia
Rosenblatt’s Perceptron
(1957)
• Input: 20x20 photocells array
• Weight...
Fabio A. González Universidad Nacional de Colombia
Neural networks time line
1943 20161957 1969 1986 1995 2007 2012
Fabio A. González Universidad Nacional de Colombia
Neural networks time line
1943 20161957 1969 1986 1995 2007 2012
Fabio A. González Universidad Nacional de Colombia
Neural networks time line
1943 20161957 1969 1986 1995 2007 2012
Fabio A. González Universidad Nacional de Colombia
Neural networks time line
1943 20161957 1969 1986 1995 2007 2012
Fabio A. González Universidad Nacional de Colombia
Neural networks time line
1943 20161957 1969 1986 1995 2007 2012
Fabio A. González Universidad Nacional de Colombia
Backpropagation
Source: http://home.agh.edu.pl/~vlsi/AI/backp_t_en/back...
Fabio A. González Universidad Nacional de Colombia
Neural networks time line
1943 20161957 1969 1986 1995 2007 2012
Fabio A. González Universidad Nacional de Colombia
Neural networks time line
1943 20161957 1969 1986 1995 2007 2012
Colomb...
Fabio A. González Universidad Nacional de Colombia
Colombian local neural
networks history
Fabio A. González Universidad Nacional de Colombia
Colombian local neural
networks history
UNNeuro (1993, UN)
Fabio A. González Universidad Nacional de Colombia
Neural networks time line
1943 20161957 1969 1986 1995 2007 2012
Fabio A. González Universidad Nacional de Colombia
Neural networks time line
1943 20161957 1969 1986 1995 2007 2012
Fabio A. González Universidad Nacional de Colombia
Neural networks time line
1943 20161957 1969 1986 1995 2007 2012
Fabio A. González Universidad Nacional de Colombia
Deep Learning
Fabio A. González Universidad Nacional de Colombia
Deep learning boom
Fabio A. González Universidad Nacional de Colombia
Deep learning boom
Fabio A. González Universidad Nacional de Colombia
Deep learning boom
Fabio A. González Universidad Nacional de Colombia
Deep learning boom
Fabio A. González Universidad Nacional de Colombia
Deep learning boom
Fabio A. González Universidad Nacional de Colombia
Deep learning boom
Fabio A. González Universidad Nacional de Colombia
Deep learning boom
Fabio A. González Universidad Nacional de Colombia
Deep learning time line
1943 20161957 1969 1986 1995 2007 2012
Fabio A. González Universidad Nacional de Colombia
Deep learning time line
1943 20161957 1969 1986 1995 2007 2012
LETTER C...
Fabio A. González Universidad Nacional de Colombia
Deep learning time line
1943 20161957 1969 1986 1995 2007 2012
LETTER C...
Fabio A. González Universidad Nacional de Colombia
Deep learning time line
1943 20161957 1969 1986 1995 2007 2012
LETTER C...
Fabio A. González Universidad Nacional de Colombia
Deep learning time line
1943 20161957 1969 1986 1995 2007 2012
LETTER C...
Fabio A. González Universidad Nacional de Colombia
Deep learning time line
1943 20161957 1969 1986 1995 2007 2012
LETTER C...
Fabio A. González Universidad Nacional de Colombia
Deep learning time line
1943 20161957 1969 1986 1995 2007 2012
LETTER C...
Fabio A. González Universidad Nacional de Colombia
Deep learning time line
1943 20161957 1969 1986 1995 2007 2012
LETTER C...
Fabio A. González Universidad Nacional de Colombia
Deep learning time line
1943 20161957 1969 1986 1995 2007 2012
LETTER C...
Fabio A. González Universidad Nacional de Colombia
Deep learning model won
ILSVRC 2012 challenge
Fabio A. González Universidad Nacional de Colombia
Deep learning model won
ILSVRC 2012 challenge
1.2 million images
1,000 ...
Fabio A. González Universidad Nacional de Colombia
Deep learning model won
ILSVRC 2012 challenge
1.2 million images
1,000 ...
Fabio A. González Universidad Nacional de Colombia
Deep learning recipe
Data
HPC
Algorithms
Tricks
Feature
learning
Size
Fabio A. González Universidad Nacional de Colombia
Feature learning
Fabio A. González Universidad Nacional de Colombia
Feature learning
Fabio A. González Universidad Nacional de Colombia
Feature learning
Fabio A. González Universidad Nacional de Colombia
Deep learning recipe
Data
HPC
Algorithms
Tricks
Feature
learning
Size
Fabio A. González Universidad Nacional de Colombia
Deep —> Bigger
Revolution of Depth
3.57
6.7 7.3
11.7
16.4
25.8
28.2
ILS...
Fabio A. González Universidad Nacional de Colombia
Deep learning recipe
Data
HPC
Algorithms
Tricks
Feature
learning
Size
Fabio A. González Universidad Nacional de Colombia
Data…
• Images annotated with
WordNet concepts
• Concepts: 21,841
• Ima...
Fabio A. González Universidad Nacional de Colombia
Deep learning recipe
Data
HPC
Algorithms
Tricks
Feature
learning
Size
Fabio A. González Universidad Nacional de Colombia
HPC
Fabio A. González Universidad Nacional de Colombia
HPC
Fabio A. González Universidad Nacional de Colombia
HPC
Fabio A. González Universidad Nacional de Colombia
Deep learning recipe
Data
HPC
Algorithms
Tricks
Feature
learning
Size
Fabio A. González Universidad Nacional de Colombia
Algorithms
• Backpropagation
• Backpropagation through time
• Online le...
Fabio A. González Universidad Nacional de Colombia
Deep learning recipe
Data
HPC
Algorithms
Tricks
Feature
learning
Size
Fabio A. González Universidad Nacional de Colombia
Tricks
• DL is mainly an engineering
problem
• DL networks are hard to ...
Fabio A. González Universidad Nacional de Colombia
Tricks
• DL is mainly an engineering
problem
• DL networks are hard to ...
Fabio A. González Universidad Nacional de Colombia
Applications
• Computer vision:
• Image: annotation, detection, segment...
Fabio A. González Universidad Nacional de Colombia
Deep Learning at
Fabio A. González Universidad Nacional de Colombia
Feature learning for cancer
diagnosisArtificial Intelligence in Medicine...
Fabio A. González Universidad Nacional de Colombia
Feature learning for cancer
diagnosis
Artificial Intelligence in Medicin...
Fabio A. González Universidad Nacional de Colombia
Feature learning for cancer
diagnosis
Artificial Intelligence in Medicin...
Fabio A. González Universidad Nacional de Colombia
Feature learning for cancer
diagnosis
Artificial Intelligence in Medicin...
Fabio A. González Universidad Nacional de Colombia
Feature learning for cancer
diagnosis
Artificial Intelligence in Medicin...
Fabio A. González Universidad Nacional de Colombia
Deep learning for cancer
diagnosis
Cascaded Ensemble of Convolutional N...
Fabio A. González Universidad Nacional de Colombia
Deep learning for cancer
diagnosis
Cascaded Ensemble of Convolutional N...
Fabio A. González Universidad Nacional de Colombia
Deep learning for cancer
diagnosis
Cascaded Ensemble of Convolutional N...
Fabio A. González Universidad Nacional de Colombia
Deep learning for cancer
diagnosis
Cascaded Ensemble of Convolutional N...
Fabio A. González Universidad Nacional de Colombia
Efficient DL over whole slide
pathology images
~2000x2000 pixels
Fabio A. González Universidad Nacional de Colombia
Efficient DL over whole slide
pathology images
~2000x2000 pixels
Fabio A. González Universidad Nacional de Colombia
Efficient DL over whole slide
pathology images
~2000x2000 pixels
Fabio A. González Universidad Nacional de Colombia
Efficient DL over whole slide
pathology images
~2000x2000 pixels
Fabio A. González Universidad Nacional de Colombia
Efficient DL over whole slide
pathology images
~2000x2000 pixels
Fabio A. González Universidad Nacional de Colombia
Exudate detection in eye
fundus images
Fabio A. González Universidad Nacional de Colombia
Exudate detection in eye
fundus images
Fabio A. González Universidad Nacional de Colombia
Exudate detection in eye
fundus images
Fabio A. González Universidad Nacional de Colombia
Exudate detection in eye
fundus images
Fabio A. González Universidad Nacional de Colombia
RNN for book genre
classification
Dataset construction
Annotated
Dataset...
Fabio A. González Universidad Nacional de Colombia
RNN for book genre
classification
Dataset construction
Annotated
Dataset...
Fabio A. González Universidad Nacional de Colombia
Deep CNN-RNN for object
tracking in video
Fabio A. González Universidad Nacional de Colombia
Deep CNN-RNN for object
tracking in video
Fabio A. González Universidad Nacional de Colombia
Deep CNN-RNN for object
tracking in video
Number of sequences in differ...
Fabio A. González Universidad Nacional de Colombia
Deep CNN-RNN for object
tracking in video
Number of sequences in differ...
Fabio A. González Universidad Nacional de Colombia
Deep CNN-RNN for object
tracking in video
Number of sequences in differ...
Fabio A. González Universidad Nacional de Colombia
Deep CNN-RNN for object
tracking in video
Number of sequences in differ...
Fabio A. González Universidad Nacional de Colombia
Deep CNN-RNN for object
tracking in video
Fabio A. González Universidad Nacional de Colombia
Deep CNN-RNN for object
tracking in video
Fabio A. González Universidad Nacional de Colombia
The Team
Alexis Carrillo
Andrés Esteban Paez
Angel Cruz
Andrés Castillo...
Gracias!
fagonzalezo@unal.edu.co
http://mindlaboratory.org
Upcoming SlideShare
Loading in …5
×

Deep learning: el renacimiento de las redes neuronales

1,746 views

Published on

El deep learning, o aprendizaje profundo, ha revolucionado el panorama del aprendizaje automático, en particular, y de la inteligencia artificial, en general. Los modelos de redes neuronales profundas (con un gran número de capas) han permitido obtener avances importantes en diversas tareas de aprendizaje, percepción y análisis de datos, que van desde la clasificación de imágenes hasta el reconocimiento del habla.

En la charla se presentarán, de manera general, los fundamentos de estos modelos y diferentes casos de aplicación en aprendizaje de la representación, visión por computador y análisis de texto entre otros. Se revisarán los avances teóricos y tecnológicos que han permitido abordar estos complejos problemas y se discutirá la experiencia tecnológica y científica en proyectos de investigación adelantados en Colombia.

Presentador: Fabio Gonzalez. Profesor Titular del Depto. de Ingeniería de Sistemas e Industrial de la Universidad Nacional de Colombia, donde lidera el Laboratorio de aprendizaje, percepción y descubrimiento automático (MindLab). Su trabajo de investigación se concentra en el aprendizaje automático, la recuperación de información y la visión por computador, con aplicaciones en campos diversos como el análisis de imágenes médicas, el análisis automático de textos y el aprendizaje a partir de información multimodal, entre otros.

Published in: Data & Analytics
  • Be the first to comment

Deep learning: el renacimiento de las redes neuronales

  1. 1. Deep Learning Neural Networks Reborn Fabio A. González Univ. Nacional de Colombia
  2. 2. Fabio A. González Universidad Nacional de Colombia Some history
  3. 3. https://www.youtube.com/watch?v=cNxadbrN_aI
  4. 4. Fabio A. González Universidad Nacional de Colombia Rosenblatt’s Perceptron (1957) • Input: 20x20 photocells array • Weights implemented with potentiometers • Weight updating performed by electric motors
  5. 5. Fabio A. González Universidad Nacional de Colombia Neural networks time line 1943 20161957 1969 1986 1995 2007 2012
  6. 6. Fabio A. González Universidad Nacional de Colombia Neural networks time line 1943 20161957 1969 1986 1995 2007 2012
  7. 7. Fabio A. González Universidad Nacional de Colombia Neural networks time line 1943 20161957 1969 1986 1995 2007 2012
  8. 8. Fabio A. González Universidad Nacional de Colombia Neural networks time line 1943 20161957 1969 1986 1995 2007 2012
  9. 9. Fabio A. González Universidad Nacional de Colombia Neural networks time line 1943 20161957 1969 1986 1995 2007 2012
  10. 10. Fabio A. González Universidad Nacional de Colombia Backpropagation Source: http://home.agh.edu.pl/~vlsi/AI/backp_t_en/backprop.html
  11. 11. Fabio A. González Universidad Nacional de Colombia Neural networks time line 1943 20161957 1969 1986 1995 2007 2012
  12. 12. Fabio A. González Universidad Nacional de Colombia Neural networks time line 1943 20161957 1969 1986 1995 2007 2012 Colombian local NN history
  13. 13. Fabio A. González Universidad Nacional de Colombia Colombian local neural networks history
  14. 14. Fabio A. González Universidad Nacional de Colombia Colombian local neural networks history UNNeuro (1993, UN)
  15. 15. Fabio A. González Universidad Nacional de Colombia Neural networks time line 1943 20161957 1969 1986 1995 2007 2012
  16. 16. Fabio A. González Universidad Nacional de Colombia Neural networks time line 1943 20161957 1969 1986 1995 2007 2012
  17. 17. Fabio A. González Universidad Nacional de Colombia Neural networks time line 1943 20161957 1969 1986 1995 2007 2012
  18. 18. Fabio A. González Universidad Nacional de Colombia Deep Learning
  19. 19. Fabio A. González Universidad Nacional de Colombia Deep learning boom
  20. 20. Fabio A. González Universidad Nacional de Colombia Deep learning boom
  21. 21. Fabio A. González Universidad Nacional de Colombia Deep learning boom
  22. 22. Fabio A. González Universidad Nacional de Colombia Deep learning boom
  23. 23. Fabio A. González Universidad Nacional de Colombia Deep learning boom
  24. 24. Fabio A. González Universidad Nacional de Colombia Deep learning boom
  25. 25. Fabio A. González Universidad Nacional de Colombia Deep learning boom
  26. 26. Fabio A. González Universidad Nacional de Colombia Deep learning time line 1943 20161957 1969 1986 1995 2007 2012
  27. 27. Fabio A. González Universidad Nacional de Colombia Deep learning time line 1943 20161957 1969 1986 1995 2007 2012 LETTER Communicated by Yann Le Cun A Fast Learning Algorithm for Deep Belief Nets Geoffrey E. Hinton hinton@cs.toronto.edu Simon Osindero osindero@cs.toronto.edu Department of Computer Science, University of Toronto, Toronto, Canada M5S 3G4 Yee-Whye Teh tehyw@comp.nus.edu.sg Department of Computer Science, National University of Singapore, Singapore 117543 We show how to use “complementary priors” to eliminate the explaining- away effects that make inference difficult in densely connected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associa- tive memory. The fast, greedy algorithm is used to initialize a slower learning procedure that fine-tunes the weights using a contrastive ver- sion of the wake-sleep algorithm. After fine-tuning, a network with three hidden layers forms a very good generative model of the joint distribu- tion of handwritten digit images and their labels. This generative model gives better digit classification than the best discriminative learning al- gorithms. The low-dimensional manifolds on which the digits lie are modeled by long ravines in the free-energy landscape of the top-level associative memory, and it is easy to explore these ravines by using the directed connections to display what the associative memory has in mind. 1 Introduction Learning is difficult in densely connected, directed belief nets that have many hidden layers because it is difficult to infer the conditional distribu- tion of the hidden activities when given a data vector. Variational methods use simple approximations to the true conditional distribution, but the ap- proximations may be poor, especially at the deepest hidden layer, where the prior assumes independence. Also, variational learning still requires all of the parameters to be learned together and this makes the learning time scale poorly as the number of parameters increases. We describe a model in which the top two hidden layers form an undi- rected associative memory (see Figure 1) and the remaining hidden layers Neural Computation 18, 1527–1554 (2006) C⃝ 2006 Massachusetts Institute of Technology LETTER Communicated by Yann Le Cun A Fast Learning Algorithm for Deep Belief Nets Geoffrey E. Hinton hinton@cs.toronto.edu Simon Osindero osindero@cs.toronto.edu Department of Computer Science, University of Toronto, Toronto, Canada M5S 3G4 Yee-Whye Teh tehyw@comp.nus.edu.sg Department of Computer Science, National University of Singapore, Singapore 117543 We show how to use “complementary priors” to eliminate the explaining- away effects that make inference difficult in densely connected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associa- tive memory. The fast, greedy algorithm is used to initialize a slower learning procedure that fine-tunes the weights using a contrastive ver- sion of the wake-sleep algorithm. After fine-tuning, a network with three hidden layers forms a very good generative model of the joint distribu- tion of handwritten digit images and their labels. This generative model gives better digit classification than the best discriminative learning al- gorithms. The low-dimensional manifolds on which the digits lie are modeled by long ravines in the free-energy landscape of the top-level associative memory, and it is easy to explore these ravines by using the directed connections to display what the associative memory has in mind. 1 Introduction Learning is difficult in densely connected, directed belief nets that have many hidden layers because it is difficult to infer the conditional distribu- tion of the hidden activities when given a data vector. Variational methods use simple approximations to the true conditional distribution, but the ap- proximations may be poor, especially at the deepest hidden layer, where the prior assumes independence. Also, variational learning still requires all of the parameters to be learned together and this makes the learning time scale poorly as the number of parameters increases. We describe a model in which the top two hidden layers form an undi- rected associative memory (see Figure 1) and the remaining hidden layers Neural Computation 18, 1527–1554 (2006) C⃝ 2006 Massachusetts Institute of Technology
  28. 28. Fabio A. González Universidad Nacional de Colombia Deep learning time line 1943 20161957 1969 1986 1995 2007 2012 LETTER Communicated by Yann Le Cun A Fast Learning Algorithm for Deep Belief Nets Geoffrey E. Hinton hinton@cs.toronto.edu Simon Osindero osindero@cs.toronto.edu Department of Computer Science, University of Toronto, Toronto, Canada M5S 3G4 Yee-Whye Teh tehyw@comp.nus.edu.sg Department of Computer Science, National University of Singapore, Singapore 117543 We show how to use “complementary priors” to eliminate the explaining- away effects that make inference difficult in densely connected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associa- tive memory. The fast, greedy algorithm is used to initialize a slower learning procedure that fine-tunes the weights using a contrastive ver- sion of the wake-sleep algorithm. After fine-tuning, a network with three hidden layers forms a very good generative model of the joint distribu- tion of handwritten digit images and their labels. This generative model gives better digit classification than the best discriminative learning al- gorithms. The low-dimensional manifolds on which the digits lie are modeled by long ravines in the free-energy landscape of the top-level associative memory, and it is easy to explore these ravines by using the directed connections to display what the associative memory has in mind. 1 Introduction Learning is difficult in densely connected, directed belief nets that have many hidden layers because it is difficult to infer the conditional distribu- tion of the hidden activities when given a data vector. Variational methods use simple approximations to the true conditional distribution, but the ap- proximations may be poor, especially at the deepest hidden layer, where the prior assumes independence. Also, variational learning still requires all of the parameters to be learned together and this makes the learning time scale poorly as the number of parameters increases. We describe a model in which the top two hidden layers form an undi- rected associative memory (see Figure 1) and the remaining hidden layers Neural Computation 18, 1527–1554 (2006) C⃝ 2006 Massachusetts Institute of Technology LETTER Communicated by Yann Le Cun A Fast Learning Algorithm for Deep Belief Nets Geoffrey E. Hinton hinton@cs.toronto.edu Simon Osindero osindero@cs.toronto.edu Department of Computer Science, University of Toronto, Toronto, Canada M5S 3G4 Yee-Whye Teh tehyw@comp.nus.edu.sg Department of Computer Science, National University of Singapore, Singapore 117543 We show how to use “complementary priors” to eliminate the explaining- away effects that make inference difficult in densely connected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associa- tive memory. The fast, greedy algorithm is used to initialize a slower learning procedure that fine-tunes the weights using a contrastive ver- sion of the wake-sleep algorithm. After fine-tuning, a network with three hidden layers forms a very good generative model of the joint distribu- tion of handwritten digit images and their labels. This generative model gives better digit classification than the best discriminative learning al- gorithms. The low-dimensional manifolds on which the digits lie are modeled by long ravines in the free-energy landscape of the top-level associative memory, and it is easy to explore these ravines by using the directed connections to display what the associative memory has in mind. 1 Introduction Learning is difficult in densely connected, directed belief nets that have many hidden layers because it is difficult to infer the conditional distribu- tion of the hidden activities when given a data vector. Variational methods use simple approximations to the true conditional distribution, but the ap- proximations may be poor, especially at the deepest hidden layer, where the prior assumes independence. Also, variational learning still requires all of the parameters to be learned together and this makes the learning time scale poorly as the number of parameters increases. We describe a model in which the top two hidden layers form an undi- rected associative memory (see Figure 1) and the remaining hidden layers Neural Computation 18, 1527–1554 (2006) C⃝ 2006 Massachusetts Institute of Technology
  29. 29. Fabio A. González Universidad Nacional de Colombia Deep learning time line 1943 20161957 1969 1986 1995 2007 2012 LETTER Communicated by Yann Le Cun A Fast Learning Algorithm for Deep Belief Nets Geoffrey E. Hinton hinton@cs.toronto.edu Simon Osindero osindero@cs.toronto.edu Department of Computer Science, University of Toronto, Toronto, Canada M5S 3G4 Yee-Whye Teh tehyw@comp.nus.edu.sg Department of Computer Science, National University of Singapore, Singapore 117543 We show how to use “complementary priors” to eliminate the explaining- away effects that make inference difficult in densely connected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associa- tive memory. The fast, greedy algorithm is used to initialize a slower learning procedure that fine-tunes the weights using a contrastive ver- sion of the wake-sleep algorithm. After fine-tuning, a network with three hidden layers forms a very good generative model of the joint distribu- tion of handwritten digit images and their labels. This generative model gives better digit classification than the best discriminative learning al- gorithms. The low-dimensional manifolds on which the digits lie are modeled by long ravines in the free-energy landscape of the top-level associative memory, and it is easy to explore these ravines by using the directed connections to display what the associative memory has in mind. 1 Introduction Learning is difficult in densely connected, directed belief nets that have many hidden layers because it is difficult to infer the conditional distribu- tion of the hidden activities when given a data vector. Variational methods use simple approximations to the true conditional distribution, but the ap- proximations may be poor, especially at the deepest hidden layer, where the prior assumes independence. Also, variational learning still requires all of the parameters to be learned together and this makes the learning time scale poorly as the number of parameters increases. We describe a model in which the top two hidden layers form an undi- rected associative memory (see Figure 1) and the remaining hidden layers Neural Computation 18, 1527–1554 (2006) C⃝ 2006 Massachusetts Institute of Technology LETTER Communicated by Yann Le Cun A Fast Learning Algorithm for Deep Belief Nets Geoffrey E. Hinton hinton@cs.toronto.edu Simon Osindero osindero@cs.toronto.edu Department of Computer Science, University of Toronto, Toronto, Canada M5S 3G4 Yee-Whye Teh tehyw@comp.nus.edu.sg Department of Computer Science, National University of Singapore, Singapore 117543 We show how to use “complementary priors” to eliminate the explaining- away effects that make inference difficult in densely connected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associa- tive memory. The fast, greedy algorithm is used to initialize a slower learning procedure that fine-tunes the weights using a contrastive ver- sion of the wake-sleep algorithm. After fine-tuning, a network with three hidden layers forms a very good generative model of the joint distribu- tion of handwritten digit images and their labels. This generative model gives better digit classification than the best discriminative learning al- gorithms. The low-dimensional manifolds on which the digits lie are modeled by long ravines in the free-energy landscape of the top-level associative memory, and it is easy to explore these ravines by using the directed connections to display what the associative memory has in mind. 1 Introduction Learning is difficult in densely connected, directed belief nets that have many hidden layers because it is difficult to infer the conditional distribu- tion of the hidden activities when given a data vector. Variational methods use simple approximations to the true conditional distribution, but the ap- proximations may be poor, especially at the deepest hidden layer, where the prior assumes independence. Also, variational learning still requires all of the parameters to be learned together and this makes the learning time scale poorly as the number of parameters increases. We describe a model in which the top two hidden layers form an undi- rected associative memory (see Figure 1) and the remaining hidden layers Neural Computation 18, 1527–1554 (2006) C⃝ 2006 Massachusetts Institute of Technology
  30. 30. Fabio A. González Universidad Nacional de Colombia Deep learning time line 1943 20161957 1969 1986 1995 2007 2012 LETTER Communicated by Yann Le Cun A Fast Learning Algorithm for Deep Belief Nets Geoffrey E. Hinton hinton@cs.toronto.edu Simon Osindero osindero@cs.toronto.edu Department of Computer Science, University of Toronto, Toronto, Canada M5S 3G4 Yee-Whye Teh tehyw@comp.nus.edu.sg Department of Computer Science, National University of Singapore, Singapore 117543 We show how to use “complementary priors” to eliminate the explaining- away effects that make inference difficult in densely connected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associa- tive memory. The fast, greedy algorithm is used to initialize a slower learning procedure that fine-tunes the weights using a contrastive ver- sion of the wake-sleep algorithm. After fine-tuning, a network with three hidden layers forms a very good generative model of the joint distribu- tion of handwritten digit images and their labels. This generative model gives better digit classification than the best discriminative learning al- gorithms. The low-dimensional manifolds on which the digits lie are modeled by long ravines in the free-energy landscape of the top-level associative memory, and it is easy to explore these ravines by using the directed connections to display what the associative memory has in mind. 1 Introduction Learning is difficult in densely connected, directed belief nets that have many hidden layers because it is difficult to infer the conditional distribu- tion of the hidden activities when given a data vector. Variational methods use simple approximations to the true conditional distribution, but the ap- proximations may be poor, especially at the deepest hidden layer, where the prior assumes independence. Also, variational learning still requires all of the parameters to be learned together and this makes the learning time scale poorly as the number of parameters increases. We describe a model in which the top two hidden layers form an undi- rected associative memory (see Figure 1) and the remaining hidden layers Neural Computation 18, 1527–1554 (2006) C⃝ 2006 Massachusetts Institute of Technology LETTER Communicated by Yann Le Cun A Fast Learning Algorithm for Deep Belief Nets Geoffrey E. Hinton hinton@cs.toronto.edu Simon Osindero osindero@cs.toronto.edu Department of Computer Science, University of Toronto, Toronto, Canada M5S 3G4 Yee-Whye Teh tehyw@comp.nus.edu.sg Department of Computer Science, National University of Singapore, Singapore 117543 We show how to use “complementary priors” to eliminate the explaining- away effects that make inference difficult in densely connected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associa- tive memory. The fast, greedy algorithm is used to initialize a slower learning procedure that fine-tunes the weights using a contrastive ver- sion of the wake-sleep algorithm. After fine-tuning, a network with three hidden layers forms a very good generative model of the joint distribu- tion of handwritten digit images and their labels. This generative model gives better digit classification than the best discriminative learning al- gorithms. The low-dimensional manifolds on which the digits lie are modeled by long ravines in the free-energy landscape of the top-level associative memory, and it is easy to explore these ravines by using the directed connections to display what the associative memory has in mind. 1 Introduction Learning is difficult in densely connected, directed belief nets that have many hidden layers because it is difficult to infer the conditional distribu- tion of the hidden activities when given a data vector. Variational methods use simple approximations to the true conditional distribution, but the ap- proximations may be poor, especially at the deepest hidden layer, where the prior assumes independence. Also, variational learning still requires all of the parameters to be learned together and this makes the learning time scale poorly as the number of parameters increases. We describe a model in which the top two hidden layers form an undi- rected associative memory (see Figure 1) and the remaining hidden layers Neural Computation 18, 1527–1554 (2006) C⃝ 2006 Massachusetts Institute of Technology
  31. 31. Fabio A. González Universidad Nacional de Colombia Deep learning time line 1943 20161957 1969 1986 1995 2007 2012 LETTER Communicated by Yann Le Cun A Fast Learning Algorithm for Deep Belief Nets Geoffrey E. Hinton hinton@cs.toronto.edu Simon Osindero osindero@cs.toronto.edu Department of Computer Science, University of Toronto, Toronto, Canada M5S 3G4 Yee-Whye Teh tehyw@comp.nus.edu.sg Department of Computer Science, National University of Singapore, Singapore 117543 We show how to use “complementary priors” to eliminate the explaining- away effects that make inference difficult in densely connected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associa- tive memory. The fast, greedy algorithm is used to initialize a slower learning procedure that fine-tunes the weights using a contrastive ver- sion of the wake-sleep algorithm. After fine-tuning, a network with three hidden layers forms a very good generative model of the joint distribu- tion of handwritten digit images and their labels. This generative model gives better digit classification than the best discriminative learning al- gorithms. The low-dimensional manifolds on which the digits lie are modeled by long ravines in the free-energy landscape of the top-level associative memory, and it is easy to explore these ravines by using the directed connections to display what the associative memory has in mind. 1 Introduction Learning is difficult in densely connected, directed belief nets that have many hidden layers because it is difficult to infer the conditional distribu- tion of the hidden activities when given a data vector. Variational methods use simple approximations to the true conditional distribution, but the ap- proximations may be poor, especially at the deepest hidden layer, where the prior assumes independence. Also, variational learning still requires all of the parameters to be learned together and this makes the learning time scale poorly as the number of parameters increases. We describe a model in which the top two hidden layers form an undi- rected associative memory (see Figure 1) and the remaining hidden layers Neural Computation 18, 1527–1554 (2006) C⃝ 2006 Massachusetts Institute of Technology LETTER Communicated by Yann Le Cun A Fast Learning Algorithm for Deep Belief Nets Geoffrey E. Hinton hinton@cs.toronto.edu Simon Osindero osindero@cs.toronto.edu Department of Computer Science, University of Toronto, Toronto, Canada M5S 3G4 Yee-Whye Teh tehyw@comp.nus.edu.sg Department of Computer Science, National University of Singapore, Singapore 117543 We show how to use “complementary priors” to eliminate the explaining- away effects that make inference difficult in densely connected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associa- tive memory. The fast, greedy algorithm is used to initialize a slower learning procedure that fine-tunes the weights using a contrastive ver- sion of the wake-sleep algorithm. After fine-tuning, a network with three hidden layers forms a very good generative model of the joint distribu- tion of handwritten digit images and their labels. This generative model gives better digit classification than the best discriminative learning al- gorithms. The low-dimensional manifolds on which the digits lie are modeled by long ravines in the free-energy landscape of the top-level associative memory, and it is easy to explore these ravines by using the directed connections to display what the associative memory has in mind. 1 Introduction Learning is difficult in densely connected, directed belief nets that have many hidden layers because it is difficult to infer the conditional distribu- tion of the hidden activities when given a data vector. Variational methods use simple approximations to the true conditional distribution, but the ap- proximations may be poor, especially at the deepest hidden layer, where the prior assumes independence. Also, variational learning still requires all of the parameters to be learned together and this makes the learning time scale poorly as the number of parameters increases. We describe a model in which the top two hidden layers form an undi- rected associative memory (see Figure 1) and the remaining hidden layers Neural Computation 18, 1527–1554 (2006) C⃝ 2006 Massachusetts Institute of Technology
  32. 32. Fabio A. González Universidad Nacional de Colombia Deep learning time line 1943 20161957 1969 1986 1995 2007 2012 LETTER Communicated by Yann Le Cun A Fast Learning Algorithm for Deep Belief Nets Geoffrey E. Hinton hinton@cs.toronto.edu Simon Osindero osindero@cs.toronto.edu Department of Computer Science, University of Toronto, Toronto, Canada M5S 3G4 Yee-Whye Teh tehyw@comp.nus.edu.sg Department of Computer Science, National University of Singapore, Singapore 117543 We show how to use “complementary priors” to eliminate the explaining- away effects that make inference difficult in densely connected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associa- tive memory. The fast, greedy algorithm is used to initialize a slower learning procedure that fine-tunes the weights using a contrastive ver- sion of the wake-sleep algorithm. After fine-tuning, a network with three hidden layers forms a very good generative model of the joint distribu- tion of handwritten digit images and their labels. This generative model gives better digit classification than the best discriminative learning al- gorithms. The low-dimensional manifolds on which the digits lie are modeled by long ravines in the free-energy landscape of the top-level associative memory, and it is easy to explore these ravines by using the directed connections to display what the associative memory has in mind. 1 Introduction Learning is difficult in densely connected, directed belief nets that have many hidden layers because it is difficult to infer the conditional distribu- tion of the hidden activities when given a data vector. Variational methods use simple approximations to the true conditional distribution, but the ap- proximations may be poor, especially at the deepest hidden layer, where the prior assumes independence. Also, variational learning still requires all of the parameters to be learned together and this makes the learning time scale poorly as the number of parameters increases. We describe a model in which the top two hidden layers form an undi- rected associative memory (see Figure 1) and the remaining hidden layers Neural Computation 18, 1527–1554 (2006) C⃝ 2006 Massachusetts Institute of Technology LETTER Communicated by Yann Le Cun A Fast Learning Algorithm for Deep Belief Nets Geoffrey E. Hinton hinton@cs.toronto.edu Simon Osindero osindero@cs.toronto.edu Department of Computer Science, University of Toronto, Toronto, Canada M5S 3G4 Yee-Whye Teh tehyw@comp.nus.edu.sg Department of Computer Science, National University of Singapore, Singapore 117543 We show how to use “complementary priors” to eliminate the explaining- away effects that make inference difficult in densely connected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associa- tive memory. The fast, greedy algorithm is used to initialize a slower learning procedure that fine-tunes the weights using a contrastive ver- sion of the wake-sleep algorithm. After fine-tuning, a network with three hidden layers forms a very good generative model of the joint distribu- tion of handwritten digit images and their labels. This generative model gives better digit classification than the best discriminative learning al- gorithms. The low-dimensional manifolds on which the digits lie are modeled by long ravines in the free-energy landscape of the top-level associative memory, and it is easy to explore these ravines by using the directed connections to display what the associative memory has in mind. 1 Introduction Learning is difficult in densely connected, directed belief nets that have many hidden layers because it is difficult to infer the conditional distribu- tion of the hidden activities when given a data vector. Variational methods use simple approximations to the true conditional distribution, but the ap- proximations may be poor, especially at the deepest hidden layer, where the prior assumes independence. Also, variational learning still requires all of the parameters to be learned together and this makes the learning time scale poorly as the number of parameters increases. We describe a model in which the top two hidden layers form an undi- rected associative memory (see Figure 1) and the remaining hidden layers Neural Computation 18, 1527–1554 (2006) C⃝ 2006 Massachusetts Institute of Technology LETTER Communicated by Yann Le Cun A Fast Learning Algorithm for Deep Belief Nets Geoffrey E. Hinton hinton@cs.toronto.edu Simon Osindero osindero@cs.toronto.edu Department of Computer Science, University of Toronto, Toronto, Canada M5S 3G4 Yee-Whye Teh tehyw@comp.nus.edu.sg Department of Computer Science, National University of Singapore, Singapore 117543 We show how to use “complementary priors” to eliminate the explaining- away effects that make inference difficult in densely connected belief nets that have many hidden layers. Using complementary priors, we derive a Learning is difficult in densely connected, directed belief nets that have many hidden layers because it is difficult to infer the conditional distribu- tion of the hidden activities when given a data vector. Variational methods use simple approximations to the true conditional distribution, but the ap- proximations may be poor, especially at the deepest hidden layer, where the prior assumes independence. Also, variational learning still requires all of the parameters to be learned together and this makes the learning time scale poorly as the number of parameters increases. We describe a model in which the top two hidden layers form an undi- rected associative memory (see Figure 1) and the remaining hidden layers Neural Computation 18, 1527–1554 (2006) C⃝ 2006 Massachusetts Institute of Technology
  33. 33. Fabio A. González Universidad Nacional de Colombia Deep learning time line 1943 20161957 1969 1986 1995 2007 2012 LETTER Communicated by Yann Le Cun A Fast Learning Algorithm for Deep Belief Nets Geoffrey E. Hinton hinton@cs.toronto.edu Simon Osindero osindero@cs.toronto.edu Department of Computer Science, University of Toronto, Toronto, Canada M5S 3G4 Yee-Whye Teh tehyw@comp.nus.edu.sg Department of Computer Science, National University of Singapore, Singapore 117543 We show how to use “complementary priors” to eliminate the explaining- away effects that make inference difficult in densely connected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associa- tive memory. The fast, greedy algorithm is used to initialize a slower learning procedure that fine-tunes the weights using a contrastive ver- sion of the wake-sleep algorithm. After fine-tuning, a network with three hidden layers forms a very good generative model of the joint distribu- tion of handwritten digit images and their labels. This generative model gives better digit classification than the best discriminative learning al- gorithms. The low-dimensional manifolds on which the digits lie are modeled by long ravines in the free-energy landscape of the top-level associative memory, and it is easy to explore these ravines by using the directed connections to display what the associative memory has in mind. 1 Introduction Learning is difficult in densely connected, directed belief nets that have many hidden layers because it is difficult to infer the conditional distribu- tion of the hidden activities when given a data vector. Variational methods use simple approximations to the true conditional distribution, but the ap- proximations may be poor, especially at the deepest hidden layer, where the prior assumes independence. Also, variational learning still requires all of the parameters to be learned together and this makes the learning time scale poorly as the number of parameters increases. We describe a model in which the top two hidden layers form an undi- rected associative memory (see Figure 1) and the remaining hidden layers Neural Computation 18, 1527–1554 (2006) C⃝ 2006 Massachusetts Institute of Technology LETTER Communicated by Yann Le Cun A Fast Learning Algorithm for Deep Belief Nets Geoffrey E. Hinton hinton@cs.toronto.edu Simon Osindero osindero@cs.toronto.edu Department of Computer Science, University of Toronto, Toronto, Canada M5S 3G4 Yee-Whye Teh tehyw@comp.nus.edu.sg Department of Computer Science, National University of Singapore, Singapore 117543 We show how to use “complementary priors” to eliminate the explaining- away effects that make inference difficult in densely connected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associa- tive memory. The fast, greedy algorithm is used to initialize a slower learning procedure that fine-tunes the weights using a contrastive ver- sion of the wake-sleep algorithm. After fine-tuning, a network with three hidden layers forms a very good generative model of the joint distribu- tion of handwritten digit images and their labels. This generative model gives better digit classification than the best discriminative learning al- gorithms. The low-dimensional manifolds on which the digits lie are modeled by long ravines in the free-energy landscape of the top-level associative memory, and it is easy to explore these ravines by using the directed connections to display what the associative memory has in mind. 1 Introduction Learning is difficult in densely connected, directed belief nets that have many hidden layers because it is difficult to infer the conditional distribu- tion of the hidden activities when given a data vector. Variational methods use simple approximations to the true conditional distribution, but the ap- proximations may be poor, especially at the deepest hidden layer, where the prior assumes independence. Also, variational learning still requires all of the parameters to be learned together and this makes the learning time scale poorly as the number of parameters increases. We describe a model in which the top two hidden layers form an undi- rected associative memory (see Figure 1) and the remaining hidden layers Neural Computation 18, 1527–1554 (2006) C⃝ 2006 Massachusetts Institute of Technology
  34. 34. Fabio A. González Universidad Nacional de Colombia Deep learning time line 1943 20161957 1969 1986 1995 2007 2012 LETTER Communicated by Yann Le Cun A Fast Learning Algorithm for Deep Belief Nets Geoffrey E. Hinton hinton@cs.toronto.edu Simon Osindero osindero@cs.toronto.edu Department of Computer Science, University of Toronto, Toronto, Canada M5S 3G4 Yee-Whye Teh tehyw@comp.nus.edu.sg Department of Computer Science, National University of Singapore, Singapore 117543 We show how to use “complementary priors” to eliminate the explaining- away effects that make inference difficult in densely connected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associa- tive memory. The fast, greedy algorithm is used to initialize a slower learning procedure that fine-tunes the weights using a contrastive ver- sion of the wake-sleep algorithm. After fine-tuning, a network with three hidden layers forms a very good generative model of the joint distribu- tion of handwritten digit images and their labels. This generative model gives better digit classification than the best discriminative learning al- gorithms. The low-dimensional manifolds on which the digits lie are modeled by long ravines in the free-energy landscape of the top-level associative memory, and it is easy to explore these ravines by using the directed connections to display what the associative memory has in mind. 1 Introduction Learning is difficult in densely connected, directed belief nets that have many hidden layers because it is difficult to infer the conditional distribu- tion of the hidden activities when given a data vector. Variational methods use simple approximations to the true conditional distribution, but the ap- proximations may be poor, especially at the deepest hidden layer, where the prior assumes independence. Also, variational learning still requires all of the parameters to be learned together and this makes the learning time scale poorly as the number of parameters increases. We describe a model in which the top two hidden layers form an undi- rected associative memory (see Figure 1) and the remaining hidden layers Neural Computation 18, 1527–1554 (2006) C⃝ 2006 Massachusetts Institute of Technology LETTER Communicated by Yann Le Cun A Fast Learning Algorithm for Deep Belief Nets Geoffrey E. Hinton hinton@cs.toronto.edu Simon Osindero osindero@cs.toronto.edu Department of Computer Science, University of Toronto, Toronto, Canada M5S 3G4 Yee-Whye Teh tehyw@comp.nus.edu.sg Department of Computer Science, National University of Singapore, Singapore 117543 We show how to use “complementary priors” to eliminate the explaining- away effects that make inference difficult in densely connected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associa- tive memory. The fast, greedy algorithm is used to initialize a slower learning procedure that fine-tunes the weights using a contrastive ver- sion of the wake-sleep algorithm. After fine-tuning, a network with three hidden layers forms a very good generative model of the joint distribu- tion of handwritten digit images and their labels. This generative model gives better digit classification than the best discriminative learning al- gorithms. The low-dimensional manifolds on which the digits lie are modeled by long ravines in the free-energy landscape of the top-level associative memory, and it is easy to explore these ravines by using the directed connections to display what the associative memory has in mind. 1 Introduction Learning is difficult in densely connected, directed belief nets that have many hidden layers because it is difficult to infer the conditional distribu- tion of the hidden activities when given a data vector. Variational methods use simple approximations to the true conditional distribution, but the ap- proximations may be poor, especially at the deepest hidden layer, where the prior assumes independence. Also, variational learning still requires all of the parameters to be learned together and this makes the learning time scale poorly as the number of parameters increases. We describe a model in which the top two hidden layers form an undi- rected associative memory (see Figure 1) and the remaining hidden layers Neural Computation 18, 1527–1554 (2006) C⃝ 2006 Massachusetts Institute of Technology
  35. 35. Fabio A. González Universidad Nacional de Colombia Deep learning model won ILSVRC 2012 challenge
  36. 36. Fabio A. González Universidad Nacional de Colombia Deep learning model won ILSVRC 2012 challenge 1.2 million images 1,000 concepts
  37. 37. Fabio A. González Universidad Nacional de Colombia Deep learning model won ILSVRC 2012 challenge 1.2 million images 1,000 concepts
  38. 38. Fabio A. González Universidad Nacional de Colombia Deep learning recipe Data HPC Algorithms Tricks Feature learning Size
  39. 39. Fabio A. González Universidad Nacional de Colombia Feature learning
  40. 40. Fabio A. González Universidad Nacional de Colombia Feature learning
  41. 41. Fabio A. González Universidad Nacional de Colombia Feature learning
  42. 42. Fabio A. González Universidad Nacional de Colombia Deep learning recipe Data HPC Algorithms Tricks Feature learning Size
  43. 43. Fabio A. González Universidad Nacional de Colombia Deep —> Bigger Revolution of Depth 3.57 6.7 7.3 11.7 16.4 25.8 28.2 ILSVRC'15 ResNet ILSVRC'14 GoogleNet ILSVRC'14 VGG ILSVRC'13 ILSVRC'12 AlexNet ILSVRC'11 ILSVRC'10 ImageNet Classification top-5 error (%) shallow8 layers 19 layers22 layers 152 layers Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. “Deep Residual Learning for Image Recognition”. arXiv 2015. 8 layers
  44. 44. Fabio A. González Universidad Nacional de Colombia Deep learning recipe Data HPC Algorithms Tricks Feature learning Size
  45. 45. Fabio A. González Universidad Nacional de Colombia Data… • Images annotated with WordNet concepts • Concepts: 21,841 • Images: 14,197,122 • Bounding box annotations: 1,034,908 • Crowdsourcing
  46. 46. Fabio A. González Universidad Nacional de Colombia Deep learning recipe Data HPC Algorithms Tricks Feature learning Size
  47. 47. Fabio A. González Universidad Nacional de Colombia HPC
  48. 48. Fabio A. González Universidad Nacional de Colombia HPC
  49. 49. Fabio A. González Universidad Nacional de Colombia HPC
  50. 50. Fabio A. González Universidad Nacional de Colombia Deep learning recipe Data HPC Algorithms Tricks Feature learning Size
  51. 51. Fabio A. González Universidad Nacional de Colombia Algorithms • Backpropagation • Backpropagation through time • Online learning (stochastic gradient descent) • Softmax (hierarchical)
  52. 52. Fabio A. González Universidad Nacional de Colombia Deep learning recipe Data HPC Algorithms Tricks Feature learning Size
  53. 53. Fabio A. González Universidad Nacional de Colombia Tricks • DL is mainly an engineering problem • DL networks are hard to train • Several tricks product of years of experience
  54. 54. Fabio A. González Universidad Nacional de Colombia Tricks • DL is mainly an engineering problem • DL networks are hard to train • Several tricks product of years of experience • Layer-wise training • RELU units • Dropout • Adaptive learning rates • Initialization • Preprocessing • Gradient norm clipping
  55. 55. Fabio A. González Universidad Nacional de Colombia Applications • Computer vision: • Image: annotation, detection, segmentation, captioning • Video: object tracking, action recognition, segmentation • Speech recognition and synthesis • Text: language modeling, word/text representation, text classification, translation • Biomedical image analysis
  56. 56. Fabio A. González Universidad Nacional de Colombia Deep Learning at
  57. 57. Fabio A. González Universidad Nacional de Colombia Feature learning for cancer diagnosisArtificial Intelligence in Medicine 64 (2015) 131–145 Contents lists available at ScienceDirect Artificial Intelligence in Medicine journal homepage: www.elsevier.com/locate/aiim An unsupervised feature learning framework for basal cell carcinoma image analysis John Arevaloa , Angel Cruz-Roaa , Viviana Ariasb , Eduardo Romeroc , Fabio A. Gonzáleza,∗ a Machine Learning, Perception and Discovery Lab, Systems and Computer Engineering Department, Universidad Nacional de Colombia, Faculty of Engineering, Cra 30 No 45 03-Ciudad Universitaria, Building 453 Office 114, Bogotá DC, Colombia b Pathology Department, Universidad Nacional de Colombia, Faculty of Medicine, Cra 30 No 45 03-Ciudad Universitaria, Bogotá DC, Colombia c Computer Imaging & Medical Applications Laboratory, Universidad Nacional de Colombia, Faculty of Medicine, Cra 30 No 45 03-Ciudad Universitaria, Bogotá DC, Colombia a r t i c l e i n f o Article history: Received 1 October 2014 Received in revised form 9 April 2015 Accepted 15 April 2015 Keywords: Digital pathology Representation learning Unsupervised feature learning Basal cell carcinoma a b s t r a c t Objective: The paper addresses the problem of automatic detection of basal cell carcinoma (BCC) in histopathology images. In particular, it proposes a framework to both, learn the image representation in an unsupervised way and visualize discriminative features supported by the learned model. Materials and methods: This paper presents an integrated unsupervised feature learning (UFL) framework for histopathology image analysis that comprises three main stages: (1) local (patch) representation learning using different strategies (sparse autoencoders, reconstruct independent component analy- sis and topographic independent component analysis (TICA), (2) global (image) representation learning using a bag-of-features representation or a convolutional neural network, and (3) a visual interpretation layer to highlight the most discriminant regions detected by the model. The integrated unsupervised feature learning framework was exhaustively evaluated in a histopathology image dataset for BCC diag- nosis. Results: The experimental evaluation produced a classification performance of 98.1%, in terms of the area under receiver-operating-characteristic curve, for the proposed framework outperforming by 7% the state-of-the-art discrete cosine transform patch-based representation. Conclusions: The proposed UFL-representation-based approach outperforms state-of-the-art methods for BCC detection. Thanks to its visual interpretation layer, the method is able to highlight discriminative
  58. 58. Fabio A. González Universidad Nacional de Colombia Feature learning for cancer diagnosis Artificial Intelligence in Medicine journal homepage: www.elsevier.com/locate/aiim An unsupervised feature learning framework for basal cell carcinoma image analysis John Arevaloa , Angel Cruz-Roaa , Viviana Ariasb , Eduardo Romeroc , Fabio A. Gonzáleza,∗ a Machine Learning, Perception and Discovery Lab, Systems and Computer Engineering Department, Universidad Nacional de Colombia, Faculty of Engineering, Cra 30 No 45 03-Ciudad Universitaria, Building 453 Office 114, Bogotá DC, Colombia b Pathology Department, Universidad Nacional de Colombia, Faculty of Medicine, Cra 30 No 45 03-Ciudad Universitaria, Bogotá DC, Colombia c Computer Imaging & Medical Applications Laboratory, Universidad Nacional de Colombia, Faculty of Medicine, Cra 30 No 45 03-Ciudad Universitaria, Bogotá DC, Colombia a r t i c l e i n f o Article history: a b s t r a c t Objective: The paper addresses the problem of automatic detection of basal cell carcinoma (BCC) 406 A.A. Cruz-Roa et al. Fig. 2. Convolutional auto-encoder neural network architecture for histopathology image repre- sentation learning, automatic cancer detection and visually interpretable prediction results anal- ogous to a digital stain identifying image regions that are most relevant for diagnostic decisions.
  59. 59. Fabio A. González Universidad Nacional de Colombia Feature learning for cancer diagnosis Artificial Intelligence in Medicine journal homepage: www.elsevier.com/locate/aiim An unsupervised feature learning framework for basal cell carcinoma image analysis John Arevaloa , Angel Cruz-Roaa , Viviana Ariasb , Eduardo Romeroc , Fabio A. Gonzáleza,∗ a Machine Learning, Perception and Discovery Lab, Systems and Computer Engineering Department, Universidad Nacional de Colombia, Faculty of Engineering, Cra 30 No 45 03-Ciudad Universitaria, Building 453 Office 114, Bogotá DC, Colombia b Pathology Department, Universidad Nacional de Colombia, Faculty of Medicine, Cra 30 No 45 03-Ciudad Universitaria, Bogotá DC, Colombia c Computer Imaging & Medical Applications Laboratory, Universidad Nacional de Colombia, Faculty of Medicine, Cra 30 No 45 03-Ciudad Universitaria, Bogotá DC, Colombia a r t i c l e i n f o Article history: a b s t r a c t Objective: The paper addresses the problem of automatic detection of basal cell carcinoma (BCC) . Local learned features with different unsupervised feature learning methods. Left: sparse autoencoders learned some features that are visually re ies of the histopathology dataset (i.e. nuclei shapes) which are highlighted by red squares. Center: reconstruct independent component analysis. R onent analysis highlighting some translational (blue), color (red), scale (yellow) and rotational (green) invariances. (For interpretation of the referen e legend, the reader is referred to the web version of this article.) . Discriminant map of learned features. Enclosed with red path are related with positive (basal cell carcinoma) class and enclosed by blue path are rela thy tissue) class. Left: softmax weights mapped back to topographic organization. Center: features learned with topographic component analysis. Rig minative features for positive (top) and negative (bottom) classes. (For interpretation of the references to color in this figure legend, the reader is re on of this article.) size is not very significant in terms of time for feature extrac- process. Excluding SAE second layer, any feature extraction el proposed here takes less than 2 s to extract features from 4.9. Overall BCC classification results Table 4 summarizes the systematic evaluation per
  60. 60. Fabio A. González Universidad Nacional de Colombia Feature learning for cancer diagnosis Artificial Intelligence in Medicine journal homepage: www.elsevier.com/locate/aiim An unsupervised feature learning framework for basal cell carcinoma image analysis John Arevaloa , Angel Cruz-Roaa , Viviana Ariasb , Eduardo Romeroc , Fabio A. Gonzáleza,∗ a Machine Learning, Perception and Discovery Lab, Systems and Computer Engineering Department, Universidad Nacional de Colombia, Faculty of Engineering, Cra 30 No 45 03-Ciudad Universitaria, Building 453 Office 114, Bogotá DC, Colombia b Pathology Department, Universidad Nacional de Colombia, Faculty of Medicine, Cra 30 No 45 03-Ciudad Universitaria, Bogotá DC, Colombia c Computer Imaging & Medical Applications Laboratory, Universidad Nacional de Colombia, Faculty of Medicine, Cra 30 No 45 03-Ciudad Universitaria, Bogotá DC, Colombia a r t i c l e i n f o Article history: a b s t r a c t Objective: The paper addresses the problem of automatic detection of basal cell carcinoma (BCC) 142 J. Arevalo et al. / Artificial Intelligence in Medicine 64 (2015) 131–145 Table 2 Outputs produced by the system for different cancer and non-cancer input images. The table rows show from top to bottom: the real image class predicted by the model, the probability associated to the prediction, and the digital stained image (red stain indicates cancer regions, blue stain in
  61. 61. Fabio A. González Universidad Nacional de Colombia Feature learning for cancer diagnosis Artificial Intelligence in Medicine journal homepage: www.elsevier.com/locate/aiim An unsupervised feature learning framework for basal cell carcinoma image analysis John Arevaloa , Angel Cruz-Roaa , Viviana Ariasb , Eduardo Romeroc , Fabio A. Gonzáleza,∗ a Machine Learning, Perception and Discovery Lab, Systems and Computer Engineering Department, Universidad Nacional de Colombia, Faculty of Engineering, Cra 30 No 45 03-Ciudad Universitaria, Building 453 Office 114, Bogotá DC, Colombia b Pathology Department, Universidad Nacional de Colombia, Faculty of Medicine, Cra 30 No 45 03-Ciudad Universitaria, Bogotá DC, Colombia c Computer Imaging & Medical Applications Laboratory, Universidad Nacional de Colombia, Faculty of Medicine, Cra 30 No 45 03-Ciudad Universitaria, Bogotá DC, Colombia a r t i c l e i n f o Article history: a b s t r a c t Objective: The paper addresses the problem of automatic detection of basal cell carcinoma (BCC)
  62. 62. Fabio A. González Universidad Nacional de Colombia Deep learning for cancer diagnosis Cascaded Ensemble of Convolutional Neural Networks and Handcrafted Features for Mitosis Detection Haibo Wang *⇤1 , Angel Cruz-Roa*2 , Ajay Basavanhally1 , Hannah Gilmore1 , Natalie Shih3 , Mike Feldman3 , John Tomaszewski4 , Fabio Gonzalez2 , and Anant Madabhushi1 1 Case Western Reserve University, USA 2 Universidad Nacional de Colombia, Colombia 3 University of Pennsylvania, USA 4 University at Buffalo School of Medicine and Biomedical Sciences, USA ABSTRACT Breast cancer (BCa) grading plays an important role in predicting disease aggressiveness and patient outcome. A key component of BCa grade is mitotic count, which involves quantifying the number of cells in the process of dividing (i.e. undergoing mitosis) at a specific point in time. Currently mitosis counting is done manually by a pathologist looking at multiple high power fields on a glass slide under a microscope, an extremely laborious and time consuming process. The development of computerized systems for automated detection of mitotic nuclei, while highly desirable, is confounded by the highly variable shape and appearance of mitoses. Existing methods use either handcrafted features that capture certain morphological, statistical or textural attributes of mitoses or features learned with convolutional neural networks (CNN). While handcrafted features are inspired by the domain and the particular application, the data-driven CNN models
  63. 63. Fabio A. González Universidad Nacional de Colombia Deep learning for cancer diagnosis Cascaded Ensemble of Convolutional Neural Networks and Handcrafted Features for Mitosis Detection Haibo Wang *⇤1 , Angel Cruz-Roa*2 , Ajay Basavanhally1 , Hannah Gilmore1 , Natalie Shih3 , Mike Feldman3 , John Tomaszewski4 , Fabio Gonzalez2 , and Anant Madabhushi1 1 Case Western Reserve University, USA 2 Universidad Nacional de Colombia, Colombia 3 University of Pennsylvania, USA 4 University at Buffalo School of Medicine and Biomedical Sciences, USA optimization. Including the time needed to extract handcrafted features (6.5 hours in pure MATLAB implementation), the training stage for HC+CNN was completed in less than 18 hours. Dataset Method TP FP FN Precision Recall F-measure Scanner Aperio HC+CNN 65 12 35 0.84 0.65 0.7345 HC 64 22 36 0.74 0.64 0.6864 CNN 53 32 47 0.63 0.53 0.5730 IDSIA13 70 9 30 0.89 0.70 0.7821 IPAL4 74 32 26 0.70 0.74 0.7184 SUTECH 72 31 28 0.70 0.72 0.7094 NEC12 59 20 41 0.75 0.59 0.6592 Table 2: Evaluation results for mitosis detection using HC+CNN and comparative methods on the ICPR12 dataset. Figure 3: Mitoses identified by HC+CNN as TP (green circles), FN (yellow circles), and FP (red circles) on the ICPR12 dataset. The TP examples have distinctive intensity, shape and texture while the FN examples are less distinctive in intensity and shape. The FP examples are visually more alike to mitotic figures than the FNs. 3.4 Results on AMIDA13 Dataset On the AMIDA13 dataset, the F-measure of our approach (CCIPD/MINDLAB) is 0.319, which ranks 6 among 14 submis-
  64. 64. Fabio A. González Universidad Nacional de Colombia Deep learning for cancer diagnosis Cascaded Ensemble of Convolutional Neural Networks and Handcrafted Features for Mitosis Detection Haibo Wang *⇤1 , Angel Cruz-Roa*2 , Ajay Basavanhally1 , Hannah Gilmore1 , Natalie Shih3 , Mike Feldman3 , John Tomaszewski4 , Fabio Gonzalez2 , and Anant Madabhushi1 1 Case Western Reserve University, USA 2 Universidad Nacional de Colombia, Colombia 3 University of Pennsylvania, USA 4 University at Buffalo School of Medicine and Biomedical Sciences, USA Figure 3: Mitoses identified by HC+CNN as TP (green circles), FN (yellow circles), and FP (red circles) on the ICPR12 HPF Handcrafted features Feature learning Classifier 1 Classifier 3 Probabilistic fusion Classifier 2 HPF Segmentation Figure 1: Workflow of our methodology. Blue-ratio thresholding14 is first applied to segment mitosis candidates. On each segmented blob, handcrafted features are extracted and classified via a Random Forests classifier. Meanwhile, on each segmented 80 ⇥ 80 patch, convolutional neural networks (CNN)8 are trained with a fully connected regression model as part of the classification layer. For those candidates that are difficult to classify (ambiguous result from the CNN), we train a second-stage Random Forests classifier on the basis of combining CNN-derived and handcrafted features. Final decision is obtained via a consensus of the predictions of the three classifiers.
  65. 65. Fabio A. González Universidad Nacional de Colombia Deep learning for cancer diagnosis Cascaded Ensemble of Convolutional Neural Networks and Handcrafted Features for Mitosis Detection Haibo Wang *⇤1 , Angel Cruz-Roa*2 , Ajay Basavanhally1 , Hannah Gilmore1 , Natalie Shih3 , Mike Feldman3 , John Tomaszewski4 , Fabio Gonzalez2 , and Anant Madabhushi1 1 Case Western Reserve University, USA 2 Universidad Nacional de Colombia, Colombia 3 University of Pennsylvania, USA 4 University at Buffalo School of Medicine and Biomedical Sciences, USA Figure 3: Mitoses identified by HC+CNN as TP (green circles), FN (yellow circles), and FP (red circles) on the ICPR12 HPF Handcrafted features Feature learning Classifier 1 Classifier 3 Probabilistic fusion Classifier 2 HPF Segmentation
  66. 66. Fabio A. González Universidad Nacional de Colombia Efficient DL over whole slide pathology images ~2000x2000 pixels
  67. 67. Fabio A. González Universidad Nacional de Colombia Efficient DL over whole slide pathology images ~2000x2000 pixels
  68. 68. Fabio A. González Universidad Nacional de Colombia Efficient DL over whole slide pathology images ~2000x2000 pixels
  69. 69. Fabio A. González Universidad Nacional de Colombia Efficient DL over whole slide pathology images ~2000x2000 pixels
  70. 70. Fabio A. González Universidad Nacional de Colombia Efficient DL over whole slide pathology images ~2000x2000 pixels
  71. 71. Fabio A. González Universidad Nacional de Colombia Exudate detection in eye fundus images
  72. 72. Fabio A. González Universidad Nacional de Colombia Exudate detection in eye fundus images
  73. 73. Fabio A. González Universidad Nacional de Colombia Exudate detection in eye fundus images
  74. 74. Fabio A. González Universidad Nacional de Colombia Exudate detection in eye fundus images
  75. 75. Fabio A. González Universidad Nacional de Colombia RNN for book genre classification Dataset construction Annotated Dataset Get top tags from users No. Class Tags 1 science_fiction sci-fi, science-fiction 2 comedy comedies, comedy, humor ... ... ... 9 religion christian, religion, christianity,...
  76. 76. Fabio A. González Universidad Nacional de Colombia RNN for book genre classification Dataset construction Annotated Dataset Get top tags from users No. Class Tags 1 science_fiction sci-fi, science-fiction 2 comedy comedies, comedy, humor ... ... ... 9 religion christian, religion, christianity,... RNN Architecture
  77. 77. Fabio A. González Universidad Nacional de Colombia Deep CNN-RNN for object tracking in video
  78. 78. Fabio A. González Universidad Nacional de Colombia Deep CNN-RNN for object tracking in video
  79. 79. Fabio A. González Universidad Nacional de Colombia Deep CNN-RNN for object tracking in video Number of sequences in different public datasets: • VOT2013: 16 • VOT2014: 25 • VOT2015: 60 • OOT: 50 • ALOV++: 315
  80. 80. Fabio A. González Universidad Nacional de Colombia Deep CNN-RNN for object tracking in video Number of sequences in different public datasets: • VOT2013: 16 • VOT2014: 25 • VOT2015: 60 • OOT: 50 • ALOV++: 315
  81. 81. Fabio A. González Universidad Nacional de Colombia Deep CNN-RNN for object tracking in video Number of sequences in different public datasets: • VOT2013: 16 • VOT2014: 25 • VOT2015: 60 • OOT: 50 • ALOV++: 315
  82. 82. Fabio A. González Universidad Nacional de Colombia Deep CNN-RNN for object tracking in video Number of sequences in different public datasets: • VOT2013: 16 • VOT2014: 25 • VOT2015: 60 • OOT: 50 • ALOV++: 315
  83. 83. Fabio A. González Universidad Nacional de Colombia Deep CNN-RNN for object tracking in video
  84. 84. Fabio A. González Universidad Nacional de Colombia Deep CNN-RNN for object tracking in video
  85. 85. Fabio A. González Universidad Nacional de Colombia The Team Alexis Carrillo Andrés Esteban Paez Angel Cruz Andrés Castillo Andrés Jaque Andrés Rosso Camilo Pino Claudia Becerra Fabián Paez Felipe Baquero Fredy Díaz Gustavo Bula Germán Sosa Hugo Castellanos Ingrid Suárez John Arévalo JorgeVanegas Jorge Camargo Jorge Mario Carrasco Joseph Alejandro Gallego José David Bermeo Juan Carlos Caicedo Juan Sebastián Otálora Katherine Rozo LadyViviana Beltrán Lina Rosales Luis Alejandro Riveros Miguel Chitiva Óscar Paruma Óscar Perdomo Raúl Ramos Roger Guzmán Santiago Pérez Sergio Jiménez Susana Sánchez Sebastián Sierra
  86. 86. Gracias! fagonzalezo@unal.edu.co http://mindlaboratory.org

×