Neural Networks and Deep Learning Syllabus

Cinvestav Guadalajara
Neural Networks and Deep Learning
Andres Mendez Vazquez, PhD
April 2, 2019
1 Overview
Neural Networks are computational models inspired in the structure found in the biological neural networks. It is
widely used to estimate or approximate functions that depend on a great number of inputs and are generally unknown.
Like other methods of learning, Neural Networks has been used to solve wide variety of tasks from computer vision
to speech recognition. In recent years (Circa 2009 - Present) huge improvements have been made in recurrent neural
networks, deep feed-forward neural networks and hardware implementation of neural networks. These new paradigms
have taken Neural Networks to new incredible ﬁelds of application, making necessary for any practitioner of intelligent
systems to be proﬁcient on them.
2 Course Objectives
This is a theoretical and practical 65 hours course that introduces the students to concepts of neural networks. The
emphasis is on various neural networks and their usage for data analysis. Students will develop an understanding of
the neural networks process of trtainning and its issues. We will be looking at applications in:
1. Image Recognition
2. Time series
3. Reinforcment learning
4. Basic Natural Language Processing
5. etc.
3 Prerequisites
Linear algebra, machine learning, probability and analysis of algorithms.
4 Class Grading
The grades in the class will be graded in the following way
Task Percentage
Midterm I 15%
Midterm II 15%
Final 15%
4 Homeworks 20%
Project 35%
1

5 Course Topics
5.1 Traditional Neural Networks
1. Introduction [1, 2, 3]
(a) Deﬁnition
(b) Models of a Neuron
(c) Neural networks as directed graphs
(d) Knowledge representation
2. Learning Process [1, 2, 3]
(a) Error-correction learning
(b) Memory-based learning
(c) Hebbian learning
(d) Competitive learning
3. Single Layer Perceptrons [1, 2, 3]
(a) Introduction
(b) Basic Theory
(c) Perceptron model
(d) Perceptron convergence theorem
4. Multilayer Perceptron [1, 2, 3]
(a) Introduction
(b) XOR problem
(c) Architecture
(d) Back-propagation algorithm
(e) Virtues and limitations of the back-propagation
5. Hopﬁeld Networks [2, 4]
(a) Energy Based Models
(b) Training Strategies
i. Hebbian Learning
ii. Storkey Learning Rule
6. Radial-Basis Function Networks [1, 2, 5]
(a) Introduction
(b) Cover’s theorem
(c) Regularization theory
(d) Generalized Radial-Basis function networks
(e) Comparison with Multilayer Perceptrons
(f) Learning Strategies
i. Recursive Least-Square Estimation
ii. Fordward Selection Strategies
2

7. Self-Organizing Maps [1]
(a) Introduction
(b) Two Basic Feature-Mapping Models
(c) Self-Organizing Map
(d) Properties of the Feature Map
5.2 Deep Learning
1. Introduction[6]
2. Deep Feed Forward Networks [6]
(a) The XOR problem
(b) Gradient-Based Learning
(c) Architecture Design
(d) Back-Propagation and Other Differentiation Algorithms
(e) The Vanishing Gradient Problem
3. Optimization for Training Deep Models [7, 8, 9, 10, 11, 12, 6, 13, 14, 15]
(a) Basic Algorithms
i. Stochastic Gradient Descent
ii. Comparison of Gradient Descent with Stochastic Gradient Descent
iii. Automatic Differentiation
(b) Batch Normalization
(c) Parameter Initialization Strategies
(d) Algorithms with Adaptive Learning Rates
(e) Approximate Second-Order Methods
(f) Sampling and Monte Carlo Methods
4. Regularization for Deep Learning [6, 16, 17, 18, 19, 1]
(a) Norm Penalties as Constrained Optimization
(b) Regularization and Under-Constrained Problems
(c) Noise Robustness
(d) Early Stopping
(e) Sparse Representations
(f) Adversarial Training
5. Convolutional Networks [20, 6, 21]
(a) The Convolution Operation
(b) Convolution and Pooling as an Inﬁnitely Strong Prior
(c) Variants of the Basic Convolution Function
(d) Efﬁcient Convolution Algorithms
6. Recurrent and Recursive Nets [22, 18, 6, 1]
(a) Introduction
(b) Recurrent Neural Networks
3

(c) Deep Recurrent Networks
(d) Recursive Neural Networks
(e) The Long Short-Term Memory
7. Auto-encoders [6, 23, 24, 25, 26]
(a) Purporse
(b) Structure
(c) Variations
i. Denoising autoencoder
ii. Sparse autoncoders
iii. Undercomplete autoncoders
iv. Variational autoencoder
8. Deep Boltzmann Machines [6, 27, 28, 29]
(a) Introduction
(b) Restricted Boltzmann Machines
(c) Deep Boltzmann Machines
(d) Boltzmann Machines for Real-Valued Data
4

References
[1] S. Haykin, Neural Networks and Learning Machines. No. v. 10 in Neural networks and learning machines,
Prentice Hall, 2009.
[2] R. Rojas, Neural networks: a systematic introduction. Springer Science & Business Media, 2013.
[3] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and
Prediction, Second Edition. Springer Series in Statistics, Springer New York, 2009.
[4] H. Wang and B. Raj, “A survey: Time travel in deep learning space: An introduction to deep learning models
and how deep learning models evolved from the initial ideas,” arXiv preprint arXiv:1510.04781, 2015.
[5] L. Zhang and P. N. Suganthan, “A survey of randomized algorithms for training neural networks,” Information
Sciences, vol. 364, pp. 146–155, 2016.
[6] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. The MIT Press, 2016.
[7] L. Bottou, “Large-scale machine learning with stochastic gradient descent,” in Proceedings of COMPSTAT’2010,
pp. 177–186, Springer, 2010.
[8] J. Duchi, E. Hazan, and Y. Singer, “Adaptive subgradient methods for online learning and stochastic optimiza-
tion,” Journal of Machine Learning Research, vol. 12, no. Jul, pp. 2121–2159, 2011.
[9] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
[10] Q. V. Le, J. Ngiam, A. Coates, A. Lahiri, B. Prochnow, and A. Y. Ng, “On optimization methods for deep learn-
ing,” in Proceedings of the 28th International Conference on International Conference on Machine Learning,
pp. 265–272, Omnipress, 2011.
[11] J. A. Snyman, Practical mathematical optimization. Springer, 2018.
[12] L. Bottou, F. E. Curtis, and J. Nocedal, “Optimization methods for large-scale machine learning,” Siam Review,
vol. 60, no. 2, pp. 223–311, 2018.
[13] A. G. Baydin, B. A. Pearlmutter, A. A. Radul, and J. M. Siskind, “Automatic differentiation in machine learning:
a survey,” Journal of Machine Learning Research, vol. 18, no. 153, pp. 1–43, 2018.
[14] A. Griewank and A. Walther, Evaluating derivatives: principles and techniques of algorithmic differentiation,
vol. 105. Siam, 2008.
[15] A. Wibisono, A. C. Wilson, and M. I. Jordan, “A variational perspective on accelerated methods in optimization,”
proceedings of the National Academy of Sciences, vol. 113, no. 47, pp. E7351–E7358, 2016.
[16] C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, “Understanding deep learning requires rethinking
generalization,” arXiv preprint arXiv:1611.03530, 2016.
[17] M. D. Zeiler and R. Fergus, “Stochastic pooling for regularization of deep convolutional neural networks,” arXiv
preprint arXiv:1301.3557, 2013.
[18] W. Zaremba, I. Sutskever, and O. Vinyals, “Recurrent neural network regularization,” arXiv preprint
arXiv:1409.2329, 2014.
[19] M. J. Kochenderfer and T. A. Wheeler, Algorithms for Optimization. Mit Press, 2019.
[20] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel, “Backpropagation
applied to handwritten zip code recognition,” Neural computation, vol. 1, no. 4, pp. 541–551, 1989.
[21] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, et al., “Gradient-based learning applied to document recognition,”
Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
5

[22] T. Mikolov, M. Karafiát, L. Burget, J. ˇCernock`y, and S. Khudanpur, “Recurrent neural network based language
model,” in Eleventh annual conference of the international speech communication association, 2010.
[23] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114, 2013.
[24] Y. Bengio et al., “Learning deep architectures for ai,” Foundations and trends R in Machine Learning, vol. 2,
no. 1, pp. 1–127, 2009.
[25] A. B. L. Larsen, S. K. SÃžnderby, H. Larochelle, and O. Winther, “Autoencoding beyond pixels using a learned
similarity metric,” in Proceedings of The 33rd International Conference on Machine Learning (M. F. Balcan and
K. Q. Weinberger, eds.), vol. 48 of Proceedings of Machine Learning Research, (New York, New York, USA),
pp. 1558–1566, PMLR, 20–22 Jun 2016.
[26] A. Makhzani and B. Frey, “K-sparse autoencoders,” arXiv preprint arXiv:1312.5663, 2013.
[27] E. Aarts and J. Korst, “Simulated annealing and boltzmann machines,” 1988.
[28] H. Larochelle, M. Mandel, R. Pascanu, and Y. Bengio, “Learning algorithms for the classification restricted
boltzmann machine,” Journal of Machine Learning Research, vol. 13, no. Mar, pp. 643–669, 2012.
[29] R. Salakhutdinov and G. Hinton, “Deep boltzmann machines,” in Artificial intelligence and statistics, pp. 448–
455, 2009.
6

Neural Networks and Deep Learning Syllabus

More Related Content

What's hot

Similar to Neural Networks and Deep Learning Syllabus

More from Andres Mendez-Vazquez

Recently uploaded

Neural Networks and Deep Learning Syllabus