Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Let’s learn deep

1,094 views

Published on

Introductory talk on deep learning. Focused on broader understanding of various terms used in context of deep learning and neural networks.

Many examples of application of various deep learning techniques from text analysis domain.

Published in: Education
  • Be the first to comment

Let’s learn deep

  1. 1. LET’S LEARN DEEP SHUBHANSHU MISHRA @THESHUBHANSHU
  2. 2. Some Interesting Results Image Source: http://colah.github.io/posts/2014-07-NLP-RNNs-Representations/ Distributed representations of words and phrases and their compositionality T Mikolov, I Sutskever, K Chen, GS Corrado, J Dean - Advances in neural information processing systems, 2013
  3. 3. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. & Dean, J. Distributed representations of words and phrases and their compositionality. In Proc. Advances in Neural Information Processing Systems 26 3111–3119 (2013). Deep learning Y LeCun, Y Bengio, G Hinton - Nature, 2015
  4. 4. http://www.socher.org/uploads/Main/MultipleVectorWordEmbedding.png
  5. 5. Zou, Will Y., et al. "Bilingual Word Embeddings for Phrase-Based Machine Translation." EMNLP. 2013.
  6. 6. Paraphrase Detection Socher, Richard, et al. "Dynamic pooling and unfolding recursive autoencoders for paraphrase detection." Advances in Neural Information Processing Systems. 2011.
  7. 7. Socher, Richard; Perelygin, Alex; Y. Wu, Jean; Chuang, Jason; D. Manning, Christopher; Y. Ng, Andrew; Potts, Christopher. "Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank" (PDF). EMNLP 2013. http://cs.stanford.edu/people/karpathy/deepimagesent/
  8. 8. Why Neural Networks? - The perceptron algorithm can learn to classify linearly separable samples. ALWAYS. - BUT, how to tackle non-linearity? Enter NEURAL NETWORKS - Add a non linear transform to the data - 1 layer ANNs can approximate any continuous function [1,2] - Can be trained through BACKPROPOGRATION http://cs231n.github.io/neural-networks-1/[1] Cybenko, George. "Approximation by superpositions of a sigmoidal function."Mathematics of control, signals and systems 2.4 (1989): 303-314. [2] http://neuralnetworksanddeeplearning.com/chap4.html
  9. 9. A simple Neural Network http://ufldl.stanford.edu/wiki/images/thumb/9/99/Network33 1.png/400px-Network331.png Y ~ 𝑓 𝑊, 𝑋 𝑙𝑜𝑠𝑠 = 𝐻(𝑓 𝑊, 𝑋 , 𝑌) log 𝑙𝑜𝑠𝑠 = 𝑦 ∗ log(𝑓(𝑊, 𝑋)) ℎ𝑖𝑛𝑔𝑒𝑙𝑜𝑠𝑠 = max(0, 1 − 𝑓(𝑊, 𝑋) ∗ 𝑦) Train it through back propagation 𝑊𝑡 = 𝑊𝑡−1 − 𝑙 ∗ 𝜕𝑙𝑜𝑠𝑠(𝑊) 𝜕𝑊
  10. 10. Types of ANN: Vanilla Feed Forward NN https://class.coursera.org/neuralnets-2012-001/lecture Hinton, Geoffrey E. "Learning distributed representations of concepts."Proceedings of the eighth annual conference of the cognitive science society. Vol. 1. 1986.
  11. 11. https://class.coursera.org/neuralnets-2012-001/lecture
  12. 12. https://class.coursera.org/neuralnets-2012-001/lecture
  13. 13. Collobert, Ronan, et al. "Natural language processing (almost) from scratch."The Journal of Machine Learning Research 12 (2011): 2493-2537. Example of multitasking with NN. Task 1 and Task 2 are two tasks trained with the window approach architecture presented in Figure 1. Lookup tables as well as the first hidden layer are shared. The last layer is task specific. The principle is the same with more than two tasks.
  14. 14. AI Question Answering Counting Compound Coreference Factoid Q/A with supporting facts Weston J, Bordes A, Chopra S, Mikolov T. Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks. In: Unpublished.; 2015. doi:10.1016/j.jpowsour.2014.09.131. Reasoning about agents motivation Bordes A, Usunier N, Chopra S, Weston J. Large-scale Simple Question Answering with Memory Networks. arXiv. 2015. Weston J, Chopra S, Bordes A. Memory Networks. In: International Conference on Learning Representations.; 2015:1-14. http://arxiv.org/abs/1410.3916. Total 20 tasks. System should solve all tasks. No task specific system. Use Memory Network to solve these tasks. Accuracy of ~42% beats the older benchmarks. http://www.thespermwhale.com/jaseweston/babi/abordes-ICLR.pdf
  15. 15. Types of ANN: Recurrent NN http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/RNN-shorttermdepdencies.png Learn sequential structures like sequence of chars, words, audio signals etc.
  16. 16. Types of ANN: Recurrent NN http://colah.github.io/posts/2014-07-NLP-RNNs- Representations/img/Bottou-Atree.png From Machine Learning to Machine Reasoning Léon Bottou Learn arbitrary structures like parse trees.
  17. 17. Types of ANN: Convolutional Neural Nets http://colah.github.io/posts/2014-07-Conv-Nets- Modular/img/Conv-9-Conv2Max2Conv2.png Learn similar features in different parts of the inputs Are used heavily in Image Data because various parts of the image can refer to the same data.
  18. 18. Types of ANN: Auto Encoders From Machine Learning to Machine Reasoning Léon Bottou http://colah.github.io/posts/2014-07-NLP-RNNs- Representations/img/Bottou-unfold.png Learn to reconstruct the input
  19. 19. Types of ANN: RBMs and DBNs RBM: Restricted Boltzmann Machine DBN: Deep Belief Networks Generative graphical model Salakhutdinov, Ruslan, Andriy Mnih, and Geoffrey Hinton. "Restricted Boltzmann machines for collaborative filtering." Proceedings of the 24th international conference on Machine learning. ACM, 2007.
  20. 20. What is Deep About Deep Learning? 1. Deep Belief networks 2. RBMs, Auto encoders 3. Convolutional Neural Networks 4. Stacked Auto Encoders Deeper NNs are helpful so that number of parameters to learn are of polynomial order compared to less layers where number of parameters to learn will increase exponentially. Wolf, Lior. "Deepface: Closing the gap to human- level performance in face verification." Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on. IEEE. 2014.
  21. 21. What is Deep Learning? Like a Lego Building exercise. Stacking of various models and propagating the error from the output of this architecture to each layer. Solves the issue of feature selection Non linear relationship between features Much easier to train a model on large data than to hand craft features.
  22. 22. When Deep Learning? LARGE DATA LARGE COMPUTATIONAL RESOURCES USEFUL QUESTIONS
  23. 23. Why were Deep ANN’s in shadows? There were major challenges in training ANNs: ◦ Need large amounts of data to train (for better function approximation) ◦ More weights to train for (Standard image classification models have weights in millions or billions) ◦ Vanishing and exploding gradient problem (for Deeper Neural Networks)
  24. 24. What changed? Algorithms for training ANN: ◦ Stochastic Gradient Descent (with momentum) ◦ RMSProp ◦ Adam, AdaDelta Fixed vanishing and exploding gradient problems: ◦ LSTM, GRU Units (for vanishing gradients) ◦ Gradient Clipping (for exploding gradients) Methods to prevent overfitting: ◦ Regularization ◦ Dropout ◦ Adversial Networks Computation Resources: ◦ GPU Computing ◦ HPC, MPI Larger Datasets: ◦ ImageNet (for image classifications) ◦ Google Billion Words Corpus (for auto generated word vectors) Methods to gain sparsity: ◦ DropOut ◦ ReLU, MaxOut activations
  25. 25. Machine Learning to Neural Networks MACHINE LEARNING METHODS Deterministic Models ◦ Linear Regression ◦ Logistic Regression ◦ SVM ◦ CRF Generative Models ◦ HMM ◦ LDA ◦ Collaborative Filtering Unsupervised ◦ K-means ◦ Hierarchal Clustering NEURAL NETWORK METHODS Deterministic Models ◦ ANN Squared Error loss ◦ ANN Softmax layer and log loss ◦ ANN Hinge loss ◦ RNN with prediction at end Generative Models ◦ RNN generating sequences ◦ RBMs ◦ RBMs Unsupervised ◦ Auto Encoders ◦ RBMs ◦ Deep Belief Networks
  26. 26. LITTLE MATH OPTIONAL
  27. 27. Loss Functions & Optimization Rmsprop and Adagrad, Adadelta are used in high performance networks. Idea is: For some f(W, X) minimize the loss Between y and f(W,X). This is done using a loss function. Major one is log-loss
  28. 28. Open Questions Autoencoders for text data AI Question Answering Sarcasm Sentiment analysis
  29. 29. Collaborate SEMEVAL 2016 is coming up and there are tasks like ◦ Sentiment analysis ◦ Question Answering ◦ http://alt.qcri.org/semeval2016/task4/
  30. 30. the didbend first water. bond warmerial in roid. the lagents to dutter sprantess i harkian, arow ... with enkyber fanter-indoug tood cool... the summer small winding skates the moutled day marked gly searl. doupy of it your sold all ic house bat she - etther of thouder fol my old stars gream trains ond cat out the song"saur and shide of gres dewill a now centher mother of at, the creaking passs cool sunsing sapcingatale dowthing aland sun caking in. do a back-end stliagh in in ithicn like into where so to the touther pate patin on' gal on the aloopme saterfleoss the sound i lean I and he had begetter by His husband, brought unto a hundred cruelings, shrouded me, pierced Arjuna, on thy foe, proud directions and urged by Satyaki in the heart as the filled hill with his flying poison. Unto thy host, called Earth, recognise him, by means of her abode, 'Thou shalt conquer thy car is in all kinds of righteousness. Whatever I is filled with respect. In thee enjoyment will iniunto that Kshatriya enjoys verily to that as to him that I have now take for me of Kuru's race.'" SECTION LXXXVIII "Drona said, 'Renounced still, thou art my great science and foreholder, thou wilt, O best of men, go now, may be said to be Pandu. Persons of fooly acts also may injury With regions of entirety? Thou art the deteriory
  31. 31. THANK YOU =) MANY OF THE RESOURCES USED CAN BE FOUND AT: HTTP://SHUBHANSHU.COM/DEEPLEARNING.HTML

×