Unsupervised Feature Learning


Published on

Published in: Technology, Education
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Unsupervised Feature Learning

  1. 1. Unsupervised Feature Learning: A Literature Review By: Amgad Muhammad & Mohamed EL Fadly 1
  2. 2. Outline • Background • Problem Definition • Unsupervised Feature Learning • Our Work • Sparse Auto-encoder • Preprocessing: PCA and Whitening • Self-Taught Learning and Unsupervised Feature Learning • References 2 of 37
  3. 3. Background • Machine learning is one of the corner stone fields in Artificial Intelligence, where machines learn to act autonomously, and react to new situations without being pre-programmed. • Machine learning has seen numerous successes, but applying learning algorithms today often means spending a long time hand-engineering the input feature representation. This is true for many problems in vision, audio, NLP, robotics, and other areas. • There are many learning algorithms for learning among them are [1]: 1) Supervised learning 2) Unsupervised learning 3 of 37
  4. 4. Problem Definition • The target of the supervised learning method can be summarized as follows: • • • Regression Classification The first step to train a machine using the supervised learning method, is collecting the data set, which in most cases is a very difficult and an expensive process • The alternative approach is to measure and use everything, which will lead to other problems, i.e. the noisy data [2] 4 of 37
  5. 5. Unsupervised feature learning • The unsupervised feature learning approach learns higher-level representation of the unlabeled data features by detecting patterns using various algorithms, i.e. sparse encoding algorithm [3] • It is a self-taught learning framework developed to transfer knowledge from unlabeled data, which is much easier to obtain, to be used as preprocessing step to enhance the supervised inductive models. • This framework is developed to tackle present issues in the supervised learning model and to increase its accuracy regardless of the domain of interest (vision, sound, and text).[4] 5 of 37
  6. 6. Our Work • We will present some of the methods for unsupervised feature learning and deep learning, each of which automatically learns a good representation of the input from unlabeled data. • We will be concentrating on the following algorithms, with more details in the following slides: • • PCA and Whitening • • Sparse Autoencoder Self-Taught We will also be focusing on the application of these algorithms to learn features from images 6 of 37
  7. 7. Sparse Autoencoders 7 of 37
  8. 8. Sparse Auto-encoder Autoencoder [6] 8 of 37
  9. 9. Neural Network Before we get further into the details of the algorithm, we need to quickly go through neural network. To describe neural networks, we will begin by describing the simplest possible neural network. One that comprises a single "neuron." We will use the following diagram to denote a single neuron [5] Single Neuron [8] 9 of 37
  10. 10. Neural Network 10 of 37
  11. 11. Sigmoid Activation Function Sigmoid Function [8] 11 of 37
  12. 12. Tanh Activation Function Tanh Function [8] 12 of 37
  13. 13. Neural Network Model • A neural network is put together by hooking together many of our simple "neurons," so that the output of a neuron can be the input of another. For example, here is a small neural network • The circles labeled "+1" are called bias units, and correspond to the intercept term. The leftmost layer of the network is called the input layer, and the rightmost layer the output layer .The middle layer of nodes is called the hidden layer, because its values are not observed in the training set.[8] Small Neural Network[8] 13 of 37
  14. 14. Neural Network Model 14 of 37
  15. 15. Autoencoders and Sparsity 15 of 37
  16. 16. Autoencoders and Sparsity Algorithm 16 of 37
  17. 17. Autoencoders and Sparsity Algorithm –cont’d 17 of 37
  18. 18. Autoencoders and Sparsity Algorithm –cont’d KL Function [6] 18 of 37
  19. 19. Autoencoders and Sparsity Algorithm – Cont’d 19 of 37
  20. 20. Autoencoder Implementation • We implemented a sparse autoencoder, trained with 8×8 image patches using the L-BFGS optimization algorithm Step 1: Generate training set The first step is to generate a training set. A random sample of 200 patches from the dataset. 20 of 37
  21. 21. Autoencoder Implementation Step 2: Sparse autoencoder objective Compute the sparse autoencoder cost function Jsparse(W,b) and the corresponding derivatives of Jsparse with respect to the different parameters Step3: Train the sparse autoencoder After computing Jsparse and its derivatives, we will minimize Jsparse with respect to its parameters, and thereby train our sparse autoencoder. We trained our sparse encoder with L-BFGS algorithm Our neural network for training has 64 input units, 25 hidden units, and 64 output units. 21 of 37
  22. 22. Autoencoder Implementation Results After training the sparse autoencoder, the sparse autoencoder successfully learned a set of edge detectors. CPU Intel corei7 Quad Core processor 2.7GHz RAM 6 GB RAM Training Set 200 patches 8x8 images Neural Network for training 64 input units, 25 hidden units, and 64 output units. 22 of 37
  23. 23. Autoencoder Implementation Results Training Time Expected Time [1] 39 seconds Less than a minute 23 of 37
  24. 24. Principle Component Analysis – PCA 24 of 32
  25. 25. Principle Component Analysis – PCA • PCA is a dimensionality reduction mechanism used to eliminate highly correlated variables, without sacrificing much of the details.[7] 25 of 37
  26. 26. PCA – Example Example • Given the 2D data example. • This data has already been pre-processed using mean normalization. • We want to find the principle directions of variation. 2D data example[8] 26 of 37
  27. 27. PCA – Example (Cont’d) u2 u1 2D data example[8] 27 of 37
  28. 28. PCA – Math 28 of 37
  29. 29. PCA – Math 2D data example[8] 29 of 37
  30. 30. PCA – Dimensionality Reduction 30 of 37
  31. 31. PCA – Dimensionality Reduction 31 of 37
  32. 32. Whitening 32 of 37
  33. 33. Self-Taught Learning 33 of 32
  34. 34. Self-Taught learning and Unsupervised feature learning Given an unlabeled data set, we can start training a sparse autoencoder to extract features to give us a better, condense representation of the data. Neural Network[8] 34 of 37
  35. 35. Self-Taught learning and Unsupervised feature learning • Once the training is done, the network is now ready to find better features to represent the input using the activations of the network hidden layer. [8] Input layer of Neural Network[8] 35 of 37
  36. 36. Self-Taught learning and Unsupervised feature learning Input layer of Neural Network[8] 36 of 37
  37. 37. Self-Taught learning and Unsupervised feature learning Input layer of Neural Network[8] 37 of 37
  38. 38. Self-Taught Learning Application • We used the self-taught learning paradigm with the sparse autoencoder and softmax classifier to build a classifier for handwritten digits. • The goal is to distinguish between the digits from 0 to 4. We will use the digits 5 to 9 as our "unlabeled" dataset; we will then use a labeled dataset with the digits 0 to 4 with which to train the softmax classifier. 38 of 37
  39. 39. Self-Taught Learning Implementation Step 1: Generate the input and test data sets We used the datasets from the MNIST Handwritten Digit Database for this project. Step 2: Train the sparse autoencoder We used the unlabeled data (the digits from 5 to 9) to train a sparse autoencoder. These results are shown after training is complete for a visualization of pen strokes like the image shown to the right Step 3: Extracting features After the sparse autoencoder is trained, we will use it to extract features from the handwritten digit images. Step 4: Training and testing the logistic regression model We will train a softmax classifier using the training set features and labels and finally computing the predictions and accuracy 39 of 37
  40. 40. Self-Taught Learning Setup Environment CPU Intel corei7 Quad processor 2.7GHz Core RAM 6 GB RAM Training Set 60,000 examples from MNIST database Unlabeled set 29404 examples Supervised training set 15298 examples Supervised testing set 15298 examples 40 of 37
  41. 41. Self-Taught Learning Results The results are shown below after training is complete for a visualization of pen strokes like the image shown below: 41 of 37
  42. 42. Self-Taught Learning Anaylsis We have done a comparison between our application outputs and the Stanford course tutorial outputs [8]. Our classifier Tutorial’s classifier Training Time 16 minutes 25 minutes Classifier Score (Accuracy) 98.208916% 98 % 42 of 37
  43. 43. Future Work We propose that if we were able to parallize our code or make the training part run on a GPU for example, it will boost the performance and decrease the time needed to train the classifier 43 of 37
  44. 44. References [1] Taiwo Oladipupo Ayodele. New Advances in Machine Learning. InTech, 2010. [2] SB Kotsiantis, ID Zaharakis, and PE Pintelas. Supervised machine learning: A review of classication techniques. 31:249-268, 2007. [3] Honglak Lee, Alexis Battle, Rajat Raina, and Andrew Ng. Ecient sparse coding algorithms. In Advances in neural information processing systems, pages 801-808,2006. [4] Bruno A Olshausen et al. Emergence of simple-cell receptive field properties by learning a sparse code for natural images.Nature, 381(6583):607-609, 1996. [5] Simon O. Haykin, ”Multilayer Perceptron,” in Neural Networks and Learning Machines, 3rd Edition ed. , Prentice Hall, 2009. [6] Andrew Ng. CS294A . Lecture notes, Topic : “Sparse autoencoder ” Standford University, Jan 11, 2011. Available: http://www.stanford.edu/class/cs294a/sparseAutoencoder_2011new.pdf. [Accessed Dec. 10,2013]. [7] Aapo Hyvärinen, Jarmo Hurri, and Patrik O. Hoyer, “Principal components and whitening,” in Natural Image Statistics: A Probabilistic Approach to Early Computational Vision., Vol. 39, Springer-Verlag, 2009,pp. 97-137 [8] Andrew Ng, Jiquan Ngiam, Chuan Yu Foo, Yifan Mai, and Caroline Suen, “UFLDL Tutorial”, April 7, 2013. [Online]. Available: http://deeplearning.stanford.edu/wiki/index.php/UFLDL_Tutorial. [Accessed Dec. 10,2013]. 44 of 37
  45. 45. Thank You!