Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Unsupervised Learning (DLAI D9L1 2017 UPC Deep Learning for Artificial Intelligence)

415 views

Published on

Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.

Published in: Data & Analytics
  • Be the first to comment

Unsupervised Learning (DLAI D9L1 2017 UPC Deep Learning for Artificial Intelligence)

  1. 1. [course site] Xavier Giro-i-Nieto xavier.giro@upc.edu Associate Professor Universitat Politecnica de Catalunya Technical University of Catalonia Unsupervised Learning #DLUPC
  2. 2. 2 Acknowledegments Kevin McGuinness, “Unsupervised Learning” Deep Learning for Computer Vision. [Slides 2016] [Slides 2017]
  3. 3. 3 Outline 1. Motivation 2. Unsupervised Learning 3. Predictive Learning 4. Self-supervised Learning
  4. 4. 4 Outline 1. Motivation 2. Unsupervised Learning 3. Predictive Learning 4. Self-supervised Learning
  5. 5. 5 Motivation
  6. 6. 6 Motivation
  7. 7. 7 Greff, Klaus, Antti Rasmus, Mathias Berglund, Tele Hao, Harri Valpola, and Juergen Schmidhuber. "Tagger: Deep unsupervised perceptual grouping." NIPS 2016 [video] [code]
  8. 8. Yann Lecun’s Black Forest cake 8 Motivation
  9. 9. 1. = ƒ( ) ƒ( ) = ƒ( ) 9 Predict label y corresponding to observation x Estimate the distribution of observation x Predict action y based on observation x, to maximize a future reward z Motivation
  10. 10. 1. = ƒ( ) ƒ( ) = ƒ( ) 10 Motivation
  11. 11. Unsupervised Learning 11 Why Unsupervised Learning? ● It is the nature of how intelligent beings percept the world. ● It can save us tons of efforts to build a human-alike intelligent agent compared to a totally supervised fashion. ● Vast amounts of unlabelled data.
  12. 12. 12 Outline 1. Motivation 2. Unsupervised Learning 3. Predictive Learning 4. Self-supervised Learning
  13. 13. Max margin decision boundary 13 How data distribution P(x) influences decisions (1D) Slide credit: Kevin McGuinness (DLCV UPC 2017)
  14. 14. Semi supervised decision boundary 14 How data distribution P(x) influences decisions (1D) Slide credit: Kevin McGuinness (DLCV UPC 2017)
  15. 15. 15 How data distribution P(x) influences decisions (2D) Slide credit: Kevin McGuinness (DLCV UPC 2017)
  16. 16. 16 How data distribution P(x) influences decisions (2D) Slide credit: Kevin McGuinness (DLCV UPC 2017)
  17. 17. How P(x) is valuable for naive Bayesian classifier ● ● ● 17 X: Data Y: Labels Slide credit: Kevin McGuinness (DLCV UPC 2017)
  18. 18. x1 x2 Not linearly separable in this 2D space :( 18 How clustering is valueble for linear classifiers Slide credit: Kevin McGuinness (DLCV UPC 2017)
  19. 19. x1 x2 Cluster 1 Cluster 2 Cluster 3 Cluster 4 1 2 3 4 1 2 3 4 4D BoW representation Separable with a linear classifiers in a 4D space ! 19 How clustering is valueble for linear classifiers Slide credit: Kevin McGuinness (DLCV UPC 2017)
  20. 20. Example from Kevin McGuinness Juyter notebook. 20 How clustering is valueble for linear classifiers
  21. 21. 21 Assumptions for unsupervised learning Slide credit: Kevin McGuinness (DLCV UPC 2017) ● ○ ● ○ ● ○
  22. 22. 22 Assumptions for unsupervised learning Slide credit: Kevin McGuinness (DLCV UPC 2017) ● ○ ○ ● ○ ○ ○ ○ ● ○ ● ○ ○ ○ ○ ● ○ ○ ○ ○
  23. 23. 23 Assumptions for unsupervised learning Slide credit: Kevin McGuinness (DLCV UPC 2017) ● ○ ○ ● ○ ○ ○ ○ ● ○ ● ○ ○ ○ ○ ● ○ ○ ○ ○
  24. 24. 24 Slide credit: Kevin McGuinness (DLCV UPC 2017) ● ● ○ ● ○ The manifold hypothesis
  25. 25. 25 The manifold hypothesis Slide credit: Kevin McGuinness (DLCV UPC 2017) x1 x2 Linear manifold wT x + b x1 x2 Non-linear manifold
  26. 26. 26 The manifold hypothesis: Energy-based models ● ● ● ● ● LeCun et al, A Tutorial on Energy-Based Learning, Predicting Structured Data, 2006 http://yann.lecun.com/exdb/publis/pdf/lecun-06.pdf
  27. 27. 27 Autoencoder (AE) F. Van Veen, “The Neural Network Zoo” (2016)
  28. 28. 28 Autoencoder (AE) “Deep Learning Tutorial”, Dept. Computer Science, Stanford Autoencoders: ● Predict at the output the same input data. ● Do not need labels:
  29. 29. 29 Autoencoder (AE) “Deep Learning Tutorial”, Dept. Computer Science, Stanford Dimensionality reduction: ● Use hidden layer as a feature extractor of any desired size.
  30. 30. 30 Autoencoder (AE) Encoder W1 Decoder W2 hdata reconstruction Loss (reconstruction error) Latent variables (representation/features) Figure: Kevin McGuinness (DLCV UPC 2017) Pretraining: 1. Initialize a NN solving an autoencoding problem.
  31. 31. 31 Autoencoder (AE) Figure: Kevin McGuinness (DLCV UPC 2017) Encoder W1 hdata Classifier WC Latent variables (representation/features) prediction y Loss (cross entropy) Pretraining: 1. Initialize a NN solving an autoencoding problem. 2. Train for final task with “few” labels.
  32. 32. 32 Autoencoder (AE) Deep Learning TV, “Autoencoders - Ep. 10”
  33. 33. 33F. Van Veen, “The Neural Network Zoo” (2016) Restricted Boltzmann Machine (RBM)
  34. 34. 34 Restricted Boltzmann Machine (RBM) Figure: Geoffrey Hinton (2013) Salakhutdinov, Ruslan, Andriy Mnih, and Geoffrey Hinton. "Restricted Boltzmann machines for collaborative filtering." Proceedings of the 24th international conference on Machine learning. ACM, 2007. ● Shallow two-layer net. ● Restricted = No two nodes in a layer share a connection ● ● Bipartite graph. ● Bidirectional graph ○ Shared weights. ○ Different biases.
  35. 35. 35 Restricted Boltzmann Machine (RBM) DeepLearning4j, “A Beginner’s Tutorial for Restricted Boltzmann Machines”. Forward pass
  36. 36. 36 Restricted Boltzmann Machine (RBM) DeepLearning4j, “A Beginner’s Tutorial for Restricted Boltzmann Machines”. Forward pass
  37. 37. 37 Restricted Boltzmann Machine (RBM) DeepLearning4j, “A Beginner’s Tutorial for Restricted Boltzmann Machines”. Forward pass
  38. 38. 38 Restricted Boltzmann Machine (RBM) DeepLearning4j, “A Beginner’s Tutorial for Restricted Boltzmann Machines”. Backward pass
  39. 39. 39 Restricted Boltzmann Machine (RBM) DeepLearning4j, “A Beginner’s Tutorial for Restricted Boltzmann Machines”. Backward pass
  40. 40. 40 Restricted Boltzmann Machine (RBM) DeepLearning4j, “A Beginner’s Tutorial for Restricted Boltzmann Machines”. Backward pass The reconstructed values at the visible layer are compared with the actual ones with the KL Divergence.
  41. 41. 41 Restricted Boltzmann Machine (RBM) Figure: Geoffrey Hinton (2013) Salakhutdinov, Ruslan, Andriy Mnih, and Geoffrey Hinton. "Restricted Boltzmann machines for collaborative filtering." Proceedings of the 24th international conference on Machine learning. ACM, 2007.
  42. 42. 42 Restricted Boltzmann Machine (RBM) DeepLearning4j, “A Beginner’s Tutorial for Restricted Boltzmann Machines”. RBMs are a specific type of autoencoder. Unsupervised learning
  43. 43. 43 Restricted Boltzmann Machine (RBM) Deep Learning TV, “Restricted Boltzmann Machines”.
  44. 44. 44 Deep Belief Networks (DBN) F. Van Veen, “The Neural Network Zoo” (2016)
  45. 45. 45 Deep Belief Networks (DBN) Hinton, Geoffrey E., Simon Osindero, and Yee-Whye Teh. "A fast learning algorithm for deep belief nets." Neural computation 18, no. 7 (2006): 1527-1554. ● Architecture like an MLP. ● Training as a stack of RBMs.
  46. 46. 46 Deep Belief Networks (DBN) ● Architecture like an MLP. ● Training as a stack of RBMs. Hinton, Geoffrey E., Simon Osindero, and Yee-Whye Teh. "A fast learning algorithm for deep belief nets." Neural computation 18, no. 7 (2006): 1527-1554.
  47. 47. 47 Deep Belief Networks (DBN) Hinton, Geoffrey E., Simon Osindero, and Yee-Whye Teh. "A fast learning algorithm for deep belief nets." Neural computation 18, no. 7 (2006): 1527-1554. ● Architecture like an MLP. ● Training as a stack of RBMs.
  48. 48. 48 Deep Belief Networks (DBN) Hinton, Geoffrey E., Simon Osindero, and Yee-Whye Teh. "A fast learning algorithm for deep belief nets." Neural computation 18, no. 7 (2006): 1527-1554. ● Architecture like an MLP. ● Training as a stack of RBMs.
  49. 49. 49 Deep Belief Networks (DBN) Hinton, Geoffrey E., Simon Osindero, and Yee-Whye Teh. "A fast learning algorithm for deep belief nets." Neural computation 18, no. 7 (2006): 1527-1554. ● Architecture like an MLP. ● Training as a stack of RBMs… ● ...so they do not need labels: Unsupervised learning
  50. 50. 50 Deep Belief Networks (DBN) Hinton, Geoffrey E., Simon Osindero, and Yee-Whye Teh. "A fast learning algorithm for deep belief nets." Neural computation 18, no. 7 (2006): 1527-1554. After the DBN is trained, it can be fine-tuned with a reduced amount of labels to solve a supervised task with superior performance. Supervised learning Softmax
  51. 51. 51 Deep Belief Networks (DBN) Deep Learning TV, “Deep Belief Nets”
  52. 52. 52 Deep Belief Networks (DBN) Geoffrey Hinton, "Introduction to Deep Learning & Deep Belief Nets” (2012) Geoorey Hinton, “Tutorial on Deep Belief Networks”. NIPS 2007.
  53. 53. 53 Deep Belief Networks (DBN) Geoffrey Hinton
  54. 54. 54 Outline 1. Motivation 2. Unsupervised Learning 3. Predictive Learning 4. Self-supervised Learning
  55. 55. Acknowledgments 55 Junting Pan Xunyu Lin
  56. 56. Unsupervised Feature Learning 56 Slide credit: Yann LeCun
  57. 57. Unsupervised Feature Learning 57 Slide credit: Yann LeCun
  58. 58. Slide credit: Junting Pan
  59. 59. Frame Reconstruction & Prediction 59 Srivastava, Nitish, Elman Mansimov, and Ruslan Salakhutdinov. "Unsupervised Learning of Video Representations using LSTMs." In ICML 2015. [Github] Unsupervised feature learning (no labels) for...
  60. 60. Frame Reconstruction & Prediction 60 Srivastava, Nitish, Elman Mansimov, and Ruslan Salakhutdinov. "Unsupervised Learning of Video Representations using LSTMs." In ICML 2015. [Github] Unsupervised feature learning (no labels) for... ...frame prediction.
  61. 61. Frame Reconstruction & Prediction 61 Srivastava, Nitish, Elman Mansimov, and Ruslan Salakhutdinov. "Unsupervised Learning of Video Representations using LSTMs." In ICML 2015. [Github] Unsupervised feature learning (no labels) for... ...frame prediction.
  62. 62. 62 Srivastava, Nitish, Elman Mansimov, and Ruslan Salakhutdinov. "Unsupervised Learning of Video Representations using LSTMs." In ICML 2015. [Github] Unsupervised learned features (lots of data) are fine-tuned for activity recognition (little data). Frame Reconstruction & Prediction
  63. 63. Frame Prediction 63 Ranzato, MarcAurelio, Arthur Szlam, Joan Bruna, Michael Mathieu, Ronan Collobert, and Sumit Chopra. "Video (language) modeling: a baseline for generative models of natural videos." arXiv preprint arXiv:1412.6604 (2014).
  64. 64. 64 Mathieu, Michael, Camille Couprie, and Yann LeCun. "Deep multi-scale video prediction beyond mean square error." ICLR 2016 [project] [code] Video frame prediction with a ConvNet. Frame Prediction
  65. 65. 65 Mathieu, Michael, Camille Couprie, and Yann LeCun. "Deep multi-scale video prediction beyond mean square error." ICLR 2016 [project] [code] The blurry predictions from MSE are improved with multi-scale architecture, adversarial learning and an image gradient difference loss function. Frame Prediction
  66. 66. 66 Mathieu, Michael, Camille Couprie, and Yann LeCun. "Deep multi-scale video prediction beyond mean square error." ICLR 2016 [project] [code] The blurry predictions from MSE (l1) are improved with multi-scale architecture, adversarial training and an image gradient difference loss (GDL) function. Frame Prediction
  67. 67. 67 Mathieu, Michael, Camille Couprie, and Yann LeCun. "Deep multi-scale video prediction beyond mean square error." ICLR 2016 [project] [code] Frame Prediction
  68. 68. 68Vondrick, Carl, Hamed Pirsiavash, and Antonio Torralba. "Generating videos with scene dynamics." NIPS 2016. Frame Prediction
  69. 69. 69 Vondrick, Carl, Hamed Pirsiavash, and Antonio Torralba. "Generating videos with scene dynamics." NIPS 2016. Frame Prediction
  70. 70. 70 Xue, Tianfan, Jiajun Wu, Katherine Bouman, and Bill Freeman. "Visual dynamics: Probabilistic future frame synthesis via cross convolutional networks." NIPS 2016 [video]
  71. 71. 71 Frame Prediction Xue, Tianfan, Jiajun Wu, Katherine Bouman, and Bill Freeman. "Visual dynamics: Probabilistic future frame synthesis via cross convolutional networks." NIPS 2016 [video] Given an input image, probabilistic generation of future frames with a Variational AutoEncoder (VAE).
  72. 72. 72 Frame Prediction Xue, Tianfan, Jiajun Wu, Katherine Bouman, and Bill Freeman. "Visual dynamics: Probabilistic future frame synthesis via cross convolutional networks." NIPS 2016 [video] Encodes image as feature maps, and motion as and cross-convolutional kernels.
  73. 73. 73 Outline 1. Motivation 2. Unsupervised Learning 3. Predictive Learning 4. Self-supervised Learning
  74. 74. First steps in video feature learning 74 Le, Quoc V., Will Y. Zou, Serena Y. Yeung, and Andrew Y. Ng. "Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis." CVPR 2011
  75. 75. Temporal Weak Labels 75 Goroshin, Ross, Joan Bruna, Jonathan Tompson, David Eigen, and Yann LeCun. "Unsupervised learning of spatiotemporally coherent metrics." ICCV 2015. Assumption: adjacent video frames contain semantically similar information. Autoencoder trained with regularizations by slowliness and sparisty.
  76. 76. 76 Jayaraman, Dinesh, and Kristen Grauman. "Slow and steady feature analysis: higher order temporal coherence in video." CVPR 2016. [video] ● ● Temporal Weak Labels
  77. 77. 77 Jayaraman, Dinesh, and Kristen Grauman. "Slow and steady feature analysis: higher order temporal coherence in video." CVPR 2016. [video] Temporal Weak Labels
  78. 78. 78 (Slides by Xunyu Lin): Misra, Ishan, C. Lawrence Zitnick, and Martial Hebert. "Shuffle and learn: unsupervised learning using temporal order verification." ECCV 2016. [code] Temporal order of frames is exploited as the supervisory signal for learning. Temporal Weak Labels
  79. 79. 79 (Slides by Xunyu Lin): Misra, Ishan, C. Lawrence Zitnick, and Martial Hebert. "Shuffle and learn: unsupervised learning using temporal order verification." ECCV 2016. [code] Take temporal order as the supervisory signals for learning Shuffled sequences Binary classification In order Not in order Temporal Weak Labels
  80. 80. 80 Fernando, Basura, Hakan Bilen, Efstratios Gavves, and Stephen Gould. "Self-supervised video representation learning with odd-one-out networks." CVPR 2017 Temporal Weak Labels Train a network to detect which of the video sequences contains frames wrong order.
  81. 81. 81 Spatio-Temporal Weak Labels X. Lin, Campos, V., Giró-i-Nieto, X., Torres, J., and Canton-Ferrer, C., “Disentangling Motion, Foreground and Background Features in Videos”, in CVPR 2017 Workshop Brave New Motion Representations kernel dec C3D Foreground Motion First Foreground Background Fg Dec Bg Dec Fg Dec Reconstruction of foreground in last frame Reconstruction of foreground in first frame Reconstruction of background in first frame uNLC Mask Block gradients Last foreground Kernels share weights
  82. 82. 82 Spatio-Temporal Weak Labels Pathak, Deepak, Ross Girshick, Piotr Dollár, Trevor Darrell, and Bharath Hariharan. "Learning features by watching objects move." CVPR 2017
  83. 83. 83 Greff, Klaus, Antti Rasmus, Mathias Berglund, Tele Hao, Harri Valpola, and Juergen Schmidhuber. "Tagger: Deep unsupervised perceptual grouping." NIPS 2016 [video] [code] Spatio-Temporal Weak Labels
  84. 84. 84 Aytar, Yusuf, Carl Vondrick, and Antonio Torralba. "Soundnet: Learning sound representations from unlabeled video." NIPS 2016.
  85. 85. 85 Audio Features from Visual weak labels Aytar, Yusuf, Carl Vondrick, and Antonio Torralba. "Soundnet: Learning sound representations from unlabeled video." NIPS 2016. Object & Scenes recognition in videos by analysing the audio track (only).
  86. 86. 86 Aytar, Yusuf, Carl Vondrick, and Antonio Torralba. "Soundnet: Learning sound representations from unlabeled video." NIPS 2016. Visualization of the 1D filters over raw audio in conv1. Audio Features from Visual weak labels
  87. 87. 87 Aytar, Yusuf, Carl Vondrick, and Antonio Torralba. "Soundnet: Learning sound representations from unlabeled video." NIPS 2016. Visualization of the 1D filters over raw audio in conv1. Audio Features from Visual weak labels
  88. 88. 88 Aytar, Yusuf, Carl Vondrick, and Antonio Torralba. "Soundnet: Learning sound representations from unlabeled video." NIPS 2016. Visualization of the 1D filters over raw audio in conv1. Audio Features from Visual weak labels
  89. 89. 89 Visualization of the video frames associated to the sounds that activate some of the last hidden units of Soundnet (conv7): Aytar, Yusuf, Carl Vondrick, and Antonio Torralba. "Soundnet: Learning sound representations from unlabeled video." NIPS 2016. Audio Features from Visual weak labels
  90. 90. 90 Audio & Visual features from alignement 90Arandjelović, Relja, and Andrew Zisserman. "Look, Listen and Learn." ICCV 2017. Audio and visual features learned by assessing alignement.
  91. 91. 91 L. Chen, S. Srivastava, Z. Duan and C. Xu. Deep Cross-Modal Audio-Visual Generation. ACM International Conference on Multimedia Thematic Workshops, 2017. Audio & Visual features from alignement
  92. 92. 92Ephrat, Ariel, and Shmuel Peleg. "Vid2speech: Speech Reconstruction from Silent Video." ICASSP 2017
  93. 93. 93Ephrat, Ariel, and Shmuel Peleg. "Vid2speech: Speech Reconstruction from Silent Video." ICASSP 2017 Video to Speech Representations CNN (VGG) Frame from a silent video Audio feature Post-hoc synthesis
  94. 94. 94Chung, Joon Son, Amir Jamaludin, and Andrew Zisserman. "You said that?." BMVC 2017.
  95. 95. 95Chung, Joon Son, Amir Jamaludin, and Andrew Zisserman. "You said that?." BMVC 2017. Speech to Video Synthesis (mouth)
  96. 96. 96 Karras, Tero, Timo Aila, Samuli Laine, Antti Herva, and Jaakko Lehtinen. "Audio-driven facial animation by joint end-to-end learning of pose and emotion." SIGGRAPH 2017
  97. 97. 97 Karras, Tero, Timo Aila, Samuli Laine, Antti Herva, and Jaakko Lehtinen. "Audio-driven facial animation by joint end-to-end learning of pose and emotion." SIGGRAPH 2017 Speech to Video Synthesis (pose & emotion)
  98. 98. 98 Karras, Tero, Timo Aila, Samuli Laine, Antti Herva, and Jaakko Lehtinen. "Audio-driven facial animation by joint end-to-end learning of pose and emotion." SIGGRAPH 2017
  99. 99. 99 Suwajanakorn, Supasorn, Steven M. Seitz, and Ira Kemelmacher-Shlizerman. "Synthesizing Obama: learning lip sync from audio." SIGGRAPH 2017. Speech to Video Synthesis (mouth)
  100. 100. 100 Outline 1. Motivation 2. Unsupervised Learning 3. Predictive Learning 4. Self-supervised Learning

×