Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Deep Learning & NLP: Graphs to the Rescue!

19,104 views

Published on

Lecture 21 October 2014

  • How Brainwave Frequencies Can Change Your Life! ➤➤ http://ishbv.com/manifmagic/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • The "Magical" Transformation That Happens When Two Brain Technologies Combine! ◆◆◆ http://t.cn/AiuvUCDd
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD THAT BOOKS INTO AVAILABLE FORMAT (2019 Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { http://bit.ly/2m77EgH } ......................................................................................................................... Download Full EPUB Ebook here { http://bit.ly/2m77EgH } ......................................................................................................................... Download Full doc Ebook here { http://bit.ly/2m77EgH } ......................................................................................................................... Download PDF EBOOK here { http://bit.ly/2m77EgH } ......................................................................................................................... Download EPUB Ebook here { http://bit.ly/2m77EgH } ......................................................................................................................... Download doc Ebook here { http://bit.ly/2m77EgH } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book that can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer that is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBooks .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story That Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money That the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths that Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Attract Abundance Into Your Life - New musical "Angel tone" calls in your angels to help you manifest abundance and miracles into your life... starting in just minutes per day. Go here to listen now. ♥♥♥ http://ishbv.com/manifmagic/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD THIS BOOKS INTO AVAILABLE FORMAT (Unlimited) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... Download Full EPUB Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... ACCESS WEBSITE for All Ebooks }} ......................................................................................................................... Download Full PDF EBOOK here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... Download EPUB Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... Download doc Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Deep Learning & NLP: Graphs to the Rescue!

  1. 1. Deep Learning & NLP Graphs to the Rescue! (or not yet…) Roelof Pieters, KTH/CSC, Graph Technologies R&D roelof@kth.se www.csc.kth.se/~roelof/ Twitter: @graphific Stockholm, Sics, October 21 2014
  2. 2. Definitions Machine Learning Improving some task T based on experience E with respect to performance measure P. - T. Mitchell (1997) Learning denotes changes in the system that are adaptive in the sense that they enable the system to do the same task (or tasks drawn from a population of similar tasks) more effectively the next time. - H. Simon (1983) 2
  3. 3. Definitions Representation learning Attempts to automatically learn good features or representations Deep learning Attempt to learn multiple levels of representation of increasing complexity/abstraction 3
  4. 4. Overview 1. From Machine Learning to Deep Learning 2. Natural Language Processing 3. Graph-Based Approaches to DL+NLP 4
  5. 5. 1. from Machine Learning to Deep Learning 5
  6. 6. Perceptron 6
  7. 7. Perceptron 6 • Rosenblatt 1957
  8. 8. Perceptron • Rosenblatt 1957 • Minsky & Papert 1969 6
  9. 9. Perceptron • Rosenblatt 1957 • Minsky & Papert 1969 The world believed Minsky & Papert… 6
  10. 10. 2th gen Perceptron • Quest to make it non-linear • no result… 7 Until finally… • Rumelhart, Hinton & Williams, 1986 • Multi-Layered Perceptrons (MLP) !!! • Backpropagation (Bryson & Ho 1969) (Rumelhart, Hinton & Williams, 1986)
  11. 11. • Forward Propagation : • Sum inputs, produce activation, feed-forward 8
  12. 12. • Back Propagation of Error • Calculate total error at the top • Calculate contributions to error at each step going backwards 9
  13. 13. Phase 1: Propagation Each propagation involves the following steps: 1. Forward propagation of a training pattern's input through the neural network in order to generate the propagation's output activations. 2. Backward propagation of the propagation's output activations through the neural network using the training pattern target in order to generate the deltas of all output and hidden neurons. Phase 2: Weight update For each weight-synapse follow the following steps: 1. Multiply its output delta and input activation to get the gradient of the weight. 2. Subtract a ratio (percentage) of the gradient from the weight. 10
  14. 14. Perceptron Network: SVM • Vapnik et al. 1992; 1995. 11 • Cortes & Vapnik 1995 Source: Cortes & Vapnik 1995
  15. 15. Perceptron Network: SVM • Vapnik et al. 1992; 1995. Kernel SVM 11 • Cortes & Vapnik 1995 Source: Cortes & Vapnik 1995
  16. 16. “2006” 12
  17. 17. “2006” • Faster machines (GPU’s!) 12
  18. 18. “2006” • Faster machines (GPU’s!) • More data 12
  19. 19. “2006” • Faster machines (GPU’s!) • More data • New methods for unsupervised pre-training 12
  20. 20. “2006” • New methods for unsupervised pre-training • Stacked RBM’s (Deep Belief Networks [DBN’s] ) • Hinton, G. E, Osindero, S., and Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18:1527-1554. • Hinton, G. E. and Salakhutdinov, R. R, Reducing the dimensionality of data with neural networks. Science, Vol. 313. no. 5786, pp. 504 - 507, 28 July 2006. 13 • (Stacked) Autoencoders • Bengio, Y., Lamblin, P., Popovici, P., Larochelle, H. (2007). Greedy Layer-Wise Training of Deep Networks, Advances in Neural Information Processing Systems 19
  21. 21. Pretraining: Stacked RBM’s • Iterative pre-training construction of Deep Belief Network (DBN) (Hinton et al., 2006) from: Larochelle et al. (2007). An Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation. 14
  22. 22. Pretraining: Stacked Denoising Auto-encoder • Stacking Auto-Encoders from: Bengio ICML 2009 15
  23. 23. Pretraining: Stacked Denoising Auto-encoder 16 • (Vincent et al, 2008) • Good vs Corrupted context from: Vincent et al 2010
  24. 24. Pretraining: Stacked Denoising Auto-encoder 16 • (Vincent et al, 2008) • Good vs Corrupted context Raw input from: Vincent et al 2010
  25. 25. Pretraining: Stacked Denoising Auto-encoder 16 • (Vincent et al, 2008) • Good vs Corrupted context Corrupted input Raw input from: Vincent et al 2010
  26. 26. Pretraining: Stacked Denoising Auto-encoder 16 • (Vincent et al, 2008) • Good vs Corrupted context Hidden code (representation) Corrupted input Raw input from: Vincent et al 2010
  27. 27. Pretraining: Stacked Denoising Auto-encoder Corrupted input Raw input reconstruction 16 • (Vincent et al, 2008) • Good vs Corrupted context Hidden code (representation) from: Vincent et al 2010
  28. 28. Pretraining: Stacked Denoising Auto-encoder KL(reconstruction | raw input) Corrupted input Raw input reconstruction 16 • (Vincent et al, 2008) • Good vs Corrupted context Hidden code (representation) from: Vincent et al 2010
  29. 29. 17
  30. 30. Convolutional Neural Networks (CNNs) • Fukushima 1980; LeCun et al. 1998; Behnke 2003; Simard et al. 2003… • Hinton et al. 2006; Bengio et al. 2007; Ranzato et al. 2007 • Sparse connectivity: 18 • MaxPooling • Shared weights: (Figures from http://deeplearning.net/tutorial/lenet.html)
  31. 31. Pretraining • Why does Pretraining work so well? (Erhan et al. 2010) • Better Generalisation without unsupervised pretraining with unsupervised pretraining) Figures from Erhan et al. 2010 19
  32. 32. Pretraining Figures from Erhan et al. 2010 20
  33. 33. “I’ve worked all my life in Machine Learning, and I’ve never seen one algorithm knock over benchmarks like Deep Learning” –Andrew Ng 21
  34. 34. The (god)fathers of DL 22
  35. 35. The (god)fathers of DL 22
  36. 36. The (god)fathers of DL 22
  37. 37. DL: (Every)where ? 23
  38. 38. DL: (Every)where ? • Language Modeling (2012, Mikolov et al) 23
  39. 39. DL: (Every)where ? • Language Modeling (2012, Mikolov et al) • Image Recognition (Krizhevsky won 2012 ImageNet competition) 23
  40. 40. DL: (Every)where ? • Language Modeling (2012, Mikolov et al) • Image Recognition (Krizhevsky won 2012 ImageNet competition) • Sentiment Classification (2011, Socher et al) 23
  41. 41. DL: (Every)where ? • Language Modeling (2012, Mikolov et al) • Image Recognition (Krizhevsky won 2012 ImageNet competition) • Sentiment Classification (2011, Socher et al) • Speech Recognition (2010, Dahl et al) 23
  42. 42. DL: (Every)where ? • Language Modeling (2012, Mikolov et al) • Image Recognition (Krizhevsky won 2012 ImageNet competition) • Sentiment Classification (2011, Socher et al) • Speech Recognition (2010, Dahl et al) • MNIST hand-written digit recognition (Ciresan et al, 2010) 23
  43. 43. 24
  44. 44. So: Why Deep? Deep Architectures can be representationally efficient • Fewer computational units for same function Deep Representations might allow for a hierarchy or representation • Allows non-local generalisation • Comprehensibility Multiple levels of latent variables allow combinatorial sharing of statistical strength 25
  45. 45. So: Why Deep? Generalizing better to new tasks & domains Can learn good intermediate representations shared across tasks Distributed representations Unsupervised Learning Multiple levels of representation 26
  46. 46. Diff Levels of Abstraction • Hierarchical Learning • Natural progression from low level to high level structure as seen in natural complexity • Easier to monitor what is being learnt and to guide the machine to better subspaces • A good lower level representation can be used for many distinct tasks 27
  47. 47. Generalizable Learning • Shared Low Level Representations • Multi-Task Learning • Unsupervised Training 28
  48. 48. Generalizable Learning • Shared Low Level Representations • Multi-Task Learning • Unsupervised Training 28 • Partial Feature Sharing • Mixed Mode Learning • Composition of Functions
  49. 49. No More Handcrafted Features ! 29
  50. 50. 2. Natural Language Processing 30
  51. 51. DL + NLP • Language Modeling • Bengio et al. (2000, 2003): via Neural network • Mnih and Hinton (2007): via RBMs • Pos, Chunking, NER, SRL • Collobert and Weston 2008 • Socher et al 2011; Socher 2014 31
  52. 52. Language Modeling • Word Embeddings (Bengio et al, 2001; Bengio et al, 2003) based on idea of distributed representations for symbols (Hinton 1986) • Neural Word embeddings (Turian et al 2010; Collobert et al. 2011) 32
  53. 53. Word Embeddings • Collobert & Weston 2008; Collobert et al. 2011 • similar to word vector learning, but uses instead of single scalar score, a Softmax/Maxent classifier word embeddings in from lookup table. From Collobert et al. 2011 33
  54. 54. Word Embeddings • Collobert & Weston 2008; Collobert et al. 2011 • similar to word vector learning, but uses instead of single scalar score, a Softmax/Maxent classifier Figure from Socher et al. Tutorial ACL 2012. 34
  55. 55. Figure from Socher et al. Tutorial ACL 2012. 35
  56. 56. • window approach • sentence approach source: Collobert & Weston, Deep Learning for Natural Language Processing. 2009 Nips 36
  57. 57. • Multi-task learning 37 source: Collobert & Weston, Deep Learning for Natural Language Processing. 2009 Nips
  58. 58. 38 General Deep Architecture for NLP source: Collobert & Weston, Deep Learning for Natural Language Processing. 2009 Nips
  59. 59. 38 General Deep Architecture for NLP Basic features source: Collobert & Weston, Deep Learning for Natural Language Processing. 2009 Nips
  60. 60. 38 General Deep Architecture for NLP Basic features Embeddings source: Collobert & Weston, Deep Learning for Natural Language Processing. 2009 Nips
  61. 61. 38 General Deep Architecture for NLP Basic features Embeddings Convolution source: Collobert & Weston, Deep Learning for Natural Language Processing. 2009 Nips
  62. 62. 38 General Deep Architecture for NLP Basic features Embeddings Convolution Max pooling source: Collobert & Weston, Deep Learning for Natural Language Processing. 2009 Nips
  63. 63. 38 General Deep Architecture for NLP Basic features Embeddings Convolution Max pooling “Supervised” learning source: Collobert & Weston, Deep Learning for Natural Language Processing. 2009 Nips
  64. 64. Word Embeddings • Unsupervised Word Representations (Turian et al 2010) • evaluates Brown clusters, C&W (Collobert and Weston 2008) embeddings, and HLBL (Mnih & Hinton, 2009) embeddings of words -> Brown clusters win out with a small margin on both NER and chunking. • more info: http://metaoptimize.com/projects/ wordreprs/ 39
  65. 65. 40 t-SNE visualizations of word embeddings. Left: Number Region; Right: Jobs Region. From Turian et al. 2011
  66. 66. http://metaoptimize.com/projects/wordreprs/ 41
  67. 67. Word Embeddings • Collobert & Weston 2008; Collobert et al. 2011 • Propose a unified neural network architecture, for many NLP tasks: • part-of-speech tagging, chunking, named entity recognition, and semantic role labeling • no hand-made input features • learns internal representations on the basis of vast amounts of mostly unlabeled training data. 42
  68. 68. Word Embeddings • Recurrent Neural Network (Mikolov et al. 2010; Mikolov et al. 2013a) W(‘‘woman")−W(‘‘man") ≃ W(‘‘aunt")−W(‘‘uncle") W(‘‘woman")−W(‘‘man") ≃ W(‘‘queen")−W(‘‘king") Figures from Mikolov, T., Yih, W., & Zweig, G. (2013). Linguistic Regularities in Continuous Space Word Representations 43
  69. 69. • Mikolov et al. 2013b Figures from Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013b). Efficient Estimation of Word Representations in Vector Space 44
  70. 70. Word Embeddings • Recursive (Tensor) Network (Socher et al. 2011; Socher 2014) 45
  71. 71. Vector Space Model 46
  72. 72. 47
  73. 73. 48
  74. 74. 49
  75. 75. 50
  76. 76. 51
  77. 77. 52
  78. 78. 53
  79. 79. 3. Graph-Based Approaches to DL+NLP • A) NLP “naturally encoded” • B) Genetic Finite State Machine • C) Neural net within Graph 54
  80. 80. Graph-Based NLP • Graphs have a “natural affinity” with NLP [ feel free to quote me on that ;) ] • relation-oriented • index-free adjacency 55
  81. 81. Whats in a Graph ? Figure from Buerli & Obispo (2012). 56
  82. 82. Whats in a Graph ? • Graph Databases: Neo4j, OrientDB, InfoGrid, Titan, FlockDB, ArangoDB, InfiniteGraph, AllegroGraph, DEX, GraphBase, and HyperGraphDB • Distributed graph processing toolkits (based on MapReduce, HDFS, and custom BSP engines): Bagel, Hama, Giraph, PEGASUS, Faunus, Flink • in-memory graph packages designed for massive shared-memory (NetworkX, Gephi, MTGL, Boost, uRika, and STINGER) 57
  83. 83. A. NLP “naturally encoded” • ie: graph-based opinion summarization (Ganesan et al. 2010; Genevan 2013) 58 • Captures: • Redundancies • Gapped Subsequences • Collapsible Structures From Ganesan 2013 Natural Affinity, Say what?
  84. 84. Summarization Graph 59 From Ganesan 2013
  85. 85. Natural Affinity? • Demo time! 60
  86. 86. B. Finite State Graph • Bastani 2014a; 2014b; 2014c • Probabilistic feature hierarchy • Grammatical inference by genetic algorithms more info: https://github.com/kbastani/graphify 61 Figure from Bastani 2014a
  87. 87. Finite State Graph 62 • Bastani 2014 • training phase: all figures from Bastani 2014b
  88. 88. Finite State Graph 62 • Bastani 2014 • training phase: all figures from Bastani 2014b
  89. 89. Finite State Graph 62 • Bastani 2014 • training phase: all figures from Bastani 2014b
  90. 90. Finite State Graph 62 • Bastani 2014 • training phase: all figures from Bastani 2014b
  91. 91. • sentiment analysis • error: 0.3 Figure from Bastani 2014c 63
  92. 92. Conceptual Hierarchical Graph • Demo time! 64
  93. 93. C. Factor Graph • Factor graph in which the factors themselves contain a deep neural net. • Factor graph: • bipartite graph representing the factorization of a function (Kschischang et al. 2001; Frey 2002) • can combine Bayesian networks (BNs) and Markov random fields (MRFs). Figure from Frey 2002 65
  94. 94. Factor Graph • Factor graph with “deep factors” (Mirowski & LeCun 2009) • Dynamic Time Series modeling 66
  95. 95. Energy-Based Graph • LeCun et al. 1998, handwriting recognition system • “Graph Transformer Networks” • Instead of normalised HMM, energy based factor graph (without normalization) • LeCun et al. 2006. • Energy-Based Learning 67
  96. 96. And finally… and Finally… What you’ve all been waiting for… 68
  97. 97. And finally… and Finally… What you’ve all been waiting for… Which Net is currently the Biggest ? 68
  98. 98. And finally… and Finally… What you’ve all been waiting for… Which Net is currently the Biggest ? 68 the Deepest
  99. 99. And finally… and Finally… What you’ve all been waiting for… Which Net is currently the Biggest ? 68 the Deepest The most Bad-ass ?
  100. 100. Winners of: Large Scale Visual Recognition Challenge 2014 (ILSVRC2014) 19 September 2014 source: Szegedy et al. Going deeper with convolutions (GoogLeNet ), ILSVRC2014, 19 Sep 2014 69
  101. 101. Winners of: Large Scale Visual Recognition Challenge 2014 (ILSVRC2014) 19 September 2014 GoogLeNet Convolution Pooling Softmax Other source: Szegedy et al. Going deeper with convolutions (GoogLeNet ), ILSVRC2014, 19 Sep 2014 69
  102. 102. Large Scale Visual Recognition Challenge 2014 GoogLeNet Convolution Pooling Softmax Other Winners of: (ILSVRC2014) 19 September 2014 GoogLeNet Convolution Pooling Softmax Other source: Szegedy et al. Going deeper with convolutions (GoogLeNet ), ILSVRC2014, 19 Sep 2014 69
  103. 103. Inception 256 480 480 512 512 512 832 832 1024 Width of inception modules ranges from 256 filters (in early modules) to 1024 in top inception modules. Can remove fully connected layers on top completely Number of parameters is reduced to 5 million Computional cost is increased by less than 2X compared to Krizhevsky’s network. (<1.5Bn operations/evaluation) source: Szegedy et al. Going deeper with convolutions (GoogLeNet ), ILSVRC2014, 19 Sep 2014 70
  104. 104. 71 Classification results on ImageNet 2012 Team Year Place Error (top-5) Uses external data SuperVision 2012 - 16.4% no SuperVision 2012 1st 15.3% ImageNet 22k Clarifai 2013 - 11.7% no Clarifai 2013 1st 11.2% ImageNet 22k MSRA 2014 3rd 7.35% no VGG 2014 2nd 7.32% no GoogLeNet 2014 1st 6.67% no Final Detection Results Team Year Place mAP e x t e r n a l data ensemble c o n t e x t u a l model approach UvA-Euvision 2013 1st 22.6% none ? yes F i s h e r vectors Deep Insight 2014 3rd 40.5% I L S V R C 1 2 Classification + Localization 3 models yes ConvNet C U H K DeepID-Net 2014 2nd 40.7% I L S V R C 1 2 Classification + Localization ? no ConvNet GoogLeNet 2014 1st 43.9% I L S V R C 1 2 Classification 6 models no ConvNet Detection results source: Szegedy et al. Going deeper with convolutions (GoogLeNet ), ILSVRC2014, 19 Sep 2014
  105. 105. Wanna Play? • cuda-convnet2 (Alex Krizhevsky, Toronto) (c++/ CUDA, optimized for GTX 580) https://code.google.com/p/cuda-convnet2/ • Caffe (Berkeley) (Cuda/OpenCL, Theano, Python) http://caffe.berkeleyvision.org/ • OverFeat (NYU) http://cilvr.nyu.edu/doku.php?id=code:start 72
  106. 106. Wanna Play? • Theano - CPU/GPU symbolic expression compiler in python (from LISA lab at University of Montreal). http:// deeplearning.net/software/theano/ • Pylearn2 - Pylearn2 is a library designed to make machine learning research easy. http://deeplearning.net/software/ pylearn2/ • Torch - provides a Matlab-like environment for state-of-the-art machine learning algorithms in lua (from Ronan Collobert, Clement Farabet and Koray Kavukcuoglu) http://torch.ch/ • more info: http://deeplearning.net/software links/ (slide partially stolen from: J. Sullivan, Convolutional Neural Networks & Computer Vision, Machine Learning meetup at Spotify, Stockholm, June 9 2014) 73
  107. 107. Fin. Questions / Discussion … ? 74
  108. 108. Bibliography: Definitions • Mitchell, T. M. (1997). Machine Learning (1st ed.). New York, NY, USA: McGraw-Hill, Inc. • Simon, H.A. (1983). Why should machines learn? in: Machine Learning: An Artificial Intelligence Approach, (R. Michalski, J. Carbonell, T. Mitchell, eds) Tioga Press, 25-38. 75
  109. 109. Bibliography: History • Rosenblatt, Frank (1957), The Perceptron--a perceiving and recognizing automaton. Report 85-460-1, Cornell Aeronautical Laboratory. • Minsky & Papert (1969), Perceptrons: an introduction to computational geometry. • Bryson, A.E.; W.F. Denham; S.E. Dreyfus (1963) Optimal programming problems with inequality constraints. I: Necessary conditions for extremal solutions. AIAA J. 1, 11 2544-2550. • Rumelhart, David E.; Hinton, Geoffrey E.; Williams, Ronald J. (1986). "Learning representations by back-propagating errors". Nature 323 (6088): 533–536. • Boser, B. E., Guyon, I., and Vapnik, V. (1992). A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pages 144– 152. ACM Press. • Cortes, C. and Vapnik, V. (1995), Support-vector network. Machine Learning, 20:273–297. • Larochelle, H., Erhan, D., Courville, A., Bergstra, J., & Bengio, Y. (2007). An Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation. In Proceedings of the 24th International Conference on Machine Learning (pp. 473–480). New York, NY, USA: ACM. • Vincent, P., Larochelle, H., & Lajoie, I. (2010), Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research, 11, 3371–3408. 76
  110. 110. Bibliography: History - CNN’s • Fukushima, Kunihiko (1980). "Neocognitron: A Self-organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position". Biological Cybernetics 36 (4): 193–202. doi:10.1007/BF00344251. PMID 7370364. Retrieved 16 November 2013. • LeCun, Yann; Léon Bottou; Yoshua Bengio; Patrick Haffner (1998). "Gradient-based learning applied to document recognition". Proceedings of the IEEE 86 (11): 2278–2324. • S. Behnke. Hierarchical Neural Networks for Image Interpretation, volume 2766 of Lecture Notes in Computer Science. Springer, 2003. • Simard, Patrice, David Steinkraus, and John C. Platt. "Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis." In ICDAR, vol. 3, pp. 958-962. 2003. • Hinton, GE; Osindero, S; Teh, YW (Jul 2006). "A fast learning algorithm for deep belief nets.". Neural computation 18 (7): 1527–54. • Bengio, Yoshua; Lamblin, Pascal; Popovici, Dan; Larochelle, Hugo (2007). "Greedy Layer-Wise Training of Deep Networks". Advances in Neural Information Processing Systems: 153–160. • Ranzato, MarcAurelio; Poultney, Christopher; Chopra, Sumit; LeCun, Yann (2007). "Efficient Learning of Sparse Representations with an Energy-Based Model". Advances in Neural Information Processing Systems. 77
  111. 111. Bibliography: DL • Bengio, Y., Ducharme, R., & Vincent, P. (2001). A Neural Probabilistic Language Model. In T. K. Leen & T. G. Dietterich (Eds.), Advances in Neural Information Processing Systems 13 (NIPS’00). MIT Press. • Bengio, Y., Ducharme, R., Vincent, P., & Janvin, C. (2003). A Neural Probabilistic Language Model. The Journal of Machine Learning Research, 3, 1137–1155. • Bengio, Y., Lamblin, P., Popovici, P., Larochelle, H. (2007). Greedy Layer-Wise Training of Deep Networks, Advances in Neural Information Processing Systems 19 • Hinton, G. E. (1986). Learning distributed representations of concepts. In Proceedings of the eighth annual conference of the cognitive science society (Vol. 1, p. 12). • Hinton, G. E. and Salakhutdinov, R. R, (2006) Reducing the dimensionality of data with neural networks. Science, Vol. 313. no. 5786, pp. 504 - 507, 28 July 2006. • Hinton, G. E, Osindero, S., and Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18:1527-1554. • Erhan, D., Bengio, Y., & Courville, A. (2010). Why does unsupervised pre-training help deep learning? Journal of Machine Learning Research, 11, 625–660. 78
  112. 112. Bibliography: DL • P. Vincent, P., Larochelle, H., Bengio, Y. and Manzagol, P. A. (2008) Extracting and composing robust features with denoising autoencoders. In ICML. • Vincent, P., Larochelle, H., & Lajoie, I. (2010). Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research, 11, 3371–3408. Bengui 2009 • Krizhevsky, A., Sutskever, I. and Hinton, G. E. (2012) Imagenet classification with deep convolutional neural networks. In NIPS. • Socher, Richard, Jeffrey Pennington, Eric H. Huang, Andrew Y. Ng, and Christopher D. Manning. (2011). Semi-supervised recursive autoencoders for predict- ing sentiment distributions. In Proceedings of the 2011 Conference on Empiri- cal Methods in Natural Language Processing (EMNLP). • Dahl, G. E., Ranzato, M. A., Mohamed, A. and Hinton, G. E. (2010) Phone recognition with the mean-covariance restricted Boltzmann machine. In NIPS. • Ciresan, D. C., Meier, U., Gambardella, L. M., & Schmidhuber, J. (2010). Deep Big Simple Neural Nets Excel on Handwritten Digit Recognition. CoRR. • Szegedy et al. (2014) Going deeper with convolutions (GoogLeNet ), ILSVRC2014, 19 Sep 2014 79
  113. 113. Bibliography: NLP • Turian, J., Ratinov, L., & Bengio, Y. (2010). Word Representations: A Simple and General Method for Semi-supervised Learning. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (pp. 384–394). Stroudsburg, PA, USA: Association for Computational Linguistics. • Collobert, R., & Weston, J. (2008). A unified architecture for natural language processing: Deep neural networks with multitask learning. Proceedings of the 25th International Conference …. • Collobert, R., Weston, J., & Bottou, L. (2011). Natural language processing (almost) from scratch. The Journal of Machine Learning Research, 12:2493-2537. • Collobert & Weston, Deep Learning for Natural Language Processing (2009) Nips Tutorial • Mikolov, T., Yih, W., & Zweig, G. (2013a). Linguistic Regularities in Continuous Space Word Representations. HLT-NAACL, (June), 746–751. • Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013b). Efficient Estimation of Word Representations in Vector Space, 1–12. Computation and Language. 80
  114. 114. Bibliography: NLP • Bengio, Y. and Bengio, S (2000) Modeling high- dimensional discrete data with multi-layer neural networks. In Proceedings of NIPS 12 • Mnih, A. and Hinton, G. E. (2007) Three New Graphical Models for Statistical Language Modelling. International Conference on Machine Learning, Corvallis, Oregon. • Socher, R., Bengio, Y., & Manning, C. (2012). Deep Learning for NLP (without Magic). Tutorial Abstracts of ACL 2012. • Socher, R. (2014). recursive deep learning for natural language processing and computer vision. Dissertation. 81
  115. 115. Bibliography: Graph-Based Approaches • Frey, B. (2002). Extending factor graphs so as to unify directed and undirected graphical models. Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence 19 (UAI 03), Morgan Kaufmann, CA, Acapulco, Mexico, 257–264. • F. R. Kschischang, B. J. Frey, H. A. L. (2001). Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory, 47(2), 498–519. • Mirowski, P., & LeCun, Y. (2009). Dynamic factor graphs for time series modeling. Machine Learning and Knowledge Discovery. • LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE November 1998. • LeCun, Y., Chopra, S., Hadsell, R., Ranzato, M. A., & Huang, F. J. (2006). A Tutorial on Energy-Based Learning 1 Introduction : Energy-Based Models, 1– 59. 82
  116. 116. Bibliography: Graph-Based Approaches • Buerli, M., & Obispo, C. (2012). The current state of graph databases. Department of Computer Science, Cal Poly San Luis Obispo • Ganesan, K., Zhai, C., & Han, J. (2010). Opinosis: a graph-based approach to abstractive summarization of highly redundant opinions. Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), (August), 340–348. • Ganesan, K. (2013). Opinion Driven Decision Support System. PhD Dissertation, University of Illinois. • Bastani, K. 2014a, Hierarchical Pattern Recognition, Blog: Meaning Of, June 17, 2014 • Bastani, K. 2014b, Using a Graph Database for Deep Learning Text Classification, Blog: Meaning Of, August 26, 2014 • Bastani, K. 2014c, Deep Learning Sentiment Analysis for Movie Reviews using Neo4j, Blog: Meaning Of, September 15, 2014 83

×