Reference Bengio, Y., Louradour, J., Collobert, R., Weston, J. Curriculum Learning. ICML, 2009. Bengio, Y. Evolving Culture vs Local Minima. arXiv, 2012. Bengio, Y. Deep Learning of Representations: Looking Forward. arXiv, 2013. Duchi, J., Hazan, E., Singer, Y. Adaptive Subgradient Methods for Online Learning andStochastic Optimization. COLT, 2010. Gulcehre, C., Bengio, Y. Knowledge Matters: Importance of Prior Information forOptimization. arXiv, 2013. Hinton, G. E. Training Products of Experts by Minimizing Contrastive Divergence. NeuralComputation, 2002. Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R. R.Improving neural networks by preventing co-adaptation of feature detectors. arXiv, 2012.39
Reference40 LeCun, Y., Bottou, L., Bengio, Y., Haffner, P. Gradient based learning applied todocument recognition. Proc. IEEE, 1998. Schaul, T., Zhang, S., LeCun, Y. No More Pesky Learning Rates. ICML, 2013a. Schaul, T., LeCun, Y. Adaptive learning rates and parallelization for stochastic, sparse,non-smooth gradients. ICLR, 2013b. Tang, Y. Deep Learning using Support Vector Machines. arXiv, 2013. Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.-A. Extracting and ComposingRobust Features with Denoising Autoencoders. ICML, 2008. Vinyals, O., Jia, Y., Deng, L., Darrell, T. Learning with Recursive PerceptualRepresentations. NIPS, 2012. Wang, S. I., Manning, C. D. Fast dropout training. ICML, 2013.
付録 : Referrence47 Elman, J. Finding structure in time. Cognitive Science, 1990. Jordan, M. Serial order: A parallel distributed processing aproach. Tech. Rep., 1986. Mesnil, G., He, X., Deng, L., Bengio, Y. Investigation of Recurrent-Neural-NetworkArchitectures and Learning Methods for Spoken Language Understanding.INTERSPEECH, 2013. Socher, R., Lin, C. C.-Y., Ng, A. Y., Manning, C. D. Parsing Natural Scenes and NaturalLanguage with Recursive Neural Networks. ICML, 2011. Sutskever, I., Martens, J., Hinton, G. Generating Text with Recurrent Neural Networks.ICML, 2011.