
Be the first to like this
Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Published on
Talk (to be given) June 8, 2018 at UC Berkeley / NERSC
Empirical results, using the machinery of Random Matrix Theory (RMT), are presented that are aimed at clarifying and resolving some of the puzzling and seeminglycontradictory aspects of deep neural networks (DNNs). We apply RMT to several well known pretrained models: LeNet5, AlexNet, and Inception V3, as well as 2 small, toy models. We show that the DNN training process itself implicitly implements a form of selfregularization associated with the entropy collapse / information bottleneck. We find that the selfregularization in small models like LeNet5, resembles the familar Tikhonov regularization, whereas large, modern deep networks display a new kind of heavy tailed selfregularization. We characterize selfregularization using RMT by identifying a taxonomy of the 5+1 phases of training. Then, with our toy models, we show that even in the absence of any explicit regularization mechanism, the DNN training process itself leads to more and more capacitycontrolled models. Importantly, this phenomenon is strongly affected by the many knobs that are used to optimize DNN training. In particular, we can induce heavy tailed selfregularization by adjusting the batch size in training, thereby exploiting the generalization gap phenomena unique to DNNs. We argue that this heavy tailed selfregularization has practical implications both designing better DNNs and deep theoretical implications for understanding the complex DNN Energy landscape / optimization problem.
Be the first to like this
Be the first to comment