Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- What to Upload to SlideShare by SlideShare 6505511 views
- Customer Code: Creating a Company C... by HubSpot 4845399 views
- Be A Great Product Leader (Amplify,... by Adam Nash 1084971 views
- Trillion Dollar Coach Book (Bill Ca... by Eric Schmidt 1272277 views
- APIdays Paris 2019 - Innovation @ s... by apidays 1527768 views
- A few thoughts on work life-balance by Wim Vanderbauwhede 1112794 views

No Downloads

Total views

1,806

On SlideShare

0

From Embeds

0

Number of Embeds

149

Shares

0

Downloads

19

Comments

6

Likes

3

No notes for slide

- 1. >< WHAT WOULD SHANNON DO? BAYESIAN COMPRESSION FOR DL KAREN ULLRICH UNIVERSITY OF AMSTERDAM DEEP LEARNING AND REINFORCEMENT LEARNING SUMMER SCHOOL MONTREAL 2017 1KAREN ULLRICH, JUN 2017
- 2. >< Motivation 2KAREN ULLRICH, JUN 201701 MOTIVATION
- 3. >< Motivation 3 - 1 Wh costs 0.0225 cent - running a Titan X for 1h: 5.625 cent KAREN ULLRICH, JUN 201701 MOTIVATION - facebook has 1.86 billion active users - VGG takes ~147ms/16 predictions - making one prediction for all users costs 20 k€
- 4. >< Motivation - Summary 4 - mobile devices have limited hardware - energy costs for predictions KAREN ULLRICH, JUN 201701 MOTIVATION - bandwidth transmitting models - speeding up inference for real time processing - relation to privacy
- 5. ><02 RECENT WORK Practical view on compression A : Sparsity learning • Structured Pruning: • CR: 5KAREN ULLRICH, MAR 2017 A = [ ] IC = [0 1 3 0 1 2] IR = [3, 0, 2, 3, …, 1] |w| |w6=0| • (Unstructured) Pruning • CR: .size = |w| |w6=0|.size = ⇡ |w| 2|w6=0|
- 6. ><02 RECENT WORK Practical view on compression B : Bit per weight reduction • precision quantisation 6KAREN ULLRICH, MAR 2017 • Set quantisation by clustering 1 -3.5667 2 0.4545 3 0.8735 4 0.33546 • CR: 32/10 = 3 • PRO: fast inference • CON: savings is not too big ? • CR: 32/4 = 8 • PRO: extreme compressible with e.g. further Hoﬀman encoding • CON: inference?
- 7. ><02 RECENT WORK Practical view on compression Summary - Properties 7KAREN ULLRICH, MAR 2017 Set quantisation Bit quantisation Unstructured pruning Structured pruning Set quantisation Bit quantisation Unstructured pruning - highest compression - ﬂop and energy savings moderate Structured pruning - lowest expected compression - BUT will save considerable amount of ﬂops and thus energy
- 8. ><02 RECENT WORK Practical view on compression Summary - Applications 8KAREN ULLRICH, MAR 2017 Set quantisation Bit quantisation Unstructured pruning -“ZIP”-format for NN - transmitting via limited channels - save millions of nets eﬃciently Structured pruning - inference at scale - real time predictions - hardware limited devices
- 9. >< 9 Variational lower bound KAREN ULLRICH, JUN 201703 MDL PRINCIPLE + VARIATIONAL LEARNING log p(D) L(q(w), w)) = Eq(w)[log p(D,w) q(w) ] = Eq(w)[log p(D|w)]] KL(q(w)||p(w)) Hinton, Geoffrey E., and Drew Van Camp. "Keeping the neural networks simple by minimizing the description length of the weights." Proceedings of the sixth annual conference on Computational learning theory. ACM, 1993.
- 10. >< MDL principle and Variational Learning 10 The best model is the one that compresses the data best. There are two costs, one for transmitting a model and one for reporting the data misﬁt. KAREN ULLRICH, JUN 201703 MDL PRINCIPLE + VARIATIONAL LEARNING Jorma Rissanen, 1978
- 11. >< 11 Variational lower bound KAREN ULLRICH, JUN 201703 MDL PRINCIPLE + VARIATIONAL LEARNING transmitting the model transmitting data misﬁt log p(D) L(q(w), w)) = Eq(w)[log p(D,w) q(w) ] = Eq(w)[log p(D|w)]] KL(q(w)||p(w)) Hinton, Geoffrey E., and Drew Van Camp. "Keeping the neural networks simple by minimizing the description length of the weights." Proceedings of the sixth annual conference on Computational learning theory. ACM, 1993.
- 12. >< Variational lower bound 12KAREN ULLRICH, JUN 201703 MDL PRINCIPLE + VARIATIONAL LEARNING log p(D) L(q(w), w)) = Eq(w)[log p(D,w) q(w) ] = Eq(w)[log p(D|w)]] KL(q(w)||p(w)) Hinton, Geoffrey E., and Drew Van Camp. "Keeping the neural networks simple by minimizing the description length of the weights." Proceedings of the sixth annual conference on Computational learning theory. ACM, 1993.
- 13. ><02 RECENT WORK Practical view on compression Summary - Properties 13KAREN ULLRICH, MAR 2017 Set quantisation Bit quantisation Unstructured pruning - highest compression - ﬂop and energy savings moderate Structured pruning - lowest expected compression - BUT will save considerable amount of ﬂops and thus energy
- 14. >< 14 • Solution: train a neural network with gaussian mixture model prior Soft weight-sharing for NN compression KAREN ULLRICH, EDWARD MEED & MAX WELLING ICLR 2017 KAREN ULLRICH, JUN 201704 CONTRIBUTED WORK q(w) = Y q(wi) = (wi|µi) • Pruning by setting one component to zero with high mixing proportion Nowlan, Steven J., and Geoffrey E. Hinton. "Simplifying neural networks by soft weight-sharing." Neural computation 4.4 (1992): 473-493.
- 15. >< 15 Soft weight-sharing for NN compression KAREN ULLRICH, EDWARD MEED & MAX WELLING ICLR 2017 KAREN ULLRICH, JUN 201704 CONTRIBUTED WORK
- 16. ><02 RECENT WORK Practical view on compression Summary - Properties 16KAREN ULLRICH, MAR 2017 Set quantisation Bit quantisation Unstructured pruning - highest compression - ﬂop and energy savings moderate Structured pruning - lowest expected compression - BUT will save considerable amount of ﬂops and thus energy
- 17. >< Bayesian Compression for Deep Learning 17 CHRISTOS LOUIZOS, KAREN ULLRICH & MAX WELLING UNDER SUBMISSION NIPS 2017 • Idea: use dropout to learn architecture • the variational version of dropout learns the dropout rate • Solution: Learn dropout rate for each weight structure, when weights have a high dropout rate we can safely ignore them • uncertainty in left over weights to compute bit precision KAREN ULLRICH, JUN 201704 CONTRIBUTED WORK Kingma, Diederik P., Tim Salimans, and Max Welling. "Variational dropout and the local reparameterization trick." NIPS. 2015. Molchanov, Dmitry, Arsenii Ashukha, and Dmitry Vetrov. "Variational Dropout Sparsiﬁes Deep Neural Networks." arXiv preprint arXiv:1701.05369 (2017).
- 18. >< Bayesian Compression for Deep Learning 18 CHRISTOS LOUIZOS, KAREN ULLRICH & MAX WELLING UNDER SUBMISSION NIPS 2017 push to zero for high dropout rates force high dropout rates KAREN ULLRICH, JUN 201704 CONTRIBUTED WORK q(w|z) = Y q(wi|zi) = N(wi|ziµi, z2 i 2 i ) q(z) = Y q(zi) = N(zi|µz i , ↵i)
- 19. >< Bayesian Compression for Deep Learning 19 CHRISTOS LOUIZOS, KAREN ULLRICH & MAX WELLING UNDER SUBMISSION NIPS 2017 KAREN ULLRICH, JUN 201704 CONTRIBUTED WORK p(w) = Z p(z)p(w|z)dz q(w|z) = Y q(wi|zi) = N(wi|ziµi, z2 i 2 i ) q(z) = Y q(zi) = N(zi|µz i , ↵i)
- 20. >< Bayesian Compression for Deep Learning 20 CHRISTOS LOUIZOS, KAREN ULLRICH & MAX WELLING UNDER SUBMISSION NIPS 2017 KAREN ULLRICH, JUN 201704 CONTRIBUTED WORK
- 21. >< Bayesian Compression for Deep Learning 21 CHRISTOS LOUIZOS, KAREN ULLRICH & MAX WELLING UNDER SUBMISSION NIPS 2017 KAREN ULLRICH, JUN 201704 CONTRIBUTED WORK
- 22. >< Bayesian Compression for Deep Learning 22 CHRISTOS LOUIZOS, KAREN ULLRICH & MAX WELLING UNDER SUBMISSION NIPS 2017 KAREN ULLRICH, JUN 201704 CONTRIBUTED WORK
- 23. >< Warning: Don’t be too enthusiastic! 23 These algorithms are merely proposals, little can be realised by common frameworks today. • Architecture pruning • Sparse matrix support (partially in big frameworks) • Reduced bit precision (NVIDIA is starting) • Clustering KAREN ULLRICH, JUN 201705 WARNING
- 24. >< 24KAREN ULLRICH, JUN 2017 Thank you for your attention. Any questions? KARENULLRICH.INFO _KARRRN_

No public clipboards found for this slide

Login to see the comments