SlideShare a Scribd company logo
><
WHAT WOULD SHANNON DO?
BAYESIAN COMPRESSION FOR DL
KAREN ULLRICH

UNIVERSITY OF AMSTERDAM

DEEP LEARNING AND REINFORCEMENT LEARNING 

SUMMER SCHOOL MONTREAL 2017
1KAREN ULLRICH, JUN 2017
><
Motivation
2KAREN ULLRICH, JUN 201701 MOTIVATION
><
Motivation
3
- 1 Wh costs 0.0225 cent
- running a Titan X for 1h: 5.625 cent
KAREN ULLRICH, JUN 201701 MOTIVATION
- facebook has 1.86 billion active users
- VGG takes ~147ms/16 predictions
- making one prediction for all users costs
20 k€
><
Motivation - Summary
4
- mobile devices have limited hardware
- energy costs for predictions
KAREN ULLRICH, JUN 201701 MOTIVATION
- bandwidth transmitting models
- speeding up inference for real time
processing
- relation to privacy
><02 RECENT WORK
Practical view on compression
A : Sparsity learning
• Structured Pruning:

• CR:
5KAREN ULLRICH, MAR 2017
A = [ ]
IC = [0 1 3 0 1 2]
IR = [3, 0, 2, 3, …, 1]
|w|
|w6=0|
• (Unstructured) Pruning 

• CR:
.size = |w|
|w6=0|.size =
⇡ |w|
2|w6=0|
><02 RECENT WORK
Practical view on compression
B : Bit per weight reduction
• precision quantisation
6KAREN ULLRICH, MAR 2017
• Set quantisation by clustering
1 -3.5667
2 0.4545
3 0.8735
4 0.33546
• CR: 32/10 = 3

• PRO: fast inference

• CON: savings is not too big
?
• CR: 32/4 = 8

• PRO: extreme compressible
with e.g. further Hoffman
encoding 

• CON: inference?
><02 RECENT WORK
Practical view on compression
Summary - Properties
7KAREN ULLRICH, MAR 2017
Set quantisation Bit quantisation
Unstructured
pruning
Structured
pruning
Set quantisation Bit quantisation
Unstructured
pruning
- highest compression
- flop and energy savings
moderate
Structured
pruning
- lowest expected compression
- BUT will save considerable
amount of flops and thus energy
><02 RECENT WORK
Practical view on compression
Summary - Applications
8KAREN ULLRICH, MAR 2017
Set quantisation Bit quantisation
Unstructured
pruning
-“ZIP”-format for NN
- transmitting via limited channels
- save millions of nets efficiently
Structured
pruning
- inference at scale
- real time predictions
- hardware limited devices
>< 9
Variational lower bound
KAREN ULLRICH, JUN 201703 MDL PRINCIPLE + VARIATIONAL LEARNING
log p(D) L(q(w), w)) = Eq(w)[log p(D,w)
q(w) ]
= Eq(w)[log p(D|w)]] KL(q(w)||p(w))
Hinton, Geoffrey E., and Drew Van Camp. "Keeping the neural networks
simple by minimizing the description length of the weights." Proceedings of
the sixth annual conference on Computational learning theory. ACM, 1993.
><
MDL principle and Variational Learning
10
The best model is the one that compresses the data best.
There are two costs, one for transmitting a model and one
for reporting the data misfit.
KAREN ULLRICH, JUN 201703 MDL PRINCIPLE + VARIATIONAL LEARNING
Jorma Rissanen, 1978
>< 11
Variational lower bound
KAREN ULLRICH, JUN 201703 MDL PRINCIPLE + VARIATIONAL LEARNING
transmitting
the model
transmitting
data misfit
log p(D) L(q(w), w)) = Eq(w)[log p(D,w)
q(w) ]
= Eq(w)[log p(D|w)]] KL(q(w)||p(w))
Hinton, Geoffrey E., and Drew Van Camp. "Keeping the neural networks
simple by minimizing the description length of the weights." Proceedings of
the sixth annual conference on Computational learning theory. ACM, 1993.
><
Variational lower bound
12KAREN ULLRICH, JUN 201703 MDL PRINCIPLE + VARIATIONAL LEARNING
log p(D) L(q(w), w)) = Eq(w)[log p(D,w)
q(w) ]
= Eq(w)[log p(D|w)]] KL(q(w)||p(w))
Hinton, Geoffrey E., and Drew Van Camp. "Keeping the neural networks
simple by minimizing the description length of the weights." Proceedings of
the sixth annual conference on Computational learning theory. ACM, 1993.
><02 RECENT WORK
Practical view on compression
Summary - Properties
13KAREN ULLRICH, MAR 2017
Set quantisation Bit quantisation
Unstructured
pruning
- highest compression
- flop and energy savings
moderate
Structured
pruning
- lowest expected compression
- BUT will save considerable
amount of flops and thus energy
>< 14
• Solution: train a neural
network with gaussian
mixture model prior
Soft weight-sharing for NN compression
KAREN ULLRICH, EDWARD MEED & MAX WELLING

ICLR 2017
KAREN ULLRICH, JUN 201704 CONTRIBUTED WORK
q(w) =
Y
q(wi) = (wi|µi)
• Pruning by setting one
component to zero with
high mixing proportion
Nowlan, Steven J., and Geoffrey E. Hinton. "Simplifying neural networks by soft weight-sharing." Neural computation 4.4 (1992): 473-493.
>< 15
Soft weight-sharing for NN compression
KAREN ULLRICH, EDWARD MEED & MAX WELLING

ICLR 2017
KAREN ULLRICH, JUN 201704 CONTRIBUTED WORK
><02 RECENT WORK
Practical view on compression
Summary - Properties
16KAREN ULLRICH, MAR 2017
Set quantisation Bit quantisation
Unstructured
pruning
- highest compression
- flop and energy savings
moderate
Structured
pruning
- lowest expected compression
- BUT will save considerable
amount of flops and thus energy
><
Bayesian Compression for Deep Learning
17
CHRISTOS LOUIZOS, KAREN ULLRICH & MAX WELLING

UNDER SUBMISSION NIPS 2017
• Idea: use dropout to learn architecture 

• the variational version of dropout learns the dropout rate 

• Solution: Learn dropout rate for each weight structure, when
weights have a high dropout rate we can safely ignore them 

• uncertainty in left over weights to compute bit precision
KAREN ULLRICH, JUN 201704 CONTRIBUTED WORK
Kingma, Diederik P., Tim Salimans, and Max Welling. "Variational dropout and the local reparameterization trick." NIPS. 2015.
Molchanov, Dmitry, Arsenii Ashukha, and Dmitry Vetrov. "Variational Dropout Sparsifies Deep Neural Networks." arXiv preprint arXiv:1701.05369 (2017).
><
Bayesian Compression for Deep Learning
18
CHRISTOS LOUIZOS, KAREN ULLRICH & MAX WELLING

UNDER SUBMISSION NIPS 2017
push to zero for high dropout rates
force high dropout rates
KAREN ULLRICH, JUN 201704 CONTRIBUTED WORK
q(w|z) =
Y
q(wi|zi) = N(wi|ziµi, z2
i
2
i )
q(z) =
Y
q(zi) = N(zi|µz
i , ↵i)
><
Bayesian Compression for Deep Learning
19
CHRISTOS LOUIZOS, KAREN ULLRICH & MAX WELLING

UNDER SUBMISSION NIPS 2017
KAREN ULLRICH, JUN 201704 CONTRIBUTED WORK
p(w) =
Z
p(z)p(w|z)dz
q(w|z) =
Y
q(wi|zi) = N(wi|ziµi, z2
i
2
i )
q(z) =
Y
q(zi) = N(zi|µz
i , ↵i)
><
Bayesian Compression for Deep Learning
20
CHRISTOS LOUIZOS, KAREN ULLRICH & MAX WELLING

UNDER SUBMISSION NIPS 2017
KAREN ULLRICH, JUN 201704 CONTRIBUTED WORK
><
Bayesian Compression for Deep Learning
21
CHRISTOS LOUIZOS, KAREN ULLRICH & MAX WELLING

UNDER SUBMISSION NIPS 2017
KAREN ULLRICH, JUN 201704 CONTRIBUTED WORK
><
Bayesian Compression for Deep Learning
22
CHRISTOS LOUIZOS, KAREN ULLRICH & MAX WELLING

UNDER SUBMISSION NIPS 2017
KAREN ULLRICH, JUN 201704 CONTRIBUTED WORK
><
Warning: Don’t be too enthusiastic!
23
These algorithms are merely proposals, little can be realised by common
frameworks today.
• Architecture pruning
• Sparse matrix support (partially in big frameworks)
• Reduced bit precision (NVIDIA is starting)
• Clustering
KAREN ULLRICH, JUN 201705 WARNING
>< 24KAREN ULLRICH, JUN 2017
Thank you for your attention.
Any questions?
KARENULLRICH.INFO

_KARRRN_

More Related Content

Similar to What Would Shannon Do?

K-means Clustering Method for the Analysis of Log Data
K-means Clustering Method for the Analysis of Log DataK-means Clustering Method for the Analysis of Log Data
K-means Clustering Method for the Analysis of Log Data
idescitation
 
Small Deep-Neural-Networks: Their Advantages and Their Design
Small Deep-Neural-Networks: Their Advantages and Their DesignSmall Deep-Neural-Networks: Their Advantages and Their Design
Small Deep-Neural-Networks: Their Advantages and Their Design
Forrest Iandola
 
k means clustering-based data compression
k means clustering-based data compressionk means clustering-based data compression
k means clustering-based data compression
mohammed alrekabe
 
Clustering Algorithms for Data Stream
Clustering Algorithms for Data StreamClustering Algorithms for Data Stream
Clustering Algorithms for Data Stream
IRJET Journal
 
Paper Reviews on Visual Attention
Paper Reviews on Visual AttentionPaper Reviews on Visual Attention
Paper Reviews on Visual Attention
민기 정
 
Compressed sensing applications in image processing &amp; communication (1)
Compressed sensing applications in image processing &amp; communication (1)Compressed sensing applications in image processing &amp; communication (1)
Compressed sensing applications in image processing &amp; communication (1)
Mayur Sevak
 
Density based methods
Density based methodsDensity based methods
Density based methods
SVijaylakshmi
 
CNN
CNNCNN
Deep Learning without Annotations - Xavier Giro - UPC Barcelona 2018
Deep Learning without Annotations - Xavier Giro - UPC Barcelona 2018Deep Learning without Annotations - Xavier Giro - UPC Barcelona 2018
Deep Learning without Annotations - Xavier Giro - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)
Mostafa G. M. Mostafa
 
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETSFAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
csandit
 
CNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent AdvancesCNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent Advances
Dmytro Mishkin
 
Multiscale Granger Causality and Information Decomposition
Multiscale Granger Causality and Information Decomposition Multiscale Granger Causality and Information Decomposition
Multiscale Granger Causality and Information Decomposition
danielemarinazzo
 
3.4 density and grid methods
3.4 density and grid methods3.4 density and grid methods
3.4 density and grid methods
Krish_ver2
 
IBM Cloud Paris Meetup 20180517 - Deep Learning Challenges
IBM Cloud Paris Meetup 20180517 - Deep Learning ChallengesIBM Cloud Paris Meetup 20180517 - Deep Learning Challenges
IBM Cloud Paris Meetup 20180517 - Deep Learning Challenges
IBM France Lab
 
A hybrid security and compressive sensing based sensor data gathering scheme
A hybrid security and compressive sensing based sensor data gathering schemeA hybrid security and compressive sensing based sensor data gathering scheme
A hybrid security and compressive sensing based sensor data gathering scheme
redpel dot com
 
DLD_WeightSharing_Slide
DLD_WeightSharing_SlideDLD_WeightSharing_Slide
DLD_WeightSharing_Slide
Kang-Ho Lee
 
Green computing
Green computingGreen computing
Green computing
Ketan Hulaji
 
West denmark short term load forecast_for smart grids
West denmark short term load forecast_for smart gridsWest denmark short term load forecast_for smart grids
West denmark short term load forecast_for smart grids
moseifu
 
"Designing CNN Algorithms for Real-time Applications," a Presentation from Al...
"Designing CNN Algorithms for Real-time Applications," a Presentation from Al..."Designing CNN Algorithms for Real-time Applications," a Presentation from Al...
"Designing CNN Algorithms for Real-time Applications," a Presentation from Al...
Edge AI and Vision Alliance
 

Similar to What Would Shannon Do? (20)

K-means Clustering Method for the Analysis of Log Data
K-means Clustering Method for the Analysis of Log DataK-means Clustering Method for the Analysis of Log Data
K-means Clustering Method for the Analysis of Log Data
 
Small Deep-Neural-Networks: Their Advantages and Their Design
Small Deep-Neural-Networks: Their Advantages and Their DesignSmall Deep-Neural-Networks: Their Advantages and Their Design
Small Deep-Neural-Networks: Their Advantages and Their Design
 
k means clustering-based data compression
k means clustering-based data compressionk means clustering-based data compression
k means clustering-based data compression
 
Clustering Algorithms for Data Stream
Clustering Algorithms for Data StreamClustering Algorithms for Data Stream
Clustering Algorithms for Data Stream
 
Paper Reviews on Visual Attention
Paper Reviews on Visual AttentionPaper Reviews on Visual Attention
Paper Reviews on Visual Attention
 
Compressed sensing applications in image processing &amp; communication (1)
Compressed sensing applications in image processing &amp; communication (1)Compressed sensing applications in image processing &amp; communication (1)
Compressed sensing applications in image processing &amp; communication (1)
 
Density based methods
Density based methodsDensity based methods
Density based methods
 
CNN
CNNCNN
CNN
 
Deep Learning without Annotations - Xavier Giro - UPC Barcelona 2018
Deep Learning without Annotations - Xavier Giro - UPC Barcelona 2018Deep Learning without Annotations - Xavier Giro - UPC Barcelona 2018
Deep Learning without Annotations - Xavier Giro - UPC Barcelona 2018
 
Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)
 
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETSFAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
 
CNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent AdvancesCNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent Advances
 
Multiscale Granger Causality and Information Decomposition
Multiscale Granger Causality and Information Decomposition Multiscale Granger Causality and Information Decomposition
Multiscale Granger Causality and Information Decomposition
 
3.4 density and grid methods
3.4 density and grid methods3.4 density and grid methods
3.4 density and grid methods
 
IBM Cloud Paris Meetup 20180517 - Deep Learning Challenges
IBM Cloud Paris Meetup 20180517 - Deep Learning ChallengesIBM Cloud Paris Meetup 20180517 - Deep Learning Challenges
IBM Cloud Paris Meetup 20180517 - Deep Learning Challenges
 
A hybrid security and compressive sensing based sensor data gathering scheme
A hybrid security and compressive sensing based sensor data gathering schemeA hybrid security and compressive sensing based sensor data gathering scheme
A hybrid security and compressive sensing based sensor data gathering scheme
 
DLD_WeightSharing_Slide
DLD_WeightSharing_SlideDLD_WeightSharing_Slide
DLD_WeightSharing_Slide
 
Green computing
Green computingGreen computing
Green computing
 
West denmark short term load forecast_for smart grids
West denmark short term load forecast_for smart gridsWest denmark short term load forecast_for smart grids
West denmark short term load forecast_for smart grids
 
"Designing CNN Algorithms for Real-time Applications," a Presentation from Al...
"Designing CNN Algorithms for Real-time Applications," a Presentation from Al..."Designing CNN Algorithms for Real-time Applications," a Presentation from Al...
"Designing CNN Algorithms for Real-time Applications," a Presentation from Al...
 

Recently uploaded

Machine learning _new.pptx for a presentation
Machine learning _new.pptx for a presentationMachine learning _new.pptx for a presentation
Machine learning _new.pptx for a presentation
RahulS66654
 
Australian Catholic University degree offer diploma Transcript
Australian Catholic University  degree offer diploma TranscriptAustralian Catholic University  degree offer diploma Transcript
Australian Catholic University degree offer diploma Transcript
taqyea
 
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
javier ramirez
 
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
sharonblush
 
Simon Fraser University degree offer diploma Transcript
Simon Fraser University  degree offer diploma TranscriptSimon Fraser University  degree offer diploma Transcript
Simon Fraser University degree offer diploma Transcript
taqyea
 
Maruti Wagon R on road price in Faridabad - CarDekho
Maruti Wagon R on road price in Faridabad - CarDekhoMaruti Wagon R on road price in Faridabad - CarDekho
Maruti Wagon R on road price in Faridabad - CarDekho
kamli sharma#S10
 
the potential of the development of the Ford–Fulkerson algorithm to solve the...
the potential of the development of the Ford–Fulkerson algorithm to solve the...the potential of the development of the Ford–Fulkerson algorithm to solve the...
the potential of the development of the Ford–Fulkerson algorithm to solve the...
huseindihon
 
Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...
Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...
Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...
janvikumar4133
 
Supervised Learning (Data Science).pptx
Supervised Learning  (Data Science).pptxSupervised Learning  (Data Science).pptx
Supervised Learning (Data Science).pptx
TARIKU ENDALE
 
International Journal of Fuzzy Logic Systems (IJFLS)
International Journal of Fuzzy Logic Systems (IJFLS)International Journal of Fuzzy Logic Systems (IJFLS)
International Journal of Fuzzy Logic Systems (IJFLS)
ijfls
 
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
dizzycaye
 
VIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 in...
VIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 in...VIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 in...
VIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 in...
44annissa
 
Introduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdfIntroduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdf
kihus38
 
Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...
Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...
Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...
birajmohan012
 
Universidad Camilo José Cela degree offer diploma Transcript
Universidad Camilo José Cela  degree offer diploma TranscriptUniversidad Camilo José Cela  degree offer diploma Transcript
Universidad Camilo José Cela degree offer diploma Transcript
taqyea
 
Amul goes international: Desi dairy giant to launch fresh ...
Amul goes international: Desi dairy giant to launch fresh ...Amul goes international: Desi dairy giant to launch fresh ...
Amul goes international: Desi dairy giant to launch fresh ...
chetankumar9855
 
The University of New England degree offer diploma Transcript
The University of New England  degree offer diploma TranscriptThe University of New England  degree offer diploma Transcript
The University of New England degree offer diploma Transcript
taqyea
 
Universidad de Alcalá degree offer diploma Transcript
Universidad de Alcalá  degree offer diploma TranscriptUniversidad de Alcalá  degree offer diploma Transcript
Universidad de Alcalá degree offer diploma Transcript
taqyea
 
University of Wollongong degree offer diploma Transcript
University of Wollongong  degree offer diploma TranscriptUniversity of Wollongong  degree offer diploma Transcript
University of Wollongong degree offer diploma Transcript
taqyea
 
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy DsouzaOpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata
 

Recently uploaded (20)

Machine learning _new.pptx for a presentation
Machine learning _new.pptx for a presentationMachine learning _new.pptx for a presentation
Machine learning _new.pptx for a presentation
 
Australian Catholic University degree offer diploma Transcript
Australian Catholic University  degree offer diploma TranscriptAustralian Catholic University  degree offer diploma Transcript
Australian Catholic University degree offer diploma Transcript
 
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
 
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
 
Simon Fraser University degree offer diploma Transcript
Simon Fraser University  degree offer diploma TranscriptSimon Fraser University  degree offer diploma Transcript
Simon Fraser University degree offer diploma Transcript
 
Maruti Wagon R on road price in Faridabad - CarDekho
Maruti Wagon R on road price in Faridabad - CarDekhoMaruti Wagon R on road price in Faridabad - CarDekho
Maruti Wagon R on road price in Faridabad - CarDekho
 
the potential of the development of the Ford–Fulkerson algorithm to solve the...
the potential of the development of the Ford–Fulkerson algorithm to solve the...the potential of the development of the Ford–Fulkerson algorithm to solve the...
the potential of the development of the Ford–Fulkerson algorithm to solve the...
 
Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...
Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...
Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...
 
Supervised Learning (Data Science).pptx
Supervised Learning  (Data Science).pptxSupervised Learning  (Data Science).pptx
Supervised Learning (Data Science).pptx
 
International Journal of Fuzzy Logic Systems (IJFLS)
International Journal of Fuzzy Logic Systems (IJFLS)International Journal of Fuzzy Logic Systems (IJFLS)
International Journal of Fuzzy Logic Systems (IJFLS)
 
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
 
VIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 in...
VIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 in...VIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 in...
VIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 in...
 
Introduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdfIntroduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdf
 
Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...
Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...
Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...
 
Universidad Camilo José Cela degree offer diploma Transcript
Universidad Camilo José Cela  degree offer diploma TranscriptUniversidad Camilo José Cela  degree offer diploma Transcript
Universidad Camilo José Cela degree offer diploma Transcript
 
Amul goes international: Desi dairy giant to launch fresh ...
Amul goes international: Desi dairy giant to launch fresh ...Amul goes international: Desi dairy giant to launch fresh ...
Amul goes international: Desi dairy giant to launch fresh ...
 
The University of New England degree offer diploma Transcript
The University of New England  degree offer diploma TranscriptThe University of New England  degree offer diploma Transcript
The University of New England degree offer diploma Transcript
 
Universidad de Alcalá degree offer diploma Transcript
Universidad de Alcalá  degree offer diploma TranscriptUniversidad de Alcalá  degree offer diploma Transcript
Universidad de Alcalá degree offer diploma Transcript
 
University of Wollongong degree offer diploma Transcript
University of Wollongong  degree offer diploma TranscriptUniversity of Wollongong  degree offer diploma Transcript
University of Wollongong degree offer diploma Transcript
 
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy DsouzaOpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
 

What Would Shannon Do?

  • 1. >< WHAT WOULD SHANNON DO? BAYESIAN COMPRESSION FOR DL KAREN ULLRICH UNIVERSITY OF AMSTERDAM DEEP LEARNING AND REINFORCEMENT LEARNING SUMMER SCHOOL MONTREAL 2017 1KAREN ULLRICH, JUN 2017
  • 3. >< Motivation 3 - 1 Wh costs 0.0225 cent - running a Titan X for 1h: 5.625 cent KAREN ULLRICH, JUN 201701 MOTIVATION - facebook has 1.86 billion active users - VGG takes ~147ms/16 predictions - making one prediction for all users costs 20 k€
  • 4. >< Motivation - Summary 4 - mobile devices have limited hardware - energy costs for predictions KAREN ULLRICH, JUN 201701 MOTIVATION - bandwidth transmitting models - speeding up inference for real time processing - relation to privacy
  • 5. ><02 RECENT WORK Practical view on compression A : Sparsity learning • Structured Pruning: • CR: 5KAREN ULLRICH, MAR 2017 A = [ ] IC = [0 1 3 0 1 2] IR = [3, 0, 2, 3, …, 1] |w| |w6=0| • (Unstructured) Pruning • CR: .size = |w| |w6=0|.size = ⇡ |w| 2|w6=0|
  • 6. ><02 RECENT WORK Practical view on compression B : Bit per weight reduction • precision quantisation 6KAREN ULLRICH, MAR 2017 • Set quantisation by clustering 1 -3.5667 2 0.4545 3 0.8735 4 0.33546 • CR: 32/10 = 3 • PRO: fast inference • CON: savings is not too big ? • CR: 32/4 = 8 • PRO: extreme compressible with e.g. further Hoffman encoding • CON: inference?
  • 7. ><02 RECENT WORK Practical view on compression Summary - Properties 7KAREN ULLRICH, MAR 2017 Set quantisation Bit quantisation Unstructured pruning Structured pruning Set quantisation Bit quantisation Unstructured pruning - highest compression - flop and energy savings moderate Structured pruning - lowest expected compression - BUT will save considerable amount of flops and thus energy
  • 8. ><02 RECENT WORK Practical view on compression Summary - Applications 8KAREN ULLRICH, MAR 2017 Set quantisation Bit quantisation Unstructured pruning -“ZIP”-format for NN - transmitting via limited channels - save millions of nets efficiently Structured pruning - inference at scale - real time predictions - hardware limited devices
  • 9. >< 9 Variational lower bound KAREN ULLRICH, JUN 201703 MDL PRINCIPLE + VARIATIONAL LEARNING log p(D) L(q(w), w)) = Eq(w)[log p(D,w) q(w) ] = Eq(w)[log p(D|w)]] KL(q(w)||p(w)) Hinton, Geoffrey E., and Drew Van Camp. "Keeping the neural networks simple by minimizing the description length of the weights." Proceedings of the sixth annual conference on Computational learning theory. ACM, 1993.
  • 10. >< MDL principle and Variational Learning 10 The best model is the one that compresses the data best. There are two costs, one for transmitting a model and one for reporting the data misfit. KAREN ULLRICH, JUN 201703 MDL PRINCIPLE + VARIATIONAL LEARNING Jorma Rissanen, 1978
  • 11. >< 11 Variational lower bound KAREN ULLRICH, JUN 201703 MDL PRINCIPLE + VARIATIONAL LEARNING transmitting the model transmitting data misfit log p(D) L(q(w), w)) = Eq(w)[log p(D,w) q(w) ] = Eq(w)[log p(D|w)]] KL(q(w)||p(w)) Hinton, Geoffrey E., and Drew Van Camp. "Keeping the neural networks simple by minimizing the description length of the weights." Proceedings of the sixth annual conference on Computational learning theory. ACM, 1993.
  • 12. >< Variational lower bound 12KAREN ULLRICH, JUN 201703 MDL PRINCIPLE + VARIATIONAL LEARNING log p(D) L(q(w), w)) = Eq(w)[log p(D,w) q(w) ] = Eq(w)[log p(D|w)]] KL(q(w)||p(w)) Hinton, Geoffrey E., and Drew Van Camp. "Keeping the neural networks simple by minimizing the description length of the weights." Proceedings of the sixth annual conference on Computational learning theory. ACM, 1993.
  • 13. ><02 RECENT WORK Practical view on compression Summary - Properties 13KAREN ULLRICH, MAR 2017 Set quantisation Bit quantisation Unstructured pruning - highest compression - flop and energy savings moderate Structured pruning - lowest expected compression - BUT will save considerable amount of flops and thus energy
  • 14. >< 14 • Solution: train a neural network with gaussian mixture model prior Soft weight-sharing for NN compression KAREN ULLRICH, EDWARD MEED & MAX WELLING ICLR 2017 KAREN ULLRICH, JUN 201704 CONTRIBUTED WORK q(w) = Y q(wi) = (wi|µi) • Pruning by setting one component to zero with high mixing proportion Nowlan, Steven J., and Geoffrey E. Hinton. "Simplifying neural networks by soft weight-sharing." Neural computation 4.4 (1992): 473-493.
  • 15. >< 15 Soft weight-sharing for NN compression KAREN ULLRICH, EDWARD MEED & MAX WELLING ICLR 2017 KAREN ULLRICH, JUN 201704 CONTRIBUTED WORK
  • 16. ><02 RECENT WORK Practical view on compression Summary - Properties 16KAREN ULLRICH, MAR 2017 Set quantisation Bit quantisation Unstructured pruning - highest compression - flop and energy savings moderate Structured pruning - lowest expected compression - BUT will save considerable amount of flops and thus energy
  • 17. >< Bayesian Compression for Deep Learning 17 CHRISTOS LOUIZOS, KAREN ULLRICH & MAX WELLING UNDER SUBMISSION NIPS 2017 • Idea: use dropout to learn architecture • the variational version of dropout learns the dropout rate • Solution: Learn dropout rate for each weight structure, when weights have a high dropout rate we can safely ignore them • uncertainty in left over weights to compute bit precision KAREN ULLRICH, JUN 201704 CONTRIBUTED WORK Kingma, Diederik P., Tim Salimans, and Max Welling. "Variational dropout and the local reparameterization trick." NIPS. 2015. Molchanov, Dmitry, Arsenii Ashukha, and Dmitry Vetrov. "Variational Dropout Sparsifies Deep Neural Networks." arXiv preprint arXiv:1701.05369 (2017).
  • 18. >< Bayesian Compression for Deep Learning 18 CHRISTOS LOUIZOS, KAREN ULLRICH & MAX WELLING UNDER SUBMISSION NIPS 2017 push to zero for high dropout rates force high dropout rates KAREN ULLRICH, JUN 201704 CONTRIBUTED WORK q(w|z) = Y q(wi|zi) = N(wi|ziµi, z2 i 2 i ) q(z) = Y q(zi) = N(zi|µz i , ↵i)
  • 19. >< Bayesian Compression for Deep Learning 19 CHRISTOS LOUIZOS, KAREN ULLRICH & MAX WELLING UNDER SUBMISSION NIPS 2017 KAREN ULLRICH, JUN 201704 CONTRIBUTED WORK p(w) = Z p(z)p(w|z)dz q(w|z) = Y q(wi|zi) = N(wi|ziµi, z2 i 2 i ) q(z) = Y q(zi) = N(zi|µz i , ↵i)
  • 20. >< Bayesian Compression for Deep Learning 20 CHRISTOS LOUIZOS, KAREN ULLRICH & MAX WELLING UNDER SUBMISSION NIPS 2017 KAREN ULLRICH, JUN 201704 CONTRIBUTED WORK
  • 21. >< Bayesian Compression for Deep Learning 21 CHRISTOS LOUIZOS, KAREN ULLRICH & MAX WELLING UNDER SUBMISSION NIPS 2017 KAREN ULLRICH, JUN 201704 CONTRIBUTED WORK
  • 22. >< Bayesian Compression for Deep Learning 22 CHRISTOS LOUIZOS, KAREN ULLRICH & MAX WELLING UNDER SUBMISSION NIPS 2017 KAREN ULLRICH, JUN 201704 CONTRIBUTED WORK
  • 23. >< Warning: Don’t be too enthusiastic! 23 These algorithms are merely proposals, little can be realised by common frameworks today. • Architecture pruning • Sparse matrix support (partially in big frameworks) • Reduced bit precision (NVIDIA is starting) • Clustering KAREN ULLRICH, JUN 201705 WARNING
  • 24. >< 24KAREN ULLRICH, JUN 2017 Thank you for your attention. Any questions? KARENULLRICH.INFO _KARRRN_