Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016

709 views

Published on

Applying Deep Learning at Facebook Scale: Facebook leverages Deep Learning for various applications including event prediction, machine translation, natural language understanding and computer vision at a very large scale. There are more than a billion users logging on to Facebook every daily generating thousands of posts per second and uploading more than a billion images and videos every day. This talk will explain how Facebook scaled Deep Learning inference for realtime applications with latency budgets in the milliseconds.

Published in: Technology
  • Be the first to comment

Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016

  1. 1. Applying Deep Learning at Facebook Scale Director of Engineering Applied Machine Learning Hussein Mehanna
  2. 2. Event prediction Machine translation Large scale 
 computer vision Natural language processing Applications of deep learning
  3. 3. Event prediction Applications of deep learning Machine t Large scale 
 computer vision Natural language processing
  4. 4. Why should I like this story?
  5. 5. 1B+
 new stories every day + Billions
 of stories from 
 this day years ago
  6. 6. Billion people Thousand of stories In milliseconds
  7. 7. Deep learning for ranking Title Deep learning I like soccer I am from Australia I am 26 I traveled to Argentina Massive sparse logistic regression Deep neural networks +
  8. 8. Deep learning for ranking Massive sparse logistic regression Deep neural networks + Title Deep learning I like soccer I am from Australia I am 26 I traveled to Argentina
  9. 9. Applications of deep learning Event prediction Machine t Large scale 
 computer vision Natural language processing
  10. 10. Applications of deep learning Event prediction Machine translation Large scale 
 computer vision Natural language processing
  11. 11. Recurrent neural networks with attention decoder Machine translation with neural networks Encoder input Encoded states Encoder DecoderDecoder input Decoder have some todayGonna fun Vamos a divertirnos hoy Attention model
  12. 12. Applications of deep learning Event prediction Machine translation Large scale 
 computer vision Natural language processing
  13. 13. Applications of deep learning Natural language processing Large scale 
 computer vision Event prediction Machine t
  14. 14. Applications of deep learning Natural language processing Large scale 
 computer vision Event prediction Machine t
  15. 15. Applications of deep learning Large scale 
 computer vision Event prediction Machine t Natural language processing
  16. 16. VIDEO VIDEO VIDEO VIDEO VIDEO VIDEO VIDEOHundreds of Convolutional neural networks run on photos uploaded to Facebook
  17. 17. Classification Detection Segmentation person, plate, drink
  18. 18. Improving Inference for deep learning Compress models Memory usage in deep networks Compute faster
  19. 19. Improving Inference for deep learning Memory usage in deep networksCompute faster Compress models
  20. 20. Convolution implementation strategies 90%+
 of runtime for modern vision models
  21. 21. Faster convolution algorithms for deep learning Compute faster 201520142013 im2col + sgemm FFT Tiled FFT Winograd
  22. 22. CuDNN for CPUs NNPACK Easy integration CuDNN-style C interface, 
 easy to integrate Supports the computationally- intensive layers: • Convolutions (tiled FFT, Winograd) • Pooling • Fully connected layers (GEMM/GEMV) Via an x86-64 meta-assembler (PeachPy) Computationally-intensive Implementation (2x-6x) vs baseline CPU Excellent performance
  23. 23. Open source, integrated into frameworks NNPACK Caffe/Caffe2: github.com/ajtulloch/caffe/tree/nnpack-pr Torch: github.com/szagoruyko/nnpack.torch github.com/Maratyszcza/NNPACK Integrated into several deep learning frameworks:
  24. 24. Improving Inference for deep learning Memory usage in deep networksCompute faster Compress models
  25. 25. Compress models Memory usage in deep networks Compute faster Improving Inference for deep learning
  26. 26. The Memory Andy-Bill Theorem Trend • ResNets in vision • deep LSTMs in language modeling GPU memory relatively stable (12GB on Titan X/ M4, 16GB on P100) CPU memory has many constraints, especially in applied settings Scale Constraints
  27. 27. Spend in activations The bulk of memory is in the activations – must reuse Memory savings for modern ConvNets View 'activations' as virtual registers and run a register allocator (graph coloring on interference graph) 50%-90% Ideas from compilers Run inference in an O(N)-ResNet in O(1) memory! Run inference
  28. 28. AlexNet
  29. 29. AlexNet
  30. 30. Inception Network
  31. 31. Some implementations MXNet: github.com/dmlc/mxnet-memonger Caffe/Caffe2: github.com/facebook/fb-caffe-exts/ Torch: github.com/fmassa/optimize-net Can go further and explicitly trade-off compute and memory: ResNet-1000 from 48GB to 7GB for 30% slower timings
  32. 32. Improving Inference for deep learning Compress models Memory usage in deep networks Compute faster
  33. 33. Improving Inference for deep learning Memory usage in deep networks Compress models Compute faster
  34. 34. Train Connectivity Train Weights Prune Connections Generate Code Book Retrain Code Book Quantize the Weights with the Cluster the Weights Encode Weights Encode Index original artwork original size same accuracy 10x reduction same accuracy 27x-31x reduction same accuracy 35x-50x reduction Pruning Less number of weights Huffman Encoding Quantization less bits per weight Deep compression pipeline (Han et al)
  35. 35. All together: 
 Pruning + Quantization + Huffman coding 11.32% 10.91% 31.50% 31.17% 49X 552MB 11.3 MB
  36. 36. Event Machine Large scale computer Natural language Compress Memory usage in deep networks Compute faster

×