SlideShare a Scribd company logo
Hacking GPUs for Deep Learning
MLConf New York
Jeff Johnson
Facebook AI Research
jhj@fb.com
Deep (convolutional) Neural Networks
Revolution in machine learning
Convolution: since 1980s. Deep: flops since 2000s
Avoid feature engineering
▪ With enough data, let network discover feature
representations
▪ Can work even for NLP. No word segmentation, use
raw character data.
2D Convolutional Nets (images)
LeCun, Bottou,
Bengio and Haffner,
1998
Krizhevsky,
Sutskever and
Hinton, 2012
2D Convolutional Nets
Progress towards smaller kernels and deeper nets
Network architecture ImageNet 1000 class top-5
error
AlexNet ~15%
OverFeat ~13%
ZeilerNet ~11%
Oxford-VGG ~7%
GoogLeNet ~6%, ~4.5%
PReLU (MSR) ~4.9%
Human performance 3-5%
3D Convolutional Nets (videos)
C3D (Tran et al., 2014)
DeepVideo (Karpathy et al., 2014)
1D Convolutional Nets (text, sequences)
Collobert et al., 2011
Zhang and LeCun, 2015
RNNs and LSTMs (text, sequences)
Graves, Mohamed and Hinton, 2013
Mikolov, 2014
Deep Neural Networks
Supervised learning. Unsupervised ???
Train with back-propagation/SGD variants
Strong scaling is unsolved
▪ Distributed parameter space exploration (e.g.,
Hogwild!; Niu et al. 2011)
▪ Distributed hyperparameter space exploration
(e.g., Bayesian optimization; Snoek et al. 2012)
Characteristics
Deep nets are flop eaters
Convolutions are expensive
Pointwise calcuations (log/exp, ReLU, */+, ...)
Neighborhood reductions (pooling, convolution)
Scaling network parameters
 increased learning capacity; overfitting
 more training data (real or synthetic),
regularization required
Deep nets are bandwidth eaters
More parameters = more memory, data to exchange
Barrier to cross-machine parallelism
▪ periodic exchanges, compression, quantization
Increase reuse of memory while local?
▪ interspersed reductions are resistant to fusion of
computations
▪ generalized programming language problem
Deep nets are latency sensitive
Serial dependency of training
fprop => bprop => fprop => ...
Serial dependency of multi-layer networks
layer 1 => layer 2 => layer 3 => ...
Multiple path dependent networks (RNNs, multi-layer
LSTMs)
Deep nets are also small?
Deeper = smaller feature planes, more of them
input Rm => expand to Rn => non-lin => reduce to
Rk
Problems are tiny in HPC terms
4096×4096 FFT, FE/PDE on massive grids, ...
NLP tasks can be sparse
Setup/kernel launch latency on GPU can dominate
compute
The tools
Vector processors
SIMD: Single Instruction,
Multiple Data
Serial processor with ability to
operate on more than one
piece of data concurrently
Cray-1 (1976)
Vector processors
Hard to use: instructions only operate on 4, 8, 16, ...
pieces of data at a time. Boundary/alignment
effects. Great if your vectors are large, but...
float* a = ...; // is this aligned (a % 16 == 0)?
float* b = ...; // is this aligned (b % 16 == 0)?
for (i = 0; i < 18; ++i) { // how to handle [16, 17]?
b[i] += a[i]; // SIMD this?!? masking/loop epilogue
}
“Vector cores”?
SIMD variant: NVIDIA calls
“SIMT”
Lots of simple cores (CM)
Hide latency through many
threads + switching (Tera)
“Pixel/vertex shaders” in 2000s
GPUs => GPGPU
CM-1 (1983)
Tera MTA (1995)
GPU versus CPU
GPUs represent a different form of vector
programming (“vector cores”)
▪ 32-wide vector of threads (“warp”)
Sufficiently optimized CPU code can be on par with
GPU perf (Tflop range with AVX2/512, exploit multi-
level caches, deep pipelines, prefetch, ...)
Vector programming: easier with GPUs than CPUs
Sweetspot is different from GPU codes
Parallelization + vectorization
Serial nature of commonly used CPU programming
languages sometimes hides opportunities
Auto-vectorizing/parallelizing compilers + DSLs can’t
yet compete with expert hand-rolled
▪ DSLs like Halide (Ragan-Kelley et al. 2013) show
promise but need a few more generations
Sprinkle in (OpenMP) doesn’t cut it
Who wins
CPU GPU
flops ✔
(vectorize: AVX2/512 gives
Tflop range)
✔
Tesla K40: 2880 fp32 ALU
pipelines
main memory b/w ✖
(Xeon Phi improves)
✔
latency ✔
(high clock, reordering;
caches are large and work if
you obey them)
✖
(threads slow, non-smem
caches irrelevant, CPU ->
GPU control overhead)
boundary effects,
small/irregular sizes
✔✖
(branches easy, vectorization
hard)
✖
(warp divergence, load
imbalance)
parallel programming model ✖
(vectorization hard, perf
black box)
✔✖
(CUDA is very different,
domain knowledge)
Tool + problem = solution?
Dive into 2D Convolutional Nets
Somewhat computationally expensive
O(b × f × f’ × n2 × k2)
1st layer AlexNet:
▪ 13.493 Gflop (1 flop here = fp32 multiply-add)
▪ 77.2 Mbyte in, 63.7 Mbyte out (fp32)
▪ Perfect caching + reuse, 175 flop/byte in
▪ No caching + reuse, 0.125 flop/byte in
The problem
Programmable caches (shared memory, registers, ...)
not large enough for perfect reuse
Space of all possible square 2D convolution
problems is 5/6-dimensional
Parameter Size
minibatch size (b) 128
input feature maps (f) 3
output feature maps (f’) 96
input feature size (n x n) 224
convolution kernel size (k x k) 11
convolution kernel stride (SxS) (optional) 4
Converting
Space of all possible matrix multiplications = 3
dimensional (ANxMBMxP = CNxP)
NVIDIA, Intel, others have put lots of effort into
optimizing many parts of this space
▪ Rephrase convolution as a matrix multiplication!
▪ NVIDIA’s cuDNN
But:
Sgemm originally optimized for large problems
13x13 * 3x3 is a small convolution. Unrolling it 192
times it might be enough to feed GPU
Large convolutions are intractable?
Small feature maps/
convolutions = boundary
effects bad for GPUs
Facebook AI Research work
2D convolution via FFT
Fast convolutional nets with fbfft: A GPU Performance
Evaluation (Vasilache, Johnson et al., 2015 ICLR
conference track oral)
Convolution => pointwise × in Fourier basis
Choice of basis is wide open! 2i is great perf
O(b f f’ n2 k2) => O(b f f’ n2 + (b f + f f’ + bf’) n2 log n)
▪ >= 5x5 kernels, faster than cuDNN
fbfft
cuFFT optimized for large FFT sizes
fbfft: smaller data, fit in registers, focus on warp
Data layout
Different problem sizes => different data layout
▪ cudaconv: DHWB (optimal for large b)
▪ deeper layers: HWBD/BHWD (many feature maps)
▪ b=1 faster convergence?
▪ b=128 better compute utilization
Smaller problems, exploit different layout/batching
▪ fbcunn 1D convolution
Latency hiding: what holds you back?
▪ Compute bound? (math)
▪ Memory b/w bound? (streaming)
▪ Memory latency bound? (sparse)
Almost all “deep learning” algorithms are b/w bound
on GPU. Low math intensity!
cuBLAS: Sgemm b/w bound. Dgemm compute
bound
Kernel fusion: CPU vs GPU
Reduces memory b/w pressure
Exploits cache locality and register reuse
CPU: fusion not necessary
Kernel tiling + interleaving works due to caches
GPU: fusion necessary
Tiling + interleaving doesn’t work: smem not
persistent, caches too small/irrelevant
Kernel fusion
CUDA kernel = hard optimization boundary on GPU
Loop interchange, lifting, better fusion on CPU
CUDA: parallelization layer not visible to optimizer.
Auto-tuning desired. HW specific non-linear tradeoffs
Scripting languages are further barrier to fusion on
both CPU and GPU (Torch)
Kernel fusion
Torch: transposition is common operation
▪ size (80, 40) stride (40, 1) => size (40, 80) stride
(1, 40)
▪ Old approach: transpose in memory, perform work,
copy back
▪ New approach: rewrite kernel to handle
transpositions. Optimize if non-transposed
Runtime fusion (CUDA 7.0, Theano)
Exploiting parallelism
end

More Related Content

What's hot

Lecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksLecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural Networks
Sang Jun Lee
 
Introduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNNIntroduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNN
Hye-min Ahn
 
Electricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural NetworksElectricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural Networks
Taegyun Jeon
 
第11回 配信講義 計算科学技術特論A(2021)
第11回 配信講義 計算科学技術特論A(2021)第11回 配信講義 計算科学技術特論A(2021)
第11回 配信講義 計算科学技術特論A(2021)
RCCSRENKEI
 
Transformer Zoo (a deeper dive)
Transformer Zoo (a deeper dive)Transformer Zoo (a deeper dive)
Transformer Zoo (a deeper dive)
Grigory Sapunov
 
Naist2015 dec ver1
Naist2015 dec ver1Naist2015 dec ver1
Naist2015 dec ver1
Hiroki Nakahara
 
LSTM
LSTMLSTM
Fine grained asynchronism for pseudo-spectral codes - with application to tur...
Fine grained asynchronism for pseudo-spectral codes - with application to tur...Fine grained asynchronism for pseudo-spectral codes - with application to tur...
Fine grained asynchronism for pseudo-spectral codes - with application to tur...
Ganesan Narayanasamy
 
Rnn & Lstm
Rnn & LstmRnn & Lstm
Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowDistributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflow
Emanuel Di Nardo
 
FPL15 talk: Deep Convolutional Neural Network on FPGA
FPL15 talk: Deep Convolutional Neural Network on FPGAFPL15 talk: Deep Convolutional Neural Network on FPGA
FPL15 talk: Deep Convolutional Neural Network on FPGA
Hiroki Nakahara
 
Deep Learning in Python with Tensorflow for Finance
Deep Learning in Python with Tensorflow for FinanceDeep Learning in Python with Tensorflow for Finance
Deep Learning in Python with Tensorflow for Finance
Ben Ball
 
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
Universitat Politècnica de Catalunya
 
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "SHow I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
Brandon Liu
 
Learn about Tensorflow for Deep Learning now! Part 1
Learn about Tensorflow for Deep Learning now! Part 1Learn about Tensorflow for Deep Learning now! Part 1
Learn about Tensorflow for Deep Learning now! Part 1
Tyrone Systems
 
Parallel computation
Parallel computationParallel computation
Parallel computation
Jayanti Prasad Ph.D.
 
Chainer v3
Chainer v3Chainer v3
Chainer v3
Seiya Tokui
 
SchNet: A continuous-filter convolutional neural network for modeling quantum...
SchNet: A continuous-filter convolutional neural network for modeling quantum...SchNet: A continuous-filter convolutional neural network for modeling quantum...
SchNet: A continuous-filter convolutional neural network for modeling quantum...
Kazuki Fujikawa
 
ISCAS'18: A Deep Neural Network on the Nested RNS (NRNS) on an FPGA: Applied ...
ISCAS'18: A Deep Neural Network on the Nested RNS (NRNS) on an FPGA: Applied ...ISCAS'18: A Deep Neural Network on the Nested RNS (NRNS) on an FPGA: Applied ...
ISCAS'18: A Deep Neural Network on the Nested RNS (NRNS) on an FPGA: Applied ...
Hiroki Nakahara
 

What's hot (20)

Lecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksLecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural Networks
 
Introduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNNIntroduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNN
 
SPAA11
SPAA11SPAA11
SPAA11
 
Electricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural NetworksElectricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural Networks
 
第11回 配信講義 計算科学技術特論A(2021)
第11回 配信講義 計算科学技術特論A(2021)第11回 配信講義 計算科学技術特論A(2021)
第11回 配信講義 計算科学技術特論A(2021)
 
Transformer Zoo (a deeper dive)
Transformer Zoo (a deeper dive)Transformer Zoo (a deeper dive)
Transformer Zoo (a deeper dive)
 
Naist2015 dec ver1
Naist2015 dec ver1Naist2015 dec ver1
Naist2015 dec ver1
 
LSTM
LSTMLSTM
LSTM
 
Fine grained asynchronism for pseudo-spectral codes - with application to tur...
Fine grained asynchronism for pseudo-spectral codes - with application to tur...Fine grained asynchronism for pseudo-spectral codes - with application to tur...
Fine grained asynchronism for pseudo-spectral codes - with application to tur...
 
Rnn & Lstm
Rnn & LstmRnn & Lstm
Rnn & Lstm
 
Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowDistributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflow
 
FPL15 talk: Deep Convolutional Neural Network on FPGA
FPL15 talk: Deep Convolutional Neural Network on FPGAFPL15 talk: Deep Convolutional Neural Network on FPGA
FPL15 talk: Deep Convolutional Neural Network on FPGA
 
Deep Learning in Python with Tensorflow for Finance
Deep Learning in Python with Tensorflow for FinanceDeep Learning in Python with Tensorflow for Finance
Deep Learning in Python with Tensorflow for Finance
 
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
 
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "SHow I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
 
Learn about Tensorflow for Deep Learning now! Part 1
Learn about Tensorflow for Deep Learning now! Part 1Learn about Tensorflow for Deep Learning now! Part 1
Learn about Tensorflow for Deep Learning now! Part 1
 
Parallel computation
Parallel computationParallel computation
Parallel computation
 
Chainer v3
Chainer v3Chainer v3
Chainer v3
 
SchNet: A continuous-filter convolutional neural network for modeling quantum...
SchNet: A continuous-filter convolutional neural network for modeling quantum...SchNet: A continuous-filter convolutional neural network for modeling quantum...
SchNet: A continuous-filter convolutional neural network for modeling quantum...
 
ISCAS'18: A Deep Neural Network on the Nested RNS (NRNS) on an FPGA: Applied ...
ISCAS'18: A Deep Neural Network on the Nested RNS (NRNS) on an FPGA: Applied ...ISCAS'18: A Deep Neural Network on the Nested RNS (NRNS) on an FPGA: Applied ...
ISCAS'18: A Deep Neural Network on the Nested RNS (NRNS) on an FPGA: Applied ...
 

Viewers also liked

Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)
Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)
Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)
Universitat Politècnica de Catalunya
 
Soumith Chintala, Artificial Intelligence Research Engineer, Facebook at MLco...
Soumith Chintala, Artificial Intelligence Research Engineer, Facebook at MLco...Soumith Chintala, Artificial Intelligence Research Engineer, Facebook at MLco...
Soumith Chintala, Artificial Intelligence Research Engineer, Facebook at MLco...
MLconf
 
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
MLconf
 
Quoc le, slides MLconf 11/15/13
Quoc le, slides  MLconf 11/15/13Quoc le, slides  MLconf 11/15/13
Quoc le, slides MLconf 11/15/13MLconf
 
Visualizing inference
Visualizing inferenceVisualizing inference
Visualizing inference
Alex Morrise
 
Understanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksUnderstanding Convolutional Neural Networks
Understanding Convolutional Neural Networks
Jeremy Nixon
 
Learning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for GraphsLearning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for Graphs
Mathias Niepert
 
Convolutional neural networks for sentiment classification
Convolutional neural networks for sentiment classificationConvolutional neural networks for sentiment classification
Convolutional neural networks for sentiment classification
Yunchao He
 
Scene classification using Convolutional Neural Networks - Jayani Withanawasam
Scene classification using Convolutional Neural Networks - Jayani WithanawasamScene classification using Convolutional Neural Networks - Jayani Withanawasam
Scene classification using Convolutional Neural Networks - Jayani Withanawasam
WithTheBest
 
Lukáš Vrábel - Deep Convolutional Neural Networks
Lukáš Vrábel - Deep Convolutional Neural NetworksLukáš Vrábel - Deep Convolutional Neural Networks
Lukáš Vrábel - Deep Convolutional Neural Networks
Machine Learning Prague
 
The basics and design of lua table
The basics and design of lua tableThe basics and design of lua table
The basics and design of lua table
Shuai Yuan
 
Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016
Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016
Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016
Keunwoo Choi
 
Teresa Larsen, Founder & Director, ScientificLiteracy.org at MLconf ATL 2016
Teresa Larsen, Founder & Director, ScientificLiteracy.org at MLconf ATL 2016Teresa Larsen, Founder & Director, ScientificLiteracy.org at MLconf ATL 2016
Teresa Larsen, Founder & Director, ScientificLiteracy.org at MLconf ATL 2016
MLconf
 
Deep Convolutional Neural Networks - Overview
Deep Convolutional Neural Networks - OverviewDeep Convolutional Neural Networks - Overview
Deep Convolutional Neural Networks - Overview
Keunwoo Choi
 
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerMDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
Poo Kuan Hoong
 
Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...
Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...
Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...
Universitat Politècnica de Catalunya
 
Image Object Detection Pipeline
Image Object Detection PipelineImage Object Detection Pipeline
Image Object Detection Pipeline
Abhinav Dadhich
 
101: Convolutional Neural Networks
101: Convolutional Neural Networks 101: Convolutional Neural Networks
101: Convolutional Neural Networks
Mad Scientists
 
Applied Deep Learning 11/03 Convolutional Neural Networks
Applied Deep Learning 11/03 Convolutional Neural NetworksApplied Deep Learning 11/03 Convolutional Neural Networks
Applied Deep Learning 11/03 Convolutional Neural Networks
Mark Chang
 
Deep Learning for Computer Vision: A comparision between Convolutional Neural...
Deep Learning for Computer Vision: A comparision between Convolutional Neural...Deep Learning for Computer Vision: A comparision between Convolutional Neural...
Deep Learning for Computer Vision: A comparision between Convolutional Neural...
Vincenzo Lomonaco
 

Viewers also liked (20)

Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)
Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)
Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)
 
Soumith Chintala, Artificial Intelligence Research Engineer, Facebook at MLco...
Soumith Chintala, Artificial Intelligence Research Engineer, Facebook at MLco...Soumith Chintala, Artificial Intelligence Research Engineer, Facebook at MLco...
Soumith Chintala, Artificial Intelligence Research Engineer, Facebook at MLco...
 
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
 
Quoc le, slides MLconf 11/15/13
Quoc le, slides  MLconf 11/15/13Quoc le, slides  MLconf 11/15/13
Quoc le, slides MLconf 11/15/13
 
Visualizing inference
Visualizing inferenceVisualizing inference
Visualizing inference
 
Understanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksUnderstanding Convolutional Neural Networks
Understanding Convolutional Neural Networks
 
Learning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for GraphsLearning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for Graphs
 
Convolutional neural networks for sentiment classification
Convolutional neural networks for sentiment classificationConvolutional neural networks for sentiment classification
Convolutional neural networks for sentiment classification
 
Scene classification using Convolutional Neural Networks - Jayani Withanawasam
Scene classification using Convolutional Neural Networks - Jayani WithanawasamScene classification using Convolutional Neural Networks - Jayani Withanawasam
Scene classification using Convolutional Neural Networks - Jayani Withanawasam
 
Lukáš Vrábel - Deep Convolutional Neural Networks
Lukáš Vrábel - Deep Convolutional Neural NetworksLukáš Vrábel - Deep Convolutional Neural Networks
Lukáš Vrábel - Deep Convolutional Neural Networks
 
The basics and design of lua table
The basics and design of lua tableThe basics and design of lua table
The basics and design of lua table
 
Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016
Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016
Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016
 
Teresa Larsen, Founder & Director, ScientificLiteracy.org at MLconf ATL 2016
Teresa Larsen, Founder & Director, ScientificLiteracy.org at MLconf ATL 2016Teresa Larsen, Founder & Director, ScientificLiteracy.org at MLconf ATL 2016
Teresa Larsen, Founder & Director, ScientificLiteracy.org at MLconf ATL 2016
 
Deep Convolutional Neural Networks - Overview
Deep Convolutional Neural Networks - OverviewDeep Convolutional Neural Networks - Overview
Deep Convolutional Neural Networks - Overview
 
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerMDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
 
Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...
Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...
Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...
 
Image Object Detection Pipeline
Image Object Detection PipelineImage Object Detection Pipeline
Image Object Detection Pipeline
 
101: Convolutional Neural Networks
101: Convolutional Neural Networks 101: Convolutional Neural Networks
101: Convolutional Neural Networks
 
Applied Deep Learning 11/03 Convolutional Neural Networks
Applied Deep Learning 11/03 Convolutional Neural NetworksApplied Deep Learning 11/03 Convolutional Neural Networks
Applied Deep Learning 11/03 Convolutional Neural Networks
 
Deep Learning for Computer Vision: A comparision between Convolutional Neural...
Deep Learning for Computer Vision: A comparision between Convolutional Neural...Deep Learning for Computer Vision: A comparision between Convolutional Neural...
Deep Learning for Computer Vision: A comparision between Convolutional Neural...
 

Similar to Jeff Johnson, Research Engineer, Facebook at MLconf NYC

Gpu and The Brick Wall
Gpu and The Brick WallGpu and The Brick Wall
Gpu and The Brick Wall
ugur candan
 
Trip down the GPU lane with Machine Learning
Trip down the GPU lane with Machine LearningTrip down the GPU lane with Machine Learning
Trip down the GPU lane with Machine Learning
Renaldas Zioma
 
Advances in GPU Computing
Advances in GPU ComputingAdvances in GPU Computing
Advances in GPU Computing
Frédéric Parienté
 
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Johan Andersson
 
Ceph Day Netherlands - Ceph @ BIT
Ceph Day Netherlands - Ceph @ BIT Ceph Day Netherlands - Ceph @ BIT
Ceph Day Netherlands - Ceph @ BIT
Ceph Community
 
Dpdk applications
Dpdk applicationsDpdk applications
Dpdk applications
Vipin Varghese
 
“Show Me the Garbage!”, Garbage Collection a Friend or a Foe
“Show Me the Garbage!”, Garbage Collection a Friend or a Foe“Show Me the Garbage!”, Garbage Collection a Friend or a Foe
“Show Me the Garbage!”, Garbage Collection a Friend or a Foe
Haim Yadid
 
Lrz kurs: big data analysis
Lrz kurs: big data analysisLrz kurs: big data analysis
Lrz kurs: big data analysis
Ferdinand Jamitzky
 
Inferno Scalable Deep Learning on Spark
Inferno Scalable Deep Learning on SparkInferno Scalable Deep Learning on Spark
Inferno Scalable Deep Learning on Spark
DataWorks Summit/Hadoop Summit
 
parallel-computation.pdf
parallel-computation.pdfparallel-computation.pdf
parallel-computation.pdf
Jayanti Prasad Ph.D.
 
Netflix machine learning
Netflix machine learningNetflix machine learning
Netflix machine learning
Amer Ather
 
Shenandoah GC: Java Without The Garbage Collection Hiccups (Christine Flood)
Shenandoah GC: Java Without The Garbage Collection Hiccups (Christine Flood)Shenandoah GC: Java Without The Garbage Collection Hiccups (Christine Flood)
Shenandoah GC: Java Without The Garbage Collection Hiccups (Christine Flood)
Red Hat Developers
 
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Slide_N
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
Sagar Dolas
 
Performance and predictability
Performance and predictabilityPerformance and predictability
Performance and predictability
RichardWarburton
 
Graphics processing uni computer archiecture
Graphics processing uni computer archiectureGraphics processing uni computer archiecture
Graphics processing uni computer archiecture
Haris456
 
Spark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with SparkSpark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with Spark
samthemonad
 
Disaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoFDisaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoF
ShapeBlue
 

Similar to Jeff Johnson, Research Engineer, Facebook at MLconf NYC (20)

Gpu and The Brick Wall
Gpu and The Brick WallGpu and The Brick Wall
Gpu and The Brick Wall
 
Trip down the GPU lane with Machine Learning
Trip down the GPU lane with Machine LearningTrip down the GPU lane with Machine Learning
Trip down the GPU lane with Machine Learning
 
Advances in GPU Computing
Advances in GPU ComputingAdvances in GPU Computing
Advances in GPU Computing
 
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
 
Ceph Day Netherlands - Ceph @ BIT
Ceph Day Netherlands - Ceph @ BIT Ceph Day Netherlands - Ceph @ BIT
Ceph Day Netherlands - Ceph @ BIT
 
Dpdk applications
Dpdk applicationsDpdk applications
Dpdk applications
 
“Show Me the Garbage!”, Garbage Collection a Friend or a Foe
“Show Me the Garbage!”, Garbage Collection a Friend or a Foe“Show Me the Garbage!”, Garbage Collection a Friend or a Foe
“Show Me the Garbage!”, Garbage Collection a Friend or a Foe
 
Lrz kurs: big data analysis
Lrz kurs: big data analysisLrz kurs: big data analysis
Lrz kurs: big data analysis
 
Ndp Slides
Ndp SlidesNdp Slides
Ndp Slides
 
Inferno Scalable Deep Learning on Spark
Inferno Scalable Deep Learning on SparkInferno Scalable Deep Learning on Spark
Inferno Scalable Deep Learning on Spark
 
parallel-computation.pdf
parallel-computation.pdfparallel-computation.pdf
parallel-computation.pdf
 
Netflix machine learning
Netflix machine learningNetflix machine learning
Netflix machine learning
 
Shenandoah GC: Java Without The Garbage Collection Hiccups (Christine Flood)
Shenandoah GC: Java Without The Garbage Collection Hiccups (Christine Flood)Shenandoah GC: Java Without The Garbage Collection Hiccups (Christine Flood)
Shenandoah GC: Java Without The Garbage Collection Hiccups (Christine Flood)
 
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
Performance and predictability
Performance and predictabilityPerformance and predictability
Performance and predictability
 
Graphics processing uni computer archiecture
Graphics processing uni computer archiectureGraphics processing uni computer archiecture
Graphics processing uni computer archiecture
 
Spark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with SparkSpark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with Spark
 
Disaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoFDisaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoF
 
Xen in Linux (aka PVOPS update)
Xen in Linux (aka PVOPS update)Xen in Linux (aka PVOPS update)
Xen in Linux (aka PVOPS update)
 

More from MLconf

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
MLconf
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
MLconf
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
MLconf
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
MLconf
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
MLconf
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
MLconf
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
MLconf
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
MLconf
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
MLconf
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
MLconf
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
MLconf
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
MLconf
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
MLconf
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
MLconf
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
MLconf
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
MLconf
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
MLconf
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
MLconf
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
MLconf
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
MLconf
 

More from MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
 

Recently uploaded

Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 

Recently uploaded (20)

Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 

Jeff Johnson, Research Engineer, Facebook at MLconf NYC

  • 1. Hacking GPUs for Deep Learning MLConf New York Jeff Johnson Facebook AI Research jhj@fb.com
  • 2. Deep (convolutional) Neural Networks Revolution in machine learning Convolution: since 1980s. Deep: flops since 2000s Avoid feature engineering ▪ With enough data, let network discover feature representations ▪ Can work even for NLP. No word segmentation, use raw character data.
  • 3. 2D Convolutional Nets (images) LeCun, Bottou, Bengio and Haffner, 1998 Krizhevsky, Sutskever and Hinton, 2012
  • 4. 2D Convolutional Nets Progress towards smaller kernels and deeper nets Network architecture ImageNet 1000 class top-5 error AlexNet ~15% OverFeat ~13% ZeilerNet ~11% Oxford-VGG ~7% GoogLeNet ~6%, ~4.5% PReLU (MSR) ~4.9% Human performance 3-5%
  • 5. 3D Convolutional Nets (videos) C3D (Tran et al., 2014) DeepVideo (Karpathy et al., 2014)
  • 6. 1D Convolutional Nets (text, sequences) Collobert et al., 2011 Zhang and LeCun, 2015
  • 7. RNNs and LSTMs (text, sequences) Graves, Mohamed and Hinton, 2013 Mikolov, 2014
  • 8. Deep Neural Networks Supervised learning. Unsupervised ??? Train with back-propagation/SGD variants Strong scaling is unsolved ▪ Distributed parameter space exploration (e.g., Hogwild!; Niu et al. 2011) ▪ Distributed hyperparameter space exploration (e.g., Bayesian optimization; Snoek et al. 2012)
  • 10. Deep nets are flop eaters Convolutions are expensive Pointwise calcuations (log/exp, ReLU, */+, ...) Neighborhood reductions (pooling, convolution) Scaling network parameters  increased learning capacity; overfitting  more training data (real or synthetic), regularization required
  • 11. Deep nets are bandwidth eaters More parameters = more memory, data to exchange Barrier to cross-machine parallelism ▪ periodic exchanges, compression, quantization Increase reuse of memory while local? ▪ interspersed reductions are resistant to fusion of computations ▪ generalized programming language problem
  • 12. Deep nets are latency sensitive Serial dependency of training fprop => bprop => fprop => ... Serial dependency of multi-layer networks layer 1 => layer 2 => layer 3 => ... Multiple path dependent networks (RNNs, multi-layer LSTMs)
  • 13. Deep nets are also small? Deeper = smaller feature planes, more of them input Rm => expand to Rn => non-lin => reduce to Rk Problems are tiny in HPC terms 4096×4096 FFT, FE/PDE on massive grids, ... NLP tasks can be sparse Setup/kernel launch latency on GPU can dominate compute
  • 15. Vector processors SIMD: Single Instruction, Multiple Data Serial processor with ability to operate on more than one piece of data concurrently Cray-1 (1976)
  • 16. Vector processors Hard to use: instructions only operate on 4, 8, 16, ... pieces of data at a time. Boundary/alignment effects. Great if your vectors are large, but... float* a = ...; // is this aligned (a % 16 == 0)? float* b = ...; // is this aligned (b % 16 == 0)? for (i = 0; i < 18; ++i) { // how to handle [16, 17]? b[i] += a[i]; // SIMD this?!? masking/loop epilogue }
  • 17. “Vector cores”? SIMD variant: NVIDIA calls “SIMT” Lots of simple cores (CM) Hide latency through many threads + switching (Tera) “Pixel/vertex shaders” in 2000s GPUs => GPGPU CM-1 (1983) Tera MTA (1995)
  • 18. GPU versus CPU GPUs represent a different form of vector programming (“vector cores”) ▪ 32-wide vector of threads (“warp”) Sufficiently optimized CPU code can be on par with GPU perf (Tflop range with AVX2/512, exploit multi- level caches, deep pipelines, prefetch, ...) Vector programming: easier with GPUs than CPUs Sweetspot is different from GPU codes
  • 19. Parallelization + vectorization Serial nature of commonly used CPU programming languages sometimes hides opportunities Auto-vectorizing/parallelizing compilers + DSLs can’t yet compete with expert hand-rolled ▪ DSLs like Halide (Ragan-Kelley et al. 2013) show promise but need a few more generations Sprinkle in (OpenMP) doesn’t cut it
  • 20. Who wins CPU GPU flops ✔ (vectorize: AVX2/512 gives Tflop range) ✔ Tesla K40: 2880 fp32 ALU pipelines main memory b/w ✖ (Xeon Phi improves) ✔ latency ✔ (high clock, reordering; caches are large and work if you obey them) ✖ (threads slow, non-smem caches irrelevant, CPU -> GPU control overhead) boundary effects, small/irregular sizes ✔✖ (branches easy, vectorization hard) ✖ (warp divergence, load imbalance) parallel programming model ✖ (vectorization hard, perf black box) ✔✖ (CUDA is very different, domain knowledge)
  • 21. Tool + problem = solution?
  • 22. Dive into 2D Convolutional Nets Somewhat computationally expensive O(b × f × f’ × n2 × k2) 1st layer AlexNet: ▪ 13.493 Gflop (1 flop here = fp32 multiply-add) ▪ 77.2 Mbyte in, 63.7 Mbyte out (fp32) ▪ Perfect caching + reuse, 175 flop/byte in ▪ No caching + reuse, 0.125 flop/byte in
  • 23. The problem Programmable caches (shared memory, registers, ...) not large enough for perfect reuse Space of all possible square 2D convolution problems is 5/6-dimensional Parameter Size minibatch size (b) 128 input feature maps (f) 3 output feature maps (f’) 96 input feature size (n x n) 224 convolution kernel size (k x k) 11 convolution kernel stride (SxS) (optional) 4
  • 24. Converting Space of all possible matrix multiplications = 3 dimensional (ANxMBMxP = CNxP) NVIDIA, Intel, others have put lots of effort into optimizing many parts of this space ▪ Rephrase convolution as a matrix multiplication! ▪ NVIDIA’s cuDNN
  • 25. But: Sgemm originally optimized for large problems 13x13 * 3x3 is a small convolution. Unrolling it 192 times it might be enough to feed GPU Large convolutions are intractable? Small feature maps/ convolutions = boundary effects bad for GPUs
  • 26. Facebook AI Research work 2D convolution via FFT Fast convolutional nets with fbfft: A GPU Performance Evaluation (Vasilache, Johnson et al., 2015 ICLR conference track oral) Convolution => pointwise × in Fourier basis Choice of basis is wide open! 2i is great perf O(b f f’ n2 k2) => O(b f f’ n2 + (b f + f f’ + bf’) n2 log n) ▪ >= 5x5 kernels, faster than cuDNN
  • 27. fbfft cuFFT optimized for large FFT sizes fbfft: smaller data, fit in registers, focus on warp
  • 28. Data layout Different problem sizes => different data layout ▪ cudaconv: DHWB (optimal for large b) ▪ deeper layers: HWBD/BHWD (many feature maps) ▪ b=1 faster convergence? ▪ b=128 better compute utilization Smaller problems, exploit different layout/batching ▪ fbcunn 1D convolution
  • 29. Latency hiding: what holds you back? ▪ Compute bound? (math) ▪ Memory b/w bound? (streaming) ▪ Memory latency bound? (sparse) Almost all “deep learning” algorithms are b/w bound on GPU. Low math intensity! cuBLAS: Sgemm b/w bound. Dgemm compute bound
  • 30. Kernel fusion: CPU vs GPU Reduces memory b/w pressure Exploits cache locality and register reuse CPU: fusion not necessary Kernel tiling + interleaving works due to caches GPU: fusion necessary Tiling + interleaving doesn’t work: smem not persistent, caches too small/irrelevant
  • 31. Kernel fusion CUDA kernel = hard optimization boundary on GPU Loop interchange, lifting, better fusion on CPU CUDA: parallelization layer not visible to optimizer. Auto-tuning desired. HW specific non-linear tradeoffs Scripting languages are further barrier to fusion on both CPU and GPU (Torch)
  • 32. Kernel fusion Torch: transposition is common operation ▪ size (80, 40) stride (40, 1) => size (40, 80) stride (1, 40) ▪ Old approach: transpose in memory, perform work, copy back ▪ New approach: rewrite kernel to handle transpositions. Optimize if non-transposed Runtime fusion (CUDA 7.0, Theano)
  • 34. end

Editor's Notes

  1. Let’s go over the components of a shard.
  2. Let’s go over the components of a shard.
  3. GPUs excel at stream processing; CPUs excel at sparsity, branching, pointer chasing (strong word)
  4. Let’s go over the components of a shard.
  5. Let’s go over the components of a shard.