SlideShare a Scribd company logo
1 of 22
Download to read offline
Copyright © 2015 Altera 1
Dr. Deshanand Singh, Director of Software Engineering
12 May 2015
Efficient Implementation of
Convolutional Neural Networks using
OpenCL on FPGAs
Copyright © 2015 Altera 2
• Convolutional Neural Network
• Feed forward network Inspired by biological processes
• Composed of different layers
• More layers increases accuracy
• Convolutional Layer
• Extract different features from input
• Low level features
• e.g. edges, lines corners
• Pooling Layer
• Reduce variance
• Invariant to small translations
• Max or average value
• Feature over region in the image
• Applications
• Classification & Detection, Image recognition/tagging
Convolutional Neural Network (CNN)
Prof. Hinton’s CNN Algorithm
Copyright © 2015 Altera 3
Image
Pooling
Convolution
Basic Building Block of the CNN
Source: Li Deng
Deep Learning Technology Center, Microsoft
Research, Redmond, WA. USA
A Tutorial at International Workshop on
Mathematical Issues in Information Sciences
Copyright © 2015 Altera 4
• Dataflow through the CNN can proceed in pipelined fashion
• No need to wait until the entire execution is complete
• Can start a new set of data going to stage 1 as soon as the stage
complete its execution
Key Observation: Pipelining
Layer 1:
Convolving Kernel
Layer 1:
Pooling Kernel
Layer 2:
Convolving Kernel
Copyright © 2015 Altera 5
FPGAs : Compute fabrics that support pipelining
1-bit configurable
operation
Configured to perform any
1-bit operation:
AND, OR, NOT, ADD, SUB
Basic Element
1-bit register
(store result)
Basic Elements are
surrounded with a
flexible interconnect
…
16-bit add
Your custom 64-bit
bit-shuffle and encode
32-bit sqrt
Memory
Block
20 Kb
addr
data_in
data_out
Can be configured and
grouped using the
interconnect to create
various cache architectures
data_in
Dedicated floating point
multiply and add blocks
data_out
Blocks are connected into
a custom data-path that
matches your application.
Copyright © 2015 Altera 6
Programming FPGAs: SDK for OpenCL
C compiler
OpenCL
Host
Program
Altera Kernel
Compiler
OpenCL
Kernels
Host
Binary
Device
Binary
Altera OpenCL
host library
Users write
software
FPGA
specific
compilatio
n
target
Copyright © 2015 Altera 7
Accelerator
LocalMem
GlobalMem
LocalMemLocalMemLocalMem
Accelerato
r
Accelerato
r
Accelerato
r
Processor
Accelerator
LocalMem
GlobalMem
LocalMemLocalMemLocalMem
AcceleratorAcceleratorAcceleratorProcessor
• Host + Accelerator Programming Model
• Sequential Host program on microprocessor
• Function offload onto a highly parallel accelerator device
OpenCL Programming Model
Host Accelerator
LocalMem
GlobalMem
LocalMemLocalMemLocalMem
Accelerato
r
Accelerato
r
Accelerato
r
Processor
__kernel void
sum(__global float *a,
__global float *b,
__global float *y)
{
int gid = get_global_id(0);
y[gid] = a[gid] + b[gid];
}
main() {
read_data( … );
maninpulate( … );
clEnqueueWriteBuffer( … );
clEnqueueNDRange(…,sum,…);
clEnqueueReadBuffer( … );
display_result( … );
}
Copyright © 2015 Altera 8
• Data-parallel function
• Defines many parallel threads of
execution
• Each thread has an identifier
specified by “get_global_id”
• Contains keyword extensions to
specify parallelism and memory
hierarchy
• Executed by compute object
• CPU
• GPU
• FPGA
• DSP
• Other Accelerators
OpenCL Kernels
__kernel void
sum(__global const float *a,
__global const float *b,
__global float *answer)
{
int xid = get_global_id(0);
answer[xid] = a[xid] + b[xid];
}
float *a =
float *b =
float *answer =
__kernel void sum( … );
0 1 2 3 4 5 6 7
7 6 5 4 3 2 1 0
7 7 7 7 7 7 7 7
Copyright © 2015 Altera 9
• On each cycle the portions of the pipeline
are processing different threads
• While thread 2 is being loaded, thread 1
is being added, and thread 0 is being
stored
Dataflow / Pipeline Architecture from OpenCL
Load Load
Store
0 1 2 3 4 5 6 7
8 threads for vector add example
Thread IDs
+
Load Load
Store
0
1 2 3 4 5 6 7
8 threads for vector add example
Thread IDs
+
Load Load
Store
0
1
2 3 4 5 6 7
8 threads for vector add example
Thread IDs
+
Load Load
Store
1
2
3 4 5 6 7
8 threads for vector add example
Thread IDs
+
0
Load Load
Store
2
3
4 5 6 7
8 threads for vector add example
Thread IDs
+
0
1
Copyright © 2015 Altera 10
Convolutions: Our Basic Building Block
Inew 𝑥 𝑦 = Iold
1
𝑦′=−1
1
𝑥′=−1
𝑥 + 𝑥′ 𝑦 + 𝑦′ × F 𝑥′ 𝑦′
Copyright © 2015 Altera 11
Main
Memory
Cache
• A cache can hide poor memory access patterns
Processor (CPU/GPU) Implementation
for(int y=1; y<height-1; ++y) {
for(int x=1; x<width-1; ++x) {
for(int y2=-1; y2<1; ++y2) {
for(int x2=-1; x2<1; ++x2) {
i2[y][x] += i[y+y2][x+x2]
* filter[y2][x2];
CPU
Copyright © 2015 Altera 12
• Example Performance Point: 1 pixel per cycle
• Cache requirements: 9 reads + 1 write per cycle
• Expensive hardware!
• Power overhead
• Cost overhead: More built in addressing flexibility than we need
• Why not customize the cache for the application?
FPGA Implementation ?
Cache
Custom
Data-path
9 read ports!
Memory
Copyright © 2015 Altera 13
Optimizing the “Cache”
• Start out with the initial picture that is W pixels wide
w
• Let’s remove all the lines that aren’t in the neighborhood of the
window
ww
• Take all of the lines and arrange them as a 1D array of pixels
w
w
w
ww
• The pixels at the edges that we don’t need for the computation
w
w
w
ww
w
w
w
ww
w
• What happens when we move the window one pixel to the right?
w
w
w
ww
w
• What happens when we move the window one pixel to the right?
Copyright © 2015 Altera 14
Optimizing the “Cache”
ww
w
Copyright © 2015 Altera 15
data_out[9]
• Managing data movement
to match the FPGA’s
architectural strengths is
key to obtaining high
performance.
Shift Registers in Software
pixel_t sr[2*W+3];
while(keep_going) {
// Shift data in
#pragma unroll
for(int i=1; i<2*W+3; ++i)
sr[i] = sr[i-1]
sr[0] = data_in;
// Tap output data
data_out = {sr[ 0], sr[ 1], sr[ 2],
sr[ w], sr[ w+1], sr[ w+2]
sr[2*w], sr[2*w+1], sr[2*w+2]}
// ...
}
ww
data_in
sr[0]sr[2*W+2]
Copyright © 2015 Altera 16
• CIFAR-10 Dataset
• 60000 32x32 color images
• 10 classes
• 6000 images per class
• 50000 training images
• 10000 test images
• Many CNN implementations are available for CIFAR-10
• Cuda-convnet provides a baseline implementation that many
works build upon
Building a CNN for CIFAR-10 on an FPGA
The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000
images per class. There are 50000 training images and 10000 test images.
Here are the classes in the dataset, as well as 10 random images from each:
airplane
automobile
bird
cat
deer
dog
frog
horse
ship
truck
Max-Pooling
Fully Connected
2-D Convolution (5x5)
On FPGA
Copyright © 2015 Altera 17
• High-latency: Requires access to global memory
• High memory-bandwidth
• Requires host coordination to pass buffers from one kernel to another
CIFAR-10 CNN: Traditional OpenCL
Implementation
Conv 1
Kernel
Pool 1
Kernel
Conv 2
Kernel
Global Memory (DDR)
Buffer Buffer Buffer Buffer
Stratix V Implementation :
• Processes 183 images per second
for CIFAR-10
Copyright © 2015 Altera 18
• Low-latency communication between kernels
• Significantly less memory bandwidth requirements
• Host is not involved in coordinating communication between
kernels
Global Memory (DDR)
CIFAR-10 CNN: Kernel-to-Kernel Channels
Buffer Buffer
Channels
Conv 1
Kernel
Pool 1
Kernel
Conv N
Kernel
Stratix V Implementation :
• Processes 400 images per second
for CIFAR-10
• Channel declaration:
• Create a queue:
value_type channel();
• Channel write:
• Push data into the queue:
void write_channel_altera(channel &ch, value_type data);
• Channel read:
• Pop the first element from the queue
value_type read_channel_altera(channel &ch);
channel int my_channel;
write_channel_altera(my_channel, x);
int y = read_channel_altera(my_channel);
Copyright © 2015 Altera 19
• Entire algorithm can be expressed in ~500
lines of OpenCL for the FPGA
• Kernels are written as standard building
blocks that are connected together
through channels
• The shift register convolution building
block is the portion of this code that is
used most heavily
• The concept of having multiple
concurrent kernels executing
simultaneously and communicating
directly on a device is currently unique
to FPGAs
• Will be portable in OpenCL 2.0 through
the concept of “OpenCL Pipes”
CIFAR-10: FPGA Code
#pragma OPENCL_EXTENSION cl_altera_channel
: enable
// Declaration of Channel API data types
channel float prod_conv1_channel;
channel float conv1_pool1_channel;
channel float pool1_conv2_channel;
channel float conv2_pool2_channel;
channel float pool2_conv3_channel;
channel float conv3_pool3_channel;
channel float pool3_cons_channel;
__kernel void
convolutional_neural_network_prod(
int batch_id_begin,
int batch_id_end,
__global const volatile float * restrict
input_global)
{
for(...) {
write_channel_altera(
prod_conv1_channel,
input_global[...]);
write_channel_altera(
prod_pool3_channel,
input_global[...]);
}
}
Copyright © 2015 Altera 20
• Altera’s Arria-10 family of FPGA introduces DSP blocks with a dedicated
floating point mode.
• Each DSP includes a IEEE 754 single precision floating-point multiplier
and adder
• FPGA logic blocks (lookup tables and registers) are no longer needed
to implement floating point functions
• Massive resource savings allows many more processing pipelines to
run simultaneously on the FPGA
Effect of Floating Point FPGAs
Arria-10 Implementation :
• Processes 6800 images per
second for CIFAR-10
Copyright © 2015 Altera 21
• CNN implementations are well suited to pipelined implementations
• Exploiting pipelining on the FPGA requires some attention to coding
style to overcome the inherent assumptions of the writing “software”
• FPGAs do not have caches
• Need to exploit data reuse in a more explicit way
• The concept of dataflow pipelining will not realize its full potential if
we write intermediate results to memory
• Bandwidth limitations begin to dominate compute
• Use direct kernel to kernel communication called channels
• Native support for floating point on the FPGA allows order of magnitude
performance increase
Lessons Learned
Copyright © 2015 Altera 22
• Altera’s SDK for OpenCL
• http://www.altera.com/products/software/opencl/opencl-
index.html
• FPGA Optimized OpenCL examples (filters, convolutions, …)
• http://www.altera.com/support/examples/opencl/opencl.html
• CIFAR-10 dataset
• http://www.cs.toronto.edu/~kriz/cifar.html
• CUDA-Convnet
• https://code.google.com/p/cuda-convnet/
Resources

More Related Content

What's hot

Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures Intel® Software
 
Deep Learning Accelerator Design Techniques
Deep Learning Accelerator Design TechniquesDeep Learning Accelerator Design Techniques
Deep Learning Accelerator Design TechniquesMindos Cheng
 
NVIDIA 深度學習教育機構 (DLI): Approaches to object detection
NVIDIA 深度學習教育機構 (DLI): Approaches to object detectionNVIDIA 深度學習教育機構 (DLI): Approaches to object detection
NVIDIA 深度學習教育機構 (DLI): Approaches to object detectionNVIDIA Taiwan
 
“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...
“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...
“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...Edge AI and Vision Alliance
 
Serving BERT Models in Production with TorchServe
Serving BERT Models in Production with TorchServeServing BERT Models in Production with TorchServe
Serving BERT Models in Production with TorchServeNidhin Pattaniyil
 
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflowNVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflowNVIDIA Taiwan
 
AI On the Edge: Model Compression
AI On the Edge: Model CompressionAI On the Edge: Model Compression
AI On the Edge: Model CompressionApache MXNet
 
"Using the OpenCL C Kernel Language for Embedded Vision Processors," a Presen...
"Using the OpenCL C Kernel Language for Embedded Vision Processors," a Presen..."Using the OpenCL C Kernel Language for Embedded Vision Processors," a Presen...
"Using the OpenCL C Kernel Language for Embedded Vision Processors," a Presen...Edge AI and Vision Alliance
 
FPT17: An object detector based on multiscale sliding window search using a f...
FPT17: An object detector based on multiscale sliding window search using a f...FPT17: An object detector based on multiscale sliding window search using a f...
FPT17: An object detector based on multiscale sliding window search using a f...Hiroki Nakahara
 
Early Benchmarking Results for Neuromorphic Computing
Early Benchmarking Results for Neuromorphic ComputingEarly Benchmarking Results for Neuromorphic Computing
Early Benchmarking Results for Neuromorphic ComputingDESMOND YUEN
 
Rethinking computation: A processor architecture for machine intelligence
Rethinking computation: A processor architecture for machine intelligenceRethinking computation: A processor architecture for machine intelligence
Rethinking computation: A processor architecture for machine intelligenceIntel Nervana
 
Evolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionEvolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionNVIDIA Taiwan
 
"New Dataflow Architecture for Machine Learning," a Presentation from Wave Co...
"New Dataflow Architecture for Machine Learning," a Presentation from Wave Co..."New Dataflow Architecture for Machine Learning," a Presentation from Wave Co...
"New Dataflow Architecture for Machine Learning," a Presentation from Wave Co...Edge AI and Vision Alliance
 
Squeezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile PhonesSqueezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile PhonesAnirudh Koul
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningCastLabKAIST
 
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digits
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digitsNVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digits
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digitsNVIDIA Taiwan
 
Nervana and the Future of Computing
Nervana and the Future of ComputingNervana and the Future of Computing
Nervana and the Future of ComputingIntel Nervana
 

What's hot (20)

Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
 
Deep learning with FPGA
Deep learning with FPGADeep learning with FPGA
Deep learning with FPGA
 
Deep Learning Accelerator Design Techniques
Deep Learning Accelerator Design TechniquesDeep Learning Accelerator Design Techniques
Deep Learning Accelerator Design Techniques
 
NVIDIA 深度學習教育機構 (DLI): Approaches to object detection
NVIDIA 深度學習教育機構 (DLI): Approaches to object detectionNVIDIA 深度學習教育機構 (DLI): Approaches to object detection
NVIDIA 深度學習教育機構 (DLI): Approaches to object detection
 
“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...
“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...
“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...
 
Serving BERT Models in Production with TorchServe
Serving BERT Models in Production with TorchServeServing BERT Models in Production with TorchServe
Serving BERT Models in Production with TorchServe
 
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflowNVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
 
AI On the Edge: Model Compression
AI On the Edge: Model CompressionAI On the Edge: Model Compression
AI On the Edge: Model Compression
 
"Using the OpenCL C Kernel Language for Embedded Vision Processors," a Presen...
"Using the OpenCL C Kernel Language for Embedded Vision Processors," a Presen..."Using the OpenCL C Kernel Language for Embedded Vision Processors," a Presen...
"Using the OpenCL C Kernel Language for Embedded Vision Processors," a Presen...
 
FPT17: An object detector based on multiscale sliding window search using a f...
FPT17: An object detector based on multiscale sliding window search using a f...FPT17: An object detector based on multiscale sliding window search using a f...
FPT17: An object detector based on multiscale sliding window search using a f...
 
CNN Quantization
CNN QuantizationCNN Quantization
CNN Quantization
 
Early Benchmarking Results for Neuromorphic Computing
Early Benchmarking Results for Neuromorphic ComputingEarly Benchmarking Results for Neuromorphic Computing
Early Benchmarking Results for Neuromorphic Computing
 
Manycores for the Masses
Manycores for the MassesManycores for the Masses
Manycores for the Masses
 
Rethinking computation: A processor architecture for machine intelligence
Rethinking computation: A processor architecture for machine intelligenceRethinking computation: A processor architecture for machine intelligence
Rethinking computation: A processor architecture for machine intelligence
 
Evolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionEvolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server Solution
 
"New Dataflow Architecture for Machine Learning," a Presentation from Wave Co...
"New Dataflow Architecture for Machine Learning," a Presentation from Wave Co..."New Dataflow Architecture for Machine Learning," a Presentation from Wave Co...
"New Dataflow Architecture for Machine Learning," a Presentation from Wave Co...
 
Squeezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile PhonesSqueezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile Phones
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digits
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digitsNVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digits
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digits
 
Nervana and the Future of Computing
Nervana and the Future of ComputingNervana and the Future of Computing
Nervana and the Future of Computing
 

Viewers also liked

Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Gaurav Mittal
 
clCaffe*: Unleashing the Power of Intel Graphics for Deep Learning Acceleration
clCaffe*: Unleashing the Power of Intel Graphics for Deep Learning AccelerationclCaffe*: Unleashing the Power of Intel Graphics for Deep Learning Acceleration
clCaffe*: Unleashing the Power of Intel Graphics for Deep Learning AccelerationIntel® Software
 
OpenCL applications in genomics
OpenCL applications in genomicsOpenCL applications in genomics
OpenCL applications in genomicsUSC
 
OpenCL Programming 101
OpenCL Programming 101OpenCL Programming 101
OpenCL Programming 101Yoss Cohen
 
AI&BigData Lab. Артем Чернодуб "Распознавание изображений методом Lazy Deep ...
AI&BigData Lab. Артем Чернодуб  "Распознавание изображений методом Lazy Deep ...AI&BigData Lab. Артем Чернодуб  "Распознавание изображений методом Lazy Deep ...
AI&BigData Lab. Артем Чернодуб "Распознавание изображений методом Lazy Deep ...GeeksLab Odessa
 
FPL15 talk: Deep Convolutional Neural Network on FPGA
FPL15 talk: Deep Convolutional Neural Network on FPGAFPL15 talk: Deep Convolutional Neural Network on FPGA
FPL15 talk: Deep Convolutional Neural Network on FPGAHiroki Nakahara
 
"Overcoming Barriers to Consumer Adoption of Vision-enabled Products and Serv...
"Overcoming Barriers to Consumer Adoption of Vision-enabled Products and Serv..."Overcoming Barriers to Consumer Adoption of Vision-enabled Products and Serv...
"Overcoming Barriers to Consumer Adoption of Vision-enabled Products and Serv...Edge AI and Vision Alliance
 
Classification and Clustering
Classification and ClusteringClassification and Clustering
Classification and ClusteringYogendra Tamang
 
Recent developments in Deep Learning
Recent developments in Deep LearningRecent developments in Deep Learning
Recent developments in Deep LearningBrahim HAMADICHAREF
 
FPGA Architecture Presentation
FPGA Architecture PresentationFPGA Architecture Presentation
FPGA Architecture Presentationomutukuda
 
Unsupervised Classification of Images: A Review
Unsupervised Classification of Images: A ReviewUnsupervised Classification of Images: A Review
Unsupervised Classification of Images: A ReviewCSCJournals
 
Neural Network as a function
Neural Network as a functionNeural Network as a function
Neural Network as a functionTaisuke Oe
 

Viewers also liked (15)

Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
clCaffe*: Unleashing the Power of Intel Graphics for Deep Learning Acceleration
clCaffe*: Unleashing the Power of Intel Graphics for Deep Learning AccelerationclCaffe*: Unleashing the Power of Intel Graphics for Deep Learning Acceleration
clCaffe*: Unleashing the Power of Intel Graphics for Deep Learning Acceleration
 
OpenCL applications in genomics
OpenCL applications in genomicsOpenCL applications in genomics
OpenCL applications in genomics
 
OpenCL Programming 101
OpenCL Programming 101OpenCL Programming 101
OpenCL Programming 101
 
Introduction to OpenCL
Introduction to OpenCLIntroduction to OpenCL
Introduction to OpenCL
 
AI&BigData Lab. Артем Чернодуб "Распознавание изображений методом Lazy Deep ...
AI&BigData Lab. Артем Чернодуб  "Распознавание изображений методом Lazy Deep ...AI&BigData Lab. Артем Чернодуб  "Распознавание изображений методом Lazy Deep ...
AI&BigData Lab. Артем Чернодуб "Распознавание изображений методом Lazy Deep ...
 
FPL15 talk: Deep Convolutional Neural Network on FPGA
FPL15 talk: Deep Convolutional Neural Network on FPGAFPL15 talk: Deep Convolutional Neural Network on FPGA
FPL15 talk: Deep Convolutional Neural Network on FPGA
 
MaPU-HPCA2016
MaPU-HPCA2016MaPU-HPCA2016
MaPU-HPCA2016
 
"Overcoming Barriers to Consumer Adoption of Vision-enabled Products and Serv...
"Overcoming Barriers to Consumer Adoption of Vision-enabled Products and Serv..."Overcoming Barriers to Consumer Adoption of Vision-enabled Products and Serv...
"Overcoming Barriers to Consumer Adoption of Vision-enabled Products and Serv...
 
Classification and Clustering
Classification and ClusteringClassification and Clustering
Classification and Clustering
 
Recent developments in Deep Learning
Recent developments in Deep LearningRecent developments in Deep Learning
Recent developments in Deep Learning
 
CIFAR-10
CIFAR-10CIFAR-10
CIFAR-10
 
FPGA Architecture Presentation
FPGA Architecture PresentationFPGA Architecture Presentation
FPGA Architecture Presentation
 
Unsupervised Classification of Images: A Review
Unsupervised Classification of Images: A ReviewUnsupervised Classification of Images: A Review
Unsupervised Classification of Images: A Review
 
Neural Network as a function
Neural Network as a functionNeural Network as a function
Neural Network as a function
 

Similar to "Efficient Implementation of Convolutional Neural Networks using OpenCL on FPGAs," a Presentation From Altera

Enabling Cross-platform Deep Learning Applications with Intel OpenVINO™
Enabling Cross-platform Deep Learning Applications with Intel OpenVINO™Enabling Cross-platform Deep Learning Applications with Intel OpenVINO™
Enabling Cross-platform Deep Learning Applications with Intel OpenVINO™Yury Gorbachev
 
Using VPP and SRIO-V with Clear Containers
Using VPP and SRIO-V with Clear ContainersUsing VPP and SRIO-V with Clear Containers
Using VPP and SRIO-V with Clear ContainersMichelle Holley
 
FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)Kirill Tsym
 
FD.IO Vector Packet Processing
FD.IO Vector Packet ProcessingFD.IO Vector Packet Processing
FD.IO Vector Packet ProcessingKernel TLV
 
MattsonTutorialSC14.pptx
MattsonTutorialSC14.pptxMattsonTutorialSC14.pptx
MattsonTutorialSC14.pptxgopikahari7
 
ProjectVault[VivekKumar_CS-C_6Sem_MIT].pptx
ProjectVault[VivekKumar_CS-C_6Sem_MIT].pptxProjectVault[VivekKumar_CS-C_6Sem_MIT].pptx
ProjectVault[VivekKumar_CS-C_6Sem_MIT].pptxVivek Kumar
 
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...Christopher Diamantopoulos
 
Intel open stack-summit-session-nov13-final
Intel open stack-summit-session-nov13-finalIntel open stack-summit-session-nov13-final
Intel open stack-summit-session-nov13-finalDeepak Mane
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kevin Lynch
 
Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core ArchitecturesPerformance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core ArchitecturesDr. Fabio Baruffa
 
G rpc talk with intel (3)
G rpc talk with intel (3)G rpc talk with intel (3)
G rpc talk with intel (3)Intel
 
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messages
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messagesMulti-Tenancy Kafka cluster for LINE services with 250 billion daily messages
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messagesLINE Corporation
 
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph Ceph Community
 
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*Michelle Holley
 
Qualcomm Centriq Arm-based Servers for Edge Computing at ONS 2018
Qualcomm Centriq Arm-based Servers for Edge Computing at ONS 2018Qualcomm Centriq Arm-based Servers for Edge Computing at ONS 2018
Qualcomm Centriq Arm-based Servers for Edge Computing at ONS 2018Chaitali Sengupta
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networksinside-BigData.com
 
HiPEAC Computing Systems Week 2022_Mario Porrmann presentation
HiPEAC Computing Systems Week 2022_Mario Porrmann presentationHiPEAC Computing Systems Week 2022_Mario Porrmann presentation
HiPEAC Computing Systems Week 2022_Mario Porrmann presentationVEDLIoT Project
 

Similar to "Efficient Implementation of Convolutional Neural Networks using OpenCL on FPGAs," a Presentation From Altera (20)

Enabling Cross-platform Deep Learning Applications with Intel OpenVINO™
Enabling Cross-platform Deep Learning Applications with Intel OpenVINO™Enabling Cross-platform Deep Learning Applications with Intel OpenVINO™
Enabling Cross-platform Deep Learning Applications with Intel OpenVINO™
 
Using VPP and SRIO-V with Clear Containers
Using VPP and SRIO-V with Clear ContainersUsing VPP and SRIO-V with Clear Containers
Using VPP and SRIO-V with Clear Containers
 
FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)
 
FD.IO Vector Packet Processing
FD.IO Vector Packet ProcessingFD.IO Vector Packet Processing
FD.IO Vector Packet Processing
 
MattsonTutorialSC14.pptx
MattsonTutorialSC14.pptxMattsonTutorialSC14.pptx
MattsonTutorialSC14.pptx
 
ProjectVault[VivekKumar_CS-C_6Sem_MIT].pptx
ProjectVault[VivekKumar_CS-C_6Sem_MIT].pptxProjectVault[VivekKumar_CS-C_6Sem_MIT].pptx
ProjectVault[VivekKumar_CS-C_6Sem_MIT].pptx
 
Onnc intro
Onnc introOnnc intro
Onnc intro
 
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...
 
Intel open stack-summit-session-nov13-final
Intel open stack-summit-session-nov13-finalIntel open stack-summit-session-nov13-final
Intel open stack-summit-session-nov13-final
 
GR740 User day
GR740 User dayGR740 User day
GR740 User day
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
 
Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core ArchitecturesPerformance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures
 
MattsonTutorialSC14.pdf
MattsonTutorialSC14.pdfMattsonTutorialSC14.pdf
MattsonTutorialSC14.pdf
 
G rpc talk with intel (3)
G rpc talk with intel (3)G rpc talk with intel (3)
G rpc talk with intel (3)
 
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messages
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messagesMulti-Tenancy Kafka cluster for LINE services with 250 billion daily messages
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messages
 
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
 
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
 
Qualcomm Centriq Arm-based Servers for Edge Computing at ONS 2018
Qualcomm Centriq Arm-based Servers for Edge Computing at ONS 2018Qualcomm Centriq Arm-based Servers for Edge Computing at ONS 2018
Qualcomm Centriq Arm-based Servers for Edge Computing at ONS 2018
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networks
 
HiPEAC Computing Systems Week 2022_Mario Porrmann presentation
HiPEAC Computing Systems Week 2022_Mario Porrmann presentationHiPEAC Computing Systems Week 2022_Mario Porrmann presentation
HiPEAC Computing Systems Week 2022_Mario Porrmann presentation
 

More from Edge AI and Vision Alliance

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...Edge AI and Vision Alliance
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...Edge AI and Vision Alliance
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...Edge AI and Vision Alliance
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...Edge AI and Vision Alliance
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...Edge AI and Vision Alliance
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...Edge AI and Vision Alliance
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...Edge AI and Vision Alliance
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsightsEdge AI and Vision Alliance
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...Edge AI and Vision Alliance
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...Edge AI and Vision Alliance
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...Edge AI and Vision Alliance
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...Edge AI and Vision Alliance
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...Edge AI and Vision Alliance
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...Edge AI and Vision Alliance
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...Edge AI and Vision Alliance
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from SamsaraEdge AI and Vision Alliance
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...Edge AI and Vision Alliance
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...Edge AI and Vision Alliance
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...Edge AI and Vision Alliance
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...Edge AI and Vision Alliance
 

More from Edge AI and Vision Alliance (20)

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
 

Recently uploaded

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your Queries
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your QueriesExploring ChatGPT Prompt Hacks To Maximally Optimise Your Queries
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your QueriesSanjay Willie
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dashnarutouzumaki53779
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 

Recently uploaded (20)

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your Queries
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your QueriesExploring ChatGPT Prompt Hacks To Maximally Optimise Your Queries
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your Queries
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dash
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 

"Efficient Implementation of Convolutional Neural Networks using OpenCL on FPGAs," a Presentation From Altera

  • 1. Copyright © 2015 Altera 1 Dr. Deshanand Singh, Director of Software Engineering 12 May 2015 Efficient Implementation of Convolutional Neural Networks using OpenCL on FPGAs
  • 2. Copyright © 2015 Altera 2 • Convolutional Neural Network • Feed forward network Inspired by biological processes • Composed of different layers • More layers increases accuracy • Convolutional Layer • Extract different features from input • Low level features • e.g. edges, lines corners • Pooling Layer • Reduce variance • Invariant to small translations • Max or average value • Feature over region in the image • Applications • Classification & Detection, Image recognition/tagging Convolutional Neural Network (CNN) Prof. Hinton’s CNN Algorithm
  • 3. Copyright © 2015 Altera 3 Image Pooling Convolution Basic Building Block of the CNN Source: Li Deng Deep Learning Technology Center, Microsoft Research, Redmond, WA. USA A Tutorial at International Workshop on Mathematical Issues in Information Sciences
  • 4. Copyright © 2015 Altera 4 • Dataflow through the CNN can proceed in pipelined fashion • No need to wait until the entire execution is complete • Can start a new set of data going to stage 1 as soon as the stage complete its execution Key Observation: Pipelining Layer 1: Convolving Kernel Layer 1: Pooling Kernel Layer 2: Convolving Kernel
  • 5. Copyright © 2015 Altera 5 FPGAs : Compute fabrics that support pipelining 1-bit configurable operation Configured to perform any 1-bit operation: AND, OR, NOT, ADD, SUB Basic Element 1-bit register (store result) Basic Elements are surrounded with a flexible interconnect … 16-bit add Your custom 64-bit bit-shuffle and encode 32-bit sqrt Memory Block 20 Kb addr data_in data_out Can be configured and grouped using the interconnect to create various cache architectures data_in Dedicated floating point multiply and add blocks data_out Blocks are connected into a custom data-path that matches your application.
  • 6. Copyright © 2015 Altera 6 Programming FPGAs: SDK for OpenCL C compiler OpenCL Host Program Altera Kernel Compiler OpenCL Kernels Host Binary Device Binary Altera OpenCL host library Users write software FPGA specific compilatio n target
  • 7. Copyright © 2015 Altera 7 Accelerator LocalMem GlobalMem LocalMemLocalMemLocalMem Accelerato r Accelerato r Accelerato r Processor Accelerator LocalMem GlobalMem LocalMemLocalMemLocalMem AcceleratorAcceleratorAcceleratorProcessor • Host + Accelerator Programming Model • Sequential Host program on microprocessor • Function offload onto a highly parallel accelerator device OpenCL Programming Model Host Accelerator LocalMem GlobalMem LocalMemLocalMemLocalMem Accelerato r Accelerato r Accelerato r Processor __kernel void sum(__global float *a, __global float *b, __global float *y) { int gid = get_global_id(0); y[gid] = a[gid] + b[gid]; } main() { read_data( … ); maninpulate( … ); clEnqueueWriteBuffer( … ); clEnqueueNDRange(…,sum,…); clEnqueueReadBuffer( … ); display_result( … ); }
  • 8. Copyright © 2015 Altera 8 • Data-parallel function • Defines many parallel threads of execution • Each thread has an identifier specified by “get_global_id” • Contains keyword extensions to specify parallelism and memory hierarchy • Executed by compute object • CPU • GPU • FPGA • DSP • Other Accelerators OpenCL Kernels __kernel void sum(__global const float *a, __global const float *b, __global float *answer) { int xid = get_global_id(0); answer[xid] = a[xid] + b[xid]; } float *a = float *b = float *answer = __kernel void sum( … ); 0 1 2 3 4 5 6 7 7 6 5 4 3 2 1 0 7 7 7 7 7 7 7 7
  • 9. Copyright © 2015 Altera 9 • On each cycle the portions of the pipeline are processing different threads • While thread 2 is being loaded, thread 1 is being added, and thread 0 is being stored Dataflow / Pipeline Architecture from OpenCL Load Load Store 0 1 2 3 4 5 6 7 8 threads for vector add example Thread IDs + Load Load Store 0 1 2 3 4 5 6 7 8 threads for vector add example Thread IDs + Load Load Store 0 1 2 3 4 5 6 7 8 threads for vector add example Thread IDs + Load Load Store 1 2 3 4 5 6 7 8 threads for vector add example Thread IDs + 0 Load Load Store 2 3 4 5 6 7 8 threads for vector add example Thread IDs + 0 1
  • 10. Copyright © 2015 Altera 10 Convolutions: Our Basic Building Block Inew 𝑥 𝑦 = Iold 1 𝑦′=−1 1 𝑥′=−1 𝑥 + 𝑥′ 𝑦 + 𝑦′ × F 𝑥′ 𝑦′
  • 11. Copyright © 2015 Altera 11 Main Memory Cache • A cache can hide poor memory access patterns Processor (CPU/GPU) Implementation for(int y=1; y<height-1; ++y) { for(int x=1; x<width-1; ++x) { for(int y2=-1; y2<1; ++y2) { for(int x2=-1; x2<1; ++x2) { i2[y][x] += i[y+y2][x+x2] * filter[y2][x2]; CPU
  • 12. Copyright © 2015 Altera 12 • Example Performance Point: 1 pixel per cycle • Cache requirements: 9 reads + 1 write per cycle • Expensive hardware! • Power overhead • Cost overhead: More built in addressing flexibility than we need • Why not customize the cache for the application? FPGA Implementation ? Cache Custom Data-path 9 read ports! Memory
  • 13. Copyright © 2015 Altera 13 Optimizing the “Cache” • Start out with the initial picture that is W pixels wide w • Let’s remove all the lines that aren’t in the neighborhood of the window ww • Take all of the lines and arrange them as a 1D array of pixels w w w ww • The pixels at the edges that we don’t need for the computation w w w ww w w w ww w • What happens when we move the window one pixel to the right? w w w ww w • What happens when we move the window one pixel to the right?
  • 14. Copyright © 2015 Altera 14 Optimizing the “Cache” ww w
  • 15. Copyright © 2015 Altera 15 data_out[9] • Managing data movement to match the FPGA’s architectural strengths is key to obtaining high performance. Shift Registers in Software pixel_t sr[2*W+3]; while(keep_going) { // Shift data in #pragma unroll for(int i=1; i<2*W+3; ++i) sr[i] = sr[i-1] sr[0] = data_in; // Tap output data data_out = {sr[ 0], sr[ 1], sr[ 2], sr[ w], sr[ w+1], sr[ w+2] sr[2*w], sr[2*w+1], sr[2*w+2]} // ... } ww data_in sr[0]sr[2*W+2]
  • 16. Copyright © 2015 Altera 16 • CIFAR-10 Dataset • 60000 32x32 color images • 10 classes • 6000 images per class • 50000 training images • 10000 test images • Many CNN implementations are available for CIFAR-10 • Cuda-convnet provides a baseline implementation that many works build upon Building a CNN for CIFAR-10 on an FPGA The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. Here are the classes in the dataset, as well as 10 random images from each: airplane automobile bird cat deer dog frog horse ship truck Max-Pooling Fully Connected 2-D Convolution (5x5) On FPGA
  • 17. Copyright © 2015 Altera 17 • High-latency: Requires access to global memory • High memory-bandwidth • Requires host coordination to pass buffers from one kernel to another CIFAR-10 CNN: Traditional OpenCL Implementation Conv 1 Kernel Pool 1 Kernel Conv 2 Kernel Global Memory (DDR) Buffer Buffer Buffer Buffer Stratix V Implementation : • Processes 183 images per second for CIFAR-10
  • 18. Copyright © 2015 Altera 18 • Low-latency communication between kernels • Significantly less memory bandwidth requirements • Host is not involved in coordinating communication between kernels Global Memory (DDR) CIFAR-10 CNN: Kernel-to-Kernel Channels Buffer Buffer Channels Conv 1 Kernel Pool 1 Kernel Conv N Kernel Stratix V Implementation : • Processes 400 images per second for CIFAR-10 • Channel declaration: • Create a queue: value_type channel(); • Channel write: • Push data into the queue: void write_channel_altera(channel &ch, value_type data); • Channel read: • Pop the first element from the queue value_type read_channel_altera(channel &ch); channel int my_channel; write_channel_altera(my_channel, x); int y = read_channel_altera(my_channel);
  • 19. Copyright © 2015 Altera 19 • Entire algorithm can be expressed in ~500 lines of OpenCL for the FPGA • Kernels are written as standard building blocks that are connected together through channels • The shift register convolution building block is the portion of this code that is used most heavily • The concept of having multiple concurrent kernels executing simultaneously and communicating directly on a device is currently unique to FPGAs • Will be portable in OpenCL 2.0 through the concept of “OpenCL Pipes” CIFAR-10: FPGA Code #pragma OPENCL_EXTENSION cl_altera_channel : enable // Declaration of Channel API data types channel float prod_conv1_channel; channel float conv1_pool1_channel; channel float pool1_conv2_channel; channel float conv2_pool2_channel; channel float pool2_conv3_channel; channel float conv3_pool3_channel; channel float pool3_cons_channel; __kernel void convolutional_neural_network_prod( int batch_id_begin, int batch_id_end, __global const volatile float * restrict input_global) { for(...) { write_channel_altera( prod_conv1_channel, input_global[...]); write_channel_altera( prod_pool3_channel, input_global[...]); } }
  • 20. Copyright © 2015 Altera 20 • Altera’s Arria-10 family of FPGA introduces DSP blocks with a dedicated floating point mode. • Each DSP includes a IEEE 754 single precision floating-point multiplier and adder • FPGA logic blocks (lookup tables and registers) are no longer needed to implement floating point functions • Massive resource savings allows many more processing pipelines to run simultaneously on the FPGA Effect of Floating Point FPGAs Arria-10 Implementation : • Processes 6800 images per second for CIFAR-10
  • 21. Copyright © 2015 Altera 21 • CNN implementations are well suited to pipelined implementations • Exploiting pipelining on the FPGA requires some attention to coding style to overcome the inherent assumptions of the writing “software” • FPGAs do not have caches • Need to exploit data reuse in a more explicit way • The concept of dataflow pipelining will not realize its full potential if we write intermediate results to memory • Bandwidth limitations begin to dominate compute • Use direct kernel to kernel communication called channels • Native support for floating point on the FPGA allows order of magnitude performance increase Lessons Learned
  • 22. Copyright © 2015 Altera 22 • Altera’s SDK for OpenCL • http://www.altera.com/products/software/opencl/opencl- index.html • FPGA Optimized OpenCL examples (filters, convolutions, …) • http://www.altera.com/support/examples/opencl/opencl.html • CIFAR-10 dataset • http://www.cs.toronto.edu/~kriz/cifar.html • CUDA-Convnet • https://code.google.com/p/cuda-convnet/ Resources