SlideShare a Scribd company logo
1 of 35
Download to read offline
© 2020 Samsung
Improving Power Efficiency for
Edge Inferencing Through
Memory Management
Optimizations
Nathan Levy
Samsung
September 2020
© 2020 Samsung
Outline
• Introduction
• Tiling and forwarding
• Filters management
• Case study of different modern convolutional neural networks
• Conclusion
2
© 2020 Samsung
Counting the I/O in the TOps per Watt KPI
• Power consumption is composed of:
• Compute power: perform the arithmetic computations
• I/O power: bring the data to the compute engine and back
• For convolutional neural networks, I/O power can be higher than compute
• The compiler must optimize the memory management
• Scope: convolutional neural network (CNN) acceleration
• Focus on convolutions
• Assume the network is fixed (and quantized, pruned…)
• Inference only
3
© 2020 Samsung
Description of the working model
What is a convolution?
• Input: feature map and filters
• Apply: multiply and accumulate
• Output: feature map
CNN can be processed by CPU, DSP, GPU, NPU/TPU
• These often have multiple cores
• These have multiple levels of memories (cache and
scratch-pad)
4
Input feature map
Filter
zb
Output feature map
Output activation
Filter
za
Output activation
X
i
Yi
X
o
Yo
© 2020 Samsung
Description of the working model
Simplified toy example:
• Single processing core
• Small local memory with “free”
access and big remote memory
with expensive access
Remote memory contains:
• Input and output feature maps
• Convolution filters
• Intermediate feature maps (if
needed)
5
Remote memory Conv 1
Local memory
Processing
unit
Feature Map 1 Feature Map N
Conv M
Filters
© 2020 Samsung
Typical orders of magnitude for NPU
• The memory management problem
depends on the relative size of:
• The local memory of the processor
• The typical feature map of the
network
• Our working point:
Feature map size ~
10 * Local memory size
6
Local memory size [B]
Feature map size [B]
10k 100k 1M 10M
10k
100k
1M
10M
100M
IoT
Mobile
Automotive
© 2020 Samsung
Tiling and forwarding
© 2020 Samsung
Tiling and Forwarding
How to overcome the small memory size:
• Split the input feature map into smaller
parts → tiles
• Apply the convolution on the input feature
map tile
→ Creates an output feature map tile
The output feature tile can be either:
• Written in remote memory
• Used for the next convolution
→ This is called forwarding
8
Remote memory Conv 1
Local memory
Processing
unit
IFM (not in mem) OFM (not in
mem)
Feature Map 1 Feature Map N
Conv M
Conv i
Input feature
map tile
Output feature
map tile
Filters
© 2020 Samsung
How to tile and forward
How does the compiler tile and forward to minimize power
consumption?
Key optimization parameters:
• Power consumption: bandwidth (I/O) + compute
→ Focus of this talk
• Processing throughput (operations/second)
• Chip area → cost
• System complexity, flexibility and maintenance
9
© 2020 Samsung
3x3
conv
3x3
conv
From
remote
memory
To
remote
memory
Tiling and increasing receptive fields
2 convolutions with 3x4 tiling. Without access to remote memory in the middle (forwarding)
For i in [1-12]:
• Load tile i of blue feature map from remote memory
• Run 1st convolution → tile i of green feature map
• Run 2nd convolution → tile i of yellow feature map
• Store tile i of yellow feature map to remote memory
© 2020 Samsung
Tiling and increasing receptive fields
2 convolutions with 3x4 tiling. Without access to remote memory in the middle (forwarding)
For i in [1-12]:
• Load tile i of blue feature map from remote memory
• Run 1st convolution → tile i of green feature map
• Run 2nd convolution → tile i of yellow feature map
• Store tile i of yellow feature map to remote memory
3x3
conv
3x3
conv
From
remote
memory
To
remote
memory
© 2020 Samsung
Tiling and increasing receptive fields
3x3
conv
3x3
conv
From
remote
memory
To
remote
memory
2 convolutions with 3x4 tiling. Without access to remote memory in the middle (forwarding)
For i in [1-12]:
• Load tile i of blue feature map from remote memory
• Run 1st convolution → tile i of green feature map
• Run 2nd convolution → tile i of yellow feature map
• Store tile i of yellow feature map to remote memory
© 2020 Samsung
3x3
conv
3x3
conv
From
remote
memory
To
remote
memory
Tiling and increasing receptive fields
2 convolutions with 3x4 tiling. Without access to remote memory in the middle (forwarding)
For i in [1-12]:
• Load tile i of blue feature map from remote memory
• Run 1st convolution → tile i of green feature map
• Run 2nd convolution → tile i of yellow feature map
• Store tile i of yellow feature map to remote memory
© 2020 Samsung
3x3
conv
3x3
conv
From
remote
memory
To
remote
memory
Tiling and increasing receptive fields
2 convolutions with 3x4 tiling. Without access to remote memory in the middle (forwarding)
For i in [1-12]:
• Load tile i of blue feature map from remote memory
• Run 1st convolution → tile i of green feature map
• Run 2nd convolution → tile i of yellow feature map
• Store tile i of yellow feature map to remote memory
© 2020 Samsung
3x3
conv
3x3
conv
From
remote
memory
To
remote
memory
Tiling and increasing receptive fields
2 convolutions with 3x4 tiling. Without access to remote memory in the middle (forwarding)
For i in [1-12]:
• Load tile i of blue feature map from remote memory
• Run 1st convolution → tile i of green feature map
• Run 2nd convolution → tile i of yellow feature map
• Store tile i of yellow feature map to remote memory
© 2020 Samsung
3x3
conv
3x3
conv
From
remote
memory
To
remote
memory
Tiling and increasing receptive fields
2 convolutions with 3x4 tiling. Without access to remote memory in the middle (forwarding)
Problem: 3x3 convolution requires margins
© 2020 Samsung
3x3
conv
3x3
conv
From
remote
memory
To
remote
memory
Tiling and increasing receptive fields
2 convolutions with 3x4 tiling. Without access to remote memory in the middle (forwarding)
Solution: Bigger blue and green tiles
Problems:
• Load more data for the first (blue) layer
• Compute more data for the intermediate (green) layer
2 pixels margin 1 pixel margin
© 2020 Samsung
Tiling and increasing receptive fields
Consider a long chain of convolutions. How many stages of forwarding do you want?
• Forwarding tiles through as many layers as possible avoids dumping to remote
memory
• This causes the overlaps to grow
• More data to load for the first layer
• Redundant processing
18
conv conv conv conv
© 2020 Samsung
2 convolutions
# of Convolutions Feature Map
Tiling
Bandwidth Bandwidth per
convolution
Operations per
convolution
Energy per
convolution
2 3x4 +6% -47% +5% -30%
3 3x4 +12% -63% +10% -39%
4 3x4 +17% -71% +16% -43%
5 4x4 +29% -74% +26% -42%
19
All numbers are compared to “no forwarding”
conv conv
To remote
memory
From remote
memory
Load the
overlaps
Compute the
overlaps
Less I/O
More compute
Break the feature map
into 12 to fit in memory
© 2020 Samsung
3 convolutions
# of Convolutions Feature Map
Tiling
Bandwidth Bandwidth per
convolution
Operations per
convolution
Energy per
convolution
2 3x4 +6% -47% +5% -30%
3 3x4 +12% -63% +10% -39%
4 3x4 +17% -71% +16% -43%
5 4x4 +29% -74% +26% -42%
20
All numbers are compared to “no forwarding”
conv conv conv
To remote
memory
From remote
memory
© 2020 Samsung
4 convolutions
# of Convolutions Feature Map
Tiling
Bandwidth Bandwidth per
convolution
Operations per
convolution
Energy per
convolution
2 3x4 +6% -47% +5% -30%
3 3x4 +12% -63% +10% -39%
4 3x4 +17% -71% +16% -43%
5 4x4 +29% -74% +26% -42%
21
All numbers are compared to “no forwarding”
conv conv conv conv
To remote
memory
From remote
memory
© 2020 Samsung
5 convolutions
# of Convolutions Feature Map
Tiling
Bandwidth Bandwidth per
convolution
Operations per
convolution
Energy per
convolution
2 3x4 +6% -47% +5% -30%
3 3x4 +12% -63% +10% -39%
4 3x4 +17% -71% +16% -43%
5 4x4 +29% -74% +26% -42%
22
All numbers are compared to “no forwarding”
conv conv conv conv conv
From remote
memory
To remote
memory
Overlaps are so big we
need smaller tiles to fit
in local memory
© 2020 Samsung
Tiling and increasing receptive fields
23
Minimal energy
Energy
[J]
• The trade-off between I/O and compute
power leads to a U-shaped behavior
• The trade-off requires an optimization
mechanism in the compiler
• In our toy example, all the convolutions
are identical
• In a real CNN, the convolutions are
different, so the optimization problem is
not one-dimensional
Number of convolutions
© 2020 Samsung
Tiling and increasing receptive fields
More constraints to take into account:
• Some hardware will prefer vertical tiling for memory access
• Increases overlap sizes
• Some hardware will impose constraints on tile size (multiple of a given number)
Out-of-scope ways to deal with overlaps:
• Multi-core processing → hardware considerations and system complexity
• Keep overlap data instead of loading/computing again: takes up memory and requires memory
fragmentation management → system complexity
24
Tile 4
Tile 3
Tile 2
Tile 1
Tile 3 Tile 4
Tile 1 Tile 2
© 2020 Samsung
Filters management
© 2020 Samsung
Filters management
Option 1: reload the filters from remote to
local memory for each tile
→ wasted bandwidth
26
Option 2: keep the filters of all convolutions
in local memory while processing all the tiles
→wasted space in the local memory
→ may increase the tiles number
Local memory
Filters for 1
convolution
Reload #tiles
times
Local memory
Filters for
convolution 1
Load once
Filters for
convolution n
© 2020 Samsung
Filters management
• No single strategy is the best all the time
• Choosing the right strategy can save up to
20% energy
• In practice, the problem is even more
complicated due to different convolutions
through the network
• The compiler should take the decision to
keep/reload the filters for each
convolution independently
27
0 100 200 300 400 500
Memory size [kB]
Energy
[J]
Local Memory Size [kB]
© 2020 Samsung
Case study of different modern
convolutional neural networks
28
© 2020 Samsung Huang 2017
Skip connections – residual block
Skip connections were introduced in the ResNet paper as
a way to facilitate back propagation
• Forwarding becomes more challenging: one strives
to keep both main path and skip connection in local
memory
Intensive usage in DenseNet
• Up to 6 skip connections alive simultaneously
29
He 2016
© 2020 Samsung
Skip connections – segmentation networks
U-net adds skip connections to encoder-decoder structure to
improve quality of segmentation
30
Ronneberger 2015.
© 2020 Samsung
Skip connections – segmentation networks
It can be mixed with residual blocks
31
Singla 2019
© 2020 Samsung
Inverted bottleneck
• Inverted bottleneck introduced in MobileNet
family
• Splitting the 3x3 convolution into 3x3 depthwise
convolution + 1x1 convolution reduces the
amount of parameters
• Can more easily be kept in memory and/or
reloaded
• 1x1 convolutions don’t create overlaps
• Skip connection on thinner feature map to
minimize memory impact
32
Sandler 2018
© 2020 Samsung
Conclusion
• Memory management is a complicated optimization problem
• The compiler affects many performance indices, in particular power and runtime
• The tradeoff is not intuitive and requires some evaluation mechanism
How do we do it?
• Map all the performance indices (power, runtime…) to a single cost
• The mapping may depend on the use case
• The compiler is able to analyze its decisions and evaluate the cost
• The compiler optimizes memory management to minimize the cost
33
© 2020 Samsung
Conclusion
What we didn’t cover:
• Pre-fetching for runtime reduction
• Multicore edge devices
• We described fixed hardware and fixed network. We actually co-design:
• Memory size, processing throughput, hardware limitations
• Network design to avoid access to remote memory
• Use of network architecture search
• Quantization and pruning to avoid access to remote memory
• Choice of bit-width and pruning rate per layer
34
© 2020 Samsung
Resources
He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE
conference on computer vision and pattern recognition. 2016.
Huang, Gao, et al. "Densely connected convolutional networks." Proceedings of the IEEE
conference on computer vision and pattern recognition. 2017.
Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. "U-net: Convolutional networks for
biomedical image segmentation." International Conference on Medical image computing and
computer-assisted intervention. Springer, Cham, 2015.
https://github.com/Nishanksingla/UNet-with-ResBlock
Sandler, Mark, et al. "Mobilenetv2: Inverted residuals and linear bottlenecks." Proceedings of the
IEEE conference on computer vision and pattern recognition. 2018.
35

More Related Content

What's hot

A new Post-Processing Pipeline
A new Post-Processing PipelineA new Post-Processing Pipeline
A new Post-Processing PipelineWolfgang Engel
 
Advancements in-tiled-rendering
Advancements in-tiled-renderingAdvancements in-tiled-rendering
Advancements in-tiled-renderingmistercteam
 
Hybrid Computing Platform for Combinatorial Optimization with the Coherent Is...
Hybrid Computing Platform for Combinatorial Optimization with the Coherent Is...Hybrid Computing Platform for Combinatorial Optimization with the Coherent Is...
Hybrid Computing Platform for Combinatorial Optimization with the Coherent Is...NTT Software Innovation Center
 
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The RunFive Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The RunElectronic Arts / DICE
 
Triangle Visibility buffer
Triangle Visibility bufferTriangle Visibility buffer
Triangle Visibility bufferWolfgang Engel
 
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...Johan Andersson
 
"Moving CNNs from Academic Theory to Embedded Reality," a Presentation from S...
"Moving CNNs from Academic Theory to Embedded Reality," a Presentation from S..."Moving CNNs from Academic Theory to Embedded Reality," a Presentation from S...
"Moving CNNs from Academic Theory to Embedded Reality," a Presentation from S...Edge AI and Vision Alliance
 
Comparison between JPEG(DCT) and JPEG 2000(DWT) compression standards
Comparison between JPEG(DCT) and JPEG 2000(DWT) compression standardsComparison between JPEG(DCT) and JPEG 2000(DWT) compression standards
Comparison between JPEG(DCT) and JPEG 2000(DWT) compression standardsRishab2612
 
CS101- Introduction to Computing- Lecture 33
CS101- Introduction to Computing- Lecture 33CS101- Introduction to Computing- Lecture 33
CS101- Introduction to Computing- Lecture 33Bilal Ahmed
 
Advanced Real-time Post-Processing using GPGPU techniques
Advanced Real-time Post-Processing using GPGPU techniquesAdvanced Real-time Post-Processing using GPGPU techniques
Advanced Real-time Post-Processing using GPGPU techniquesJohan Andersson
 
A Review on Image Compression using DCT and DWT
A Review on Image Compression using DCT and DWTA Review on Image Compression using DCT and DWT
A Review on Image Compression using DCT and DWTIJSRD
 
Multimedia image compression standards
Multimedia image compression standardsMultimedia image compression standards
Multimedia image compression standardsMazin Alwaaly
 
Survey on optical flow estimation with DL
Survey on optical flow estimation with DLSurvey on optical flow estimation with DL
Survey on optical flow estimation with DLLeapMind Inc
 
A 2.5D Culling for Forward+ (SIGGRAPH ASIA 2012)
A 2.5D Culling for Forward+ (SIGGRAPH ASIA 2012)A 2.5D Culling for Forward+ (SIGGRAPH ASIA 2012)
A 2.5D Culling for Forward+ (SIGGRAPH ASIA 2012)Takahiro Harada
 
Image compression and it’s security1
Image compression and it’s security1Image compression and it’s security1
Image compression and it’s security1Reyad Hossain
 
Multimedia communication jpeg
Multimedia communication jpegMultimedia communication jpeg
Multimedia communication jpegDr. Kapil Gupta
 

What's hot (20)

A new Post-Processing Pipeline
A new Post-Processing PipelineA new Post-Processing Pipeline
A new Post-Processing Pipeline
 
Advancements in-tiled-rendering
Advancements in-tiled-renderingAdvancements in-tiled-rendering
Advancements in-tiled-rendering
 
Image Compression
Image CompressionImage Compression
Image Compression
 
Hybrid Computing Platform for Combinatorial Optimization with the Coherent Is...
Hybrid Computing Platform for Combinatorial Optimization with the Coherent Is...Hybrid Computing Platform for Combinatorial Optimization with the Coherent Is...
Hybrid Computing Platform for Combinatorial Optimization with the Coherent Is...
 
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The RunFive Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
 
Image compression
Image compressionImage compression
Image compression
 
Triangle Visibility buffer
Triangle Visibility bufferTriangle Visibility buffer
Triangle Visibility buffer
 
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
 
"Moving CNNs from Academic Theory to Embedded Reality," a Presentation from S...
"Moving CNNs from Academic Theory to Embedded Reality," a Presentation from S..."Moving CNNs from Academic Theory to Embedded Reality," a Presentation from S...
"Moving CNNs from Academic Theory to Embedded Reality," a Presentation from S...
 
Comparison between JPEG(DCT) and JPEG 2000(DWT) compression standards
Comparison between JPEG(DCT) and JPEG 2000(DWT) compression standardsComparison between JPEG(DCT) and JPEG 2000(DWT) compression standards
Comparison between JPEG(DCT) and JPEG 2000(DWT) compression standards
 
CS101- Introduction to Computing- Lecture 33
CS101- Introduction to Computing- Lecture 33CS101- Introduction to Computing- Lecture 33
CS101- Introduction to Computing- Lecture 33
 
ARPS Architecture
ARPS ArchitectureARPS Architecture
ARPS Architecture
 
Advanced Real-time Post-Processing using GPGPU techniques
Advanced Real-time Post-Processing using GPGPU techniquesAdvanced Real-time Post-Processing using GPGPU techniques
Advanced Real-time Post-Processing using GPGPU techniques
 
A Review on Image Compression using DCT and DWT
A Review on Image Compression using DCT and DWTA Review on Image Compression using DCT and DWT
A Review on Image Compression using DCT and DWT
 
Multimedia image compression standards
Multimedia image compression standardsMultimedia image compression standards
Multimedia image compression standards
 
Shiny PC Graphics in Battlefield 3
Shiny PC Graphics in Battlefield 3Shiny PC Graphics in Battlefield 3
Shiny PC Graphics in Battlefield 3
 
Survey on optical flow estimation with DL
Survey on optical flow estimation with DLSurvey on optical flow estimation with DL
Survey on optical flow estimation with DL
 
A 2.5D Culling for Forward+ (SIGGRAPH ASIA 2012)
A 2.5D Culling for Forward+ (SIGGRAPH ASIA 2012)A 2.5D Culling for Forward+ (SIGGRAPH ASIA 2012)
A 2.5D Culling for Forward+ (SIGGRAPH ASIA 2012)
 
Image compression and it’s security1
Image compression and it’s security1Image compression and it’s security1
Image compression and it’s security1
 
Multimedia communication jpeg
Multimedia communication jpegMultimedia communication jpeg
Multimedia communication jpeg
 

Similar to “Improving Power Efficiency for Edge Inferencing with Memory Management Optimizations,” a Presentation from Samsung

Role of gis in telecommunications
Role of gis in telecommunicationsRole of gis in telecommunications
Role of gis in telecommunicationsAkhil Gupta
 
RECAP: The Simulation Approach
RECAP: The Simulation ApproachRECAP: The Simulation Approach
RECAP: The Simulation ApproachRECAP Project
 
Deep Learning for Computer Vision: Memory usage and computational considerati...
Deep Learning for Computer Vision: Memory usage and computational considerati...Deep Learning for Computer Vision: Memory usage and computational considerati...
Deep Learning for Computer Vision: Memory usage and computational considerati...Universitat Politècnica de Catalunya
 
Energy Efficient Mobile Applications with Mobile Cloud Computing ( MCC )
Energy Efficient Mobile Applications with Mobile Cloud Computing ( MCC )Energy Efficient Mobile Applications with Mobile Cloud Computing ( MCC )
Energy Efficient Mobile Applications with Mobile Cloud Computing ( MCC )Anand Bhojan
 
OPTIMIZING THE TICK STACK
OPTIMIZING THE TICK STACKOPTIMIZING THE TICK STACK
OPTIMIZING THE TICK STACKInfluxData
 
Dominoes: Exploratory Data Analysis of Software Repositories Through GPU Proc...
Dominoes: Exploratory Data Analysis of Software Repositories Through GPU Proc...Dominoes: Exploratory Data Analysis of Software Repositories Through GPU Proc...
Dominoes: Exploratory Data Analysis of Software Repositories Through GPU Proc...Jose Ricardo da Silva Junior
 
Introduction to CNN Models: DenseNet & MobileNet
Introduction to CNN Models: DenseNet & MobileNetIntroduction to CNN Models: DenseNet & MobileNet
Introduction to CNN Models: DenseNet & MobileNetKrishnakoumarC
 
“Trends in Neural Network Topologies for Vision at the Edge,” a Presentation ...
“Trends in Neural Network Topologies for Vision at the Edge,” a Presentation ...“Trends in Neural Network Topologies for Vision at the Edge,” a Presentation ...
“Trends in Neural Network Topologies for Vision at the Edge,” a Presentation ...Edge AI and Vision Alliance
 
Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)Vincenzo Gulisano
 
MobileNet - PR044
MobileNet - PR044MobileNet - PR044
MobileNet - PR044Jinwon Lee
 
Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016Graham Wihlidal
 
"Stateful app as an efficient way to build dispatching for riders and drivers...
"Stateful app as an efficient way to build dispatching for riders and drivers..."Stateful app as an efficient way to build dispatching for riders and drivers...
"Stateful app as an efficient way to build dispatching for riders and drivers...Fwdays
 
CVPR 2018 Paper Reading MobileNet V2
CVPR 2018 Paper Reading MobileNet V2CVPR 2018 Paper Reading MobileNet V2
CVPR 2018 Paper Reading MobileNet V2Khang Pham
 
“COVID-19 Safe Distancing Measures in Public Spaces with Edge AI,” a Presenta...
“COVID-19 Safe Distancing Measures in Public Spaces with Edge AI,” a Presenta...“COVID-19 Safe Distancing Measures in Public Spaces with Edge AI,” a Presenta...
“COVID-19 Safe Distancing Measures in Public Spaces with Edge AI,” a Presenta...Edge AI and Vision Alliance
 
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15MLconf
 
Presentation by Priyanka_Greendroid
Presentation by Priyanka_GreendroidPresentation by Priyanka_Greendroid
Presentation by Priyanka_GreendroidPRIYANKA KATKAR
 
Once-for-All: Train One Network and Specialize it for Efficient Deployment
 Once-for-All: Train One Network and Specialize it for Efficient Deployment Once-for-All: Train One Network and Specialize it for Efficient Deployment
Once-for-All: Train One Network and Specialize it for Efficient Deploymenttaeseon ryu
 
Video Terminal Evolution and The Future of Browsers
Video Terminal Evolution and The Future of BrowsersVideo Terminal Evolution and The Future of Browsers
Video Terminal Evolution and The Future of BrowsersThomas Walker Lynch
 
From data centers to fog computing: the evaporating cloud
From data centers to fog computing: the evaporating cloudFrom data centers to fog computing: the evaporating cloud
From data centers to fog computing: the evaporating cloudFogGuru MSCA Project
 

Similar to “Improving Power Efficiency for Edge Inferencing with Memory Management Optimizations,” a Presentation from Samsung (20)

Role of gis in telecommunications
Role of gis in telecommunicationsRole of gis in telecommunications
Role of gis in telecommunications
 
RECAP: The Simulation Approach
RECAP: The Simulation ApproachRECAP: The Simulation Approach
RECAP: The Simulation Approach
 
Deep Learning for Computer Vision: Memory usage and computational considerati...
Deep Learning for Computer Vision: Memory usage and computational considerati...Deep Learning for Computer Vision: Memory usage and computational considerati...
Deep Learning for Computer Vision: Memory usage and computational considerati...
 
Energy Efficient Mobile Applications with Mobile Cloud Computing ( MCC )
Energy Efficient Mobile Applications with Mobile Cloud Computing ( MCC )Energy Efficient Mobile Applications with Mobile Cloud Computing ( MCC )
Energy Efficient Mobile Applications with Mobile Cloud Computing ( MCC )
 
Masked Occlusion Culling
Masked Occlusion CullingMasked Occlusion Culling
Masked Occlusion Culling
 
OPTIMIZING THE TICK STACK
OPTIMIZING THE TICK STACKOPTIMIZING THE TICK STACK
OPTIMIZING THE TICK STACK
 
Dominoes: Exploratory Data Analysis of Software Repositories Through GPU Proc...
Dominoes: Exploratory Data Analysis of Software Repositories Through GPU Proc...Dominoes: Exploratory Data Analysis of Software Repositories Through GPU Proc...
Dominoes: Exploratory Data Analysis of Software Repositories Through GPU Proc...
 
Introduction to CNN Models: DenseNet & MobileNet
Introduction to CNN Models: DenseNet & MobileNetIntroduction to CNN Models: DenseNet & MobileNet
Introduction to CNN Models: DenseNet & MobileNet
 
“Trends in Neural Network Topologies for Vision at the Edge,” a Presentation ...
“Trends in Neural Network Topologies for Vision at the Edge,” a Presentation ...“Trends in Neural Network Topologies for Vision at the Edge,” a Presentation ...
“Trends in Neural Network Topologies for Vision at the Edge,” a Presentation ...
 
Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)
 
MobileNet - PR044
MobileNet - PR044MobileNet - PR044
MobileNet - PR044
 
Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016
 
"Stateful app as an efficient way to build dispatching for riders and drivers...
"Stateful app as an efficient way to build dispatching for riders and drivers..."Stateful app as an efficient way to build dispatching for riders and drivers...
"Stateful app as an efficient way to build dispatching for riders and drivers...
 
CVPR 2018 Paper Reading MobileNet V2
CVPR 2018 Paper Reading MobileNet V2CVPR 2018 Paper Reading MobileNet V2
CVPR 2018 Paper Reading MobileNet V2
 
“COVID-19 Safe Distancing Measures in Public Spaces with Edge AI,” a Presenta...
“COVID-19 Safe Distancing Measures in Public Spaces with Edge AI,” a Presenta...“COVID-19 Safe Distancing Measures in Public Spaces with Edge AI,” a Presenta...
“COVID-19 Safe Distancing Measures in Public Spaces with Edge AI,” a Presenta...
 
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
 
Presentation by Priyanka_Greendroid
Presentation by Priyanka_GreendroidPresentation by Priyanka_Greendroid
Presentation by Priyanka_Greendroid
 
Once-for-All: Train One Network and Specialize it for Efficient Deployment
 Once-for-All: Train One Network and Specialize it for Efficient Deployment Once-for-All: Train One Network and Specialize it for Efficient Deployment
Once-for-All: Train One Network and Specialize it for Efficient Deployment
 
Video Terminal Evolution and The Future of Browsers
Video Terminal Evolution and The Future of BrowsersVideo Terminal Evolution and The Future of Browsers
Video Terminal Evolution and The Future of Browsers
 
From data centers to fog computing: the evaporating cloud
From data centers to fog computing: the evaporating cloudFrom data centers to fog computing: the evaporating cloud
From data centers to fog computing: the evaporating cloud
 

More from Edge AI and Vision Alliance

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...Edge AI and Vision Alliance
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...Edge AI and Vision Alliance
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...Edge AI and Vision Alliance
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...Edge AI and Vision Alliance
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...Edge AI and Vision Alliance
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...Edge AI and Vision Alliance
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...Edge AI and Vision Alliance
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsightsEdge AI and Vision Alliance
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...Edge AI and Vision Alliance
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...Edge AI and Vision Alliance
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...Edge AI and Vision Alliance
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...Edge AI and Vision Alliance
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...Edge AI and Vision Alliance
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...Edge AI and Vision Alliance
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...Edge AI and Vision Alliance
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from SamsaraEdge AI and Vision Alliance
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...Edge AI and Vision Alliance
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...Edge AI and Vision Alliance
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...Edge AI and Vision Alliance
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...Edge AI and Vision Alliance
 

More from Edge AI and Vision Alliance (20)

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
 

Recently uploaded

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 

Recently uploaded (20)

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 

“Improving Power Efficiency for Edge Inferencing with Memory Management Optimizations,” a Presentation from Samsung

  • 1. © 2020 Samsung Improving Power Efficiency for Edge Inferencing Through Memory Management Optimizations Nathan Levy Samsung September 2020
  • 2. © 2020 Samsung Outline • Introduction • Tiling and forwarding • Filters management • Case study of different modern convolutional neural networks • Conclusion 2
  • 3. © 2020 Samsung Counting the I/O in the TOps per Watt KPI • Power consumption is composed of: • Compute power: perform the arithmetic computations • I/O power: bring the data to the compute engine and back • For convolutional neural networks, I/O power can be higher than compute • The compiler must optimize the memory management • Scope: convolutional neural network (CNN) acceleration • Focus on convolutions • Assume the network is fixed (and quantized, pruned…) • Inference only 3
  • 4. © 2020 Samsung Description of the working model What is a convolution? • Input: feature map and filters • Apply: multiply and accumulate • Output: feature map CNN can be processed by CPU, DSP, GPU, NPU/TPU • These often have multiple cores • These have multiple levels of memories (cache and scratch-pad) 4 Input feature map Filter zb Output feature map Output activation Filter za Output activation X i Yi X o Yo
  • 5. © 2020 Samsung Description of the working model Simplified toy example: • Single processing core • Small local memory with “free” access and big remote memory with expensive access Remote memory contains: • Input and output feature maps • Convolution filters • Intermediate feature maps (if needed) 5 Remote memory Conv 1 Local memory Processing unit Feature Map 1 Feature Map N Conv M Filters
  • 6. © 2020 Samsung Typical orders of magnitude for NPU • The memory management problem depends on the relative size of: • The local memory of the processor • The typical feature map of the network • Our working point: Feature map size ~ 10 * Local memory size 6 Local memory size [B] Feature map size [B] 10k 100k 1M 10M 10k 100k 1M 10M 100M IoT Mobile Automotive
  • 7. © 2020 Samsung Tiling and forwarding
  • 8. © 2020 Samsung Tiling and Forwarding How to overcome the small memory size: • Split the input feature map into smaller parts → tiles • Apply the convolution on the input feature map tile → Creates an output feature map tile The output feature tile can be either: • Written in remote memory • Used for the next convolution → This is called forwarding 8 Remote memory Conv 1 Local memory Processing unit IFM (not in mem) OFM (not in mem) Feature Map 1 Feature Map N Conv M Conv i Input feature map tile Output feature map tile Filters
  • 9. © 2020 Samsung How to tile and forward How does the compiler tile and forward to minimize power consumption? Key optimization parameters: • Power consumption: bandwidth (I/O) + compute → Focus of this talk • Processing throughput (operations/second) • Chip area → cost • System complexity, flexibility and maintenance 9
  • 10. © 2020 Samsung 3x3 conv 3x3 conv From remote memory To remote memory Tiling and increasing receptive fields 2 convolutions with 3x4 tiling. Without access to remote memory in the middle (forwarding) For i in [1-12]: • Load tile i of blue feature map from remote memory • Run 1st convolution → tile i of green feature map • Run 2nd convolution → tile i of yellow feature map • Store tile i of yellow feature map to remote memory
  • 11. © 2020 Samsung Tiling and increasing receptive fields 2 convolutions with 3x4 tiling. Without access to remote memory in the middle (forwarding) For i in [1-12]: • Load tile i of blue feature map from remote memory • Run 1st convolution → tile i of green feature map • Run 2nd convolution → tile i of yellow feature map • Store tile i of yellow feature map to remote memory 3x3 conv 3x3 conv From remote memory To remote memory
  • 12. © 2020 Samsung Tiling and increasing receptive fields 3x3 conv 3x3 conv From remote memory To remote memory 2 convolutions with 3x4 tiling. Without access to remote memory in the middle (forwarding) For i in [1-12]: • Load tile i of blue feature map from remote memory • Run 1st convolution → tile i of green feature map • Run 2nd convolution → tile i of yellow feature map • Store tile i of yellow feature map to remote memory
  • 13. © 2020 Samsung 3x3 conv 3x3 conv From remote memory To remote memory Tiling and increasing receptive fields 2 convolutions with 3x4 tiling. Without access to remote memory in the middle (forwarding) For i in [1-12]: • Load tile i of blue feature map from remote memory • Run 1st convolution → tile i of green feature map • Run 2nd convolution → tile i of yellow feature map • Store tile i of yellow feature map to remote memory
  • 14. © 2020 Samsung 3x3 conv 3x3 conv From remote memory To remote memory Tiling and increasing receptive fields 2 convolutions with 3x4 tiling. Without access to remote memory in the middle (forwarding) For i in [1-12]: • Load tile i of blue feature map from remote memory • Run 1st convolution → tile i of green feature map • Run 2nd convolution → tile i of yellow feature map • Store tile i of yellow feature map to remote memory
  • 15. © 2020 Samsung 3x3 conv 3x3 conv From remote memory To remote memory Tiling and increasing receptive fields 2 convolutions with 3x4 tiling. Without access to remote memory in the middle (forwarding) For i in [1-12]: • Load tile i of blue feature map from remote memory • Run 1st convolution → tile i of green feature map • Run 2nd convolution → tile i of yellow feature map • Store tile i of yellow feature map to remote memory
  • 16. © 2020 Samsung 3x3 conv 3x3 conv From remote memory To remote memory Tiling and increasing receptive fields 2 convolutions with 3x4 tiling. Without access to remote memory in the middle (forwarding) Problem: 3x3 convolution requires margins
  • 17. © 2020 Samsung 3x3 conv 3x3 conv From remote memory To remote memory Tiling and increasing receptive fields 2 convolutions with 3x4 tiling. Without access to remote memory in the middle (forwarding) Solution: Bigger blue and green tiles Problems: • Load more data for the first (blue) layer • Compute more data for the intermediate (green) layer 2 pixels margin 1 pixel margin
  • 18. © 2020 Samsung Tiling and increasing receptive fields Consider a long chain of convolutions. How many stages of forwarding do you want? • Forwarding tiles through as many layers as possible avoids dumping to remote memory • This causes the overlaps to grow • More data to load for the first layer • Redundant processing 18 conv conv conv conv
  • 19. © 2020 Samsung 2 convolutions # of Convolutions Feature Map Tiling Bandwidth Bandwidth per convolution Operations per convolution Energy per convolution 2 3x4 +6% -47% +5% -30% 3 3x4 +12% -63% +10% -39% 4 3x4 +17% -71% +16% -43% 5 4x4 +29% -74% +26% -42% 19 All numbers are compared to “no forwarding” conv conv To remote memory From remote memory Load the overlaps Compute the overlaps Less I/O More compute Break the feature map into 12 to fit in memory
  • 20. © 2020 Samsung 3 convolutions # of Convolutions Feature Map Tiling Bandwidth Bandwidth per convolution Operations per convolution Energy per convolution 2 3x4 +6% -47% +5% -30% 3 3x4 +12% -63% +10% -39% 4 3x4 +17% -71% +16% -43% 5 4x4 +29% -74% +26% -42% 20 All numbers are compared to “no forwarding” conv conv conv To remote memory From remote memory
  • 21. © 2020 Samsung 4 convolutions # of Convolutions Feature Map Tiling Bandwidth Bandwidth per convolution Operations per convolution Energy per convolution 2 3x4 +6% -47% +5% -30% 3 3x4 +12% -63% +10% -39% 4 3x4 +17% -71% +16% -43% 5 4x4 +29% -74% +26% -42% 21 All numbers are compared to “no forwarding” conv conv conv conv To remote memory From remote memory
  • 22. © 2020 Samsung 5 convolutions # of Convolutions Feature Map Tiling Bandwidth Bandwidth per convolution Operations per convolution Energy per convolution 2 3x4 +6% -47% +5% -30% 3 3x4 +12% -63% +10% -39% 4 3x4 +17% -71% +16% -43% 5 4x4 +29% -74% +26% -42% 22 All numbers are compared to “no forwarding” conv conv conv conv conv From remote memory To remote memory Overlaps are so big we need smaller tiles to fit in local memory
  • 23. © 2020 Samsung Tiling and increasing receptive fields 23 Minimal energy Energy [J] • The trade-off between I/O and compute power leads to a U-shaped behavior • The trade-off requires an optimization mechanism in the compiler • In our toy example, all the convolutions are identical • In a real CNN, the convolutions are different, so the optimization problem is not one-dimensional Number of convolutions
  • 24. © 2020 Samsung Tiling and increasing receptive fields More constraints to take into account: • Some hardware will prefer vertical tiling for memory access • Increases overlap sizes • Some hardware will impose constraints on tile size (multiple of a given number) Out-of-scope ways to deal with overlaps: • Multi-core processing → hardware considerations and system complexity • Keep overlap data instead of loading/computing again: takes up memory and requires memory fragmentation management → system complexity 24 Tile 4 Tile 3 Tile 2 Tile 1 Tile 3 Tile 4 Tile 1 Tile 2
  • 26. © 2020 Samsung Filters management Option 1: reload the filters from remote to local memory for each tile → wasted bandwidth 26 Option 2: keep the filters of all convolutions in local memory while processing all the tiles →wasted space in the local memory → may increase the tiles number Local memory Filters for 1 convolution Reload #tiles times Local memory Filters for convolution 1 Load once Filters for convolution n
  • 27. © 2020 Samsung Filters management • No single strategy is the best all the time • Choosing the right strategy can save up to 20% energy • In practice, the problem is even more complicated due to different convolutions through the network • The compiler should take the decision to keep/reload the filters for each convolution independently 27 0 100 200 300 400 500 Memory size [kB] Energy [J] Local Memory Size [kB]
  • 28. © 2020 Samsung Case study of different modern convolutional neural networks 28
  • 29. © 2020 Samsung Huang 2017 Skip connections – residual block Skip connections were introduced in the ResNet paper as a way to facilitate back propagation • Forwarding becomes more challenging: one strives to keep both main path and skip connection in local memory Intensive usage in DenseNet • Up to 6 skip connections alive simultaneously 29 He 2016
  • 30. © 2020 Samsung Skip connections – segmentation networks U-net adds skip connections to encoder-decoder structure to improve quality of segmentation 30 Ronneberger 2015.
  • 31. © 2020 Samsung Skip connections – segmentation networks It can be mixed with residual blocks 31 Singla 2019
  • 32. © 2020 Samsung Inverted bottleneck • Inverted bottleneck introduced in MobileNet family • Splitting the 3x3 convolution into 3x3 depthwise convolution + 1x1 convolution reduces the amount of parameters • Can more easily be kept in memory and/or reloaded • 1x1 convolutions don’t create overlaps • Skip connection on thinner feature map to minimize memory impact 32 Sandler 2018
  • 33. © 2020 Samsung Conclusion • Memory management is a complicated optimization problem • The compiler affects many performance indices, in particular power and runtime • The tradeoff is not intuitive and requires some evaluation mechanism How do we do it? • Map all the performance indices (power, runtime…) to a single cost • The mapping may depend on the use case • The compiler is able to analyze its decisions and evaluate the cost • The compiler optimizes memory management to minimize the cost 33
  • 34. © 2020 Samsung Conclusion What we didn’t cover: • Pre-fetching for runtime reduction • Multicore edge devices • We described fixed hardware and fixed network. We actually co-design: • Memory size, processing throughput, hardware limitations • Network design to avoid access to remote memory • Use of network architecture search • Quantization and pruning to avoid access to remote memory • Choice of bit-width and pruning rate per layer 34
  • 35. © 2020 Samsung Resources He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. Huang, Gao, et al. "Densely connected convolutional networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017. Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. "U-net: Convolutional networks for biomedical image segmentation." International Conference on Medical image computing and computer-assisted intervention. Springer, Cham, 2015. https://github.com/Nishanksingla/UNet-with-ResBlock Sandler, Mark, et al. "Mobilenetv2: Inverted residuals and linear bottlenecks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018. 35