SlideShare a Scribd company logo
1 of 44
Flow Mapping and Data Distribution on Mesh-based
Deep Learning Accelerator
Science and Research
Branch of Azad
University
Presenting by Hesam Shabani
Seyedeh Yasaman Hosseini Mirmahaleh1, Midia Reshadi1, Hesam Shabani2, Xiaochen Guo2, Nader Bagherzadeh3
1Department of Computer Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran,
2Lehigh University, Bethlehem, PA, USA
3Department of Electrical Engineering and Computer Science, University of California Irvine, Irvine, CA, USA
yasaman.hosseini@srbiau.ac.ir
NOCS2019
Titles of presentation
Introduction
Investigating some related works
The purposes of our proposed deep learning accelerator
Evaluated parameters
Flow mapping method on a mesh topology
Influence of dataflow on energy consumption
Row-node stationary-based dataflow approach
Traffic distribution based on distributer nodes
Experimental results
Conclusion
Acknowledgment
1NOCS2019
 Deploying machine learning algorithm-based applications
 Internet of Things (IoT)
 Web search engines
 Image processing and data mining-based applications
 Increasing depth and complexity of neural networks
 Challenges regarding increasing depth and complexity of
convolutional and deep neural networks (CNN and DNN)
 Increasing energy consumption
 Memory capacity
 Bandwidth requirement
 Memory access
 Delay
 Proposed deep learning accelerators for facing CNN and DNN
problems
 Supercomputer
 Communication networks
 Memory logics
 Proposed our method for improving delay, energy consumption,
bandwidth, and memory requirements
 Flow mapping
 Distributer nodes
 New traffic distribution mechanism on a mesh topology
 Simple structure for router with tiny switches
Introduction
Investigating some related works
The purposes of our proposed deep learning accelerator
Evaluated parameters
Flow mapping method on a mesh topology
Influence of dataflow on energy consumption
Row-node stationary-based dataflow approach
Traffic distribution based on distributer nodes
Experimental results
Conclusion
Acknowledgment
2
NOCS2019
Investigating advantages and disadvantages of proposed deep learning
accelerators (DLA)
Accelerator Advantage Disadvantage
TPU [6] Speed up processing Dataflow dependency
DaDianNao [1] Speedup processing compared with
GPU, Improving memory capacity
and energy consumption
Inflexible, complexity of neuron
mapping, Implementing train and
inference phases, integrating optical
interconnections and electrical
connections, computation dependency
Eyeriss [5] Improving memory access, reducing
bandwidth requirement and delay
No flexibility and scableity, No
supporting sparse DNN (SDNN),
computation dependency
Eyeriss V.2 [16] Scableity, supporting SDNN Increasing complexity of MAC
MAERI [8] Speed up processing, improving
memory access, flexibility,
independent to dataflow
Restricted to only one direction for
traffic distribution, increasing power
consumption compared other
accelerators
Introduction
Investigating some related works
The purposes of our proposed deep learning accelerator
Evaluated parameters
Flow mapping method on a mesh topology
Influence of dataflow on energy consumption
Row-node stationary-based dataflow approach
Traffic distribution based on distributer nodes
Experimental results
Conclusion
Acknowledgment
Advantage and Disadvantage GPU-based systems [38]
Advantage Flexibility
Disadvantage High energy consumption
3
NOCS2019
 A new traffic distribution mechanism on a mesh topology using
distributer nodes
 Providing a flexible structure of proposed our DLA based on
filter, kernel, and channel sizes of CNN and DNN trained models
 Focus on a mesh topology as a communication network for
accelerating
 Flexible location of distributer nodes on a mesh topology based
on filter, kernel, and channel sizes
 Row-node stationary for flow mapping
 Improving online implementing trained models using reducing
the parameters
 Delay
 Energy consumption
 Memory access
 Bandwidth requirement
 Analyzing and distributing the traffic of AlexNet, VGG-16, and
GoogleNet as the examples of CNN and DNN models
Introduction
Investigating some related works
The purposes of our proposed deep learning accelerator
Evaluated parameters
Flow mapping method on a mesh topology
Influence of dataflow on energy consumption
Row-node stationary-based dataflow approach
Traffic distribution based on distributer nodes
Experimental results
Conclusion
Acknowledgment
4
NOCS2019
Area consumption
Energy consumption
Delay
Average utilization
Bandwidth requirement
Memory access
Introduction
Investigating some related works
The purposes of our proposed deep learning accelerator
Evaluated parameters
Flow mapping method on a mesh topology
Influence of dataflow on energy consumption
Row-node stationary-based dataflow approach
Traffic distribution based on distributer nodes
Experimental results
Conclusion
Acknowledgment
5
NOCS2019
AlexNet traffic distribution as an example of CNN on a
mesh topology
Partitioning the mesh based on kernel, filter, and channel
sizes of AlexNet as an example for describing partitioning
Our proposed mesh based DLA architecture
 Architecture of proposed DLA
 Router
 Switches
 Switch selector
Introduction
Investigating some related works
The purposes of our proposed deep learning accelerator
Evaluated parameters
Flow mapping method on a mesh topology
Influence of dataflow on energy consumption
Row-node stationary-based dataflow approach
Traffic distribution based on distributer nodes
Experimental results
Conclusion
Acknowledgment
6
NOCS2019
AlexNet traffic distribution on 12×14 2D
mesh
2D mesh 12×14
(a)
2D mesh 12×14 2D mesh 12×14
(c)
2D mesh 12×14
(d)
2D mesh 12×14
(e)(b)
CONV1
11×55 CONV2
5×27
CONV3
3×13
CONV4
3×13
CONV5
3×13
7
NOCS2019
Partitioning the mesh based on kernel, filter, and
channel sizes of AlexNet for CONV1
Introduction
Investigating some related works
The purposes of our proposed deep learning accelerator
Evaluated parameters
Flow mapping method on a mesh topology
Influence of dataflow on energy consumption
Row-node stationary-based dataflow approach
Traffic distribution based on distributer nodes
Experimental results
Conclusion
Acknowledgment
8
NOCS2019
AlexNet
architecture
[19]
Partitioning the mesh based on kernel, filter,
and channel sizes of AlexNet for CONV1
Introduction
Investigating some related works
The purposes of our proposed deep learning accelerator
Evaluated parameters
Flow mapping method on a mesh topology
Influence of dataflow on energy consumption
Row-node stationary-based dataflow approach
Traffic distribution based on distributer nodes
Experimental results
Conclusion
Acknowledgment
11×7
9
NOCS2019
Partitioning the mesh based on kernel, filter,
and channel sizes of AlexNet for CONV1
Introduction
Investigating some related works
The purposes of our proposed deep learning accelerator
Evaluated parameters
Flow mapping method on a mesh topology
Influence of dataflow on energy consumption
Row-node stationary-based dataflow approach
Traffic distribution based on distributer nodes
Experimental results
Conclusion
Acknowledgment
11×7 11×7
10
NOCS2019
Partitioning the mesh based on kernel, filter,
and channel sizes of AlexNet for CONV2
Introduction
Investigating some related works
The purposes of our proposed deep learning accelerator
Evaluated parameters
Flow mapping method on a mesh topology
Influence of dataflow on energy consumption
Row-node stationary-based dataflow approach
Traffic distribution based on distributer nodes
Experimental results
Conclusion
Acknowledgment
11
NOCS2019
Partitioning the mesh based on kernel, filter,
and channel sizes of AlexNet for CONV2
Introduction
Investigating some related works
The purposes of our proposed deep learning accelerator
Evaluated parameters
Flow mapping method on a mesh topology
Influence of dataflow on energy consumption
Row-node stationary-based dataflow approach
Traffic distribution based on distributer nodes
Experimental results
Conclusion
Acknowledgment
5×13
12
NOCS2019
Partitioning the mesh based on kernel, filter,
and channel sizes of AlexNet for CONV2
Introduction
Investigating some related works
The purposes of our proposed deep learning accelerator
Evaluated parameters
Flow mapping method on a mesh topology
Influence of dataflow on energy consumption
Row-node stationary-based dataflow approach
Traffic distribution based on distributer nodes
Experimental results
Conclusion
Acknowledgment
5×13
5×14
13
NOCS2019
Partitioning the mesh based on kernel, filter,
and channel sizes of AlexNet for CONV3-5
Introduction
Investigating some related works
The purposes of our proposed deep learning accelerator
Evaluated parameters
Flow mapping method on a mesh topology
Influence of dataflow on energy consumption
Row-node stationary-based dataflow approach
Traffic distribution based on distributer nodes
Experimental results
Conclusion
Acknowledgment
14
NOCS2019
Partitioning the mesh based on kernel, filter,
and channel sizes of AlexNet for CONV3-5
Introduction
Investigating some related works
The purposes of our proposed deep learning accelerator
Evaluated parameters
Flow mapping method on a mesh topology
Influence of dataflow on energy consumption
Row-node stationary-based dataflow approach
Traffic distribution based on distributer nodes
Experimental results
Conclusion
Acknowledgment
3×13
3×13
3×13
3×13
15
NOCS2019
Architecture of proposed DLAIntroduction
Investigating some related works
The purposes of our proposed deep learning accelerator
Evaluated parameters
Flow mapping method on a mesh topology
Influence of dataflow on energy consumption
Row-node stationary-based dataflow approach
Traffic distribution based on distributer nodes
Experimental results
Conclusion
Acknowledgment
ifmap
Filter
Psum
GlobalBuffer
16
NOCS2019
Architecture of proposed DLA
Introduction
Investigating some related works
The purposes of our proposed deep learning accelerator
Evaluated parameters
Flow mapping method on a mesh topology
Influence of dataflow on energy consumption
Row-node stationary-based dataflow approach
Traffic distribution based on distributer nodes
Experimental results
Conclusion
Acknowledgment
ifmap
Filter
Psum
GlobalBuffer
Switch selector
17
NOCS2019
Architecture of proposed DLA
Introduction
Investigating some related works
The purposes of our proposed deep learning accelerator
Evaluated parameters
Flow mapping method on a mesh topology
Influence of dataflow on energy consumption
Row-node stationary-based dataflow approach
Traffic distribution based on distributer nodes
Experimental results
Conclusion
Acknowledgment
ifmap
Filter
Psum
GlobalBuffer
12×15 2D Mesh
12×14
Switch selector
18
NOCS2019
Router
Introduction
Investigating some related works
The purposes of our proposed deep learning accelerator
Evaluated parameters
Flow mapping method on a mesh topology
Influence of dataflow on energy consumption
Row-node stationary-based dataflow approach
Traffic distribution based on distributer nodes
Experimental results
Conclusion
Acknowledgment
North
Switch
West
South
East
Multicast
Buffer
Local Buffer
Buffer
Buffer
Buffer
Buffer
Utilizing multicast buffer,
on/off buffer backpressure
mechanism, and two-stage
pipeline
19
NOCS2019
SwitchIntroduction
Investigating some related works
The purposes of our proposed deep learning accelerator
Evaluated parameters
Flow mapping method on a mesh topology
Influence of dataflow on energy consumption
Row-node stationary-based dataflow approach
Traffic distribution based on distributer nodes
Experimental results
Conclusion
Acknowledgment
N
S
W
E
Clk EN
s0
s1 s3
s2
N
S
W
E
MUX
DeMUX
Local port Local port
20
NOCS2019
Switch selectorIntroduction
Investigating some related works
The purposes of our proposed deep learning accelerator
Evaluated parameters
Flow mapping method on a mesh topology
Influence of dataflow on energy consumption
Row-node stationary-based dataflow approach
Traffic distribution based on distributer nodes
Experimental results
Conclusion
Acknowledgment
S1
S0
S2
S3
S4
S1S0S2S3S4
S1S0S2S3S4
11111
11111
EN
EN
In0
In1
In3
Switch
address
Switch
address
C-decoderR-decoder
Mux
0
Mux
N-1
0
N-1
0
N-1
21
NOCS2019
 Weight stationary (WS): Weight elements are received from the GB and
broadcasted to PEs and after fixing in each PE, convolution calculation is
performed between fixed weight in each PE and ifmap elements
broadcasted from GB onto PEs [3], [4].
 Microswitch array [12]
 Output stationary (OS): In output-stationary DLA, outputs or both
weights and input activations are mapped to PEs from GB. The Psum
results are sent to the GB after finishing local computation [2], [4], [7].
 TPU
 Systolic array
 Row stationary (RS): The ifmap and filter are transferred from the GB to
PE units horizontally, whereas Psums are accumulated vertically by a
multiply-accumulate (MAC) operation of PEs, and accumulated Psums
are transferred to the GB [5].
 Eyeriss [5]
 Eyeriss V.2 [16]
 Microswitch array [4]
 Row-node stationary (RNS): We propose row-node stationary (RNS)
dataflow as a state-of-the-art approach for traffic distribution of DNN
trained models based on flow mapping and memory access mechanism.
An accelerator can transfer data on sets of nodes based on RNS dataflow
in the vertical and horizontal directions using distributer nodes in parallel.
Introduction
Investigating some related works
The purposes of our proposed deep learning accelerator
Evaluated parameters
Flow mapping method on a mesh topology
Influence of dataflow on energy consumption
Row-node stationary-based dataflow approach
Traffic distribution based on distributer nodes
Experimental results
Conclusion
Acknowledgment
22
NOCS2019
Introduction
Investigating some related works
The purposes of our proposed deep learning accelerator
Evaluated parameters
Flow mapping method on a mesh topology
Influence of dataflow on energy consumption
Row-node stationary-based dataflow approach
Traffic distribution based on distributer nodes
Experimental results
Conclusion
Acknowledgment
Filter row 1
Filterrow2
Filterrow3
Filterrow3
Ifmap row 1
Ifmaprow2
Ifm
ap
row
3
Ifmaprow4
Ifmaprow5
Ifmaprow3
Ifmap row 3
Ifmap row 2 Ifmap row 4
Filter row 2
Filter row 3
Filterrow3
Filterrow3
Node
(a) (b)
Filterrow1
Ifmaprow1
Distributer
Node
Psum
row3
Psum
row1
Psum
row2
Filter row 3
(c)
A row of ifmap values is
reused and distributed in
vertical and horizontal
directions based on the
location of distributer node
A row of filter weights is reused
and distributed in vertical and
horizontal directions based on
the location of distributer node
A row of Psums is
accumulated
vertically
23
NOCS2019
Introduction
Investigating some related works
The purposes of our proposed deep learning accelerator
Evaluated parameters
Flow mapping method on a mesh topology
Influence of dataflow on energy consumption
Row-node stationary-based dataflow approach
Traffic distribution based on distributer nodes
Experimental results
Conclusion
Acknowledgment 12×15 2D Mesh
(a)Destination node
12×14
ifmap
Psum
Filter
Shared bus
Distributer node
AlexNet traffic distribution for
CONV1 on 12×15 2D mesh
using distributer nodes
24
NOCS2019
Introduction
Investigating some related works
The purposes of our proposed deep learning accelerator
Evaluated parameters
Flow mapping method on a mesh topology
Influence of dataflow on energy consumption
Row-node stationary-based dataflow approach
Traffic distribution based on distributer nodes
Experimental results
Conclusion
Acknowledgment
12×15 2D Mesh
(a)Destination node
12×14
ifmap
Psum
Filter
Shared bus
Distributer node
AlexNet traffic distribution for
CONV1 on 12×15 2D mesh
using distributer nodes
25
NOCS2019
Introduction
Investigating some related works
The purposes of our proposed deep learning accelerator
Evaluated parameters
Flow mapping method on a mesh topology
Influence of dataflow on energy consumption
Row-node stationary-based dataflow approach
Traffic distribution based on distributer nodes
Experimental results
Conclusion
Acknowledgment
12×15 2D Mesh
(a)Destination node
12×14
ifmap
Psum
Filter
Shared bus
Distributer node
AlexNet traffic distribution for
CONV1 on 12×15 2D mesh using
distributer nodes
26
NOCS2019
Introduction
Investigating some related works
The purposes of our proposed deep learning accelerator
Evaluated parameters
Flow mapping method on a mesh topology
Influence of dataflow on energy consumption
Row-node stationary-based dataflow approach
Traffic distribution based on distributer nodes
Experimental results
Conclusion
Acknowledgment
12×15 2D Mesh
(b)
Distributer node
12×14
Destination node
AlexNet traffic distribution for
CONV2 on 12×15 2D mesh
using distributer nodes
27
NOCS2019
Introduction
Investigating some related works
The purposes of our proposed deep learning accelerator
Evaluated parameters
Flow mapping method on a mesh topology
Influence of dataflow on energy consumption
Row-node stationary-based dataflow approach
Traffic distribution based on distributer nodes
Experimental results
Conclusion
Acknowledgment
12×15 2D Mesh
(b)
Distributer node
12×14
Destination node
AlexNet traffic distribution
for CONV2 on 12×15 2D mesh
using distributer nodes
28
NOCS2019
Introduction
Investigating some related works
The purposes of our proposed deep learning accelerator
Evaluated parameters
Flow mapping method on a mesh topology
Influence of dataflow on energy consumption
Row-node stationary-based dataflow approach
Traffic distribution based on distributer nodes
Experimental results
Conclusion
Acknowledgment
12×15 2D Mesh
(b)
Distributer node
12×14
Destination node
AlexNet traffic distribution for
CONV2 on 12×15 2D mesh using
distributer nodes
29
NOCS2019
Introduction
Investigating some related works
The purposes of our proposed deep learning accelerator
Evaluated parameters
Flow mapping method on a mesh topology
Influence of dataflow on energy consumption
Row-node stationary-based dataflow approach
Traffic distribution based on distributer nodes
Experimental results
Conclusion
Acknowledgment
12×15 2D Mesh
(c)
12×14
ifmap
Psum
Filter
Shared bus
Destination node
AlexNet traffic distribution for
CONV1 on 12×15 2D mesh without
distributer nodes
30
NOCS2019
Introduction
Investigating some related works
The purposes of our proposed deep learning accelerator
Evaluated parameters
Flow mapping method on a mesh topology
Influence of dataflow on energy consumption
Row-node stationary-based dataflow approach
Traffic distribution based on distributer nodes
Experimental results
Conclusion
Acknowledgment
12×15 2D Mesh
(c)
12×14
ifmap
Psum
Filter
Shared bus
Destination node
AlexNet traffic distribution for
CONV1 on 12×15 2D mesh
without distributer nodes
31
NOCS2019
Introduction
Investigating some related works
The purposes of our proposed deep learning accelerator
Evaluated parameters
Flow mapping method on a mesh topology
Influence of dataflow on energy consumption
Row-node stationary-based dataflow approach
Traffic distribution based on distributer nodes
Experimental results
Conclusion
Acknowledgment
12×15 2D Mesh
(d)
12×14
Destination node
AlexNet traffic distribution for
CONV2 on 12×15 2D mesh
without distributer nodes
32
NOCS2019
Introduction
Investigating some related works
The purposes of our proposed deep learning accelerator
Evaluated parameters
Flow mapping method on a mesh topology
Influence of dataflow on energy consumption
Row-node stationary-based dataflow approach
Traffic distribution based on distributer nodes
Experimental results
Conclusion
Acknowledgment
Destination node
12×15 2D Mesh
(d)
12×14
AlexNet traffic distribution for
CONV2 on 12×15 2D mesh
without distributer nodes
33
NOCS2019
Introduction
Investigating some related works
The purposes of our proposed deep learning accelerator
Evaluated parameters
Flow mapping method on a mesh topology
Influence of dataflow on energy consumption
Row-node stationary-based dataflow approach
Traffic distribution based on distributer nodes
Experimental results
Conclusion
Acknowledgment
12×15 2D Mesh 12×15 2D Mesh
(a) (b)
12×15 2D Mesh 12×15 2D Mesh
(c) (d)
Distributer node
Destination node
12×14 12×14
12×14 12×14
ifmap
Psum
Filter
ifmap
Psum
Filter
Shared bus
Shared bus
AlexNet traffic distribution for
CONV1 on 12×15 2D mesh using
distributer nodes
AlexNet traffic distribution for
CONV1 on 12×15 2D mesh
without distributer nodes
34
NOCS2019
Introduction
Investigating some related works
The purposes of our proposed deep learning accelerator
Evaluated parameters
Flow mapping method on a mesh topology
Influence of dataflow on energy consumption
Row-node stationary-based dataflow approach
Traffic distribution based on distributer nodes
Experimental results
Conclusion
Acknowledgment
0.00E+00
5.00E-06
1.00E-05
1.50E-05
2.00E-05
2.50E-05
3.00E-05
12×15 mesh without
distributer node
12×15 mesh with
distributer node
Maeri
Totalenergy(J)
Total Energy
12×15 mesh without
distributer node
12×15 mesh with distributer
node
Maeri
Comparing total energy of 12×15
2D mesh with distributer nodes,
12×15 2D mesh without distributer
nodes and Maeri
4600
4620
4640
4660
4680
4700
4720
12×15 mesh without
distributer node
12×15 mesh with
distributer node
Maeri
Totaldelay(Cycle)
Total Delay
12×15 mesh without
distributer node
12×15 mesh with
distributer node
Maeri
Comparing total delay of 12×15 2D
mesh with distributer nodes, 12×15
2D mesh without distributer nodes
and Maeri
35
NOCS2019
Introduction
Investigating some related works
The purposes of our proposed deep learning accelerator
Evaluated parameters
Flow mapping method on a mesh topology
Influence of dataflow on energy consumption
Row-node stationary-based dataflow approach
Traffic distribution based on distributer nodes
Experimental results
Conclusion
Acknowledgment
0.00E+00
1.00E+03
2.00E+03
3.00E+03
4.00E+03
5.00E+03
6.00E+03
7.00E+03
8.00E+03
9.00E+03
Eyeriss Maeri Mesh
NumberofLUTs
FPGA LUT
Eyeriss
Maeri
Mesh
Comparing switch area consumption
of 12×15 2D mesh with distributer
nodes, 168 switches of Eyeriss and
64 multiplier switches of Maeri
0
50
100
150
200
250
300
350
12×15 mesh without distributer
node
12×15 mesh with distributer node
Memoryaccess(Cycles)
Memory access
Comparing memory access of 12×15 2D mesh
with distributer nodes and without using
distributer nodes for AlexNet traffic
distribution
based on cycles for writing and read memory
36
NOCS2019
Table 1. Total run time comparing between various dataflows with 168 PEs for
CONV1 and CONV11 of VGG-16Introduction
Investigating some related works
The purposes of our proposed deep learning accelerator
Evaluated parameters
Flow mapping method on a mesh topology
Influence of dataflow on energy consumption
Row-node stationary-based dataflow approach
Traffic distribution based on distributer nodes
Experimental results
Conclusion
Acknowledgment
CONV Dataflow Total runtime (Cycle)
1 RN 17034
1 NLR 501258240
1 Ws 25961600
1 Shi 249446400
1 DLA 1157409792
1 RS 164204544
11 RN 17722
11 NLR 360316928
11 Ws 217317376
11 Shi 2020081664
11 DLA 673876224
11 RS 830472192
Table2. Average utilization and run time comparison between various topologies
for AlexNet and GoogleNet traffic distribution
Trained model Topology Array size
Compute
runtime
(Cycle)
Average
utilization
(%)
AlexNet
Proposed mesh based
DLA
12×14 113352 88.57
AlexNet TPU 256×256 10026200 96.25
AlexNet Systolic array 32×32 2504183 99.12
AlexNet Eyeriss 12×14 16377164 98.05
GoogleNet
Proposed mesh based
DLA
12×14 180182 84.52
GoogleNet TPU 256×256 259827 68.67
GoogleNet Systolic array 256×256 297163 68.67
37
NOCS2019
Table3. Bandwidth requirement comparing between various topologies for
AlexNet, GoogleNet and VGG-16 traffic distributions
Introduction
Investigating some related works
The purposes of our proposed deep learning accelerator
Evaluated parameters
Flow mapping method on a mesh topology
Influence of dataflow on energy consumption
Row-node stationary-based dataflow approach
Traffic distribution based on distributer nodes
Experimental results
Conclusion
Acknowledgment
Trained
model
Topology Array size
Bandwidth requirement
(Byte/Cycle)
GoogleNet
Proposed mesh based
DLA
12×14 0.08
GoogleNet TPU 256×256 3.62
GoogleNet Systolic array 256×256 49.71
AlexNet
Proposed mesh based
DLA
12×14 0.08
AlexNet TPU 256×256 3.14
AlexNet Systolic array 256×256 3.14
AlexNet Eyeriss 12×14 1.02
VGG-16
Proposed mesh based
DLA
12×14 0.08
VGG-16 TPU 256×256 4.38
VGG-16 Systolic array 256×256 12.108
VGG-16 Eyeriss 12×14 0.9
0.00E+00
5.00E+04
1.00E+05
1.50E+05
2.00E+05
2.50E+05
3.00E+05
3.50E+05
AlexNet VGG-16 GoogleNet
Totalruntime(Cycle)
Trained models
Total Runtime
Total runtime of traffic
distribution of AlexNet, VGG-
16, and GoogleNet on the
mesh
38
NOCS2019
Introducing used simulation tools
 Deploying a cycle-accurate simulation tool based on SystemC inspired
by the Noxim tool [13], [10], [15]
 Xilinx Vivado tool [11], [14]
 Scale-sim as a Python-based cycle-accurate tool [17], [18]
 Maestro as a SystemC-based tool [9], [12]
A summary of simulation results
 Reducing energy consumption for distributing traffic with distributer
nodes by approximately 8% compared to without distributer nodes
 Decreasing energy consumption and total delay for 12×15 2D mesh
with distributer node by approximately 43.66% and 0.59% compared
with Maeri, respectively
 Reducing area consumption based on LUT for 12×15 2D mesh with
distributer nodes by approximately 93.56% as compared to Maeri
 Reducing memory access approximately 62.5% compared to using no
distributer nodes in AlexNet traffic on 12×15 mesh
 Decreasing total runtime for row-node stationary (RN) by
approximately 99% compared with weight stationary (WS) dataflow in
CONV1 and CONV11 of VGG-16
 Improving compute runtime and average utilization of our proposed
DLA by approximately 30.65 % and 18.75% compared with TPU for
first nine-convolutions of GoogleNet, respectively
 Improving bandwidth requirement for mesh by approximately 98.17
and 91.1% compared with TPU and Eyeriss for VGG-16 traffic
distribution, respectively
Introduction
Investigating some related works
The purposes of our proposed deep learning accelerator
Evaluated parameters
Flow mapping method on a mesh topology
Influence of dataflow on energy consumption
Row-node stationary-based dataflow approach
Traffic distribution based on distributer nodes
Experimental results
Conclusion
Acknowledgment
39
NOCS2019
Flow mapping method reduced the total energy and
delay with the distributer nodes compared with the
pattern without the distributer nodes
Traffic distribution of CNN and DNN on a mesh
network with distributers nodes improving the
performance and throughput requirements
Row-node stationary-based dataflow has impressive
effect on reducing delay and energy consumption
Proposed router with simpler structure and tiny
switches decreased area consumption and delay
Multicast traffic distribution in multi-side with the
distributer nodes decreases total energy and flow on the
mesh
Introduction
Investigating some related works
The purposes of our proposed deep learning accelerator
Evaluated parameters
Flow mapping method on a mesh topology
Influence of dataflow on energy consumption
Row-node stationary-based dataflow approach
Traffic distribution based on distributer nodes
Experimental results
Conclusion
Acknowledgment
40
NOCS2019
We thank the Synergy lab team from Georgia Institute
of Technology for responding our questions and
providing more information about the Maeri project and
their kind help in compiling and using Maestro and
Scale-sim simulators.
Introduction
Investigating some related works
The purposes of our proposed deep learning accelerator
Evaluated parameters
Flow mapping method on a mesh topology
Influence of dataflow on energy consumption
Row-node stationary-based dataflow approach
Traffic distribution based on distributer nodes
Experimental results
Conclusion
Acknowledgment
41
NOCS2019
REFERENCES
[1] Tao Luo, Shaoli Liu, Ling Li, Yuqing Wang, Shijin Zhang, Tianshi Chen, Zhiwei Xu, Olivier Temam, and Yunji Chen, DaDianNao: A Machine-Learning Supercomputer. Journal (Transactions on
Computers), 2016.
[2] Z. Du, Robert Fasthuber, Tianshi Chen, Paolo Ienne, Ling Li, Tao Luo, Xiaobing Feng, Yunji Chen, and Olivier Temam, ShiDianNao: Shifting Vision Processing Closer to the Sensor. Conference
(ISCA), 2015.
[3]Ali Shafiee, Anirban Nag, Naveen Muralimanohar, Rajeev Balasubramonian, John Paul Strachan, Miao Hu, R. Stanley Williams, Vivek Srikumar, ISAAC: A Convolutional Neural Network
Accelerator with In-Situ Analog Arithmetic in Crossbars. Conference (Computer Architecture), 2016.
[4] Hyoukjun Kwon, Ananda Samajdar, and Tushar Krishna, Rethinking NoCs for Spatial Neural Network Accelerators. Conference (NOCS), 2017.
[5] Yu-Hsin Chen, Tushar Krishna, Joel S. Emer, and Vivienne Sze, Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks. Journal (SOLID-STATE
CIRCUITS), 2016.
[6] N P. Jouppi et al., In-Datacenter Performance Analysis of a Tensor Processing Unit. Conference (ArXiv), 2017.
[7] Bert Moons, and Marian Verhelst, A 0.3-2.6 UPS/W Precision-Scalable Processor for Real-Time Large-Scale ConvNets. Symposium (VLSI), 2016.
[8] Hyoukjun Kwon, Joel S. Emer, and Tushar Krishna, MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects. Conference (ASPLOS’18), 2018.
[9] Hyoukjun Kwon, Michael Pellauer, and Tushar Krishna, MAESTRO: An Open-source Infrastructure for Modeling Dataflows within Deep Learning Accelerators. Conference (ArXiv), 2018.
[10] https://github.com/davidepatti/noxim
[11] https://www.xilinx.com/products/design-tools/vivado.html
[12] http://synergy.ece.gatech.edu/tools/maestro/
[13] Vincenzo Catania, Andrea Mineo, Maurizio Palesi, Davide Patti, and Salvatore Monteleone, Cycle-Accurate Network on Chip Simulation with Noxim. Journal (TOMACS), 2016.
[14] Hyoukjun Kwon, and Tushar Krishna, OpenSMART: Single-Cycle Multi-hop NoC Generator in BSV and Chisel. Conference (ISPASS), 2017.
[15] Kun-Chih, Jimmy Chen, and Ting-Yi Wang, NN-Noxim: High-Level Cycle-Accurate NoC-based Neural Networks Simulator. Conference (NOCARC), 2018.
[16] Yu-Hsin Chen, Joel S. Emer, and Vivienne Sze, Eyeriss v2: A Flexible and High-Performance Accelerator for Emerging Deep Neural Networks. Journal (ArXiv), 2018.
[17] Ananda Samajdar, Yuhao Zhu, Paul Whatmough, Matthew Mattina, and Tushar Krishna, SCALE-Sim: Systolic CNN Accelerator Simulator. Conference (ASPLOS’18), 2018.
[18] https://github.com/ARM-software/SCALE-Sim
[19] Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, ImageNet Classification with Deep Convolutional Neural Networks. Conference (NIPS), 2012.
42
NOCS2019
Thank you for your attention
?
NOCS2019

More Related Content

What's hot

IMPROVEMENTS IN ROUTING ALGORITHMS TO ENHANCE LIFETIME OF WIRELESS SENSOR NET...
IMPROVEMENTS IN ROUTING ALGORITHMS TO ENHANCE LIFETIME OF WIRELESS SENSOR NET...IMPROVEMENTS IN ROUTING ALGORITHMS TO ENHANCE LIFETIME OF WIRELESS SENSOR NET...
IMPROVEMENTS IN ROUTING ALGORITHMS TO ENHANCE LIFETIME OF WIRELESS SENSOR NET...IJCNCJournal
 
Review on Clustering and Data Aggregation in Wireless Sensor Network
Review on Clustering and Data Aggregation in Wireless Sensor NetworkReview on Clustering and Data Aggregation in Wireless Sensor Network
Review on Clustering and Data Aggregation in Wireless Sensor NetworkEditor IJCATR
 
AN OPTIMUM ENERGY CONSUMPTION HYBRID ALGORITHM FOR XLN STRATEGIC DESIGN IN WSN’S
AN OPTIMUM ENERGY CONSUMPTION HYBRID ALGORITHM FOR XLN STRATEGIC DESIGN IN WSN’SAN OPTIMUM ENERGY CONSUMPTION HYBRID ALGORITHM FOR XLN STRATEGIC DESIGN IN WSN’S
AN OPTIMUM ENERGY CONSUMPTION HYBRID ALGORITHM FOR XLN STRATEGIC DESIGN IN WSN’SIJCNCJournal
 
SECTOR TREE-BASED CLUSTERING FOR ENERGY EFFICIENT ROUTING PROTOCOL IN HETEROG...
SECTOR TREE-BASED CLUSTERING FOR ENERGY EFFICIENT ROUTING PROTOCOL IN HETEROG...SECTOR TREE-BASED CLUSTERING FOR ENERGY EFFICIENT ROUTING PROTOCOL IN HETEROG...
SECTOR TREE-BASED CLUSTERING FOR ENERGY EFFICIENT ROUTING PROTOCOL IN HETEROG...IJCNCJournal
 
Enhanced Hybrid Clustering Scheme for Dense Wireless Sensor Networks
Enhanced Hybrid Clustering Scheme for Dense Wireless Sensor NetworksEnhanced Hybrid Clustering Scheme for Dense Wireless Sensor Networks
Enhanced Hybrid Clustering Scheme for Dense Wireless Sensor NetworksAssociate Professor in VSB Coimbatore
 
Vol 8 No 1 - December 2013
Vol 8 No 1 - December 2013Vol 8 No 1 - December 2013
Vol 8 No 1 - December 2013ijcsbi
 
A DYNAMIC ROUTE DISCOVERY SCHEME FOR HETEROGENEOUS WIRELESS SENSOR NETWORKS B...
A DYNAMIC ROUTE DISCOVERY SCHEME FOR HETEROGENEOUS WIRELESS SENSOR NETWORKS B...A DYNAMIC ROUTE DISCOVERY SCHEME FOR HETEROGENEOUS WIRELESS SENSOR NETWORKS B...
A DYNAMIC ROUTE DISCOVERY SCHEME FOR HETEROGENEOUS WIRELESS SENSOR NETWORKS B...csandit
 
Optimized Cluster Establishment and Cluster-Head Selection Approach in WSN
Optimized Cluster Establishment and Cluster-Head Selection Approach in WSNOptimized Cluster Establishment and Cluster-Head Selection Approach in WSN
Optimized Cluster Establishment and Cluster-Head Selection Approach in WSNIJCNCJournal
 
ON THE PERFORMANCE OF INTRUSION DETECTION SYSTEMS WITH HIDDEN MULTILAYER NEUR...
ON THE PERFORMANCE OF INTRUSION DETECTION SYSTEMS WITH HIDDEN MULTILAYER NEUR...ON THE PERFORMANCE OF INTRUSION DETECTION SYSTEMS WITH HIDDEN MULTILAYER NEUR...
ON THE PERFORMANCE OF INTRUSION DETECTION SYSTEMS WITH HIDDEN MULTILAYER NEUR...IJCNCJournal
 
Ijarcet vol-2-issue-4-1420-1427
Ijarcet vol-2-issue-4-1420-1427Ijarcet vol-2-issue-4-1420-1427
Ijarcet vol-2-issue-4-1420-1427Editor IJARCET
 
Performance evaluation of hierarchical clustering protocols with fuzzy C-means
Performance evaluation of hierarchical clustering protocols with fuzzy C-means Performance evaluation of hierarchical clustering protocols with fuzzy C-means
Performance evaluation of hierarchical clustering protocols with fuzzy C-means IJECEIAES
 
PERFORMANCE ANALYSIS IN CELLULAR NETWORKS CONSIDERING THE QOS BY RETRIAL QUEU...
PERFORMANCE ANALYSIS IN CELLULAR NETWORKS CONSIDERING THE QOS BY RETRIAL QUEU...PERFORMANCE ANALYSIS IN CELLULAR NETWORKS CONSIDERING THE QOS BY RETRIAL QUEU...
PERFORMANCE ANALYSIS IN CELLULAR NETWORKS CONSIDERING THE QOS BY RETRIAL QUEU...IJCNCJournal
 
On the Tree Construction of Multi hop Wireless Mesh Networks with Evolutionar...
On the Tree Construction of Multi hop Wireless Mesh Networks with Evolutionar...On the Tree Construction of Multi hop Wireless Mesh Networks with Evolutionar...
On the Tree Construction of Multi hop Wireless Mesh Networks with Evolutionar...CSCJournals
 
Optimal configuration of network coding in ad hoc networks
Optimal configuration of network coding in ad hoc networksOptimal configuration of network coding in ad hoc networks
Optimal configuration of network coding in ad hoc networksPvrtechnologies Nellore
 
ERROR PERFORMANCE ANALYSIS USING COOPERATIVE CONTENTION-BASED ROUTING IN WIRE...
ERROR PERFORMANCE ANALYSIS USING COOPERATIVE CONTENTION-BASED ROUTING IN WIRE...ERROR PERFORMANCE ANALYSIS USING COOPERATIVE CONTENTION-BASED ROUTING IN WIRE...
ERROR PERFORMANCE ANALYSIS USING COOPERATIVE CONTENTION-BASED ROUTING IN WIRE...IJCSEIT Journal
 
Algorithmic Construction of Optimal and Load Balanced Clusters in Wireless Se...
Algorithmic Construction of Optimal and Load Balanced Clusters in Wireless Se...Algorithmic Construction of Optimal and Load Balanced Clusters in Wireless Se...
Algorithmic Construction of Optimal and Load Balanced Clusters in Wireless Se...M H
 
Energy Efficient Clustering Algorithm based on Expectation Maximization for H...
Energy Efficient Clustering Algorithm based on Expectation Maximization for H...Energy Efficient Clustering Algorithm based on Expectation Maximization for H...
Energy Efficient Clustering Algorithm based on Expectation Maximization for H...IRJET Journal
 
TEST-COST-SENSITIVE CONVOLUTIONAL NEURAL NETWORKS WITH EXPERT BRANCHES
TEST-COST-SENSITIVE CONVOLUTIONAL NEURAL NETWORKS WITH EXPERT BRANCHESTEST-COST-SENSITIVE CONVOLUTIONAL NEURAL NETWORKS WITH EXPERT BRANCHES
TEST-COST-SENSITIVE CONVOLUTIONAL NEURAL NETWORKS WITH EXPERT BRANCHESsipij
 
DYNAMIC TASK SCHEDULING BASED ON BURST TIME REQUIREMENT FOR CLOUD ENVIRONMENT
DYNAMIC TASK SCHEDULING BASED ON BURST TIME REQUIREMENT FOR CLOUD ENVIRONMENTDYNAMIC TASK SCHEDULING BASED ON BURST TIME REQUIREMENT FOR CLOUD ENVIRONMENT
DYNAMIC TASK SCHEDULING BASED ON BURST TIME REQUIREMENT FOR CLOUD ENVIRONMENTIJCNCJournal
 
NSGA-III Based Energy Efficient Protocol for Wireless Sensor Networks
NSGA-III Based Energy Efficient Protocol for Wireless Sensor NetworksNSGA-III Based Energy Efficient Protocol for Wireless Sensor Networks
NSGA-III Based Energy Efficient Protocol for Wireless Sensor NetworksIJCSIS Research Publications
 

What's hot (20)

IMPROVEMENTS IN ROUTING ALGORITHMS TO ENHANCE LIFETIME OF WIRELESS SENSOR NET...
IMPROVEMENTS IN ROUTING ALGORITHMS TO ENHANCE LIFETIME OF WIRELESS SENSOR NET...IMPROVEMENTS IN ROUTING ALGORITHMS TO ENHANCE LIFETIME OF WIRELESS SENSOR NET...
IMPROVEMENTS IN ROUTING ALGORITHMS TO ENHANCE LIFETIME OF WIRELESS SENSOR NET...
 
Review on Clustering and Data Aggregation in Wireless Sensor Network
Review on Clustering and Data Aggregation in Wireless Sensor NetworkReview on Clustering and Data Aggregation in Wireless Sensor Network
Review on Clustering and Data Aggregation in Wireless Sensor Network
 
AN OPTIMUM ENERGY CONSUMPTION HYBRID ALGORITHM FOR XLN STRATEGIC DESIGN IN WSN’S
AN OPTIMUM ENERGY CONSUMPTION HYBRID ALGORITHM FOR XLN STRATEGIC DESIGN IN WSN’SAN OPTIMUM ENERGY CONSUMPTION HYBRID ALGORITHM FOR XLN STRATEGIC DESIGN IN WSN’S
AN OPTIMUM ENERGY CONSUMPTION HYBRID ALGORITHM FOR XLN STRATEGIC DESIGN IN WSN’S
 
SECTOR TREE-BASED CLUSTERING FOR ENERGY EFFICIENT ROUTING PROTOCOL IN HETEROG...
SECTOR TREE-BASED CLUSTERING FOR ENERGY EFFICIENT ROUTING PROTOCOL IN HETEROG...SECTOR TREE-BASED CLUSTERING FOR ENERGY EFFICIENT ROUTING PROTOCOL IN HETEROG...
SECTOR TREE-BASED CLUSTERING FOR ENERGY EFFICIENT ROUTING PROTOCOL IN HETEROG...
 
Enhanced Hybrid Clustering Scheme for Dense Wireless Sensor Networks
Enhanced Hybrid Clustering Scheme for Dense Wireless Sensor NetworksEnhanced Hybrid Clustering Scheme for Dense Wireless Sensor Networks
Enhanced Hybrid Clustering Scheme for Dense Wireless Sensor Networks
 
Vol 8 No 1 - December 2013
Vol 8 No 1 - December 2013Vol 8 No 1 - December 2013
Vol 8 No 1 - December 2013
 
A DYNAMIC ROUTE DISCOVERY SCHEME FOR HETEROGENEOUS WIRELESS SENSOR NETWORKS B...
A DYNAMIC ROUTE DISCOVERY SCHEME FOR HETEROGENEOUS WIRELESS SENSOR NETWORKS B...A DYNAMIC ROUTE DISCOVERY SCHEME FOR HETEROGENEOUS WIRELESS SENSOR NETWORKS B...
A DYNAMIC ROUTE DISCOVERY SCHEME FOR HETEROGENEOUS WIRELESS SENSOR NETWORKS B...
 
Optimized Cluster Establishment and Cluster-Head Selection Approach in WSN
Optimized Cluster Establishment and Cluster-Head Selection Approach in WSNOptimized Cluster Establishment and Cluster-Head Selection Approach in WSN
Optimized Cluster Establishment and Cluster-Head Selection Approach in WSN
 
ON THE PERFORMANCE OF INTRUSION DETECTION SYSTEMS WITH HIDDEN MULTILAYER NEUR...
ON THE PERFORMANCE OF INTRUSION DETECTION SYSTEMS WITH HIDDEN MULTILAYER NEUR...ON THE PERFORMANCE OF INTRUSION DETECTION SYSTEMS WITH HIDDEN MULTILAYER NEUR...
ON THE PERFORMANCE OF INTRUSION DETECTION SYSTEMS WITH HIDDEN MULTILAYER NEUR...
 
Ijarcet vol-2-issue-4-1420-1427
Ijarcet vol-2-issue-4-1420-1427Ijarcet vol-2-issue-4-1420-1427
Ijarcet vol-2-issue-4-1420-1427
 
Performance evaluation of hierarchical clustering protocols with fuzzy C-means
Performance evaluation of hierarchical clustering protocols with fuzzy C-means Performance evaluation of hierarchical clustering protocols with fuzzy C-means
Performance evaluation of hierarchical clustering protocols with fuzzy C-means
 
PERFORMANCE ANALYSIS IN CELLULAR NETWORKS CONSIDERING THE QOS BY RETRIAL QUEU...
PERFORMANCE ANALYSIS IN CELLULAR NETWORKS CONSIDERING THE QOS BY RETRIAL QUEU...PERFORMANCE ANALYSIS IN CELLULAR NETWORKS CONSIDERING THE QOS BY RETRIAL QUEU...
PERFORMANCE ANALYSIS IN CELLULAR NETWORKS CONSIDERING THE QOS BY RETRIAL QUEU...
 
On the Tree Construction of Multi hop Wireless Mesh Networks with Evolutionar...
On the Tree Construction of Multi hop Wireless Mesh Networks with Evolutionar...On the Tree Construction of Multi hop Wireless Mesh Networks with Evolutionar...
On the Tree Construction of Multi hop Wireless Mesh Networks with Evolutionar...
 
Optimal configuration of network coding in ad hoc networks
Optimal configuration of network coding in ad hoc networksOptimal configuration of network coding in ad hoc networks
Optimal configuration of network coding in ad hoc networks
 
ERROR PERFORMANCE ANALYSIS USING COOPERATIVE CONTENTION-BASED ROUTING IN WIRE...
ERROR PERFORMANCE ANALYSIS USING COOPERATIVE CONTENTION-BASED ROUTING IN WIRE...ERROR PERFORMANCE ANALYSIS USING COOPERATIVE CONTENTION-BASED ROUTING IN WIRE...
ERROR PERFORMANCE ANALYSIS USING COOPERATIVE CONTENTION-BASED ROUTING IN WIRE...
 
Algorithmic Construction of Optimal and Load Balanced Clusters in Wireless Se...
Algorithmic Construction of Optimal and Load Balanced Clusters in Wireless Se...Algorithmic Construction of Optimal and Load Balanced Clusters in Wireless Se...
Algorithmic Construction of Optimal and Load Balanced Clusters in Wireless Se...
 
Energy Efficient Clustering Algorithm based on Expectation Maximization for H...
Energy Efficient Clustering Algorithm based on Expectation Maximization for H...Energy Efficient Clustering Algorithm based on Expectation Maximization for H...
Energy Efficient Clustering Algorithm based on Expectation Maximization for H...
 
TEST-COST-SENSITIVE CONVOLUTIONAL NEURAL NETWORKS WITH EXPERT BRANCHES
TEST-COST-SENSITIVE CONVOLUTIONAL NEURAL NETWORKS WITH EXPERT BRANCHESTEST-COST-SENSITIVE CONVOLUTIONAL NEURAL NETWORKS WITH EXPERT BRANCHES
TEST-COST-SENSITIVE CONVOLUTIONAL NEURAL NETWORKS WITH EXPERT BRANCHES
 
DYNAMIC TASK SCHEDULING BASED ON BURST TIME REQUIREMENT FOR CLOUD ENVIRONMENT
DYNAMIC TASK SCHEDULING BASED ON BURST TIME REQUIREMENT FOR CLOUD ENVIRONMENTDYNAMIC TASK SCHEDULING BASED ON BURST TIME REQUIREMENT FOR CLOUD ENVIRONMENT
DYNAMIC TASK SCHEDULING BASED ON BURST TIME REQUIREMENT FOR CLOUD ENVIRONMENT
 
NSGA-III Based Energy Efficient Protocol for Wireless Sensor Networks
NSGA-III Based Energy Efficient Protocol for Wireless Sensor NetworksNSGA-III Based Energy Efficient Protocol for Wireless Sensor Networks
NSGA-III Based Energy Efficient Protocol for Wireless Sensor Networks
 

Similar to Flow Mapping and Data Distribution on Mesh-based Deep Learning Accelerator

Congestion aware routing algorithm network on chip
Congestion aware routing algorithm network on chipCongestion aware routing algorithm network on chip
Congestion aware routing algorithm network on chipNiteshKumar198644
 
IRJET- Aggregate Signature Scheme and Secured ID for Wireless Sensor Netw...
IRJET-  	  Aggregate Signature Scheme and Secured ID for Wireless Sensor Netw...IRJET-  	  Aggregate Signature Scheme and Secured ID for Wireless Sensor Netw...
IRJET- Aggregate Signature Scheme and Secured ID for Wireless Sensor Netw...IRJET Journal
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
PERFORMANCE ANALYSIS OF WIRELESS MESH NETWORK USING ADAPTIVE INFORMANT FACTOR...
PERFORMANCE ANALYSIS OF WIRELESS MESH NETWORK USING ADAPTIVE INFORMANT FACTOR...PERFORMANCE ANALYSIS OF WIRELESS MESH NETWORK USING ADAPTIVE INFORMANT FACTOR...
PERFORMANCE ANALYSIS OF WIRELESS MESH NETWORK USING ADAPTIVE INFORMANT FACTOR...IJCSES Journal
 
Performance analysis of congestion-aware Q-routing algorithm for network on chip
Performance analysis of congestion-aware Q-routing algorithm for network on chipPerformance analysis of congestion-aware Q-routing algorithm for network on chip
Performance analysis of congestion-aware Q-routing algorithm for network on chipIAESIJAI
 
A Novel Weighted Clustering Based Approach for Improving the Wireless Sensor ...
A Novel Weighted Clustering Based Approach for Improving the Wireless Sensor ...A Novel Weighted Clustering Based Approach for Improving the Wireless Sensor ...
A Novel Weighted Clustering Based Approach for Improving the Wireless Sensor ...IJERA Editor
 
Greedy Cluster Based Routing for Wireless Sensor Networks
Greedy Cluster Based Routing for Wireless Sensor NetworksGreedy Cluster Based Routing for Wireless Sensor Networks
Greedy Cluster Based Routing for Wireless Sensor NetworksAIRCC Publishing Corporation
 
GREEDY CLUSTER BASED ROUTING FOR WIRELESS SENSOR NETWORKS
GREEDY CLUSTER BASED ROUTING FOR WIRELESS SENSOR NETWORKSGREEDY CLUSTER BASED ROUTING FOR WIRELESS SENSOR NETWORKS
GREEDY CLUSTER BASED ROUTING FOR WIRELESS SENSOR NETWORKSijcsit
 
Greedy Cluster Based Routing for Wireless Sensor Networks
Greedy Cluster Based Routing for Wireless Sensor NetworksGreedy Cluster Based Routing for Wireless Sensor Networks
Greedy Cluster Based Routing for Wireless Sensor NetworksAIRCC Publishing Corporation
 
Clustering and data aggregation scheme in underwater wireless acoustic sensor...
Clustering and data aggregation scheme in underwater wireless acoustic sensor...Clustering and data aggregation scheme in underwater wireless acoustic sensor...
Clustering and data aggregation scheme in underwater wireless acoustic sensor...TELKOMNIKA JOURNAL
 
Routing Optimization with Load Balancing: an Energy Efficient Approach
Routing Optimization with Load Balancing: an Energy Efficient ApproachRouting Optimization with Load Balancing: an Energy Efficient Approach
Routing Optimization with Load Balancing: an Energy Efficient ApproachEswar Publications
 
Guleria2019
Guleria2019Guleria2019
Guleria2019SSNayak2
 
A smart clustering based approach to
A smart clustering based approach toA smart clustering based approach to
A smart clustering based approach toIJCNCJournal
 
The impact of channel model on the performance of distance-based schemes in v...
The impact of channel model on the performance of distance-based schemes in v...The impact of channel model on the performance of distance-based schemes in v...
The impact of channel model on the performance of distance-based schemes in v...IJECEIAES
 
Traffic-aware adaptive server load balancing for softwaredefined networks
Traffic-aware adaptive server load balancing for softwaredefined networks Traffic-aware adaptive server load balancing for softwaredefined networks
Traffic-aware adaptive server load balancing for softwaredefined networks IJECEIAES
 
DESIGN OF ENERGY EFFICIENT ROUTING ALGORITHM FOR WIRELESS SENSOR NETWORK (WSN...
DESIGN OF ENERGY EFFICIENT ROUTING ALGORITHM FOR WIRELESS SENSOR NETWORK (WSN...DESIGN OF ENERGY EFFICIENT ROUTING ALGORITHM FOR WIRELESS SENSOR NETWORK (WSN...
DESIGN OF ENERGY EFFICIENT ROUTING ALGORITHM FOR WIRELESS SENSOR NETWORK (WSN...cscpconf
 
DYNAMIC HYBRID CHANNEL (WMN) FOR BANDWIDTH GUARANTEES IN AD_HOC NETWORKS
DYNAMIC HYBRID CHANNEL (WMN) FOR BANDWIDTH GUARANTEES IN AD_HOC NETWORKSDYNAMIC HYBRID CHANNEL (WMN) FOR BANDWIDTH GUARANTEES IN AD_HOC NETWORKS
DYNAMIC HYBRID CHANNEL (WMN) FOR BANDWIDTH GUARANTEES IN AD_HOC NETWORKSpharmaindexing
 
MULTICASTING BASED ENHANCED PROACTIVE SOURCE ROUTING IN MANETS
MULTICASTING BASED ENHANCED PROACTIVE SOURCE ROUTING IN MANETSMULTICASTING BASED ENHANCED PROACTIVE SOURCE ROUTING IN MANETS
MULTICASTING BASED ENHANCED PROACTIVE SOURCE ROUTING IN MANETSIJCNCJournal
 
JCWAEED: JOINT CHANNEL ASSIGNMENT AND WEIGHTED AVERAGE EXPECTED END-TO-END DE...
JCWAEED: JOINT CHANNEL ASSIGNMENT AND WEIGHTED AVERAGE EXPECTED END-TO-END DE...JCWAEED: JOINT CHANNEL ASSIGNMENT AND WEIGHTED AVERAGE EXPECTED END-TO-END DE...
JCWAEED: JOINT CHANNEL ASSIGNMENT AND WEIGHTED AVERAGE EXPECTED END-TO-END DE...csandit
 
A fuzzy delay-bandwidth guaranteed routing algorithm for vedio conferencing ...
A fuzzy  delay-bandwidth guaranteed routing algorithm for vedio conferencing ...A fuzzy  delay-bandwidth guaranteed routing algorithm for vedio conferencing ...
A fuzzy delay-bandwidth guaranteed routing algorithm for vedio conferencing ...Gopi Krishna
 

Similar to Flow Mapping and Data Distribution on Mesh-based Deep Learning Accelerator (20)

Congestion aware routing algorithm network on chip
Congestion aware routing algorithm network on chipCongestion aware routing algorithm network on chip
Congestion aware routing algorithm network on chip
 
IRJET- Aggregate Signature Scheme and Secured ID for Wireless Sensor Netw...
IRJET-  	  Aggregate Signature Scheme and Secured ID for Wireless Sensor Netw...IRJET-  	  Aggregate Signature Scheme and Secured ID for Wireless Sensor Netw...
IRJET- Aggregate Signature Scheme and Secured ID for Wireless Sensor Netw...
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
PERFORMANCE ANALYSIS OF WIRELESS MESH NETWORK USING ADAPTIVE INFORMANT FACTOR...
PERFORMANCE ANALYSIS OF WIRELESS MESH NETWORK USING ADAPTIVE INFORMANT FACTOR...PERFORMANCE ANALYSIS OF WIRELESS MESH NETWORK USING ADAPTIVE INFORMANT FACTOR...
PERFORMANCE ANALYSIS OF WIRELESS MESH NETWORK USING ADAPTIVE INFORMANT FACTOR...
 
Performance analysis of congestion-aware Q-routing algorithm for network on chip
Performance analysis of congestion-aware Q-routing algorithm for network on chipPerformance analysis of congestion-aware Q-routing algorithm for network on chip
Performance analysis of congestion-aware Q-routing algorithm for network on chip
 
A Novel Weighted Clustering Based Approach for Improving the Wireless Sensor ...
A Novel Weighted Clustering Based Approach for Improving the Wireless Sensor ...A Novel Weighted Clustering Based Approach for Improving the Wireless Sensor ...
A Novel Weighted Clustering Based Approach for Improving the Wireless Sensor ...
 
Greedy Cluster Based Routing for Wireless Sensor Networks
Greedy Cluster Based Routing for Wireless Sensor NetworksGreedy Cluster Based Routing for Wireless Sensor Networks
Greedy Cluster Based Routing for Wireless Sensor Networks
 
GREEDY CLUSTER BASED ROUTING FOR WIRELESS SENSOR NETWORKS
GREEDY CLUSTER BASED ROUTING FOR WIRELESS SENSOR NETWORKSGREEDY CLUSTER BASED ROUTING FOR WIRELESS SENSOR NETWORKS
GREEDY CLUSTER BASED ROUTING FOR WIRELESS SENSOR NETWORKS
 
Greedy Cluster Based Routing for Wireless Sensor Networks
Greedy Cluster Based Routing for Wireless Sensor NetworksGreedy Cluster Based Routing for Wireless Sensor Networks
Greedy Cluster Based Routing for Wireless Sensor Networks
 
Clustering and data aggregation scheme in underwater wireless acoustic sensor...
Clustering and data aggregation scheme in underwater wireless acoustic sensor...Clustering and data aggregation scheme in underwater wireless acoustic sensor...
Clustering and data aggregation scheme in underwater wireless acoustic sensor...
 
Routing Optimization with Load Balancing: an Energy Efficient Approach
Routing Optimization with Load Balancing: an Energy Efficient ApproachRouting Optimization with Load Balancing: an Energy Efficient Approach
Routing Optimization with Load Balancing: an Energy Efficient Approach
 
Guleria2019
Guleria2019Guleria2019
Guleria2019
 
A smart clustering based approach to
A smart clustering based approach toA smart clustering based approach to
A smart clustering based approach to
 
The impact of channel model on the performance of distance-based schemes in v...
The impact of channel model on the performance of distance-based schemes in v...The impact of channel model on the performance of distance-based schemes in v...
The impact of channel model on the performance of distance-based schemes in v...
 
Traffic-aware adaptive server load balancing for softwaredefined networks
Traffic-aware adaptive server load balancing for softwaredefined networks Traffic-aware adaptive server load balancing for softwaredefined networks
Traffic-aware adaptive server load balancing for softwaredefined networks
 
DESIGN OF ENERGY EFFICIENT ROUTING ALGORITHM FOR WIRELESS SENSOR NETWORK (WSN...
DESIGN OF ENERGY EFFICIENT ROUTING ALGORITHM FOR WIRELESS SENSOR NETWORK (WSN...DESIGN OF ENERGY EFFICIENT ROUTING ALGORITHM FOR WIRELESS SENSOR NETWORK (WSN...
DESIGN OF ENERGY EFFICIENT ROUTING ALGORITHM FOR WIRELESS SENSOR NETWORK (WSN...
 
DYNAMIC HYBRID CHANNEL (WMN) FOR BANDWIDTH GUARANTEES IN AD_HOC NETWORKS
DYNAMIC HYBRID CHANNEL (WMN) FOR BANDWIDTH GUARANTEES IN AD_HOC NETWORKSDYNAMIC HYBRID CHANNEL (WMN) FOR BANDWIDTH GUARANTEES IN AD_HOC NETWORKS
DYNAMIC HYBRID CHANNEL (WMN) FOR BANDWIDTH GUARANTEES IN AD_HOC NETWORKS
 
MULTICASTING BASED ENHANCED PROACTIVE SOURCE ROUTING IN MANETS
MULTICASTING BASED ENHANCED PROACTIVE SOURCE ROUTING IN MANETSMULTICASTING BASED ENHANCED PROACTIVE SOURCE ROUTING IN MANETS
MULTICASTING BASED ENHANCED PROACTIVE SOURCE ROUTING IN MANETS
 
JCWAEED: JOINT CHANNEL ASSIGNMENT AND WEIGHTED AVERAGE EXPECTED END-TO-END DE...
JCWAEED: JOINT CHANNEL ASSIGNMENT AND WEIGHTED AVERAGE EXPECTED END-TO-END DE...JCWAEED: JOINT CHANNEL ASSIGNMENT AND WEIGHTED AVERAGE EXPECTED END-TO-END DE...
JCWAEED: JOINT CHANNEL ASSIGNMENT AND WEIGHTED AVERAGE EXPECTED END-TO-END DE...
 
A fuzzy delay-bandwidth guaranteed routing algorithm for vedio conferencing ...
A fuzzy  delay-bandwidth guaranteed routing algorithm for vedio conferencing ...A fuzzy  delay-bandwidth guaranteed routing algorithm for vedio conferencing ...
A fuzzy delay-bandwidth guaranteed routing algorithm for vedio conferencing ...
 

Recently uploaded

Kalyan callg Girls, { 07738631006 } || Call Girl In Kalyan Women Seeking Men ...
Kalyan callg Girls, { 07738631006 } || Call Girl In Kalyan Women Seeking Men ...Kalyan callg Girls, { 07738631006 } || Call Girl In Kalyan Women Seeking Men ...
Kalyan callg Girls, { 07738631006 } || Call Girl In Kalyan Women Seeking Men ...Pooja Nehwal
 
Hifi Defence Colony Call Girls Service WhatsApp -> 9999965857 Available 24x7 ...
Hifi Defence Colony Call Girls Service WhatsApp -> 9999965857 Available 24x7 ...Hifi Defence Colony Call Girls Service WhatsApp -> 9999965857 Available 24x7 ...
Hifi Defence Colony Call Girls Service WhatsApp -> 9999965857 Available 24x7 ...srsj9000
 
Pallawi 9167673311 Call Girls in Thane , Independent Escort Service Thane
Pallawi 9167673311  Call Girls in Thane , Independent Escort Service ThanePallawi 9167673311  Call Girls in Thane , Independent Escort Service Thane
Pallawi 9167673311 Call Girls in Thane , Independent Escort Service ThanePooja Nehwal
 
Lucknow 💋 Call Girls Adil Nagar | ₹,9500 Pay Cash 8923113531 Free Home Delive...
Lucknow 💋 Call Girls Adil Nagar | ₹,9500 Pay Cash 8923113531 Free Home Delive...Lucknow 💋 Call Girls Adil Nagar | ₹,9500 Pay Cash 8923113531 Free Home Delive...
Lucknow 💋 Call Girls Adil Nagar | ₹,9500 Pay Cash 8923113531 Free Home Delive...anilsa9823
 
(ANIKA) Wanwadi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(ANIKA) Wanwadi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(ANIKA) Wanwadi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(ANIKA) Wanwadi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
Russian Call Girls Kolkata Chhaya 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls Kolkata Chhaya 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls Kolkata Chhaya 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls Kolkata Chhaya 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
《1:1仿制麦克马斯特大学毕业证|订制麦克马斯特大学文凭》
《1:1仿制麦克马斯特大学毕业证|订制麦克马斯特大学文凭》《1:1仿制麦克马斯特大学毕业证|订制麦克马斯特大学文凭》
《1:1仿制麦克马斯特大学毕业证|订制麦克马斯特大学文凭》o8wvnojp
 
FULL ENJOY - 8264348440 Call Girls in Hauz Khas | Delhi
FULL ENJOY - 8264348440 Call Girls in Hauz Khas | DelhiFULL ENJOY - 8264348440 Call Girls in Hauz Khas | Delhi
FULL ENJOY - 8264348440 Call Girls in Hauz Khas | Delhisoniya singh
 
(SANA) Call Girls Landewadi ( 7001035870 ) HI-Fi Pune Escorts Service
(SANA) Call Girls Landewadi ( 7001035870 ) HI-Fi Pune Escorts Service(SANA) Call Girls Landewadi ( 7001035870 ) HI-Fi Pune Escorts Service
(SANA) Call Girls Landewadi ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
(ZARA) Call Girls Jejuri ( 7001035870 ) HI-Fi Pune Escorts Service
(ZARA) Call Girls Jejuri ( 7001035870 ) HI-Fi Pune Escorts Service(ZARA) Call Girls Jejuri ( 7001035870 ) HI-Fi Pune Escorts Service
(ZARA) Call Girls Jejuri ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
定制(USF学位证)旧金山大学毕业证成绩单原版一比一
定制(USF学位证)旧金山大学毕业证成绩单原版一比一定制(USF学位证)旧金山大学毕业证成绩单原版一比一
定制(USF学位证)旧金山大学毕业证成绩单原版一比一ss ss
 
Call Girls in Dwarka Sub City 💯Call Us 🔝8264348440🔝
Call Girls in Dwarka Sub City 💯Call Us 🔝8264348440🔝Call Girls in Dwarka Sub City 💯Call Us 🔝8264348440🔝
Call Girls in Dwarka Sub City 💯Call Us 🔝8264348440🔝soniya singh
 
Call Girls in Thane 9892124323, Vashi cAll girls Serivces Juhu Escorts, powai...
Call Girls in Thane 9892124323, Vashi cAll girls Serivces Juhu Escorts, powai...Call Girls in Thane 9892124323, Vashi cAll girls Serivces Juhu Escorts, powai...
Call Girls in Thane 9892124323, Vashi cAll girls Serivces Juhu Escorts, powai...Pooja Nehwal
 
定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一
定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一
定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一ga6c6bdl
 
9004554577, Get Adorable Call Girls service. Book call girls & escort service...
9004554577, Get Adorable Call Girls service. Book call girls & escort service...9004554577, Get Adorable Call Girls service. Book call girls & escort service...
9004554577, Get Adorable Call Girls service. Book call girls & escort service...Pooja Nehwal
 
Alambagh Call Girl 9548273370 , Call Girls Service Lucknow
Alambagh Call Girl 9548273370 , Call Girls Service LucknowAlambagh Call Girl 9548273370 , Call Girls Service Lucknow
Alambagh Call Girl 9548273370 , Call Girls Service Lucknowmakika9823
 
VIP Call Girl Saharanpur Aashi 8250192130 Independent Escort Service Saharanpur
VIP Call Girl Saharanpur Aashi 8250192130 Independent Escort Service SaharanpurVIP Call Girl Saharanpur Aashi 8250192130 Independent Escort Service Saharanpur
VIP Call Girl Saharanpur Aashi 8250192130 Independent Escort Service SaharanpurSuhani Kapoor
 
Presentation.pptxjnfoigneoifnvoeifnvklfnvf
Presentation.pptxjnfoigneoifnvoeifnvklfnvfPresentation.pptxjnfoigneoifnvoeifnvklfnvf
Presentation.pptxjnfoigneoifnvoeifnvklfnvfchapmanellie27
 
Vip Noida Escorts 9873940964 Greater Noida Escorts Service
Vip Noida Escorts 9873940964 Greater Noida Escorts ServiceVip Noida Escorts 9873940964 Greater Noida Escorts Service
Vip Noida Escorts 9873940964 Greater Noida Escorts Serviceankitnayak356677
 

Recently uploaded (20)

Kalyan callg Girls, { 07738631006 } || Call Girl In Kalyan Women Seeking Men ...
Kalyan callg Girls, { 07738631006 } || Call Girl In Kalyan Women Seeking Men ...Kalyan callg Girls, { 07738631006 } || Call Girl In Kalyan Women Seeking Men ...
Kalyan callg Girls, { 07738631006 } || Call Girl In Kalyan Women Seeking Men ...
 
Hifi Defence Colony Call Girls Service WhatsApp -> 9999965857 Available 24x7 ...
Hifi Defence Colony Call Girls Service WhatsApp -> 9999965857 Available 24x7 ...Hifi Defence Colony Call Girls Service WhatsApp -> 9999965857 Available 24x7 ...
Hifi Defence Colony Call Girls Service WhatsApp -> 9999965857 Available 24x7 ...
 
Pallawi 9167673311 Call Girls in Thane , Independent Escort Service Thane
Pallawi 9167673311  Call Girls in Thane , Independent Escort Service ThanePallawi 9167673311  Call Girls in Thane , Independent Escort Service Thane
Pallawi 9167673311 Call Girls in Thane , Independent Escort Service Thane
 
Lucknow 💋 Call Girls Adil Nagar | ₹,9500 Pay Cash 8923113531 Free Home Delive...
Lucknow 💋 Call Girls Adil Nagar | ₹,9500 Pay Cash 8923113531 Free Home Delive...Lucknow 💋 Call Girls Adil Nagar | ₹,9500 Pay Cash 8923113531 Free Home Delive...
Lucknow 💋 Call Girls Adil Nagar | ₹,9500 Pay Cash 8923113531 Free Home Delive...
 
(ANIKA) Wanwadi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(ANIKA) Wanwadi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(ANIKA) Wanwadi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(ANIKA) Wanwadi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
Russian Call Girls Kolkata Chhaya 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls Kolkata Chhaya 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls Kolkata Chhaya 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls Kolkata Chhaya 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
《1:1仿制麦克马斯特大学毕业证|订制麦克马斯特大学文凭》
《1:1仿制麦克马斯特大学毕业证|订制麦克马斯特大学文凭》《1:1仿制麦克马斯特大学毕业证|订制麦克马斯特大学文凭》
《1:1仿制麦克马斯特大学毕业证|订制麦克马斯特大学文凭》
 
FULL ENJOY - 8264348440 Call Girls in Hauz Khas | Delhi
FULL ENJOY - 8264348440 Call Girls in Hauz Khas | DelhiFULL ENJOY - 8264348440 Call Girls in Hauz Khas | Delhi
FULL ENJOY - 8264348440 Call Girls in Hauz Khas | Delhi
 
(SANA) Call Girls Landewadi ( 7001035870 ) HI-Fi Pune Escorts Service
(SANA) Call Girls Landewadi ( 7001035870 ) HI-Fi Pune Escorts Service(SANA) Call Girls Landewadi ( 7001035870 ) HI-Fi Pune Escorts Service
(SANA) Call Girls Landewadi ( 7001035870 ) HI-Fi Pune Escorts Service
 
(ZARA) Call Girls Jejuri ( 7001035870 ) HI-Fi Pune Escorts Service
(ZARA) Call Girls Jejuri ( 7001035870 ) HI-Fi Pune Escorts Service(ZARA) Call Girls Jejuri ( 7001035870 ) HI-Fi Pune Escorts Service
(ZARA) Call Girls Jejuri ( 7001035870 ) HI-Fi Pune Escorts Service
 
定制(USF学位证)旧金山大学毕业证成绩单原版一比一
定制(USF学位证)旧金山大学毕业证成绩单原版一比一定制(USF学位证)旧金山大学毕业证成绩单原版一比一
定制(USF学位证)旧金山大学毕业证成绩单原版一比一
 
Call Girls in Dwarka Sub City 💯Call Us 🔝8264348440🔝
Call Girls in Dwarka Sub City 💯Call Us 🔝8264348440🔝Call Girls in Dwarka Sub City 💯Call Us 🔝8264348440🔝
Call Girls in Dwarka Sub City 💯Call Us 🔝8264348440🔝
 
Call Girls in Thane 9892124323, Vashi cAll girls Serivces Juhu Escorts, powai...
Call Girls in Thane 9892124323, Vashi cAll girls Serivces Juhu Escorts, powai...Call Girls in Thane 9892124323, Vashi cAll girls Serivces Juhu Escorts, powai...
Call Girls in Thane 9892124323, Vashi cAll girls Serivces Juhu Escorts, powai...
 
定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一
定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一
定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一
 
9004554577, Get Adorable Call Girls service. Book call girls & escort service...
9004554577, Get Adorable Call Girls service. Book call girls & escort service...9004554577, Get Adorable Call Girls service. Book call girls & escort service...
9004554577, Get Adorable Call Girls service. Book call girls & escort service...
 
Alambagh Call Girl 9548273370 , Call Girls Service Lucknow
Alambagh Call Girl 9548273370 , Call Girls Service LucknowAlambagh Call Girl 9548273370 , Call Girls Service Lucknow
Alambagh Call Girl 9548273370 , Call Girls Service Lucknow
 
VIP Call Girl Saharanpur Aashi 8250192130 Independent Escort Service Saharanpur
VIP Call Girl Saharanpur Aashi 8250192130 Independent Escort Service SaharanpurVIP Call Girl Saharanpur Aashi 8250192130 Independent Escort Service Saharanpur
VIP Call Girl Saharanpur Aashi 8250192130 Independent Escort Service Saharanpur
 
Presentation.pptxjnfoigneoifnvoeifnvklfnvf
Presentation.pptxjnfoigneoifnvoeifnvklfnvfPresentation.pptxjnfoigneoifnvoeifnvklfnvf
Presentation.pptxjnfoigneoifnvoeifnvklfnvf
 
Vip Noida Escorts 9873940964 Greater Noida Escorts Service
Vip Noida Escorts 9873940964 Greater Noida Escorts ServiceVip Noida Escorts 9873940964 Greater Noida Escorts Service
Vip Noida Escorts 9873940964 Greater Noida Escorts Service
 
Low rate Call girls in Delhi Justdial | 9953330565
Low rate Call girls in Delhi Justdial | 9953330565Low rate Call girls in Delhi Justdial | 9953330565
Low rate Call girls in Delhi Justdial | 9953330565
 

Flow Mapping and Data Distribution on Mesh-based Deep Learning Accelerator

  • 1. Flow Mapping and Data Distribution on Mesh-based Deep Learning Accelerator Science and Research Branch of Azad University Presenting by Hesam Shabani Seyedeh Yasaman Hosseini Mirmahaleh1, Midia Reshadi1, Hesam Shabani2, Xiaochen Guo2, Nader Bagherzadeh3 1Department of Computer Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran, 2Lehigh University, Bethlehem, PA, USA 3Department of Electrical Engineering and Computer Science, University of California Irvine, Irvine, CA, USA yasaman.hosseini@srbiau.ac.ir NOCS2019
  • 2. Titles of presentation Introduction Investigating some related works The purposes of our proposed deep learning accelerator Evaluated parameters Flow mapping method on a mesh topology Influence of dataflow on energy consumption Row-node stationary-based dataflow approach Traffic distribution based on distributer nodes Experimental results Conclusion Acknowledgment 1NOCS2019
  • 3.  Deploying machine learning algorithm-based applications  Internet of Things (IoT)  Web search engines  Image processing and data mining-based applications  Increasing depth and complexity of neural networks  Challenges regarding increasing depth and complexity of convolutional and deep neural networks (CNN and DNN)  Increasing energy consumption  Memory capacity  Bandwidth requirement  Memory access  Delay  Proposed deep learning accelerators for facing CNN and DNN problems  Supercomputer  Communication networks  Memory logics  Proposed our method for improving delay, energy consumption, bandwidth, and memory requirements  Flow mapping  Distributer nodes  New traffic distribution mechanism on a mesh topology  Simple structure for router with tiny switches Introduction Investigating some related works The purposes of our proposed deep learning accelerator Evaluated parameters Flow mapping method on a mesh topology Influence of dataflow on energy consumption Row-node stationary-based dataflow approach Traffic distribution based on distributer nodes Experimental results Conclusion Acknowledgment 2 NOCS2019
  • 4. Investigating advantages and disadvantages of proposed deep learning accelerators (DLA) Accelerator Advantage Disadvantage TPU [6] Speed up processing Dataflow dependency DaDianNao [1] Speedup processing compared with GPU, Improving memory capacity and energy consumption Inflexible, complexity of neuron mapping, Implementing train and inference phases, integrating optical interconnections and electrical connections, computation dependency Eyeriss [5] Improving memory access, reducing bandwidth requirement and delay No flexibility and scableity, No supporting sparse DNN (SDNN), computation dependency Eyeriss V.2 [16] Scableity, supporting SDNN Increasing complexity of MAC MAERI [8] Speed up processing, improving memory access, flexibility, independent to dataflow Restricted to only one direction for traffic distribution, increasing power consumption compared other accelerators Introduction Investigating some related works The purposes of our proposed deep learning accelerator Evaluated parameters Flow mapping method on a mesh topology Influence of dataflow on energy consumption Row-node stationary-based dataflow approach Traffic distribution based on distributer nodes Experimental results Conclusion Acknowledgment Advantage and Disadvantage GPU-based systems [38] Advantage Flexibility Disadvantage High energy consumption 3 NOCS2019
  • 5.  A new traffic distribution mechanism on a mesh topology using distributer nodes  Providing a flexible structure of proposed our DLA based on filter, kernel, and channel sizes of CNN and DNN trained models  Focus on a mesh topology as a communication network for accelerating  Flexible location of distributer nodes on a mesh topology based on filter, kernel, and channel sizes  Row-node stationary for flow mapping  Improving online implementing trained models using reducing the parameters  Delay  Energy consumption  Memory access  Bandwidth requirement  Analyzing and distributing the traffic of AlexNet, VGG-16, and GoogleNet as the examples of CNN and DNN models Introduction Investigating some related works The purposes of our proposed deep learning accelerator Evaluated parameters Flow mapping method on a mesh topology Influence of dataflow on energy consumption Row-node stationary-based dataflow approach Traffic distribution based on distributer nodes Experimental results Conclusion Acknowledgment 4 NOCS2019
  • 6. Area consumption Energy consumption Delay Average utilization Bandwidth requirement Memory access Introduction Investigating some related works The purposes of our proposed deep learning accelerator Evaluated parameters Flow mapping method on a mesh topology Influence of dataflow on energy consumption Row-node stationary-based dataflow approach Traffic distribution based on distributer nodes Experimental results Conclusion Acknowledgment 5 NOCS2019
  • 7. AlexNet traffic distribution as an example of CNN on a mesh topology Partitioning the mesh based on kernel, filter, and channel sizes of AlexNet as an example for describing partitioning Our proposed mesh based DLA architecture  Architecture of proposed DLA  Router  Switches  Switch selector Introduction Investigating some related works The purposes of our proposed deep learning accelerator Evaluated parameters Flow mapping method on a mesh topology Influence of dataflow on energy consumption Row-node stationary-based dataflow approach Traffic distribution based on distributer nodes Experimental results Conclusion Acknowledgment 6 NOCS2019
  • 8. AlexNet traffic distribution on 12×14 2D mesh 2D mesh 12×14 (a) 2D mesh 12×14 2D mesh 12×14 (c) 2D mesh 12×14 (d) 2D mesh 12×14 (e)(b) CONV1 11×55 CONV2 5×27 CONV3 3×13 CONV4 3×13 CONV5 3×13 7 NOCS2019
  • 9. Partitioning the mesh based on kernel, filter, and channel sizes of AlexNet for CONV1 Introduction Investigating some related works The purposes of our proposed deep learning accelerator Evaluated parameters Flow mapping method on a mesh topology Influence of dataflow on energy consumption Row-node stationary-based dataflow approach Traffic distribution based on distributer nodes Experimental results Conclusion Acknowledgment 8 NOCS2019 AlexNet architecture [19]
  • 10. Partitioning the mesh based on kernel, filter, and channel sizes of AlexNet for CONV1 Introduction Investigating some related works The purposes of our proposed deep learning accelerator Evaluated parameters Flow mapping method on a mesh topology Influence of dataflow on energy consumption Row-node stationary-based dataflow approach Traffic distribution based on distributer nodes Experimental results Conclusion Acknowledgment 11×7 9 NOCS2019
  • 11. Partitioning the mesh based on kernel, filter, and channel sizes of AlexNet for CONV1 Introduction Investigating some related works The purposes of our proposed deep learning accelerator Evaluated parameters Flow mapping method on a mesh topology Influence of dataflow on energy consumption Row-node stationary-based dataflow approach Traffic distribution based on distributer nodes Experimental results Conclusion Acknowledgment 11×7 11×7 10 NOCS2019
  • 12. Partitioning the mesh based on kernel, filter, and channel sizes of AlexNet for CONV2 Introduction Investigating some related works The purposes of our proposed deep learning accelerator Evaluated parameters Flow mapping method on a mesh topology Influence of dataflow on energy consumption Row-node stationary-based dataflow approach Traffic distribution based on distributer nodes Experimental results Conclusion Acknowledgment 11 NOCS2019
  • 13. Partitioning the mesh based on kernel, filter, and channel sizes of AlexNet for CONV2 Introduction Investigating some related works The purposes of our proposed deep learning accelerator Evaluated parameters Flow mapping method on a mesh topology Influence of dataflow on energy consumption Row-node stationary-based dataflow approach Traffic distribution based on distributer nodes Experimental results Conclusion Acknowledgment 5×13 12 NOCS2019
  • 14. Partitioning the mesh based on kernel, filter, and channel sizes of AlexNet for CONV2 Introduction Investigating some related works The purposes of our proposed deep learning accelerator Evaluated parameters Flow mapping method on a mesh topology Influence of dataflow on energy consumption Row-node stationary-based dataflow approach Traffic distribution based on distributer nodes Experimental results Conclusion Acknowledgment 5×13 5×14 13 NOCS2019
  • 15. Partitioning the mesh based on kernel, filter, and channel sizes of AlexNet for CONV3-5 Introduction Investigating some related works The purposes of our proposed deep learning accelerator Evaluated parameters Flow mapping method on a mesh topology Influence of dataflow on energy consumption Row-node stationary-based dataflow approach Traffic distribution based on distributer nodes Experimental results Conclusion Acknowledgment 14 NOCS2019
  • 16. Partitioning the mesh based on kernel, filter, and channel sizes of AlexNet for CONV3-5 Introduction Investigating some related works The purposes of our proposed deep learning accelerator Evaluated parameters Flow mapping method on a mesh topology Influence of dataflow on energy consumption Row-node stationary-based dataflow approach Traffic distribution based on distributer nodes Experimental results Conclusion Acknowledgment 3×13 3×13 3×13 3×13 15 NOCS2019
  • 17. Architecture of proposed DLAIntroduction Investigating some related works The purposes of our proposed deep learning accelerator Evaluated parameters Flow mapping method on a mesh topology Influence of dataflow on energy consumption Row-node stationary-based dataflow approach Traffic distribution based on distributer nodes Experimental results Conclusion Acknowledgment ifmap Filter Psum GlobalBuffer 16 NOCS2019
  • 18. Architecture of proposed DLA Introduction Investigating some related works The purposes of our proposed deep learning accelerator Evaluated parameters Flow mapping method on a mesh topology Influence of dataflow on energy consumption Row-node stationary-based dataflow approach Traffic distribution based on distributer nodes Experimental results Conclusion Acknowledgment ifmap Filter Psum GlobalBuffer Switch selector 17 NOCS2019
  • 19. Architecture of proposed DLA Introduction Investigating some related works The purposes of our proposed deep learning accelerator Evaluated parameters Flow mapping method on a mesh topology Influence of dataflow on energy consumption Row-node stationary-based dataflow approach Traffic distribution based on distributer nodes Experimental results Conclusion Acknowledgment ifmap Filter Psum GlobalBuffer 12×15 2D Mesh 12×14 Switch selector 18 NOCS2019
  • 20. Router Introduction Investigating some related works The purposes of our proposed deep learning accelerator Evaluated parameters Flow mapping method on a mesh topology Influence of dataflow on energy consumption Row-node stationary-based dataflow approach Traffic distribution based on distributer nodes Experimental results Conclusion Acknowledgment North Switch West South East Multicast Buffer Local Buffer Buffer Buffer Buffer Buffer Utilizing multicast buffer, on/off buffer backpressure mechanism, and two-stage pipeline 19 NOCS2019
  • 21. SwitchIntroduction Investigating some related works The purposes of our proposed deep learning accelerator Evaluated parameters Flow mapping method on a mesh topology Influence of dataflow on energy consumption Row-node stationary-based dataflow approach Traffic distribution based on distributer nodes Experimental results Conclusion Acknowledgment N S W E Clk EN s0 s1 s3 s2 N S W E MUX DeMUX Local port Local port 20 NOCS2019
  • 22. Switch selectorIntroduction Investigating some related works The purposes of our proposed deep learning accelerator Evaluated parameters Flow mapping method on a mesh topology Influence of dataflow on energy consumption Row-node stationary-based dataflow approach Traffic distribution based on distributer nodes Experimental results Conclusion Acknowledgment S1 S0 S2 S3 S4 S1S0S2S3S4 S1S0S2S3S4 11111 11111 EN EN In0 In1 In3 Switch address Switch address C-decoderR-decoder Mux 0 Mux N-1 0 N-1 0 N-1 21 NOCS2019
  • 23.  Weight stationary (WS): Weight elements are received from the GB and broadcasted to PEs and after fixing in each PE, convolution calculation is performed between fixed weight in each PE and ifmap elements broadcasted from GB onto PEs [3], [4].  Microswitch array [12]  Output stationary (OS): In output-stationary DLA, outputs or both weights and input activations are mapped to PEs from GB. The Psum results are sent to the GB after finishing local computation [2], [4], [7].  TPU  Systolic array  Row stationary (RS): The ifmap and filter are transferred from the GB to PE units horizontally, whereas Psums are accumulated vertically by a multiply-accumulate (MAC) operation of PEs, and accumulated Psums are transferred to the GB [5].  Eyeriss [5]  Eyeriss V.2 [16]  Microswitch array [4]  Row-node stationary (RNS): We propose row-node stationary (RNS) dataflow as a state-of-the-art approach for traffic distribution of DNN trained models based on flow mapping and memory access mechanism. An accelerator can transfer data on sets of nodes based on RNS dataflow in the vertical and horizontal directions using distributer nodes in parallel. Introduction Investigating some related works The purposes of our proposed deep learning accelerator Evaluated parameters Flow mapping method on a mesh topology Influence of dataflow on energy consumption Row-node stationary-based dataflow approach Traffic distribution based on distributer nodes Experimental results Conclusion Acknowledgment 22 NOCS2019
  • 24. Introduction Investigating some related works The purposes of our proposed deep learning accelerator Evaluated parameters Flow mapping method on a mesh topology Influence of dataflow on energy consumption Row-node stationary-based dataflow approach Traffic distribution based on distributer nodes Experimental results Conclusion Acknowledgment Filter row 1 Filterrow2 Filterrow3 Filterrow3 Ifmap row 1 Ifmaprow2 Ifm ap row 3 Ifmaprow4 Ifmaprow5 Ifmaprow3 Ifmap row 3 Ifmap row 2 Ifmap row 4 Filter row 2 Filter row 3 Filterrow3 Filterrow3 Node (a) (b) Filterrow1 Ifmaprow1 Distributer Node Psum row3 Psum row1 Psum row2 Filter row 3 (c) A row of ifmap values is reused and distributed in vertical and horizontal directions based on the location of distributer node A row of filter weights is reused and distributed in vertical and horizontal directions based on the location of distributer node A row of Psums is accumulated vertically 23 NOCS2019
  • 25. Introduction Investigating some related works The purposes of our proposed deep learning accelerator Evaluated parameters Flow mapping method on a mesh topology Influence of dataflow on energy consumption Row-node stationary-based dataflow approach Traffic distribution based on distributer nodes Experimental results Conclusion Acknowledgment 12×15 2D Mesh (a)Destination node 12×14 ifmap Psum Filter Shared bus Distributer node AlexNet traffic distribution for CONV1 on 12×15 2D mesh using distributer nodes 24 NOCS2019
  • 26. Introduction Investigating some related works The purposes of our proposed deep learning accelerator Evaluated parameters Flow mapping method on a mesh topology Influence of dataflow on energy consumption Row-node stationary-based dataflow approach Traffic distribution based on distributer nodes Experimental results Conclusion Acknowledgment 12×15 2D Mesh (a)Destination node 12×14 ifmap Psum Filter Shared bus Distributer node AlexNet traffic distribution for CONV1 on 12×15 2D mesh using distributer nodes 25 NOCS2019
  • 27. Introduction Investigating some related works The purposes of our proposed deep learning accelerator Evaluated parameters Flow mapping method on a mesh topology Influence of dataflow on energy consumption Row-node stationary-based dataflow approach Traffic distribution based on distributer nodes Experimental results Conclusion Acknowledgment 12×15 2D Mesh (a)Destination node 12×14 ifmap Psum Filter Shared bus Distributer node AlexNet traffic distribution for CONV1 on 12×15 2D mesh using distributer nodes 26 NOCS2019
  • 28. Introduction Investigating some related works The purposes of our proposed deep learning accelerator Evaluated parameters Flow mapping method on a mesh topology Influence of dataflow on energy consumption Row-node stationary-based dataflow approach Traffic distribution based on distributer nodes Experimental results Conclusion Acknowledgment 12×15 2D Mesh (b) Distributer node 12×14 Destination node AlexNet traffic distribution for CONV2 on 12×15 2D mesh using distributer nodes 27 NOCS2019
  • 29. Introduction Investigating some related works The purposes of our proposed deep learning accelerator Evaluated parameters Flow mapping method on a mesh topology Influence of dataflow on energy consumption Row-node stationary-based dataflow approach Traffic distribution based on distributer nodes Experimental results Conclusion Acknowledgment 12×15 2D Mesh (b) Distributer node 12×14 Destination node AlexNet traffic distribution for CONV2 on 12×15 2D mesh using distributer nodes 28 NOCS2019
  • 30. Introduction Investigating some related works The purposes of our proposed deep learning accelerator Evaluated parameters Flow mapping method on a mesh topology Influence of dataflow on energy consumption Row-node stationary-based dataflow approach Traffic distribution based on distributer nodes Experimental results Conclusion Acknowledgment 12×15 2D Mesh (b) Distributer node 12×14 Destination node AlexNet traffic distribution for CONV2 on 12×15 2D mesh using distributer nodes 29 NOCS2019
  • 31. Introduction Investigating some related works The purposes of our proposed deep learning accelerator Evaluated parameters Flow mapping method on a mesh topology Influence of dataflow on energy consumption Row-node stationary-based dataflow approach Traffic distribution based on distributer nodes Experimental results Conclusion Acknowledgment 12×15 2D Mesh (c) 12×14 ifmap Psum Filter Shared bus Destination node AlexNet traffic distribution for CONV1 on 12×15 2D mesh without distributer nodes 30 NOCS2019
  • 32. Introduction Investigating some related works The purposes of our proposed deep learning accelerator Evaluated parameters Flow mapping method on a mesh topology Influence of dataflow on energy consumption Row-node stationary-based dataflow approach Traffic distribution based on distributer nodes Experimental results Conclusion Acknowledgment 12×15 2D Mesh (c) 12×14 ifmap Psum Filter Shared bus Destination node AlexNet traffic distribution for CONV1 on 12×15 2D mesh without distributer nodes 31 NOCS2019
  • 33. Introduction Investigating some related works The purposes of our proposed deep learning accelerator Evaluated parameters Flow mapping method on a mesh topology Influence of dataflow on energy consumption Row-node stationary-based dataflow approach Traffic distribution based on distributer nodes Experimental results Conclusion Acknowledgment 12×15 2D Mesh (d) 12×14 Destination node AlexNet traffic distribution for CONV2 on 12×15 2D mesh without distributer nodes 32 NOCS2019
  • 34. Introduction Investigating some related works The purposes of our proposed deep learning accelerator Evaluated parameters Flow mapping method on a mesh topology Influence of dataflow on energy consumption Row-node stationary-based dataflow approach Traffic distribution based on distributer nodes Experimental results Conclusion Acknowledgment Destination node 12×15 2D Mesh (d) 12×14 AlexNet traffic distribution for CONV2 on 12×15 2D mesh without distributer nodes 33 NOCS2019
  • 35. Introduction Investigating some related works The purposes of our proposed deep learning accelerator Evaluated parameters Flow mapping method on a mesh topology Influence of dataflow on energy consumption Row-node stationary-based dataflow approach Traffic distribution based on distributer nodes Experimental results Conclusion Acknowledgment 12×15 2D Mesh 12×15 2D Mesh (a) (b) 12×15 2D Mesh 12×15 2D Mesh (c) (d) Distributer node Destination node 12×14 12×14 12×14 12×14 ifmap Psum Filter ifmap Psum Filter Shared bus Shared bus AlexNet traffic distribution for CONV1 on 12×15 2D mesh using distributer nodes AlexNet traffic distribution for CONV1 on 12×15 2D mesh without distributer nodes 34 NOCS2019
  • 36. Introduction Investigating some related works The purposes of our proposed deep learning accelerator Evaluated parameters Flow mapping method on a mesh topology Influence of dataflow on energy consumption Row-node stationary-based dataflow approach Traffic distribution based on distributer nodes Experimental results Conclusion Acknowledgment 0.00E+00 5.00E-06 1.00E-05 1.50E-05 2.00E-05 2.50E-05 3.00E-05 12×15 mesh without distributer node 12×15 mesh with distributer node Maeri Totalenergy(J) Total Energy 12×15 mesh without distributer node 12×15 mesh with distributer node Maeri Comparing total energy of 12×15 2D mesh with distributer nodes, 12×15 2D mesh without distributer nodes and Maeri 4600 4620 4640 4660 4680 4700 4720 12×15 mesh without distributer node 12×15 mesh with distributer node Maeri Totaldelay(Cycle) Total Delay 12×15 mesh without distributer node 12×15 mesh with distributer node Maeri Comparing total delay of 12×15 2D mesh with distributer nodes, 12×15 2D mesh without distributer nodes and Maeri 35 NOCS2019
  • 37. Introduction Investigating some related works The purposes of our proposed deep learning accelerator Evaluated parameters Flow mapping method on a mesh topology Influence of dataflow on energy consumption Row-node stationary-based dataflow approach Traffic distribution based on distributer nodes Experimental results Conclusion Acknowledgment 0.00E+00 1.00E+03 2.00E+03 3.00E+03 4.00E+03 5.00E+03 6.00E+03 7.00E+03 8.00E+03 9.00E+03 Eyeriss Maeri Mesh NumberofLUTs FPGA LUT Eyeriss Maeri Mesh Comparing switch area consumption of 12×15 2D mesh with distributer nodes, 168 switches of Eyeriss and 64 multiplier switches of Maeri 0 50 100 150 200 250 300 350 12×15 mesh without distributer node 12×15 mesh with distributer node Memoryaccess(Cycles) Memory access Comparing memory access of 12×15 2D mesh with distributer nodes and without using distributer nodes for AlexNet traffic distribution based on cycles for writing and read memory 36 NOCS2019
  • 38. Table 1. Total run time comparing between various dataflows with 168 PEs for CONV1 and CONV11 of VGG-16Introduction Investigating some related works The purposes of our proposed deep learning accelerator Evaluated parameters Flow mapping method on a mesh topology Influence of dataflow on energy consumption Row-node stationary-based dataflow approach Traffic distribution based on distributer nodes Experimental results Conclusion Acknowledgment CONV Dataflow Total runtime (Cycle) 1 RN 17034 1 NLR 501258240 1 Ws 25961600 1 Shi 249446400 1 DLA 1157409792 1 RS 164204544 11 RN 17722 11 NLR 360316928 11 Ws 217317376 11 Shi 2020081664 11 DLA 673876224 11 RS 830472192 Table2. Average utilization and run time comparison between various topologies for AlexNet and GoogleNet traffic distribution Trained model Topology Array size Compute runtime (Cycle) Average utilization (%) AlexNet Proposed mesh based DLA 12×14 113352 88.57 AlexNet TPU 256×256 10026200 96.25 AlexNet Systolic array 32×32 2504183 99.12 AlexNet Eyeriss 12×14 16377164 98.05 GoogleNet Proposed mesh based DLA 12×14 180182 84.52 GoogleNet TPU 256×256 259827 68.67 GoogleNet Systolic array 256×256 297163 68.67 37 NOCS2019
  • 39. Table3. Bandwidth requirement comparing between various topologies for AlexNet, GoogleNet and VGG-16 traffic distributions Introduction Investigating some related works The purposes of our proposed deep learning accelerator Evaluated parameters Flow mapping method on a mesh topology Influence of dataflow on energy consumption Row-node stationary-based dataflow approach Traffic distribution based on distributer nodes Experimental results Conclusion Acknowledgment Trained model Topology Array size Bandwidth requirement (Byte/Cycle) GoogleNet Proposed mesh based DLA 12×14 0.08 GoogleNet TPU 256×256 3.62 GoogleNet Systolic array 256×256 49.71 AlexNet Proposed mesh based DLA 12×14 0.08 AlexNet TPU 256×256 3.14 AlexNet Systolic array 256×256 3.14 AlexNet Eyeriss 12×14 1.02 VGG-16 Proposed mesh based DLA 12×14 0.08 VGG-16 TPU 256×256 4.38 VGG-16 Systolic array 256×256 12.108 VGG-16 Eyeriss 12×14 0.9 0.00E+00 5.00E+04 1.00E+05 1.50E+05 2.00E+05 2.50E+05 3.00E+05 3.50E+05 AlexNet VGG-16 GoogleNet Totalruntime(Cycle) Trained models Total Runtime Total runtime of traffic distribution of AlexNet, VGG- 16, and GoogleNet on the mesh 38 NOCS2019
  • 40. Introducing used simulation tools  Deploying a cycle-accurate simulation tool based on SystemC inspired by the Noxim tool [13], [10], [15]  Xilinx Vivado tool [11], [14]  Scale-sim as a Python-based cycle-accurate tool [17], [18]  Maestro as a SystemC-based tool [9], [12] A summary of simulation results  Reducing energy consumption for distributing traffic with distributer nodes by approximately 8% compared to without distributer nodes  Decreasing energy consumption and total delay for 12×15 2D mesh with distributer node by approximately 43.66% and 0.59% compared with Maeri, respectively  Reducing area consumption based on LUT for 12×15 2D mesh with distributer nodes by approximately 93.56% as compared to Maeri  Reducing memory access approximately 62.5% compared to using no distributer nodes in AlexNet traffic on 12×15 mesh  Decreasing total runtime for row-node stationary (RN) by approximately 99% compared with weight stationary (WS) dataflow in CONV1 and CONV11 of VGG-16  Improving compute runtime and average utilization of our proposed DLA by approximately 30.65 % and 18.75% compared with TPU for first nine-convolutions of GoogleNet, respectively  Improving bandwidth requirement for mesh by approximately 98.17 and 91.1% compared with TPU and Eyeriss for VGG-16 traffic distribution, respectively Introduction Investigating some related works The purposes of our proposed deep learning accelerator Evaluated parameters Flow mapping method on a mesh topology Influence of dataflow on energy consumption Row-node stationary-based dataflow approach Traffic distribution based on distributer nodes Experimental results Conclusion Acknowledgment 39 NOCS2019
  • 41. Flow mapping method reduced the total energy and delay with the distributer nodes compared with the pattern without the distributer nodes Traffic distribution of CNN and DNN on a mesh network with distributers nodes improving the performance and throughput requirements Row-node stationary-based dataflow has impressive effect on reducing delay and energy consumption Proposed router with simpler structure and tiny switches decreased area consumption and delay Multicast traffic distribution in multi-side with the distributer nodes decreases total energy and flow on the mesh Introduction Investigating some related works The purposes of our proposed deep learning accelerator Evaluated parameters Flow mapping method on a mesh topology Influence of dataflow on energy consumption Row-node stationary-based dataflow approach Traffic distribution based on distributer nodes Experimental results Conclusion Acknowledgment 40 NOCS2019
  • 42. We thank the Synergy lab team from Georgia Institute of Technology for responding our questions and providing more information about the Maeri project and their kind help in compiling and using Maestro and Scale-sim simulators. Introduction Investigating some related works The purposes of our proposed deep learning accelerator Evaluated parameters Flow mapping method on a mesh topology Influence of dataflow on energy consumption Row-node stationary-based dataflow approach Traffic distribution based on distributer nodes Experimental results Conclusion Acknowledgment 41 NOCS2019
  • 43. REFERENCES [1] Tao Luo, Shaoli Liu, Ling Li, Yuqing Wang, Shijin Zhang, Tianshi Chen, Zhiwei Xu, Olivier Temam, and Yunji Chen, DaDianNao: A Machine-Learning Supercomputer. Journal (Transactions on Computers), 2016. [2] Z. Du, Robert Fasthuber, Tianshi Chen, Paolo Ienne, Ling Li, Tao Luo, Xiaobing Feng, Yunji Chen, and Olivier Temam, ShiDianNao: Shifting Vision Processing Closer to the Sensor. Conference (ISCA), 2015. [3]Ali Shafiee, Anirban Nag, Naveen Muralimanohar, Rajeev Balasubramonian, John Paul Strachan, Miao Hu, R. Stanley Williams, Vivek Srikumar, ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars. Conference (Computer Architecture), 2016. [4] Hyoukjun Kwon, Ananda Samajdar, and Tushar Krishna, Rethinking NoCs for Spatial Neural Network Accelerators. Conference (NOCS), 2017. [5] Yu-Hsin Chen, Tushar Krishna, Joel S. Emer, and Vivienne Sze, Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks. Journal (SOLID-STATE CIRCUITS), 2016. [6] N P. Jouppi et al., In-Datacenter Performance Analysis of a Tensor Processing Unit. Conference (ArXiv), 2017. [7] Bert Moons, and Marian Verhelst, A 0.3-2.6 UPS/W Precision-Scalable Processor for Real-Time Large-Scale ConvNets. Symposium (VLSI), 2016. [8] Hyoukjun Kwon, Joel S. Emer, and Tushar Krishna, MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects. Conference (ASPLOS’18), 2018. [9] Hyoukjun Kwon, Michael Pellauer, and Tushar Krishna, MAESTRO: An Open-source Infrastructure for Modeling Dataflows within Deep Learning Accelerators. Conference (ArXiv), 2018. [10] https://github.com/davidepatti/noxim [11] https://www.xilinx.com/products/design-tools/vivado.html [12] http://synergy.ece.gatech.edu/tools/maestro/ [13] Vincenzo Catania, Andrea Mineo, Maurizio Palesi, Davide Patti, and Salvatore Monteleone, Cycle-Accurate Network on Chip Simulation with Noxim. Journal (TOMACS), 2016. [14] Hyoukjun Kwon, and Tushar Krishna, OpenSMART: Single-Cycle Multi-hop NoC Generator in BSV and Chisel. Conference (ISPASS), 2017. [15] Kun-Chih, Jimmy Chen, and Ting-Yi Wang, NN-Noxim: High-Level Cycle-Accurate NoC-based Neural Networks Simulator. Conference (NOCARC), 2018. [16] Yu-Hsin Chen, Joel S. Emer, and Vivienne Sze, Eyeriss v2: A Flexible and High-Performance Accelerator for Emerging Deep Neural Networks. Journal (ArXiv), 2018. [17] Ananda Samajdar, Yuhao Zhu, Paul Whatmough, Matthew Mattina, and Tushar Krishna, SCALE-Sim: Systolic CNN Accelerator Simulator. Conference (ASPLOS’18), 2018. [18] https://github.com/ARM-software/SCALE-Sim [19] Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, ImageNet Classification with Deep Convolutional Neural Networks. Conference (NIPS), 2012. 42 NOCS2019
  • 44. Thank you for your attention ? NOCS2019