SlideShare a Scribd company logo
1 of 46
Download to read offline
Introduction to Eyeriss1
Michael (Tao-Yi) Lee
tylee@mlpanda.rocks
NTU IoX Center
October 24, 2017
1Y. H. Chen et al. “Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep
Convolutional Neural Networks”. In: IEEE Journal of Solid-State Circuits 52.1 (Jan. 2017),
pp. 127–138.
Outline
1 Introduction
2 Eyeriss Highlights
Memory Hierarchy
Row Stationary Data Flow
Network-on-a-chip (NoC)
Compression and Data Gating
3 Summary
4 Appendix
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 2 / 37
Introduction
Contributions of Eyeriss
A novel energy-efficient CNN dataflow that has been verified in
a fabricated chip
A taxonomy of CNN dataflows that classifies previous work into
three categories (WS, OS, NLR)
Figure: Eyeriss Die Photo (35 fps @ 278 mW running AlexNet[10])
4000µm
4000µm
168 PE
GlobalBuffer
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 3 / 37
Introduction
Features of Eyeriss
Use row stationary (RS) on spatial architecture with 168
processing elements to reduce energy cost of data flow
4 level memory hierachy: Maximally local data reuse
Network-on-a-chip (NoC)
Multicast
P2P single cycle delievery
Compression and data gating
Run-length compression (RLC)
PE data gating
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 4 / 37
Introduction
Architecture
Core clock : 100-250 MHz / Link Clock: 90 MHz
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 5 / 37
Introduction
Recap on CNN
Forward computation in CONV layers
Given ofmap O, ifmap I, bias B, weight W, stride size U
O[z][u][x][y] =ReLU B[u] +
C−1
k=0
R−1
i=0
S−1
j=0
I[z][k][Ux + i][Uy + j] × W[u][k][i][j]
partial sum



(1)
where 0 ≤ z < N, 0 ≤ u < M, 0 ≤ y < E, 0 ≤ x < F
E = (H − R + U)/U (2)
F = (W − S + U)/U (3)
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 6 / 37
Eyeriss Highlights Memory Hierarchy
PE Matrix and Memory Hierarchy
1. Spatial Architecture: Allows data to flow in four directions
2. PE operates independently with one CLKcore (i.e. not systolic)
MAC
pixel
W
psumi
psumo
MAC
pixel
W
psumi
psumo
MAC
pixel
W
psumi
psumo
MAC
pixel
W
psumi
psumo
MAC
pixel
W
psumi
psumo
MAC
pixel
W
psumi
psumo
MAC
pixel
W
psumi
psumo
MAC
pixel
W
psumi
psumo
MAC
pixel
W
psumi
psumo
DRAM
Challenge
How to optimize data flow in order to minimize energy consumption?
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 7 / 37
Eyeriss Highlights Memory Hierarchy
PE Matrix and Memory Hierarchy (Eyeriss)
4 level memory hierachy
DRAM → Global Buffer (GLB) → Network-on-a-Chip (NoC) →
Register File (RF)
on-chip
RF(1kB)
(EC=1X)
RF(1kB)
(EC=1X)
RF(1kB)
(EC=1X)
RF(1kB)
(EC=1X)
RF(1kB)
(EC=1X)
RF(1kB)
(EC=1X)
RF(1kB)
(EC=1X)
RF(1kB)
(EC=1X)
RF(1kB)
(EC=1X)
NoC (EC=2X)
DRAM GLB
(EC=6X)
FIFO (EC2
=500X)
2Relative energy cost
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 8 / 37
Eyeriss Highlights Row Stationary Data Flow
CNN dataflows
Row Stationary
Weight Stationary (WS)
Output Stationary (OS)
No Local Reuse (NLR)
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 9 / 37
Eyeriss Highlights Row Stationary Data Flow
Comparison of Dataflows (I)
Focus on flows of psum, weight and pixels in next slides
RS uses 1.4X – 2.5X lower energy than other dataflows
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 10 / 37
Eyeriss Highlights Row Stationary Data Flow
Row Stationary in 1D Convolution
a b c
Kernel
∗
a b c d e
Image
=
a b c
PSum
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 11 / 37
Eyeriss Highlights Row Stationary Data Flow
Row Stationary in 1D Convolution
a b c
Kernel
∗
a b c d e
Image
=
a b c
PSum
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 11 / 37
Eyeriss Highlights Row Stationary Data Flow
Row Stationary in 1D Convolution
a b c
Kernel
∗
a b c d e
Image
=
a b c
PSum
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 11 / 37
Eyeriss Highlights Row Stationary Data Flow
Row Stationary in 1D Convolution (PE)
a b c
Kernel
∗ a b c d e
Image
= a b c
PSum
PEReg File
c b a
c b a
a
de
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 12 / 37
Eyeriss Highlights Row Stationary Data Flow
Row Stationary in 1D Convolution (PE)
a b c
Kernel
∗ a b c d e
Image
= a b c
PSum
PEReg File
d c b
c b a
b
e
a
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 12 / 37
Eyeriss Highlights Row Stationary Data Flow
Row Stationary in 1D Convolution (PE)
a b c
Kernel
∗ a b c d e
Image
= a b c
PSum
PEReg File
e d c
c b a
c b a
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 12 / 37
Eyeriss Highlights Row Stationary Data Flow
Row Stationary in 2D Convolution
1a 1b 1c
2a 2b 2c
3a 3b 3c
Kernel
∗
1a 1b 1c 1d 1e
2a 2b 2c 2d 2e
3a 3b 3c 3d 3e
4a 4b 4c 4d 4e
5a 5b 5c 5d 5e
Image
=
1a 1b 1c
2a 2b 2c
3a 3b 3c
PSum
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 13 / 37
Eyeriss Highlights Row Stationary Data Flow
Row Stationary in 2D Convolution
1a 1b 1c
2a 2b 2c
3a 3b 3c
Kernel
∗
1a 1b 1c 1d 1e
2a 2b 2c 2d 2e
3a 3b 3c 3d 3e
4a 4b 4c 4d 4e
5a 5b 5c 5d 5e
Image
=
1a 1b 1c
2a 2b 2c
3a 3b 3c
PSum
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 13 / 37
Eyeriss Highlights Row Stationary Data Flow
Row Stationary in 2D Convolution
1a 1b 1c
2a 2b 2c
3a 3b 3c
Kernel
∗
1a 1b 1c 1d 1e
2a 2b 2c 2d 2e
3a 3b 3c 3d 3e
4a 4b 4c 4d 4e
5a 5b 5c 5d 5e
Image
=
1a 1b 1c
2a 2b 2c
3a 3b 3c
PSum
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 13 / 37
Eyeriss Highlights Row Stationary Data Flow
Row Stationary in 2D Convolution (PE)
1a 2a 3a
PEReg File
3c 3b 3a
3c 3b 3a
1a
3d3e
PEReg File
4c 4b 4a
3c 3b 3a
2a
4d4e
PEReg File
5c 5b 5a
3c 3b 3a
3a
5d5e
PEReg File
2c 2b 2a
2c 2b 2a
1a
2d2e
PEReg File
3c 3b 3a
2c 2b 2a
2a
3d3e
PEReg File
4c 4b 4a
2c 2b 2a
3a
4d4e
PEReg File
1c 1b 1a
1c 1b 1a
1a
1d1e
PEReg File
2c 2b 2a
1c 1b 1a
2a
2d2e
PEReg File
3c 3b 3a
1c 1b 1a
3a
3d3e
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 14 / 37
Eyeriss Highlights Row Stationary Data Flow
Row Stationary in 2D Convolution (PE)
1a 2a 3a1b 2b 3b
PEReg File
3d 3c 3b
3c 3b 3a
1b
3e
1b
PEReg File
4d 4c 4b
3c 3b 3a
2b
4e
2b
PEReg File
5d 5c 5b
3c 3b 3a
3b
5e
3b
PEReg File
2d 2c 2b
2c 2b 2a
1b
2e
1b
PEReg File
3d 3c 3b
2c 2b 2a
2b
3e
2b
PEReg File
4d 4c 4b
2c 2b 2a
3b
4e
3b
PEReg File
1d 1c 1b
1c 1b 1a
1b
1e
1b
PEReg File
2d 2c 2b
1c 1b 1a
2b
2e
2b
PEReg File
3d 3c 3b
1c 1b 1a
3b
3e
3b
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 14 / 37
Eyeriss Highlights Row Stationary Data Flow
Row Stationary in 2D Convolution (PE)
1a 2a 3a1b 2b 3b1c 2c 3c
PEReg File
3e 3d 3c
3c 3b 3a
1c 1b 1a
PEReg File
4e 4d 4c
3c 3b 3a
2c 2b 2a
PEReg File
5e 5d 5c
3c 3b 3a
3c 3b 3a
PEReg File
2e 2d 2c
2c 2b 2a
1c 1b 1a
PEReg File
3e 3d 3c
2c 2b 2a
2c 2b 2a
PEReg File
4e 4d 4c
2c 2b 2a
3c 3b 3a
PEReg File
1e 1d 1c
1c 1b 1a
1c 1b 1a
PEReg File
2e 2d 2c
1c 1b 1a
2c 2b 2a
PEReg File
3e 3d 3c
1c 1b 1a
3c 3b 3a
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 14 / 37
Eyeriss Highlights Row Stationary Data Flow
Row Stationary in 2D Convolution (PE)
1a 2a 3a1b 2b 3b1c 2c 3c
PEReg File
3e 3d 3c
3c 3b 3a
1c 1b 1a
PEReg File
4e 4d 4c
3c 3b 3a
2c 2b 2a
PEReg File
5e 5d 5c
3c 3b 3a
3c 3b 3a
PEReg File
2e 2d 2c
2c 2b 2a
1c 1b 1a
PEReg File
3e 3d 3c
2c 2b 2a
2c 2b 2a
PEReg File
4e 4d 4c
2c 2b 2a
3c 3b 3a
PEReg File
1e 1d 1c
1c 1b 1a
1c 1b 1a
PEReg File
2e 2d 2c
1c 1b 1a
2c 2b 2a
PEReg File
3e 3d 3c
1c 1b 1a
3c 3b 3a
Psum propagate vertically Psum propagate vertically Psum propagate vertically
Psum propagate diagnally Pixel propagate diagnally Pixel propagate diagnally
Pixel propagate diagnally
Pixel propagate diagnally
Weight propagate horizontally
Weight propagate horizontally
Weight propagate horizontally
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 14 / 37
Eyeriss Highlights Row Stationary Data Flow
Weight Stationary (WS)3
Minimize weight read energy consumption
maximize convolutional and filter reuse of weights
Examples:
Chakradhar et al. 2010
Gokhale et al. 2014
Park et al. 2015
Cavigelli et al. 2015
3Image adopted from Yu-Hsin Chen, Joel Emer, and Vivienne Sze. ISCA 2016 Slides of
Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks.
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 15 / 37
Eyeriss Highlights Row Stationary Data Flow
Output Stationary (OS)4
Minimize partial sum R/W energy consumption
maximize local accumulation
Examples:
Gupta et al. 2015
Du et al. 2015
Peemen et al. 2013
4Image adopted from Yu-Hsin Chen, Joel Emer, and Vivienne Sze. ISCA 2016 Slides of
Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks.
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 16 / 37
Eyeriss Highlights Row Stationary Data Flow
No Local Reuse (NLR)5
Use a large global buffer as shared storage
Reduce DRAM access energy consumption
Examples:
Chen et al. 2014
Chen et al. 2014
Zhang et al. 2015
5Image adopted from Yu-Hsin Chen, Joel Emer, and Vivienne Sze. ISCA 2016 Slides of
Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks.
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 17 / 37
Eyeriss Highlights Row Stationary Data Flow
Comparison of Dataflows (II)
RS reuses data in local register files (RF), a lot! ⇒ Saves energy of
moving data
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 18 / 37
Eyeriss Highlights Row Stationary Data Flow
Beyond 2D Convolution - Multiple Images
Processing in PE
Concatenate image rows
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 19 / 37
Eyeriss Highlights Row Stationary Data Flow
Beyond 2D Convolution - Multiple Filters
Processing in PE
Interleave filter rows
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 20 / 37
Eyeriss Highlights Row Stationary Data Flow
Beyond 2D Convolution - Multiple Channels
Processing in PE
Interleave channels
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 21 / 37
Eyeriss Highlights Row Stationary Data Flow
AlexNet Revisited
AlexNet Convolutional Layer Configurations
Layer Filter Size (R) # Filters (M) # Channels (C) Stride Max Pooling
1 11x11 96 3 4
2 5x5 256 48 1 3×3 S2
3 3x3 384 256 1 3×3 S2
4 3x3 256 192 1
5 3x3 256 192 1 3×3 S2
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 22 / 37
Eyeriss Highlights Row Stationary Data Flow
AlexNet PE Mapping
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 23 / 37
Eyeriss Highlights Row Stationary Data Flow
AlexNet Inter-Pass Data Caching
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 24 / 37
Eyeriss Highlights Row Stationary Data Flow
AlexNet Shape Parameters
How do we map AlexNet onto Eyeriss?
L H6
R E C M U
1 227 11 55 3 96 4
2 31 5 27 48 256 1
3 15 3 13 256 384 1
4 15 3 13 192 384 1
5 15 3 13 192 256 1
m7
n e p q r t
96 1 7 16 1 1 2
64 1 27 16 2 1 1
64 4 13 16 4 1 4
64 4 13 16 3 2 2
64 4 13 16 3 2 2
6H: ifmap width, R: kernel width, E: ofmap width, C: Channels, M: # kernels, U: Stride
7m: # ofmap chan stored in GLB, n: # ifmap, e: width of PE set, p: # filters proc., q: #
chan proc., r: # pe proc. diff. chan., t: # pe proc. diff. filter.
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 25 / 37
Eyeriss Highlights Row Stationary Data Flow
AlexNet Shape Mapping Illustrated
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 26 / 37
Eyeriss Highlights Network-on-a-chip (NoC)
NoC Optimized for RS
Global input/output network: use Multicast Controller (MC) to
broadcaset GLB data into assigned PE. Data is augmented with
(row, col) in GLB
filter GI/ON
ifmap GI/ON
psum GI/ON
Local network: dedicated 64b data bus is implemented to pass
the psums from the bottom PE to the top PE directly
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 27 / 37
Eyeriss Highlights Network-on-a-chip (NoC)
Global Input / Output Network
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 28 / 37
Eyeriss Highlights Network-on-a-chip (NoC)
Populate Data with Global Input / Output Network
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 29 / 37
Eyeriss Highlights Compression and Data Gating
Run-Length Compression (RLC)
ReLU produces many zeros in activated ofmap, use RLC to save
power in DRAM R/W
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 30 / 37
Eyeriss Highlights Compression and Data Gating
Data Gating / Zero Skipping
Simply skip tasks when either pixel or weight is zero
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 31 / 37
Summary
Row Stationary Energy Breakdown
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 32 / 37
Summary
Performance Summary and Comparison
Eyeriss[5] NVIDIA TK1
Technology 65nm 1P9M 28nm
Chip Size 4.0×4.0 N/A
Core Area 3.5×3.5 N/A
Gate Count 1176k N/A
Word Bit-Width 16b Fixed 32b Float
Core Clock(MHz) 200 852
On-Chip Buffer Size (kB) 108 64
Total Register Size (kB) 75.3 256
#MAC 168 192
Throughput(fps) 34.7 68
Measured Power Idle (mW) 3700
Measured Power Active (mW) 278 10002
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 33 / 37
Summary
Summary
RS optimizes for best overall energy efficiency while existing
CNN dataflows only focus on certain data types.
RS has higher energy efficiency than existing dataflows
1.4X ∼ 2.5X higher in CONV layers
at least 1.3X higher in FC layers. (batch size ≥ 16)
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 34 / 37
Appendix
Bibliography I
Lukas Cavigelli et al. “Origami: A Convolutional Network Accelerator”. In: Proceedings of the 25th Edition on Great
Lakes Symposium on VLSI. GLSVLSI ’15. Pittsburgh, Pennsylvania, USA: ACM, 2015, pp. 199–204. isbn:
978-1-4503-3474-7. doi: 10.1145/2742060.2743766. url: http://doi.acm.org/10.1145/2742060.2743766.
Srimat Chakradhar et al. “A Dynamically Configurable Coprocessor for Convolutional Neural Networks”. In:
Proceedings of the 37th Annual International Symposium on Computer Architecture. ISCA ’10. Saint-Malo, France:
ACM, 2010, pp. 247–257. isbn: 978-1-4503-0053-7.
Yu-Hsin Chen, Joel Emer, and Vivienne Sze. ISCA 2016 Slides of Eyeriss: A Spatial Architecture for Energy-Efficient
Dataflow for Convolutional Neural Networks.
Tianshi Chen et al. “DianNao: A Small-footprint High-throughput Accelerator for Ubiquitous Machine-learning”. In:
Proceedings of the 19th International Conference on Architectural Support for Programming Languages and
Operating Systems. ASPLOS ’14. Salt Lake City, Utah, USA: ACM, 2014, pp. 269–284. isbn: 978-1-4503-2305-5.
doi: 10.1145/2541940.2541967. url: http://doi.acm.org/10.1145/2541940.2541967.
Y. H. Chen et al. “Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural
Networks”. In: IEEE Journal of Solid-State Circuits 52.1 (Jan. 2017), pp. 127–138.
Y. Chen et al. “DaDianNao: A Machine-Learning Supercomputer”. In: 2014 47th Annual IEEE/ACM International
Symposium on Microarchitecture. Dec. 2014, pp. 609–622. doi: 10.1109/MICRO.2014.58.
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 35 / 37
Appendix
Bibliography II
Z. Du et al. “ShiDianNao: Shifting vision processing closer to the sensor”. In: 2015 ACM/IEEE 42nd Annual
International Symposium on Computer Architecture (ISCA). June 2015, pp. 92–104. doi:
10.1145/2749469.2750389.
V. Gokhale et al. “A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks”. In: 2014 IEEE Conference on
Computer Vision and Pattern Recognition Workshops. June 2014, pp. 696–701. doi: 10.1109/CVPRW.2014.106.
Suyog Gupta et al. “Deep Learning with Limited Numerical Precision”. In: Proceedings of the 32Nd International
Conference on International Conference on Machine Learning - Volume 37. ICML’15. Lille, France: JMLR.org, 2015,
pp. 1737–1746. url: http://dl.acm.org/citation.cfm?id=3045118.3045303.
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. “ImageNet Classification with Deep Convolutional Neural
Networks”. In: Advances in Neural Information Processing Systems 25. Ed. by F. Pereira et al. 2012, pp. 1097–1105.
S. Park et al. “4.6 A1.93TOPS/W scalable deep learning/inference processor with tetra-parallel MIMD architecture
for big-data applications”. In: 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical
Papers. Feb. 2015, pp. 1–3. doi: 10.1109/ISSCC.2015.7062935.
M. Peemen et al. “Memory-centric accelerator design for Convolutional Neural Networks”. In: 2013 IEEE 31st
International Conference on Computer Design (ICCD). Oct. 2013, pp. 13–19. doi: 10.1109/ICCD.2013.6657019.
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 36 / 37
Appendix
Bibliography III
Chen Zhang et al. “Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks”. In:
Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. FPGA ’15.
Monterey, California, USA: ACM, 2015, pp. 161–170. isbn: 978-1-4503-3315-3. doi: 10.1145/2684746.2689060.
url: http://doi.acm.org/10.1145/2684746.2689060.
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 37 / 37

More Related Content

What's hot

敵対的生成ネットワーク(GAN)
敵対的生成ネットワーク(GAN)敵対的生成ネットワーク(GAN)
敵対的生成ネットワーク(GAN)cvpaper. challenge
 
PostgreSQLのリカバリ超入門(もしくはWAL、CHECKPOINT、オンラインバックアップの仕組み)
PostgreSQLのリカバリ超入門(もしくはWAL、CHECKPOINT、オンラインバックアップの仕組み)PostgreSQLのリカバリ超入門(もしくはWAL、CHECKPOINT、オンラインバックアップの仕組み)
PostgreSQLのリカバリ超入門(もしくはWAL、CHECKPOINT、オンラインバックアップの仕組み)Hironobu Suzuki
 
【DL輪読会】時系列予測 Transfomers の精度向上手法
【DL輪読会】時系列予測 Transfomers の精度向上手法【DL輪読会】時系列予測 Transfomers の精度向上手法
【DL輪読会】時系列予測 Transfomers の精度向上手法Deep Learning JP
 
DARTS: Differentiable Architecture Search at 社内論文読み会
DARTS: Differentiable Architecture Search at 社内論文読み会DARTS: Differentiable Architecture Search at 社内論文読み会
DARTS: Differentiable Architecture Search at 社内論文読み会Masashi Shibata
 
論文紹介 "DARTS: Differentiable Architecture Search"
論文紹介 "DARTS: Differentiable Architecture Search"論文紹介 "DARTS: Differentiable Architecture Search"
論文紹介 "DARTS: Differentiable Architecture Search"Yuta Koreeda
 
FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning Dr. Swaminathan Kathirvel
 
Hopper アーキテクチャで、変わること、変わらないこと
Hopper アーキテクチャで、変わること、変わらないことHopper アーキテクチャで、変わること、変わらないこと
Hopper アーキテクチャで、変わること、変わらないことNVIDIA Japan
 
[Track2-2] 最新のNVIDIA AmpereアーキテクチャによるNVIDIA A100 TensorコアGPUの特長とその性能を引き出す方法
[Track2-2] 最新のNVIDIA AmpereアーキテクチャによるNVIDIA A100 TensorコアGPUの特長とその性能を引き出す方法[Track2-2] 最新のNVIDIA AmpereアーキテクチャによるNVIDIA A100 TensorコアGPUの特長とその性能を引き出す方法
[Track2-2] 最新のNVIDIA AmpereアーキテクチャによるNVIDIA A100 TensorコアGPUの特長とその性能を引き出す方法Deep Learning Lab(ディープラーニング・ラボ)
 
Scikit learnで学ぶ機械学習入門
Scikit learnで学ぶ機械学習入門Scikit learnで学ぶ機械学習入門
Scikit learnで学ぶ機械学習入門Takami Sato
 
(文献紹介)Depth Completionの最新動向
(文献紹介)Depth Completionの最新動向(文献紹介)Depth Completionの最新動向
(文献紹介)Depth Completionの最新動向Morpho, Inc.
 
Chips alliance omni xtend overview
Chips alliance omni xtend overviewChips alliance omni xtend overview
Chips alliance omni xtend overviewRISC-V International
 
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)Fellowship at Vodafone FutureLab
 
[Cloud OnAir] BigQuery の一般公開データセットを 利用した実践的データ分析 2019年3月28日 放送
[Cloud OnAir] BigQuery の一般公開データセットを 利用した実践的データ分析 2019年3月28日 放送[Cloud OnAir] BigQuery の一般公開データセットを 利用した実践的データ分析 2019年3月28日 放送
[Cloud OnAir] BigQuery の一般公開データセットを 利用した実践的データ分析 2019年3月28日 放送Google Cloud Platform - Japan
 
On First-Order Meta-Learning Algorithms
On First-Order Meta-Learning AlgorithmsOn First-Order Meta-Learning Algorithms
On First-Order Meta-Learning AlgorithmsYoonho Lee
 
adversarial training.pptx
adversarial training.pptxadversarial training.pptx
adversarial training.pptxssuserc45ddf
 
SSII2018TS: 大規模深層学習
SSII2018TS: 大規模深層学習SSII2018TS: 大規模深層学習
SSII2018TS: 大規模深層学習SSII
 
fpgax #11+TFUG ハード部:DNN専用ハードについて語る会-2019-02-02 MN-coreについて 金子 紘也
fpgax #11+TFUG ハード部:DNN専用ハードについて語る会-2019-02-02 MN-coreについて 金子 紘也fpgax #11+TFUG ハード部:DNN専用ハードについて語る会-2019-02-02 MN-coreについて 金子 紘也
fpgax #11+TFUG ハード部:DNN専用ハードについて語る会-2019-02-02 MN-coreについて 金子 紘也Preferred Networks
 
最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2
最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2
最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2Preferred Networks
 
さいきんのMySQLに関する取り組み(仮)
さいきんのMySQLに関する取り組み(仮)さいきんのMySQLに関する取り組み(仮)
さいきんのMySQLに関する取り組み(仮)Takanori Sejima
 

What's hot (20)

敵対的生成ネットワーク(GAN)
敵対的生成ネットワーク(GAN)敵対的生成ネットワーク(GAN)
敵対的生成ネットワーク(GAN)
 
PostgreSQLのリカバリ超入門(もしくはWAL、CHECKPOINT、オンラインバックアップの仕組み)
PostgreSQLのリカバリ超入門(もしくはWAL、CHECKPOINT、オンラインバックアップの仕組み)PostgreSQLのリカバリ超入門(もしくはWAL、CHECKPOINT、オンラインバックアップの仕組み)
PostgreSQLのリカバリ超入門(もしくはWAL、CHECKPOINT、オンラインバックアップの仕組み)
 
【DL輪読会】時系列予測 Transfomers の精度向上手法
【DL輪読会】時系列予測 Transfomers の精度向上手法【DL輪読会】時系列予測 Transfomers の精度向上手法
【DL輪読会】時系列予測 Transfomers の精度向上手法
 
DARTS: Differentiable Architecture Search at 社内論文読み会
DARTS: Differentiable Architecture Search at 社内論文読み会DARTS: Differentiable Architecture Search at 社内論文読み会
DARTS: Differentiable Architecture Search at 社内論文読み会
 
論文紹介 "DARTS: Differentiable Architecture Search"
論文紹介 "DARTS: Differentiable Architecture Search"論文紹介 "DARTS: Differentiable Architecture Search"
論文紹介 "DARTS: Differentiable Architecture Search"
 
FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning
 
Hopper アーキテクチャで、変わること、変わらないこと
Hopper アーキテクチャで、変わること、変わらないことHopper アーキテクチャで、変わること、変わらないこと
Hopper アーキテクチャで、変わること、変わらないこと
 
[Track2-2] 最新のNVIDIA AmpereアーキテクチャによるNVIDIA A100 TensorコアGPUの特長とその性能を引き出す方法
[Track2-2] 最新のNVIDIA AmpereアーキテクチャによるNVIDIA A100 TensorコアGPUの特長とその性能を引き出す方法[Track2-2] 最新のNVIDIA AmpereアーキテクチャによるNVIDIA A100 TensorコアGPUの特長とその性能を引き出す方法
[Track2-2] 最新のNVIDIA AmpereアーキテクチャによるNVIDIA A100 TensorコアGPUの特長とその性能を引き出す方法
 
Scikit learnで学ぶ機械学習入門
Scikit learnで学ぶ機械学習入門Scikit learnで学ぶ機械学習入門
Scikit learnで学ぶ機械学習入門
 
(文献紹介)Depth Completionの最新動向
(文献紹介)Depth Completionの最新動向(文献紹介)Depth Completionの最新動向
(文献紹介)Depth Completionの最新動向
 
Chips alliance omni xtend overview
Chips alliance omni xtend overviewChips alliance omni xtend overview
Chips alliance omni xtend overview
 
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
 
[Cloud OnAir] BigQuery の一般公開データセットを 利用した実践的データ分析 2019年3月28日 放送
[Cloud OnAir] BigQuery の一般公開データセットを 利用した実践的データ分析 2019年3月28日 放送[Cloud OnAir] BigQuery の一般公開データセットを 利用した実践的データ分析 2019年3月28日 放送
[Cloud OnAir] BigQuery の一般公開データセットを 利用した実践的データ分析 2019年3月28日 放送
 
On First-Order Meta-Learning Algorithms
On First-Order Meta-Learning AlgorithmsOn First-Order Meta-Learning Algorithms
On First-Order Meta-Learning Algorithms
 
BERT入門
BERT入門BERT入門
BERT入門
 
adversarial training.pptx
adversarial training.pptxadversarial training.pptx
adversarial training.pptx
 
SSII2018TS: 大規模深層学習
SSII2018TS: 大規模深層学習SSII2018TS: 大規模深層学習
SSII2018TS: 大規模深層学習
 
fpgax #11+TFUG ハード部:DNN専用ハードについて語る会-2019-02-02 MN-coreについて 金子 紘也
fpgax #11+TFUG ハード部:DNN専用ハードについて語る会-2019-02-02 MN-coreについて 金子 紘也fpgax #11+TFUG ハード部:DNN専用ハードについて語る会-2019-02-02 MN-coreについて 金子 紘也
fpgax #11+TFUG ハード部:DNN専用ハードについて語る会-2019-02-02 MN-coreについて 金子 紘也
 
最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2
最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2
最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2
 
さいきんのMySQLに関する取り組み(仮)
さいきんのMySQLに関する取り組み(仮)さいきんのMySQLに関する取り組み(仮)
さいきんのMySQLに関する取り組み(仮)
 

Similar to Eyeriss Introduction

ArrayUDF: User-Defined Scientific Data Analysis on Arrays
ArrayUDF: User-Defined Scientific Data Analysis on ArraysArrayUDF: User-Defined Scientific Data Analysis on Arrays
ArrayUDF: User-Defined Scientific Data Analysis on ArraysGoon83
 
ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...
ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...
ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...Advanced-Concepts-Team
 
Data Intensive Research with DISPEL
Data Intensive Research with DISPELData Intensive Research with DISPEL
Data Intensive Research with DISPELOscar Corcho
 
TMPA-2017: Layered Layouts for Software Systems Visualization
TMPA-2017: Layered Layouts for Software Systems VisualizationTMPA-2017: Layered Layouts for Software Systems Visualization
TMPA-2017: Layered Layouts for Software Systems VisualizationIosif Itkin
 
A survey paper on sequence pattern mining with incremental
A survey paper on sequence pattern mining with incrementalA survey paper on sequence pattern mining with incremental
A survey paper on sequence pattern mining with incrementalAlexander Decker
 
A survey paper on sequence pattern mining with incremental
A survey paper on sequence pattern mining with incrementalA survey paper on sequence pattern mining with incremental
A survey paper on sequence pattern mining with incrementalAlexander Decker
 
pandas: a Foundational Python Library for Data Analysis and Statistics
pandas: a Foundational Python Library for Data Analysis and Statisticspandas: a Foundational Python Library for Data Analysis and Statistics
pandas: a Foundational Python Library for Data Analysis and StatisticsWes McKinney
 
Word Embedding Models & Support Vector Machines for Text Classification
Word Embedding Models & Support Vector Machines for Text ClassificationWord Embedding Models & Support Vector Machines for Text Classification
Word Embedding Models & Support Vector Machines for Text ClassificationNa'im Tyson
 
IGUANA: A Generic Framework for Benchmarking the Read-Write Performance of Tr...
IGUANA: A Generic Framework for Benchmarking the Read-Write Performance of Tr...IGUANA: A Generic Framework for Benchmarking the Read-Write Performance of Tr...
IGUANA: A Generic Framework for Benchmarking the Read-Write Performance of Tr...Lixi Conrads
 
Kernel Recipes 2017 - EBPF and XDP - Eric Leblond
Kernel Recipes 2017 - EBPF and XDP - Eric LeblondKernel Recipes 2017 - EBPF and XDP - Eric Leblond
Kernel Recipes 2017 - EBPF and XDP - Eric LeblondAnne Nicolas
 
Cs231n 2017 lecture9 CNN Architecture
Cs231n 2017 lecture9 CNN ArchitectureCs231n 2017 lecture9 CNN Architecture
Cs231n 2017 lecture9 CNN ArchitectureYanbin Kong
 
Session 1.5 supporting virtual integration of linked data with just-in-time...
Session 1.5   supporting virtual integration of linked data with just-in-time...Session 1.5   supporting virtual integration of linked data with just-in-time...
Session 1.5 supporting virtual integration of linked data with just-in-time...semanticsconference
 
Cedal slides. Web Inteligence 2017
Cedal slides. Web Inteligence 2017Cedal slides. Web Inteligence 2017
Cedal slides. Web Inteligence 2017André Valdestilhas
 
Introduction to neural networks and Keras
Introduction to neural networks and KerasIntroduction to neural networks and Keras
Introduction to neural networks and KerasJie He
 
Property Graphs with Time
Property Graphs with TimeProperty Graphs with Time
Property Graphs with TimeopenCypher
 
The Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource ProvisioningThe Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource ProvisioningRafael Ferreira da Silva
 
Open Analytics Environment
Open Analytics EnvironmentOpen Analytics Environment
Open Analytics EnvironmentIan Foster
 
HPC I/O for Computational Scientists
HPC I/O for Computational ScientistsHPC I/O for Computational Scientists
HPC I/O for Computational Scientistsinside-BigData.com
 

Similar to Eyeriss Introduction (20)

ArrayUDF: User-Defined Scientific Data Analysis on Arrays
ArrayUDF: User-Defined Scientific Data Analysis on ArraysArrayUDF: User-Defined Scientific Data Analysis on Arrays
ArrayUDF: User-Defined Scientific Data Analysis on Arrays
 
ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...
ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...
ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...
 
Data Intensive Research with DISPEL
Data Intensive Research with DISPELData Intensive Research with DISPEL
Data Intensive Research with DISPEL
 
TMPA-2017: Layered Layouts for Software Systems Visualization
TMPA-2017: Layered Layouts for Software Systems VisualizationTMPA-2017: Layered Layouts for Software Systems Visualization
TMPA-2017: Layered Layouts for Software Systems Visualization
 
Leopard ISWC Semantic Web Challenge 2017
Leopard ISWC Semantic Web Challenge 2017Leopard ISWC Semantic Web Challenge 2017
Leopard ISWC Semantic Web Challenge 2017
 
A survey paper on sequence pattern mining with incremental
A survey paper on sequence pattern mining with incrementalA survey paper on sequence pattern mining with incremental
A survey paper on sequence pattern mining with incremental
 
A survey paper on sequence pattern mining with incremental
A survey paper on sequence pattern mining with incrementalA survey paper on sequence pattern mining with incremental
A survey paper on sequence pattern mining with incremental
 
pandas: a Foundational Python Library for Data Analysis and Statistics
pandas: a Foundational Python Library for Data Analysis and Statisticspandas: a Foundational Python Library for Data Analysis and Statistics
pandas: a Foundational Python Library for Data Analysis and Statistics
 
Word Embedding Models & Support Vector Machines for Text Classification
Word Embedding Models & Support Vector Machines for Text ClassificationWord Embedding Models & Support Vector Machines for Text Classification
Word Embedding Models & Support Vector Machines for Text Classification
 
IGUANA: A Generic Framework for Benchmarking the Read-Write Performance of Tr...
IGUANA: A Generic Framework for Benchmarking the Read-Write Performance of Tr...IGUANA: A Generic Framework for Benchmarking the Read-Write Performance of Tr...
IGUANA: A Generic Framework for Benchmarking the Read-Write Performance of Tr...
 
Kernel Recipes 2017 - EBPF and XDP - Eric Leblond
Kernel Recipes 2017 - EBPF and XDP - Eric LeblondKernel Recipes 2017 - EBPF and XDP - Eric Leblond
Kernel Recipes 2017 - EBPF and XDP - Eric Leblond
 
Cs231n 2017 lecture9 CNN Architecture
Cs231n 2017 lecture9 CNN ArchitectureCs231n 2017 lecture9 CNN Architecture
Cs231n 2017 lecture9 CNN Architecture
 
Session 1.5 supporting virtual integration of linked data with just-in-time...
Session 1.5   supporting virtual integration of linked data with just-in-time...Session 1.5   supporting virtual integration of linked data with just-in-time...
Session 1.5 supporting virtual integration of linked data with just-in-time...
 
Cedal slides. Web Inteligence 2017
Cedal slides. Web Inteligence 2017Cedal slides. Web Inteligence 2017
Cedal slides. Web Inteligence 2017
 
Introduction to neural networks and Keras
Introduction to neural networks and KerasIntroduction to neural networks and Keras
Introduction to neural networks and Keras
 
Property Graphs with Time
Property Graphs with TimeProperty Graphs with Time
Property Graphs with Time
 
The Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource ProvisioningThe Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource Provisioning
 
Open Analytics Environment
Open Analytics EnvironmentOpen Analytics Environment
Open Analytics Environment
 
Multicore Intel Processors Performance Evaluation
Multicore Intel Processors Performance EvaluationMulticore Intel Processors Performance Evaluation
Multicore Intel Processors Performance Evaluation
 
HPC I/O for Computational Scientists
HPC I/O for Computational ScientistsHPC I/O for Computational Scientists
HPC I/O for Computational Scientists
 

More from Michael Lee

NCTu DIC 2012 term report
NCTu DIC 2012 term reportNCTu DIC 2012 term report
NCTu DIC 2012 term reportMichael Lee
 
Cadence P-cell tutorial
Cadence P-cell tutorial Cadence P-cell tutorial
Cadence P-cell tutorial Michael Lee
 
Skill translator usage guide
Skill translator usage guideSkill translator usage guide
Skill translator usage guideMichael Lee
 
Setup of EDA tools and workstation environment variables in NCTU 307 Lab. wor...
Setup of EDA tools and workstation environment variables in NCTU 307 Lab. wor...Setup of EDA tools and workstation environment variables in NCTU 307 Lab. wor...
Setup of EDA tools and workstation environment variables in NCTU 307 Lab. wor...Michael Lee
 
Writing Cadence Ocean scripts
Writing Cadence Ocean scriptsWriting Cadence Ocean scripts
Writing Cadence Ocean scriptsMichael Lee
 
Generiic RF passive device modeling
Generiic RF passive device modelingGeneriic RF passive device modeling
Generiic RF passive device modelingMichael Lee
 
Allegro PCB教學
Allegro PCB教學Allegro PCB教學
Allegro PCB教學Michael Lee
 

More from Michael Lee (7)

NCTu DIC 2012 term report
NCTu DIC 2012 term reportNCTu DIC 2012 term report
NCTu DIC 2012 term report
 
Cadence P-cell tutorial
Cadence P-cell tutorial Cadence P-cell tutorial
Cadence P-cell tutorial
 
Skill translator usage guide
Skill translator usage guideSkill translator usage guide
Skill translator usage guide
 
Setup of EDA tools and workstation environment variables in NCTU 307 Lab. wor...
Setup of EDA tools and workstation environment variables in NCTU 307 Lab. wor...Setup of EDA tools and workstation environment variables in NCTU 307 Lab. wor...
Setup of EDA tools and workstation environment variables in NCTU 307 Lab. wor...
 
Writing Cadence Ocean scripts
Writing Cadence Ocean scriptsWriting Cadence Ocean scripts
Writing Cadence Ocean scripts
 
Generiic RF passive device modeling
Generiic RF passive device modelingGeneriic RF passive device modeling
Generiic RF passive device modeling
 
Allegro PCB教學
Allegro PCB教學Allegro PCB教學
Allegro PCB教學
 

Recently uploaded

Introduction to Heat Exchangers: Principle, Types and Applications
Introduction to Heat Exchangers: Principle, Types and ApplicationsIntroduction to Heat Exchangers: Principle, Types and Applications
Introduction to Heat Exchangers: Principle, Types and ApplicationsKineticEngineeringCo
 
Multivibrator and its types defination and usges.pptx
Multivibrator and its types defination and usges.pptxMultivibrator and its types defination and usges.pptx
Multivibrator and its types defination and usges.pptxalijaker017
 
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...Lovely Professional University
 
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical EngineeringIntroduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical EngineeringC Sai Kiran
 
Intelligent Agents, A discovery on How A Rational Agent Acts
Intelligent Agents, A discovery on How A Rational Agent ActsIntelligent Agents, A discovery on How A Rational Agent Acts
Intelligent Agents, A discovery on How A Rational Agent ActsSheetal Jain
 
The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...
The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...
The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...Roi Lipman
 
Artificial Intelligence Bayesian Reasoning
Artificial Intelligence Bayesian ReasoningArtificial Intelligence Bayesian Reasoning
Artificial Intelligence Bayesian Reasoninghotman30312
 
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptxSLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptxCHAIRMAN M
 
Online book store management system project.pdf
Online book store management system project.pdfOnline book store management system project.pdf
Online book store management system project.pdfKamal Acharya
 
Fabrication Of Automatic Star Delta Starter Using Relay And GSM Module By Utk...
Fabrication Of Automatic Star Delta Starter Using Relay And GSM Module By Utk...Fabrication Of Automatic Star Delta Starter Using Relay And GSM Module By Utk...
Fabrication Of Automatic Star Delta Starter Using Relay And GSM Module By Utk...ShivamTiwari995432
 
Lesson no16 application of Induction Generator in Wind.ppsx
Lesson no16 application of Induction Generator in Wind.ppsxLesson no16 application of Induction Generator in Wind.ppsx
Lesson no16 application of Induction Generator in Wind.ppsxmichaelprrior
 
Interfacing Analog to Digital Data Converters ee3404.pdf
Interfacing Analog to Digital Data Converters ee3404.pdfInterfacing Analog to Digital Data Converters ee3404.pdf
Interfacing Analog to Digital Data Converters ee3404.pdfragupathi90
 
ChatGPT Prompt Engineering for project managers.pdf
ChatGPT Prompt Engineering for project managers.pdfChatGPT Prompt Engineering for project managers.pdf
ChatGPT Prompt Engineering for project managers.pdfqasastareekh
 
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdf
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdfRESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdf
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdfKamal Acharya
 
BURGER ORDERING SYSYTEM PROJECT REPORT..pdf
BURGER ORDERING SYSYTEM PROJECT REPORT..pdfBURGER ORDERING SYSYTEM PROJECT REPORT..pdf
BURGER ORDERING SYSYTEM PROJECT REPORT..pdfKamal Acharya
 
Introduction to Artificial Intelligence and History of AI
Introduction to Artificial Intelligence and History of AIIntroduction to Artificial Intelligence and History of AI
Introduction to Artificial Intelligence and History of AISheetal Jain
 
Quiz application system project report..pdf
Quiz application system project report..pdfQuiz application system project report..pdf
Quiz application system project report..pdfKamal Acharya
 
RM&IPR M5 notes.pdfResearch Methodolgy & Intellectual Property Rights Series 5
RM&IPR M5 notes.pdfResearch Methodolgy & Intellectual Property Rights Series 5RM&IPR M5 notes.pdfResearch Methodolgy & Intellectual Property Rights Series 5
RM&IPR M5 notes.pdfResearch Methodolgy & Intellectual Property Rights Series 5T.D. Shashikala
 
Theory for How to calculation capacitor bank
Theory for How to calculation capacitor bankTheory for How to calculation capacitor bank
Theory for How to calculation capacitor banktawat puangthong
 
Supermarket billing system project report..pdf
Supermarket billing system project report..pdfSupermarket billing system project report..pdf
Supermarket billing system project report..pdfKamal Acharya
 

Recently uploaded (20)

Introduction to Heat Exchangers: Principle, Types and Applications
Introduction to Heat Exchangers: Principle, Types and ApplicationsIntroduction to Heat Exchangers: Principle, Types and Applications
Introduction to Heat Exchangers: Principle, Types and Applications
 
Multivibrator and its types defination and usges.pptx
Multivibrator and its types defination and usges.pptxMultivibrator and its types defination and usges.pptx
Multivibrator and its types defination and usges.pptx
 
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
 
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical EngineeringIntroduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
 
Intelligent Agents, A discovery on How A Rational Agent Acts
Intelligent Agents, A discovery on How A Rational Agent ActsIntelligent Agents, A discovery on How A Rational Agent Acts
Intelligent Agents, A discovery on How A Rational Agent Acts
 
The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...
The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...
The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...
 
Artificial Intelligence Bayesian Reasoning
Artificial Intelligence Bayesian ReasoningArtificial Intelligence Bayesian Reasoning
Artificial Intelligence Bayesian Reasoning
 
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptxSLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
 
Online book store management system project.pdf
Online book store management system project.pdfOnline book store management system project.pdf
Online book store management system project.pdf
 
Fabrication Of Automatic Star Delta Starter Using Relay And GSM Module By Utk...
Fabrication Of Automatic Star Delta Starter Using Relay And GSM Module By Utk...Fabrication Of Automatic Star Delta Starter Using Relay And GSM Module By Utk...
Fabrication Of Automatic Star Delta Starter Using Relay And GSM Module By Utk...
 
Lesson no16 application of Induction Generator in Wind.ppsx
Lesson no16 application of Induction Generator in Wind.ppsxLesson no16 application of Induction Generator in Wind.ppsx
Lesson no16 application of Induction Generator in Wind.ppsx
 
Interfacing Analog to Digital Data Converters ee3404.pdf
Interfacing Analog to Digital Data Converters ee3404.pdfInterfacing Analog to Digital Data Converters ee3404.pdf
Interfacing Analog to Digital Data Converters ee3404.pdf
 
ChatGPT Prompt Engineering for project managers.pdf
ChatGPT Prompt Engineering for project managers.pdfChatGPT Prompt Engineering for project managers.pdf
ChatGPT Prompt Engineering for project managers.pdf
 
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdf
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdfRESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdf
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdf
 
BURGER ORDERING SYSYTEM PROJECT REPORT..pdf
BURGER ORDERING SYSYTEM PROJECT REPORT..pdfBURGER ORDERING SYSYTEM PROJECT REPORT..pdf
BURGER ORDERING SYSYTEM PROJECT REPORT..pdf
 
Introduction to Artificial Intelligence and History of AI
Introduction to Artificial Intelligence and History of AIIntroduction to Artificial Intelligence and History of AI
Introduction to Artificial Intelligence and History of AI
 
Quiz application system project report..pdf
Quiz application system project report..pdfQuiz application system project report..pdf
Quiz application system project report..pdf
 
RM&IPR M5 notes.pdfResearch Methodolgy & Intellectual Property Rights Series 5
RM&IPR M5 notes.pdfResearch Methodolgy & Intellectual Property Rights Series 5RM&IPR M5 notes.pdfResearch Methodolgy & Intellectual Property Rights Series 5
RM&IPR M5 notes.pdfResearch Methodolgy & Intellectual Property Rights Series 5
 
Theory for How to calculation capacitor bank
Theory for How to calculation capacitor bankTheory for How to calculation capacitor bank
Theory for How to calculation capacitor bank
 
Supermarket billing system project report..pdf
Supermarket billing system project report..pdfSupermarket billing system project report..pdf
Supermarket billing system project report..pdf
 

Eyeriss Introduction

  • 1. Introduction to Eyeriss1 Michael (Tao-Yi) Lee tylee@mlpanda.rocks NTU IoX Center October 24, 2017 1Y. H. Chen et al. “Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks”. In: IEEE Journal of Solid-State Circuits 52.1 (Jan. 2017), pp. 127–138.
  • 2. Outline 1 Introduction 2 Eyeriss Highlights Memory Hierarchy Row Stationary Data Flow Network-on-a-chip (NoC) Compression and Data Gating 3 Summary 4 Appendix Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 2 / 37
  • 3. Introduction Contributions of Eyeriss A novel energy-efficient CNN dataflow that has been verified in a fabricated chip A taxonomy of CNN dataflows that classifies previous work into three categories (WS, OS, NLR) Figure: Eyeriss Die Photo (35 fps @ 278 mW running AlexNet[10]) 4000µm 4000µm 168 PE GlobalBuffer Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 3 / 37
  • 4. Introduction Features of Eyeriss Use row stationary (RS) on spatial architecture with 168 processing elements to reduce energy cost of data flow 4 level memory hierachy: Maximally local data reuse Network-on-a-chip (NoC) Multicast P2P single cycle delievery Compression and data gating Run-length compression (RLC) PE data gating Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 4 / 37
  • 5. Introduction Architecture Core clock : 100-250 MHz / Link Clock: 90 MHz Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 5 / 37
  • 6. Introduction Recap on CNN Forward computation in CONV layers Given ofmap O, ifmap I, bias B, weight W, stride size U O[z][u][x][y] =ReLU B[u] + C−1 k=0 R−1 i=0 S−1 j=0 I[z][k][Ux + i][Uy + j] × W[u][k][i][j] partial sum    (1) where 0 ≤ z < N, 0 ≤ u < M, 0 ≤ y < E, 0 ≤ x < F E = (H − R + U)/U (2) F = (W − S + U)/U (3) Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 6 / 37
  • 7. Eyeriss Highlights Memory Hierarchy PE Matrix and Memory Hierarchy 1. Spatial Architecture: Allows data to flow in four directions 2. PE operates independently with one CLKcore (i.e. not systolic) MAC pixel W psumi psumo MAC pixel W psumi psumo MAC pixel W psumi psumo MAC pixel W psumi psumo MAC pixel W psumi psumo MAC pixel W psumi psumo MAC pixel W psumi psumo MAC pixel W psumi psumo MAC pixel W psumi psumo DRAM Challenge How to optimize data flow in order to minimize energy consumption? Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 7 / 37
  • 8. Eyeriss Highlights Memory Hierarchy PE Matrix and Memory Hierarchy (Eyeriss) 4 level memory hierachy DRAM → Global Buffer (GLB) → Network-on-a-Chip (NoC) → Register File (RF) on-chip RF(1kB) (EC=1X) RF(1kB) (EC=1X) RF(1kB) (EC=1X) RF(1kB) (EC=1X) RF(1kB) (EC=1X) RF(1kB) (EC=1X) RF(1kB) (EC=1X) RF(1kB) (EC=1X) RF(1kB) (EC=1X) NoC (EC=2X) DRAM GLB (EC=6X) FIFO (EC2 =500X) 2Relative energy cost Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 8 / 37
  • 9. Eyeriss Highlights Row Stationary Data Flow CNN dataflows Row Stationary Weight Stationary (WS) Output Stationary (OS) No Local Reuse (NLR) Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 9 / 37
  • 10. Eyeriss Highlights Row Stationary Data Flow Comparison of Dataflows (I) Focus on flows of psum, weight and pixels in next slides RS uses 1.4X – 2.5X lower energy than other dataflows Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 10 / 37
  • 11. Eyeriss Highlights Row Stationary Data Flow Row Stationary in 1D Convolution a b c Kernel ∗ a b c d e Image = a b c PSum Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 11 / 37
  • 12. Eyeriss Highlights Row Stationary Data Flow Row Stationary in 1D Convolution a b c Kernel ∗ a b c d e Image = a b c PSum Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 11 / 37
  • 13. Eyeriss Highlights Row Stationary Data Flow Row Stationary in 1D Convolution a b c Kernel ∗ a b c d e Image = a b c PSum Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 11 / 37
  • 14. Eyeriss Highlights Row Stationary Data Flow Row Stationary in 1D Convolution (PE) a b c Kernel ∗ a b c d e Image = a b c PSum PEReg File c b a c b a a de Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 12 / 37
  • 15. Eyeriss Highlights Row Stationary Data Flow Row Stationary in 1D Convolution (PE) a b c Kernel ∗ a b c d e Image = a b c PSum PEReg File d c b c b a b e a Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 12 / 37
  • 16. Eyeriss Highlights Row Stationary Data Flow Row Stationary in 1D Convolution (PE) a b c Kernel ∗ a b c d e Image = a b c PSum PEReg File e d c c b a c b a Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 12 / 37
  • 17. Eyeriss Highlights Row Stationary Data Flow Row Stationary in 2D Convolution 1a 1b 1c 2a 2b 2c 3a 3b 3c Kernel ∗ 1a 1b 1c 1d 1e 2a 2b 2c 2d 2e 3a 3b 3c 3d 3e 4a 4b 4c 4d 4e 5a 5b 5c 5d 5e Image = 1a 1b 1c 2a 2b 2c 3a 3b 3c PSum Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 13 / 37
  • 18. Eyeriss Highlights Row Stationary Data Flow Row Stationary in 2D Convolution 1a 1b 1c 2a 2b 2c 3a 3b 3c Kernel ∗ 1a 1b 1c 1d 1e 2a 2b 2c 2d 2e 3a 3b 3c 3d 3e 4a 4b 4c 4d 4e 5a 5b 5c 5d 5e Image = 1a 1b 1c 2a 2b 2c 3a 3b 3c PSum Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 13 / 37
  • 19. Eyeriss Highlights Row Stationary Data Flow Row Stationary in 2D Convolution 1a 1b 1c 2a 2b 2c 3a 3b 3c Kernel ∗ 1a 1b 1c 1d 1e 2a 2b 2c 2d 2e 3a 3b 3c 3d 3e 4a 4b 4c 4d 4e 5a 5b 5c 5d 5e Image = 1a 1b 1c 2a 2b 2c 3a 3b 3c PSum Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 13 / 37
  • 20. Eyeriss Highlights Row Stationary Data Flow Row Stationary in 2D Convolution (PE) 1a 2a 3a PEReg File 3c 3b 3a 3c 3b 3a 1a 3d3e PEReg File 4c 4b 4a 3c 3b 3a 2a 4d4e PEReg File 5c 5b 5a 3c 3b 3a 3a 5d5e PEReg File 2c 2b 2a 2c 2b 2a 1a 2d2e PEReg File 3c 3b 3a 2c 2b 2a 2a 3d3e PEReg File 4c 4b 4a 2c 2b 2a 3a 4d4e PEReg File 1c 1b 1a 1c 1b 1a 1a 1d1e PEReg File 2c 2b 2a 1c 1b 1a 2a 2d2e PEReg File 3c 3b 3a 1c 1b 1a 3a 3d3e Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 14 / 37
  • 21. Eyeriss Highlights Row Stationary Data Flow Row Stationary in 2D Convolution (PE) 1a 2a 3a1b 2b 3b PEReg File 3d 3c 3b 3c 3b 3a 1b 3e 1b PEReg File 4d 4c 4b 3c 3b 3a 2b 4e 2b PEReg File 5d 5c 5b 3c 3b 3a 3b 5e 3b PEReg File 2d 2c 2b 2c 2b 2a 1b 2e 1b PEReg File 3d 3c 3b 2c 2b 2a 2b 3e 2b PEReg File 4d 4c 4b 2c 2b 2a 3b 4e 3b PEReg File 1d 1c 1b 1c 1b 1a 1b 1e 1b PEReg File 2d 2c 2b 1c 1b 1a 2b 2e 2b PEReg File 3d 3c 3b 1c 1b 1a 3b 3e 3b Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 14 / 37
  • 22. Eyeriss Highlights Row Stationary Data Flow Row Stationary in 2D Convolution (PE) 1a 2a 3a1b 2b 3b1c 2c 3c PEReg File 3e 3d 3c 3c 3b 3a 1c 1b 1a PEReg File 4e 4d 4c 3c 3b 3a 2c 2b 2a PEReg File 5e 5d 5c 3c 3b 3a 3c 3b 3a PEReg File 2e 2d 2c 2c 2b 2a 1c 1b 1a PEReg File 3e 3d 3c 2c 2b 2a 2c 2b 2a PEReg File 4e 4d 4c 2c 2b 2a 3c 3b 3a PEReg File 1e 1d 1c 1c 1b 1a 1c 1b 1a PEReg File 2e 2d 2c 1c 1b 1a 2c 2b 2a PEReg File 3e 3d 3c 1c 1b 1a 3c 3b 3a Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 14 / 37
  • 23. Eyeriss Highlights Row Stationary Data Flow Row Stationary in 2D Convolution (PE) 1a 2a 3a1b 2b 3b1c 2c 3c PEReg File 3e 3d 3c 3c 3b 3a 1c 1b 1a PEReg File 4e 4d 4c 3c 3b 3a 2c 2b 2a PEReg File 5e 5d 5c 3c 3b 3a 3c 3b 3a PEReg File 2e 2d 2c 2c 2b 2a 1c 1b 1a PEReg File 3e 3d 3c 2c 2b 2a 2c 2b 2a PEReg File 4e 4d 4c 2c 2b 2a 3c 3b 3a PEReg File 1e 1d 1c 1c 1b 1a 1c 1b 1a PEReg File 2e 2d 2c 1c 1b 1a 2c 2b 2a PEReg File 3e 3d 3c 1c 1b 1a 3c 3b 3a Psum propagate vertically Psum propagate vertically Psum propagate vertically Psum propagate diagnally Pixel propagate diagnally Pixel propagate diagnally Pixel propagate diagnally Pixel propagate diagnally Weight propagate horizontally Weight propagate horizontally Weight propagate horizontally Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 14 / 37
  • 24. Eyeriss Highlights Row Stationary Data Flow Weight Stationary (WS)3 Minimize weight read energy consumption maximize convolutional and filter reuse of weights Examples: Chakradhar et al. 2010 Gokhale et al. 2014 Park et al. 2015 Cavigelli et al. 2015 3Image adopted from Yu-Hsin Chen, Joel Emer, and Vivienne Sze. ISCA 2016 Slides of Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks. Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 15 / 37
  • 25. Eyeriss Highlights Row Stationary Data Flow Output Stationary (OS)4 Minimize partial sum R/W energy consumption maximize local accumulation Examples: Gupta et al. 2015 Du et al. 2015 Peemen et al. 2013 4Image adopted from Yu-Hsin Chen, Joel Emer, and Vivienne Sze. ISCA 2016 Slides of Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks. Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 16 / 37
  • 26. Eyeriss Highlights Row Stationary Data Flow No Local Reuse (NLR)5 Use a large global buffer as shared storage Reduce DRAM access energy consumption Examples: Chen et al. 2014 Chen et al. 2014 Zhang et al. 2015 5Image adopted from Yu-Hsin Chen, Joel Emer, and Vivienne Sze. ISCA 2016 Slides of Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks. Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 17 / 37
  • 27. Eyeriss Highlights Row Stationary Data Flow Comparison of Dataflows (II) RS reuses data in local register files (RF), a lot! ⇒ Saves energy of moving data Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 18 / 37
  • 28. Eyeriss Highlights Row Stationary Data Flow Beyond 2D Convolution - Multiple Images Processing in PE Concatenate image rows Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 19 / 37
  • 29. Eyeriss Highlights Row Stationary Data Flow Beyond 2D Convolution - Multiple Filters Processing in PE Interleave filter rows Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 20 / 37
  • 30. Eyeriss Highlights Row Stationary Data Flow Beyond 2D Convolution - Multiple Channels Processing in PE Interleave channels Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 21 / 37
  • 31. Eyeriss Highlights Row Stationary Data Flow AlexNet Revisited AlexNet Convolutional Layer Configurations Layer Filter Size (R) # Filters (M) # Channels (C) Stride Max Pooling 1 11x11 96 3 4 2 5x5 256 48 1 3×3 S2 3 3x3 384 256 1 3×3 S2 4 3x3 256 192 1 5 3x3 256 192 1 3×3 S2 Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 22 / 37
  • 32. Eyeriss Highlights Row Stationary Data Flow AlexNet PE Mapping Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 23 / 37
  • 33. Eyeriss Highlights Row Stationary Data Flow AlexNet Inter-Pass Data Caching Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 24 / 37
  • 34. Eyeriss Highlights Row Stationary Data Flow AlexNet Shape Parameters How do we map AlexNet onto Eyeriss? L H6 R E C M U 1 227 11 55 3 96 4 2 31 5 27 48 256 1 3 15 3 13 256 384 1 4 15 3 13 192 384 1 5 15 3 13 192 256 1 m7 n e p q r t 96 1 7 16 1 1 2 64 1 27 16 2 1 1 64 4 13 16 4 1 4 64 4 13 16 3 2 2 64 4 13 16 3 2 2 6H: ifmap width, R: kernel width, E: ofmap width, C: Channels, M: # kernels, U: Stride 7m: # ofmap chan stored in GLB, n: # ifmap, e: width of PE set, p: # filters proc., q: # chan proc., r: # pe proc. diff. chan., t: # pe proc. diff. filter. Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 25 / 37
  • 35. Eyeriss Highlights Row Stationary Data Flow AlexNet Shape Mapping Illustrated Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 26 / 37
  • 36. Eyeriss Highlights Network-on-a-chip (NoC) NoC Optimized for RS Global input/output network: use Multicast Controller (MC) to broadcaset GLB data into assigned PE. Data is augmented with (row, col) in GLB filter GI/ON ifmap GI/ON psum GI/ON Local network: dedicated 64b data bus is implemented to pass the psums from the bottom PE to the top PE directly Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 27 / 37
  • 37. Eyeriss Highlights Network-on-a-chip (NoC) Global Input / Output Network Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 28 / 37
  • 38. Eyeriss Highlights Network-on-a-chip (NoC) Populate Data with Global Input / Output Network Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 29 / 37
  • 39. Eyeriss Highlights Compression and Data Gating Run-Length Compression (RLC) ReLU produces many zeros in activated ofmap, use RLC to save power in DRAM R/W Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 30 / 37
  • 40. Eyeriss Highlights Compression and Data Gating Data Gating / Zero Skipping Simply skip tasks when either pixel or weight is zero Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 31 / 37
  • 41. Summary Row Stationary Energy Breakdown Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 32 / 37
  • 42. Summary Performance Summary and Comparison Eyeriss[5] NVIDIA TK1 Technology 65nm 1P9M 28nm Chip Size 4.0×4.0 N/A Core Area 3.5×3.5 N/A Gate Count 1176k N/A Word Bit-Width 16b Fixed 32b Float Core Clock(MHz) 200 852 On-Chip Buffer Size (kB) 108 64 Total Register Size (kB) 75.3 256 #MAC 168 192 Throughput(fps) 34.7 68 Measured Power Idle (mW) 3700 Measured Power Active (mW) 278 10002 Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 33 / 37
  • 43. Summary Summary RS optimizes for best overall energy efficiency while existing CNN dataflows only focus on certain data types. RS has higher energy efficiency than existing dataflows 1.4X ∼ 2.5X higher in CONV layers at least 1.3X higher in FC layers. (batch size ≥ 16) Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 34 / 37
  • 44. Appendix Bibliography I Lukas Cavigelli et al. “Origami: A Convolutional Network Accelerator”. In: Proceedings of the 25th Edition on Great Lakes Symposium on VLSI. GLSVLSI ’15. Pittsburgh, Pennsylvania, USA: ACM, 2015, pp. 199–204. isbn: 978-1-4503-3474-7. doi: 10.1145/2742060.2743766. url: http://doi.acm.org/10.1145/2742060.2743766. Srimat Chakradhar et al. “A Dynamically Configurable Coprocessor for Convolutional Neural Networks”. In: Proceedings of the 37th Annual International Symposium on Computer Architecture. ISCA ’10. Saint-Malo, France: ACM, 2010, pp. 247–257. isbn: 978-1-4503-0053-7. Yu-Hsin Chen, Joel Emer, and Vivienne Sze. ISCA 2016 Slides of Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks. Tianshi Chen et al. “DianNao: A Small-footprint High-throughput Accelerator for Ubiquitous Machine-learning”. In: Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems. ASPLOS ’14. Salt Lake City, Utah, USA: ACM, 2014, pp. 269–284. isbn: 978-1-4503-2305-5. doi: 10.1145/2541940.2541967. url: http://doi.acm.org/10.1145/2541940.2541967. Y. H. Chen et al. “Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks”. In: IEEE Journal of Solid-State Circuits 52.1 (Jan. 2017), pp. 127–138. Y. Chen et al. “DaDianNao: A Machine-Learning Supercomputer”. In: 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture. Dec. 2014, pp. 609–622. doi: 10.1109/MICRO.2014.58. Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 35 / 37
  • 45. Appendix Bibliography II Z. Du et al. “ShiDianNao: Shifting vision processing closer to the sensor”. In: 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA). June 2015, pp. 92–104. doi: 10.1145/2749469.2750389. V. Gokhale et al. “A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks”. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops. June 2014, pp. 696–701. doi: 10.1109/CVPRW.2014.106. Suyog Gupta et al. “Deep Learning with Limited Numerical Precision”. In: Proceedings of the 32Nd International Conference on International Conference on Machine Learning - Volume 37. ICML’15. Lille, France: JMLR.org, 2015, pp. 1737–1746. url: http://dl.acm.org/citation.cfm?id=3045118.3045303. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. “ImageNet Classification with Deep Convolutional Neural Networks”. In: Advances in Neural Information Processing Systems 25. Ed. by F. Pereira et al. 2012, pp. 1097–1105. S. Park et al. “4.6 A1.93TOPS/W scalable deep learning/inference processor with tetra-parallel MIMD architecture for big-data applications”. In: 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers. Feb. 2015, pp. 1–3. doi: 10.1109/ISSCC.2015.7062935. M. Peemen et al. “Memory-centric accelerator design for Convolutional Neural Networks”. In: 2013 IEEE 31st International Conference on Computer Design (ICCD). Oct. 2013, pp. 13–19. doi: 10.1109/ICCD.2013.6657019. Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 36 / 37
  • 46. Appendix Bibliography III Chen Zhang et al. “Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks”. In: Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. FPGA ’15. Monterey, California, USA: ACM, 2015, pp. 161–170. isbn: 978-1-4503-3315-3. doi: 10.1145/2684746.2689060. url: http://doi.acm.org/10.1145/2684746.2689060. Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 37 / 37