SlideShare a Scribd company logo
DESIGNING EFFICIENT PARALLEL
PROCESSING
IN 3D STANDARD-CHIP STACKING
SYSTEM WITH STANDARD BUS
Takeshi Ohkawa* Kanemitsu Ootsu* Takashi Yokota*
Katsuya Kikuchi** Masahiro Aoyagi**
* Dept. of Information Systems Science,
Graduate School of Engineering, Utsunomiya University
** 3D Integration System Group,
Nano-electronics Research Institute, National Institute of
Advanced Industrial Science and Technology (AIST)
2017/9/18 MCSoC2017@Korea Univ., Seoul 1
Outline
• 1. Introduction
• 2. 3D-SCSS (Standard Chip Stacked System)
Design Method
• Model of target 3D-SCSS system and TSV technology
• Model-driven parallel system design flow
• 3. Design Example
• Overview of image recognition (ORB feature extraction)
• Parallel processing for Scale Pyramid Generation Process
• Discussion on Processing Performance
• Estimation of Power Consumption
• 4. Conclusion
2017/9/18 MCSoC2017@Korea Univ., Seoul 2
Research Background
• Requirement: High-performance at low energy
• Smartphones, tablets, information home appliances, IoT (Internet
of Things), M2M (Machine to Machine), automotive
• eg.) Image recognition (Local feature, DNN)
• General tradeoff: flexibility and energy efficiency
• Energy Efficiency
• ASIC: DSP/FPGA : SW=1000:10:1
• Flexibility
• State-of-the-art algorithm changes fast!
• Heterogeneous integration of general/special LSIs
→ High energy-efficiency, High-performance
2017/9/18 MCSoC2017@Korea Univ., Seoul 3
Heterogeneous Integration of LSIs
• Conventional
• Electronic Circuit Board integration
• TSV(Through Silicon Via) technology
• extends the limit of horizontal integration
• opens holes (Via) through silicon chips to connect
electric signals and power supply vertically to the back
of the chips
• Related technologies
• Interposer (a chip dedicated for inter-chip wiring,
electrical [7] or optical [8])
• wireless (vertical) communications between chips [9]
2017/9/18 MCSoC2017@Korea Univ., Seoul 4
[7] Kurita, Yoichiro, et al. "A novel" SMAFTI" package for inter-chip wide-band data transfer." Electronic Components and
Technology Conference, 2006. Proceedings. 56th. IEEE, 2006.
[8] Arakawa, Yasuhiko, et al. "Silicon photonics for next generation system integration platform." IEEE Communications
Magazine 51.3 (2013), pp. 72-77, 2013.
[9] Miura, Noriyuki, et al. "Analysis and design of inductive coupling and transceiver circuit for inductive inter-chip wireless
superconnect." IEEE Journal of Solid-State Circuits 40.4, pp. 829-837, 2005.
Research Objectives
• Proposal of 3D-SCSS with Standard BUS
(Three-Dimensional Standard-Chip Stacked System)
• To improve design productivity to satisfy performance and
energy requirement by exploiting parallelism of algorithm
• Study of an example case design
• Mapping parallel processing of image recognition
• Evaluate the performance and energy efficiency
2017/9/18 MCSoC2017@Korea Univ., Seoul 5
Outline
• 1. Introduction
• 2. 3D-SCSS (Standard Chip Stacked System)
Design Method
• Model of target 3D-SCSS system and TSV technology
• Model-driven parallel system design flow
• 3. Design Example
• Overview of image recognition (ORB feature extraction)
• Parallel processing for Scale Pyramid Generation Process
• Discussion on Processing Performance
• Estimation of Power Consumption
• 4. Conclusion
2017/9/18 MCSoC2017@Korea Univ., Seoul 6
Our Proposal: 3D-SCSS with Standard BUS
(Three-Dimensional Standard-Chip Stacked System)
• Heterogeneous 3D Integration of standard and special-
purpose LSI chips for performance/energy
• Define “Standard BUS” for Stock and Reuse
• By defining Standard Socket (like PCI on PC
Motherboard), just stacking the stocked chips for
building-up desired system.
• Physical size, layout
• Electrical voltage and impedance
• Layered communication protocols:
datalink, network, transport, application…
• Our GOAL
• To improve design productivity to satisfy performance and
energy requirement at application level
2017/9/18 MCSoC2017@Korea Univ., Seoul 7
3D Interconnect for Multi Core Internal Bus
■ 2D Interconnect
RF/
Analog
Memory
Logic
I/O
64 or 128-bit
On-Chip Bus
・・・Horizontal
2D system
- Long wiring for bus communication
- Limitation of signal line cumber
- Large size bus-driver circuits
- Many repeater buffer circuits
- Difficult of Integration
of heterogeneous chips
■ 3D Interconnect
3D system
- Short wiring for bus communication
- Architecture of wide Interchip bus
- Smaller size of bus interface circuits
(low capacitance TSVs)
TSVs/
micro-bumps
Array
1600-bit Wide Interchip Bus
・・・Vertical
TSV: Though Si Via
Heterogeneous Multi
LSI Chip Stack System
3D Interchip Communication with High Data Transfer
Rate between Heterogeneous Multi Chips
Multi Core System LSI
2017/9/18 MCSoC2017@Korea Univ., Seoul 8
Presented in 3D Test Workshop, Anaheim CA, USA, September 13, 2013
2017 update!
20,000-bit
Wide Bus Chip-to-Chip Interconnection to Realize
Scalable Stacking for Multi LSI Chip Stack System
Scalable
Stacking for
Heterogeneous
Multi LSI
Chip Stack
System
Si Interposer
Package Substrate
COOL Interconnect:
Wide Bus Chip-to-Chip Interconnection
Standard Interface Circuits TSV (10mmF, 50mmD)
Micro-bump
50μm
- Chip level stack process for chip stacking : Low internal stress bonding
- High density of 3D interconnect: fine-TSVs/micro-bumps, fine-pitch
2017/9/18 MCSoC2017@Korea Univ., Seoul 9
Presented in 3D Test Workshop, Anaheim CA, USA, September 13, 2013
Wide Bus Communication
Test LSI Device
Ultra Wide Bus Interface Circuits Block
Occupied Area : 2.16mm-square
Power, GND: 400 Al pads
Outer Connection
Wiring Bonding Pads
Al Pad Array Area: 2mm-square
40x40(=1600), 20μm-sq., 50μm-pitch
Chip Size 8.3 mm x 6.0 mm
Clock Freq. 50 MHz
Power Voltage 2.5 V
Bus Signal Number 1600
Bus Data Rate 6.4 GB/s
@1024 bit, 50 MHz
Bus Occupied Area 4.67 mm2
TSV/Bump Area 4 mm2 (86 %)
Driver Circuits Area 0.67 mm2 (14 %)
0.25μm-CMOS Technology
2017/9/18 MCSoC2017@Korea Univ., Seoul 10
Presented in 3D Test Workshop, Anaheim CA, USA, September 13, 2013
Summary of 3D-SCSS standard BUS [10]
Parameter Value
Physical size 2 mm x 2 mm
BUS location Center of the chip
Number of TSVs and bumps
[TSVs for data signal]
1600 (40x40)
[1024]
Signal Frequency 50 MHz
Communication Capacity 51.2 Gbps
Power consumption
(Flip-chip result)
97mW *
@ 50% toggle rate
2017/9/18 MCSoC2017@Korea Univ., Seoul 11
* Only the I/O power is measured separately.
Aoyagi, M.; Imura, F.; Nemoto, S.; Watanabe, N.; Kato, F.; Kikuchi, K.; Nakagawa, H.; Hagimoto, M.; Uchida,
H.; Matsumoto, Y., "Wide bus chip-to-chip interconnection technology using fine pitch bump joint array for 3D
LSI chip stacking," 2nd IEEE CPMT Symposium Japan, Kyoto, 2012, pp. 1-4, 2012.
Design method of 3D-SCSS using
standard BUS
• Conventional: There is no standard access method
between vertically stacked chips.
• Intra-chip: CPU/Memory BUS (eg. ARM AXI4)
• Inter-chip: High-speed serial, Ethernet, …
• Consider the future scalability/flexibility!
• Each chip would be enough complex system.
• Loosely-coupled architecture is needed.
2017/9/18 MCSoC2017@Korea Univ., Seoul 12
Mapping KPN[16] on 3D-SCSS
• KPN[16]: Kahn Process Network (Process and FIFO model)
• Mapping
• A process onto a processor element on a chip
• A buffer onto a memory element on a chip
• Application layer
• Process to process data exchange through a buffer
• Control process to reduce the KPN[16] complexity
2017/9/18 MCSoC2017@Korea Univ., Seoul 13
Proc
ess
Proc
ess
Proc
ess
Control
process
[16] G. Kahn, “The semantics of a simple language for parallel programming,” Proc. of the IFIP
Congress 74. North-Holland Publishing Co., pp. 471-475, 1974
Outline
• 1. Introduction
• 2. 3D-SCSS (Standard Chip Stacked System)
Design Method
• Model of target 3D-SCSS system and TSV technology
• Model-driven parallel system design flow
• 3. Design Example
• Overview of image recognition (ORB feature extraction)
• Parallel processing for Scale Pyramid Generation Process
• Discussion on Processing Performance
• Estimation of Power Consumption
• 4. Conclusion
2017/9/18 MCSoC2017@Korea Univ., Seoul 14
Example Process Network for Image
Recognition (ORB Feature Extraction)
• Image recognition using ORB Feature Descriptor
• Process1: Preprocessing of the input image
• Process2: Scaled Image Generation
• Process3: Key-point extraction
• Process4: Feature description
• Process5: Matching/ Machine Learning
• Process6: Output of the Recognition result
P2
Scaled
Image
Generat
ion
P3
Key-
point
extracti
on
P1
Preproc
essing
of the
input
image
Input image
Resized images
P4
Feature
descripti
on
P5
Matching
/
Machine
Learning
P6
Output
of the
Recogni
tion
result
Preprocessed image
Image + Key-point
information
Feature Desctiptor
Maching / Learning
Result
Controller
2017/9/18 MCSoC2017@Korea Univ., Seoul 15
Sub Process Network for Scaled
Images Generation Process
2017/9/18 MCSoC2017@Korea Univ., Seoul 16
(a)
Image
Scaling
1/1.2
(b)
Image
Scaling
1/1.2
(g)
Image
Scaling
1/1.2
(A)
Image
Scaling
1/1.2
(B)
Image
Scaling
1/1.44
(G)
Image
Scaling
1/3.58
1 1/1.2 1/1.44 1/2.99 1/3.58
1 1/1.2
1/1.44
1/3.58
(b) Independent
Resize
(a) Iterative
Resize
Processing time in PC environment
• Image size: 4096×2380
• Software: OpenCV2.4.6.1 (ORB descriptor, resize)
• Resize algorithm: linear interpolation
• Hardware:
• CPU: AMD Phenom II
905e(2.5GHz)
• Observation
• Independent resize
takes more time
• Reason: large input
image size
2017/9/18 MCSoC2017@Korea Univ., Seoul 17
0
5
10
15
20
25
30
35
1 2 3 4 5 6 7
ProcessingTime(ms)
The level of the Scale Pyramid
Iterative Image Resize
(1/1.2 x 8 times)
Independent Image
Resize (1/1.2)^n
3D-SCSS Mapping Example
(case 1: Iterative Resize)
Memory Chip
Processor Chip1
R 10MB/s
W 7MB/s
Data transfer rate @ 10fps [MB/s]
R 94MB/s
W 65MB/s
R 65MB/s
W 45MB/s
R 281MB/s
W 194MB/s
Processor Chip2
Processor Chip7
POINTS
・7 chips
・Each chip works independently
・no need of sync
・All the results are written to memory
0
20
40
60
80
100
a b c d e f g
Datarate(MB/s)
Read
Write
(a)
resize
1/1.2
(b)
resize
1/1.2
(g)
resize
1/1.2
9.4MB
6.5MB 4.5MB
1.0MB 0.7MB
6.5MB
Processor
On-chip memory
FIFO buffers are assign to memory chip
FIFO:assigned to memory chip
4K(4096x2304)
1byte/pixel
2017/9/18 MCSoC2017@Korea Univ., Seoul 18
3D-SCSS Mapping Example
(case 2: Independent Resize)
Memory Chip
Processor Chip1
(R 94MB/s)
W 7MB/s
R 94MB/s
W 65MB/s
(R 94MB/s)
W 76MB/s
R 658 (94)MB/s
W 194MB/s
Processor Chip2
Processor Chip7
(A)
resize
1/1.2
(B)
resize
1/1.44
(G)
resize
1/3.58
0
20
40
60
80
100
A B C D E F G
Datarate(MB/s)
Read
Write
9.4MB 6.5MB
4.5MB
0.7MB
0
20
40
60
80
100
A B C D E F G
Datarate(MB/s)
Read
Write
*with broadcast
Processor
On-chip memory
FIFO:assigned to memory chip
FIFO buffers are assign to memory chip
4K(4096x2304)
1byte/pixel
Data transfer rate @ 10fps [MB/s]
POINTS
・Broadcast may reduce data transfer
・need of sync when broadcasting
・All the results are written to memory
2017/9/18 MCSoC2017@Korea Univ., Seoul 19
Discussion
• Case 2: Independent Resize is better
in terms of data transfer size
• Data transfer reduces with broadcasting!
• Different tradeoff from the
conventional system.
2017/9/18 MCSoC2017@Korea Univ., Seoul 20
0.0
0.5
1.0
1.5
2.0
2.5
3.0
a b c d e f g
DataTransferTime(ms)
Sub Process Name
Write
Read
0.0
0.5
1.0
1.5
2.0
2.5
3.0
A B C D E F G
DataTransferTime(ms)
Sub Process Name
Write
Read
Case 2: independent resizeCase 1: iterative resize
0
5
10
15
20
25
30
35
1 2 3 4 5 6 7
ProcessingTime(ms)
The level of the Scale Pyramid
Iterative Image Resize
(1/1.2 x 8 times)
Independent Image
Resize (1/1.2)^n
PC
Power estimation for data transfer
• TSV electric capacity: 0.3pF
• Energy for 1-bit transfer: 0.3pJ (@1.0V)
• Q[C]=CV, E[J]=QV=CV2
• E=0.3[pF] x (1.0)2 [V2]=0.3 [pJ]
• 1 byte(=8bit) transfer: 2.4[pJ]
• Estimated results (10fps)
• Minimum 691.2μW
(69.12μJ per frame)
0
500
1,000
1,500
2,000
2,500
(a) Iterative (b) Independent (b') Independent,
broadcast
Powerconsumption@10fps(µW)
Network Mapping
Write [µW]
Read [µW]
2017/9/18 MCSoC2017@Korea Univ., Seoul 21
Conclusion
• Proposed the design method of 3D-SCSS(Three-
Dimensional Standard-Chip Stacked System) with
Standard BUS
• Stacking by: TSV(Through Silicon Via)+Bump
• Design method: KPN mapping
• A design case of image scaling is studied
• Improvement in performance can be expected by
introducing another type of parallel processing, which is
different tradeoff from that of under normal PC environment.
• Communication and synchronization mechanism
• By realizing broadcasting communication in the 3D-SCSS, it
is expected to reduce further energy consumption.
2017/9/18 MCSoC2017@Korea Univ., Seoul 22
THANK YOU
2017/9/18 MCSoC2017@Korea Univ., Seoul 23

More Related Content

What's hot

IRJET- Significant Neural Networks for Classification of Product Images
IRJET- Significant Neural Networks for Classification of Product ImagesIRJET- Significant Neural Networks for Classification of Product Images
IRJET- Significant Neural Networks for Classification of Product Images
IRJET Journal
 
56
5656
AN EFFICIENT M-ARY QIM DATA HIDING ALGORITHM FOR THE APPLICATION TO IMAGE ERR...
AN EFFICIENT M-ARY QIM DATA HIDING ALGORITHM FOR THE APPLICATION TO IMAGE ERR...AN EFFICIENT M-ARY QIM DATA HIDING ALGORITHM FOR THE APPLICATION TO IMAGE ERR...
AN EFFICIENT M-ARY QIM DATA HIDING ALGORITHM FOR THE APPLICATION TO IMAGE ERR...
IJNSA Journal
 
Road Network Extraction using Satellite Imagery.
Road Network Extraction using Satellite Imagery.Road Network Extraction using Satellite Imagery.
Road Network Extraction using Satellite Imagery.
SUMITRAJ312049
 
Simulating the triba noc architecture
Simulating the triba noc architectureSimulating the triba noc architecture
Simulating the triba noc architecture
ijmnct
 
IRJET- Study of MIMO Precoding Techniques and their Application using Joi...
IRJET-  	  Study of MIMO Precoding Techniques and their Application using Joi...IRJET-  	  Study of MIMO Precoding Techniques and their Application using Joi...
IRJET- Study of MIMO Precoding Techniques and their Application using Joi...
IRJET Journal
 
Ijciet 10 02_043
Ijciet 10 02_043Ijciet 10 02_043
Ijciet 10 02_043
IAEME Publication
 
A Review: Integrating SUGAR simulating tool and MEMS sensor
A Review: Integrating SUGAR simulating tool and MEMS sensorA Review: Integrating SUGAR simulating tool and MEMS sensor
A Review: Integrating SUGAR simulating tool and MEMS sensor
IJERA Editor
 
Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...
Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...
Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...
ijcsit
 
“Introduction to DNN Model Compression Techniques,” a Presentation from Xailient
“Introduction to DNN Model Compression Techniques,” a Presentation from Xailient“Introduction to DNN Model Compression Techniques,” a Presentation from Xailient
“Introduction to DNN Model Compression Techniques,” a Presentation from Xailient
Edge AI and Vision Alliance
 
Effective Sparse Matrix Representation for the GPU Architectures
 Effective Sparse Matrix Representation for the GPU Architectures Effective Sparse Matrix Representation for the GPU Architectures
Effective Sparse Matrix Representation for the GPU Architectures
IJCSEA Journal
 
Text Recognition using Convolutional Neural Network: A Review
Text Recognition using Convolutional Neural Network: A ReviewText Recognition using Convolutional Neural Network: A Review
Text Recognition using Convolutional Neural Network: A Review
IRJET Journal
 
IRJET- Automated Detection of Diabetic Retinopathy using Compressed Sensing
IRJET- Automated Detection of Diabetic Retinopathy using Compressed SensingIRJET- Automated Detection of Diabetic Retinopathy using Compressed Sensing
IRJET- Automated Detection of Diabetic Retinopathy using Compressed Sensing
IRJET Journal
 
Deep Learning for Structure-from-Motion (SfM)
Deep Learning for Structure-from-Motion (SfM)Deep Learning for Structure-from-Motion (SfM)
Deep Learning for Structure-from-Motion (SfM)
PetteriTeikariPhD
 
Modelling of next zen memory cell using low power consuming high speed nano d...
Modelling of next zen memory cell using low power consuming high speed nano d...Modelling of next zen memory cell using low power consuming high speed nano d...
Modelling of next zen memory cell using low power consuming high speed nano d...
eSAT Journals
 
“Applying the Right Deep Learning Model with the Right Data for Your Applicat...
“Applying the Right Deep Learning Model with the Right Data for Your Applicat...“Applying the Right Deep Learning Model with the Right Data for Your Applicat...
“Applying the Right Deep Learning Model with the Right Data for Your Applicat...
Edge AI and Vision Alliance
 
Analysis of image storage and retrieval in graded memory
Analysis of image storage and retrieval in graded memoryAnalysis of image storage and retrieval in graded memory
Analysis of image storage and retrieval in graded memory
eSAT Journals
 
Microstructure anlaysis and enhancement of nodular cast iron using digital im...
Microstructure anlaysis and enhancement of nodular cast iron using digital im...Microstructure anlaysis and enhancement of nodular cast iron using digital im...
Microstructure anlaysis and enhancement of nodular cast iron using digital im...
eSAT Journals
 
Graph-based Framework for Function Splitting in CRAN
Graph-based Framework for Function Splitting in CRANGraph-based Framework for Function Splitting in CRAN
Graph-based Framework for Function Splitting in CRAN
Jingchu Liu
 
Generations of computers
Generations of computersGenerations of computers
Generations of computers
shalinigayathri3
 

What's hot (20)

IRJET- Significant Neural Networks for Classification of Product Images
IRJET- Significant Neural Networks for Classification of Product ImagesIRJET- Significant Neural Networks for Classification of Product Images
IRJET- Significant Neural Networks for Classification of Product Images
 
56
5656
56
 
AN EFFICIENT M-ARY QIM DATA HIDING ALGORITHM FOR THE APPLICATION TO IMAGE ERR...
AN EFFICIENT M-ARY QIM DATA HIDING ALGORITHM FOR THE APPLICATION TO IMAGE ERR...AN EFFICIENT M-ARY QIM DATA HIDING ALGORITHM FOR THE APPLICATION TO IMAGE ERR...
AN EFFICIENT M-ARY QIM DATA HIDING ALGORITHM FOR THE APPLICATION TO IMAGE ERR...
 
Road Network Extraction using Satellite Imagery.
Road Network Extraction using Satellite Imagery.Road Network Extraction using Satellite Imagery.
Road Network Extraction using Satellite Imagery.
 
Simulating the triba noc architecture
Simulating the triba noc architectureSimulating the triba noc architecture
Simulating the triba noc architecture
 
IRJET- Study of MIMO Precoding Techniques and their Application using Joi...
IRJET-  	  Study of MIMO Precoding Techniques and their Application using Joi...IRJET-  	  Study of MIMO Precoding Techniques and their Application using Joi...
IRJET- Study of MIMO Precoding Techniques and their Application using Joi...
 
Ijciet 10 02_043
Ijciet 10 02_043Ijciet 10 02_043
Ijciet 10 02_043
 
A Review: Integrating SUGAR simulating tool and MEMS sensor
A Review: Integrating SUGAR simulating tool and MEMS sensorA Review: Integrating SUGAR simulating tool and MEMS sensor
A Review: Integrating SUGAR simulating tool and MEMS sensor
 
Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...
Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...
Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...
 
“Introduction to DNN Model Compression Techniques,” a Presentation from Xailient
“Introduction to DNN Model Compression Techniques,” a Presentation from Xailient“Introduction to DNN Model Compression Techniques,” a Presentation from Xailient
“Introduction to DNN Model Compression Techniques,” a Presentation from Xailient
 
Effective Sparse Matrix Representation for the GPU Architectures
 Effective Sparse Matrix Representation for the GPU Architectures Effective Sparse Matrix Representation for the GPU Architectures
Effective Sparse Matrix Representation for the GPU Architectures
 
Text Recognition using Convolutional Neural Network: A Review
Text Recognition using Convolutional Neural Network: A ReviewText Recognition using Convolutional Neural Network: A Review
Text Recognition using Convolutional Neural Network: A Review
 
IRJET- Automated Detection of Diabetic Retinopathy using Compressed Sensing
IRJET- Automated Detection of Diabetic Retinopathy using Compressed SensingIRJET- Automated Detection of Diabetic Retinopathy using Compressed Sensing
IRJET- Automated Detection of Diabetic Retinopathy using Compressed Sensing
 
Deep Learning for Structure-from-Motion (SfM)
Deep Learning for Structure-from-Motion (SfM)Deep Learning for Structure-from-Motion (SfM)
Deep Learning for Structure-from-Motion (SfM)
 
Modelling of next zen memory cell using low power consuming high speed nano d...
Modelling of next zen memory cell using low power consuming high speed nano d...Modelling of next zen memory cell using low power consuming high speed nano d...
Modelling of next zen memory cell using low power consuming high speed nano d...
 
“Applying the Right Deep Learning Model with the Right Data for Your Applicat...
“Applying the Right Deep Learning Model with the Right Data for Your Applicat...“Applying the Right Deep Learning Model with the Right Data for Your Applicat...
“Applying the Right Deep Learning Model with the Right Data for Your Applicat...
 
Analysis of image storage and retrieval in graded memory
Analysis of image storage and retrieval in graded memoryAnalysis of image storage and retrieval in graded memory
Analysis of image storage and retrieval in graded memory
 
Microstructure anlaysis and enhancement of nodular cast iron using digital im...
Microstructure anlaysis and enhancement of nodular cast iron using digital im...Microstructure anlaysis and enhancement of nodular cast iron using digital im...
Microstructure anlaysis and enhancement of nodular cast iron using digital im...
 
Graph-based Framework for Function Splitting in CRAN
Graph-based Framework for Function Splitting in CRANGraph-based Framework for Function Splitting in CRAN
Graph-based Framework for Function Splitting in CRAN
 
Generations of computers
Generations of computersGenerations of computers
Generations of computers
 

Similar to 2017 09-ohkawa-MCSoC2017-presen

APPLYING GENETIC ALGORITHM TO SOLVE PARTITIONING AND MAPPING PROBLEM FOR MESH...
APPLYING GENETIC ALGORITHM TO SOLVE PARTITIONING AND MAPPING PROBLEM FOR MESH...APPLYING GENETIC ALGORITHM TO SOLVE PARTITIONING AND MAPPING PROBLEM FOR MESH...
APPLYING GENETIC ALGORITHM TO SOLVE PARTITIONING AND MAPPING PROBLEM FOR MESH...
ijcsit
 
Applying Genetic Algorithm to Solve Partitioning and Mapping Problem for Mesh...
Applying Genetic Algorithm to Solve Partitioning and Mapping Problem for Mesh...Applying Genetic Algorithm to Solve Partitioning and Mapping Problem for Mesh...
Applying Genetic Algorithm to Solve Partitioning and Mapping Problem for Mesh...
AIRCC Publishing Corporation
 
T4408103107
T4408103107T4408103107
T4408103107
IJERA Editor
 
Design and Implementation of JPEG CODEC using NoC
Design and Implementation of JPEG CODEC using NoCDesign and Implementation of JPEG CODEC using NoC
Design and Implementation of JPEG CODEC using NoC
IRJET Journal
 
Trends and challenges in IP based SOC design
Trends and challenges in IP based SOC designTrends and challenges in IP based SOC design
Trends and challenges in IP based SOC design
AishwaryaRavishankar8
 
Lecture_IIITD.pptx
Lecture_IIITD.pptxLecture_IIITD.pptx
Lecture_IIITD.pptx
achakracu
 
Design of Tele command SOC-IP by AES Cryptographic Method Using VHDL
Design of Tele command SOC-IP by AES Cryptographic Method Using VHDLDesign of Tele command SOC-IP by AES Cryptographic Method Using VHDL
Design of Tele command SOC-IP by AES Cryptographic Method Using VHDL
dbpublications
 
IRJET- Intelligent Character Recognition of Handwritten Characters
IRJET- Intelligent Character Recognition of Handwritten CharactersIRJET- Intelligent Character Recognition of Handwritten Characters
IRJET- Intelligent Character Recognition of Handwritten Characters
IRJET Journal
 
SEMINAR[2].pptx automatic circuit design
SEMINAR[2].pptx automatic circuit designSEMINAR[2].pptx automatic circuit design
SEMINAR[2].pptx automatic circuit design
ShaelMalik
 
An octa core processor with shared memory and message-passing
An octa core processor with shared memory and message-passingAn octa core processor with shared memory and message-passing
An octa core processor with shared memory and message-passing
eSAT Journals
 
imagefiltervhdl.pptx
imagefiltervhdl.pptximagefiltervhdl.pptx
imagefiltervhdl.pptx
Akbarali206563
 
IRJET- 3D Object Recognition of Car Image Detection
IRJET-  	  3D Object Recognition of Car Image DetectionIRJET-  	  3D Object Recognition of Car Image Detection
IRJET- 3D Object Recognition of Car Image Detection
IRJET Journal
 
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSINGHOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
cscpconf
 
An Effect of Compressive Sensing on Image Steganalysis
An Effect of Compressive Sensing on Image SteganalysisAn Effect of Compressive Sensing on Image Steganalysis
An Effect of Compressive Sensing on Image Steganalysis
IRJET Journal
 
DIGEST PODCAST
DIGEST PODCASTDIGEST PODCAST
DIGEST PODCAST
IRJET Journal
 
An Investigation towards Effectiveness in Image Enhancement Process in MPSoC
An Investigation towards Effectiveness in Image Enhancement Process in MPSoC An Investigation towards Effectiveness in Image Enhancement Process in MPSoC
An Investigation towards Effectiveness in Image Enhancement Process in MPSoC
IJECEIAES
 
Ramprakash Resume
Ramprakash ResumeRamprakash Resume
Ramprakash Resume
Ram Prakash
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
IJERD Editor
 
20607-39024-1-PB.pdf
20607-39024-1-PB.pdf20607-39024-1-PB.pdf
20607-39024-1-PB.pdf
IjictTeam
 
IRJET- Flexible DSP Accelerator Architecture using Carry Lookahead Tree
IRJET- Flexible DSP Accelerator Architecture using Carry Lookahead TreeIRJET- Flexible DSP Accelerator Architecture using Carry Lookahead Tree
IRJET- Flexible DSP Accelerator Architecture using Carry Lookahead Tree
IRJET Journal
 

Similar to 2017 09-ohkawa-MCSoC2017-presen (20)

APPLYING GENETIC ALGORITHM TO SOLVE PARTITIONING AND MAPPING PROBLEM FOR MESH...
APPLYING GENETIC ALGORITHM TO SOLVE PARTITIONING AND MAPPING PROBLEM FOR MESH...APPLYING GENETIC ALGORITHM TO SOLVE PARTITIONING AND MAPPING PROBLEM FOR MESH...
APPLYING GENETIC ALGORITHM TO SOLVE PARTITIONING AND MAPPING PROBLEM FOR MESH...
 
Applying Genetic Algorithm to Solve Partitioning and Mapping Problem for Mesh...
Applying Genetic Algorithm to Solve Partitioning and Mapping Problem for Mesh...Applying Genetic Algorithm to Solve Partitioning and Mapping Problem for Mesh...
Applying Genetic Algorithm to Solve Partitioning and Mapping Problem for Mesh...
 
T4408103107
T4408103107T4408103107
T4408103107
 
Design and Implementation of JPEG CODEC using NoC
Design and Implementation of JPEG CODEC using NoCDesign and Implementation of JPEG CODEC using NoC
Design and Implementation of JPEG CODEC using NoC
 
Trends and challenges in IP based SOC design
Trends and challenges in IP based SOC designTrends and challenges in IP based SOC design
Trends and challenges in IP based SOC design
 
Lecture_IIITD.pptx
Lecture_IIITD.pptxLecture_IIITD.pptx
Lecture_IIITD.pptx
 
Design of Tele command SOC-IP by AES Cryptographic Method Using VHDL
Design of Tele command SOC-IP by AES Cryptographic Method Using VHDLDesign of Tele command SOC-IP by AES Cryptographic Method Using VHDL
Design of Tele command SOC-IP by AES Cryptographic Method Using VHDL
 
IRJET- Intelligent Character Recognition of Handwritten Characters
IRJET- Intelligent Character Recognition of Handwritten CharactersIRJET- Intelligent Character Recognition of Handwritten Characters
IRJET- Intelligent Character Recognition of Handwritten Characters
 
SEMINAR[2].pptx automatic circuit design
SEMINAR[2].pptx automatic circuit designSEMINAR[2].pptx automatic circuit design
SEMINAR[2].pptx automatic circuit design
 
An octa core processor with shared memory and message-passing
An octa core processor with shared memory and message-passingAn octa core processor with shared memory and message-passing
An octa core processor with shared memory and message-passing
 
imagefiltervhdl.pptx
imagefiltervhdl.pptximagefiltervhdl.pptx
imagefiltervhdl.pptx
 
IRJET- 3D Object Recognition of Car Image Detection
IRJET-  	  3D Object Recognition of Car Image DetectionIRJET-  	  3D Object Recognition of Car Image Detection
IRJET- 3D Object Recognition of Car Image Detection
 
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSINGHOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
 
An Effect of Compressive Sensing on Image Steganalysis
An Effect of Compressive Sensing on Image SteganalysisAn Effect of Compressive Sensing on Image Steganalysis
An Effect of Compressive Sensing on Image Steganalysis
 
DIGEST PODCAST
DIGEST PODCASTDIGEST PODCAST
DIGEST PODCAST
 
An Investigation towards Effectiveness in Image Enhancement Process in MPSoC
An Investigation towards Effectiveness in Image Enhancement Process in MPSoC An Investigation towards Effectiveness in Image Enhancement Process in MPSoC
An Investigation towards Effectiveness in Image Enhancement Process in MPSoC
 
Ramprakash Resume
Ramprakash ResumeRamprakash Resume
Ramprakash Resume
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
20607-39024-1-PB.pdf
20607-39024-1-PB.pdf20607-39024-1-PB.pdf
20607-39024-1-PB.pdf
 
IRJET- Flexible DSP Accelerator Architecture using Carry Lookahead Tree
IRJET- Flexible DSP Accelerator Architecture using Carry Lookahead TreeIRJET- Flexible DSP Accelerator Architecture using Carry Lookahead Tree
IRJET- Flexible DSP Accelerator Architecture using Carry Lookahead Tree
 

Recently uploaded

Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfAI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
Techgropse Pvt.Ltd.
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 

Recently uploaded (20)

Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfAI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 

2017 09-ohkawa-MCSoC2017-presen

  • 1. DESIGNING EFFICIENT PARALLEL PROCESSING IN 3D STANDARD-CHIP STACKING SYSTEM WITH STANDARD BUS Takeshi Ohkawa* Kanemitsu Ootsu* Takashi Yokota* Katsuya Kikuchi** Masahiro Aoyagi** * Dept. of Information Systems Science, Graduate School of Engineering, Utsunomiya University ** 3D Integration System Group, Nano-electronics Research Institute, National Institute of Advanced Industrial Science and Technology (AIST) 2017/9/18 MCSoC2017@Korea Univ., Seoul 1
  • 2. Outline • 1. Introduction • 2. 3D-SCSS (Standard Chip Stacked System) Design Method • Model of target 3D-SCSS system and TSV technology • Model-driven parallel system design flow • 3. Design Example • Overview of image recognition (ORB feature extraction) • Parallel processing for Scale Pyramid Generation Process • Discussion on Processing Performance • Estimation of Power Consumption • 4. Conclusion 2017/9/18 MCSoC2017@Korea Univ., Seoul 2
  • 3. Research Background • Requirement: High-performance at low energy • Smartphones, tablets, information home appliances, IoT (Internet of Things), M2M (Machine to Machine), automotive • eg.) Image recognition (Local feature, DNN) • General tradeoff: flexibility and energy efficiency • Energy Efficiency • ASIC: DSP/FPGA : SW=1000:10:1 • Flexibility • State-of-the-art algorithm changes fast! • Heterogeneous integration of general/special LSIs → High energy-efficiency, High-performance 2017/9/18 MCSoC2017@Korea Univ., Seoul 3
  • 4. Heterogeneous Integration of LSIs • Conventional • Electronic Circuit Board integration • TSV(Through Silicon Via) technology • extends the limit of horizontal integration • opens holes (Via) through silicon chips to connect electric signals and power supply vertically to the back of the chips • Related technologies • Interposer (a chip dedicated for inter-chip wiring, electrical [7] or optical [8]) • wireless (vertical) communications between chips [9] 2017/9/18 MCSoC2017@Korea Univ., Seoul 4 [7] Kurita, Yoichiro, et al. "A novel" SMAFTI" package for inter-chip wide-band data transfer." Electronic Components and Technology Conference, 2006. Proceedings. 56th. IEEE, 2006. [8] Arakawa, Yasuhiko, et al. "Silicon photonics for next generation system integration platform." IEEE Communications Magazine 51.3 (2013), pp. 72-77, 2013. [9] Miura, Noriyuki, et al. "Analysis and design of inductive coupling and transceiver circuit for inductive inter-chip wireless superconnect." IEEE Journal of Solid-State Circuits 40.4, pp. 829-837, 2005.
  • 5. Research Objectives • Proposal of 3D-SCSS with Standard BUS (Three-Dimensional Standard-Chip Stacked System) • To improve design productivity to satisfy performance and energy requirement by exploiting parallelism of algorithm • Study of an example case design • Mapping parallel processing of image recognition • Evaluate the performance and energy efficiency 2017/9/18 MCSoC2017@Korea Univ., Seoul 5
  • 6. Outline • 1. Introduction • 2. 3D-SCSS (Standard Chip Stacked System) Design Method • Model of target 3D-SCSS system and TSV technology • Model-driven parallel system design flow • 3. Design Example • Overview of image recognition (ORB feature extraction) • Parallel processing for Scale Pyramid Generation Process • Discussion on Processing Performance • Estimation of Power Consumption • 4. Conclusion 2017/9/18 MCSoC2017@Korea Univ., Seoul 6
  • 7. Our Proposal: 3D-SCSS with Standard BUS (Three-Dimensional Standard-Chip Stacked System) • Heterogeneous 3D Integration of standard and special- purpose LSI chips for performance/energy • Define “Standard BUS” for Stock and Reuse • By defining Standard Socket (like PCI on PC Motherboard), just stacking the stocked chips for building-up desired system. • Physical size, layout • Electrical voltage and impedance • Layered communication protocols: datalink, network, transport, application… • Our GOAL • To improve design productivity to satisfy performance and energy requirement at application level 2017/9/18 MCSoC2017@Korea Univ., Seoul 7
  • 8. 3D Interconnect for Multi Core Internal Bus ■ 2D Interconnect RF/ Analog Memory Logic I/O 64 or 128-bit On-Chip Bus ・・・Horizontal 2D system - Long wiring for bus communication - Limitation of signal line cumber - Large size bus-driver circuits - Many repeater buffer circuits - Difficult of Integration of heterogeneous chips ■ 3D Interconnect 3D system - Short wiring for bus communication - Architecture of wide Interchip bus - Smaller size of bus interface circuits (low capacitance TSVs) TSVs/ micro-bumps Array 1600-bit Wide Interchip Bus ・・・Vertical TSV: Though Si Via Heterogeneous Multi LSI Chip Stack System 3D Interchip Communication with High Data Transfer Rate between Heterogeneous Multi Chips Multi Core System LSI 2017/9/18 MCSoC2017@Korea Univ., Seoul 8 Presented in 3D Test Workshop, Anaheim CA, USA, September 13, 2013 2017 update! 20,000-bit
  • 9. Wide Bus Chip-to-Chip Interconnection to Realize Scalable Stacking for Multi LSI Chip Stack System Scalable Stacking for Heterogeneous Multi LSI Chip Stack System Si Interposer Package Substrate COOL Interconnect: Wide Bus Chip-to-Chip Interconnection Standard Interface Circuits TSV (10mmF, 50mmD) Micro-bump 50μm - Chip level stack process for chip stacking : Low internal stress bonding - High density of 3D interconnect: fine-TSVs/micro-bumps, fine-pitch 2017/9/18 MCSoC2017@Korea Univ., Seoul 9 Presented in 3D Test Workshop, Anaheim CA, USA, September 13, 2013
  • 10. Wide Bus Communication Test LSI Device Ultra Wide Bus Interface Circuits Block Occupied Area : 2.16mm-square Power, GND: 400 Al pads Outer Connection Wiring Bonding Pads Al Pad Array Area: 2mm-square 40x40(=1600), 20μm-sq., 50μm-pitch Chip Size 8.3 mm x 6.0 mm Clock Freq. 50 MHz Power Voltage 2.5 V Bus Signal Number 1600 Bus Data Rate 6.4 GB/s @1024 bit, 50 MHz Bus Occupied Area 4.67 mm2 TSV/Bump Area 4 mm2 (86 %) Driver Circuits Area 0.67 mm2 (14 %) 0.25μm-CMOS Technology 2017/9/18 MCSoC2017@Korea Univ., Seoul 10 Presented in 3D Test Workshop, Anaheim CA, USA, September 13, 2013
  • 11. Summary of 3D-SCSS standard BUS [10] Parameter Value Physical size 2 mm x 2 mm BUS location Center of the chip Number of TSVs and bumps [TSVs for data signal] 1600 (40x40) [1024] Signal Frequency 50 MHz Communication Capacity 51.2 Gbps Power consumption (Flip-chip result) 97mW * @ 50% toggle rate 2017/9/18 MCSoC2017@Korea Univ., Seoul 11 * Only the I/O power is measured separately. Aoyagi, M.; Imura, F.; Nemoto, S.; Watanabe, N.; Kato, F.; Kikuchi, K.; Nakagawa, H.; Hagimoto, M.; Uchida, H.; Matsumoto, Y., "Wide bus chip-to-chip interconnection technology using fine pitch bump joint array for 3D LSI chip stacking," 2nd IEEE CPMT Symposium Japan, Kyoto, 2012, pp. 1-4, 2012.
  • 12. Design method of 3D-SCSS using standard BUS • Conventional: There is no standard access method between vertically stacked chips. • Intra-chip: CPU/Memory BUS (eg. ARM AXI4) • Inter-chip: High-speed serial, Ethernet, … • Consider the future scalability/flexibility! • Each chip would be enough complex system. • Loosely-coupled architecture is needed. 2017/9/18 MCSoC2017@Korea Univ., Seoul 12
  • 13. Mapping KPN[16] on 3D-SCSS • KPN[16]: Kahn Process Network (Process and FIFO model) • Mapping • A process onto a processor element on a chip • A buffer onto a memory element on a chip • Application layer • Process to process data exchange through a buffer • Control process to reduce the KPN[16] complexity 2017/9/18 MCSoC2017@Korea Univ., Seoul 13 Proc ess Proc ess Proc ess Control process [16] G. Kahn, “The semantics of a simple language for parallel programming,” Proc. of the IFIP Congress 74. North-Holland Publishing Co., pp. 471-475, 1974
  • 14. Outline • 1. Introduction • 2. 3D-SCSS (Standard Chip Stacked System) Design Method • Model of target 3D-SCSS system and TSV technology • Model-driven parallel system design flow • 3. Design Example • Overview of image recognition (ORB feature extraction) • Parallel processing for Scale Pyramid Generation Process • Discussion on Processing Performance • Estimation of Power Consumption • 4. Conclusion 2017/9/18 MCSoC2017@Korea Univ., Seoul 14
  • 15. Example Process Network for Image Recognition (ORB Feature Extraction) • Image recognition using ORB Feature Descriptor • Process1: Preprocessing of the input image • Process2: Scaled Image Generation • Process3: Key-point extraction • Process4: Feature description • Process5: Matching/ Machine Learning • Process6: Output of the Recognition result P2 Scaled Image Generat ion P3 Key- point extracti on P1 Preproc essing of the input image Input image Resized images P4 Feature descripti on P5 Matching / Machine Learning P6 Output of the Recogni tion result Preprocessed image Image + Key-point information Feature Desctiptor Maching / Learning Result Controller 2017/9/18 MCSoC2017@Korea Univ., Seoul 15
  • 16. Sub Process Network for Scaled Images Generation Process 2017/9/18 MCSoC2017@Korea Univ., Seoul 16 (a) Image Scaling 1/1.2 (b) Image Scaling 1/1.2 (g) Image Scaling 1/1.2 (A) Image Scaling 1/1.2 (B) Image Scaling 1/1.44 (G) Image Scaling 1/3.58 1 1/1.2 1/1.44 1/2.99 1/3.58 1 1/1.2 1/1.44 1/3.58 (b) Independent Resize (a) Iterative Resize
  • 17. Processing time in PC environment • Image size: 4096×2380 • Software: OpenCV2.4.6.1 (ORB descriptor, resize) • Resize algorithm: linear interpolation • Hardware: • CPU: AMD Phenom II 905e(2.5GHz) • Observation • Independent resize takes more time • Reason: large input image size 2017/9/18 MCSoC2017@Korea Univ., Seoul 17 0 5 10 15 20 25 30 35 1 2 3 4 5 6 7 ProcessingTime(ms) The level of the Scale Pyramid Iterative Image Resize (1/1.2 x 8 times) Independent Image Resize (1/1.2)^n
  • 18. 3D-SCSS Mapping Example (case 1: Iterative Resize) Memory Chip Processor Chip1 R 10MB/s W 7MB/s Data transfer rate @ 10fps [MB/s] R 94MB/s W 65MB/s R 65MB/s W 45MB/s R 281MB/s W 194MB/s Processor Chip2 Processor Chip7 POINTS ・7 chips ・Each chip works independently ・no need of sync ・All the results are written to memory 0 20 40 60 80 100 a b c d e f g Datarate(MB/s) Read Write (a) resize 1/1.2 (b) resize 1/1.2 (g) resize 1/1.2 9.4MB 6.5MB 4.5MB 1.0MB 0.7MB 6.5MB Processor On-chip memory FIFO buffers are assign to memory chip FIFO:assigned to memory chip 4K(4096x2304) 1byte/pixel 2017/9/18 MCSoC2017@Korea Univ., Seoul 18
  • 19. 3D-SCSS Mapping Example (case 2: Independent Resize) Memory Chip Processor Chip1 (R 94MB/s) W 7MB/s R 94MB/s W 65MB/s (R 94MB/s) W 76MB/s R 658 (94)MB/s W 194MB/s Processor Chip2 Processor Chip7 (A) resize 1/1.2 (B) resize 1/1.44 (G) resize 1/3.58 0 20 40 60 80 100 A B C D E F G Datarate(MB/s) Read Write 9.4MB 6.5MB 4.5MB 0.7MB 0 20 40 60 80 100 A B C D E F G Datarate(MB/s) Read Write *with broadcast Processor On-chip memory FIFO:assigned to memory chip FIFO buffers are assign to memory chip 4K(4096x2304) 1byte/pixel Data transfer rate @ 10fps [MB/s] POINTS ・Broadcast may reduce data transfer ・need of sync when broadcasting ・All the results are written to memory 2017/9/18 MCSoC2017@Korea Univ., Seoul 19
  • 20. Discussion • Case 2: Independent Resize is better in terms of data transfer size • Data transfer reduces with broadcasting! • Different tradeoff from the conventional system. 2017/9/18 MCSoC2017@Korea Univ., Seoul 20 0.0 0.5 1.0 1.5 2.0 2.5 3.0 a b c d e f g DataTransferTime(ms) Sub Process Name Write Read 0.0 0.5 1.0 1.5 2.0 2.5 3.0 A B C D E F G DataTransferTime(ms) Sub Process Name Write Read Case 2: independent resizeCase 1: iterative resize 0 5 10 15 20 25 30 35 1 2 3 4 5 6 7 ProcessingTime(ms) The level of the Scale Pyramid Iterative Image Resize (1/1.2 x 8 times) Independent Image Resize (1/1.2)^n PC
  • 21. Power estimation for data transfer • TSV electric capacity: 0.3pF • Energy for 1-bit transfer: 0.3pJ (@1.0V) • Q[C]=CV, E[J]=QV=CV2 • E=0.3[pF] x (1.0)2 [V2]=0.3 [pJ] • 1 byte(=8bit) transfer: 2.4[pJ] • Estimated results (10fps) • Minimum 691.2μW (69.12μJ per frame) 0 500 1,000 1,500 2,000 2,500 (a) Iterative (b) Independent (b') Independent, broadcast Powerconsumption@10fps(µW) Network Mapping Write [µW] Read [µW] 2017/9/18 MCSoC2017@Korea Univ., Seoul 21
  • 22. Conclusion • Proposed the design method of 3D-SCSS(Three- Dimensional Standard-Chip Stacked System) with Standard BUS • Stacking by: TSV(Through Silicon Via)+Bump • Design method: KPN mapping • A design case of image scaling is studied • Improvement in performance can be expected by introducing another type of parallel processing, which is different tradeoff from that of under normal PC environment. • Communication and synchronization mechanism • By realizing broadcasting communication in the 3D-SCSS, it is expected to reduce further energy consumption. 2017/9/18 MCSoC2017@Korea Univ., Seoul 22