SlideShare a Scribd company logo
1 of 23
Download to read offline
DESIGNING EFFICIENT PARALLEL
PROCESSING
IN 3D STANDARD-CHIP STACKING
SYSTEM WITH STANDARD BUS
Takeshi Ohkawa* Kanemitsu Ootsu* Takashi Yokota*
Katsuya Kikuchi** Masahiro Aoyagi**
* Dept. of Information Systems Science,
Graduate School of Engineering, Utsunomiya University
** 3D Integration System Group,
Nano-electronics Research Institute, National Institute of
Advanced Industrial Science and Technology (AIST)
2017/9/18 MCSoC2017@Korea Univ., Seoul 1
Outline
• 1. Introduction
• 2. 3D-SCSS (Standard Chip Stacked System)
Design Method
• Model of target 3D-SCSS system and TSV technology
• Model-driven parallel system design flow
• 3. Design Example
• Overview of image recognition (ORB feature extraction)
• Parallel processing for Scale Pyramid Generation Process
• Discussion on Processing Performance
• Estimation of Power Consumption
• 4. Conclusion
2017/9/18 MCSoC2017@Korea Univ., Seoul 2
Research Background
• Requirement: High-performance at low energy
• Smartphones, tablets, information home appliances, IoT (Internet
of Things), M2M (Machine to Machine), automotive
• eg.) Image recognition (Local feature, DNN)
• General tradeoff: flexibility and energy efficiency
• Energy Efficiency
• ASIC: DSP/FPGA : SW=1000:10:1
• Flexibility
• State-of-the-art algorithm changes fast!
• Heterogeneous integration of general/special LSIs
→ High energy-efficiency, High-performance
2017/9/18 MCSoC2017@Korea Univ., Seoul 3
Heterogeneous Integration of LSIs
• Conventional
• Electronic Circuit Board integration
• TSV(Through Silicon Via) technology
• extends the limit of horizontal integration
• opens holes (Via) through silicon chips to connect
electric signals and power supply vertically to the back
of the chips
• Related technologies
• Interposer (a chip dedicated for inter-chip wiring,
electrical [7] or optical [8])
• wireless (vertical) communications between chips [9]
2017/9/18 MCSoC2017@Korea Univ., Seoul 4
[7] Kurita, Yoichiro, et al. "A novel" SMAFTI" package for inter-chip wide-band data transfer." Electronic Components and
Technology Conference, 2006. Proceedings. 56th. IEEE, 2006.
[8] Arakawa, Yasuhiko, et al. "Silicon photonics for next generation system integration platform." IEEE Communications
Magazine 51.3 (2013), pp. 72-77, 2013.
[9] Miura, Noriyuki, et al. "Analysis and design of inductive coupling and transceiver circuit for inductive inter-chip wireless
superconnect." IEEE Journal of Solid-State Circuits 40.4, pp. 829-837, 2005.
Research Objectives
• Proposal of 3D-SCSS with Standard BUS
(Three-Dimensional Standard-Chip Stacked System)
• To improve design productivity to satisfy performance and
energy requirement by exploiting parallelism of algorithm
• Study of an example case design
• Mapping parallel processing of image recognition
• Evaluate the performance and energy efficiency
2017/9/18 MCSoC2017@Korea Univ., Seoul 5
Outline
• 1. Introduction
• 2. 3D-SCSS (Standard Chip Stacked System)
Design Method
• Model of target 3D-SCSS system and TSV technology
• Model-driven parallel system design flow
• 3. Design Example
• Overview of image recognition (ORB feature extraction)
• Parallel processing for Scale Pyramid Generation Process
• Discussion on Processing Performance
• Estimation of Power Consumption
• 4. Conclusion
2017/9/18 MCSoC2017@Korea Univ., Seoul 6
Our Proposal: 3D-SCSS with Standard BUS
(Three-Dimensional Standard-Chip Stacked System)
• Heterogeneous 3D Integration of standard and special-
purpose LSI chips for performance/energy
• Define “Standard BUS” for Stock and Reuse
• By defining Standard Socket (like PCI on PC
Motherboard), just stacking the stocked chips for
building-up desired system.
• Physical size, layout
• Electrical voltage and impedance
• Layered communication protocols:
datalink, network, transport, application…
• Our GOAL
• To improve design productivity to satisfy performance and
energy requirement at application level
2017/9/18 MCSoC2017@Korea Univ., Seoul 7
3D Interconnect for Multi Core Internal Bus
■ 2D Interconnect
RF/
Analog
Memory
Logic
I/O
64 or 128-bit
On-Chip Bus
・・・Horizontal
2D system
- Long wiring for bus communication
- Limitation of signal line cumber
- Large size bus-driver circuits
- Many repeater buffer circuits
- Difficult of Integration
of heterogeneous chips
■ 3D Interconnect
3D system
- Short wiring for bus communication
- Architecture of wide Interchip bus
- Smaller size of bus interface circuits
(low capacitance TSVs)
TSVs/
micro-bumps
Array
1600-bit Wide Interchip Bus
・・・Vertical
TSV: Though Si Via
Heterogeneous Multi
LSI Chip Stack System
3D Interchip Communication with High Data Transfer
Rate between Heterogeneous Multi Chips
Multi Core System LSI
2017/9/18 MCSoC2017@Korea Univ., Seoul 8
Presented in 3D Test Workshop, Anaheim CA, USA, September 13, 2013
2017 update!
20,000-bit
Wide Bus Chip-to-Chip Interconnection to Realize
Scalable Stacking for Multi LSI Chip Stack System
Scalable
Stacking for
Heterogeneous
Multi LSI
Chip Stack
System
Si Interposer
Package Substrate
COOL Interconnect:
Wide Bus Chip-to-Chip Interconnection
Standard Interface Circuits TSV (10mmF, 50mmD)
Micro-bump
50μm
- Chip level stack process for chip stacking : Low internal stress bonding
- High density of 3D interconnect: fine-TSVs/micro-bumps, fine-pitch
2017/9/18 MCSoC2017@Korea Univ., Seoul 9
Presented in 3D Test Workshop, Anaheim CA, USA, September 13, 2013
Wide Bus Communication
Test LSI Device
Ultra Wide Bus Interface Circuits Block
Occupied Area : 2.16mm-square
Power, GND: 400 Al pads
Outer Connection
Wiring Bonding Pads
Al Pad Array Area: 2mm-square
40x40(=1600), 20μm-sq., 50μm-pitch
Chip Size 8.3 mm x 6.0 mm
Clock Freq. 50 MHz
Power Voltage 2.5 V
Bus Signal Number 1600
Bus Data Rate 6.4 GB/s
@1024 bit, 50 MHz
Bus Occupied Area 4.67 mm2
TSV/Bump Area 4 mm2 (86 %)
Driver Circuits Area 0.67 mm2 (14 %)
0.25μm-CMOS Technology
2017/9/18 MCSoC2017@Korea Univ., Seoul 10
Presented in 3D Test Workshop, Anaheim CA, USA, September 13, 2013
Summary of 3D-SCSS standard BUS [10]
Parameter Value
Physical size 2 mm x 2 mm
BUS location Center of the chip
Number of TSVs and bumps
[TSVs for data signal]
1600 (40x40)
[1024]
Signal Frequency 50 MHz
Communication Capacity 51.2 Gbps
Power consumption
(Flip-chip result)
97mW *
@ 50% toggle rate
2017/9/18 MCSoC2017@Korea Univ., Seoul 11
* Only the I/O power is measured separately.
Aoyagi, M.; Imura, F.; Nemoto, S.; Watanabe, N.; Kato, F.; Kikuchi, K.; Nakagawa, H.; Hagimoto, M.; Uchida,
H.; Matsumoto, Y., "Wide bus chip-to-chip interconnection technology using fine pitch bump joint array for 3D
LSI chip stacking," 2nd IEEE CPMT Symposium Japan, Kyoto, 2012, pp. 1-4, 2012.
Design method of 3D-SCSS using
standard BUS
• Conventional: There is no standard access method
between vertically stacked chips.
• Intra-chip: CPU/Memory BUS (eg. ARM AXI4)
• Inter-chip: High-speed serial, Ethernet, …
• Consider the future scalability/flexibility!
• Each chip would be enough complex system.
• Loosely-coupled architecture is needed.
2017/9/18 MCSoC2017@Korea Univ., Seoul 12
Mapping KPN[16] on 3D-SCSS
• KPN[16]: Kahn Process Network (Process and FIFO model)
• Mapping
• A process onto a processor element on a chip
• A buffer onto a memory element on a chip
• Application layer
• Process to process data exchange through a buffer
• Control process to reduce the KPN[16] complexity
2017/9/18 MCSoC2017@Korea Univ., Seoul 13
Proc
ess
Proc
ess
Proc
ess
Control
process
[16] G. Kahn, “The semantics of a simple language for parallel programming,” Proc. of the IFIP
Congress 74. North-Holland Publishing Co., pp. 471-475, 1974
Outline
• 1. Introduction
• 2. 3D-SCSS (Standard Chip Stacked System)
Design Method
• Model of target 3D-SCSS system and TSV technology
• Model-driven parallel system design flow
• 3. Design Example
• Overview of image recognition (ORB feature extraction)
• Parallel processing for Scale Pyramid Generation Process
• Discussion on Processing Performance
• Estimation of Power Consumption
• 4. Conclusion
2017/9/18 MCSoC2017@Korea Univ., Seoul 14
Example Process Network for Image
Recognition (ORB Feature Extraction)
• Image recognition using ORB Feature Descriptor
• Process1: Preprocessing of the input image
• Process2: Scaled Image Generation
• Process3: Key-point extraction
• Process4: Feature description
• Process5: Matching/ Machine Learning
• Process6: Output of the Recognition result
P2
Scaled
Image
Generat
ion
P3
Key-
point
extracti
on
P1
Preproc
essing
of the
input
image
Input image
Resized images
P4
Feature
descripti
on
P5
Matching
/
Machine
Learning
P6
Output
of the
Recogni
tion
result
Preprocessed image
Image + Key-point
information
Feature Desctiptor
Maching / Learning
Result
Controller
2017/9/18 MCSoC2017@Korea Univ., Seoul 15
Sub Process Network for Scaled
Images Generation Process
2017/9/18 MCSoC2017@Korea Univ., Seoul 16
(a)
Image
Scaling
1/1.2
(b)
Image
Scaling
1/1.2
(g)
Image
Scaling
1/1.2
(A)
Image
Scaling
1/1.2
(B)
Image
Scaling
1/1.44
(G)
Image
Scaling
1/3.58
1 1/1.2 1/1.44 1/2.99 1/3.58
1 1/1.2
1/1.44
1/3.58
(b) Independent
Resize
(a) Iterative
Resize
Processing time in PC environment
• Image size: 4096×2380
• Software: OpenCV2.4.6.1 (ORB descriptor, resize)
• Resize algorithm: linear interpolation
• Hardware:
• CPU: AMD Phenom II
905e(2.5GHz)
• Observation
• Independent resize
takes more time
• Reason: large input
image size
2017/9/18 MCSoC2017@Korea Univ., Seoul 17
0
5
10
15
20
25
30
35
1 2 3 4 5 6 7
ProcessingTime(ms)
The level of the Scale Pyramid
Iterative Image Resize
(1/1.2 x 8 times)
Independent Image
Resize (1/1.2)^n
3D-SCSS Mapping Example
(case 1: Iterative Resize)
Memory Chip
Processor Chip1
R 10MB/s
W 7MB/s
Data transfer rate @ 10fps [MB/s]
R 94MB/s
W 65MB/s
R 65MB/s
W 45MB/s
R 281MB/s
W 194MB/s
Processor Chip2
Processor Chip7
POINTS
・7 chips
・Each chip works independently
・no need of sync
・All the results are written to memory
0
20
40
60
80
100
a b c d e f g
Datarate(MB/s)
Read
Write
(a)
resize
1/1.2
(b)
resize
1/1.2
(g)
resize
1/1.2
9.4MB
6.5MB 4.5MB
1.0MB 0.7MB
6.5MB
Processor
On-chip memory
FIFO buffers are assign to memory chip
FIFO:assigned to memory chip
4K(4096x2304)
1byte/pixel
2017/9/18 MCSoC2017@Korea Univ., Seoul 18
3D-SCSS Mapping Example
(case 2: Independent Resize)
Memory Chip
Processor Chip1
(R 94MB/s)
W 7MB/s
R 94MB/s
W 65MB/s
(R 94MB/s)
W 76MB/s
R 658 (94)MB/s
W 194MB/s
Processor Chip2
Processor Chip7
(A)
resize
1/1.2
(B)
resize
1/1.44
(G)
resize
1/3.58
0
20
40
60
80
100
A B C D E F G
Datarate(MB/s)
Read
Write
9.4MB 6.5MB
4.5MB
0.7MB
0
20
40
60
80
100
A B C D E F G
Datarate(MB/s)
Read
Write
*with broadcast
Processor
On-chip memory
FIFO:assigned to memory chip
FIFO buffers are assign to memory chip
4K(4096x2304)
1byte/pixel
Data transfer rate @ 10fps [MB/s]
POINTS
・Broadcast may reduce data transfer
・need of sync when broadcasting
・All the results are written to memory
2017/9/18 MCSoC2017@Korea Univ., Seoul 19
Discussion
• Case 2: Independent Resize is better
in terms of data transfer size
• Data transfer reduces with broadcasting!
• Different tradeoff from the
conventional system.
2017/9/18 MCSoC2017@Korea Univ., Seoul 20
0.0
0.5
1.0
1.5
2.0
2.5
3.0
a b c d e f g
DataTransferTime(ms)
Sub Process Name
Write
Read
0.0
0.5
1.0
1.5
2.0
2.5
3.0
A B C D E F G
DataTransferTime(ms)
Sub Process Name
Write
Read
Case 2: independent resizeCase 1: iterative resize
0
5
10
15
20
25
30
35
1 2 3 4 5 6 7
ProcessingTime(ms)
The level of the Scale Pyramid
Iterative Image Resize
(1/1.2 x 8 times)
Independent Image
Resize (1/1.2)^n
PC
Power estimation for data transfer
• TSV electric capacity: 0.3pF
• Energy for 1-bit transfer: 0.3pJ (@1.0V)
• Q[C]=CV, E[J]=QV=CV2
• E=0.3[pF] x (1.0)2 [V2]=0.3 [pJ]
• 1 byte(=8bit) transfer: 2.4[pJ]
• Estimated results (10fps)
• Minimum 691.2μW
(69.12μJ per frame)
0
500
1,000
1,500
2,000
2,500
(a) Iterative (b) Independent (b') Independent,
broadcast
Powerconsumption@10fps(µW)
Network Mapping
Write [µW]
Read [µW]
2017/9/18 MCSoC2017@Korea Univ., Seoul 21
Conclusion
• Proposed the design method of 3D-SCSS(Three-
Dimensional Standard-Chip Stacked System) with
Standard BUS
• Stacking by: TSV(Through Silicon Via)+Bump
• Design method: KPN mapping
• A design case of image scaling is studied
• Improvement in performance can be expected by
introducing another type of parallel processing, which is
different tradeoff from that of under normal PC environment.
• Communication and synchronization mechanism
• By realizing broadcasting communication in the 3D-SCSS, it
is expected to reduce further energy consumption.
2017/9/18 MCSoC2017@Korea Univ., Seoul 22
THANK YOU
2017/9/18 MCSoC2017@Korea Univ., Seoul 23

More Related Content

What's hot

IRJET- Significant Neural Networks for Classification of Product Images
IRJET- Significant Neural Networks for Classification of Product ImagesIRJET- Significant Neural Networks for Classification of Product Images
IRJET- Significant Neural Networks for Classification of Product ImagesIRJET Journal
 
AN EFFICIENT M-ARY QIM DATA HIDING ALGORITHM FOR THE APPLICATION TO IMAGE ERR...
AN EFFICIENT M-ARY QIM DATA HIDING ALGORITHM FOR THE APPLICATION TO IMAGE ERR...AN EFFICIENT M-ARY QIM DATA HIDING ALGORITHM FOR THE APPLICATION TO IMAGE ERR...
AN EFFICIENT M-ARY QIM DATA HIDING ALGORITHM FOR THE APPLICATION TO IMAGE ERR...IJNSA Journal
 
Road Network Extraction using Satellite Imagery.
Road Network Extraction using Satellite Imagery.Road Network Extraction using Satellite Imagery.
Road Network Extraction using Satellite Imagery.SUMITRAJ312049
 
Simulating the triba noc architecture
Simulating the triba noc architectureSimulating the triba noc architecture
Simulating the triba noc architectureijmnct
 
IRJET- Study of MIMO Precoding Techniques and their Application using Joi...
IRJET-  	  Study of MIMO Precoding Techniques and their Application using Joi...IRJET-  	  Study of MIMO Precoding Techniques and their Application using Joi...
IRJET- Study of MIMO Precoding Techniques and their Application using Joi...IRJET Journal
 
A Review: Integrating SUGAR simulating tool and MEMS sensor
A Review: Integrating SUGAR simulating tool and MEMS sensorA Review: Integrating SUGAR simulating tool and MEMS sensor
A Review: Integrating SUGAR simulating tool and MEMS sensorIJERA Editor
 
Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...
Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...
Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...ijcsit
 
“Introduction to DNN Model Compression Techniques,” a Presentation from Xailient
“Introduction to DNN Model Compression Techniques,” a Presentation from Xailient“Introduction to DNN Model Compression Techniques,” a Presentation from Xailient
“Introduction to DNN Model Compression Techniques,” a Presentation from XailientEdge AI and Vision Alliance
 
Effective Sparse Matrix Representation for the GPU Architectures
 Effective Sparse Matrix Representation for the GPU Architectures Effective Sparse Matrix Representation for the GPU Architectures
Effective Sparse Matrix Representation for the GPU ArchitecturesIJCSEA Journal
 
Text Recognition using Convolutional Neural Network: A Review
Text Recognition using Convolutional Neural Network: A ReviewText Recognition using Convolutional Neural Network: A Review
Text Recognition using Convolutional Neural Network: A ReviewIRJET Journal
 
IRJET- Automated Detection of Diabetic Retinopathy using Compressed Sensing
IRJET- Automated Detection of Diabetic Retinopathy using Compressed SensingIRJET- Automated Detection of Diabetic Retinopathy using Compressed Sensing
IRJET- Automated Detection of Diabetic Retinopathy using Compressed SensingIRJET Journal
 
Deep Learning for Structure-from-Motion (SfM)
Deep Learning for Structure-from-Motion (SfM)Deep Learning for Structure-from-Motion (SfM)
Deep Learning for Structure-from-Motion (SfM)PetteriTeikariPhD
 
Modelling of next zen memory cell using low power consuming high speed nano d...
Modelling of next zen memory cell using low power consuming high speed nano d...Modelling of next zen memory cell using low power consuming high speed nano d...
Modelling of next zen memory cell using low power consuming high speed nano d...eSAT Journals
 
“Applying the Right Deep Learning Model with the Right Data for Your Applicat...
“Applying the Right Deep Learning Model with the Right Data for Your Applicat...“Applying the Right Deep Learning Model with the Right Data for Your Applicat...
“Applying the Right Deep Learning Model with the Right Data for Your Applicat...Edge AI and Vision Alliance
 
Analysis of image storage and retrieval in graded memory
Analysis of image storage and retrieval in graded memoryAnalysis of image storage and retrieval in graded memory
Analysis of image storage and retrieval in graded memoryeSAT Journals
 
Microstructure anlaysis and enhancement of nodular cast iron using digital im...
Microstructure anlaysis and enhancement of nodular cast iron using digital im...Microstructure anlaysis and enhancement of nodular cast iron using digital im...
Microstructure anlaysis and enhancement of nodular cast iron using digital im...eSAT Journals
 
Graph-based Framework for Function Splitting in CRAN
Graph-based Framework for Function Splitting in CRANGraph-based Framework for Function Splitting in CRAN
Graph-based Framework for Function Splitting in CRANJingchu Liu
 

What's hot (20)

IRJET- Significant Neural Networks for Classification of Product Images
IRJET- Significant Neural Networks for Classification of Product ImagesIRJET- Significant Neural Networks for Classification of Product Images
IRJET- Significant Neural Networks for Classification of Product Images
 
56
5656
56
 
AN EFFICIENT M-ARY QIM DATA HIDING ALGORITHM FOR THE APPLICATION TO IMAGE ERR...
AN EFFICIENT M-ARY QIM DATA HIDING ALGORITHM FOR THE APPLICATION TO IMAGE ERR...AN EFFICIENT M-ARY QIM DATA HIDING ALGORITHM FOR THE APPLICATION TO IMAGE ERR...
AN EFFICIENT M-ARY QIM DATA HIDING ALGORITHM FOR THE APPLICATION TO IMAGE ERR...
 
Road Network Extraction using Satellite Imagery.
Road Network Extraction using Satellite Imagery.Road Network Extraction using Satellite Imagery.
Road Network Extraction using Satellite Imagery.
 
Simulating the triba noc architecture
Simulating the triba noc architectureSimulating the triba noc architecture
Simulating the triba noc architecture
 
IRJET- Study of MIMO Precoding Techniques and their Application using Joi...
IRJET-  	  Study of MIMO Precoding Techniques and their Application using Joi...IRJET-  	  Study of MIMO Precoding Techniques and their Application using Joi...
IRJET- Study of MIMO Precoding Techniques and their Application using Joi...
 
Ijciet 10 02_043
Ijciet 10 02_043Ijciet 10 02_043
Ijciet 10 02_043
 
A Review: Integrating SUGAR simulating tool and MEMS sensor
A Review: Integrating SUGAR simulating tool and MEMS sensorA Review: Integrating SUGAR simulating tool and MEMS sensor
A Review: Integrating SUGAR simulating tool and MEMS sensor
 
Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...
Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...
Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...
 
“Introduction to DNN Model Compression Techniques,” a Presentation from Xailient
“Introduction to DNN Model Compression Techniques,” a Presentation from Xailient“Introduction to DNN Model Compression Techniques,” a Presentation from Xailient
“Introduction to DNN Model Compression Techniques,” a Presentation from Xailient
 
Effective Sparse Matrix Representation for the GPU Architectures
 Effective Sparse Matrix Representation for the GPU Architectures Effective Sparse Matrix Representation for the GPU Architectures
Effective Sparse Matrix Representation for the GPU Architectures
 
Text Recognition using Convolutional Neural Network: A Review
Text Recognition using Convolutional Neural Network: A ReviewText Recognition using Convolutional Neural Network: A Review
Text Recognition using Convolutional Neural Network: A Review
 
IRJET- Automated Detection of Diabetic Retinopathy using Compressed Sensing
IRJET- Automated Detection of Diabetic Retinopathy using Compressed SensingIRJET- Automated Detection of Diabetic Retinopathy using Compressed Sensing
IRJET- Automated Detection of Diabetic Retinopathy using Compressed Sensing
 
Deep Learning for Structure-from-Motion (SfM)
Deep Learning for Structure-from-Motion (SfM)Deep Learning for Structure-from-Motion (SfM)
Deep Learning for Structure-from-Motion (SfM)
 
Modelling of next zen memory cell using low power consuming high speed nano d...
Modelling of next zen memory cell using low power consuming high speed nano d...Modelling of next zen memory cell using low power consuming high speed nano d...
Modelling of next zen memory cell using low power consuming high speed nano d...
 
“Applying the Right Deep Learning Model with the Right Data for Your Applicat...
“Applying the Right Deep Learning Model with the Right Data for Your Applicat...“Applying the Right Deep Learning Model with the Right Data for Your Applicat...
“Applying the Right Deep Learning Model with the Right Data for Your Applicat...
 
Analysis of image storage and retrieval in graded memory
Analysis of image storage and retrieval in graded memoryAnalysis of image storage and retrieval in graded memory
Analysis of image storage and retrieval in graded memory
 
Microstructure anlaysis and enhancement of nodular cast iron using digital im...
Microstructure anlaysis and enhancement of nodular cast iron using digital im...Microstructure anlaysis and enhancement of nodular cast iron using digital im...
Microstructure anlaysis and enhancement of nodular cast iron using digital im...
 
Graph-based Framework for Function Splitting in CRAN
Graph-based Framework for Function Splitting in CRANGraph-based Framework for Function Splitting in CRAN
Graph-based Framework for Function Splitting in CRAN
 
Generations of computers
Generations of computersGenerations of computers
Generations of computers
 

Similar to 2017 09-ohkawa-MCSoC2017-presen

Applying Genetic Algorithm to Solve Partitioning and Mapping Problem for Mesh...
Applying Genetic Algorithm to Solve Partitioning and Mapping Problem for Mesh...Applying Genetic Algorithm to Solve Partitioning and Mapping Problem for Mesh...
Applying Genetic Algorithm to Solve Partitioning and Mapping Problem for Mesh...AIRCC Publishing Corporation
 
APPLYING GENETIC ALGORITHM TO SOLVE PARTITIONING AND MAPPING PROBLEM FOR MESH...
APPLYING GENETIC ALGORITHM TO SOLVE PARTITIONING AND MAPPING PROBLEM FOR MESH...APPLYING GENETIC ALGORITHM TO SOLVE PARTITIONING AND MAPPING PROBLEM FOR MESH...
APPLYING GENETIC ALGORITHM TO SOLVE PARTITIONING AND MAPPING PROBLEM FOR MESH...ijcsit
 
Design and Implementation of JPEG CODEC using NoC
Design and Implementation of JPEG CODEC using NoCDesign and Implementation of JPEG CODEC using NoC
Design and Implementation of JPEG CODEC using NoCIRJET Journal
 
Trends and challenges in IP based SOC design
Trends and challenges in IP based SOC designTrends and challenges in IP based SOC design
Trends and challenges in IP based SOC designAishwaryaRavishankar8
 
Lecture_IIITD.pptx
Lecture_IIITD.pptxLecture_IIITD.pptx
Lecture_IIITD.pptxachakracu
 
Design of Tele command SOC-IP by AES Cryptographic Method Using VHDL
Design of Tele command SOC-IP by AES Cryptographic Method Using VHDLDesign of Tele command SOC-IP by AES Cryptographic Method Using VHDL
Design of Tele command SOC-IP by AES Cryptographic Method Using VHDLdbpublications
 
IRJET- Intelligent Character Recognition of Handwritten Characters
IRJET- Intelligent Character Recognition of Handwritten CharactersIRJET- Intelligent Character Recognition of Handwritten Characters
IRJET- Intelligent Character Recognition of Handwritten CharactersIRJET Journal
 
An octa core processor with shared memory and message-passing
An octa core processor with shared memory and message-passingAn octa core processor with shared memory and message-passing
An octa core processor with shared memory and message-passingeSAT Journals
 
IRJET- 3D Object Recognition of Car Image Detection
IRJET-  	  3D Object Recognition of Car Image DetectionIRJET-  	  3D Object Recognition of Car Image Detection
IRJET- 3D Object Recognition of Car Image DetectionIRJET Journal
 
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSINGHOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSINGcscpconf
 
An Effect of Compressive Sensing on Image Steganalysis
An Effect of Compressive Sensing on Image SteganalysisAn Effect of Compressive Sensing on Image Steganalysis
An Effect of Compressive Sensing on Image SteganalysisIRJET Journal
 
An Investigation towards Effectiveness in Image Enhancement Process in MPSoC
An Investigation towards Effectiveness in Image Enhancement Process in MPSoC An Investigation towards Effectiveness in Image Enhancement Process in MPSoC
An Investigation towards Effectiveness in Image Enhancement Process in MPSoC IJECEIAES
 
Ramprakash Resume
Ramprakash ResumeRamprakash Resume
Ramprakash ResumeRam Prakash
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentIJERD Editor
 
20607-39024-1-PB.pdf
20607-39024-1-PB.pdf20607-39024-1-PB.pdf
20607-39024-1-PB.pdfIjictTeam
 
IRJET- Flexible DSP Accelerator Architecture using Carry Lookahead Tree
IRJET- Flexible DSP Accelerator Architecture using Carry Lookahead TreeIRJET- Flexible DSP Accelerator Architecture using Carry Lookahead Tree
IRJET- Flexible DSP Accelerator Architecture using Carry Lookahead TreeIRJET Journal
 
A Flexible Software/Hardware Adaptive Network for Embedded Distributed Archit...
A Flexible Software/Hardware Adaptive Network for Embedded Distributed Archit...A Flexible Software/Hardware Adaptive Network for Embedded Distributed Archit...
A Flexible Software/Hardware Adaptive Network for Embedded Distributed Archit...csijjournal
 

Similar to 2017 09-ohkawa-MCSoC2017-presen (20)

Applying Genetic Algorithm to Solve Partitioning and Mapping Problem for Mesh...
Applying Genetic Algorithm to Solve Partitioning and Mapping Problem for Mesh...Applying Genetic Algorithm to Solve Partitioning and Mapping Problem for Mesh...
Applying Genetic Algorithm to Solve Partitioning and Mapping Problem for Mesh...
 
APPLYING GENETIC ALGORITHM TO SOLVE PARTITIONING AND MAPPING PROBLEM FOR MESH...
APPLYING GENETIC ALGORITHM TO SOLVE PARTITIONING AND MAPPING PROBLEM FOR MESH...APPLYING GENETIC ALGORITHM TO SOLVE PARTITIONING AND MAPPING PROBLEM FOR MESH...
APPLYING GENETIC ALGORITHM TO SOLVE PARTITIONING AND MAPPING PROBLEM FOR MESH...
 
T4408103107
T4408103107T4408103107
T4408103107
 
Design and Implementation of JPEG CODEC using NoC
Design and Implementation of JPEG CODEC using NoCDesign and Implementation of JPEG CODEC using NoC
Design and Implementation of JPEG CODEC using NoC
 
Trends and challenges in IP based SOC design
Trends and challenges in IP based SOC designTrends and challenges in IP based SOC design
Trends and challenges in IP based SOC design
 
Lecture_IIITD.pptx
Lecture_IIITD.pptxLecture_IIITD.pptx
Lecture_IIITD.pptx
 
Design of Tele command SOC-IP by AES Cryptographic Method Using VHDL
Design of Tele command SOC-IP by AES Cryptographic Method Using VHDLDesign of Tele command SOC-IP by AES Cryptographic Method Using VHDL
Design of Tele command SOC-IP by AES Cryptographic Method Using VHDL
 
IRJET- Intelligent Character Recognition of Handwritten Characters
IRJET- Intelligent Character Recognition of Handwritten CharactersIRJET- Intelligent Character Recognition of Handwritten Characters
IRJET- Intelligent Character Recognition of Handwritten Characters
 
An octa core processor with shared memory and message-passing
An octa core processor with shared memory and message-passingAn octa core processor with shared memory and message-passing
An octa core processor with shared memory and message-passing
 
imagefiltervhdl.pptx
imagefiltervhdl.pptximagefiltervhdl.pptx
imagefiltervhdl.pptx
 
IRJET- 3D Object Recognition of Car Image Detection
IRJET-  	  3D Object Recognition of Car Image DetectionIRJET-  	  3D Object Recognition of Car Image Detection
IRJET- 3D Object Recognition of Car Image Detection
 
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSINGHOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
 
An Effect of Compressive Sensing on Image Steganalysis
An Effect of Compressive Sensing on Image SteganalysisAn Effect of Compressive Sensing on Image Steganalysis
An Effect of Compressive Sensing on Image Steganalysis
 
DIGEST PODCAST
DIGEST PODCASTDIGEST PODCAST
DIGEST PODCAST
 
An Investigation towards Effectiveness in Image Enhancement Process in MPSoC
An Investigation towards Effectiveness in Image Enhancement Process in MPSoC An Investigation towards Effectiveness in Image Enhancement Process in MPSoC
An Investigation towards Effectiveness in Image Enhancement Process in MPSoC
 
Ramprakash Resume
Ramprakash ResumeRamprakash Resume
Ramprakash Resume
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
20607-39024-1-PB.pdf
20607-39024-1-PB.pdf20607-39024-1-PB.pdf
20607-39024-1-PB.pdf
 
IRJET- Flexible DSP Accelerator Architecture using Carry Lookahead Tree
IRJET- Flexible DSP Accelerator Architecture using Carry Lookahead TreeIRJET- Flexible DSP Accelerator Architecture using Carry Lookahead Tree
IRJET- Flexible DSP Accelerator Architecture using Carry Lookahead Tree
 
A Flexible Software/Hardware Adaptive Network for Embedded Distributed Archit...
A Flexible Software/Hardware Adaptive Network for Embedded Distributed Archit...A Flexible Software/Hardware Adaptive Network for Embedded Distributed Archit...
A Flexible Software/Hardware Adaptive Network for Embedded Distributed Archit...
 

Recently uploaded

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 

Recently uploaded (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 

2017 09-ohkawa-MCSoC2017-presen

  • 1. DESIGNING EFFICIENT PARALLEL PROCESSING IN 3D STANDARD-CHIP STACKING SYSTEM WITH STANDARD BUS Takeshi Ohkawa* Kanemitsu Ootsu* Takashi Yokota* Katsuya Kikuchi** Masahiro Aoyagi** * Dept. of Information Systems Science, Graduate School of Engineering, Utsunomiya University ** 3D Integration System Group, Nano-electronics Research Institute, National Institute of Advanced Industrial Science and Technology (AIST) 2017/9/18 MCSoC2017@Korea Univ., Seoul 1
  • 2. Outline • 1. Introduction • 2. 3D-SCSS (Standard Chip Stacked System) Design Method • Model of target 3D-SCSS system and TSV technology • Model-driven parallel system design flow • 3. Design Example • Overview of image recognition (ORB feature extraction) • Parallel processing for Scale Pyramid Generation Process • Discussion on Processing Performance • Estimation of Power Consumption • 4. Conclusion 2017/9/18 MCSoC2017@Korea Univ., Seoul 2
  • 3. Research Background • Requirement: High-performance at low energy • Smartphones, tablets, information home appliances, IoT (Internet of Things), M2M (Machine to Machine), automotive • eg.) Image recognition (Local feature, DNN) • General tradeoff: flexibility and energy efficiency • Energy Efficiency • ASIC: DSP/FPGA : SW=1000:10:1 • Flexibility • State-of-the-art algorithm changes fast! • Heterogeneous integration of general/special LSIs → High energy-efficiency, High-performance 2017/9/18 MCSoC2017@Korea Univ., Seoul 3
  • 4. Heterogeneous Integration of LSIs • Conventional • Electronic Circuit Board integration • TSV(Through Silicon Via) technology • extends the limit of horizontal integration • opens holes (Via) through silicon chips to connect electric signals and power supply vertically to the back of the chips • Related technologies • Interposer (a chip dedicated for inter-chip wiring, electrical [7] or optical [8]) • wireless (vertical) communications between chips [9] 2017/9/18 MCSoC2017@Korea Univ., Seoul 4 [7] Kurita, Yoichiro, et al. "A novel" SMAFTI" package for inter-chip wide-band data transfer." Electronic Components and Technology Conference, 2006. Proceedings. 56th. IEEE, 2006. [8] Arakawa, Yasuhiko, et al. "Silicon photonics for next generation system integration platform." IEEE Communications Magazine 51.3 (2013), pp. 72-77, 2013. [9] Miura, Noriyuki, et al. "Analysis and design of inductive coupling and transceiver circuit for inductive inter-chip wireless superconnect." IEEE Journal of Solid-State Circuits 40.4, pp. 829-837, 2005.
  • 5. Research Objectives • Proposal of 3D-SCSS with Standard BUS (Three-Dimensional Standard-Chip Stacked System) • To improve design productivity to satisfy performance and energy requirement by exploiting parallelism of algorithm • Study of an example case design • Mapping parallel processing of image recognition • Evaluate the performance and energy efficiency 2017/9/18 MCSoC2017@Korea Univ., Seoul 5
  • 6. Outline • 1. Introduction • 2. 3D-SCSS (Standard Chip Stacked System) Design Method • Model of target 3D-SCSS system and TSV technology • Model-driven parallel system design flow • 3. Design Example • Overview of image recognition (ORB feature extraction) • Parallel processing for Scale Pyramid Generation Process • Discussion on Processing Performance • Estimation of Power Consumption • 4. Conclusion 2017/9/18 MCSoC2017@Korea Univ., Seoul 6
  • 7. Our Proposal: 3D-SCSS with Standard BUS (Three-Dimensional Standard-Chip Stacked System) • Heterogeneous 3D Integration of standard and special- purpose LSI chips for performance/energy • Define “Standard BUS” for Stock and Reuse • By defining Standard Socket (like PCI on PC Motherboard), just stacking the stocked chips for building-up desired system. • Physical size, layout • Electrical voltage and impedance • Layered communication protocols: datalink, network, transport, application… • Our GOAL • To improve design productivity to satisfy performance and energy requirement at application level 2017/9/18 MCSoC2017@Korea Univ., Seoul 7
  • 8. 3D Interconnect for Multi Core Internal Bus ■ 2D Interconnect RF/ Analog Memory Logic I/O 64 or 128-bit On-Chip Bus ・・・Horizontal 2D system - Long wiring for bus communication - Limitation of signal line cumber - Large size bus-driver circuits - Many repeater buffer circuits - Difficult of Integration of heterogeneous chips ■ 3D Interconnect 3D system - Short wiring for bus communication - Architecture of wide Interchip bus - Smaller size of bus interface circuits (low capacitance TSVs) TSVs/ micro-bumps Array 1600-bit Wide Interchip Bus ・・・Vertical TSV: Though Si Via Heterogeneous Multi LSI Chip Stack System 3D Interchip Communication with High Data Transfer Rate between Heterogeneous Multi Chips Multi Core System LSI 2017/9/18 MCSoC2017@Korea Univ., Seoul 8 Presented in 3D Test Workshop, Anaheim CA, USA, September 13, 2013 2017 update! 20,000-bit
  • 9. Wide Bus Chip-to-Chip Interconnection to Realize Scalable Stacking for Multi LSI Chip Stack System Scalable Stacking for Heterogeneous Multi LSI Chip Stack System Si Interposer Package Substrate COOL Interconnect: Wide Bus Chip-to-Chip Interconnection Standard Interface Circuits TSV (10mmF, 50mmD) Micro-bump 50μm - Chip level stack process for chip stacking : Low internal stress bonding - High density of 3D interconnect: fine-TSVs/micro-bumps, fine-pitch 2017/9/18 MCSoC2017@Korea Univ., Seoul 9 Presented in 3D Test Workshop, Anaheim CA, USA, September 13, 2013
  • 10. Wide Bus Communication Test LSI Device Ultra Wide Bus Interface Circuits Block Occupied Area : 2.16mm-square Power, GND: 400 Al pads Outer Connection Wiring Bonding Pads Al Pad Array Area: 2mm-square 40x40(=1600), 20μm-sq., 50μm-pitch Chip Size 8.3 mm x 6.0 mm Clock Freq. 50 MHz Power Voltage 2.5 V Bus Signal Number 1600 Bus Data Rate 6.4 GB/s @1024 bit, 50 MHz Bus Occupied Area 4.67 mm2 TSV/Bump Area 4 mm2 (86 %) Driver Circuits Area 0.67 mm2 (14 %) 0.25μm-CMOS Technology 2017/9/18 MCSoC2017@Korea Univ., Seoul 10 Presented in 3D Test Workshop, Anaheim CA, USA, September 13, 2013
  • 11. Summary of 3D-SCSS standard BUS [10] Parameter Value Physical size 2 mm x 2 mm BUS location Center of the chip Number of TSVs and bumps [TSVs for data signal] 1600 (40x40) [1024] Signal Frequency 50 MHz Communication Capacity 51.2 Gbps Power consumption (Flip-chip result) 97mW * @ 50% toggle rate 2017/9/18 MCSoC2017@Korea Univ., Seoul 11 * Only the I/O power is measured separately. Aoyagi, M.; Imura, F.; Nemoto, S.; Watanabe, N.; Kato, F.; Kikuchi, K.; Nakagawa, H.; Hagimoto, M.; Uchida, H.; Matsumoto, Y., "Wide bus chip-to-chip interconnection technology using fine pitch bump joint array for 3D LSI chip stacking," 2nd IEEE CPMT Symposium Japan, Kyoto, 2012, pp. 1-4, 2012.
  • 12. Design method of 3D-SCSS using standard BUS • Conventional: There is no standard access method between vertically stacked chips. • Intra-chip: CPU/Memory BUS (eg. ARM AXI4) • Inter-chip: High-speed serial, Ethernet, … • Consider the future scalability/flexibility! • Each chip would be enough complex system. • Loosely-coupled architecture is needed. 2017/9/18 MCSoC2017@Korea Univ., Seoul 12
  • 13. Mapping KPN[16] on 3D-SCSS • KPN[16]: Kahn Process Network (Process and FIFO model) • Mapping • A process onto a processor element on a chip • A buffer onto a memory element on a chip • Application layer • Process to process data exchange through a buffer • Control process to reduce the KPN[16] complexity 2017/9/18 MCSoC2017@Korea Univ., Seoul 13 Proc ess Proc ess Proc ess Control process [16] G. Kahn, “The semantics of a simple language for parallel programming,” Proc. of the IFIP Congress 74. North-Holland Publishing Co., pp. 471-475, 1974
  • 14. Outline • 1. Introduction • 2. 3D-SCSS (Standard Chip Stacked System) Design Method • Model of target 3D-SCSS system and TSV technology • Model-driven parallel system design flow • 3. Design Example • Overview of image recognition (ORB feature extraction) • Parallel processing for Scale Pyramid Generation Process • Discussion on Processing Performance • Estimation of Power Consumption • 4. Conclusion 2017/9/18 MCSoC2017@Korea Univ., Seoul 14
  • 15. Example Process Network for Image Recognition (ORB Feature Extraction) • Image recognition using ORB Feature Descriptor • Process1: Preprocessing of the input image • Process2: Scaled Image Generation • Process3: Key-point extraction • Process4: Feature description • Process5: Matching/ Machine Learning • Process6: Output of the Recognition result P2 Scaled Image Generat ion P3 Key- point extracti on P1 Preproc essing of the input image Input image Resized images P4 Feature descripti on P5 Matching / Machine Learning P6 Output of the Recogni tion result Preprocessed image Image + Key-point information Feature Desctiptor Maching / Learning Result Controller 2017/9/18 MCSoC2017@Korea Univ., Seoul 15
  • 16. Sub Process Network for Scaled Images Generation Process 2017/9/18 MCSoC2017@Korea Univ., Seoul 16 (a) Image Scaling 1/1.2 (b) Image Scaling 1/1.2 (g) Image Scaling 1/1.2 (A) Image Scaling 1/1.2 (B) Image Scaling 1/1.44 (G) Image Scaling 1/3.58 1 1/1.2 1/1.44 1/2.99 1/3.58 1 1/1.2 1/1.44 1/3.58 (b) Independent Resize (a) Iterative Resize
  • 17. Processing time in PC environment • Image size: 4096×2380 • Software: OpenCV2.4.6.1 (ORB descriptor, resize) • Resize algorithm: linear interpolation • Hardware: • CPU: AMD Phenom II 905e(2.5GHz) • Observation • Independent resize takes more time • Reason: large input image size 2017/9/18 MCSoC2017@Korea Univ., Seoul 17 0 5 10 15 20 25 30 35 1 2 3 4 5 6 7 ProcessingTime(ms) The level of the Scale Pyramid Iterative Image Resize (1/1.2 x 8 times) Independent Image Resize (1/1.2)^n
  • 18. 3D-SCSS Mapping Example (case 1: Iterative Resize) Memory Chip Processor Chip1 R 10MB/s W 7MB/s Data transfer rate @ 10fps [MB/s] R 94MB/s W 65MB/s R 65MB/s W 45MB/s R 281MB/s W 194MB/s Processor Chip2 Processor Chip7 POINTS ・7 chips ・Each chip works independently ・no need of sync ・All the results are written to memory 0 20 40 60 80 100 a b c d e f g Datarate(MB/s) Read Write (a) resize 1/1.2 (b) resize 1/1.2 (g) resize 1/1.2 9.4MB 6.5MB 4.5MB 1.0MB 0.7MB 6.5MB Processor On-chip memory FIFO buffers are assign to memory chip FIFO:assigned to memory chip 4K(4096x2304) 1byte/pixel 2017/9/18 MCSoC2017@Korea Univ., Seoul 18
  • 19. 3D-SCSS Mapping Example (case 2: Independent Resize) Memory Chip Processor Chip1 (R 94MB/s) W 7MB/s R 94MB/s W 65MB/s (R 94MB/s) W 76MB/s R 658 (94)MB/s W 194MB/s Processor Chip2 Processor Chip7 (A) resize 1/1.2 (B) resize 1/1.44 (G) resize 1/3.58 0 20 40 60 80 100 A B C D E F G Datarate(MB/s) Read Write 9.4MB 6.5MB 4.5MB 0.7MB 0 20 40 60 80 100 A B C D E F G Datarate(MB/s) Read Write *with broadcast Processor On-chip memory FIFO:assigned to memory chip FIFO buffers are assign to memory chip 4K(4096x2304) 1byte/pixel Data transfer rate @ 10fps [MB/s] POINTS ・Broadcast may reduce data transfer ・need of sync when broadcasting ・All the results are written to memory 2017/9/18 MCSoC2017@Korea Univ., Seoul 19
  • 20. Discussion • Case 2: Independent Resize is better in terms of data transfer size • Data transfer reduces with broadcasting! • Different tradeoff from the conventional system. 2017/9/18 MCSoC2017@Korea Univ., Seoul 20 0.0 0.5 1.0 1.5 2.0 2.5 3.0 a b c d e f g DataTransferTime(ms) Sub Process Name Write Read 0.0 0.5 1.0 1.5 2.0 2.5 3.0 A B C D E F G DataTransferTime(ms) Sub Process Name Write Read Case 2: independent resizeCase 1: iterative resize 0 5 10 15 20 25 30 35 1 2 3 4 5 6 7 ProcessingTime(ms) The level of the Scale Pyramid Iterative Image Resize (1/1.2 x 8 times) Independent Image Resize (1/1.2)^n PC
  • 21. Power estimation for data transfer • TSV electric capacity: 0.3pF • Energy for 1-bit transfer: 0.3pJ (@1.0V) • Q[C]=CV, E[J]=QV=CV2 • E=0.3[pF] x (1.0)2 [V2]=0.3 [pJ] • 1 byte(=8bit) transfer: 2.4[pJ] • Estimated results (10fps) • Minimum 691.2μW (69.12μJ per frame) 0 500 1,000 1,500 2,000 2,500 (a) Iterative (b) Independent (b') Independent, broadcast Powerconsumption@10fps(µW) Network Mapping Write [µW] Read [µW] 2017/9/18 MCSoC2017@Korea Univ., Seoul 21
  • 22. Conclusion • Proposed the design method of 3D-SCSS(Three- Dimensional Standard-Chip Stacked System) with Standard BUS • Stacking by: TSV(Through Silicon Via)+Bump • Design method: KPN mapping • A design case of image scaling is studied • Improvement in performance can be expected by introducing another type of parallel processing, which is different tradeoff from that of under normal PC environment. • Communication and synchronization mechanism • By realizing broadcasting communication in the 3D-SCSS, it is expected to reduce further energy consumption. 2017/9/18 MCSoC2017@Korea Univ., Seoul 22