n       l       l             • n       l       l 
n n n n n n 
Intel Single Chip Cloud Computer                      48 cores (x86)TILERA TILE-Gx100   100 cores (MIPS)
n       l       l 
n       l       l             DRAM Controller   DRAM Controller                                                        ...
n       l       l n       l             •             •       l             •             •             • 
n       l             •       l             •                  –             •                  – 
n       l       l       l       l       l             • 
n n       l n       l             • 
n       l       l                                                     Core                                            s...
n       l       l             •                  –                  –             •                  –                 ...
n       l       l       l       l n       l       l             • n       l       l       l n 
n       l             • n       l             • n       l             • n       l n       l 
n                                n                                         l                                         l...
n                          l                          l                                  Core Stall Rate (# cores = 16)...
n                                             l                                             l                          ...
n                                              l                                                       •                ...
n       l       l             • n       l       l 
n       l n       l n       l n       l n       l       l 
DMAベースメニーコアにおける通信オーバーヘッド削減手法 @SWoPP2011 ARC-196
Upcoming SlideShare
Loading in …5
×

DMAベースメニーコアにおける通信オーバーヘッド削減手法 @SWoPP2011 ARC-196

1,637 views

Published on

情報処理学会計算機アーキテクチャ研究会 ARC-196 (SWoPP2011@鹿児島)

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,637
On SlideShare
0
From Embeds
0
Number of Embeds
404
Actions
Shares
0
Downloads
11
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

DMAベースメニーコアにおける通信オーバーヘッド削減手法 @SWoPP2011 ARC-196

  1. 1. n  l  l  • n  l  l 
  2. 2. n n n n n n 
  3. 3. Intel Single Chip Cloud Computer 48 cores (x86)TILERA TILE-Gx100 100 cores (MIPS)
  4. 4. n  l  l 
  5. 5. n  l  l  DRAM Controller DRAM Controller Local Memory Core INCC R Node DRAM Controller DRAM Controller
  6. 6. n  l  l n  l  •  •  l  •  •  • 
  7. 7. n  l  •  l  •  –  •  – 
  8. 8. n  l  l  l  l  l  • 
  9. 9. n n  l n  l  • 
  10. 10. n  l  l  Core store stall Store Buffer : data : control store stall Violation Detector load Head Address 0x1000 load Tail Address 0x2000 (DMA Status) DMA Req Yes store store DMA status read Local Memory DMA Controller (Scratchpad) PUT to (3,2) write DMA reg DMA Transfer Rout er On Chip Network
  11. 11. n  l  l  •  –  –  •  –  – 
  12. 12. n  l  l  l  l n  l  l  • n  l  l  l n 
  13. 13. n  l  • n  l  • n  l  • n  l n  l 
  14. 14. n  n  l  l  Performance (# cores = 16) Performance (# cores = 64) 100.00% 100.00%Normalized Execution Cycle Normalized Execution Cycle 95.00% 95.00% 90.00% 90.00% 85.00% 85.00% 80.00% 80.00% Matrix Matrix Bitonic Bitonic Pipeline Stencil All to All Multiply Pipeline Stencil All to All Multiply Sort Sort Cannon Cannon #buf=8 96.27% 99.85% 97.86% 97.25% 99.98% #buf=8 96.83% 99.83% 91.91% 92.80% 99.90% #buf=16 92.64% 99.85% 94.86% 97.21% 99.98% #buf=16 94.20% 99.83% 89.27% 92.80% 99.82% #buf=32 89.50% 99.85% 92.26% 97.19% 99.98% #buf=32 91.90% 99.83% 81.51% 92.80% 99.87%
  15. 15. n  l  l  Core Stall Rate (# cores = 16) Core Stall Rate (# cores = 64) 30.00% 30.00% 25.00% 25.00% 20.00% 20.00%Stall Rate Stall Rate 15.00% 15.00% 10.00% 10.00% 5.00% 5.00% 0.00% 0.00% Matrix Matrix Bitonic Bitonic Pipeline Stencil All to All Multiply Pipeline Stencil All to All Multiply Sort Sort Cannon Cannon Blocking 10.22% 0.22% 21.39% 1.32% 3.76% Blocking 7.43% 0.31% 26.57% 2.33% 8.05% Proposal: #buf=8 0.71% 0.00% 19.33% 0.68% 1.74% Proposal: #buf=8 0.16% 0.00% 19.42% 1.64% 4.46% Proposal: #buf=16 0.55% 0.00% 18.02% 0.66% 1.73% Proposal: #buf=16 0.17% 0.00% 16.37% 1.63% 4.38% Proposal: #buf=32 0.00% 0.00% 16.70% 0.62% 1.73% Proposal: #buf=32 0.00% 0.00% 9.88% 1.65% 4.36%
  16. 16. n  l  l  l  Aggressive Store Rate (# cores = 16) Aggressive Store Rate (# cores = 64) 1.400% 3.500%Rate of Store into Store Buffer Rate of Store into Store Buffer 1.200% 3.000% 1.000% 2.500% 0.800% 2.000% 0.600% 0.400% 1.500% 0.200% 1.000% 0.000% 0.500% Matrix Multipl Pipelin All to Bitonic 0.000% Stencil y Matrix e All Sort Bitonic Canno Pipeline Stencil All to All Multiply n Sort Cannon Proposal: #buf=8 0.456% 0.013% 0.311% 0.135% 0.003% Proposal: #buf=8 0.330% 0.000% 1.147% 0.046% 0.014% Proposal: #buf=16 0.691% 0.013% 0.517% 0.131% 0.003% Proposal: #buf=16 0.486% 0.000% 1.852% 0.045% 0.015% Proposal: #buf=32 1.194% 0.013% 0.932% 0.111% 0.003% Proposal: #buf=32 0.843% 0.000% 3.092% 0.044% 0.017%
  17. 17. n  l  •  l  Aggressive Store Rate (for Data) (# cores = 16) Aggressive Store Rate (for Data) # cores = 64 1.000% 3.500%Rate of Store into Store Buffer Rate of Store into Store Buffer 3.000% 0.800% 2.500% 0.600% 2.000% 0.400% 1.500% 1.000% 0.200% 0.500% 0.000% 0.000% Matrix Matrix Bitonic Bitonic Pipeline Stencil All to All Multiply Pipeline Stencil All to All Multiply Sort Sort Cannon Cannon Proposal: #buf=8 0.000% 0.000% 0.311% 0.000% 0.000% Proposal: #buf=8 0.000% 0.000% 1.147% 0.000% 0.000% Proposal: #buf=16 0.217% 0.000% 0.517% 0.000% 0.000% Proposal: #buf=16 0.147% 0.000% 1.852% 0.000% 0.001% Proposal: #buf=32 0.703% 0.000% 0.932% 0.000% 0.000% Proposal: #buf=32 0.495% 0.000% 3.092% 0.000% 0.002%
  18. 18. n  l  l  • n  l  l 
  19. 19. n  l n  l n  l n  l n  l  l 

×