3D Microprocessor Design: Stacking at different granularities

1,947 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,947
On SlideShare
0
From Embeds
0
Number of Embeds
304
Actions
Shares
0
Downloads
33
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

3D Microprocessor Design: Stacking at different granularities

  1. 1. 3D Microprocessor Design Stacking at different granularities Alberto Villegas Erce Seminar on Computer Systems Turku University April 2010Alberto Villegas Erce (Seminar on Computer Systems Turku University ) Design 3D Microprocessor April 2010 1 / 29
  2. 2. Introduction Concepts review Previously on 3D world... Industry trends Make it faster, smaller and cuter but do not forget the prize 3D Design Benefits: shorter wire length, speed increase, lower power consumption. Challenges: risk of defects, heat problems, design complexity. Through Silicon Vias (TSVs) Vertical electrical connection passing completely through a silicon die. Low power consumption Low latency Increasing integration level (10k-100k per cm2 )Alberto Villegas Erce (Seminar on Computer Systems Turku University ) Design 3D Microprocessor April 2010 2 / 29
  3. 3. Introduction Today Three dimensional Puzzle How to face 3D design? 2D design decomposition at different granularities. 1 Entire cores, cache: add functionality with high 2D reuse. 2 Functional unit blocks: performance improvement and power reduction. Must re-floorplan and retime paths. 3 Logic gates (block splitting): reduce latency and power on every level routes. Need new 3D circuit design, methodologies and layout tools.Alberto Villegas Erce (Seminar on Computer Systems Turku University ) Design 3D Microprocessor April 2010 3 / 29
  4. 4. Introduction Index 1 Stacking Complete Modules 2 Stacking Functional Unit Blocks 3 Splitting Functional Unit Blocks 4 ConclusionsAlberto Villegas Erce (Seminar on Computer Systems Turku University ) Design 3D Microprocessor April 2010 4 / 29
  5. 5. Stacking Complete Modules Index 1 Stacking Complete Modules 2 Stacking Functional Unit Blocks 3 Splitting Functional Unit Blocks 4 ConclusionsAlberto Villegas Erce (Seminar on Computer Systems Turku University ) Design 3D Microprocessor April 2010 5 / 29
  6. 6. Stacking Complete Modules Idea Three-Dimensional Stacked Caches Idea Break & stack existing modules. Conventional dual-core processor featuring a 4MB L2 cache. Design options for 3D stacking Reduce space. Increase storage.Alberto Villegas Erce (Seminar on Computer Systems Turku University ) Design 3D Microprocessor April 2010 6 / 29
  7. 7. Stacking Complete Modules Increasing storage L2 cache controller in 3D Objective Add more storage to the L2 cache. Stacking a second silicon layer Additional 8MB of cache Nearly no impact in L2 access latency Traditional 2D solution Double silicon area. Latency increased.Alberto Villegas Erce (Seminar on Computer Systems Turku University ) Design 3D Microprocessor April 2010 7 / 29
  8. 8. Stacking Complete Modules Increasing storage L2 cache controller in 3D (cont.) DRAM Solution Much greater storage density. Greater latency (50-150 cycles). Reduce silicon area in a half. Hybrid solution SRAM to store only the tags. DRAM to store the actual data.Alberto Villegas Erce (Seminar on Computer Systems Turku University ) Design 3D Microprocessor April 2010 8 / 29
  9. 9. Stacking Complete Modules Increasing storage L2 cache controller in 3D (testing) Three programs test: Program A : small working set that fits in 4MB SRAM cache. Program B : larger working set that do not fit 4MB SRAM but does fit within 32MB DRAM cache. Program C : streaming memory access patterns. Poor cache hits rate for both configurations.Alberto Villegas Erce (Seminar on Computer Systems Turku University ) Design 3D Microprocessor April 2010 9 / 29
  10. 10. Stacking Complete Modules 3D optionality 3D Integration ... for everyone? 3D Integration: Increase silicon required for the chip (layers) =⇒ Increase manufacturing cost Extra manufacturing steps for bounding. Impact on yield rates. 3D is not the general answer! 3D stacking is to use it as a means to optionally augment the processor with some additional functionalityAlberto Villegas Erce (Seminar on Computer Systems Turku University ) Design 3D Microprocessor April 2010 10 / 29
  11. 11. Stacking Complete Modules 3D optionality Introspective 3D Processors Objective Access to more dynamic information about the internal state of a microprocessor.Alberto Villegas Erce (Seminar on Computer Systems Turku University ) Design 3D Microprocessor April 2010 11 / 29
  12. 12. Stacking Complete Modules 3D optionality Reliable 3D Processors Problem Small size in modern processors makes them vulnerable to data corruption Solutions Redundancy: two/three copies of the processor operating lock-step =⇒ multiple pipelines increase cost. Leading execution/trailing checking cores: trailing core re-executes instructions (not lock-step) =⇒ still additional pipeline increases area. Extra wires eliminated. Stack it! Optional checker core. Unutilized silicon area.Alberto Villegas Erce (Seminar on Computer Systems Turku University ) Design 3D Microprocessor April 2010 12 / 29
  13. 13. Stacking Functional Unit Blocks Index 1 Stacking Complete Modules 2 Stacking Functional Unit Blocks 3 Splitting Functional Unit Blocks 4 ConclusionsAlberto Villegas Erce (Seminar on Computer Systems Turku University ) Design 3D Microprocessor April 2010 13 / 29
  14. 14. Stacking Functional Unit Blocks Introduction Stacking Functional Unit Blocks Nowadays Early step of development for this technologies. 3D integration will require Design automation tools. Layout support. Verification and validation methodologies. Future Reorganize the processor pipeline in new ways.Alberto Villegas Erce (Seminar on Computer Systems Turku University ) Design 3D Microprocessor April 2010 14 / 29
  15. 15. Stacking Functional Unit Blocks Removing wires Removing Wires Pentium III & IV branch misprediction Problem Wire delays have not evolve as fast as transistors speed. PIII branch misprediction PIV branch misprediction Solution 3D implementation so distant blocks are now vertically stacked on top of each other.Alberto Villegas Erce (Seminar on Computer Systems Turku University ) Design 3D Microprocessor April 2010 15 / 29
  16. 16. Stacking Functional Unit Blocks Removing wires Removing Wires Alpha 21264 Problem Superscalar processor with multiple execution units (EU) requires a bypass network to forward results between all of the EU =⇒ wiring. 2D Solution Divide EU into two groups or clusters, each with its own bypass network and communicated. 3D Solution Stack the clusters.Alberto Villegas Erce (Seminar on Computer Systems Turku University ) Design 3D Microprocessor April 2010 16 / 29
  17. 17. Stacking Functional Unit Blocks Trade-offs Removing Wires Trade-offs Cons Pros Non-trivial engineering Optimize processor effort. pipeline opportunities. Modify pipeline Physically reduction of Verify and validate amount of wiring. new design. Additional costs.Alberto Villegas Erce (Seminar on Computer Systems Turku University ) Design 3D Microprocessor April 2010 17 / 29
  18. 18. Stacking Functional Unit Blocks TSV Reality Removing Wires TSV Reality Problem After stacking two blocks there is enough room for placing TSVs. Solution Different layouts of the TSVs. Wire overhead reintroduction Reintroduced wires do not completely cancel the 3D wire reduction benefits.Alberto Villegas Erce (Seminar on Computer Systems Turku University ) Design 3D Microprocessor April 2010 18 / 29
  19. 19. Splitting Functional Unit Blocks Index 1 Stacking Complete Modules 2 Stacking Functional Unit Blocks 3 Splitting Functional Unit Blocks 4 ConclusionsAlberto Villegas Erce (Seminar on Computer Systems Turku University ) Design 3D Microprocessor April 2010 19 / 29
  20. 20. Splitting Functional Unit Blocks Introduction Splitting Functional Unit Blocks Last level Logic gates Split individual functional units across multiple layers. Reorganize the functional unit block =⇒ more compact 3D arragement. Benefits Reduce length of intra-block wiring. Improve operating frequencies. We will introduce a starting point of thinking.Alberto Villegas Erce (Seminar on Computer Systems Turku University ) Design 3D Microprocessor April 2010 20 / 29
  21. 21. Splitting Functional Unit Blocks 3D Cache Organizations 3D Cache Organizations First view Problem L2 cache consumes about half of the overall die area. Worst case routing distance: 2x+4y Two stack possibilities. Banks on cores Banks on banks Half space. Half space. Accessing Accessing equal. reduced.Alberto Villegas Erce (Seminar on Computer Systems Turku University ) Design 3D Microprocessor April 2010 21 / 29
  22. 22. Splitting Functional Unit Blocks Splitting the cache 3D Splitting the cache Problem Wires within each bank also impact overall latency. Split individual cache banks across multiple layers. Columns on columns Best latency. Rows on rows Energy reduction.Alberto Villegas Erce (Seminar on Computer Systems Turku University ) Design 3D Microprocessor April 2010 22 / 29
  23. 23. Splitting Functional Unit Blocks Splitting the cache 3D Splitting cache Testing Experimental results SPICE simulation. Column on column organization. SRAM implementations in 65-nm process.Alberto Villegas Erce (Seminar on Computer Systems Turku University ) Design 3D Microprocessor April 2010 23 / 29
  24. 24. Splitting Functional Unit Blocks 3D Adders 3D Adders Classic Look-ahead Carry Adder Look-ahead Carry Adder n = 16-bits Critical path along bit[0]-bit[n-1] Several ways to split the adder Based on inputs By significance x bottom layer; least significant bits y top layer. bottom layer; most significant top 1st lvl of propagate layer. layer splitted. TSV between root Half wire length. nodes.Alberto Villegas Erce (Seminar on Computer Systems Turku University ) Design 3D Microprocessor April 2010 24 / 29
  25. 25. Conclusions Index 1 Stacking Complete Modules 2 Stacking Functional Unit Blocks 3 Splitting Functional Unit Blocks 4 ConclusionsAlberto Villegas Erce (Seminar on Computer Systems Turku University ) Design 3D Microprocessor April 2010 25 / 29
  26. 26. Conclusions Conclusions Benefits of 3D organizing components Can significantly reduce wire lengths. Devices from different technologies can be tightly integrated and combined. 3D organizations may be required depending on the exact design constraints and objectives.Alberto Villegas Erce (Seminar on Computer Systems Turku University ) Design 3D Microprocessor April 2010 26 / 29
  27. 27. Conclusions Conclusions Cons More granularity ⇒ more re-dising. Stacking can increase heat. Long level of technological development Every re-design process yields to a cost increment.Alberto Villegas Erce (Seminar on Computer Systems Turku University ) Design 3D Microprocessor April 2010 27 / 29
  28. 28. References References Three-Dimensional Microprocessor Design Gabriel H. Loh Springer Science 2010 A Modular 3D Processor for Flexible Product Design and Technology Migration Gabriel H. Loh ACM 2008 Die-stacking (3D) microarchitecture B. Black. International Symposium on Microarchitecture, pp. 469-479, 2006Alberto Villegas Erce (Seminar on Computer Systems Turku University ) Design 3D Microprocessor April 2010 28 / 29
  29. 29. The end Questions Thank you. Questions? Please be niceAlberto Villegas Erce (Seminar on Computer Systems Turku University ) Design 3D Microprocessor April 2010 29 / 29

×