CSTalks-Polymorphic heterogeneous multicore systems-17Aug

710 views

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
710
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
18
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

CSTalks-Polymorphic heterogeneous multicore systems-17Aug

  1. 1. blog.nus.edu.sg/cstalks
  2. 2. Polymorphic Heterogeneous Multi-Core Systems Mihai Pricopi CSTalks August 17, 2011
  3. 3. Motivation Single-core performance (complexity) increaseMihai Pricopi CSTalks 3
  4. 4. Motivation Instruction-level parallelism (ILP) I 2 1: e = a + b 2: f = c + d 3: g = e * f 4: h = f * 2 3 4Mihai Pricopi CSTalks 4
  5. 5. Motivation 2006 2007Mihai Pricopi CSTalks 5
  6. 6. Motivation Thread-level parallelism (TLP) Multi-threaded applications Multi-programmed jobs Process Process0 Process1 P0 P1 P0 P1Mihai Pricopi CSTalks 6
  7. 7. Motivation nVidia Tesla many-core: up to 960 simple and identical cores. Massively exploiting the TLP. Sequential programs suffer from limited ILP exploitation. A gap between TLP and ILP. Solution: heterogeneous systems to accommodate the gap between TLP and ILP.Mihai Pricopi CSTalks 7
  8. 8. Heterogeneous Chip Multi-processors Multi-core systems that use cores with different performance parameters. Existing results show that heterogeneous systems are more efficient than homogeneous ones in terms of performance, power, area and delay. Heterogeneity can be reached by using: ◦ Asymmetric chip multi-processors (ACMPs) ◦ Multiprocessor system-on-chip (MPSoC) ◦ Architectures that dynamically reconfigure the internal structure in order to adapt to different software requests (polymorphic)Mihai Pricopi CSTalks 8
  9. 9. Heterogeneous Chip Multi-processors Asymmetric chip multi-processors (ACMPs) P0 P0 P1 P4 P1 P3 P2 P3 P2Mihai Pricopi CSTalks 9
  10. 10. Heterogeneous Chip Multi-processors Multiprocessor system-on-chip (MPSoC) ARM memory controller bridges DSP video acceleratorMihai Pricopi CSTalks 10
  11. 11. Program Phase Behavior - gzipMihai Pricopi CSTalks 11
  12. 12. Program Phase Behavior - gccMihai Pricopi CSTalks 12
  13. 13. Polymorphic Heterogeneous Multi-CoreSystems • General propose applications • Novel architecture that can be tailored according to the software requirements • Base system: homogeneous P0 P1 P2 P3 processor RF • Reconfigurable capabilities • Internal structure P4 P5 P6 P7 adaptation • Core-coalition P8 P9 P10 P11 • Memory P12 P13 P14 RF P15Mihai Pricopi CSTalks 13
  14. 14. Polymorphic Heterogeneous Multi-CoreSystems – Reconfigurable Fabric • Reconfigurable hardware shared by different processors • RF implements custom instructions • Dynamic reconfiguration at runtime – speedup I 2 1: e = a + b P0 2: f = c + d RF 3: g = e * f P1 4: h = f * 2 3 4 Custom InstructionMihai Pricopi CSTalks 14
  15. 15. Polymorphic Heterogeneous Multi-CoreSystems – Reconfigurable Fabric • Challenging Problems: • The amount of RF is limited. • Decide when to reconfigure the RF (scheduling) • What is the best set of Custom Instructions that will give the highest speedup. • Overhead of the dynamic reconfiguration.Mihai Pricopi CSTalks 15
  16. 16. Polymorphic Heterogeneous Multi-CoreSystems – Core Structure Adaptation • Similar performance can be achieved by using smaller processor internal units. • Instruction fetch window size, issue width, instruction window size, frequency can be dynamically changed. • Power and thermal concerns.Mihai Pricopi CSTalks 16
  17. 17. Polymorphic Heterogeneous Multi-CoreSystems – Core-Coalition • Coalition helps creating “stronger” cores using the already existing light cores: • accelerates serial applications by extracting more ILP (if available). • uses limited amount of shared hardware between cores. • up to 4-core coalition can be formed. 2-core coalition P0 P1 P (2-way) (2-way) ≡ (4-way)Mihai Pricopi CSTalks 17
  18. 18. Polymorphic Heterogeneous Multi-Core Systems – Core-Coalition Execution Model Time SF RF EX CM SF RF EX CM B0 B1 SF: Sentinel Instruction B0 B0 B1 fetch and global B0 renaming B3 B1 RF: Regular instruction B4 fetch, decode andB1 B2 B0 renaming B3 B4 EX: Regular instruction execution B3 B3 B1 B4 CM: Regular instruction commit B3 B4 B4CFG Core 0 Core 1 Mihai Pricopi National University of Singapore 18
  19. 19. Experimental Results - SpeedupMihai Pricopi National University of Singapore 19
  20. 20. Experimental Results – Load BalanceMihai Pricopi National University of Singapore 20
  21. 21. Proposed directions  Next steps: ◦ Implement Coalition on FPGA. ◦ More study on the overhead and power consumption determined by the shared resources. ◦ Implement a dynamic scheduler for Coalition.Mihai Pricopi National University of Singapore 21
  22. 22. ?Mihai Pricopi National University of Singapore 22
  23. 23. Next Week’s TalkA Unified Framework for Recommendations inthe Social Network by Chen Wei Join us next Wednesday! Wednesday, 31 August, 2011 23

×