Your SlideShare is downloading. ×
DFX Architecture for High
Performance Multi-core Processors




           Ishwar Parulkar
        Sun Microsystems, Inc.
Outline

•  Processor Overview
•  DFX Challenges/Opportunities in 3rd Gen CMT
•  Scan flop and chain configurations
•  Emb...
Outline

•  Processor Overview
•  DFX Challenges/Opportunities in 3rd Gen CMT
•  Scan flop and chain configurations
•  Emb...
Processor Overview




                     4
Processor Die Photograph




                           5
Outline

•  Processor Overview
•  DFX Challenges/Opportunities in 3rd Gen CMT
•  Scan flop and chain configurations
•  Emb...
Characteristics of 3rd Gen CMT
Processors
•  16 or more complex, multi-threaded cores
    – Scout threads
    – Execute ah...
DFX Challenges in 3rd Gen CMT
Processors

•  Amplification of DFX cost because of high
   degree of replication
  –  globa...
DFX Opportunities in 3rd Gen CMT
Processors

•  Yield enhancement
   –  Binning on throughput performance
•  On-line Avail...
Outline

•  Processor Overview
•  DFX Challenges/Opportunities in 3rd Gen CMT
•  Scan flop and chain configurations
•  Emb...
Scan Flop
                                                         Q

          func clk   func clk                 func c...
Choice of Scan Flop

•  Scan path and operation impervious to variation
  –  scan circuits are uniform across flop types -...
Scan Chain Architecture
•  Requirements
   How do
  yo
  u
   manage 1.35 million scan flops in a CMT design?
•  Considera...
Scan Chain Architecture




                CC Level
                 Scan
              Configuration



                ...
Outline

•  Processor Overview
•  DFX Challenges/Opportunities in 3rd Gen CMT
•  Scan flop and chain configurations
•  Emb...
Embedded Memory Test - Challenges
•  Scale
   –  >1100 instances of embedded memories
•  Variation of size and type
   –  ...
At-speed Test of Memories via Scan


                          W Address   W Word Lines
              1
              0
  ...
SPARC ASI Network
•  Access network on chip corresponding to Address
   Space Identifier (ASI) in SPARC memory model
•  Us...
SPARC ASI Network Implementation
                                            CORE
                                        ...
Memory Test Network
                                               CORE
                                              Pipe...
Memory Test Network
    IEEE 1149.1
       TAP
                                              CORE
                        ...
Memory Test Network
    IEEE 1149.1
       TAP
                                              CORE
                        ...
Memory Test Network
    IEEE 1149.1
       TAP
                                               CORE
                       ...
Memory Test Network
•  DFX requirements
   impose
   d
    on ASI network (architectural and implementation)‫‏‬
  –  Loads...
Central MBIST Programmability
•  Parameters of Memory under Test
  –  ASI ID of Memory
  –  Routing information (core/unit...
3-D Register File
  •  Stores multiple copies of architectural state
     for speculation and threading
     –  a static p...
3-D Register File (Schematic)‫‏‬




                                 27
MBIST Algorithm for 3-D Memories
•  Static Portion: Only Write Ports
•  Active Portion: Write and Read Ports
•  RESTORE Fu...
MBIST Algorithm for 3-D Memories
Clock Cycles   0      1        2    3       4    5    6
Accesses       R0     W1       R1...
MBIST Algorithm for 3-D Memories
 Clock Cycles     0      1        2      3       4       5    6
 Accesses         R0     ...
Outline

•  Processor Overview
•  DFX Challenges/Opportunities in 3rd Gen CMT
•  Scan flop and chain configurations
•  Emb...
Determinism for Functional Test
The Problem required for
•  Functional Test
  –  Speed binning
  –  Timing path debug on A...
Processor Clock Domains

                  IO Logical        Serdes
      Main           Laye       Physical Layer
       ...
Indeterminism on Tx path




                 0 1 2 3 4 5 6 7 8
             2



                                     R
 ...
Indeterminism on Tx path




                 0 1 2 3 4 5 6 7 8
             2



                                     R
 ...
Indeterminism on Tx path

                    0 1 2 3 4 5 6 7 8




            W                           R


          ...
Indeterminism on Tx path

                    0 1 2 3 4 5 6 7 8




            W                           R


          ...
Deterministic Tx path

                    0 1 2 3 4 5 6 7 8




            W                           R


             ...
Indeterminism on Rx path




         8   76 5 4 3 2 1 0




                              W
     R




                  ...
Indeterminism on Rx path

          8   76 5 4 3 2 1 0




                                   W
      R


          8   76...
Indeterminism on Rx path

          8   76 5 4 3 2 1 0




                                   W
      R


          8   76...
Indeterminism on Rx path

          8   76 5 4 3 2 1 0




                                   W
      R


          8   76...
Indeterminism on Rx path

          8   76 5 4 3 2 1 0




                                   W
      R


          8   76...
Deterministic Rx path

          8   76 5 4 3 2 1 0




                                   W
      R


          8   76 5 ...
Deterministic Rx path

          8   76 5 4 3 2 1 0




                                   W
      R


          8   76 5 ...
Deterministic Rx path

          8   76 5 4 3 2 1 0




                                    W
      R


          8   76 5...
Deterministic Rx path

          8   76 5 4 3 2 1 0




                                    W
      R


          8   76 5...
Deterministic Functional Test Mode

                      IO Logical              Serdes
    Main                 Laye    ...
Outline

•  Processor Overview
•  DFX Challenges/Opportunities in 3rd Gen CMT
•  Scan flop and chain configurations
•  Emb...
System Test/Debug
•  ServiceLink – Serial System
   Management
   In
   terface with Service Processor (SP) as master
•  L...
• 
 U
 s
Use of DFX Features in System
 e of DFX features in enterprise class systems?
  – Productization/Engineering
    ...
Outline

•  Processor Overview
•  DFX Challenges/Opportunities in 3rd Gen CMT
•  Scan flop and chain configurations
•  Emb...
Enhancing Product Yield
•  Size of core cluster (4 cores) = 58mm2 = 15% of
   die
  –  Defects in 30% of chip, yield chips...
Enhancing RAS

•  Logic BIST, Memory BIST and Interconnect
   BIST run in the field
•  Fault Management module in SolarisT...
Outline

•  Processor Overview
•  DFX Challenges/Opportunities in 3rd Gen CMT
•  Scan flop and chain configurations
•  Emb...
Conclusions

•  Highly re-configurable scan chain architecture
   to manage > 1 million flops in CMT designs

•  Balance b...
Conclusions (contd.)
•  Ability to sort partially defective chips critical to
   maximizing yield in CMT products

•  Defe...
Upcoming SlideShare
Loading in...5
×

DFX Architecture for High-performance Multi-core Microprocessors

1,475

Published on

This presentation was given at ITC 2008 (International Test Conference). It deals with DFX challenges and solution for high count multi-core microprocessors. Acknowledgment: Co-authors on ITC presentation - Gaurav Agarwal, Sriram Anandakumar, Gordon Liu, Rajesh Pendurkar, Krishna Rajan and Frank Chiu.

0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,475
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Transcript of "DFX Architecture for High-performance Multi-core Microprocessors"

  1. 1. DFX Architecture for High Performance Multi-core Processors Ishwar Parulkar Sun Microsystems, Inc.
  2. 2. Outline •  Processor Overview •  DFX Challenges/Opportunities in 3rd Gen CMT •  Scan flop and chain configurations •  Embedded Memories •  Deterministic Functional Test •  System Test and Debug •  Enhancing Yield and RAS •  Conclusions 2
  3. 3. Outline •  Processor Overview •  DFX Challenges/Opportunities in 3rd Gen CMT •  Scan flop and chain configurations •  Embedded Memories •  Deterministic Functional Test •  System Test and Debug •  Enhancing Yield and RAS •  Conclusions 3
  4. 4. Processor Overview 4
  5. 5. Processor Die Photograph 5
  6. 6. Outline •  Processor Overview •  DFX Challenges/Opportunities in 3rd Gen CMT •  Scan flop and chain configurations •  Embedded Memories •  Deterministic Functional Test •  System Test and Debug •  Enhancing Yield and RAS •  Conclusions 6
  7. 7. Characteristics of 3rd Gen CMT Processors •  16 or more complex, multi-threaded cores – Scout threads – Execute ahead – Simultaneous speculative threading – Transactional memory – Parallelization of programs – Near-linear scalability for multiple sockets •  High bandwidth in and out of chip => Serdes •  Chip configurations with subset of cores 7
  8. 8. DFX Challenges in 3rd Gen CMT Processors •  Amplification of DFX cost because of high degree of replication –  global versus local trade-offs •  Testing of complex structures –  3-D register files; multi-ported memories •  Testing large-scale implementation of SerDes •  Deterministic behavior on ATE and in system in presence of non-deterministic SerDes 8
  9. 9. DFX Opportunities in 3rd Gen CMT Processors •  Yield enhancement –  Binning on throughput performance •  On-line Availability –  Detection and isolation of defective cores and/ or thread hardware •  Rapid design of derivative chip family –  Minimal DFX design, verification and test pattern generation effort 9
  10. 10. Outline •  Processor Overview •  DFX Challenges/Opportunities in 3rd Gen CMT •  Scan flop and chain configurations •  Embedded Memories •  Deterministic Functional Test •  System Test and Debug •  Enhancing Yield and RAS •  Conclusions 10
  11. 11. Scan Flop Q func clk func clk func clk D scan_in clk scan_in clk scan_in scan_out 11
  12. 12. Choice of Scan Flop •  Scan path and operation impervious to variation –  scan circuits are uniform across flop types - dynamic front-ends; pulse clocked –  scan path is static and robust across process variation •  Can be extended (by adding one more latch) to observe flop state dynamically •  Consumes less dynamic power because of reduced load on functional clock 12
  13. 13. Scan Chain Architecture •  Requirements How do yo u manage 1.35 million scan flops in a CMT design? •  Considerations in architecting scan chains –  Efficient identification of partial good cores –  Partial core chip configurations –  Handling of special flops in non-ATPG scenarios (e. g. redundancy registers, clock control, Logic BIST, etc.)‫‏‬ –  Efficiency of scan patterns on ATE –  IDS probe loop time for debug –  Efficiency of scan-dump in system debug –  Usability of scan in presence of scan bugs 13
  14. 14. Scan Chain Architecture CC Level Scan Configuration m scan chains chip chip scan inputs scan outputs 14
  15. 15. Outline •  Processor Overview •  DFX Challenges/Opportunities in 3rd Gen CMT •  Scan flop and chain configurations •  Embedded Memories •  Deterministic Functional Test •  System Test and Debug •  Enhancing Yield and RAS •  Conclusions 15
  16. 16. Embedded Memory Test - Challenges •  Scale –  >1100 instances of embedded memories •  Variation of size and type –  2MB L2-cache to 8-32 entry queues •  Complex, specialized arrays –  3-D register files; CAM-RAM combinations; multi- ported memories •  Sun/TI specific test requirements –  Direct pin access to large memories –  Efficiency and ease of bit-mapping 16
  17. 17. At-speed Test of Memories via Scan W Address W Word Lines 1 0 W Enable R Address R Word Lines R Enable Memory Clock W Data R Data Clock Header Functional Clock Scan_In Clock 17
  18. 18. SPARC ASI Network •  Access network on chip corresponding to Address Space Identifier (ASI) in SPARC memory model •  Uses of ASI accesses –  Normal operation – Chip configuration by software – Transfer of information programmed in E-fuse farm to internal registers –  Failures in field – Diagnosis of failures – Reconfiguration of chip –  Engineering Bring-up – Error injection for post-silicon validation of RAS – Observability during debug 18
  19. 19. SPARC ASI Network Implementation CORE Pipeline Service System Switch ASI Management Port Control Unit Network •  ASI network is hierarchical –  Star and daisy chain •  ASI routing hubs in units –  Packets WRITE DATA routed ADDRESS b MEMORY ased on destination array ID R/W CONTROL •  Dedicated or shared ASI paths READ DATA –  Muxing could be 19 a nywhere in the path to array‫‏‬
  20. 20. Memory Test Network CORE Pipeline ASI Switch Network System Service Port Management WRITE DATA Control Unit ADDRESS MEMORY ASI - Address Space Identifier R/W CONTROL READ DATA 20
  21. 21. Memory Test Network IEEE 1149.1 TAP CORE Pipeline MTCU ASI Switch Network System Service Port Management WRITE DATA Control Unit ADDRESS MEMORY ASI - Address Space Identifier R/W CONTROL MTCU - Memory Test Control Unit READ DATA 21
  22. 22. Memory Test Network IEEE 1149.1 TAP CORE Pipeline DMTA MTCU Port ASI Switch Network System Service Port Management WRITE DATA Control Unit ADDRESS MEMORY ASI - Address Space Identifier R/W CONTROL MTCU - Memory Test Control Unit READ DATA DMTA - Direct Memory Test Access (Slow Speed)‫‏‬ 22
  23. 23. Memory Test Network IEEE 1149.1 TAP CORE Pipeline DMTA MTCU Port DMO Space/Time ASI Switch Port Multiplexer Network System Service Port Management WRITE DATA Control Unit ADDRESS MEMORY ASI - Address Space Identifier R/W CONTROL MTCU - Memory Test Control Unit READ DATA DMTA - Direct Memory Test Access (Slow Speed)‫‏‬ DMO - Direct Memory Observe (High Speed) 23
  24. 24. Memory Test Network •  DFX requirements impose d on ASI network (architectural and implementation)‫‏‬ –  Loads and stores on consecutive clock cycles –  Order of transactions maintained during transit –  Direct access to memory via network –  Error checking logic disabled (parity, ECC)‫‏‬ –  Data word replication for wide memories –  Broadcast mode (for initialization)‫‏‬ –  Network integrity mode (for diagnosis)‫‏‬ 24
  25. 25. Central MBIST Programmability •  Parameters of Memory under Test –  ASI ID of Memory –  Routing information (core/unit ID)‫‏‬ –  ASI data bits to be masked –  Size of address space –  R/W cycle access time of memory •  Address permutation programmability –  MBIST engine has incrementor/decrementor –  Program bit position of ASI address bit for MBIST sequencer before test •  Debug and bit-mapping support 25
  26. 26. 3-D Register File •  Stores multiple copies of architectural state for speculation and threading –  a static portion optimized for area –  an active portion optimized for speed 26
  27. 27. 3-D Register File (Schematic)‫‏‬ 27
  28. 28. MBIST Algorithm for 3-D Memories •  Static Portion: Only Write Ports •  Active Portion: Write and Read Ports •  RESTORE Function: Transfers contents from Static to Active Portion •  MBIST Algorithm –  First, test Active array like a typical SRAM –  For Static array •  in place of READ of Static array, do a RESTORE followed by READ of Active array in next cycle •  align accesses to maintain back-to-back cycle accesses of March tests 28
  29. 29. MBIST Algorithm for 3-D Memories Clock Cycles 0 1 2 3 4 5 6 Accesses R0 W1 R1 R0 W1 R1 R0 Address Seq Address X Address X+1 29
  30. 30. MBIST Algorithm for 3-D Memories Clock Cycles 0 1 2 3 4 5 6 Accesses R0 W1 R1 R0 W1 R1 R0 Address Seq Address X Address X+1 Clock Cycles 0 1 2 3 4 5 6 Static Address X Address X+1 Address Seq Static Accesses ® W1 ® ® W1 ® ® Active Accesses R0 _ R1 R0 _ R1 Active Address X Address X+1 Address Seq ® = Restore 30
  31. 31. Outline •  Processor Overview •  DFX Challenges/Opportunities in 3rd Gen CMT •  Scan flop and chain configurations •  Embedded Memories •  Deterministic Functional Test •  System Test and Debug •  Enhancing Yield and RAS •  Conclusions 31
  32. 32. Determinism for Functional Test The Problem required for •  Functional Test –  Speed binning –  Timing path debug on ATE –  Repeatability for logic debug in system – E mulate system behavior on ATE for correlation •  Sources of indeterminism –  Indeterminism in Rx (SerDes receivers)‫‏‬ –  Indeterminism in Tx (SerDes transmitters)‫‏‬ –  Asynchronous clock domain crossings •  Ca c 32 he-resident functional test is a partial solution
  33. 33. Processor Clock Domains IO Logical Serdes Main Laye Physical Layer Cor r Clock Domain e Clock Domain (1.33Ghz)‫‏‬ Clock Domain (1.33Ghz)‫‏‬ (2.3Ghz)‫‏‬ Tx1 TxN Rx1 RxN 33
  34. 34. Indeterminism on Tx path 0 1 2 3 4 5 6 7 8 2 R W 34
  35. 35. Indeterminism on Tx path 0 1 2 3 4 5 6 7 8 2 R W 35
  36. 36. Indeterminism on Tx path 0 1 2 3 4 5 6 7 8 W R 0 1 2 3 4 5 6 7 8 2 R W 36
  37. 37. Indeterminism on Tx path 0 1 2 3 4 5 6 7 8 W R 0 1 2 3 4 5 6 7 8 2 R W 37
  38. 38. Deterministic Tx path 0 1 2 3 4 5 6 7 8 W R 0 1 2 3 4 5 6 7 8 2 R W = 38
  39. 39. Indeterminism on Rx path 8 76 5 4 3 2 1 0 W R 39
  40. 40. Indeterminism on Rx path 8 76 5 4 3 2 1 0 W R 8 76 5 4 3 2 1 0 W R 40
  41. 41. Indeterminism on Rx path 8 76 5 4 3 2 1 0 W R 8 76 5 4 3 2 1 0 W R 41
  42. 42. Indeterminism on Rx path 8 76 5 4 3 2 1 0 W R 8 76 5 4 3 2 1 0 W R 42
  43. 43. Indeterminism on Rx path 8 76 5 4 3 2 1 0 W R 8 76 5 4 3 2 1 0 W R 43
  44. 44. Deterministic Rx path 8 76 5 4 3 2 1 0 W R 8 76 5 4 3 2 1 0 W R 44
  45. 45. Deterministic Rx path 8 76 5 4 3 2 1 0 W R 8 76 5 4 3 2 1 0 W R 45
  46. 46. Deterministic Rx path 8 76 5 4 3 2 1 0 W R 8 76 5 4 3 2 1 0 W R READ_DELAY Rx timeline YES Rx enables Rx starts Rx enables Aligned? Rx detects Sync byte incrementing byte alignment Sync byte detection write pointer NO Jog 46 by 1-bit
  47. 47. Deterministic Rx path 8 76 5 4 3 2 1 0 W R 8 76 5 4 3 2 1 0 W R READ_DELAY Rx timeline YES Rx enables Rx starts Rx enables Aligned? Rx detects Sync byte incrementing byte alignment Sync byte detection write pointer NO Jog 47 by 1-bit
  48. 48. Deterministic Functional Test Mode IO Logical Serdes Main Laye Physical Layer Cor r Clock Domain e Clock Domain (1.33Ghz)‫‏‬ Clock Domain (1.33Ghz)‫‏‬ (2.3Ghz)‫‏‬ Tx1 Ratioed (1:1) Synchronous Fixed Phase in Half TxN Data Rate Mode Ratioed (1:1) Rx1 CDR Output Synchronous RxN De-skew Alignment Ratioed (1:1) Ratioed 2:1 Synchronous Synchronous Fixed Phase in Half Pointer Passing Data Rate Mode 48
  49. 49. Outline •  Processor Overview •  DFX Challenges/Opportunities in 3rd Gen CMT •  Scan flop and chain configurations •  Embedded Memories •  Deterministic Functional Test •  System Test and Debug •  Enhancing Yield and RAS •  Conclusions 49
  50. 50. System Test/Debug •  ServiceLink – Serial System Management In terface with Service Processor (SP) as master •  Logic BIST in addition to scan ATPG •  Memory BIST –  Default configuration available via ServiceLink •  Interconnect BIST – All loopback modes and programmable knobs (phase, amplitude, CDR sampling, etc.) accessible via ServiceLink –  Ability to plot eye diagrams in system 50 •  BIST included in Power-on Self-test (POST)‫‏‬
  51. 51. •  U s Use of DFX Features in System e of DFX features in enterprise class systems? – Productization/Engineering • E arly electrical validation of system infrastructure •  Correlation of m e asurements in ATE versus system environments – Manufacturing •  High qual ity test of components in embedded environment – In Field 51 •  Efficient POST •  Reduction of field NTF (No Trouble Found)‫‏‬
  52. 52. Outline •  Processor Overview •  DFX Challenges/Opportunities in 3rd Gen CMT •  Scan flop and chain configurations •  Embedded Memories •  Deterministic Functional Test •  System Test and Debug •  Enhancing Yield and RAS •  Conclusions 52
  53. 53. Enhancing Product Yield •  Size of core cluster (4 cores) = 58mm2 = 15% of die –  Defects in 30% of chip, yield chips with approx. ½ of max throughput –  Small memories below repair criteria add up to a large number of bits •  DFX features identify partial die configurations •  Information programmed into E-fuse farm during manufacturing •  Clocks to defective cores disabled and SolarisTM disallows scheduling threads 53
  54. 54. Enhancing RAS •  Logic BIST, Memory BIST and Interconnect BIST run in the field •  Fault Management module in SolarisTM isolates and reconfigures –  cores, cache ways, cache lines •  Hypervisor can dynamically move workloads from a core •  Significant improvement in Availability (up- time) and Mean Time Between Unplanned System Interruptions (crashes) 54
  55. 55. Outline •  Processor Overview •  DFX Challenges/Opportunities in 3rd Gen CMT •  Scan flop and chain configurations •  Embedded Memories •  Deterministic Functional Test •  System Test and Debug •  Enhancing Yield and RAS •  Conclusions 55
  56. 56. Conclusions •  Highly re-configurable scan chain architecture to manage > 1 million flops in CMT designs •  Balance between a central MBIST engine to cover most arrays and a few dedicated engines for specialized arrays •  Determinism for functional test/debug will become more challenging at > 10Gbps – need more observability on chip 56
  57. 57. Conclusions (contd.) •  Ability to sort partially defective chips critical to maximizing yield in CMT products •  Defect isolation at thread resolution essential for acceptable uptimes in systems with CMT chips •  Modularity and reconfigurability of DFX features enables faster design and productization of derivative CMT chips 57

×