SlideShare a Scribd company logo
1 of 34
Download to read offline
"JAGUAR" X86 CORE
FUNCTIONAL VERIFICATION
Zihno Jusufovic
“JAGUAR” X86 LOW-POWER CORE




2 | Jaguar x86 Core Functional Verification | December 2012
TWO X86 CORES TUNED
 FOR TARGET MARKETS
                                                                           Mainstream Client and
                                                                              Server Markets


                “Bulldozer”
                  Family
                   Performance
                  and Scalability

              “Cat” Family                                                 Small
                         Flexible,                                         Die
                                                                           Area
                       Low Power,
                        and Small                                                                  Optimized
                                                               Low-power                            for Cloud
                                                               Markets
                                                                                                      Clients


Jaguar Hotchips 2012


 3 | Jaguar x86 Core Functional Verification | December 2012
“JAGUAR” – DESIGN FOR LOW-POWER X86 CORE
§ Jaguar is based on AMD’s Bobcat low-power x86 core with goal to:
   –  Improve IPC/power/frequency
   –  Update the ISA/feature set
§ Significant changes between Bobcat and Jaguar:
   –  Totally new L2-inclusive cache shared among four Jaguar cores
   –  New power-management flow
   –  Update the ISA/feature set:
        –  SSE4.1, SSE4.2
        –  AES, CLMUL
        –  MOVBE
        –  AVX, XSAVE/XSAVEOPT
        –  F16C, BMI1
   –  40-bit physical address capable vs. 36-bit on Bobcat
   –  Improved virtualization
   –  Many design blocks totally or significantly redesigned
 4 | Jaguar x86 Core Functional Verification | December 2012
JAGUAR X86 CORE
                                                                      32KB               Branch
 Microarchitecture                                                   ICACHE             Prediction



                                                                     Decode
                                                                       and
                                                                    Microcode
                                                                      ROMs




                                                                  Int Rename
                                                                                                        FP Decode
                                                                                                         Rename
                                                        Scheduler             Scheduler

                                                                    Int PRF
                                                                                                       FP Scheduler

                                                      ALU      ALU       LAGU      SAGU                      FP PRF

                                                               Mul
                                                                                                     VALU             VALU
                                                               Div

                                                                                                     VIMul        St Conv.

                                                          32KB                  Ld/St
                                                         DCache                Queues                FPAdd            FPMul


                                                                                                         To/From
                                                                                 BU
                                                                                                     Shared Cache Unit
Jaguar Hotchips 2012

 5 | Jaguar x86 Core Functional Verification | December 2012
JAGUAR COMPUTE UNIT (CU)


§ Four independent Jaguar cores                                CU                                SCU
§ Shared cache unit (SCU)
    –  4 L2 data banks (total 2MB)
    –  L2 interface tile                                             L2D                    L2D

                                                                     L2D                    L2D

                                       To/From NB                           L2 Interface


                                                                Core       Core      Core     Core



Jaguar Hotchips 2012


  6 | Jaguar x86 Core Functional Verification | December 2012
JAGUAR CORE PIPELINE
           0           1          2          3           4          5       6      7        8        9         10      11      12      13




                                                                                          uCode                 Branch Mispredict Latency
       Fetch0      Fetch1     Fetch2      Fetch3     Fetch4       Fetch5                           MDec
                                                                                           ROM                          14 cycles



                                           Dec0        Dec1       Dec2     iDec   Pack     FDec   Dispatch    Sched   RegRd   ALU      WB




        Transit    FpDec      RegRen      Sched      RegRd1 RegRd2         EXE    WB                AGU       DC1      DC2


                                                                                         Load Use Latency
                                                                                           L1 hit: 3 cycles




Jaguar Hotchips 2012


    7 | Jaguar x86 Core Functional Verification | December 2012
"JAGUAR" FUNCTIONAL VERIFICATION




8 | Jaguar x86 Core Functional Verification | December 2012
"JAGUAR" FUNCTIONAL VERIFICATION STRATEGY

§ Jaguar core is verified with test benches at multiple levels:
   –  Unit (Whacker)-level test benches
       §  ID
       §  DE
       §  FP
       §  LSDC
       §  BU
       §  L2I
       §  MP

   –  Top (Cluster or CPC)-level test bench
   –  System (SOC)-level test benches




9 | Jaguar x86 Core Functional Verification | December 2012
"JAGUAR"
                                                                      32KB               Branch
TEST BENCHES                                                         ICACHE             Prediction


                                                                                                                      ID Test Bench
                                                                     Decode
                           DE Test Bench                               and
                                                                    Microcode
                                                                      ROMs




                                                                  Int Rename
                                                                                                        FP Decode
                                                                                                         Rename
                                                        Scheduler             Scheduler

                                                                    Int PRF
                                                                                                       FP Scheduler

                                                      ALU      ALU       LAGU      SAGU                      FP PRF                   FP Test Bench

                                                               Mul
                                                                                                     VALU             VALU
                                                               Div

                                                                                                     VIMul        St Conv.

                                                          32KB                  Ld/St
                  LSDC Test Bench                        DCache                Queues                FPAdd            FPMul


                                                                                                         To/From
                                         BU Test Bench                           BU
                                                                                                     Shared Cache Unit


10 | Jaguar x86 Core Functional Verification | December 2012
"JAGUAR" TEST BENCHES - CONT
                                                  MP Test Bench (SCU + LS/DC/BU of each core)


                                       CU                                                      SCU


                                                  L2D                                          L2D
L2I Test Bench -
      SCU                                         L2D                                          L2D

To/From NB                                                          L2 Interface


                                           Core                  Core            Core            Core



                                                               Top (Cluster) Test Bench - CU

11 | Jaguar x86 Core Functional Verification | December 2012
"JAGUAR" FUNCTIONAL VERIFICATION STRATEGY

§ Cluster (Core) verification with mixed C++/SV(OVM/VMM)/assembly
  environment -- random and directed stimulus
§ Unit-level verification with SV OVM/VMM transaction-based random test
  benches
§ Formal verification used in FP and a few other blocks
§ Emulation done at SOC level
§ MVSIM used for power verification in cluster test bench
§ X-propagation targeted with special tool/regressions
§ Extensive use of coverage:
    –  Functional coverage
    –  Code coverage
    –  Microcode coverage


12 | Jaguar x86 Core Functional Verification | December 2012
"JAGUAR" FUNCTIONAL VERIFICATION STRATEGY
§ Long bake (soak) time for bug hunting
§ Maintain high passing rate through the entire project
    –  Core/CPC team organized around “always tape-out ready” principle
         § Main code line should always be higher than 90% pass rate
         § Anything below 90% is considered a crisis -- all hands required to drive up pass rate
    –  Features developed on branches and merged when healthy enough to support
       main line pass rate > 90%
§ Different stimulus strategies used at different levels
    –  Core test bench uses mix of random exercisers (generators) and directed tests
       supported by global tools
         § Biased towards exerciser-based new development
         § Conscious effort to not write new directed tests because of maintenance costs
         § Rigorous core debug strategy
    –  Unit test benches use SV OVM/VMM-constrained random transaction-based
       tests
13 | Jaguar x86 Core Functional Verification | December 2012
JAGUAR TOP (CLUSTER)-LEVEL TEST BENCH BLOCK DIAGRAM

       Various                                                                            System Model
  Core/CU-level                                                Fake UNB
     Checkers,                                                                               DRAM Mem
   Irritators, and
 Cache Preloaders
                                           CU
                                                                                               I/O Mem
MP Mem Model                                 SCU

    I/O              DRAM                     L2D L2D                 L2I      L2D L2D
                                                                                               Various
   Mem               Mem                                                                     Monitors and
                                                                                            Programmable
Bridge Code                                                                                    Drivers
                                               Core            Core         Core   Core
   x86 ISA Models
     1 per Core

14 | Jaguar x86 Core Functional Verification | December 2012
"JAGUAR" TOP-LEVEL STIMULUS

§ x86 random test generators
    –  Many single-threaded and multi-threaded generators
    –  Contemporary generator has more directed random capabilities and is used extensively in
       core/cluster-level test plan executions
§ Heavy emphasis on random and coverage for new stimulus requirements
§ Randomize control/configuration register state on per-test basis
§ L1/L2 cache preloaders and other dynamic, random irritators:
    –  MCA, TLBs, external probes, power-management events, interrupts, etc.
§ Fake UNB:
    –  Built-in randomization for things like memory-read latency
§ Large amount of self-checking x86 directed tests, mostly legacy:
    –  Use coverage-based test case selection to reduce run cost



15 | Jaguar x86 Core Functional Verification | December 2012
"JAGUAR" TOP TEST BENCH CHECKING, COVERAGE, AND REGRESSIONS
§ Checking:
     –  x86 ISA model
         §  Architectural state compared at instruction retire
         §  MP memory model checks all memory accesses, ordering rules, and consistency
         §  Also used in MP unit-level test bench

     –  Cache coherency checkers
         §  MOESI state and corresponding data checked between all caches

     –  Variety of other cluster-level checkers (i.e., power management, probes, stalls)
     –  Thousands of inline RTL assertions
     –  All unit-level checkers re-used in top test bench
     –  Self-checking legacy-directed tests
§ Coverage:
     –  Heavy use and dependency on functional coverage
     –  Code coverage
     –  Microcode code coverage
16 | Jaguar x86 Core Functional Verification | December 2012
"JAGUAR" TOP TEST BENCH CHECKING, COVERAGE, AND REGRESSIONS

§ 24x7 regression runs
     –  Use machine resources effectively
         §  Have enough pending sims to keep all machines busy

     –  Requires a good, organized debug effort to cover all fails
§ User-friendly regression database with many options/filters
     –  Helps synchronizing debug efforts among multiple teams
§  Debug methodology
     –  Debug to root cause




17 | Jaguar x86 Core Functional Verification | December 2012
"JAGUAR" TOP TEST BENCH METRIC

§ Test plan completeness
§ Functional, code, and microcode coverage
§ Regression cycles/instrs, pass rates, and fail signatures
§ RTL bug rates and open backlog
§ Verification bug rates and open backlog




18 | Jaguar x86 Core Functional Verification | December 2012
UNIT-LEVEL TEST BENCHES SUMMARY (1 OF 3)

§ All unit-level test benches based on SV (VMM or OVM)
§ Most stimulus is constrained random transaction-based
    –  Coverage-driven random stimulus
    –  Randomization of control/configuration register is shared with higher-level test
       benches
    –  Stimulus “state targets” with time-outs
         §  Stimulus attempts to put DUT in a targeted state, with a time out, to catch deadlock/live-lock bugs
         §  Examples: Artificial reduction of RTL queue size

§ Multi-unit test bench used to target coherency
§ Good simulation performance -- cycle per second (CPS)
    –  Goal 5-10x comparing to top test bench




19 | Jaguar x86 Core Functional Verification | December 2012
UNIT-LEVEL TEST BENCHES SUMMARY (2 OF 3)

§ 100% functional and code coverage with waiving few coverage points
    –  Selectively exporting functional coverage points to high-level test benches
§ Checking done using assertions, high-level checkers, and x86 ISA model
    –  All checks are re-used in the higher-level test benches
    –  Checks for unit stimulus constraints exported to higher-level test benches
    –  Create overlap of critical checking functions between unit-level and higher-level
       checkers
    –  Black and white box checking
    –  White box checks for:
         § Consistency between fields of different internal queues and arrays
         § Many others…
    –  Thousands of inline RTL assertions

20 | Jaguar x86 Core Functional Verification | December 2012
UNIT-LEVEL TEST BENCHES SUMMARY (3 OF 3)

§  Formal verification
    –  Still relaying on simulation
    –  FP – FPA and FPM theorem proofs
    –  Debug bus
    –  LS (USQ)
    –  CRC32
    –  DE block
    –  Others




21 | Jaguar x86 Core Functional Verification | December 2012
"JAGUAR" FP UNIT-LEVEL TEST BENCH BLOCK DIAGRAM

                                                Broadcast
                                                  cloud

                                                               Load
                                                                          CCU
                                                               Store
                                                                                   Monitors
                                                                                FPU Mon
           Op
           db
                Opgen                        ME BFM
                                                                FPU RTL
                                                                                  Checkers
                                            SRB BFM
                                                               FPU KOS Bridge       CLK,
                                                                                   Reset,
                    Test(s)
                                                                    KOS           Timeout

22 | Jaguar x86 Core Functional Verification | December 2012
JAGUAR LSDC TEST BENCH BLOCK DIAGRAM
    MP Mem Model                                                          Back-end Agent                   System Mem
                                                    MP Probe/
      I/O    DRAM                                  Write Irritator     (Represents BU, NB, etc.             DRAM     I/O
     Mem     Mem                                                            Not re-used.)                    Mem    Mem

                                                        DC             Miss Buffers
                                                                                  TLBs,
                 Numerous                                       Data Cache
                                                                               TableWalker                   Transactional
                monitors and                                                                              stimulus generator
             scoreboard-based                           LS                                                 recipes (multiply
                                                          Sched., Ordering
               checkers, plus                                                     Store Queue                  selectable,
                                                                 Queues
           shadow models of D$,                                                                              interleavable)
                   TLBs
                                                                   Front-end Agent
                                                               (Represents ID, ME, EX, etc.)


        Memory layout and page
        translation configuration
                 engines                                                            Dashed line indicates global re-use
                                                                                    in MP test bench
23 | Jaguar x86 Core Functional Verification | December 2012
JAGUAR L2I UNIT-LEVEL TEST BENCH BLOCK DIAGRAM
                                                                                                           Test
SCU                                  L2I connects 4 cores to NB and
  L2I                                manages a shared, inclusive L2 cache.
                                   Back door             Memory preloaders,            Test interface
     PRQ                                                 irritators, etc.
                     x4
                                                         Interface stimulus per external device
       CRQ                         External                Transaction driver           Trans. generator          Stimulus
                                                           (drive-x if invalid)         (test plug-ins)           state

       DSM

       DPM

      BANK/                                            Interface checking per device             Checker          Monitored state
      L2 TAG                                             transaction monitor
                    x4
                                                         (x-checks)

                                   Internal
         L2
        DATA
                     x4

24 | Jaguar x86 Core Functional Verification | December 2012
"JAGUAR" RTL BUGS FOUND PER TEST BENCH LEVELS


Test Bench Level                                        Percentage of Found RTL Bugs

Unit-level Test Benches                                               31%
Top-level Test Bench                                                  65%
System-level Test Bench                                               4%

•  Bug distribution rate does not match typical/expected distribution
•  Top-level test bench found most bugs due to:
    •  Some RTL blocks covered only in the top-level test bench
    •  Unit-level test benches extensively used in the bug fixes validation by
       RTL team due to good simulation performance (CPS)
    •  Bug hunting late in the project relies more on top-level test benches
       to find corner cases involving multiple blocks

25 | Jaguar x86 Core Functional Verification | December 2012
CHALLENGES




26 | Jaguar x86 Core Functional Verification | December 2012
POWER-MANAGEMENT VERIFICATION (1 OF 3)
§ New power-management interface between Jaguar and rest of the system
§ Shared L2 cache increases complexity
§ Each core can independently go to different levels of power states
§ Number of possible states very high:
    –  Number of clusters
    –  Number of cores
    –  Number of possible power states
    –  Number of wake-up events
    –  Specific windows of interest
    –  Number of features affected by power-management events (example: probes,
       debug features, etc.)
§ Specification changes


27 | Jaguar x86 Core Functional Verification | December 2012
POWER-MANAGEMENT VERIFICATION (2 OF 3)
§ Stimulus
    –  Random generators
         §  Random sequences to change power-management states
         §  Per-thread stimulus

    –  Power-management irritators
    –  UNB BFM built-in randomization for certain power-management events
    –  Very few directed tests
§ Coverage
    –  Functional coverage extensively used
         §  Per-core coverage points
         §  Cross-coverage points

    –  Microcode coverage



28 | Jaguar x86 Core Functional Verification | December 2012
POWER-MANAGEMENT VERIFICATION (3 OF 3)
§ Checking
    –  Power-management checker
    –  L2I and other checkers
    –  Self-checking directed tests
§  Test bench level
    –  Unit-level test benches (L2I)
    –  Top-level test bench
        §    Most of verification done here because heavy dependency on microcode and better simulation
              performance than on SOC-level test bench

    –  SOC (System)-level test benches
        §    For power-management verification, SOC-level test bench is very important:
              –    Some power-management features are very complex and not all details well-documented
              –    Use SOC-level test bench to check power-management constraints used at lower test benches
              –    First level at which all power-management components are integrated


29 | Jaguar x86 Core Functional Verification | December 2012
COHERENCY, SELF-MODIFYING CODE, AND CROSS-MODIFYING CODE (1 OF 2)
§ Coherency -- traditionally concerning feature in multi-processor (MP) environments
    –  MOESI protocol
    –  Common core interface (CCI) protocol between cluster and NB
    –  Inclusive L2 cache shared among four cores (from scratch)
         §  Example: Flushing and invalidation of shared caches

    –  Complexity increases with increased number of clusters
§ SMC/CMC handled by hardware in x86
    –  Increased complexity due to inclusive shared L2 cache
§ Out-of-order execution adds complexity
    –  LS block totally redesigned




30 | Jaguar x86 Core Functional Verification | December 2012
COHERENCY, SMC, AND CMC (2 OF 2)
§ Verification
      –  Done at multiple levels of test benches
         §  L2I test bench, top-level test bench, and SOC-level test benches

      –  Multiple levels of checkers (Jaguar-specific checkers and IP checkers)
         §  CCI IP protocol checkers (and coverage)

      –  MP memory-model checker used for ordering and data consistency
      –  Different types of stimulus (SV transaction-based, random generators, and
         directed tests)
         §  Some random generators created to target coherence/SMC/CMC specifically
              –  True and false sharing
         §  Cache preloading
         §  Functional coverage used to check quality of random stimulus
§    MP test bench created to target coherency



31 | Jaguar x86 Core Functional Verification | December 2012
JAGUAR MP (LSDC + BU + L2I) TEST BENCH BLOCK DIAGRAM
             Various                                                 Fake NB                             System Mem
         Core/CPC-level
            Checkers,                                                                                          DRAM Mem
     Irritators, and Cache                      CPC
           Preloaders                            SCU
                                                  L2D          L2D      L2I          L2D    L2D                 I/O Mem
    MP Mem Model
      I/O    DRAM
                                                   “Core”        “Core”       “Core”       “Core”
     Mem     Mem

     Memory layout and                                                                 “Core”
                                                                BU Monitors,                              BU
      page translation                                           Checkers
    configuration engines
                                                               LSDC Monitors,                            LSDC
                                                                 Checkers

                                                                     IF stimulus                    LSDC tb stimulus
                                                                     (not re-used)

                                                           (Exploded view)

32 | Jaguar x86 Core Functional Verification | December 2012
MISCELLANEOUS
§ Verification done in multiple geographic locations
    –  Time zone differences
         §  Good for 24-hours-a-day work on a project
         §  Challenge for meetings and communication

    –  Sharing methodologies and tools
§ Jaguar is designed to be used in multiple SOCs
§ Adding new features late in a project
§ Compressed schedule
§ Jaguar verification team worked on two very successful projects (Bobcat and
  Jaguar)
§ Verification team starts a new project




33 | Jaguar x86 Core Functional Verification | December 2012
DISCLAIMER & ATTRIBUTION


The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors.


The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes,
component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS
flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise
this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes.


AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY
INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.


AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD
BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY
INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
ATTRIBUTION
© 2012 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the
United States and/or other jurisdictions. SPEC is a registered trademark of the Standard Performance Evaluation Corporation (SPEC). Other names are for
informational purposes only and may be trademarks of their respective owners.




34 | Jaguar x86 Core Functional Verification | December 2012

More Related Content

What's hot

Sun sparc enterprise t5440 server technical presentation
Sun sparc enterprise t5440 server technical presentationSun sparc enterprise t5440 server technical presentation
Sun sparc enterprise t5440 server technical presentationxKinAnx
 
Sun sparc enterprise t5120 and t5220 servers technical presentation
Sun sparc enterprise t5120 and t5220 servers technical presentationSun sparc enterprise t5120 and t5220 servers technical presentation
Sun sparc enterprise t5120 and t5220 servers technical presentationxKinAnx
 
Sun sparc enterprise t5140 and t5240 servers technical presentation
Sun sparc enterprise t5140 and t5240 servers technical presentationSun sparc enterprise t5140 and t5240 servers technical presentation
Sun sparc enterprise t5140 and t5240 servers technical presentationxKinAnx
 
An FPGA-based Scalable Simulation Accelerator for Tile Architectures @HEART2011
An FPGA-based Scalable Simulation Accelerator for Tile Architectures @HEART2011An FPGA-based Scalable Simulation Accelerator for Tile Architectures @HEART2011
An FPGA-based Scalable Simulation Accelerator for Tile Architectures @HEART2011Shinya Takamaeda-Y
 
Ctp cdnlive2005 1329mohindru.pres
Ctp cdnlive2005 1329mohindru.presCtp cdnlive2005 1329mohindru.pres
Ctp cdnlive2005 1329mohindru.pressandeep patil
 
Perf EMC VNX5100 vs IBM DS5300 Eng
Perf EMC VNX5100 vs IBM DS5300 EngPerf EMC VNX5100 vs IBM DS5300 Eng
Perf EMC VNX5100 vs IBM DS5300 EngOleg Korol
 
DFX Architecture for High-performance Multi-core Microprocessors
DFX Architecture for High-performance Multi-core MicroprocessorsDFX Architecture for High-performance Multi-core Microprocessors
DFX Architecture for High-performance Multi-core MicroprocessorsIshwar Parulkar
 
XT Best Practices
XT Best PracticesXT Best Practices
XT Best PracticesJeff Larkin
 
Valdir Adorni - Compwire / EMC2 Clariion Implementation Sample
Valdir Adorni - Compwire / EMC2 Clariion Implementation SampleValdir Adorni - Compwire / EMC2 Clariion Implementation Sample
Valdir Adorni - Compwire / EMC2 Clariion Implementation SampleValdir Adorni
 
07.flash memory technology
07.flash memory technology07.flash memory technology
07.flash memory technologyruchiusha
 
Catalogo Planet Network da Spark Controles
Catalogo Planet Network da Spark ControlesCatalogo Planet Network da Spark Controles
Catalogo Planet Network da Spark ControlesSpark Controles
 
Presentation of the open source CFD code Code_Saturne
Presentation of the open source CFD code Code_SaturnePresentation of the open source CFD code Code_Saturne
Presentation of the open source CFD code Code_SaturneRenuda SARL
 
Vrrp technology white paper
Vrrp technology white paperVrrp technology white paper
Vrrp technology white paperPuran Pangeni
 
ICEBreaker Presentation: Complex Sweep Plans for Automatic Component Characte...
ICEBreaker Presentation: Complex Sweep Plans for Automatic Component Characte...ICEBreaker Presentation: Complex Sweep Plans for Automatic Component Characte...
ICEBreaker Presentation: Complex Sweep Plans for Automatic Component Characte...NMDG NV
 
Software Defined Data Centers - June 2012
Software Defined Data Centers - June 2012Software Defined Data Centers - June 2012
Software Defined Data Centers - June 2012Brent Salisbury
 

What's hot (20)

Sun sparc enterprise t5440 server technical presentation
Sun sparc enterprise t5440 server technical presentationSun sparc enterprise t5440 server technical presentation
Sun sparc enterprise t5440 server technical presentation
 
Sun sparc enterprise t5120 and t5220 servers technical presentation
Sun sparc enterprise t5120 and t5220 servers technical presentationSun sparc enterprise t5120 and t5220 servers technical presentation
Sun sparc enterprise t5120 and t5220 servers technical presentation
 
Sun sparc enterprise t5140 and t5240 servers technical presentation
Sun sparc enterprise t5140 and t5240 servers technical presentationSun sparc enterprise t5140 and t5240 servers technical presentation
Sun sparc enterprise t5140 and t5240 servers technical presentation
 
Linux on System z – performance update
Linux on System z – performance updateLinux on System z – performance update
Linux on System z – performance update
 
An FPGA-based Scalable Simulation Accelerator for Tile Architectures @HEART2011
An FPGA-based Scalable Simulation Accelerator for Tile Architectures @HEART2011An FPGA-based Scalable Simulation Accelerator for Tile Architectures @HEART2011
An FPGA-based Scalable Simulation Accelerator for Tile Architectures @HEART2011
 
Ctp cdnlive2005 1329mohindru.pres
Ctp cdnlive2005 1329mohindru.presCtp cdnlive2005 1329mohindru.pres
Ctp cdnlive2005 1329mohindru.pres
 
Timekeeper Wiring5
Timekeeper Wiring5Timekeeper Wiring5
Timekeeper Wiring5
 
Perf EMC VNX5100 vs IBM DS5300 Eng
Perf EMC VNX5100 vs IBM DS5300 EngPerf EMC VNX5100 vs IBM DS5300 Eng
Perf EMC VNX5100 vs IBM DS5300 Eng
 
Adf7901
Adf7901Adf7901
Adf7901
 
DFX Architecture for High-performance Multi-core Microprocessors
DFX Architecture for High-performance Multi-core MicroprocessorsDFX Architecture for High-performance Multi-core Microprocessors
DFX Architecture for High-performance Multi-core Microprocessors
 
XT Best Practices
XT Best PracticesXT Best Practices
XT Best Practices
 
Flash memory
Flash memoryFlash memory
Flash memory
 
Test Tutorial
Test TutorialTest Tutorial
Test Tutorial
 
Valdir Adorni - Compwire / EMC2 Clariion Implementation Sample
Valdir Adorni - Compwire / EMC2 Clariion Implementation SampleValdir Adorni - Compwire / EMC2 Clariion Implementation Sample
Valdir Adorni - Compwire / EMC2 Clariion Implementation Sample
 
07.flash memory technology
07.flash memory technology07.flash memory technology
07.flash memory technology
 
Catalogo Planet Network da Spark Controles
Catalogo Planet Network da Spark ControlesCatalogo Planet Network da Spark Controles
Catalogo Planet Network da Spark Controles
 
Presentation of the open source CFD code Code_Saturne
Presentation of the open source CFD code Code_SaturnePresentation of the open source CFD code Code_Saturne
Presentation of the open source CFD code Code_Saturne
 
Vrrp technology white paper
Vrrp technology white paperVrrp technology white paper
Vrrp technology white paper
 
ICEBreaker Presentation: Complex Sweep Plans for Automatic Component Characte...
ICEBreaker Presentation: Complex Sweep Plans for Automatic Component Characte...ICEBreaker Presentation: Complex Sweep Plans for Automatic Component Characte...
ICEBreaker Presentation: Complex Sweep Plans for Automatic Component Characte...
 
Software Defined Data Centers - June 2012
Software Defined Data Centers - June 2012Software Defined Data Centers - June 2012
Software Defined Data Centers - June 2012
 

Similar to Jaguar x86 Core Functional Verification

AMD technologies for HPC
AMD technologies for HPCAMD technologies for HPC
AMD technologies for HPCJoshua Mora
 
PG-Strom - GPU Accelerated Asyncr
PG-Strom - GPU Accelerated AsyncrPG-Strom - GPU Accelerated Asyncr
PG-Strom - GPU Accelerated AsyncrKohei KaiGai
 
Hp All In 1
Hp All In 1Hp All In 1
Hp All In 1RBratton
 
GIGABYTE GA-K8NXP-SLI AMD Socket 939 Processor Motherboard
GIGABYTE GA-K8NXP-SLI AMD Socket 939 Processor MotherboardGIGABYTE GA-K8NXP-SLI AMD Socket 939 Processor Motherboard
GIGABYTE GA-K8NXP-SLI AMD Socket 939 Processor Motherboardbettlebrox
 
QPACE QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
QPACE QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)QPACE QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
QPACE QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)Heiko Joerg Schick
 
New solutions for wireless infrastructure applications
New solutions for wireless infrastructure applicationsNew solutions for wireless infrastructure applications
New solutions for wireless infrastructure applicationschiportal
 
PA Develops an LTE PHY for Catapult
PA Develops an LTE PHY for CatapultPA Develops an LTE PHY for Catapult
PA Develops an LTE PHY for Catapultgrahambell
 
Memory Interfaces & Controllers - Sandeep Kulkarni, Lattice
Memory Interfaces & Controllers - Sandeep Kulkarni, LatticeMemory Interfaces & Controllers - Sandeep Kulkarni, Lattice
Memory Interfaces & Controllers - Sandeep Kulkarni, LatticeFPGA Central
 
MDE based FPGA physical Design Fast prototyping with Smalltalk
MDE based FPGA physical Design Fast prototyping with SmalltalkMDE based FPGA physical Design Fast prototyping with Smalltalk
MDE based FPGA physical Design Fast prototyping with SmalltalkESUG
 
FPGAシステムを作るといいことある(かも?) @SWoPP2011鹿児島 BoF-2
FPGAシステムを作るといいことある(かも?) @SWoPP2011鹿児島 BoF-2FPGAシステムを作るといいことある(かも?) @SWoPP2011鹿児島 BoF-2
FPGAシステムを作るといいことある(かも?) @SWoPP2011鹿児島 BoF-2Shinya Takamaeda-Y
 
Benefits of Using FPGAs for Embedded Processing: Embedded World 2010
Benefits of Using FPGAs for Embedded Processing: Embedded World 2010Benefits of Using FPGAs for Embedded Processing: Embedded World 2010
Benefits of Using FPGAs for Embedded Processing: Embedded World 2010Altera Corporation
 
Hana Memory Scale out using the hecatonchire Project
Hana Memory Scale out using the hecatonchire ProjectHana Memory Scale out using the hecatonchire Project
Hana Memory Scale out using the hecatonchire ProjectBenoit Hudzia
 
Hpc Application List
Hpc Application ListHpc Application List
Hpc Application Listjstemler
 
Verification of Graphics ASICs (Part II)
Verification of Graphics ASICs (Part II)Verification of Graphics ASICs (Part II)
Verification of Graphics ASICs (Part II)DVClub
 

Similar to Jaguar x86 Core Functional Verification (20)

Gpu archi
Gpu archiGpu archi
Gpu archi
 
AMD technologies for HPC
AMD technologies for HPCAMD technologies for HPC
AMD technologies for HPC
 
PG-Strom - GPU Accelerated Asyncr
PG-Strom - GPU Accelerated AsyncrPG-Strom - GPU Accelerated Asyncr
PG-Strom - GPU Accelerated Asyncr
 
Hp All In 1
Hp All In 1Hp All In 1
Hp All In 1
 
GIGABYTE GA-K8NXP-SLI AMD Socket 939 Processor Motherboard
GIGABYTE GA-K8NXP-SLI AMD Socket 939 Processor MotherboardGIGABYTE GA-K8NXP-SLI AMD Socket 939 Processor Motherboard
GIGABYTE GA-K8NXP-SLI AMD Socket 939 Processor Motherboard
 
QPACE QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
QPACE QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)QPACE QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
QPACE QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
 
New solutions for wireless infrastructure applications
New solutions for wireless infrastructure applicationsNew solutions for wireless infrastructure applications
New solutions for wireless infrastructure applications
 
Fpga technology
Fpga technologyFpga technology
Fpga technology
 
PA Develops an LTE PHY for Catapult
PA Develops an LTE PHY for CatapultPA Develops an LTE PHY for Catapult
PA Develops an LTE PHY for Catapult
 
Memory Interfaces & Controllers - Sandeep Kulkarni, Lattice
Memory Interfaces & Controllers - Sandeep Kulkarni, LatticeMemory Interfaces & Controllers - Sandeep Kulkarni, Lattice
Memory Interfaces & Controllers - Sandeep Kulkarni, Lattice
 
Cuda tutorial
Cuda tutorialCuda tutorial
Cuda tutorial
 
0507036
05070360507036
0507036
 
MDE based FPGA physical Design Fast prototyping with Smalltalk
MDE based FPGA physical Design Fast prototyping with SmalltalkMDE based FPGA physical Design Fast prototyping with Smalltalk
MDE based FPGA physical Design Fast prototyping with Smalltalk
 
FPGAシステムを作るといいことある(かも?) @SWoPP2011鹿児島 BoF-2
FPGAシステムを作るといいことある(かも?) @SWoPP2011鹿児島 BoF-2FPGAシステムを作るといいことある(かも?) @SWoPP2011鹿児島 BoF-2
FPGAシステムを作るといいことある(かも?) @SWoPP2011鹿児島 BoF-2
 
Benefits of Using FPGAs for Embedded Processing: Embedded World 2010
Benefits of Using FPGAs for Embedded Processing: Embedded World 2010Benefits of Using FPGAs for Embedded Processing: Embedded World 2010
Benefits of Using FPGAs for Embedded Processing: Embedded World 2010
 
Hana Memory Scale out using the hecatonchire Project
Hana Memory Scale out using the hecatonchire ProjectHana Memory Scale out using the hecatonchire Project
Hana Memory Scale out using the hecatonchire Project
 
Hpc Application List
Hpc Application ListHpc Application List
Hpc Application List
 
Ph.D. Thesis presentation
Ph.D. Thesis presentationPh.D. Thesis presentation
Ph.D. Thesis presentation
 
Yang greenstein part_2
Yang greenstein part_2Yang greenstein part_2
Yang greenstein part_2
 
Verification of Graphics ASICs (Part II)
Verification of Graphics ASICs (Part II)Verification of Graphics ASICs (Part II)
Verification of Graphics ASICs (Part II)
 

More from DVClub

IP Reuse Impact on Design Verification Management Across the Enterprise
IP Reuse Impact on Design Verification Management Across the EnterpriseIP Reuse Impact on Design Verification Management Across the Enterprise
IP Reuse Impact on Design Verification Management Across the EnterpriseDVClub
 
Cisco Base Environment Overview
Cisco Base Environment OverviewCisco Base Environment Overview
Cisco Base Environment OverviewDVClub
 
Intel Xeon Pre-Silicon Validation: Introduction and Challenges
Intel Xeon Pre-Silicon Validation: Introduction and ChallengesIntel Xeon Pre-Silicon Validation: Introduction and Challenges
Intel Xeon Pre-Silicon Validation: Introduction and ChallengesDVClub
 
Verification of Graphics ASICs (Part I)
Verification of Graphics ASICs (Part I)Verification of Graphics ASICs (Part I)
Verification of Graphics ASICs (Part I)DVClub
 
Stop Writing Assertions! Efficient Verification Methodology
Stop Writing Assertions! Efficient Verification MethodologyStop Writing Assertions! Efficient Verification Methodology
Stop Writing Assertions! Efficient Verification MethodologyDVClub
 
Validating Next Generation CPUs
Validating Next Generation CPUsValidating Next Generation CPUs
Validating Next Generation CPUsDVClub
 
Verification Automation Using IPXACT
Verification Automation Using IPXACTVerification Automation Using IPXACT
Verification Automation Using IPXACTDVClub
 
Validation and Design in a Small Team Environment
Validation and Design in a Small Team EnvironmentValidation and Design in a Small Team Environment
Validation and Design in a Small Team EnvironmentDVClub
 
Trends in Mixed Signal Validation
Trends in Mixed Signal ValidationTrends in Mixed Signal Validation
Trends in Mixed Signal ValidationDVClub
 
Verification In A Global Design Community
Verification In A Global Design CommunityVerification In A Global Design Community
Verification In A Global Design CommunityDVClub
 
Design Verification Using SystemC
Design Verification Using SystemCDesign Verification Using SystemC
Design Verification Using SystemCDVClub
 
Verification Strategy for PCI-Express
Verification Strategy for PCI-ExpressVerification Strategy for PCI-Express
Verification Strategy for PCI-ExpressDVClub
 
SystemVerilog Assertions (SVA) in the Design/Verification Process
SystemVerilog Assertions (SVA) in the Design/Verification ProcessSystemVerilog Assertions (SVA) in the Design/Verification Process
SystemVerilog Assertions (SVA) in the Design/Verification ProcessDVClub
 
Efficiency Through Methodology
Efficiency Through MethodologyEfficiency Through Methodology
Efficiency Through MethodologyDVClub
 
Pre-Si Verification for Post-Si Validation
Pre-Si Verification for Post-Si ValidationPre-Si Verification for Post-Si Validation
Pre-Si Verification for Post-Si ValidationDVClub
 
OpenSPARC T1 Processor
OpenSPARC T1 ProcessorOpenSPARC T1 Processor
OpenSPARC T1 ProcessorDVClub
 
Intel Atom Processor Pre-Silicon Verification Experience
Intel Atom Processor Pre-Silicon Verification ExperienceIntel Atom Processor Pre-Silicon Verification Experience
Intel Atom Processor Pre-Silicon Verification ExperienceDVClub
 
Using Assertions in AMS Verification
Using Assertions in AMS VerificationUsing Assertions in AMS Verification
Using Assertions in AMS VerificationDVClub
 
Low-Power Design and Verification
Low-Power Design and VerificationLow-Power Design and Verification
Low-Power Design and VerificationDVClub
 
UVM Update: Register Package
UVM Update: Register PackageUVM Update: Register Package
UVM Update: Register PackageDVClub
 

More from DVClub (20)

IP Reuse Impact on Design Verification Management Across the Enterprise
IP Reuse Impact on Design Verification Management Across the EnterpriseIP Reuse Impact on Design Verification Management Across the Enterprise
IP Reuse Impact on Design Verification Management Across the Enterprise
 
Cisco Base Environment Overview
Cisco Base Environment OverviewCisco Base Environment Overview
Cisco Base Environment Overview
 
Intel Xeon Pre-Silicon Validation: Introduction and Challenges
Intel Xeon Pre-Silicon Validation: Introduction and ChallengesIntel Xeon Pre-Silicon Validation: Introduction and Challenges
Intel Xeon Pre-Silicon Validation: Introduction and Challenges
 
Verification of Graphics ASICs (Part I)
Verification of Graphics ASICs (Part I)Verification of Graphics ASICs (Part I)
Verification of Graphics ASICs (Part I)
 
Stop Writing Assertions! Efficient Verification Methodology
Stop Writing Assertions! Efficient Verification MethodologyStop Writing Assertions! Efficient Verification Methodology
Stop Writing Assertions! Efficient Verification Methodology
 
Validating Next Generation CPUs
Validating Next Generation CPUsValidating Next Generation CPUs
Validating Next Generation CPUs
 
Verification Automation Using IPXACT
Verification Automation Using IPXACTVerification Automation Using IPXACT
Verification Automation Using IPXACT
 
Validation and Design in a Small Team Environment
Validation and Design in a Small Team EnvironmentValidation and Design in a Small Team Environment
Validation and Design in a Small Team Environment
 
Trends in Mixed Signal Validation
Trends in Mixed Signal ValidationTrends in Mixed Signal Validation
Trends in Mixed Signal Validation
 
Verification In A Global Design Community
Verification In A Global Design CommunityVerification In A Global Design Community
Verification In A Global Design Community
 
Design Verification Using SystemC
Design Verification Using SystemCDesign Verification Using SystemC
Design Verification Using SystemC
 
Verification Strategy for PCI-Express
Verification Strategy for PCI-ExpressVerification Strategy for PCI-Express
Verification Strategy for PCI-Express
 
SystemVerilog Assertions (SVA) in the Design/Verification Process
SystemVerilog Assertions (SVA) in the Design/Verification ProcessSystemVerilog Assertions (SVA) in the Design/Verification Process
SystemVerilog Assertions (SVA) in the Design/Verification Process
 
Efficiency Through Methodology
Efficiency Through MethodologyEfficiency Through Methodology
Efficiency Through Methodology
 
Pre-Si Verification for Post-Si Validation
Pre-Si Verification for Post-Si ValidationPre-Si Verification for Post-Si Validation
Pre-Si Verification for Post-Si Validation
 
OpenSPARC T1 Processor
OpenSPARC T1 ProcessorOpenSPARC T1 Processor
OpenSPARC T1 Processor
 
Intel Atom Processor Pre-Silicon Verification Experience
Intel Atom Processor Pre-Silicon Verification ExperienceIntel Atom Processor Pre-Silicon Verification Experience
Intel Atom Processor Pre-Silicon Verification Experience
 
Using Assertions in AMS Verification
Using Assertions in AMS VerificationUsing Assertions in AMS Verification
Using Assertions in AMS Verification
 
Low-Power Design and Verification
Low-Power Design and VerificationLow-Power Design and Verification
Low-Power Design and Verification
 
UVM Update: Register Package
UVM Update: Register PackageUVM Update: Register Package
UVM Update: Register Package
 

Jaguar x86 Core Functional Verification

  • 1. "JAGUAR" X86 CORE FUNCTIONAL VERIFICATION Zihno Jusufovic
  • 2. “JAGUAR” X86 LOW-POWER CORE 2 | Jaguar x86 Core Functional Verification | December 2012
  • 3. TWO X86 CORES TUNED FOR TARGET MARKETS Mainstream Client and Server Markets “Bulldozer” Family Performance and Scalability “Cat” Family Small Flexible, Die Area Low Power, and Small Optimized Low-power for Cloud Markets Clients Jaguar Hotchips 2012 3 | Jaguar x86 Core Functional Verification | December 2012
  • 4. “JAGUAR” – DESIGN FOR LOW-POWER X86 CORE § Jaguar is based on AMD’s Bobcat low-power x86 core with goal to: –  Improve IPC/power/frequency –  Update the ISA/feature set § Significant changes between Bobcat and Jaguar: –  Totally new L2-inclusive cache shared among four Jaguar cores –  New power-management flow –  Update the ISA/feature set: –  SSE4.1, SSE4.2 –  AES, CLMUL –  MOVBE –  AVX, XSAVE/XSAVEOPT –  F16C, BMI1 –  40-bit physical address capable vs. 36-bit on Bobcat –  Improved virtualization –  Many design blocks totally or significantly redesigned 4 | Jaguar x86 Core Functional Verification | December 2012
  • 5. JAGUAR X86 CORE 32KB Branch Microarchitecture ICACHE Prediction Decode and Microcode ROMs Int Rename FP Decode Rename Scheduler Scheduler Int PRF FP Scheduler ALU ALU LAGU SAGU FP PRF Mul VALU VALU Div VIMul St Conv. 32KB Ld/St DCache Queues FPAdd FPMul To/From BU Shared Cache Unit Jaguar Hotchips 2012 5 | Jaguar x86 Core Functional Verification | December 2012
  • 6. JAGUAR COMPUTE UNIT (CU) § Four independent Jaguar cores CU SCU § Shared cache unit (SCU) –  4 L2 data banks (total 2MB) –  L2 interface tile L2D L2D L2D L2D To/From NB L2 Interface Core Core Core Core Jaguar Hotchips 2012 6 | Jaguar x86 Core Functional Verification | December 2012
  • 7. JAGUAR CORE PIPELINE 0 1 2 3 4 5 6 7 8 9 10 11 12 13 uCode Branch Mispredict Latency Fetch0 Fetch1 Fetch2 Fetch3 Fetch4 Fetch5 MDec ROM 14 cycles Dec0 Dec1 Dec2 iDec Pack FDec Dispatch Sched RegRd ALU WB Transit FpDec RegRen Sched RegRd1 RegRd2 EXE WB AGU DC1 DC2 Load Use Latency L1 hit: 3 cycles Jaguar Hotchips 2012 7 | Jaguar x86 Core Functional Verification | December 2012
  • 8. "JAGUAR" FUNCTIONAL VERIFICATION 8 | Jaguar x86 Core Functional Verification | December 2012
  • 9. "JAGUAR" FUNCTIONAL VERIFICATION STRATEGY § Jaguar core is verified with test benches at multiple levels: –  Unit (Whacker)-level test benches §  ID §  DE §  FP §  LSDC §  BU §  L2I §  MP –  Top (Cluster or CPC)-level test bench –  System (SOC)-level test benches 9 | Jaguar x86 Core Functional Verification | December 2012
  • 10. "JAGUAR" 32KB Branch TEST BENCHES ICACHE Prediction ID Test Bench Decode DE Test Bench and Microcode ROMs Int Rename FP Decode Rename Scheduler Scheduler Int PRF FP Scheduler ALU ALU LAGU SAGU FP PRF FP Test Bench Mul VALU VALU Div VIMul St Conv. 32KB Ld/St LSDC Test Bench DCache Queues FPAdd FPMul To/From BU Test Bench BU Shared Cache Unit 10 | Jaguar x86 Core Functional Verification | December 2012
  • 11. "JAGUAR" TEST BENCHES - CONT MP Test Bench (SCU + LS/DC/BU of each core) CU SCU L2D L2D L2I Test Bench - SCU L2D L2D To/From NB L2 Interface Core Core Core Core Top (Cluster) Test Bench - CU 11 | Jaguar x86 Core Functional Verification | December 2012
  • 12. "JAGUAR" FUNCTIONAL VERIFICATION STRATEGY § Cluster (Core) verification with mixed C++/SV(OVM/VMM)/assembly environment -- random and directed stimulus § Unit-level verification with SV OVM/VMM transaction-based random test benches § Formal verification used in FP and a few other blocks § Emulation done at SOC level § MVSIM used for power verification in cluster test bench § X-propagation targeted with special tool/regressions § Extensive use of coverage: –  Functional coverage –  Code coverage –  Microcode coverage 12 | Jaguar x86 Core Functional Verification | December 2012
  • 13. "JAGUAR" FUNCTIONAL VERIFICATION STRATEGY § Long bake (soak) time for bug hunting § Maintain high passing rate through the entire project –  Core/CPC team organized around “always tape-out ready” principle § Main code line should always be higher than 90% pass rate § Anything below 90% is considered a crisis -- all hands required to drive up pass rate –  Features developed on branches and merged when healthy enough to support main line pass rate > 90% § Different stimulus strategies used at different levels –  Core test bench uses mix of random exercisers (generators) and directed tests supported by global tools § Biased towards exerciser-based new development § Conscious effort to not write new directed tests because of maintenance costs § Rigorous core debug strategy –  Unit test benches use SV OVM/VMM-constrained random transaction-based tests 13 | Jaguar x86 Core Functional Verification | December 2012
  • 14. JAGUAR TOP (CLUSTER)-LEVEL TEST BENCH BLOCK DIAGRAM Various System Model Core/CU-level Fake UNB Checkers, DRAM Mem Irritators, and Cache Preloaders CU I/O Mem MP Mem Model SCU I/O DRAM L2D L2D L2I L2D L2D Various Mem Mem Monitors and Programmable Bridge Code Drivers Core Core Core Core x86 ISA Models 1 per Core 14 | Jaguar x86 Core Functional Verification | December 2012
  • 15. "JAGUAR" TOP-LEVEL STIMULUS § x86 random test generators –  Many single-threaded and multi-threaded generators –  Contemporary generator has more directed random capabilities and is used extensively in core/cluster-level test plan executions § Heavy emphasis on random and coverage for new stimulus requirements § Randomize control/configuration register state on per-test basis § L1/L2 cache preloaders and other dynamic, random irritators: –  MCA, TLBs, external probes, power-management events, interrupts, etc. § Fake UNB: –  Built-in randomization for things like memory-read latency § Large amount of self-checking x86 directed tests, mostly legacy: –  Use coverage-based test case selection to reduce run cost 15 | Jaguar x86 Core Functional Verification | December 2012
  • 16. "JAGUAR" TOP TEST BENCH CHECKING, COVERAGE, AND REGRESSIONS § Checking: –  x86 ISA model §  Architectural state compared at instruction retire §  MP memory model checks all memory accesses, ordering rules, and consistency §  Also used in MP unit-level test bench –  Cache coherency checkers §  MOESI state and corresponding data checked between all caches –  Variety of other cluster-level checkers (i.e., power management, probes, stalls) –  Thousands of inline RTL assertions –  All unit-level checkers re-used in top test bench –  Self-checking legacy-directed tests § Coverage: –  Heavy use and dependency on functional coverage –  Code coverage –  Microcode code coverage 16 | Jaguar x86 Core Functional Verification | December 2012
  • 17. "JAGUAR" TOP TEST BENCH CHECKING, COVERAGE, AND REGRESSIONS § 24x7 regression runs –  Use machine resources effectively §  Have enough pending sims to keep all machines busy –  Requires a good, organized debug effort to cover all fails § User-friendly regression database with many options/filters –  Helps synchronizing debug efforts among multiple teams §  Debug methodology –  Debug to root cause 17 | Jaguar x86 Core Functional Verification | December 2012
  • 18. "JAGUAR" TOP TEST BENCH METRIC § Test plan completeness § Functional, code, and microcode coverage § Regression cycles/instrs, pass rates, and fail signatures § RTL bug rates and open backlog § Verification bug rates and open backlog 18 | Jaguar x86 Core Functional Verification | December 2012
  • 19. UNIT-LEVEL TEST BENCHES SUMMARY (1 OF 3) § All unit-level test benches based on SV (VMM or OVM) § Most stimulus is constrained random transaction-based –  Coverage-driven random stimulus –  Randomization of control/configuration register is shared with higher-level test benches –  Stimulus “state targets” with time-outs §  Stimulus attempts to put DUT in a targeted state, with a time out, to catch deadlock/live-lock bugs §  Examples: Artificial reduction of RTL queue size § Multi-unit test bench used to target coherency § Good simulation performance -- cycle per second (CPS) –  Goal 5-10x comparing to top test bench 19 | Jaguar x86 Core Functional Verification | December 2012
  • 20. UNIT-LEVEL TEST BENCHES SUMMARY (2 OF 3) § 100% functional and code coverage with waiving few coverage points –  Selectively exporting functional coverage points to high-level test benches § Checking done using assertions, high-level checkers, and x86 ISA model –  All checks are re-used in the higher-level test benches –  Checks for unit stimulus constraints exported to higher-level test benches –  Create overlap of critical checking functions between unit-level and higher-level checkers –  Black and white box checking –  White box checks for: § Consistency between fields of different internal queues and arrays § Many others… –  Thousands of inline RTL assertions 20 | Jaguar x86 Core Functional Verification | December 2012
  • 21. UNIT-LEVEL TEST BENCHES SUMMARY (3 OF 3) §  Formal verification –  Still relaying on simulation –  FP – FPA and FPM theorem proofs –  Debug bus –  LS (USQ) –  CRC32 –  DE block –  Others 21 | Jaguar x86 Core Functional Verification | December 2012
  • 22. "JAGUAR" FP UNIT-LEVEL TEST BENCH BLOCK DIAGRAM Broadcast cloud Load CCU Store Monitors FPU Mon Op db Opgen ME BFM FPU RTL Checkers SRB BFM FPU KOS Bridge CLK, Reset, Test(s) KOS Timeout 22 | Jaguar x86 Core Functional Verification | December 2012
  • 23. JAGUAR LSDC TEST BENCH BLOCK DIAGRAM MP Mem Model Back-end Agent System Mem MP Probe/ I/O DRAM Write Irritator (Represents BU, NB, etc. DRAM I/O Mem Mem Not re-used.) Mem Mem DC Miss Buffers TLBs, Numerous Data Cache TableWalker Transactional monitors and stimulus generator scoreboard-based LS recipes (multiply Sched., Ordering checkers, plus Store Queue selectable, Queues shadow models of D$, interleavable) TLBs Front-end Agent (Represents ID, ME, EX, etc.) Memory layout and page translation configuration engines Dashed line indicates global re-use in MP test bench 23 | Jaguar x86 Core Functional Verification | December 2012
  • 24. JAGUAR L2I UNIT-LEVEL TEST BENCH BLOCK DIAGRAM Test SCU L2I connects 4 cores to NB and L2I manages a shared, inclusive L2 cache. Back door Memory preloaders, Test interface PRQ irritators, etc. x4 Interface stimulus per external device CRQ External Transaction driver Trans. generator Stimulus (drive-x if invalid) (test plug-ins) state DSM DPM BANK/ Interface checking per device Checker Monitored state L2 TAG transaction monitor x4 (x-checks) Internal L2 DATA x4 24 | Jaguar x86 Core Functional Verification | December 2012
  • 25. "JAGUAR" RTL BUGS FOUND PER TEST BENCH LEVELS Test Bench Level Percentage of Found RTL Bugs Unit-level Test Benches 31% Top-level Test Bench 65% System-level Test Bench 4% •  Bug distribution rate does not match typical/expected distribution •  Top-level test bench found most bugs due to: •  Some RTL blocks covered only in the top-level test bench •  Unit-level test benches extensively used in the bug fixes validation by RTL team due to good simulation performance (CPS) •  Bug hunting late in the project relies more on top-level test benches to find corner cases involving multiple blocks 25 | Jaguar x86 Core Functional Verification | December 2012
  • 26. CHALLENGES 26 | Jaguar x86 Core Functional Verification | December 2012
  • 27. POWER-MANAGEMENT VERIFICATION (1 OF 3) § New power-management interface between Jaguar and rest of the system § Shared L2 cache increases complexity § Each core can independently go to different levels of power states § Number of possible states very high: –  Number of clusters –  Number of cores –  Number of possible power states –  Number of wake-up events –  Specific windows of interest –  Number of features affected by power-management events (example: probes, debug features, etc.) § Specification changes 27 | Jaguar x86 Core Functional Verification | December 2012
  • 28. POWER-MANAGEMENT VERIFICATION (2 OF 3) § Stimulus –  Random generators §  Random sequences to change power-management states §  Per-thread stimulus –  Power-management irritators –  UNB BFM built-in randomization for certain power-management events –  Very few directed tests § Coverage –  Functional coverage extensively used §  Per-core coverage points §  Cross-coverage points –  Microcode coverage 28 | Jaguar x86 Core Functional Verification | December 2012
  • 29. POWER-MANAGEMENT VERIFICATION (3 OF 3) § Checking –  Power-management checker –  L2I and other checkers –  Self-checking directed tests §  Test bench level –  Unit-level test benches (L2I) –  Top-level test bench §  Most of verification done here because heavy dependency on microcode and better simulation performance than on SOC-level test bench –  SOC (System)-level test benches §  For power-management verification, SOC-level test bench is very important: –  Some power-management features are very complex and not all details well-documented –  Use SOC-level test bench to check power-management constraints used at lower test benches –  First level at which all power-management components are integrated 29 | Jaguar x86 Core Functional Verification | December 2012
  • 30. COHERENCY, SELF-MODIFYING CODE, AND CROSS-MODIFYING CODE (1 OF 2) § Coherency -- traditionally concerning feature in multi-processor (MP) environments –  MOESI protocol –  Common core interface (CCI) protocol between cluster and NB –  Inclusive L2 cache shared among four cores (from scratch) §  Example: Flushing and invalidation of shared caches –  Complexity increases with increased number of clusters § SMC/CMC handled by hardware in x86 –  Increased complexity due to inclusive shared L2 cache § Out-of-order execution adds complexity –  LS block totally redesigned 30 | Jaguar x86 Core Functional Verification | December 2012
  • 31. COHERENCY, SMC, AND CMC (2 OF 2) § Verification –  Done at multiple levels of test benches §  L2I test bench, top-level test bench, and SOC-level test benches –  Multiple levels of checkers (Jaguar-specific checkers and IP checkers) §  CCI IP protocol checkers (and coverage) –  MP memory-model checker used for ordering and data consistency –  Different types of stimulus (SV transaction-based, random generators, and directed tests) §  Some random generators created to target coherence/SMC/CMC specifically –  True and false sharing §  Cache preloading §  Functional coverage used to check quality of random stimulus §  MP test bench created to target coherency 31 | Jaguar x86 Core Functional Verification | December 2012
  • 32. JAGUAR MP (LSDC + BU + L2I) TEST BENCH BLOCK DIAGRAM Various Fake NB System Mem Core/CPC-level Checkers, DRAM Mem Irritators, and Cache CPC Preloaders SCU L2D L2D L2I L2D L2D I/O Mem MP Mem Model I/O DRAM “Core” “Core” “Core” “Core” Mem Mem Memory layout and “Core” BU Monitors, BU page translation Checkers configuration engines LSDC Monitors, LSDC Checkers IF stimulus LSDC tb stimulus (not re-used) (Exploded view) 32 | Jaguar x86 Core Functional Verification | December 2012
  • 33. MISCELLANEOUS § Verification done in multiple geographic locations –  Time zone differences §  Good for 24-hours-a-day work on a project §  Challenge for meetings and communication –  Sharing methodologies and tools § Jaguar is designed to be used in multiple SOCs § Adding new features late in a project § Compressed schedule § Jaguar verification team worked on two very successful projects (Bobcat and Jaguar) § Verification team starts a new project 33 | Jaguar x86 Core Functional Verification | December 2012
  • 34. DISCLAIMER & ATTRIBUTION The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes. AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. ATTRIBUTION © 2012 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. SPEC is a registered trademark of the Standard Performance Evaluation Corporation (SPEC). Other names are for informational purposes only and may be trademarks of their respective owners. 34 | Jaguar x86 Core Functional Verification | December 2012