The document discusses the functional verification of the Jaguar x86 low-power core. It describes Jaguar's microarchitecture, which includes improvements over the previous Bobcat core such as a new shared L2 cache and updated ISA support. The verification strategy involves testing at the unit, cluster, and system levels using techniques like random stimulus generation, coverage analysis, and formal verification. Challenges included verifying the complex new power management features and shared L2 cache across multiple independent cores.
3. TWO X86 CORES TUNED
FOR TARGET MARKETS
Mainstream Client and
Server Markets
“Bulldozer”
Family
Performance
and Scalability
“Cat” Family Small
Flexible, Die
Area
Low Power,
and Small Optimized
Low-power for Cloud
Markets
Clients
Jaguar Hotchips 2012
3 | Jaguar x86 Core Functional Verification | December 2012
4. “JAGUAR” – DESIGN FOR LOW-POWER X86 CORE
§ Jaguar is based on AMD’s Bobcat low-power x86 core with goal to:
– Improve IPC/power/frequency
– Update the ISA/feature set
§ Significant changes between Bobcat and Jaguar:
– Totally new L2-inclusive cache shared among four Jaguar cores
– New power-management flow
– Update the ISA/feature set:
– SSE4.1, SSE4.2
– AES, CLMUL
– MOVBE
– AVX, XSAVE/XSAVEOPT
– F16C, BMI1
– 40-bit physical address capable vs. 36-bit on Bobcat
– Improved virtualization
– Many design blocks totally or significantly redesigned
4 | Jaguar x86 Core Functional Verification | December 2012
5. JAGUAR X86 CORE
32KB Branch
Microarchitecture ICACHE Prediction
Decode
and
Microcode
ROMs
Int Rename
FP Decode
Rename
Scheduler Scheduler
Int PRF
FP Scheduler
ALU ALU LAGU SAGU FP PRF
Mul
VALU VALU
Div
VIMul St Conv.
32KB Ld/St
DCache Queues FPAdd FPMul
To/From
BU
Shared Cache Unit
Jaguar Hotchips 2012
5 | Jaguar x86 Core Functional Verification | December 2012
6. JAGUAR COMPUTE UNIT (CU)
§ Four independent Jaguar cores CU SCU
§ Shared cache unit (SCU)
– 4 L2 data banks (total 2MB)
– L2 interface tile L2D L2D
L2D L2D
To/From NB L2 Interface
Core Core Core Core
Jaguar Hotchips 2012
6 | Jaguar x86 Core Functional Verification | December 2012
9. "JAGUAR" FUNCTIONAL VERIFICATION STRATEGY
§ Jaguar core is verified with test benches at multiple levels:
– Unit (Whacker)-level test benches
§ ID
§ DE
§ FP
§ LSDC
§ BU
§ L2I
§ MP
– Top (Cluster or CPC)-level test bench
– System (SOC)-level test benches
9 | Jaguar x86 Core Functional Verification | December 2012
10. "JAGUAR"
32KB Branch
TEST BENCHES ICACHE Prediction
ID Test Bench
Decode
DE Test Bench and
Microcode
ROMs
Int Rename
FP Decode
Rename
Scheduler Scheduler
Int PRF
FP Scheduler
ALU ALU LAGU SAGU FP PRF FP Test Bench
Mul
VALU VALU
Div
VIMul St Conv.
32KB Ld/St
LSDC Test Bench DCache Queues FPAdd FPMul
To/From
BU Test Bench BU
Shared Cache Unit
10 | Jaguar x86 Core Functional Verification | December 2012
11. "JAGUAR" TEST BENCHES - CONT
MP Test Bench (SCU + LS/DC/BU of each core)
CU SCU
L2D L2D
L2I Test Bench -
SCU L2D L2D
To/From NB L2 Interface
Core Core Core Core
Top (Cluster) Test Bench - CU
11 | Jaguar x86 Core Functional Verification | December 2012
12. "JAGUAR" FUNCTIONAL VERIFICATION STRATEGY
§ Cluster (Core) verification with mixed C++/SV(OVM/VMM)/assembly
environment -- random and directed stimulus
§ Unit-level verification with SV OVM/VMM transaction-based random test
benches
§ Formal verification used in FP and a few other blocks
§ Emulation done at SOC level
§ MVSIM used for power verification in cluster test bench
§ X-propagation targeted with special tool/regressions
§ Extensive use of coverage:
– Functional coverage
– Code coverage
– Microcode coverage
12 | Jaguar x86 Core Functional Verification | December 2012
13. "JAGUAR" FUNCTIONAL VERIFICATION STRATEGY
§ Long bake (soak) time for bug hunting
§ Maintain high passing rate through the entire project
– Core/CPC team organized around “always tape-out ready” principle
§ Main code line should always be higher than 90% pass rate
§ Anything below 90% is considered a crisis -- all hands required to drive up pass rate
– Features developed on branches and merged when healthy enough to support
main line pass rate > 90%
§ Different stimulus strategies used at different levels
– Core test bench uses mix of random exercisers (generators) and directed tests
supported by global tools
§ Biased towards exerciser-based new development
§ Conscious effort to not write new directed tests because of maintenance costs
§ Rigorous core debug strategy
– Unit test benches use SV OVM/VMM-constrained random transaction-based
tests
13 | Jaguar x86 Core Functional Verification | December 2012
14. JAGUAR TOP (CLUSTER)-LEVEL TEST BENCH BLOCK DIAGRAM
Various System Model
Core/CU-level Fake UNB
Checkers, DRAM Mem
Irritators, and
Cache Preloaders
CU
I/O Mem
MP Mem Model SCU
I/O DRAM L2D L2D L2I L2D L2D
Various
Mem Mem Monitors and
Programmable
Bridge Code Drivers
Core Core Core Core
x86 ISA Models
1 per Core
14 | Jaguar x86 Core Functional Verification | December 2012
15. "JAGUAR" TOP-LEVEL STIMULUS
§ x86 random test generators
– Many single-threaded and multi-threaded generators
– Contemporary generator has more directed random capabilities and is used extensively in
core/cluster-level test plan executions
§ Heavy emphasis on random and coverage for new stimulus requirements
§ Randomize control/configuration register state on per-test basis
§ L1/L2 cache preloaders and other dynamic, random irritators:
– MCA, TLBs, external probes, power-management events, interrupts, etc.
§ Fake UNB:
– Built-in randomization for things like memory-read latency
§ Large amount of self-checking x86 directed tests, mostly legacy:
– Use coverage-based test case selection to reduce run cost
15 | Jaguar x86 Core Functional Verification | December 2012
16. "JAGUAR" TOP TEST BENCH CHECKING, COVERAGE, AND REGRESSIONS
§ Checking:
– x86 ISA model
§ Architectural state compared at instruction retire
§ MP memory model checks all memory accesses, ordering rules, and consistency
§ Also used in MP unit-level test bench
– Cache coherency checkers
§ MOESI state and corresponding data checked between all caches
– Variety of other cluster-level checkers (i.e., power management, probes, stalls)
– Thousands of inline RTL assertions
– All unit-level checkers re-used in top test bench
– Self-checking legacy-directed tests
§ Coverage:
– Heavy use and dependency on functional coverage
– Code coverage
– Microcode code coverage
16 | Jaguar x86 Core Functional Verification | December 2012
17. "JAGUAR" TOP TEST BENCH CHECKING, COVERAGE, AND REGRESSIONS
§ 24x7 regression runs
– Use machine resources effectively
§ Have enough pending sims to keep all machines busy
– Requires a good, organized debug effort to cover all fails
§ User-friendly regression database with many options/filters
– Helps synchronizing debug efforts among multiple teams
§ Debug methodology
– Debug to root cause
17 | Jaguar x86 Core Functional Verification | December 2012
18. "JAGUAR" TOP TEST BENCH METRIC
§ Test plan completeness
§ Functional, code, and microcode coverage
§ Regression cycles/instrs, pass rates, and fail signatures
§ RTL bug rates and open backlog
§ Verification bug rates and open backlog
18 | Jaguar x86 Core Functional Verification | December 2012
19. UNIT-LEVEL TEST BENCHES SUMMARY (1 OF 3)
§ All unit-level test benches based on SV (VMM or OVM)
§ Most stimulus is constrained random transaction-based
– Coverage-driven random stimulus
– Randomization of control/configuration register is shared with higher-level test
benches
– Stimulus “state targets” with time-outs
§ Stimulus attempts to put DUT in a targeted state, with a time out, to catch deadlock/live-lock bugs
§ Examples: Artificial reduction of RTL queue size
§ Multi-unit test bench used to target coherency
§ Good simulation performance -- cycle per second (CPS)
– Goal 5-10x comparing to top test bench
19 | Jaguar x86 Core Functional Verification | December 2012
20. UNIT-LEVEL TEST BENCHES SUMMARY (2 OF 3)
§ 100% functional and code coverage with waiving few coverage points
– Selectively exporting functional coverage points to high-level test benches
§ Checking done using assertions, high-level checkers, and x86 ISA model
– All checks are re-used in the higher-level test benches
– Checks for unit stimulus constraints exported to higher-level test benches
– Create overlap of critical checking functions between unit-level and higher-level
checkers
– Black and white box checking
– White box checks for:
§ Consistency between fields of different internal queues and arrays
§ Many others…
– Thousands of inline RTL assertions
20 | Jaguar x86 Core Functional Verification | December 2012
21. UNIT-LEVEL TEST BENCHES SUMMARY (3 OF 3)
§ Formal verification
– Still relaying on simulation
– FP – FPA and FPM theorem proofs
– Debug bus
– LS (USQ)
– CRC32
– DE block
– Others
21 | Jaguar x86 Core Functional Verification | December 2012
22. "JAGUAR" FP UNIT-LEVEL TEST BENCH BLOCK DIAGRAM
Broadcast
cloud
Load
CCU
Store
Monitors
FPU Mon
Op
db
Opgen ME BFM
FPU RTL
Checkers
SRB BFM
FPU KOS Bridge CLK,
Reset,
Test(s)
KOS Timeout
22 | Jaguar x86 Core Functional Verification | December 2012
23. JAGUAR LSDC TEST BENCH BLOCK DIAGRAM
MP Mem Model Back-end Agent System Mem
MP Probe/
I/O DRAM Write Irritator (Represents BU, NB, etc. DRAM I/O
Mem Mem Not re-used.) Mem Mem
DC Miss Buffers
TLBs,
Numerous Data Cache
TableWalker Transactional
monitors and stimulus generator
scoreboard-based LS recipes (multiply
Sched., Ordering
checkers, plus Store Queue selectable,
Queues
shadow models of D$, interleavable)
TLBs
Front-end Agent
(Represents ID, ME, EX, etc.)
Memory layout and page
translation configuration
engines Dashed line indicates global re-use
in MP test bench
23 | Jaguar x86 Core Functional Verification | December 2012
24. JAGUAR L2I UNIT-LEVEL TEST BENCH BLOCK DIAGRAM
Test
SCU L2I connects 4 cores to NB and
L2I manages a shared, inclusive L2 cache.
Back door Memory preloaders, Test interface
PRQ irritators, etc.
x4
Interface stimulus per external device
CRQ External Transaction driver Trans. generator Stimulus
(drive-x if invalid) (test plug-ins) state
DSM
DPM
BANK/ Interface checking per device Checker Monitored state
L2 TAG transaction monitor
x4
(x-checks)
Internal
L2
DATA
x4
24 | Jaguar x86 Core Functional Verification | December 2012
25. "JAGUAR" RTL BUGS FOUND PER TEST BENCH LEVELS
Test Bench Level Percentage of Found RTL Bugs
Unit-level Test Benches 31%
Top-level Test Bench 65%
System-level Test Bench 4%
• Bug distribution rate does not match typical/expected distribution
• Top-level test bench found most bugs due to:
• Some RTL blocks covered only in the top-level test bench
• Unit-level test benches extensively used in the bug fixes validation by
RTL team due to good simulation performance (CPS)
• Bug hunting late in the project relies more on top-level test benches
to find corner cases involving multiple blocks
25 | Jaguar x86 Core Functional Verification | December 2012
27. POWER-MANAGEMENT VERIFICATION (1 OF 3)
§ New power-management interface between Jaguar and rest of the system
§ Shared L2 cache increases complexity
§ Each core can independently go to different levels of power states
§ Number of possible states very high:
– Number of clusters
– Number of cores
– Number of possible power states
– Number of wake-up events
– Specific windows of interest
– Number of features affected by power-management events (example: probes,
debug features, etc.)
§ Specification changes
27 | Jaguar x86 Core Functional Verification | December 2012
28. POWER-MANAGEMENT VERIFICATION (2 OF 3)
§ Stimulus
– Random generators
§ Random sequences to change power-management states
§ Per-thread stimulus
– Power-management irritators
– UNB BFM built-in randomization for certain power-management events
– Very few directed tests
§ Coverage
– Functional coverage extensively used
§ Per-core coverage points
§ Cross-coverage points
– Microcode coverage
28 | Jaguar x86 Core Functional Verification | December 2012
29. POWER-MANAGEMENT VERIFICATION (3 OF 3)
§ Checking
– Power-management checker
– L2I and other checkers
– Self-checking directed tests
§ Test bench level
– Unit-level test benches (L2I)
– Top-level test bench
§ Most of verification done here because heavy dependency on microcode and better simulation
performance than on SOC-level test bench
– SOC (System)-level test benches
§ For power-management verification, SOC-level test bench is very important:
– Some power-management features are very complex and not all details well-documented
– Use SOC-level test bench to check power-management constraints used at lower test benches
– First level at which all power-management components are integrated
29 | Jaguar x86 Core Functional Verification | December 2012
30. COHERENCY, SELF-MODIFYING CODE, AND CROSS-MODIFYING CODE (1 OF 2)
§ Coherency -- traditionally concerning feature in multi-processor (MP) environments
– MOESI protocol
– Common core interface (CCI) protocol between cluster and NB
– Inclusive L2 cache shared among four cores (from scratch)
§ Example: Flushing and invalidation of shared caches
– Complexity increases with increased number of clusters
§ SMC/CMC handled by hardware in x86
– Increased complexity due to inclusive shared L2 cache
§ Out-of-order execution adds complexity
– LS block totally redesigned
30 | Jaguar x86 Core Functional Verification | December 2012
31. COHERENCY, SMC, AND CMC (2 OF 2)
§ Verification
– Done at multiple levels of test benches
§ L2I test bench, top-level test bench, and SOC-level test benches
– Multiple levels of checkers (Jaguar-specific checkers and IP checkers)
§ CCI IP protocol checkers (and coverage)
– MP memory-model checker used for ordering and data consistency
– Different types of stimulus (SV transaction-based, random generators, and
directed tests)
§ Some random generators created to target coherence/SMC/CMC specifically
– True and false sharing
§ Cache preloading
§ Functional coverage used to check quality of random stimulus
§ MP test bench created to target coherency
31 | Jaguar x86 Core Functional Verification | December 2012
32. JAGUAR MP (LSDC + BU + L2I) TEST BENCH BLOCK DIAGRAM
Various Fake NB System Mem
Core/CPC-level
Checkers, DRAM Mem
Irritators, and Cache CPC
Preloaders SCU
L2D L2D L2I L2D L2D I/O Mem
MP Mem Model
I/O DRAM
“Core” “Core” “Core” “Core”
Mem Mem
Memory layout and “Core”
BU Monitors, BU
page translation Checkers
configuration engines
LSDC Monitors, LSDC
Checkers
IF stimulus LSDC tb stimulus
(not re-used)
(Exploded view)
32 | Jaguar x86 Core Functional Verification | December 2012
33. MISCELLANEOUS
§ Verification done in multiple geographic locations
– Time zone differences
§ Good for 24-hours-a-day work on a project
§ Challenge for meetings and communication
– Sharing methodologies and tools
§ Jaguar is designed to be used in multiple SOCs
§ Adding new features late in a project
§ Compressed schedule
§ Jaguar verification team worked on two very successful projects (Bobcat and
Jaguar)
§ Verification team starts a new project
33 | Jaguar x86 Core Functional Verification | December 2012