1. More so than other areas of processor design, visibility of completion is still fairly low at the end of the project. Dreaded “when will we find the last bug” question. 2. Verification complexity increases non-linearly with design complexity 3. empirical evidence shows that projects are almost always delayed. Best case they hit the externally published schedule, but usually this is the 2 nd or 3 rd internal schedule… 4. Conservative estimates means lost design win opportunities, Optimistic estimates means slipped schedules or buggy silicon
1. Verification metrics are what is controlled by the DV team, health of the design is somewhat out of the control of the DV team 2. All metrics can be applied to full chip, or unit level of the design
Test plan only covers what you know to do, not what do don’t know yet you need to do Test plan is non-exhaustive and when you find bugs in the design, new corner cases are exposed. This will happen all the way to the end of the project (historical data can help) Health of the design can look better or worse than what it really is based on what is currently happening on the testing side Most health of the design metrics are trailing indicators, so you really need good historical data on similar projects to make full use of them Need to be careful to avoid meeting the letter of the law but not the intent: For example, if you have hard metrics on cycles run per week or tests written per week, test/cycle quality might go down. Need to think up front about how you want to use metrics to make sure you track the right things and also need to account for the time to build the infrastructure required to do it Historical data is very useful, but every project is different, and generally speaking future projects are more complex than previous ones, so needs to be taken with a grain of salt Metrics won’t replace subjective gut feel from experience. If gut feel is that the design is not ready for tapeout, then it probably isn’t. Need to take metric results with a grain of salt. This applies to the final ‘when we done’ as well as determing critical priorities throughout the project
Total bug graph fairly linear with one pronounced knee at about the 75% point Bugs per week pretty sporadic until it drops off at knee This is 4 week rolling average…results are even more sporadic if raw count is used
Breakdown by unit can be useful to indicate early stablilty of certain units (or point to deficit testing) Relative number of bugs found per area is roughly consistent with expectations based on complexity of each unit SIMD unit was an early focus and got stable before the rest of the design
Getting up to low 90% happens pretty quickly and most of the time is spent on closing the final 5% of the points Expect to have a few dips along the way as new coverage that wasn’t originally planned is added to the design May improve tracking in the future…breakout crosses vs. single points, add some way to indicate priority of points