• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
TRACK F: Improving Utilization of Acceleration Platforms by Using Off-Platform Test Generation/ Arkadiy Morgenshtein

TRACK F: Improving Utilization of Acceleration Platforms by Using Off-Platform Test Generation/ Arkadiy Morgenshtein






Total Views
Views on SlideShare
Embed Views



5 Embeds 729

http://www.chipex.co.il 637
http://www.explace.co.il 74
http://www.innovex.co.il 11
http://explace.co.il 6
http://translate.googleusercontent.com 1



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    TRACK F: Improving Utilization of Acceleration Platforms by Using Off-Platform Test Generation/ Arkadiy Morgenshtein TRACK F: Improving Utilization of Acceleration Platforms by Using Off-Platform Test Generation/ Arkadiy Morgenshtein Presentation Transcript

    • May 1, 2013 1Improving Utilization of AccelerationPlatforms by Using Off-Platform TestGenerationMay 1, 2013Wisam Kadry, Dmitry Krestyashyn, Arkadiy Morgenshtein,Amir Nahir, Vitali Sokhin, Elena TsankoIBM Research - Haifa
    • May 1, 2013 2OutlineIntroduction• Functional verification• Exercisers for Post-Si validation• Exercisers on Accelerators (EoA)Threadmill Overview• Architecture• Main featuresOffline Generation Mode• Motivation• MethodologyResults• Utilization improvement• Coverage improvementConclusions and Future Work
    • May 1, 2013 3Typical Functional Verification FlowTestTemplateCoverageAnalysis ToolCoverageInformationRandomStimuliGeneratorTestTestFailPassDUVSimulatorChecking,AssertionsCoverageReports
    • May 1, 2013 4SoftwareSimulationAccelerationPrototypingSiliconSpeedControllabilityandObservability10 1K 100K 10M 1GPre and Post Silicon Tradeoffs
    • May 1, 2013 5• Run operating-systems and application– Very limited coverage– Very little variability– Hard to debug• Run test-cases generated by pre-silicon test-generators– Long generation time implies many servers need to feed one siliconplatform– Low utilization due to loading time– Poor solutions for built-in online checking at test level• Pre-Si checking uses checkers of the simulation platforms, unavailable at Post-Si• ExercisersPost Silicon Validation Alternatives
    • May 1, 2013 6Exercisers: Post Silicon Validation ToolsExerciser - program that runs on a testing environment (acceleratoror/and silicon) and “exercises” the design by testing interestingscenarios on it
    • May 1, 2013 7Exerciser requirementsInclude a random stimuli generation component (as in pre-silicon)Valid stimuliAdhere to user requestsHigh quality stimuliGenerate many test-cases from the same test-templateSimple and fastCan run on early bring-up siliconEases debuggingIncreases platform utilizationSelf-containedMinimal interaction with the environmentLoaded once on the DUV, runs “forever”Bare-MetalContains OS services required by the test-casesEnables complete machine control
    • May 1, 2013 8Threadmill: IBM Post-Silicon ExerciserTestTemplateSystemConfigurationArchitecturalModel &TestingKnowledgeGenerator&KernelGenerationCheckingExecutionOS servicesTest TemplateTopologyArchitecturalModelExerciser ImageTest TemplateTopologyArchitecturalModelTest TemplateTopologyArchitecturalModelExerciser ImageBuilderTestTemplateTestTemplateSiliconAcceleratorReferenceModel
    • May 1, 2013 9Def language for test-templates:Rich language to describe the test-plan scenariosMulti-threaded support (each thread with its own scenario)Checking:Multi pass checking: comparing values of architectural resources (GPRs,SPRs, memory) between different executions of the same test-caseVariability originates from changes to the state of the designTiming variations in multithreaded processingRandomization of uArch modes of the processors – thread priority,internal control modesVariations in pipeline and cache statesUser ability to specify self checking as part of the test-caseThreadmill - Main Features
    • May 1, 2013 10Generation:Concurrent multi-threaded generationLight-weight, on-platformStatic: no reference model and no state trackingVery fast :100s of tests per second on siliconUtilization: 90% generation, 10% execution and checkingThreadmill - Main Features
    • May 1, 2013 11Large number of processors, each of which simulates a small portion of thedesign and pass the results between themProcessors running in parallel, allowing high execution performanceOrders of magnitude faster than simulationAllow good observability and coverage analysisAllow tests execution of billions of cycles at pre-Si stageThe platform used extensively and simultaneously by multiple projects andlocationsHigh cost and limited resources dictate request for utilization efficiencyAccelerators
    • May 1, 2013 12Exercisers on AcceleratorMotivation:Verification of early design models – more cycles, longer tests than in simulationDebug at bring-up stage (better observability than Si, higher speed than simulation)Utilization of failure event checkers, available only on AcceleratorSW validationTest quality analysis – coverage (count, specific functions hit)Challenges:High system cost and limited resource availability dictate a need for utilizationefficiency improvementTests ran by the exercisers should target coverage maximization within constrains oflimited resourcesProposed approach – Off-Platform Generation
    • May 1, 2013 13Threadmill Offline Generation ModeExecutionCheckingTC1RESt0GenerationTC10RESt0ExecutionCheckingTC1RESt0GenerationExecutionCheckingTC1RESt0GenerationExecutionNew ImageCheckingTC1ResultsAcceleratorGenerationTC10ResultsGenerationCheckingExecutionOS servicesTest TemplateTopologyArchitecturalModelExerciser ImageTest TemplateTopologyArchitecturalModelTest TemplateTopologyArchitecturalModelExerciser ImageTest TemplateGenerator&KernelBuilderArchitecturalModelReferenceModelConfiguration
    • May 1, 2013 14Threadmill Offline Generation Mode• Create image with generator component enabled– Include empty data structures for the test-cases, memory initializations,translation tables and expected results• Run the post-silicon application on a software reference model• Extract the necessary data of test-cases, memory and results from the runon a software reference model– Fill data structures with all the data• Produce an image that includes all harvested data.– Disable the generator component• Load the image to the acceleration platform• Run the image without the overhead associated with the generation oftest-cases and initializations.
    • May 1, 2013 15Offline vs. Regular GenerationPro’s• No cycles “waste” for on-platform generation• More test cases can be ran for same number of cycles• Higher test coverage can be expected• Comparison with SW reference model may reveal 2+2=5 bugsCon’s• Depends on a reference model• Big-size image loading influences number of test cases
    • May 1, 2013 16Experimental Setup• Two example test templates used as benchmarks:– Random: 100 random instructions– Directed: some threads perform load/stores; other threads runfunctional scenario• For each test template 3 images were prepared:– Regular mode– Offline mode with 50 test-cases– Offline mode with 100 test-cases
    • May 1, 2013 171.35 M1.3 M4.8 MCycles per test-case10050124Num of test-cases135 M65 M595 MTotal Acceleratorcycles44.3 MB23.7 MB3.5 MBImage size15.8 min8 min0.6 minTime to prepareimageOffline mode 100 TCOffline mode 50 TCRegular modeAccelerator utilization improvement: x3.7Results – Random Test
    • May 1, 2013 181.45 M1.4 M7 MCycles per test-case1005042Num of test-cases145 M70 M295 MTotal Acceleratorcycles45.9 MB24.6 MB3.7 MBImage size17.9 min10.2 min0.7 minTime to prepareimageOffline mode 100 TCOffline mode 50 TCRegular modeAccelerator utilization improvement: x5Results – Directed Test
    • May 1, 2013 19Coverage Comparison•About 50,000 coverage events are analyzed in the Accelerator model•A test of a new special feature of the next Power design was selected forcoverage comparison• Only events related to the specific functionality were analyzed• Exerciser code does not use the analyzed feature - less coverage “noise”•Number of covered events (out of 310 analyzed events):• Offline – 237• Regular – 209•Total count of hits of all events:• Offline – 117,020• Regular – 56,708
    • May 1, 2013 20110100100010000100000coverage events#hitshitCounter_offlinehitCounter_regularCoverage ComparisonEvents hit only by OfflineOffline achieves more hitsfor most events
    • May 1, 2013 21Conclusions and Future Work• More TCs – higher chance of triggering various scenarios• Improved coverage• Quality assessment of test content that is later used at bring-up• The Offline generation concept may be used in future as basis fora dedicated tool for Accelerator-based verification
    • May 1, 2013 22References• A. Adir, S. Copty, S. Landa, A. Nahir, G. Shurek, A. Ziv, C. Meissner,J. Schumann, “A unified methodology for pre-silicon verification andpost-silicon validation” – DATE 2011• A. Adir, M. Golubev, S. Landa, A. Nahir, G. Shurek, V. Sokhin, A. Ziv,“Threadmill: A post-silicon exerciser for multi-threaded processors” –DAC 2011• A. Adir, A. Nahir, G. Shurek, A. Ziv, C. Meissner, J. Schumann,“Leveraging pre-silicon verification resources for the post-siliconvalidation of the IBM POWER7 processor” – DAC 2011
    • May 1, 2013 23
    • May 1, 2013 24Thank You!