Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Leveraging Low-CostFPGA Prototypingfor Validation ofHighly ThreadedServer-on-ChipDV Club - July 2009Jai Kumar,Verification...
Slide 2Jai KumarDV ClubOutline• Verification Challenges• Emulation alternatives• FPGA Prototyping Basics• Prototyping Chal...
Slide 3Jai KumarDV ClubDesign Challenges Impacting VerificationT1000 T5220 T5240 T5440050100150200250300ThreadsT1000 T5220...
Slide 4Jai KumarDV ClubServer-on-Chip:Verification Complexity • 2x+ performance overUltraSPARC T1, within thesame power en...
Slide 5Jai KumarDV ClubProblem: cost of Emulation going upGulfstream jetEmulator HW (big iron)
Slide 6Jai KumarDV ClubFPGA RoadmapSource: MPSOC Keynote 2006, XilinxFPGAs are getting bigger, cheaper and faster!
Slide 7Jai KumarDV ClubSolution: Supplement Emulation withcheaper FPGA prototyping alternatives• Why use FPGA prototyping?...
Slide 8Jai KumarDV ClubFPGA Prototyping 101What is Prototyping:• Process of mapping RTL functionality to FPGAsHardware:• M...
Slide 9Jai KumarDV ClubBig PictureModelingEffort1 10 100 1K 10K 100K 500K 1M 5M 10M 100M 1G+SimulationAccelerationEmulatio...
Slide 10Jai KumarDV ClubFPGA Protyping Vs. EmulationFeatures FPGA Prototype EmulationGeneral:Capacity Expandability Good V...
Slide 11Jai KumarDV ClubFPGA ToolsDesign PartitionRTL SynthesisAltera Place & Route Xilinx Place & RouteAltera Stratix3 FP...
Slide 12Jai KumarDV ClubDeployment Strategy• Understand platform capabilities and limitations> Build your use model> Set m...
Slide 13Jai KumarDV ClubPrototyping Challenges• Design Mapping – Size, Style> Limit to 4-6 FPGAs (~16M Gates)• Memory Mapp...
Slide 14Jai KumarDV ClubGuidelines• RTL Coding Guidelines for FPGAs> No XMRs, no force/release, avoid latches, clock gatin...
Slide 15Jai KumarDV ClubFPGA FlowModularSynthesisParallelSynthesisEmulationRTL ModelNetlistQualificationDesignPartitionDes...
Slide 16Jai KumarDV Club• OpenSPARC T2 Model> 3.8M Gates, Runs @8MHz> Being opensourced soon –opensparc.net• Hardware:> 6M...
Slide 17Jai KumarDV ClubPlatform improvements – to ease adoption• Bridge gap between Emulator and FPGAPrototyping> Learn f...
Slide 18Jai KumarDV ClubSummary• Low cost FPGA prototyping supplements expensiveemulators• Collaborate with vendors to imp...
Leveraging Low-CostFPGA Prototypingfor Validation ofHighly ThreadedServer-on-ChipDV Club - July 2009Jai Kumar,Verification...
Upcoming SlideShare
Loading in …5
×

Leveraging Low-Cost FPGA Prototyping for Validation of Highly Threaded Server-on-Chip

725 views

Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Leveraging Low-Cost FPGA Prototyping for Validation of Highly Threaded Server-on-Chip

  1. 1. Leveraging Low-CostFPGA Prototypingfor Validation ofHighly ThreadedServer-on-ChipDV Club - July 2009Jai Kumar,Verification TechnologistSun Microsystems Inc.jai.kumar@sun.comhttp://sun.com
  2. 2. Slide 2Jai KumarDV ClubOutline• Verification Challenges• Emulation alternatives• FPGA Prototyping Basics• Prototyping Challenges• Guidelines• Results• SummaryWhats in it for you -Managers:- Requirements – effort,$$, Time, toolsEngineers:- Challenges- Avoid PitfallsVendors:- Enhancements tosimplify adoption
  3. 3. Slide 3Jai KumarDV ClubDesign Challenges Impacting VerificationT1000 T5220 T5240 T5440050100150200250300ThreadsT1000 T5220 T5240 T5440020406080100120140160180Design SizeT1000 T5220 T5240 T54400123456789PerformanceT1000 T5220 T5240 T54400100200300400500600Memory64G128G256G512G1X2.5X4X8X326412825641M80M120M160M5000000 10000000 150000001101001000100001000001000000Design Size (M gates)SimulationSpeed(cycles/sec)SW SimEmulationFPGA Prototyping
  4. 4. Slide 4Jai KumarDV ClubServer-on-Chip:Verification Complexity • 2x+ performance overUltraSPARC T1, within thesame power envelope• Up to 8 cores @1.4GHz• 2x the threads> Up to 64 threads per CPU• 2x the memory> Up to 128GB memory> Up to 16 full buffered Dimms> 2.5x memory BW = 60+GB/S• 8x FPUs, 1 fully pipelinedfloating point unit/core• 4MB L2$ (8 banks) 16 way set• Security co-processor per core> DES, 3DES, AES, RC4, SHA1,SHA256, MD5, RSA to 2096 key,ECC• Powers SunFire T5120, T5220,T6320 ServersSSI, JTAG Debug portC4C3C2C1L2$ BankL2$ BankL2$ BankL2$ BankCrossbar16KBI$8KBD$16KBI$8KBD$16KBI$8KBD$16KBI$8KBD$C8C7C6C516KBI$8KBD$16KBI$8KBD$16KBI$8KBD$16KBI$8KBD$L2$BankMemorycontrollerMemorycontrollerMemorycontrollerFPUSPUFPUSPUFPUSPUFPUSPUFPUSPUFPUSPUFPUSPUFPUSPUCrossbarMemorycontrollerL2$BankL2$BankL2$BankL2$BankL2$BankL2$BankL2$BankDual-channelFB-DIMMDual-channelFB-DIMMDual-channelFB-DIMMDual-channelFB-DIMMNIU PCIe10 Gb Ethernet X8 @ 2.5 GHz2 GB/s each directionSys I/Fbuffer switchcore
  5. 5. Slide 5Jai KumarDV ClubProblem: cost of Emulation going upGulfstream jetEmulator HW (big iron)
  6. 6. Slide 6Jai KumarDV ClubFPGA RoadmapSource: MPSOC Keynote 2006, XilinxFPGAs are getting bigger, cheaper and faster!
  7. 7. Slide 7Jai KumarDV ClubSolution: Supplement Emulation withcheaper FPGA prototyping alternatives• Why use FPGA prototyping?Not enough $$ for HW Emulators (big iron) – R&D dollarsNeed to run at close to real-time speedNew advancements in FPGA technology creates opportunity for leverage• BenefitsAvailability of standard off-the-shelf, mix-n-match FPGA HW/SW tools (smalliron)Allows you to stretch your R&D dollarsDeploy many replicates – multiple systems in parallelSupplements your emulators (big iron) – does not replaceThink Small, Fast and Many
  8. 8. Slide 8Jai KumarDV ClubFPGA Prototyping 101What is Prototyping:• Process of mapping RTL functionality to FPGAsHardware:• Multiple Latest, Largest FPGAs on a board• Two Major Vendors: Altera & Xilinx• Capacity: 3-150M Gates• Performance: 5 to 50MHzSoftware:• Synthesis, Design Partition, FPGA P&R• Debug Tools
  9. 9. Slide 9Jai KumarDV ClubBig PictureModelingEffort1 10 100 1K 10K 100K 500K 1M 5M 10M 100M 1G+SimulationAccelerationEmulationFPGA PrototypingHW verification System-level (HW/SW verificationSW DevelopmentProductivityDebug ProductivitySimulation Speed (Hz)SiliconSolaris BootTime 15 years1Day 18hrs6 hours38mins
  10. 10. Slide 10Jai KumarDV ClubFPGA Protyping Vs. EmulationFeatures FPGA Prototype EmulationGeneral:Capacity Expandability Good Very GoodMemory Capacity Very Good GoodEase of use Low Very GoodCost Low HighModel Build Efficiency:Compile Time OK Very GoodModel Size Smaller BiggerRTL Flexibility OK GoodTest bench support OK Very GoodSimulation Efficiency:Simulation Speed Very Good GoodSave/Restore No Very GoodIO Expandability (PCIE,Ethernet etc) Very Good GoodDebug Efficiency:Signal Visibility Limited Very GoodWaveforms w/o re-run No Very Good
  11. 11. Slide 11Jai KumarDV ClubFPGA ToolsDesign PartitionRTL SynthesisAltera Place & Route Xilinx Place & RouteAltera Stratix3 FPGA Xilinx Virtex5 FPGAAltera SignalTap Debug Xilinx Chipscope DebugGidel HW DINI HW Vendor XDINISynopsysAdvanced DebugToolsAuspySynopsysCertifyAlteraQuartusSynopsysSynplifyXilinxISEALDECDAFCASynopsysIdentifyProRTLDesignHW BoardsOff-the-Shelf, Mix-n-Match FPGA Emulation HW/SW Tools
  12. 12. Slide 12Jai KumarDV ClubDeployment Strategy• Understand platform capabilities and limitations> Build your use model> Set management, user expectations• Identify Applicable Model Configurations> Size limited to small capacity (<16MGates)• Identify Workload> Primary Platform for SW Development> Secondary Platform for RTL/IO Verification• Design Mapping> Automated FPGA RTL Coding enforcements• Leverage simulators/emulators for debug
  13. 13. Slide 13Jai KumarDV ClubPrototyping Challenges• Design Mapping – Size, Style> Limit to 4-6 FPGAs (~16M Gates)• Memory Mapping> RTL Arrays (custom logic) – BLK RAM inferencing> Multi-ported arrays – over clocking> Large system memory - mapping to DDR• Verification Infrastructure> TestBench – synthesizable, self-checking> Initialization - Use back-door access to download/upload big memories> Monitors, SVA, $display is not supported – use LA triggers• Mapping Transformation Verification> Gate-level Simulation at every stage
  14. 14. Slide 14Jai KumarDV ClubGuidelines• RTL Coding Guidelines for FPGAs> No XMRs, no force/release, avoid latches, clock gating> No initializations (constant inits results in undesired synthoptimizations)> Perform FPGA RTL Linting Check• Stand-alone Synthesis & Verif of custom logic> check for RAM utilization & reduced CLK domains> Mixed-mode RTL-Gate Simulations• Perform full-chip gate simulations at different stages> After synthesis, after partitioning, after insertion of signalmultiplexing logic
  15. 15. Slide 15Jai KumarDV ClubFPGA FlowModularSynthesisParallelSynthesisEmulationRTL ModelNetlistQualificationDesignPartitionDesign VisibilityFPGAPlace & RouteC-APICompileRTL Simulation- verify latch, clk-gateconversions- fpga partitioning- pin multiplexingGate-levelSimulationFPGAPlatform
  16. 16. Slide 16Jai KumarDV Club• OpenSPARC T2 Model> 3.8M Gates, Runs @8MHz> Being opensourced soon –opensparc.net• Hardware:> 6M Gates> 2 Altera Stratix III SL340 FPGAS• Software:> RTL Partitioner, Bundled FPGA tools• Effort:> 1 engineer; 3 months• Applications:> Verify Core, SOC, IO> Verify Firmware (HV/OBP), Solaris,ApplicationC4C3C2C1L2$ BankL2$ BankL2$ BankL2$ BankCrossbar16KBI$8KBD$16KBI$8KBD$16KBI$8KBD$16KBI$8KBD$C8C7C6C516KBI$8KBD$16KBI$8KBD$16KBI$8KBD$16KBI$8KBD$L2$BankMemorycontrollerMemorycontrollerMemorycontrollerFPUSPUFPUSPUFPUSPUFPUSPUFPUSPUFPUSPUFPUSPUFPUSPUCrossbarMemorycontrollerL2$BankL2$BankL2$BankL2$BankL2$BankL2$BankL2$BankNIU PCIeSys I/FbufferswitchcoreFPGA Prototyping Results
  17. 17. Slide 17Jai KumarDV ClubPlatform improvements – to ease adoption• Bridge gap between Emulator and FPGAPrototyping> Learn from advances in the emulator space> Ease of model build> Support for RTL, SVA, TB constructs> Seamless RTL partitioning> Eliminate need for gate-simulations• Support for Verification infrastructure> XMRs, preserve net names, ports• Enhance Debug experience> Improve debug tools, offload to simulators
  18. 18. Slide 18Jai KumarDV ClubSummary• Low cost FPGA prototyping supplements expensiveemulators• Collaborate with vendors to implement feature-setfor your use models• FPGA Prototyping is effort-intensive, but will pay offin cost savings & higher performance• Benefit:> Higher HW & SW coverage (fewer silicon respins)> Debug Bringup Tools before TO (faster bringup; productizationtime savings)
  19. 19. Leveraging Low-CostFPGA Prototypingfor Validation ofHighly ThreadedServer-on-ChipDV Club - July 2009Jai Kumar,Verification TechnologistSun Microsystems Inc.jai.kumar@sun.comhttp://sun.com

×