Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
SNUG 2014 1
Achieving Maximum System Performance on
Multi-FPGA designs using
HAPS-70 FPGA Prototyping System
Jaseel Abdull...
SNUG 2014 2
Agenda
• SNDK Prototyping goals
• SoC Mapping to HAPS-70
• Leveraging HSTDM technology for maximizing system
p...
SNUG 2014 3
SNDK Prototyping goals
SNUG 2014 4
SNDK Prototyping Goals
• HW validation
• Early FW/SW development.
• Full-SoC prototype
• Maximum achievable ti...
SNUG 2014 5
Design Overview - SSD Controller
High speed interfaces
Processor,
Debug,
Peripheral I/F
etc
Storage I/F
 Mult...
SNUG 2014 6
SoC Mapping to HAPS-70
SNUG 2014 7
Prototyping Stages
Overall Planning
• Full-SoC prototyping
• System frequency: 10 MHz
High-level
Goals
• Synop...
SNUG 2014 8
Prototyping Stages
Overall Planning Design Mapping
Clocks & Resets
Clocking
-On-board PLLs , MMCMs.
-Frequency...
SNUG 2014 9
Prototyping Stages
Overall Planning Design Mapping
Implementation
-Partitioning
-Synthesis
-Place & Route
Brin...
SNUG 2014 10
ISERDES/
OSERDES
DDR3
Controller
IDELAYE2/
ODELAYE2
IBUF/
OBUF
IDELAYCTRL/
ODELAYCTRL
DDR3
Memory
Prototyping...
SNUG 2014 11
Prototyping Stages
IP Bring-up: DDR3
• Simulation using DDR3 PHY emulation model
• DDR Init HW validation:
 ...
SNUG 2014 12
Prototyping Stages
IP Mapping: PCI-Express
• PHY mapped using GTXE2 transceivers.
• PHY -> controller speed b...
SNUG 2014 13
Prototyping Stages
IP Bring-up: PCI-Express
• IP and glue-logic simulation.
• PHY training and link bring-up ...
SNUG 2014 14
Leveraging HSTDM technology for
maximizing System performance
SNUG 2014 15
HSTDM Concept
• HSTDM - High Speed Time Division Multiplexing
SNUG 2014 16
HSTDM Flow
1. Pre-Certify
Preparation
2. RTL_PREP
3. Paritioning
4. Estimate
Timing for CPM
Qualified Nets
5....
SNUG 2014 17
HSTDM Flow
• Performance-aware partitioning
 Number of inter-FPGA connections
 Number of cables
Manual Part...
SNUG 2014 18
HSTDM Flow
• Performance-aware partitioning
 Number of inter-FPGA connections
 Number of cables
Manual Part...
SNUG 2014 19
HSTDM Flow
• Estimate timing on partitioned design.
• Select TDM Qualification criteria:
 All Nets
 Start a...
SNUG 2014 20
HSTDM Flow
• Factor in User logic delay and TDM delay for grouping
nets into different TDM ratios.
• Don’t TD...
SNUG 2014 21
HSTDM Flow
• Total cables = Num HSTDM cables + Num Non-HSTDM
cables
• Number of HT3 Cables for HSTDM = Total ...
SNUG 2014 22
HSTDM Flow
• SLPgen
 RTL based – mixed flow
 netlist based – srs flow
 time_est.srr – system level timing ...
SNUG 2014 23
HSTDM Flow
Log files and Timing Performance tuning process
RTL_PREP
design.srr
*_cck.rpt
PARTITION
TIMING_EST...
SNUG 2014 24
Smart Debugging
SNUG 2014 25
Smart debugging and validation
• Previous solution: Debug samples saved inside
FPGA
• What was the limitation...
SNUG 2014 26
HAPS-70
Smart debugging and validation
• Previous solution: Firmware log was stored inside DDR3
• What was th...
SNUG 2014 27
Full-prototype lab set-up
SNUG 2014 28
Full-chip prototype using HAPS-70
SNUG 2014 29
• Robust HW validation.
• Delivered a full SSD-drive prototype on FPGA platform to
SW team before tape-out.
•...
SNUG 2014 30
Thank You
SNUG 2014 31
Q&A
Upcoming SlideShare
Loading in …5
×

Snug 2014 China

1,060 views

Published on

Published in: Engineering, Technology, Design
  • Be the first to comment

  • Be the first to like this

Snug 2014 China

  1. 1. SNUG 2014 1 Achieving Maximum System Performance on Multi-FPGA designs using HAPS-70 FPGA Prototyping System Jaseel Abdulla Synopsys May 16, 2014 Shanghai, China
  2. 2. SNUG 2014 2 Agenda • SNDK Prototyping goals • SoC Mapping to HAPS-70 • Leveraging HSTDM technology for maximizing system performance • Smart debugging • Results & conclusion • Q&A
  3. 3. SNUG 2014 3 SNDK Prototyping goals
  4. 4. SNUG 2014 4 SNDK Prototyping Goals • HW validation • Early FW/SW development. • Full-SoC prototype • Maximum achievable timing performance Requirement Architecture Design Verification FW Validation Tape-out FPGA prototype FW/SWRTL
  5. 5. SNUG 2014 5 Design Overview - SSD Controller High speed interfaces Processor, Debug, Peripheral I/F etc Storage I/F  Multi-million gate count
  6. 6. SNUG 2014 6 SoC Mapping to HAPS-70
  7. 7. SNUG 2014 7 Prototyping Stages Overall Planning • Full-SoC prototyping • System frequency: 10 MHz High-level Goals • Synopsys HAPS-70 • Custom designed daughter cards: DDR, Flash, IO card FPGA Platform and HW setup • Certify - Partitioning • Synplify - Synthesis • Xilinx Vivado – P&R • Identify - Debug Tools and Methodology
  8. 8. SNUG 2014 8 Prototyping Stages Overall Planning Design Mapping Clocks & Resets Clocking -On-board PLLs , MMCMs. -Frequency division -Clock sync/ replication -Gated clock conversion Resets -HAPS-70 global reset -Reset sync/ replication IP Mapping PHYs -Emulation PHYs with speed bridges : DDR, PCIE, Flash -Daughter card design Processor cores -Vendor supplied netlists
  9. 9. SNUG 2014 9 Prototyping Stages Overall Planning Design Mapping Implementation -Partitioning -Synthesis -Place & Route Bring-up -IP bring-up -Full system bring-up Internal Debug -Identify -Chipscope -Timing Report analysis External Debug -Logic Analyzer -UMRBus, JTAG -Protocol Analyzer etc.
  10. 10. SNUG 2014 10 ISERDES/ OSERDES DDR3 Controller IDELAYE2/ ODELAYE2 IBUF/ OBUF IDELAYCTRL/ ODELAYCTRL DDR3 Memory Prototyping Stages IP Mapping: DDR3 • DDR3 daughter card design  Length-matched for data/strobe/control • PHY mapping using IDELAYCTRL, IDELAYE2, ODELAYE2, ISERDESE2, OSERDESE2 primitives.
  11. 11. SNUG 2014 11 Prototyping Stages IP Bring-up: DDR3 • Simulation using DDR3 PHY emulation model • DDR Init HW validation:  Config registers  Training  Read/Write leveling  Lab measurement of DQ/DQS alignment and skew • Basic testing • Firmware stress test of write-read-compare. • Functional validation at 50 MHz. • Successfully tested after full-SoC integration and performance tuning at 80 MHz.
  12. 12. SNUG 2014 12 Prototyping Stages IP Mapping: PCI-Express • PHY mapped using GTXE2 transceivers. • PHY -> controller speed bridge  PHY: 125 MHz 16-bit; Controller: 62.5 MHz 32-bit t Lane 1 Lane 0 Freq Step 2:1 Xilinx Phy Controller / PhyStatus / 32 Rx data Tx data 32 Pipe_clk clk62MHz / PhyStatus / 16 Rx data 16 Tx data RXN TXN
  13. 13. SNUG 2014 13 Prototyping Stages IP Bring-up: PCI-Express • IP and glue-logic simulation. • PHY training and link bring-up using Protocol analyzer. • Firmware driver for testing:  Enumeration  Read/write of PCIe-Config registers. • Host read/write stress testing at Gen1 speed. MGB connector & PCIe IO cards Cables connects to PCIe host
  14. 14. SNUG 2014 14 Leveraging HSTDM technology for maximizing System performance
  15. 15. SNUG 2014 15 HSTDM Concept • HSTDM - High Speed Time Division Multiplexing
  16. 16. SNUG 2014 16 HSTDM Flow 1. Pre-Certify Preparation 2. RTL_PREP 3. Paritioning 4. Estimate Timing for CPM Qualified Nets 5. Trace Assignment 6. SLP Gen 7. Estimate Timing & Time budgeting 8. Making sense of various reports for timing closure 9. Synthesis + PAR 10. Validating HSTDM Training in lab
  17. 17. SNUG 2014 17 HSTDM Flow • Performance-aware partitioning  Number of inter-FPGA connections  Number of cables Manual Partitioning
  18. 18. SNUG 2014 18 HSTDM Flow • Performance-aware partitioning  Number of inter-FPGA connections  Number of cables Manual Partitioning
  19. 19. SNUG 2014 19 HSTDM Flow • Estimate timing on partitioned design. • Select TDM Qualification criteria:  All Nets  Start and End @ Sequential  End @ Sequential  Start @ Sequential • Report qualified TDM nets timing to get slack #s.  Helps in excluding timing critical nets from TDM. Using Timing estimate and Qualified TDM
  20. 20. SNUG 2014 20 HSTDM Flow • Factor in User logic delay and TDM delay for grouping nets into different TDM ratios. • Don’t TDM critical nets like resets, feed-thru nets etc. • Use mixed ratio TDM for better timing performance. Grouping nets into TDM ratios TDM version minimum delay freq with 10ns user logic freq with 20ns user logic delay HSTDM 4x2 21ns 32MHz 24MHz HSTDM 6x2 33ns 23MHz 19MHz HSTDM 8x2 40ns 20MHz 17MHz HSTDM 16x2 55ns 15MHz 13MHz HSTDM 32x2 75ns 12MHz 10.5MHz HSTDM 64x2 117ns 7.9MHz 7.3MHz HSTDM 128x2 197ns 5MHz 4.6MHz
  21. 21. SNUG 2014 21 HSTDM Flow • Total cables = Num HSTDM cables + Num Non-HSTDM cables • Number of HT3 Cables for HSTDM = Total number of Qualified Nets between 2 FPGAs / (Number of differential pairs * HSTDM ratio) • Example: Cable calculation FPGA _A FPGA _B3000 signals Each cable needs a clock between FPGAs 24-2=22 : Usable diff-pair IOs HSTDM ratio: 16 # of HSTDM cables required: 2700/(16*22)=8
  22. 22. SNUG 2014 22 HSTDM Flow • SLPgen  RTL based – mixed flow  netlist based – srs flow  time_est.srr – system level timing analysis • Mixed mode useful for Identify instrumentation • Run estimate timing SLPgen and Timing estimate
  23. 23. SNUG 2014 23 HSTDM Flow Log files and Timing Performance tuning process RTL_PREP design.srr *_cck.rpt PARTITION TIMING_EST cpm_time_est. srr qualified.txt TRACE ASSIGN & SLPGEN slpgen.srr *_timing.sdc HSTDM_ TIMING_EST result_time _est.srr
  24. 24. SNUG 2014 24 Smart Debugging
  25. 25. SNUG 2014 25 Smart debugging and validation • Previous solution: Debug samples saved inside FPGA • What was the limitation?  Performance and congestion • New solution: DTD SRAM card  Higher sample depth  Save FPGA resources  Improved P&R and timing Deep trace debug (DTD) with SRAM card
  26. 26. SNUG 2014 26 HAPS-70 Smart debugging and validation • Previous solution: Firmware log was stored inside DDR3 • What was the limitation?  Limited capacity • New solution: record firmware log to host via UMRBus  Continuous logging from power-up Continuous logging using UMRBus UMR Bus I/F
  27. 27. SNUG 2014 27 Full-prototype lab set-up
  28. 28. SNUG 2014 28 Full-chip prototype using HAPS-70
  29. 29. SNUG 2014 29 • Robust HW validation. • Delivered a full SSD-drive prototype on FPGA platform to SW team before tape-out. • Full-design prototyping • “Our approach”  IP stand-alone bring up for functional validation, followed by full- SoC integration  Proving HSTDM flow on simpler interface, then scaling it to full design for performance  Using DTD and UMRBus for enhanced debug Results & Conclusion Prototyping System System Clock DDR3 Clock TDM ratios TDM Bit Rate HAPS-70 S48 10MHz 80MHz Mixed (32, 16, 8) 1000 Mbps
  30. 30. SNUG 2014 30 Thank You
  31. 31. SNUG 2014 31 Q&A

×