PCIe Gen 3.0 Presentation @ 4th FPGA Camp

5,030 views

Published on

PCIe Gen3 presentation by PLDA at 4th FPGA Camp in Santa Clara, CA. For more details visit http://www.fpgacentral.com/fpgacamp or http://www.fpgacentral.com

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
5,030
On SlideShare
0
From Embeds
0
Number of Embeds
1,261
Actions
Shares
0
Downloads
176
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

PCIe Gen 3.0 Presentation @ 4th FPGA Camp

  1. 1. Copyright © 2011, PCI-SIG, All Rights Reserved<br />1<br />PCIe 3.0 Controller Case Study<br />Trupti Gowda<br />Field Application Engineer<br />PLDA<br />
  2. 2. Copyright © 2011, PCI-SIG, All Rights Reserved<br />2<br />The challenge of Gen3<br />PCIe 2.0 introduced a x2.0 bandwidth increase, and speed negotiation<br /><ul><li>Challenge mostly on electrical part, but existing PCIe logic could be adapted easily.</li></li></ul><li>Copyright © 2011, PCI-SIG, All Rights Reserved<br />3<br />The challenge of Gen3<br />PCIExpress 3.0 introduces x2.0 bandwidth increase, and 128b/130b encoding<br /><ul><li>Major changes are required in PHY layer</li></li></ul><li>Copyright © 2011, PCI-SIG, All Rights Reserved<br />4<br />Why a new architecture ?<br />128b/130b encoding has a major impact in PHY layer and many parts must be significantly modified:<br />Blocks completely changes the way packets & ordered sets are processed & transmitted.<br />Scrambling is different and affects data differently<br />Different and new ordered sets (SKIP, SDS, ..)<br />Equalization affects LTSSM & training sets<br />
  3. 3. Copyright © 2011, PCI-SIG, All Rights Reserved<br />5<br />Why a new architecture ?<br />Existing designs have often being defined for Gen1, adapted for Gen2, but technical choices are no longer optimal with another x2.0 throughput increase.<br />And also .. this is a good opportunity to use experience in PCIe to design a more powerful architecture.<br />
  4. 4. Copyright © 2011, PCI-SIG, All Rights Reserved<br />6<br />Selecting datapath<br />Datapath width is an important choice because it strongly affects :<br />Design effort<br />Device scalability<br />Ease of use for end-user<br />Possible targeted FPGA devices<br />Cost (gate count..)<br />
  5. 5. Copyright © 2011, PCI-SIG, All Rights Reserved<br />7<br />Selecting datapath<br />Example datapaths for a x8 device :<br />
  6. 6. Copyright © 2011, PCI-SIG, All Rights Reserved<br />8<br />Selecting datapath<br />Benefits of a small datapath (32/64-bit) :<br />Easier to design, minimal data alignment and corner case issues.<br />Only one (32-bit) or a maximum of two (64-bits) packets received on the same clock cycle<br />Small gate count<br />Easy to connect to common peripherals<br />
  7. 7. Copyright © 2011, PCI-SIG, All Rights Reserved<br />9<br />Selecting datapath<br />Drawbacks of a small datapath (32/64-bit) :<br />More complex in some situations<br />3..4 clock cycles (32-bit) or 2 clock cycles (64-bit) required to get the complete header of a TLP. Header checking done over several clock cycles.<br />Not suitable for high-throughput devices<br />Requires a faster clock than a large datapath for the same throughput : more power consumed and more expensive FPGA device.<br />
  8. 8. Copyright © 2011, PCI-SIG, All Rights Reserved<br />10<br />Selecting datapath<br />Benefits of a large datapath (128/256-bit) :<br />Requires a slower clock than a small datapath for the same throughput : less power consumed.<br />Can be implemented in slower and cheaper FPGA devices<br />Suitable for high-throughput devices (Gen2/Gen3)<br />
  9. 9. Copyright © 2011, PCI-SIG, All Rights Reserved<br />11<br />Selecting datapath<br />Drawbacks of a large datapath (128/256-bit) :<br />Generally more complex to design : more data alignment & corner cases to handle.<br />For example : up to 4 DLLPs (256-bit) can be received on a single clock cycle.<br />Larger gate count<br />More difficult to interface to legacy peripherals<br />
  10. 10. Copyright © 2011, PCI-SIG, All Rights Reserved<br />12<br />Datapath effect on device<br />Note that datapath size can advertly affect a device:<br />For Example : datapath frequency can have a huge impact on power consumption…<br />
  11. 11. Copyright © 2011, PCI-SIG, All Rights Reserved<br />13<br />Datapath effect on device<br />Example datapaths for a x4 gen1 device :<br />256-bit<br />@ 31.2 MHz<br />Gate x2.5<br />Freq x0.12<br />128-bit<br />@ 62.5 MHz<br />Gate x2.0<br />Freq x0.25<br />64-bit<br />@ 125 MHz<br />Gate x1.5Freq x0.5<br />32-bit<br />@ 250 Mhz<br />Gate x1.0<br />Freq x1.0<br />
  12. 12. Copyright © 2011, PCI-SIG, All Rights Reserved<br />14<br />Datapath effect on device<br />Power consumption is related to Gate x Freq2 :<br />256-bit<br />@ 31.2 MHz<br />Gate x2.5<br />Freq x0.12<br />Power 4%<br />128-bit<br />@ 62.5 MHz<br />Gate x2.0<br />Freq x0.25<br />Power 12.5%<br />64-bit<br />@ 125 MHz<br />Gate x1.5Freq x0.5<br />Power 37.5%<br />32-bit<br />@ 250 Mhz<br />Gate x1.0<br />Freq x1.0<br />Power 100%<br />x25 factor !<br />
  13. 13. Copyright © 2011, PCI-SIG, All Rights Reserved<br />15<br />Datapath effect on device<br />So this shows that :<br />All parts of the device must be designed carefully to avoid performance degradation.<br />Datapath must be chosen according to number of lanes and PCIe generation in order to :<br />sustain throughput at a reasonable frequency<br />match gate count & power requirements<br />
  14. 14. Copyright © 2011, PCI-SIG, All Rights Reserved<br />16<br />Clock Domain Crossing<br />PCIe logic typically runs at PIPE interface clock frequency. Clock Domain Crossing (CDC) allows part of this logic to run at an application specific clock frequency.<br />Clock Domain Crossing location is an important choice because:<br />There can be lots of signals to re-synchronize<br />This can consume logic and add latency<br />There can be required relationships between PIPE clock and application clock.<br />This can limit usability of CDC<br />There can be side effects inside PCIe logic. <br />
  15. 15. Copyright © 2011, PCI-SIG, All Rights Reserved<br />17<br />Clock Domain Crossing<br />Solution #1 : Change clock frequency between Transaction layer & application:<br />
  16. 16. Copyright © 2011, PCI-SIG, All Rights Reserved<br />18<br />Clock Domain Crossing<br />Benefits:<br />Simple to implement and there are no side-effects on PCIe logic.<br />Application clock is fully independent of PIPE clock and can be freely reduced to save power.<br />
  17. 17. Copyright © 2011, PCI-SIG, All Rights Reserved<br />19<br />Clock Domain Crossing<br />Drawbacks :<br />All receive/transmit datapath must be buffered with FIFOs or memories (consumes gates & adds latency)<br />All PCIe logic always runs at full PIPE frequency so this solution allows no power savings on PCIe logic.<br />
  18. 18. Copyright © 2011, PCI-SIG, All Rights Reserved<br />20<br />Clock Domain Crossing<br />Solution #2 : Change clock frequency between Data Link & Transaction layers:<br />
  19. 19. Copyright © 2011, PCI-SIG, All Rights Reserved<br />21<br />Clock Domain Crossing<br />Benefits:<br />Application clock is fully independent of PIPE clock and can be freely reduced to save power.<br />Receive/transmit datapaths can be buffer through receive/transmit buffers (saves gates).<br />Receive<br />Buffer<br />Transmit<br />Buffer<br />
  20. 20. Copyright © 2011, PCI-SIG, All Rights Reserved<br />22<br />Clock Domain Crossing<br />Drawbacks :<br />A part of PCIe logic always runs at full PCIe speed : this limits power savings.<br />Potential side effects<br />Flow control overflow cannot be detected accurately,…<br />Receive<br />Buffer<br />Transmit<br />Buffer<br />
  21. 21. Copyright © 2011, PCI-SIG, All Rights Reserved<br />23<br />Clock Domain Crossing<br />Solution #3 : Change clock frequency between PHYMAC & Data Link layers:<br />
  22. 22. Copyright © 2011, PCI-SIG, All Rights Reserved<br />24<br />Clock Domain Crossing<br />Benefit:<br />Most of PCIe logic runs at application frequency, this allows large power saving :<br />Example : with a PIPE 8-bit interface clock is 250MHz in Gen1, however with a 32-bit datapath a 62.5MHz application clock is enough.<br />
  23. 23. Copyright © 2011, PCI-SIG, All Rights Reserved<br />25<br />Clock Domain Crossing<br />Drawbacks :<br />All receive/transmit datapath must be buffered with FIFOs or memories (consumes gate & adds latency)<br />Application clock frequency must always be high enough to match data rate.<br />This is because there is no way to limit PCIe throughput before receive buffer.<br />Receive<br />Buffer<br />
  24. 24. Copyright © 2011, PCI-SIG, All Rights Reserved<br />26<br />Case Study Conclusion<br />Through these examples we can see that some key architecture choices directly impact PCIe device in terms of :<br />flexibility <br />performance<br />logic & power usage<br />
  25. 25. How PLDA can help your PCIe3.0 design?<br />PLDA has been designing PCIe since 2002, 65 man year effort, 20+ entries on PCISIG integrators list<br />PLDA has Soft IP for Gen3.0 - XpressRICH3<br />256b data-path for smaller clock frequency<br />User clock - integrated Clock Domain Crossing to support user-selected freqat the Application layer <br />Ease of use on FPGA<br />Quick PCIe layer to easily adapt the PCIe Gen1/2/3 Soft/Hard IP to the AXI4 interface which is popular with FPGA vendors.<br />Coming soon Stratix IV based PCIe3.0 hardware! <br />Copyright © 2010, PCI-SIG, All Rights Reserved<br />27<br />
  26. 26. AXI4 Interconnect Support<br />ARM has created the AMBA AXI4 bus which can be integrated into the FPGA fabric. <br />The AXI4 protocol defines a point to point interface developed to address SoC performance challenges. <br /><ul><li>It supports multiple clock domains, larger burst lengths up to 256 beats, and Quality of Service (QoS) signaling. AXI4-Lite is a subset of the AXI4 protocol used for control interface. The AXI4-Stream protocol is used to exchange data between masters and slaves and it does not have a defined or maximum burst or packet length.</li></ul>PLDA is amongst the first in PCIe domain to standardise on the AMBA 4 specification as part of our interconnect strategy to support 'plug and play' FPGA design <br />Copyright © 2010, PCI-SIG, All Rights Reserved<br />28<br />
  27. 27. QuickPCIeBlock Diagram<br />Copyright © 2010, PCI-SIG, All Rights Reserved<br />29<br />UNB<br />UNB<br />QuickPCIe-xxx-top<br />QuickPCIe-top<br />QuickPCIe Layer<br />AXI4-Lite slave<br />Internal Registers<br />AXI4-Lite Master<br />AXI4 Desc Master<br />PCIe<br />Block<br />AXI4-Stream<br />AXI4-Stream<br />AXI4-Stream<br />AXI4 Slave(s)<br />AXI4-Stream<br />Address Translator(s)<br />Address Translator(s)<br />AXI4-Stream<br />Address Translator(s)<br />Address Translator(s)<br />AXI4-Stream<br />Address Translator(s)<br />Address Translator(s)<br />AXI4 Master(s)<br />Address Translator(s)<br />DMA Engine(s)<br />Interconnect<br />AXI4-Stream<br />AXI4-Stream<br />AXI4-Stream<br />AXI4 Stream(s)<br />
  28. 28. Learn more about our soft IP controller and hardware solutions…..<br />Visit our booth here at FPGA Camp!<br />meet Sales Manager Kate Martin kmartin@plda.com (408)887-5981<br /> <br />Thank You!<br />Copyright © 2010, PCI-SIG, All Rights Reserved<br />30<br />

×