Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

xstream_network

167 views

Published on

  • Be the first to comment

  • Be the first to like this

xstream_network

  1. 1. XStream: Rapid Generation of Custom Processors for ASIC Designs by Ali Shahbazi
  2. 2. 2 Overview  What is XStream ?  Comparison to Network Processors  Design Flow  Design Example: Ethernet Bridge/VLAN Switch
  3. 3. 3 What is XStream ?  Software tool to rapidly generate high performance custom stream processors  Stream Processing: Repeated application of an algorithm kernel to a sequence of packets subject to throughput specifications  Resulting custom processors:  40-90% performance of a custom ASIC  < 5% design effort of a custom ASIC  Rapidly develop your own ultra high performance network processors!
  4. 4. 4 When you use a Network Processor What your product looks like What your competitor’s product looks like
  5. 5. 5 XStream vs Network Processor What if my application does not look like this ?
  6. 6. 6 XStream vs Network Processor What if my application does not look like this ? Network Processor: No help XStream: Make a system that looks like my app in days
  7. 7. 7 XStream vs Network Processor What if I want to use cheaper DDR2 instead of RDRAM or need more b/w ?
  8. 8. 8 XStream vs Network Processor What if I want to use cheaper DDR2 instead of RDRAM or need more b/w ? Network Processor: No help XStream: Select a different controller from the GUI and plop it on the chip
  9. 9. 9 XStream vs Network Processor  What if I need  Different type/number of micro-engines  More capable control processor  Additional high performance processors for value added services  More crypto cores  Different trie lookup hardware  Different DRAM bandwidth  Etc, etc, etc  Network processor: No help  XStream: Yes
  10. 10. 10 Design Flow  Draw an architecture diagram for your application  Select processors, interfaces, IP blocks etc from a GUI  Specify parameters, throughput requirements etc  Specify the high level function of any additional custom coprocessors you need  Press a button and wait...  XStream generates the h/w for you
  11. 11. 11 Design Example  Objective:  Design a platform chip that is shared across different products to save cost  Product 1: 16 port Ethernet Bridge  Product 2: 16 port VLAN switch with advanced filtering abilities  Major differences:  Wimpy ingress/egress processors ok on the bridge  VLAN Switch needs high performance ingress/egress processors  VLAN Switch needs high performance filter rule engine
  12. 12. 12 XStream: Designing a Platform Chip Link Interface Port Ingress Processor Port Egress Processor Link Interface Port Ingress Processor Port Egress Processor . . . 16 ports Ingress Queue Egress Queue Crossbar Stream Processor for Switching Decisions Control Processor External DRAM
  13. 13. 13 The Streams in XStream Link Interface Port Ingress Processor Port Egress Processor Link Interface Port Ingress Processor Port Egress Processor . . . 16 ports Ingress Queue Egress Queue Crossbar Stream Processor for Switching Decisions Control Processor External DRAM
  14. 14. 14 The Streams in Xstream Link Interface Port Ingress Processor Link Interface Port Ingress Processor Port Egress Processor . . . 16 ports Ingress Queue Egress Queue Crossbar Stream Processor for Switching Decisions Control Processor External DRAM Port Egress Processor
  15. 15. 15 The Streams in Xstream Link Interface Port Ingress Processor Port Egress Processor Link Interface Port Ingress Processor Port Egress Processor . . . 16 ports Ingress Queue Egress Queue Crossbar Stream Processor for Switching Decisions Control Processor External DRAM
  16. 16. 16 XStream: Mapping the core processor Link Interface Port Ingress Processor Port Egress Processor Link Interface Port Ingress Processor Port Egress Processor . . . 16 ports Ingress Queue Egress Queue Crossbar Stream Processor for Switching Decisions Control Processor External DRAM
  17. 17. 17 XStream: Mapping the core processor... Ingress Queue Egress Queue Stream Processor for Switching Decisions  Imagine a snazzy GUI here  Designer says:  Stream processor, 8 issue  Stream 1: Input, 16x1 queue, N deep  Stream 2: Output,16x1 queue, M deep  Stream 3: Inout, RISC processor interface  Add a CAM: 2 port, 48 bit keys, 1024 entries, 4 way associative, hash=F(…)  The tool ponders for a while…  Says: “Yes master”
  18. 18. 18 Ingress Queue Egress Queue Stream Processor for Switching Decisions  Imagine a snazzy GUI here  Designer writes 15 lines of code for the data plane, say in a subset of C  Designer says: Schedule and report  The tool ponders for a while…Says:  Compiled 45 instructions  Using modulo accelerator  Initiation interval = 8 cycles  Clock speed: 500 MHz  Throughput based on 64 byte (worst case) packet size:  500MHz/8 * 64 * 8 = 32 Gb/s  Area: 2.5mm x 2.5mm  Power: 1.2 W  Single stream processor @ 500 MHz = 32 Gb/s  Have designed up to 1 GHz processor in 0.13u process XStream: Mapping the core processor...
  19. 19. 19 XStream: Mapping the ingress processor... Link Interface Port Ingress Processor Port Egress Processor Link Interface Port Ingress Processor Port Egress Processor . . . 16 ports Ingress Queue Egress Queue Crossbar Stream Processor for Switching Decisions Control Processor External DRAM
  20. 20. 20 XStream: Mapping the ingress processor... Port Ingress Processor Filter Rule Engine  Imagine a snazzy GUI here  Designer says:  RISC processor engine, no-cache  2 issue, scratchpad memory  Stream 1: Input, link interface  Stream 2: Output, StreamProc:Ingress Queue  Add a Filter Rule Engine: Rule complexity = 64 terms, …  The tool ponders for a while…Says:  RISC core and compiler generated  Area: 1mm x 1mm (i.e. this can be replicated 100x on a 10x10mm chip)  Power: 250 mW
  21. 21. 21 Summary  Showed network processor design  But might as well be multi-media or wireless product design  Very high performance custom processors replace ASIC modules  Reduce design time for stream oriented ASIC modules by 95%  Retain 40-90% of ASIC performance  Software replaces hardware design  Software prototype already exists  Flexible, fast bug fixes, feature upgrades  Share chip across product family

×