• Save
Architecting10 x performanceforltewebinar4.20.10
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Architecting10 x performanceforltewebinar4.20.10

on

  • 1,348 views

.This presentation was shown at a webinar entitled, “Architecting a 10X Performance Breakthrough for LTE Core Networking”.

.This presentation was shown at a webinar entitled, “Architecting a 10X Performance Breakthrough for LTE Core Networking”.

Statistics

Views

Total Views
1,348
Views on SlideShare
1,290
Embed Views
58

Actions

Likes
1
Downloads
3
Comments
0

3 Embeds 58

http://www.ccpu.com 53
http://www.slideshare.net 4
http://www.slashdocs.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Architecting10 x performanceforltewebinar4.20.10 Presentation Transcript

  • 1. Architecting a 10X Performance Breakthrough for LTE Core Networking April 20, 2010 9-10 PT / 12-1 ET / 5-6 UK Panelist discussion = 50 min, Q&A = 10+ min The webinar will be recorded and available for future playback
  • 2. The Research Linley Analyze Group Advise Architecting a 10X Breakthrough for LTE Introduction By The Linley Group April 20, 2010
  • 3. The Linley Group LTE Addresses Skyrocketing Bandwidth Demand Per-User Bandwidth Aggregate Bandwidth 15,000 4,000 3,500 MB/month 3,000 PB/month 2,500 300 2,000 200 1,500 1,000 100 500 0 Basic Blackberry iPhone PC User 11 09 10 12 13 14 20 20 20 20 20 20 Phone Source: Consumer Reports, US Telecom Source: Cisco Smartphone Forecast Bandwidth Use By App Type 600 Smartphone Shipments (MU) Other VoIP & IM 3% 500 3% 400 300 P2P Browsing 200 19% 27% 100 Download 19% Streaming 0 29% 2008 2009 2010 2011 2012 2013 2014 Source: The Linley Group Source: Allot April 20, 2010 Architecting a 10X Breakthrough for LTE 2
  • 4. The Linley Group LTE Specifies 10X Better Bandwidth, Users, Latency 200 350 Peak Mbps Down 300 180 250 160 200 140 Latency (ms) 150 120 100 100 50 0 80 60 8 z TS z z z + Source: 3GPP H 5/ H H H A 40 0M M M M M SP R )5 20 20 U A )1 H 20 SP x2 .3 .5 x2 (2 H at at 0 (2 C C E E LT 2G HSDPA HSPA DSL HSPA+ LTE E E LT LT LT Source: NSN, Ericsson 1200 1000 # Users Per Sector 800 600 400 200 0 HSPA EV-DO Rev. A LTE (20MHz) Source: Verizon, Siemens April 20, 2010 Architecting a 10X Breakthrough for LTE 3
  • 5. The Linley Group LTE Streamlines The Data-Network Architecture Four Elements in 3G Network Node B RNC SGSN GGSN Internet Two Elements in LTE Network eNode B Gateway Internet April 20, 2010 Architecting a 10X Breakthrough for LTE 4
  • 6. The Linley Group Actual Networks Are A Bit More Complicated GMLC SLg E-SMLC CBC 1xCS IWS MSC HSS EIR SLg SLs SBs S12 S102 (SGs,Sv) S6a S13 Rel-8+ SGSN S3 Pre Rel-8 Gn SGSN MME H12 S10 PCRF CALEA HI3 S11 S1-MME S7 Rx+ Networ X1 k Operator’s Serving PDN IP Services E-UTRAN S1 – U Gateway S5 Gateway SGi (e.g., IMS, PSS, etc.) UE Macro Femto ::: ::: ::: Micro April 20, 2010 Architecting a 10X Breakthrough for LTE 5
  • 7. The Linley Group MME / Gateway Designers Have Many Processor Options • Multiple single / dual-core RISC  Advantages: similar software domain as all-CP approach  Disadvantages: integration • Multicore x86  Advantages: fast CPU, up to 8 threads, big cache  Disadvantages: integration & interfaces • NPU  Advantages: fast at L2-L4 functions, designed to hide latency  Disadvantages: requires control-plane processor, complex programming model • Integrated multicore processor for communications  Advantages: flexible partitioning, familiar software model, integration, performance features for communications  Challenges: partitioning software, exploiting performance features April 20, 2010 Architecting a 10X Breakthrough for LTE 6
  • 8. The Linley Group Multiprocessing Is Challenging State Management Processing Latency Multicore Execution Execution Execution Processor Packets 9 8 7 6 5 4 3 2 1 1 2 3 4 Branch Cache Mispredict Miss 5 6 7 8 Software Partitioning Stack + ? + April 20, 2010 Architecting a 10X Breakthrough for LTE 7
  • 9. The Linley Group Hardware and Software Co- Operate to Solve Challenges HW for State Management Multithreading Hides Latency Thread 3 Thread 1 Thread 1 Thread 2 Thread 2 Thread 3 Thread 1 CPU CPU CPU CPU Memory Shared On-Chip Messaging CPU CPU CPU CPU Specialized Software Balances Load, Manages State L5 L4 L3 CPU2 L2 App CPU0 L5 CPU5 Mgmt L3 + L4 CPU3 + App + OS CPU7 L2 OS CPU1 L5 CPU6 L4 CPU4 April 20, 2010 Architecting a 10X Breakthrough for LTE 8
  • 10. The Linley Group NetLogic, 6WIND, Continuous Achieve a 10X SCTP Speedup 1,000 Throughput (pps) 750 500 250 0 XE50 x86 PP50 PP50 User Space User Space Fast Path Source: Continuous Computing April 20, 2010 Architecting a 10X Breakthrough for LTE 9
  • 11. The Linley Group NetLogic, 6WIND, Continuous Architecture Overview NetLogic XLR732 6WIND/ 6WIND/ SCTP SCTP SCTP Linux App App Dist. Dist. FP FP FP 4 vCPU 4 vCPU 4 vCPU 4 vCPU 4 vCPU 4 vCPU 4 vCPU 4 vCPU April 20, 2010 Architecting a 10X Breakthrough for LTE 10
  • 12. The Linley Group Up Next • NetLogic Microsystems – Jim Johnston  Multicore, Multithread Processors • 6WIND – Eric Carmes  Packet Processing for Multicore Processors • Continuous Computing – Deepak Wadhwa  Adapting Trillium Protocol Software for 10X Gains April 20, 2010 Architecting a 10X Breakthrough for LTE 11
  • 13. The Linley Group About The Linley Group • Areas of specialty • Contact info  Embedded processors  www.linleygroup.com  Wireless semiconductors  1-800-413-2881  Wired-comm. semis  1-408-281-1947 • Services  cs@linleygroup.com  Consulting  joe@linleygroup.com  Reports  Events April 20, 2010 Architecting a 10X Breakthrough for LTE 12
  • 14. Architecting a 10X Performance Breakthrough for LTE Core Networking
  • 15. Leveraging NetLogic’s XLR MultiCore Processor in LTE Core Network Systems Eight Banked 2MB Level-2 Cache Memory Bridge Dual-Channels Dual-Channels DRAM Memory Distributed Interconnect DRAM Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Core 7 Core 8 32KB D 32KB D 32KB D 32KB D 32KB D 32KB D 32KB D 32KB D 32KB I 32KB I 32KB I 32KB I 32KB I 32KB I 32KB I 32KB I Memory and I/O Bridge QDRSRAM/LA1 Quad-Channel 1 5 9 13 17 21 25 29 2 6 10 14 18 22 26 30 3 7 11 15 19 23 27 31 Purpose I/O 4 8 12 16 20 24 28 32 General Fast Messaging Network I/O Distributed Interconnect Packet Distribution Engines High-Speed Security Ethernet Network Interfaces System IO Engine R (2 x10G, 4 x1G) (PCI-X, DMA S HT, etc.) Engine 1 2 3 4 A • Eight Quad-threaded high-performance MIPS64 processor cores • Fast Message Network for high-speed intra-SOC communication • Autonomous Security Acceleration Engine minimizes processor overhead • High-speed memory infrastructure with low-overhead DMA Engine Architecting a 10X Performance Breakthrough for LTE Core Networking | April 20, 2010 2
  • 16. Benefits of Multithreading Single CPU - Single Thread • Improves performance by Memory Latency Memory Latency Memory Latency Memory Latency hiding memory latency • When one vCPU (thread) stalls, Processing Time the next one takes over • CPU usage is maximized 4 threads in parallel takes much less time to complete 4 tasks by utilizing cycles • Without threading, CPU’s otherwise wasted on memory latency will stall Single CPU - Four Threads • vCPU’s (threads) consume Thread 1 much less area and power at Thread 2 Thread 3 a given performance level Thread 4 Performance Improvement Architecting a 10X Performance Breakthrough for LTE Core Networking | April 20, 2010 3
  • 17. Performance & Power Efficiency Two identical 1.0GHz Single 2.0GHz dual- single-threaded CPUs threaded CPU No performance loss Cache Miss, Performance Memory Stall, Total 2.0GHz loss Total 2.0GHz Branch mis- 1.0GHz prediction Cache Miss, Memory Stall, 400MHz Branch mis- 400MHz prediction CPU0 CPU1 CPU0 CPU1 vCPU0 vCPU1 vCPU0 vCPU1 Electric Power > Electric Power Multi-threaded core is energy efficient, and captures lost cycles caused by memory latencies Architecting a 10X Performance Breakthrough for LTE Core Networking | April 20, 2010 4
  • 18. XLR – Fast Messaging Network TM • Low-Latency Command Messaging Network for Thread-Level Processing • Lockless Inter-Agent communication • I/O to I/O, I/O to CPU, CPU to CPU • Dedicated “Mailboxes” for each message station • Efficient Mechanism for Internal Control and Communication • Packet Information Thread 0 • Security Requests 10GbE • Computation Management between CPUs/Threads 10GbE Thread 1 4x 1GbE Thread 2 • Advantages • Breaks the packet distribution bottleneck Thread 3 DMA created by multiple simultaneous processing elements • Eliminates CPU / Thread communication overhead common to MP environments (locks and semaphores) • Prevents unnecessary context switches Security Engine Thread 31 • Conserves memory accesses • Simplifies I/O driver architecture Architecting a 10X Performance Breakthrough for LTE Core Networking | April 20, 2010 5
  • 19. Security Acceleration – Options Pkt 1 Pkt 2 Pkt N • No Acceleration % CPU 100% Utilization (for Security) 0% Time Pkt 1 Pkt 2 Pkt N • CPU Instruction Set Enhancements % CPU 100% Utilization 100% CPU All Issues (for Security) 0% Time Pkt 1 Pkt 2 Pkt N • Security Processing Offload % CPU 100% Utilization Wait on int Or poll (for Security) 0% Time • Autonomous Security Accelerator Pkt 1 Pkt 2 Pkt N % CPU 100% Run to completion Utilization No real-time (for Security) requirements 0% Time Architecting a 10X Performance Breakthrough for LTE Core Networking | April 20, 2010 6
  • 20. Security Engine • Fully Autonomous 10-Gbps bulk Encrypt / Decrypt and Authentication • Completely offload CPU, not share with CPU, low power consumption for security application, and reserve CPU cycles for other applications (e.g. 1 core for 10Gbps) • Easy to manage export control laws • 4 Programmable Security Pipes • Message Driven Interface enables low overhead asynchronous access • Minimizes use of memory bandwidth • Integrated DMA with full scatter/gather support • Supports both cipher and authentication in a single pass Architecting a 10X Performance Breakthrough for LTE Core Networking | April 20, 2010 7
  • 21. Efficient Packet Flow within XLR Memory Packet Data Via Distributed Packet Data Rx Interfaces Packet Data Tx Network Network Core Network Network Interface Accelerator 1 Accelerator Interface DMA DMA RGMII / RGMII / XGMII / XGMII / SPI-4.2 SPI-4.2 Parser Parser Thread Packet Director Packet Director 2 Packet Distribution Packet Distribution Engine Engine FMN Station FMN Station FMN Station Fast Messaging Rx Packet Tx Packet Network Descriptor Descriptor Message Message 3 4 Free Back Descriptor Message 5 Architecting a 10X Performance Breakthrough for LTE Core Networking | April 20, 2010 8
  • 22. Continuous Computing’s PP50 ATCA Packet Processing Blade XLR #1 DDR2 10GbE Switch XLR #1 TCAM & HT Mezz XLR #2 10GbE SFP+ XLR #1 DDR2 GbE RJ45 Console Architecting a 10X Performance Breakthrough for LTE Core Networking | April 20, 2010 9
  • 23. Introducing NetLogic’s XLP® Processor Architecture Banked 8MB Level-3 Cache Level- DDR3 DRAM Controller GCU Memory Distributed Interconnect (MDI) Memory and I/O Bridge Inter-Chip-Interface (ICI) DDR3 DRAM L2 Cache L2 Cache L2 Cache L2 Cache L2 Cache L2 Cache L2 Cache L2 Cache Controller L1-I L1-D L1- L1- L1-I L1-D L1- L1- L1-I L1-D L1- L1- L1-I L1-D L1- L1- L1-I L1-D L1- L1- L1-I L1-D L1- L1- L1-I L1-D L1- L1- L1-I L1-D L1- L1- vCPU0 vCPU4 vCPU8 vCPU12 vCPU16 vCPU20 vCPU24 vCPU28 vCPU1 vCPU5 vCPU9 vCPU13 vCPU17 vCPU21 vCPU25 vCPU29 vCPU2 vCPU6 vCPU10 vCPU14 vCPU18 vCPU22 vCPU26 vCPU30 vCPU3 vCPU7 vCPU11 vCPU15 vCPU19 vCPU23 vCPU27 vCPU31 DDR3 DRAM Controller EC4400 EC4400 EC4400 EC4400 EC4400 EC4400 EC4400 EC4400 Core 0 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Core 7 PIC Fast Central Message Switch (CMS) DDR3 DRAM Controller EJTAG EJTAG I2C I2C I/O Distributed Interconnect (IODI) Flash Flash POE GPIO GPIO PWR Mgt Network Acceleration Engine Security RSA Purpose I/O DMA, Engines Engines UART UART General COMP Interlaken RAID SPI SPI Interlakens XAUIs SGMIIs (LA) PCI-E PCI- 0 1 2 1011 01 2 6 7 SDHC SDHC Debug Debug Misc Misc • Leverages successful XLR/S architecture (FMN, Memory Interconnect, Etc.) • Quad-issue, Simultaneously Quad-threaded core with out-of-order execution • Delivers 2x – 3x performance per watt performance gain • 40Gbps+ Network capacity delivers performance requirements for Next Gen LTE Core system requirements • Enhanced Autonomous Acceleration engines further offload main processor from critical tasks • Multi-chip cache-coherent scalability via Interchip Interface (ICI) Architecting a 10X Performance Breakthrough for LTE Core Networking | April 20, 2010 10
  • 24. NetLogic’s Multi-Core Solution Delivers High Performance for LTE Core System Solutions • Hardware Multi-Threading • Leveraged by both Continuous Computing and 6WIND to optimize control and data plane SW architecture and performance • Fast Message Network • Low-overhead packet interface for fast path • Inter-processor communication within the control plane and between control plane and fast path • Autonomous Security Accelerator • Integrated scatter/gather DMA eliminates memory copy requirement • Scalability • Dual XLR solution on Continuous Computing’s ATCA PP50 blade with software compatible roadmap to Dual XLP Solution (on PP80) • Delivers 10X performance benefit for LTE Core Systems as demonstrated by 6WIND, Continuous, NetLogic developed platform Architecting a 10X Performance Breakthrough for LTE Core Networking | April 20, 2010 11
  • 25. NetLogic Microsystems Overview • Worldwide leader in intelligent semiconductor solutions powering next- generation Internet networks • Multi-Core, Multi-Threaded Processors • Knowledge-based Processors • Layer 7 Content Processors • 10/40/100GE PHYs • Low-Power Embedded Processors • NASDAQ: NETL • IPO in July 2004 • $350M revenue forecasted for CY2010 www.NetLogicMicro.com • Based on forward-looking guidance provided on NETL’s 4th Quarter 2009 Earnings Call on Feb 2, 2010 • Contact Information • General: info@netlogicmicro.com • Technical: support@netlogicmicro.com Architecting a 10X Performance Breakthrough for LTE Core Networking | April 20, 2010 12
  • 26. Architecting a 10X Performance Breakthrough for LTE Core Networking Eric Carmes, 6WIND
  • 27. High-Performance Packet Processing for LTE Core Networking  Converged, all-IP networks require high-performance packet processing Performance OS + High- Performance  A standard OS cannot sustain multi- Packet 10Gb/s of complex network traffic Processing that requires deep inspection to implement advanced services. Standard  An OS networking stack uses OS Operating services System  OS’s limitations such as preemptions, threads, timers, locking, etc.  Performance bottlenecks including Number of cores L1/L2 CPU cache misses and pipeline mis-predictions Architecting a 10x Performance Breakthrough for LTE Core Networking, April 20 2010. Slide 2
  • 28. Design Challenges  Developing packet processing under an executive environment outside the Operating System environment  Modern IP networking stack is more complex than the initial IP-UDP-TCP stack  Developing a new stack from the ground up is very expensive  Porting an OS stack under the executive environment is a faster alternative but  This only removes the OS scheduling limitations  Still requires development of APIs to synchronize control plane and data plane  Having a complete networking stack under the executive environment is not efficient  Only the processing performed on each packet has to be improved  It is useless to try to accelerate the processing of complex signaling protocols at the executive level as this is only a small fraction of the overall traffic  An OS networking stack provides some stable APIs for applications  Change of the design of the networking stack could lead to costly integration and validation steps, delaying adoption of the multicore architecture Architecting a 10x Performance Breakthrough for LTE Core Networking, April 20 2010. Slide 3
  • 29. Fast Path Architecture: the Best of Both Worlds for Optimized Performance Full IP stack for Networking processing Stack exception traffic Exception packet NOK Lock-free packet OK processing for steady-state traffic Fast Path Local info ? Fast path packet  Maximizes performance  Preserves the API of the OS networking stack  Requires complete synchronization of Fast Path, networking stack & Control Plane Architecting a 10x Performance Breakthrough for LTE Core Networking, April 20 2010. Slide 4
  • 30. 6WINDGate  6WINDGate™ is 6WIND’s packet processing Customer’s Application Software framework  Fast path is a high-performance simplified stack Gate™ that processes and forwards the majority of packets Linux Control Plane without involvement by the stack or the control plane, maximizing the performance of the platform Networking Stack  Networking stack is a commercial-quality Linux NetLogic NetOS Fast Path stack with all standard networking features such as IPsec, firewall, NAT, QoS, filtering etc.  Control plane provides application-level signaling NetLogic XLS / XLR protocols such as routing, security, connectivity etc.  Support for industry-leading multicore architectures Architecting a 10x Performance Breakthrough for LTE Core Networking, April 20 2010. Slide 5
  • 31. 6WINDGate for NetLogic Processor  Core allocation for Linux and NetOS is flexible and done at runtime using hybrid model and userapp Customer’s Application Software  Fast Path is using NetOS services and run-to- Gate™ completion model Linux Control Plane  Linux running in SMP mode (on one or more cores) Networking Stack  Each protocol has been optimized for XLS/XLR NetLogic NetOS Fast Path  Crypto-engines support for IPsec / IKE maximal performance NetLogic XLS / XLR  Ensures linear scalability over cores/threads  Zero-Copy mechanism between NetOS and Linux with large zones of shared memory to provide extended system capabilities  Same binary for XLS/XLR Architecting a 10x Performance Breakthrough for LTE Core Networking, April 20 2010. Slide 6
  • 32. Compelling Packet Processing Performance for Multicore Platforms  Up to 10X performance improvement vs. standard Linux implementation Control Gate™ SDS Profile Plane  “SDS” profile on multicore processor with one core dedicated to Linux Slow Fast Fast Fast Path Path Path Path  10X improvement is typical, performance on Linux specific platforms available under NDA NetOS NetOS NetOS NetLogic Multi-Core Processor Standard Linux Up to 10X 6WINDGate (SDS profile) Performance Mpps Architecting a 10x Performance Breakthrough for LTE Core Networking, April 20 2010. Slide 7
  • 33. 6WINDGate Includes a Full Set of Networking Protocols Control Plane Modules Fast Path Modules Static RIP (IPv4, IPv6), RIPng, Gate™ OSPFv2, OSPFv3, BGP-4, IP forwarding NAT Routing BGP-4+, EMCP (IPv4, IPv6), Protocols VRRP, PIMv4-SM, PIMv6-SM, IGMP/MLD snooping & proxy, IPsec, IPsec SVTI ROHC static route monitoring & BFD Networking Stack IKE, IKEv2, EAP, VPN Layer 2 VLAN, GRE, Security Optimized stack for multicore including: Flow inspection monitoring link aggregation PPP, Multi-link PPP, PPPoE, • All Linux networking features CHDLC, VLAN, GRE, 6in6, (TCP/IP, filtering, NAT, IPsec…) QoS Multicast Connectivity 4in4, L2TP, DHCPv4/v6, DNS proxy, RADIUS client • Optimized SMP, 2K VR for forwarding, firewalling, NAT and IPsec IP reassembly GTP encapsulation Switching LACP • Integrated crypto engine management Home agent, FMIP, Extended Fast Path TCP splicing for IPsec and SSL corresponding node, mobile Mobility node, IPsec integration, • VNB framework for fast Layer 2 NEMO, proxy MIP IP filtering MPLS encapsulation through Layer 4 protocol integration Virtual Routing Routing protocols, IKE • Network system calls optimization IPv6 tunneling and (UDP, SCTP, RAW). PPP / L2TP (VRF) transition Monitoring system, High • Graceful Restart extensions for High Distributed synchronization daemons for High Availability availability Availability. Processing ARP-NDP, routing and IPsec Architecting a 10x Performance Breakthrough for LTE Core Networking, April 20 2010. Slide 8
  • 34. 6WINDGate for Continuous Computing / PP50  Two NetLogic XLR processors to double performance Complete Continuous Computing Solution  Flexibility to share cores from the two processors between Fast Path HA & Management Frameworks and Linux to maximize performance of the system Trillium LTE Stacks (eGTP, Diameter…) Control Plane  Complete XML-based management Gate™ XMS HA (Routing, IKE…) solution (XMS) OS Networking Stack  High Availability support on dual Fast Path Continuous Computing PP50 architectures IPv4/IPv6, IPsec, L2, Qos, firewall, Fast Path Modules GTP, GRE, virtual routing…  Easy integration with Trillium stacks Continuous Computing Integrated Platform Architecting a 10x Performance Breakthrough for LTE Core Networking, April 20 2010. Slide 9
  • 35. Full Support for High Availability System Architectures  6WINDGate meets the requirements High Availability System Framework of High Availability (HA) systems  For networking and telecom equipment where five-nines reliability or zero downtime Gate™ Gate™ is critical Control Plane Synchronization Control Plane Control Plane  HA support fully integrated into XLR0 Networking Networking 6WINDGate software architecture Stack Stack  Two industry-standard failover modes Fast Path XLR1 Fast Path Fast Path Redundancy  Redundant hardware platforms, each running 6WINDGate with Control Planes maintaining coherent view of overall system Continuous Computing Continuous Computing PP50 A PP50 B  Dual active Fast Path subsystems, providing redundancy between network interfaces for Non-Stop Forwarding (NSF) Architecting a 10x Performance Breakthrough for LTE Core Networking, April 20 2010. Slide 10
  • 36. Summary  6WIND provides multicore packet processing Application Software software that extracts the highest possible networking performance from NetLogic’s XLR Gate™ processor Linux Control Plane  Fully-compatible with existing Linux application Networking Stack software: no need for developers to rewrite their code to take advantage of 6WINDGate NetLogic NetOS Fast Path performance and features  Comprehensive set of optimized networking, NetLogic XLS / XLR routing and security protocols provides the ideal platform for Continuous Computing’s Trillium software Architecting a 10x Performance Breakthrough for LTE Core Networking, April 20 2010. Slide 11
  • 37. About 6WIND Company Overview Contact Information  A networking and telecom software company  http://www.6wind.com  Providing high-performance packet processing  6wind-contact@6wind.com software products for multicore platforms  Charlie Ashton, VP of Marketing  Customers include Tier-1 networking and charlie.ashton@6wind.com telecom equipment suppliers worldwide +1 (512) 913-6231  Headquartered in France with approximately 45 employees worldwide (Asia, Europe, North America) Architecting a 10x Performance Breakthrough for LTE Core Networking, April 20 2010. Slide 12
  • 38. Architecting a 10X Performance Breakthrough for LTE Core Networking Deepak Wadhwa Chief Architect
  • 39. SCTP Requirements & Challenges  High number of SCTP associations (64K+)  High SCTP throughput (1 million packets/sec)  High rate of association establishment / teardown  Limitations of existing SCTP implementations  Kernel or user-space based  Scalability in multi-core environments (locks)  Avoiding buffer copy  Checksum calculation April 20, 2010 Architecting a 10X Performance Breakthrough for LTE Core Networking 2
  • 40. Trillium FastPath Architecture NetLogic Microsystems XLR732 Processor Linux OS NETOS Control Plane Application Data Plane Application Control Plane Library Data Plane Protocol • System Services • Buffer Management • Messenger • Messenger Distributor • Management and Control APIs • Protocol Data APIs • Protocol Control Logic • Protocol Logic User Space 6WINDGATE 6WINDGate SDS Kernel Space Slow Path Fast Path IP Stack Management API Data API Provision the protocol and environment  Send / receive data & flow control parameters, collect status & statistics, indications provide alarm indications Control API Reference Applications Control & setup logic for the specific Sample Control, Data & Mgmt plane protocol (e.g., for SCTP create / delete applications provided of end points & associations) April 20, 2010 Architecting a 10X Performance Breakthrough for LTE Core Networking 3
  • 41. SCTP Architecture Linux NETOS Control App Data App Data App SCTP SCTP Control & Core SCTP Fast Path Management API Library SCTP Fast Path thread User DMA Engine Kernel SCTP Distributor SCTP Distributor Shared Memory IPSEC/IP IPSEC/IP SAE 6WIND Linux Engine FMN Packet Distribution Engine Control Plane 6WIND Fast Path April 20, 2010 Architecting a 10X Performance Breakthrough for LTE Core Networking 4
  • 42. SCTP Optimizations  Data structure duplication  Avoid locking  Hardware acceleration engines  DMA engine for checksum calculation  SAE engine for IPsec encryption / decryption  Pre-fetching memory into cache  Atomic operations  Buffer reference counters  Profile counters April 20, 2010 Architecting a 10X Performance Breakthrough for LTE Core Networking 5
  • 43. 10X Performance Breakthrough Highlights  Achieved peak rate of 1029K packets/sec  10X better than comparative solutions SCTP (User Space) on XE50 (x86) 100K LKSCTP on PP50 (Linux) 30K Trillium SCTP FastPath on PP50 1029K  SCTP Fast Path Architecture XLR Cores 6WIND 6WIND SCTP SCTP SCTP Linux App App FP FP FP Distributor Distributor Scaled to more than 1M PPS on 12 SCTP threads April 20, 2010 Architecting a 10X Performance Breakthrough for LTE Core Networking 6
  • 44. Scalability Details (With APP Threads) April 20, 2010 Architecting a 10X Performance Breakthrough for LTE Core Networking 7
  • 45. GTP-u / eGTP-u Fast Path Architecture Control Data Processing Customer Control Fast Path Application Control Customer Fast Path GTP Control PMIPv6 DHCP Application Shared Memory ( Routing, Security, GTP-u / UDP GRE Data Flow Database) IP IPSec Customer 6WIND Trillium Protocol Software April 20, 2010 Architecting a 10X Performance Breakthrough for LTE Core Networking 8
  • 46. GTP/IP L3 Routing Performance 10 9 8 Gbps 7 6 5 L3 Routing 4 with GTP 3 2 1 0 64 128 256 512 768 Packet Size(Bytes) April 20, 2010 Architecting a 10X Performance Breakthrough for LTE Core Networking 9
  • 47. Conclusion Unprecedented 10X Performance Breakthrough Acceleration of key wireless protocols in the fast path Exploits processing power of multi-core / multi-threaded packet processors Integrated Package Simplifies Development Delivered integrated with FlexPacket ATCA-PP50 Control, Management & Fast Path APIs + reference applications Smooth Evolution to Next Generation Packet Processing Roadmap to support PP80 featuring NetLogic Microsystems XLP processors Built on 6WINDGate SDS Fast Path protocol suite: portability, rich feature set Deployment & Interoperability-Proven Trillium protocols Products retain core software modules from their standard Trillium counterparts Leverages decades of feedback, code hardening & interop into the fast path April 20, 2010 Architecting a 10X Performance Breakthrough for LTE Core Networking 10
  • 48. About Us  Private  Founded 1998  ~300 employees  HQ in San Diego  R&D in China & India  Acquired in 2003 For more info: Brian Wood VP Marketing  100% focus on telecom market brian.wood@ccpu.com April 20, 2010 Architecting a 10X Performance Breakthrough for LTE Core Networking 11
  • 49. Architecting a 10X Performance Breakthrough for LTE Core Networking Joe Byrne, Senior Analyst Jim Johnston, Sr. Director Multicore Processors Eric Carmes, Founder & CEO Deepak Wadhwa, Chief Architect The webinar has been recorded and will be available for future playback