• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
STAC Report Highlights
 

STAC Report Highlights

on

  • 577 views

Report of benchmarking of a full QFabric System, based on the same STAC M2 standards—the first-ever STAC benchmark on a data center network fabric (or any network environment more complex than a ...

Report of benchmarking of a full QFabric System, based on the same STAC M2 standards—the first-ever STAC benchmark on a data center network fabric (or any network environment more complex than a single switch). This is significant because in a traditional tiered network, latency and jitter add up, meaning the larger the network gets, the less deterministic it is. Only QFabric has proven to deliver fast, predictable performance across the more complex environment. The benchmark determined the mean latency for the QFabric System, which in the STAC M2 test was rated at 10 microseconds (μsec). For comparison purposes, the best published score Cisco recorded for its Nexus 5010 switch had a mean latency of 14 microseconds (μsec).

Statistics

Views

Total Views
577
Views on SlideShare
577
Embed Views
0

Actions

Likes
0
Downloads
8
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    STAC Report Highlights STAC Report Highlights Document Transcript

    • STAC Report™ HighlightsIBM WMQ LLM with IBM x3650 servers, Mellanox ConnectX®-2 EN with RoCE and Juniper Networks QFabric System (SUT ID: LLM120109) STAC-M2™ Benchmarks v1.0 Tested by: STAC Test date: 9 Jan 2012 Report 1.0.0, 23 Feb 2012 This document was produced by the Securities Technology Analysis Center, LLC (STAC® ), a provider of performance measurement services, tools, and research to the securities industry. To be notified of future reports, or for more information, please visit www.STACresearch.com. Copyright ©2012 Securities Technology Analysis Center, LLC. "STAC" and all STAC names are trademarks or registered trademarks of the Securities Technology Analysis Center, LLC. Other company and product names are trademarks of their respective owners.
    • STAC Report Highlights STAC-M2.v1.0 - LLM120109 Disclaimer The Securities Technology Analysis Center, LLC (STAC® ) prepared this report at the request of Juniper Networks. It is provided for your internal use only and may not be redistributed, retrans- mitted, or published in any form without the prior written consent of STAC. All trademarks in this document belong to their respective owners. The test results contained in this report are made available for informational purposes only. STAC does not guarantee similar performance results. All information contained herein is provided on an “AS–IS” BASIS WITHOUT WARRANTY OF ANY KIND. STAC has made commercially reasonable efforts to adhere to published test procedures and otherwise ensure the accuracy of the contents of this document, but the document may contain errors. STAC explicitly disclaims any liability what- soever for any errors or otherwise. The evaluations described in this document were conducted under controlled laboratory conditions. Obtaining repeatable, measurable performance results requires a controlled environment with spe- cific hardware, software, network, and configuration in an isolated system. Adjusting any single element may yield different results. Additionally, test results at the component level may not be indicative of system level performance, or vice versa. Each organization has unique requirements and therefore may find this information insufficient for its needs. Customers interested in analyzing their own environment using the same methodology used in this report are encouraged to contact STAC or visit www.STACresearch.com/harnesses.Copyright ©2012 STAC, LLC. Page 2 NO REDISTRIBUTION
    • STAC Report Highlights STAC-M2.v1.0 - LLM120109 Contents Summary 4 1 Background on the STAC-M2 Benchmark specifications 7 1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2 Test cases used to produce the STAC-M2 Report Card . . . . . . . . . . . . . . . . . . . . . . . . 9 2 Product background 12 3 Project participants and responsibilities 14 4 Contacts 14 5 Benchmark Status 14 6 Methodology 14 6.1 Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 6.2 STAC Test Harness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 6.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 7 Stack under test 16 7.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 7.2 Producer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 7.2.1 Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 7.2.2 Network interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 7.2.3 Operating System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 7.3 Consumer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 7.4 Interconnects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 7.5 Application Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 7.6 Configuration for Slow Consumers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 8 Vendor Commentary 20 9 STAC Notes 21Copyright ©2012 STAC, LLC. Page 3 NO REDISTRIBUTION
    • STAC Report Highlights STAC-M2.v1.0 - LLM120109 Summary The Securities Technology Analysis Center (STAC) ran STAC-M2 Benchmarks on a stack consisting of Juniper Networks’ QFabric System and IBM’s WebSphere MQ Low Latency Messaging (LLM) running on IBM xSeries® servers and using RDMA Over Converged Ethernet (RoCE) to drive Mellanox ConnectX®-2 EN adapters (see Section 7 for system details). This report documents the results. STAC-M2 v1.0 delivers hundreds of test results, which are presented through a variety of tables and visualiza- tions in this STAC Report™. Juniper, IBM, and Mellanox wish to highlight the following: This was the first STAC-M2 benchmark to utilize a data center-scale network fabric as the interconnect, rather than a single switch. Each message in these tests transited an ingress QFabric Node (QFX3500), a QFabric Interconnect (QFX3008), and an egress QFabric Node (QFX3500). Juniper’s intent in testing a complete QFabric System was to document the performance of a data center fabric that the company claims is managed as simply as a single switch, delivers the low-latency and jitter expected of a stan- dalone low-latency switch, but has the port capacity and backplane scalability to support thousands of endpoints (see Section 2 for more product background). Juniper believes that by delivering mean latency of 10 µsec and jitter of 1 µsec in the baseline test (STAC.M2.v1.0.BASELINE.LAT1), they have met that goal. According to Juniper, trading firms dealing with big data and other challenges increasingly want to enable the mid- and back-office to run at a speed comparable to their front-office trading systems. Juniper says it has seen use cases in which complex event processing, risk engines, back testing and multi-venue, multi-asset strategies all run in a single domain (with the associated security and virtualization required to segregate traffic where needed). The vendors in this project believe that the Juniper fabric and IBM messaging software, together with IBM servers and Mellanox NICs, represent a solution that can scale to thousands of servers while providing "front office" latency and capacity, allowing back-office systems to provide more timely support to front-office trading platforms. In these firms’ view, this audited benchmark demonstrates that the solution offers the low latency and deterministic performance required to deliver bids and orders at the lightning speed of today’s markets. STAC-M2 tests the ability of messaging middleware to handle real-time market data under a variety of condi- tions. The tests provide key performance metrics such as latency, throughput, power efficiency, and CPU/memory consumption under several scenarios, including both undisturbed flow and exception conditions like slow con- sumers. Version 1.0 of STAC-M2 was approved by the STAC Benchmark Council on 11 September 2009. Section 1 includes both an overview of STAC-M2 and diagrams that summarize each test case used to produce the STAC Report Card. The diagrams have labels that correspond to the spec IDs in the Report Card. Under- standing the tests in more detail requires access to the detailed STAC-M2 specifications, which are available to Contributor and Recipient members of the Council (membership is open to trading organizations and vendors; see www.STACresearch.com/council). STAC tested the solution against all of the required STAC-M2 Benchmark specifications, except for those that required the stack under test to provide a last-known-value cache. IBM attested that LLM does not support this functionality, and this exception is allowed under STAC-M2. IBM says that its customers are able to use LLM as the base for custom-built market data systems that include caching functionality and that LLM is also integrated with the IBM WebSphere Front Office for Financial Markets product (www.ibm.com/software/integration/wfo), which supports caching, among other features. For this report, IBM chose not to involve products beyond LLM.Copyright ©2012 STAC, LLC. Page 4 NO REDISTRIBUTION
    • STAC Report Highlights STAC-M2.v1.0 - LLM120109 Get the rest of the story! These STAC Report Highlights provide some insight into the performance of the stack that was tested, but there’s more to the story. The full STAC Report contains additional information, including: 1) Results from additional test cases: - Tests with three distinct types of slow consumers - Tests where one or a group of clients restarts during message flow - Tests with two Producers, not just one 2) Latency versus throughput for a range of message rates, including the highest rate tested. 3) Deeper latency analysis: - Full latency statistics for the highest message rate tested, not just the base rate - Latency percentiles all the way to six nines (99.9999th), not just 99th - Visualizations such as histograms and time plots of latency over test runs 4) Additional metrics (recovery times in restart cases, an additional latency metric that shows un- derlying transport speed, factoring out any latency due to queuing prior to send). The full STAC Report is in the STAC Vault™. The Vault is only accessible to those firms with an appropriate membership in the STAC Benchmark Council. For more information, see www.STACresearch.com/vault. The STAC Report Card summarizes results from several of the STAC-M2 test cases. Each benchmark has a unique identifier. If you are comparing these results to other STAC-M2 Benchmark results, make sure the identifiers match exactly. If they do not, they cannot be fairly compared. To obtain a copy of the STAC-M2 Test Harness to run the same STAC Benchmarks™on a system in your own lab, contact info@STACresearch.com. STAC and the STAC-M2 working group have taken great pains to identify and describe the limitations inherent in the STAC-M2 methodology (see Section 6.3). Some tradeoffs were made in version 1.0 in the interest of getting a set of specifications to market that could begin the process of standards-based benchmarking of high- performance messaging systems. As the industry gains experience with STAC-M2 v1.0 and provides feedback, the STAC-M2 working group will consider additional specifications and perhaps modifications to the existing specifications that will improve a future version of STAC-M2. To participate in this process, please contact info@STACresearch.com.Copyright ©2012 STAC, LLC. Page 5 NO REDISTRIBUTION
    • STAC-M2 REPORT CARD IBM WMQ Low Latency Messaging v2.6.0.3 (LLM120109) Type Spec ID Description VALUE MEAN MED 99P MAX STDV STAC.M2.v1.0.BASELINE.LAT1 SupplyToReceive Latency (Hybrid) at base rate in the 1:5 setup with no watchlist 10 10 15 35 1 overlap (µsec) Latency STAC.M2.v1.0.OVERLAP.LAT1 SupplyToReceive Latency (Hybrid) at base rate in the 1:5 setup with some watchlist 10 10 14 36 1 overlap (µsec) STAC.M2.v1.0.FLEXIBLE.LAT1 SupplyToReceive Latency (Hybrid) at base rate in the setup with flexible Consumer 11 11 15 35 1 resources (µsec) STAC.M2.v1.0.CACHING.LAT1 SupplyToReceive Latency (Hybrid) at base rate in the 1:5 setup with caching enabled NS NS NS NS NS (µsec) STAC.M2.v1.0.BASELINE.TPUT1 Highest supply rate tested in the 1:5 setup with no watchlist overlap (msg/sec) 1,700,000 Throughput STAC.M2.v1.0.OVERLAP.TPUT1 Highest supply rate tested in the 1:5 setup with some watchlist overlap (msg/sec) 1,700,000 STAC.M2.v1.0.FLEXIBLE.TPUT1 Highest supply rate tested in the setup with flexible Consumer resources (msg/sec) 1,600,000 STAC.M2.v1.0.CACHING.TPUT1 Highest supply rate tested in the 1:5 overlap setup with caching enabled (msg/sec) NS STAC.M2.v1.0.SINGLE.CPU1 Consumer CPU utilization at base rate in the 1:1 setup (core equivalents) Active 1.99 2.00 Cores = 26 STAC.M2.v1.0.SINGLE.CPU2 Producer CPU utilization at base rate in the 1:1 setup (core equivalents) Active 2.30 2.65 Cores = 3 CPU STAC.M2.v1.0.FIELDS1.CPU1 Consumer CPU utilization at base rate in the 1:1 setup with complete field consump- Active 1.99 2.00 tion (core equivalents) Cores = 2 STAC.M2.v1.0.TINYLISTS.CPU1 Consumer CPU utilization in the 1:5 setup with four tiny-watchlist Consumers at base Active 1.99 2.00 rate (core equivalents) Cores = 2 STAC.M2.v1.0.SINGLE.MEM1 Max memory utilization of the Consumer machine at base rate in the 1:1 setup (MB) 344 Memory STAC.M2.v1.0.SINGLE.MEM2 Max memory utilization of the Producer machine at base rate in the 1:1 setup (MB) 349 STAC.M2.v1.0.FIELDS1.MEM1 Max memory utilization of the Consumer machine at base rate in the 1:1 setup with 342 complete field consumption (MB) STAC.M2.v1.0.FLEXIBLE.MPW FLEXIBLE.TPUT1, divided by adjusted watts in the setup with flexible Consumer 32 resources (msg/Watt) Data-center Efficiency STAC.M2.v1.0.FLEXIBLE.MPU1 FLEXIBLE.TPUT1, divided by rack units in the setup with flexible Consumer re- 23,529 sources (msg/U) STAC.M2.v1.0.FLEXIBLE.MPO FLEXIBLE.TPUT1, divided by OS impressions in the setup with flexible Consumer 200,000 resources (msg/impression) . NOTE: "NS" means not supported. The vendor attests that the product configuration tested does not support the functionality necessary to produce this benchmark.
    • STAC Report Highlights STAC-M2.v1.0 - LLM120109 1. Background on the STAC-M2 Benchmark specifications 1.1 Overview The STAC-M2 Benchmark specifications test the ability of a solution such as messaging middleware to handle real-time market data in a variety of configurations. The specs reflect the input of seven leading trading firms and six vendors of high-performance messaging. They provide key performance metrics such as throughput, latency, power efficiency, and CPU/memory consumption under several scenarios. As shown in Figure 1, the software binaries required for a STAC-M2 test are Producer and Consumer appli- cations written to interface to the "stack under test" (SUT). These incorporate the STAC-M2 Library to control behavior. The test specs make no assumption about the architecture of the SUT (brokerless vs broker-based, appliance vs software, Ethernet vs InfiniBand, etc.). The fundamental hardware requirement is a small num- ber of machines to host the Producers and Consumers. Some specs are designed to enable arbitrarily large test harnesses for scale testing, but such testing is not required. STAC-M2 does not require any proprietary hardware for time sync or wire capture. Figure 1 - High-level view of STAC-M2 test harness The harness uses a "reflection" methodology for round-trip latency measurement, illustrated in Figure 2. A Producer transmits a "primary" message; a Consumer consumes the message and selectively republishes it as a "reflected" message; and the Producer consumes the reflected message. Multiple Consumers can receive primary messages from a given Producer, which in turn can receive reflected messages from multiple Consumers. Latency is measured from the earliest moment a primary message is available for sending in the Producer to the moment the reflected message payload is available for consumption in the proper format in the Producer (ReceiveTime minus SupplyTime in Figure 2). This result is divided by two to provide a "Hybrid Latency." Hybrid Latency is indicative of one-way latency but may not be exact, because the reflected message rate is always a fraction of the primary message rate.Copyright ©2012 STAC, LLC. Page 7 NO REDISTRIBUTION
    • STAC Report Highlights STAC-M2.v1.0 - LLM120109 Figure 2 - Component-level view of STAC-M2 Producer/Consumer methodology Version 1.0 of STAC-M2 draws from equities and options use cases: smart order routing, pairs trading, market- making, and black-box trading. Some specs emulate "latency minimizing" deployments, where application own- ers tend to over-provision resources to avoid contention. Other specs emulate "cost-minimizing" deployments, where application owners care more about total cost, meaning that they may load multiple apps onto a given machine. The specs vary the number of Producers and Consumers, the watchlist sizes and commonality among Con- sumers, whether the SUT must cache, and whether it must deliver parsed or opaque messages. They test common exception conditions like slow Consumers and application restarts. Version 1.0 supplies messages modeled on observed output of US equities order-book feed handlers deployed in the field. The STAC Library supplies messages to the Producer Adapter as pointers to C structs containing a 24- to 32-byte header and a 232-byte payload containing 136 bytes of data. The Producer Adapter can send the payload as an opaque buffer or use a SUT-specific encoding algorithm. Note that these sizes are purely the size in memory of data going into and out of the adapter. They say nothing about the size of a message as a SUT represents it internally or across a wire. Version 1.0 of STAC-M2 supplies messages at a steady-state rate. Future versions can include more message types, sizes, and timing patterns. For more details on STAC-M2, see www.STACresearch.com/m2. Most of the materials are restricted to Council members. To gain access or to provide input on future versions of STAC-M2, please contact council@STACresearch.Copyright ©2012 STAC, LLC. Page 8 NO REDISTRIBUTION
    • STAC Report Highlights STAC-M2.v1.0 - LLM120109 1.2 Test cases used to produce the STAC-M2 Report Card Below are diagrams that summarize the test sequences (setups and scenarios) used to generate the results in the STAC Report Card (additional sequences were used to generate the full test results contained in the complete STAC Report). The specification IDs in the STAC Report Card embed the name of the relevant sequence below. The sequences are presented in the order in which they are referenced in the STAC Report Card. Sequence BASELINE: Figure 3 Sequence OVERLAP: Figure 4Copyright ©2012 STAC, LLC. Page 9 NO REDISTRIBUTION
    • STAC Report Highlights STAC-M2.v1.0 - LLM120109 Sequence FLEXIBLE: Figure 5 Sequence CACHING: Figure 6Copyright ©2012 STAC, LLC. Page 10 NO REDISTRIBUTION
    • STAC Report Highlights STAC-M2.v1.0 - LLM120109 Sequence SINGLE: Figure 7 Sequence FIELDS1: Figure 8Copyright ©2012 STAC, LLC. Page 11 NO REDISTRIBUTION
    • STAC Report Highlights STAC-M2.v1.0 - LLM120109 Sequence TINYLISTS: Figure 9 2. Product background Juniper Networks submitted the following information and claims about its products: Juniper Networks QFabric is a data center network fabric that flattens the entire network to a single tier, where up to 6,144 servers have equal access, eliminating the effects of network locality. The QFabric architecture also eliminates the need for protocols such as Spanning Tree, TRILL or SPB while reducing network latency to microseconds, improving the performance of every application in the data center. The QFabric System is composed of three separate but interdependent elements that are tradition- ally the primary components of a modular switch: line cards, backplane, and routing engines. By liberating these components from the confines of a single box, the QFabric System delivers a highly scalable fabric that easily grows along with the demands of the data center without suffering any performance degradation, while offering the management simplicity of a single switch. QFabric Node turns the line cards that typically reside within a modular chassis into high-density, fixed-configuration, top-of-rack edge devices that provide access into and out of the fabric. QFabric Interconnect is a high-speed transport device that serves as the backplane in a QFabric System, interconnecting all QFabric Node edge devices in a full-mesh topology. QFabric Director is the QFabric System’s Routing Engine, providing all control and management services for the fabric. These QFabric components work together, behaving as a single logical switch that enables any- to-any connectivity for servers, storage, existing network components, and other critical systems in the data center, while providing a foundation for cloud-ready, virtualized and mobile computing environments. The QFabric System scales to support 6,144 ports and consistently delivers very low latency in a nonblocking and lossless architecture that supports Layer 2, Layer 3 and Fibre Channel over Ethernet (FCoE) traffic. NOTE: The STAC-M2 tests in this project did not test a QFabric System at massive scale. The tests used 8 ports spread across several QFabric Nodes. See Section 9, Vendor Commentary, for discussion of potential scale benchmarks. IBM submitted the following information and claims about its products: Key to the vision of a high-performance network fabric is a message bus that can meet the scale, service requirements, and low-latency expectations of every department within a trading firm. IBMCopyright ©2012 STAC, LLC. Page 12 NO REDISTRIBUTION
    • STAC Report Highlights STAC-M2.v1.0 - LLM120109 Websphere MQ Low Latency Messaging (LLM) is a messaging transport that is highly optimized for the very high-volume, low-latency requirements of financial markets firms. Applications for LLM include the high-speed delivery of market data, transactional data, reference data, and event data in or between front-, middle-, and back-office. This includes: • Market data from exchanges to data consumers • Market and reference data within the enterprise to analytic or trading applications • Trade data such as positions or orders to direct-market access and other trading applications • Event notifications for systems-monitoring, risk-analytics, and compliance applications • Gateway to matching engine high-availability connection with total ordering for stock exchange scenarios. The product is fully daemonless and provides peer-to-peer transport for one-to-one, one-to-many and many-to-many data exchange. In addition to this message-delivery flexibility and the perfor- mance demonstrated in this report, LLM’s capabilities include: • Reliable message delivery with fine-grained control of message delivery assurance • High availability to maintain system service levels and to protect the integrity of the data stream when components fail • Message storage and replay at wire speeds for message recovery and auditing • Monitoring and congestion control to automatically detect bottlenecks and streamline data flow. • High-speed message filtering that supports fine-grained data multiplexing and efficient data segmentation. • Direct shared memory communication and best of breed native support for InfiniBand and 10GE connections. NOTE: Not all of these features were necessarily enabled during the STAC-M2 tests. To learn more about LLM’s abilities or to evaluate LLM for a specific use case, IBM asks customers to visit www.ibm.com/software/integration/wmq/llm or to contact IBM at IBM_WFO_Solutions@us.ibm.com. The tests were conducted on IBM’s xSeries®x3650 M3 servers equipped with the 3.47Ghz Intel® Xeon® Processor X5690 series and configured with 48GB of RAM on a 1333Mhz System bus. These servers provided the power, stability and configuration flexibility to optimize the performance on the STAC-M2 test harness, with particular flexibility to use and tune the threading features re- quired to optimize performance. The flagship System x rack server equipped with the intelligent Intel® Xeon® 5600 series proces- sors delivers high-performance per watt computing for business-critical applications thanks to low total cost of ownership. Low power options like low-voltage processors and high-efficiency power supplies draw less energy and produce less waste heat, helping to reduce data center energy costs. The x3650 M3 is the most expandable and upgradeable IBM two-socket rack server and can handle customers growing, demanding, data-dense and mission-critical application workloads until the next decade with high expandability/availability and manageability attributes that provide growth flexibility. And with a choice of up to sixteen 2.5-inch hot-swap SAS/SATA HDDs or solid-state drives, it offers up to 8.0TB of storage capacity. Mellanox submitted the following information and claims about its products: The Mellanox ConnectX®-2 EN Ethernet Network Interface Card (NIC) delivers the low-latency, high- throughput (10 & 40 GbE) and low CPU utilization, leveraging the RoCE (RDMA over Converged Ethernet) standard. ConnectX®-2 EN adapters with RoCE are widely used in financial services for removing I/O bottlenecks, lowering latency and jitter, and increasing message rates for high fre- quency trading, market data distribution and real-time risk management. More product information is available at: http://www.mellanox.com/content/pages.php?pg=products_dyn&product_family=63&menu_section=28. RoCE is based on the IBTA RoCE specifications and utilizes the Open Fabrics Enterprise Distribu- tion (OFED) verbs interface as the software interface between application layer and ConnectX®-2Copyright ©2012 STAC, LLC. Page 13 NO REDISTRIBUTION
    • STAC Report Highlights STAC-M2.v1.0 - LLM120109 EN hardware. RoCE takes advantage of transport services support of various modes of communi- cation, such as reliable connected services and datagram services. RoCE uses well defined verbs operations including kernel bypass, send/receive semantics, RDMA read/write, user-level multicast, user level I/O access, zero copy and atomic operations. 3. Project participants and responsibilities The following firms participated in the project: • Juniper Networks • IBM • Mellanox • STAC The Project Participants had the following responsibilities: • IBM implemented the STAC-M2 adapter using the STAC Library. • Juniper and IBM configured and optimized the stack under test (SUT). • IBM provided the servers and messaging middleware. • Juniper Networks provided the QFabric System components including edge switches and interconnects. • Mellanox provided the 10GbE network adapters. • Juniper sponsored the Audit. • STAC conducted the STAC-M2 Benchmark Audit, which included inspecting the source code to the STAC- M2 Producer and Consumer, inspecting the STAC Test Harness configuration, validating required SUT functionality, and executing the tests. 4. Contacts For more information about Juniper products, please visit www.juniper.net/us/en/contact-us/. For questions about this system configuration or the test results, contact IBM_WFO_Solutions@us.ibm.com or visit www.ibm.com/software/integration/wmq/llm. For questions about STAC-M2, contact info@STACresearch.com. 5. Benchmark Status • These test results were audited by STAC or a STAC-certified third party, as indicated in the Responsibilities section above. As such, they are considered official results. For details, please see www.STACresearch.com/reporting. • The Vendor attests that the Producer and Consumer binaries provided for the audit were compiled from the same source code approved by STAC or a STAC-certified third party as part of this audit engagement. • The Vendor attests that it did not modify the SUT during the audit. 6. Methodology 6.1 Specifications This project followed the 1.0 version of the STAC-M2 Benchmark specifications. Full members of the STAC Benchmark Council can access these specifications at www.STACresearch.com/m2.Copyright ©2012 STAC, LLC. Page 14 NO REDISTRIBUTION
    • STAC Report Highlights STAC-M2.v1.0 - LLM120109 6.2 STAC Test Harness STAC-M2 Library Version v1.0.4 STAC Lib Timing Config "system" (gettimeofday) 6.3 Limitations • The latency metrics in version 1.0 STAC-M2 specifications are not true one-way latencies. They are "hybrid" values (round trip divided by 2), which should be interpreted as imperfect proxies for one-way latency. Several factors potentially limit the accuracy of these proxies: 1) since the reflected message rate is a small fraction of the primary message rate, the workload is not symmetric, which means latency might not be symmetric (for example, latency from Producer to Consumer might be higher than from Consumer to Producer); 2) while the specs call for the primary and reflection paths to have symmetric configurations, such symmetry may not be possible for some SUTs, which could cause asymmetries in latency. • The STAC-M2 Test Harness relies on software instrumentation for measurement, which has an impact on performance. These limitations are consistent for all systems tested with this harness but are worth noting. In particular: 1) Because it is a reflection methodology, the latency measurements include some latency for the STAC-M2 Library to reflect messages (which is divided by 2 along with the rest of the round-trip latency); 2) reflection in the Consumer also takes CPU time, which increases the observed CPU utilization of the Consumer and may decrease throughput; 3) the Producer is required to call the STAC-M2 Library to record a "Send" timestamp, which consumes some CPU and may reduce the max sustainable rate. • When deployed in a Consumer, the threading model of the STAC Library used for this version of the specifications emulates an application that requires business logic to execute in a different thread from message-consumption threads and requires serialized input to the business-logic thread from those other threads. This imposes a single-threaded bottleneck on performance of a single Consumer. Another com- mon application pattern in the industry is for an application to scale across multiple cores by partitioning business logic by symbol and running multiple partition threads, each performing its own in-line message consumption. The latter pattern would potentially allow a single client to achieve higher throughput but is not supported in this version of the specs. Similarly, each Producer has a single Supplier thread, which limits the total possible message supply rate for a given Producer. Specifications that allow an arbitrary number of Suppliers on a given machine may be added in the future but are not part of the 1.0 specs. • Power measurements in this version of STAC-M2 are allowed to be rudimentary. It is acceptable to take periodic amp readings and multiply by voltage supplied to the power distribution units (PDU). This method is known to have two sources of error: 1) accuracy of the meters in the PDUs, and 2) lack of accounting for the power factor. More accurate power readings may be taken if available. • The timing of Consumer coordination is only accurate to +/- one second. This could lead to some small variation in results from test run to test run.Copyright ©2012 STAC, LLC. Page 15 NO REDISTRIBUTION
    • STAC Report Highlights STAC-M2.v1.0 - LLM120109 7. Stack under test 7.1 Overview The stack under test (SUT) consisted of 8 IBM x3650 M3 Model 7945 rack-mounted servers. Each server was interconnected using one port of a Mellanox ConnectX®-2 10GE with RoCE 10 Gbps Ethernet Dual Port Adapter networked through the Juniper QFabric System. STAC-M2 Producer and Consumer applications used IBM’s WebSphere MQ Low Latency Messaging (LLM) API for all communications and were configured to use different multicast channels for primary messages from Producer(s) to Consumer(s) than for reflected messages from Consumer(s) to Producer. The STAC-M2 Adapter was dynamically linked with LLM. The OpenFabrics Enterprise Distribution (OFED) software was used for RDMA protocol support, allowing LLM and the STAC-M2 Adapter to work with a user- space kernel bypass protocol stack instead of the usual Linux kernel TCP/UDP stack. For more detailed configuration information on the servers or software, contact IBM_WFO_Solutions@us.ibm.com. Figure 10 - Test configuration The Juniper QFabric System was configured as shown in Figure 11, with default Layer 3 configurations to support low-latency, cut-through switching. Consumer and Producer applications were configured on servers S1-S8 so that no Consumer server was connected to the same QFX3500 QFabric Node as a Producer server with which it communicated. This meant that all messages were required to transit two QFX3500 QFabric Nodes plus a QFX3008 QFabric Interconnect.Copyright ©2012 STAC, LLC. Page 16 NO REDISTRIBUTION
    • STAC Report Highlights STAC-M2.v1.0 - LLM120109 Figure 11 - Test configurationCopyright ©2012 STAC, LLC. Page 17 NO REDISTRIBUTION
    • STAC Report Highlights STAC-M2.v1.0 - LLM120109 7.2 Producer 7.2.1 Servers Vendor Model IBM System x3650 M3 Model 7945 Rack Units 1 Power Supply 675 W Processor type 2 x Intel Xeon CPU X5690 @ 3.47GHz Cache L1 I cache: 32K, L1 D cache: 32K, L2 cache: 256K, L3 cache: 12288K Bus Speed 1333 MHz Memory 48GB (6 x 8GB) DIMM Disk 192GB BIOS Vendor IBM BIOS Version D6E153AUS-1.12 BIOS Release Date 06/30/2011 Hyperthreading Disabled 7.2.2 Network interfaces NIC 2 – port 1 Mellanox MNPH29D-XTR ConnectX®-2 EN with RoCE Used for Message Traffic NIC Driver Version mlx4_en 1.5.3 (Jan 2011) NIC Firmware Ver- 2.9.1000 sion NIC 1 – port 1 Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20) Used for Test Harness Admin NIC Driver Version bnx2 v2.0.2 NIC Firmware Ver- 6.2.0 NCSI 2.0.11 sionCopyright ©2012 STAC, LLC. Page 18 NO REDISTRIBUTION
    • STAC Report Highlights STAC-M2.v1.0 - LLM120109 Network Stack Con- Contact Vendor for details figuration 7.2.3 Operating System Version Red Hat Enterprise Linux Server AS 5.5 Kernel Version 2.6.18-194.el5 (x86_64) 7.3 Consumer Same as Producer. 7.4 Interconnects Switch - Forwarding Plane Juniper QFX3500 QFabric Node Switch - Forwarding Plane 1U Rack Units Interconnect Juniper QFX3008 QFabric Interconnect Interconnect Rack Units 21U Director Juniper QFX3100 QFabric Director Director Rack Units 2U Switch - Control Plane Juniper EX4200 Ethernet Switch Switch - Control Plane Rack 1U Units 7.5 Application Software Product/component WebSphere WMQ Low Latency Messaging Version 2.6.0.3 Build ID 20111013.1Copyright ©2012 STAC, LLC. Page 19 NO REDISTRIBUTION
    • STAC Report Highlights STAC-M2.v1.0 - LLM120109 Product/component STAC Pack Author IBM Interface IBM WMQ LLM API v2.6.0.3 Source version v2.6.0.3 Producer build ID v 1.8 :: Dec 23 2011 - 14:03:20 Consumer build ID v 1.8 :: Dec 23 2011 - 14:03:19 7.6 Configuration for Slow Consumers STAC-M2 requires all Consumers to be configured the same. In an Audit, the vendor does not know which Consumers STAC will configure to exhibit slow consumption in a given test run. LLM allows for the specification of slow-consumer policies that can be used to automatically suspend or expel receivers that are not performing well. According to IBM, the LLM policies in these tests applied thresholds to the number of NAKs (negative acknowledgements) generated by a Consumer and expelled slow Consumers from the system. 8. Vendor Commentary Juniper Networks provided the following comments: This particular test harness was purposely unique, in that our objective was to measure, document and audit the performance characteristics of a large-scale data center fabric (the Juniper QFabric System). The network infrastructure offered far more capacity than required to meet the specific needs of the test in order to showcase the entire QFabric System. The network in this test harness comprised four (04) QFX3500 QFabric Nodes, two (02) QFX3008 QFabric Interconnects, and two (02) QFX3100 QFabric Directors. An additional EX4200 Ethernet Switch was also deployed to provide the management backplane (always discreet from the data path in a Juniper network) but was not in the data path for the STAC-M2 Test Harness. The latter two QFabric Interconnect and QFabric Director components are always deployed as pairs for purposes of high availability. When connected as a fabric, every component integrates into a single system image, and the full fabric can be administered as a single system, with tasks such as OS updates or security policy changes only needing to be applied one time. (More details on the Juniper QFabric System can be obtained at http://www.juniper.net/us/en/products-services/switching/qfx-series/) We distributed the eight System x3650 M3 Servers across the four nodes of the fabric and ran an unmodified STAC-M2 test harness. While normal tuning was applied to Kernel Settings, Ring Buffers and Core affinities with various threads, the test harness ran unmodified across the QFabric System. It is interesting to note that the QFabric Interconnects and QFabric Directors were actually located on a separate floor in the building. This is one more example of the flexibility and scale that the QFabric System offers with regards to physical location and floor planning. We would like to discuss several points on the test results that bear further explanation. Interoperability: While the QFabric System represents a new approach to data center networking, and can offer new levels of performance in large-scale environments, we would like to point out that interoperability with the entire ecosystem was quite simple and standards based. The network interface cards (and RDMA Over Converged Ethernet [RoCE] drivers) ran unmodified across the full environment, as did the IBM Websphere LLM platform and the STAC test harness itself. This highlights the fact that a QFabric System behaves as a single switch using standard protocols and interfaces.Copyright ©2012 STAC, LLC. Page 20 NO REDISTRIBUTION
    • STAC Report Highlights STAC-M2.v1.0 - LLM120109 Power Consumption: The objectives for this test were a "scale out" to demonstrate the performance of a full- scale fabric. The SUT included nine powered system chassis to connect the eight servers, two of which (the QFX3008 QFabric Interconnect pair) require 208V power. This presented two challenges. First, the large number of network devices relative to the number of pro- ducer/consumer servers distorts the Messages Per Watt ratings on the STAC.M2.v1.0.FLEXIBLE.MPW test, and second, the power for the 208V products did not come from a monitored PDU, so the STAC Auditor took the most conservative metric: the power draw for the entire circuit. While we don’t believe the Messages Per Watt is particularly relevant in a test like this, the large number does merit some explanation. The full QFabric System has well-defined (and highly efficient) power consumption ratings which can be easily calculated from data sheet statistics for purposes of capacity plan- ning. The QFabric System configured for this STAC Benchmark is capable of supporting 193-254 10 Gbps attached servers; however, this test involved only eight servers. Conse- quently, the Messages Per Watt rating is artificially high. In a production system, we would expect the power efficiency to be proportionally better than the system documented here. Test Configuration and Ease of Large Scale benchmarks: We would like to thank the STAC organization for their responsiveness and flexibility as we pushed the boundaries of the STAC-M2 Benchmarks to measure a data center-scale fabric. In the future, we think that tests at a larger scale would provide additional insight into enterprise-scale performance and would like it to be easier to use more servers. STAC-M2 does contain optional benchmark specifications for arbitrarily large scaling. And according to STAC, the STAC-M2 Advanced Test Harness (used by trading firms for internal testing but not currently used for published benchmarks) can scale up to tens of thousands of Consumers and Producers, so the test framework exists. However, server procurement is a big challenge because the current specifications require identical systems. Even large manufacturers such as IBM and Juniper have a hard time securing large numbers of identical servers. We would have liked to test 100 or more servers but could only come up with eight identical systems. We would like to propose some discussion in the Council about how to model and test large environments, perhaps using server virtualization or at least mis-matched servers. 9. STAC Notes We hear Juniper’s request for a re-consideration of the "identical systems" requirement in the context of a large-scale test and will raise it with the STAC-M2 Working Group.Copyright ©2012 STAC, LLC. Page 21 NO REDISTRIBUTION
    • STAC Report Highlights STAC-M2.v1.0 - LLM120109 About STAC STAC is a technology-research firm that facilitates the STAC Benchmark Council™ (www.STACresearch.com/council), an organization of leading trading organizations and vendors that specifies standard ways to measure the performance of trading solutions. STAC Benchmarks cover workloads in market data, analytics, and trade execution. STAC helps end-user firms relate the performance of new technologies to that of their existing systems by supplying them with STAC Benchmark reports as well as standards-based STAC Test Harnesses™ for rapid execution of STAC Benchmarks in their own labs. End users do not disclose their results. Some STAC Benchmark results from vendor-driven projects are made available to the public, while those in the STAC Vault™ are reserved for members of the Council (see www.STACresearch.com/vault). To be notified when new STAC Benchmark results become available, please sign up for free at www.STACresearch.com.2000459-001-EN Feb 2012Copyright ©2012 STAC, LLC. Page 22 NO REDISTRIBUTION