The document discusses performance issues experienced when migrating a WebSphere MQ application from a non-IBM to IBM (AIX) platform while using confirmation on arrival (COA). Over several weeks of analysis and tuning by IBM and the customer, it was discovered that the IBM platform was using smaller TCP buffer sizes during channel negotiation than the non-IBM platform, despite tuning efforts. This explained the increasing message queue times seen on the IBM platform.
Opensample: A Low-latency, Sampling-based Measurement Platform for Software D...Junho Suh
In this paper we propose, implement and evaluate OpenSample: a low-latency, sampling-based network measure- ment platform targeted at building faster control loops for software-defined networks. OpenSample leverages sFlow packet sampling to provide near–real-time measurements of both net- work load and individual flows. While OpenSample is useful in any context, it is particularly useful in an SDN environment where a network controller can quickly take action based on the data it provides. Using sampling for network monitoring allows OpenSample to have a 100 millisecond control loop rather than the 1–5 second control loop of prior polling-based approaches. We implement OpenSample in the Floodlight OpenFlow controller and evaluate it both in simulation and on a testbed comprised of commodity switches. When used to inform traffic engineering, OpenSample provides up to a 150% throughput improvement over both static equal-cost multi-path routing and a polling-based solution with a one second control loop.
DPACC Acceleration Progress and DemonstrationOPNFV
The session provides an update to on the DPACC project within the OPNFV with a brief discussion on APIs and implementation progress. This session will review the API definition progress and follow up with a demo highlighting a common application as the vNF running on top of the DPACC defined layers. The demo will highlight the use of both hardware and software acceleration utilizing the DPACC defined acceleration layers. The demonstrationIt will highlight the progress in optimizing performance and latency characteristics of a platform to realize the vision of NFV while meeting stringent requirements, particularly for certain workloads, required by carriers.
Presentation given at MPLS+SDN+NFVWORLD 2019 in Paris that shows how network architects can leverage the support for IPv6 Segment that is included in the Linux kernel to develop new end-to-end services that use IPv6 Segment Routing on clients, routers and servers.
Opensample: A Low-latency, Sampling-based Measurement Platform for Software D...Junho Suh
In this paper we propose, implement and evaluate OpenSample: a low-latency, sampling-based network measure- ment platform targeted at building faster control loops for software-defined networks. OpenSample leverages sFlow packet sampling to provide near–real-time measurements of both net- work load and individual flows. While OpenSample is useful in any context, it is particularly useful in an SDN environment where a network controller can quickly take action based on the data it provides. Using sampling for network monitoring allows OpenSample to have a 100 millisecond control loop rather than the 1–5 second control loop of prior polling-based approaches. We implement OpenSample in the Floodlight OpenFlow controller and evaluate it both in simulation and on a testbed comprised of commodity switches. When used to inform traffic engineering, OpenSample provides up to a 150% throughput improvement over both static equal-cost multi-path routing and a polling-based solution with a one second control loop.
DPACC Acceleration Progress and DemonstrationOPNFV
The session provides an update to on the DPACC project within the OPNFV with a brief discussion on APIs and implementation progress. This session will review the API definition progress and follow up with a demo highlighting a common application as the vNF running on top of the DPACC defined layers. The demo will highlight the use of both hardware and software acceleration utilizing the DPACC defined acceleration layers. The demonstrationIt will highlight the progress in optimizing performance and latency characteristics of a platform to realize the vision of NFV while meeting stringent requirements, particularly for certain workloads, required by carriers.
Presentation given at MPLS+SDN+NFVWORLD 2019 in Paris that shows how network architects can leverage the support for IPv6 Segment that is included in the Linux kernel to develop new end-to-end services that use IPv6 Segment Routing on clients, routers and servers.
Open vSwitch - Stateful Connection Tracking & Stateful NATThomas Graf
Update on status of connection tracking and stateful NAT addition to the Linux kernel datapath. Followed by a discussion on the topic to collect ideas and come up with next steps.
IPv6 Segment Routing is a major IPv6 extension that provides a modern version of source routing that is currently being developed within the Internet Engineering Task Force (IETF). We propose the first open-source implementation of IPv6 Segment Routing in the Linux kernel. We first describe it in details and explain how it can be used on both endhosts and routers. We then evaluate and compare its performance with plain IPv6 packet forwarding in a lab environment. Our measurements indicate that the performance penalty of inserting IPv6 Segment Routing Headers or encapsulat- ing packets is limited to less than 15%. On the other hand, the optional HMAC security feature of IPv6 Segment Routing is costly in a pure software implementation. Since our implementation has been included in the official Linux 4.10 kernel, we expect that it will be extended by other researchers for new use cases.
Presented at ANRW'17 https://irtf.org/anrw/2017/program.html on behalf of David Lebrun
Part 10 : Routing in IP networks and interdomain routing with BGPOlivier Bonaventure
Slides supporting the "Computer Networking: Principles, Protocols and Practice" ebook. The slides can be freely reused to teach an undergraduate computer networking class using the open-source ebook.
This presentation will walk through the values and benefits of using service chaining technologies in OPNFV for service composition. The presentation will talk through and demonstrate, in real time, platform service chaining features and capabilities
In this presentation from the DDN User Meeting at SC13, Tommy Minyard from the Texas Advanced Computing Center describes TACC's new Corral data storage system.
Watch the video presentation: http://insidehpc.com/2013/11/13/ddn-user-meeting-coming-sc13-nov-18/
Peat, Peatlands and Sustainable Development Scientific DayJennifer O'Donnell
The first scientific day of the Department of Peat, Peatlands and Sustainable Development of the Coastal Zones Research Institute will be held at the Congress Centre of the Acadian Peninsula on April 28 in Shippagan from 8:30 a.m. to 4:30 p.m.
Open vSwitch - Stateful Connection Tracking & Stateful NATThomas Graf
Update on status of connection tracking and stateful NAT addition to the Linux kernel datapath. Followed by a discussion on the topic to collect ideas and come up with next steps.
IPv6 Segment Routing is a major IPv6 extension that provides a modern version of source routing that is currently being developed within the Internet Engineering Task Force (IETF). We propose the first open-source implementation of IPv6 Segment Routing in the Linux kernel. We first describe it in details and explain how it can be used on both endhosts and routers. We then evaluate and compare its performance with plain IPv6 packet forwarding in a lab environment. Our measurements indicate that the performance penalty of inserting IPv6 Segment Routing Headers or encapsulat- ing packets is limited to less than 15%. On the other hand, the optional HMAC security feature of IPv6 Segment Routing is costly in a pure software implementation. Since our implementation has been included in the official Linux 4.10 kernel, we expect that it will be extended by other researchers for new use cases.
Presented at ANRW'17 https://irtf.org/anrw/2017/program.html on behalf of David Lebrun
Part 10 : Routing in IP networks and interdomain routing with BGPOlivier Bonaventure
Slides supporting the "Computer Networking: Principles, Protocols and Practice" ebook. The slides can be freely reused to teach an undergraduate computer networking class using the open-source ebook.
This presentation will walk through the values and benefits of using service chaining technologies in OPNFV for service composition. The presentation will talk through and demonstrate, in real time, platform service chaining features and capabilities
In this presentation from the DDN User Meeting at SC13, Tommy Minyard from the Texas Advanced Computing Center describes TACC's new Corral data storage system.
Watch the video presentation: http://insidehpc.com/2013/11/13/ddn-user-meeting-coming-sc13-nov-18/
Peat, Peatlands and Sustainable Development Scientific DayJennifer O'Donnell
The first scientific day of the Department of Peat, Peatlands and Sustainable Development of the Coastal Zones Research Institute will be held at the Congress Centre of the Acadian Peninsula on April 28 in Shippagan from 8:30 a.m. to 4:30 p.m.
As various organizations of different sizes have started absorbing more of training – classroom, online or a blend of both, the employee’s path of progression gets clearer and clearer and when he/she see the interest the organization has vested in their growth, there comes a sense of motivation, a sense of involvement, a sense of participation and the willingness to stay and perform and grow. At the end of the day, when the organization invests in building a workforce for the future, every bit they invest into their employee will pay off both in the financial sense as well as in employee engagement.
Reconsider TCPdump for Modern TroubleshootingAvi Networks
Are you tired of troubleshooting with TCPdump? The Avi Vantage Platform is here to help. Learn how you can reconsider your decades-old CPU-intensive logging tools – and gain intuitive, real-time analytics, faster time-to-resolution, modern SSL / TLS encryption, and (most importantly) happy IT teams focused on delivering applications.
Watch this Avi webinar to learn:
- Why TCPdump should be your tool of last resort
- How headers compressed with HTTP/2, PFS, and distributed systems have rendered certain tools useless
- How you can replace TCPdump with intelligent logs and analytics
- How to future proof your troubleshooting tools with HTTP/3, TLS 1.3, containers and Kubernetes
Watch on-demand here https://www.networkworld.com/resources/form?placement_id=de4979d3-4f46-498e-8285-2bdad91ca3fb&brand_id=512
Many applications are network I/O bound, including common database-based applications and service-based architectures. But operating systems and applications are often untuned to deliver high performance. This session uncovers hidden issues that lead to low network performance, and shows you how to overcome them to obtain the best network performance possible.
Flink Forward Berlin 2018: Nico Kruber - "Improving throughput and latency wi...Flink Forward
Flink's network stack is designed with two goals in mind: (a) having low latency for data passing through, and (b) achieving an optimal throughput. It already achieves a good trade-off between these two but we will continue to tune it further for an optimal one. Flink 1.5 did not only further reduce overhead inside the network stack, but also introduced some groundbreaking changes in the stack itself, introducing an own (credit-based) flow control mechanism and a general change towards a more network-event-driven pull approach. With our own flow control, we steer what data is sent on the wire and make sure that the receiver has capacity to handle it. If there is no capacity, we will backpressure earlier and allow checkpoint barriers to pass through more quickly which improves checkpoint alignment and overall checkpoint speed.
In this talk, we will describe in detail how the network stack works with respect to (de)serialization, memory management and buffer pools, the different actor threads involved, configuration, as well as the design to improve throughput and latency trade-offs in the network and processing pipelines. We will go into detail on what is happening during checkpoint alignments and how credit-based flow control improves them. Additionally, we will present common pitfalls as well as debugging techniques to find network-related issues in your Flink jobs.
A Dataflow Processing Chip for Training Deep Neural Networksinside-BigData.com
In this deck from the Hot Chips conference, Chris Nicol from Wave Computing presents: A Dataflow Processing Chip for Training Deep Neural Networks.
Watch the video: https://wp.me/p3RLHQ-k6W
Learn more: https://wavecomp.ai/
and
http://www.hotchips.org/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
I²C (Inter-Integrated Circuit), pronounced I-squared-C, is a multi-master, multi-slave, single-ended, serial computer bus invented by Philips Semiconductor (now NXP Semiconductors). It is typically used for attaching lower-speed peripheral ICs to processors and microcontrollers. Alternatively I²C is spelled I2C (pronounced I-two-C) or IIC (pronounced I-I-C).
Since October 10, 2006, no licensing fees are required to implement the I²C protocol. However, fees are still required to obtain I²C slave addresses allocated by NXP.[1]
Several competitors, such as Siemens AG (later Infineon Technologies AG, now Intel mobile communications), NEC, Texas Instruments, STMicroelectronics (formerly SGS-Thomson), Motorola (later Freescale), and Intersil, have introduced compatible I²C products to the market since the mid-1990s.
SMBus, defined by Intel in 1995, is a subset of I²C that defines the protocols more strictly. One purpose of SMBus is to promote robustness and interoperability. Accordingly, modern I²C systems incorporate policies and rules from SMBus, sometimes supporting both I²C and SMBus, requiring only minimal reconfiguration.
The Serial Peripheral Interface (SPI) bus is a synchronous serial communication interface specification used for short distance communication, primarily in embedded systems. The interface was developed by Motorola and has become a de facto standard. Typical applications include sensors, Secure Digital cards, and liquid crystal displays.
SPI devices communicate in full duplex mode using a master-slave architecture with a single master. The master device originates the frame for reading and writing. Multiple slave devices are supported through selection with individual slave select (SS) lines.
Sometimes SPI is called a four-wire serial bus, contrasting with three-, two-, and one-wire serial buses. The SPI may be accurately described as a synchronous serial interface,[1] but it is different from the Synchronous Serial Interface (SSI) protocol, which is also a four-wire synchronous serial communication protocol, but employs differential signaling and provides only a single simplex communication channel.
Presented at PyCon UK 2018 (18 September 2018, Cardiff).
The slides are incomplete.
Recording available at:
https://www.youtube.com/watch?v=-weU0Zy4Yd8
Slides supporting the "Computer Networking: Principles, Protocols and Practice" ebook. The slides can be freely reused to teach an undergraduate computer networking class using the open-source ebook.
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...Flink Forward
This talk shares experiences from deploying and tuning Flink steam processing applications for very large scale. We share lessons learned from users, contributors, and our own experiments about running demanding streaming jobs at scale. The talk will explain what aspects currently render a job as particularly demanding, show how to configure and tune a large scale Flink job, and outline what the Flink community is working on to make the out-of-the-box for experience as smooth as possible. We will, for example, dive into - analyzing and tuning checkpointing - selecting and configuring state backends - understanding common bottlenecks - understanding and configuring network parameters
1. WebSphere MQ & TCP Buffers –
Size DOES Matter!
MQTC V2.0.1.3
Arthur C. Schanz
Distributed Computing Specialist
Federal Reserve Information Technology (FRIT)
2. Disclaimer :
The views expressed herein are my own and not necessarily those
of Federal Reserve Information Technology, the Federal Reserve
Bank of Richmond, or the Federal Reserve System.
2
3. About the Federal Reserve System
• The Federal Reserve System, the Central Bank of the United States – Created in
1913 – aka “The Fed”
• Consists of 12 Regional Reserve Banks (Districts) and the Board of Governors
• Roughly 7500 participant institutions
• Fed Services – Currency and Coin, Check Processing, Automated Clearinghouse
(FedACHSM
) and Fedwire®
• Fedwire – A real-time gross settlement funds transfer system – Immediate, Final
and Irrevocable
• Fedwire®
Funds Service, Fedwire®
Securities Services and National Settlement
Services
• Daily average Volume – Approximately 600,000 Originations
• Daily average Value – Approximately $4 Trillion
• Average transfer – $6,000,000
• Single-day peak:
• Volume – 1.2 Million Originations
• Value – $6.7 Trillion
3
4. If you could spend $100,000,000 ($100 Million) per day, how long would it take you
to spend $4 Trillion?
• 1 year?
• 5 years?
• 10 years?
• 50 years?
• 100 years?
Actually, it would take over 109 years to spend the entire $4 Trillion.
Just for Fun: How much is $4 Trillion?
4
5. Scenario Details
• Major redesign of a critical application
• Platform change – moving from “non-IBM” to “IBM” (AIX)
• Initial use of Confirmation on Arrival (COA)
• QMGRs are located at data centers which are ~1500 Miles
(2400 KM) apart
• 40ms latency for round-trip messages between data centers
• SSL in use for SDR/RCVR channels between the QMGRs
• WMQ V7.1 / GSKit V8
• Combination of JMeter and ‘Q’ program to generate test msg
traffic
• Used 10K byte messages for application traffic simulation
5
8. Initial Observations…Which Led to WMQ PMR
• Functionally, everything is working as expected
• “Q” program reading file and loading 1000 10K byte messages
• No delay seen in generation of COA msgs
• IBM platform shows linear progression in time differential
between MQPUT of original msg by Q and MQPUT of the COA
• “non-IBM” platform does not exhibit similar behavior
• Network core infrastructure is common for both platforms
• Evidence points to a potential issue in WMQ’s use of the
resources in the communications/network layer (buffers?)
• Issue seen as a ‘show-stopper’ for platform migration by Appl
8
9. Week #1 – IBM Initial Analysis
• Several sets of traces, RAS output and test result spreadsheets
were posted to the PMR
• WMQ L2 indicated that they are “…not finding any unexpected
processing relating to product defects…”, and that “…the
differences between the coa.req and coa.reply times are due
to the time taken to transmit the messages to the remote
qmgr.”
• Received links to tuning documents and info on Perf & Tuning
Engagements ($$$)
• IBM: OK to close this PMR?
• Our Answer: Open a PMR w/ AIX
9
10. Week #2 – Let’s Get a 2nd Opinion (When in Doubt, Look at the Trace)
• AIX TCP/IP and Perf L2 request various trace/perfpmr data be
captured on QMGR lpars and VIOS, while executing tests
• Network interfaces showing enormous amount of errors
• SSL Handshake taking an inordinate amount of time, as the
channel PING or START is not immediate. This developed into
another PMR – However, this is not contributing to the
original problem
• Additional traces requested, using different criteria
• AIX PMR escalated to S1P1 – Executive conference call
• Internal review of WMQ traces reveals several interesting
details…
10
13. Week #2 (con’d) – …and When in Doubt, Look at the Trace
• Negotiated and Set Channel Options/Parameters:
• KeepAlive = 'YES'
• Conversation Type: SSL
• Timeout set to 360 (360) (30)
• GSKit version check '8.0.14.14' >= '8.0.14.12' OK
• TCP/IP TCP_NODELAY active
• TCP/IP KEEPALIVE active
• Current socket receive buffer size: 262088
• Current socket send buffer size: 262088
• Socket send buffer size: 32766
• Socket receive buffer size: 2048
• Final socket receive buffer size: 2048
• Final socket send buffer size: 32766
• PMR update: “This may be at the root of the problem, but I will leave that
determination to the experts.”
13
14. (32K) (32K)
(2K)(2K)
WebSphere MQ – Established
TCP Channel Connection
Both sockets have a send buffer and a receive buffer for data
• write() adds data to the send buffer for transfer
• read() removes data arrived on the receive buffer
• poll() and select() wait for a buffer to be ready
14
15. Week #3 – AIX/TCP Tuning (Lather…Rinse…Repeat)
• More traces
• AIX and TCP L2 suggest tuning changes: ‘Largesend=1’ and
‘tcp_sendspace=524288’ & ‘tcp_recvspace=524288’ for network interfaces
• Wireshark shows that ‘Largesend=1’ made performance worse. TCP
window size updates still constantly occurring during test
• Network interface errors were ‘bogus’ – known AIX bug
• VIOS level could be contributing to the problem
• High fork rate
• Tcp_nodelayack
• WMQ PMR escalated to S1
• IBM changes this entire issue to Crit Sit – we now have a dedicated team
to work the issue
15
16. Week #3 (con’d) – WMQ Tuning – Back to (Buffer) Basics
• More traces
• WMQ L2/L3 suggest increasing both WMQ TCP Send/Recv buffers to 32K,
then 64K, via qm.ini - (But why isn’t this needed on current platform?)
TCP:
SndBuffSize=65536
RcvBuffSize=65536
• Change had no positive effect
• We see (and L3 confirms) that trace shows SDR-side and RCVR-side
sockets are in many wait/poll calls, some of which are simultaneous
11:42:45.381709 9306238.433 RSESS:00011e ---{ cciTcpSend
11:42:45.381713 9306238.433 RSESS:00011e ----{ send
11:42:45.381723 9306238.433 RSESS:00011e ----}! send rc=Unknown(B)
• rc=Unknown(B), where B=11 (EAGAIN) - No buffer available
16
17. Week #3 (con’d) – More WMQ and AIX Tuning
• More traces
• AIX Perf team suggests ‘hstcp=1’
• WMQ L3: “Delays are outside of MQ’s control and requires further
investigation by OS and Network experts”
• AIX L3: “We are 95% sure that an upgrade to VIOS will correct the issue”
• We determined that AIX L3 had made several tuning recommendations by
looking at the wrong socket pair…none of which had any positive effect
• AIX L3: “It appears that TCP buffers need to be increased.” Value chosen:
5MB.
• Pushed back on making that change, but IBM WMQ L2/L3 agreed with
that recommendation
• Tests also requested w/ Batchsize of 100, then 200
17
18. Week #3 (con’d) – More WMQ and AIX Tuning
• 5MB buffers showed some improvement (How could they not?!)
• Batchsize changes did not show significant improvement
• Removed trace hooks (30D/30E), set ‘tcp_pmtu_discover=1’ &
‘tcp_init_window=8’, as they were said to be “pretty significant” to
increased performance. They weren’t.
• Back to the $64,000 question – Why do we need changes on this new
platform, but not on the current platform, all other things being equal?
• While IBM contemplates their next recommendation, we decided to take a
closer look at the two environments, more specifically the WMQ TCP
buffers. What is actually happening ‘under the covers’ during channel
negotiation.
• Guess what we found?
18
19. Week #3 (con’d) – All Flavors of Unix are NOT Created Equal
• SDR channel formatted trace file for new (AIX) platform:
• Current socket receive buffer size: 262088
• Current socket send buffer size: 262088
• Socket send buffer size: 32766
• Socket receive buffer size: 2048
• Final socket receive buffer size: 2048
• Final socket send buffer size: 32766
• SDR channel formatted trace file for current (non-IBM) platform:
• Current socket receive buffer size: 263536
• Current socket send buffer size: 262144
• Socket send buffer size: 32766
• Socket receive buffer size: 2048
• Final socket receive buffer size: 263536
• Final socket send buffer size: 32766
IBM platform uses setsockopt() values – non-IBM uses hybrid
19
20. Week #3 (con’d) – All Flavors of Unix are NOT Created Equal
• RCVR channel formatted trace file for new (AIX) platform:
• Current socket receive buffer size: 262088
• Current socket send buffer size: 262088
• Socket send buffer size: 2048
• Socket receive buffer size: 32766
• Final socket receive buffer size: 32766
• Final socket send buffer size: 2048
• RCVR channel formatted trace file for current (non-IBM) platform:
• Current socket receive buffer size: 263536
• Current socket send buffer size: 262144
• Socket send buffer size: 2048
• Socket receive buffer size: 32766
• Final socket receive buffer size: 263536
• Final socket send buffer size: 2048
IBM platform uses setsockopt() values – non-IBM uses hybrid
20
21. Week #3 (con’d) – If Only it Were That Easy
• This should be simple – hard code the Send/Recv buffer values that are
being used in the current environment into the qm.ini for the QMGR in
the new environment, retest and we should be done:
TCP:
SndBuffSize=32766
RcvBuffSize=263536
• Success? Not so much – In fact, that test showed no difference than the
‘out of the box’ values?!? Now what?
• Of course – more AIX tuning – ‘tcp_init_window=16’, increase CPU
entitlements to lpars and increase VEA buffers…no help
• Wireshark shows 256K max ‘Bytes in Flight’, even with 5MB buffers
• AIX L3 supplies TCP Congestion Window ifix – “Silver Bullet”
• WMQ L3 requests a fresh set of traces, for both platforms
• This is their analysis:
21
22. Week #4 – WMQ L3 Analysis of Latest Traces
Non-IBM AIX
WMQ 7.0.1.9 WMQ 7.1.0.1
non-SSL SSL
Log write avg: 2359us 1562us
ccxSend avg: 165us 3234us
Send avg: 55us 122us
Send max: 187us 10529us
ccxReceive avg: 91ms 177ms
ccxReceive min: 74ms 100ms
Batch size: 50 50
MQGET avg: 6708us 2784us
Msg size: 10K 10K
Overall time: 10s 10s
Notes:
ccxSend: This is the time to execute the simple MQ wrapper function ccxSend
which calls the TCP send function.
Send: A long send might indicate that the TCP send buffer is full.
ccxReceive: This is the time to execute the simple MQ wrapper function ccxReceive
which calls the TCP recv function.
22
23. Week #4 (con’d) – WMQ L3 Analysis of Latest Traces
• We temporarily disabled SSL on AIX and saw a definite improvement – it
was comparable w/ the current platform. (The fallacy here is that it
should be better!)
• In addition, we are still using 5MB Send/Recv buffers, so this is not a viable
solution.
• With all of the recommended changes that have been made, we
requested IBM compose a list of everything that is still ‘in play’, for both
AIX and WMQ. What is really needed vs. what was an educated guess.
• IBM believes they have met/exceeded the original objective of
matching/beating current platform performance.
• Exec. conf call held to discuss current status, review recommended
changes and determine next steps.
• Most importantly – We are still seeing the increasing Qtime.
23
24. Week #4 (con’d) – You Can’t (Won’t) Always Get What You Want
• There had to be some reason why, when hard-coding the values being
used on the non-IBM system, we did not see the same results
TCP:
SndBuffSize=32766
RcvBuffSize=263536
• Looking at the traces, something just didn’t seem right
• Were we actually getting the values we were specifying in qm.ini? It did
not appear so.
• There had to be an explanation
• What were we missing?
• As it turns out, a very important, yet extremely difficult to find set of
parameters was the key…
24
25. Week #4 (con’d) – Let’s Go to the Undocumented Parameters
• Sometimes, all it takes is the ‘right’ search, using your favorite search
engine.
• We found some undocumented (‘under-documented’) TCP buffer
parameters, that complement their more well-known brothers:
TCP:
SndBuffSize=262144
RcvBuffSize=2048
RcvSendBuffSize=2048
RcvRcvBuffSize=262144
• Now it’s starting to make sense.
• Now we see why the traces were not showing our values being accepted.
• Now we know how to correctly specify the buffers for both SDR and RCVR
25
26. (32K) (32K)
(2K)(2K)
WebSphere MQ – Established
TCP Channel Connection
Both sockets have a send buffer and a receive buffer for data
• write() adds data to the send buffer for transfer
• read() removes data arrived on the receive buffer
• poll() and select() wait for a buffer to be ready
26
27. (256K) (256K)
(2K)(2K)
WebSphere MQ – Established
TCP Channel Connection
SndBuffSize=256K
(Default is 32K)
RcvRcvBuffSize=256K
(Default is 32K)
RcvSndBuffSize=2K
(Default is 2K)
RcvBuffSize=2K
(Default is 2K)
Both sockets have a send buffer and a receive buffer for data
• write() adds data to the send buffer for transfer
• read() removes data arrived on the receive buffer
• poll() and select() wait for a buffer to be ready
‘Under-
documented’
tuning parameters
27
28. Conclusions – What We Learned
• Desired values for message rate and (lack of) latency were achieved by
tuning only WMQ TCP Send/Recv buffers. (‘tcp_init_window=16’ was also
left in place, as it produced a noticeable, albeit small performance gain)
• The use of the ‘under-documented’ TCP parameters was the key in
resolving this issue.
• Continue doing in-house problem determination, even after opening a
PMR.
• Do not be afraid to push back on suggestions/recommendations that do
not feel right. No one knows your environment better than you.
• Take some time to learn how to read and interpret traces. (Handy ksh cmd
we used: find . –name ‘*.FMT’ | xargs sort –k1.2,1 > System.FMTALL,
which creates a timestamp-sorted formatted trace file from all indiv files)
• TCP is complicated – Thankfully, WMQ insulates you from almost all of it.
• Keep searching the Web, using different combinations of search criteria –
you never know what you might find.
28
29.
30. Arthur C. Schanz
Distributed Computing Specialist
Federal Reserve Information Technology
Arthur.Schanz@frit.frb.org
Quazi AhmedChad Little
Josh Heinze Larry Halley
31. Appendix: TCP Init Window Size
10GE
41ms
Round trip
TCP – 256KB Maximum Window
Site 1 Site 2
• AIX Initial TCP Window Size is 0, which causes TCP to perform dynamic window size
increases (‘ramp-up’) as packets begin to flow across the connection.
• 1K -> 2K -> 4K -> 8K -> 16K ->…
• Increasing the ‘tcp_init_window’ parameter to 16K allows TCP to start at a larger
window size, therefore allowing a quicker ramp-up of data flow.
31