Ryan Sciampacone – IBM Java Runtime Lead1st October 2012High Speed NetworksFree Performance or New Bottlenecks?           ...
Important Disclaimers    THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR      INFORMATIONAL PURPOSES ONLY. ...
Introduction to the speaker    ■   15 years experience developing and deploying Java SDKs    ■   Recent work focus:       ...
What should you get from this talk?■   Understand the current state of high speed networks in the context of Java    devel...
Life In The Fast Lane■   “Never underestimate the bandwidth of a station wagon full of tapes hurtling down    the highway....
Network Overview6                      © 2012 IBM Corporation
Network Speeds Over Time                                Comparison of Network Speeds                      10Mbs           ...
Network Speeds Over Time                                 Comparison of Network Speeds                       10Mbs         ...
Network Speeds vs. The World                                 Networks vs. Other Storage Bandwidth                       1G...
Networks Now vs. Yesterday■    Real opportunity to look at decentralized systems■    Already true:      – Cloud computing ...
What is InfiniBand?■    Originated in 1999 from the merger of two competing designs■    Features      – High throughput   ...
IB vs. IPoIB vs. SDP – InfiniBand           IB      Application              Modified application using                   ...
IB vs. IPoIB vs. SDP – IP over InfiniBand           IB                IPoIB      Application         Application          ...
IB vs. IPoIB vs. SDP – IP over InfiniBand           IB               IPoIB               SDP      Application        Appli...
Throughput vs. Latency15                            © 2012 IBM Corporation
Throughput vs. Latency16                            © 2012 IBM Corporation
Throughput vs. Latency     Data unit used for measuring       throughput and latency17                                    ...
Throughput vs. Latency     Data unit used for measuring       throughput and latency18                                    ...
Throughput vs. Latency                                    Length of time for a data unit to travel     Data unit used for ...
Throughput vs. Latency                                    Length of time for a data unit to travel     Data unit used for ...
Throughput vs. Latency                                    Length of time for a data unit to travel     Data unit used for ...
Throughput vs. Latency                                    Length of time for a data unit to travel     Data unit used for ...
Throughput vs. Latency                                    Length of time for a data unit to travel     Data unit used for ...
Throughput vs. Latency                                       Length of time for a data unit to travel        Data unit use...
Throughput vs. Latency■    Motivations can characterize priorities      – They are not necessarily related!■    Higher thr...
Simple Test using IB26                          © 2012 IBM Corporation
Simple Test using IB – Background■    Experiment: Can Java exploit RDMA to get better performance?■    Tests conducted    ...
Simple Test using IB – IPoIB Comparison                                                             Throughput comparison ...
Simple Test using IB – SDP Comparison                                                             Throughput comparison fo...
Interlude – Zero Copy 64k Boundary■    Classic networking (java.net) package                       Java             Native...
Interlude – Zero Copy 64k Boundary■    Classic networking (java.net) package                           Java           Nati...
Interlude – Zero Copy 64k Boundary■    Classic networking (java.net) package                           Java           Nati...
Interlude – Zero Copy 64k Boundary■    Classic networking (java.net) package                           Java           Nati...
Interlude – Zero Copy 64k Boundary■    Classic networking (java.net) package                           Java               ...
Interlude – Zero Copy 64k Boundary■    Classic networking (java.net) package                           Java               ...
Interlude – Zero Copy 64k Boundary■    Classic networking (java.net) package                           Java               ...
Interlude – Zero Copy 64k Boundary■    Using DirectByteBuffer with SDP                      Java             Native       ...
Interlude – Zero Copy 64k Boundary■    Using DirectByteBuffer with SDP                      Java             Native       ...
Interlude – Zero Copy 64k Boundary■    Using DirectByteBuffer with SDP                      Java             Native       ...
Interlude – Zero Copy 64k Boundary■    Using DirectByteBuffer with SDP                      Java             Native       ...
Interlude – Zero Copy 64k Boundary■    Using DirectByteBuffer with SDP                       Java                Native   ...
Interlude – Zero Copy 64k Boundary■    But when the payload hits the “zero copy” threshold in SDP…                      Ja...
Interlude – Zero Copy 64k Boundary■    But when the payload hits the “zero copy” threshold in SDP…                      Ja...
Interlude – Zero Copy 64k Boundary■    But when the payload hits the “zero copy” threshold in SDP…                      Ja...
Interlude – Zero Copy 64k Boundary■    But when the payload hits the “zero copy” threshold in SDP…                      Ja...
Interlude – Zero Copy 64k Boundary■    But when the payload hits the “zero copy” threshold in SDP…                      Ja...
Interlude – Zero Copy 64k Boundary■    But when the payload hits the “zero copy” threshold in SDP…                      Ja...
Interlude – Zero Copy 64k Boundary■    But when the payload hits the “zero copy” threshold in SDP…                        ...
Simple Test using IB – SDP Comparison                                                             Throughput comparison fo...
Simple Test using IB – RDMA Comparison                                                             Throughput comparison f...
Simple Test using IB – What about that Zero Copy                                      Threshold?                          ...
Simple Test using IB – Summary■    Simple steps to start using      – IPoIB lets you use your application ‘as is’■    Incr...
ORB and High Speed Networks53                                 © 2012 IBM Corporation
Benchmarking the ORB – Background■    Experiment: How does the ORB perform over InfiniBand?■    Tests conducted      – Sen...
Benchmarking the ORB – Ethernet Results                                                            ORB Echo Test Performan...
Benchmarking the ORB – SDP                                                            ORB Echo Test Performance           ...
Benchmarking the ORB – ORB Transmission Buffers     byte[ ]57                                                           © ...
Benchmarking the ORB – ORB Transmission Buffers                                               ORB     byte[ ]58           ...
Benchmarking the ORB – ORB Transmission Buffers                                               ORB     byte[ ]             ...
Benchmarking the ORB – ORB Transmission Buffers                                               ORB     byte[ ]             ...
Benchmarking the ORB – ORB Transmission Buffers                                               ORB     byte[ ]             ...
Benchmarking the ORB – ORB Transmission Buffers                                                 ORB     byte[ ]           ...
Benchmarking the ORB – ORB Transmission Buffers                                                 ORB                       ...
Benchmarking the ORB – ORB Transmission Buffers                                                             ORB           ...
Benchmarking the ORB – ORB Transmission Buffers     ■   3KB to 4KB ORB buffer sizes were sufficient for Ethernet          ...
Benchmarking the ORB – ORB Transmission Buffers■    64KB was the best size for SDP                         ORB            ...
Benchmarking the ORB – Garbage Collector Impact■    Allocating large objects (e.g., buffers) can be a costly operation    ...
Benchmarking the ORB – Garbage Collector Impact■    Allocating large objects (e.g., buffers) can be a costly operation    ...
Benchmarking the ORB – Garbage Collector Impact■    Allocating large objects (e.g., buffers) can be a costly operation    ...
Benchmarking the ORB – Garbage Collector Impact■    Allocating large objects (e.g., buffers) can be a costly operation    ...
Benchmarking the ORB – Thread Pools■    Thread and connection count ratios are a factor                      Client       ...
Benchmarking the ORB – Thread Pools■    Thread and connection count ratios are a factor                      Client       ...
Benchmarking the ORB – Thread Pools■    Thread and connection count ratios are a factor                      Client       ...
Benchmarking the ORB – Thread Pools■    Thread and connection count ratios are a factor                      Client       ...
Benchmarking the ORB – Thread Pools■    Thread and connection count ratios are a factor                      Client       ...
Benchmarking the ORB – Thread Pools■    Thread and connection count ratios are a factor                      Client       ...
Benchmarking the ORB – Thread Pools■    Thread and connection count ratios are a factor                      Client       ...
Benchmarking the ORB – Thread Pools■    Thread and connection count ratios are a factor                      Client       ...
Benchmarking the ORB – Thread Pools■    Thread and connection count ratios are a factor                      Client       ...
Benchmarking the ORB – Thread Pools■    Thread and connection count ratios are a factor                      Client       ...
Benchmarking the ORB – Thread Pools■    Thread and connection count ratios are a factor                      Client       ...
Benchmarking the ORB – Thread Pools■    Thread and connection count ratios are a factor                      Client       ...
Benchmarking the ORB – Post Optimization Round                                                      ORB Echo Test Performa...
Benchmarking the ORB – Post Optimization Round                                                             ORB Echo Test P...
Benchmarking the ORB – Post Optimization Round                                                            ORB Echo Test Pe...
Benchmarking the ORB – Summary■    It’s not as easy as “stepping on the gas”      – High speed networks alone don’t resolv...
And after all that…87                         © 2012 IBM Corporation
Conclusion■    High speed networks are a game changer■    Simple to use, hard to use effectively■    Expectations based on...
Questions?89                © 2012 IBM Corporation
References ■   Get Products and Technologies:      – IBM Java Runtimes and SDKs:          • https://www.ibm.com/developerw...
Copyright and Trademarks © IBM Corporation 2012. All Rights Reserved. IBM, the IBM logo, and ibm.com are trademarks or reg...
Upcoming SlideShare
Loading in...5
×

High speed networks and Java (Ryan Sciampacone)

1,765

Published on

Networking technology has improved constantly over time, and it is now regularly possible to get bandwidths of 10 Gbps and often considerably more. Is this purely “free speed,” or does it simply create new application bottlenecks and scaling challenges? This session begins by discussing how to enable Java for high-speed communications, such as SDP, and then moves on to sharing some hard-learned real-world experiences showing how improving network speeds often results in unexpected surprises. Come hear about the amazing promise of RDMA and the sometimes sobering reality of high-speed networks. Take away a clear view of the issues, and hear some practical advice on achieving great performance when moving Java applications to high-speed networks.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,765
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
47
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

High speed networks and Java (Ryan Sciampacone)

  1. 1. Ryan Sciampacone – IBM Java Runtime Lead1st October 2012High Speed NetworksFree Performance or New Bottlenecks? © 2012 IBM Corporation
  2. 2. Important Disclaimers THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY. WHILST EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. ALL PERFORMANCE DATA INCLUDED IN THIS PRESENTATION HAVE BEEN GATHERED IN A CONTROLLED ENVIRONMENT. YOUR OWN TEST RESULTS MAY VARY BASED ON HARDWARE, SOFTWARE OR INFRASTRUCTURE DIFFERENCES. ALL DATA INCLUDED IN THIS PRESENTATION ARE MEANT TO BE USED ONLY AS A GUIDE. IN ADDITION, THE INFORMATION CONTAINED IN THIS PRESENTATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM, WITHOUT NOTICE. IBM AND ITS AFFILIATED COMPANIES SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION. NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF: - CREATING ANY WARRANT OR REPRESENTATION FROM IBM, ITS AFFILIATED COMPANIES OR ITS OR THEIR SUPPLIERS AND/OR LICENSORS2 © 2012 IBM Corporation
  3. 3. Introduction to the speaker ■ 15 years experience developing and deploying Java SDKs ■ Recent work focus: ■ Managed Runtime Architecture ■ Java Virtual Machine improvements ■ Multi-tenancy technology ■ Native data access and heap density ■ Footprint and performance ■ Garbage Collection ■ Scalability and pause time reduction ■ Advanced GC technology ■ My contact information: – Ryan_Sciampacone@ca.ibm.com3 © 2012 IBM Corporation
  4. 4. What should you get from this talk?■ Understand the current state of high speed networks in the context of Java development and take away a clear view of the issues involved. Learn practical approaches to achieving great performance, including how to understand results that initially don’t make sense.4 © 2012 IBM Corporation
  5. 5. Life In The Fast Lane■ “Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway.” -- Andrew S. Tanenbaum, Computer Networks, 4th ed., p. 91■ Networks often just thought of as a simple interconnect between systems■ No real differentiators – WAN vs. LAN – Wired vs. Wireless■ APIs traditionally make this invisible – Socket API is good at hiding things (SDP, SMC-R, TCP/IP)■ Can today’s network offerings be exploited to improve existing performance?5 © 2012 IBM Corporation
  6. 6. Network Overview6 © 2012 IBM Corporation
  7. 7. Network Speeds Over Time Comparison of Network Speeds 10Mbs 100Mbs 1GigE 10GigE InfiniBand■ Consistent advancement in speeds over the years■ Networks have come a long way in that time7 © 2012 IBM Corporation
  8. 8. Network Speeds Over Time Comparison of Network Speeds 10Mbs 100Mbs 1GigE 10GigE InfiniBand■ Oh sorry – that was a logarithmic scaled chart!8 © 2012 IBM Corporation
  9. 9. Network Speeds vs. The World Networks vs. Other Storage Bandwidth 1GigE 10GigE InfiniBand Core i7 SSD■ Bandwidth differences between memory and InfiniBand still a ways off■ But the gap is getting smaller!9 © 2012 IBM Corporation
  10. 10. Networks Now vs. Yesterday■ Real opportunity to look at decentralized systems■ Already true: – Cloud computing – Data grids – Distributed computation■ Network distance isn’t as far as it used to be!10 © 2012 IBM Corporation
  11. 11. What is InfiniBand?■ Originated in 1999 from the merger of two competing designs■ Features – High throughput – Low Latency – Quality of Service – Failover – Designed to be scalable■ Offers low latency RDMA (Remote Direct Memory Access)■ Uses a different programming model than traditional sockets – No “standard” API – De-facto: OFED (OpenFabrics Enterprise Distribution) – Upper layer protocols (ULPs) exist to ease the pain of development11 © 2012 IBM Corporation
  12. 12. IB vs. IPoIB vs. SDP – InfiniBand IB Application Modified application using IB specific communication IB Services mechanism Bypass of Kernel facilities. Effectively a “zero hop” to The communication layer IB Core Device Driver■ Handles all transmission aspects (guarantees, transmission units, etc)■ Extremely low CPU cost12 © 2012 IBM Corporation
  13. 13. IB vs. IPoIB vs. SDP – IP over InfiniBand IB IPoIB Application Application Application uses standard socket APIs for communication IB Services Socket API TCP/IP Entire TCP/IP stack used but resides on a mapping / conversion layer (IPoIB) IPoIB IB Core IB Core Device Driver Device Driver■ Effectively TCP/IP stack using a “device driver” to interface the IB layer■ High CPU cost13 © 2012 IBM Corporation
  14. 14. IB vs. IPoIB vs. SDP – IP over InfiniBand IB IPoIB SDP Application Application Application Application uses standard socket APIs for IB Services Socket API Socket API communication TCP/IP Although socket SDP API based uses its own lighter IPoIB weight mechanisms and mappings IB Core IB Core IB Core to leverage IB Device Driver Device Driver Device Driver■ Largely bypasses the kernel but still incurs an extra hop during transmission■ Medium CPU cost14 © 2012 IBM Corporation
  15. 15. Throughput vs. Latency15 © 2012 IBM Corporation
  16. 16. Throughput vs. Latency16 © 2012 IBM Corporation
  17. 17. Throughput vs. Latency Data unit used for measuring throughput and latency17 © 2012 IBM Corporation
  18. 18. Throughput vs. Latency Data unit used for measuring throughput and latency18 © 2012 IBM Corporation
  19. 19. Throughput vs. Latency Length of time for a data unit to travel Data unit used for measuring from the start to end point throughput and latency19 © 2012 IBM Corporation
  20. 20. Throughput vs. Latency Length of time for a data unit to travel Data unit used for measuring from the start to end point throughput and latency Latency e.g., 10ms20 © 2012 IBM Corporation
  21. 21. Throughput vs. Latency Length of time for a data unit to travel Data unit used for measuring from the start to end point throughput and latency Latency e.g., 10ms21 © 2012 IBM Corporation
  22. 22. Throughput vs. Latency Length of time for a data unit to travel Data unit used for measuring from the start to end point Numbers of data units throughput and latency that arrive per time Latency measurement e.g., 10ms22 © 2012 IBM Corporation
  23. 23. Throughput vs. Latency Length of time for a data unit to travel Data unit used for measuring from the start to end point Numbers of data units throughput and latency that arrive per time Latency measurement e.g., 10ms Throughput e.g., 10Gb/s23 © 2012 IBM Corporation
  24. 24. Throughput vs. Latency Length of time for a data unit to travel Data unit used for measuring from the start to end point Numbers of data units throughput and latency that arrive per time Latency measurement e.g., 10ms Throughput e.g., 10Gb/s■ Shower analogy – Diameter of the pipe gives you water throughput – Length determines the time it takes for a drop to travel from end to end24 © 2012 IBM Corporation
  25. 25. Throughput vs. Latency■ Motivations can characterize priorities – They are not necessarily related!■ Higher throughput rates offer interesting optimization possibilities – Reduced pressure on compressing data – Reduced pressure on being selective about what data to send■ For something like RDMA… just send the entire page25 © 2012 IBM Corporation
  26. 26. Simple Test using IB26 © 2012 IBM Corporation
  27. 27. Simple Test using IB – Background■ Experiment: Can Java exploit RDMA to get better performance?■ Tests conducted – Send different sized packets from a client to a server – Time required to complete write – Test variations include communication layer with RDMA■ Conditions – Single threaded – 40Gb/s InfiniBand■ Goal being to look at – Network speeds – Baseline overhead that Java imposes over C equivalent programs – Existing issues that may not have been predicted■ Also going to look at very basic Java overhead – Comparisons will go against C equivalent program27 © 2012 IBM Corporation
  28. 28. Simple Test using IB – IPoIB Comparison Throughput comparison for C / Java Throughput C IPoIB Java DBB IPoIB Better 1m 4m 6m m m 1k 4k 16 64 1 4 k k 6 6k 16 64 25 16 64 25 25 Payload Size■ DirectByteBuffer (NIO socket channel) to avoid marshalling costs (JNI)■ Observations – C code is initially faster than Java implementation – Generally even after 128k payload size28 © 2012 IBM Corporation
  29. 29. Simple Test using IB – SDP Comparison Throughput comparison for C / Java Throughput C IPoIB Java DBB IPoIB C SDP Java DBB SDP Better 1m 4m 6m m m 1k 4k 16 64 1 4 k k 6 6k 16 64 25 16 64 25 25 Payload Size■ DirectByteBuffer (NIO socket channel) to avoid marshalling costs (JNI)■ Observations – C code is initially faster than Java implementation – Generally even after 128k payload size29 © 2012 IBM Corporation
  30. 30. Interlude – Zero Copy 64k Boundary■ Classic networking (java.net) package Java Native Kernel JNI30 © 2012 IBM Corporation
  31. 31. Interlude – Zero Copy 64k Boundary■ Classic networking (java.net) package Java Native Kernel byte[ ] JNI31 © 2012 IBM Corporation
  32. 32. Interlude – Zero Copy 64k Boundary■ Classic networking (java.net) package Java Native Kernel byte[ ] JNI write data32 © 2012 IBM Corporation
  33. 33. Interlude – Zero Copy 64k Boundary■ Classic networking (java.net) package Java Native Kernel byte[ ] JNI write data33 © 2012 IBM Corporation
  34. 34. Interlude – Zero Copy 64k Boundary■ Classic networking (java.net) package Java Native Kernel copy byte[ ] JNI write data34 © 2012 IBM Corporation
  35. 35. Interlude – Zero Copy 64k Boundary■ Classic networking (java.net) package Java Native Kernel copy copy byte[ ] JNI write data35 © 2012 IBM Corporation
  36. 36. Interlude – Zero Copy 64k Boundary■ Classic networking (java.net) package Java Native Kernel copy copy byte[ ] Transmit JNI write data■ 2 copies before data gets transmitted■ Lots of CPU burn, lots of memory being consumed36 © 2012 IBM Corporation
  37. 37. Interlude – Zero Copy 64k Boundary■ Using DirectByteBuffer with SDP Java Native Kernel37 © 2012 IBM Corporation
  38. 38. Interlude – Zero Copy 64k Boundary■ Using DirectByteBuffer with SDP Java Native Kernel38 © 2012 IBM Corporation
  39. 39. Interlude – Zero Copy 64k Boundary■ Using DirectByteBuffer with SDP Java Native Kernel write data39 © 2012 IBM Corporation
  40. 40. Interlude – Zero Copy 64k Boundary■ Using DirectByteBuffer with SDP Java Native Kernel copy write data40 © 2012 IBM Corporation
  41. 41. Interlude – Zero Copy 64k Boundary■ Using DirectByteBuffer with SDP Java Native Kernel copy Transmit write data■ 1 copy before data gets transmitted■ Less CPU burn, less memory being consumed41 © 2012 IBM Corporation
  42. 42. Interlude – Zero Copy 64k Boundary■ But when the payload hits the “zero copy” threshold in SDP… Java Native Kernel42 © 2012 IBM Corporation
  43. 43. Interlude – Zero Copy 64k Boundary■ But when the payload hits the “zero copy” threshold in SDP… Java Native Kernel >64KB43 © 2012 IBM Corporation
  44. 44. Interlude – Zero Copy 64k Boundary■ But when the payload hits the “zero copy” threshold in SDP… Java Native Kernel >64KB write data44 © 2012 IBM Corporation
  45. 45. Interlude – Zero Copy 64k Boundary■ But when the payload hits the “zero copy” threshold in SDP… Java Native Kernel >64KB write data Memory is “registered” for use with RDMA (direct send from user space memory)45 © 2012 IBM Corporation
  46. 46. Interlude – Zero Copy 64k Boundary■ But when the payload hits the “zero copy” threshold in SDP… Java Native Kernel >64KB write data Memory is “registered” for use This is extremely with RDMA (direct send from expensive / slow! user space memory)46 © 2012 IBM Corporation
  47. 47. Interlude – Zero Copy 64k Boundary■ But when the payload hits the “zero copy” threshold in SDP… Java Native Kernel >64KB Transmit write data Memory is “registered” for use This is extremely with RDMA (direct send from expensive / slow! user space memory)47 © 2012 IBM Corporation
  48. 48. Interlude – Zero Copy 64k Boundary■ But when the payload hits the “zero copy” threshold in SDP… Java Native Kernel >64KB write data Memory is “unregistered” when the send completes■ 1 copy before data gets transmitted■ Register / unregister is prohibitively expensive (every transmit!)48 © 2012 IBM Corporation
  49. 49. Simple Test using IB – SDP Comparison Throughput comparison for C / Java Throughput C IPoIB Java DBB IPoIB C SDP Java DBB SDP Better 1m 4m 6m m m 1k 4k 16 64 1 4 k k 6 6k 16 64 25 16 64 25 25 Payload Size■ Post Zero Copy threshold there is a sharp drop – Cost of memory register / unregister■ Eventual climb and plateau – Benefits of zero copy cannot outweigh the drawbacks49 © 2012 IBM Corporation
  50. 50. Simple Test using IB – RDMA Comparison Throughput comparison for C / Java C IPoIB Throughput Java DBB IPoIB C SDP Java DBB SDP C RDMA/W Better Java DBB RDMA/W 1m 4m m m 6m k k 1k 4k 1 4 6 16 64 6k 16 64 25 16 64 25 25 Payload Size■ No “zero copy” threshold issues – Always zero copy – Memory registered once, reused■ Throughput does finally plateau – Single thread – pipe is hardly saturated50 © 2012 IBM Corporation
  51. 51. Simple Test using IB – What about that Zero Copy Threshold? Zero Copy Threshold Comparison 4k 8k Throughput 16k 32k 64k 128k Better 256k 512k m m k 6m k 1m 4m 6 16 k 64 1 4 1k 4k 6 16 64 25 64 16 25 25 Payload Size■ SDP ultimately has a plateau here – Possibly other deeper tuning aspects available■ Pushing the threshold for zero copy out has no advantange■ Claw back is still ultimately limited – Likely gated by some other aspect of the system■ 64KB threshold (default) seems to be the “sweet spot”51 © 2012 IBM Corporation
  52. 52. Simple Test using IB – Summary■ Simple steps to start using – IPoIB lets you use your application ‘as is’■ Increased speed can potentially involve significant application changes – Potential need for deeper technical knowledge – SDP is an interesting stop gap■ There are hidden gotchas! – Increased load changes the game – but this is standard when dealing with computers52 © 2012 IBM Corporation
  53. 53. ORB and High Speed Networks53 © 2012 IBM Corporation
  54. 54. Benchmarking the ORB – Background■ Experiment: How does the ORB perform over InfiniBand?■ Tests conducted – Send different sized packets from a client to a server – Time required for write followed by read – Compare standard Ethernet to SDP / IPoIB■ Conditions – 500 client threads – Echo style test (send to server, server echoes data back) – byte[] payload – 40Gb/s InfiniBand■ Goal being to look at – ORB performance when data pipe isn’t the bottleneck (Time to complete benchmark) – Threading performance■ Realistically expecting to discover bottlenecks in the ORB54 © 2012 IBM Corporation
  55. 55. Benchmarking the ORB – Ethernet Results ORB Echo Test Performance Time to Complete ETH Better 1k 2k 4k 8k 16k 32k 64k 128k 256k 512k 1m Payload Size■ Standard Ethernet with the classic java.net package55 © 2012 IBM Corporation
  56. 56. Benchmarking the ORB – SDP ORB Echo Test Performance Time to Complete ETH SDP Better 1k 2k 4k 8k 16k 32k 64k 128k 256k 512k 1m Payload Size■ …And this is with SDP (could be better)56 © 2012 IBM Corporation
  57. 57. Benchmarking the ORB – ORB Transmission Buffers byte[ ]57 © 2012 IBM Corporation
  58. 58. Benchmarking the ORB – ORB Transmission Buffers ORB byte[ ]58 © 2012 IBM Corporation
  59. 59. Benchmarking the ORB – ORB Transmission Buffers ORB byte[ ] write data59 © 2012 IBM Corporation
  60. 60. Benchmarking the ORB – ORB Transmission Buffers ORB byte[ ] write data Internal buffer for transmission60 © 2012 IBM Corporation
  61. 61. Benchmarking the ORB – ORB Transmission Buffers ORB byte[ ] write data 2KB 1KB61 © 2012 IBM Corporation
  62. 62. Benchmarking the ORB – ORB Transmission Buffers ORB byte[ ] write data 1KB 1KB 1KB62 © 2012 IBM Corporation
  63. 63. Benchmarking the ORB – ORB Transmission Buffers ORB copy byte[ ] write data 1KB 1KB 1KB63 © 2012 IBM Corporation
  64. 64. Benchmarking the ORB – ORB Transmission Buffers ORB copy byte[ ] Transmit write data 1KB 1KB 1KB■ Many additional costs being incurred (per thread!) to transmit a byte array64 © 2012 IBM Corporation
  65. 65. Benchmarking the ORB – ORB Transmission Buffers ■ 3KB to 4KB ORB buffer sizes were sufficient for Ethernet ORB Socket Layer Transmit 4KB■ Existing bottlenecks outside the ORB (buffer management)■ Throughput couldn’t be pushed much further65 © 2012 IBM Corporation
  66. 66. Benchmarking the ORB – ORB Transmission Buffers■ 64KB was the best size for SDP ORB Native copy Transmit 64KB■ Zero Copy Threshold!66 © 2012 IBM Corporation
  67. 67. Benchmarking the ORB – Garbage Collector Impact■ Allocating large objects (e.g., buffers) can be a costly operation Heap Free Memory Allocated Memory67 © 2012 IBM Corporation
  68. 68. Benchmarking the ORB – Garbage Collector Impact■ Allocating large objects (e.g., buffers) can be a costly operation Buffer Heap Free Memory Allocated Memory68 © 2012 IBM Corporation
  69. 69. Benchmarking the ORB – Garbage Collector Impact■ Allocating large objects (e.g., buffers) can be a costly operation Buffer Allocate where? Heap Free Memory Allocated Memory69 © 2012 IBM Corporation
  70. 70. Benchmarking the ORB – Garbage Collector Impact■ Allocating large objects (e.g., buffers) can be a costly operation Buffer Allocate where? Heap Free Memory Allocated Memory■ Premature Garbage Collections in order to “clear space” for large allocations70 © 2012 IBM Corporation
  71. 71. Benchmarking the ORB – Thread Pools■ Thread and connection count ratios are a factor Client Server71 © 2012 IBM Corporation
  72. 72. Benchmarking the ORB – Thread Pools■ Thread and connection count ratios are a factor Client Server72 © 2012 IBM Corporation
  73. 73. Benchmarking the ORB – Thread Pools■ Thread and connection count ratios are a factor Client Server 00 ds 5 a e T hr73 © 2012 IBM Corporation
  74. 74. Benchmarking the ORB – Thread Pools■ Thread and connection count ratios are a factor Client Server 00 ds 5 a e T hr 1 Connection74 © 2012 IBM Corporation
  75. 75. Benchmarking the ORB – Thread Pools■ Thread and connection count ratios are a factor Client Server 00 ds 5 a e T hr 1 Connection■ Highly contended resource■ Couldn’t saturate the communication channel75 © 2012 IBM Corporation
  76. 76. Benchmarking the ORB – Thread Pools■ Thread and connection count ratios are a factor Client Server 00 ds 5 a e T hr76 © 2012 IBM Corporation
  77. 77. Benchmarking the ORB – Thread Pools■ Thread and connection count ratios are a factor Client Server 00 ds 5 a e T hr77 © 2012 IBM Corporation
  78. 78. Benchmarking the ORB – Thread Pools■ Thread and connection count ratios are a factor Client Server 00 ds 5 a e T hr 500 Connections78 © 2012 IBM Corporation
  79. 79. Benchmarking the ORB – Thread Pools■ Thread and connection count ratios are a factor Client Server 00 ds 5 a e T hr 500 Connections■ Context switching disaster■ Threads queued and unable to complete transmit■ Memory / resource consumption nightmare79 © 2012 IBM Corporation
  80. 80. Benchmarking the ORB – Thread Pools■ Thread and connection count ratios are a factor Client Server 00 ds 5 a e T hr80 © 2012 IBM Corporation
  81. 81. Benchmarking the ORB – Thread Pools■ Thread and connection count ratios are a factor Client Server 00 ds 5 a e T hr 10 Connections81 © 2012 IBM Corporation
  82. 82. Benchmarking the ORB – Thread Pools■ Thread and connection count ratios are a factor Client Server 00 ds 5 a e T hr 10 Connections■ 2-5% of the client thread count appeared to be best■ Saturate the communication pipe enough to achieve best throughput■ Keep resource consumption and context switches to a minimum82 © 2012 IBM Corporation
  83. 83. Benchmarking the ORB – Post Optimization Round ORB Echo Test Performance Time to Complete ETH SDP Better 1k 2k 4k 8k 16k 32k 64k 128k 256k 512k 1m Payload Size83 © 2012 IBM Corporation
  84. 84. Benchmarking the ORB – Post Optimization Round ORB Echo Test Performance Time to Complete ETH SDP SDP (New) Better 1k 2k 4k 8k 16k 32k 64k 128k 256k 512k 1m Payload Size■ Hey, great! Still not super (or the difference you’d expect) but it’s a good start■ NOTE: 64k threshold definitely a big part of the whole thing84 © 2012 IBM Corporation
  85. 85. Benchmarking the ORB – Post Optimization Round ORB Echo Test Performance Time to Complete ETH SDP SDP (New) IPoIB Better 1k 2k 4k 8k 16k 32k 64k 128k 256k 512k 1m Payload Size■ No surprises, IPoIB has higher overhead than SDP■ 64KB numbers are actually quite close – so still issues to discover and fix85 © 2012 IBM Corporation
  86. 86. Benchmarking the ORB – Summary■ It’s not as easy as “stepping on the gas” – High speed networks alone don’t resolve your problems. – Software layers are going to have bottlenecks. – Improvements for high speed networks can help traditional ones as well■ Benefit is not always clear cut86 © 2012 IBM Corporation
  87. 87. And after all that…87 © 2012 IBM Corporation
  88. 88. Conclusion■ High speed networks are a game changer■ Simple to use, hard to use effectively■ Expectations based on past results need to be re-evaluated■ Existing applications / frameworks may need tuning or optimization■ Opening of potentially new possibilities88 © 2012 IBM Corporation
  89. 89. Questions?89 © 2012 IBM Corporation
  90. 90. References ■ Get Products and Technologies: – IBM Java Runtimes and SDKs: • https://www.ibm.com/developerworks/java/jdk/ – IBM Monitoring and Diagnostic Tools for Java: • https://www.ibm.com/developerworks/java/jdk/tools/ ■ Learn: – IBM Java InfoCenter: • http://publib.boulder.ibm.com/infocenter/java7sdk/v7r0/index.jsp ■ Discuss: – IBM Java Runtimes and SDKs Forum: • http://www.ibm.com/developerworks/forums/forum.jspa?forumID=367&start=090 © 2012 IBM Corporation
  91. 91. Copyright and Trademarks © IBM Corporation 2012. All Rights Reserved. IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corp., and registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web – see the IBM “Copyright and trademark information” page at URL: www.ibm.com/legal/copytrade.shtml91 © 2012 IBM Corporation
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×