Your SlideShare is downloading. ×
Java Marshalling: A Performance Approach
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Java Marshalling: A Performance Approach

474
views

Published on

Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1ghWIic. …

Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1ghWIic.

Todd Montgomery proposes a new approach to marshalling in Java using FIX/SBE, new marshalling API approaches, and the extensive application of mechanical sympathy to this problem domain. Filmed at qconsf.com.

Todd Montgomery is a networking hacker who has researched, designed, and built numerous protocols, messaging-oriented middleware systems, and real-time data systems, done research for NASA, contributed to the IETF and IEEE, and co-founded two startups. He currently works for Informatica as an architect.

Published in: Technology, News & Politics

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
474
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Java Marshaling: A Performance Approach Todd L. Montgomery! @toddlmontgomery! Informatica!
  • 2. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations /java-marshalling-performance InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month
  • 3. Presented at QCon San Francisco www.qconsf.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide
  • 4. Why Performance? Isn’t this solved?
  • 5. Capital Markets Performance can be a competitive advantage… Sp ee d re l at is i ve
  • 6. The Trader Fast Market Data Feed Handler Execution Got! INFA at 37.00 INFA at 37.00 Slow Market Data Feed Handler Starting Line Execution Got! INFA at 38.00
  • 7. The FX Trader Currency Pairs Trades $ / £ = 0.6375! £ / ¥ = 123.77! ¥ / $ = 0.0127 $1,000,000 = £637,500! ! £637,500 = 78,903,375¥! ! 78,903,375¥ = $1,002,072! 78,903,375¥ = $994,182.53 !!! ¥ / $ = 0.0126 ! CHA-CHING!
  • 8. The Exchange Venues Order Trader Acknowledge Cancel Alpha Tango Romeo
  • 9. What is Possible? App-to-App Latency Today, less than 100 50 ns. 10,000x improvement from 2004. Throughput / Core Today, more than 500M messages / sec Connections / Core Just easily passed 1M! Efficiency! ☟Cost, ☝Capacity ☝Profit
  • 10. Do n' t S et t le! Why Java? Why even an OS? Why even … ’t D o n m is e ! o mp r Co
  • 11. Mechanical Sympathy Encoding/Decoding Arrays = Predictable Striding is Key Boundary alignment can be important Java Best Practices Zero, low, controlled garbage generation Don’t create an object if you don’t have to Be green, recycle Input = Data Layout Compute as much as possible before generation of code Use computed constants as much as possible Do n’ t Ge n C r a p e r a te ! Full Stack Approach Know how to leverage the compiler to do the dirty work for you Know what the JIT will generate Managed runtime = know it, love it! (Know the OS. Know how disk/network work.)
  • 12. Layout & Striding 00004f0: 0000500: 0000510: 0000520: 0000530: 0000540: 0000550: 0000560: 0000570: 0000580: 0000590: 00005a0: 00005b0: 00005c0: 27cf 07ba 0b02 2148 621b 0999 ce7e aec2 6270 3277 a680 c885 3154 9c0e 5c08 9b14 a02b 8924 0a55 5cbe 86fb 776a af16 1b3b bd8d 770e b266 3dc6 726b 18e9 5095 2220 e068 e2ac d4e3 f4cf 1576 62de 7f05 91d3 1c94 21b5 8da2 90da cb25 1e30 4ad3 cca4 cdf8 f46f 8bbe 1ca2 2b35 4d92 bc80 d391 486d ce20 5f11 e325 01d0 6a31 f7c2 99ee 39b1 37b4 1882 ae90 de89 f2d9 f305 8569 76b8 5f71 0259 bbc2 b76a cfc4 56c9 d218 dea4 bb18 1f50 0929 8e18 6d49 1ae2 44e5 4845 b2b6 14ad 6a8b 81f1 a706 7607 9e8d a5a8 a4b0 8727 1b2c 13d4 98c4 8028 e520 62ff 7682 218d 51f2 d0d1 15bd 83b6 82d4 '..rk..Hm.....' ......... .imI., ...+P..%_.v..... !H.$" .0.%_qD... b..U.hJ....YHE.( .......j1..... .~.........j..b. ..wj...o....j.v. bp...v..9.V...!. 2w.;b...7.....Q. ......+5....v... ..w...M......... 1T.f.......P.... ..=.!......)....
  • 13. Do e s t h is lo o fa m i l ia r? k Data Layout > Code How do you go from a schema to generated code?
  • 14. Intermediate Representation Schema C++ Java Annotations IR On-The-Fly! Encoder C# On-The-Fly! Decoder Optimization
  • 15. Desirable IR Properties Serializable Self-Contained A must to be a tool chain Should be self generating Low amount of referencing Should be non-hierarchical Efficient to Traverse For On-The-Fly encoding/decoding Imagine - IR is an instruction “set” and data is “memory” e r a te ’t G e n ! Do n rap C Input = Data Layout Compute as much as possible before generation of code IR
  • 16. IR Optimization Local view can’t optimize for composition Not all fields aligned to boundaries X+4 uint32 X 6 X + 24 X + 16 1 double X + 12 Access order isn’t sequential float 5 int32 4 uint32 7 X + 40 uint64 X + 20 Not a big deal here, but in general a very big concern. 2 X + 32 int64 3
  • 17. IR Optimization 0 uint32 int64 8 double uint64 16 float double uint32 32 int32 6 int32 5 float uint32 3 2 1 4 7 double uint64 int64 int32 float uint32 uint32 uint32 uint64 int64 64 Alignment (Pad) Arrangement (Compact) Access (Order)
  • 18. IR What does it look like?
  • 19. … IR = List<Token> BEGIN FIELD! name = “header”! id = 101 BEGIN! COMPOSITE! offset = X! size = 16 ENCODING! type = int64! offset = 0! size = 8 X+0 0x1FEEDBEEF X+8 3.14159… X + 16 ENCODING! type = double! offset = 8! size = 8 END! COMPOSITE END FIELD … Data
  • 20. Serializing IR Yes, IR is serialized by it’s own schema, generated code, etc.
  • 21. ne rat n’t G e p ! Do C ra e Generated Java API C++ similar Object Reuse Encoding/Decoding Flyweight public interface MessageFlyweight { MessageFlyweight resetForEncode(buffer, offset); MessageFlyweight resetForDecode(buffer, offset); } Reduce/Eliminate Garbage Generation “Protocol” for Encoding/Decoding Random Access - NO!, Fast Sequential Access - YES! Low predictable latency - Random is not predictable Low Bounds Checking Requirements And use of Unsafe under the hood Burden of access pattern shifted to application Still going to be less buggy than C
  • 22. Down the Rabbit Hole BEGIN MESSAGE! name = “Car”! id = 1! version = 0 … BEGIN FIELD! name = “modelYear”! id = 2 public class Car implements MessageFlyweight { … public Car resetForEncode(final DirectBuffer buffer, final int offset) ! ENCODING! type = uint16! offset = 4! size = 2! ByteOrder = LITTLE END FIELD … END MESSAGE! name = “Car” ! ! public Car resetForDecode(final DirectBuffer buffer, final int offset) … public long modelYearId() { return 2L; } public int modelYear() public Car modelYear(final int value) …
  • 23. Down the Rabbit Hole Car car = new Car(); int offset = 0; ! ByteBuffer buffer = ByteBuffer.allocateDirect(CAPACITY); DirectBuffer dirBuffer = new DirectBuffer(buffer); car.resetForEncode(dirBuffer, offset); … car.modelYear(2013); … offset += car.size(); ! car.resetForEncode(dirBuffer, offset); … car.modelYear(2014); … // grab buffer and write it out Encoding // read data into buffer offset = 0; … car.resetForDecode(dirBuffer, offset); … System.out.println(car.modelYear()); … offset += car.size(); ! car.resetForDecode(dirBuffer, offset); … System.out.println(car.modelYear()); Decoding
  • 24. Down the Rabbit Hole De c o d i public int modelYear() { return CodecUtil.uint16Get(buffer, offset + 4, java.nio.ByteOrder.LITTLE_ENDIAN); } public static int uint16Get(final DirectBuffer buffer, final int index, final ByteOrder byteOrder) { return buffer.getShort(index, byteOrder) & 0xFFFF; } public short getShort(final int index, final ByteOrder byteOrder) { short bits = UNSAFE.getShort(byteArray, baseOffset + index); if (NATIVE_BYTE_ORDER != byteOrder) { bits = Short.reverseBytes(bits); } return bits; } Will be optimized out on x86 ng
  • 25. Down the Rabbit Hole mov 0x1c(%rsi),%r10d mov add $0x4,%r11d 0xc(%rsi),%r11d mov 0x18(%r12,%r10,8),%r8d shl movslq add movzwl $0x3,%r8 %r11d,%r11 0x10(%r12,%r10,8),%r11 (%r8,%r11,1),%eax ;*getfield buffer ; -Car::modelYear@1 (line 100) ;*iadd ; - Car::modelYear@9 (line 100) ;*getfield byteArray ; - DirectBuffer::getShort@4 (line 266) ; - CodecUtil::uint16Get@3 (line 378) ; - Car::modelYear@13 (line 100) ;*iand ; - CodecUtil::uint16Get@8 (line 378) ; - Car::modelYear@13 (line 100) Demonstration Purposes Only We look at this so you don’t always have to
  • 26. St ay Tu ne d ! Performance Initially… about 25x - 50x GPB … without tuning… … without aligning…
  • 27. Questions?
  • 28. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations/javamarshalling-performance