4. Performance Gain Summary
Initially Now % change
Mean Time 611 ms 93 ms 85% reduction
95th Percentile 620 ms 150 ms 76 % reduction
QPS 63 409 549 % increase
9. Alternate explored
• Netty is good for requirements we had , and haven’t found
any more disadvantageous to Others.
– Used by Facebook, Twitter and many others for high
performance
• We can probably later experiment , with nifty /swift, thrift
server based on netty – it has handlers based that does
thrift decoding, and provides few other features.
– https://github.com/facebook/nifty/
– https://github.com/facebook/swift
• Explored various things like
– Dropwizard framework based on jetty & jersey – has in built
integration with yammer
– Phoenix 3 , Servlet 3.0 -Async NIO- Tomcat container
11. What we did
• Connection Pool
– Initially
• Earlier we used custom written netty code for some partners and for some ning – were not
making a connection pool.
• Had a bug which used to timeout if response was not written in cutoff time
– Now
• Connection Pool for outbound request to Partners – used ning. Setup and teardown time
of connection saved . Integration team is following with partners to do change for
maintaining a persistent connection.
• Extra Threads
– Had a Execution Handler , which use to create a new thread to execute a handler in a pipeline,
every time a request arrives. It was not required as we were not doing a blocking operation in
the handler
– Shared the WorkerPool thread between server and client
• HashedWheelTimer
– were doing polling every 5 ms for checking timeout . Was taking around 16-20 % CPU. Increased
this to 20 ms.
• File logging
– Reduced to minimum
– Moved completely to logback .
– Introduced Turbo Filters .
15. Channel Pipeline – Contd.
protected void initChannel(final SocketChannel ch) throws Exception {
ChannelPipeline pipeline = ch.pipeline();
pipeline.addLast("logging", loggingHandler);
pipeline.addLast("incomingLimitHandler",
incomingConnectionLimitHandler);
pipeline.addLast("decoderEncoder", new HttpServerCodec());
pipeline.addLast("aggregator", new HttpObjectAggregator(1024 *
1024));
}
Consider Handler in a pipeline is like a filters in servlet world, Except there
are only filters and nothing special like a Servlet at end.
16. Next we did
• Moved to Netty4 framework from Netty3
– Advantages:
• Less GC overhead and memory consumption
– Removal of Event Objects.
– Buffer pooling
• Better thread model - http://netty.io/wiki/new-and-noteworthy-in-
4.0.html#wiki-h2-34
• Pool of Marshaller , UnMarshaller , DocumentBuilder , JaxbContext creation once
– used commons-pool from apache for pooling
• JSON parsing library we used for request parsing was slow
– should use Jackson.
– Now, we are having thrift serialized request.
• Persistent connection at server side with UMP .
– Cas Timeout Handler for keep alive connections timeout. Existing Netty
Handlers handle read / write timeouts irrespective of whether op took place.
– Unnecessary thread waiting for lock in synchronized block while sending
response. Release lock as soon as possible.
17. Removal of Event Objects
class Before implements ChannelUpstreamHandler {
void handleUpstream(ctx, ChannelEvent e) {
if (e instanceof MessageEvent) { ... }
else if (e instanceof ChannelStateEvent) { ... }
...
}
}
class After implements ChannelInboundHandler {
void channelActive(ctx) { ... }
void channelInactive(ctx) { ... }
void channelRead(ctx, msg) { ... }
void userEventTriggered(ctx, evt) { ... }
...
}
18. Buffer Pooling
• Pool implementation using ByteBufAllocator which is based on Facebook’s Jemalloc -
https://www.facebook.com/video/video.php?v=696488619305
– Why?
• Doesn’t waste memory bandwidth by filling in zeros. Now ‘new
byte[capacity]’ doesn’t happen every time a new message arrives or
response is sent.
– Advantage:
• 5 times less frequent GC pauses: 45.5 vs. 9.2 times/min.
• 5 times less garbage production: 207.11 vs 41.81 MiB/s
– Disadvantage:
• Memory Management at the hand of user. Explicit release of buffers. Netty
has Leak Reporting facilities.
• We are using Pooled Direct Memory in CAS.
– Pooled – means memory pool maintained by Netty
– Direct – means memory allocated outside the JVM. So, JVM doesn’t garbage
collect it.
• Direct is there in Java NIO ByteBuffer too.
24. Tools Used
• Tcpdump, Wireshark
– Ning was adding :80 for Host header, while DSP
didn’t expect. Though it is as per HTTP/1.1 spec
rfc.
• JVisualVm , JProfiler , Eclipse Java Monitor
• Gatling , Jmeter - for performance
benchmarking
25. Further
• Use a standard Netty4 based http client api. Waiting on Ning to
release a update.
– Several good alternates.
• https://github.com/brunodecarvalho/http-client
• https://github.com/timboudreau/netty-http-client
•Facebook’s Nifty / Swift
• Move to Bare Metal Boxes from VM’s.