Java one2013

Distributed & highly available server
applications in Java and Scala
Max Alexejev, Aleksei Kornev
JavaOne Moscow 2013
24 April 2013

Lightweight SOA
Key principles
● S1, S2 - edge services
● Each service is 0..1 servers
and 0..N clients built together
● No special "broker" services
● All services are stateless
● All instances are equal
What about state?
State is kept is specialized
distributed systems and fronted
by specific services.
Example follows...

Case study: Talkbits backend
Recursive call

Requirements for a distrubuted RPC system
Must have and nice to have
● Elastic and reliable discovery - schould handle nodes brought up and
shut down transparently and not be a SPOF itself
● Support for N-N topology of client and server instances
● Disconnect detection and transparent reconnects
● Fault tolerance - for example, by retries to remaining instances where
called instance goes down
● Clients backoff built-in - i.e., clients should not overload servers
when load spikes - as far as possible
● Configurable load distribution - i.e., which server instance to call for
this specific request
● Configurable networking layer - keepalives & heartbeats, timeouts,
connection pools etc.)
● Distributed tracing facilities
● Portability among different platforms
● Distributed stack traces for exceptions
● Transactions

Key principles to be lightweight and get rid of architectural waste
● Java SE
● No containers. Even servlet containers are light and built-in
● Standalone applications: unified configuration, deployment, metrics,
logging, single development framework - more on this later
● All launched istances are equal and process requests - no "special"
nodes or "active-standby" patterns
● Minimal dependencies and JAR size
● Minimal memory footprint
● One service - one purpose
● Highly tuned for this one purpose (app, JVM, OS, HW)
● Isolated fault domains - i.e., single datasource or external service is
fronted by one service only
No bloatware in technology stack!
"Lean" services

Finagle library
(twitter.github.io/finagle) acts
as a distributed RPC
framework.
Services are written in Java
and Scala and use Thrift
communication protocol.
Talkbits implementation choices
Apache Zookeeper (zookeeper.apache.org)
Provides reliable service discovery mechanics. Finagle has a nice built-in
integration with Zookeeper.

Finagle server: networking
Finagle is built on top of Netty - asynchronous, non-blocking TCP server.
Finagle codec
trait Codec[Req, Rep]
class ThriftClientFramedCodec(...) extends Codec[ThriftClientRequest, Array[Byte]] {
pipeline.addLast("thriftFrameCodec", new ThriftFrameCodec)
pipeline.addLast("byteEncoder", new ThriftClientChannelBufferEncoder)
pipeline.addLast("byteDecoder", new ThriftChannelBufferDecoder)
...
}
Finagle comes with ready-made codecs for
Thrift, HTTP, Memcache, Kestrel, HTTP streaming.

Finagle services and filters
// Service is simply a function from request to a future of response.
trait Service[Req, Rep] extends (Req => Future[Rep])
// Filter[A, B, C, D] converts a Service[C, D] to a Service[A, B].
abstract class Filter[-ReqIn, +RepOut, +ReqOut, -RepIn]
extends ((ReqIn, Service[ReqOut, RepIn]) => Future[RepOut])
abstract class SimpleFilter[Req, Rep] extends Filter[Req, Rep, Req, Rep]
// Service transformation example
val serviceWithTimeout: Service[Req, Rep] =
new RetryFilter[Req, Rep](..) andThen
new TimeoutFilter[Req, Rep](..) andThen
service
Finagle comes with
rate limiting, retries,
statistics, tracing,
uncaught exceptions
handling, timeouts and
more.

Functional composition
Given Future[A]
Sequential composition
def map[B](f: A => B): Future[B]
def flatMap[B](f: A => Future[B]): Future[B]
def rescue[B >: A](rescueException: PartialFunction[Throwable, Future[B]]): Future[B]
Concurrent composition
def collect[A](fs: Seq[Future[A]]): Future[Seq[A]]
def select[A](fs: Seq[Future[A]]): Future[(Try[A], Seq[Future[A]])]
And more
times(), whileDo() etc.

Functional composition on RPC calls
Sequential composition
val nearestChannel: Future[Channel] =
metadataClient.getUserById(uuid) flatMap {
user => geolocationClient.getNearestChannelId( user.getLocation() )
} flatMap {
channelId => metadataClient.getChannelById( channelId )
}
Concurrent composition
val userF: Future[User] = metadataClient.getUserById(uuid)
val bitsCountF: Future[Integer] = metadataClient.getUserBitsCount(uuid)
val avatarsF: Future[List[Avatar]] = metadataClient.getUserAvatars(uuid)
val(user, bitsCount, avatars) =
Future.collect(Seq(userF, bitsCountF, avatarsF)).get()
*All this stuff works in Java just like in Scala, but does not look as cool.

Finagle server: threading model
You should never block worker threads in order to achieve high
performance (throughput).
For blocking IO or long compuntations, delegate to FuturePool.
val diskIoFuturePool = FuturePool(Executors.newFixedThreadPool(4))
diskIoFuturePool( { scala.Source.fromFile(..) } )
Boss thread accepts new
client connections and
binds NIO Channel to a
specific worker thread.
Worker threads perform
all client IO.

More gifts and bonuses from Finagle
In addition to all said before, Finagle has
● Load-distribution in N-N topos - HeapBalancer ("least active
connections") by default
● Client backoff strategies - comes with TruncatedBinaryBackoff
implementation
● Failure detection
● Failover/Retry
● Connection Pooling
● Distributed Tracing (Zipkin project based on Google Dapper paper)

Finagle, Thrift & Java: lessons learned
Pros
● Gives a lot out of the box
● Production-proven and stable
● Active development community
● Lots of extension points in the library
Cons
● Good for Scala, usable with Java
● Works well with Thrift and HTTP (plus trivial protocols), but lacks
support for Protobuf and other stuff
● Poor exceptions handling experience with Java (no Scala match-es)
and ugly code
● finagle-thrift is a pain (old libthrift version lock-in, Cassandra
dependencies clash, cannot return nulls, and more). All problems
avoidable thought.
● Cluster scatters and never gathers when whole Zookeeper ensemble
is down.

Finagle: competitors & alternatives
Trending
● Akka 2.0 (Scala, OpenSource) by Typesafe
● ZeroRPC (Python & Node.js, OpenSource) by DotCloud
● RxJava (Java, OpenSource) by Netflix
Old
● JGroups (Java, OpenSource)
● JBOSS Remoting (Java, OpenSource) by JBOSS
● Spread Toolkit (C/C++, Commercial & OpenSource)

Configuration, deployment,
monitoring and logging
by Aleksei Kornev

Architecture of talkbits service
One way to configure service, logs, metrics.
One way to package and deploy service.
One way to lunch service.
Bundled in one-jar.

One delivery unit. Contains:
Java service
In a single executable fat-jar.
Installation script
[Re]installs service on the machine,
registers it in /etc/init.d
Init.d script
Contains instructions to start, stop,
restart JVM and get quick status.
Delivery

Logging
Confuguration
● SLF4J as an API, all other libraries redirected
● Logback as a logging implementation
● Each service logs to /var/log/talkbits/... (application logs, GC logs)
● Daily rotation policy applied
● Also sent to loggly.com for aggregation, grouping etc.
Aggregation
● loggly.com
● sshfs for analyzing logs by means of linux tools such as grep, tail, less,
etc.
Aggregation alternatives
Splunk.com, Flume, Scribe, etc...

Metrics
Application metrics and health checks are implemented with CodaHale lib
(metrics.codahale.com). Codahale reports metrics via JMX.
Jolokia JVM agent (www.jolokia.org/agent/jvm.html) exposes JMX beans
via REST (JSON / HTTP), using JVMs internal HTTP server.
Monitoring agent use jolokia REST interface to fetch metrics and send
them to monitoring system.
All metrics are divided into common metrics (HW, JVM, etc) and service-
specific metrics.

Deployment
Fabric (http://fabfile.org) used for
environments provisioning and
services deployment.
Process
● Fabric script provisions new env
(or uses existing) by cluster
scheme
● Amazon instances are
automatically tagged with
services list (i.e., instance roles)
● Fabric script reads instance roles
and deploys (redeploys)
appropriate components.

Monitoring
As monitoring platform we chose Datadoghq.com. Datadog is a SaaS
which is easy to integrate into your infrastucture. Datadog agent is
opensourced and implemented in Python. There are many predefined
checksets (plugins, or integrations) for popular products out of the box -
including JVM, Cassandra, Zookeeper and ElasticSearch.
Datadog provides REST API.
Alternatives
● Nagios, Zabbix - need to have bearded admin in team. We wanted to
go SaaS and outsource infrastructure as far as possible.
● Amazon CloudWatch, LogicMonitor, ManageEngine, etc.
Process
Each service has own monitoring agent instance on a single machine. If
node has 'monitoring-agent' role in the roles tag of EC2 instance,
monitoring agent will be installed for each service on this node.

QA
Max Alexejev
HTTP://RU.LINKEDIN.COM/PUB/MAX-ALEXEJEV/51/820/AB9
http://www.slideshare.net/MaxAlexejev/
MALEXEJEV@GMAIL.COM
Aleksei Kornev
aleksei.kornev@gmail.com

Java one2013

Recommended

Recommended

More Related Content

What's hot

What's hot (15)

Similar to Java one2013

Similar to Java one2013 (20)

Java one2013