Java one2013


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Java one2013

  1. 1. Distributed & highly available serverapplications in Java and ScalaMax Alexejev, Aleksei KornevJavaOne Moscow 201324 April 2013
  2. 2. What is talkbits?
  3. 3. Architectureby Max Alexejev
  4. 4. Lightweight SOAKey principles● S1, S2 - edge services● Each service is 0..1 serversand 0..N clients built together● No special "broker" services● All services are stateless● All instances are equalWhat about state?State is kept is specializeddistributed systems and frontedby specific services.Example follows...
  5. 5. Case study: Talkbits backendRecursive call
  6. 6. Requirements for a distrubuted RPC systemMust have and nice to have● Elastic and reliable discovery - schould handle nodes brought up andshut down transparently and not be a SPOF itself● Support for N-N topology of client and server instances● Disconnect detection and transparent reconnects● Fault tolerance - for example, by retries to remaining instances wherecalled instance goes down● Clients backoff built-in - i.e., clients should not overload serverswhen load spikes - as far as possible● Configurable load distribution - i.e., which server instance to call forthis specific request● Configurable networking layer - keepalives & heartbeats, timeouts,connection pools etc.)● Distributed tracing facilities● Portability among different platforms● Distributed stack traces for exceptions● Transactions
  7. 7. Key principles to be lightweight and get rid of architectural waste● Java SE● No containers. Even servlet containers are light and built-in● Standalone applications: unified configuration, deployment, metrics,logging, single development framework - more on this later● All launched istances are equal and process requests - no "special"nodes or "active-standby" patterns● Minimal dependencies and JAR size● Minimal memory footprint● One service - one purpose● Highly tuned for this one purpose (app, JVM, OS, HW)● Isolated fault domains - i.e., single datasource or external service isfronted by one service onlyNo bloatware in technology stack!"Lean" services
  8. 8. Finagle library( actsas a distributed RPCframework.Services are written in Javaand Scala and use Thriftcommunication protocol.Talkbits implementation choicesApache Zookeeper ( reliable service discovery mechanics. Finagle has a nice built-inintegration with Zookeeper.
  9. 9. Finagle server: networkingFinagle is built on top of Netty - asynchronous, non-blocking TCP server.Finagle codectrait Codec[Req, Rep]class ThriftClientFramedCodec(...) extends Codec[ThriftClientRequest, Array[Byte]] {pipeline.addLast("thriftFrameCodec", new ThriftFrameCodec)pipeline.addLast("byteEncoder", new ThriftClientChannelBufferEncoder)pipeline.addLast("byteDecoder", new ThriftChannelBufferDecoder)...}Finagle comes with ready-made codecs forThrift, HTTP, Memcache, Kestrel, HTTP streaming.
  10. 10. Finagle services and filters// Service is simply a function from request to a future of response.trait Service[Req, Rep] extends (Req => Future[Rep])// Filter[A, B, C, D] converts a Service[C, D] to a Service[A, B].abstract class Filter[-ReqIn, +RepOut, +ReqOut, -RepIn]extends ((ReqIn, Service[ReqOut, RepIn]) => Future[RepOut])abstract class SimpleFilter[Req, Rep] extends Filter[Req, Rep, Req, Rep]// Service transformation exampleval serviceWithTimeout: Service[Req, Rep] =new RetryFilter[Req, Rep](..) andThennew TimeoutFilter[Req, Rep](..) andThenserviceFinagle comes withrate limiting, retries,statistics, tracing,uncaught exceptionshandling, timeouts andmore.
  11. 11. Functional compositionGiven Future[A]Sequential compositiondef map[B](f: A => B): Future[B]def flatMap[B](f: A => Future[B]): Future[B]def rescue[B >: A](rescueException: PartialFunction[Throwable, Future[B]]): Future[B]Concurrent compositiondef collect[A](fs: Seq[Future[A]]): Future[Seq[A]]def select[A](fs: Seq[Future[A]]): Future[(Try[A], Seq[Future[A]])]And moretimes(), whileDo() etc.
  12. 12. Functional composition on RPC callsSequential compositionval nearestChannel: Future[Channel] =metadataClient.getUserById(uuid) flatMap {user => geolocationClient.getNearestChannelId( user.getLocation() )} flatMap {channelId => metadataClient.getChannelById( channelId )}Concurrent compositionval userF: Future[User] = metadataClient.getUserById(uuid)val bitsCountF: Future[Integer] = metadataClient.getUserBitsCount(uuid)val avatarsF: Future[List[Avatar]] = metadataClient.getUserAvatars(uuid)val(user, bitsCount, avatars) =Future.collect(Seq(userF, bitsCountF, avatarsF)).get()*All this stuff works in Java just like in Scala, but does not look as cool.
  13. 13. Finagle server: threading modelYou should never block worker threads in order to achieve highperformance (throughput).For blocking IO or long compuntations, delegate to FuturePool.val diskIoFuturePool = FuturePool(Executors.newFixedThreadPool(4))diskIoFuturePool( { scala.Source.fromFile(..) } )Boss thread accepts newclient connections andbinds NIO Channel to aspecific worker thread.Worker threads performall client IO.
  14. 14. More gifts and bonuses from FinagleIn addition to all said before, Finagle has● Load-distribution in N-N topos - HeapBalancer ("least activeconnections") by default● Client backoff strategies - comes with TruncatedBinaryBackoffimplementation● Failure detection● Failover/Retry● Connection Pooling● Distributed Tracing (Zipkin project based on Google Dapper paper)
  15. 15. Finagle, Thrift & Java: lessons learnedPros● Gives a lot out of the box● Production-proven and stable● Active development community● Lots of extension points in the libraryCons● Good for Scala, usable with Java● Works well with Thrift and HTTP (plus trivial protocols), but lackssupport for Protobuf and other stuff● Poor exceptions handling experience with Java (no Scala match-es)and ugly code● finagle-thrift is a pain (old libthrift version lock-in, Cassandradependencies clash, cannot return nulls, and more). All problemsavoidable thought.● Cluster scatters and never gathers when whole Zookeeper ensembleis down.
  16. 16. Finagle: competitors & alternativesTrending● Akka 2.0 (Scala, OpenSource) by Typesafe● ZeroRPC (Python & Node.js, OpenSource) by DotCloud● RxJava (Java, OpenSource) by NetflixOld● JGroups (Java, OpenSource)● JBOSS Remoting (Java, OpenSource) by JBOSS● Spread Toolkit (C/C++, Commercial & OpenSource)
  17. 17. Configuration, deployment,monitoring and loggingby Aleksei Kornev
  18. 18. Get stuff done...
  19. 19. Typical application
  20. 20. Architecture of talkbits serviceOne way to configure service, logs, metrics.One way to package and deploy service.One way to lunch service.Bundled in one-jar.
  21. 21. One delivery unit. Contains:Java serviceIn a single executable fat-jar.Installation script[Re]installs service on the machine,registers it in /etc/init.dInit.d scriptContains instructions to start, stop,restart JVM and get quick status.Delivery
  22. 22. LoggingConfuguration● SLF4J as an API, all other libraries redirected● Logback as a logging implementation● Each service logs to /var/log/talkbits/... (application logs, GC logs)● Daily rotation policy applied● Also sent to for aggregation, grouping etc.Aggregation●● sshfs for analyzing logs by means of linux tools such as grep, tail, less,etc.Aggregation, Flume, Scribe, etc...
  23. 23. MetricsApplication metrics and health checks are implemented with CodaHale lib( Codahale reports metrics via JMX.Jolokia JVM agent ( exposes JMX beansvia REST (JSON / HTTP), using JVMs internal HTTP server.Monitoring agent use jolokia REST interface to fetch metrics and sendthem to monitoring system.All metrics are divided into common metrics (HW, JVM, etc) and service-specific metrics.
  24. 24. DeploymentFabric ( used forenvironments provisioning andservices deployment.Process● Fabric script provisions new env(or uses existing) by clusterscheme● Amazon instances areautomatically tagged withservices list (i.e., instance roles)● Fabric script reads instance rolesand deploys (redeploys)appropriate components.
  25. 25. MonitoringAs monitoring platform we chose Datadog is a SaaSwhich is easy to integrate into your infrastucture. Datadog agent isopensourced and implemented in Python. There are many predefinedchecksets (plugins, or integrations) for popular products out of the box -including JVM, Cassandra, Zookeeper and ElasticSearch.Datadog provides REST API.Alternatives● Nagios, Zabbix - need to have bearded admin in team. We wanted togo SaaS and outsource infrastructure as far as possible.● Amazon CloudWatch, LogicMonitor, ManageEngine, etc.ProcessEach service has own monitoring agent instance on a single machine. Ifnode has monitoring-agent role in the roles tag of EC2 instance,monitoring agent will be installed for each service on this node.
  26. 26. Talkbits cluster structure
  27. 27. QAMax AlexejevHTTP://RU.LINKEDIN.COM/PUB/MAX-ALEXEJEV/51/820/AB9