Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
LIGHTWEIGHT MESSAGING ANDRPC IN DISTRIBUTED SYSTEMSMax A. Alexejev11.10.2012
Some Theoryto start with…
Messaging SystemMessage (not packet/byte/…) as a minimaltransmission unit.The whole system unifies• Underlying protocol (T...
Typical peer-to-peer messagingProducer[host, port]Consumer[host, port]
Typical broker-based messagingProducer[bhost, bport]Broker Consumer[bhost, bport]• Broker is an indirection layer betweenp...
The trick is…Producer[bhost, bport]Broker Consumer[bhost, bport]• Producers and consumers are logical units.• Both P and C...
Generic SOA pictureS1S2S3S4S5S6
In a generic case• A service may be both a consumer for manyproducers and a producer to manyconsumers
Characteristics and Features• Topology (1-1, 1-N, N-N)• Retries• Service discovery• Guaranteed delivery (in case yes – at-...
Main classes• ESBs (Enterprise service buses)– Slow, but most feature-rich. MuleESB, JbossESB,Apache Camel, many commercia...
Messaging PerformanceAs usual, its about throughput and latency…Major throughput factors:– Network hardware used– UNICAST ...
Guaranteed deliveryInvolves additional logic both on Producer,Consumer and Broker (if any)!This is at-least-once delivery:...
Ordering (distributed broker scenario)• Producers receive messages in any order. Very cheap.No Ordering• Messages are orde...
Remote procedure callsInherently builds on top of some messaging.Method call as a minimal unit (3 states: maysucceed retur...
Serialization librariesCurrently, there are 4 clear winners:1. Google Protocol buffers (with ProtoStuff)2. Apache Thrift3....
Messaging vs RPCMessaging• In Broker-enabled case:Producers are decoupledfrom Consumers. Justpush message and don’tcare wh...
And Practiceto continue!
Today’s Overview• Broker[less] peer-to-peer messagingZeroMQ• Broker-enabled persistent distributedpubsubApache Kafka• Mult...
ZeroMQ“Its sockets on steroids. Its like mailboxes withrouting. Its fast!Things just become simpler. Complexity goesaway. ...
ZeroMQ - features• Topology – all, very flexible.• Retries – no.• Service discovery – no.• Guaranteed delivery – no.• Ackn...
ZeroMQ – features explainedIsn’t there too much “no”s ?Yes and no. Most of the features are notprovided out of the box, bu...
ZeroMQ – what’s bad about it• First of all – name.Think of ZMQ as a sockets library and u’re happy.Consider it messaging m...
ZeroMQ – what’s good• Huge list of supported platforms.• MULTICAST support for fan-out (1-N)topology.• High raw performanc...
ZeroMQ – verdict• Good for non-reliable high performancecommunication, when delivery semantics isnot strict. Example - ngx...
Apache Kafka“We have built a novel messaging system for logprocessing called Kafka that combines the benefitsof traditiona...
Kafka - features• Topology – all.• Retries – no.• Service discovery – yes (Zookeeper).• Guaranteed delivery – no (at-least...
Kafka - Architecture
Kafka - Internals• Fast writes– Configurable batching– All writes are continuous, no need for random disk access(i.e., wor...
Kafka - conclusion• Good for event-sourcing architectures(especially when they add HA support forbrokers).• Good to decoup...
Twitter Finagle“Finagle is a protocol-agnostic, asynchronousRPC system for the JVM that makes it easy tobuild robust clien...
Finagle - features• Topology – all, very flexible.• Retries – yes.• Service discovery – yes (Zookeper).• Guaranteed delive...
Finagle – from authorsFinagle provides a robust implementation of:• connection pools, with throttling to avoid TCP connect...
Finagle – Layered architecture
Finagle - Filters
Finagle – Event loop
Finagle – Future pools
Finagle - conclusion• Good for complex JVM-based RPCarchitectures.• Very good for Scala, worse experience withJava (but ye...
Resources• Moscow Big Systems / Big Data grouphttp://www.meetup.com/bigmoscow/• http://www.zeromq.org• http://zerorpc.dotc...
QUESTIONS?AND CONTACTS HTTP://MAKSIMALEKSEEV.MOIKRUG.RU/ HTTP://RU.LINKEDIN.COM/PUB/MAX-ALEXEJEV/51/820/AB9 HTTP://WWW....
Upcoming SlideShare
Loading in …5
×

Modern Distributed Messaging and RPC

17,266 views

Published on

Techtalk I gave on Moscow Big Systems / Big Data meetup in October 2012.

Published in: Technology
  • For more (and better) slides on Twitter Finagle also check http://www.slideshare.net/MaxAlexejev/distributed-highly-available-systems-in-java-and-scala
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Modern Distributed Messaging and RPC

  1. 1. LIGHTWEIGHT MESSAGING ANDRPC IN DISTRIBUTED SYSTEMSMax A. Alexejev11.10.2012
  2. 2. Some Theoryto start with…
  3. 3. Messaging SystemMessage (not packet/byte/…) as a minimaltransmission unit.The whole system unifies• Underlying protocol (TCP, UDP)• UNICAST or MULTICAST• Data format (message types & structure)Tied with• Serialization format (text or binary)
  4. 4. Typical peer-to-peer messagingProducer[host, port]Consumer[host, port]
  5. 5. Typical broker-based messagingProducer[bhost, bport]Broker Consumer[bhost, bport]• Broker is an indirection layer betweenproducer and consumer.• Producer PUSHes messages to broker.• Consumer PULLs messages from broker.
  6. 6. The trick is…Producer[bhost, bport]Broker Consumer[bhost, bport]• Producers and consumers are logical units.• Both P and C may be launched in multipleinstances.• p2p and pubsub terms are expressed in terms ofthese logical (!) units.• Even broker may be distributed or replicatedentity.
  7. 7. Generic SOA pictureS1S2S3S4S5S6
  8. 8. In a generic case• A service may be both a consumer for manyproducers and a producer to manyconsumers
  9. 9. Characteristics and Features• Topology (1-1, 1-N, N-N)• Retries• Service discovery• Guaranteed delivery (in case yes – at-least-once or exactly-once)• Ordering• Acknowledge• Disconnect detection• Transactions support (can participate in distributed transactions)• Persistence• Portability (one or many languages and platforms)• Distributed or not• Highly available or not• Type (p2p or broker-based)• Load balancing strategy for consumers• Client backoff strategy for producers• Tracing support• Library or standalone software
  10. 10. Main classes• ESBs (Enterprise service buses)– Slow, but most feature-rich. MuleESB, JbossESB,Apache Camel, many commercial.• JMS implementations– ActiveMQ, JBOSS Messaging, Glassfish, etc.• AMQP implementations– RabbitMQ, Qpid, HornetQ, etc.• Lightweight modern stuff - unstandardized– ZeroMQ, Finagle, Kafka, Beanstalkd, etc.
  11. 11. Messaging PerformanceAs usual, its about throughput and latency…Major throughput factors:– Network hardware used– UNICAST vs MULTICAST (for fan-out)Major latency factors:– Persistence (batched or single-message persistenceinvolves sequential or random disk writes)– Transactions– Broker replication– Delivery guarantees (at-least-once & exactly-once)
  12. 12. Guaranteed deliveryInvolves additional logic both on Producer,Consumer and Broker (if any)!This is at-least-once delivery:• Producer needs to get ack’ed by Broker• Consumer needs to track high-watermark ofmessages received from BrokerExact-once delivery requires more work and evenmore expensive. Typically implemented as 2-phasecommit.
  13. 13. Ordering (distributed broker scenario)• Producers receive messages in any order. Very cheap.No Ordering• Messages are ordered within single data partition. Suchas: stock symbol, account number, etc. Possible to createwell-performing implementation of distributed broker.Partitioned Ordering• All incoming messages are fairly ordered. Scalability andperformance is limited.Global (fair) ordering
  14. 14. Remote procedure callsInherently builds on top of some messaging.Method call as a minimal unit (3 states: maysucceed returning optional value, throw exception,or time out).Adds some RPC-specific characteristics & features:• Sync or async• Distributed stack traces for exceptions• Interfaces and structs declaration (possibly, viasome DSL) – often come with serialization library• May support schema evolution
  15. 15. Serialization librariesCurrently, there are 4 clear winners:1. Google Protocol buffers (with ProtoStuff)2. Apache Thrift3. Avro4. MessagePackAll provide DSLs and schema evolution.Difference is in wire format and DSL compilerform (program in C, in Java, or does not requirecompilation).
  16. 16. Messaging vs RPCMessaging• In Broker-enabled case:Producers are decoupledfrom Consumers. Justpush message and don’tcare who pulls it.• Natively matchesmessages to events inevent-sourcingarchitectures.RPC• Need to know destination(i.e., service A must knowservice B and callsignature).Messaging and RPC dictate different programmingmodels. RPC requires higher coupling betweeninteracting services.
  17. 17. And Practiceto continue!
  18. 18. Today’s Overview• Broker[less] peer-to-peer messagingZeroMQ• Broker-enabled persistent distributedpubsubApache Kafka• Multi-paradigm and feature-rich RPC in ScalaTwitter Finagle
  19. 19. ZeroMQ“Its sockets on steroids. Its like mailboxes withrouting. Its fast!Things just become simpler. Complexity goesaway. It opens the mind. Others try to explainby comparison. Its smaller, simpler, but stilllooks familiar.”@ ZeroMQ 2.2 Guide
  20. 20. ZeroMQ - features• Topology – all, very flexible.• Retries – no.• Service discovery – no.• Guaranteed delivery – no.• Acknowledge – no.• Disconnect detection – no.• Transactions support (can participate in distributed transactions) – no.• Persistence – kind of.• Portability (one or many languages and platforms) – yes, there are many bindings. However,library itself is written in C, so there’s only one “native” binding.• Distributed – yes.• Highly available or not – no.• Type (p2p or broker-based) – mostly p2p. In case of N-N topology, a broker needed in form ofZMQ “Device” with ROUTER/DEALER type sockets.• Load balancing strategy for consumers – yes (???).• Client backoff strategy for producers – no.• Tracing support – no.• Library or standalone software – platform-native library + language bindings.
  21. 21. ZeroMQ – features explainedIsn’t there too much “no”s ?Yes and no. Most of the features are notprovided out of the box, but may beimplemented manually in client andor server.Some features are easy to implement(heartbeats, ack’s, retries, …) some are verycomplex (guaranteed delivery, persistence, highavailability).
  22. 22. ZeroMQ – what’s bad about it• First of all – name.Think of ZMQ as a sockets library and u’re happy.Consider it messaging middleware and u got frustratedjust while reading guide.• Complex implementation for multithreadedclients and servers.• There were issues with services going down dueto corrupted packets (so, may not be suitable forWAN).• Some mess with development process. InitialZMQ developers forked ZMQ as Crossroads.io
  23. 23. ZeroMQ – what’s good• Huge list of supported platforms.• MULTICAST support for fan-out (1-N)topology.• High raw performance.• Fluent connect/disconnect/reconnectbehavior – really feels how it should be.• Wants to be part of Linux kernel.
  24. 24. ZeroMQ – verdict• Good for non-reliable high performancecommunication, when delivery semantics isnot strict. Example - ngx-zeromq module forNGINX.• Good if you can invest sufficient effort inbuilding custom messaging platform on topof ZMQ as a network library. Example –ZeroRPC lib by DotCloud.• Bad for any other purpose.
  25. 25. Apache Kafka“We have built a novel messaging system for logprocessing called Kafka that combines the benefitsof traditional log aggregators and messagingsystems. On the one hand, Kafka is distributed andscalable, and offers high throughput. On the otherhand, Kafka provides an API similar to a messagingsystem and allows applications to consume logevents in real time.”@ Kafka: a Distributed Messaging System for Log Processing,LinkedIn
  26. 26. Kafka - features• Topology – all.• Retries – no.• Service discovery – yes (Zookeeper).• Guaranteed delivery – no (at-least-once in normal case).• Acknowledge – no.• Disconnect detection – yes (Zookeeper).• Transactions support (can participate in distributed transactions) – no.• Persistence – yes.• Portability (one or many languages and platforms) – no.• Distributed – yes.• Highly available or not – no (work in progress).• Type (p2p or broker-based) – broker-enabled with distributed broker.• Load balancing strategy for consumers – yes.• Client backoff strategy for producers – yes .• Tracing support – no.• Library or standalone software – standalone + client libraries in Java.
  27. 27. Kafka - Architecture
  28. 28. Kafka - Internals• Fast writes– Configurable batching– All writes are continuous, no need for random disk access(i.e., works well on commodity SATA/SAS disks in RAIDarrays)• Fast reads– O(1) disk search– Extensive use of sendfile()– No in-memory data caching inside Kafka – fully relies onOS file system’s page cache• Elastic horizontal scalability– Zookeeper is used for brokers and consumers discovery– Pubsub topics are distributed among brokers
  29. 29. Kafka - conclusion• Good for event-sourcing architectures(especially when they add HA support forbrokers).• Good to decouple incoming stream andprocessing to withstand request spikes.• Very good for logs aggregation andmonitoring data collection.• Bad for transactional messaging with richdelivery semantics (exact once etc).
  30. 30. Twitter Finagle“Finagle is a protocol-agnostic, asynchronousRPC system for the JVM that makes it easy tobuild robust clients and servers in Java, Scala,or any JVM-hosted language.Finagle supports a wide variety ofrequest/response- oriented RPC protocols andmany classes of streaming protocols.”@ Twitter Engineering Blog
  31. 31. Finagle - features• Topology – all, very flexible.• Retries – yes.• Service discovery – yes (Zookeper).• Guaranteed delivery – no.• Acknowledge – no.• Disconnect detection – yes.• Transactions support (can participate in distributed transactions) – no.• Persistence – no.• Portability (one or many languages and platforms) – JVM only.• Distributed – yes.• Highly available – yes.• Type (p2p or broker-based) – p2p.• Load balancing strategy for consumers – yes (least connections etc).• Client backoff strategy for producers – yes (limited exponential).• Tracing support – yes (Zipkin ).• Library or standalone software – Scala library.
  32. 32. Finagle – from authorsFinagle provides a robust implementation of:• connection pools, with throttling to avoid TCP connection churn;• failure detectors, to identify slow or crashed hosts;• failover strategies, to direct traffic away from unhealthy hosts;• load-balancers, including “least-connections” and other strategies;• back-pressure techniques, to defend servers against abusiveclients and dogpiling.Additionally, Finagle makes it easier to build and deploy a service that• publishes standard statistics, logs, and exception reports;• supports distributed tracing (a la Dapper) across protocols;• optionally uses ZooKeeper for cluster management; and• supports common sharding strategies.
  33. 33. Finagle – Layered architecture
  34. 34. Finagle - Filters
  35. 35. Finagle – Event loop
  36. 36. Finagle – Future pools
  37. 37. Finagle - conclusion• Good for complex JVM-based RPCarchitectures.• Very good for Scala, worse experience withJava (but yes, they have some utility classes).• Works well with Thrift and HTTP (plus trivialprotocols), but lacks support for Protobuf andother popular stuff.• Active developers community (Googlegroup), but project infrastructure (mavenrepo, versioning, etc) still being improved.
  38. 38. Resources• Moscow Big Systems / Big Data grouphttp://www.meetup.com/bigmoscow/• http://www.zeromq.org• http://zerorpc.dotcloud.com• http://kafka.apache.org• http://twitter.github.io/finagle/
  39. 39. QUESTIONS?AND CONTACTS HTTP://MAKSIMALEKSEEV.MOIKRUG.RU/ HTTP://RU.LINKEDIN.COM/PUB/MAX-ALEXEJEV/51/820/AB9 HTTP://WWW.SLIDESHARE.NET/MAXALEXEJEV MALEXEJEV@GMAIL.COM SKYPE: MALEXEJEV

×