Modern Distributed Messaging and RPC

14,524 views

Published on

Techtalk I gave on Moscow Big Systems / Big Data meetup in October 2012.

Published in: Technology
1 Comment
44 Likes
Statistics
Notes
  • For more (and better) slides on Twitter Finagle also check http://www.slideshare.net/MaxAlexejev/distributed-highly-available-systems-in-java-and-scala
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
14,524
On SlideShare
0
From Embeds
0
Number of Embeds
165
Actions
Shares
0
Downloads
276
Comments
1
Likes
44
Embeds 0
No embeds

No notes for slide
  • This template can be used as a starter file for presenting training materials in a group setting.SectionsSections can help to organize your slides or facilitate collaboration between multiple authors. On the Home tab under Slides, click Section, and then click Add Section.NotesUse the Notes pane for delivery notes or to provide additional details for the audience. You can see these notes in Presenter View during your presentation. Keep in mind the font size (important for accessibility, visibility, videotaping, and online production)Coordinated colors Pay particular attention to the graphs, charts, and text boxes.Consider that attendees will print in black and white or grayscale. Run a test print to make sure your colors work when printed in pure black and white and grayscale.Graphics, tables, and graphsKeep it simple: If possible, use consistent, non-distracting styles and colors.Label all graphs and tables.
  • Give a brief overview of the presentation. Describe the major focus of the presentation and why it is important.Introduce each of the major topics.To provide a road map for the audience, you can repeat this Overview slide throughout the presentation, highlighting the particular topic you will discuss next.
  • Summarize presentation content by restating the important points from the lessons.What do you want the audience to remember when they leave your presentation?
  • Give a brief overview of the presentation. Describe the major focus of the presentation and why it is important.Introduce each of the major topics.To provide a road map for the audience, you can repeat this Overview slide throughout the presentation, highlighting the particular topic you will discuss next.
  • Give a brief overview of the presentation. Describe the major focus of the presentation and why it is important.Introduce each of the major topics.To provide a road map for the audience, you can repeat this Overview slide throughout the presentation, highlighting the particular topic you will discuss next.
  • Give a brief overview of the presentation. Describe the major focus of the presentation and why it is important.Introduce each of the major topics.To provide a road map for the audience, you can repeat this Overview slide throughout the presentation, highlighting the particular topic you will discuss next.
  • This is another option for an overview slide.
  • Modern Distributed Messaging and RPC

    1. 1. LIGHTWEIGHT MESSAGING ANDRPC IN DISTRIBUTED SYSTEMSMax A. Alexejev11.10.2012
    2. 2. Some Theoryto start with…
    3. 3. Messaging SystemMessage (not packet/byte/…) as a minimaltransmission unit.The whole system unifies• Underlying protocol (TCP, UDP)• UNICAST or MULTICAST• Data format (message types & structure)Tied with• Serialization format (text or binary)
    4. 4. Typical peer-to-peer messagingProducer[host, port]Consumer[host, port]
    5. 5. Typical broker-based messagingProducer[bhost, bport]Broker Consumer[bhost, bport]• Broker is an indirection layer betweenproducer and consumer.• Producer PUSHes messages to broker.• Consumer PULLs messages from broker.
    6. 6. The trick is…Producer[bhost, bport]Broker Consumer[bhost, bport]• Producers and consumers are logical units.• Both P and C may be launched in multipleinstances.• p2p and pubsub terms are expressed in terms ofthese logical (!) units.• Even broker may be distributed or replicatedentity.
    7. 7. Generic SOA pictureS1S2S3S4S5S6
    8. 8. In a generic case• A service may be both a consumer for manyproducers and a producer to manyconsumers
    9. 9. Characteristics and Features• Topology (1-1, 1-N, N-N)• Retries• Service discovery• Guaranteed delivery (in case yes – at-least-once or exactly-once)• Ordering• Acknowledge• Disconnect detection• Transactions support (can participate in distributed transactions)• Persistence• Portability (one or many languages and platforms)• Distributed or not• Highly available or not• Type (p2p or broker-based)• Load balancing strategy for consumers• Client backoff strategy for producers• Tracing support• Library or standalone software
    10. 10. Main classes• ESBs (Enterprise service buses)– Slow, but most feature-rich. MuleESB, JbossESB,Apache Camel, many commercial.• JMS implementations– ActiveMQ, JBOSS Messaging, Glassfish, etc.• AMQP implementations– RabbitMQ, Qpid, HornetQ, etc.• Lightweight modern stuff - unstandardized– ZeroMQ, Finagle, Kafka, Beanstalkd, etc.
    11. 11. Messaging PerformanceAs usual, its about throughput and latency…Major throughput factors:– Network hardware used– UNICAST vs MULTICAST (for fan-out)Major latency factors:– Persistence (batched or single-message persistenceinvolves sequential or random disk writes)– Transactions– Broker replication– Delivery guarantees (at-least-once & exactly-once)
    12. 12. Guaranteed deliveryInvolves additional logic both on Producer,Consumer and Broker (if any)!This is at-least-once delivery:• Producer needs to get ack’ed by Broker• Consumer needs to track high-watermark ofmessages received from BrokerExact-once delivery requires more work and evenmore expensive. Typically implemented as 2-phasecommit.
    13. 13. Ordering (distributed broker scenario)• Producers receive messages in any order. Very cheap.No Ordering• Messages are ordered within single data partition. Suchas: stock symbol, account number, etc. Possible to createwell-performing implementation of distributed broker.Partitioned Ordering• All incoming messages are fairly ordered. Scalability andperformance is limited.Global (fair) ordering
    14. 14. Remote procedure callsInherently builds on top of some messaging.Method call as a minimal unit (3 states: maysucceed returning optional value, throw exception,or time out).Adds some RPC-specific characteristics & features:• Sync or async• Distributed stack traces for exceptions• Interfaces and structs declaration (possibly, viasome DSL) – often come with serialization library• May support schema evolution
    15. 15. Serialization librariesCurrently, there are 4 clear winners:1. Google Protocol buffers (with ProtoStuff)2. Apache Thrift3. Avro4. MessagePackAll provide DSLs and schema evolution.Difference is in wire format and DSL compilerform (program in C, in Java, or does not requirecompilation).
    16. 16. Messaging vs RPCMessaging• In Broker-enabled case:Producers are decoupledfrom Consumers. Justpush message and don’tcare who pulls it.• Natively matchesmessages to events inevent-sourcingarchitectures.RPC• Need to know destination(i.e., service A must knowservice B and callsignature).Messaging and RPC dictate different programmingmodels. RPC requires higher coupling betweeninteracting services.
    17. 17. And Practiceto continue!
    18. 18. Today’s Overview• Broker[less] peer-to-peer messagingZeroMQ• Broker-enabled persistent distributedpubsubApache Kafka• Multi-paradigm and feature-rich RPC in ScalaTwitter Finagle
    19. 19. ZeroMQ“Its sockets on steroids. Its like mailboxes withrouting. Its fast!Things just become simpler. Complexity goesaway. It opens the mind. Others try to explainby comparison. Its smaller, simpler, but stilllooks familiar.”@ ZeroMQ 2.2 Guide
    20. 20. ZeroMQ - features• Topology – all, very flexible.• Retries – no.• Service discovery – no.• Guaranteed delivery – no.• Acknowledge – no.• Disconnect detection – no.• Transactions support (can participate in distributed transactions) – no.• Persistence – kind of.• Portability (one or many languages and platforms) – yes, there are many bindings. However,library itself is written in C, so there’s only one “native” binding.• Distributed – yes.• Highly available or not – no.• Type (p2p or broker-based) – mostly p2p. In case of N-N topology, a broker needed in form ofZMQ “Device” with ROUTER/DEALER type sockets.• Load balancing strategy for consumers – yes (???).• Client backoff strategy for producers – no.• Tracing support – no.• Library or standalone software – platform-native library + language bindings.
    21. 21. ZeroMQ – features explainedIsn’t there too much “no”s ?Yes and no. Most of the features are notprovided out of the box, but may beimplemented manually in client andor server.Some features are easy to implement(heartbeats, ack’s, retries, …) some are verycomplex (guaranteed delivery, persistence, highavailability).
    22. 22. ZeroMQ – what’s bad about it• First of all – name.Think of ZMQ as a sockets library and u’re happy.Consider it messaging middleware and u got frustratedjust while reading guide.• Complex implementation for multithreadedclients and servers.• There were issues with services going down dueto corrupted packets (so, may not be suitable forWAN).• Some mess with development process. InitialZMQ developers forked ZMQ as Crossroads.io
    23. 23. ZeroMQ – what’s good• Huge list of supported platforms.• MULTICAST support for fan-out (1-N)topology.• High raw performance.• Fluent connect/disconnect/reconnectbehavior – really feels how it should be.• Wants to be part of Linux kernel.
    24. 24. ZeroMQ – verdict• Good for non-reliable high performancecommunication, when delivery semantics isnot strict. Example - ngx-zeromq module forNGINX.• Good if you can invest sufficient effort inbuilding custom messaging platform on topof ZMQ as a network library. Example –ZeroRPC lib by DotCloud.• Bad for any other purpose.
    25. 25. Apache Kafka“We have built a novel messaging system for logprocessing called Kafka that combines the benefitsof traditional log aggregators and messagingsystems. On the one hand, Kafka is distributed andscalable, and offers high throughput. On the otherhand, Kafka provides an API similar to a messagingsystem and allows applications to consume logevents in real time.”@ Kafka: a Distributed Messaging System for Log Processing,LinkedIn
    26. 26. Kafka - features• Topology – all.• Retries – no.• Service discovery – yes (Zookeeper).• Guaranteed delivery – no (at-least-once in normal case).• Acknowledge – no.• Disconnect detection – yes (Zookeeper).• Transactions support (can participate in distributed transactions) – no.• Persistence – yes.• Portability (one or many languages and platforms) – no.• Distributed – yes.• Highly available or not – no (work in progress).• Type (p2p or broker-based) – broker-enabled with distributed broker.• Load balancing strategy for consumers – yes.• Client backoff strategy for producers – yes .• Tracing support – no.• Library or standalone software – standalone + client libraries in Java.
    27. 27. Kafka - Architecture
    28. 28. Kafka - Internals• Fast writes– Configurable batching– All writes are continuous, no need for random disk access(i.e., works well on commodity SATA/SAS disks in RAIDarrays)• Fast reads– O(1) disk search– Extensive use of sendfile()– No in-memory data caching inside Kafka – fully relies onOS file system’s page cache• Elastic horizontal scalability– Zookeeper is used for brokers and consumers discovery– Pubsub topics are distributed among brokers
    29. 29. Kafka - conclusion• Good for event-sourcing architectures(especially when they add HA support forbrokers).• Good to decouple incoming stream andprocessing to withstand request spikes.• Very good for logs aggregation andmonitoring data collection.• Bad for transactional messaging with richdelivery semantics (exact once etc).
    30. 30. Twitter Finagle“Finagle is a protocol-agnostic, asynchronousRPC system for the JVM that makes it easy tobuild robust clients and servers in Java, Scala,or any JVM-hosted language.Finagle supports a wide variety ofrequest/response- oriented RPC protocols andmany classes of streaming protocols.”@ Twitter Engineering Blog
    31. 31. Finagle - features• Topology – all, very flexible.• Retries – yes.• Service discovery – yes (Zookeper).• Guaranteed delivery – no.• Acknowledge – no.• Disconnect detection – yes.• Transactions support (can participate in distributed transactions) – no.• Persistence – no.• Portability (one or many languages and platforms) – JVM only.• Distributed – yes.• Highly available – yes.• Type (p2p or broker-based) – p2p.• Load balancing strategy for consumers – yes (least connections etc).• Client backoff strategy for producers – yes (limited exponential).• Tracing support – yes (Zipkin ).• Library or standalone software – Scala library.
    32. 32. Finagle – from authorsFinagle provides a robust implementation of:• connection pools, with throttling to avoid TCP connection churn;• failure detectors, to identify slow or crashed hosts;• failover strategies, to direct traffic away from unhealthy hosts;• load-balancers, including “least-connections” and other strategies;• back-pressure techniques, to defend servers against abusiveclients and dogpiling.Additionally, Finagle makes it easier to build and deploy a service that• publishes standard statistics, logs, and exception reports;• supports distributed tracing (a la Dapper) across protocols;• optionally uses ZooKeeper for cluster management; and• supports common sharding strategies.
    33. 33. Finagle – Layered architecture
    34. 34. Finagle - Filters
    35. 35. Finagle – Event loop
    36. 36. Finagle – Future pools
    37. 37. Finagle - conclusion• Good for complex JVM-based RPCarchitectures.• Very good for Scala, worse experience withJava (but yes, they have some utility classes).• Works well with Thrift and HTTP (plus trivialprotocols), but lacks support for Protobuf andother popular stuff.• Active developers community (Googlegroup), but project infrastructure (mavenrepo, versioning, etc) still being improved.
    38. 38. Resources• Moscow Big Systems / Big Data grouphttp://www.meetup.com/bigmoscow/• http://www.zeromq.org• http://zerorpc.dotcloud.com• http://kafka.apache.org• http://twitter.github.io/finagle/
    39. 39. QUESTIONS?AND CONTACTS HTTP://MAKSIMALEKSEEV.MOIKRUG.RU/ HTTP://RU.LINKEDIN.COM/PUB/MAX-ALEXEJEV/51/820/AB9 HTTP://WWW.SLIDESHARE.NET/MAXALEXEJEV MALEXEJEV@GMAIL.COM SKYPE: MALEXEJEV

    ×