SlideShare a Scribd company logo
1 of 39
LIGHTWEIGHT MESSAGING AND
RPC IN DISTRIBUTED SYSTEMS
Max A. Alexejev
11.10.2012
Some Theory
to start with…
Messaging System
Message (not packet/byte/…) as a minimal
transmission unit.
The whole system unifies
• Underlying protocol (TCP, UDP)
• UNICAST or MULTICAST
• Data format (message types & structure)
Tied with
• Serialization format (text or binary)
Typical peer-to-peer messaging
Producer
[host, port]
Consumer
[host, port]
Typical broker-based messaging
Producer
[bhost, bport]
Broker Consumer
[bhost, bport]
• Broker is an indirection layer between
producer and consumer.
• Producer PUSHes messages to broker.
• Consumer PULLs messages from broker.
The trick is…
Producer
[bhost, bport]
Broker Consumer
[bhost, bport]
• Producers and consumers are logical units.
• Both P and C may be launched in multiple
instances.
• p2p and pubsub terms are expressed in terms of
these logical (!) units.
• Even broker may be distributed or replicated
entity.
Generic SOA picture
S1
S2
S3
S4
S5
S6
In a generic case
• A service may be both a consumer for many
producers and a producer to many
consumers
Characteristics and Features
• Topology (1-1, 1-N, N-N)
• Retries
• Service discovery
• Guaranteed delivery (in case yes – at-least-once or exactly-once)
• Ordering
• Acknowledge
• Disconnect detection
• Transactions support (can participate in distributed transactions)
• Persistence
• Portability (one or many languages and platforms)
• Distributed or not
• Highly available or not
• Type (p2p or broker-based)
• Load balancing strategy for consumers
• Client backoff strategy for producers
• Tracing support
• Library or standalone software
Main classes
• ESBs (Enterprise service buses)
– Slow, but most feature-rich. MuleESB, JbossESB,
Apache Camel, many commercial.
• JMS implementations
– ActiveMQ, JBOSS Messaging, Glassfish, etc.
• AMQP implementations
– RabbitMQ, Qpid, HornetQ, etc.
• Lightweight modern stuff - unstandardized
– ZeroMQ, Finagle, Kafka, Beanstalkd, etc.
Messaging Performance
As usual, its about throughput and latency…
Major throughput factors:
– Network hardware used
– UNICAST vs MULTICAST (for fan-out)
Major latency factors:
– Persistence (batched or single-message persistence
involves sequential or random disk writes)
– Transactions
– Broker replication
– Delivery guarantees (at-least-once & exactly-once)
Guaranteed delivery
Involves additional logic both on Producer,
Consumer and Broker (if any)!
This is at-least-once delivery:
• Producer needs to get ack’ed by Broker
• Consumer needs to track high-watermark of
messages received from Broker
Exact-once delivery requires more work and even
more expensive. Typically implemented as 2-phase
commit.
Ordering (distributed broker scenario)
• Producers receive messages in any order. Very cheap.
No Ordering
• Messages are ordered within single data partition. Such
as: stock symbol, account number, etc. Possible to create
well-performing implementation of distributed broker.
Partitioned Ordering
• All incoming messages are fairly ordered. Scalability and
performance is limited.
Global (fair) ordering
Remote procedure calls
Inherently builds on top of some messaging.
Method call as a minimal unit (3 states: may
succeed returning optional value, throw exception,
or time out).
Adds some RPC-specific characteristics & features:
• Sync or async
• Distributed stack traces for exceptions
• Interfaces and structs declaration (possibly, via
some DSL) – often come with serialization library
• May support schema evolution
Serialization libraries
Currently, there are 4 clear winners:
1. Google Protocol buffers (with ProtoStuff)
2. Apache Thrift
3. Avro
4. MessagePack
All provide DSLs and schema evolution.
Difference is in wire format and DSL compiler
form (program in C, in Java, or does not require
compilation).
Messaging vs RPC
Messaging
• In Broker-enabled case:
Producers are decoupled
from Consumers. Just
push message and don’t
care who pulls it.
• Natively matches
messages to events in
event-sourcing
architectures.
RPC
• Need to know destination
(i.e., service A must know
service B and call
signature).
Messaging and RPC dictate different programming
models. RPC requires higher coupling between
interacting services.
And Practice
to continue!
Today’s Overview
• Broker[less] peer-to-peer messaging
ZeroMQ
• Broker-enabled persistent distributed
pubsub
Apache Kafka
• Multi-paradigm and feature-rich RPC in Scala
Twitter Finagle
ZeroMQ
“It's sockets on steroids. It's like mailboxes with
routing. It's fast!
Things just become simpler. Complexity goes
away. It opens the mind. Others try to explain
by comparison. It's smaller, simpler, but still
looks familiar.”
@ ZeroMQ 2.2 Guide
ZeroMQ - features
• Topology – all, very flexible.
• Retries – no.
• Service discovery – no.
• Guaranteed delivery – no.
• Acknowledge – no.
• Disconnect detection – no.
• Transactions support (can participate in distributed transactions) – no.
• Persistence – kind of.
• Portability (one or many languages and platforms) – yes, there are many bindings. However,
library itself is written in C, so there’s only one “native” binding.
• Distributed – yes.
• Highly available or not – no.
• Type (p2p or broker-based) – mostly p2p. In case of N-N topology, a broker needed in form of
ZMQ “Device” with ROUTER/DEALER type sockets.
• Load balancing strategy for consumers – yes (???).
• Client backoff strategy for producers – no.
• Tracing support – no.
• Library or standalone software – platform-native library + language bindings.
ZeroMQ – features explained
Isn’t there too much “no”s ?
Yes and no. Most of the features are not
provided out of the box, but may be
implemented manually in client andor server.
Some features are easy to implement
(heartbeats, ack’s, retries, …) some are very
complex (guaranteed delivery, persistence, high
availability).
ZeroMQ – what’s bad about it
• First of all – name.
Think of ZMQ as a sockets library and u’re happy.
Consider it messaging middleware and u got frustrated
just while reading guide.
• Complex implementation for multithreaded
clients and servers.
• There were issues with services going down due
to corrupted packets (so, may not be suitable for
WAN).
• Some mess with development process. Initial
ZMQ developers forked ZMQ as Crossroads.io
ZeroMQ – what’s good
• Huge list of supported platforms.
• MULTICAST support for fan-out (1-N)
topology.
• High raw performance.
• Fluent connect/disconnect/reconnect
behavior – really feels how it should be.
• Wants to be part of Linux kernel.
ZeroMQ – verdict
• Good for non-reliable high performance
communication, when delivery semantics is
not strict. Example - ngx-zeromq module for
NGINX.
• Good if you can invest sufficient effort in
building custom messaging platform on top
of ZMQ as a network library. Example –
ZeroRPC lib by DotCloud.
• Bad for any other purpose.
Apache Kafka
“We have built a novel messaging system for log
processing called Kafka that combines the benefits
of traditional log aggregators and messaging
systems. On the one hand, Kafka is distributed and
scalable, and offers high throughput. On the other
hand, Kafka provides an API similar to a messaging
system and allows applications to consume log
events in real time.”
@ Kafka: a Distributed Messaging System for Log Processing,
LinkedIn
Kafka - features
• Topology – all.
• Retries – no.
• Service discovery – yes (Zookeeper).
• Guaranteed delivery – no (at-least-once in normal case).
• Acknowledge – no.
• Disconnect detection – yes (Zookeeper).
• Transactions support (can participate in distributed transactions) – no.
• Persistence – yes.
• Portability (one or many languages and platforms) – no.
• Distributed – yes.
• Highly available or not – no (work in progress).
• Type (p2p or broker-based) – broker-enabled with distributed broker.
• Load balancing strategy for consumers – yes.
• Client backoff strategy for producers – yes .
• Tracing support – no.
• Library or standalone software – standalone + client libraries in Java.
Kafka - Architecture
Kafka - Internals
• Fast writes
– Configurable batching
– All writes are continuous, no need for random disk access
(i.e., works well on commodity SATA/SAS disks in RAID
arrays)
• Fast reads
– O(1) disk search
– Extensive use of sendfile()
– No in-memory data caching inside Kafka – fully relies on
OS file system’s page cache
• Elastic horizontal scalability
– Zookeeper is used for brokers and consumers discovery
– Pubsub topics are distributed among brokers
Kafka - conclusion
• Good for event-sourcing architectures
(especially when they add HA support for
brokers).
• Good to decouple incoming stream and
processing to withstand request spikes.
• Very good for logs aggregation and
monitoring data collection.
• Bad for transactional messaging with rich
delivery semantics (exact once etc).
Twitter Finagle
“Finagle is a protocol-agnostic, asynchronous
RPC system for the JVM that makes it easy to
build robust clients and servers in Java, Scala,
or any JVM-hosted language.
Finagle supports a wide variety of
request/response- oriented RPC protocols and
many classes of streaming protocols.”
@ Twitter Engineering Blog
Finagle - features
• Topology – all, very flexible.
• Retries – yes.
• Service discovery – yes (Zookeper).
• Guaranteed delivery – no.
• Acknowledge – no.
• Disconnect detection – yes.
• Transactions support (can participate in distributed transactions) – no.
• Persistence – no.
• Portability (one or many languages and platforms) – JVM only.
• Distributed – yes.
• Highly available – yes.
• Type (p2p or broker-based) – p2p.
• Load balancing strategy for consumers – yes (least connections etc).
• Client backoff strategy for producers – yes (limited exponential).
• Tracing support – yes (Zipkin ).
• Library or standalone software – Scala library.
Finagle – from authors
Finagle provides a robust implementation of:
• connection pools, with throttling to avoid TCP connection churn;
• failure detectors, to identify slow or crashed hosts;
• failover strategies, to direct traffic away from unhealthy hosts;
• load-balancers, including “least-connections” and other strategies;
• back-pressure techniques, to defend servers against abusive
clients and dogpiling.
Additionally, Finagle makes it easier to build and deploy a service that
• publishes standard statistics, logs, and exception reports;
• supports distributed tracing (a la Dapper) across protocols;
• optionally uses ZooKeeper for cluster management; and
• supports common sharding strategies.
Finagle – Layered architecture
Finagle - Filters
Finagle – Event loop
Finagle – Future pools
Finagle - conclusion
• Good for complex JVM-based RPC
architectures.
• Very good for Scala, worse experience with
Java (but yes, they have some utility classes).
• Works well with Thrift and HTTP (plus trivial
protocols), but lacks support for Protobuf and
other popular stuff.
• Active developers community (Google
group), but project infrastructure (maven
repo, versioning, etc) still being improved.
Resources
• Moscow Big Systems / Big Data group
http://www.meetup.com/bigmoscow/
• http://www.zeromq.org
• http://zerorpc.dotcloud.com
• http://kafka.apache.org
• http://twitter.github.io/finagle/
QUESTIONS?
AND CONTACTS
 HTTP://MAKSIMALEKSEEV.MOIKRUG.RU/
 HTTP://RU.LINKEDIN.COM/PUB/MAX-ALEXEJEV/51/820/AB9
 HTTP://WWW.SLIDESHARE.NET/MAXALEXEJEV
 MALEXEJEV@GMAIL.COM
 SKYPE: MALEXEJEV

More Related Content

What's hot

Messaging With Apache ActiveMQ
Messaging With Apache ActiveMQMessaging With Apache ActiveMQ
Messaging With Apache ActiveMQ
Bruce Snyder
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
Joe Stein
 

What's hot (20)

Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
Effectively-once semantics in Apache Pulsar
Effectively-once semantics in Apache PulsarEffectively-once semantics in Apache Pulsar
Effectively-once semantics in Apache Pulsar
 
Apache Kafka Demo
Apache Kafka DemoApache Kafka Demo
Apache Kafka Demo
 
Using Apache Pulsar as a Modern, Scalable, High Performing JMS Platform - Pus...
Using Apache Pulsar as a Modern, Scalable, High Performing JMS Platform - Pus...Using Apache Pulsar as a Modern, Scalable, High Performing JMS Platform - Pus...
Using Apache Pulsar as a Modern, Scalable, High Performing JMS Platform - Pus...
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Building a Messaging Solutions for OVHcloud with Apache Pulsar_Pierre Zemb
Building a Messaging Solutions for OVHcloud with Apache Pulsar_Pierre ZembBuilding a Messaging Solutions for OVHcloud with Apache Pulsar_Pierre Zemb
Building a Messaging Solutions for OVHcloud with Apache Pulsar_Pierre Zemb
 
Pulsar - Distributed pub/sub platform
Pulsar - Distributed pub/sub platformPulsar - Distributed pub/sub platform
Pulsar - Distributed pub/sub platform
 
Messaging With Apache ActiveMQ
Messaging With Apache ActiveMQMessaging With Apache ActiveMQ
Messaging With Apache ActiveMQ
 
CoAP Talk
CoAP TalkCoAP Talk
CoAP Talk
 
Pulsar - flexible pub-sub for internet scale
Pulsar - flexible pub-sub for internet scalePulsar - flexible pub-sub for internet scale
Pulsar - flexible pub-sub for internet scale
 
Micro on NATS - Microservices with Messaging
Micro on NATS - Microservices with MessagingMicro on NATS - Microservices with Messaging
Micro on NATS - Microservices with Messaging
 
Scaling MQTT With Apache Kafka
Scaling MQTT With Apache KafkaScaling MQTT With Apache Kafka
Scaling MQTT With Apache Kafka
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
 
Messaging With ActiveMQ
Messaging With ActiveMQMessaging With ActiveMQ
Messaging With ActiveMQ
 
Apache Kafka - Messaging System Overview
Apache Kafka - Messaging System OverviewApache Kafka - Messaging System Overview
Apache Kafka - Messaging System Overview
 
Pulsar Storage on BookKeeper _Seamless Evolution
Pulsar Storage on BookKeeper _Seamless EvolutionPulsar Storage on BookKeeper _Seamless Evolution
Pulsar Storage on BookKeeper _Seamless Evolution
 
Introduction Apache Kafka
Introduction Apache KafkaIntroduction Apache Kafka
Introduction Apache Kafka
 
Apache Bookkeeper and Apache Zookeeper for Apache Pulsar
Apache Bookkeeper and Apache Zookeeper for Apache PulsarApache Bookkeeper and Apache Zookeeper for Apache Pulsar
Apache Bookkeeper and Apache Zookeeper for Apache Pulsar
 

Similar to Modern Distributed Messaging and RPC

(Current22) Let's Monitor The Conditions at the Conference
(Current22) Let's Monitor The Conditions at the Conference(Current22) Let's Monitor The Conditions at the Conference
(Current22) Let's Monitor The Conditions at the Conference
Timothy Spann
 
apachekafka-160907180205.pdf
apachekafka-160907180205.pdfapachekafka-160907180205.pdf
apachekafka-160907180205.pdf
TarekHamdi8
 
JConf.dev 2022 - Apache Pulsar Development 101 with Java
JConf.dev 2022 - Apache Pulsar Development 101 with JavaJConf.dev 2022 - Apache Pulsar Development 101 with Java
JConf.dev 2022 - Apache Pulsar Development 101 with Java
Timothy Spann
 
[March sn meetup] apache pulsar + apache nifi for cloud data lake
[March sn meetup] apache pulsar + apache nifi for cloud data lake[March sn meetup] apache pulsar + apache nifi for cloud data lake
[March sn meetup] apache pulsar + apache nifi for cloud data lake
Timothy Spann
 
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Erik Onnen
 

Similar to Modern Distributed Messaging and RPC (20)

Timothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for MLTimothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for ML
 
Building an Event Bus at Scale
Building an Event Bus at ScaleBuilding an Event Bus at Scale
Building an Event Bus at Scale
 
bigdata 2022_ FLiP Into Pulsar Apps
bigdata 2022_ FLiP Into Pulsar Appsbigdata 2022_ FLiP Into Pulsar Apps
bigdata 2022_ FLiP Into Pulsar Apps
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processing
 
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
 
Linked In Stream Processing Meetup - Apache Pulsar
Linked In Stream Processing Meetup - Apache PulsarLinked In Stream Processing Meetup - Apache Pulsar
Linked In Stream Processing Meetup - Apache Pulsar
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
Hands-on Workshop: Apache Pulsar
Hands-on Workshop: Apache PulsarHands-on Workshop: Apache Pulsar
Hands-on Workshop: Apache Pulsar
 
QoS, QoS Baby
QoS, QoS BabyQoS, QoS Baby
QoS, QoS Baby
 
Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum...
Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum...Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum...
Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum...
 
(Current22) Let's Monitor The Conditions at the Conference
(Current22) Let's Monitor The Conditions at the Conference(Current22) Let's Monitor The Conditions at the Conference
(Current22) Let's Monitor The Conditions at the Conference
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Kafka tutorial
Kafka tutorialKafka tutorial
Kafka tutorial
 
apachekafka-160907180205.pdf
apachekafka-160907180205.pdfapachekafka-160907180205.pdf
apachekafka-160907180205.pdf
 
Architectures with Windows Azure
Architectures with Windows AzureArchitectures with Windows Azure
Architectures with Windows Azure
 
JConf.dev 2022 - Apache Pulsar Development 101 with Java
JConf.dev 2022 - Apache Pulsar Development 101 with JavaJConf.dev 2022 - Apache Pulsar Development 101 with Java
JConf.dev 2022 - Apache Pulsar Development 101 with Java
 
[March sn meetup] apache pulsar + apache nifi for cloud data lake
[March sn meetup] apache pulsar + apache nifi for cloud data lake[March sn meetup] apache pulsar + apache nifi for cloud data lake
[March sn meetup] apache pulsar + apache nifi for cloud data lake
 
Picking a message queue
Picking a  message queuePicking a  message queue
Picking a message queue
 
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
 

Recently uploaded

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
FIDO Alliance
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation Computing
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 

Modern Distributed Messaging and RPC

  • 1. LIGHTWEIGHT MESSAGING AND RPC IN DISTRIBUTED SYSTEMS Max A. Alexejev 11.10.2012
  • 3. Messaging System Message (not packet/byte/…) as a minimal transmission unit. The whole system unifies • Underlying protocol (TCP, UDP) • UNICAST or MULTICAST • Data format (message types & structure) Tied with • Serialization format (text or binary)
  • 5. Typical broker-based messaging Producer [bhost, bport] Broker Consumer [bhost, bport] • Broker is an indirection layer between producer and consumer. • Producer PUSHes messages to broker. • Consumer PULLs messages from broker.
  • 6. The trick is… Producer [bhost, bport] Broker Consumer [bhost, bport] • Producers and consumers are logical units. • Both P and C may be launched in multiple instances. • p2p and pubsub terms are expressed in terms of these logical (!) units. • Even broker may be distributed or replicated entity.
  • 8. In a generic case • A service may be both a consumer for many producers and a producer to many consumers
  • 9. Characteristics and Features • Topology (1-1, 1-N, N-N) • Retries • Service discovery • Guaranteed delivery (in case yes – at-least-once or exactly-once) • Ordering • Acknowledge • Disconnect detection • Transactions support (can participate in distributed transactions) • Persistence • Portability (one or many languages and platforms) • Distributed or not • Highly available or not • Type (p2p or broker-based) • Load balancing strategy for consumers • Client backoff strategy for producers • Tracing support • Library or standalone software
  • 10. Main classes • ESBs (Enterprise service buses) – Slow, but most feature-rich. MuleESB, JbossESB, Apache Camel, many commercial. • JMS implementations – ActiveMQ, JBOSS Messaging, Glassfish, etc. • AMQP implementations – RabbitMQ, Qpid, HornetQ, etc. • Lightweight modern stuff - unstandardized – ZeroMQ, Finagle, Kafka, Beanstalkd, etc.
  • 11. Messaging Performance As usual, its about throughput and latency… Major throughput factors: – Network hardware used – UNICAST vs MULTICAST (for fan-out) Major latency factors: – Persistence (batched or single-message persistence involves sequential or random disk writes) – Transactions – Broker replication – Delivery guarantees (at-least-once & exactly-once)
  • 12. Guaranteed delivery Involves additional logic both on Producer, Consumer and Broker (if any)! This is at-least-once delivery: • Producer needs to get ack’ed by Broker • Consumer needs to track high-watermark of messages received from Broker Exact-once delivery requires more work and even more expensive. Typically implemented as 2-phase commit.
  • 13. Ordering (distributed broker scenario) • Producers receive messages in any order. Very cheap. No Ordering • Messages are ordered within single data partition. Such as: stock symbol, account number, etc. Possible to create well-performing implementation of distributed broker. Partitioned Ordering • All incoming messages are fairly ordered. Scalability and performance is limited. Global (fair) ordering
  • 14. Remote procedure calls Inherently builds on top of some messaging. Method call as a minimal unit (3 states: may succeed returning optional value, throw exception, or time out). Adds some RPC-specific characteristics & features: • Sync or async • Distributed stack traces for exceptions • Interfaces and structs declaration (possibly, via some DSL) – often come with serialization library • May support schema evolution
  • 15. Serialization libraries Currently, there are 4 clear winners: 1. Google Protocol buffers (with ProtoStuff) 2. Apache Thrift 3. Avro 4. MessagePack All provide DSLs and schema evolution. Difference is in wire format and DSL compiler form (program in C, in Java, or does not require compilation).
  • 16. Messaging vs RPC Messaging • In Broker-enabled case: Producers are decoupled from Consumers. Just push message and don’t care who pulls it. • Natively matches messages to events in event-sourcing architectures. RPC • Need to know destination (i.e., service A must know service B and call signature). Messaging and RPC dictate different programming models. RPC requires higher coupling between interacting services.
  • 18. Today’s Overview • Broker[less] peer-to-peer messaging ZeroMQ • Broker-enabled persistent distributed pubsub Apache Kafka • Multi-paradigm and feature-rich RPC in Scala Twitter Finagle
  • 19. ZeroMQ “It's sockets on steroids. It's like mailboxes with routing. It's fast! Things just become simpler. Complexity goes away. It opens the mind. Others try to explain by comparison. It's smaller, simpler, but still looks familiar.” @ ZeroMQ 2.2 Guide
  • 20. ZeroMQ - features • Topology – all, very flexible. • Retries – no. • Service discovery – no. • Guaranteed delivery – no. • Acknowledge – no. • Disconnect detection – no. • Transactions support (can participate in distributed transactions) – no. • Persistence – kind of. • Portability (one or many languages and platforms) – yes, there are many bindings. However, library itself is written in C, so there’s only one “native” binding. • Distributed – yes. • Highly available or not – no. • Type (p2p or broker-based) – mostly p2p. In case of N-N topology, a broker needed in form of ZMQ “Device” with ROUTER/DEALER type sockets. • Load balancing strategy for consumers – yes (???). • Client backoff strategy for producers – no. • Tracing support – no. • Library or standalone software – platform-native library + language bindings.
  • 21. ZeroMQ – features explained Isn’t there too much “no”s ? Yes and no. Most of the features are not provided out of the box, but may be implemented manually in client andor server. Some features are easy to implement (heartbeats, ack’s, retries, …) some are very complex (guaranteed delivery, persistence, high availability).
  • 22. ZeroMQ – what’s bad about it • First of all – name. Think of ZMQ as a sockets library and u’re happy. Consider it messaging middleware and u got frustrated just while reading guide. • Complex implementation for multithreaded clients and servers. • There were issues with services going down due to corrupted packets (so, may not be suitable for WAN). • Some mess with development process. Initial ZMQ developers forked ZMQ as Crossroads.io
  • 23. ZeroMQ – what’s good • Huge list of supported platforms. • MULTICAST support for fan-out (1-N) topology. • High raw performance. • Fluent connect/disconnect/reconnect behavior – really feels how it should be. • Wants to be part of Linux kernel.
  • 24. ZeroMQ – verdict • Good for non-reliable high performance communication, when delivery semantics is not strict. Example - ngx-zeromq module for NGINX. • Good if you can invest sufficient effort in building custom messaging platform on top of ZMQ as a network library. Example – ZeroRPC lib by DotCloud. • Bad for any other purpose.
  • 25. Apache Kafka “We have built a novel messaging system for log processing called Kafka that combines the benefits of traditional log aggregators and messaging systems. On the one hand, Kafka is distributed and scalable, and offers high throughput. On the other hand, Kafka provides an API similar to a messaging system and allows applications to consume log events in real time.” @ Kafka: a Distributed Messaging System for Log Processing, LinkedIn
  • 26. Kafka - features • Topology – all. • Retries – no. • Service discovery – yes (Zookeeper). • Guaranteed delivery – no (at-least-once in normal case). • Acknowledge – no. • Disconnect detection – yes (Zookeeper). • Transactions support (can participate in distributed transactions) – no. • Persistence – yes. • Portability (one or many languages and platforms) – no. • Distributed – yes. • Highly available or not – no (work in progress). • Type (p2p or broker-based) – broker-enabled with distributed broker. • Load balancing strategy for consumers – yes. • Client backoff strategy for producers – yes . • Tracing support – no. • Library or standalone software – standalone + client libraries in Java.
  • 28. Kafka - Internals • Fast writes – Configurable batching – All writes are continuous, no need for random disk access (i.e., works well on commodity SATA/SAS disks in RAID arrays) • Fast reads – O(1) disk search – Extensive use of sendfile() – No in-memory data caching inside Kafka – fully relies on OS file system’s page cache • Elastic horizontal scalability – Zookeeper is used for brokers and consumers discovery – Pubsub topics are distributed among brokers
  • 29. Kafka - conclusion • Good for event-sourcing architectures (especially when they add HA support for brokers). • Good to decouple incoming stream and processing to withstand request spikes. • Very good for logs aggregation and monitoring data collection. • Bad for transactional messaging with rich delivery semantics (exact once etc).
  • 30. Twitter Finagle “Finagle is a protocol-agnostic, asynchronous RPC system for the JVM that makes it easy to build robust clients and servers in Java, Scala, or any JVM-hosted language. Finagle supports a wide variety of request/response- oriented RPC protocols and many classes of streaming protocols.” @ Twitter Engineering Blog
  • 31. Finagle - features • Topology – all, very flexible. • Retries – yes. • Service discovery – yes (Zookeper). • Guaranteed delivery – no. • Acknowledge – no. • Disconnect detection – yes. • Transactions support (can participate in distributed transactions) – no. • Persistence – no. • Portability (one or many languages and platforms) – JVM only. • Distributed – yes. • Highly available – yes. • Type (p2p or broker-based) – p2p. • Load balancing strategy for consumers – yes (least connections etc). • Client backoff strategy for producers – yes (limited exponential). • Tracing support – yes (Zipkin ). • Library or standalone software – Scala library.
  • 32. Finagle – from authors Finagle provides a robust implementation of: • connection pools, with throttling to avoid TCP connection churn; • failure detectors, to identify slow or crashed hosts; • failover strategies, to direct traffic away from unhealthy hosts; • load-balancers, including “least-connections” and other strategies; • back-pressure techniques, to defend servers against abusive clients and dogpiling. Additionally, Finagle makes it easier to build and deploy a service that • publishes standard statistics, logs, and exception reports; • supports distributed tracing (a la Dapper) across protocols; • optionally uses ZooKeeper for cluster management; and • supports common sharding strategies.
  • 33. Finagle – Layered architecture
  • 37. Finagle - conclusion • Good for complex JVM-based RPC architectures. • Very good for Scala, worse experience with Java (but yes, they have some utility classes). • Works well with Thrift and HTTP (plus trivial protocols), but lacks support for Protobuf and other popular stuff. • Active developers community (Google group), but project infrastructure (maven repo, versioning, etc) still being improved.
  • 38. Resources • Moscow Big Systems / Big Data group http://www.meetup.com/bigmoscow/ • http://www.zeromq.org • http://zerorpc.dotcloud.com • http://kafka.apache.org • http://twitter.github.io/finagle/
  • 39. QUESTIONS? AND CONTACTS  HTTP://MAKSIMALEKSEEV.MOIKRUG.RU/  HTTP://RU.LINKEDIN.COM/PUB/MAX-ALEXEJEV/51/820/AB9  HTTP://WWW.SLIDESHARE.NET/MAXALEXEJEV  MALEXEJEV@GMAIL.COM  SKYPE: MALEXEJEV

Editor's Notes

  1. This template can be used as a starter file for presenting training materials in a group setting.SectionsSections can help to organize your slides or facilitate collaboration between multiple authors. On the Home tab under Slides, click Section, and then click Add Section.NotesUse the Notes pane for delivery notes or to provide additional details for the audience. You can see these notes in Presenter View during your presentation. Keep in mind the font size (important for accessibility, visibility, videotaping, and online production)Coordinated colors Pay particular attention to the graphs, charts, and text boxes.Consider that attendees will print in black and white or grayscale. Run a test print to make sure your colors work when printed in pure black and white and grayscale.Graphics, tables, and graphsKeep it simple: If possible, use consistent, non-distracting styles and colors.Label all graphs and tables.
  2. Give a brief overview of the presentation. Describe the major focus of the presentation and why it is important.Introduce each of the major topics.To provide a road map for the audience, you can repeat this Overview slide throughout the presentation, highlighting the particular topic you will discuss next.
  3. Summarize presentation content by restating the important points from the lessons.What do you want the audience to remember when they leave your presentation?
  4. Give a brief overview of the presentation. Describe the major focus of the presentation and why it is important.Introduce each of the major topics.To provide a road map for the audience, you can repeat this Overview slide throughout the presentation, highlighting the particular topic you will discuss next.
  5. Give a brief overview of the presentation. Describe the major focus of the presentation and why it is important.Introduce each of the major topics.To provide a road map for the audience, you can repeat this Overview slide throughout the presentation, highlighting the particular topic you will discuss next.
  6. Give a brief overview of the presentation. Describe the major focus of the presentation and why it is important.Introduce each of the major topics.To provide a road map for the audience, you can repeat this Overview slide throughout the presentation, highlighting the particular topic you will discuss next.
  7. This is another option for an overview slide.