Message
Architectures
in Distributed
Systems

Eric Lubow
@elubow
elubow@simplereach.com
#ddtx14
Overview
•

SimpleReach

•

Why is messaging important

•

Goals

•

Explanations

•

Questions

Message Architectures in ...
Personal Vanity
•

CTO of SimpleReach

•

Co-author of Practical Cassandra

•

Skydiver, Mixed Martial Artist,
Motorcyclis...
Message Architectures in Distributed Systems

Eric Lubow

@elubow #ddtx14
Message Architectures in Distributed Systems

Eric Lubow

@elubow #ddtx14
SimpleReach
•

Millions of URLs per day

•

Over 3.25 billion page views per month

•

1.4b events per day (~16k events/se...
Why is Messaging Important?
•

Most large scale systems discussions only talk about storage

•

Direct high volumes of dat...
Data Flow
incoming request

❶

❸ send response
App

❹
async queue message

sync persist data

❷

Message Architectures in ...
Goals
•

Consistent interfaces between systems

•

Allow access to many toolsets

•

Minimize downtime/Minimize cost of do...
Messaging Systems
•
•
•
•

RabbitMQ
ZeroMQ
Kafka
Amazon SQS

•
•
•
•

NSQ
ActiveMQ
Resque
Custom

Message Architectures in...
What Did SimpleReach Choose?

Message Architectures in Distributed Systems
Message Architectures in Distributed Systems

E...
NSQ
•

Distributed and de-centralized topology

•

At least once delivery guaranteed

•

Multicast style message routing

...
Topics and Channels
• a topic is a distinct stream of messages

(a single nsqd instance can have multiple
topics)

nsqd

s...
Everyone Speaks The Same Language

http:// + {“content-type”: “application/json”}

Message Architectures in Distributed Sy...
Goals
•

Consistent interfaces between systems

Message Architectures in Distributed Systems

Eric Lubow

@elubow #ddtx14
NSQ Tools
• nsqadmin provides a web interface to

administrate and introspect an NSQ cluster at
runtime (and empty, pause,...
Right Tool For The Job

Message Architectures in Distributed Systems

Eric Lubow

@elubow #ddtx14
Goals
•

Consistent interfaces between systems

•

Allow access to many toolsets

Message Architectures in Distributed Sys...
How Does It Work?
API

API

API

NSQD
NSQ

NSQD
NSQ

NSQD
NSQ

PUBLISH

REGISTER
nsqlookupd

nsqlookupd
SUBSCRIBE
DISCOVER...
The Schrute of the Problem

Message Architectures in Distributed Systems

Eric Lubow

@elubow #ddtx14
Goals
•

Consistent interfaces between systems

•

Allow access to many toolsets

•

Minimize downtime/Minimize cost of do...
Simple Deployment & Automation
•

Chef cookbook - github.com/simplereach/chef-nsq

•

Written in Go

•

Easily distributab...
Goals
•

Consistent interfaces between systems

•

Allow access to many toolsets

•

Minimize downtime/Minimize cost of do...
Runtime Discovery
nsqlookupd

nsqlookupd

HTTP requests

consumer

➊ regularly poll for topic producers
➋ connect to all p...
Goals
•

Consistent interfaces between systems

•

Allow access to many toolsets

•

Minimize downtime/Minimize cost of do...
Path of a Packet
Fire
Hose

API
SC

Internal API

Internet

Queue

EC

Consumers

Solr
C*
Mongo
Redis
Vertica
Message Arch...
Message Architectures in Distributed Systems

Eric Lubow

@elubow #ddtx14
Controlled Data Flow
NSQ

Broadcast

NSQ

Batch & Write
Processed Data
Social Event
Collector

Social Data

Batch & Write
...
Broadcast Importance for Polyglottany
NSQ

Broadcast

Mongo Writer
Redis Writer
Writer

Aggregator

Cassandra Writer
Solr ...
Message Architectures in Distributed Systems

Eric Lubow

@elubow #ddtx14
Controlled Data Flow
NSQ

Broadcast

NSQ

Batch & Write
Processed Data
Social Event
Collector

Social Data

Batch & Write
...
Goals
•

Consistent interfaces between systems

•

Allow access to many toolsets

•

Minimize downtime/Minimize cost of do...
What Is Enrichment?
A mechanism to add
value to a message to
enhance processing in
your system
Message Architectures in Di...
How Do We Enrich
NSQ

Broadcast

Consumer A

Raw Event

Enriched
Event

Consumer B

Consumer C
Message Architectures in Di...
Goals
•

Consistent interfaces between systems

•

Allow access to many toolsets

•

Minimize downtime/Minimize cost of do...
Monitoring / Instrumentation
•

Comes with statsd support built-in

•

Statsd talks to both Graphite and nsqadmin

•

Nsqa...
Goals
•

Consistent interfaces between systems

•

Allow access to many toolsets

•

Minimize downtime/Minimize cost of do...
Summary
•

Large Systems are more than just storage

•

Abstraction

•

Highly Available

•

Controlled Data Flow Patterns...
We’re
Hiring
Message Architectures in Distributed Systems

Eric Lubow

@elubow #ddtx14
Questions are guaranteed in life.
Answers aren’t.
Eric Lubow
@elubow
elubow@simplereach.com
#ddtx14

Thank you.
Message Ar...
Upcoming SlideShare
Loading in …5
×

Message Architectures in Distributed Systems - Data Day Texas 2013-01-11

2,165 views

Published on

Message architectures are an important part of a distributed system. They are often overlooked because the prevailing sentiment is that the storage and processing engines are the important parts.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,165
On SlideShare
0
From Embeds
0
Number of Embeds
608
Actions
Shares
0
Downloads
50
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Message Architectures in Distributed Systems - Data Day Texas 2013-01-11

  1. 1. Message Architectures in Distributed Systems Eric Lubow @elubow elubow@simplereach.com #ddtx14
  2. 2. Overview • SimpleReach • Why is messaging important • Goals • Explanations • Questions Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14
  3. 3. Personal Vanity • CTO of SimpleReach • Co-author of Practical Cassandra • Skydiver, Mixed Martial Artist, Motorcyclist, Dog dad, NY Giants fan • IronMatt Foundation for Pediatric Brian Tumors (ironmatt.org) Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14
  4. 4. Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14
  5. 5. Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14
  6. 6. SimpleReach • Millions of URLs per day • Over 3.25 billion page views per month • 1.4b events per day (~16k events/second) • Auto-scale 125-160 machines depending on traffic • Built a predictive measurement algorithm for the social web Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14
  7. 7. Why is Messaging Important? • Most large scale systems discussions only talk about storage • Direct high volumes of data around your infrastructure • Control flow of data through your infrastructure • Decouple important systems • Scalability, Elasticity, Deliverability, and Redundancy • Buffering and Asynchronous communication Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14
  8. 8. Data Flow incoming request ❶ ❸ send response App ❹ async queue message sync persist data ❷ Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14
  9. 9. Goals • Consistent interfaces between systems • Allow access to many toolsets • Minimize downtime/Minimize cost of downtime • High availability • Clients should have minimal architecture knowledge • Horizontal Scaling • Controlled Data Flow Patterns • Enrichment/In-stream Modification Schemes • Monitoring and Instrumentation Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14
  10. 10. Messaging Systems • • • • RabbitMQ ZeroMQ Kafka Amazon SQS • • • • NSQ ActiveMQ Resque Custom Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14
  11. 11. What Did SimpleReach Choose? Message Architectures in Distributed Systems Message Architectures in Distributed Systems EricEric Lubow@elubow #ddtx14 Lubow @elubow #ddtx14
  12. 12. NSQ • Distributed and de-centralized topology • At least once delivery guaranteed • Multicast style message routing • Simple to configure and deploy • Allow for maintenance windows with no downtime • Ephemeral channels for testing • Channel sampling github.com/bitly/nsq Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14
  13. 13. Topics and Channels • a topic is a distinct stream of messages (a single nsqd instance can have multiple topics) nsqd separate hosts Topics • a channel is an independent queue for a topic (a topic can have multiple channels) • consumers discover producers by querying nsqlookupd (a discovery service for topics) • topics and channels are created at Consumers “event” A B Channels “metrics” “enrichment” “writer” runtime (just start publishing/ subscribing) Message Architectures in Distributed Systems Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14 Eric Lubow @elubow #ddtx14
  14. 14. Everyone Speaks The Same Language http:// + {“content-type”: “application/json”} Message Architectures in Distributed Systems Message Architectures in Distributed Systems EricEric Lubow@elubow #ddtx14 Lubow @elubow #ddtx14
  15. 15. Goals • Consistent interfaces between systems Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14
  16. 16. NSQ Tools • nsqadmin provides a web interface to administrate and introspect an NSQ cluster at runtime (and empty, pause, or delete topics/ channels) • nsq_to_http - utility that helps transport an aggregate stream over HTTP • nsq_to_file - utility that safely persists an aggregated stream to disk • nsq_stat - iostat like utility for a topic/channel • nsq_tail - tail like utility for a topic/channel Message Architectures in Distributed Systems Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14 Eric Lubow @elubow #ddtx14
  17. 17. Right Tool For The Job Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14
  18. 18. Goals • Consistent interfaces between systems • Allow access to many toolsets Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14
  19. 19. How Does It Work? API API API NSQD NSQ NSQD NSQ NSQD NSQ PUBLISH REGISTER nsqlookupd nsqlookupd SUBSCRIBE DISCOVER consumer consumer Message Architectures in Distributed Systems Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14 Eric Lubow @elubow #ddtx14
  20. 20. The Schrute of the Problem Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14
  21. 21. Goals • Consistent interfaces between systems • Allow access to many toolsets • Minimize downtime/Minimize cost of downtime • High availability Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14
  22. 22. Simple Deployment & Automation • Chef cookbook - github.com/simplereach/chef-nsq • Written in Go • Easily distributable binaries • Deploy lookup nodes • Nsqd’s installed locally Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14
  23. 23. Goals • Consistent interfaces between systems • Allow access to many toolsets • Minimize downtime/Minimize cost of downtime • High availability • Clients should have minimal architecture knowledge Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14
  24. 24. Runtime Discovery nsqlookupd nsqlookupd HTTP requests consumer ➊ regularly poll for topic producers ➋ connect to all producers Message Architectures in Distributed Systems Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14 Eric Lubow @elubow #ddtx14
  25. 25. Goals • Consistent interfaces between systems • Allow access to many toolsets • Minimize downtime/Minimize cost of downtime • High availability • Clients should have minimal architecture knowledge • Horizontal Scaling Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14
  26. 26. Path of a Packet Fire Hose API SC Internal API Internet Queue EC Consumers Solr C* Mongo Redis Vertica Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14
  27. 27. Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14
  28. 28. Controlled Data Flow NSQ Broadcast NSQ Batch & Write Processed Data Social Event Collector Social Data Batch & Write Raw Data Calculate Score Message Architectures in Distributed Systems Eric Lubow Write @elubow #ddtx14
  29. 29. Broadcast Importance for Polyglottany NSQ Broadcast Mongo Writer Redis Writer Writer Aggregator Cassandra Writer Solr Writer Vertica Writer Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14
  30. 30. Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14
  31. 31. Controlled Data Flow NSQ Broadcast NSQ Batch & Write Processed Data Social Event Collector Social Data Batch & Write Raw Data Calculate Score Message Architectures in Distributed Systems Eric Lubow Write @elubow #ddtx14
  32. 32. Goals • Consistent interfaces between systems • Allow access to many toolsets • Minimize downtime/Minimize cost of downtime • High availability • Clients should have minimal architecture knowledge • Horizontal Scaling • Controlled Data Flow Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14
  33. 33. What Is Enrichment? A mechanism to add value to a message to enhance processing in your system Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14
  34. 34. How Do We Enrich NSQ Broadcast Consumer A Raw Event Enriched Event Consumer B Consumer C Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14
  35. 35. Goals • Consistent interfaces between systems • Allow access to many toolsets • Minimize downtime/Minimize cost of downtime • High availability • Clients should have minimal architecture knowledge • Horizontal Scaling • Controlled Data Flow • Enrichment Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14
  36. 36. Monitoring / Instrumentation • Comes with statsd support built-in • Statsd talks to both Graphite and nsqadmin • Nsqadmin comes with graphs for message processing stats • Nagios plugins available for monitoring topic/channel depth • Average end to end latency calculations are done on a per-channel basis Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14
  37. 37. Goals • Consistent interfaces between systems • Allow access to many toolsets • Minimize downtime/Minimize cost of downtime • High availability • Clients should have minimal architecture knowledge • Horizontal Scaling • Controlled Data Flow • Enrichment • Monitoring and Instrumentation Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14
  38. 38. Summary • Large Systems are more than just storage • Abstraction • Highly Available • Controlled Data Flow Patterns • Monitoring & Automation Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14
  39. 39. We’re Hiring Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14
  40. 40. Questions are guaranteed in life. Answers aren’t. Eric Lubow @elubow elubow@simplereach.com #ddtx14 Thank you. Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14

×