Messaging, interoperability and log aggregation - a new framework

Messaging,
interoperability and log
aggregation - a new
framework

Tomas Doran (t0m) <bobtﬁsh@bobtﬁsh.net>

Who are you?
• Perl Developer
• Been paid to write perl code for ~14 years

• Open Source hacker
• Catalyst core team
• >160 CPAN dists

• Also C, Javascript, ruby, etc..

Sponsored by
• state51
• Pb of mogilefs, 100+ boxes.
• > 4 million tracks on-demand via API
• > 400 reqs/s per server, >1Gb peak from backhaul
• Suretec VOIP Systems
• UK voice over IP provider
• Extensive API, including WebHooks for notiﬁcations
• TIM Group
• International Financial apps
• Java / ruby / puppet

What?
• This talk is about my new perl library:
Message::Passing

Why?
• I’d better stop, and explain a speciﬁc
problem.
• The solution that grew out of this is more
generic.
• But it illustrates my concerns and design
choices well.
• And everyone likes a story, right?

Once upon a time...

• I was bored of tailing log ﬁles across dozens
of servers
• splunk was amazing, but unaffordable

Centralised logging
• Syslog isn’t good enough
• UDP is lossy, TCP not much better
• Limited ﬁelds
• No structure to actual message
• RFC3164 - “This document describes the
observed behaviour of the syslog protocol”

Centralised logging
• Structured app logging
• We want to log data, rather than text
from our application
• E.g. HTTP request - vhost, path, time to
generate, N db queries etc..

Centralised logging

Centralised logging
• Post-process log ﬁles to re-structure
• We can do this in cases we don’t control
• Apache logs, etc..
• SO MANY DATE FORMATS. ARGHH!!

Centralised logging
• Post-process log ﬁles to re-structure
• Publish logs as JSON to a message queue
• JSON is fast, and widely supported
• Great for arbitrary structured data!

Message queue
• Flattens load spikes!
• Only have to keep up with average message
volume, not peak volume.
• Logs are bursty! (Peak rate 1000x average.)
• Easy to scale - just add more consumers
• Allows smart routing
• Great as a common integration point.

elasticsearch
• Just tip JSON documents into it
• Figures out type for each ﬁeld, indexes
appropriately.
• Free sharding and replication
• Histograms!

Histograms!
• elasticsearch does ‘big data’, not just text
search.
• Ask arbitrary questions
• Get back aggregate metrics / counts
• Very powerful.

Logstash
In JRuby, by Jordan Sissel

Input
Simple: Filter
Output

Flexible
Extensible
Plays well with others
Nice web interface

Logstash
INPUT

FILTER

OUTPUT

Java (JRuby) decoding
AMQP is, however
much much faster than
perl doing that...

JVM+-

Logstash on each host
is totally out...

• Running it on elasticsearch servers which
are already dedicated to this is ﬁne..
• I’d still like to reuse all of it’s parsing

This talk
• Is about my new library: Message::Passing
• The clue is in the name...
• Hopefully really simple
• Maybe even useful!

Wait a second!

• My app logs are already structured!
• Why don’t I just publish AMQP from the
app

Good question!
• I tried that.
• App logging relies on RabbitMQ being up
• Adds a single point of failure.
• Logging isn’t that important!
• ZeroMQ to the rescue

ZeroMQ has the
correct semantics
• Pub/Sub sockets
• Never, ever blocking
• Lossy! (If needed)
• Buffer sizes / locations conﬁgureable
• Arbitrary message size
• IO done in a background thread

On host log collector
• ZeroMQ SUB socket
• App logs - pre structured
• Syslog listener
• Forward rsyslogd
• Log ﬁle tailer
• Ship to AMQP

Lets make it generic!
• So, I wanted a log shipper
• I ended up with a framework for messaging
interoperability
• Whoops!
• Got sick of writing scripts..

Events - my model for
message passing
• a hash {}
• Output consumes events:
• method consume ($event) { ...
• Input produces events:
• has output_to => (..
• Filter does both

Simplifying assumption

$self->output_to->consume($message)

That’s it.
• No, really - that’s all the complexity you
have to care about!
• Except for the complexity introduced by
the inputs and outputs you use.
• Uniﬁed attribute names / reconnection
model, etc.. This helps, somewhat..

Inputs and outputs
• ZeroMQ In / Out
• AMQP (RabbitMQ) In / Out
• STOMP (ActiveMQ) In / Out
• elasticsearch Out
• Redis PubSub In/Out
• Syslog In
• HTTP POST (“WebHooks”) Out

DSL
• Building more complex
chains easy!
• Multiple inputs
• Multiple outputs
• Multiple independent chains

CLI

• 1 Input
• 1 Output
• 1 Filter (default Null)

• For simple use, or testing.

CLI

• Encode / Decode step is just a Filter
• JSON by default

Questions?

I can build my log shipper, without using 1/2
Gb of RAM.

Questions?

I built my log shipper.

Does this actually
work?
• YES - In production at two sites.

Does this actually
work?
• Some of the adaptors are partially
complete

Does this actually
work?
complete
• Dumber than logstash - no multiple
threads/cores

Does this actually
work?
complete
• Dumber than logstash - no multiple
threads/cores
• ZeroMQ is insanely fast

Other people are using
it in production!

Two people I know of already writing adaptors!

What about logstash?
• Use my lightweight code on end nodes.
• Use ‘proper’ logstash for parsing/ﬁltering
on the dedicated hardware (elasticsearch
boxes)
• Filter to change my hashes to logstash
compatible hashes
• For use with MooseX::Storage and/or
Log::Message::Structured

Other applications

• Anywhere an asynchronous event stream is
useful!
• Monitoring
• Metrics transport
• Queued jobs

Other applications
(Web stuff)

• User activity (ajax ‘what are your users
doing’)
• WebSockets / MXHR
• HTTP Push notiﬁcations - “WebHooks”

WebHooks

• HTTP PUSH notiﬁcation
• E.g. Paypal IPN
• Shopify API

Messaging patterns
• Pub / Sub (AMQP / STOMP / Redis / ZMQ)
• Round robin (AMQP / STOMP / Redis /
ZMQ)
• Partial subscribe - ‘routing keys’
• AMQP - Best at this, wildcards anywhere
• Redis - wildcards as sufﬁx
• ZMQ - Exact match

Demo?
• The last demo wasn’t silly enough!
• How could I top that?
• Plan - Re-invent mongrel2
• Badly

PSGI
• PSGI $env is basically just a hash.
• (With a little ﬁddling), you can serialize it as
JSON
• PSGI response is just an array.
• Ignore streaming responses!

PUSH socket does fan
out between multiple
handlers.

Reply to address
embedded in request

Code
• https://metacpan.org/module/
Message::Passing
• https://github.com/suretec/Message-Passing
• #message-passing on irc.perl.org
• Demo examples:
• git://gist.github.com/2941747.git

Messaging, interoperability and log aggregation - a new framework

More Related Content

What's hot

Viewers also liked

Similar to Messaging, interoperability and log aggregation - a new framework

More from Tomas Doran

Recently uploaded

Messaging, interoperability and log aggregation - a new framework

Editor's Notes