Heka
Unified Data Processing
So. Much. Data.
So. Much. Data.
•Server

level ops data

•Process
•Ops

level data

data / metrics

•Business
•Logging
•Error

data

output

reports / tracebacks
So. Many. Tools.
•collectd
•statsd

/ tcollector

/ graphite / etc.

•[r]syslog[-ng]
•Logstash
•Riemann
•Nagios

/ Esper / other CEP

/ Zenoss
One Basic Pattern
•Acquire

data

•Transform
•Output

and/or Transport data

data
One Multi-Tool?
What would it be like to build a tool to
tackle this in the general case?
Wins:
•Fewer

processes to manage

•Increased

client / configuration
consistency

•Processing

shared across domains
One Multi-Tool?

Requirements:
•Lightweight
•Flexible
•Easily

and configurable

extended
I know, I know...
BUT!

Replacing even two services on each box is a
net ops win.
SCIENCE!
How Heka Is Put Together
Inputs

•Listen
•Just

or fetch

about the low level
transport
Splitters
•Slice

Inputs' raw data
streams into discrete
events

•Text

or binary protocols

•Decouple

protocols from
their transports
Decoders
•Parse

event data to
populate a metadata
envelope for all event
types

•Extract

structure from
unstructured data...

•...

or just wrap a blob

•Sandbox-able

(Lua)
Router
Simple, efficient grammar for matching messages:

Type == "counter" && Payload == "1"
Type == "applog" && Logger == "marketplace"
Type == "alert" && (Severity==7 || Payload=="emergency")
Type == "myapp.metric" && Fields[name] =~ /.*.stat/
Filters

•Watch

flowing data

•Generate

output messages

•Sandbox-able

(Lua)
Outputs
•Deliver

to external
service...

•…

and/or to upstream
Heka...

•…

and/or directly to Heka
Dashboard UI

•Configurable

reconnect
Sandboxes
Are Fun!

•

Dynamically added to running Heka w/ no
config changes, no restart

●

CPU cycles and RAM usage monitored

●

Misbehaving plugins are shut off
Sandboxes
Are Fun!

•

LPeg (parsing expression grammar) &
JSON libraries for data parsing

•

Circular buffer library for time series data
Sandboxes
Are Fun!
Circular buffers auto-generate
dashboard graphs
Try It Out
https://github.com/mozilla-services/heka
http://hekad.readthedocs.org
https://mail.mozilla.org/listinfo/heka
irc.mozilla.org, #heka
rmiller@mozilla.com

Heka - Rob Miller