Building a system for machine and event-oriented data - Velocity, Santa Clara 2015

building a system for machine and
event-oriented data
e. sammer | @esammer | may 28, 2015
velocity, santa clara, 2015

© 2015 Rocana, Inc. All Rights Reserved.
context

me
3
• i work here: rocana – cto and cofounder
• i used to work here: cloudera (‘10 – ’14), magnetic, experian, …
• i do this: systems / distributed systems (storage, query, messaging, ...)
• i wrote this:
if marie (my editor) is in the room,
yes, i’m hard at work on the second
edition. honest. hi marie.

what we do
4
• we build a system for the operation of modern data centers
• triage and diagnostics, exploration, trends, advanced analytics of complex
systems
• our data: logs, metrics, human activity, anything that occurs in the data center
• “enterprise software” (i.e. we build for others.)
• today: how we built what we built
coffee!
selling
speaking
helpful venn diagram of
today’s events

our typical customer use cases
5
• ~100K events / sec (8.6B events / day), sub-second end to end latency, full
fidelity retention, critical use cases
• quality of service - “are credit card transactions happening fast enough?”
• fraud detection - “detect, investigate, prosecute, and learn from fraud.”
• forensic diagnostics - “what really caused the outage last friday?”
• security - “who’s doing what, where, when, why, and how, and is that ok?”
• user behavior - ”capture and correlate user behavior with system performance,
then feed it to downstream systems in realtime.”

depth: 3 meters

high level architecture
7

guarantees
8
• no single point of failure exists
• all components scale horizontally[1]
• data retention and latency is a function of cost, not tech[1]
• every event is delivered provided no more than N - 1 failures occur (where N is
the kafka replication level)
• all operations, including upgrade, are online[2]
• every event is (or appears to be) delivered exactly once[3]
[1] we’re positive there’s a limit, but thus far it has been cost.
[2] from the user’s perspective, at a system level.
[3] when queried via our UI. lots of details here.

events

modeling our world
10
• everything is an event
• each event contains a timestamp, type, location, host, service, body, and type-
specific attributes (k/v pairs)
• build specialized aggregates as necessary - just optimized views of the data

event schema
11
{
ts: long,
event_type_id: int,
location: string,
host: string,
service: string,
body: [ null, string ],
attributes: map<string>
}

event types
12
• some event types are standard
– syslog, http, log4j, generic text record, …
• users define custom event types
• producers populate event type
• transformations can turn one event type into another
• event type metadata tells downstream systems how to interpret body and
attributes

ex: generic syslog event
13
event_type_id: 100, // rfc3164, rfc5424 (syslog)
body: … // raw syslog message bytes
attributes: { // extracted fields from body
syslog_message: “DHCPACK from 10.10.0.1 (xid=0x45b63bdc)”,
syslog_severity: “6”, // info severity
syslog_facility: “3”, // daemon facility
syslog_process: “dhclient”,
syslog_pid: “668”,
…
}

ex: generic http event
14
event_type_id: 102, // generic http event
body: … // raw http log message bytes
attributes: {
http_req_method: “GET”,
http_req_vhost: “w2a-demo-02”,
http_req_path: “/api/v1/search?q=service%3Asshd&p=1&s=200”,
http_req_query: “q=service%3Asshd&p=1&s=200”,
http_resp_code: “200”,
…
}

consumers

consumers
16
• …do most of the work
• parallelism
• kafka offset management
• message de-duplication
• transformation (embedded library)
• dead letter queue support
• downstream system knowledge

consumers
17
• …do most of the work
• parallelism
• kafka offset management
• message de-duplication
• transformation (embedded library)
• dead letter queue support
• downstream system knowledge

inside a consumer
18

metrics and time series

aggregation
20
• mostly for time series metrics
• two halves: on write and on query
• data model: (dimensions) => (aggregates)
• on write
– reduce(a: A, b: A): B over window
– store “base” aggregates, all associative and commutative
• on query
– perform same aggregate or build non-associative/commutative aggregates
– group by the same dimensions
– we use SQL (Impala)

aside: late arriving data (it’s a thing)
21
• never trust a (wall) clock
• producer determines observation time, rest of the system uses this always
• data that shows up late always processed according to observation time
• aggregation consequences
– the same time window can appear multiple times
– solution: aggregate every N seconds, potentially generating multiple aggregates for
the same time bin
• this is real and you must deal with it
– do what we did or
– build a system that mutates/replaces aggregates already output (eww) or
– delay aggregate output for some slop time; drop it if late data shows up

ex: service event volume by host and minute
22
• dimensions: ts, window, location, host, service, metric
• on write, aggregates: count, sum, min, max, last
• epoch, 60000, us-west-2a, w2a-demo-1, sshd, event_volume =>
17, 42, 1, 10, 8
• on query:
– SELECT floor(ts / 60000) as bin, host, service, metric, sum(value_sum) FROM
events WHERE ts BETWEEN x AND y AND metric = ”event_volume” GROUP BY
bin, host, service, metric
• if late arriving data existed in events, the same dimensions would repeat with a
another set of aggregates and would be rolled up as a result of the group by
• tl;dr: normal window aggregation operations

extension, pain, and advice

extending the system
24
• custom producers
• custom consumers
• event types
• parser / transformation plugins
• custom metric definition and aggregate functions
• custom processing jobs on landed data

pain (aka: the struggle is real)
25
• lots of tradeoffs when picking a stream processing solution
– samza: right features, but low level programming model, not supported by vendors.
missing security features.
– storm: too rigid, too slow. not supported by all Hadoop vendors.
– spark streaming: tons of issues initially, but lots of community energy. improving.
– @digitallogic: “my heart says samza, but my head says spark streaming.”
– our (current) needs are meager; do work inside consumers.
• stack complexity, (relative im)maturity
• scaling solr cloud to billions of events per day

if you’re going to try this…
26
• read all the literature on stream processing[1]
• treat it like the distributed systems problem it is
• understand, make, and make good on guarantees
• find the right abstractions
• never trust the hand waving or “hello worlds”
• fully evaluate the projects/products in this space
• understand it’s not just about search
[1] wait, like all of it? yea, like all of it.

things I didn’t talk about
27
• reprocessing data when bad code / transformations are detected
• dealing with data quality issues (“the struggle is real” part 2)
• the user interface and all the fancy analytics
– data visualization and exploration
– event search
– anomalous trend and event detection
– metric, source, and event correlation
– motif finding
– noise reduction and dithering
• event delivery semantics (e.g. at least once, exactly once, etc.)
• alerting

questions?
thank you.
@esammer | esammer@rocana.com

Building a system for machine and event-oriented data - Velocity, Santa Clara 2015

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Building a system for machine and event-oriented data - Velocity, Santa Clara 2015

Similar to Building a system for machine and event-oriented data - Velocity, Santa Clara 2015 (20)

Recently uploaded

Recently uploaded (20)

Building a system for machine and event-oriented data - Velocity, Santa Clara 2015