SlideShare a Scribd company logo
1 of 28
building a system for machine and
event-oriented data
e. sammer | @esammer | september 9, 2015
silicon valley data engineering meetup
© 2015 Rocana, Inc. All Rights Reserved.
context
© 2015 Rocana, Inc. All Rights Reserved.
me
3
• i work here: rocana – cto and cofounder
• i used to work here: cloudera (‘10 – ’14), magnetic, experian, …
• i do this: systems / distributed systems (storage, query, messaging, ...)
• i wrote this:
© 2015 Rocana, Inc. All Rights Reserved.
what we do
4
• we build a system for the operation of modern data centers
• triage and diagnostics, exploration, trends, advanced analytics of complex
systems
• our data: logs, metrics, human activity, anything that occurs in the data center
• “enterprise software” (i.e. we build for others.)
• today: how we built what we built
© 2015 Rocana, Inc. All Rights Reserved.
our typical customer use cases
5
• >100K events / sec (8.6B events / day), sub-second end to end latency, full
fidelity retention, critical use cases
• quality of service - “are credit card transactions happening fast enough?”
• fraud detection - “detect, investigate, prosecute, and learn from fraud.”
• forensic diagnostics - “what really caused the outage last friday?”
• security - “who’s doing what, where, when, why, and how, and is that ok?”
• user behavior - ”capture and correlate user behavior with system performance,
then feed it to downstream systems in realtime.”
© 2015 Rocana, Inc. All Rights Reserved.
depth: 3 meters
© 2015 Rocana, Inc. All Rights Reserved.
high level architecture
7
© 2015 Rocana, Inc. All Rights Reserved.
guarantees
8
• no single point of failure exists
• all components scale horizontally[1]
• data retention and latency is a function of cost, not tech[1]
• every event is delivered provided no more than N - 1 failures occur (where N is
the kafka replication level)
• all operations, including upgrade, are online[2]
• every event is (or appears to be) delivered exactly once[3]
[1] we’re positive there’s a limit, but thus far it has been cost.
[2] from the user’s perspective, at a system level.
[3] when queried via our UI. lots of details here.
© 2015 Rocana, Inc. All Rights Reserved.
events
© 2015 Rocana, Inc. All Rights Reserved.
modeling our world
10
• everything is an event
• each event contains a timestamp, type, location, host, service, body, and type-
specific attributes (k/v pairs)
• build specialized aggregates as necessary - just optimized views of the data
© 2015 Rocana, Inc. All Rights Reserved.
event schema
11
{
id: string,
ts: long,
event_type_id: int,
location: string,
host: string,
service: string,
body: [ null, string ],
attributes: map<string>
}
© 2015 Rocana, Inc. All Rights Reserved.
event types
12
• some event types are standard
– syslog, http, log4j, generic text record, …
• users define custom event types
• producers populate event type
• transformations can turn one event type into another
• event type metadata tells downstream systems how to interpret body and
attributes
© 2015 Rocana, Inc. All Rights Reserved.
ex: generic syslog event
13
event_type_id: 100, // rfc3164, rfc5424 (syslog)
body: … // raw syslog message bytes
attributes: { // extracted fields from body
syslog_message: “DHCPACK from 10.10.0.1 (xid=0x45b63bdc)”,
syslog_severity: “6”, // info severity
syslog_facility: “3”, // daemon facility
syslog_process: “dhclient”,
syslog_pid: “668”,
…
}
© 2015 Rocana, Inc. All Rights Reserved.
ex: generic http event
14
event_type_id: 102, // generic http event
body: … // raw http log message bytes
attributes: {
http_req_method: “GET”,
http_req_vhost: “w2a-demo-02”,
http_req_path: “/api/v1/search?q=service%3Asshd&p=1&s=200”,
http_req_query: “q=service%3Asshd&p=1&s=200”,
http_resp_code: “200”,
…
}
© 2015 Rocana, Inc. All Rights Reserved.
consumers
© 2015 Rocana, Inc. All Rights Reserved.
consumers
16
• …do most of the work
• parallelism
• kafka offset management
• message de-duplication
• transformation (embedded library)
• dead letter queue support
• downstream system knowledge
© 2015 Rocana, Inc. All Rights Reserved.
consumers
17
• …do most of the work
• parallelism
• kafka offset management
• message de-duplication
• transformation (embedded library)
• dead letter queue support
• downstream system knowledge
© 2015 Rocana, Inc. All Rights Reserved.
inside a consumer
18
© 2015 Rocana, Inc. All Rights Reserved.
metrics and time series
© 2015 Rocana, Inc. All Rights Reserved.
aggregation
20
• mostly for time series metrics
• two halves: on write and on query
• data model: (dimensions) => (aggregates)
• on write
– reduce(a: A, b: A) => A over window
– store “base” aggregates, all associative and commutative
• on query
– perform same aggregate or derivative aggregates
– group by the same dimensions
– we use SQL (Impala)
© 2015 Rocana, Inc. All Rights Reserved.
aside: late arriving data (it’s a thing)
21
• never trust a (wall) clock
• producer determines observation time, rest of the system uses this always
• data that shows up late always processed according to observation time
• aggregation consequences
– the same time window can appear multiple times
– solution: aggregate every N seconds, potentially generating multiple aggregates for
the same time bin
• this is real and you must deal with it
– do what we did or
– build a system that mutates/replaces aggregates already output (eww) or
– delay aggregate output for some slop time; drop it if late data shows up
© 2015 Rocana, Inc. All Rights Reserved.
ex: service event volume by host and minute
22
• dimensions: ts, window, location, host, service, metric
• on write, aggregates: count, sum, min, max, last
• epoch, 60000, us-west-2a, w2a-demo-1, sshd, event_volume =>
17, 42, 1, 10, 8
• on query:
– SELECT floor(ts / 60000) as bin, host, service, metric, sum(value_sum) FROM
events WHERE ts BETWEEN x AND y AND metric = ”event_volume” GROUP BY
bin, host, service, metric
• if late arriving data existed in events, the same dimensions would repeat with a
another set of aggregates and would be rolled up as a result of the group by
• tl;dr: normal window aggregation operations
© 2015 Rocana, Inc. All Rights Reserved.
extension, pain, and advice
© 2015 Rocana, Inc. All Rights Reserved.
extending the system
24
• custom producers
• custom consumers
• event types
• parser / transformation plugins
• custom metric definition and aggregate functions
• custom processing jobs on landed data
© 2015 Rocana, Inc. All Rights Reserved.
pain (aka: the struggle is real)
25
• lots of tradeoffs when picking a stream processing solution
– samza: right features, but low level programming model, not supported by vendors.
missing security features.
– storm: too rigid, too slow. not supported by all Hadoop vendors.
– spark streaming: tons of issues initially, but lots of community energy. improving.
– @digitallogic: “my heart says samza, but my head says spark streaming.”
– our (current) needs are meager; do work inside consumers.
• stack complexity, (relative im)maturity
• scaling solr cloud to billions of events per day
© 2015 Rocana, Inc. All Rights Reserved.
if you’re going to try this…
26
• read all the literature on stream processing[1]
• treat it like the distributed systems problem it is
• understand, make, and make good on guarantees
• find the right abstractions
• never trust the hand waving or “hello worlds”
• fully evaluate the projects/products in this space
• understand it’s not just about search
[1] wait, like all of it? yea, like all of it.
© 2015 Rocana, Inc. All Rights Reserved.
things I didn’t talk about
27
• reprocessing data when bad code / transformations are detected
• dealing with data quality issues (“the struggle is real” part 2)
• the user interface and all the fancy analytics
– data visualization and exploration
– event search
– anomalous trend and event detection
– metric, source, and event correlation
– motif finding
– noise reduction and dithering
• event delivery semantics (e.g. at least once, exactly once, etc.)
• alerting
© 2015 Rocana, Inc. All Rights Reserved.
questions?
thank you.
@esammer | esammer@rocana.com

More Related Content

What's hot

DataEngConf: Apache Kafka at Rocana: a scalable, distributed log for machine ...
DataEngConf: Apache Kafka at Rocana: a scalable, distributed log for machine ...DataEngConf: Apache Kafka at Rocana: a scalable, distributed log for machine ...
DataEngConf: Apache Kafka at Rocana: a scalable, distributed log for machine ...Hakka Labs
 
Introduction to Data Engineer and Data Pipeline at Credit OK
Introduction to Data Engineer and Data Pipeline at Credit OKIntroduction to Data Engineer and Data Pipeline at Credit OK
Introduction to Data Engineer and Data Pipeline at Credit OKKriangkrai Chaonithi
 
Scaling graphite for application metrics
Scaling graphite for application metricsScaling graphite for application metrics
Scaling graphite for application metricsJim Plush
 
Scaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big DataScaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big DataTreasure Data, Inc.
 
Developing high frequency indicators using real time tick data on apache supe...
Developing high frequency indicators using real time tick data on apache supe...Developing high frequency indicators using real time tick data on apache supe...
Developing high frequency indicators using real time tick data on apache supe...Zekeriya Besiroglu
 
Monitoring and scaling postgres at datadog
Monitoring and scaling postgres at datadogMonitoring and scaling postgres at datadog
Monitoring and scaling postgres at datadogSeth Rosenblum
 
Treasure Data From MySQL to Redshift
Treasure Data  From MySQL to RedshiftTreasure Data  From MySQL to Redshift
Treasure Data From MySQL to RedshiftTreasure Data, Inc.
 
Hoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on SparkHoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on SparkVinoth Chandar
 
Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...
Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...
Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...Data Con LA
 
From Batch to Streaming ET(L) with Apache Apex
From Batch to Streaming ET(L) with Apache ApexFrom Batch to Streaming ET(L) with Apache Apex
From Batch to Streaming ET(L) with Apache ApexDataWorks Summit
 
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...HostedbyConfluent
 
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedData Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedHostedbyConfluent
 
Monitoring and Troubleshooting a Real Time Pipeline
Monitoring and Troubleshooting a Real Time PipelineMonitoring and Troubleshooting a Real Time Pipeline
Monitoring and Troubleshooting a Real Time PipelineApache Apex
 
Kappa Architecture on Apache Kafka and Querona: datamass.io
Kappa Architecture on Apache Kafka and Querona: datamass.ioKappa Architecture on Apache Kafka and Querona: datamass.io
Kappa Architecture on Apache Kafka and Querona: datamass.ioPiotr Czarnas
 
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...HostedbyConfluent
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Sparktsliwowicz
 
Bullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query EngineBullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query EngineDataWorks Summit
 
Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7Jack Gudenkauf
 

What's hot (20)

Open source data ingestion
Open source data ingestionOpen source data ingestion
Open source data ingestion
 
DataEngConf: Apache Kafka at Rocana: a scalable, distributed log for machine ...
DataEngConf: Apache Kafka at Rocana: a scalable, distributed log for machine ...DataEngConf: Apache Kafka at Rocana: a scalable, distributed log for machine ...
DataEngConf: Apache Kafka at Rocana: a scalable, distributed log for machine ...
 
Introduction to Data Engineer and Data Pipeline at Credit OK
Introduction to Data Engineer and Data Pipeline at Credit OKIntroduction to Data Engineer and Data Pipeline at Credit OK
Introduction to Data Engineer and Data Pipeline at Credit OK
 
Scaling graphite for application metrics
Scaling graphite for application metricsScaling graphite for application metrics
Scaling graphite for application metrics
 
Scaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big DataScaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big Data
 
Developing high frequency indicators using real time tick data on apache supe...
Developing high frequency indicators using real time tick data on apache supe...Developing high frequency indicators using real time tick data on apache supe...
Developing high frequency indicators using real time tick data on apache supe...
 
Monitoring and scaling postgres at datadog
Monitoring and scaling postgres at datadogMonitoring and scaling postgres at datadog
Monitoring and scaling postgres at datadog
 
Treasure Data From MySQL to Redshift
Treasure Data  From MySQL to RedshiftTreasure Data  From MySQL to Redshift
Treasure Data From MySQL to Redshift
 
Hoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on SparkHoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on Spark
 
Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...
Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...
Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...
 
From Batch to Streaming ET(L) with Apache Apex
From Batch to Streaming ET(L) with Apache ApexFrom Batch to Streaming ET(L) with Apache Apex
From Batch to Streaming ET(L) with Apache Apex
 
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
 
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedData Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
 
Monitoring and Troubleshooting a Real Time Pipeline
Monitoring and Troubleshooting a Real Time PipelineMonitoring and Troubleshooting a Real Time Pipeline
Monitoring and Troubleshooting a Real Time Pipeline
 
Kappa Architecture on Apache Kafka and Querona: datamass.io
Kappa Architecture on Apache Kafka and Querona: datamass.ioKappa Architecture on Apache Kafka and Querona: datamass.io
Kappa Architecture on Apache Kafka and Querona: datamass.io
 
The Rise of Streaming SQL
The Rise of Streaming SQLThe Rise of Streaming SQL
The Rise of Streaming SQL
 
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Spark
 
Bullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query EngineBullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query Engine
 
Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7
 

Viewers also liked

What is support_engineer_in_treasuredata
What is support_engineer_in_treasuredataWhat is support_engineer_in_treasuredata
What is support_engineer_in_treasuredataTreasure Data, Inc.
 
Fluentd and Docker - running fluentd within a docker container
Fluentd and Docker - running fluentd within a docker containerFluentd and Docker - running fluentd within a docker container
Fluentd and Docker - running fluentd within a docker containerTreasure Data, Inc.
 
Fluentd and Docker - running fluentd within a docker container
Fluentd and Docker - running fluentd within a docker containerFluentd and Docker - running fluentd within a docker container
Fluentd and Docker - running fluentd within a docker containerTreasure Data, Inc.
 
Insight Data Engineering: Open source data ingestion
Insight Data Engineering: Open source data ingestionInsight Data Engineering: Open source data ingestion
Insight Data Engineering: Open source data ingestionTreasure Data, Inc.
 
Introduction to New features and Use cases of Hivemall
Introduction to New features and Use cases of HivemallIntroduction to New features and Use cases of Hivemall
Introduction to New features and Use cases of HivemallTreasure Data, Inc.
 
Unifying Events and Logs into the Cloud
Unifying Events and Logs into the CloudUnifying Events and Logs into the Cloud
Unifying Events and Logs into the CloudTreasure Data, Inc.
 
Augmenting Mongo DB with Treasure Data
Augmenting Mongo DB with Treasure DataAugmenting Mongo DB with Treasure Data
Augmenting Mongo DB with Treasure DataTreasure Data, Inc.
 
Packaging Ecosystems -Monki Gras 2017
Packaging Ecosystems -Monki Gras 2017Packaging Ecosystems -Monki Gras 2017
Packaging Ecosystems -Monki Gras 2017Treasure Data, Inc.
 
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)Treasure Data, Inc.
 

Viewers also liked (11)

What is support_engineer_in_treasuredata
What is support_engineer_in_treasuredataWhat is support_engineer_in_treasuredata
What is support_engineer_in_treasuredata
 
Fluentd - Unified logging layer
Fluentd -  Unified logging layerFluentd -  Unified logging layer
Fluentd - Unified logging layer
 
Fluentd and Docker - running fluentd within a docker container
Fluentd and Docker - running fluentd within a docker containerFluentd and Docker - running fluentd within a docker container
Fluentd and Docker - running fluentd within a docker container
 
Fluentd and Docker - running fluentd within a docker container
Fluentd and Docker - running fluentd within a docker containerFluentd and Docker - running fluentd within a docker container
Fluentd and Docker - running fluentd within a docker container
 
Insight Data Engineering: Open source data ingestion
Insight Data Engineering: Open source data ingestionInsight Data Engineering: Open source data ingestion
Insight Data Engineering: Open source data ingestion
 
Introduction to New features and Use cases of Hivemall
Introduction to New features and Use cases of HivemallIntroduction to New features and Use cases of Hivemall
Introduction to New features and Use cases of Hivemall
 
Unifying Events and Logs into the Cloud
Unifying Events and Logs into the CloudUnifying Events and Logs into the Cloud
Unifying Events and Logs into the Cloud
 
Augmenting Mongo DB with Treasure Data
Augmenting Mongo DB with Treasure DataAugmenting Mongo DB with Treasure Data
Augmenting Mongo DB with Treasure Data
 
Packaging Ecosystems -Monki Gras 2017
Packaging Ecosystems -Monki Gras 2017Packaging Ecosystems -Monki Gras 2017
Packaging Ecosystems -Monki Gras 2017
 
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
 
Keynote - Fluentd meetup v14
Keynote - Fluentd meetup v14Keynote - Fluentd meetup v14
Keynote - Fluentd meetup v14
 

Similar to Building a system for machine and event-oriented data with Rocana

Building a system for machine and event-oriented data - Data Day Seattle 2015
Building a system for machine and event-oriented data - Data Day Seattle 2015Building a system for machine and event-oriented data - Data Day Seattle 2015
Building a system for machine and event-oriented data - Data Day Seattle 2015Eric Sammer
 
Building a system for machine and event-oriented data - Velocity, Santa Clara...
Building a system for machine and event-oriented data - Velocity, Santa Clara...Building a system for machine and event-oriented data - Velocity, Santa Clara...
Building a system for machine and event-oriented data - Velocity, Santa Clara...Eric Sammer
 
Building an Event-oriented Data Platform with Kafka, Eric Sammer
Building an Event-oriented Data Platform with Kafka, Eric Sammer Building an Event-oriented Data Platform with Kafka, Eric Sammer
Building an Event-oriented Data Platform with Kafka, Eric Sammer confluent
 
Building a system for machine and event-oriented data - SF HUG Nov 2015
Building a system for machine and event-oriented data - SF HUG Nov 2015Building a system for machine and event-oriented data - SF HUG Nov 2015
Building a system for machine and event-oriented data - SF HUG Nov 2015Felicia Haggarty
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...Data Con LA
 
Volta: Logging, Metrics, and Monitoring as a Service
Volta: Logging, Metrics, and Monitoring as a ServiceVolta: Logging, Metrics, and Monitoring as a Service
Volta: Logging, Metrics, and Monitoring as a ServiceLN Renganarayana
 
OSDC 2018 | From Monolith to Microservices by Paul Puschmann_
OSDC 2018 | From Monolith to Microservices by Paul Puschmann_OSDC 2018 | From Monolith to Microservices by Paul Puschmann_
OSDC 2018 | From Monolith to Microservices by Paul Puschmann_NETWAYS
 
How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.Renzo Tomà
 
from source to solution - building a system for event-oriented data
from source to solution - building a system for event-oriented datafrom source to solution - building a system for event-oriented data
from source to solution - building a system for event-oriented dataEric Sammer
 
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...confluent
 
Kakfa summit london 2019 - the art of the event-streaming app
Kakfa summit london 2019 - the art of the event-streaming appKakfa summit london 2019 - the art of the event-streaming app
Kakfa summit london 2019 - the art of the event-streaming appNeil Avery
 
Suning OpenStack Cloud and Heat
Suning OpenStack Cloud and HeatSuning OpenStack Cloud and Heat
Suning OpenStack Cloud and HeatQiming Teng
 
[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scale
[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scale[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scale
[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scaleDataScienceConferenc1
 
The art of the event streaming application: streams, stream processors and sc...
The art of the event streaming application: streams, stream processors and sc...The art of the event streaming application: streams, stream processors and sc...
The art of the event streaming application: streams, stream processors and sc...confluent
 
Kafka summit SF 2019 - the art of the event-streaming app
Kafka summit SF 2019 - the art of the event-streaming appKafka summit SF 2019 - the art of the event-streaming app
Kafka summit SF 2019 - the art of the event-streaming appNeil Avery
 
Cloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark AnalyticsCloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark Analyticsamesar0
 
Actors or Not: Async Event Architectures
Actors or Not: Async Event ArchitecturesActors or Not: Async Event Architectures
Actors or Not: Async Event ArchitecturesYaroslav Tkachenko
 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaThe Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaKai Wähner
 
Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...
Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...
Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...Tammy Bednar
 
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...In-Memory Computing Summit
 

Similar to Building a system for machine and event-oriented data with Rocana (20)

Building a system for machine and event-oriented data - Data Day Seattle 2015
Building a system for machine and event-oriented data - Data Day Seattle 2015Building a system for machine and event-oriented data - Data Day Seattle 2015
Building a system for machine and event-oriented data - Data Day Seattle 2015
 
Building a system for machine and event-oriented data - Velocity, Santa Clara...
Building a system for machine and event-oriented data - Velocity, Santa Clara...Building a system for machine and event-oriented data - Velocity, Santa Clara...
Building a system for machine and event-oriented data - Velocity, Santa Clara...
 
Building an Event-oriented Data Platform with Kafka, Eric Sammer
Building an Event-oriented Data Platform with Kafka, Eric Sammer Building an Event-oriented Data Platform with Kafka, Eric Sammer
Building an Event-oriented Data Platform with Kafka, Eric Sammer
 
Building a system for machine and event-oriented data - SF HUG Nov 2015
Building a system for machine and event-oriented data - SF HUG Nov 2015Building a system for machine and event-oriented data - SF HUG Nov 2015
Building a system for machine and event-oriented data - SF HUG Nov 2015
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
 
Volta: Logging, Metrics, and Monitoring as a Service
Volta: Logging, Metrics, and Monitoring as a ServiceVolta: Logging, Metrics, and Monitoring as a Service
Volta: Logging, Metrics, and Monitoring as a Service
 
OSDC 2018 | From Monolith to Microservices by Paul Puschmann_
OSDC 2018 | From Monolith to Microservices by Paul Puschmann_OSDC 2018 | From Monolith to Microservices by Paul Puschmann_
OSDC 2018 | From Monolith to Microservices by Paul Puschmann_
 
How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.
 
from source to solution - building a system for event-oriented data
from source to solution - building a system for event-oriented datafrom source to solution - building a system for event-oriented data
from source to solution - building a system for event-oriented data
 
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
 
Kakfa summit london 2019 - the art of the event-streaming app
Kakfa summit london 2019 - the art of the event-streaming appKakfa summit london 2019 - the art of the event-streaming app
Kakfa summit london 2019 - the art of the event-streaming app
 
Suning OpenStack Cloud and Heat
Suning OpenStack Cloud and HeatSuning OpenStack Cloud and Heat
Suning OpenStack Cloud and Heat
 
[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scale
[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scale[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scale
[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scale
 
The art of the event streaming application: streams, stream processors and sc...
The art of the event streaming application: streams, stream processors and sc...The art of the event streaming application: streams, stream processors and sc...
The art of the event streaming application: streams, stream processors and sc...
 
Kafka summit SF 2019 - the art of the event-streaming app
Kafka summit SF 2019 - the art of the event-streaming appKafka summit SF 2019 - the art of the event-streaming app
Kafka summit SF 2019 - the art of the event-streaming app
 
Cloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark AnalyticsCloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark Analytics
 
Actors or Not: Async Event Architectures
Actors or Not: Async Event ArchitecturesActors or Not: Async Event Architectures
Actors or Not: Async Event Architectures
 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaThe Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
 
Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...
Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...
Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...
 
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
 

More from Treasure Data, Inc.

GDPR: A Practical Guide for Marketers
GDPR: A Practical Guide for MarketersGDPR: A Practical Guide for Marketers
GDPR: A Practical Guide for MarketersTreasure Data, Inc.
 
AR and VR by the Numbers: A Data First Approach to the Technology and Market
AR and VR by the Numbers: A Data First Approach to the Technology and MarketAR and VR by the Numbers: A Data First Approach to the Technology and Market
AR and VR by the Numbers: A Data First Approach to the Technology and MarketTreasure Data, Inc.
 
Introduction to Customer Data Platforms
Introduction to Customer Data PlatformsIntroduction to Customer Data Platforms
Introduction to Customer Data PlatformsTreasure Data, Inc.
 
Hands-On: Managing Slowly Changing Dimensions Using TD Workflow
Hands-On: Managing Slowly Changing Dimensions Using TD WorkflowHands-On: Managing Slowly Changing Dimensions Using TD Workflow
Hands-On: Managing Slowly Changing Dimensions Using TD WorkflowTreasure Data, Inc.
 
Brand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
Brand Analytics Management: Measuring CLV Across Platforms, Devices and AppsBrand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
Brand Analytics Management: Measuring CLV Across Platforms, Devices and AppsTreasure Data, Inc.
 
How to Power Your Customer Experience with Data
How to Power Your Customer Experience with DataHow to Power Your Customer Experience with Data
How to Power Your Customer Experience with DataTreasure Data, Inc.
 
Why Your VR Game is Virtually Useless Without Data
Why Your VR Game is Virtually Useless Without DataWhy Your VR Game is Virtually Useless Without Data
Why Your VR Game is Virtually Useless Without DataTreasure Data, Inc.
 
Connecting the Customer Data Dots
Connecting the Customer Data DotsConnecting the Customer Data Dots
Connecting the Customer Data DotsTreasure Data, Inc.
 
Harnessing Data for Better Customer Experience and Company Success
Harnessing Data for Better Customer Experience and Company SuccessHarnessing Data for Better Customer Experience and Company Success
Harnessing Data for Better Customer Experience and Company SuccessTreasure Data, Inc.
 
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...
Treasure Data:  Move your data from MySQL to Redshift with (not much more tha...Treasure Data:  Move your data from MySQL to Redshift with (not much more tha...
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...Treasure Data, Inc.
 
Partner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_dataPartner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_dataTreasure Data, Inc.
 

More from Treasure Data, Inc. (14)

GDPR: A Practical Guide for Marketers
GDPR: A Practical Guide for MarketersGDPR: A Practical Guide for Marketers
GDPR: A Practical Guide for Marketers
 
AR and VR by the Numbers: A Data First Approach to the Technology and Market
AR and VR by the Numbers: A Data First Approach to the Technology and MarketAR and VR by the Numbers: A Data First Approach to the Technology and Market
AR and VR by the Numbers: A Data First Approach to the Technology and Market
 
Introduction to Customer Data Platforms
Introduction to Customer Data PlatformsIntroduction to Customer Data Platforms
Introduction to Customer Data Platforms
 
Hands On: Javascript SDK
Hands On: Javascript SDKHands On: Javascript SDK
Hands On: Javascript SDK
 
Hands-On: Managing Slowly Changing Dimensions Using TD Workflow
Hands-On: Managing Slowly Changing Dimensions Using TD WorkflowHands-On: Managing Slowly Changing Dimensions Using TD Workflow
Hands-On: Managing Slowly Changing Dimensions Using TD Workflow
 
Brand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
Brand Analytics Management: Measuring CLV Across Platforms, Devices and AppsBrand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
Brand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
 
How to Power Your Customer Experience with Data
How to Power Your Customer Experience with DataHow to Power Your Customer Experience with Data
How to Power Your Customer Experience with Data
 
Why Your VR Game is Virtually Useless Without Data
Why Your VR Game is Virtually Useless Without DataWhy Your VR Game is Virtually Useless Without Data
Why Your VR Game is Virtually Useless Without Data
 
Connecting the Customer Data Dots
Connecting the Customer Data DotsConnecting the Customer Data Dots
Connecting the Customer Data Dots
 
Harnessing Data for Better Customer Experience and Company Success
Harnessing Data for Better Customer Experience and Company SuccessHarnessing Data for Better Customer Experience and Company Success
Harnessing Data for Better Customer Experience and Company Success
 
Scalable Hadoop in the cloud
Scalable Hadoop in the cloudScalable Hadoop in the cloud
Scalable Hadoop in the cloud
 
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...
Treasure Data:  Move your data from MySQL to Redshift with (not much more tha...Treasure Data:  Move your data from MySQL to Redshift with (not much more tha...
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...
 
Partner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_dataPartner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_data
 
Introduction to Hivemall
Introduction to HivemallIntroduction to Hivemall
Introduction to Hivemall
 

Recently uploaded

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 

Recently uploaded (20)

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 

Building a system for machine and event-oriented data with Rocana

  • 1. building a system for machine and event-oriented data e. sammer | @esammer | september 9, 2015 silicon valley data engineering meetup
  • 2. © 2015 Rocana, Inc. All Rights Reserved. context
  • 3. © 2015 Rocana, Inc. All Rights Reserved. me 3 • i work here: rocana – cto and cofounder • i used to work here: cloudera (‘10 – ’14), magnetic, experian, … • i do this: systems / distributed systems (storage, query, messaging, ...) • i wrote this:
  • 4. © 2015 Rocana, Inc. All Rights Reserved. what we do 4 • we build a system for the operation of modern data centers • triage and diagnostics, exploration, trends, advanced analytics of complex systems • our data: logs, metrics, human activity, anything that occurs in the data center • “enterprise software” (i.e. we build for others.) • today: how we built what we built
  • 5. © 2015 Rocana, Inc. All Rights Reserved. our typical customer use cases 5 • >100K events / sec (8.6B events / day), sub-second end to end latency, full fidelity retention, critical use cases • quality of service - “are credit card transactions happening fast enough?” • fraud detection - “detect, investigate, prosecute, and learn from fraud.” • forensic diagnostics - “what really caused the outage last friday?” • security - “who’s doing what, where, when, why, and how, and is that ok?” • user behavior - ”capture and correlate user behavior with system performance, then feed it to downstream systems in realtime.”
  • 6. © 2015 Rocana, Inc. All Rights Reserved. depth: 3 meters
  • 7. © 2015 Rocana, Inc. All Rights Reserved. high level architecture 7
  • 8. © 2015 Rocana, Inc. All Rights Reserved. guarantees 8 • no single point of failure exists • all components scale horizontally[1] • data retention and latency is a function of cost, not tech[1] • every event is delivered provided no more than N - 1 failures occur (where N is the kafka replication level) • all operations, including upgrade, are online[2] • every event is (or appears to be) delivered exactly once[3] [1] we’re positive there’s a limit, but thus far it has been cost. [2] from the user’s perspective, at a system level. [3] when queried via our UI. lots of details here.
  • 9. © 2015 Rocana, Inc. All Rights Reserved. events
  • 10. © 2015 Rocana, Inc. All Rights Reserved. modeling our world 10 • everything is an event • each event contains a timestamp, type, location, host, service, body, and type- specific attributes (k/v pairs) • build specialized aggregates as necessary - just optimized views of the data
  • 11. © 2015 Rocana, Inc. All Rights Reserved. event schema 11 { id: string, ts: long, event_type_id: int, location: string, host: string, service: string, body: [ null, string ], attributes: map<string> }
  • 12. © 2015 Rocana, Inc. All Rights Reserved. event types 12 • some event types are standard – syslog, http, log4j, generic text record, … • users define custom event types • producers populate event type • transformations can turn one event type into another • event type metadata tells downstream systems how to interpret body and attributes
  • 13. © 2015 Rocana, Inc. All Rights Reserved. ex: generic syslog event 13 event_type_id: 100, // rfc3164, rfc5424 (syslog) body: … // raw syslog message bytes attributes: { // extracted fields from body syslog_message: “DHCPACK from 10.10.0.1 (xid=0x45b63bdc)”, syslog_severity: “6”, // info severity syslog_facility: “3”, // daemon facility syslog_process: “dhclient”, syslog_pid: “668”, … }
  • 14. © 2015 Rocana, Inc. All Rights Reserved. ex: generic http event 14 event_type_id: 102, // generic http event body: … // raw http log message bytes attributes: { http_req_method: “GET”, http_req_vhost: “w2a-demo-02”, http_req_path: “/api/v1/search?q=service%3Asshd&p=1&s=200”, http_req_query: “q=service%3Asshd&p=1&s=200”, http_resp_code: “200”, … }
  • 15. © 2015 Rocana, Inc. All Rights Reserved. consumers
  • 16. © 2015 Rocana, Inc. All Rights Reserved. consumers 16 • …do most of the work • parallelism • kafka offset management • message de-duplication • transformation (embedded library) • dead letter queue support • downstream system knowledge
  • 17. © 2015 Rocana, Inc. All Rights Reserved. consumers 17 • …do most of the work • parallelism • kafka offset management • message de-duplication • transformation (embedded library) • dead letter queue support • downstream system knowledge
  • 18. © 2015 Rocana, Inc. All Rights Reserved. inside a consumer 18
  • 19. © 2015 Rocana, Inc. All Rights Reserved. metrics and time series
  • 20. © 2015 Rocana, Inc. All Rights Reserved. aggregation 20 • mostly for time series metrics • two halves: on write and on query • data model: (dimensions) => (aggregates) • on write – reduce(a: A, b: A) => A over window – store “base” aggregates, all associative and commutative • on query – perform same aggregate or derivative aggregates – group by the same dimensions – we use SQL (Impala)
  • 21. © 2015 Rocana, Inc. All Rights Reserved. aside: late arriving data (it’s a thing) 21 • never trust a (wall) clock • producer determines observation time, rest of the system uses this always • data that shows up late always processed according to observation time • aggregation consequences – the same time window can appear multiple times – solution: aggregate every N seconds, potentially generating multiple aggregates for the same time bin • this is real and you must deal with it – do what we did or – build a system that mutates/replaces aggregates already output (eww) or – delay aggregate output for some slop time; drop it if late data shows up
  • 22. © 2015 Rocana, Inc. All Rights Reserved. ex: service event volume by host and minute 22 • dimensions: ts, window, location, host, service, metric • on write, aggregates: count, sum, min, max, last • epoch, 60000, us-west-2a, w2a-demo-1, sshd, event_volume => 17, 42, 1, 10, 8 • on query: – SELECT floor(ts / 60000) as bin, host, service, metric, sum(value_sum) FROM events WHERE ts BETWEEN x AND y AND metric = ”event_volume” GROUP BY bin, host, service, metric • if late arriving data existed in events, the same dimensions would repeat with a another set of aggregates and would be rolled up as a result of the group by • tl;dr: normal window aggregation operations
  • 23. © 2015 Rocana, Inc. All Rights Reserved. extension, pain, and advice
  • 24. © 2015 Rocana, Inc. All Rights Reserved. extending the system 24 • custom producers • custom consumers • event types • parser / transformation plugins • custom metric definition and aggregate functions • custom processing jobs on landed data
  • 25. © 2015 Rocana, Inc. All Rights Reserved. pain (aka: the struggle is real) 25 • lots of tradeoffs when picking a stream processing solution – samza: right features, but low level programming model, not supported by vendors. missing security features. – storm: too rigid, too slow. not supported by all Hadoop vendors. – spark streaming: tons of issues initially, but lots of community energy. improving. – @digitallogic: “my heart says samza, but my head says spark streaming.” – our (current) needs are meager; do work inside consumers. • stack complexity, (relative im)maturity • scaling solr cloud to billions of events per day
  • 26. © 2015 Rocana, Inc. All Rights Reserved. if you’re going to try this… 26 • read all the literature on stream processing[1] • treat it like the distributed systems problem it is • understand, make, and make good on guarantees • find the right abstractions • never trust the hand waving or “hello worlds” • fully evaluate the projects/products in this space • understand it’s not just about search [1] wait, like all of it? yea, like all of it.
  • 27. © 2015 Rocana, Inc. All Rights Reserved. things I didn’t talk about 27 • reprocessing data when bad code / transformations are detected • dealing with data quality issues (“the struggle is real” part 2) • the user interface and all the fancy analytics – data visualization and exploration – event search – anomalous trend and event detection – metric, source, and event correlation – motif finding – noise reduction and dithering • event delivery semantics (e.g. at least once, exactly once, etc.) • alerting
  • 28. © 2015 Rocana, Inc. All Rights Reserved. questions? thank you. @esammer | esammer@rocana.com