-
1.
Apache Samza*
Stream Processing at LinkedIn
Chris Riccomini
11/13/2013
* Incubating
-
2.
Stream Processing?
-
3.
0 ms
Response latency
-
4.
0 ms
Response latency
Synchronous
-
5.
0 ms
Response latency
Synchronous
Later. Possibly much later.
-
6.
0 ms
Response latency
Milliseconds to minutes
Synchronous
Later. Possibly much later.
-
7.
Newsfeed
-
8.
News
-
9.
Ad Relevance
-
10.
Email
-
11.
Search Indexing Pipeline
-
12.
Metrics and Monitoring
-
13.
Motivation
-
14.
Real-time Feeds
•
•
•
•
User activity
Metrics
Monitoring
Database Changes
-
15.
Real-time Feeds
• 10+ billion writes per day
• 172,000 messages per second (average)
• 55+ billion messages per day to real-time
consumers
-
16.
Stream Processing is Hard
•
•
•
•
•
•
Partitioning
State
Re-processing
Failure semantics
Joins to services or database
Non-determinism
-
17.
Samza Concepts
&
Architecture
-
18.
Streams
Partition 0
Partition 1
Partition 2
-
19.
Streams
Partition 0
1
2
3
4
5
6
Partition 1
1
2
3
4
5
Partition 2
1
2
3
4
5
6
7
-
20.
Streams
Partition 0
1
2
3
4
5
6
Partition 1
1
2
3
4
5
Partition 2
1
2
3
4
5
6
7
-
21.
Streams
Partition 0
1
2
3
4
5
6
Partition 1
1
2
3
4
5
Partition 2
1
2
3
4
5
6
7
-
22.
Streams
Partition 0
1
2
3
4
5
6
Partition 1
1
2
3
4
5
Partition 2
1
2
3
4
5
6
7
-
23.
Streams
Partition 0
1
2
3
4
5
6
Partition 1
1
2
3
4
5
Partition 2
1
2
3
4
5
6
7
-
24.
Streams
Partition 0
1
2
3
4
5
6
Partition 1
1
2
3
4
5
Partition 2
1
2
3
4
5
6
7
next append
-
25.
Tasks
Partition 0
-
26.
Tasks
Partition 0
Task 1
-
27.
Tasks
Partition 0
class PageKeyViewsCounterTask implements StreamTask {
public void process(IncomingMessageEnvelope envelope,
MessageCollector collector,
TaskCoordinator coordinator) {
GenericRecord record = ((GenericRecord) envelope.getMsg());
String pageKey = record.get("page-key").toString();
int newCount = pageKeyViews.get(pageKey).incrementAndGet();
collector.send(countStream, pageKey, newCount);
}
}
-
28.
Tasks
Partition 0
class PageKeyViewsCounterTask implements StreamTask {
public void process(IncomingMessageEnvelope envelope,
MessageCollector collector,
TaskCoordinator coordinator) {
GenericRecord record = ((GenericRecord) envelope.getMsg());
String pageKey = record.get("page-key").toString();
int newCount = pageKeyViews.get(pageKey).incrementAndGet();
collector.send(countStream, pageKey, newCount);
}
}
-
29.
Tasks
Partition 0
class PageKeyViewsCounterTask implements StreamTask {
public void process(IncomingMessageEnvelope envelope,
MessageCollector collector,
TaskCoordinator coordinator) {
GenericRecord record = ((GenericRecord) envelope.getMsg());
String pageKey = record.get("page-key").toString();
int newCount = pageKeyViews.get(pageKey).incrementAndGet();
collector.send(countStream, pageKey, newCount);
}
}
-
30.
Tasks
Partition 0
class PageKeyViewsCounterTask implements StreamTask {
public void process(IncomingMessageEnvelope envelope,
MessageCollector collector,
TaskCoordinator coordinator) {
GenericRecord record = ((GenericRecord) envelope.getMsg());
String pageKey = record.get("page-key").toString();
int newCount = pageKeyViews.get(pageKey).incrementAndGet();
collector.send(countStream, pageKey, newCount);
}
}
-
31.
Tasks
Partition 0
class PageKeyViewsCounterTask implements StreamTask {
public void process(IncomingMessageEnvelope envelope,
MessageCollector collector,
TaskCoordinator coordinator) {
GenericRecord record = ((GenericRecord) envelope.getMsg());
String pageKey = record.get("page-key").toString();
int newCount = pageKeyViews.get(pageKey).incrementAndGet();
collector.send(countStream, pageKey, newCount);
}
}
-
32.
Tasks
Partition 0
class PageKeyViewsCounterTask implements StreamTask {
public void process(IncomingMessageEnvelope envelope,
MessageCollector collector,
TaskCoordinator coordinator) {
GenericRecord record = ((GenericRecord) envelope.getMsg());
String pageKey = record.get("page-key").toString();
int newCount = pageKeyViews.get(pageKey).incrementAndGet();
collector.send(countStream, pageKey, newCount);
}
}
-
33.
Tasks
Partition 0
class PageKeyViewsCounterTask implements StreamTask {
public void process(IncomingMessageEnvelope envelope,
MessageCollector collector,
TaskCoordinator coordinator) {
GenericRecord record = ((GenericRecord) envelope.getMsg());
String pageKey = record.get("page-key").toString();
int newCount = pageKeyViews.get(pageKey).incrementAndGet();
collector.send(countStream, pageKey, newCount);
}
}
-
34.
Tasks
Partition 0
class PageKeyViewsCounterTask implements StreamTask {
public void process(IncomingMessageEnvelope envelope,
MessageCollector collector,
TaskCoordinator coordinator) {
GenericRecord record = ((GenericRecord) envelope.getMsg());
String pageKey = record.get("page-key").toString();
int newCount = pageKeyViews.get(pageKey).incrementAndGet();
collector.send(countStream, pageKey, newCount);
}
}
-
35.
Tasks
Partition 0
class PageKeyViewsCounterTask implements StreamTask {
public void process(IncomingMessageEnvelope envelope,
MessageCollector collector,
TaskCoordinator coordinator) {
GenericRecord record = ((GenericRecord) envelope.getMsg());
String pageKey = record.get("page-key").toString();
int newCount = pageKeyViews.get(pageKey).incrementAndGet();
collector.send(countStream, pageKey, newCount);
}
}
-
36.
Tasks
Partition 0
class PageKeyViewsCounterTask implements StreamTask {
public void process(IncomingMessageEnvelope envelope,
MessageCollector collector,
TaskCoordinator coordinator) {
GenericRecord record = ((GenericRecord) envelope.getMsg());
String pageKey = record.get("page-key").toString();
int newCount = pageKeyViews.get(pageKey).incrementAndGet();
collector.send(countStream, pageKey, newCount);
}
}
-
37.
Tasks
Partition 0
Task 1
-
38.
Tasks
Page Views - Partition 0
1
2
3
4
PageKeyViews
CounterTask
Partition 0
Partition 1
Output Count Stream
-
39.
Tasks
Page Views - Partition 0
1
2
3
4
PageKeyViews
CounterTask
Partition 0
Partition 1
Output Count Stream
-
40.
Tasks
Page Views - Partition 0
1
2
3
4
PageKeyViews
CounterTask
Partition 0
Partition 1
Output Count Stream
-
41.
Tasks
Page Views - Partition 0
1
2
3
4
PageKeyViews
CounterTask
Output Count Stream
Partition 0
Partition 1
-
42.
Tasks
Page Views - Partition 0
1
2
3
4
PageKeyViews
CounterTask
Output Count Stream
Partition 0
Partition 1
-
43.
Tasks
Page Views - Partition 0
1
2
3
4
PageKeyViews
CounterTask
Output Count Stream
Partition 0
Partition 1
-
44.
Tasks
Page Views - Partition 0
1
2
3
4
PageKeyViews
CounterTask
Output Count Stream
Partition 0
Partition 1
-
45.
Tasks
Page Views - Partition 0
1
2
3
4
PageKeyViews
CounterTask
Output Count Stream
Partition 0
Partition 1
-
46.
Tasks
Page Views - Partition 0
1
2
3
4
PageKeyViews
CounterTask
Checkpoint
Stream
2
Output Count Stream
Partition 1
Partition 0
Partition 1
-
47.
Tasks
Page Views - Partition 0
1
2
3
4
PageKeyViews
CounterTask
Checkpoint
Stream
2
Output Count Stream
Partition 1
Partition 0
Partition 1
-
48.
Tasks
Page Views - Partition 0
1
2
3
4
PageKeyViews
CounterTask
Checkpoint
Stream
2
Output Count Stream
Partition 1
Partition 0
Partition 1
-
49.
Tasks
Page Views - Partition 0
1
2
3
4
PageKeyViews
CounterTask
Checkpoint
Stream
2
Output Count Stream
Partition 1
Partition 0
Partition 1
-
50.
Tasks
Page Views - Partition 0
1
2
3
4
PageKeyViews
CounterTask
Checkpoint
Stream
2
Output Count Stream
Partition 1
Partition 0
Partition 1
-
51.
Tasks
Page Views - Partition 0
1
2
3
4
PageKeyViews
CounterTask
Checkpoint
Stream
2
Output Count Stream
Partition 1
Partition 0
Partition 1
-
52.
Tasks
Page Views - Partition 0
1
2
3
4
PageKeyViews
CounterTask
Checkpoint
Stream
2
Output Count Stream
Partition 1
Partition 0
Partition 1
-
53.
Tasks
Page Views - Partition 0
1
2
3
4
PageKeyViews
CounterTask
Checkpoint
Stream
2
Output Count Stream
Partition 1
Partition 0
Partition 1
-
54.
Tasks
Page Views - Partition 0
1
2
3
4
PageKeyViews
CounterTask
Checkpoint
Stream
2
Output Count Stream
Partition 1
Partition 0
Partition 1
-
55.
Jobs
Stream A
Task 1
Task 2
Stream B
Task 3
-
56.
Jobs
Stream A
Task 1
Stream B
Task 2
Stream C
Task 3
-
57.
Jobs
AdViews
Task 1
AdClicks
Task 2
AdClickThroughRate
Task 3
-
58.
Jobs
AdViews
Task 1
AdClicks
Task 2
AdClickThroughRate
Task 3
-
59.
Jobs
Stream A
Task 1
Stream B
Task 2
Stream C
Task 3
-
60.
Dataflow
Stream A
Stream B
Job 1
Stream D
Job 2
Stream E
Job 3
Stream B
Stream C
-
61.
Dataflow
Stream A
Stream B
Job 1
Stream D
Job 2
Stream E
Job 3
Stream B
Stream C
-
62.
YARN
-
63.
YARN
You: I want to run command X on two machines with
512M of memory.
-
64.
YARN
You: I want to run command X on two machines with
512M of memory.
YARN: Cool, where’s your code?
-
65.
YARN
You: I want to run command X on two machines with
512M of memory.
YARN: Cool, where’s your code?
You: http://some-host/jobs/download/my.tgz
-
66.
YARN
You: I want to run command X on two machines with
512M of memory.
YARN: Cool, where’s your code?
You: http://some-host/jobs/download/my.tgz
YARN: I’ve run your command on grid-node-2 and
grid-node-7.
-
67.
YARN
Host 1
Host 2
Host 3
-
68.
YARN
Host 1
Host 2
Host 3
NM
NM
NM
-
69.
YARN
Host 0
RM
Host 1
Host 2
Host 3
NM
NM
NM
-
70.
YARN
Host 0
Client
RM
Host 1
Host 2
Host 3
NM
NM
NM
-
71.
YARN
Host 0
Client
RM
Host 1
Host 2
Host 3
NM
NM
NM
-
72.
YARN
Host 0
Client
RM
Host 1
Host 2
Host 3
NM
NM
NM
-
73.
YARN
Host 0
Client
Host 1
NM
RM
Host 2
AM
Host 3
NM
NM
-
74.
YARN
Host 0
Client
Host 1
NM
RM
Host 2
AM
Host 3
NM
NM
-
75.
YARN
Host 0
Client
Host 1
NM
RM
Host 2
AM
Host 3
NM
NM
-
76.
YARN
Host 0
Client
Host 1
NM
RM
Host 2
AM
Host 3
NM
NM
Container
-
77.
YARN
Host 0
Client
Host 1
NM
RM
Host 2
AM
Host 3
NM
NM
Container
-
78.
YARN
Host 0
Client
Host 1
NM
RM
Host 2
AM
Host 3
NM
NM
-
79.
YARN
Host 0
Client
Host 1
NM
RM
Host 2
AM
Host 3
NM
NM
-
80.
YARN
Host 0
Client
Host 1
NM
RM
Host 2
AM
Host 3
NM
NM
-
81.
YARN
Host 0
Client
Host 1
NM
RM
Host 2
AM
Host 3
NM
NM
-
82.
YARN
Host 0
Client
Host 1
NM
RM
Host 2
AM
Host 3
NM
NM
Container
-
83.
Jobs
Stream A
Task 1
Task 2
Stream B
Task 3
-
84.
Containers
Stream A
Task 1
Task 2
Stream B
Task 3
-
85.
Containers
Stream A
Samza Container 1
Stream B
Samza Container 2
-
86.
Containers
Samza Container 1
Samza Container 2
-
87.
YARN
Host 1
Samza Container 1
Host 2
Samza Container 2
-
88.
YARN
Host 1
Host 2
NodeManager
NodeManager
Samza Container 1
Samza Container 2
-
89.
YARN
Host 1
Host 2
NodeManager
NodeManager
Samza Container 1
Samza Container 2
Samza YARN AM
-
90.
YARN
Host 1
Host 2
NodeManager
NodeManager
Samza Container 1
Kafka Broker
Samza Container 2
Samza YARN AM
Kafka Broker
-
91.
YARN
Host 1
Host 2
NodeManager
NodeManager
MapReduce
Container
HDFS
MapReduce
YARN AM
MapReduce
Container
HDFS
-
92.
YARN
Host 1
Stream A
NodeManager
Samza Container 1
Samza Container 1
Kafka Broker
Stream C
Samza
Container 2
-
93.
YARN
Host 1
Stream A
NodeManager
Samza Container 1
Samza Container 1
Kafka Broker
Stream C
Samza
Container 2
-
94.
YARN
Host 1
Stream A
NodeManager
Samza Container 1
Samza Container 1
Kafka Broker
Stream C
Samza
Container 2
-
95.
YARN
Host 1
Stream A
NodeManager
Samza Container 1
Samza Container 1
Kafka Broker
Stream C
Samza
Container 2
-
96.
YARN
Host 1
Host 2
NodeManager
NodeManager
Samza Container 1
Kafka Broker
Samza Container 2
Samza YARN AM
Kafka Broker
-
97.
CGroups
Host 1
Host 2
NodeManager
NodeManager
Samza Container 1
Kafka Broker
Samza Container 2
Samza YARN AM
Kafka Broker
-
98.
(Not Running) Multi-Framework
Host 1
Host 2
NodeManager
NodeManager
Samza Container 1
Kafka
MapReduce
Container
Samza YARN AM
HDFS
-
99.
Stateful Processing
-
100.
SELECT
col1,
count(*)
FROM
stream1
INNER JOIN
stream2
ON
stream1.col3 = stream2.col3
WHERE
col2 > 20
GROUP BY
col1
ORDER BY
count(*) DESC
LIMIT 50;
-
101.
SELECT
col1,
count(*)
FROM
stream1
INNER JOIN
stream2
ON
stream1.col3 = stream2.col3
WHERE
col2 > 20
GROUP BY
col1
ORDER BY
count(*) DESC
LIMIT 50;
-
102.
SELECT
col1,
count(*)
FROM
stream1
INNER JOIN
stream2
ON
stream1.col3 = stream2.col3
WHERE
col2 > 20
GROUP BY
col1
ORDER BY
count(*) DESC
LIMIT 50;
-
103.
SELECT
col1,
count(*)
FROM
stream1
INNER JOIN
stream2
ON
stream1.col3 = stream2.col3
WHERE
col2 > 20
GROUP BY
col1
ORDER BY
count(*) DESC
LIMIT 10;
-
104.
How do people do this?
-
105.
Remote Stores
Stream A
Task 1
Task 2
Task 3
Key-Value
Store
Stream B
-
106.
Remote RPC is slow
• Stream: ~500k records/sec/container
• DB: << less
-
107.
Online vs. Async
-
108.
No undo
• Database state is non-deterministic
• Can’t roll back mutations if task crashes
-
109.
Tables & Streams
put(a, w)
put(b, x)
Database
put(a, y)
put(b, z)
Time
-
110.
Stateful Tasks
Stream A
Task 1
Task 2
Stream B
Task 3
-
111.
Stateful Tasks
Stream A
Task 1
Task 2
Stream B
Task 3
-
112.
Stateful Tasks
Stream A
Task 1
Task 2
Stream B
Task 3
Changelog Stream
-
113.
Stateful Tasks
Stream A
Task 1
Task 2
Stream B
Task 3
Changelog Stream
-
114.
Stateful Tasks
Stream A
Task 1
Task 2
Stream B
Task 3
Changelog Stream
-
115.
Stateful Tasks
Stream A
Task 1
Task 2
Stream B
Task 3
Changelog Stream
-
116.
Stateful Tasks
Stream A
Task 1
Task 2
Stream B
Task 3
Changelog Stream
-
117.
Stateful Tasks
Stream A
Task 1
Task 2
Stream B
Task 3
Changelog Stream
-
118.
Stateful Tasks
Stream A
Task 1
Task 2
Stream B
Task 3
Changelog Stream
-
119.
Stateful Tasks
Stream A
Task 1
Task 2
Stream B
Task 3
Changelog Stream
-
120.
Stateful Tasks
Stream A
Task 1
Task 2
Stream B
Task 3
Changelog Stream
-
121.
Stateful Tasks
Stream A
Task 1
Task 2
Stream B
Task 3
Changelog Stream
-
122.
Stateful Tasks
Stream A
Task 1
Task 2
Stream B
Task 3
Changelog Stream
-
123.
Stateful Tasks
Stream A
Task 1
Task 2
Stream B
Task 3
Changelog Stream
-
124.
Key-Value Store
•
•
•
•
put(table_name, key, value)
get(table_name, key)
delete(table_name, key)
range(table_name, key1, key2)
-
125.
Stateful Stream Task
public class SimpleStatefulTask implements StreamTask, InitableTask {
private KeyValueStore<String, String> store;
public void init(Config config, TaskContext context) {
this.store = context.getStore("mystore");
}
public void process(
IncomingMessageEnvelope envelope,
MessageCollector collector,
TaskCoordinator coordinator) {
GenericRecord record = (GenericRecord) envelope.getMessage();
String memberId = record.get("member_id");
String name = record.get("name");
System.out.println("old name: " + store.get(memberId));
store.put(memberId, name);
}
}
-
126.
Stateful Stream Task
public class SimpleStatefulTask implements StreamTask, InitableTask {
private KeyValueStore<String, String> store;
public void init(Config config, TaskContext context) {
this.store = context.getStore("mystore");
}
public void process(
IncomingMessageEnvelope envelope,
MessageCollector collector,
TaskCoordinator coordinator) {
GenericRecord record = (GenericRecord) envelope.getMessage();
String memberId = record.get("member_id");
String name = record.get("name");
System.out.println("old name: " + store.get(memberId));
store.put(memberId, name);
}
}
-
127.
Stateful Stream Task
public class SimpleStatefulTask implements StreamTask, InitableTask {
private KeyValueStore<String, String> store;
public void init(Config config, TaskContext context) {
this.store = context.getStore("mystore");
}
public void process(
IncomingMessageEnvelope envelope,
MessageCollector collector,
TaskCoordinator coordinator) {
GenericRecord record = (GenericRecord) envelope.getMessage();
String memberId = record.get("member_id");
String name = record.get("name");
System.out.println("old name: " + store.get(memberId));
store.put(memberId, name);
}
}
-
128.
Stateful Stream Task
public class SimpleStatefulTask implements StreamTask, InitableTask {
private KeyValueStore<String, String> store;
public void init(Config config, TaskContext context) {
this.store = context.getStore("mystore");
}
public void process(
IncomingMessageEnvelope envelope,
MessageCollector collector,
TaskCoordinator coordinator) {
GenericRecord record = (GenericRecord) envelope.getMessage();
String memberId = record.get("member_id");
String name = record.get("name");
System.out.println("old name: " + store.get(memberId));
store.put(memberId, name);
}
}
-
129.
Whew!
-
130.
Let’s be Friends!
• We are incubating, and you can help!
• Get up and running in 5 minutes
http://bit.ly/hello-samza
• Grab some newbie JIRAs
http://bit.ly/samza_newbie_issues
- stream processing for us = anything asynchronous, but not batch computed.- 25% of code is async. 50% is rpc/online. 25% is batch.- stream processing is worst supported.
- stream processing for us = anything asynchronous, but not batch computed.- 25% of code is async. 50% is rpc/online. 25% is batch.- stream processing is worst supported.
- stream processing for us = anything asynchronous, but not batch computed.- 25% of code is async. 50% is rpc/online. 25% is batch.- stream processing is worst supported.
- stream processing for us = anything asynchronous, but not batch computed.- 25% of code is async. 50% is rpc/online. 25% is batch.- stream processing is worst supported.
- compute top shares, pull in, scrape, entity tag- language detection- send emails: friend was in the news- requirement: has to be fast, since news is trendy
- relevance pipeline
- we send relatively data rich emails- some emails are time sensitive (need to be sent soon)
- time sensitive- data ingestion pattern- other systems that follow this pattern: realtimeolap system, and social graph system
- ecosystem at LinkedIn (some unique traits)- hard unsolved problems in this space
- once we had all this data in kafka, we wanted to do stuff with it.- persistent,reliable,distributed,message queue- Kafka = first among equals, but stream systems are pluggable. Just like Hadoop with HDSF vs. S3.
- started with just simple web service that consumes and produces kafka messages.- realized that there are a lot of hard problems that needed to be solved.- reprocessing: what if my algorithm changes and I need to reprocess all events?- non-determinism: queries to external systems, time dependencies, ordering of messages.
- open area of research- been around for 20 years
partitioned
re-playable,ordered,fault tolerant,infinitevery heavyweight definition of a stream (vs. s4, storm, etc)
partition assignment happens on write
At least once messaging. Duplicates are possible.Future: exact semantics.Transparent to user. No ack’ing API.
connected by stream name onlyfully buffered
split job tracker upresource management, process isolation, fault tolerance, security
- group by, sum, count
- stream to stream, stream to table, table to table
- buffered sorting
Changelog/redologState machine model
Can also consume these streams from other jobs.
- can’t keep messages forever. - log compaction: delete over-written keys over time.
- can’t keep messages forever. - log compaction: delete over-written keys over time.
store API is pluggable: Lucene, buffered sort, external sort, bitmap index, bloom filters and sketches