Apache Storm vs. Spark Streaming – two Stream Processing Platforms compared

Apache Storm vs. Spark Streaming –
Two Stream Processing Platforms compared
DBTA Workshop on Stream Processing
Berne, 3.12.2014
Guido Schmutz
BASEL BERN BRUGG LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MUNICH STUTTGART VIENNA
2014 © Trivadis
Apache Storm vs. Spark Streaming – Two Stream Processing Platforms compared
3rd December 2014
1

Guido Schmutz
§ Working for Trivadis for more than 18 years
§ Oracle ACE Director for Fusion Middleware and SOA
§ Co-Author of different books
§ Consultant, Trainer Software Architect for Java, Oracle, SOA and
Big Data / Fast Data
§ Member of Trivadis Architecture Board
§ Technology Manager @ Trivadis
§ More than 25 years of software development
experience
§ Contact: guido.schmutz@trivadis.com
§ Blog: http://guidoschmutz.wordpress.com
§ Twitter: gschmutz
2014 © Trivadis
3rd December 2014
2

Our company
Trivadis is a market leader in IT consulting, system integration,
solution engineering and the provision of IT services focusing
on and technologies in Switzerland,
Germany and Austria.
We offer our services in the following strategic business fields:
Trivadis Services takes over the interacting operation of your IT systems.
2014 © Trivadis
O P E R A T I O N
3rd December 2014
3

2014 © Trivadis
Agenda
1. Introduction
2. Apache Storm
3. Apache Spark (Streaming)
4. Unified Log
5. Stream Processing Architectures
3rd December 2014
4

What is Stream Processing?
Infrastructure for continuous data processing
Computational model can be as general as MapReduce but with the ability
to produce low-latency results
Data collected continuously is naturally processed continuously
aka. Event Processing / Complex Event Processing (CEP)
2014 © Trivadis
Einheitlicher Umgang mit Ereignisströmen - Unified Log Processing Architecture
August 2014
5

Why Stream Processing?
Stream Processing
2014 © Trivadis
Response latency
Milliseconds to minutes
RPC
Synchronous Later. Possibly much later.
August 2014
6

How to design a Stream Processing System?
2014 © Trivadis
August 2014
7
Event
Stream
event
Collecting
event
Queue
(Persist)
Event
Stream
event
Collecting
event
Processing
event
Processing
result
result
Event
Stream
event Collecting/
Processing
result

How to scale a Stream Processing System?
event event event result
2014 © Trivadis
August 2014
8
Queue
(Persist)
Event
Stream
event
Collecting
Thread 1 event event
Processing
Thread 1 result
Collecting
Thread 2
Processing
Thread 2
Collecting
Thread n
Processing
Thread n

Collecting
Process 1
2014 © Trivadis
Collecting
Process 1
Collecting
Process 1
event event result
Collecting
Process 1
Collecting
Process 1
August 2014
9
Queue 1
(Persist)
Event
Stream
event
Collecting
Thread 1
event event Processing
Process 1 result
Collecting
Thread 1
Processing
Process 1
Queue 2
event (Persist)
Processing
Process 1
Queue n
(Persist)

Collecting
Process 1
Collecting
Process 2
2014 © Trivadis
Processing A
Process 2
Processing B
Process 2
Processing A
Process 1
Processing B
Process 1
e
e
e
August 2014
Event
Stream
10
Collecting
Process 1
Collecting
Process 2
Processing A
Q2 Thread 2
Processing B
e
e
Q2 Thread 2
Processing A
Q1 Thread 1
Processing B
Q1 Thread 1
Processing A
Process 2
Processing A
Qn Thread n

How to make (stateful) Stream Processing System
reliable?
Faults and stragglers inevitable in large clusters running big data
applications
Streaming applications must recover from them quickly
2014 © Trivadis
e
e
August 2014
11
Collecting
Process 2
Processing A
Process 2
Processing B
Process 2
Event
Stream
Collecting
Process 2
Processing A
Q2 Thread 2
Processing B
e
Q2 Thread 2
Collecting
Process 2
Processing A
Process 2
e
Event
Collecting
Processing A
Processing Processing B
B
Stream
Process 2
Q2 Thread 2
Q2 Thread Process 2
2

reliable?
Solution 1: using active/passive system (hot replication)
• Both systems process the full load
• In case of a failure, automatically switch and use the “passive” system
• Stragglers slow down both active and passive system
2014 © Trivadis
e
e
State
August 2014
12
e
e
State = State in-memory and/or on-disk
Collecting
Process 2
Processing A
Process 2
Processing B
Process 2
Event
Stream
Collecting
Process 2
Processing A
Q2 Thread 2
Processing B
Q2 Thread 2
Active
Collecting
Process 2
Processing A
Process 2
Processing B
Process 2
Collecting
Process 2
Processing A
Q2 Thread 2
Processing B
Q2 Thread 2
Passive
State

reliable?
Solution 2: Upstream backup
• Nodes buffer sent messages and reply them to new node in case of failure
• Stragglers are treated as failures
Collecting
Process 2
Processing A
Process 2
e
e
Event
Collecting
Processing A
Processing B
Stream
Process 2
Q2 Thread 2
Process 2 buffer = Buffer for replay in-memory and/or on-disk
2014 © Trivadis
August 2014
13
State = State in-memory and/or on-disk
Processing B
Q2 Thread 2
State

Processing Models
Batch Processing
• Familiar concept of processing data en masse
• Generally incurs a high-latency
(Event-) Stream Processing
• A one-at-a-time processing model
• A datum is processed as it arrives
• Sub-second latency
• Difficult to process state data efficiently
Micro-Batching
• A special case of batch processing with very small batch sizes (tiny)
• A nice mix between batching and streaming
• At cost of latency
• Gives stateful computation, making windowing an easy task
2014 © Trivadis
3rd December 2014
14

Message Delivery Semantics
At most once [0,1]
• Messages my be lost
• Messages never redelivered
At least once [1 .. n]
• Messages will never be lost
• but messages may be redelivered (might be ok if consumer can handle it)
Exactly once [1]
• Messages are never lost
• Messages are never redelivered
• Perfect message delivery
• Incurs higher latency for transactional semantics
2014 © Trivadis
3rd December 2014
15

Requirements dictate the choice
Latency
• Is performance of streaming application paramount
Development Cost
• Is it desired to have similar code bases for batch and stream processing =>
lambda architecture
Message Delivery Guarantees
• Is there high importance on processing every single record, or is some normal
amount of data loss acceptable
Process Fault Tolerance
• Is high-availability of primary concern
2014 © Trivadis
3rd December 2014
16

2014 © Trivadis
Agenda
1. Introduction
2. Apache Storm
4. Unified Log
3rd December 2014
17

Apache Storm
A platform for doing analysis on streams of data as they come in, so you
can react to data as it happens.
• A highly distributed real-time computation system
• Provides general primitives to do real-time computation
• To simplify working with queues & workers
• scalable and fault-tolerant
• complementary to Hadoop
• Written in Clojure, supports Java, Clojure
• Originated at Backtype, acquired by Twitter in 2011
• Open Sourced late 2011
• Part of Apache Incubator since September 2013
2014 © Trivadis
August 2014
18

Apache Storm – Core concepts
Tuple
• Core data structure in storm
• Immutable Set of Key/value pairs
• You can think of Storm tuples as events
• Values must be serializable
Stream
• Key abstraction of Storm
• an unbounded sequence of tuples that can be processed in parallel by Storm
• Each stream is given ID and bolts can produce and consume tuples from
these streams on the basis of their ID
• Each stream also has an associated schema of the tuples that will flow
through it
2014 © Trivadis
August 2014
19
T T T T T T T T

Apache Storm – Core concepts
Topology
• Wires data and functions via a DAG (directed acyclic graph)
• Executes on many machines similar to a MR job in Hadoop
Spout
• Source of data streams (tuples)
• can be run in “reliable” and “unreliable” mode
Bolt
• Consumes 1+ streams and potentially
produces new streams
• Complex operations often require multiple
steps and thus multiple bolts
• Calculate, Filter, Aggregate, Join, Talk to
database
2014 © Trivadis
August 2014
20
Spout
Spout
Bolt
Bolt
Bolt
Subscribes: C & D
Emits: -
Bolt
Source of
Stream B
Subscribes: A
Emits: C
Subscribes: A
Emits: D
Subscribes: A & B
Emits: -

Storm – How does it work ?
2014 © Trivadis
Superbowl
Superbowl
CAS Big Data - FH Bern | Stream- and Event-Processing | Processing Event Streams - Apache Storm
August 2014
NFL: Peyton Manning
and Denver’s elite
offense fall flat in
#Superbowl XLVIII
21
ow.ly/tdQZn
#seahawks #broncos
#Superbowl
Split
Sentence
Twitter
Spout
Word
Count
Split
Sentence
Word
Count
NFL
Manning
… #Superbowl
Peyton
...

2014 © Trivadis
Peyton
Superbowl
Superbowl
August 2014
22
Split
Sentence
Twitter
Spout
Word
Count
Split
Sentence
Word
Count
INCR
Superbowl
INCR
NFL
INCR
Manning
NFL = 1
Manning = 1
1
… #Superbowl
INCR
Superbowl
NFL: Peyton Manning
#SuperBowl XLVIII
ow.ly/tdQZn
#seahawks #broncos
#Superbowl
Superbowl = 2
NFL
Manning
...
INCR
Peyton Peyton = 1

2014 © Trivadis
Peyton
Superbowl
Superbowl
August 2014
23
Split
Sentence
Twitter
Spout
Word
Count
Split
Sentence
Word
Count
INCR
Superbowl
INCR
NFL
INCR
Manning
NFL = 1
Manning= 1
1
… #Superbowl
INCR
Superbowl
NFL: Peyton Manning
#SuperBowl XLVIII
ow.ly/tdQZn
#seahawks #broncos
#Superbowl
Superbowl = 2
NFL
Manning
...
INCR
Peyton Peyton = 1
Report
Peyton= 1
Superbowl = 2
NFL = 1
Manning = 1

Storm - Topology
Global Report
Each Spout or Bolt are running N instances in parallel
2014 © Trivadis
August 2014
24
Split
Sentence
Twitter
Spout
Word
Count
Split
Sentence
Word
Count
Shuffle Fields
Shuffle grouping is random grouping
Fields grouping is grouped by value, such that equal value results in equal task
All grouping replicates to all tasks
Global grouping makes all tuples go to one task
None grouping makes bolt run in the same thread as bolt/spout it subscribes to
Direct grouping producer (task that emits) controls which consumer will receive
Local or Shuffle
grouping
similar to the shuffle grouping but will shuffle tuples among bolt tasks
running in the same worker process, if any. Falls back to shuffle
grouping behavior.

Storm - Creating Topology
2014 © Trivadis
August 2014
25

Using a NoSQL database for storing
results (keeping state with counter type columns)
2014 © Trivadis
superbowl INCR
3rd December 2014
Twitter
Stream
26
Hashtag
Splitter
Twitter
Spout
Hashtag
Counter
Hashtag
Splitter
Hashtag
Counter
seahawks
broncos
superbowl
INCR
seahawks
INCR
broncos
superbowl = 1
seahawks= 1
broncos = 1
superbowl
… #Superbowl
INCR
superbowl
NFL: Peyton Manning
#SuperBowl XLVIII
ow.ly/tdQZn
#seahawks #broncos
#Superbowl
2

Storm Trident
High-Level abstraction on top of storm
Simplifies building topologies
Core data model is the stream
• Processed as a series of batches (micro-batches)
• Stream is partitioned among nodes in cluster
5 kinds of operations in Trident
• Operations that apply locally to each partition and cause no network transfer
• Repartitioning operations that don‘t change the contents
• Aggregation operations that do network transfer
• Operations on grouped streams
• Merges and Joins
2014 © Trivadis
3rd December 2014
27

Storm Trident - Creating Topology
2014 © Trivadis
Bolt Bolt
3rd December 2014
Twitter
Stream
28
tweet tweet Hashtag
Splitter
Twitter
Spout
hashtag Hashtag
Normalizer
Persistent
Aggregate
hashtag
local groupBy

Trident Concepts - Function
• takes in a set of input fields and emits zero or more tuples as output
• fields of the output tuple are appended to the original input tuple in the
stream
• If a function emits no tuples, the original input tuple is filtered out
• Otherwise the input tuple is duplicated for each output tuple
2014 © Trivadis
3rd December 2014
29

Storm Core vs. Storm Trident
2014 © Trivadis
3rd December 2014
30
Core Storm Storm Trident
Community > 100 contributors > 100 contributors
Adoption *** *
Language Options Java, Clojure, Scala,
Python, Ruby, …
Java, Clojure,
Scala
Processing Models Event-Streaming Micro-Batching
Processing DSL No Yes
Stateful Ops No Yes
Distributed RPC Yes Yes
Delivery Guarantees At most once / At least
once
Exactly Once
Latency sub-second seconds
Platform Storm Cluster, YARN Storm Cluster, YARN

2014 © Trivadis
Agenda
1. Introduction
2. Apache Storm
4. Unified Log
3rd December 2014
31

Apache Spark
Apache Spark is a fast and general engine for large-scale data processing
• The hot trend in Big Data!
• Based on 2007 Microsoft Dryad paper
• Written in Scala, supports Java, Python, SQL and R
• Can run programs up to 100x faster than Hadoop MapReduce in memory, or
10x faster on disk
• Runs everywhere – runs on Hadoop, Mesos, standalone or in the cloud
• One of the largest OSS communities in big data with over 200 contributors in
50+ organizations
• Originally developed 2009 in UC Berkley’s AMPLab
• Open Sourced in 2010 – since 2014 part of Apache Software foundation
2014 © Trivadis
3rd December 2014
32

Apache Spark
Spark Core
• General execution engine for the Spark platform
• In-memory computing capabilities deliver speed
• General execution model supports wide variety of use cases
• DAG-based
• Ease of development – native APIs in Java, Scala and Python
Spark Streaming
• Run a streaming computation as a series of very small, deterministic batch jobs
• Batch size as low as ½ sec, latency of about 1 sec
• Exactly-once semantics
• Potential for combining batch and streaming processing in same system
• Started in 2012, first alpha release in 2013
2014 © Trivadis
3rd December 2014
33

Apache Spark - Generality
2014 © Trivadis
3rd December 2014
34
Spark SQL
(Batch
Processing)
Blink DB
(Approximate
Querying)
Spark Streaming
(Real-Time)
MLLib, Spark R
(Machine
Learning)
GraphX
(Graph
Processing)
Spark Core API and Execution Model
Spark
Standalone MESOS YARN HDFS Elastic
Search Cassandra S3 /
DynamoDB
Libraries
Core Runtime
Cluster Resource Managers Data Stores
Adapted from C. Fregly: http://slidesha.re/11PP7FV

Apache Spark – Core concepts
Resilient Distributed Dataset (RDD)
• Core Spark abstraction
• Collections of objects (partitions) spread across cluster
• Partitions can be stored in-memory or on-disk (local)
• Enables parallel processing on data sets
• Build through parallel transformations
• Immutable, recomputable, fault tolerant
• Contains transformation history (“lineage”) for whole data set
Operations
• Stateless Transformations (map, filter, groupBy)
• Actions (count, collect, save)
2014 © Trivadis
August 2014
35

RDD Lineage Example
2014 © Trivadis
3rd December 2014
36
HDFS File Input 1
HadoopRDD
FilteredRDD
MappedRDD
ShuffledRDD
HDFS File
Output
HDFS File Input 2
HadoopRDD
MappedRDD
SparkContext.hadoopFile()
filter()
SparkContext.hadoopFile()
map()
map()
join()
SparkContext.saveAsHadoopFile()
Transformations
(Lazy)
Action
(Execute Transformations)
Adapted from Chris Fregly: http://slidesha.re/11PP7FV

RDD Execution Example
groupByKey()
2014 © Trivadis
ShuffledRDD
….
FileRDD
….
FileRDD
ShuffledRDD
MappedRDD
3rd December 2014
Partition
1
37
FileRDD
Partition
2
….
Partition
5
Partition
1
Partition
2
Partition
5
Partition
1
Partition
2
Partition
5
FileRDD
Partition
1
Partition
2
Partition
1
Partition
2
Partition
1
Partition
2
….
Partition
5
ShuffledRDD
Partition
1
Partition
2
….
Partition
5
Partition
1
Partition
2
filter()
map()
join()
join()

Apache Spark Streaming – Core concepts
Discretized Stream (DStream)
• Core Spark Streaming abstraction
• micro batches of RDD’s
• Operations similar to RDD
Input DStreams
• Represents the stream of raw data received from streaming sources
• Data can be ingested from many sources: Kafka, Kinesis, Flume, Twitter,
ZeroMQ, TCP Socket, Akka actors, etc.
• Custom Sources can be easily written for custom data sources
Operations
• Same as Spark Core
• Additional Stateful transformations (window, reduceByWindow)
2014 © Trivadis
August 2014
38

Discretized Stream (DStream)
RDD @time 1
2014 © Trivadis
message
1
message
2
….
message
n
RDD @time 1
….
….
RDD @time 2
message
1
message
2
….
message
n
RDD @time 2
….
….
3rd December 2014
39
time 1 time 2 time 3
message
…. time n
f(message
1)
f(message
2)
f(message
n)
result
1
result
2
result
n
message
message
message
f(message
1)
f(message
2)
f(message
n)
result
1
result
2
result
n
RDD @time 3
message
1
message
2
….
message
n
RDD @time 3
f(message
1)
f(message
2)
….
f(message
n)
result
1
result
2
….
result
n
RDD @time n
message
1
message
2
….
message
n
RDD @time n
f(message
1)
f(message
2)
….
f(message
n)
result
1
result
2
….
result
n
Input Stream
DStream
MappedDStream
map()
saveAsHadoopFiles()
Time Increasing
Actions Trigger DStream Transformation Lineage
Spark Jobs Adapted from Chris Fregly: http://slidesha.re/11PP7FV

Storm Core vs. Storm Trident vs. Spark Streaming
2014 © Trivadis
3rd December 2014
41
Core Storm Storm Trident Spark Streaming
Community > 100 contributors > 100 contributors > 280 contributors
Adoption *** * *
Language
Java, Clojure, Scala,
Java, Clojure,
Java, Scala
Options
Python, Ruby, …
Scala
Python (coming)
Processing
Models
Event-Streaming Micro-Batching Micro-Batching
Batch (Spark Core)
Processing DSL No Yes Yes
Stateful Ops No Yes Yes
Distributed RPC Yes Yes No
Delivery
At most once / At
Guarantees
least once
Exactly Once Exactly Once
Latency sub-second seconds seconds
Platform Storm Cluster, YARN Storm Cluster, YARN
YARN, Mesos
Standalone, DataStax EE

Unified Log
That’s what most people think about logs
137.229.78.245 - - [02/Jul/2012:13:22:26 -0800] "GET /wp-admin/images/date-button.gif HTTP/1.1" 200 111
137.229.78.245 - - [02/Jul/2012:13:22:26 -0800] "GET /wp-includes/js/tinymce/langs/wp-langs-en.js?ver=349-20805 HTTP/1.1" 200 13593
137.229.78.245 - - [02/Jul/2012:13:22:26 -0800] "GET /wp-includes/js/tinymce/wp-tinymce.php?c=1&ver=349-20805 HTTP/1.1" 200 101114
137.229.78.245 - - [02/Jul/2012:13:22:28 -0800] "POST /wp-admin/admin-ajax.php HTTP/1.1" 200 30747
137.229.78.245 - - [02/Jul/2012:13:22:40 -0800] "POST /wp-admin/post.php HTTP/1.1" 302 -
137.229.78.245 - - [02/Jul/2012:13:22:40 -0800] "GET /wp-admin/post.php?post=387&action=edit&message=1 HTTP/1.1" 200 73160
137.229.78.245 - - [02/Jul/2012:13:22:41 -0800] "GET /wp-includes/css/editor.css?ver=3.4.1 HTTP/1.1" 304 -
137.229.78.245 - - [02/Jul/2012:13:22:41 -0800] "GET /wp-includes/js/tinymce/langs/wp-langs-en.js?ver=349-20805 HTTP/1.1" 304 -
137.229.78.245 - - [02/Jul/2012:13:22:41 -0800] "POST /wp-admin/admin-ajax.php HTTP/1.1" 200 30809
But this is what we mean here by Log
• a structured log (records are numbered beginning with 0 based on order they
2014 © Trivadis
are written)
• aka. commit log or
journal
1st record Next record
August 2014
43
written
0 1 2 3 4 5 6 7 8 9 10 11

Central Unified Log for (real-time) subscription
Take all the organization’s data and put it into a central log for subscription
Properties of the Unified Log:
• Unified: “Enterprise”, single deployment
• Append-Only: events are appended, no update in place => immutable
• Ordered: each event has an offset, which is unique within a shard
• Fast: should be able to handle thousands of messages / sec
• Distributed: lives on a cluster of machines
2014 © Trivadis
August 2014
44
Collector
0 1 2 3 4 5 6 7 8 9 10 11
reads
writes
Consumer
System A
(time = 6)
reads
Consumer
System B
(time = 10)

Apache Kafka - Overview
• A distributed publish-subscribe messaging system
• Designed for processing of real time activity stream data (logs, metrics
collections, social media streams, …)
• Initially developed at LinkedIn, now part of Apache
• Does not follow JMS Standards and does not use JMS API
• Kafka maintains feeds of messages in topics
Producer Producer Producer
2014 © Trivadis
August 2014
45
Kafka Cluster
Consumer Consumer Consumer
0 1 2 3 4 5 6 7 8 9 1
0
1
1
1
2
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9 1
0
1
1
1
2
Anatomy of a topic:
Partition 0
Partition 1
Partition 2
Writes
old new

Apache Kafka - Motivation
LinkedIn’s motivation for Kafka was:
§ “A unified platform for handling all the real-time data feeds a large company
might have.”
2014 © Trivadis
Must haves
§ High throughput to support high volume event feeds.
§ Support real-time processing of these feeds to create new, derived feeds.
§ Support large data backlogs to handle periodic ingestion from offline
systems.
§ Support low-latency delivery to handle more traditional messaging use
cases.
§ Guarantee fault-tolerance in the presence of machine failures.
August 2014
46

Apache Kafka - Performance
Kafka at LinkedIn
Up to 2 million writes/sec on 3 cheap machines
§ Using 3 producers on 3 different machines
2014 © Trivadis
August 2014
47
10+ billion
writes per day
172k
messages per second
(average)
55+ billion
messages per day
to real-time consumers
http://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines

Apache Kafka - Partition offsets
Offset: messages in the partitions are each assigned a unique (per
partition) and sequential id called the offset
• Consumers track their pointers via (offset, partition, topic) tuples
2014 © Trivadis
August 2014
48
Consumer group C1

Apache Kafka – two Options for Log Cleanup
Retaining a window of data
• Ideal for event data
• Window can be defined in time (days) or space (GBs) – defaults to 1 week
Retain a complete log (log compaction)
• Ideal for keyed data
• Keep a space-efficient complete
2014 © Trivadis
August 2014
49
log of changes
• Log compaction runs in the
background
• Ensures that always at least the
last known value for each message
key within the log of data is retained

Data Flow Graphs using Unified Log
Stream processing
allows
for computing feeds
off of other feeds
Derived feeds
are no different
than original feeds
they are computed off
Single deployment of
“Unified Log” but
logically different
feeds
2014 © Trivadis
Customer Aggregate
August 2014
50
Meter
Readings Collector
Enrich /
Transform
Aggregate
by Minute
Raw Meter
Readings
Meter with
Customer
Meter by Customer
by Minute
by Minute
Meter by
Minute
Persist
Meter by
Minute
Persist
Raw Meter
Readings

Architectural Pattern: Standalone Event Stream
Processing
2014 © Trivadis
August 2014
Social Media
52
Event Processing
(ESP / CEP)
State Store /
Event Store
Enterprise Event Bus
(Ingress)
Event
Cloud
Streams
Internet of
Things
Enterprise
Event Bus
Analytical
Applications
52
DB
Enterprise
Service Bus
Business Rule
Management
Rules System
Event Processing
Result
Store

Architectural Pattern: Event Stream Processing as part
of Lambda Architecture
2014 © Trivadis
Hadoop Big Data
Infrastructure
August 2014
Social Media
53
Event Processing
(ESP / CEP)
State Store /
Event Store
(Ingress)
Event
Cloud
Streams
Internet of
Things
Enterprise
Event Bus
Analytical
Applications
53
DB
Enterprise
Service Bus
Event Processing
Map/
HDFS Reduce Result
Store
Result
Store

Architectural Pattern: Event Stream Processing as part
of Kappa Architecture
2014 © Trivadis
Hadoop Big Data
Infrastructure
August 2014
Social Media
54
Event Processing
(ESP / CEP)
State Store /
Event Store
(Ingress)
Event
Cloud
Streams
Internet of
Things
Analytical
Applications
54
DB
Enterprise
Service Bus
Event Processing
HDFS Replay
Result
Store

Questions and answers ...
Guido Schmutz
Technology Manager
BASEL BERN BRUGES LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MUNICH STUTTGART VIENNA
2014 © Trivadis
3rd December 2014
55

Apache Storm vs. Spark Streaming – two Stream Processing Platforms compared

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Apache Storm vs. Spark Streaming – two Stream Processing Platforms compared

Similar to Apache Storm vs. Spark Streaming – two Stream Processing Platforms compared (20)

More from Guido Schmutz

More from Guido Schmutz (20)

Recently uploaded

Recently uploaded (20)

Apache Storm vs. Spark Streaming – two Stream Processing Platforms compared