14. You don’t need
the batch layer.
Interesting.
That’s half
the costs.
Stream processing isn’t
reliable on its own!
15. A well-designed
streaming system
provides exactly-once
semantics, even in
case of failure.
Receiving the data
Kafka is a reliable source.
Tracking the offsets in checkpoints.
Transforming the data
Repeatable transformations.
Pushing out the data
Idempotent updates.
Transactional updates. (Saving results and offsets.)
90 1 2 3 4 5 6 7 8
Offset
17. How can I
stream data from
my databases?
A stream is an ever-
growing, immutable
set of events.
Under the hood, a database
is also a stream of events:
creates, updates and deletes.
18. A database is a
view over this
stream of events.
CreateCreateCreateCreateCreateUpdateDeleteCreateUpdateUpdateDeleteUpdateUpdateUpdateDelete
Database
Let’s capture this
internal stream.
19. A consistent snapshot of the entire
database contents at one point in time.
A real-time stream of changes from
that point onward.
PostgreSQL and
Oracle support both.
The technique is called
Change Data Capture.
20. And all this with a
single computational model,
without code duplication.
Complex
asynchronous
transformations…
…with low latency.
And fault-tolerance
through recomputation.
21. The SMACK stack
Spark for Micro-Batch Processing
Mesos for Cluster Management
Akka for Event Processing
Cassandra for Persistence
Kafka for Event Transport
22. Event Processing Micro-Batch Processing
Latency Sub-second Seconds to minutes
Power Simple triggers Complex transformations
A trade-off between latency
and computational power.
Responding to single
events in real-time or a
general analysis over
the stream.
23. Some other alternatives:
Storm, Flink, Samza.
Event Processing Micro-Batch Processing
Latency Sub-second Seconds to minutes
Power Simple triggers Complex transformations
Akka Streams
Reactive Streams
with back pressure.
Kafka Streams
24. Event Processing Micro-Batch Processing
Latency Sub-second Seconds to minutes
Power Simple triggers Complex transformations
SQL
Machine
Learning
Graph
Analytics
Functional
API
25. Cluster Management with
YARN
• Hadoop and related components.
• Job request comes in, YARN places the job.
MESOS
• Any application.
• Job request comes in, MESOS offers
resources, job accepts or rejects.