The document promotes a 31.4% discount code for the upcoming Kafka Summit conferences in New York and San Francisco, occurring on May 8th and August 28th respectively. The discount is only valid until March 14th at 11:59 PST. The summits are presented by Twitter, Jay Kreps, Confluent, Apache Kafka, and Confluent's blog which provides information on downloading Apache Kafka and the Confluent Platform.
TODO: fix title
Introduce self
What is Stream Processing
Brief intro to Kafka
Kafka Streams
Database data, log data
Lots of systems—databases, specialized system like search, caches
Business units
N^2 connections
Tons of glue code to stitch it all together
This is what that architecture looks like relying on streaming.
Two key uses:
Acts as a data pipeline between data systems and apps
Acts as a backbone for streams of data for stream processing
Exciting! Important!
About how inputs are translated into outputs (very fundamental)
HTTP/REST
All databases
Run all the time
Each request totally independent—No real ordering
Can fail individual requests if you want
Very simple!
About the future!
“Ed, the MapReduce job never finishes if you watch it like that”
Job kicks off at a certain time
Cron!
Processes all the input, produces all the input
Data is usually static
Hadoop!
DWH, JCL
Archaic but powerful. Can do analytics! Compex algorithms!
Also can be really efficient!
Inherently high latency
Generalizes request/response and batch.
Program takes some inputs and produces some outputs
Could be all inputs
Could be one at a time
Runs continuously forever!
Companies == streams
What a retail store do
Streams
Retail
- Sales
- Shipments and logistics
- Pricing
- Re-ordering
- Analytics
- Fraud and theft
Database data, log data
Lots of systems—databases, specialized system like search, caches
Business units
N^2 connections
Tons of glue code to stitch it all together
This is what that architecture looks like relying on streaming.
Two key uses:
Acts as a data pipeline between data systems and apps
Acts as a backbone for streams of data for stream processing
Quick run-through of the features in Kafka.
It’s a streaming platform.
Lets you publish and subscribe to streams of data, stores them reliably, and lets you process them in real time.
The second half of this talk will dive into Apache Kafka and talk about it acts as streaming platform and let’s you build event-driven stream processing microservices.
Events = Record = Message
Timestamp, an optional key and a value
Key is used for partitioning. Timestamp is used for retention and processing.
Logs
Distributed
Fault-tolerant
Change to Logs Unify Batch and stream processing
World is a process/threads (total order) but no order between
Can’t just scale storage, need to scale processing
Important: order
Four APIs to read and write streams of events
First two are easy, the producer and consumer allow applications to read and write to Kafka.
The connect API allows building connectors that integrate Kafka with existing systems or applications.
The streams api allows stream processing on top of Kafka.
We’ll go through each of these briefly.
Core: Data pipeline
Venture bet: Stream processing
Current state
OpenGL Triangle
TODO: Like Streams library or scala collections or reactive thingies BUT stateful, fault-tolerant, distributed
Add screenshot example
Add screenshot example
TODO: Summarize
Change to “Logs make reprocessing easy”
Time is hard
Need a model of time
Request/Response ignores the issue, you just set an aggressive timeout
Batch solves the issue usually by just freezing all data for the day
Stream processing needs to actually address the issue
Kafka Streams:
Manage the set of live processors and route data to them
Uses Kafka’s group management facility
External framework
Start and restart processes
Package processes
Deploy code
Companies == streams
What a retail store do
Streams
Retail
- Sales
- Shipments and logistics
- Pricing
- Re-ordering
- Analytics
- Fraud and theft
But…no notion of time
It’s a streaming platform.
Lets you publish and subscribe to streams of data, stores them reliably, and lets you process them in real time.
The second half of this talk will dive into Apache Kafka and talk about it acts as streaming platform and let’s you build event-driven stream processing microservices.
Also:
Other talks
Kafka Summit
Streaming data hackathon
Stop by the Confluent booth and ask your questions about Kafka or stream processing
Get a Kafka t-shirt and sticker.
We’re also giving away a few books: the early release of Kafka: The Definitive Guide, Making Sense of Stream Processing, and I Heart Logs
Meet the authors and get your book signed.
We also want to invite you to participate in the Stream Data Hackathon in San Francisco on the evening of April 25, the day before Kafka Summit
You might be interested in some of the other Confluent talks. If you missed it you’ll have access to the video recording.