12. What is?
Distributed streaming platform
You can produce and consume messages similar to conventional message systems.
Store data - fault tolerant
Process events in order
14. Log
Persistent sequence messages
Newest messages are append to the log
Similar to database transaction log
State recovery
Sequence of actions / events
Debug
Audit
19. Producer
Write messages to partitions
Round Robbin - Balanced
Semantic Strategy - Hot Spot attention
Messages can be compressed
Sync
Send and wait for response.
Exception will be treat and resend manually
Async
Register a callback
20. Producer
Several configs
acks = 0
■ Very fast.
■ Without consistent guarantee
acks = 1
■ Wait the response of the leader
■ Sync can increase the latency
acks = all
■ Wait for all replication process
27. Summary
Topic can have partitions
Each partition is a log
Producers send messages
Consumers pull messages and control the offset
Each consumer group receive just one message per partition
Number of partitions is the unit of parallel processing
28. Use to connect data systems to kafka
Can override common ETL systems
Available connectors:
Amazon S3
JDBC, MYSQL
HANA
Cassandra
Elastic Search
FTP
Connector
31. Other features
Stream
Aggregation
Generate new topics
Complex data pipelines
Dynamic data processor
KSQL
Real time streaming with sql language