5. Batch & Streaming processing
Data
Generator
Ingestion
Distributed
File system
Processing Data Store
Batch processing
Data
Generator
Ingestion
Message
Queue
Processing Data Store
Stream Data processing
6. Batch & Streaming processing
6
Data
Generator
Ingestion
Message
Queue
Processing Data Store
Stream Data processing
Distributed
File system
Processing Data Store
Batch processing
7. Batch & Streaming processing
7
Data
Generator
Ingestion
Message
Queue
Processing Data Store
Stream Data processing
Distributed
File system
Processing Data Store
Batch processing
11. Flume
• Easier to setup
• Rich set of in-build tools
• No inherent support for data replication
• Nodes works in isolation
• Memory channel vs File Channel 11
14. Why is Kafka so fast?
• Fast writes:
• While Kafka persists all data to disk, essentially all writes go to the
page cache of OS, i.e. RAM.
• Fast reads:
• Very efficient to transfer data from page cache to a network socket
• Linux: sendfile() system call
• Combination of the two = fast Kafka!
• Example (Operations):On a Kafka cluster where the consumers are mostly
caught up you will see no read activity on the disks as they will be serving
data entirely from cache.
14
1
http://kafka.apache.org/documentation.html#persistence