Kafka
Streaming Data Platform
Traditional Messaging System
• Queue
• Topic
• After Consumed Removed
• Out of order messaging
What is Kafka
• Messaging system
• Polyglot Consumers / Producers
• Topics and Partitions
• Scalable
• Configurable Message Retention
• Guaranteed order
Use Cases
• Ordered Messaging
• Log Aggregation
• Metrics
• Web Activity Tracking
• Stream Processing
Kafka Brokers – Clusters and Replication
• Topics can be replicated
• Data stored across various nodes
• Kafka clusters require broker.id=0
• Zookeeper
• Offsets
• Topic names
• partitions
Demo – Local Kafka
• Startup zookeeper
• bin/zookeeper-server-start.sh config/zookeeper.properties
• Start kafka
• bin/kafka-server-start.sh config/server.properties
Demo Command line tools
• bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-
factor 1 --partitions 1 --topic test
• bin/kafka-topics.sh --list --zookeeper localhost:2181
• bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
• bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic
test --from-beginning
Example Producer
• <CODE>
Example Consumer
• <CODE>
Deployment Options
• Stand alone deployment
• Confluent.io
• Horton Works
• AWS
HortonWorks Data Platform on AWS
Big Data in a one stop shop
Determine Cluster Sizing
• Implement a producer and consumer
• Use your data structures
• 3 Zookeeper nodes and 3 Kafka nodes
• Java Heap = 2GB
• Network Saturation (1 gigabit / 10 gigabit)
• Avro Data Serialization
Producer for testing throughput
• <CODE>
Architectural Possibilities
• Streaming data platform
• Common interface
• High throughput
WARNING
• Kafka 0.8.x has a major bug…deletes data
• Make sure to use 0.9.0.x
Question & Answer
bryancjacobs@gmail.com

kafka-steaming-data