Presentation from South Bay.NET meetup on 3/30.
Speaker: Matt Howlett, Software Engineer at Confluent
Apache Kafka is a scalable streaming platform that forms a key part of the infrastructure at many companies including Uber, Netflix, Walmart, Airbnb, Goldman Sachs and LinkedIn. In this talk Matt will give a technical overview of Kafka, discuss some typical use cases (from surge pricing to fraud detection to web analytics) and show you how to use Kafka from within your C#/.NET applications.
- Spikes in usage
- Real world applications often have non-uniform usage patterns
- Want to avoid huge over-provisioning
- Upgrades / outages
- What if you want to do something else with the data?
- What if you want to adopt something other than elastic search?
● Append only
● Delete earliest data based on time / size / never
• Allows topics to scale past
constraints of single server
• Message → partition_id
relevant to application.
• Ordering guarantees per
partition but not across
Apache Kafka Replication
• cheap durability!
• choose # acks for
Apache Kafka Consumer Groups
Partitions are spread across brokers
Server parameters you’re likely to want to tweak
dataDir=<data dir> # location of database snapshots
autopurge.purgeInterval=12 # time interval in hours for which purge task triggered (default: no purge)
log.dir=<data dir> # location of kafka log data
auto.create.topics.enable=false # whether or not topics are auto-create when referenced if don’t exist
delete.topic.enable=true # topics cannot be deleted unless this is set
log.retention.hours=1000000 # ~infinite retention
log.cleaner.dedupe.buffer.size=20000000 # pre-allocated compaction buffer size (bytes)
KAFKA_HEAP_OPTS="-Xmx128M -Xms128M” ./bin/kafka-server-start server.properties
KAFKA_HEAP_OPTS="-Xmx64M –Xms64M” ./bin/zookeeper-server-start zookeeper.properties