Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
1
Stream Processing with
Apache KafkaTM and .NET
Matt Howlett
Confluent Inc.
2
Agenda
Some Typical Use Cases
Technical Overview
[break]
Live Demo in C#
[let’s build a massively scalable web crawler… ...
3
Typical Use Cases
4
• Application Logs
{
“log_level”: 7,
“time”: “2017-03-03 11:45:05.737”,
“consumer-id”: “rdkafka#consumer-1”,
“method”: “...
5
192.168.1.13 - - [23/Aug/2010:03:50:59 +0000] "POST /wordpress3/wp-admin/admin-ajax.php HTTP/1.1" 200 2
"http://www.exam...
6
Log Analytics v1.0
Log
files
ETL
tool
7
Potential Problems
- Spikes in usage
- Real world applications often have non-uniform usage patterns
- Want to avoid hug...
8
Log Analytics v2
Kafka
connect
Kafka
Kafka
connect
Log
files
9
+ Alerting + Fraud/Spam Detection
Kafka
Connect
Kafka Kafka
Connect
Log
files
User
Info
IP
Addr.
Info fraud detection
st...
10
kafka
DWH
search stream processingapps
K/V monitoring real-time analytics Hadoop
rdbms
Before you know it:
11
• Central to architecture at many
companies
• Across industries
12
Technical Overview
13
14
● Persisted
● Append only
● Immutable
● Delete earliest data based on time / size / never
15
• Allows topics to scale past
constraints of single server
• Message → partition_id
deterministic. Partitioning
relevan...
16
Apache Kafka Replication
• cheap durability!
• choose # acks for
message produced
confirmation
17
Apache Kafka Consumer Groups
Partitions are spread across brokers
18
19
Discount code: kafcom17
Use the Apache Kafka community discount code to get $50 off
www.kafka-summit.org
Kafka Summit N...
20
Live Demo
21
Basic Operation
Links
https://www.confluent.io/download/
https://github.com/edenhill/librdkafka/blob/master/CONFIGURATI...
22
Server parameters you’re likely to want to tweak
dataDir=<data dir> # location of database snapshots
autopurge.purgeInt...
23
Thank You
@matt_howlett
@confluentinc
Upcoming SlideShare
Loading in …5
×

Stream Processing with Apache Kafka and .NET

7,865 views

Published on

Presentation from South Bay.NET meetup on 3/30.
Speaker: Matt Howlett, Software Engineer at Confluent

Apache Kafka is a scalable streaming platform that forms a key part of the infrastructure at many companies including Uber, Netflix, Walmart, Airbnb, Goldman Sachs and LinkedIn. In this talk Matt will give a technical overview of Kafka, discuss some typical use cases (from surge pricing to fraud detection to web analytics) and show you how to use Kafka from within your C#/.NET applications.

Published in: Technology

Stream Processing with Apache Kafka and .NET

  1. 1. 1 Stream Processing with Apache KafkaTM and .NET Matt Howlett Confluent Inc.
  2. 2. 2 Agenda Some Typical Use Cases Technical Overview [break] Live Demo in C# [let’s build a massively scalable web crawler… in 30 minutes]
  3. 3. 3 Typical Use Cases
  4. 4. 4 • Application Logs { “log_level”: 7, “time”: “2017-03-03 11:45:05.737”, “consumer-id”: “rdkafka#consumer-1”, “method”: “RECV”, “addr”: “10.0.0.14:9092/0”, ”message”: “Received HeartbeatResponse (v0, 2 bytes, CorrId 8, rrt 0.00ms) } Analytics • Click / Meta Event Data { “ip”: “192.168.0.43”, “time”: “2017-03-03 11:45:05.737”, “user_id”: 7423653, ”product_id”: 62345334, “page”: “product.detail”, “data”: “32da—bfe89-116ac” }
  5. 5. 5 192.168.1.13 - - [23/Aug/2010:03:50:59 +0000] "POST /wordpress3/wp-admin/admin-ajax.php HTTP/1.1" 200 2 "http://www.example.com/wordpress3/wp-admin/post-new.php" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_4; en-US) AppleWebKit/534.3 (KHTML, like Gecko) Chrome/6.0.472.25 Safari/534.3" • Web Server Logs Stack Trace: at Confluent.Kafka.IntegrationTests.Tests.ConsumeMessage(Consumer consumer, Message`2 dr, String testString) in /git/confluent-kafka-dotnet/test/Confluent.Kafka.IntegrationTests/Tests/SimpleProduceConsume.cs:line 72 at Confluent.Kafka.IntegrationTests.Tests.SimpleProduceConsume(String bootstrapServers, String topic, String partitionedTopic) in /git/confluent-kafka-dotnet/test/Confluent.Kafka.IntegrationTests/Tests/SimpleProduceConsume.cs:line 65 • Stack Traces
  6. 6. 6 Log Analytics v1.0 Log files ETL tool
  7. 7. 7 Potential Problems - Spikes in usage - Real world applications often have non-uniform usage patterns - Want to avoid huge over-provisioning - Upgrades / outages - What if you want to do something else with the data? - What if you want to adopt something other than elastic search? Missed Opportunities
  8. 8. 8 Log Analytics v2 Kafka connect Kafka Kafka connect Log files
  9. 9. 9 + Alerting + Fraud/Spam Detection Kafka Connect Kafka Kafka Connect Log files User Info IP Addr. Info fraud detection stream processor alerting
  10. 10. 10 kafka DWH search stream processingapps K/V monitoring real-time analytics Hadoop rdbms Before you know it:
  11. 11. 11 • Central to architecture at many companies • Across industries
  12. 12. 12 Technical Overview
  13. 13. 13
  14. 14. 14 ● Persisted ● Append only ● Immutable ● Delete earliest data based on time / size / never
  15. 15. 15 • Allows topics to scale past constraints of single server • Message → partition_id deterministic. Partitioning relevant to application. • Ordering guarantees per partition but not across partitions
  16. 16. 16 Apache Kafka Replication • cheap durability! • choose # acks for message produced confirmation
  17. 17. 17 Apache Kafka Consumer Groups Partitions are spread across brokers
  18. 18. 18
  19. 19. 19 Discount code: kafcom17 Use the Apache Kafka community discount code to get $50 off www.kafka-summit.org Kafka Summit New York: May 8 Kafka Summit San Francisco: August 28 Presented by
  20. 20. 20 Live Demo
  21. 21. 21 Basic Operation Links https://www.confluent.io/download/ https://github.com/edenhill/librdkafka/blob/master/CONFIGURATION.md https://github.com/mhowlett/south-bay-dotnet Starting ./bin/zookeeper-server-start ./etc/kafka/zookeeper.properties ./bin/kafka-server-start ./etc/kafka/server.properties Create Topics ./bin/kafka-topics –zookeeper localhost:2181 --create --topic url-queue --partitions 12 --replication-factor 1 ./bin/kafka-topics –zookeeper localhost:2181 --create --topic pages --partitions 12 --replication-factor 1 List High Watermark Offsets ./bin/kafka-run-class kafka.tools.GetOffsetShell --broker-list localhost:9092 --topic pages --time -1
  22. 22. 22 Server parameters you’re likely to want to tweak dataDir=<data dir> # location of database snapshots autopurge.purgeInterval=12 # time interval in hours for which purge task triggered (default: no purge) Kafka Zookeeper Low Memory log.dir=<data dir> # location of kafka log data auto.create.topics.enable=false # whether or not topics are auto-create when referenced if don’t exist delete.topic.enable=true # topics cannot be deleted unless this is set log.retention.hours=1000000 # ~infinite retention log.cleaner.dedupe.buffer.size=20000000 # pre-allocated compaction buffer size (bytes) KAFKA_HEAP_OPTS="-Xmx128M -Xms128M” ./bin/kafka-server-start server.properties KAFKA_HEAP_OPTS="-Xmx64M –Xms64M” ./bin/zookeeper-server-start zookeeper.properties
  23. 23. 23 Thank You @matt_howlett @confluentinc

×