The document discusses how the Siphon team at Microsoft uses Kafka Connect for large-scale data integration. It describes some of the problems with Kafka Connect that led Siphon to make improvements around encryption, auditing, logging, metrics, and customizing the data pipeline. It then outlines how Siphon operates Kafka Connect at scale, including through self-serve APIs, Docker deployment, and monitoring systems.
Breaking the Kubernetes Kill Chain: Host Path Mount
Ren cao kafka connect
1. July 12, 2018
Ren Cao, Siphon Team, Microsoft
Kafka Connect in a Docker Container at Scale
2. • Why did we choose KConnect?
- Some background
- Our problems before KConnect
• Why KConnect is not good enough for us?
- Encryption, Auditing, Logging, Metrics and Pipeline
- Integration
• How do we operate KConnect at Siphon?
- a picture of Siphon
- user experience: self-serve
- our experience: build, deployment, monitoring and alerting
Content
3. • Why did we choose KConnect?
- Some background
- Our problems before KConnect
• Why KConnect is not good enough for us?
- Encryption, Auditing, Logging, Metrics and Pipeline
- Integration
• How do we operate KConnect at Siphon?
- a picture of Siphon
- user experience: self-serve
- our experience: build, deployment, monitoring and alerting
Content
4. KConnect Background
• A framework included in Kafka since 0.9
• Move data between Kafka and other systems scalably and reliably.
• Source Connectors import. Sink Connectors export.
• 100+ opensource connectors: S3, MySQL, MongoDB, HDFS, JDBC, Redis, Elasticsearch, Cassandra…
Pic: Confluent.io
5. How KConnect Works
• Start a KConnect cluster
• Implement / Choose an existing SinkConnector
• Create a connector of N tasks (based on your topic volume)
• Manage the connector through KConnect REST API and JMX counters
6. How KConnect Works
Under the hood…
• A KConnect cluster consists of
“Worker” processes
• KConnect is still using the traditional
HLC model
• A “Connector” is a “Consumer-group”
• A “Task” is a “Consumer” object
• Performance and rebalance rules is
the same in KConnect
• The master Worker will assign all the
tasks in cluster evenly to each
Worker
7. Siphon Background
We are the connector team at Microsoft:
• host 1000+ connectors for 100+ customers
• 24 x 7, SLA in 30s.
• 50 clusters, peak volume 20 GB/s
• don’t do much processing
• currently internal, aim to be public in future
8. C0
C1
C2
C3
C4
Our Problems before KConnect…
• Our own C# client is based on Protocol 0.8, which needs to upgrade
• Our other services are on platform “Java + Docker + Linux”
• We need to implement a new uploader for every new destination
• Uneven consumer partition assignment:
P0
P1
P2
P3
P4
P5
P6
C0
C1
C2
C3
C4
P0
P1
P2
C0
C1
C2
C3
C4
P0
P1
P2
P3
P4
Consumer Assigned Partitions
C0 T0P0 T1P0 T2P0 T2P5
C1 T0P1 T1P1 T2P1 T2P6
C2 T0P2 T1P2 T2P2
C3 T0P3 T2P3
C4 T0P4 T2P4
T0 T1 T2
9. • Why did we choose KConnect?
- Some background
- Our problems before KConnect
• Why KConnect is not good enough for us?
- Encryption, Auditing, Logging, Metrics and Pipeline
- Integration
• How do we operate KConnect at Siphon?
- a picture of Siphon
- user experience: self-serve
- our experience: build, deployment, monitoring and alerting
Content
16. Our Improvement - Metrics
Customer Original
Message Creation
Send to Producer Received by Consumer
Upload to
Destination
AuditTrail Finish
pickup latency
SDK to Connect latency
parse, process, adapt, upload
latency
Siphon latency
Auditing latency
E2E latency
At Siphon we care about latency a lot. Other metrics including lag, data volume, etc.
18. Our Improvement - Pipeline
Parse Process Adapt Upload DestinationKafka
Transfer source format
to SiphonRecord
Processor chain: filtering, sampling,
repartitioning, buffering, etc
Transfer SiphonRecord
to destination format
Final uploading step
21. • Why did we choose KConnect?
- Some background
- Our problems before KConnect
• Why KConnect is not good enough for us?
- Encryption, Auditing, Logging, Metrics and Pipeline
- Integration
• How do we operate KConnect at Siphon?
- a picture of Siphon
- user experience: self-serve
- our experience: build, deployment, monitoring and alerting
Content
22. A pic of Siphon
Rest Proxy
(EventServer)
Push data
Pull data
Kafka
data
data
Kafka Connect
data
Cosmos
S3
MySQL
Sync-Services
New topic config New connector config
Self-Serve API DB
Siphon UI
Monitoring Services
Monitoring System (Jarvis)
Tenant, Topic and Subscription Management
23. KConnect Operation – User experience
At Siphon, we supports Self-serve
connector management.
24. KConnect Operation – User experience
Customers could monitor their topic/subscription through UI.
25. KConnect Operation - Build
We use Visual Studio Team Service for code management and build pipeline.
26. KConnect Operation - Deployment
We use Azure Container Registries for Docker image storage and version control
27. KConnect Operation - Deployment
We use ExpressV2 for KConnect cluster deployment / upgrade.
28. KConnect Operation - Deployment
Most Siphon services are running in Docker, within Mesos for orchestration.
29. KConnect Operation - Monitoring
We use Jarvis as our monitoring system for cluster health.
30. • Why did we choose KConnect?
- Some background
- Our problems before KConnect
• Why KConnect is not good enough for us?
- Encryption, Auditing, Logging, Metrics and Pipeline
- Integration
• How do we operate KConnect at Siphon?
- a picture of Siphon
- user experience: self-serve
- our experience: build, deployment, monitoring and alerting
Content