Ren cao kafka connect

July 12, 2018
Ren Cao, Siphon Team, Microsoft
Kafka Connect in a Docker Container at Scale

• Why did we choose KConnect?
- Some background
- Our problems before KConnect
• Why KConnect is not good enough for us?
- Encryption, Auditing, Logging, Metrics and Pipeline
- Integration
• How do we operate KConnect at Siphon?
- a picture of Siphon
- user experience: self-serve
- our experience: build, deployment, monitoring and alerting
Content

KConnect Background
• A framework included in Kafka since 0.9
• Move data between Kafka and other systems scalably and reliably.
• Source Connectors import. Sink Connectors export.
• 100+ opensource connectors: S3, MySQL, MongoDB, HDFS, JDBC, Redis, Elasticsearch, Cassandra…
Pic: Confluent.io

How KConnect Works
• Start a KConnect cluster
• Implement / Choose an existing SinkConnector
• Create a connector of N tasks (based on your topic volume)
• Manage the connector through KConnect REST API and JMX counters

How KConnect Works
Under the hood…
• A KConnect cluster consists of
“Worker” processes
• KConnect is still using the traditional
HLC model
• A “Connector” is a “Consumer-group”
• A “Task” is a “Consumer” object
• Performance and rebalance rules is
the same in KConnect
• The master Worker will assign all the
tasks in cluster evenly to each
Worker

Siphon Background
We are the connector team at Microsoft:
• host 1000+ connectors for 100+ customers
• 24 x 7, SLA in 30s.
• 50 clusters, peak volume 20 GB/s
• don’t do much processing
• currently internal, aim to be public in future

C0
C1
C2
C3
C4
Our Problems before KConnect…
• Our own C# client is based on Protocol 0.8, which needs to upgrade
• Our other services are on platform “Java + Docker + Linux”
• We need to implement a new uploader for every new destination
• Uneven consumer partition assignment:
P0
P1
P2
P3
P4
P5
P6
C0
C1
C2
C3
C4
P0
P1
P2
C0
C1
C2
C3
C4
P0
P1
P2
P3
P4
Consumer Assigned Partitions
C0 T0P0 T1P0 T2P0 T2P5
C1 T0P1 T1P1 T2P1 T2P6
C2 T0P2 T1P2 T2P2
C3 T0P3 T2P3
C4 T0P4 T2P4
T0 T1 T2

Our Improvement - Encryption
Disk
Kafka
data data

Our Improvement - Encryption
Disk
Kafka
Encrypted
data
Producer
(Encryption)
Encrypted
data
Consumer
(Decryption)

Our Improvement - Auditing
Siphon uses AuditTrail to monitor data loss for important customers.

Our Improvement - Logging
KConnect writes all logs in a single file, which is hard to search when you have tens of connectors.

Our Improvement - Logging
We separated connector-level logs, added more dimensions, and published them to Kusto.

Our Improvement - Metrics
KConnect version 1.0+ provides detailed build-in metrics.
Framework-level Metrics
Connect-metrics
Connect-node-metrics
Connect-coordinator-metrics
Connect-worker-metrics
Connect-worker-rebalance-metrics
Connector-level Metrics
Connector-metrics
Connector-task-metrics
Consumer-level Metrics
Consumer-metrics
Consumer-node-metrics
Consumer-coordinator-metrics
Consumer-fetch-manager-metrics

Our Improvement - Metrics
Customer Original
Message Creation
Send to Producer Received by Consumer
Upload to
Destination
AuditTrail Finish
pickup latency
SDK to Connect latency
parse, process, adapt, upload
latency
Siphon latency
Auditing latency
E2E latency
At Siphon we care about latency a lot. Other metrics including lag, data volume, etc.

{
"topics": “my_topic",
"tasks.max": "5",
"connector.class": "com.microsoft.siphon.sink.HttpConnector",
"http.connection.string": “http://abc.xyz",
}
Our Improvement - Pipeline

Parse Process Adapt Upload DestinationKafka
Transfer source format
to SiphonRecord
Processor chain: filtering, sampling,
repartitioning, buffering, etc
Transfer SiphonRecord
to destination format
Final uploading step

{
"topics": “my_topic",
"tasks.max": "5",
"siphon.decryption.enabled": "true",
"siphon.parser.type": “EvrParser",
"siphon.parser.extract.fields": “true",
"siphon.parser.forward.original": “true",
"siphon.processors": “proc1,proc2",
"siphon.processors.proc1.type": “FilteringProcessor",
"siphon.processors.proc1.filter.condition": "exists("AAA") ",
"siphon.processors.proc2.type": "BufferingProcessor",
"siphon.processors.proc2.buffer.size": "1024000",
"siphon.adapter.type": "PassthroughAdapter",
"connector.class": "com.microsoft.siphon.sink.HttpConnector",
"http.connection.string": "http://abc.xyz",
"http.certificate.name": "siphonkat"
}

Kafka Connect Framework
Integration
com.microsoft.siphon.sink.HttpConnector
Uploader:
com.microsoft.
connectors.sink.HttpConnector
Used by Siphon
Used by 3rd parties
Keep connectors public.
Keep our bussiness internal.
SiphonSinkConnecor
Decryption, Auditing,
Logging, Internal Metrics
Parser, Processor, Adapter

A pic of Siphon
Rest Proxy
(EventServer)
Push data
Pull data
Kafka
data
data
Kafka Connect
data
Cosmos
S3
MySQL
Sync-Services
New topic config New connector config
Self-Serve API DB
Siphon UI
Monitoring Services
Monitoring System (Jarvis)
Tenant, Topic and Subscription Management

KConnect Operation – User experience
At Siphon, we supports Self-serve
connector management.

KConnect Operation – User experience
Customers could monitor their topic/subscription through UI.

KConnect Operation - Build
We use Visual Studio Team Service for code management and build pipeline.

KConnect Operation - Deployment
We use Azure Container Registries for Docker image storage and version control

We use ExpressV2 for KConnect cluster deployment / upgrade.

Most Siphon services are running in Docker, within Mesos for orchestration.

KConnect Operation - Monitoring
We use Jarvis as our monitoring system for cluster health.

July 12, 2018
Ren Cao, Siphon Team, Microsoft
Thank you

Ren cao kafka connect

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Ren cao kafka connect

Similar to Ren cao kafka connect (20)

More from Nitin Kumar

More from Nitin Kumar (16)

Recently uploaded

Recently uploaded (20)

Ren cao kafka connect