SlideShare a Scribd company logo
1 of 31
July 12, 2018
Ren Cao, Siphon Team, Microsoft
Kafka Connect in a Docker Container at Scale
• Why did we choose KConnect?
- Some background
- Our problems before KConnect
• Why KConnect is not good enough for us?
- Encryption, Auditing, Logging, Metrics and Pipeline
- Integration
• How do we operate KConnect at Siphon?
- a picture of Siphon
- user experience: self-serve
- our experience: build, deployment, monitoring and alerting
Content
• Why did we choose KConnect?
- Some background
- Our problems before KConnect
• Why KConnect is not good enough for us?
- Encryption, Auditing, Logging, Metrics and Pipeline
- Integration
• How do we operate KConnect at Siphon?
- a picture of Siphon
- user experience: self-serve
- our experience: build, deployment, monitoring and alerting
Content
KConnect Background
• A framework included in Kafka since 0.9
• Move data between Kafka and other systems scalably and reliably.
• Source Connectors import. Sink Connectors export.
• 100+ opensource connectors: S3, MySQL, MongoDB, HDFS, JDBC, Redis, Elasticsearch, Cassandra…
Pic: Confluent.io
How KConnect Works
• Start a KConnect cluster
• Implement / Choose an existing SinkConnector
• Create a connector of N tasks (based on your topic volume)
• Manage the connector through KConnect REST API and JMX counters
How KConnect Works
Under the hood…
• A KConnect cluster consists of
“Worker” processes
• KConnect is still using the traditional
HLC model
• A “Connector” is a “Consumer-group”
• A “Task” is a “Consumer” object
• Performance and rebalance rules is
the same in KConnect
• The master Worker will assign all the
tasks in cluster evenly to each
Worker
Siphon Background
We are the connector team at Microsoft:
• host 1000+ connectors for 100+ customers
• 24 x 7, SLA in 30s.
• 50 clusters, peak volume 20 GB/s
• don’t do much processing
• currently internal, aim to be public in future
C0
C1
C2
C3
C4
Our Problems before KConnect…
• Our own C# client is based on Protocol 0.8, which needs to upgrade
• Our other services are on platform “Java + Docker + Linux”
• We need to implement a new uploader for every new destination
• Uneven consumer partition assignment:
P0
P1
P2
P3
P4
P5
P6
C0
C1
C2
C3
C4
P0
P1
P2
C0
C1
C2
C3
C4
P0
P1
P2
P3
P4
Consumer Assigned Partitions
C0 T0P0 T1P0 T2P0 T2P5
C1 T0P1 T1P1 T2P1 T2P6
C2 T0P2 T1P2 T2P2
C3 T0P3 T2P3
C4 T0P4 T2P4
T0 T1 T2
• Why did we choose KConnect?
- Some background
- Our problems before KConnect
• Why KConnect is not good enough for us?
- Encryption, Auditing, Logging, Metrics and Pipeline
- Integration
• How do we operate KConnect at Siphon?
- a picture of Siphon
- user experience: self-serve
- our experience: build, deployment, monitoring and alerting
Content
Our Improvement - Encryption
Disk
Kafka
data data
Our Improvement - Encryption
Disk
Kafka
Encrypted
data
Producer
(Encryption)
Encrypted
data
Consumer
(Decryption)
Our Improvement - Auditing
Siphon uses AuditTrail to monitor data loss for important customers.
Our Improvement - Logging
KConnect writes all logs in a single file, which is hard to search when you have tens of connectors.
Our Improvement - Logging
We separated connector-level logs, added more dimensions, and published them to Kusto.
Our Improvement - Metrics
KConnect version 1.0+ provides detailed build-in metrics.
Framework-level Metrics
Connect-metrics
Connect-node-metrics
Connect-coordinator-metrics
Connect-worker-metrics
Connect-worker-rebalance-metrics
Connector-level Metrics
Connector-metrics
Connector-task-metrics
Consumer-level Metrics
Consumer-metrics
Consumer-node-metrics
Consumer-coordinator-metrics
Consumer-fetch-manager-metrics
Our Improvement - Metrics
Customer Original
Message Creation
Send to Producer Received by Consumer
Upload to
Destination
AuditTrail Finish
pickup latency
SDK to Connect latency
parse, process, adapt, upload
latency
Siphon latency
Auditing latency
E2E latency
At Siphon we care about latency a lot. Other metrics including lag, data volume, etc.
{
"topics": “my_topic",
"tasks.max": "5",
"connector.class": "com.microsoft.siphon.sink.HttpConnector",
"http.connection.string": “http://abc.xyz",
}
Our Improvement - Pipeline
Our Improvement - Pipeline
Parse Process Adapt Upload DestinationKafka
Transfer source format
to SiphonRecord
Processor chain: filtering, sampling,
repartitioning, buffering, etc
Transfer SiphonRecord
to destination format
Final uploading step
Our Improvement - Pipeline
{
"topics": “my_topic",
"tasks.max": "5",
"siphon.decryption.enabled": "true",
"siphon.parser.type": “EvrParser",
"siphon.parser.extract.fields": “true",
"siphon.parser.forward.original": “true",
"siphon.processors": “proc1,proc2",
"siphon.processors.proc1.type": “FilteringProcessor",
"siphon.processors.proc1.filter.condition": "exists("AAA") ",
"siphon.processors.proc2.type": "BufferingProcessor",
"siphon.processors.proc2.buffer.size": "1024000",
"siphon.adapter.type": "PassthroughAdapter",
"connector.class": "com.microsoft.siphon.sink.HttpConnector",
"http.connection.string": "http://abc.xyz",
"http.certificate.name": "siphonkat"
}
Kafka Connect Framework
Integration
com.microsoft.siphon.sink.HttpConnector
Uploader:
com.microsoft.
connectors.sink.HttpConnector
Used by Siphon
Used by 3rd parties
Keep connectors public.
Keep our bussiness internal.
SiphonSinkConnecor
Decryption, Auditing,
Logging, Internal Metrics
Parser, Processor, Adapter
• Why did we choose KConnect?
- Some background
- Our problems before KConnect
• Why KConnect is not good enough for us?
- Encryption, Auditing, Logging, Metrics and Pipeline
- Integration
• How do we operate KConnect at Siphon?
- a picture of Siphon
- user experience: self-serve
- our experience: build, deployment, monitoring and alerting
Content
A pic of Siphon
Rest Proxy
(EventServer)
Push data
Pull data
Kafka
data
data
Kafka Connect
data
Cosmos
S3
MySQL
Sync-Services
New topic config New connector config
Self-Serve API DB
Siphon UI
Monitoring Services
Monitoring System (Jarvis)
Tenant, Topic and Subscription Management
KConnect Operation – User experience
At Siphon, we supports Self-serve
connector management.
KConnect Operation – User experience
Customers could monitor their topic/subscription through UI.
KConnect Operation - Build
We use Visual Studio Team Service for code management and build pipeline.
KConnect Operation - Deployment
We use Azure Container Registries for Docker image storage and version control
KConnect Operation - Deployment
We use ExpressV2 for KConnect cluster deployment / upgrade.
KConnect Operation - Deployment
Most Siphon services are running in Docker, within Mesos for orchestration.
KConnect Operation - Monitoring
We use Jarvis as our monitoring system for cluster health.
• Why did we choose KConnect?
- Some background
- Our problems before KConnect
• Why KConnect is not good enough for us?
- Encryption, Auditing, Logging, Metrics and Pipeline
- Integration
• How do we operate KConnect at Siphon?
- a picture of Siphon
- user experience: self-serve
- our experience: build, deployment, monitoring and alerting
Content
July 12, 2018
Ren Cao, Siphon Team, Microsoft
Thank you

More Related Content

What's hot

Blue host openstacksummit_2013
Blue host openstacksummit_2013Blue host openstacksummit_2013
Blue host openstacksummit_2013Jun Park
 
Design and Implementation of Incremental Cooperative Rebalancing
Design and Implementation of Incremental Cooperative RebalancingDesign and Implementation of Incremental Cooperative Rebalancing
Design and Implementation of Incremental Cooperative Rebalancingconfluent
 
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...HostedbyConfluent
 
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the FieldKafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Fieldconfluent
 
The Good, The Bad, and The Avro (Graham Stirling, Saxo Bank and David Navalho...
The Good, The Bad, and The Avro (Graham Stirling, Saxo Bank and David Navalho...The Good, The Bad, and The Avro (Graham Stirling, Saxo Bank and David Navalho...
The Good, The Bad, and The Avro (Graham Stirling, Saxo Bank and David Navalho...confluent
 
... No it's Apache Kafka!
... No it's Apache Kafka!... No it's Apache Kafka!
... No it's Apache Kafka!makker_nl
 
Deploying Flink on Kubernetes - David Anderson
 Deploying Flink on Kubernetes - David Anderson Deploying Flink on Kubernetes - David Anderson
Deploying Flink on Kubernetes - David AndersonVerverica
 
Modular Architectures using Micro Services
Modular Architectures using Micro ServicesModular Architectures using Micro Services
Modular Architectures using Micro ServicesMarcel Offermans
 
A Modern C++ Kafka API | Kenneth Jia, Morgan Stanley
A Modern C++ Kafka API | Kenneth Jia, Morgan StanleyA Modern C++ Kafka API | Kenneth Jia, Morgan Stanley
A Modern C++ Kafka API | Kenneth Jia, Morgan StanleyHostedbyConfluent
 
Troubleshooting Kafka's socket server: from incident to resolution
Troubleshooting Kafka's socket server: from incident to resolutionTroubleshooting Kafka's socket server: from incident to resolution
Troubleshooting Kafka's socket server: from incident to resolutionJoel Koshy
 
Camel Kafka Connectors: Tune Kafka to “Speak” with (Almost) Everything (Andre...
Camel Kafka Connectors: Tune Kafka to “Speak” with (Almost) Everything (Andre...Camel Kafka Connectors: Tune Kafka to “Speak” with (Almost) Everything (Andre...
Camel Kafka Connectors: Tune Kafka to “Speak” with (Almost) Everything (Andre...HostedbyConfluent
 
Building High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in KafkaBuilding High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in Kafkaconfluent
 
Moving Gigantic Files Into and Out of the Alfresco Repository
Moving Gigantic Files Into and Out of the Alfresco RepositoryMoving Gigantic Files Into and Out of the Alfresco Repository
Moving Gigantic Files Into and Out of the Alfresco RepositoryJeff Potts
 
Let the alpakka pull your stream
Let the alpakka pull your streamLet the alpakka pull your stream
Let the alpakka pull your streamEnno Runne
 
OSMC 2010 | Monitoring mit Icinga by Icinga Team
OSMC 2010 | Monitoring mit Icinga by Icinga TeamOSMC 2010 | Monitoring mit Icinga by Icinga Team
OSMC 2010 | Monitoring mit Icinga by Icinga TeamNETWAYS
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsC4Media
 
Pulsar Functions Deep Dive_Sanjeev kulkarni
Pulsar Functions Deep Dive_Sanjeev kulkarniPulsar Functions Deep Dive_Sanjeev kulkarni
Pulsar Functions Deep Dive_Sanjeev kulkarniStreamNative
 
OSGi for real in the enterprise: Apache Karaf - NLJUG J-FALL 2010
OSGi for real in the enterprise: Apache Karaf - NLJUG J-FALL 2010OSGi for real in the enterprise: Apache Karaf - NLJUG J-FALL 2010
OSGi for real in the enterprise: Apache Karaf - NLJUG J-FALL 2010Adrian Trenaman
 
How to Reduce Database Load with Sparse Branches
How to Reduce Database Load with Sparse BranchesHow to Reduce Database Load with Sparse Branches
How to Reduce Database Load with Sparse BranchesPerforce
 
A la rencontre de Kafka, le log distribué par Florian GARCIA
A la rencontre de Kafka, le log distribué par Florian GARCIAA la rencontre de Kafka, le log distribué par Florian GARCIA
A la rencontre de Kafka, le log distribué par Florian GARCIALa Cuisine du Web
 

What's hot (20)

Blue host openstacksummit_2013
Blue host openstacksummit_2013Blue host openstacksummit_2013
Blue host openstacksummit_2013
 
Design and Implementation of Incremental Cooperative Rebalancing
Design and Implementation of Incremental Cooperative RebalancingDesign and Implementation of Incremental Cooperative Rebalancing
Design and Implementation of Incremental Cooperative Rebalancing
 
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...
 
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the FieldKafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
 
The Good, The Bad, and The Avro (Graham Stirling, Saxo Bank and David Navalho...
The Good, The Bad, and The Avro (Graham Stirling, Saxo Bank and David Navalho...The Good, The Bad, and The Avro (Graham Stirling, Saxo Bank and David Navalho...
The Good, The Bad, and The Avro (Graham Stirling, Saxo Bank and David Navalho...
 
... No it's Apache Kafka!
... No it's Apache Kafka!... No it's Apache Kafka!
... No it's Apache Kafka!
 
Deploying Flink on Kubernetes - David Anderson
 Deploying Flink on Kubernetes - David Anderson Deploying Flink on Kubernetes - David Anderson
Deploying Flink on Kubernetes - David Anderson
 
Modular Architectures using Micro Services
Modular Architectures using Micro ServicesModular Architectures using Micro Services
Modular Architectures using Micro Services
 
A Modern C++ Kafka API | Kenneth Jia, Morgan Stanley
A Modern C++ Kafka API | Kenneth Jia, Morgan StanleyA Modern C++ Kafka API | Kenneth Jia, Morgan Stanley
A Modern C++ Kafka API | Kenneth Jia, Morgan Stanley
 
Troubleshooting Kafka's socket server: from incident to resolution
Troubleshooting Kafka's socket server: from incident to resolutionTroubleshooting Kafka's socket server: from incident to resolution
Troubleshooting Kafka's socket server: from incident to resolution
 
Camel Kafka Connectors: Tune Kafka to “Speak” with (Almost) Everything (Andre...
Camel Kafka Connectors: Tune Kafka to “Speak” with (Almost) Everything (Andre...Camel Kafka Connectors: Tune Kafka to “Speak” with (Almost) Everything (Andre...
Camel Kafka Connectors: Tune Kafka to “Speak” with (Almost) Everything (Andre...
 
Building High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in KafkaBuilding High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in Kafka
 
Moving Gigantic Files Into and Out of the Alfresco Repository
Moving Gigantic Files Into and Out of the Alfresco RepositoryMoving Gigantic Files Into and Out of the Alfresco Repository
Moving Gigantic Files Into and Out of the Alfresco Repository
 
Let the alpakka pull your stream
Let the alpakka pull your streamLet the alpakka pull your stream
Let the alpakka pull your stream
 
OSMC 2010 | Monitoring mit Icinga by Icinga Team
OSMC 2010 | Monitoring mit Icinga by Icinga TeamOSMC 2010 | Monitoring mit Icinga by Icinga Team
OSMC 2010 | Monitoring mit Icinga by Icinga Team
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java Applications
 
Pulsar Functions Deep Dive_Sanjeev kulkarni
Pulsar Functions Deep Dive_Sanjeev kulkarniPulsar Functions Deep Dive_Sanjeev kulkarni
Pulsar Functions Deep Dive_Sanjeev kulkarni
 
OSGi for real in the enterprise: Apache Karaf - NLJUG J-FALL 2010
OSGi for real in the enterprise: Apache Karaf - NLJUG J-FALL 2010OSGi for real in the enterprise: Apache Karaf - NLJUG J-FALL 2010
OSGi for real in the enterprise: Apache Karaf - NLJUG J-FALL 2010
 
How to Reduce Database Load with Sparse Branches
How to Reduce Database Load with Sparse BranchesHow to Reduce Database Load with Sparse Branches
How to Reduce Database Load with Sparse Branches
 
A la rencontre de Kafka, le log distribué par Florian GARCIA
A la rencontre de Kafka, le log distribué par Florian GARCIAA la rencontre de Kafka, le log distribué par Florian GARCIA
A la rencontre de Kafka, le log distribué par Florian GARCIA
 

Similar to Ren cao kafka connect

The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
 The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ... The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...Josef Adersberger
 
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...QAware GmbH
 
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...confluent
 
Laying the Foundation for Ionic Platform Insights on Spark
Laying the Foundation for Ionic Platform Insights on SparkLaying the Foundation for Ionic Platform Insights on Spark
Laying the Foundation for Ionic Platform Insights on SparkIonic Security
 
Introducing Kafka's Streams API
Introducing Kafka's Streams APIIntroducing Kafka's Streams API
Introducing Kafka's Streams APIconfluent
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data EngineeringC4Media
 
Cloud: From Unmanned Data Center to Algorithmic Economy using Openstack
Cloud: From Unmanned Data Center to Algorithmic Economy using OpenstackCloud: From Unmanned Data Center to Algorithmic Economy using Openstack
Cloud: From Unmanned Data Center to Algorithmic Economy using OpenstackAndrew Yongjoon Kong
 
Apache Spark™ + IBM Watson + Twitter DataPalooza SF 2015
Apache Spark™ + IBM Watson + Twitter DataPalooza SF 2015Apache Spark™ + IBM Watson + Twitter DataPalooza SF 2015
Apache Spark™ + IBM Watson + Twitter DataPalooza SF 2015Mike Broberg
 
How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.Renzo Tomà
 
Modern application development with oracle cloud sangam17
Modern application development with oracle cloud sangam17Modern application development with oracle cloud sangam17
Modern application development with oracle cloud sangam17Vinay Kumar
 
Envoy @ Lyft: developer productivity (kubecon 2.0)
Envoy @ Lyft: developer productivity (kubecon 2.0)Envoy @ Lyft: developer productivity (kubecon 2.0)
Envoy @ Lyft: developer productivity (kubecon 2.0)Jose Ulises Nino Rivera
 
Web Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC ProjectWeb Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC ProjectSaltlux Inc.
 
Monitoring Akka with Kamon 1.0
Monitoring Akka with Kamon 1.0Monitoring Akka with Kamon 1.0
Monitoring Akka with Kamon 1.0Steffen Gebert
 
Matrix.org decentralised communication, Matthew Hodgson, TADSummit
Matrix.org decentralised communication, Matthew Hodgson, TADSummitMatrix.org decentralised communication, Matthew Hodgson, TADSummit
Matrix.org decentralised communication, Matthew Hodgson, TADSummitAlan Quayle
 
[DLHacks]Introduction to ChainerCV
[DLHacks]Introduction to ChainerCV[DLHacks]Introduction to ChainerCV
[DLHacks]Introduction to ChainerCVDeep Learning JP
 
(APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS r...
(APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS r...(APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS r...
(APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS r...Amazon Web Services
 
Cognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & TricksCognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & TricksSenturus
 
Docker Orchestration: Welcome to the Jungle! Devoxx & Docker Meetup Tour Nov ...
Docker Orchestration: Welcome to the Jungle! Devoxx & Docker Meetup Tour Nov ...Docker Orchestration: Welcome to the Jungle! Devoxx & Docker Meetup Tour Nov ...
Docker Orchestration: Welcome to the Jungle! Devoxx & Docker Meetup Tour Nov ...Patrick Chanezon
 
The Application Server Platform of the Future - Container & Cloud Native and ...
The Application Server Platform of the Future - Container & Cloud Native and ...The Application Server Platform of the Future - Container & Cloud Native and ...
The Application Server Platform of the Future - Container & Cloud Native and ...Lucas Jellema
 

Similar to Ren cao kafka connect (20)

The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
 The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ... The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
 
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
 
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
 
Laying the Foundation for Ionic Platform Insights on Spark
Laying the Foundation for Ionic Platform Insights on SparkLaying the Foundation for Ionic Platform Insights on Spark
Laying the Foundation for Ionic Platform Insights on Spark
 
Introducing Kafka's Streams API
Introducing Kafka's Streams APIIntroducing Kafka's Streams API
Introducing Kafka's Streams API
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Cloud: From Unmanned Data Center to Algorithmic Economy using Openstack
Cloud: From Unmanned Data Center to Algorithmic Economy using OpenstackCloud: From Unmanned Data Center to Algorithmic Economy using Openstack
Cloud: From Unmanned Data Center to Algorithmic Economy using Openstack
 
Apache Spark™ + IBM Watson + Twitter DataPalooza SF 2015
Apache Spark™ + IBM Watson + Twitter DataPalooza SF 2015Apache Spark™ + IBM Watson + Twitter DataPalooza SF 2015
Apache Spark™ + IBM Watson + Twitter DataPalooza SF 2015
 
How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.
 
Modern application development with oracle cloud sangam17
Modern application development with oracle cloud sangam17Modern application development with oracle cloud sangam17
Modern application development with oracle cloud sangam17
 
Envoy @ Lyft: developer productivity (kubecon 2.0)
Envoy @ Lyft: developer productivity (kubecon 2.0)Envoy @ Lyft: developer productivity (kubecon 2.0)
Envoy @ Lyft: developer productivity (kubecon 2.0)
 
Web Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC ProjectWeb Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC Project
 
Envoy @ Lyft: Developer Productivity
Envoy @ Lyft: Developer ProductivityEnvoy @ Lyft: Developer Productivity
Envoy @ Lyft: Developer Productivity
 
Monitoring Akka with Kamon 1.0
Monitoring Akka with Kamon 1.0Monitoring Akka with Kamon 1.0
Monitoring Akka with Kamon 1.0
 
Matrix.org decentralised communication, Matthew Hodgson, TADSummit
Matrix.org decentralised communication, Matthew Hodgson, TADSummitMatrix.org decentralised communication, Matthew Hodgson, TADSummit
Matrix.org decentralised communication, Matthew Hodgson, TADSummit
 
[DLHacks]Introduction to ChainerCV
[DLHacks]Introduction to ChainerCV[DLHacks]Introduction to ChainerCV
[DLHacks]Introduction to ChainerCV
 
(APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS r...
(APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS r...(APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS r...
(APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS r...
 
Cognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & TricksCognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & Tricks
 
Docker Orchestration: Welcome to the Jungle! Devoxx & Docker Meetup Tour Nov ...
Docker Orchestration: Welcome to the Jungle! Devoxx & Docker Meetup Tour Nov ...Docker Orchestration: Welcome to the Jungle! Devoxx & Docker Meetup Tour Nov ...
Docker Orchestration: Welcome to the Jungle! Devoxx & Docker Meetup Tour Nov ...
 
The Application Server Platform of the Future - Container & Cloud Native and ...
The Application Server Platform of the Future - Container & Cloud Native and ...The Application Server Platform of the Future - Container & Cloud Native and ...
The Application Server Platform of the Future - Container & Cloud Native and ...
 

More from Nitin Kumar

Deep learning with kafka
Deep learning with kafkaDeep learning with kafka
Deep learning with kafkaNitin Kumar
 
2019 04 seattle_meetup___kafka_machine_learning___kai_waehner
2019 04 seattle_meetup___kafka_machine_learning___kai_waehner2019 04 seattle_meetup___kafka_machine_learning___kai_waehner
2019 04 seattle_meetup___kafka_machine_learning___kai_waehnerNitin Kumar
 
Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...
Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...
Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...Nitin Kumar
 
Processing trillions of events per day with apache
Processing trillions of events per day with apacheProcessing trillions of events per day with apache
Processing trillions of events per day with apacheNitin Kumar
 
Insta clustr seattle kafka meetup presentation bb
Insta clustr seattle kafka meetup presentation   bbInsta clustr seattle kafka meetup presentation   bb
Insta clustr seattle kafka meetup presentation bbNitin Kumar
 
EventHub for kafka ecosystems kafka meetup
EventHub for kafka ecosystems   kafka meetupEventHub for kafka ecosystems   kafka meetup
EventHub for kafka ecosystems kafka meetupNitin Kumar
 
Microsoft challenges of a multi tenant kafka service
Microsoft challenges of a multi tenant kafka serviceMicrosoft challenges of a multi tenant kafka service
Microsoft challenges of a multi tenant kafka serviceNitin Kumar
 
Net flix kafka seattle meetup
Net flix kafka seattle meetupNet flix kafka seattle meetup
Net flix kafka seattle meetupNitin Kumar
 
Brandon obrien streaming_data
Brandon obrien streaming_dataBrandon obrien streaming_data
Brandon obrien streaming_dataNitin Kumar
 
Confluent kafka meetupseattle jan2017
Confluent kafka meetupseattle jan2017Confluent kafka meetupseattle jan2017
Confluent kafka meetupseattle jan2017Nitin Kumar
 
Microsoft kafka load imbalance
Microsoft   kafka load imbalanceMicrosoft   kafka load imbalance
Microsoft kafka load imbalanceNitin Kumar
 
Map r seattle streams meetup oct 2016
Map r seattle streams meetup   oct 2016Map r seattle streams meetup   oct 2016
Map r seattle streams meetup oct 2016Nitin Kumar
 
Linked in multi tier, multi-tenant, multi-problem kafka
Linked in multi tier, multi-tenant, multi-problem kafkaLinked in multi tier, multi-tenant, multi-problem kafka
Linked in multi tier, multi-tenant, multi-problem kafkaNitin Kumar
 
Seattle kafka meetup nov 2015 published siphon
Seattle kafka meetup nov 2015 published  siphonSeattle kafka meetup nov 2015 published  siphon
Seattle kafka meetup nov 2015 published siphonNitin Kumar
 

More from Nitin Kumar (16)

Deep learning with kafka
Deep learning with kafkaDeep learning with kafka
Deep learning with kafka
 
2019 04 seattle_meetup___kafka_machine_learning___kai_waehner
2019 04 seattle_meetup___kafka_machine_learning___kai_waehner2019 04 seattle_meetup___kafka_machine_learning___kai_waehner
2019 04 seattle_meetup___kafka_machine_learning___kai_waehner
 
Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...
Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...
Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...
 
Processing trillions of events per day with apache
Processing trillions of events per day with apacheProcessing trillions of events per day with apache
Processing trillions of events per day with apache
 
Insta clustr seattle kafka meetup presentation bb
Insta clustr seattle kafka meetup presentation   bbInsta clustr seattle kafka meetup presentation   bb
Insta clustr seattle kafka meetup presentation bb
 
EventHub for kafka ecosystems kafka meetup
EventHub for kafka ecosystems   kafka meetupEventHub for kafka ecosystems   kafka meetup
EventHub for kafka ecosystems kafka meetup
 
Kafka eos
Kafka eosKafka eos
Kafka eos
 
Microsoft challenges of a multi tenant kafka service
Microsoft challenges of a multi tenant kafka serviceMicrosoft challenges of a multi tenant kafka service
Microsoft challenges of a multi tenant kafka service
 
Net flix kafka seattle meetup
Net flix kafka seattle meetupNet flix kafka seattle meetup
Net flix kafka seattle meetup
 
Avvo fkafka
Avvo fkafkaAvvo fkafka
Avvo fkafka
 
Brandon obrien streaming_data
Brandon obrien streaming_dataBrandon obrien streaming_data
Brandon obrien streaming_data
 
Confluent kafka meetupseattle jan2017
Confluent kafka meetupseattle jan2017Confluent kafka meetupseattle jan2017
Confluent kafka meetupseattle jan2017
 
Microsoft kafka load imbalance
Microsoft   kafka load imbalanceMicrosoft   kafka load imbalance
Microsoft kafka load imbalance
 
Map r seattle streams meetup oct 2016
Map r seattle streams meetup   oct 2016Map r seattle streams meetup   oct 2016
Map r seattle streams meetup oct 2016
 
Linked in multi tier, multi-tenant, multi-problem kafka
Linked in multi tier, multi-tenant, multi-problem kafkaLinked in multi tier, multi-tenant, multi-problem kafka
Linked in multi tier, multi-tenant, multi-problem kafka
 
Seattle kafka meetup nov 2015 published siphon
Seattle kafka meetup nov 2015 published  siphonSeattle kafka meetup nov 2015 published  siphon
Seattle kafka meetup nov 2015 published siphon
 

Recently uploaded

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 

Recently uploaded (20)

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

Ren cao kafka connect

  • 1. July 12, 2018 Ren Cao, Siphon Team, Microsoft Kafka Connect in a Docker Container at Scale
  • 2. • Why did we choose KConnect? - Some background - Our problems before KConnect • Why KConnect is not good enough for us? - Encryption, Auditing, Logging, Metrics and Pipeline - Integration • How do we operate KConnect at Siphon? - a picture of Siphon - user experience: self-serve - our experience: build, deployment, monitoring and alerting Content
  • 3. • Why did we choose KConnect? - Some background - Our problems before KConnect • Why KConnect is not good enough for us? - Encryption, Auditing, Logging, Metrics and Pipeline - Integration • How do we operate KConnect at Siphon? - a picture of Siphon - user experience: self-serve - our experience: build, deployment, monitoring and alerting Content
  • 4. KConnect Background • A framework included in Kafka since 0.9 • Move data between Kafka and other systems scalably and reliably. • Source Connectors import. Sink Connectors export. • 100+ opensource connectors: S3, MySQL, MongoDB, HDFS, JDBC, Redis, Elasticsearch, Cassandra… Pic: Confluent.io
  • 5. How KConnect Works • Start a KConnect cluster • Implement / Choose an existing SinkConnector • Create a connector of N tasks (based on your topic volume) • Manage the connector through KConnect REST API and JMX counters
  • 6. How KConnect Works Under the hood… • A KConnect cluster consists of “Worker” processes • KConnect is still using the traditional HLC model • A “Connector” is a “Consumer-group” • A “Task” is a “Consumer” object • Performance and rebalance rules is the same in KConnect • The master Worker will assign all the tasks in cluster evenly to each Worker
  • 7. Siphon Background We are the connector team at Microsoft: • host 1000+ connectors for 100+ customers • 24 x 7, SLA in 30s. • 50 clusters, peak volume 20 GB/s • don’t do much processing • currently internal, aim to be public in future
  • 8. C0 C1 C2 C3 C4 Our Problems before KConnect… • Our own C# client is based on Protocol 0.8, which needs to upgrade • Our other services are on platform “Java + Docker + Linux” • We need to implement a new uploader for every new destination • Uneven consumer partition assignment: P0 P1 P2 P3 P4 P5 P6 C0 C1 C2 C3 C4 P0 P1 P2 C0 C1 C2 C3 C4 P0 P1 P2 P3 P4 Consumer Assigned Partitions C0 T0P0 T1P0 T2P0 T2P5 C1 T0P1 T1P1 T2P1 T2P6 C2 T0P2 T1P2 T2P2 C3 T0P3 T2P3 C4 T0P4 T2P4 T0 T1 T2
  • 9. • Why did we choose KConnect? - Some background - Our problems before KConnect • Why KConnect is not good enough for us? - Encryption, Auditing, Logging, Metrics and Pipeline - Integration • How do we operate KConnect at Siphon? - a picture of Siphon - user experience: self-serve - our experience: build, deployment, monitoring and alerting Content
  • 10. Our Improvement - Encryption Disk Kafka data data
  • 11. Our Improvement - Encryption Disk Kafka Encrypted data Producer (Encryption) Encrypted data Consumer (Decryption)
  • 12. Our Improvement - Auditing Siphon uses AuditTrail to monitor data loss for important customers.
  • 13. Our Improvement - Logging KConnect writes all logs in a single file, which is hard to search when you have tens of connectors.
  • 14. Our Improvement - Logging We separated connector-level logs, added more dimensions, and published them to Kusto.
  • 15. Our Improvement - Metrics KConnect version 1.0+ provides detailed build-in metrics. Framework-level Metrics Connect-metrics Connect-node-metrics Connect-coordinator-metrics Connect-worker-metrics Connect-worker-rebalance-metrics Connector-level Metrics Connector-metrics Connector-task-metrics Consumer-level Metrics Consumer-metrics Consumer-node-metrics Consumer-coordinator-metrics Consumer-fetch-manager-metrics
  • 16. Our Improvement - Metrics Customer Original Message Creation Send to Producer Received by Consumer Upload to Destination AuditTrail Finish pickup latency SDK to Connect latency parse, process, adapt, upload latency Siphon latency Auditing latency E2E latency At Siphon we care about latency a lot. Other metrics including lag, data volume, etc.
  • 17. { "topics": “my_topic", "tasks.max": "5", "connector.class": "com.microsoft.siphon.sink.HttpConnector", "http.connection.string": “http://abc.xyz", } Our Improvement - Pipeline
  • 18. Our Improvement - Pipeline Parse Process Adapt Upload DestinationKafka Transfer source format to SiphonRecord Processor chain: filtering, sampling, repartitioning, buffering, etc Transfer SiphonRecord to destination format Final uploading step
  • 19. Our Improvement - Pipeline { "topics": “my_topic", "tasks.max": "5", "siphon.decryption.enabled": "true", "siphon.parser.type": “EvrParser", "siphon.parser.extract.fields": “true", "siphon.parser.forward.original": “true", "siphon.processors": “proc1,proc2", "siphon.processors.proc1.type": “FilteringProcessor", "siphon.processors.proc1.filter.condition": "exists("AAA") ", "siphon.processors.proc2.type": "BufferingProcessor", "siphon.processors.proc2.buffer.size": "1024000", "siphon.adapter.type": "PassthroughAdapter", "connector.class": "com.microsoft.siphon.sink.HttpConnector", "http.connection.string": "http://abc.xyz", "http.certificate.name": "siphonkat" }
  • 20. Kafka Connect Framework Integration com.microsoft.siphon.sink.HttpConnector Uploader: com.microsoft. connectors.sink.HttpConnector Used by Siphon Used by 3rd parties Keep connectors public. Keep our bussiness internal. SiphonSinkConnecor Decryption, Auditing, Logging, Internal Metrics Parser, Processor, Adapter
  • 21. • Why did we choose KConnect? - Some background - Our problems before KConnect • Why KConnect is not good enough for us? - Encryption, Auditing, Logging, Metrics and Pipeline - Integration • How do we operate KConnect at Siphon? - a picture of Siphon - user experience: self-serve - our experience: build, deployment, monitoring and alerting Content
  • 22. A pic of Siphon Rest Proxy (EventServer) Push data Pull data Kafka data data Kafka Connect data Cosmos S3 MySQL Sync-Services New topic config New connector config Self-Serve API DB Siphon UI Monitoring Services Monitoring System (Jarvis) Tenant, Topic and Subscription Management
  • 23. KConnect Operation – User experience At Siphon, we supports Self-serve connector management.
  • 24. KConnect Operation – User experience Customers could monitor their topic/subscription through UI.
  • 25. KConnect Operation - Build We use Visual Studio Team Service for code management and build pipeline.
  • 26. KConnect Operation - Deployment We use Azure Container Registries for Docker image storage and version control
  • 27. KConnect Operation - Deployment We use ExpressV2 for KConnect cluster deployment / upgrade.
  • 28. KConnect Operation - Deployment Most Siphon services are running in Docker, within Mesos for orchestration.
  • 29. KConnect Operation - Monitoring We use Jarvis as our monitoring system for cluster health.
  • 30. • Why did we choose KConnect? - Some background - Our problems before KConnect • Why KConnect is not good enough for us? - Encryption, Auditing, Logging, Metrics and Pipeline - Integration • How do we operate KConnect at Siphon? - a picture of Siphon - user experience: self-serve - our experience: build, deployment, monitoring and alerting Content
  • 31. July 12, 2018 Ren Cao, Siphon Team, Microsoft Thank you