SlideShare a Scribd company logo
1 of 44
Download to read offline
Building Kafka Connectors with Kotlin
A Step-by-Step Guide to Creation and Deployment
By Sami Alashabi and Ramzi Alashabi
2
Building Kafka
Connectors with
Kotlin
A Step-by-Step Guide to Creation
and Deployment
Sami Alashabi, Solutions Architect, Accenture/Essent
Ramzi Alashabi, Senior Data Engineer, ABN Amro
3
Sami Alashabi
12+ Year Journey in Data
Various Roles and Segments
Architecture, Big Data, Real-Time
Low Latency Distributed Systems,
AWS
Love to solve problems
Love spending time with family
when I’m not coding/architecting
Kafka Enthusiast
https://www.linkedin.com/in/sami-alashabi/
4
Ramzi Alashabi
10+ Years Data Specialist
Micro-services, ETLs, and Cloud
Engineering
Transform ideas to Production
Love learning new Languages &
hanging out with the fam.
Yes, I'm a Dog Person
https://www.linkedin.com/in/ramzialashabi/
5
Q&A
Questions & Follow Up
01
02
03
Kafka Connect
Overview, Architecture, Types & Concepts
Kotlin
Introduction, Background, Features & Advantages
Implementation & Code
Building a Source Connector, Test & Deployment Strategies
Agenda
04 Key Learnings
Summary & Takeaways
05
6
Kafka
7
Kafka Connecter
8
Connect: Start Kafka Connector
9
Connect: Start Kafka Connector
10
Connect: Start Kafka Connector
11
Connect: Start Kafka Connector
12
Connect: Start Kafka Connector
13
Connect: Start Kafka Connector
14
Connect: Start Kafka Connector
15
Connect: Start Kafka Connector
16
Connect: Start Kafka Connector
17
Connect: Start Kafka Connector
18
Connect: Start Kafka Connector
19
Connect: Start Kafka Connector
20
Connect: Start Kafka Connector
21
Connect: Standalone vs Distributed
Standalone
Ideal for large
production
Tasks are
distributed
across multiple
worker nodes
Configuration
stored in
Kafka, allows
dynamic
updates
Fault
tolerance,
tasks are
automatically
redistributed
It provides automatic
scalability.
more worker processes
can be added to scale
up (elastic)
Distributed
Ideal for
development &
testing
Tasks executed
in a single
process
Configuration
in a properties
file
No fault
tolerance,
If the process
fails, all tasks
stop
No automatic
scalability,
To scale up, you need to
manually start more
standalone processes.
22
curl --location 'http://kafkaConnect:8083/connectors' 
--header 'Content-Type: application/json' 
--header 'Authorization: Basic *******************' 
--data '{
"name": "GitlabSourceConnector-merge-requests",
"config": {
"name": "GitlabSourceConnector-merge-requests",
"connector.class":
"com.sami12rom.kafka.gitlab.GitlabSourceConnector",
"gitlab.repositories":
"kafka/confluent_kafka_connect_aws_terraform",
"gitlab.service.url": "https://gitlab.compny.nl/api/v4/",
"gitlab.resources": "merge_requests",
"gitlab.since": "2023-12-10T20:12:59.300Z",
"gitlab.access.token": "*****************",
"max.poll.interval.ms": "40000",
"topic.name.pattern": "gitlab-merge-requests",
"tasks.max": 1,
...
}
}'
Distributed Mode
Offsets
&
Config
&
Status
23
Single Message Transform
Single Message Transform: Is a way to modify the individual
messages as it flows through the Kafka Connect pipeline e.g.
● ReplaceField:
org.apache.kafka.connect.transforms.ReplaceField$Key
● MaskField:
org.apache.kafka.connect.transforms.MaskField$Value
● InsertField:
org.apache.kafka.connect.transforms.InsertField$Value
"config": {
...
"transforms":"flatten,createKey",
"transforms.flatten.type": "org.apache.kafka.connect.transforms.Flatten$Value",
"transforms.flatten.delimiter": "_",
"transforms.createKey.type": "org.apache.kafka.connect.transforms.ValueToKey",
"transforms.createKey.fields":"id,iid,project_id"
}
24
Data formats can be chosen depending on the specific requirements of your application:
● ProtobufConverter: When you need to optimize for speed and size -
io.confluent.connect.protobuf.ProtobufConverter
● JsonSchemaConverter: When you want a human-readable format and working with RESTful APIs
- io.confluent.connect.json.JsonSchemaConverter
● AvroConverter: is easiest for schema evolution - io.confluent.connect.avro.AvroConverter
● JsonConverter: When you want a human-readable format and don't need a schema -
org.apache.kafka.connect.json.JsonConverter
Converters & Data Formats
"config": {
...
"key.converter":"io.confluent.connect.json.JsonSchemaConverter",
"key.converter.schema.registry.url":"http://schema-registry:8081",
"value.converter":"io.confluent.connect.json.JsonSchemaConverter",
"value.converter.schema.registry.url":"http://schema-registry:8081"
}
25
25
Kotlin
Background, Features
and Advantages
26
Introduction
Kotlin is a modern, statically typed programming
language that mainly targets the Java Virtual
Machine (JVM)
● It was first introduced by JetBrains in 2011.
● In 2019, Google announced Kotlin as an
official language for Android development.
● Growing Community of Developers.
27
Features
& Advantages
val message = "Hello, World!" // Type inference
if (message is String) { // Smart cast
println(message.length)} // Allows accessing String-specific funcs
// Using default arguments
fun greet(name: String = "John Doe", message: String = "Hello") {
println("$message, $name!")}
greet()
// Safe Calls (?.): Execute only when the value is not null
val name: String? = null
val length: Int? = name?.length
// Elvis Operator (?:): Use value if not null, otherwise use default
val name: String? = null
val length = name?.length ?: -1
// Not-null assertion (!!): Use when sure the value is not null
val name: String? = null
val length = name!!.length
// Higher-order function that takes a function as a parameter
fun calculate(x: Int, y: Int, operation: (Int, Int) -> Int): Int {
return operation(x, y)}
// Using lambda expression
val result = calculate(5, 3) { a, b -> a + b }
Concise Syntax
Reduces boilerplate which
allows writing clean, compact &
more readable code e.g.
● Type inference
● Smart casts
● Default arguments
Safe & Reliable
Built-in null safety features,
eliminating the infamous
NullPointerException errors
using
● safe calls (?.)
● the Elvis operator (?:)
● non-null assertion (!!)
Interoperability
It is fully compatible with Java,
which means you can seamlessly
use Kotlin code in Java projects
and vice versa.
Functional
Programming support
It embraces functional
programming and offers features
like higher-order & first-class
functions, lambda expressions,
functional utilities such as map,
filter, and reduce.
Implementation
& Code
Building A Source
Connector
29
Build.gradle.kts
● Plugins: e.g. Java library plugin, the
Kotlin JVM plugin, the Git version plugin,
and the Maven Publish plugin.
● Repositories: specifies where to fetch
dependencies from.
● Dependencies: libraries the project
depends on, including both
implementation and test dependencies
● Tasks: Test, Build, Jar
● Publishing: publish to a Maven
repository
plugins {
`java-library`
kotlin("jvm") version "1.9.22"
id("com.palantir.git-version") version "1.0.0"
`maven-publish`
}
dependencies {
implementation("org.apache.kafka:connect-api:3.4.0”)
implementation("commons-validator:commons-validator:1.7")
testImplementation("org.testcontainers:kafka:1.19.6")
}
Gitlab: Building a
Source Connector
30
Source Connector Interface
● GitlabSourceConnector extends from
SourceConnector.
● SourceConnector: part of the Kafka
Connect framework to stream data from
external data systems to Kafka.
● Version: Returns the version of the
connector and is often used for logging
and debugging purposes.
Gitlab: Building a
Source Connector class GitlabSourceConnector: SourceConnector() {
override fun version(): String {
return ConnectorVersionDetails::class.java.`package`.implementationVersion ?:
"1.0.0" }
override fun start(props: Map<String, String>) {}
override fun config(): ConfigDef {}
override fun taskClass(): Class<out Task> {}
override fun taskConfigs(maxTasks: Int):
List<Map<String, String>> {}
override fun stop() {}
}
31
Gitlab: Building a
Source Connector class GitlabSourceConnector: SourceConnector() {
override fun version(): String {}
override fun start(props: Map<String, String>) {
logger.info("Starting GitlabSourceConnector”)
this.props = props
}
override fun config(): ConfigDef {}
override fun taskClass (): Class< out Task> {}
override fun taskConfigs (maxTasks: Int): List<Map<String , String>> {}
override fun stop() {
logger.info("Requested connector to stop at ${Instant.now()}")
}
Source Connector Lifecycle
● The start and stop methods are part of
the lifecycle of a Source Connector in
Kafka Connect.
● start(props) is called on initialization
and allows the set up of any resources
the connector needs to run. The props is
a map of configuration settings.
● stop is called when the connector is
being shut down and where it clean up
any resources that were opened or
started in the start method.
32
Gitlab: Building a
Source Connector
Source Connector Task
Configuration
● taskConfigs method is used to divide
the work of the connector into smaller,
independent tasks that can be distributed
across multiple workers in a Kafka
Connect cluster, with benefits such as:
○ Parallelism
○ Scalability
○ Fault Isolation
○ Flexibility
override fun taskConfigs(maxTasks: Int): List<Map<String,
String>> {
val taskConfigs = ListOf<Map<String, String>>()
val repositories = props[REPOSITORIES].split(", ")
val groups = repositories.size.coerceAtMost(maxTasks)
val reposGrouped =
ConnectorUtils.groupPartitions(repositories, groups)
for (group in reposGrouped) {
val taskProps = mutableMapOf<String, String>()
taskProps.putAll(props?.toMap()!!)
taskProps.replace(REPOSITORIES,
group.joinToString(";"))
taskConfigs.add(taskProps)
}
return taskConfigs
}
Output config:
[ {gitlab.repositories=Repo#1;Repo#2}, {gitlab.repositories=Repo#3} ]
Input config:
{"gitlab.repositories": "Repo#1, Repo#2, Repo#3", "tasks.max": 2
33
Gitlab: Building a
Source Connector
override fun config(): ConfigDef {}
const val GITLAB_ENDPOINT_CONFIG = "gitlab.service.url"
val CONFIG: ConfigDef = ConfigDef()
.define(
/* name = */ GITLAB_ENDPOINT_CONFIG,
/* type = */ ConfigDef.Type.STRING,
/* defaultValue = */ "https://gitlab.company.nl/api/v4",
/* validator = */ EndpointValidator(),
/* importance = */ ConfigDef.Importance.HIGH,
/* documentation = */ "GitLab API Root Endpoint Ex.
https://gitlab.example.com/api/v4/",
/* group = */ "Settings",
/* orderInGroup = */ -1,
/* width = */ ConfigDef.Width.MEDIUM,
/* displayName = */ "GitLab Endpoint",
/* recommender = */ EndpointRecommender()
)
Source Connector Configuration
● ConfigDef class is used to define the
configuration options the Kafka connector
accepts.
34
Gitlab: Building a
Source Connector
override fun config(): ConfigDef {}
class EndpointValidator : ConfigDef.Validator {
override fun ensureValid(name: String?, value: Any?) {
val url = value as String
val validator = UrlValidator()
if (!validator.isValid(url)) {
throw ConfigException("$url must be a valid URL, use
examples https://gitlab.example.com/api/v4/")
}
}
}
class EndpointRecommender : ConfigDef.Recommender {
override fun validValues(name: String, parsedConfig:
Map<String, Any>): List<String> {
return ListOf("https://gitlab.company.nl/api/v4/")
}
override fun visible(name: String?, parsedConfig:
Map<String, Any>?): Boolean {
return true
}
}
Source Connector Configuration
● Enhancing usability and reducing the
likelihood of configuration errors.
● Recommender: Is an instance of
ConfigDef.Recommender that can
suggest values for the configuration
option and make it easier for users to
configure options correctly.
● Validator: Is an instance of
ConfigDef.Validator that is used to
validate the configuration values which
can help catch configuration errors early,
before they cause problems at runtime.
35
Gitlab: Building a
Source Connector val mergedRequest: Schema = SchemaBuilder.struct()
.name("com.sami12rom.mergedRequest")
.version(1).doc("Merged Request Value Schema")
.field("id", SchemaBuilder.int64())
.field("project_id", SchemaBuilder.int64())
.field("title", SchemaBuilder.string()
.optional().defaultValue(null))
.field("description", SchemaBuilder.string()
.optional().defaultValue(null))
.build()
val struct = Struct(Schemas.mergedRequest)
struct.put("id", mergedRequest.id)
struct.put("project_id", mergedRequest.project_id)
struct.put("title", mergedRequest.title)
struct.put("description", mergedRequest.description)
Data Schemas: SchemaBuilder
● Schemas define the structure of the
data in Kafka Connect and specify the
type of each field, whether it's required
or optional, and other properties.
○ Data types e.g. struct, map, array
○ Helps ensure data consistency
● Structs is used to hold actual data and
ensure that the data conforms to the
schema.
○ Needed for SourceRecord or
SinkRecord.
36
Gitlab: Building a
Source Connector
class GitlabSourceTask : SourceTask() {
override fun start(props: Map<String, String>?) {
initializeSource()
}
override fun poll(): MutableList<SourceRecord> {
val records = mutableListOf<SourceRecord>()
sleepForInterval()
val response = ApiCalls.GitLabCall(props!!)
val record = generateSourceRecord(response as
MergedRequest)
records.add(record)
return records
}
override fun stop() {}
Source Task Class
● Poll: is called repeatedly to pull data
from external source into Kafka. It should
return a list of SourceRecord objects
or null if there's no data available.
37
Source Record - Part 1
● topic: Name of the topic to write to.
● partition: Partition where the record will be
written, can be null to let Kafka assign it.
● keySchema & key: The schema & key for
this record.
● valueSchema & value: The schema & value
for this record. Value is the actual data that
will be written to the Kafka topic.
● timestamp: The timestamp for this record
and can be null to let Kafka assign the
current time.
● headers: Headers for this record.
Gitlab: Building a
Source Connector val record = SourceRecord(
/* sourcePartition = */ Map (Connector),
/* sourceOffset = */ Map (Connector),
/* topic = */ String,
/* partition = */ Integer (Optional),
/* keySchema = */ Schema (Optional),
/* key = */ Object (Optional),
/* valueSchema = */ Schema (Optional),
/* value = */ Object (Optional),
/* timestamp = */ Long (Optional),
/* headers= generateHeaders() (Optional)
)
38
val record = SourceRecord(
/* sourcePartition = */ Map,
/* sourceOffset = */ Map,
...
)
Connector Restart Offset:
override fun start(props: Map<String, String>?) {
initializeSource()
}
fun initializeSource(): Map<String, Any>?{
return context.offsetStorageReader()
.offset(sourcePartition())
Gitlab: Building a
Source Connector
Source Record - Part 2
(Restartability)
● sourcePartition: It defines the partition
of the source system that this record
came from, e.g. a table name for a
database connector.
● sourceOffset: It defines the position in
the source partition that this record came
from, e.g. an ID of the row for a database
connector.
● offsetStorageReader: Retrieve the last
stored offset for a specific partition to
resume data ingestion where it last left
off.
39
Testing Strategies
Soak & Error Handling Tests
Run your connector for an extended period under
typical load to identify long-term issues. Write tests to
ensure your connector handles errors gracefully and
recovers from failures.
Unit Tests
Isolated tests for individual functions or methods using
a testing framework like JUnit and a mocking library like
mockito.
Integration Tests
Test the interaction between all components using
tools like Testcontainers to set up realistic testing
environments.
End to End & Performance Tests
Validate the entire flow from producing a record to the
source system to consuming it from Kafka. Measure the
throughput and latency of your connector under
different loads.
40
Deployment Strategies
CodeArtifact
AWS ECR
AWS ECS
CodeBuild
CodeDeploy
CodePipeline
Github
Gitlab
AWS ECR
AWS ECS
Gitlab
Managed
Infrastructure Connector
Plugin
41
Monitoring Strategies
Cloudwatch
CW Alarm
Grafana
Prometheus
42
Errors Handling
Retries
For transient errors, e.g.
temporary network issues
Custom Error Handling
In your SourceTask or SinkTask,
custom error handling logic can be
added e.g. catch exceptions,
log them, and decide whether to
fail the task or attempt to recover
and continue
Monitoring Metrics
Actively monitor and alert
on error message rates of the
connector e.g. Task Error
Metrics, Records
Produced/Consumed, Task
Status, Lag/Throughput
Metrics
Error Tolerance
errors.tolerance = none
● fail fast (default)
errors.tolerance = all
● silently ignore
errors.deadletterqueue.topic.name
● dead letter queues
Log Errors
Errors can be logged for
troubleshooting and can be
controlled by:
● errors.log.enable = true
● Errors.log.include.messages
Avoid excessive use of Error or
Warn levels in your logging
Dead Letter Queue
Automatically send error records
to a DLQ topic for later inspection
along with header.
PS: DLQ is currently only supported
for Sink Connectors and not for
Source Connectors
43
4. Resilience and Error
Handling
● Design your connector with restartability and
fault tolerance in mind.
● Implement error handling.
● Consider how the connector will handle
network failures, API rate limits, etc..
5. Testing, Deployment, and
Monitoring
● Test, Test & Test under different scenarios
● Set up Monitoring Mechanism
● Implement proper logging
● Track Performance (JMX)
2. Connector Development
● Add the required dependencies,
● Define the actions for the start and stop
methods,
● Determine the number of tasks based on
your parallelism requirements.
● implement the poll method, and decide on
the frequency of polling.
3. Data Management
● Develop a function to fetch data from your source system.
● Define the Schema and Struct.
● Define the contents of the source record.
● Choose the right Converter for your data format (operations)
● Consider the usage of Single Message Transforms (operations)
1. Planning and Design
● Understand Your Data Source
● Decide on the type (source or sink)
● Plan config inputs, defaults, validators, and
recommenders
● Consider the volume of data your connector
will need to handle (parallel processing)
Key Learnings
44
Q & A
Thank you for your attention
and participation
Please rate the session in the Kafka Summit App
Code
https://github.com/sami12rom/kafka-connect-gitlab

More Related Content

Similar to Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and Deployment

Kafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaKafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaGuido Schmutz
 
Introduction to Structured Streaming
Introduction to Structured StreamingIntroduction to Structured Streaming
Introduction to Structured StreamingKnoldus Inc.
 
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019Thomas Weise
 
ApacheCon NA - Apache Camel K: connect your Knative serverless applications w...
ApacheCon NA - Apache Camel K: connect your Knative serverless applications w...ApacheCon NA - Apache Camel K: connect your Knative serverless applications w...
ApacheCon NA - Apache Camel K: connect your Knative serverless applications w...Nicola Ferraro
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Guido Schmutz
 
Productionizing your Streaming Jobs
Productionizing your Streaming JobsProductionizing your Streaming Jobs
Productionizing your Streaming JobsDatabricks
 
Scio - Moving to Google Cloud, A Spotify Story
 Scio - Moving to Google Cloud, A Spotify Story Scio - Moving to Google Cloud, A Spotify Story
Scio - Moving to Google Cloud, A Spotify StoryNeville Li
 
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-AirflowPyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-AirflowChetan Khatri
 
Building Continuous Application with Structured Streaming and Real-Time Data ...
Building Continuous Application with Structured Streaming and Real-Time Data ...Building Continuous Application with Structured Streaming and Real-Time Data ...
Building Continuous Application with Structured Streaming and Real-Time Data ...Databricks
 
BigQuery case study in Groovenauts & Dive into the DataflowJavaSDK
BigQuery case study in Groovenauts & Dive into the DataflowJavaSDKBigQuery case study in Groovenauts & Dive into the DataflowJavaSDK
BigQuery case study in Groovenauts & Dive into the DataflowJavaSDKnagachika t
 
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...Inhacking
 
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...Аліна Шепшелей
 
How to Build an Apache Kafka® Connector
How to Build an Apache Kafka® ConnectorHow to Build an Apache Kafka® Connector
How to Build an Apache Kafka® Connectorconfluent
 
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LMESet your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LMEconfluent
 
Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Julian Hyde
 
Building a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkBuilding a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkEvan Chan
 
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQLBuilding a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQLScyllaDB
 

Similar to Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and Deployment (20)

Kafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaKafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around Kafka
 
Jug - ecosystem
Jug -  ecosystemJug -  ecosystem
Jug - ecosystem
 
Introduction to Structured Streaming
Introduction to Structured StreamingIntroduction to Structured Streaming
Introduction to Structured Streaming
 
Data Pipeline at Tapad
Data Pipeline at TapadData Pipeline at Tapad
Data Pipeline at Tapad
 
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
 
ApacheCon NA - Apache Camel K: connect your Knative serverless applications w...
ApacheCon NA - Apache Camel K: connect your Knative serverless applications w...ApacheCon NA - Apache Camel K: connect your Knative serverless applications w...
ApacheCon NA - Apache Camel K: connect your Knative serverless applications w...
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 
Productionizing your Streaming Jobs
Productionizing your Streaming JobsProductionizing your Streaming Jobs
Productionizing your Streaming Jobs
 
Scio - Moving to Google Cloud, A Spotify Story
 Scio - Moving to Google Cloud, A Spotify Story Scio - Moving to Google Cloud, A Spotify Story
Scio - Moving to Google Cloud, A Spotify Story
 
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-AirflowPyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
 
Building Continuous Application with Structured Streaming and Real-Time Data ...
Building Continuous Application with Structured Streaming and Real-Time Data ...Building Continuous Application with Structured Streaming and Real-Time Data ...
Building Continuous Application with Structured Streaming and Real-Time Data ...
 
BigQuery case study in Groovenauts & Dive into the DataflowJavaSDK
BigQuery case study in Groovenauts & Dive into the DataflowJavaSDKBigQuery case study in Groovenauts & Dive into the DataflowJavaSDK
BigQuery case study in Groovenauts & Dive into the DataflowJavaSDK
 
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...
 
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
 
How to Build an Apache Kafka® Connector
How to Build an Apache Kafka® ConnectorHow to Build an Apache Kafka® Connector
How to Build an Apache Kafka® Connector
 
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LMESet your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
 
Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)
 
Building a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkBuilding a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and Spark
 
So you think you can stream.pptx
So you think you can stream.pptxSo you think you can stream.pptx
So you think you can stream.pptx
 
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQLBuilding a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
 

More from HostedbyConfluent

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonHostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolHostedbyConfluent
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesHostedbyConfluent
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaHostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonHostedbyConfluent
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonHostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyHostedbyConfluent
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...HostedbyConfluent
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...HostedbyConfluent
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersHostedbyConfluent
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformHostedbyConfluent
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubHostedbyConfluent
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonHostedbyConfluent
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLHostedbyConfluent
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceHostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondHostedbyConfluent
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsHostedbyConfluent
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemHostedbyConfluent
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksHostedbyConfluent
 

More from HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
 

Recently uploaded

Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 

Recently uploaded (20)

Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 

Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and Deployment

  • 1. Building Kafka Connectors with Kotlin A Step-by-Step Guide to Creation and Deployment By Sami Alashabi and Ramzi Alashabi
  • 2. 2 Building Kafka Connectors with Kotlin A Step-by-Step Guide to Creation and Deployment Sami Alashabi, Solutions Architect, Accenture/Essent Ramzi Alashabi, Senior Data Engineer, ABN Amro
  • 3. 3 Sami Alashabi 12+ Year Journey in Data Various Roles and Segments Architecture, Big Data, Real-Time Low Latency Distributed Systems, AWS Love to solve problems Love spending time with family when I’m not coding/architecting Kafka Enthusiast https://www.linkedin.com/in/sami-alashabi/
  • 4. 4 Ramzi Alashabi 10+ Years Data Specialist Micro-services, ETLs, and Cloud Engineering Transform ideas to Production Love learning new Languages & hanging out with the fam. Yes, I'm a Dog Person https://www.linkedin.com/in/ramzialashabi/
  • 5. 5 Q&A Questions & Follow Up 01 02 03 Kafka Connect Overview, Architecture, Types & Concepts Kotlin Introduction, Background, Features & Advantages Implementation & Code Building a Source Connector, Test & Deployment Strategies Agenda 04 Key Learnings Summary & Takeaways 05
  • 21. 21 Connect: Standalone vs Distributed Standalone Ideal for large production Tasks are distributed across multiple worker nodes Configuration stored in Kafka, allows dynamic updates Fault tolerance, tasks are automatically redistributed It provides automatic scalability. more worker processes can be added to scale up (elastic) Distributed Ideal for development & testing Tasks executed in a single process Configuration in a properties file No fault tolerance, If the process fails, all tasks stop No automatic scalability, To scale up, you need to manually start more standalone processes.
  • 22. 22 curl --location 'http://kafkaConnect:8083/connectors' --header 'Content-Type: application/json' --header 'Authorization: Basic *******************' --data '{ "name": "GitlabSourceConnector-merge-requests", "config": { "name": "GitlabSourceConnector-merge-requests", "connector.class": "com.sami12rom.kafka.gitlab.GitlabSourceConnector", "gitlab.repositories": "kafka/confluent_kafka_connect_aws_terraform", "gitlab.service.url": "https://gitlab.compny.nl/api/v4/", "gitlab.resources": "merge_requests", "gitlab.since": "2023-12-10T20:12:59.300Z", "gitlab.access.token": "*****************", "max.poll.interval.ms": "40000", "topic.name.pattern": "gitlab-merge-requests", "tasks.max": 1, ... } }' Distributed Mode Offsets & Config & Status
  • 23. 23 Single Message Transform Single Message Transform: Is a way to modify the individual messages as it flows through the Kafka Connect pipeline e.g. ● ReplaceField: org.apache.kafka.connect.transforms.ReplaceField$Key ● MaskField: org.apache.kafka.connect.transforms.MaskField$Value ● InsertField: org.apache.kafka.connect.transforms.InsertField$Value "config": { ... "transforms":"flatten,createKey", "transforms.flatten.type": "org.apache.kafka.connect.transforms.Flatten$Value", "transforms.flatten.delimiter": "_", "transforms.createKey.type": "org.apache.kafka.connect.transforms.ValueToKey", "transforms.createKey.fields":"id,iid,project_id" }
  • 24. 24 Data formats can be chosen depending on the specific requirements of your application: ● ProtobufConverter: When you need to optimize for speed and size - io.confluent.connect.protobuf.ProtobufConverter ● JsonSchemaConverter: When you want a human-readable format and working with RESTful APIs - io.confluent.connect.json.JsonSchemaConverter ● AvroConverter: is easiest for schema evolution - io.confluent.connect.avro.AvroConverter ● JsonConverter: When you want a human-readable format and don't need a schema - org.apache.kafka.connect.json.JsonConverter Converters & Data Formats "config": { ... "key.converter":"io.confluent.connect.json.JsonSchemaConverter", "key.converter.schema.registry.url":"http://schema-registry:8081", "value.converter":"io.confluent.connect.json.JsonSchemaConverter", "value.converter.schema.registry.url":"http://schema-registry:8081" }
  • 26. 26 Introduction Kotlin is a modern, statically typed programming language that mainly targets the Java Virtual Machine (JVM) ● It was first introduced by JetBrains in 2011. ● In 2019, Google announced Kotlin as an official language for Android development. ● Growing Community of Developers.
  • 27. 27 Features & Advantages val message = "Hello, World!" // Type inference if (message is String) { // Smart cast println(message.length)} // Allows accessing String-specific funcs // Using default arguments fun greet(name: String = "John Doe", message: String = "Hello") { println("$message, $name!")} greet() // Safe Calls (?.): Execute only when the value is not null val name: String? = null val length: Int? = name?.length // Elvis Operator (?:): Use value if not null, otherwise use default val name: String? = null val length = name?.length ?: -1 // Not-null assertion (!!): Use when sure the value is not null val name: String? = null val length = name!!.length // Higher-order function that takes a function as a parameter fun calculate(x: Int, y: Int, operation: (Int, Int) -> Int): Int { return operation(x, y)} // Using lambda expression val result = calculate(5, 3) { a, b -> a + b } Concise Syntax Reduces boilerplate which allows writing clean, compact & more readable code e.g. ● Type inference ● Smart casts ● Default arguments Safe & Reliable Built-in null safety features, eliminating the infamous NullPointerException errors using ● safe calls (?.) ● the Elvis operator (?:) ● non-null assertion (!!) Interoperability It is fully compatible with Java, which means you can seamlessly use Kotlin code in Java projects and vice versa. Functional Programming support It embraces functional programming and offers features like higher-order & first-class functions, lambda expressions, functional utilities such as map, filter, and reduce.
  • 29. 29 Build.gradle.kts ● Plugins: e.g. Java library plugin, the Kotlin JVM plugin, the Git version plugin, and the Maven Publish plugin. ● Repositories: specifies where to fetch dependencies from. ● Dependencies: libraries the project depends on, including both implementation and test dependencies ● Tasks: Test, Build, Jar ● Publishing: publish to a Maven repository plugins { `java-library` kotlin("jvm") version "1.9.22" id("com.palantir.git-version") version "1.0.0" `maven-publish` } dependencies { implementation("org.apache.kafka:connect-api:3.4.0”) implementation("commons-validator:commons-validator:1.7") testImplementation("org.testcontainers:kafka:1.19.6") } Gitlab: Building a Source Connector
  • 30. 30 Source Connector Interface ● GitlabSourceConnector extends from SourceConnector. ● SourceConnector: part of the Kafka Connect framework to stream data from external data systems to Kafka. ● Version: Returns the version of the connector and is often used for logging and debugging purposes. Gitlab: Building a Source Connector class GitlabSourceConnector: SourceConnector() { override fun version(): String { return ConnectorVersionDetails::class.java.`package`.implementationVersion ?: "1.0.0" } override fun start(props: Map<String, String>) {} override fun config(): ConfigDef {} override fun taskClass(): Class<out Task> {} override fun taskConfigs(maxTasks: Int): List<Map<String, String>> {} override fun stop() {} }
  • 31. 31 Gitlab: Building a Source Connector class GitlabSourceConnector: SourceConnector() { override fun version(): String {} override fun start(props: Map<String, String>) { logger.info("Starting GitlabSourceConnector”) this.props = props } override fun config(): ConfigDef {} override fun taskClass (): Class< out Task> {} override fun taskConfigs (maxTasks: Int): List<Map<String , String>> {} override fun stop() { logger.info("Requested connector to stop at ${Instant.now()}") } Source Connector Lifecycle ● The start and stop methods are part of the lifecycle of a Source Connector in Kafka Connect. ● start(props) is called on initialization and allows the set up of any resources the connector needs to run. The props is a map of configuration settings. ● stop is called when the connector is being shut down and where it clean up any resources that were opened or started in the start method.
  • 32. 32 Gitlab: Building a Source Connector Source Connector Task Configuration ● taskConfigs method is used to divide the work of the connector into smaller, independent tasks that can be distributed across multiple workers in a Kafka Connect cluster, with benefits such as: ○ Parallelism ○ Scalability ○ Fault Isolation ○ Flexibility override fun taskConfigs(maxTasks: Int): List<Map<String, String>> { val taskConfigs = ListOf<Map<String, String>>() val repositories = props[REPOSITORIES].split(", ") val groups = repositories.size.coerceAtMost(maxTasks) val reposGrouped = ConnectorUtils.groupPartitions(repositories, groups) for (group in reposGrouped) { val taskProps = mutableMapOf<String, String>() taskProps.putAll(props?.toMap()!!) taskProps.replace(REPOSITORIES, group.joinToString(";")) taskConfigs.add(taskProps) } return taskConfigs } Output config: [ {gitlab.repositories=Repo#1;Repo#2}, {gitlab.repositories=Repo#3} ] Input config: {"gitlab.repositories": "Repo#1, Repo#2, Repo#3", "tasks.max": 2
  • 33. 33 Gitlab: Building a Source Connector override fun config(): ConfigDef {} const val GITLAB_ENDPOINT_CONFIG = "gitlab.service.url" val CONFIG: ConfigDef = ConfigDef() .define( /* name = */ GITLAB_ENDPOINT_CONFIG, /* type = */ ConfigDef.Type.STRING, /* defaultValue = */ "https://gitlab.company.nl/api/v4", /* validator = */ EndpointValidator(), /* importance = */ ConfigDef.Importance.HIGH, /* documentation = */ "GitLab API Root Endpoint Ex. https://gitlab.example.com/api/v4/", /* group = */ "Settings", /* orderInGroup = */ -1, /* width = */ ConfigDef.Width.MEDIUM, /* displayName = */ "GitLab Endpoint", /* recommender = */ EndpointRecommender() ) Source Connector Configuration ● ConfigDef class is used to define the configuration options the Kafka connector accepts.
  • 34. 34 Gitlab: Building a Source Connector override fun config(): ConfigDef {} class EndpointValidator : ConfigDef.Validator { override fun ensureValid(name: String?, value: Any?) { val url = value as String val validator = UrlValidator() if (!validator.isValid(url)) { throw ConfigException("$url must be a valid URL, use examples https://gitlab.example.com/api/v4/") } } } class EndpointRecommender : ConfigDef.Recommender { override fun validValues(name: String, parsedConfig: Map<String, Any>): List<String> { return ListOf("https://gitlab.company.nl/api/v4/") } override fun visible(name: String?, parsedConfig: Map<String, Any>?): Boolean { return true } } Source Connector Configuration ● Enhancing usability and reducing the likelihood of configuration errors. ● Recommender: Is an instance of ConfigDef.Recommender that can suggest values for the configuration option and make it easier for users to configure options correctly. ● Validator: Is an instance of ConfigDef.Validator that is used to validate the configuration values which can help catch configuration errors early, before they cause problems at runtime.
  • 35. 35 Gitlab: Building a Source Connector val mergedRequest: Schema = SchemaBuilder.struct() .name("com.sami12rom.mergedRequest") .version(1).doc("Merged Request Value Schema") .field("id", SchemaBuilder.int64()) .field("project_id", SchemaBuilder.int64()) .field("title", SchemaBuilder.string() .optional().defaultValue(null)) .field("description", SchemaBuilder.string() .optional().defaultValue(null)) .build() val struct = Struct(Schemas.mergedRequest) struct.put("id", mergedRequest.id) struct.put("project_id", mergedRequest.project_id) struct.put("title", mergedRequest.title) struct.put("description", mergedRequest.description) Data Schemas: SchemaBuilder ● Schemas define the structure of the data in Kafka Connect and specify the type of each field, whether it's required or optional, and other properties. ○ Data types e.g. struct, map, array ○ Helps ensure data consistency ● Structs is used to hold actual data and ensure that the data conforms to the schema. ○ Needed for SourceRecord or SinkRecord.
  • 36. 36 Gitlab: Building a Source Connector class GitlabSourceTask : SourceTask() { override fun start(props: Map<String, String>?) { initializeSource() } override fun poll(): MutableList<SourceRecord> { val records = mutableListOf<SourceRecord>() sleepForInterval() val response = ApiCalls.GitLabCall(props!!) val record = generateSourceRecord(response as MergedRequest) records.add(record) return records } override fun stop() {} Source Task Class ● Poll: is called repeatedly to pull data from external source into Kafka. It should return a list of SourceRecord objects or null if there's no data available.
  • 37. 37 Source Record - Part 1 ● topic: Name of the topic to write to. ● partition: Partition where the record will be written, can be null to let Kafka assign it. ● keySchema & key: The schema & key for this record. ● valueSchema & value: The schema & value for this record. Value is the actual data that will be written to the Kafka topic. ● timestamp: The timestamp for this record and can be null to let Kafka assign the current time. ● headers: Headers for this record. Gitlab: Building a Source Connector val record = SourceRecord( /* sourcePartition = */ Map (Connector), /* sourceOffset = */ Map (Connector), /* topic = */ String, /* partition = */ Integer (Optional), /* keySchema = */ Schema (Optional), /* key = */ Object (Optional), /* valueSchema = */ Schema (Optional), /* value = */ Object (Optional), /* timestamp = */ Long (Optional), /* headers= generateHeaders() (Optional) )
  • 38. 38 val record = SourceRecord( /* sourcePartition = */ Map, /* sourceOffset = */ Map, ... ) Connector Restart Offset: override fun start(props: Map<String, String>?) { initializeSource() } fun initializeSource(): Map<String, Any>?{ return context.offsetStorageReader() .offset(sourcePartition()) Gitlab: Building a Source Connector Source Record - Part 2 (Restartability) ● sourcePartition: It defines the partition of the source system that this record came from, e.g. a table name for a database connector. ● sourceOffset: It defines the position in the source partition that this record came from, e.g. an ID of the row for a database connector. ● offsetStorageReader: Retrieve the last stored offset for a specific partition to resume data ingestion where it last left off.
  • 39. 39 Testing Strategies Soak & Error Handling Tests Run your connector for an extended period under typical load to identify long-term issues. Write tests to ensure your connector handles errors gracefully and recovers from failures. Unit Tests Isolated tests for individual functions or methods using a testing framework like JUnit and a mocking library like mockito. Integration Tests Test the interaction between all components using tools like Testcontainers to set up realistic testing environments. End to End & Performance Tests Validate the entire flow from producing a record to the source system to consuming it from Kafka. Measure the throughput and latency of your connector under different loads.
  • 40. 40 Deployment Strategies CodeArtifact AWS ECR AWS ECS CodeBuild CodeDeploy CodePipeline Github Gitlab AWS ECR AWS ECS Gitlab Managed Infrastructure Connector Plugin
  • 42. 42 Errors Handling Retries For transient errors, e.g. temporary network issues Custom Error Handling In your SourceTask or SinkTask, custom error handling logic can be added e.g. catch exceptions, log them, and decide whether to fail the task or attempt to recover and continue Monitoring Metrics Actively monitor and alert on error message rates of the connector e.g. Task Error Metrics, Records Produced/Consumed, Task Status, Lag/Throughput Metrics Error Tolerance errors.tolerance = none ● fail fast (default) errors.tolerance = all ● silently ignore errors.deadletterqueue.topic.name ● dead letter queues Log Errors Errors can be logged for troubleshooting and can be controlled by: ● errors.log.enable = true ● Errors.log.include.messages Avoid excessive use of Error or Warn levels in your logging Dead Letter Queue Automatically send error records to a DLQ topic for later inspection along with header. PS: DLQ is currently only supported for Sink Connectors and not for Source Connectors
  • 43. 43 4. Resilience and Error Handling ● Design your connector with restartability and fault tolerance in mind. ● Implement error handling. ● Consider how the connector will handle network failures, API rate limits, etc.. 5. Testing, Deployment, and Monitoring ● Test, Test & Test under different scenarios ● Set up Monitoring Mechanism ● Implement proper logging ● Track Performance (JMX) 2. Connector Development ● Add the required dependencies, ● Define the actions for the start and stop methods, ● Determine the number of tasks based on your parallelism requirements. ● implement the poll method, and decide on the frequency of polling. 3. Data Management ● Develop a function to fetch data from your source system. ● Define the Schema and Struct. ● Define the contents of the source record. ● Choose the right Converter for your data format (operations) ● Consider the usage of Single Message Transforms (operations) 1. Planning and Design ● Understand Your Data Source ● Decide on the type (source or sink) ● Plan config inputs, defaults, validators, and recommenders ● Consider the volume of data your connector will need to handle (parallel processing) Key Learnings
  • 44. 44 Q & A Thank you for your attention and participation Please rate the session in the Kafka Summit App Code https://github.com/sami12rom/kafka-connect-gitlab