SlideShare a Scribd company logo
Oficinas en Madrid: C/ Francisco Silvela, 54 Duplicado 1ºD 28028
Telf: 91 080 82 44
Oficinas en Barcelona: C/ Madrazo 27-29 4ª 08006
Telf: 933 68 52 46
1
KAFKA INFRASTRUCTURE:
SERVICES
Oficinas en Madrid: C/ Francisco Silvela, 54 Duplicado 1ºD 28028
Telf: 91 080 82 44
Oficinas en Barcelona: C/ Madrazo 27-29 4ª 08006
Telf: 933 68 52 46
1. Kafka Schema Registry
2. Kafka Connect & Connectors
3. KSQL
4. HTTP Proxy
2
$intro --help
Oficinas en Madrid: C/ Francisco Silvela, 54 Duplicado 1ºD 28028
Telf: 91 080 82 44
Oficinas en Barcelona: C/ Madrazo 27-29 4ª 08006
Telf: 933 68 52 46
- The Schema Registry is a distributed storage layer for Avro Schemas which uses
Kafka as its underlying storage mechanism.
- Assigns globally unique id to each registered schema. Allocated ids are
guaranteed to be monotonically increasing but not necessarily consecutive.
- Kafka provides the durable backend, and functions as a write-ahead changelog for
the state of the Schema Registry and the schemas it contains.
- The Schema Registry is designed to be distributed, with single-master
architecture, and ZooKeeper/Kafka coordinates master election
- Memory & CPU consumption is really low, even for bigger kafka clusters or many
event schemas.
- Schema Registry doesn’t have its own disk data, all the schemas reside in a kafka
log store + in memory indices.
- Avoid running schema registry clusters in different data centers, the increase in
network and/or network availability problems may hinder the performance of the
whole kafka cluster.
- Backup your schemas topic constantly (e.g. with kafka sink connector)
3
KAFKA SCHEMA REGISTRY
Oficinas en Madrid: C/ Francisco Silvela, 54 Duplicado 1ºD 28028
Telf: 91 080 82 44
Oficinas en Barcelona: C/ Madrazo 27-29 4ª 08006
Telf: 933 68 52 46
- Kafka connect is a framework for connecting Kafka with external systems and
data sources.
- It allows for source connectors to ingest data from various elements e. g.
databases, filesystems, logs, etc.
- On the other hand, sink connectors delivers data from kafka topics onto other
indexes (Elasticsearch, Hadoop, etc).
- It can be run standalone or clusterized.
- Connectors allow for an abstraction layer when pulling or pushing data to Kafka
- They are extremely flexible and scalable (Clusterization, batching, streaming, etc)
- One can extend or reuse the connectors with ease.
4
KAFKA CONNECT
Oficinas en Madrid: C/ Francisco Silvela, 54 Duplicado 1ºD 28028
Telf: 91 080 82 44
Oficinas en Barcelona: C/ Madrazo 27-29 4ª 08006
Telf: 933 68 52 46
- Connectors define the source and target of the data.
- A connector instance manages the copying of data between kafka and another
system.
- A connector plugin is the packaging of the classes used by said instance, which
then one can deploy in instances and replicate at will
5
KAFKA CONNECT: Connectors
Oficinas en Madrid: C/ Francisco Silvela, 54 Duplicado 1ºD 28028
Telf: 91 080 82 44
Oficinas en Barcelona: C/ Madrazo 27-29 4ª 08006
Telf: 933 68 52 46
6
KAFKA CONNECT: Tasks
- Each connector instance coordinates a set of tasks that copy the data.
- The tasks have no state stored in them, therefore parallelism and scalability is
really easy.
- Task state is stored in kafka special topics for both configuration and status, and
the connector instance manages them.
Oficinas en Madrid: C/ Francisco Silvela, 54 Duplicado 1ºD 28028
Telf: 91 080 82 44
Oficinas en Barcelona: C/ Madrazo 27-29 4ª 08006
Telf: 933 68 52 46
7
KAFKA CONNECT: Task rebalancing
- When a connector is first submitted the the workers rebalance the full set of
connectors in the cluster and their tasks so that each worker has approximately
the same amount of work.
- When a worker fails, tasks are rebalanced across healthy workers.
- When a task fails no rebalancing takes place, and the the task should be restarted
via the Connect REST API.
Oficinas en Madrid: C/ Francisco Silvela, 54 Duplicado 1ºD 28028
Telf: 91 080 82 44
Oficinas en Barcelona: C/ Madrazo 27-29 4ª 08006
Telf: 933 68 52 46
8
KAFKA CONNECT: Workers
- Workers are responsible of running connectors and their associated tasks.
- There are two main types of workers, standalone and distributed.
- Standalone workers are the simplest, where a single process is responsible for
executing all connectors and tasks.
- Distributed workers provide scalability and fault tolerance, since they start may
processes and coordinate execution of connectors and tasks across all of them.
Oficinas en Madrid: C/ Francisco Silvela, 54 Duplicado 1ºD 28028
Telf: 91 080 82 44
Oficinas en Barcelona: C/ Madrazo 27-29 4ª 08006
Telf: 933 68 52 46
9
KAFKA CONNECT: Converters
- They handle the support for different data formats when writing or reading in a
kafka cluster.
- Tasks use converters to change the format of the data.
- By default, Kafka connect provides with: AvroConverter, JsonConverter,
StringConverter, ByteArrayConverter
- They are completely decoupled from connectors, allowing for reusability.
Oficinas en Madrid: C/ Francisco Silvela, 54 Duplicado 1ºD 28028
Telf: 91 080 82 44
Oficinas en Barcelona: C/ Madrazo 27-29 4ª 08006
Telf: 933 68 52 46
10
KAFKA CONNECT: Transforms
- Connectors can be configured with transformations to make simple and
lightweight modifications to individual messages.
- Really convenient for minor data adjustments.
- They are chainable, and you can write your own thanks to the transformation
interface.
- Kafka connect bundles some of them already:
Cast: Cast fields or the entire key or value to a specific type, e.g. to force an integer field to
a smaller width.
ExtractField Extract the specified field from a Struct when schema present, or a Map in the case of schemaless
data.
Flatten Flatten a nested data structure.
HoistField Wrap data using the specified field name in a Struct or Map
InsertField Insert field using attributes from the record metadata or a configured static value.
MaskField Mask specified fields with a valid null value for the field type.
RegexRouter Update the record topic using the configured regular expression and replacement string.
ReplaceField Filter or rename fields.
SetSchemaMetadata Set the schema name, version, or both on the record’s key or value schema.
TimestampConverter Convert timestamps between different formats.
TimestampRouter Update the record’s topic field as a function of the original topic value and the record timestamp.
ValueToKey Replace the record key with a new key formed from a subset of fields in the record value.
Oficinas en Madrid: C/ Francisco Silvela, 54 Duplicado 1ºD 28028
Telf: 91 080 82 44
Oficinas en Barcelona: C/ Madrazo 27-29 4ª 08006
Telf: 933 68 52 46
11
KAFKA KSQL
- KSQL is the open source streaming SQL engine for Apache Kafka
- Provides an easy-to-use yet powerful interactive SQL interface for stream
processing on Kafka, without the need to write code in a programming language
such as Java or Python.
- Scalable, elastic, fault-tolerant, and real-time.
- Supports a wide range of streaming operations, including data filtering,
transformations, aggregations, joins, windowing, and sessionization.
- Some applications are streaming ETL, real-time monitoring and analytics, data
exploration and discovery, anomaly detection, personalization, IoT, and customer
360.
Oficinas en Madrid: C/ Francisco Silvela, 54 Duplicado 1ºD 28028
Telf: 91 080 82 44
Oficinas en Barcelona: C/ Madrazo 27-29 4ª 08006
Telf: 933 68 52 46
12
KAFKA KSQL components
KSQL Server
- The KSQL server runs the engine that executes KSQL queries. This includes
processing, reading, and writing data to and from the target Kafka cluster.
- KSQL servers form KSQL clusters and can run in containers, virtual machines, and
bare-metal machines. You can add and remove servers to/from the same KSQL
cluster during live operations to elastically scale KSQL’s processing capacity as
desired. You can deploy different KSQL clusters to achieve workload isolation.
KSQL CLI
- You can interactively write KSQL queries by using the KSQL command line
interface (CLI). The KSQL CLI acts as a client to the KSQL server. For production
scenarios you may also configure KSQL servers to run in non-interactive
“headless” configuration, thereby preventing KSQL CLI access.
Oficinas en Madrid: C/ Francisco Silvela, 54 Duplicado 1ºD 28028
Telf: 91 080 82 44
Oficinas en Barcelona: C/ Madrazo 27-29 4ª 08006
Telf: 933 68 52 46
13
KAFKA REST Proxy
Provides a RESTful interface to a Kafka cluster, making it easy to produce and consume
messages, view the state of the cluster, and perform administrative actions without
using the native Kafka protocol or clients. Some of the key features are:
- Metadata: Most metadata about the cluster – brokers, topics, partitions, and
configs – can be read using GET requests for the corresponding URLs.
- Producers: Instead of exposing producer objects, the API accepts produce
requests targeted at specific topics or partitions and routes them all through a
small pool of producers. Producer instances are shared, so configs cannot be set
on a per-request basis. However, you can adjust settings globally by passing new
producer settings in the REST Proxy configuration.
Oficinas en Madrid: C/ Francisco Silvela, 54 Duplicado 1ºD 28028
Telf: 91 080 82 44
Oficinas en Barcelona: C/ Madrazo 27-29 4ª 08006
Telf: 933 68 52 46
14
KAFKA REST Proxy
- Consumers: The REST Proxy uses either the high level consumer (v1 api) or the
new 0.9 consumer (v2 api) to implement consumer-groups that can read from
topics. Consumers are stateful and therefore tied to specific REST Proxy instances.
Offset commit can be either automatic or explicitly requested by the user.
Currently limited to one thread per consumer; use multiple consumers for higher
throughput.
- Data Formats: The REST Proxy can read and write data using JSON, raw bytes
encoded with base64 or using JSON-encoded Avro. With Avro, schemas are
registered and validated against the Schema Registry.
- REST Proxy Clusters and Load Balancing: The REST Proxy is designed to support
multiple instances running together to spread load and can safely be run behind
various load balancing mechanisms (e.g. round robin DNS, discovery services, load
balancers.

More Related Content

Similar to Kafka infrastructure services

Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
confluent
 

Similar to Kafka infrastructure services (20)

Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud ServicesBuild a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
 
A prototype of utilizing Apache Kafka and Lightweight M2M protocol as the bac...
A prototype of utilizing Apache Kafka and Lightweight M2M protocol as the bac...A prototype of utilizing Apache Kafka and Lightweight M2M protocol as the bac...
A prototype of utilizing Apache Kafka and Lightweight M2M protocol as the bac...
 
How Kafka and MemSQL Became the Dynamic Duo (Sarung Tripathi, MemSQL) Kafka S...
How Kafka and MemSQL Became the Dynamic Duo (Sarung Tripathi, MemSQL) Kafka S...How Kafka and MemSQL Became the Dynamic Duo (Sarung Tripathi, MemSQL) Kafka S...
How Kafka and MemSQL Became the Dynamic Duo (Sarung Tripathi, MemSQL) Kafka S...
 
Apache Big Data Europe 2015: Selected Talks
Apache Big Data Europe 2015: Selected TalksApache Big Data Europe 2015: Selected Talks
Apache Big Data Europe 2015: Selected Talks
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Kafka meetup - kafka connect
Kafka meetup -  kafka connectKafka meetup -  kafka connect
Kafka meetup - kafka connect
 
The latest cisco nexus 9000 innovations
The latest cisco nexus 9000 innovationsThe latest cisco nexus 9000 innovations
The latest cisco nexus 9000 innovations
 
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
 
London Apache Kafka Meetup (Jan 2017)
London Apache Kafka Meetup (Jan 2017)London Apache Kafka Meetup (Jan 2017)
London Apache Kafka Meetup (Jan 2017)
 
BBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.comBBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.com
 
Changing landscapes in data integration - Kafka Connect for near real-time da...
Changing landscapes in data integration - Kafka Connect for near real-time da...Changing landscapes in data integration - Kafka Connect for near real-time da...
Changing landscapes in data integration - Kafka Connect for near real-time da...
 
Kafka summit apac session
Kafka summit apac sessionKafka summit apac session
Kafka summit apac session
 
GumGum: Multi-Region Cassandra in AWS
GumGum: Multi-Region Cassandra in AWSGumGum: Multi-Region Cassandra in AWS
GumGum: Multi-Region Cassandra in AWS
 
Building Event Streaming Architectures on Scylla and Kafka
Building Event Streaming Architectures on Scylla and KafkaBuilding Event Streaming Architectures on Scylla and Kafka
Building Event Streaming Architectures on Scylla and Kafka
 
Au delà des brokers, un tour de l’environnement Kafka | Florent Ramière
Au delà des brokers, un tour de l’environnement Kafka | Florent RamièreAu delà des brokers, un tour de l’environnement Kafka | Florent Ramière
Au delà des brokers, un tour de l’environnement Kafka | Florent Ramière
 
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
 
Integration for real-time Kafka SQL
Integration for real-time Kafka SQLIntegration for real-time Kafka SQL
Integration for real-time Kafka SQL
 
A Tour of Apache Kafka
A Tour of Apache KafkaA Tour of Apache Kafka
A Tour of Apache Kafka
 
Install Cisco 3850 Switches to Automate Processes| Leveraging Innovation With...
Install Cisco 3850 Switches to Automate Processes| Leveraging Innovation With...Install Cisco 3850 Switches to Automate Processes| Leveraging Innovation With...
Install Cisco 3850 Switches to Automate Processes| Leveraging Innovation With...
 
All Streams Ahead! ksqlDB Workshop ANZ
All Streams Ahead! ksqlDB Workshop ANZAll Streams Ahead! ksqlDB Workshop ANZ
All Streams Ahead! ksqlDB Workshop ANZ
 

Recently uploaded

Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
mbmh111980
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
Alluxio, Inc.
 

Recently uploaded (20)

how-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdfhow-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdf
 
KLARNA - Language Models and Knowledge Graphs: A Systems Approach
KLARNA -  Language Models and Knowledge Graphs: A Systems ApproachKLARNA -  Language Models and Knowledge Graphs: A Systems Approach
KLARNA - Language Models and Knowledge Graphs: A Systems Approach
 
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with StrimziStrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi
 
Workforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdfWorkforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdf
 
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
 
APVP,apvp apvp High quality supplier safe spot transport, 98% purity
APVP,apvp apvp High quality supplier safe spot transport, 98% purityAPVP,apvp apvp High quality supplier safe spot transport, 98% purity
APVP,apvp apvp High quality supplier safe spot transport, 98% purity
 
10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf
 
How to install and activate eGrabber JobGrabber
How to install and activate eGrabber JobGrabberHow to install and activate eGrabber JobGrabber
How to install and activate eGrabber JobGrabber
 
Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024
 
AI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in Michelangelo
 
CompTIA Security+ (Study Notes) for cs.pdf
CompTIA Security+ (Study Notes) for cs.pdfCompTIA Security+ (Study Notes) for cs.pdf
CompTIA Security+ (Study Notes) for cs.pdf
 
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
 
Breaking the Code : A Guide to WhatsApp Business API.pdf
Breaking the Code : A Guide to WhatsApp Business API.pdfBreaking the Code : A Guide to WhatsApp Business API.pdf
Breaking the Code : A Guide to WhatsApp Business API.pdf
 
iGaming Platform & Lottery Solutions by Skilrock
iGaming Platform & Lottery Solutions by SkilrockiGaming Platform & Lottery Solutions by Skilrock
iGaming Platform & Lottery Solutions by Skilrock
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
 
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
 
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product UpdatesGraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
 
The Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion ProductionThe Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion Production
 
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfA Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
 

Kafka infrastructure services

  • 1. Oficinas en Madrid: C/ Francisco Silvela, 54 Duplicado 1ºD 28028 Telf: 91 080 82 44 Oficinas en Barcelona: C/ Madrazo 27-29 4ª 08006 Telf: 933 68 52 46 1 KAFKA INFRASTRUCTURE: SERVICES
  • 2. Oficinas en Madrid: C/ Francisco Silvela, 54 Duplicado 1ºD 28028 Telf: 91 080 82 44 Oficinas en Barcelona: C/ Madrazo 27-29 4ª 08006 Telf: 933 68 52 46 1. Kafka Schema Registry 2. Kafka Connect & Connectors 3. KSQL 4. HTTP Proxy 2 $intro --help
  • 3. Oficinas en Madrid: C/ Francisco Silvela, 54 Duplicado 1ºD 28028 Telf: 91 080 82 44 Oficinas en Barcelona: C/ Madrazo 27-29 4ª 08006 Telf: 933 68 52 46 - The Schema Registry is a distributed storage layer for Avro Schemas which uses Kafka as its underlying storage mechanism. - Assigns globally unique id to each registered schema. Allocated ids are guaranteed to be monotonically increasing but not necessarily consecutive. - Kafka provides the durable backend, and functions as a write-ahead changelog for the state of the Schema Registry and the schemas it contains. - The Schema Registry is designed to be distributed, with single-master architecture, and ZooKeeper/Kafka coordinates master election - Memory & CPU consumption is really low, even for bigger kafka clusters or many event schemas. - Schema Registry doesn’t have its own disk data, all the schemas reside in a kafka log store + in memory indices. - Avoid running schema registry clusters in different data centers, the increase in network and/or network availability problems may hinder the performance of the whole kafka cluster. - Backup your schemas topic constantly (e.g. with kafka sink connector) 3 KAFKA SCHEMA REGISTRY
  • 4. Oficinas en Madrid: C/ Francisco Silvela, 54 Duplicado 1ºD 28028 Telf: 91 080 82 44 Oficinas en Barcelona: C/ Madrazo 27-29 4ª 08006 Telf: 933 68 52 46 - Kafka connect is a framework for connecting Kafka with external systems and data sources. - It allows for source connectors to ingest data from various elements e. g. databases, filesystems, logs, etc. - On the other hand, sink connectors delivers data from kafka topics onto other indexes (Elasticsearch, Hadoop, etc). - It can be run standalone or clusterized. - Connectors allow for an abstraction layer when pulling or pushing data to Kafka - They are extremely flexible and scalable (Clusterization, batching, streaming, etc) - One can extend or reuse the connectors with ease. 4 KAFKA CONNECT
  • 5. Oficinas en Madrid: C/ Francisco Silvela, 54 Duplicado 1ºD 28028 Telf: 91 080 82 44 Oficinas en Barcelona: C/ Madrazo 27-29 4ª 08006 Telf: 933 68 52 46 - Connectors define the source and target of the data. - A connector instance manages the copying of data between kafka and another system. - A connector plugin is the packaging of the classes used by said instance, which then one can deploy in instances and replicate at will 5 KAFKA CONNECT: Connectors
  • 6. Oficinas en Madrid: C/ Francisco Silvela, 54 Duplicado 1ºD 28028 Telf: 91 080 82 44 Oficinas en Barcelona: C/ Madrazo 27-29 4ª 08006 Telf: 933 68 52 46 6 KAFKA CONNECT: Tasks - Each connector instance coordinates a set of tasks that copy the data. - The tasks have no state stored in them, therefore parallelism and scalability is really easy. - Task state is stored in kafka special topics for both configuration and status, and the connector instance manages them.
  • 7. Oficinas en Madrid: C/ Francisco Silvela, 54 Duplicado 1ºD 28028 Telf: 91 080 82 44 Oficinas en Barcelona: C/ Madrazo 27-29 4ª 08006 Telf: 933 68 52 46 7 KAFKA CONNECT: Task rebalancing - When a connector is first submitted the the workers rebalance the full set of connectors in the cluster and their tasks so that each worker has approximately the same amount of work. - When a worker fails, tasks are rebalanced across healthy workers. - When a task fails no rebalancing takes place, and the the task should be restarted via the Connect REST API.
  • 8. Oficinas en Madrid: C/ Francisco Silvela, 54 Duplicado 1ºD 28028 Telf: 91 080 82 44 Oficinas en Barcelona: C/ Madrazo 27-29 4ª 08006 Telf: 933 68 52 46 8 KAFKA CONNECT: Workers - Workers are responsible of running connectors and their associated tasks. - There are two main types of workers, standalone and distributed. - Standalone workers are the simplest, where a single process is responsible for executing all connectors and tasks. - Distributed workers provide scalability and fault tolerance, since they start may processes and coordinate execution of connectors and tasks across all of them.
  • 9. Oficinas en Madrid: C/ Francisco Silvela, 54 Duplicado 1ºD 28028 Telf: 91 080 82 44 Oficinas en Barcelona: C/ Madrazo 27-29 4ª 08006 Telf: 933 68 52 46 9 KAFKA CONNECT: Converters - They handle the support for different data formats when writing or reading in a kafka cluster. - Tasks use converters to change the format of the data. - By default, Kafka connect provides with: AvroConverter, JsonConverter, StringConverter, ByteArrayConverter - They are completely decoupled from connectors, allowing for reusability.
  • 10. Oficinas en Madrid: C/ Francisco Silvela, 54 Duplicado 1ºD 28028 Telf: 91 080 82 44 Oficinas en Barcelona: C/ Madrazo 27-29 4ª 08006 Telf: 933 68 52 46 10 KAFKA CONNECT: Transforms - Connectors can be configured with transformations to make simple and lightweight modifications to individual messages. - Really convenient for minor data adjustments. - They are chainable, and you can write your own thanks to the transformation interface. - Kafka connect bundles some of them already: Cast: Cast fields or the entire key or value to a specific type, e.g. to force an integer field to a smaller width. ExtractField Extract the specified field from a Struct when schema present, or a Map in the case of schemaless data. Flatten Flatten a nested data structure. HoistField Wrap data using the specified field name in a Struct or Map InsertField Insert field using attributes from the record metadata or a configured static value. MaskField Mask specified fields with a valid null value for the field type. RegexRouter Update the record topic using the configured regular expression and replacement string. ReplaceField Filter or rename fields. SetSchemaMetadata Set the schema name, version, or both on the record’s key or value schema. TimestampConverter Convert timestamps between different formats. TimestampRouter Update the record’s topic field as a function of the original topic value and the record timestamp. ValueToKey Replace the record key with a new key formed from a subset of fields in the record value.
  • 11. Oficinas en Madrid: C/ Francisco Silvela, 54 Duplicado 1ºD 28028 Telf: 91 080 82 44 Oficinas en Barcelona: C/ Madrazo 27-29 4ª 08006 Telf: 933 68 52 46 11 KAFKA KSQL - KSQL is the open source streaming SQL engine for Apache Kafka - Provides an easy-to-use yet powerful interactive SQL interface for stream processing on Kafka, without the need to write code in a programming language such as Java or Python. - Scalable, elastic, fault-tolerant, and real-time. - Supports a wide range of streaming operations, including data filtering, transformations, aggregations, joins, windowing, and sessionization. - Some applications are streaming ETL, real-time monitoring and analytics, data exploration and discovery, anomaly detection, personalization, IoT, and customer 360.
  • 12. Oficinas en Madrid: C/ Francisco Silvela, 54 Duplicado 1ºD 28028 Telf: 91 080 82 44 Oficinas en Barcelona: C/ Madrazo 27-29 4ª 08006 Telf: 933 68 52 46 12 KAFKA KSQL components KSQL Server - The KSQL server runs the engine that executes KSQL queries. This includes processing, reading, and writing data to and from the target Kafka cluster. - KSQL servers form KSQL clusters and can run in containers, virtual machines, and bare-metal machines. You can add and remove servers to/from the same KSQL cluster during live operations to elastically scale KSQL’s processing capacity as desired. You can deploy different KSQL clusters to achieve workload isolation. KSQL CLI - You can interactively write KSQL queries by using the KSQL command line interface (CLI). The KSQL CLI acts as a client to the KSQL server. For production scenarios you may also configure KSQL servers to run in non-interactive “headless” configuration, thereby preventing KSQL CLI access.
  • 13. Oficinas en Madrid: C/ Francisco Silvela, 54 Duplicado 1ºD 28028 Telf: 91 080 82 44 Oficinas en Barcelona: C/ Madrazo 27-29 4ª 08006 Telf: 933 68 52 46 13 KAFKA REST Proxy Provides a RESTful interface to a Kafka cluster, making it easy to produce and consume messages, view the state of the cluster, and perform administrative actions without using the native Kafka protocol or clients. Some of the key features are: - Metadata: Most metadata about the cluster – brokers, topics, partitions, and configs – can be read using GET requests for the corresponding URLs. - Producers: Instead of exposing producer objects, the API accepts produce requests targeted at specific topics or partitions and routes them all through a small pool of producers. Producer instances are shared, so configs cannot be set on a per-request basis. However, you can adjust settings globally by passing new producer settings in the REST Proxy configuration.
  • 14. Oficinas en Madrid: C/ Francisco Silvela, 54 Duplicado 1ºD 28028 Telf: 91 080 82 44 Oficinas en Barcelona: C/ Madrazo 27-29 4ª 08006 Telf: 933 68 52 46 14 KAFKA REST Proxy - Consumers: The REST Proxy uses either the high level consumer (v1 api) or the new 0.9 consumer (v2 api) to implement consumer-groups that can read from topics. Consumers are stateful and therefore tied to specific REST Proxy instances. Offset commit can be either automatic or explicitly requested by the user. Currently limited to one thread per consumer; use multiple consumers for higher throughput. - Data Formats: The REST Proxy can read and write data using JSON, raw bytes encoded with base64 or using JSON-encoded Avro. With Avro, schemas are registered and validated against the Schema Registry. - REST Proxy Clusters and Load Balancing: The REST Proxy is designed to support multiple instances running together to spread load and can safely be run behind various load balancing mechanisms (e.g. round robin DNS, discovery services, load balancers.