SlideShare a Scribd company logo
1 of 24
Connecting Kafka Message
Systems with Scylla
Maheedhar Gunturu, Solutions Architect
Presenter
Maheedhar Gunturu, Solutions Architect
Maheedhar held senior roles both in engineering and sales
organizations. He has over a decade of experience designing &
developing server-side applications in the cloud and working on
big data and ETL frameworks in companies such as Samsung,
MapR, Apple, VoltDB, Zscaler and Qualcomm. He holds a
masters degree in Electrical and Computer engineering from
the University of Texas at San Antonio.
Agenda
1. Benefits of Message Queues
2. Kafka Connect Framework
Benefits of Message Queues
Benefits of messaging queues
■ Centralized Infrastructure.
■ Intermediate layer for buffering.
■ Export and Import capabilities.
■ Publish CDC streams.
■ Integrate with various applications.
■ Streaming Data Transformations.
■ Impedance mismatch between applications.
■ Ability to recreate state.
Centralized Infrastructure
Microservices
Apps
Operational
Applications
Data Warehouse
Databases
Producers
Consumers
Database
change
Microservices
events
SaaS
data
Operational
Alerts
Streams of real time events
Stream processing apps
Databases
Intermediate layer for buffering.
■ Provides flexibility for downstream Consumers.
● Buffer data while upgrades, migrations or troubleshooting.
■ Downstream systems don't have to be provisioned for peak traffic
● Save hardware costs.
■ Dynamically scalable layer to handle bursty loads
● Add more partitions/brokers to increase parallelism and throughput.
● Use kafka operator and it will dynamically scale the cluster based on ingress traffic.
■ Provides resiliency and fault tolerance
● Each Topic has replicas available with multiple partitions spread across multiple
brokers.
● Set TTLs at the topic level to determine retention.
Export and Import capabilities.
Publish CDC streams
■ Publish record level changes to the corresponding Topics.
● Usually configurable to what level of detail you want in the change records.
■ Upstream changes from watched rows emitted as a change record
● The format of these rows is in a configurable format (JSON, Avro etc)
■ Downstream processing for reporting, caching, or full-text indexing.
● Subscribe with the corresponding consumer (i.e. Scylla, elasticsearch, spark)
■ Changefeeds are emitted with at-least-once delivery guarantees.
● In most cases, each version of a row will be emitted once. However, some infrequent
conditions (e.g., node failures, network partitions) will cause them to be repeated.
Integrations
Scylla
Mongo
Example Consumers
Serializer
App 2
Serializer
App 3
!
Schema
Registry
Elastic
Serializer
App 1
!
Kafka Topic
● Define the expected fields for each Kafka topic
● Automatically handle schema changes (e.g. new
fields)
● Prevent backwards incompatible
changes
● Support multi-data center environments
Hbase
Streaming Data Transformations.
Streams
API
Producer
Topic TopicTopic
Consumer Consumer
Overview
• Write standard Java applications
• No separate processing cluster required
• Exactly-once processing semantics
• Elastic, highly scalable, fault-tolerant
• Fully integrated with Kafka security
Example Use Cases
• Event-Driven Microservices
• Continuous queries
• Continuous transformations
Kafka Cluster
■ Applications produce and consume data at a different rate.
● Provides flexibility for the downstream applications to scale based on their SLAs
■ Downstream applications can be independently scaled
● Dynamically move partitions to optimize resource utilization and reliability.
● Enable elastic scaling by easily adding and removing nodes from your Kafka cluster.
■ Tuning topic’s configuration will help in efficient use of consumers
● Determine the ratio between number of partitions in a topic and number of
consumers.
● ADB traffic is throttled upon data transfers to ensure network bandwidth
Impedance mismatch between applications.
Event Sourcing
■ Every change to the state of an application is captured in an event
object.
● Order of the events needs to be maintained.
■ Ability to recreate state in your application and the supporting
database.
● cqrs provides the benefit of event sourcing analogous to a materialized view
● Need to keep track of lineage and the transformations that were run on the data.
■ Newer versions of ML algorithms can operate on the raw Event data
to recreate the state in the database.
● Better model serving/benchmarking.
Kafka Connect Framework
15
Scylla & Confluent
Kafka Connect Features
01
A standard framework for
Kafka connectors.
04
Distributed & scalable by
default.
04
Automatic offset mgmt.
02
Distributed and standalone
modes.
06
Streaming/batch integration.
03
REST interface for configuration.
Port: 8083
Kafka Connect API
CDC
Database
Mongo
Cassandra
Elastic
Scylla
HDFS
Kafka Connect API
Kafka Pipeline
Connect Worker
Connect worker
Connect worker
Connect Worker
Connect Worker
Connect Worker
Sources Sinks
Auto-recovery and
Fault tolerant
Manage hundreds of
data sources and
sinks
Preserves data
schema
Integrated within
Confluent Control
Center
Simple Parallelism
Configuring Kafka Connect (sink)
#sample casssandra-sink.properties file
name=sink
topics=temperature
tasks.max=1
connector.class=io.confluent.connect.cassandra.CassandraSinkConn
ector
cassandra.contact.points=<PUBLIC IPs of your SCYLLA Cluster
(IP1,IP2,IP3)>
cassandra.keyspace=demo
cassandra.compression=SNAPPY
cassandra.consistency.level=LOCAL_QUORUM
transforms=prune
transforms.prune.type=org.apache.kafka.connect.transforms.Replac
eField$Value
transforms.prune.whitelist=CreatedAt,Id,Text,Source,Truncated
1. Update the sink.properties
2. Update the connect-
distributed.properties file
3. start the Connect framework using the
Cassandra connector in distributed
mode.
ref: https://www.scylladb.com/2018/12/19/scylla-and-confluent-for-iot/
Kafka Connect Security
Encryption
■ Kafka Connect also works with SSL-encrypted connections to these
brokers.
Authentication
■ Kafka Connect works with SASL – e.g. Kerberos, Active Directory
Authorization
■ Restrict who can create, write to, read from topics, and more
■ REST API for Kafka Connect nodes are not secure.
● Require an external proxy (eg Apache HTTP) to act as a secure gateway to the REST
services, when configuring a secure cluster.
Confluent Hub
■ Discover and share
Connectors
■ Cassandra (OSS) and
Dynamodb Source/Sink
connectors available.
■ Scylla Shard aware
connector to be
published soon!
https://www.confluent.io/hub/confluentinc/kafka-connect-cassandra
Take away
TakeAways
■ Message queues are useful for a variety of reasons.
■ Scylla Kafka Connecter ( Sink and CDC source) will be coming out
soon!
■ Event Streaming and Event-driven microservices are useful - try it
out!
Thank you Stay in touch
Any questions?
Maheedhar Gunturu
maheedhar@scylladb.com
@vanguard_space
some useful links
Here are some useful links for further reading/watching.
1. Useful video explaining most things for a low level of understanding – https://www.confluent.io/kafka-summit-sf18/so-
you-want-to-write-a-connector
2. Confluent’s Developer guide to connectors which covers most basics –
https://docs.confluent.io/current/connect/devguide.html
3. The source for above developer guide is available through maven here –
https://mvnrepository.com/artifact/org.apache.kafka/connect-file/2.1.1
4. Useful guide providing additional best practices ( now deprecated though still useful) –
https://docs.google.com/document/d/1jEn_G-KDsrhdecPTGIWIcke1I4gw4fR0G8OVj8e3iAI/edit#
5. Verification guide though a little generic as it is for both Connectors and Consumer/producers –
https://www.confluent.io/wp-content/uploads/Verification-Guide-Confluent-Platform-Connectors-Integrations.pdf
6. https://opencredo.com/blogs/kafka-connect-source-connectors-a-detailed-guide-to-connecting-to-what-you-love/
7. https://www.confluent.io/blog/simplest-useful-kafka-connect-data-pipeline-world-thereabouts-part-1/
8. https://www.confluent.io/blog/the-simplest-useful-kafka-connect-data-pipeline-in-the-world-or-thereabouts-part-2/
9. https://www.confluent.io/blog/simplest-useful-kafka-connect-data-pipeline-world-thereabouts-part-3/

More Related Content

What's hot

Cassandra - Tips And Techniques
Cassandra - Tips And TechniquesCassandra - Tips And Techniques
Cassandra - Tips And TechniquesKnoldus Inc.
 
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...ScyllaDB
 
Deploying Kafka at Dropbox, Mark Smith, Sean Fellows
Deploying Kafka at Dropbox, Mark Smith, Sean FellowsDeploying Kafka at Dropbox, Mark Smith, Sean Fellows
Deploying Kafka at Dropbox, Mark Smith, Sean Fellowsconfluent
 
Seastar Summit 2019 Keynote
Seastar Summit 2019 KeynoteSeastar Summit 2019 Keynote
Seastar Summit 2019 KeynoteScyllaDB
 
Scylla Summit 2019 Keynote - Dor Laor - Beyond Cassandra
Scylla Summit 2019 Keynote - Dor Laor - Beyond CassandraScylla Summit 2019 Keynote - Dor Laor - Beyond Cassandra
Scylla Summit 2019 Keynote - Dor Laor - Beyond CassandraScyllaDB
 
From Three Nines to Five Nines - A Kafka Journey
From Three Nines to Five Nines - A Kafka JourneyFrom Three Nines to Five Nines - A Kafka Journey
From Three Nines to Five Nines - A Kafka JourneyAllen (Xiaozhong) Wang
 
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life ExampleKafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Exampleconfluent
 
Administrative techniques to reduce Kafka costs | Anna Kepler, Viasat
Administrative techniques to reduce Kafka costs | Anna Kepler, ViasatAdministrative techniques to reduce Kafka costs | Anna Kepler, Viasat
Administrative techniques to reduce Kafka costs | Anna Kepler, ViasatHostedbyConfluent
 
Scylla Summit 2016: Scylla at Samsung SDS
Scylla Summit 2016: Scylla at Samsung SDSScylla Summit 2016: Scylla at Samsung SDS
Scylla Summit 2016: Scylla at Samsung SDSScyllaDB
 
Scylla Summit 2018: Scylla Feature Talks - Scylla Streaming and Repair Updates
Scylla Summit 2018: Scylla Feature Talks - Scylla Streaming and Repair UpdatesScylla Summit 2018: Scylla Feature Talks - Scylla Streaming and Repair Updates
Scylla Summit 2018: Scylla Feature Talks - Scylla Streaming and Repair UpdatesScyllaDB
 
How netflix manages petabyte scale apache cassandra in the cloud
How netflix manages petabyte scale apache cassandra in the cloudHow netflix manages petabyte scale apache cassandra in the cloud
How netflix manages petabyte scale apache cassandra in the cloudVinay Kumar Chella
 
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQDataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQHakka Labs
 
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kuberneteshbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on KubernetesHBaseCon
 
Performance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla ClusterPerformance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla ClusterScyllaDB
 
Kickstart your Kafka with Faker Data | Francesco Tisiot, Aiven.io
Kickstart your Kafka with Faker Data | Francesco Tisiot, Aiven.ioKickstart your Kafka with Faker Data | Francesco Tisiot, Aiven.io
Kickstart your Kafka with Faker Data | Francesco Tisiot, Aiven.ioHostedbyConfluent
 
Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla
Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with ScyllaScylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla
Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with ScyllaScyllaDB
 
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...ScyllaDB
 
Lookout on Scaling Security to 100 Million Devices
Lookout on Scaling Security to 100 Million DevicesLookout on Scaling Security to 100 Million Devices
Lookout on Scaling Security to 100 Million DevicesScyllaDB
 
Cassandra Summit 2014: Active-Active Cassandra Behind the Scenes
Cassandra Summit 2014: Active-Active Cassandra Behind the ScenesCassandra Summit 2014: Active-Active Cassandra Behind the Scenes
Cassandra Summit 2014: Active-Active Cassandra Behind the ScenesDataStax Academy
 

What's hot (20)

Cassandra - Tips And Techniques
Cassandra - Tips And TechniquesCassandra - Tips And Techniques
Cassandra - Tips And Techniques
 
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...
 
Deploying Kafka at Dropbox, Mark Smith, Sean Fellows
Deploying Kafka at Dropbox, Mark Smith, Sean FellowsDeploying Kafka at Dropbox, Mark Smith, Sean Fellows
Deploying Kafka at Dropbox, Mark Smith, Sean Fellows
 
Seastar Summit 2019 Keynote
Seastar Summit 2019 KeynoteSeastar Summit 2019 Keynote
Seastar Summit 2019 Keynote
 
Scylla Summit 2019 Keynote - Dor Laor - Beyond Cassandra
Scylla Summit 2019 Keynote - Dor Laor - Beyond CassandraScylla Summit 2019 Keynote - Dor Laor - Beyond Cassandra
Scylla Summit 2019 Keynote - Dor Laor - Beyond Cassandra
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
 
From Three Nines to Five Nines - A Kafka Journey
From Three Nines to Five Nines - A Kafka JourneyFrom Three Nines to Five Nines - A Kafka Journey
From Three Nines to Five Nines - A Kafka Journey
 
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life ExampleKafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
 
Administrative techniques to reduce Kafka costs | Anna Kepler, Viasat
Administrative techniques to reduce Kafka costs | Anna Kepler, ViasatAdministrative techniques to reduce Kafka costs | Anna Kepler, Viasat
Administrative techniques to reduce Kafka costs | Anna Kepler, Viasat
 
Scylla Summit 2016: Scylla at Samsung SDS
Scylla Summit 2016: Scylla at Samsung SDSScylla Summit 2016: Scylla at Samsung SDS
Scylla Summit 2016: Scylla at Samsung SDS
 
Scylla Summit 2018: Scylla Feature Talks - Scylla Streaming and Repair Updates
Scylla Summit 2018: Scylla Feature Talks - Scylla Streaming and Repair UpdatesScylla Summit 2018: Scylla Feature Talks - Scylla Streaming and Repair Updates
Scylla Summit 2018: Scylla Feature Talks - Scylla Streaming and Repair Updates
 
How netflix manages petabyte scale apache cassandra in the cloud
How netflix manages petabyte scale apache cassandra in the cloudHow netflix manages petabyte scale apache cassandra in the cloud
How netflix manages petabyte scale apache cassandra in the cloud
 
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQDataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
 
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kuberneteshbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
 
Performance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla ClusterPerformance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla Cluster
 
Kickstart your Kafka with Faker Data | Francesco Tisiot, Aiven.io
Kickstart your Kafka with Faker Data | Francesco Tisiot, Aiven.ioKickstart your Kafka with Faker Data | Francesco Tisiot, Aiven.io
Kickstart your Kafka with Faker Data | Francesco Tisiot, Aiven.io
 
Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla
Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with ScyllaScylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla
Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla
 
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
 
Lookout on Scaling Security to 100 Million Devices
Lookout on Scaling Security to 100 Million DevicesLookout on Scaling Security to 100 Million Devices
Lookout on Scaling Security to 100 Million Devices
 
Cassandra Summit 2014: Active-Active Cassandra Behind the Scenes
Cassandra Summit 2014: Active-Active Cassandra Behind the ScenesCassandra Summit 2014: Active-Active Cassandra Behind the Scenes
Cassandra Summit 2014: Active-Active Cassandra Behind the Scenes
 

Similar to Connecting kafka message systems with scylla

Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using KafkaKnoldus Inc.
 
BBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.comBBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.comCedric Vidal
 
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...DataStax Academy
 
A prototype of utilizing Apache Kafka and Lightweight M2M protocol as the bac...
A prototype of utilizing Apache Kafka and Lightweight M2M protocol as the bac...A prototype of utilizing Apache Kafka and Lightweight M2M protocol as the bac...
A prototype of utilizing Apache Kafka and Lightweight M2M protocol as the bac...Christos Vasilakis
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Anton Nazaruk
 
Event Streaming Architectures with Confluent and ScyllaDB
Event Streaming Architectures with Confluent and ScyllaDBEvent Streaming Architectures with Confluent and ScyllaDB
Event Streaming Architectures with Confluent and ScyllaDBScyllaDB
 
Captial One: Why Stream Data as Part of Data Transformation?
Captial One: Why Stream Data as Part of Data Transformation?Captial One: Why Stream Data as Part of Data Transformation?
Captial One: Why Stream Data as Part of Data Transformation?ScyllaDB
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Helena Edelson
 
Redpanda and ClickHouse
Redpanda and ClickHouseRedpanda and ClickHouse
Redpanda and ClickHouseAltinity Ltd
 
SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion Dubai
SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion DubaiSMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion Dubai
SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion DubaiCodemotion Dubai
 
Implementing Domain Events with Kafka
Implementing Domain Events with KafkaImplementing Domain Events with Kafka
Implementing Domain Events with KafkaAndrei Rugina
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Guido Schmutz
 
Building Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaBuilding Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaSlim Baltagi
 
Event Driven Architectures with Apache Kafka
Event Driven Architectures with Apache KafkaEvent Driven Architectures with Apache Kafka
Event Driven Architectures with Apache KafkaMatt Masuda
 
Timothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for MLTimothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for MLEdunomica
 
How Kafka and MemSQL Became the Dynamic Duo (Sarung Tripathi, MemSQL) Kafka S...
How Kafka and MemSQL Became the Dynamic Duo (Sarung Tripathi, MemSQL) Kafka S...How Kafka and MemSQL Became the Dynamic Duo (Sarung Tripathi, MemSQL) Kafka S...
How Kafka and MemSQL Became the Dynamic Duo (Sarung Tripathi, MemSQL) Kafka S...HostedbyConfluent
 
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Timothy Spann
 
Microservices Integration Patterns with Kafka
Microservices Integration Patterns with KafkaMicroservices Integration Patterns with Kafka
Microservices Integration Patterns with KafkaKasun Indrasiri
 
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Building streaming data applications using Kafka*[Connect + Core + Streams] b...Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Building streaming data applications using Kafka*[Connect + Core + Streams] b...Data Con LA
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Monal Daxini
 

Similar to Connecting kafka message systems with scylla (20)

Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
BBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.comBBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.com
 
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
 
A prototype of utilizing Apache Kafka and Lightweight M2M protocol as the bac...
A prototype of utilizing Apache Kafka and Lightweight M2M protocol as the bac...A prototype of utilizing Apache Kafka and Lightweight M2M protocol as the bac...
A prototype of utilizing Apache Kafka and Lightweight M2M protocol as the bac...
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
Event Streaming Architectures with Confluent and ScyllaDB
Event Streaming Architectures with Confluent and ScyllaDBEvent Streaming Architectures with Confluent and ScyllaDB
Event Streaming Architectures with Confluent and ScyllaDB
 
Captial One: Why Stream Data as Part of Data Transformation?
Captial One: Why Stream Data as Part of Data Transformation?Captial One: Why Stream Data as Part of Data Transformation?
Captial One: Why Stream Data as Part of Data Transformation?
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
 
Redpanda and ClickHouse
Redpanda and ClickHouseRedpanda and ClickHouse
Redpanda and ClickHouse
 
SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion Dubai
SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion DubaiSMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion Dubai
SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion Dubai
 
Implementing Domain Events with Kafka
Implementing Domain Events with KafkaImplementing Domain Events with Kafka
Implementing Domain Events with Kafka
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 
Building Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaBuilding Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache Kafka
 
Event Driven Architectures with Apache Kafka
Event Driven Architectures with Apache KafkaEvent Driven Architectures with Apache Kafka
Event Driven Architectures with Apache Kafka
 
Timothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for MLTimothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for ML
 
How Kafka and MemSQL Became the Dynamic Duo (Sarung Tripathi, MemSQL) Kafka S...
How Kafka and MemSQL Became the Dynamic Duo (Sarung Tripathi, MemSQL) Kafka S...How Kafka and MemSQL Became the Dynamic Duo (Sarung Tripathi, MemSQL) Kafka S...
How Kafka and MemSQL Became the Dynamic Duo (Sarung Tripathi, MemSQL) Kafka S...
 
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
 
Microservices Integration Patterns with Kafka
Microservices Integration Patterns with KafkaMicroservices Integration Patterns with Kafka
Microservices Integration Patterns with Kafka
 
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Building streaming data applications using Kafka*[Connect + Core + Streams] b...Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
 

Recently uploaded

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Recently uploaded (20)

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Connecting kafka message systems with scylla

  • 1. Connecting Kafka Message Systems with Scylla Maheedhar Gunturu, Solutions Architect
  • 2. Presenter Maheedhar Gunturu, Solutions Architect Maheedhar held senior roles both in engineering and sales organizations. He has over a decade of experience designing & developing server-side applications in the cloud and working on big data and ETL frameworks in companies such as Samsung, MapR, Apple, VoltDB, Zscaler and Qualcomm. He holds a masters degree in Electrical and Computer engineering from the University of Texas at San Antonio.
  • 3. Agenda 1. Benefits of Message Queues 2. Kafka Connect Framework
  • 5. Benefits of messaging queues ■ Centralized Infrastructure. ■ Intermediate layer for buffering. ■ Export and Import capabilities. ■ Publish CDC streams. ■ Integrate with various applications. ■ Streaming Data Transformations. ■ Impedance mismatch between applications. ■ Ability to recreate state.
  • 7. Intermediate layer for buffering. ■ Provides flexibility for downstream Consumers. ● Buffer data while upgrades, migrations or troubleshooting. ■ Downstream systems don't have to be provisioned for peak traffic ● Save hardware costs. ■ Dynamically scalable layer to handle bursty loads ● Add more partitions/brokers to increase parallelism and throughput. ● Use kafka operator and it will dynamically scale the cluster based on ingress traffic. ■ Provides resiliency and fault tolerance ● Each Topic has replicas available with multiple partitions spread across multiple brokers. ● Set TTLs at the topic level to determine retention.
  • 8. Export and Import capabilities.
  • 9. Publish CDC streams ■ Publish record level changes to the corresponding Topics. ● Usually configurable to what level of detail you want in the change records. ■ Upstream changes from watched rows emitted as a change record ● The format of these rows is in a configurable format (JSON, Avro etc) ■ Downstream processing for reporting, caching, or full-text indexing. ● Subscribe with the corresponding consumer (i.e. Scylla, elasticsearch, spark) ■ Changefeeds are emitted with at-least-once delivery guarantees. ● In most cases, each version of a row will be emitted once. However, some infrequent conditions (e.g., node failures, network partitions) will cause them to be repeated.
  • 10. Integrations Scylla Mongo Example Consumers Serializer App 2 Serializer App 3 ! Schema Registry Elastic Serializer App 1 ! Kafka Topic ● Define the expected fields for each Kafka topic ● Automatically handle schema changes (e.g. new fields) ● Prevent backwards incompatible changes ● Support multi-data center environments Hbase
  • 11. Streaming Data Transformations. Streams API Producer Topic TopicTopic Consumer Consumer Overview • Write standard Java applications • No separate processing cluster required • Exactly-once processing semantics • Elastic, highly scalable, fault-tolerant • Fully integrated with Kafka security Example Use Cases • Event-Driven Microservices • Continuous queries • Continuous transformations Kafka Cluster
  • 12. ■ Applications produce and consume data at a different rate. ● Provides flexibility for the downstream applications to scale based on their SLAs ■ Downstream applications can be independently scaled ● Dynamically move partitions to optimize resource utilization and reliability. ● Enable elastic scaling by easily adding and removing nodes from your Kafka cluster. ■ Tuning topic’s configuration will help in efficient use of consumers ● Determine the ratio between number of partitions in a topic and number of consumers. ● ADB traffic is throttled upon data transfers to ensure network bandwidth Impedance mismatch between applications.
  • 13. Event Sourcing ■ Every change to the state of an application is captured in an event object. ● Order of the events needs to be maintained. ■ Ability to recreate state in your application and the supporting database. ● cqrs provides the benefit of event sourcing analogous to a materialized view ● Need to keep track of lineage and the transformations that were run on the data. ■ Newer versions of ML algorithms can operate on the raw Event data to recreate the state in the database. ● Better model serving/benchmarking.
  • 16. Kafka Connect Features 01 A standard framework for Kafka connectors. 04 Distributed & scalable by default. 04 Automatic offset mgmt. 02 Distributed and standalone modes. 06 Streaming/batch integration. 03 REST interface for configuration. Port: 8083
  • 17. Kafka Connect API CDC Database Mongo Cassandra Elastic Scylla HDFS Kafka Connect API Kafka Pipeline Connect Worker Connect worker Connect worker Connect Worker Connect Worker Connect Worker Sources Sinks Auto-recovery and Fault tolerant Manage hundreds of data sources and sinks Preserves data schema Integrated within Confluent Control Center Simple Parallelism
  • 18. Configuring Kafka Connect (sink) #sample casssandra-sink.properties file name=sink topics=temperature tasks.max=1 connector.class=io.confluent.connect.cassandra.CassandraSinkConn ector cassandra.contact.points=<PUBLIC IPs of your SCYLLA Cluster (IP1,IP2,IP3)> cassandra.keyspace=demo cassandra.compression=SNAPPY cassandra.consistency.level=LOCAL_QUORUM transforms=prune transforms.prune.type=org.apache.kafka.connect.transforms.Replac eField$Value transforms.prune.whitelist=CreatedAt,Id,Text,Source,Truncated 1. Update the sink.properties 2. Update the connect- distributed.properties file 3. start the Connect framework using the Cassandra connector in distributed mode. ref: https://www.scylladb.com/2018/12/19/scylla-and-confluent-for-iot/
  • 19. Kafka Connect Security Encryption ■ Kafka Connect also works with SSL-encrypted connections to these brokers. Authentication ■ Kafka Connect works with SASL – e.g. Kerberos, Active Directory Authorization ■ Restrict who can create, write to, read from topics, and more ■ REST API for Kafka Connect nodes are not secure. ● Require an external proxy (eg Apache HTTP) to act as a secure gateway to the REST services, when configuring a secure cluster.
  • 20. Confluent Hub ■ Discover and share Connectors ■ Cassandra (OSS) and Dynamodb Source/Sink connectors available. ■ Scylla Shard aware connector to be published soon! https://www.confluent.io/hub/confluentinc/kafka-connect-cassandra
  • 22. TakeAways ■ Message queues are useful for a variety of reasons. ■ Scylla Kafka Connecter ( Sink and CDC source) will be coming out soon! ■ Event Streaming and Event-driven microservices are useful - try it out!
  • 23. Thank you Stay in touch Any questions? Maheedhar Gunturu maheedhar@scylladb.com @vanguard_space
  • 24. some useful links Here are some useful links for further reading/watching. 1. Useful video explaining most things for a low level of understanding – https://www.confluent.io/kafka-summit-sf18/so- you-want-to-write-a-connector 2. Confluent’s Developer guide to connectors which covers most basics – https://docs.confluent.io/current/connect/devguide.html 3. The source for above developer guide is available through maven here – https://mvnrepository.com/artifact/org.apache.kafka/connect-file/2.1.1 4. Useful guide providing additional best practices ( now deprecated though still useful) – https://docs.google.com/document/d/1jEn_G-KDsrhdecPTGIWIcke1I4gw4fR0G8OVj8e3iAI/edit# 5. Verification guide though a little generic as it is for both Connectors and Consumer/producers – https://www.confluent.io/wp-content/uploads/Verification-Guide-Confluent-Platform-Connectors-Integrations.pdf 6. https://opencredo.com/blogs/kafka-connect-source-connectors-a-detailed-guide-to-connecting-to-what-you-love/ 7. https://www.confluent.io/blog/simplest-useful-kafka-connect-data-pipeline-world-thereabouts-part-1/ 8. https://www.confluent.io/blog/the-simplest-useful-kafka-connect-data-pipeline-in-the-world-or-thereabouts-part-2/ 9. https://www.confluent.io/blog/simplest-useful-kafka-connect-data-pipeline-world-thereabouts-part-3/

Editor's Notes

  1. Kafka is, it is highly available and resilient to node failures and supports automatic recovery. This feature makes Apache Kafka ideal for communication and integration between components of large-scale data systems in real-world data systems.
  2. A typical ratio of the number of partitions in a topic to the number of consumers in a group would be (1:1) or (1:2) https://www.confluent.io/blog/how-choose-number-topics-partitions-kafka-cluster https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines https://www.confluent.io/blog/apache-kafka-supports-200k-partitions-per-cluster
  3. Topic partition is the unit of parallelism in Kafka. On both the producer and the broker side, writes to different partitions can be done fully in parallel. Kafka can replicate partitions across a configurable number of Kafka servers. Each partition has a leader server and zero or more follower servers. Leaders handle all read and write requests for a partition. Kafka uses also uses partitions for parallel consumer handling within a group. Each Broker handles its share of data and requests by sharing partition leadership. The partitions in each topic that all of the consumers are subscribed to are assigned dynamically to the consumers in round-robin fashion.
  4. Phrase coined by Martin Fowler. - CQRS and Event Sourcing Command Query Responsibility Segregation https://martinfowler.com/eaaDev/EventSourcing.html
  5. Kafka Connect is an open source framework, built as another layer on core Apache Kafka, to support large scale streaming data.
  6. SoC (Separation of Concerns)
  7. Includes transformations (SMT) Has ability to communicate with schema registry Currently the API is primarily Java and Scala only.