Strumenti e Strategie di Stream
Governance con Confluent Platform
Federico Alberton
Customer Success Technical Architects,
Confluent Italy
Paolo Venturini
Customer Success Technical Architects,
Confluent Italy
11:00 - 11:20
11:20 - 11:40
11:20 - 11:40
11:40 - 11:50
12:00
Stream Governance & Technical features:
- Schema, Schema Registry, Data Contracts
Sessione di Demo
Roadmap & next steps:
- Stream Catalog, Data Portal, Stream Lineage, Client Side Field Level
Encryption
Q&A Session
Chiusura Lavori
Agenda
Confluent and Confluent Platform
Confluent Data Streaming
Platform
All your data continuously streamed, processed,
governed and shared as a product,
making it instantly valuable, usable, and
trustworthy everywhere.
CONNECT
PROCESS
GOVERN
STREAM
Accounts
Customer
Profile
Purchases
Shipments
Claims Orders
Clickstreams
From Data Mess To Data Products
To Instant Value
Everywhere
CONNECT
Data
Systems
Inventory
Replenishment
Forecasting
…
Custom Apps &
Microservices
Personalization
Recommendation
Fraud
…
Confluent Platform
5
https://www.confluent.io/whitepaper/confluent-enterprise-reference-architecture/
Application
Sticky Load Balancer
REST Proxy
Proxy
Kafka Brokers
Broker +
Rebalancer
Quorum Nodes (ZooKeeper or KRaft)
Q Q Q
Proxy
Broker +
Rebalancer
Broker +
Rebalancer
Broker +
Rebalancer
Schema Registry
Leader Follower
Q Q
Confluent
Control Center
Application
Clients
KStreams
pp
Streams
Kafka Connect
Worker +
Connectors
or
Replicator
Microservice
Worker +
Connectors
or
Replicator
ksqlDB
ksqlDB
Server
ksqlDB
Server
Data contracts
Data producer/owner
- Produce high quality data
- Evolve data safely
- Make data contextualized
and discoverable
- Share data
Data platform team
- Design and offer a self-serve data
streaming platform
- Facilitate the onboarding of
developers and other data users
Data consumer
- Search and discover data
- Understand data
- Trust data
- Consume and build on high
quality data
Remove friction at scale without centralization
produces
Kafka without Schema
No structure can cause
many problems
Producer
Consumer
consume
Consumer
Consumers
produces
Kafka without Schema
Who is encoding and
decoding?
Producer
Consumer
consume
Consumer
Consumers
Serializer Deserializer
Major problems without the schema
Breaking changes Low quality data Security challenges
Data producers and
consumers evolve
independently. Changes to
the data schema (e.g.,
adding a new field,
changing data types) can
break consumers if not
managed properly.
Without a central
mechanism to enforce
schema adherence,
producers may send
malformed or incompatible
data, leading to data quality
issues downstream (Missing
data, Incorrect data, etc )
Sensitive data cannot be
protected properly and can
be accessed by wrong users
/ applications
{
"type": "record",
"name": "Membership",
"fields": [
{"name": "start_date", "type": {"type": "int", "logicalType": "date"}},
{"name": "end_date", "type": {"type": "int", "logicalType": "date"}},
{"name": "email", "type": "string"},
{"name": "ssn", "type": "string"}
]
}
Example Avro Schema
{
"type": "record",
"name": "Membership",
"fields": [
{"name": "start_date", "type": {"type": "int", "logicalType": "date"}},
{"name": "end_date", "type": {"type": "int", "logicalType": "date"}},
{"name": "email", "type": "string"},
{"name": "ssn", "type": "string"},
{"name": "full_name", "type": ["null", "string"]}
]
}
Change in Structure
Can be disruptive
Structure Ignored
Unpredictable outcomes SCHEMA
"start_date", "int"
"end_date", "int"
"email", "string"
"ssn", "string"
{
"start_date": 19358,
"end_date": 19722,
"email": 100
}
// 2023-01-01
// 2023-12-31
INCORRECT DATA TYPE
MISSING SSN FIELD
OUT OF ORDER
{
"start_date": 19358,
"end_date": 0,
"email": "john.doe",
"ssn": "fizzbuzz"
}
// 2023-01-01
Domain integrity ignored
Garbage in, garbage out SCHEMA
"start_date", "int"
"end_date", "int"
"email", "string"
"ssn", "string"
INVALID EMAIL
INVALID SSN
{
"start_date": 19358,
"end_date": 19722,
"email": "john.doe@example.com",
"ssn": "856-45-6789"
}
Example of a high quality
data
Proper structure and
semantics
// 2023-01-01
// 2023-12-31
SCHEMA
"start_date", "int"
"end_date", "int"
"email", "string"
"ssn", "string"
CLEAR TEXT
{
"start_date": 19358,
"end_date": 19722,
"email": "john.doe@example.com",
"ssn": "856-45-6789"
}
Data security gaps
No Data Privacy
// 2023-01-01
// 2023-12-31
SCHEMA
"start_date", "int"
"end_date", "int"
"email", "string"
"ssn", "string"
promises trust
produces
Data contract
structure
Producer
Consumer
consume
Consumer
Consumers
Business
Logic
Business
Logic
semantics
rules
Data Contracts is the
answer
Adding structure,
meaning, and policies
promises trust
produces
Data Contracts
Schema Registry as the foundation
Producer
Schema Registry
Confluent Kafka
Consumer
consume
Consumer
Consumers
Confluent Schema Registry
Schema Repository and more!
1
2
3 1
2
2
1
promises trust
produces
Data contract
First step: Define the
structure
Structure is the
foundation of the
contract
Producer
Consumer
consume
Consumer
Consumers
structure
{
"type": "record",
"name": "Membership",
"fields": [
{"name": "start_date", "type": {"type": "int", "logicalType": "date"}},
{"name": "end_date", "type": {"type": "int", "logicalType": "date"}},
{"name": "email", "type": "string"},
{"name": "ssn", "type": "string"}
]
}
Example Avro Schema
{
"type": "record",
"name": "Membership",
"fields": [
{"name": "start_date", "type": {"type": "int", "logicalType": "date"}},
{"name": "end_date", "type": {"type": "int", "logicalType": "date"}},
{"name": "email", "type": "string"},
{"name": "ssn", "type": "string"},
{"name": "full_name", "type": ["null", "string"]}
]
}
Constant Change
How do we know if this is allowed?
Set Compatibility for Schema Evolution
Compatibility enables predictable changes
COMPATIBILITY TYPE ALLOWED CHANGES COMPARED TO UPGRADE FIRST
BACKWARD
- Delete fields
- Add optional fields
Latest version Consumers
BACKWARD_TRANSITIVE
- Delete fields
- Add optional fields
All previous versions Consumers
FORWARD
- Add fields
- Delete optional fields
Latest version Producers
FORWARD_TRANSITIVE
- Add fields
- Delete optional fields
All previous versions Producers
FULL
- Add optional fields
- Delete optional fields
Latest version Either
FULL_TRANSITIVE
- Add optional fields
- Delete optional fields
All previous versions Either
NONE - Any change allowed Not compared Depends
{
"schema": "...",
"metadata": {
"properties": {
"owner": "Carol Smith",
"email": "csmith@acme.com",
"gdpr_sensitive": "True",
"retention_period": "24"
}
}
}
Adding Metadata
promises trust
produces
Data contract
structure
Proper semantics for data
Enables proper flow of
data
Producer
Consumer
consume
Consumer
Consumers
Business
Logic
Business
Logic
semantics
rules
Use Data Contract Rules
Data Quality
Constrains the values of
fields and customize
follow-up actions on
incompatible messages.
Data Encryption
Identifies and Encrypts
the value of a field
based on a tag added
to the field.
Data Transformation
Change the value of a
specific field or an entire
message based on a
condition.
Schema Migration
A Transform rule and
allows otherwise
breaking changes to be
performed on a schema
by adding upgrade and
downgrade rules.
// 2023-01-01
// 2023-12-31
{
"start_date": 19358,
"end_date": 19722,
"email": "john.doe@example.com",
"ssn": "856-45-6789"
}
Domain Integrity
Ensuring proper values SCHEMA
"start_date", "int"
"end_date", "int"
"email", "string"
"ssn", "string"
Data Quality Rule
{
"schema": "...",
"schemaType": "AVRO",
"ruleSet": {
"domainRules": [{
"name": "validateEmail",
"kind": "CONDITION",
"mode": "WRITE",
"type": "CEL",
"doc": "Rule checks email is well formatted and sends record to a DLQ if not.",
"expr": "Membership.email.matches(r".+@.+..+")",
"onFailure": "DLQ",
"params": {
"dlq.topic": "bad_members"
}
}]
}
}
// 2023-01-01
// 2023-12-31
{
"start_date": 19358,
"end_date": 19722,
"email": "for.mat@preserved.enc",
"ssn": "XXX-XX-6789"
}
Data Privacy
Protecting Sensitive Data SCHEMA
"start_date", "int"
"end_date", "int"
"email", "string"
"ssn", "string"
Data Encryption Rule
{
"schema": "...",
"schemaType": "AVRO",
"metadata": {
"tags": {
"Membership.email": ["PII"]
}
},
"ruleSet": {
"domainRules": [{
"name": "encryptPII",
"type": "ENCRYPT",
"doc": "Rule encrypts every field tagged as PII. ",
"tags": ["PII"],
"params": {
"encrypt.kek.name": "ce581594-3115-486e-b391-5ea874371e73",
"encrypt.kms.type": "aws-kms",
"encrypt.kms.key.id": "arn:aws:kms:us-east-1:586051073099:key/ce58..."
}
}]
}
}
On Roadmap
Data Transformation Rule
{
"schema": "...",
"schemaType": "AVRO",
"ruleSet": {
"domainRules": [{
"name": "populateDefaultSSN",
"kind": "TRANSFORM",
"type": "CEL_FIELD",
"doc": "Rule checks if ssn is empty and replaces it with 'unspecified' if it is.",
"mode": "WRITE",
"expr": "name == 'ssn' ; value == '' ? 'unspecified' : value"
}]
}
}
Schema Migration Rule
{
"schema": "...",
"schemaType": "AVRO",
"ruleSet": {
"migrationRules": [{
"name": "changeSsnToSocialSecurityNumber",
"kind": "TRANSFORM",
"type": "JSONATA",
"doc": "Consumer is on new major version and gets socialSecurityNumber while producer sends ssn.",
"mode": "UPGRADE",
"expr": "$merge([$sift($, function($v, $k) {$k != 'ssn'}), {'socialSecurityNumber': $.'ssn'}])"
}, {
"name": "changeSocialSecurityNumberToSsn",
"kind": "TRANSFORM",
"type": "JSONATA",
"doc": "Consumer is on old major version and gets ssn while producer sends socialSecurityNumber.",
"mode": "DOWNGRADE",
"expr": "$merge([$sift($, function($v, $k) {$k != 'socialSecurityNumber'}), {'ssn':
$.'socialSecurityNumber'}])"
}]
}
contract
validation
trust
Data Contracts
Shift-left and ensure high quality data
Producer
Schema Registry
Confluent Kafka
Consumer
Consumer
Consumers
Serializer
content
validation
consume
prevented
promises
provides
Data Contracts
High quality data for processing
Producer
Schema Registry
Confluent Kafka
Consumer
Consumer
Consumers
Deserializer
content
validation
prevented
contract
validation
Solution overview
Breaking changes Low quality data Security challenges
Utilize validation and
transformation rules
Take advantage of metadata
capabilities like tagging,
business metadata, etc
Protect sensitive data using
Confluent Security and
Governance capabilities
Use schema registry and
define schemas
Evolve them as needed
Use schema validation
Demo
Roadmap
● Provide hybrid monitoring, governance
and management in a unified, secure
architecture:
✓ Data Portal Sync
✓ Data Policies
✓ Observability - Metrics and Alerting
✓ Cluster Linking & Mirrored Topics
Hybrid Connected Cloud
Extend Data Streaming Platform capabilities to
self-managed clusters
On Roadmap for 2025
Data Discovery
An Example from Amazon
Discoverable
Owner
Accessible
Contextualized
Standardized
Secure
Trustworthy
Self Describing
Metadata
Search and discover data
using Data Portal
On Roadmap for 2025
Understand the data lineage
On Roadmap for 2025
Stream Catalog
On Roadmap for 2025
Q&A Session
falberton@confluent.io
pventurini@confluent.io
Grazie!
Strumenti e Strategie di Stream Governance con Confluent Platform

Strumenti e Strategie di Stream Governance con Confluent Platform

  • 1.
    Strumenti e Strategiedi Stream Governance con Confluent Platform Federico Alberton Customer Success Technical Architects, Confluent Italy Paolo Venturini Customer Success Technical Architects, Confluent Italy
  • 2.
    11:00 - 11:20 11:20- 11:40 11:20 - 11:40 11:40 - 11:50 12:00 Stream Governance & Technical features: - Schema, Schema Registry, Data Contracts Sessione di Demo Roadmap & next steps: - Stream Catalog, Data Portal, Stream Lineage, Client Side Field Level Encryption Q&A Session Chiusura Lavori Agenda
  • 3.
  • 4.
    Confluent Data Streaming Platform Allyour data continuously streamed, processed, governed and shared as a product, making it instantly valuable, usable, and trustworthy everywhere. CONNECT PROCESS GOVERN STREAM Accounts Customer Profile Purchases Shipments Claims Orders Clickstreams From Data Mess To Data Products To Instant Value Everywhere CONNECT Data Systems Inventory Replenishment Forecasting … Custom Apps & Microservices Personalization Recommendation Fraud …
  • 5.
    Confluent Platform 5 https://www.confluent.io/whitepaper/confluent-enterprise-reference-architecture/ Application Sticky LoadBalancer REST Proxy Proxy Kafka Brokers Broker + Rebalancer Quorum Nodes (ZooKeeper or KRaft) Q Q Q Proxy Broker + Rebalancer Broker + Rebalancer Broker + Rebalancer Schema Registry Leader Follower Q Q Confluent Control Center Application Clients KStreams pp Streams Kafka Connect Worker + Connectors or Replicator Microservice Worker + Connectors or Replicator ksqlDB ksqlDB Server ksqlDB Server
  • 6.
  • 7.
    Data producer/owner - Producehigh quality data - Evolve data safely - Make data contextualized and discoverable - Share data Data platform team - Design and offer a self-serve data streaming platform - Facilitate the onboarding of developers and other data users Data consumer - Search and discover data - Understand data - Trust data - Consume and build on high quality data Remove friction at scale without centralization
  • 8.
    produces Kafka without Schema Nostructure can cause many problems Producer Consumer consume Consumer Consumers
  • 9.
    produces Kafka without Schema Whois encoding and decoding? Producer Consumer consume Consumer Consumers Serializer Deserializer
  • 10.
    Major problems withoutthe schema Breaking changes Low quality data Security challenges Data producers and consumers evolve independently. Changes to the data schema (e.g., adding a new field, changing data types) can break consumers if not managed properly. Without a central mechanism to enforce schema adherence, producers may send malformed or incompatible data, leading to data quality issues downstream (Missing data, Incorrect data, etc ) Sensitive data cannot be protected properly and can be accessed by wrong users / applications
  • 11.
    { "type": "record", "name": "Membership", "fields":[ {"name": "start_date", "type": {"type": "int", "logicalType": "date"}}, {"name": "end_date", "type": {"type": "int", "logicalType": "date"}}, {"name": "email", "type": "string"}, {"name": "ssn", "type": "string"} ] } Example Avro Schema
  • 12.
    { "type": "record", "name": "Membership", "fields":[ {"name": "start_date", "type": {"type": "int", "logicalType": "date"}}, {"name": "end_date", "type": {"type": "int", "logicalType": "date"}}, {"name": "email", "type": "string"}, {"name": "ssn", "type": "string"}, {"name": "full_name", "type": ["null", "string"]} ] } Change in Structure Can be disruptive
  • 13.
    Structure Ignored Unpredictable outcomesSCHEMA "start_date", "int" "end_date", "int" "email", "string" "ssn", "string" { "start_date": 19358, "end_date": 19722, "email": 100 } // 2023-01-01 // 2023-12-31 INCORRECT DATA TYPE MISSING SSN FIELD
  • 14.
    OUT OF ORDER { "start_date":19358, "end_date": 0, "email": "john.doe", "ssn": "fizzbuzz" } // 2023-01-01 Domain integrity ignored Garbage in, garbage out SCHEMA "start_date", "int" "end_date", "int" "email", "string" "ssn", "string" INVALID EMAIL INVALID SSN
  • 15.
    { "start_date": 19358, "end_date": 19722, "email":"john.doe@example.com", "ssn": "856-45-6789" } Example of a high quality data Proper structure and semantics // 2023-01-01 // 2023-12-31 SCHEMA "start_date", "int" "end_date", "int" "email", "string" "ssn", "string"
  • 16.
    CLEAR TEXT { "start_date": 19358, "end_date":19722, "email": "john.doe@example.com", "ssn": "856-45-6789" } Data security gaps No Data Privacy // 2023-01-01 // 2023-12-31 SCHEMA "start_date", "int" "end_date", "int" "email", "string" "ssn", "string"
  • 17.
  • 18.
    promises trust produces Data Contracts SchemaRegistry as the foundation Producer Schema Registry Confluent Kafka Consumer consume Consumer Consumers
  • 19.
    Confluent Schema Registry SchemaRepository and more! 1 2 3 1 2 2 1
  • 20.
    promises trust produces Data contract Firststep: Define the structure Structure is the foundation of the contract Producer Consumer consume Consumer Consumers structure
  • 21.
    { "type": "record", "name": "Membership", "fields":[ {"name": "start_date", "type": {"type": "int", "logicalType": "date"}}, {"name": "end_date", "type": {"type": "int", "logicalType": "date"}}, {"name": "email", "type": "string"}, {"name": "ssn", "type": "string"} ] } Example Avro Schema
  • 22.
    { "type": "record", "name": "Membership", "fields":[ {"name": "start_date", "type": {"type": "int", "logicalType": "date"}}, {"name": "end_date", "type": {"type": "int", "logicalType": "date"}}, {"name": "email", "type": "string"}, {"name": "ssn", "type": "string"}, {"name": "full_name", "type": ["null", "string"]} ] } Constant Change How do we know if this is allowed?
  • 23.
    Set Compatibility forSchema Evolution Compatibility enables predictable changes COMPATIBILITY TYPE ALLOWED CHANGES COMPARED TO UPGRADE FIRST BACKWARD - Delete fields - Add optional fields Latest version Consumers BACKWARD_TRANSITIVE - Delete fields - Add optional fields All previous versions Consumers FORWARD - Add fields - Delete optional fields Latest version Producers FORWARD_TRANSITIVE - Add fields - Delete optional fields All previous versions Producers FULL - Add optional fields - Delete optional fields Latest version Either FULL_TRANSITIVE - Add optional fields - Delete optional fields All previous versions Either NONE - Any change allowed Not compared Depends
  • 24.
    { "schema": "...", "metadata": { "properties":{ "owner": "Carol Smith", "email": "csmith@acme.com", "gdpr_sensitive": "True", "retention_period": "24" } } } Adding Metadata
  • 25.
    promises trust produces Data contract structure Propersemantics for data Enables proper flow of data Producer Consumer consume Consumer Consumers Business Logic Business Logic semantics rules
  • 26.
    Use Data ContractRules Data Quality Constrains the values of fields and customize follow-up actions on incompatible messages. Data Encryption Identifies and Encrypts the value of a field based on a tag added to the field. Data Transformation Change the value of a specific field or an entire message based on a condition. Schema Migration A Transform rule and allows otherwise breaking changes to be performed on a schema by adding upgrade and downgrade rules.
  • 27.
    // 2023-01-01 // 2023-12-31 { "start_date":19358, "end_date": 19722, "email": "john.doe@example.com", "ssn": "856-45-6789" } Domain Integrity Ensuring proper values SCHEMA "start_date", "int" "end_date", "int" "email", "string" "ssn", "string"
  • 28.
    Data Quality Rule { "schema":"...", "schemaType": "AVRO", "ruleSet": { "domainRules": [{ "name": "validateEmail", "kind": "CONDITION", "mode": "WRITE", "type": "CEL", "doc": "Rule checks email is well formatted and sends record to a DLQ if not.", "expr": "Membership.email.matches(r".+@.+..+")", "onFailure": "DLQ", "params": { "dlq.topic": "bad_members" } }] } }
  • 29.
    // 2023-01-01 // 2023-12-31 { "start_date":19358, "end_date": 19722, "email": "for.mat@preserved.enc", "ssn": "XXX-XX-6789" } Data Privacy Protecting Sensitive Data SCHEMA "start_date", "int" "end_date", "int" "email", "string" "ssn", "string"
  • 30.
    Data Encryption Rule { "schema":"...", "schemaType": "AVRO", "metadata": { "tags": { "Membership.email": ["PII"] } }, "ruleSet": { "domainRules": [{ "name": "encryptPII", "type": "ENCRYPT", "doc": "Rule encrypts every field tagged as PII. ", "tags": ["PII"], "params": { "encrypt.kek.name": "ce581594-3115-486e-b391-5ea874371e73", "encrypt.kms.type": "aws-kms", "encrypt.kms.key.id": "arn:aws:kms:us-east-1:586051073099:key/ce58..." } }] } } On Roadmap
  • 31.
    Data Transformation Rule { "schema":"...", "schemaType": "AVRO", "ruleSet": { "domainRules": [{ "name": "populateDefaultSSN", "kind": "TRANSFORM", "type": "CEL_FIELD", "doc": "Rule checks if ssn is empty and replaces it with 'unspecified' if it is.", "mode": "WRITE", "expr": "name == 'ssn' ; value == '' ? 'unspecified' : value" }] } }
  • 32.
    Schema Migration Rule { "schema":"...", "schemaType": "AVRO", "ruleSet": { "migrationRules": [{ "name": "changeSsnToSocialSecurityNumber", "kind": "TRANSFORM", "type": "JSONATA", "doc": "Consumer is on new major version and gets socialSecurityNumber while producer sends ssn.", "mode": "UPGRADE", "expr": "$merge([$sift($, function($v, $k) {$k != 'ssn'}), {'socialSecurityNumber': $.'ssn'}])" }, { "name": "changeSocialSecurityNumberToSsn", "kind": "TRANSFORM", "type": "JSONATA", "doc": "Consumer is on old major version and gets ssn while producer sends socialSecurityNumber.", "mode": "DOWNGRADE", "expr": "$merge([$sift($, function($v, $k) {$k != 'socialSecurityNumber'}), {'ssn': $.'socialSecurityNumber'}])" }] }
  • 33.
    contract validation trust Data Contracts Shift-left andensure high quality data Producer Schema Registry Confluent Kafka Consumer Consumer Consumers Serializer content validation consume prevented
  • 34.
    promises provides Data Contracts High qualitydata for processing Producer Schema Registry Confluent Kafka Consumer Consumer Consumers Deserializer content validation prevented contract validation
  • 35.
    Solution overview Breaking changesLow quality data Security challenges Utilize validation and transformation rules Take advantage of metadata capabilities like tagging, business metadata, etc Protect sensitive data using Confluent Security and Governance capabilities Use schema registry and define schemas Evolve them as needed Use schema validation
  • 36.
  • 37.
  • 38.
    ● Provide hybridmonitoring, governance and management in a unified, secure architecture: ✓ Data Portal Sync ✓ Data Policies ✓ Observability - Metrics and Alerting ✓ Cluster Linking & Mirrored Topics Hybrid Connected Cloud Extend Data Streaming Platform capabilities to self-managed clusters On Roadmap for 2025
  • 39.
  • 40.
    An Example fromAmazon Discoverable Owner Accessible Contextualized Standardized Secure Trustworthy Self Describing Metadata
  • 41.
    Search and discoverdata using Data Portal On Roadmap for 2025
  • 42.
    Understand the datalineage On Roadmap for 2025
  • 43.
  • 44.
  • 45.